[00:00:01] hmm [00:00:05] nope [00:00:10] !log reedy@deploy1001 Finished scap: log event fixes (duration: 33m 18s) [00:00:13] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [00:00:15] 6 seconds over [00:00:21] MatmaRex: done [00:00:29] ha [00:00:58] Reedy: thanks. everything looks fine [00:31:22] 10Operations, 10Discovery, 10Traffic, 10Wikidata, and 2 others: Consider switching to HTTPS for Wikidata query service links - https://phabricator.wikimedia.org/T153563 (10Smalyshev) 05Open>03declined I don't think it's happening. If we decide otherwise in the future, we can reopen, but for now I don't... [01:22:48] (03PS3) 1020after4: Add phabricator-antivandalism extension to the library path [puppet] - 10https://gerrit.wikimedia.org/r/445329 (https://phabricator.wikimedia.org/T199741) [01:23:07] (03PS4) 1020after4: Add phabricator-antivandalism extension to the library path [puppet] - 10https://gerrit.wikimedia.org/r/445329 (https://phabricator.wikimedia.org/T199741) [01:23:12] (03CR) 1020after4: [C: 031] Add phabricator-antivandalism extension to the library path [puppet] - 10https://gerrit.wikimedia.org/r/445329 (https://phabricator.wikimedia.org/T199741) (owner: 1020after4) [02:28:37] !log l10nupdate@deploy1001 scap sync-l10n completed (1.32.0-wmf.12) (duration: 08m 35s) [02:28:39] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [02:46:01] 10Operations, 10Discovery, 10WMDE-Tech-Communication, 10Wikidata, and 2 others: announce breaking change: http > https for entities in rdf - https://phabricator.wikimedia.org/T154015 (10Smalyshev) [02:47:17] 10Operations, 10Performance-Team, 10vm-requests: Increase webperf1002/webperf2002 space from 50GB to 500 GB (Ganeti) - https://phabricator.wikimedia.org/T199853 (10Krinkle) [02:47:54] 10Operations, 10Performance-Team, 10vm-requests: Increase webperf1002/webperf2002 space from 50GB to 500 GB (Ganeti) - https://phabricator.wikimedia.org/T199853 (10Krinkle) [02:47:56] 10Operations, 10Performance-Team, 10monitoring, 10Patch-For-Review: Consolidate performance website and related software - https://phabricator.wikimedia.org/T158837 (10Krinkle) [02:55:19] !log l10nupdate@deploy1001 scap sync-l10n completed (1.32.0-wmf.13) (duration: 09m 43s) [02:55:22] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [03:05:46] !log l10nupdate@deploy1001 ResourceLoader cache refresh completed at Wed Jul 18 03:05:46 UTC 2018 (duration 10m 27s) [03:05:48] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [03:26:05] PROBLEM - MariaDB Slave Lag: s1 on dbstore1002 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 908.00 seconds [04:01:11] (03CR) 10Krinkle: "If this also runs in beta, might be worth cherry-picking there." [puppet] - 10https://gerrit.wikimedia.org/r/417233 (https://phabricator.wikimedia.org/T187765) (owner: 10Gilles) [04:02:19] (03PS1) 10BBlack: foundationwiki: 302s for www and m as well [puppet] - 10https://gerrit.wikimedia.org/r/446521 (https://phabricator.wikimedia.org/T188776) [04:03:29] (03PS3) 10Krinkle: Serve WebP variants for the hottest thumbnails [puppet] - 10https://gerrit.wikimedia.org/r/434055 (https://phabricator.wikimedia.org/T27611) (owner: 10Gilles) [04:03:52] (03CR) 10BBlack: [C: 032] foundationwiki: 302s for www and m as well [puppet] - 10https://gerrit.wikimedia.org/r/446521 (https://phabricator.wikimedia.org/T188776) (owner: 10BBlack) [04:04:57] (03CR) 10Krinkle: "Cherry-picked to deployment-puppetmaster03." [puppet] - 10https://gerrit.wikimedia.org/r/434055 (https://phabricator.wikimedia.org/T27611) (owner: 10Gilles) [04:07:07] bblack: Hm.. m-dot not in the regex? [04:07:25] RECOVERY - MariaDB Slave Lag: s1 on dbstore1002 is OK: OK slave_sql_lag Replication lag: 298.84 seconds [04:10:26] Krinkle: I'm kind of assuming the automattic site won't have an m-dot, as that's a pretty mediawiki-specific pattern [04:10:55] bblack: Right, so m-dot would unconditionally redirect to the wiki? [04:10:58] I don't know, we can ask for more feedback on that tomorrow on the ticket perhaps [04:11:07] I just meant that the commit msg says to 302 m-dot. [04:11:29] oh, right, I updated the rest of the commit text, but didn't change the subject line [04:11:39] bad subject line, from before I pulled that part back out [04:11:46] kk [04:15:29] 10Operations, 10Performance-Team, 10monitoring, 10Patch-For-Review: Consolidate performance website and related software - https://phabricator.wikimedia.org/T158837 (10Krinkle) [04:15:42] 10Operations, 10Performance-Team, 10monitoring, 10Patch-For-Review: Consolidate performance website and related software - https://phabricator.wikimedia.org/T158837 (10Krinkle) [04:56:32] !log Starting s1 failover pre steps - https://phabricator.wikimedia.org/T197069 [04:56:36] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:08:39] (03PS5) 10Marostegui: mariadb: Promote db1067 to s1 master [puppet] - 10https://gerrit.wikimedia.org/r/445354 (https://phabricator.wikimedia.org/T197069) [05:09:12] (03CR) 10Marostegui: mariadb: Promote db1067 to s1 master [puppet] - 10https://gerrit.wikimedia.org/r/445354 (https://phabricator.wikimedia.org/T197069) (owner: 10Marostegui) [05:09:26] !log mwscript deleteEqualMessages.php --wiki test2wiki (deleted 2 pages) - T45917 [05:09:29] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:09:30] T45917: Delete all redundant "MediaWiki" pages for system messages - https://phabricator.wikimedia.org/T45917 [05:09:32] !log mwscript deleteEqualMessages.php --wiki metawiki (deleted 1 page) - T45917 [05:09:35] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:09:59] (03CR) 10Marostegui: [C: 032] mariadb: Promote db1067 to s1 master [puppet] - 10https://gerrit.wikimedia.org/r/445354 (https://phabricator.wikimedia.org/T197069) (owner: 10Marostegui) [05:10:34] (03PS4) 10Marostegui: db-eqiad.php: Set up s1 on read only [mediawiki-config] - 10https://gerrit.wikimedia.org/r/445369 (https://phabricator.wikimedia.org/T197069) [05:10:38] (03CR) 10Marostegui: db-eqiad.php: Set up s1 on read only [mediawiki-config] - 10https://gerrit.wikimedia.org/r/445369 (https://phabricator.wikimedia.org/T197069) (owner: 10Marostegui) [05:11:02] (03CR) 10Marostegui: db-eqiad.php: Promote db1067 to master [mediawiki-config] - 10https://gerrit.wikimedia.org/r/445371 (https://phabricator.wikimedia.org/T197069) (owner: 10Marostegui) [05:11:07] (03PS3) 10Marostegui: db-eqiad.php: Promote db1067 to master [mediawiki-config] - 10https://gerrit.wikimedia.org/r/445371 (https://phabricator.wikimedia.org/T197069) [05:11:13] !log mwscript deleteEqualMessages.php --wiki frwiki (5 pages) - T45917 [05:11:15] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:11:27] !log mwscript deleteEqualMessages.php --wiki ruwiki (4 pages) - T45917 [05:11:30] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:11:41] !log mwscript deleteEqualMessages.php --wiki dewiki (1 page) - T45917 [05:11:44] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:13:41] gtid replication control seems to be working fine [05:16:02] :) [05:26:27] (03CR) 10Jcrespo: [C: 031] db-eqiad.php: Set up s1 on read only [mediawiki-config] - 10https://gerrit.wikimedia.org/r/445369 (https://phabricator.wikimedia.org/T197069) (owner: 10Marostegui) [05:27:48] (03PS1) 10KartikMistry: apertium-kaz-tat: Fix Build-Dep [debs/contenttranslation/apertium-kaz-tat] - 10https://gerrit.wikimedia.org/r/446525 [05:48:07] We are going to take over deploy1001 for the s1 failover, if something needs to be deployed please coordinate with us [05:48:32] I am going to merge but NOT deploy the read_only change so we are ready for when the time comes [05:49:09] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Set up s1 on read only [mediawiki-config] - 10https://gerrit.wikimedia.org/r/445369 (https://phabricator.wikimedia.org/T197069) (owner: 10Marostegui) [05:50:47] (03Merged) 10jenkins-bot: db-eqiad.php: Set up s1 on read only [mediawiki-config] - 10https://gerrit.wikimedia.org/r/445369 (https://phabricator.wikimedia.org/T197069) (owner: 10Marostegui) [05:51:01] (03CR) 10jenkins-bot: db-eqiad.php: Set up s1 on read only [mediawiki-config] - 10https://gerrit.wikimedia.org/r/445369 (https://phabricator.wikimedia.org/T197069) (owner: 10Marostegui) [05:51:40] (03PS1) 10Marostegui: Revert "db-eqiad.php: Set up s1 on read only" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/446527 [05:51:50] (03PS4) 10Marostegui: db-eqiad.php: Promote db1067 to master [mediawiki-config] - 10https://gerrit.wikimedia.org/r/445371 (https://phabricator.wikimedia.org/T197069) [06:00:04] jynus and marostegui: That opportune time is upon us again. Time for a Database Maintenance deploy. Don't be afraid. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180718T0600). [06:00:05] jynus: ready? [06:00:16] yes [06:00:20] !log Starting s1 failover from db1052 to db1067 - T197069 [06:00:23] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:00:24] T197069: Failover db1052 (s1) db primary master - https://phabricator.wikimedia.org/T197069 [06:00:28] deployingh [06:00:57] I can still edit [06:01:01] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Promote db1067 to master [mediawiki-config] - 10https://gerrit.wikimedia.org/r/445371 (https://phabricator.wikimedia.org/T197069) (owner: 10Marostegui) [06:01:07] still deploying [06:01:11] I know [06:01:13] :-) [06:01:25] just updating status :-) [06:01:31] !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Set s1 on ready only for maintenance T197069 (duration: 01m 08s) [06:01:34] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:01:34] read only set on mysql too [06:01:44] I confirm is up [06:01:52] heartbeat killed [06:02:15] confirm positions [06:02:29] db1052-bin.005945 [06:02:34] 479414160 [06:02:36] up to date [06:02:37] yep [06:02:55] restarting puppet [06:03:09] (03Merged) 10jenkins-bot: db-eqiad.php: Promote db1067 to master [mediawiki-config] - 10https://gerrit.wikimedia.org/r/445371 (https://phabricator.wikimedia.org/T197069) (owner: 10Marostegui) [06:03:20] (03PS2) 10Marostegui: Revert "db-eqiad.php: Set up s1 on read only" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/446527 [06:03:39] (03CR) 10jenkins-bot: db-eqiad.php: Promote db1067 to master [mediawiki-config] - 10https://gerrit.wikimedia.org/r/445371 (https://phabricator.wikimedia.org/T197069) (owner: 10Marostegui) [06:03:43] heartbeat started [06:03:43] I can see heartbeat working as expected [06:03:50] @67 [06:04:19] deploying changes [06:04:26] (03PS1) 10Tim Starling: Make "sql wikishared" work again [puppet] - 10https://gerrit.wikimedia.org/r/446530 (https://phabricator.wikimedia.org/T199316) [06:04:42] did you set 67 read_only 0? [06:04:45] can be done at the same time [06:04:46] not yet [06:04:51] doing it now [06:04:58] !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Set db1067 as master T197069 (duration: 00m 53s) [06:05:02] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:05:09] why u do this 2 me? [06:05:28] mysql read only off on db1067 [06:05:33] removing mediawiki read only now [06:05:46] (03CR) 10Tim Starling: [C: 032] "Already tested on deploy1001" [puppet] - 10https://gerrit.wikimedia.org/r/446530 (https://phabricator.wikimedia.org/T199316) (owner: 10Tim Starling) [06:05:48] :( read only, Story of my life. [06:05:49] (03CR) 10Marostegui: [V: 032 C: 032] Revert "db-eqiad.php: Set up s1 on read only" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/446527 (owner: 10Marostegui) [06:06:34] still cannot edit [06:06:44] Deploying [06:06:51] (I know it has not been deployed yet) [06:07:14] (03CR) 10Tim Starling: "Hold off on deploy due to failover test" [puppet] - 10https://gerrit.wikimedia.org/r/446530 (https://phabricator.wikimedia.org/T199316) (owner: 10Tim Starling) [06:07:17] !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: read only OFF after failover T197069 (duration: 00m 53s) [06:07:20] done! [06:07:20] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:07:21] T197069: Failover db1052 (s1) db primary master - https://phabricator.wikimedia.org/T197069 [06:07:23] let's test [06:07:35] I can edit on enwiki [06:08:05] yep, we are bacl [06:08:12] log it manually [06:08:18] and I will check rcs, and logs [06:08:19] (03CR) 10jenkins-bot: Revert "db-eqiad.php: Set up s1 on read only" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/446527 (owner: 10Marostegui) [06:08:39] !log s1 failover finished T197069 [06:08:42] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:08:55] interestingly errors only happened at 6:00 [06:09:03] Going to do the cleanup stuff [06:09:04] I guess edits that were caught in the middle [06:09:25] if anyone sees any issue editing on enwiki, speak up! [06:11:47] That seemed fairly painless :) [06:14:48] well, in my opinion, it could be way less painless if we got rid of all deployments [06:15:20] hopefully in the future just being a few seconds [06:16:33] marostegui: tell me when you are done with the topology changes/tendril (or I can do it) [06:16:49] jynus: starting slave on db1052 now :) [06:16:56] cool [06:17:01] you can go ahead [06:17:10] I will keep doing all the semi-sync and gtid related tasks [06:17:11] I will keep an eye on icinga [06:17:22] as we may miss issues there due to downtime [06:17:28] yeah [06:17:32] downtime is till 8:55 [06:17:51] specially on replication to cloud/dbstore [06:18:04] or mariadb 10.0 [06:18:09] I think it is catching up now [06:18:13] yeah [06:20:22] 17 replication errors on mediawiki at 6:18 [06:21:02] could be the stop/start slave for the semi sync enablement I am doing now [06:21:04] not sure why [06:21:06] ah [06:21:07] ok [06:21:19] that would explain it [06:21:24] Yep [06:21:25] I am done now [06:22:55] you upgrading tendril then? [06:23:09] I can do that, yes [06:23:18] both dbs [06:23:26] ok, I will jump those two steps and keep going with the rest [06:23:27] I checked those before for correctness [06:23:28] thanks [06:23:36] (03PS2) 10Marostegui: wmnet: Update s1-master alias [dns] - 10https://gerrit.wikimedia.org/r/445363 (https://phabricator.wikimedia.org/T197069) [06:24:02] (03CR) 10Marostegui: wmnet: Update s1-master alias [dns] - 10https://gerrit.wikimedia.org/r/445363 (https://phabricator.wikimedia.org/T197069) (owner: 10Marostegui) [06:24:05] (03CR) 10Marostegui: [C: 032] wmnet: Update s1-master alias [dns] - 10https://gerrit.wikimedia.org/r/445363 (https://phabricator.wikimedia.org/T197069) (owner: 10Marostegui) [06:24:34] running puppet on prometheus [06:25:36] PROBLEM - Router interfaces on cr1-eqiad is CRITICAL: CRITICAL: host 208.80.154.196, interfaces up: 237, down: 1, dormant: 0, excluded: 0, unused: 0 [06:26:22] that is not us, could be vendor network maintenance^ [06:26:45] RECOVERY - Router interfaces on cr1-eqiad is OK: OK: host 208.80.154.196, interfaces up: 241, down: 0, dormant: 0, excluded: 0, unused: 0 [06:29:40] dns alias in place [06:30:21] I just realised that we also moved db1118 smoothly (mysql 8.0!) [06:30:39] sure, because we don't trust GTID :-) [06:30:50] yeah, but it went fine XD [06:30:53] I wonder how you did it with no reaplication error [06:30:55] from 10 to 10.1 master [06:31:02] (on any other host) [06:33:57] I checked, and gtid is enabled on all hosts but 67, 1118, dbstore1002 and wikireplicas [06:34:04] yep [06:34:07] I just finished doing it [06:34:27] I will also check semisync status [06:34:56] I think I did all the slaves, but worth double checking yes [06:36:02] Rpl_semi_sync_master_clients 8 on 67 [06:37:18] it is not enabled on 1099 and db1105; on the other side, I am not sure if it should be enabled on 52 and 1118 [06:37:48] yeah, probably not needed in db1052 or db1118 [06:40:45] I am more interested on the first 2, should I enable it on them? [06:40:56] Yeah, I think so (or I can do it) [06:42:20] I will do it [06:42:27] tendril tree for s1 looks good - thank you! [06:42:36] meanwhile, tell me what you think we should do with the other 2 [06:42:48] for 1052 and db1118? [06:42:50] yes [06:42:57] they are depooled I guess [06:43:04] I would not enable it there [06:43:07] but they may alert of lag [06:43:14] so we may want to disable such alerts [06:43:29] I think db1118 has notifications disabled [06:43:31] 10Operations, 10ops-eqiad, 10Cloud-VPS, 10cloud-services-team: Rack/cable/configure asw2-b-eqiad switch stack - https://phabricator.wikimedia.org/T183585 (10Marostegui) [06:43:35] And for db1052 I was going to do it too [06:44:04] +1 [06:44:05] db1118 has notifications disabled - confirmed [06:44:31] I will work with db1052 and will also disable its notifications [06:45:23] !log enable semi-sync on db1099:3311 and db1105:3311 [06:45:25] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:46:23] for some strange reasons, I think the plugin was not even enabled on 1105 [06:47:05] and on db1099? [06:47:22] (03PS1) 10Marostegui: db1052: Disable notifications, upgrade socket [puppet] - 10https://gerrit.wikimedia.org/r/446533 (https://phabricator.wikimedia.org/T199861) [06:47:30] installed but off [06:48:00] I am also going to disable the slave on db1067 [06:48:19] even if it is technically not replication from anywhere right now [06:48:34] yeah, but no need to keep it enabled [06:49:05] so there is 10 clients right now [06:49:24] (03CR) 10Muehlenhoff: [C: 031] "Looks good" [puppet] - 10https://gerrit.wikimedia.org/r/446318 (https://phabricator.wikimedia.org/T198756) (owner: 10Filippo Giunchedi) [06:49:28] tell me when you are done with these 2 so I Can check there is 8 back again [06:50:37] which 2? [06:50:50] 1052 and 1118 [06:50:54] ah [06:51:08] db1118 I am not touching it [06:51:14] I thought we were not doing anything with it [06:51:14] are you going to restart 1052? [06:51:30] I asked if you wanted to disable semisync on db1118? [06:51:32] I was going to wait a bit to restart db1052 (just in case db1067 has issues) [06:51:35] Ah sorry, I missed that [06:51:39] I will disable it on both hosts [06:51:42] :) [06:51:49] I can do it , too [06:51:55] just asking if it was desirable [06:52:03] Yeah, I think it is [06:52:27] if you are going to wait for db1052 restart, note icinga my complain [06:52:31] disabled on db1052 [06:52:34] due to socket relocation [06:52:40] or just create temporary alias [06:53:00] disabled on db1118 [06:53:09] oh sorry [06:53:17] I thought you merged that patch [06:53:19] ignore me [06:53:22] nope, not yet :) [06:53:26] ok, I get it now [06:53:50] (03CR) 10Jcrespo: [C: 031] db1052: Disable notifications, upgrade socket [puppet] - 10https://gerrit.wikimedia.org/r/446533 (https://phabricator.wikimedia.org/T199861) (owner: 10Marostegui) [06:54:04] ok, so I think we are done? [06:54:08] db1067: Rpl_semi_sync_master_clients | 8 [06:54:09] yep [06:54:10] We are! [06:56:25] And downtime expired already, so we are back with alerts [06:56:34] cool [07:18:52] 10Operations, 10ops-eqiad, 10Cloud-VPS, 10cloud-services-team: Rack/cable/configure asw2-b-eqiad switch stack - https://phabricator.wikimedia.org/T183585 (10jcrespo) As an addemdum to T183585#4427995, because of T180918, we need to depool, ahead of the maintenance, the other replica dbs, too, but that shou... [07:22:36] !log updated tor on apt.wikimedia.org to 0.3.3.9 [07:22:38] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:23:59] uh [07:24:06] someone needs to whitelist the Wikimania IP from Phabricator [07:24:11] TOO MANY REQUESTS [07:24:11] You ("197.101.76.150") are issuing too many requests too quickly. [07:25:33] !log Deploy schema change on db1052 - https://phabricator.wikimedia.org/T191316 https://phabricator.wikimedia.org/T192926 https://phabricator.wikimedia.org/T89737 https://phabricator.wikimedia.org/T195193 [07:25:39] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:27:22] (03CR) 10Mbch331: Move foundationwiki domain (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/446333 (https://phabricator.wikimedia.org/T188776) (owner: 10Reedy) [07:27:59] (03PS3) 10Jcrespo: Disable phabricator rate limits [puppet] - 10https://gerrit.wikimedia.org/r/445328 (https://phabricator.wikimedia.org/T198974) (owner: 1020after4) [07:28:53] (03CR) 10Jcrespo: [C: 032] Disable phabricator rate limits [puppet] - 10https://gerrit.wikimedia.org/r/445328 (https://phabricator.wikimedia.org/T198974) (owner: 1020after4) [07:30:49] (03PS1) 10Jforrester: Follow-up 7fe5d447: Fix typo [mediawiki-config] - 10https://gerrit.wikimedia.org/r/446535 [07:31:21] Thanks, jynus. :-) [07:32:13] Everyone OK if I do a quick deploy right now? https://gerrit.wikimedia.org/r/#/c/operations/mediawiki-config/+/446535 typo fix in the 404 page. [07:33:07] James_F: Fine by me. We have finished our failover an hour ago [07:33:24] James_F: go for it [07:34:42] Kk. [07:35:56] (03CR) 10Jforrester: [C: 032] "Quick deploy for typo." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/446535 (owner: 10Jforrester) [07:37:13] (03Merged) 10jenkins-bot: Follow-up 7fe5d447: Fix typo [mediawiki-config] - 10https://gerrit.wikimedia.org/r/446535 (owner: 10Jforrester) [07:37:15] !log upgrading tor on radium to 0.3.3.9 [07:37:17] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:41:18] !log jforrester@deploy1001 Synchronized wmf-config/missing.php: Fix typo in missing.php Ia3aae912d9m (duration: 00m 54s) [07:41:21] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:42:22] All done, deploy over. [07:52:57] (03CR) 10Mark Bergsma: [C: 032] Move FSM connect state handling to the FSM itself [debs/pybal] - 10https://gerrit.wikimedia.org/r/434163 (owner: 10Mark Bergsma) [07:53:55] (03Merged) 10jenkins-bot: Move FSM connect state handling to the FSM itself [debs/pybal] - 10https://gerrit.wikimedia.org/r/434163 (owner: 10Mark Bergsma) [07:56:00] (03CR) 10jenkins-bot: Follow-up 7fe5d447: Fix typo [mediawiki-config] - 10https://gerrit.wikimedia.org/r/446535 (owner: 10Jforrester) [07:59:11] Hi [07:59:13] I need help [07:59:29] Which commit message I can set for patch for task T199783 [07:59:29] T199783: it.wiktionary : request for $wgAbuseFilterAvailableActions[] = 'block' - https://phabricator.wikimedia.org/T199783 [07:59:51] (03PS1) 10Marostegui: db-eqiad.php: Depool db1088 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/446537 (https://phabricator.wikimedia.org/T199368) [08:02:56] (03PS4) 10Mark Bergsma: Implement BGP FSM events 4 and 5 (passive start) [debs/pybal] - 10https://gerrit.wikimedia.org/r/436297 [08:03:01] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Depool db1088 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/446537 (https://phabricator.wikimedia.org/T199368) (owner: 10Marostegui) [08:03:03] Zoranzoki21: not a mediawiki deployer, but I would do something like [configuration file]: [English summary of what you intend to do]\n\nFurther explanation, if needed\n\nBug: T199783 [08:03:24] if the change is trivial, you may not need a detailed explanation [08:03:35] 10Operations, 10PoolCounter: Migrate pool counters to stretch - https://phabricator.wikimedia.org/T199876 (10MoritzMuehlenhoff) [08:03:45] jynus: OK [08:03:48] 10Operations, 10PoolCounter: Migrate pool counters to stretch - https://phabricator.wikimedia.org/T199876 (10MoritzMuehlenhoff) p:05Triage>03Normal a:03MoritzMuehlenhoff [08:04:24] Zoranzoki21: it is ok to upload an incomplete patch, it can be ammeded before deploy if needed [08:04:41] (03Merged) 10jenkins-bot: db-eqiad.php: Depool db1088 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/446537 (https://phabricator.wikimedia.org/T199368) (owner: 10Marostegui) [08:05:15] Zoranzoki21: a guide is available at https://www.mediawiki.org/wiki/Special:MyLanguage/Gerrit/Commit_message_guidelines [08:05:18] jynus: I know. Will try :) [08:06:10] and you can do as I do, check what other people deploy and copy the style :-), as I don't know better [08:06:24] !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Depool db1088 for alter table (duration: 00m 53s) [08:06:26] !log Deploy schema change on db1088 T144010 T51190 T199368 [08:06:26] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:06:31] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:06:32] T51190: Truncate SHA-1 indexes - https://phabricator.wikimedia.org/T51190 [08:06:33] T144010: Drop eu_touched in production - https://phabricator.wikimedia.org/T144010 [08:06:33] T199368: Convert UNIQUE INDEX to PK in Production - https://phabricator.wikimedia.org/T199368 [08:06:50] (03PS1) 10Marostegui: Revert "db-eqiad.php: Depool db1088" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/446538 [08:07:46] (03CR) 10Mark Bergsma: ">" (031 comment) [debs/pybal] - 10https://gerrit.wikimedia.org/r/436297 (owner: 10Mark Bergsma) [08:08:03] (03CR) 10Mark Bergsma: [C: 032] Implement BGP FSM events 4 and 5 (passive start) [debs/pybal] - 10https://gerrit.wikimedia.org/r/436297 (owner: 10Mark Bergsma) [08:08:25] 10Operations, 10Traffic, 10Patch-For-Review: Setup wikimediafoundation.org domain for July 30 launch of new site - https://phabricator.wikimedia.org/T198922 (10Varnent) The IP address is 192.0.66.2 [08:08:40] (03CR) 10jenkins-bot: db-eqiad.php: Depool db1088 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/446537 (https://phabricator.wikimedia.org/T199368) (owner: 10Marostegui) [08:08:50] (03Merged) 10jenkins-bot: Implement BGP FSM events 4 and 5 (passive start) [debs/pybal] - 10https://gerrit.wikimedia.org/r/436297 (owner: 10Mark Bergsma) [08:09:21] (03CR) 10Muehlenhoff: "https://puppet-compiler.wmflabs.org/compiler02/11806/" [puppet] - 10https://gerrit.wikimedia.org/r/446242 (https://phabricator.wikimedia.org/T183454) (owner: 10Muehlenhoff) [08:11:41] (03PS4) 10Mark Bergsma: Implement BGP FSM event 14 [debs/pybal] - 10https://gerrit.wikimedia.org/r/436298 [08:11:43] (03PS1) 10Zoranzoki21: abusefilter.php: Set $wgAbuseFilterActions['block'] = true for it.wiktionary [mediawiki-config] - 10https://gerrit.wikimedia.org/r/446539 (https://phabricator.wikimedia.org/T199783) [08:12:23] (03PS2) 10Zoranzoki21: abusefilter.php: Set $wgAbuseFilterActions['block'] = true for it.wiktionary [mediawiki-config] - 10https://gerrit.wikimedia.org/r/446539 (https://phabricator.wikimedia.org/T199783) [08:13:09] (03CR) 10Mark Bergsma: [C: 032] Implement BGP FSM event 14 [debs/pybal] - 10https://gerrit.wikimedia.org/r/436298 (owner: 10Mark Bergsma) [08:13:53] (03Merged) 10jenkins-bot: Implement BGP FSM event 14 [debs/pybal] - 10https://gerrit.wikimedia.org/r/436298 (owner: 10Mark Bergsma) [08:14:16] (03PS4) 10Mark Bergsma: Correct incoming connection interaction with BGP FSM [debs/pybal] - 10https://gerrit.wikimedia.org/r/436299 [08:14:46] (03CR) 10Filippo Giunchedi: [C: 032] wdqs: use syslogidentifier in systemd units [puppet] - 10https://gerrit.wikimedia.org/r/446318 (https://phabricator.wikimedia.org/T198756) (owner: 10Filippo Giunchedi) [08:14:55] (03PS3) 10Filippo Giunchedi: wdqs: use syslogidentifier in systemd units [puppet] - 10https://gerrit.wikimedia.org/r/446318 (https://phabricator.wikimedia.org/T198756) [08:15:46] (03CR) 10Mark Bergsma: [C: 032] Correct incoming connection interaction with BGP FSM [debs/pybal] - 10https://gerrit.wikimedia.org/r/436299 (owner: 10Mark Bergsma) [08:16:26] (03CR) 10Marostegui: [C: 032] Revert "db-eqiad.php: Depool db1088" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/446538 (owner: 10Marostegui) [08:16:28] (03Merged) 10jenkins-bot: Correct incoming connection interaction with BGP FSM [debs/pybal] - 10https://gerrit.wikimedia.org/r/436299 (owner: 10Mark Bergsma) [08:18:06] (03Merged) 10jenkins-bot: Revert "db-eqiad.php: Depool db1088" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/446538 (owner: 10Marostegui) [08:19:21] !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Repool db1088 after alter table (duration: 00m 53s) [08:19:23] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:19:24] (03CR) 10jenkins-bot: Revert "db-eqiad.php: Depool db1088" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/446538 (owner: 10Marostegui) [08:20:24] (03PS1) 10Filippo Giunchedi: admin: fix setting terminal's titlebar (filippo) [puppet] - 10https://gerrit.wikimedia.org/r/446541 [08:20:46] (03PS3) 10Mark Bergsma: Move NaiveBGPPeeringTestCase to test_peering [debs/pybal] - 10https://gerrit.wikimedia.org/r/436766 [08:20:48] (03PS2) 10Mark Bergsma: Cleanup NaiveBGPPeeringTestCase [debs/pybal] - 10https://gerrit.wikimedia.org/r/436769 [08:20:50] (03PS2) 10Mark Bergsma: Extend NaiveBGPPeering unit testing [debs/pybal] - 10https://gerrit.wikimedia.org/r/436807 [08:20:52] (03PS2) 10Mark Bergsma: Test UPDATE generation of the NaiveBGPPeering [debs/pybal] - 10https://gerrit.wikimedia.org/r/436808 [08:20:54] (03PS2) 10Mark Bergsma: Fix handling of withdrawals for Inet Unicast [debs/pybal] - 10https://gerrit.wikimedia.org/r/436809 [08:20:56] (03PS2) 10Mark Bergsma: Split off BGP factory/peering classes into a separate module [debs/pybal] - 10https://gerrit.wikimedia.org/r/436822 [08:22:49] (03CR) 10Filippo Giunchedi: [C: 032] admin: fix setting terminal's titlebar (filippo) [puppet] - 10https://gerrit.wikimedia.org/r/446541 (owner: 10Filippo Giunchedi) [08:23:53] (03PS1) 10Marostegui: db-eqiad.php: Depool db1085 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/446542 (https://phabricator.wikimedia.org/T199368) [08:26:03] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Depool db1085 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/446542 (https://phabricator.wikimedia.org/T199368) (owner: 10Marostegui) [08:27:02] (03CR) 10Mark Bergsma: [C: 032] Move NaiveBGPPeeringTestCase to test_peering [debs/pybal] - 10https://gerrit.wikimedia.org/r/436766 (owner: 10Mark Bergsma) [08:27:43] (03Merged) 10jenkins-bot: db-eqiad.php: Depool db1085 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/446542 (https://phabricator.wikimedia.org/T199368) (owner: 10Marostegui) [08:27:45] (03Merged) 10jenkins-bot: Move NaiveBGPPeeringTestCase to test_peering [debs/pybal] - 10https://gerrit.wikimedia.org/r/436766 (owner: 10Mark Bergsma) [08:29:00] !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Depool db1085 for alter table (duration: 01m 01s) [08:29:02] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:29:19] (03CR) 10jenkins-bot: db-eqiad.php: Depool db1085 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/446542 (https://phabricator.wikimedia.org/T199368) (owner: 10Marostegui) [08:29:22] !log Deploy schema change on db1085 with replication, this will generate lag on labsdb hosts for s6 T144010 T51190 T199368 [08:29:27] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:29:27] T51190: Truncate SHA-1 indexes - https://phabricator.wikimedia.org/T51190 [08:29:28] T144010: Drop eu_touched in production - https://phabricator.wikimedia.org/T144010 [08:29:28] T199368: Convert UNIQUE INDEX to PK in Production - https://phabricator.wikimedia.org/T199368 [08:31:43] (03CR) 10Mark Bergsma: [C: 032] Cleanup NaiveBGPPeeringTestCase [debs/pybal] - 10https://gerrit.wikimedia.org/r/436769 (owner: 10Mark Bergsma) [08:31:52] !log drain + reboot analytics1030 for kernel updates [08:31:54] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:32:28] (03Merged) 10jenkins-bot: Cleanup NaiveBGPPeeringTestCase [debs/pybal] - 10https://gerrit.wikimedia.org/r/436769 (owner: 10Mark Bergsma) [08:33:03] (03PS5) 10Rush: Add phabricator-antivandalism extension to the library path [puppet] - 10https://gerrit.wikimedia.org/r/445329 (https://phabricator.wikimedia.org/T199741) (owner: 1020after4) [08:34:49] (03CR) 10Rush: "Do we have any concerns about private/security related things being stored in swift?" [puppet] - 10https://gerrit.wikimedia.org/r/432528 (https://phabricator.wikimedia.org/T182085) (owner: 1020after4) [08:35:33] (03CR) 10Rush: [C: 032] Add phabricator-antivandalism extension to the library path [puppet] - 10https://gerrit.wikimedia.org/r/445329 (https://phabricator.wikimedia.org/T199741) (owner: 1020after4) [08:36:51] 10Operations, 10DNS, 10Release-Engineering-Team, 10Traffic, and 4 others: Move Foundation Wiki to new URL when new Wikimedia Foundation website launches - https://phabricator.wikimedia.org/T188776 (10Varnent) >>! In T188776#4431684, @BBlack wrote: > Obviously, those whole-site redirects configured within o... [08:37:01] !log deploy antivandalism stuff to phab https://gerrit.wikimedia.org/r/c/operations/puppet/+/445329 (needs herald to fully enable) [08:37:03] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:38:51] !log restart apache on phab1001 [08:38:53] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:40:14] (03PS1) 10Marostegui: Revert "db-eqiad.php: Depool db1085" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/446544 [08:42:54] (03CR) 10Marostegui: [C: 032] Revert "db-eqiad.php: Depool db1085" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/446544 (owner: 10Marostegui) [08:44:09] (03Merged) 10jenkins-bot: Revert "db-eqiad.php: Depool db1085" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/446544 (owner: 10Marostegui) [08:45:44] !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Repool db1085 after alter table (duration: 00m 53s) [08:45:47] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:47:20] marostegui: deploying a schema change on s6? [08:47:51] yeaj, db1086 [08:47:58] I will down the children [08:48:02] *85 [08:48:19] Yeah, I logged it [08:48:25] That will cause replication lag on labs s6 [08:48:34] (I think I logged, it, let me double check) [08:48:44] Yep, I did :) [08:48:50] it is not much the log [08:48:55] as icinga [08:48:59] Oh! [08:49:03] Yes! I forgot to downtime db1125 [08:49:04] Sorry [08:49:21] db1125 is not that important, but I don't know it yet by heart [08:49:29] (03PS1) 10ArielGlenn: add snapshot1005 back to mediawiki scap targets [puppet] - 10https://gerrit.wikimedia.org/r/446546 [08:50:18] !log reboot ms-be1040 for tests with SSBD microcode [08:50:20] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:51:39] (03CR) 10ArielGlenn: [C: 032] add snapshot1005 back to mediawiki scap targets [puppet] - 10https://gerrit.wikimedia.org/r/446546 (owner: 10ArielGlenn) [08:51:44] !log run xfs_repair on filesystems reporting negative space available on ms-be1043 - T199198 [08:51:46] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:51:47] T199198: Some swift filesystems reporting negative disk usage - https://phabricator.wikimedia.org/T199198 [08:51:52] (03PS2) 10ArielGlenn: add snapshot1005 back to mediawiki scap targets [puppet] - 10https://gerrit.wikimedia.org/r/446546 [08:54:31] (03PS3) 10Elukey: Add IPv6 PTR records for Analytics hosts [dns] - 10https://gerrit.wikimedia.org/r/446235 (https://phabricator.wikimedia.org/T199180) [08:55:33] (03CR) 10Elukey: [C: 032] Add IPv6 PTR records for Analytics hosts [dns] - 10https://gerrit.wikimedia.org/r/446235 (https://phabricator.wikimedia.org/T199180) (owner: 10Elukey) [09:06:56] RECOVERY - mediawiki-installation DSH group on snapshot1005 is OK: OK [09:08:41] (03PS1) 10ArielGlenn: update mac address for snapshot1005 after mainboard replacement [puppet] - 10https://gerrit.wikimedia.org/r/446550 [09:10:04] (03PS1) 10Jcrespo: mariadb: Allow reimage of es1019 [puppet] - 10https://gerrit.wikimedia.org/r/446551 (https://phabricator.wikimedia.org/T197073) [09:10:36] (03CR) 10ArielGlenn: [C: 032] update mac address for snapshot1005 after mainboard replacement [puppet] - 10https://gerrit.wikimedia.org/r/446550 (owner: 10ArielGlenn) [09:10:50] (03PS1) 10Elukey: Fix analytics1038's IPv6 PTR record [dns] - 10https://gerrit.wikimedia.org/r/446552 [09:11:18] (03CR) 10Elukey: [C: 032] Fix analytics1038's IPv6 PTR record [dns] - 10https://gerrit.wikimedia.org/r/446552 (owner: 10Elukey) [09:12:30] (03PS1) 10Jcrespo: mariadb: Depool es1018 for reimage [mediawiki-config] - 10https://gerrit.wikimedia.org/r/446553 (https://phabricator.wikimedia.org/T197073) [09:13:12] (03CR) 10Elukey: [C: 032] "To keep archives happy: there was a mistake in analytics1038's PTR (same as 1035's), fixed with https://gerrit.wikimedia.org/r/446552" [dns] - 10https://gerrit.wikimedia.org/r/446235 (https://phabricator.wikimedia.org/T199180) (owner: 10Elukey) [09:13:21] (03PS2) 10Jcrespo: mariadb: Depool es1019 for reimage [mediawiki-config] - 10https://gerrit.wikimedia.org/r/446553 (https://phabricator.wikimedia.org/T197073) [09:14:08] (03PS2) 10Jcrespo: mariadb: Allow reimage of es1019 [puppet] - 10https://gerrit.wikimedia.org/r/446551 (https://phabricator.wikimedia.org/T197073) [09:14:15] (03CR) 10jenkins-bot: Revert "db-eqiad.php: Depool db1085" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/446544 (owner: 10Marostegui) [09:15:09] (03CR) 10Jcrespo: [C: 032] mariadb: Allow reimage of es1019 [puppet] - 10https://gerrit.wikimedia.org/r/446551 (https://phabricator.wikimedia.org/T197073) (owner: 10Jcrespo) [09:18:27] Reedy: https://phabricator.wikimedia.org/T188776#4433198 - possible to get the redirect reversed? [09:30:51] (03PS1) 10Mobrovac: EventStreams: Consume messages from the local Kafka cluster [puppet] - 10https://gerrit.wikimedia.org/r/446556 (https://phabricator.wikimedia.org/T199813) [09:36:35] !log draining restbase1007 for eventual reboot for tests with SSBD microcode [09:36:37] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:37:44] (03CR) 10Mobrovac: "PCC OK - https://puppet-compiler.wmflabs.org/compiler02/11807/" [puppet] - 10https://gerrit.wikimedia.org/r/446556 (https://phabricator.wikimedia.org/T199813) (owner: 10Mobrovac) [09:43:36] jouncebot: now [09:43:36] No deployments scheduled for the next 1 hour(s) and 16 minute(s) [09:50:27] (03PS3) 10Zoranzoki21: Enable AbuseFilter block option on itwiktionary [mediawiki-config] - 10https://gerrit.wikimedia.org/r/446539 (https://phabricator.wikimedia.org/T199783) [09:51:25] PROBLEM - CirrusSearch eqiad 95th percentile latency on graphite1001 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [1000.0] https://grafana.wikimedia.org/dashboard/db/elasticsearch-percentiles?panelId=19&fullscreen&orgId=1&var-cluster=eqiad&var-smoothing=1 [09:55:46] elastic1030 is struggling ^, will wait bit and restart it if it does not recover [09:56:45] thanks! [09:58:09] !log contint1001 dropping unused docker containers [09:58:11] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:01:29] (03PS1) 10Arturo Borrero Gonzalez: cloudvps: labtest: only have one active net node at a time [puppet] - 10https://gerrit.wikimedia.org/r/446562 (https://phabricator.wikimedia.org/T196752) [10:02:00] !log clearing internal cache on elastic1030 [10:02:02] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:03:19] PROBLEM - CirrusSearch eqiad 95th percentile latency on graphite1001 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [1000.0] https://grafana.wikimedia.org/dashboard/db/elasticsearch-percentiles?panelId=19&fullscreen&orgId=1&var-cluster=eqiad&var-smoothing=1 [10:03:28] (03CR) 10Arturo Borrero Gonzalez: [C: 032] cloudvps: labtest: only have one active net node at a time [puppet] - 10https://gerrit.wikimedia.org/r/446562 (https://phabricator.wikimedia.org/T196752) (owner: 10Arturo Borrero Gonzalez) [10:05:44] (03PS1) 10Ladsgroup: Enable ores extension wp10 storage in Basque Wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/446563 (https://phabricator.wikimedia.org/T198358) [10:10:56] !log T156137: banning elastic1030 from the cluster (high load) [10:10:59] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:11:00] T156137: Reduce impact of GC pauses on elasticsearch response time - https://phabricator.wikimedia.org/T156137 [10:12:37] 10Operations: missed pages from kafka outage on July 11 2018 - https://phabricator.wikimedia.org/T199890 (10ArielGlenn) [10:13:05] 10Operations, 10User-ArielGlenn: missed pages from kafka outage on July 11 2018 - https://phabricator.wikimedia.org/T199890 (10ArielGlenn) [10:24:48] !log installing cups/libcups security updates for jessie [10:24:50] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:26:17] 10Operations: Decommission servermon - https://phabricator.wikimedia.org/T198939 (10Volans) I've submitted https://github.com/voxpupuli/puppetboard/pull/477 upstream, and went ahead and applied it to our puppetboard installation, so that we can enable some query endpoints. As a start I propose to enable: ``` ['f... [10:27:05] (03PS1) 10Volans: puppetboard: enable some query endpoints [puppet] - 10https://gerrit.wikimedia.org/r/446564 (https://phabricator.wikimedia.org/T198939) [10:27:09] RECOVERY - CirrusSearch eqiad 95th percentile latency on graphite1001 is OK: OK: Less than 20.00% above the threshold [500.0] https://grafana.wikimedia.org/dashboard/db/elasticsearch-percentiles?panelId=19&fullscreen&orgId=1&var-cluster=eqiad&var-smoothing=1 [10:29:45] (03PS2) 10Mark Bergsma: Test Server invariants [debs/pybal] - 10https://gerrit.wikimedia.org/r/445207 (https://phabricator.wikimedia.org/T184715) [10:29:47] (03PS1) 10Mark Bergsma: Cleanup setServers and clarify its use [debs/pybal] - 10https://gerrit.wikimedia.org/r/446565 [10:30:24] !log ran puppet clean/deactivate for lawrencium, long-standing source of errors in package deployments (T191360) [10:30:27] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:30:27] T191360: decom spare server lawrencium/WMF3542 - https://phabricator.wikimedia.org/T191360 [10:30:53] 10Operations, 10ops-eqiad, 10decommission: decom spare server lawrencium/WMF3542 - https://phabricator.wikimedia.org/T191360 (10MoritzMuehlenhoff) [10:35:51] (03CR) 10Mark Bergsma: Test Server invariants (031 comment) [debs/pybal] - 10https://gerrit.wikimedia.org/r/445207 (https://phabricator.wikimedia.org/T184715) (owner: 10Mark Bergsma) [10:40:40] 10Operations, 10Wikimedia-Mailing-lists: Mailman issues a "403 Forbidden" error when subscribing to a list - https://phabricator.wikimedia.org/T195750 (10dbs) Hi, I ran into this problem over the last few days trying to subscribe to wikidata-tech while on the eduroam network (my public IP address of 194.94.98.... [10:43:54] (03CR) 10Vgutierrez: [C: 031] Fix socket message for Python3 [software/conftool] - 10https://gerrit.wikimedia.org/r/446336 (owner: 10Volans) [11:00:04] addshore, hashar, aude, MaxSem, twentyafterfour, RoanKattouw, Dereckson, thcipriani, Niharika, and zeljkof: It is that lovely time of the day again! You are hereby commanded to deploy European Mid-day SWAT(Max 6 patches). (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180718T1100). [11:00:04] Zoranzoki21 and Amir1: A patch you scheduled for European Mid-day SWAT(Max 6 patches) is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker. [11:00:22] o/ [11:00:22] I can SWAT today [11:00:33] Amir1: go ahead while I review other commits [11:01:05] (03CR) 10Ladsgroup: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/446563 (https://phabricator.wikimedia.org/T198358) (owner: 10Ladsgroup) [11:01:49] Zoranzoki21: around for SWAT? [11:01:59] I am here [11:02:23] (03Merged) 10jenkins-bot: Enable ores extension wp10 storage in Basque Wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/446563 (https://phabricator.wikimedia.org/T198358) (owner: 10Ladsgroup) [11:02:39] (03CR) 10jenkins-bot: Enable ores extension wp10 storage in Basque Wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/446563 (https://phabricator.wikimedia.org/T198358) (owner: 10Ladsgroup) [11:02:49] (03PS14) 10Zfilipin: Create Publisher namespace in Bengali Wikisource [mediawiki-config] - 10https://gerrit.wikimedia.org/r/444664 (https://phabricator.wikimedia.org/T199028) (owner: 10Zoranzoki21) [11:03:52] Tested on mwdebug1002 [11:03:56] moving forward [11:05:45] !log ladsgroup@deploy1001 Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:446563|Enable ores extension wp10 storage in Basque Wikipedia (T198358)]] (duration: 00m 54s) [11:05:48] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:05:49] T198358: Enable ores extension wp10 storage in Basque Wikipedia - https://phabricator.wikimedia.org/T198358 [11:06:16] ^ I'm done, I have to be afk for an hour, if you see any increase in errors in any wikis feel free to revert this [11:06:29] \o/ [11:06:33] <3 AmandaNP [11:06:36] *Amir1 [11:06:37] sorry [11:06:59] I tested in beta cluster yesterday so it should be fine [11:07:04] but you never know :D [11:07:21] Amir1: ok, taking over swat, if logs go crazy I'll revert it :) [11:07:58] (03CR) 10Zfilipin: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/444664 (https://phabricator.wikimedia.org/T199028) (owner: 10Zoranzoki21) [11:09:31] (03Merged) 10jenkins-bot: Create Publisher namespace in Bengali Wikisource [mediawiki-config] - 10https://gerrit.wikimedia.org/r/444664 (https://phabricator.wikimedia.org/T199028) (owner: 10Zoranzoki21) [11:11:58] (03CR) 10Volans: [C: 032] Fix socket message for Python3 [software/conftool] - 10https://gerrit.wikimedia.org/r/446336 (owner: 10Volans) [11:12:00] (03CR) 10jenkins-bot: Create Publisher namespace in Bengali Wikisource [mediawiki-config] - 10https://gerrit.wikimedia.org/r/444664 (https://phabricator.wikimedia.org/T199028) (owner: 10Zoranzoki21) [11:12:04] Zoranzoki21: 444664 is at mwdebug1002 for testing [11:12:12] zeljkof: testing [11:13:21] (03Merged) 10jenkins-bot: Fix socket message for Python3 [software/conftool] - 10https://gerrit.wikimedia.org/r/446336 (owner: 10Volans) [11:14:03] zeljkof: ok. lgtm in public [11:14:08] ok is all [11:14:25] Zoranzoki21: ok, deploying [11:15:51] !log zfilipin@deploy1001 Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:444664|Create Publisher namespace in Bengali Wikisource (T199028)]] (duration: 00m 54s) [11:15:54] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:15:55] T199028: Create Publisher:Namespace in Bengali Wikisource - https://phabricator.wikimedia.org/T199028 [11:16:03] zeljkof: please and script [11:16:18] Zoranzoki21: it's deployed, running the script [11:16:37] actually I remembered, I didn't make the tables [11:16:43] it's going to blow up ASAP [11:16:53] I ran to the office [11:17:24] Amir1: o.O [11:17:31] swat is yours, please fix :) [11:18:28] (03CR) 10Zfilipin: "Script output at T199028#4433765" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/444664 (https://phabricator.wikimedia.org/T199028) (owner: 10Zoranzoki21) [11:18:48] Zoranzoki21: script output https://phabricator.wikimedia.org/T199028#4433765 [11:19:05] Zoranzoki21: waiting for Amir1 to fix something, will continue after him [11:19:18] ok [11:20:08] !log making ores tables on euwiki [11:20:10] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:24:53] It doesn't work but nothing is broken [11:24:57] please proceed [11:25:10] !log installing policykit security updates on trusty [11:25:11] I can make edits, I can do whatever [11:25:12] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:25:16] zeljkof: ^ [11:25:42] Amir1: ok, continuing with swat [11:26:30] Zoranzoki21: merging 446539, can you test it at mwdebug once it merges? [11:26:46] zeljkof: I can [11:26:52] (03CR) 10Zfilipin: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/446539 (https://phabricator.wikimedia.org/T199783) (owner: 10Zoranzoki21) [11:28:07] (03Merged) 10jenkins-bot: Enable AbuseFilter block option on itwiktionary [mediawiki-config] - 10https://gerrit.wikimedia.org/r/446539 (https://phabricator.wikimedia.org/T199783) (owner: 10Zoranzoki21) [11:28:20] (03CR) 10jenkins-bot: Enable AbuseFilter block option on itwiktionary [mediawiki-config] - 10https://gerrit.wikimedia.org/r/446539 (https://phabricator.wikimedia.org/T199783) (owner: 10Zoranzoki21) [11:28:58] Zoranzoki21: 446539 is at mwdebug for testing [11:29:35] zeljkof: testing [11:30:54] zeljkof: ok is all. You can deploy [11:31:02] ok, deploying [11:32:10] !log zfilipin@deploy1001 Synchronized wmf-config/abusefilter.php: SWAT: [[gerrit:446539|Enable AbuseFilter block option on itwiktionary (T199783)]] (duration: 00m 54s) [11:32:13] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:32:13] T199783: it.wiktionary : request for $wgAbuseFilterAvailableActions[] = 'block' - https://phabricator.wikimedia.org/T199783 [11:32:15] PROBLEM - Check systemd state on ms-be1022 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [11:32:38] Zoranzoki21: it's deployed, please test and thanks for deploying with #releng :) [11:33:48] everything seems ok [11:34:49] !log EU SWAT finished [11:34:51] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:35:46] !log installing imagemagick security updates [11:35:48] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:36:59] (03PS1) 10Reedy: Fix typo in missing.php [mediawiki-config] - 10https://gerrit.wikimedia.org/r/446576 [11:38:12] (03Abandoned) 10Reedy: Fix typo in missing.php [mediawiki-config] - 10https://gerrit.wikimedia.org/r/446576 (owner: 10Reedy) [11:40:22] 10Operations, 10ops-eqiad, 10DC-Ops: Replace wtp1043's sda - https://phabricator.wikimedia.org/T196886 (10faidon) What's going on with this? [11:50:29] 10Operations, 10DNS, 10Release-Engineering-Team, 10Traffic, and 4 others: Move Foundation Wiki to new URL when new Wikimedia Foundation website launches - https://phabricator.wikimedia.org/T188776 (10Reedy) Brandon switched them to 302-ing, which is a "temporary redirect". If crawlers and other sites aren'... [11:52:11] so we won't have any failed requests from euwiki and it will start working but I need to investigate why it failed [11:52:50] 10Operations, 10ops-eqiad, 10Cloud-VPS, 10Datasets-General-or-Unknown, 10Patch-For-Review: rack upgraded storage capacity in labstore100[67].eqiad.wmnet - https://phabricator.wikimedia.org/T196651 (10ArielGlenn) The cert on labstore1006 was valid from Mar 14 through Jun 12, dead already. The command ar... [12:00:04] Deploy window Pre MediaWiki train sanity break (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180718T1200) [12:00:11] (03PS3) 10Jcrespo: mariadb: Depool es1019 for reimage [mediawiki-config] - 10https://gerrit.wikimedia.org/r/446553 (https://phabricator.wikimedia.org/T197073) [12:02:52] !log installing PHP security update on bohrium [12:02:55] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:05:42] (03PS1) 10Ladsgroup: Enable new backend for Special:Tags in fawikisource [mediawiki-config] - 10https://gerrit.wikimedia.org/r/446579 (https://phabricator.wikimedia.org/T199334) [12:16:33] !log installing PHP security updates on yubiauth servers [12:16:35] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:24:05] !log manually added queued.max.messages.kbytes: 65535 to eventstreams on scb2002 as test for T199813 [12:24:08] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:24:09] T199813: EventStreams accumulates too much memory on SCB nodes in CODFW - https://phabricator.wikimedia.org/T199813 [12:26:15] PROBLEM - Check systemd state on scb2002 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [12:26:29] 10Operations, 10CommRel-Internals, 10Wikimedia-Mailing-lists: Close https://lists.wikimedia.org/mailman/listinfo/cep and keep the archive for now - https://phabricator.wikimedia.org/T155683 (10Quiddity) [12:26:44] PROBLEM - eventstreams on scb2002 is CRITICAL: connect to address 10.192.48.43 and port 8092: Connection refused [12:27:09] 10Operations, 10CommRel-Internals, 10Wikimedia-Mailing-lists: Close https://lists.wikimedia.org/mailman/listinfo/cep and keep the archive for now - https://phabricator.wikimedia.org/T155683 (10Quiddity) I've attempted a description update. Quim or Erica will have to confirm or fix the plan though. [12:27:25] RECOVERY - Check systemd state on scb2002 is OK: OK - running: The system is fully operational [12:27:45] RECOVERY - eventstreams on scb2002 is OK: HTTP OK: HTTP/1.1 200 OK - 1066 bytes in 0.092 second response time [12:28:51] (03PS1) 10Arturo Borrero Gonzalez: cloudvps: reimage/rename labnet1004 to cloudnet1004 [puppet] - 10https://gerrit.wikimedia.org/r/446581 (https://phabricator.wikimedia.org/T199906) [12:31:51] (03CR) 10Arturo Borrero Gonzalez: [C: 032] cloudvps: reimage/rename labnet1004 to cloudnet1004 [puppet] - 10https://gerrit.wikimedia.org/r/446581 (https://phabricator.wikimedia.org/T199906) (owner: 10Arturo Borrero Gonzalez) [12:35:55] (03PS2) 10Volans: puppetboard: enable some query endpoints [puppet] - 10https://gerrit.wikimedia.org/r/446564 (https://phabricator.wikimedia.org/T198939) [12:36:38] !log installing PHP security updates on icinga.wikimedia.org [12:36:40] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:36:41] (03PS1) 10Arturo Borrero Gonzalez: cloudvps: rename labnet1004 to cloudnet1004 [dns] - 10https://gerrit.wikimedia.org/r/446582 (https://phabricator.wikimedia.org/T199906) [12:37:18] (03CR) 10Arturo Borrero Gonzalez: [C: 032] cloudvps: rename labnet1004 to cloudnet1004 [dns] - 10https://gerrit.wikimedia.org/r/446582 (https://phabricator.wikimedia.org/T199906) (owner: 10Arturo Borrero Gonzalez) [12:38:46] (03PS1) 10Zoranzoki21: Enable VisualEditor at fiwikibooks [mediawiki-config] - 10https://gerrit.wikimedia.org/r/446583 (https://phabricator.wikimedia.org/T192135) [12:41:02] (03CR) 10Jforrester: [C: 04-2] "Needs sign-off from the product owner first, obviously." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/446583 (https://phabricator.wikimedia.org/T192135) (owner: 10Zoranzoki21) [12:42:25] PROBLEM - Ensure legal html en.wb on en.wikibooks.org is CRITICAL: additional\sterms\smay\sapply\. By\susing\sthis\ssite,\syou\sagree\sto\sthe a\shref=(https:)?\/\/foundation\.wikimedia\.org\/wiki\/Terms_of_UseTerms\sof\sUse/a html not found [12:43:44] PROBLEM - Ensure legal html en.wp on en.wikipedia.org is CRITICAL: additional\sterms\smay\sapply\. By\susing\sthis\ssite,\syou\sagree\sto\sthe a\shref=(https:)?\/\/foundation\.wikimedia\.org\/wiki\/Terms_of_UseTerms\sof\sUse/a html not found [12:48:16] this is probably a check failure [12:48:20] (03PS2) 10Zoranzoki21: Enable VisualEditor at fiwikibooks [mediawiki-config] - 10https://gerrit.wikimedia.org/r/446583 (https://phabricator.wikimedia.org/T192135) [12:48:28] !log installing PHP security updates on tungsten [12:48:30] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:48:32] jynus: Fallout from the foundationwiki move [12:48:34] foundation.wikimedia.org [12:48:36] vs [12:48:44] wikimediafoundation.org [12:49:21] let me search the ticket an notify it, I don't think it high priority, as the text is actually legible,etc. [12:50:02] and the link works (redirects) [12:50:11] yes [12:51:27] !log installing PHP5 security updates (remaining hosts which only use -cli) [12:51:29] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:52:10] !log T156137: restarting elastic1030 to disable G1GC [12:52:15] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:52:16] T156137: Reduce impact of GC pauses on elasticsearch response time - https://phabricator.wikimedia.org/T156137 [12:54:33] !log T156137: unbanning elastic1030 [12:54:36] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:56:18] this is strange, because the check already isues foundation. [12:56:23] but it complains about that? [12:56:27] !log restart eventstreams on scb2001 as precautionary after mem consumption too high (still investigating a fix) [12:56:28] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:57:10] !log restart eventstreams on scb2003 as precautionary after mem consumption too high [12:57:12] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:57:55] !log restart eventstreams on scb200[5,6] as precautionary after mem consumption too high [12:57:57] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:58:06] 10Operations, 10DNS, 10Release-Engineering-Team, 10Traffic, and 4 others: Move Foundation Wiki to new URL when new Wikimedia Foundation website launches - https://phabricator.wikimedia.org/T188776 (10BBlack) >>! In T188776#4433198, @Varnent wrote: > Sorry if the above was not clear - the corp site's primar... [12:58:58] !log Stop MySQL on db1052 to upgrade socket and MySQL [12:59:00] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:00:04] zeljkof: #bothumor When your hammer is PHP, everything starts looking like a thumb. Rise for MediaWiki train - European version. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180718T1300). [13:00:28] Oh, I see [13:00:40] the check is actually right (not sure about the mobile link) [13:00:53] but the actual website needs change [13:03:33] (03PS2) 10Marostegui: db1052: Disable notifications, upgrade socket [puppet] - 10https://gerrit.wikimedia.org/r/446533 (https://phabricator.wikimedia.org/T199861) [13:07:11] (03CR) 10Marostegui: [C: 032] db1052: Disable notifications, upgrade socket [puppet] - 10https://gerrit.wikimedia.org/r/446533 (https://phabricator.wikimedia.org/T199861) (owner: 10Marostegui) [13:09:06] (03PS3) 10Volans: puppetboard: enable some query endpoints [puppet] - 10https://gerrit.wikimedia.org/r/446564 (https://phabricator.wikimedia.org/T198939) [13:10:23] (03PS1) 10Zfilipin: group1 wikis to 1.32.0-wmf.13 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/446585 [13:10:25] (03CR) 10Zfilipin: [C: 032] group1 wikis to 1.32.0-wmf.13 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/446585 (owner: 10Zfilipin) [13:11:41] zeljkof: Please let me know when the train is done. (I have a config change for Beta Cluster's Commons to deploy for some MCR testing) [13:11:46] (03Merged) 10jenkins-bot: group1 wikis to 1.32.0-wmf.13 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/446585 (owner: 10Zfilipin) [13:11:56] anomie: sure [13:12:03] (03CR) 10jenkins-bot: group1 wikis to 1.32.0-wmf.13 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/446585 (owner: 10Zfilipin) [13:13:04] !log zfilipin@deploy1001 rebuilt and synchronized wikiversions files: group1 wikis to 1.32.0-wmf.13 [13:13:28] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:13:43] 10Operations, 10DNS, 10Release-Engineering-Team, 10Traffic, and 4 others: Move Foundation Wiki to new URL when new Wikimedia Foundation website launches - https://phabricator.wikimedia.org/T188776 (10BBlack) I should mention a few other technical issues that have been crossing my mind as we try to wade thr... [13:13:57] !log zfilipin@deploy1001 Synchronized php: group1 wikis to 1.32.0-wmf.13 (duration: 00m 53s) [13:14:04] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:16:49] anomie: I'm done [13:16:53] thanks [13:17:25] !log Deploy schema change on db1052 - T187089 [13:17:35] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:17:36] T187089: Fix WMF schemas to not break when comment store goes WRITE_NEW - https://phabricator.wikimedia.org/T187089 [13:18:29] (03CR) 10Anomie: [C: 032] "Deploying Beta Cluster config change" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/442918 (https://phabricator.wikimedia.org/T197818) (owner: 10Daniel Kinzler) [13:21:58] RECOVERY - Check systemd state on ms-be1028 is OK: OK - running: The system is fully operational [13:23:50] 10Operations, 10Operations-Software-Development: Systemd session creation fails under I/O load - https://phabricator.wikimedia.org/T199911 (10Volans) [13:25:07] RECOVERY - Check systemd state on ms-be1036 is OK: OK - running: The system is fully operational [13:25:22] (03CR) 10Volans: [C: 032] puppetboard: enable some query endpoints [puppet] - 10https://gerrit.wikimedia.org/r/446564 (https://phabricator.wikimedia.org/T198939) (owner: 10Volans) [13:29:28] anomie: are spikes related to you? https://grafana.wikimedia.org/dashboard/db/varnish-http-errors?refresh=1m&orgId=1&from=now-12h&to=now [13:30:50] zeljkof: Not that I know of. Jenkins hasn't even said the patch is merged yet. [13:30:50] looks like it was temporary [13:31:21] I'm super-careful since wmf.13 just got pushed to group1 [13:31:36] looks like things are back to normal, as far as I can see [13:31:42] zeljkof: And if it's not Beta Cluster, it really shouldn't be me because the only file my patch touches is InitialiseSettings-labs.php [13:31:54] zeljkof: https://logstash.wikimedia.org/goto/7da512e640704c81e41b5c47b33ce5b6 [13:32:00] it's mostly 500 [13:32:12] you have the 5xx on logstash, too [13:32:25] (no excuse :-)) [13:32:55] jynus: ok, looks like search related [13:32:59] and back to normal [13:33:04] * zeljkof wipes sweat [13:36:51] 10Operations, 10Patch-For-Review: Decommission servermon - https://phabricator.wikimedia.org/T198939 (10Volans) The query tab is now enabled, limited to the above endpoints. This should cover most cases, although Puppetboard's query support is not great to be honest. @faidon might have something in store... [13:49:53] 10Operations, 10DNS, 10Release-Engineering-Team, 10Traffic, and 4 others: Move Foundation Wiki to new URL when new Wikimedia Foundation website launches - https://phabricator.wikimedia.org/T188776 (10Varnent) Thank you for the information. It is not an ideal solution, but at this point it does not seem wor... [13:52:51] (03CR) 10Anomie: MCR Enable MCR write-both mode on commons beta [mediawiki-config] - 10https://gerrit.wikimedia.org/r/442918 (https://phabricator.wikimedia.org/T197818) (owner: 10Daniel Kinzler) [13:52:55] (03CR) 10Anomie: [C: 032] MCR Enable MCR write-both mode on commons beta [mediawiki-config] - 10https://gerrit.wikimedia.org/r/442918 (https://phabricator.wikimedia.org/T197818) (owner: 10Daniel Kinzler) [13:53:06] (03PS2) 10Gergő Tisza: Enable TemplateStyles on enwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/444567 (https://phabricator.wikimedia.org/T197603) [13:53:08] (03PS1) 10Gergő Tisza: Deploy TemplateStyles to frwiki and zhwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/446586 (https://phabricator.wikimedia.org/T189022) [13:53:14] 10Operations, 10DNS, 10Release-Engineering-Team, 10Traffic, and 4 others: Move Foundation Wiki to new URL when new Wikimedia Foundation website launches - https://phabricator.wikimedia.org/T188776 (10Varnent) Regarding where pages on the Governance Wiki are going in the months following the launch of the n... [13:53:28] !log installing ruby2.1 security updates for jessie [13:53:30] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:56:30] (03PS4) 10Anomie: MCR Enable MCR write-both mode on commons beta [mediawiki-config] - 10https://gerrit.wikimedia.org/r/442918 (https://phabricator.wikimedia.org/T197818) (owner: 10Daniel Kinzler) [13:56:53] (03CR) 10Anomie: MCR Enable MCR write-both mode on commons beta [mediawiki-config] - 10https://gerrit.wikimedia.org/r/442918 (https://phabricator.wikimedia.org/T197818) (owner: 10Daniel Kinzler) [13:56:56] (03CR) 10Anomie: [C: 032] MCR Enable MCR write-both mode on commons beta [mediawiki-config] - 10https://gerrit.wikimedia.org/r/442918 (https://phabricator.wikimedia.org/T197818) (owner: 10Daniel Kinzler) [13:58:41] (03Merged) 10jenkins-bot: MCR Enable MCR write-both mode on commons beta [mediawiki-config] - 10https://gerrit.wikimedia.org/r/442918 (https://phabricator.wikimedia.org/T197818) (owner: 10Daniel Kinzler) [13:59:07] (03CR) 10jenkins-bot: MCR Enable MCR write-both mode on commons beta [mediawiki-config] - 10https://gerrit.wikimedia.org/r/442918 (https://phabricator.wikimedia.org/T197818) (owner: 10Daniel Kinzler) [13:59:12] (03CR) 10Volans: [C: 032] Release 1.0.1 [software/conftool] - 10https://gerrit.wikimedia.org/r/446343 (owner: 10Volans) [13:59:55] !log anomie@deploy1001 Synchronized wmf-config/InitialiseSettings-labs.php: Sync labs config file, no prod impact (duration: 00m 54s) [13:59:57] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:00:49] (03Merged) 10jenkins-bot: Release 1.0.1 [software/conftool] - 10https://gerrit.wikimedia.org/r/446343 (owner: 10Volans) [14:01:26] (03CR) 10Muehlenhoff: [C: 031] "Looks good" [puppet] - 10https://gerrit.wikimedia.org/r/446346 (owner: 10Volans) [14:01:46] (03PS2) 10Muehlenhoff: Blacklist cdrom kernel module [puppet] - 10https://gerrit.wikimedia.org/r/445954 [14:02:32] moritzm: lol --^ [14:02:40] I love these changes [14:03:12] So we can't mass eject cdroms in the dc? [14:04:03] elukey: wait until I'm at the obscure network stuff :-) [14:04:20] (03CR) 10Muehlenhoff: [C: 032] Blacklist cdrom kernel module [puppet] - 10https://gerrit.wikimedia.org/r/445954 (owner: 10Muehlenhoff) [14:05:10] Reedy: you can still use a knitting needle! [14:06:15] Reedy: that would be an awesome way to say good morning to say good morning to our folks in the DC [14:09:29] (03PS1) 10Volans: Bump setup.py version to 1.0.1 [software/conftool] - 10https://gerrit.wikimedia.org/r/446589 [14:15:29] (03CR) 10Volans: [C: 032] Bump setup.py version to 1.0.1 [software/conftool] - 10https://gerrit.wikimedia.org/r/446589 (owner: 10Volans) [14:16:47] (03Merged) 10jenkins-bot: Bump setup.py version to 1.0.1 [software/conftool] - 10https://gerrit.wikimedia.org/r/446589 (owner: 10Volans) [14:19:41] (03PS1) 10Andrew Bogott: labtestn: pass keystone_host to designate profile [puppet] - 10https://gerrit.wikimedia.org/r/446591 [14:20:36] (03CR) 10jerkins-bot: [V: 04-1] labtestn: pass keystone_host to designate profile [puppet] - 10https://gerrit.wikimedia.org/r/446591 (owner: 10Andrew Bogott) [14:21:20] (03PS2) 10Andrew Bogott: labtestn: pass keystone_host to designate profile [puppet] - 10https://gerrit.wikimedia.org/r/446591 [14:22:22] !log rebooting vega for some tests [14:22:24] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:26:13] (03CR) 10Andrew Bogott: [C: 032] labtestn: pass keystone_host to designate profile [puppet] - 10https://gerrit.wikimedia.org/r/446591 (owner: 10Andrew Bogott) [14:27:34] PROBLEM - puppet last run on chromium is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [14:29:23] RECOVERY - Memory correctable errors -EDAC- on cp1053 is OK: (C)4 ge (W)2 ge 0 https://grafana.wikimedia.org/dashboard/db/host-overview?orgId=1&var-server=cp1053&var-datasource=eqiad%2520prometheus%252Fops [14:29:31] (03PS1) 10Kosta Harlan: Enable ORES extension on testwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/446595 (https://phabricator.wikimedia.org/T199913) [14:31:43] RECOVERY - puppet last run on labtestservices2002 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [14:32:04] PROBLEM - Check systemd state on labtestservices2002 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [14:33:24] 10Operations, 10DNS, 10Release-Engineering-Team, 10Traffic, and 4 others: Move Foundation Wiki to new URL when new Wikimedia Foundation website launches - https://phabricator.wikimedia.org/T188776 (10Varnent) General (not comprehensive) list of pages on new site: - Home - About -- Board -- Staff and contra... [14:37:15] (03PS1) 10Marostegui: db-eqiad.php: Depool db1103:3314 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/446598 (https://phabricator.wikimedia.org/T199790) [14:37:40] !log rebooting poolcounter2001 for some tests [14:37:42] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:39:25] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Depool db1103:3314 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/446598 (https://phabricator.wikimedia.org/T199790) (owner: 10Marostegui) [14:40:35] 10Operations, 10Wikimedia-SVG-rendering, 10Upstream: Incorrect text positioning in SVG rasterization (scale/transform; font-size; kerning) - https://phabricator.wikimedia.org/T36947 (10TheDJ) [14:41:16] (03Merged) 10jenkins-bot: db-eqiad.php: Depool db1103:3314 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/446598 (https://phabricator.wikimedia.org/T199790) (owner: 10Marostegui) [14:41:32] (03CR) 10jenkins-bot: db-eqiad.php: Depool db1103:3314 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/446598 (https://phabricator.wikimedia.org/T199790) (owner: 10Marostegui) [14:42:47] !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Depool db1103:3314 for alter table - T199790 (duration: 00m 52s) [14:42:50] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:42:51] T199790: Special:Log/Fanghong results in fatal exception of type "Wikimedia\Rdbms\DBQueryTimeoutError" - https://phabricator.wikimedia.org/T199790 [14:43:20] !log uploaded conftool (with python-conftool and python3-conftool) version 1.0.1 for jessie and stretch to apt.w.o [14:43:22] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:46:14] !log volans@sarin conftool action : set/pooled=yes; selector: name=mw2224.codfw.wmnet [14:46:16] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:46:18] * volans testing new conftool version, ignore the above message [14:47:31] !log Partition commonswiki.logging table on db1103:3314 - T199790 [14:47:33] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:48:02] !log mobrovac@deploy1001 Started deploy [eventstreams/deploy@09f0efe]: Set the maximum rdkafka receive buffer size to 64MB - T199813 [14:48:05] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:48:06] T199813: EventStreams accumulates too much memory on SCB nodes in CODFW - https://phabricator.wikimedia.org/T199813 [14:48:44] (03PS1) 10Andrew Bogott: labtestn: make the services node more like a normal services node [puppet] - 10https://gerrit.wikimedia.org/r/446601 [14:49:51] (03CR) 10Andrew Bogott: [C: 032] labtestn: make the services node more like a normal services node [puppet] - 10https://gerrit.wikimedia.org/r/446601 (owner: 10Andrew Bogott) [14:50:30] !log mobrovac@deploy1001 Finished deploy [eventstreams/deploy@09f0efe]: Set the maximum rdkafka receive buffer size to 64MB - T199813 (duration: 02m 28s) [14:50:33] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:51:47] 10Operations, 10ops-codfw, 10Analytics, 10EventBus, and 4 others: EventStreams accumulates too much memory on SCB nodes in CODFW - https://phabricator.wikimedia.org/T199813 (10mobrovac) p:05Unbreak!>03High a:03mobrovac Lowering the priority as fixing the receive buffer size to 64MB should do the tric... [14:51:54] (03PS1) 10Andrew Bogott: Add profile::openstack::labtestn::cloudrepo [puppet] - 10https://gerrit.wikimedia.org/r/446602 [14:53:04] (03CR) 10Andrew Bogott: [C: 032] Add profile::openstack::labtestn::cloudrepo [puppet] - 10https://gerrit.wikimedia.org/r/446602 (owner: 10Andrew Bogott) [14:53:13] RECOVERY - puppet last run on chromium is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [14:54:33] PROBLEM - puppet last run on labtestservices2002 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [14:55:38] (03PS2) 10Kosta Harlan: Enable ORES extension on testwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/446595 (https://phabricator.wikimedia.org/T199913) [14:57:25] 10Operations, 10CommRel-Specialists-Support, 10User-Johan: Community Relations support for the 2018 data center switchover - https://phabricator.wikimedia.org/T199676 (10Johan) p:05Triage>03Normal [14:59:14] (03CR) 10Sbisson: [C: 031] Enable ORES extension on testwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/446595 (https://phabricator.wikimedia.org/T199913) (owner: 10Kosta Harlan) [15:05:36] (03PS1) 10Andrew Bogott: labtestn: various puppet/hiera things for dns [puppet] - 10https://gerrit.wikimedia.org/r/446603 [15:07:15] (03PS2) 10Andrew Bogott: labtestn: various puppet/hiera things for dns [puppet] - 10https://gerrit.wikimedia.org/r/446603 [15:07:51] (03CR) 10Reedy: "Noting this enables it on production testwiki, not beta testwiki" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/446595 (https://phabricator.wikimedia.org/T199913) (owner: 10Kosta Harlan) [15:08:17] (03CR) 10Andrew Bogott: [C: 032] labtestn: various puppet/hiera things for dns [puppet] - 10https://gerrit.wikimedia.org/r/446603 (owner: 10Andrew Bogott) [15:09:23] (03CR) 10Catrope: [C: 031] Enable ORES extension on testwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/446595 (https://phabricator.wikimedia.org/T199913) (owner: 10Kosta Harlan) [15:13:14] (03PS5) 10Andrew Bogott: Delegate 185.15.56.0/24 to labs-ns0/ns1 [dns] - 10https://gerrit.wikimedia.org/r/445303 (https://phabricator.wikimedia.org/T199374) [15:13:20] (03PS1) 10Andrew Bogott: labtest: set up dns and recursor names for labtestservices2002 [dns] - 10https://gerrit.wikimedia.org/r/446604 [15:14:25] (03CR) 10Andrew Bogott: [C: 032] labtest: set up dns and recursor names for labtestservices2002 [dns] - 10https://gerrit.wikimedia.org/r/446604 (owner: 10Andrew Bogott) [15:15:04] (03PS2) 10Volans: Add zone_validator script [dns] - 10https://gerrit.wikimedia.org/r/444649 (https://phabricator.wikimedia.org/T182028) [15:18:26] (03PS1) 10Herron: mailman: whitelist hackathon IP addresses from rate limiting [puppet] - 10https://gerrit.wikimedia.org/r/446607 [15:18:41] ^ chasemp [15:19:44] looking tx herron [15:20:38] (03CR) 10Rush: [C: 031] "Thanks dude" [puppet] - 10https://gerrit.wikimedia.org/r/446607 (owner: 10Herron) [15:20:59] (03CR) 10Herron: [C: 032] mailman: whitelist hackathon IP addresses from rate limiting [puppet] - 10https://gerrit.wikimedia.org/r/446607 (owner: 10Herron) [15:25:51] !log installing conftool 1.0.1 on jessie and stretch [15:25:53] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:30:05] PROBLEM - Host 208.80.153.55 is DOWN: PING CRITICAL - Packet loss = 100% [15:32:14] ^ 55.153.80.208.in-addr.arpa domain name pointer labtest-recursor1.wikimedia.org [15:35:34] RECOVERY - Router interfaces on cr2-eqiad is OK: OK: host 208.80.154.197, interfaces up: 231, down: 0, dormant: 0, excluded: 0, unused: 0 [15:42:44] (03PS1) 10Arturo Borrero Gonzalez: cloudvps: cleanup labnet1004 resources [dns] - 10https://gerrit.wikimedia.org/r/446611 (https://phabricator.wikimedia.org/T199906) [15:43:08] (03CR) 10Arturo Borrero Gonzalez: [C: 032] cloudvps: cleanup labnet1004 resources [dns] - 10https://gerrit.wikimedia.org/r/446611 (https://phabricator.wikimedia.org/T199906) (owner: 10Arturo Borrero Gonzalez) [15:47:54] 10Operations, 10ops-eqiad, 10cloud-services-team: Relabel labnet1004.eqiad.wmnet as cloudnet1004.eqiad.wmnet - https://phabricator.wikimedia.org/T199921 (10aborrero) p:05Triage>03Normal [15:53:06] 10Operations, 10ops-eqiad, 10Cloud-VPS, 10Datasets-General-or-Unknown, 10Patch-For-Review: rack upgraded storage capacity in labstore100[67].eqiad.wmnet - https://phabricator.wikimedia.org/T196651 (10Bstorm) I just assumed it probably was, if acme wasn't running. [15:56:05] (03PS1) 10Andrew Bogott: designate: install packages from jessie-backports on Jessie [puppet] - 10https://gerrit.wikimedia.org/r/446612 [16:00:05] addshore, hashar, aude, MaxSem, twentyafterfour, RoanKattouw, Dereckson, thcipriani, Niharika, and zeljkof: Dear deployers, time to do the Morning SWAT (Max 6 patches) deploy. Dont look at me like that. You signed up for it. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180718T1600). [16:00:05] Amir1 and kostajh: A patch you scheduled for Morning SWAT (Max 6 patches) is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker. [16:00:19] o/ [16:00:38] I'm here! [16:00:54] I'll do the SWAT [16:01:29] (03PS3) 10Catrope: Enable ORES extension on testwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/446595 (https://phabricator.wikimedia.org/T199913) (owner: 10Kosta Harlan) [16:01:31] Cool [16:01:35] (03CR) 10Catrope: [C: 032] Enable ORES extension on testwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/446595 (https://phabricator.wikimedia.org/T199913) (owner: 10Kosta Harlan) [16:02:08] (03PS1) 10Dbarratt: Enable Special:Block Feedback Reqeust [mediawiki-config] - 10https://gerrit.wikimedia.org/r/446613 (https://phabricator.wikimedia.org/T199919) [16:03:05] (03CR) 10Arturo Borrero Gonzalez: [C: 031] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/446612 (owner: 10Andrew Bogott) [16:03:18] (03Merged) 10jenkins-bot: Enable ORES extension on testwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/446595 (https://phabricator.wikimedia.org/T199913) (owner: 10Kosta Harlan) [16:03:36] (03CR) 10jenkins-bot: Enable ORES extension on testwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/446595 (https://phabricator.wikimedia.org/T199913) (owner: 10Kosta Harlan) [16:04:08] (03CR) 10Andrew Bogott: [C: 032] designate: install packages from jessie-backports on Jessie [puppet] - 10https://gerrit.wikimedia.org/r/446612 (owner: 10Andrew Bogott) [16:05:00] kostajh: Your patch is on mwdebug1002 now. Do you know how to test there? (With the WikimediaDebug extension) [16:05:39] RoanKattouw: Per my CR... it's supposed to be for beta not prod [16:05:49] RoanKattouw: I don't [16:05:52] Did you create teh tables for prod testwiki? [16:05:57] Yes I did [16:06:10] And my understanding is he wants it for prod testwiki. We already have it on beta enwiki [16:06:31] kostajh: Install the WikimediaDebug browser extension, then do this: https://wikitech.wikimedia.org/wiki/X-Wikimedia-Debug#Staging_changes [16:06:45] (the part below the ssh instructions that is; I did the top half) [16:07:06] (03PS2) 10Catrope: Enable new backend for Special:Tags in fawikisource [mediawiki-config] - 10https://gerrit.wikimedia.org/r/446579 (https://phabricator.wikimedia.org/T199334) (owner: 10Ladsgroup) [16:07:13] (03CR) 10Catrope: [C: 032] Enable new backend for Special:Tags in fawikisource [mediawiki-config] - 10https://gerrit.wikimedia.org/r/446579 (https://phabricator.wikimedia.org/T199334) (owner: 10Ladsgroup) [16:08:30] (03Merged) 10jenkins-bot: Enable new backend for Special:Tags in fawikisource [mediawiki-config] - 10https://gerrit.wikimedia.org/r/446579 (https://phabricator.wikimedia.org/T199334) (owner: 10Ladsgroup) [16:08:42] (03CR) 10jenkins-bot: Enable new backend for Special:Tags in fawikisource [mediawiki-config] - 10https://gerrit.wikimedia.org/r/446579 (https://phabricator.wikimedia.org/T199334) (owner: 10Ladsgroup) [16:08:46] RoanKattouw: so set it to mwdebug1002 and visit https://test.wikimedia.beta.wmflabs.org/wiki/Special:NewPagesFeed?ores=true ? That works [16:10:24] (03PS2) 10Dbarratt: Enable Special:Block Feedback Request [mediawiki-config] - 10https://gerrit.wikimedia.org/r/446613 (https://phabricator.wikimedia.org/T199919) [16:10:50] (03PS3) 10Dbarratt: Enable Special:Block Feedback Request [mediawiki-config] - 10https://gerrit.wikimedia.org/r/446613 (https://phabricator.wikimedia.org/T199919) [16:11:56] (03PS3) 10Mark Bergsma: Test Server invariants [debs/pybal] - 10https://gerrit.wikimedia.org/r/445207 (https://phabricator.wikimedia.org/T184715) [16:11:58] (03PS1) 10Mark Bergsma: Remove Server.modified [debs/pybal] - 10https://gerrit.wikimedia.org/r/446614 [16:12:46] kostajh: Great! Then I'll deploy it [16:13:24] Hmph also the menu items are bigger in RC on testwiki [16:13:57] RoanKattouw: that's from the OOUI update last week, IIRC [16:14:09] !log catrope@deploy1001 Synchronized wmf-config/InitialiseSettings.php: Enable ORES on testwiki (T199913) (duration: 00m 55s) [16:14:10] Yeah I'm trying to figure out why it's taller and not having any luck [16:14:12] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:14:13] T199913: beta-update-databases-eqiad broken with 'PageTriage requires ORES to be installed' - https://phabricator.wikimedia.org/T199913 [16:14:14] Is there a task for it already? [16:14:43] Oh wow it's the padding rule [16:14:53] Setting padding-left: 6px makes everything taller by an unreasonable amount [16:15:30] RoanKattouw: it's discussed in https://phabricator.wikimedia.org/T199466 [16:15:59] Having any padding at all triggers it [16:21:24] Amir1: Your patch is on mwdebug1002, please test [16:21:28] kostajh: No I think this is a different bug [16:22:02] RoanKattouw: on it [16:22:37] I'll file it [16:22:49] RoanKattouw: sorry, not the same as the alignment issue but I believe that the commit referenced there is the one that changed the height of the menu items [16:23:40] RoanKattouw: It works fine, if there is no error on mwdebug logstash, please proceed [16:27:30] hello, is there time in SWAT for another patch? i'll add it to the page if so. https://gerrit.wikimedia.org/r/c/mediawiki/extensions/VisualEditor/+/446529 [16:28:33] !log catrope@deploy1001 Synchronized wmf-config/InitialiseSettings.php: Enable new backend for Special:Tags on fawikisource (T199334) (duration: 00m 54s) [16:28:36] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:28:37] T199334: Temporarily add config and use it to use change_tag_def table instead of change_tag table for Special:Tags - https://phabricator.wikimedia.org/T199334 [16:29:47] Thanks! [16:31:42] 10Operations, 10Wikimedia-Mailing-lists: Mailman issues a "403 Forbidden" error when subscribing to a list - https://phabricator.wikimedia.org/T195750 (10herron) Since the 429 "too often" error is thrown after multiple subscription attempts from the same IP maybe the phone was also connected to wifi at the tim... [16:31:53] (03PS1) 10Herron: mailman: increase per IP and per email rate limits from 1 to 5/hour [puppet] - 10https://gerrit.wikimedia.org/r/446617 (https://phabricator.wikimedia.org/T195750) [16:32:32] (03PS1) 10Vgutierrez: WIP: provide ACMEv2 support based on certbot/acme library [software/certcentral] - 10https://gerrit.wikimedia.org/r/446618 (https://phabricator.wikimedia.org/T199717) [16:33:09] (03CR) 10Herron: [C: 032] mailman: increase per IP and per email rate limits from 1 to 5/hour [puppet] - 10https://gerrit.wikimedia.org/r/446617 (https://phabricator.wikimedia.org/T195750) (owner: 10Herron) [16:33:11] (03CR) 10Vgutierrez: [C: 04-1] "heavily WIP" [software/certcentral] - 10https://gerrit.wikimedia.org/r/446618 (https://phabricator.wikimedia.org/T199717) (owner: 10Vgutierrez) [16:33:13] RoanKattouw: you're swatting? can i add https://gerrit.wikimedia.org/r/c/mediawiki/extensions/VisualEditor/+/446529 ? [16:33:40] (03CR) 10jerkins-bot: [V: 04-1] WIP: provide ACMEv2 support based on certbot/acme library [software/certcentral] - 10https://gerrit.wikimedia.org/r/446618 (https://phabricator.wikimedia.org/T199717) (owner: 10Vgutierrez) [16:34:42] 10Operations, 10ops-eqsin, 10Traffic: cp5006 unresponsive - https://phabricator.wikimedia.org/T187157 (10RobH) Scheduling a dell technician visist scheduling via email, as it involves Dell support, and them selecting and dispatching a tech. (I have to have the tech's name 48 hours before they arrive to the... [16:37:54] (03CR) 10Anomie: [C: 031] Make "sql wikishared" work again [puppet] - 10https://gerrit.wikimedia.org/r/446530 (https://phabricator.wikimedia.org/T199316) (owner: 10Tim Starling) [16:39:39] Wait a minute, morning swat is at 9 now?! [16:43:36] (03PS1) 10Cmjohnson: Adding mgmt dns for cp10[75-90] [dns] - 10https://gerrit.wikimedia.org/r/446619 (https://phabricator.wikimedia.org/T195923) [16:47:10] Niharika: on wednesday, so we don't clash with SoS. It combined that change with the EU Train window announcement [16:47:16] s/It/I/ [16:47:27] 10Operations, 10Traffic, 10Patch-For-Review: Setup wikimediafoundation.org domain for July 30 launch of new site - https://phabricator.wikimedia.org/T198922 (10BBlack) Thanks! Since that's also the IP they use for policy.wikimedia.org, we can at least have some confidence in the basics of the TLS config, fr... [16:50:15] 10Operations, 10Phabricator, 10Patch-For-Review, 10Release-Engineering-Team (Kanban), and 2 others: Apache on phab1001 is gradually leaking worker processes which are stuck in "Gracefully finishing" state - https://phabricator.wikimedia.org/T182832 (10mmodell) And now it's back again on July 1st: {F23884869} [16:50:41] MatmaRex: Sorry for the late response, will deploy that now [16:51:28] RoanKattouw: thanks. i can also put it in the evening swat if you don't have time [16:51:30] 10Operations, 10Phabricator, 10Patch-For-Review, 10Release-Engineering-Team (Kanban), and 2 others: Apache on phab1001 is gradually leaking worker processes which are stuck in "Gracefully finishing" state - https://phabricator.wikimedia.org/T182832 (10mmodell) This looks really strange. {F23884895} [16:51:36] If you don't mind, that'd be great [16:51:46] I'm at Wikimania and trying to follow a conversation :) [16:52:12] right. okay [16:54:53] 10Operations, 10Phabricator, 10Patch-For-Review, 10Release-Engineering-Team (Kanban), and 2 others: Apache on phab1001 is gradually leaking worker processes which are stuck in "Gracefully finishing" state - https://phabricator.wikimedia.org/T182832 (10Paladox) I guess that may have been the period where th... [16:55:38] (03PS1) 10ArielGlenn: add a blank line before the SyslogIdentifier line [puppet] - 10https://gerrit.wikimedia.org/r/446621 [16:56:30] (03CR) 10ArielGlenn: [C: 032] add a blank line before the SyslogIdentifier line [puppet] - 10https://gerrit.wikimedia.org/r/446621 (owner: 10ArielGlenn) [16:58:04] 10Operations, 10Traffic, 10Patch-For-Review: Setup wikimediafoundation.org domain for July 30 launch of new site - https://phabricator.wikimedia.org/T198922 (10Varnent) Excellent! I pinged Automattic about this earlier today to setup a meeting with us, Reaktiv, and Automattic to finalize things. Who should b... [17:02:05] RECOVERY - puppet last run on labtestservices2002 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [17:09:26] (03PS1) 10Sbisson: Enable PageTriage ORES filters on labs enwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/446624 (https://phabricator.wikimedia.org/T198747) [17:12:56] greg-g: Okay. It's a bit unlikely that west coast folks will be able to swat in that window though. I guess we do have several EU people who swat so it'll probably be okay. [17:17:14] 10Operations, 10Commons, 10Edit-Review-Improvements, 10Growth-Team, and 2 others: I can't edit and can't view preview - https://phabricator.wikimedia.org/T199930 (10Seadick) [17:18:21] Niharika: it used to be at 8 :) [17:19:59] Ah, I didn't know. [17:22:23] Niharika: is there any deployment in progress or about to be? [17:22:51] jouncebot: now [17:22:52] No deployments scheduled for the next 1 hour(s) and 37 minute(s) [17:22:56] volans: ^ [17:23:46] ehehe, thanks :) but sometimes schedule and reality are different ;) [17:26:18] 10Operations, 10Wikimedia-General-or-Unknown: I can't edit and can't view preview - https://phabricator.wikimedia.org/T199930 (10Framawiki) [17:56:35] PROBLEM - configured eth on labtestnet2002 is CRITICAL: eth1 reporting no carrier. [18:01:00] 10Operations, 10ops-esams, 10decommission, 10Patch-For-Review: Decommission cp300[3456] - https://phabricator.wikimedia.org/T167376 (10RobH) a:05RobH>03None [18:03:07] 10Operations, 10ops-eqiad, 10DBA, 10decommission: Decommission db1056 - https://phabricator.wikimedia.org/T193736 (10RobH) a:05Cmjohnson>03None [18:03:09] 10Operations, 10ops-eqiad, 10DC-Ops, 10decommission: decom spare server osmium/wmf4546 - https://phabricator.wikimedia.org/T191364 (10RobH) a:05Cmjohnson>03None [18:04:37] 10Operations, 10Traffic, 10Patch-For-Review: Setup wikimediafoundation.org domain for July 30 launch of new site - https://phabricator.wikimedia.org/T198922 (10BBlack) I guess me! [18:07:04] 10Operations, 10Performance-Team, 10vm-requests: Increase webperf1002/webperf2002 space from 50GB to 500 GB (Ganeti) - https://phabricator.wikimedia.org/T199853 (10herron) p:05Triage>03Normal It might be worth considering hardware for this purpose. Most Ganeti hosts have roughly 1T free disk space, and... [18:08:52] 10Operations, 10ops-eqiad, 10DC-Ops, 10decommission: decom zinc/WMF3298 - https://phabricator.wikimedia.org/T191352 (10RobH) a:05Cmjohnson>03None [18:09:05] 10Operations, 10ops-eqiad, 10DC-Ops, 10decommission: decom vanadium/WMF3291 - https://phabricator.wikimedia.org/T191351 (10RobH) a:05Cmjohnson>03None [18:09:55] 10Operations, 10ops-eqiad, 10decommission: decom spare server lawrencium/WMF3542 - https://phabricator.wikimedia.org/T191360 (10RobH) a:05Cmjohnson>03None [18:10:21] 10Operations, 10ops-eqiad, 10DC-Ops, 10decommission: decommission uranium/WMF3128 - https://phabricator.wikimedia.org/T191348 (10RobH) a:05Cmjohnson>03None [18:10:23] 10Operations, 10decommission: Decommission old server wmf4077 - https://phabricator.wikimedia.org/T190086 (10RobH) a:05Cmjohnson>03None [18:12:40] 10Operations, 10Performance-Team, 10vm-requests: Increase webperf1002/webperf2002 space from 50GB to 500 GB (Ganeti) - https://phabricator.wikimedia.org/T199853 (10Imarlier) Dedicated hardware makes me so sad :-( Is there any sort of shared storage option (Swift or otherwise)? We could also use public clou... [18:15:00] 10Operations, 10Performance-Team, 10vm-requests: Increase webperf1002/webperf2002 space from 50GB to 500 GB (Ganeti) - https://phabricator.wikimedia.org/T199853 (10herron) Looping in @fgiunchedi re: swift [18:15:02] moritzm: ping [18:17:25] 10Operations, 10User-ArielGlenn: missed pages from kafka outage on July 11 2018 - https://phabricator.wikimedia.org/T199890 (10herron) p:05Triage>03High Anecdotally the same has happened to me with aql. Both delayed and dropped SMS to my US mobile number (area code 646). I chased the alerts through our i... [18:20:11] (03PS2) 10Bstorm: dumps distribution: failing web services over to labstore1006 [puppet] - 10https://gerrit.wikimedia.org/r/446497 (https://phabricator.wikimedia.org/T196651) [18:20:14] moritzm: any chance something changed in libpng recently : https://phabricator.wikimedia.org/T198370 ? [18:20:43] matanya: 09:45 moritzm: installing libpng security updates on trusty (Debian already updated) [18:20:54] Sounds like something might've [18:21:08] that is july 1st, and the report is june 28 [18:21:37] Yes [18:21:40] so unless the user has a time machine i have some doubt about it [18:21:46] Well, no, that's 13th July [18:21:55] I don't see any other libpng SAL entries [18:22:32] i see those: [18:22:35] 2017-01-18 [18:22:35] 11:09 restarting mediawiki canary servers to pick up cairo and libpng updates [18:22:35] 2017-01-17 [18:22:35] 13:33 installing libpng security updates [18:22:46] (03CR) 10Bstorm: [C: 032] dumps distribution: failing web services over to labstore1006 [puppet] - 10https://gerrit.wikimedia.org/r/446497 (https://phabricator.wikimedia.org/T196651) (owner: 10Bstorm) [18:22:50] oops, wrong year [18:22:54] heh [18:23:30] any july 13 is past june 28 on any year i know of [18:23:44] I know [18:23:49] But AFAIK trusty isn't serving MW in prod [18:24:02] The question would be when those debian ones were applied [18:24:13] exatcly what i typed [18:24:25] (Debian already updated) = when? [18:25:07] Possibly something to do with a stretch point release [18:25:41] those trusty updates were ancient, the respective bugs have been fixed in Debian _long_ before, let me look [18:26:31] the libpng version in stretch is unchanged since we migrated from jessie to stretch [18:26:50] 10Operations, 10Operations-Software-Development: Systemd session creation fails under I/O load - https://phabricator.wikimedia.org/T199911 (10herron) p:05Triage>03Normal [18:27:36] (03PS3) 10Bstorm: WIP dumps: fail over dumps web to labstore1006 [dns] - 10https://gerrit.wikimedia.org/r/446476 (https://phabricator.wikimedia.org/T196651) [18:28:16] PROBLEM - puppet last run on labstore1006 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[acme-setup-acme-dumps] [18:31:27] matanya, Reedy: also, these are files which would have been processed by thumbor, not mediawiki or am I missing something? [18:31:49] I think you are correct [18:32:40] 10Operations, 10Core-Platform-Team, 10WMF-JobQueue, 10Services (designing), and 2 others: Exception "Job queue is read-only" - https://phabricator.wikimedia.org/T199594 (10herron) p:05Triage>03High [18:35:34] 10Operations, 10Analytics, 10ChangeProp, 10MediaWiki-JobQueue, and 2 others: Consider the possibility of separating ChangeProp and JobQueue on Kafka level - https://phabricator.wikimedia.org/T199431 (10herron) p:05Triage>03Normal [18:36:12] 10Operations, 10Analytics, 10ChangeProp, 10Services (designing), 10Wikimedia-Incident: Separate dev Change-Prop from production Kafka cluster - https://phabricator.wikimedia.org/T199427 (10herron) p:05Triage>03Normal [18:36:49] it would be helpful to have an isolated reproducer [18:37:31] there was an imagemagick update deployed on the 4th of July, but that happened after the first symptoms were reported [18:38:04] 10Operations, 10DBA, 10MediaWiki-Maintenance-scripts: sql enwik gives a poor error message when db doesn't exist - https://phabricator.wikimedia.org/T199008 (10herron) p:05Triage>03Low [18:38:06] imagemagick being what thumbor uses for scaling [18:38:56] this is an unreated commit to thumbor extensions happening just before the first report: https://phabricator.wikimedia.org/rTHMBREXT25dd4de9422c664a9b7d35bfeec5a06a86880a41 [18:39:16] actually, I think the more likely culprit is the update of python-thumbor-wikimedia, build-tagged on 26th June [18:39:21] 10Operations, 10Traffic: Deploy initial ATS test clusters in core DCs - https://phabricator.wikimedia.org/T199720 (10herron) p:05Triage>03Normal [18:39:48] and updated on 26th of June at 11:09 [18:40:10] which matches the time of the initial report [18:40:19] much more likely [18:40:59] Gilles is on vacation, I'll investigate this tomorrow with Filllippo [18:41:05] (03PS1) 10Bstorm: Revert "dumps distribution: failing web services over to labstore1006" [puppet] - 10https://gerrit.wikimedia.org/r/446637 [18:41:05] following up on the task [18:41:59] Thanks much [18:42:02] 10Operations, 10fundraising-tech-ops, 10netops: NAT and DNS for fundraising monitor host - https://phabricator.wikimedia.org/T198516 (10herron) p:05Triage>03Normal [18:42:39] I was nagged on the wikis by users, as this is a visual and change, and we all know how users are annoyed by moving their cheese :) [18:43:28] PROBLEM - HTTPS on labstore1006 is CRITICAL: SSL CRITICAL - Certificate dumps.wikimedia.org expired [18:43:34] (03CR) 10Bstorm: [C: 032] Revert "dumps distribution: failing web services over to labstore1006" [puppet] - 10https://gerrit.wikimedia.org/r/446637 (owner: 10Bstorm) [18:43:43] matanya: ack, we'll look into it tomorrow [18:44:17] danke [18:48:28] RECOVERY - puppet last run on labstore1006 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [18:55:49] 10Operations, 10MediaWiki-extensions-VipsScaler, 10Multimedia, 10Wikimedia-Incident: Thumbs from VipsScaler fail (HTTP 500; proton.svc.codfw refuses port 80) - https://phabricator.wikimedia.org/T56751 (10Krinkle) [18:56:02] 10Operations, 10MediaWiki-extensions-VipsScaler, 10Multimedia, 10Wikimedia-Incident: Thumbs from VipsScaler fail (HTTP 500; proton.svc.codfw refuses port 80) - https://phabricator.wikimedia.org/T56751 (10Krinkle) [18:56:29] 10Operations, 10Core-Platform-Team, 10MediaWiki-Maintenance-scripts: sql enwik gives a poor error message when db doesn't exist - https://phabricator.wikimedia.org/T199008 (10Marostegui) [18:57:11] 10Operations, 10ops-eqsin, 10Traffic: cp5006 unresponsive - https://phabricator.wikimedia.org/T187157 (10RobH) Email from Dell: > Hi Rob, > > The part dispatched is done and the reference number for this dispatch is DPS 91911999981. > > As such, our onsite engineer will email you the security informati... [18:57:21] (03CR) 10Kosta Harlan: [C: 031] "Looks good!" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/446624 (https://phabricator.wikimedia.org/T198747) (owner: 10Sbisson) [18:57:39] (03CR) 10RobH: [C: 031] Adding mgmt dns for cp10[75-90] [dns] - 10https://gerrit.wikimedia.org/r/446619 (https://phabricator.wikimedia.org/T195923) (owner: 10Cmjohnson) [19:00:04] Deploy window MediaWiki train - Americas version (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180718T1900) [19:04:36] (03Abandoned) 10Zoranzoki21: Deploy TemplateStyles to frwiki and zhwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/416708 (https://phabricator.wikimedia.org/T189022) (owner: 10Zoranzoki21) [19:05:40] 10Operations, 10MediaWiki-extensions-VipsScaler, 10Multimedia, 10Wikimedia-Incident: Thumbs from VipsScaler fail (HTTP 500; 10.2.1.21 refuses port 80) - https://phabricator.wikimedia.org/T56751 (10Krinkle) [19:08:32] 10Operations, 10Collaboration-Team-Triage, 10Growth-Team, 10MediaWiki-Redirects, and 3 others: Flow notification links on mobile point to desktop - https://phabricator.wikimedia.org/T107108 (10EBernhardson) [19:08:44] 10Operations, 10MediaWiki-extensions-VipsScaler, 10Multimedia, 10Wikimedia-Incident: Thumbs from VipsScaler fail (HTTP 500; 10.2.1.21 refuses port 80) - https://phabricator.wikimedia.org/T56751 (10Krinkle) This IP, oddly, resolves to proton.svc.codfw.wmnet, which is unrelated (Proton is the Chromium PDF re... [19:09:34] 10Operations, 10MediaWiki-extensions-VipsScaler, 10Multimedia, 10Wikimedia-Incident: Thumbs from VipsScaler fail (HTTP 500; 10.2.1.21 refuses port 80) - https://phabricator.wikimedia.org/T199937 (10Krinkle) [19:10:06] 10Operations, 10MediaWiki-extensions-VipsScaler, 10Multimedia, 10Wikimedia-Incident: Thumbs from VipsScaler fail (HTTP 500; 10.2.1.21 refuses port 80) - https://phabricator.wikimedia.org/T199937 (10Krinkle) This is happening again. Easy to reproduce at (03PS1) 10Reedy: Unset $wgVipsThumbnailerHost [mediawiki-config] - 10https://gerrit.wikimedia.org/r/446639 (https://phabricator.wikimedia.org/T199937) [19:19:14] (03PS2) 10Krinkle: Unset $wgVipsThumbnailerHost [mediawiki-config] - 10https://gerrit.wikimedia.org/r/446639 (https://phabricator.wikimedia.org/T199937) (owner: 10Reedy) [19:19:24] (03CR) 10Krinkle: [C: 031] "Was just writing the same, merged commit messages." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/446639 (https://phabricator.wikimedia.org/T199937) (owner: 10Reedy) [19:19:30] Reedy: Go ahead :) [19:21:08] (03CR) 10Reedy: [C: 032] Unset $wgVipsThumbnailerHost [mediawiki-config] - 10https://gerrit.wikimedia.org/r/446639 (https://phabricator.wikimedia.org/T199937) (owner: 10Reedy) [19:22:44] (03Merged) 10jenkins-bot: Unset $wgVipsThumbnailerHost [mediawiki-config] - 10https://gerrit.wikimedia.org/r/446639 (https://phabricator.wikimedia.org/T199937) (owner: 10Reedy) [19:23:02] (03CR) 10jenkins-bot: Unset $wgVipsThumbnailerHost [mediawiki-config] - 10https://gerrit.wikimedia.org/r/446639 (https://phabricator.wikimedia.org/T199937) (owner: 10Reedy) [19:24:19] !log reedy@deploy1001 Synchronized wmf-config/CommonSettings.php: Unset wgVipsThumbnailHost T199937 (duration: 00m 55s) [19:24:23] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:24:24] T199937: Thumbs from VipsScaler fail (HTTP 500; 10.2.1.21 refuses port 80) - https://phabricator.wikimedia.org/T199937 [19:31:03] HI, I talk with user Seadick which opened task T199930. I saw to he earlier opened T124417. Can you check, is there problem with databases? [19:31:03] T124417: Connection to Wikimedia projects slow/timing out for some users - https://phabricator.wikimedia.org/T124417 [19:31:04] T199930: I can't edit and can't view preview - https://phabricator.wikimedia.org/T199930 [19:33:36] We passed standard things: cleaning cache memory and cookies. [19:37:14] DB problems wouldn't give ERR_CONNECTION_RESET [19:37:31] Internet connection problems [19:37:32] Firewall issues [19:37:40] He talked with his provider [19:37:47] I asked his provider same [19:37:53] He no have problems with internet connection [19:38:08] Many ISPs tech support don't have a clue [19:38:29] It started to happening when is MediaWiki 1.32/wmf.13 released [19:39:03] Do the .12 wikis work fine for him? [19:39:18] Which are .12 wikis? [19:39:39] Most of the wikipedias [19:40:09] I will ask him [19:40:14] 10Operations, 10MediaWiki-extensions-VipsScaler, 10Multimedia, 10Wikimedia-log-errors: VipsScaler broken for MediaWiki production (/usr/bin/vips: No such file) - https://phabricator.wikimedia.org/T199938 (10Krinkle) [19:40:16] Wait [19:40:27] 10Operations, 10MediaWiki-extensions-VipsScaler, 10Multimedia, 10Patch-For-Review, 10Wikimedia-Incident: Thumbs from VipsScaler fail (HTTP 500; 10.2.1.21 refuses port 80) - https://phabricator.wikimedia.org/T199937 (10Krinkle) 05Open>03Resolved a:03Krinkle It's still failing but now with: ```count... [19:41:26] (03CR) 10Cmjohnson: [C: 032] Adding mgmt dns for cp10[75-90] [dns] - 10https://gerrit.wikimedia.org/r/446619 (https://phabricator.wikimedia.org/T195923) (owner: 10Cmjohnson) [19:44:43] Reedy: He have same problem [19:44:50] So it's nothing to do with .13 [19:45:19] The fact there's not a wider problem (or so far as reported)... Suggests it's his computer/network/internet connection [19:45:28] Reedy: Ok. He tryed to edit my user page on serbian wikipedia. He got same error [19:52:09] (03PS1) 10Reedy: Install libvips-tools on mw app servers [puppet] - 10https://gerrit.wikimedia.org/r/446640 (https://phabricator.wikimedia.org/T199938) [20:00:04] cscott, arlolra, subbu, bearND, halfak, and Amir1: #bothumor Q:Why did functions stop calling each other? A:They had arguments. Rise for Services – Parsoid / Citoid / Mobileapps / ORES / … . (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180718T2000). [20:03:51] (03PS2) 10Krinkle: Install libvips-tools on mw app servers [puppet] - 10https://gerrit.wikimedia.org/r/446640 (https://phabricator.wikimedia.org/T199938) (owner: 10Reedy) [20:04:53] Reedy: Edge, mozilla, chrome and in other browsers on his PC have this problem.. I checked router.. Everything with him is ok [20:05:04] Everything isn't ok [20:07:13] It could be an ISP peer or carrier [20:11:48] Reedy: I am back.. He have and on phone this problem [20:13:37] But he no have this problem on non-wmf projects [20:20:21] (03PS1) 10Andrew Bogott: labtestn: use the same DB as labtest for designate [puppet] - 10https://gerrit.wikimedia.org/r/446685 [20:21:27] (03CR) 10Andrew Bogott: [C: 032] labtestn: use the same DB as labtest for designate [puppet] - 10https://gerrit.wikimedia.org/r/446685 (owner: 10Andrew Bogott) [20:26:09] (03PS1) 10EBernhardson: Create prometheus::resource_config [puppet] - 10https://gerrit.wikimedia.org/r/446687 [20:26:41] (03CR) 10jerkins-bot: [V: 04-1] Create prometheus::resource_config [puppet] - 10https://gerrit.wikimedia.org/r/446687 (owner: 10EBernhardson) [20:28:56] 10Operations, 10DBA, 10JADE, 10TechCom-RFC, and 2 others: Extension:JADE scalability concerns due to creating a page per revision - https://phabricator.wikimedia.org/T196547 (10Krinkle) [20:30:38] (03PS2) 10EBernhardson: Create prometheus::resource_config [puppet] - 10https://gerrit.wikimedia.org/r/446687 [20:33:38] T199930... [20:33:39] T199930: I can't edit and can't view preview - https://phabricator.wikimedia.org/T199930 [20:43:56] Is 109.252.54.0/23 maybe blacklisted? [20:46:12] (03PS1) 10Andrew Bogott: labtestn: use a labtestn-specific designate host, labtestservices2002 [puppet] - 10https://gerrit.wikimedia.org/r/446690 [20:46:52] (03CR) 10Andrew Bogott: [C: 032] labtestn: use a labtestn-specific designate host, labtestservices2002 [puppet] - 10https://gerrit.wikimedia.org/r/446690 (owner: 10Andrew Bogott) [20:57:34] (03PS1) 10Bstorm: dumps distribution: set the ttl lower to prepare for web failover on dumps [dns] - 10https://gerrit.wikimedia.org/r/446692 (https://phabricator.wikimedia.org/T196651) [20:59:41] (03PS1) 10Andrew Bogott: nova with neutron: enable notifications [puppet] - 10https://gerrit.wikimedia.org/r/446693 [20:59:55] (03CR) 10Bstorm: [C: 032] dumps distribution: set the ttl lower to prepare for web failover on dumps [dns] - 10https://gerrit.wikimedia.org/r/446692 (https://phabricator.wikimedia.org/T196651) (owner: 10Bstorm) [21:00:30] (03CR) 10Andrew Bogott: [C: 032] nova with neutron: enable notifications [puppet] - 10https://gerrit.wikimedia.org/r/446693 (owner: 10Andrew Bogott) [21:02:05] 10Operations, 10Phabricator, 10Patch-For-Review, 10Release-Engineering-Team (Kanban), and 2 others: Apache on phab1001 is gradually leaking worker processes which are stuck in "Gracefully finishing" state - https://phabricator.wikimedia.org/T182832 (10mmodell) >>! In T182832#4434775, @Paladox wrote: > I gu... [21:02:11] hi, I get NET::ERR_CERT_AUTHORITY_INVALID on en.wp now... [21:03:09] yannf: Can you get some advanced details to see which certificate you are actually seeing? [21:03:55] If this is in your bot, its likely one of the libraries are just severely out of date or the certificate store is missing [21:03:59] or something like that [21:04:30] bawolff: Hi, do you have ideas about task T199930. I tryed all but nothing new is not happened [21:04:31] T199930: I can't edit and can't view preview - https://phabricator.wikimedia.org/T199930 [21:05:18] https://phabricator.wikimedia.org/T199950 [21:05:48] Zoranzoki21: Connection problem on the user end. Possibly some sort of anti-virus or security product that is malfunctioning (E.g. if it was MITM'ing in order to scan for insecure websites) [21:06:28] Or at least that'd be my guess [21:07:02] yannf:, well.... that looks very wrong [21:09:30] bawolff: He no have antivirus or security product in system [21:09:43] yannf: C=TR, ST=ANKARA, O=Bilgi Teknolojileri ve Iletisim Kurumu, CN=erisimengellisayfa [21:09:56] yannf: So someone is man in the middling that connection from the sounds of it [21:10:06] yannf: Do you have any "https" proxies setup? [21:10:36] no [21:11:11] I am travelling, so my timezone was wrong [21:11:41] so I fixed that [21:11:49] but same again [21:12:33] yannf: Are you travelling in Turkey? [21:12:43] yes [21:12:44] (03CR) 10Muehlenhoff: [C: 031] "Looks fine, I'll merge this tomorrow." [puppet] - 10https://gerrit.wikimedia.org/r/446640 (https://phabricator.wikimedia.org/T199938) (owner: 10Reedy) [21:12:57] wikipedia is blocked there [21:13:11] ah :/ [21:13:48] Yeah, this looks like the turkish government trying to MITM your connection [21:14:28] The organization set on the certificate = "Bilgi Teknolojileri ve Iletisim Kurumu" which is the national telecom of turkey or something [21:14:37] commons.wikimedia.org is OK [21:15:48] Meta is also OK [21:15:48] They also are totally not following best practises for TLS [21:15:53] guess that's not suprising [21:16:04] yannf: They are probably going by domain name [21:16:16] yannf: you may be able to get around if you disable SNI [21:16:51] SNI? [21:17:16] Server name indicator [21:18:01] I don't know if its actually possible to easily disable [21:18:31] yannf: Basically, SNI is part of HTTPS where it tells the webserver which site your visiting, and it sends it before encrypting [21:18:58] If you don't send it, some websites will stop working, but their blocking might also (depending on the method used to block wikipedia) [21:20:04] The fact that commons works suggests that they are not blocking based on IP. So that leaves either DNS hijacking, or looking at SNI. SNI is probably the more likely method being employed [21:20:06] i only remember them blocking en wiki, not commons or meta as they are on wikimedia.org [21:20:09] not wikipedia.org [21:20:29] Of course you may not want to try and circumvent foreign laws well in a foreign country [21:21:44] paladox: Yes. Both en.wikipedia.org and meta.wikimedia.org use the same IP address. So to block them, gov can either change what IP address is associated with en.wikipedia.org (DNS poisioning). Or they can look at the SNI "header" (Technically TLS extension not header), to see which domain you are visiting [21:21:58] oh [21:21:59] If they were blocking via IP address, both commons and wikipedia would be blocked because they use the same IP [21:22:25] fr.wikipedia.org ks [21:22:25] is also blocked [21:22:41] yannf all of wikipedia.org will be blocked. [21:23:43] SNI (for the most part) isn't used on wikipedia (AFAIK), so techncially its not needed to send one. DNS poisioning (If they were doing that, which they probably aren't) can be gotten around by using a different DNS server (e.g. google's 8.8.8.8 or cloudflare's 1.1.1.1) [21:23:43] I will use Lantern [21:23:48] or for best results TOR [21:28:05] https://en.wikisource.org is also OK [21:29:32] anyway thanks [21:31:59] now blocked from editing :// [21:32:04] with Lantern [21:32:35] :( [21:32:50] time to sleep [21:32:58] yeah, it sucks we don't support annonymization products [21:33:05] I will look at solutions tomorrow [21:33:09] you can get ipblock-exempt flag somehow [21:33:22] but i think its complicated [21:33:33] {{fact}} :-) [21:33:44] ipbe's can be granted by any sysop [21:33:44] https://meta.wikimedia.org/wiki/Global_IP_block_exemptions [21:33:49] bye [21:33:54] GIPBE is my turf :) [21:34:05] Ah [21:34:08] but it is not hard for established uses to get one if they have a need for it [21:34:48] I've never used, just heard people complain. But most of the people complaining are the types who want to use TOR straight from the start and never established themselves as legit [21:35:18] Problem is that TOR and other anonymization services are abused by vandals and sock puppets [21:35:24] so we block them [21:35:27] yep [21:35:35] TorBlock even does that for us automatically [21:36:02] I've been through all the arguments. Seems mostly there are no good solutions [21:36:02] Even established users have been later 'caught' abusing anonymization services :( [21:36:11] so it is a bit of a problem, yes [21:37:36] I'm partial to the solution at http://homes.soic.indiana.edu/henry/publications/wpes13.pdf -- but that only really solves the vandalism issue (And just barely at that, not as much as users actually want), not the sockpuppet issue [21:38:07] And also would need to be implemented (Kind of complicated. Would require support on the client side which would also annoy potential users) [21:39:38] in terms of the anonoymous blocking papers, I also kind of like http://www-users.cs.umn.edu/~hopper/bnymble.pdf (the BLAC one seems more privacy preserving, but the bnymble one is much simpler) [21:47:32] I'm afraid I cannot help in implementing that. [21:47:44] * Hauskatze goes find an extension that lacks codesniffer [21:51:20] Its quite a bit above me too [21:54:59] Hauskatze: https://www.mediawiki.org/wiki/User:Legoktm/ci [21:55:28] and I was doing it by hand [21:55:43] <3 [21:55:47] :D [21:56:07] Hauskatze: also have I shown you my auto add PHPCS script? [21:56:37] Hauskatze: https://git.legoktm.com/legoktm/bin/src/master/add_codesniffer.py it adds codesniffer, runs composer, and then auto-fixes all the sniffs that are possible in their own commits [21:56:50] legoktm: I don't think so. I usually do 'phpcbf' (a.k.a. composer fix) while doing that stuff [21:57:05] but I saw the other day you did fixed one by one [21:57:15] which I think it might be better for reviewers? [21:57:17] all scripted, 0 manual work :) [21:57:18] yep [21:57:23] it makes it super easy to review [21:57:31] I WANT THAT!!!11 [21:57:36] https://git.legoktm.com/legoktm/bin/src/master/add_codesniffer.py [21:57:37] :) [21:57:47] just needs python3 [21:58:21] so I invoke the script from some random folder and python add_codesniffer.py path/to/the/extension ? [21:58:46] invoke the script from the extensoin directory [21:58:54] python ../../whatever/add_codesniffer.py [21:59:15] ah, so I'll just copy the script as needed [21:59:20] * Hauskatze bookmarks [22:00:13] hmm... 'import subprocesses' <-- I think I had some troubles with that in the past [22:00:27] well, I can try doing some now and see how it goes [22:00:58] 10Operations, 10Wikimedia-SVG-rendering, 10Upstream: librsvg misinterpret quoted font family names that contain whitespaces - https://phabricator.wikimedia.org/T64987 (10JoKalliauer) [22:10:26] legoktm: \AppData\Local\Programs\Python\Python36-32\lib\subprocess.py", line 286, in check_call etc etc [22:10:37] system cannot find the specified file [22:11:18] Oh [22:11:29] Maybe hack in your composer path there? [22:12:12] I need to find where that file is I guess [22:12:20] (03PS2) 10EBernhardson: [WIP] Add mjolnir kafka daemon to primary elasticsearch clusters [puppet] - 10https://gerrit.wikimedia.org/r/445254 (https://phabricator.wikimedia.org/T198490) [22:12:37] 10Operations, 10Wikimedia-SVG-rendering, 10Upstream: librsvg misinterpret quoted font family names that contain whitespaces - https://phabricator.wikimedia.org/T64987 (10JoKalliauer) Merged from T184369 If you use (quote and double-quote) >font-family="'font name'" instead of >font-family="font name" or... [22:15:52] This is weird; subprocess.py is there on /lib/ [22:16:06] I hate when this happens [22:19:49] 10Operations, 10Wikimedia-SVG-rendering, 10Upstream: librsvg misinterpret quoted font family names that contain whitespaces - https://phabricator.wikimedia.org/T64987 (10Aklapper) [22:24:07] No luck [22:26:46] In the meanwhile I'll update to Python 3.7.0 [22:35:43] Hauskatze: wait can you post the full stacktrace please? [22:36:03] too late good sir [22:36:31] because I think it was that python couldn't find composer...it definitely found subprocess.py [22:37:14] I'm finishing with pyton 3.7.0 x86 [22:37:24] I'll re-try [22:40:42] legoktm: https://phabricator.wikimedia.org/P7374 [22:41:12] where is composer installed to? [22:41:27] Blackout/composer.json [22:41:40] wait [22:41:46] you mean the composer program [22:41:50] let me fetch [22:42:23] yes [22:45:40] on AppData\Composer [22:45:49] PROBLEM - Disk space on elastic1024 is CRITICAL: DISK CRITICAL - free space: /srv 51735 MB (10% inode=99%) [22:45:55] note that Python 36-32 was also there [22:52:46] Hauskatze: replace 'composer' with the full path to your composer.exe? [22:53:31] hmm, I don't understand... I invoke the add_codesniffer.py script using 'python' not composer [22:55:50] RECOVERY - Disk space on elastic1024 is OK: DISK OK [23:00:04] addshore, hashar, aude, MaxSem, twentyafterfour, RoanKattouw, Dereckson, thcipriani, Niharika, and zeljkof: How many deployers does it take to do Evening SWAT (Max 6 patches) deploy? (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180718T2300). [23:00:04] MatmaRex: A patch you scheduled for Evening SWAT (Max 6 patches) is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker. [23:00:17] yeah hi [23:03:40] Hauskatze: I mean to edit the script itself [23:03:52] add_composer? [23:03:58] I think I can do that [23:04:59] is anyone swatting? [23:05:40] Not me. [23:05:41] I can SWAT [23:06:12] thanks [23:06:28] Hauskatze: https://paste.fedoraproject.org/paste/qAGbEyB0NMdkjw7dK3uQAg/raw [23:06:34] Hauskatze: lets switch to pm [23:06:57] legoktm: I'll try that [23:07:03] legoktm: ok, I'll /accept [23:08:11] 10Puppet, 10Toolforge, 10Goal: Fully puppetize Grid Engine - https://phabricator.wikimedia.org/T88711 (10Bstorm) [23:28:11] MatmaRex: both of your changes are on mwdebug1002, check please [23:28:31] on it [23:30:07] thcipriani: both look good [23:30:17] ok, going live [23:33:02] !log thcipriani@deploy1001 Synchronized php-1.32.0-wmf.13/extensions/VisualEditor/modules/ve-mw/ui/dialogs/ve.ui.MWMediaDialog.js: SWAT: [[gerrit:446529|ve.ui.MWMediaDialog: Fix confusion between #getSetupProcess and #getReadyProcess]] T185944 T199841 (duration: 00m 56s) [23:33:07] ^ that's one [23:33:07] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:33:08] T199841: [Regression wmf.13] All kinds of buttons from different panels of Media Settings dialog are appearing at the bottom of the dialog for a split second while editing an image - https://phabricator.wikimedia.org/T199841 [23:33:08] T185944: OOUI window ready process runs before window is actually ready (Category drop down list gets sticky when it's open while closing the dialog) - https://phabricator.wikimedia.org/T185944 [23:35:37] !log thcipriani@deploy1001 Synchronized php-1.32.0-wmf.13/includes/logging/LogEventsList.php: SWAT: [[gerrit:446635|LogEventsList: Use GET in HTMLForm]] T199856 (duration: 00m 54s) [23:35:40] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:35:41] T199856: Special:Log form should be using GET not POST - https://phabricator.wikimedia.org/T199856 [23:35:45] ^ and that's two [23:35:48] MatmaRex: all live now [23:35:58] thanks! [23:36:01] yw :) [23:49:05] 10Puppet, 10Toolforge, 10Goal: Fully puppetize Grid Engine - https://phabricator.wikimedia.org/T88711 (10Bstorm) [23:51:19] PROBLEM - kubelet operational latencies on kubernetes1003 is CRITICAL: instance=kubernetes1003.eqiad.wmnet operation_type={create_container,remove_container,start_container,stop_container} https://grafana.wikimedia.org/dashboard/db/kubernetes-kubelets?orgId=1 [23:52:29] RECOVERY - kubelet operational latencies on kubernetes1003 is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/kubernetes-kubelets?orgId=1