[00:00:05] that sounds right [00:00:05] does Bjones still have contribs/logs? [00:00:21] Of course, we could just decline it... As they're not WMF staff any more? [00:00:51] I cannot take that decision :) [00:01:04] but james o'almighty can :P [00:01:14] Rename User Cleanup starting... [00:01:14] Found possible log entry of the rename, please check: Bjones with comment Consistency on 20130222064759 [00:01:14] Found 0 edits to be re-attributed from table revision for uid 95 [00:02:31] so it fixed itself [00:02:45] Reedy: you are correct they are not staff anymore so I'm certainly happy with whatever as long as you don't think it will cause problems later on :) [00:02:55] Jamesofur: Everything looks consistent [00:02:58] The user row is correct [00:03:06] awesome [00:03:07] I have no idea if someone fixed it manually at some other point [00:03:07] even better :D [00:03:24] I wouldn't even count out the idea that I did... though I don't remember it [00:03:41] If it was just the user row fscked, but the other tables fine... [00:03:46] The SQL update would be trivial [00:04:53] for uid 0 [00:05:00] weird? [00:05:10] or expected? [00:05:51] I guess that means something doesn't exist [00:06:02] I guess we have no idea if the maintenance script even works [00:06:48] (03PS2) 10Paladox: phab1001: add interface::add_ip6_mapped [puppet] - 10https://gerrit.wikimedia.org/r/368957 (https://phabricator.wikimedia.org/T163938) (owner: 10Dzahn) [00:06:50] if the row looks good then I don't worry much anymore [00:07:03] $olduser = User::newFromName( $this->getOption( 'olduser' ) ); [00:07:03] $newuser = User::newFromName( $this->getOption( 'newuser' ) ); [00:07:55] $this->doUpdates( $olduser, $newuser, $newuser->getId() ); [00:07:55] $this->doUpdates( $olduser, $newuser, 0 ); [00:08:01] Yeah, I guess it counts as expected [00:08:17] so the script is okay? [00:08:32] Dunno [00:08:47] You'd think it would do a select on user id where user text is old [00:09:03] They have no old revision rows [00:47:31] 10Operations, 10Epic, 10Goal, 10Services (doing): Consider a lower virtual node count - https://phabricator.wikimedia.org/T172149#3488217 (10Reedy) [01:29:16] PROBLEM - Check Varnish expiry mailbox lag on cp1099 is CRITICAL: CRITICAL: expiry mailbox lag is 2025030 [01:31:09] (03PS3) 10Dzahn: phab1001: add interface::add_ip6_mapped [puppet] - 10https://gerrit.wikimedia.org/r/368957 (https://phabricator.wikimedia.org/T163938) [01:38:55] (03CR) 10Dzahn: [C: 032] phab1001: add interface::add_ip6_mapped [puppet] - 10https://gerrit.wikimedia.org/r/368957 (https://phabricator.wikimedia.org/T163938) (owner: 10Dzahn) [02:07:16] (03PS1) 10Dzahn: add IPv6 records for phab1001.eqiad.wmnet. [dns] - 10https://gerrit.wikimedia.org/r/368969 (https://phabricator.wikimedia.org/T163938) [02:07:31] (03CR) 10jerkins-bot: [V: 04-1] add IPv6 records for phab1001.eqiad.wmnet. [dns] - 10https://gerrit.wikimedia.org/r/368969 (https://phabricator.wikimedia.org/T163938) (owner: 10Dzahn) [02:10:14] ok jenkins, fair enough, quad-A record, not penta-A record [02:10:26] (03PS2) 10Dzahn: add IPv6 records for phab1001.eqiad.wmnet. [dns] - 10https://gerrit.wikimedia.org/r/368969 (https://phabricator.wikimedia.org/T163938) [02:13:40] (03PS3) 10Dzahn: add IPv6 records for phab1001.eqiad.wmnet. [dns] - 10https://gerrit.wikimedia.org/r/368969 (https://phabricator.wikimedia.org/T163938) [02:19:10] (03CR) 10Dzahn: [C: 032] add IPv6 records for phab1001.eqiad.wmnet. [dns] - 10https://gerrit.wikimedia.org/r/368969 (https://phabricator.wikimedia.org/T163938) (owner: 10Dzahn) [02:25:41] !log l10nupdate@tin scap sync-l10n completed (1.30.0-wmf.11) (duration: 07m 46s) [02:25:53] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [02:32:17] !log l10nupdate@tin ResourceLoader cache refresh completed at Tue Aug 1 02:32:17 UTC 2017 (duration 6m 36s) [02:32:28] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [02:35:16] (03CR) 10Dzahn: [C: 032] "yes, replaced by https://gerrit.wikimedia.org/r/#/c/368841/" [puppet] - 10https://gerrit.wikimedia.org/r/368956 (owner: 10Paladox) [02:35:29] (03PS3) 10Dzahn: phabricator: Remove old rsync file [puppet] - 10https://gerrit.wikimedia.org/r/368956 (owner: 10Paladox) [02:49:16] RECOVERY - Check Varnish expiry mailbox lag on cp1099 is OK: OK: expiry mailbox lag is 325 [03:26:37] PROBLEM - MariaDB Slave Lag: s1 on dbstore1002 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 861.99 seconds [03:42:46] !log mobrovac@tin Started deploy [restbase/deploy@0d12138]: Add nl.wikinews - T171897 [03:42:58] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [03:42:59] T171897: Configure Wikimedia Parsoid servers to know about Dutch Wikinews - https://phabricator.wikimedia.org/T171897 [03:50:43] !log mobrovac@tin Finished deploy [restbase/deploy@0d12138]: Add nl.wikinews - T171897 (duration: 07m 57s) [03:50:53] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [03:50:54] T171897: Configure Wikimedia Parsoid servers to know about Dutch Wikinews - https://phabricator.wikimedia.org/T171897 [04:13:56] RECOVERY - MariaDB Slave Lag: s1 on dbstore1002 is OK: OK slave_sql_lag Replication lag: 128.54 seconds [04:15:04] 10Operations, 10Traffic, 10Performance, 10Services (later): Look into solutions for replaying traffic to testing environment(s) - https://phabricator.wikimedia.org/T129682#2112642 (10mobrovac) Since we are sampling live traffic, should we decline this task? @Eevans thoughts? [04:16:21] 10Operations, 10RESTBase, 10service-runner, 10Services (done), 10User-mobrovac: enable restbase syslog/file logging - https://phabricator.wikimedia.org/T112648#3488427 (10mobrovac) 05Open>03Resolved a:03mobrovac Yup. RESTBase is logging locally as well. Resolving. [05:12:46] PROBLEM - recommendation_api endpoints health on scb1001 is CRITICAL: /{domain}/v1/translation/articles/{source}{/seed} (normal source and target) timed out before a response was received [05:13:36] RECOVERY - recommendation_api endpoints health on scb1001 is OK: All endpoints are healthy [05:21:45] !log Restart MySQL on labsdb1003 as it is totally stuck [05:21:56] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:45:27] (03PS2) 10Marostegui: mariadb: Add db2073 to s4 [puppet] - 10https://gerrit.wikimedia.org/r/368804 (https://phabricator.wikimedia.org/T170662) [05:47:13] (03CR) 10Marostegui: [C: 032] mariadb: Add db2073 to s4 [puppet] - 10https://gerrit.wikimedia.org/r/368804 (https://phabricator.wikimedia.org/T170662) (owner: 10Marostegui) [05:50:26] PROBLEM - carbon-cache too many creates on graphite1001 is CRITICAL: CRITICAL: 1.67% of data above the critical threshold [1000.0] [05:52:24] (03PS1) 10Marostegui: mariadb: db2073 is going to s4, not s1 [puppet] - 10https://gerrit.wikimedia.org/r/368980 [05:53:49] (03CR) 10Marostegui: [C: 032] mariadb: db2073 is going to s4, not s1 [puppet] - 10https://gerrit.wikimedia.org/r/368980 (owner: 10Marostegui) [05:57:29] (03PS1) 10Marostegui: mariadb: db2073 yaml file for s4 [puppet] - 10https://gerrit.wikimedia.org/r/368981 [05:58:51] (03CR) 10Marostegui: [C: 032] mariadb: db2073 yaml file for s4 [puppet] - 10https://gerrit.wikimedia.org/r/368981 (owner: 10Marostegui) [06:02:58] (03PS1) 10Marostegui: s4.hosts: Add db2073 [software] - 10https://gerrit.wikimedia.org/r/368982 (https://phabricator.wikimedia.org/T170662) [06:04:10] (03CR) 10Marostegui: [C: 032] s4.hosts: Add db2073 [software] - 10https://gerrit.wikimedia.org/r/368982 (https://phabricator.wikimedia.org/T170662) (owner: 10Marostegui) [06:04:49] (03Merged) 10jenkins-bot: s4.hosts: Add db2073 [software] - 10https://gerrit.wikimedia.org/r/368982 (https://phabricator.wikimedia.org/T170662) (owner: 10Marostegui) [06:07:45] (03PS1) 10Marostegui: db-codfw.php: Depool db2065 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/368983 (https://phabricator.wikimedia.org/T170662) [06:10:24] (03CR) 10Marostegui: [C: 032] db-codfw.php: Depool db2065 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/368983 (https://phabricator.wikimedia.org/T170662) (owner: 10Marostegui) [06:11:55] (03Merged) 10jenkins-bot: db-codfw.php: Depool db2065 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/368983 (https://phabricator.wikimedia.org/T170662) (owner: 10Marostegui) [06:12:08] (03CR) 10jenkins-bot: db-codfw.php: Depool db2065 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/368983 (https://phabricator.wikimedia.org/T170662) (owner: 10Marostegui) [06:13:10] !log marostegui@tin Synchronized wmf-config/db-codfw.php: Depool db2065 - T170662 (duration: 00m 43s) [06:13:20] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:13:21] T170662: Productionize 22 new codfw database servers - https://phabricator.wikimedia.org/T170662 [06:16:17] PROBLEM - Router interfaces on cr1-codfw is CRITICAL: CRITICAL: host 208.80.153.192, interfaces up: 122, down: 1, dormant: 0, excluded: 0, unused: 0 [06:16:27] PROBLEM - Router interfaces on cr2-ulsfo is CRITICAL: CRITICAL: host 198.35.26.193, interfaces up: 76, down: 1, dormant: 0, excluded: 0, unused: 0 [06:17:48] PROBLEM - Router interfaces on cr2-codfw is CRITICAL: CRITICAL: host 208.80.153.193, interfaces up: 122, down: 1, dormant: 0, excluded: 0, unused: 0 [06:20:21] !log Stop MySQL on db2065 to copy its data to db2073 - T170662 [06:20:32] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:20:32] T170662: Productionize 22 new codfw database servers - https://phabricator.wikimedia.org/T170662 [06:24:57] RECOVERY - Router interfaces on cr2-codfw is OK: OK: host 208.80.153.193, interfaces up: 124, down: 0, dormant: 0, excluded: 0, unused: 0 [06:25:28] RECOVERY - Router interfaces on cr1-codfw is OK: OK: host 208.80.153.192, interfaces up: 124, down: 0, dormant: 0, excluded: 0, unused: 0 [06:25:37] RECOVERY - Router interfaces on cr2-ulsfo is OK: OK: host 198.35.26.193, interfaces up: 78, down: 0, dormant: 0, excluded: 0, unused: 0 [06:31:50] (03PS3) 10ArielGlenn: write dump output files to temporary location, move in place when done [dumps] - 10https://gerrit.wikimedia.org/r/368744 (https://phabricator.wikimedia.org/T169849) [06:32:11] (03CR) 10jerkins-bot: [V: 04-1] write dump output files to temporary location, move in place when done [dumps] - 10https://gerrit.wikimedia.org/r/368744 (https://phabricator.wikimedia.org/T169849) (owner: 10ArielGlenn) [06:35:19] (03PS1) 10Marostegui: mariadb: Add db2074 as a new slave for s3 [puppet] - 10https://gerrit.wikimedia.org/r/368985 (https://phabricator.wikimedia.org/T170662) [06:39:12] 10Operations, 10Pybal, 10Traffic: lvs servers report 'Memory allocation problem' on bootup - https://phabricator.wikimedia.org/T82849#3488505 (10ema) I've sent a patch upstream covering the virtual service removal case: http://archive.linuxvirtualserver.org/html/lvs-devel/2017-07/msg00016.html [06:39:25] (03CR) 10Marostegui: "https://puppet-compiler.wmflabs.org/compiler02/7239/" [puppet] - 10https://gerrit.wikimedia.org/r/368985 (https://phabricator.wikimedia.org/T170662) (owner: 10Marostegui) [06:44:37] RECOVERY - carbon-cache too many creates on graphite1001 is OK: OK: Less than 1.00% above the threshold [500.0] [07:02:28] (03PS4) 10ArielGlenn: write dump output files to temporary location, move in place when done [dumps] - 10https://gerrit.wikimedia.org/r/368744 (https://phabricator.wikimedia.org/T169849) [07:03:09] (03PS6) 10Ema: pybal: bind instrumentation TCP port to private addresses [puppet] - 10https://gerrit.wikimedia.org/r/348074 (https://phabricator.wikimedia.org/T103882) [07:04:32] (03CR) 10Elukey: [C: 032] hive: remove etc default config in favor of hive-env.sh [puppet/cdh] - 10https://gerrit.wikimedia.org/r/368820 (https://phabricator.wikimedia.org/T172107) (owner: 10Elukey) [07:08:43] (03PS2) 10Ema: Instrumentation fixes [debs/pybal] (1.13) - 10https://gerrit.wikimedia.org/r/365610 (https://phabricator.wikimedia.org/T103882) [07:10:11] (03PS1) 10Ema: Set empty PYTHONPATH in tox.ini [debs/pybal] (1.13) - 10https://gerrit.wikimedia.org/r/368986 [07:12:14] (03PS1) 10Elukey: hive: fix default parameters for metastore/server [puppet/cdh] - 10https://gerrit.wikimedia.org/r/368987 (https://phabricator.wikimedia.org/T172107) [07:12:32] (03CR) 10Ema: [C: 032] Set empty PYTHONPATH in tox.ini [debs/pybal] (1.13) - 10https://gerrit.wikimedia.org/r/368986 (owner: 10Ema) [07:13:38] (03PS2) 10Elukey: hive: fix default parameters for metastore/server [puppet/cdh] - 10https://gerrit.wikimedia.org/r/368987 (https://phabricator.wikimedia.org/T172107) [07:14:55] (03PS3) 10Ema: Instrumentation fixes [debs/pybal] (1.13) - 10https://gerrit.wikimedia.org/r/365610 (https://phabricator.wikimedia.org/T103882) [07:16:00] (03CR) 10Ema: [C: 032] pybal: bind instrumentation TCP port to private addresses [puppet] - 10https://gerrit.wikimedia.org/r/348074 (https://phabricator.wikimedia.org/T103882) (owner: 10Ema) [07:16:36] (03CR) 10Elukey: [C: 032] hive: fix default parameters for metastore/server [puppet/cdh] - 10https://gerrit.wikimedia.org/r/368987 (https://phabricator.wikimedia.org/T172107) (owner: 10Elukey) [07:21:38] (03PS1) 10Elukey: cdh::hive: add hive-env.sh to properly set jvm opts [puppet] - 10https://gerrit.wikimedia.org/r/368988 (https://phabricator.wikimedia.org/T172107) [07:23:15] (03CR) 10Ema: [C: 032] Instrumentation fixes [debs/pybal] (1.13) - 10https://gerrit.wikimedia.org/r/365610 (https://phabricator.wikimedia.org/T103882) (owner: 10Ema) [07:23:45] (03PS1) 10Ema: 1.13.11: instrumentation fixes, test execution [debs/pybal] - 10https://gerrit.wikimedia.org/r/368989 (https://phabricator.wikimedia.org/T103882) [07:23:57] (03CR) 10Elukey: [C: 032] "https://puppet-compiler.wmflabs.org/compiler02/7242/analytics1003.eqiad.wmnet/" [puppet] - 10https://gerrit.wikimedia.org/r/368988 (https://phabricator.wikimedia.org/T172107) (owner: 10Elukey) [07:24:47] (03CR) 10Jcrespo: "This missed adding the shard to hiera, puppet should fail. If not, that is an error." [puppet] - 10https://gerrit.wikimedia.org/r/368804 (https://phabricator.wikimedia.org/T170662) (owner: 10Marostegui) [07:24:49] (03PS1) 10Ema: 1.13.11: instrumentation fixes, test execution [debs/pybal] (1.13) - 10https://gerrit.wikimedia.org/r/368990 (https://phabricator.wikimedia.org/T103882) [07:25:10] (03CR) 10Marostegui: "it did, i fixed in another commit :)" [puppet] - 10https://gerrit.wikimedia.org/r/368804 (https://phabricator.wikimedia.org/T170662) (owner: 10Marostegui) [07:25:50] (03CR) 10Ema: [C: 032] 1.13.11: instrumentation fixes, test execution [debs/pybal] - 10https://gerrit.wikimedia.org/r/368989 (https://phabricator.wikimedia.org/T103882) (owner: 10Ema) [07:26:53] (03CR) 10Ema: [C: 032] 1.13.11: instrumentation fixes, test execution [debs/pybal] (1.13) - 10https://gerrit.wikimedia.org/r/368990 (https://phabricator.wikimedia.org/T103882) (owner: 10Ema) [07:27:58] PROBLEM - puppet last run on analytics1003 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [07:28:29] this one is me--^ [07:28:43] need another round of cdh fix + sha update [07:28:47] * elukey is full of joy [07:29:20] wiki<3 to you elukey [07:29:26] * elukey hugs ema [07:29:27] (03PS1) 10Elukey: hive: remove dependency between /etc/default files and daemons [puppet/cdh] - 10https://gerrit.wikimedia.org/r/368991 (https://phabricator.wikimedia.org/T172107) [07:30:35] (03CR) 10Elukey: [C: 032] hive: remove dependency between /etc/default files and daemons [puppet/cdh] - 10https://gerrit.wikimedia.org/r/368991 (https://phabricator.wikimedia.org/T172107) (owner: 10Elukey) [07:32:12] (03PS1) 10Elukey: modules::cdh: update to the latest sha [puppet] - 10https://gerrit.wikimedia.org/r/368992 (https://phabricator.wikimedia.org/T172107) [07:32:33] (03CR) 10Elukey: [V: 032 C: 032] modules::cdh: update to the latest sha [puppet] - 10https://gerrit.wikimedia.org/r/368992 (https://phabricator.wikimedia.org/T172107) (owner: 10Elukey) [07:33:37] !log pybal 1.13.11 uploaded to apt.w.o T103882 [07:33:50] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:35:07] RECOVERY - puppet last run on analytics1003 is OK: OK: Puppet is currently enabled, last run 9 seconds ago with 0 failures [07:35:34] !log lvs4003, lvs4004 (ulsfo secondaries): upgrade to pybal 1.13.11 - one-packet-scheduling, instrumentation fixes. T104442, T103882 [07:35:45] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:35:46] T104442: Investigate better DNS cache/lookup solutions - https://phabricator.wikimedia.org/T104442 [07:40:38] (03PS1) 10Elukey: hive: fix hive-env.sh variable substitution [puppet/cdh] - 10https://gerrit.wikimedia.org/r/368994 (https://phabricator.wikimedia.org/T172107) [07:40:44] !log lvs4001, lvs4002 (ulsfo primaries): upgrade to pybal 1.13.11 - one-packet-scheduling, instrumentation fixes. T104442, T103882 [07:40:55] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:40:55] T104442: Investigate better DNS cache/lookup solutions - https://phabricator.wikimedia.org/T104442 [07:43:18] (03PS2) 10Elukey: hive: fix hive-env.sh variable substitution [puppet/cdh] - 10https://gerrit.wikimedia.org/r/368994 (https://phabricator.wikimedia.org/T172107) [07:49:12] (03CR) 10Elukey: [V: 032 C: 032] hive: fix hive-env.sh variable substitution [puppet/cdh] - 10https://gerrit.wikimedia.org/r/368994 (https://phabricator.wikimedia.org/T172107) (owner: 10Elukey) [07:50:52] (03PS1) 10Elukey: modules::cdh: update to the latest sha [puppet] - 10https://gerrit.wikimedia.org/r/368995 (https://phabricator.wikimedia.org/T172107) [07:52:29] (03CR) 10Elukey: [C: 032] modules::cdh: update to the latest sha [puppet] - 10https://gerrit.wikimedia.org/r/368995 (https://phabricator.wikimedia.org/T172107) (owner: 10Elukey) [08:03:23] !log lvs3*: upgrade to pybal 1.13.11 - one-packet-scheduling, instrumentation fixes. T104442, T103882 [08:03:35] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:03:35] T104442: Investigate better DNS cache/lookup solutions - https://phabricator.wikimedia.org/T104442 [08:12:27] 10Operations, 10Phabricator, 10Release-Engineering-Team (Kanban): reinstall iridium (phabricator) as phab1001 with jessie - https://phabricator.wikimedia.org/T152129#2839436 (10mmodell) p:05Normal>03High [08:19:09] (03PS1) 10Marostegui: Revert "db-codfw.php: Depool db2065" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/368997 [08:25:07] PROBLEM - carbon-cache too many creates on graphite1001 is CRITICAL: CRITICAL: 1.67% of data above the critical threshold [1000.0] [08:32:04] !log lvs2004-2006 (codfw secondaries): upgrade to pybal 1.13.11 - one-packet-scheduling, instrumentation fixes. T104442, T103882 [08:32:14] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:32:14] T104442: Investigate better DNS cache/lookup solutions - https://phabricator.wikimedia.org/T104442 [08:38:29] 10Operations, 10ops-codfw, 10User-fgiunchedi: ms-be2024 not powering on - https://phabricator.wikimedia.org/T171275#3488617 (10fgiunchedi) 05Open>03Resolved Machine is back up, I only had to change bios boot from uefi to legacy, thanks @papaul ! [08:39:11] 10Operations, 10monitoring: Add RIPE atlas data to Prometheus - https://phabricator.wikimedia.org/T167689#3488619 (10fgiunchedi) thanks @Aklapper ! fixed with operations and monitoring [08:39:25] 10Operations, 10monitoring: Export ipsec counters as Prometheus metrics - https://phabricator.wikimedia.org/T154619#3488621 (10fgiunchedi) thanks @Aklapper ! fixed with operations and monitoring [08:50:32] 10Operations, 10ops-codfw, 10DBA: Move some masters away from B6 - https://phabricator.wikimedia.org/T169501#3488647 (10Marostegui) Hi @Papaul We are planning to switchover db2019, and we will promote db2051 to master which is in C6, but we'd like to move it to B8 for instance? Could you let us know which... [08:50:58] 10Operations, 10Ops-Access-Requests, 10User-Addshore: Requesting access to mwlog1001.eqiad.wmnet for goransm - https://phabricator.wikimedia.org/T171958#3488648 (10Addshore) @ayounsi Access is needed to mediawiki logs created by code added to the WikimediaEvents extension. The log file is cunningly called W... [08:55:34] !log lvs2001-2003 (codfw primaries): upgrade to pybal 1.13.11 - one-packet-scheduling, instrumentation fixes. T104442, T103882 [08:55:44] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:55:45] T104442: Investigate better DNS cache/lookup solutions - https://phabricator.wikimedia.org/T104442 [09:01:20] 10Operations, 10Wikimedia-Language-setup, 10Wikimedia-Site-requests, 10Hindi-Sites, and 2 others: Create Wikiversity Hindi - https://phabricator.wikimedia.org/T168765#3488685 (10Jayprakash12345) >>! In T168765#3487579, @MarcoAurelio wrote: >> This solution to disable Flow isn't acceptable. > > I totally d... [09:01:34] 10Operations, 10Android-app-feature-Compilations, 10Reading-Infrastructure-Team-Backlog, 10Traffic, 10Wikipedia-Android-App-Backlog: Determine how to upload Zim files to Swift infrastructure - https://phabricator.wikimedia.org/T172123#3488686 (10fgiunchedi) I'll add some thoughts/braindump below: 1. For... [09:02:40] (03PS1) 1020after4: phab1001: Allow listen_address to be empty for migration from iridium [puppet] - 10https://gerrit.wikimedia.org/r/369001 [09:03:39] (03CR) 10jerkins-bot: [V: 04-1] phab1001: Allow listen_address to be empty for migration from iridium [puppet] - 10https://gerrit.wikimedia.org/r/369001 (owner: 1020after4) [09:09:21] (03PS2) 1020after4: phab1001: Allow listen_address to be empty for migration from iridium [puppet] - 10https://gerrit.wikimedia.org/r/369001 [09:10:29] 10Operations, 10Android-app-feature-Compilations, 10Traffic, 10Wikipedia-Android-App-Backlog, 10Reading-Infrastructure-Team-Backlog (Kanban): Determine URL paths for Zim files - https://phabricator.wikimedia.org/T172148#3488728 (10fgiunchedi) Swift's basic grouping for files are "containers" (or "buckets... [09:10:43] (03CR) 10jerkins-bot: [V: 04-1] phab1001: Allow listen_address to be empty for migration from iridium [puppet] - 10https://gerrit.wikimedia.org/r/369001 (owner: 1020after4) [09:11:37] (03PS3) 1020after4: phab1001: Allow listen_address to be empty for migration from iridium [puppet] - 10https://gerrit.wikimedia.org/r/369001 [09:12:02] (03CR) 10Volans: "recheck" [software/cumin] - 10https://gerrit.wikimedia.org/r/366735 (owner: 10Volans) [09:12:36] (03CR) 10jerkins-bot: [V: 04-1] phab1001: Allow listen_address to be empty for migration from iridium [puppet] - 10https://gerrit.wikimedia.org/r/369001 (owner: 1020after4) [09:12:59] (03PS4) 1020after4: phab1001: Allow listen_address to be empty for migration from iridium [puppet] - 10https://gerrit.wikimedia.org/r/369001 [09:13:44] (03CR) 10jerkins-bot: [V: 04-1] Logging: add a custom trace() logging level [software/cumin] - 10https://gerrit.wikimedia.org/r/366735 (owner: 10Volans) [09:17:49] (03CR) 10Aklapper: "@Volans: Good point, thanks! I compared whois' "parent" output for the IPs of the last 36 disabled accounts that are listed on https://pha" [puppet] - 10https://gerrit.wikimedia.org/r/368775 (owner: 10Aklapper) [09:19:47] (03PS1) 10Filippo Giunchedi: install_server: default to ext4 [puppet] - 10https://gerrit.wikimedia.org/r/369003 (https://phabricator.wikimedia.org/T169605) [09:20:27] PROBLEM - carbon-cache too many creates on graphite1001 is CRITICAL: CRITICAL: 1.67% of data above the critical threshold [1000.0] [09:23:34] (03PS1) 10Filippo Giunchedi: graphite: bump too many creates thresholds [puppet] - 10https://gerrit.wikimedia.org/r/369004 [09:24:09] 10Operations, 10DBA, 10Wikimedia-Site-requests, 10Wikimedia-maintenance-script-run: Unbreak/finish global merge of Yuvipanda into YuviPanda - https://phabricator.wikimedia.org/T104686#3488739 (10MarcoAurelio) p:05High>03Unbreak! a:05Legoktm>03None It is blocking editting from this editor and is run... [09:30:24] (03PS2) 10Filippo Giunchedi: graphite: adjust max creates alert [puppet] - 10https://gerrit.wikimedia.org/r/369004 [09:46:01] (03PS1) 10Ema: pybal::monitoring: check_pybal_ipvs_diff interval/timeouts [puppet] - 10https://gerrit.wikimedia.org/r/369006 (https://phabricator.wikimedia.org/T134893) [09:55:54] I am going to stop CI for a few minutes to push a mass amount of changes to the mediawiki extensions [10:08:34] (03CR) 10Filippo Giunchedi: [C: 032] graphite: adjust max creates alert [puppet] - 10https://gerrit.wikimedia.org/r/369004 (owner: 10Filippo Giunchedi) [10:10:48] PROBLEM - puppet last run on puppetmaster1001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [10:12:46] !log Stopped Zuul / CI for mass mediawiki extension changes [10:12:57] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:13:19] 10Operations, 10Ops-Access-Requests, 10User-Addshore: Requesting access to mwlog1001.eqiad.wmnet for goransm - https://phabricator.wikimedia.org/T171958#3488786 (10GoranSMilovanovic) @Addshore Thanks Adam. @ayounsi My immediate manager in WMDE Software Engineering, @Tobi_WMDE_SW, is on vacation. Hopefully,... [10:13:47] PROBLEM - zuul_gearman_service on contint1001 is CRITICAL: connect to address 127.0.0.1 and port 4730: Connection refused [10:14:11] 10Operations, 10Ops-Access-Requests, 10User-Addshore: Requesting access to mwlog1001.eqiad.wmnet for goransm - https://phabricator.wikimedia.org/T171958#3488787 (10Addshore) We can always get @Abraham to jump in and OK this too if he is not too busy! [10:14:38] PROBLEM - zuul_service_running on contint1001 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/share/python/zuul/bin/python /usr/bin/zuul-server [10:15:08] ACKNOWLEDGEMENT - zuul_gearman_service on contint1001 is CRITICAL: connect to address 127.0.0.1 and port 4730: Connection refused amusso Stopped Zuul for mass spam to Gerrit [10:15:08] ACKNOWLEDGEMENT - zuul_service_running on contint1001 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/share/python/zuul/bin/python /usr/bin/zuul-server amusso Stopped Zuul for mass spam to Gerrit [10:21:38] RECOVERY - carbon-cache too many creates on graphite1001 is OK: OK: Less than 1.00% above the threshold [500.0] [10:21:47] RECOVERY - zuul_gearman_service on contint1001 is OK: TCP OK - 0.000 second response time on 127.0.0.1 port 4730 [10:22:38] RECOVERY - zuul_service_running on contint1001 is OK: PROCS OK: 2 processes with regex args ^/usr/share/python/zuul/bin/python /usr/bin/zuul-server [10:22:56] grrpuppet [10:24:37] ACKNOWLEDGEMENT - zuul_gearman_service on contint1001 is CRITICAL: connect to address 127.0.0.1 and port 4730: Connection refused amusso Mass spam to Gerrit and puppet disabled to prevent Zuul from coming back up [10:24:37] ACKNOWLEDGEMENT - zuul_service_running on contint1001 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/share/python/zuul/bin/python /usr/bin/zuul-server amusso Mass spam to Gerrit and puppet disabled to prevent Zuul from coming back up [10:24:53] !log contint1001 stopped puppet agent to prevent Zuul server to come back up [10:24:57] RECOVERY - puppet last run on puppetmaster1001 is OK: OK: Puppet is currently enabled, last run 40 seconds ago with 0 failures [10:25:03] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:26:58] 10Operations, 10DBA, 10MediaWiki-extensions-WikibaseClient, 10Performance-Team, and 6 others: Cache invalidations coming from the JobQueue are causing lag on several wikis - https://phabricator.wikimedia.org/T164173#3488793 (10thiemowmde) [10:31:07] !log Enabling Zuul/CI again and reenabling puppet on contint1001 [10:31:17] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:05:06] (03CR) 10Marostegui: [C: 032] Revert "db-codfw.php: Depool db2065" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/368997 (owner: 10Marostegui) [11:06:12] 10Operations, 10Wikimedia-Language-setup, 10Wikimedia-Site-requests, 10Patch-For-Review, 10User-Urbanecm: Reopen Wikinews Dutch - https://phabricator.wikimedia.org/T168764#3488875 (10Urbanecm) [11:09:49] (03Merged) 10jenkins-bot: Revert "db-codfw.php: Depool db2065" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/368997 (owner: 10Marostegui) [11:09:59] (03CR) 10jenkins-bot: Revert "db-codfw.php: Depool db2065" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/368997 (owner: 10Marostegui) [11:10:42] !log marostegui@tin Synchronized wmf-config/db-codfw.php: Repool db2065 - T170662 (duration: 00m 43s) [11:10:52] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:10:52] T170662: Productionize 22 new codfw database servers - https://phabricator.wikimedia.org/T170662 [11:12:01] 10Operations, 10Wikidata, 10User-notice, 10Wikimedia-Incident: Wikidata and dewiki databases locked - https://phabricator.wikimedia.org/T171928#3488879 (10jcrespo) I've almost finished the above incident documentation. However, I am unsure about which are the right actionables and their priorities (last se... [11:12:10] 10Operations, 10Wikimedia-Language-setup, 10Wikimedia-Site-requests, 10Hindi-Sites, and 2 others: Create Wikiversity Hindi - https://phabricator.wikimedia.org/T168765#3488880 (10Urbanecm) Just a note: Even if your community decides to have flow disabled, it'll probably stay enabled because system administr... [11:14:45] 10Operations, 10Wikidata, 10User-notice, 10Wikimedia-Incident: Wikidata and dewiki databases locked - https://phabricator.wikimedia.org/T171928#3488882 (10Marostegui) From my side, I would prefer option "b" (monitoring read-only status on the active masters) My reasoning for this is: I wouldn't like puppe... [11:16:29] (03PS2) 10Marostegui: mariadb: Add db2074 as a new slave for s3 [puppet] - 10https://gerrit.wikimedia.org/r/368985 (https://phabricator.wikimedia.org/T170662) [11:21:02] (03CR) 10Marostegui: [C: 032] mariadb: Add db2074 as a new slave for s3 [puppet] - 10https://gerrit.wikimedia.org/r/368985 (https://phabricator.wikimedia.org/T170662) (owner: 10Marostegui) [11:25:23] (03PS5) 10Steinsplitter: Same namespace for global mail blacklist as for global spam blacklist. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/368770 [11:30:48] (03PS1) 10Marostegui: db-codfw.php: Depool db2057 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/369335 (https://phabricator.wikimedia.org/T170662) [11:33:47] (03CR) 10Marostegui: [C: 032] db-codfw.php: Depool db2057 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/369335 (https://phabricator.wikimedia.org/T170662) (owner: 10Marostegui) [11:35:12] (03Merged) 10jenkins-bot: db-codfw.php: Depool db2057 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/369335 (https://phabricator.wikimedia.org/T170662) (owner: 10Marostegui) [11:36:19] (03CR) 10jenkins-bot: db-codfw.php: Depool db2057 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/369335 (https://phabricator.wikimedia.org/T170662) (owner: 10Marostegui) [11:36:35] !log marostegui@tin Synchronized wmf-config/db-codfw.php: Depool db2057 - T170662 (duration: 00m 43s) [11:36:45] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:36:46] T170662: Productionize 22 new codfw database servers - https://phabricator.wikimedia.org/T170662 [11:37:07] 10Operations, 10Wikidata, 10User-notice, 10Wikimedia-Incident: Wikidata and dewiki databases locked - https://phabricator.wikimedia.org/T171928#3480468 (10mark) I agree; there's a very good reason for setting masters to read-only when something happened, because it needs manual intervention to investigate... [11:50:07] 10Operations, 10Analytics, 10EventBus, 10User-Elukey: Eventbus does not handle gracefully changes in DNS recursors - https://phabricator.wikimedia.org/T171048#3452124 (10elukey) p:05High>03Low The remaining step is to explore the possibility of having a logic to cache the statsd IP only for a limited a... [11:52:17] (03CR) 10Paladox: "I've cherrypicked this onto puppet-phabricator to test for any errors" [puppet] - 10https://gerrit.wikimedia.org/r/369001 (owner: 1020after4) [11:52:36] !log Stop MySQL on db2057 to copy its data to db2074 - T170662 [11:52:46] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:52:46] T170662: Productionize 22 new codfw database servers - https://phabricator.wikimedia.org/T170662 [11:54:52] (03CR) 10Paladox: [C: 031] "puppet passes" [puppet] - 10https://gerrit.wikimedia.org/r/369001 (owner: 1020after4) [12:02:11] PROBLEM - CirrusSearch eqiad 95th percentile latency on graphite1001 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [1000.0] [12:02:28] looking ^ [12:04:49] !log stop eventlogging_sync on analytics-slaves && rename all CookieBlock* tables (log db) to CookieBlock*_backup - T171883 [12:04:58] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:04:59] T171883: Drop CookieBlock* tables from EventLogging DB - https://phabricator.wikimedia.org/T171883 [12:05:03] marostegui: ---^ [12:05:24] my idea is to stop EL sync + renamed on db1046/7/store1002 [12:05:28] *rename [12:06:25] !log 100% cpu spike on elastic1023 caused percentiles to jump for a short period of time (T169498) [12:06:36] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:06:36] T169498: Investigate load spikes on the elasticsearch cluster in eqiad - https://phabricator.wikimedia.org/T169498 [12:08:11] RECOVERY - CirrusSearch eqiad 95th percentile latency on graphite1001 is OK: OK: Less than 20.00% above the threshold [500.0] [12:09:50] elukey: that sounds good! [12:10:26] marostegui: proceeding! [12:18:46] marostegui: done! I didn't set " set session sql_log_bin=0" on db104[67] becaues I forgot, shouldn't be an issue but sorry :( [12:19:05] elukey: Thanks :-) [12:46:13] 10Operations, 10Wikimedia-Language-setup, 10Wikimedia-Site-requests, 10Hindi-Sites, and 2 others: Create Wikiversity Hindi - https://phabricator.wikimedia.org/T168765#3489005 (10Trizek-WMF) >>! In T168765#3488685, @Jayprakash12345 wrote: > Some user opposing to enable this. So I was made new section for be... [12:49:23] !log restart hive daemons on analytics1003 to pick up new jvm settings (bigger Xmx, JMX ports) [12:49:33] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:50:04] 10Operations, 10Ops-Access-Requests, 10User-MarcoAurelio: Requesting access to deployment-prep for @MarcoAurelio - https://phabricator.wikimedia.org/T172182#3489008 (10MarcoAurelio) [12:54:08] (03CR) 10Mforns: [C: 031] "LGTM! Luca can you please merge if it looks good to you?" [puppet] - 10https://gerrit.wikimedia.org/r/366049 (https://phabricator.wikimedia.org/T170986) (owner: 10Mforns) [12:58:24] o/ _joe_ [12:58:42] (03PS6) 10Elukey: Add MediaWikiPingback to EL purging white-list [puppet] - 10https://gerrit.wikimedia.org/r/366049 (https://phabricator.wikimedia.org/T170986) (owner: 10Mforns) [12:58:44] Just saw you won't be able to help me out with the ores stress test today [13:00:04] addshore, hashar, anomie, RainbowSprinkles, aude, MaxSem, twentyafterfour, RoanKattouw, Dereckson, thcipriani, Niharika, and zeljkof: Respected human, time to deploy European Mid-day SWAT(Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20170801T1300). Please do the needful. [13:00:04] James_F: A patch you scheduled for European Mid-day SWAT(Max 8 patches) is about to be deployed. Please be available during the process. [13:00:14] o/ [13:00:25] o/ [13:00:40] Hey. [13:00:53] (03CR) 10Elukey: [C: 032] Add MediaWikiPingback to EL purging white-list [puppet] - 10https://gerrit.wikimedia.org/r/366049 (https://phabricator.wikimedia.org/T170986) (owner: 10Mforns) [13:01:02] <_joe_> halfak|Mobile: yeah sorry I'm blocking a few people until I've fixed the puppet-compiler [13:01:07] hashar: I can swat,or do you want to? [13:01:09] (03PS3) 10Hashar: Enable OOjs UI EditPage on all wikis except Commons [mediawiki-config] - 10https://gerrit.wikimedia.org/r/366867 (owner: 10Jforrester) [13:01:10] <_joe_> so I don't really have time today [13:01:20] zeljkof: I will do it [13:01:34] hashar: ok [13:01:36] Ok. Gotcha. Godspeed [13:01:52] James_F: I am around for the //Enable OOjs UI EditPage on all wikis except Commons// [13:02:01] <_joe_> I'm around tomorrow though [13:02:05] OK. [13:02:11] <_joe_> and I should hopefully be done [13:02:47] James_F: I guess the patch should be attached to T162849 ? [13:02:47] T162849: Support WMF communities in run-up to switching EditPage over to OOUI - https://phabricator.wikimedia.org/T162849 [13:03:17] hashar: It’s fine. [13:03:24] ok [13:03:32] (03CR) 10Hashar: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/366867 (owner: 10Jforrester) [13:04:32] Hmm. Tomorrow is bad until the end of wikimania. Maybe I can still do the test today. I'll get a patch together for more workers figure out how to apply it to just the ores nodes. [13:04:42] 10Operations, 10ArchCom-RfC, 10Traffic, 10Services (designing): Make API usage limits easier to understand, implement, and more adaptive to varying request costs / concurrency limiting - https://phabricator.wikimedia.org/T167906#3489044 (10Anomie) [13:05:25] (03Merged) 10jenkins-bot: Enable OOjs UI EditPage on all wikis except Commons [mediawiki-config] - 10https://gerrit.wikimedia.org/r/366867 (owner: 10Jforrester) [13:05:44] James_F: it is on mwdebug1001 [13:05:46] (03PS1) 10Mforns: Add Kartographer schema to EventLogging white-list [puppet] - 10https://gerrit.wikimedia.org/r/369350 (https://phabricator.wikimedia.org/T171622) [13:06:00] (03CR) 10jerkins-bot: [V: 04-1] Add Kartographer schema to EventLogging white-list [puppet] - 10https://gerrit.wikimedia.org/r/369350 (https://phabricator.wikimedia.org/T171622) (owner: 10Mforns) [13:06:27] !log Compress s2 on db1102 - T172169 [13:06:32] (03CR) 10jenkins-bot: Enable OOjs UI EditPage on all wikis except Commons [mediawiki-config] - 10https://gerrit.wikimedia.org/r/366867 (owner: 10Jforrester) [13:06:37] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:06:37] T172169: Compress InnoDB on db1102 - https://phabricator.wikimedia.org/T172169 [13:08:05] Oh, 1001, not 1002. :-) [13:08:25] James_F: it is on both now :-) [13:08:27] sorry! [13:08:29] * James_F grins. [13:09:03] hashar: Yeah, LGTM. [13:09:29] syncing [13:10:09] !log hashar@tin Synchronized wmf-config/InitialiseSettings.php: Enable OOjs UI EditPage on all wikis except Commons (duration: 00m 44s) [13:10:18] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:10:34] Thank you! [13:11:52] (03PS3) 10Ema: VCL mobile redirect: allow other params alongside title= [puppet] - 10https://gerrit.wikimedia.org/r/368814 (https://phabricator.wikimedia.org/T154227) (owner: 10BBlack) [13:18:21] (03PS4) 10Ema: VCL mobile redirect: allow other params alongside title= [puppet] - 10https://gerrit.wikimedia.org/r/368814 (https://phabricator.wikimedia.org/T154227) (owner: 10BBlack) [13:26:54] !log ema@neodymium conftool action : set/pooled=no; selector: name=achernar.wikimedia.org,service=pdns_recursor [13:27:03] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:27:42] (03PS1) 10Elukey: hive: add jmx monitoring classes [puppet/cdh] - 10https://gerrit.wikimedia.org/r/369351 (https://phabricator.wikimedia.org/T172107) [13:28:03] (03CR) 10jerkins-bot: [V: 04-1] hive: add jmx monitoring classes [puppet/cdh] - 10https://gerrit.wikimedia.org/r/369351 (https://phabricator.wikimedia.org/T172107) (owner: 10Elukey) [13:30:54] (03PS2) 10Elukey: hive: add jmx monitoring classes [puppet/cdh] - 10https://gerrit.wikimedia.org/r/369351 (https://phabricator.wikimedia.org/T172107) [13:34:12] (03CR) 10Elukey: [C: 04-1] "Self -1, jmx strings are wrong :)" [puppet/cdh] - 10https://gerrit.wikimedia.org/r/369351 (https://phabricator.wikimedia.org/T172107) (owner: 10Elukey) [13:38:21] PROBLEM - Check Varnish expiry mailbox lag on cp1099 is CRITICAL: CRITICAL: expiry mailbox lag is 2181161 [13:40:43] (03CR) 10Herron: [C: 032] Change mailman DEFAULT_DMARC_MODERATION_ACTION to 1 (munge from) [puppet] - 10https://gerrit.wikimedia.org/r/361685 (https://phabricator.wikimedia.org/T168467) (owner: 10Herron) [13:40:51] (03PS2) 10Herron: Change mailman DEFAULT_DMARC_MODERATION_ACTION to 1 (munge from) [puppet] - 10https://gerrit.wikimedia.org/r/361685 (https://phabricator.wikimedia.org/T168467) [13:41:01] 10Operations, 10Citoid, 10VisualEditor, 10Services (watching), 10User-mobrovac: Wiley requests for DOI and some other publishers don't work in production - https://phabricator.wikimedia.org/T165105#3489125 (10Samwalton9) I received this response today: > I have confirmed with network and Sys-Ops team... [13:41:08] 10Operations, 10Analytics-Cluster, 10Analytics-Kanban, 10User-Elukey: thorium - failed git clone of geowiki-data-private - https://phabricator.wikimedia.org/T171923#3489126 (10Ottomata) Weird, I'm not sure what's up here. The /var/lib/stats/.gitconfig file looks good, and it is [[ https://github.com/wikim... [13:42:23] (03PS3) 10Elukey: hive: add jmx monitoring classes [puppet/cdh] - 10https://gerrit.wikimedia.org/r/369351 (https://phabricator.wikimedia.org/T172107) [13:46:27] ottomata: --^ whenever you have time :) [13:46:36] o/ [13:49:22] (03CR) 10Ottomata: hive: add jmx monitoring classes (031 comment) [puppet/cdh] - 10https://gerrit.wikimedia.org/r/369351 (https://phabricator.wikimedia.org/T172107) (owner: 10Elukey) [13:52:50] (03PS4) 10Elukey: hive: add jmx monitoring classes [puppet/cdh] - 10https://gerrit.wikimedia.org/r/369351 (https://phabricator.wikimedia.org/T172107) [13:53:12] !log ema@neodymium conftool action : set/pooled=yes; selector: name=achernar.wikimedia.org,service=pdns_recursor [13:53:13] (03PS3) 10BBlack: OCSP: Warn less, retry more [puppet] - 10https://gerrit.wikimedia.org/r/368779 (https://phabricator.wikimedia.org/T172116) [13:53:22] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:56:09] (03CR) 10BBlack: [V: 032 C: 032] OCSP: Warn less, retry more [puppet] - 10https://gerrit.wikimedia.org/r/368779 (https://phabricator.wikimedia.org/T172116) (owner: 10BBlack) [13:58:43] <_joe_> win 19 [14:03:20] (03CR) 10Aklapper: "...and 160.90.*: https://phabricator.wikimedia.org/people/logs/query/i6e14cDvRQyg/#R" [puppet] - 10https://gerrit.wikimedia.org/r/368775 (owner: 10Aklapper) [14:03:50] (03PS2) 10Volans: Tests: simplify and improve parametrized tests [software/cumin] - 10https://gerrit.wikimedia.org/r/366733 (https://phabricator.wikimedia.org/T154588) [14:03:52] (03PS2) 10Volans: CLI: simplify imports and introspection [software/cumin] - 10https://gerrit.wikimedia.org/r/366734 [14:03:54] (03PS2) 10Volans: Logging: add a custom trace() logging level [software/cumin] - 10https://gerrit.wikimedia.org/r/366735 [14:03:56] (03PS2) 10Volans: Transports: convert hosts to ClusterShell's NodeSet [software/cumin] - 10https://gerrit.wikimedia.org/r/366736 (https://phabricator.wikimedia.org/T170394) [14:03:58] (03PS3) 10Volans: Query: add multi-query support [software/cumin] - 10https://gerrit.wikimedia.org/r/366737 (https://phabricator.wikimedia.org/T170394) [14:04:00] (03PS2) 10Volans: Transports: improve Command class [software/cumin] - 10https://gerrit.wikimedia.org/r/367823 (https://phabricator.wikimedia.org/T171679) [14:04:02] (03PS2) 10Volans: CLI: add an option to ignore exit codes of commands [software/cumin] - 10https://gerrit.wikimedia.org/r/367824 (https://phabricator.wikimedia.org/T171679) [14:04:07] (03PS2) 10Volans: Transports: improve target management [software/cumin] - 10https://gerrit.wikimedia.org/r/367825 (https://phabricator.wikimedia.org/T171684) [14:04:07] (03PS1) 10Volans: Style: fix some newly reported vulture violations [software/cumin] - 10https://gerrit.wikimedia.org/r/369384 [14:05:12] (03CR) 10Elukey: "recheck" [puppet/cdh] - 10https://gerrit.wikimedia.org/r/369351 (https://phabricator.wikimedia.org/T172107) (owner: 10Elukey) [14:06:32] (03PS5) 10Elukey: hive: add jmx monitoring classes [puppet/cdh] - 10https://gerrit.wikimedia.org/r/369351 (https://phabricator.wikimedia.org/T172107) [14:07:07] (03Abandoned) 10MarcoAurelio: Alter ContentTranslation default namespace destination for zhwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/363370 (https://phabricator.wikimedia.org/T168727) (owner: 10MarcoAurelio) [14:07:10] (03CR) 10Ottomata: Kafka broker profile and roles for new 'aggregate' (TBD) cluster and 'simple' cluster (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/356232 (https://phabricator.wikimedia.org/T166162) (owner: 10Ottomata) [14:07:40] (03CR) 10Volans: "@aklapper: from what you're saying I guess that those IP ranges are not part of WP Zero and are just providers from which we get spam. If " [puppet] - 10https://gerrit.wikimedia.org/r/368775 (owner: 10Aklapper) [14:08:49] (03CR) 10Elukey: "recheck" [puppet/cdh] - 10https://gerrit.wikimedia.org/r/369351 (https://phabricator.wikimedia.org/T172107) (owner: 10Elukey) [14:11:31] PROBLEM - Check Varnish expiry mailbox lag on cp1074 is CRITICAL: CRITICAL: expiry mailbox lag is 2125940 [14:12:08] !log restart varnish backend on cp1074 (mailbox lag) [14:12:19] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:12:53] (03PS1) 10Giuseppe Lavagetto: Use own differ instead of puppet catalog diff [software/puppet-compiler] - 10https://gerrit.wikimedia.org/r/369385 [14:13:43] (03PS5) 10BBlack: VCL mobile redirect: allow other params alongside title= [puppet] - 10https://gerrit.wikimedia.org/r/368814 (https://phabricator.wikimedia.org/T154227) [14:14:08] (03CR) 10BBlack: [V: 032 C: 032] VCL mobile redirect: allow other params alongside title= [puppet] - 10https://gerrit.wikimedia.org/r/368814 (https://phabricator.wikimedia.org/T154227) (owner: 10BBlack) [14:15:46] 10Operations, 10Ops-Access-Requests, 10Beta-Cluster-Infrastructure, 10User-MarcoAurelio: Requesting access to deployment-prep for @MarcoAurelio - https://phabricator.wikimedia.org/T172182#3489233 (10MarcoAurelio) [14:21:32] RECOVERY - Check Varnish expiry mailbox lag on cp1074 is OK: OK: expiry mailbox lag is 0 [14:22:21] PROBLEM - Check Varnish expiry mailbox lag on cp1049 is CRITICAL: CRITICAL: expiry mailbox lag is 2044320 [14:27:57] !log restart varnish backend on cp1049 (mailbox lag) [14:28:05] !log lvs1004-1006 (eqiad secondaries): upgrade to pybal 1.13.11 - one-packet-scheduling, instrumentation fixes. T104442, T103882 [14:28:07] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:28:17] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:28:17] T104442: Investigate better DNS cache/lookup solutions - https://phabricator.wikimedia.org/T104442 [14:28:43] (03CR) 10RobH: [C: 031] install_server: default to ext4 [puppet] - 10https://gerrit.wikimedia.org/r/369003 (https://phabricator.wikimedia.org/T169605) (owner: 10Filippo Giunchedi) [14:32:21] RECOVERY - Check Varnish expiry mailbox lag on cp1049 is OK: OK: expiry mailbox lag is 0 [14:34:58] 10Operations, 10ops-codfw, 10User-fgiunchedi: ms-be2024 not powering on - https://phabricator.wikimedia.org/T171275#3489266 (10Papaul) @fgiunchedi you welcome. Please see below for return label information for the bad main board just for reference. {F8924492} [14:35:32] (03PS1) 10Marostegui: s3.hosts: Add db2074 to s3 list [software] - 10https://gerrit.wikimedia.org/r/369389 (https://phabricator.wikimedia.org/T170662) [14:38:16] 10Operations, 10ops-codfw, 10DBA: Move some masters away from B6 - https://phabricator.wikimedia.org/T169501#3489286 (10Papaul) Hi @Marostegui We can do this tomorrow with no problem. [14:38:40] 10Operations, 10ops-codfw, 10DBA: Move some masters away from B6 - https://phabricator.wikimedia.org/T169501#3489287 (10Marostegui) >>! In T169501#3489286, @Papaul wrote: > Hi @Marostegui > > We can do this tomorrow with no problem. Sounds good, any specific time? [14:39:36] ACKNOWLEDGEMENT - puppet last run on labvirt1016 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues andrew bogott Ill probably need to reboot to get around this. [14:40:25] (03CR) 10Aklapper: "@Volans: I agree but I do not see any other solution currently." [puppet] - 10https://gerrit.wikimedia.org/r/368775 (owner: 10Aklapper) [14:40:32] (03CR) 10Marostegui: [C: 032] s3.hosts: Add db2074 to s3 list [software] - 10https://gerrit.wikimedia.org/r/369389 (https://phabricator.wikimedia.org/T170662) (owner: 10Marostegui) [14:41:13] (03Merged) 10jenkins-bot: s3.hosts: Add db2074 to s3 list [software] - 10https://gerrit.wikimedia.org/r/369389 (https://phabricator.wikimedia.org/T170662) (owner: 10Marostegui) [14:45:18] !log lvs1001-1003 (eqiad primaries): upgrade to pybal 1.13.11 - one-packet-scheduling, instrumentation fixes. T104442, T103882 [14:45:28] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:45:28] T104442: Investigate better DNS cache/lookup solutions - https://phabricator.wikimedia.org/T104442 [14:46:36] !log Deploy InnoDB compression on s3 - db2074 for the following tables (revision, pagelinks and templatelinks) - T170662 [14:46:45] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:46:45] T170662: Productionize 22 new codfw database servers - https://phabricator.wikimedia.org/T170662 [14:47:05] 10Operations, 10Wikidata, 10User-notice, 10Wikimedia-Incident: Wikidata and dewiki databases locked - https://phabricator.wikimedia.org/T171928#3489355 (10jcrespo) I have started working on more complete monitoring, useful if we go over the route of human monitoring rather than automation, here is one exam... [14:47:11] (03CR) 10Elukey: [C: 032] hive: add jmx monitoring classes [puppet/cdh] - 10https://gerrit.wikimedia.org/r/369351 (https://phabricator.wikimedia.org/T172107) (owner: 10Elukey) [14:49:29] (03PS1) 10Elukey: role::analytics_cluster::hive: add jvm monitoring [puppet] - 10https://gerrit.wikimedia.org/r/369392 (https://phabricator.wikimedia.org/T172107) [14:51:43] (03CR) 10Paladox: [C: 04-1] "We can't block every ip used by a spammer instead we could block them on mediawiki." [puppet] - 10https://gerrit.wikimedia.org/r/368775 (owner: 10Aklapper) [14:52:43] (03CR) 10Elukey: [C: 032] role::analytics_cluster::hive: add jvm monitoring [puppet] - 10https://gerrit.wikimedia.org/r/369392 (https://phabricator.wikimedia.org/T172107) (owner: 10Elukey) [14:54:51] PROBLEM - Upload HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 10.00% of data above the critical threshold [1000.0] [14:55:27] mdholloway: I need to reboot your VM 'mobile-puppetmaster' — it'll be down for a few minutes. Is it OK if I do that now? [14:55:43] andrewbogott: go for it! [14:55:46] thanks [14:55:55] andrewbogott: was just replying to your email. thanks for the heads-up [14:56:47] !log rebooting labvirt1016 [14:56:56] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:02:52] RECOVERY - Upload HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] [15:03:46] (03CR) 1020after4: [C: 031] "https://puppet-compiler.wmflabs.org/compiler02/7246/" [puppet] - 10https://gerrit.wikimedia.org/r/369001 (owner: 1020after4) [15:04:49] (03PS1) 10Jcrespo: mariadb: Add new python3 script to check the health of a server [puppet] - 10https://gerrit.wikimedia.org/r/369397 (https://phabricator.wikimedia.org/T171928) [15:05:43] (03CR) 10jerkins-bot: [V: 04-1] mariadb: Add new python3 script to check the health of a server [puppet] - 10https://gerrit.wikimedia.org/r/369397 (https://phabricator.wikimedia.org/T171928) (owner: 10Jcrespo) [15:05:45] 10Operations, 10ops-codfw, 10DBA: Move some masters away from B6 - https://phabricator.wikimedia.org/T169501#3489466 (10Papaul) 10:00 am CDT [15:07:45] 10Operations, 10ops-codfw, 10DBA: Move some masters away from B6 - https://phabricator.wikimedia.org/T169501#3489472 (10Marostegui) Excellent, I will get the server ready by then, just to change the IP on it and power it off :) Thanks a lot! [15:13:08] (03CR) 10Volans: "addressed comments" (032 comments) [software/cumin] - 10https://gerrit.wikimedia.org/r/366733 (https://phabricator.wikimedia.org/T154588) (owner: 10Volans) [15:15:07] (03CR) 10Volans: "addressed comments" (0313 comments) [software/cumin] - 10https://gerrit.wikimedia.org/r/366737 (https://phabricator.wikimedia.org/T170394) (owner: 10Volans) [15:17:20] (03PS1) 10Marostegui: db1055.yaml: Update socket location [puppet] - 10https://gerrit.wikimedia.org/r/369400 (https://phabricator.wikimedia.org/T148507) [15:17:34] !log Stop MySQL on db1055 for maintenance - https://phabricator.wikimedia.org/T148507 [15:17:44] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:18:30] 10Operations, 10ops-ulsfo, 10Traffic: setup/install cp402[5-8].ulsfo.wmnet - https://phabricator.wikimedia.org/T172198#3489494 (10RobH) [15:19:42] (03CR) 10Marostegui: "https://puppet-compiler.wmflabs.org/compiler02/7249/" [puppet] - 10https://gerrit.wikimedia.org/r/369400 (https://phabricator.wikimedia.org/T148507) (owner: 10Marostegui) [15:21:17] (03CR) 10Marostegui: [C: 032] db1055.yaml: Update socket location [puppet] - 10https://gerrit.wikimedia.org/r/369400 (https://phabricator.wikimedia.org/T148507) (owner: 10Marostegui) [15:21:21] 10Operations, 10DBA, 10Wikimedia-Site-requests, 10Wikimedia-maintenance-script-run: Unbreak/finish global merge of Yuvipanda into YuviPanda - https://phabricator.wikimedia.org/T104686#3489517 (10Jdforrester-WMF) p:05Unbreak!>03High This affects one account, and is not a risk to life or limb, nor are th... [15:23:58] (03CR) 10Ayounsi: [C: 032] [DNM] ContInt: Upgrade npm from 2.15.2 to 3.8.3 in CI [puppet] - 10https://gerrit.wikimedia.org/r/368459 (https://phabricator.wikimedia.org/T161861) (owner: 10Jforrester) [15:24:10] (03PS2) 10Ayounsi: [DNM] ContInt: Upgrade npm from 2.15.2 to 3.8.3 in CI [puppet] - 10https://gerrit.wikimedia.org/r/368459 (https://phabricator.wikimedia.org/T161861) (owner: 10Jforrester) [15:24:13] 10Operations, 10Wikimedia-Site-requests, 10Wikimedia-maintenance-script-run: Unbreak/finish global merge of Yuvipanda into YuviPanda - https://phabricator.wikimedia.org/T104686#3489523 (10Marostegui) I am removing the DBA tag, as the number of edits is really low, so there should be no issues with the databa... [15:25:44] (03CR) 10Paladox: [DNM] ContInt: Upgrade npm from 2.15.2 to 3.8.3 in CI (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/368459 (https://phabricator.wikimedia.org/T161861) (owner: 10Jforrester) [15:27:25] (03PS3) 10Hashar: contint: upgrade npm from 2.15.2 to 3.8.3 in CI [puppet] - 10https://gerrit.wikimedia.org/r/368459 (https://phabricator.wikimedia.org/T161861) (owner: 10Jforrester) [15:27:45] 10Operations, 10Puppet, 10Traffic, 10Mobile, and 5 others: URLs with title query string parameter and additional query string parameters do not redirect to mobile site - https://phabricator.wikimedia.org/T154227#3489526 (10Jdlrobson) 05Open>03Resolved a:03Jdlrobson Tested on a mobile device. I can to... [15:27:46] (03CR) 10Paladox: [C: 031] contint: upgrade npm from 2.15.2 to 3.8.3 in CI [puppet] - 10https://gerrit.wikimedia.org/r/368459 (https://phabricator.wikimedia.org/T161861) (owner: 10Jforrester) [15:31:35] (03CR) 10Ayounsi: [V: 032 C: 032] contint: upgrade npm from 2.15.2 to 3.8.3 in CI [puppet] - 10https://gerrit.wikimedia.org/r/368459 (https://phabricator.wikimedia.org/T161861) (owner: 10Jforrester) [15:33:48] !log Stop s3 on db1069 - replication stuck [15:33:58] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:38:05] 10Operations, 10monitoring, 10netops: "MySQL server has gone away" from librenms logs - https://phabricator.wikimedia.org/T171714#3489551 (10ayounsi) I don't see this error in the log files anymore, maybe that was temporary during the service move? [15:38:17] 10Operations, 10ops-ulsfo, 10Traffic: setup/install cp402[5-8].ulsfo.wmnet - https://phabricator.wikimedia.org/T172198#3489552 (10RobH) ``` network ports update: robh@asw-ulsfo# show | compare [edit interfaces xe-1/0/9] - description cp4019.ulsfo.wmnet; + description cp4025; - disable; + enable; [e... [15:39:55] !log stopping pybal on 1002 for impending reboot [15:39:59] heh [15:40:03] !log stopping pybal on lvs1002 for impending reboot [15:40:05] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:40:14] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:41:12] 10Operations, 10monitoring, 10User-fgiunchedi: Programmatic generation of grafana dashboards - https://phabricator.wikimedia.org/T171482#3489555 (10fgiunchedi) [15:42:06] !log db1069: Migrate trwiktionary.page from TokuDB to InnoDB [15:42:16] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:42:58] marostegui: noooo poor toku! [15:42:59] :P [15:43:02] haha [15:43:35] (03PS1) 10Andrew Bogott: labs firstboot: use the new puppetmaster [puppet] - 10https://gerrit.wikimedia.org/r/369403 [15:43:54] !log rebooting lvs1002 [15:44:04] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:44:59] (03PS1) 10RobH: adding cp402[5-8] install params [puppet] - 10https://gerrit.wikimedia.org/r/369404 (https://phabricator.wikimedia.org/T172198) [15:45:06] andrewbogott: s/: /: / :P [15:45:26] we really need to add that test to bd808's commit message checker [15:45:28] (03PS2) 10Andrew Bogott: labs firstboot: use the new puppetmaster [puppet] - 10https://gerrit.wikimedia.org/r/369403 [15:45:34] (03CR) 10RobH: [C: 032] adding cp402[5-8] install params [puppet] - 10https://gerrit.wikimedia.org/r/369404 (https://phabricator.wikimedia.org/T172198) (owner: 10RobH) [15:45:35] Sometimes replication gets totally stuck with toku :( - https://jira.mariadb.org/browse/MDEV-10796 [15:45:39] elukey: ^ [15:47:40] ahhh what a nice one [15:49:01] (03PS2) 10Giuseppe Lavagetto: Use own differ instead of puppet catalog diff [software/puppet-compiler] - 10https://gerrit.wikimedia.org/r/369385 [15:49:09] (03PS2) 10Jcrespo: mariadb: Add new python3 script to check the health of a server [puppet] - 10https://gerrit.wikimedia.org/r/369397 (https://phabricator.wikimedia.org/T171928) [15:51:40] (03CR) 10jerkins-bot: [V: 04-1] Use own differ instead of puppet catalog diff [software/puppet-compiler] - 10https://gerrit.wikimedia.org/r/369385 (owner: 10Giuseppe Lavagetto) [15:51:50] (03CR) 10jerkins-bot: [V: 04-1] mariadb: Add new python3 script to check the health of a server [puppet] - 10https://gerrit.wikimedia.org/r/369397 (https://phabricator.wikimedia.org/T171928) (owner: 10Jcrespo) [15:52:15] (03PS3) 10Andrew Bogott: labs firstboot: use the new puppetmaster [puppet] - 10https://gerrit.wikimedia.org/r/369403 [15:53:22] PROBLEM - Check Varnish expiry mailbox lag on cp1072 is CRITICAL: CRITICAL: expiry mailbox lag is 2067679 [15:53:40] (03CR) 10Andrew Bogott: [C: 032] labs firstboot: use the new puppetmaster [puppet] - 10https://gerrit.wikimedia.org/r/369403 (owner: 10Andrew Bogott) [15:54:45] !log varnish backend restart on cp1072 (mailbox lag) [15:54:55] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:56:11] (03PS3) 10Jcrespo: mariadb: Add new python3 script to check the health of a server [puppet] - 10https://gerrit.wikimedia.org/r/369397 (https://phabricator.wikimedia.org/T171928) [15:58:10] (03CR) 10jerkins-bot: [V: 04-1] mariadb: Add new python3 script to check the health of a server [puppet] - 10https://gerrit.wikimedia.org/r/369397 (https://phabricator.wikimedia.org/T171928) (owner: 10Jcrespo) [16:00:04] godog, moritzm, and _joe_: Dear anthropoid, the time has come. Please deploy Puppet SWAT(Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20170801T1600). [16:00:05] twentyafterfour and Amir1: A patch you scheduled for Puppet SWAT(Max 8 patches) is about to be deployed. Please be available during the process. [16:00:12] o/ [16:01:29] (03PS4) 10Jcrespo: mariadb: Add new python3 script to check the health of a server [puppet] - 10https://gerrit.wikimedia.org/r/369397 (https://phabricator.wikimedia.org/T171928) [16:01:34] I'll take a look [16:02:33] jynus: ok to merge this? https://gerrit.wikimedia.org/r/#/c/366887/ [16:02:52] (03PS2) 10Filippo Giunchedi: deployment-prep: enable reusable TC on HHVM [puppet] - 10https://gerrit.wikimedia.org/r/364148 (https://phabricator.wikimedia.org/T103886) (owner: 10Giuseppe Lavagetto) [16:02:54] (03CR) 10jerkins-bot: [V: 04-1] mariadb: Add new python3 script to check the health of a server [puppet] - 10https://gerrit.wikimedia.org/r/369397 (https://phabricator.wikimedia.org/T171928) (owner: 10Jcrespo) [16:02:58] godog: ok [16:03:09] (03PS3) 10DCausse: [WIP] Bump version of the ltr plugin [software/elasticsearch/plugins] - 10https://gerrit.wikimedia.org/r/364462 [16:03:12] (03PS3) 10Giuseppe Lavagetto: Use own differ instead of puppet catalog diff [software/puppet-compiler] - 10https://gerrit.wikimedia.org/r/369385 [16:03:21] RECOVERY - Check Varnish expiry mailbox lag on cp1072 is OK: OK: expiry mailbox lag is 0 [16:03:26] ack, thanks [16:03:26] (03PS4) 10Filippo Giunchedi: mediawiki: increase the batch size of dispatchChanges cronjob [puppet] - 10https://gerrit.wikimedia.org/r/366887 (https://phabricator.wikimedia.org/T171263) (owner: 10Ladsgroup) [16:04:31] (03CR) 10Filippo Giunchedi: [C: 032] mediawiki: increase the batch size of dispatchChanges cronjob [puppet] - 10https://gerrit.wikimedia.org/r/366887 (https://phabricator.wikimedia.org/T171263) (owner: 10Ladsgroup) [16:04:31] Thanks jynus [16:05:50] (03CR) 10Jcrespo: "FileExceptionError is a valid exception, is this due to https://bugs.launchpad.net/pyflakes/+bug/1169552 ?" [puppet] - 10https://gerrit.wikimedia.org/r/369397 (https://phabricator.wikimedia.org/T171928) (owner: 10Jcrespo) [16:06:19] _joe_: https://gerrit.wikimedia.org/r/#/c/364148 is up for puppet swat, ok to merge or you want to do it yourself? [16:06:24] Amir1: merged [16:06:35] Thanks [16:06:43] I look grafana [16:13:44] (03CR) 10Bearloga: [C: 031] "Thanks! All the columns are correct & accounted for." [puppet] - 10https://gerrit.wikimedia.org/r/369350 (https://phabricator.wikimedia.org/T171622) (owner: 10Mforns) [16:15:44] (03CR) 10Ayounsi: [C: 032] Add codfw frack to Smokeping, Icinga and Rancid [puppet] - 10https://gerrit.wikimedia.org/r/368824 (https://phabricator.wikimedia.org/T171970) (owner: 10Ayounsi) [16:15:50] (03PS2) 10Ayounsi: Add codfw frack to Smokeping, Icinga and Rancid [puppet] - 10https://gerrit.wikimedia.org/r/368824 (https://phabricator.wikimedia.org/T171970) [16:16:04] (03PS5) 10Jcrespo: mariadb: Add new python3 script to check the health of a server [puppet] - 10https://gerrit.wikimedia.org/r/369397 (https://phabricator.wikimedia.org/T171928) [16:17:58] !log restarting elastic on relforge100x servers to pick up new version of the plugins [16:18:07] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:18:32] (03CR) 10Dzahn: [C: 031] Add codfw frack to Smokeping, Icinga and Rancid [puppet] - 10https://gerrit.wikimedia.org/r/368824 (https://phabricator.wikimedia.org/T171970) (owner: 10Ayounsi) [16:25:14] !log MediaWiki Train: Creating new branch wmf/1.30.0-wmf.12 from master. See T170631 for deployment blockers. [16:25:24] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:25:24] T170631: 1.30.0-wmf.13 deployment blockers - https://phabricator.wikimedia.org/T170631 [16:26:53] (03PS1) 10Urbanecm: Assign autopatrol to all holders of autoreview on arwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/369414 (https://phabricator.wikimedia.org/T167071) [16:27:05] 10Operations, 10ops-ulsfo, 10Traffic: setup/install cp402[5-8].ulsfo.wmnet - https://phabricator.wikimedia.org/T172198#3489707 (10RobH) [16:30:52] twentyafterfour i think you linked the wrong task? [16:31:06] Hello, where's todays Morning SWAT? [16:31:13] jouncebot next [16:31:13] In 0 hour(s) and 28 minute(s): Services – Graphoid / Parsoid / OCG / Citoid / ORES (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20170801T1700) [16:34:10] Urbanecm i dont think there is a morning swat today for some reason [16:34:46] twentyafterfour: oops, I missed the branch cut...... [16:34:52] (again) [16:35:15] That's the reason I'm asking Zppix :) [16:35:58] twentyafterfour: could I still get something in? [16:36:01] Is there anybody who will be available during the Evening one? [16:36:06] apologies!!!! [16:36:07] (03PS3) 10Giuseppe Lavagetto: deployment-prep: enable reusable TC on HHVM [puppet] - 10https://gerrit.wikimedia.org/r/364148 (https://phabricator.wikimedia.org/T103886) [16:36:18] (03CR) 10Giuseppe Lavagetto: [V: 032 C: 032] deployment-prep: enable reusable TC on HHVM [puppet] - 10https://gerrit.wikimedia.org/r/364148 (https://phabricator.wikimedia.org/T103886) (owner: 10Giuseppe Lavagetto) [16:39:09] 10Operations, 10ops-ulsfo, 10Traffic: setup/install cp402[5-8].ulsfo.wmnet - https://phabricator.wikimedia.org/T172198#3489738 (10RobH) [16:39:33] PROBLEM - Juniper alarms on pfw3-codfw is CRITICAL: JNX_ALARMS CRITICAL - The requested table is empty or does not exist [16:40:31] 10Operations, 10Traffic: setup/install cp402[5-8].ulsfo.wmnet - https://phabricator.wikimedia.org/T172198#3489494 (10RobH) a:05RobH>03BBlack These are now ready for use, and are calling into puppet with role spare. [16:40:33] PROBLEM - Router interfaces on pfw3-codfw is CRITICAL: CRITICAL: host 208.80.153.197, interfaces up: 41, down: 10, dormant: 0, excluded: 0, unused: 0 [16:40:47] AndyRussG: I can merge a patch as soon as branch finishes [16:41:05] Zppix: yes I linked the wrong task [16:41:09] twentyafterfour: weee thanks so much! [16:41:26] AndyRussG: just tell me which patch you need [16:42:16] twentyafterfour i can fix the log page if you wish? [16:42:31] Zppix: sure [16:42:33] Zppix it's on twitter too :) [16:42:36] Zppix: not important, it would be different than the other 2 locations [16:42:51] https://tools.wmflabs.org/sal/production and the task [16:43:18] greg-g i realise that but i help where i can :) [16:43:56] my point is there isn't a need and it could only create confusion [16:44:01] twentyafterfour: this is the Gerrit change for the update to CN's wmf_deploy branch: https://gerrit.wikimedia.org/r/#/c/369416/ [16:44:47] In core, the submodule points to the tip of that branch, which will now be 86153ab94d62 [16:45:26] (twentyafterfour: as soon as the Gerrit change merges, that is...) [16:45:37] thx again!!!! [16:46:32] 10Operations, 10ops-ulsfo, 10Traffic: setup/install cp4022 - https://phabricator.wikimedia.org/T171967#3489756 (10RobH) [16:46:38] 10Operations, 10ops-ulsfo, 10Traffic: setup/install cp4022 - https://phabricator.wikimedia.org/T171967#3481679 (10RobH) 05stalled>03Open [16:46:40] 10Operations, 10ops-ulsfo, 10Traffic, 10Patch-For-Review: replace ulsfo aging servers - https://phabricator.wikimedia.org/T164327#3489758 (10RobH) [16:46:47] AndyRussG: got it [16:47:27] Zppix: greg-g: I posted a comment on the phabricator task mentioning that it was linked to the wrong task ID, in case anyone is further confused. [16:49:22] twentyafterfour: :) [16:51:35] 10Operations, 10ops-ulsfo, 10Traffic: setup/install cp4022 - https://phabricator.wikimedia.org/T171967#3489784 (10RobH) a:05RobH>03BBlack cp4022 ready for use. [16:52:41] 10Operations, 10ops-ulsfo, 10Traffic, 10Patch-For-Review: replace ulsfo aging servers - https://phabricator.wikimedia.org/T164327#3489786 (10RobH) cp402[1-8] are all racked and ready for use. [16:55:38] (03CR) 10Giuseppe Lavagetto: [C: 032] Use own differ instead of puppet catalog diff [software/puppet-compiler] - 10https://gerrit.wikimedia.org/r/369385 (owner: 10Giuseppe Lavagetto) [16:57:14] (03PS4) 10Paladox: Gerrit: Set auth.userNameToLowerCase [puppet] - 10https://gerrit.wikimedia.org/r/368196 [16:57:18] (03PS3) 10Paladox: Gerrit: Remove ldap user and password from secure.config [puppet] - 10https://gerrit.wikimedia.org/r/366910 [16:58:18] (03PS1) 10Giuseppe Lavagetto: puppet-compiler: bump version again [puppet] - 10https://gerrit.wikimedia.org/r/369421 [16:59:21] (03CR) 10Giuseppe Lavagetto: [V: 032 C: 032] puppet-compiler: bump version again [puppet] - 10https://gerrit.wikimedia.org/r/369421 (owner: 10Giuseppe Lavagetto) [16:59:49] (03PS31) 10Paladox: Gerrit: Add support for scap [puppet] - 10https://gerrit.wikimedia.org/r/363726 (https://phabricator.wikimedia.org/T157414) [17:00:05] gwicke, cscott, arlolra, subbu, halfak, and Amir1: Dear anthropoid, the time has come. Please deploy Services – Graphoid / Parsoid / OCG / Citoid / ORES (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20170801T1700). [17:00:07] * paladox sets status ^^ to rebase [17:00:17] Nothing for ORES today [17:02:17] ACKNOWLEDGEMENT - Router interfaces on pfw3-codfw is CRITICAL: CRITICAL: host 208.80.153.197, interfaces up: 41, down: 10, dormant: 0, excluded: 0, unused: 0: Ayounsi ACK, investigating [17:02:49] (03CR) 10Paladox: "@Chad or @Thcipriani could you re review please? :)" [puppet] - 10https://gerrit.wikimedia.org/r/363726 (https://phabricator.wikimedia.org/T157414) (owner: 10Paladox) [17:02:59] no parsoid deploy today [17:04:27] (03PS2) 10Lucas Werkmeister (WMDE): Log 'WikibaseQualityConstraints' [mediawiki-config] - 10https://gerrit.wikimedia.org/r/367914 (https://phabricator.wikimedia.org/T171281) [17:04:39] 10Operations, 10monitoring: Monitor hardware thermal issues - https://phabricator.wikimedia.org/T125205#3489844 (10ayounsi) [17:04:42] 10Operations, 10monitoring, 10Patch-For-Review: Several hosts return "internal IPMI error" in the check_ipmi_temp check - https://phabricator.wikimedia.org/T167121#3489842 (10ayounsi) 05Resolved>03Open db2040 is alerting as unknown as well: ``` root@db2040:~# ipmi-sensors ID | Name | Type | Reading |... [17:05:03] (03CR) 10Lucas Werkmeister (WMDE): "Rebased due to conflict with Ib1a761c0a3 (WMDE channel, removed yesterday)." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/367914 (https://phabricator.wikimedia.org/T171281) (owner: 10Lucas Werkmeister (WMDE)) [17:05:23] 10Operations, 10ops-ulsfo, 10Traffic, 10Patch-For-Review: replace ulsfo aging servers - https://phabricator.wikimedia.org/T164327#3489850 (10BBlack) Excellent news! I'll try to squeeze in replacing one of the clusters ASAP, which will decom another 6x of the old cp to let us move further. [17:10:01] 10Operations, 10ops-ulsfo, 10Traffic, 10Patch-For-Review: replace ulsfo aging servers - https://phabricator.wikimedia.org/T164327#3489868 (10RobH) [17:11:09] 10Operations, 10ops-ulsfo, 10hardware-requests, 10Patch-For-Review: Decommission cp400[1-4] - https://phabricator.wikimedia.org/T169020#3384648 (10RobH) All of these systems have now been wiped and moved around in the racks in ulsfo. racktables shows their current position, but since wipe and movement in... [17:11:13] 10Operations, 10ops-ulsfo, 10hardware-requests: Decommission cp4011, cp4012, cp4019, cp4020 - https://phabricator.wikimedia.org/T167377#3331138 (10RobH) All of these systems have now been wiped and moved around in the racks in ulsfo. racktables shows their current position, but since wipe and movement in th... [17:15:20] 10Operations, 10monitoring, 10Patch-For-Review: Several hosts return "internal IPMI error" in the check_ipmi_temp check - https://phabricator.wikimedia.org/T167121#3489883 (10jcrespo) Should we try restarting it or upgrading firmware? [17:18:44] PROBLEM - CirrusSearch eqiad 95th percentile latency on graphite1001 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [1000.0] [17:20:24] PROBLEM - wikidata.org dispatch lag is higher than 300s on www.wikidata.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 200 OK - pattern not found - 1951 bytes in 0.137 second response time [17:21:14] PROBLEM - MariaDB Slave Lag: s5 on db2045 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 612.12 seconds [17:21:24] PROBLEM - MariaDB Slave Lag: s5 on db2059 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 620.68 seconds [17:21:24] PROBLEM - MariaDB Slave Lag: s5 on db2038 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 622.51 seconds [17:21:34] PROBLEM - MariaDB Slave Lag: s5 on db2066 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 631.89 seconds [17:21:45] PROBLEM - MariaDB Slave Lag: s5 on db2052 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 644.88 seconds [17:23:27] !log MediaWiki train for 1.30.0-wmf.12 - finished `scap prep` & `scap patch` refs T168053 [17:23:38] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:23:39] T168053: 1.30.0-wmf.12 deployment blockers - https://phabricator.wikimedia.org/T168053 [17:24:14] RECOVERY - MariaDB Slave Lag: s5 on db2045 is OK: OK slave_sql_lag Replication lag: 39.33 seconds [17:24:24] RECOVERY - MariaDB Slave Lag: s5 on db2038 is OK: OK slave_sql_lag Replication lag: 0.20 seconds [17:24:25] there seems to be wikidata or dewiki lag issues [17:24:34] RECOVERY - MariaDB Slave Lag: s5 on db2066 is OK: OK slave_sql_lag Replication lag: 0.50 seconds [17:24:49] started on 17:11 [17:24:54] RECOVERY - MariaDB Slave Lag: s5 on db2052 is OK: OK slave_sql_lag Replication lag: 0.50 seconds [17:24:57] anything deployed then? [17:25:15] 10Operations, 10Epic, 10Goal, 10Services (doing): End of August milestone: Cassandra 3 cluster in production - https://phabricator.wikimedia.org/T169939#3489900 (10Eevans) Truncating everything but the 3 Wikipedia Parsoid tables would only net 8.1T, so it does not appear that there is any alternative to cu... [17:25:24] RECOVERY - MariaDB Slave Lag: s5 on db2059 is OK: OK slave_sql_lag Replication lag: 0.44 seconds [17:25:24] RECOVERY - wikidata.org dispatch lag is higher than 300s on www.wikidata.org is OK: HTTP OK: HTTP/1.1 200 OK - 1952 bytes in 0.148 second response time [17:26:15] jynus: Might've been me fixing https://phabricator.wikimedia.org/T104686 [17:26:35] mmm [17:26:45] 10Operations, 10Wikidata, 10Patch-For-Review, 10User-notice, 10Wikimedia-Incident: Wikidata and dewiki databases locked - https://phabricator.wikimedia.org/T171928#3489904 (10Esc3300) Maybe as an action point for (unlikely) future incidents, when Wikidata goes into read-only the subscriptions mentioned a... [17:26:53] But there weren't many edits etc [17:26:55] Reedy: no problem, but please log [17:27:09] https://phabricator.wikimedia.org/T104686#3489523 [17:27:10] :P [17:27:24] I think there was only one wiki with edits [17:27:40] So unless some of the wikis still had a load of say log entries to move... [17:27:46] It shouldn't have been changing much [17:28:45] 123K errrors on log [17:29:28] Reedy: that is manuel comment, not mine [17:29:43] hmm [17:30:02] I do not necesarily agree with him [17:30:09] PROBLEM - MariaDB Slave Lag: s5 on db1049 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 1140.07 seconds [17:31:04] Still getting poor performance [17:31:10] On English Wikisource [17:31:43] MergeUser::mergeDatabaseTables [17:31:53] it certainly is that job [17:31:57] That is strange, we have done a much bigger renames lately [17:32:02] Without any single issue [17:32:02] ShakespeareFan00 it's most likly your phone line. [17:32:03] 1536 seconds running an update [17:32:05] (hance my comment there) [17:32:08] *hence [17:32:17] paladox: I don't have problems with any other site [17:32:23] on dewiki [17:32:24] I was using YouTube perfectly accepatbly [17:32:36] And YouTube is crazy bandwidth [17:32:37] ShakespeareFan00 youtube use special caching techniches [17:32:57] paladox: So why doesn't Mediawiki? [17:33:32] ShakespeareFan00: mediawiki caches but as youtube is spread all over the world. [17:33:41] One thing that would consideriably for my editing purposes would be 'diff' based edits... [17:33:51] ShakespeareFan00: I doint have problems with wikisource and im in the same country :) [17:33:52] So it doesn't have to send the "whole" text block back [17:33:54] RECOVERY - CirrusSearch eqiad 95th percentile latency on graphite1001 is OK: OK: Less than 20.00% above the threshold [500.0] [17:34:00] paladox [17:34:01] Hmm [17:34:10] (03PS1) 10Ottomata: Change default druid extension load for 0.9.2 upgrade [puppet] - 10https://gerrit.wikimedia.org/r/369432 (https://phabricator.wikimedia.org/T170590) [17:34:31] It's only WMF sites that do this though [17:34:39] Other sites don't take nearly as long [17:35:02] (03CR) 10Ottomata: [V: 032 C: 032] Change default druid extension load for 0.9.2 upgrade [puppet] - 10https://gerrit.wikimedia.org/r/369432 (https://phabricator.wikimedia.org/T170590) (owner: 10Ottomata) [17:35:09] ShakespeareFan00 try upgrading to vodaphone directly or bt buisness. Your provider only provides adsl [17:35:12] that's slowwww [17:35:24] My speed test says about 9Mbps [17:35:32] Given it's supposed to be 8MBps [17:35:37] Speed is only one metric [17:35:39] There's latency [17:35:44] <_joe_> ShakespeareFan00: please read https://wikitech.wikimedia.org/wiki/Reporting_a_connectivity_issue and file the appropriate task [17:35:47] There's the route it takes to get to the wikimedia servers [17:36:05] <_joe_> We have no other reports from Vodafone UK users, btw [17:36:24] Anyway in the past, slow response has typically been down to "throttling" or "filtering" [17:36:39] ShakespeareFan00 that's illegal i think or i may be wrong [17:36:40] Maybe it's time Commons bit the bullet and DID actually remove the porn XD [17:36:57] paladox: Throttling isn't illegal [17:37:05] It's traffic managment [17:37:10] ShakespeareFan00 they signed an agreement with the uk govement [17:37:19] bt promised never to throttle [17:37:38] paladox: What BT "said" and what they do are different things.. [17:37:55] You can still "traffic manage" without doing active throtlling [17:38:17] (and I will be making some qneuires to Avast about the relaibility of thier web filter soon anyway) [17:38:37] see https://ec.europa.eu/digital-single-market/en/news/new-rules-roaming-charges-and-open-internet [17:38:40] (03PS1) 10MaxSem: Redo "enable mapframe for euwiki, ptwiki and uawikimedia" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/369434 [17:38:56] When I was having problems with Imgur, it was directly due to "content filtering" [17:39:04] the rules came in june 2017 [17:39:06] <_joe_> ShakespeareFan00: I'd disable avast web filtering before testing for reporting the task, btw [17:39:23] _joe_: Naturally [17:39:28] (03PS1) 10BBlack: VCL: block PKP output [puppet] - 10https://gerrit.wikimedia.org/r/369435 [17:39:33] <_joe_> ShakespeareFan00: just making sure :) [17:39:42] ShakespeareFan00 it's illegal to throttle in the eu according to the new rules that came enforced in june 2017 [17:39:45] 10Operations, 10Wikidata, 10Patch-For-Review, 10User-notice, 10Wikimedia-Incident: Wikidata and dewiki databases locked - https://phabricator.wikimedia.org/T171928#3489939 (10jcrespo) > Wikidata goes into read-only the subscriptions mentioned Yes, definitely some extensions in the past do not behave per... [17:39:48] <_joe_> paladox: please stop spreading FUD about things you clearly are not informed enough about (like wikimedia's caching infrastructure and/or internet-level networking) [17:39:57] <_joe_> it's not helping [17:39:58] (03CR) 10BBlack: [V: 032 C: 032] VCL: block PKP output [puppet] - 10https://gerrit.wikimedia.org/r/369435 (owner: 10BBlack) [17:40:34] _joe_: Does the Mediawiki side throttle for high volume users as anti overload measure? [17:40:37] I wonder why that UPDATE is taking so much...there are no edits there :| [17:40:57] Google for example has specfic anti-overload scripts [17:41:05] <_joe_> ShakespeareFan00: the only throttles I think you can concievably hit would show you an error message, not slowness [17:41:23] _joe_: Thanks [17:41:37] ShakespeareFan00: (and really, anyone else reading) - as a general rule, all the antivirus/malware "web security" software you can get for a PC is a Bad Idea, it will cause more problems and/or active security harm than it solves [17:41:45] <_joe_> I would rather look at your ISP and its peering with us, but this is a shot in the dark tbh [17:41:46] marostegui: I was thinking schema differences [17:41:53] step 1: disable your avast stuff [17:41:59] but they are all the same [17:42:11] it doesn't help that it is doing a full table scan [17:42:27] bblack: I only have Avast, because certain entities I use to do web stuff for insisted... [17:42:32] *used [17:42:58] http://robert.ocallahan.org/2017/01/disable-your-antivirus-software-except.html [17:43:04] http://robert.ocallahan.org/2017/01/disable-your-antivirus-software-except.html [17:43:07] And in any case I should not be encountering malware editing Wikisource [17:43:09] jynus: That server has been complaining a lot about memory issues, lately too [17:43:12] oops second link should've been: [17:43:17] https://jhalderm.com/pub/papers/interception-ndss17.pdf [17:43:25] that us a 64 million update [17:43:38] even if it only updates 40K rows [17:44:15] jynus: It will update 0 rows (I check on another server) [17:44:30] then the query is horrible [17:44:43] and either and index or batching should be done [17:44:54] I think it finished now [17:45:09] to be fair, db1049 is going to be decomissioned soon [17:45:15] It finished now [17:45:21] Odd.. [17:45:26] Avast web shield isn't on for me [17:45:31] but it took 10-1000 times more than the other hosts [17:45:42] the most-relevant TL;DR from the second link (the long paper) for this topic is this from the conclusion section: [17:45:45] "We [17:45:46] (03PS1) 10Ayounsi: Icinga: remove Juniper Alarms check as not exposed via SNMP [puppet] - 10https://gerrit.wikimedia.org/r/369436 (https://phabricator.wikimedia.org/T171970) [17:45:46] vulnerabilities (e.g., fail to validate certificates)" [17:45:49] investigated popular antivirus and corporate proxies, finding that [17:45:51] nearly all reduce connection security and that many introduce [17:45:59] jynus: That server has probably an horrible buffer pool too, it has been depooled for months too [17:46:05] I think it also created issues on s4 [17:46:14] bblack: This is the wrong channel for a debate on this [17:46:41] I will for now assume it's a temporary issue [17:46:56] But it's been occuring intermitently for a few days [17:47:02] But only with WMF sites [17:47:16] jynus: As I said, maybe something was done differently for this special case, because we have been renaming users the last few weeks, with a LOT more edits, I am talking about...more than 50k ones [17:47:26] With no issues at all [17:47:36] Thanks for the input people [17:48:00] And for this one, the largest wiki had only 900 edits [17:48:52] Anyone here good with technical stuff [17:48:53] Reedy: did the rename finish all over the place yet? [17:48:53] ? [17:48:57] I had a suggestion [17:49:07] Yeah [17:49:11] ShakespeareFan00: the relevance is we've had other intermittent issue reports in the past (about users with slowness or connectfail) that pointed at AV-like products [17:49:19] RECOVERY - MariaDB Slave Lag: s5 on db1049 is OK: OK slave_sql_lag Replication lag: 10.81 seconds [17:49:34] bblack: It was diasabled... which is puzzling [17:49:38] ShakespeareFan00: ditching such software is kind of step 1 in my mind with any such investigation. it's actively harmful anyways, regardless of impact on the specific WMF issue [17:50:02] bblack : Do you ahve any reports of issues with ZoneAlarm [17:50:03] ? [17:50:19] Why do you want to run a software firewall on your desktop? [17:50:22] I used to use Comodo until Avast decided it MUST do things it's own way [17:50:29] I don't even bother trying to figure out which needles in the haystack of AV products happen to not cause a specific issue. It's all bad. [17:51:02] as the ex-mozilla blogger says: just use the windows built-in one and don't install 3rd party AV. [17:51:17] Reedy: I've sometimes had websites that were dead linked pull a fast one and attempt to connect with bad sites [17:51:21] I am also still on XP [17:51:25] oh [17:51:30] so don't have the most recent Windows firewall [17:51:31] so let's rewind from step 1 then [17:51:37] rofl [17:51:51] (And yes I know about the Windows XP is expired, and I am an Idiot...) [17:52:02] step 0: get away from XP very quickly. like, pour gasoline on it, take a few steps back, and light it [17:52:17] (03PS1) 10Bearloga: statistics::discovery: Fix scheduled command [puppet] - 10https://gerrit.wikimedia.org/r/369438 (https://phabricator.wikimedia.org/T170494) [17:52:24] bblack: When I have the budget , I am getting Debian on a new system [17:52:25] ;) [17:52:35] https://wikitech.wikimedia.org/wiki/HTTPS/Browser_Recommendations#For_Users_of_Microsoft_Windows [17:53:00] also, we're working on (but haven't yet finalized dates) for plans that will make MSIE-on-XP fail to connect to us at all, probably before the end of the year [17:53:19] Chrome or FF on XP will still work, but is still a worse idea than ditching XP altogether [17:53:22] (03PS2) 10Debt: Redo "enable mapframe for euwiki, ptwiki and uawikimedia" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/369434 (https://phabricator.wikimedia.org/T167619) (owner: 10MaxSem) [17:53:27] ottomata gehel: the tiniest patch awaits a +2 :) please and thank you [17:53:31] ShakespeareFan00: yes, I think the last time this happened, we said it's a Windows XP issue, and even if your installation isn't completely borked (how long since it was cleanly installed ?) the software suppliers won't have bug tested their software on XP, so unintended issues will be common. [17:53:36] bblack: Do you have a phase out for Firefox as well? [17:53:42] (03CR) 10Debt: [C: 031] "Looks good!" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/369434 (https://phabricator.wikimedia.org/T167619) (owner: 10MaxSem) [17:53:44] (03CR) 10Chad: "So, I'm not sure if we should be reusing the existing SSH keypair that we have...we never really sorted that question." (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/363726 (https://phabricator.wikimedia.org/T157414) (owner: 10Paladox) [17:53:48] Old FF maybe [17:53:57] Given that Mozilla's formally announced they will drop XP support soon [17:54:06] no, if you have the latest (well last available real release before they stop supporting it for new versions) Firefox or Chrome for XP, they will continue to work for quite far into the future [17:54:32] bblack: That was a long term suggestion though [17:54:36] as they support fairly modern crypto. in the long run on XP, FF will outlive Chrome eventually (as the last Chrome for XP only supports RSA certs, and FF supports ECDSA) [17:55:00] (03CR) 10Ayounsi: [C: 032] Icinga: remove Juniper Alarms check as not exposed via SNMP [puppet] - 10https://gerrit.wikimedia.org/r/369436 (https://phabricator.wikimedia.org/T171970) (owner: 10Ayounsi) [17:55:09] bearloga: i'm off today, but can have a look in ~1h if you don't find anyone [17:55:16] (03PS2) 10Ayounsi: Icinga: remove Juniper Alarms check as not exposed via SNMP [puppet] - 10https://gerrit.wikimedia.org/r/369436 (https://phabricator.wikimedia.org/T171970) [17:55:20] bblack: Nothing I do on Wikimedia sites technically "needs" crypto anyway [17:55:23] Other than logins [17:55:29] (03PS32) 10Paladox: Gerrit: Add support for scap [puppet] - 10https://gerrit.wikimedia.org/r/363726 (https://phabricator.wikimedia.org/T157414) [17:55:37] (03CR) 10Paladox: Gerrit: Add support for scap (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/363726 (https://phabricator.wikimedia.org/T157414) (owner: 10Paladox) [17:55:38] well unfortunately our policy disagrees with that opinion :) [17:56:08] https Isn't any event secure against snooping anyway [17:56:23] If someones determined enough... [17:56:31] Anyway... [17:56:35] On something else... [17:56:46] the most minimal and defensible version of the argument goes something like: we require and enforce strong crypto on all connections to us, because there are definitely people in the world who need it with us, and if we don't force it, the attackers can downgrade those peoples' connections to insecure ones [17:57:02] Currently when saving an edit, it saves the entire block [17:57:28] https is fairly robust against mass surveillance. nothing is robust against targeted and well-financed surveillance of a specific individual. [17:57:45] debt, there's no morning SWAT today [17:57:45] It's a shame there isn't a technical means to implement "diff" based saves, so that you don't have to send back 1K blocks when all you've changed is 4-5 charcters as a typo [17:57:46] I mean eventually they'll just send someone into your house and physically compromise systems, keyboards, etc [17:58:08] MaxSem: so, 4pm PT today then? [17:58:12] we can only do what we can, which is prevent easy remote mass surveillance [17:58:15] yup [17:58:21] MaxSem: coolio. see ya then! [17:58:35] I will ask my question about "diff" based saves in #mediawiki [17:59:21] Diff based saves? You'd still have to send the whole text to the server to diff it. [17:59:25] (03PS2) 10Bearloga: statistics::discovery: Fix scheduled command [puppet] - 10https://gerrit.wikimedia.org/r/369438 (https://phabricator.wikimedia.org/T170494) [17:59:51] Rainbow sparkles: The client has a copy [18:00:03] You'd make a diff in the client when saving... [18:00:04] Diffing client-side sounds ugly. [18:00:15] Not impossible though? [18:00:52] I wouldn't trust it. The edge cases could result in bad text being submitted. [18:01:04] That's fair [18:01:19] I mean not impossible, but between various browsers & such you'd have a ton of competing implementations. Someone having a busted JS implementation could result in bad diffs. [18:01:27] And then my innocent behavior busts your work. [18:02:33] It's an interesting idea in theory, but not really practical imho [18:03:01] The other related concern was to do do "real-time" collabrative edits you can see [18:03:14] without the "edit conflict" warnings you get currently [18:03:30] Somehow Google Docs manages to have a "Refresh" and real-time view [18:04:48] Google also has a lot more engineers and money than us :) [18:04:57] But yeah [18:04:57] The future will be awesome :D [18:05:22] Wiki's are supposed to be collabrative [18:05:40] It's a shame that at present they need you to form an orderly queue when editing [18:05:42] ;) [18:05:57] Rather than you being able to see the other edits being made as you edit in real time. [18:06:27] Of course. Hard problems to solve :) [18:07:03] Diif based "tranasctions" when saving would help "real-time" as you could have some kind of journaling [18:07:19] which is how some file systems now work? [18:08:30] I suppose [18:08:32] :) [18:09:40] (03PS5) 10Dzahn: phab1001: Allow listen_address to be empty for migration from iridium [puppet] - 10https://gerrit.wikimedia.org/r/369001 (owner: 1020after4) [18:13:15] (03CR) 10Dzahn: [C: 032] phab1001: Allow listen_address to be empty for migration from iridium (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/369001 (owner: 1020after4) [18:14:12] (03PS6) 10Dzahn: phabricator: Allow listen_address to be empty for migration from iridium [puppet] - 10https://gerrit.wikimedia.org/r/369001 (https://phabricator.wikimedia.org/T163938) (owner: 1020after4) [18:14:27] (03CR) 1020after4: [C: 031] phabricator: Allow listen_address to be empty for migration from iridium (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/369001 (https://phabricator.wikimedia.org/T163938) (owner: 1020after4) [18:18:21] (03PS1) 10Paladox: Revert "phabricator: make phab1001 use role::spare for now" [puppet] - 10https://gerrit.wikimedia.org/r/369444 [18:18:37] (03CR) 10jerkins-bot: [V: 04-1] Revert "phabricator: make phab1001 use role::spare for now" [puppet] - 10https://gerrit.wikimedia.org/r/369444 (owner: 10Paladox) [18:19:54] PROBLEM - Restbase root url on restbase-dev1004 is CRITICAL: connect to address 10.64.0.89 and port 7231: Connection refused [18:20:22] ^^^^ known [18:20:34] PROBLEM - Check systemd state on restbase-dev1004 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [18:20:43] (03PS2) 10Paladox: Revert "phabricator: make phab1001 use role::spare for now" [puppet] - 10https://gerrit.wikimedia.org/r/369444 [18:21:34] RECOVERY - Check systemd state on restbase-dev1004 is OK: OK - running: The system is fully operational [18:22:06] (03PS3) 10Dzahn: Revert "phabricator: make phab1001 use role::spare for now" [puppet] - 10https://gerrit.wikimedia.org/r/369444 (https://phabricator.wikimedia.org/T163938) (owner: 10Paladox) [18:23:28] (03PS4) 10Paladox: phabricator: make phab1001 use role::phabricator_server [puppet] - 10https://gerrit.wikimedia.org/r/369444 (https://phabricator.wikimedia.org/T163938) [18:24:04] (03CR) 10Dzahn: [C: 032] "yep, this can be done now after https://gerrit.wikimedia.org/r/#/c/369001/" [puppet] - 10https://gerrit.wikimedia.org/r/369444 (https://phabricator.wikimedia.org/T163938) (owner: 10Paladox) [18:24:06] (03CR) 10Dzahn: [C: 032] phabricator: make phab1001 use role::phabricator_server [puppet] - 10https://gerrit.wikimedia.org/r/369444 (https://phabricator.wikimedia.org/T163938) (owner: 10Paladox) [18:26:48] (03PS1) 10Dzahn: Revert "phabricator/admins: give phab admins access to phab1001" [puppet] - 10https://gerrit.wikimedia.org/r/369445 [18:26:50] (03PS3) 10Ottomata: statistics::discovery: Fix scheduled command [puppet] - 10https://gerrit.wikimedia.org/r/369438 (https://phabricator.wikimedia.org/T170494) (owner: 10Bearloga) [18:26:52] (03CR) 10Ottomata: [V: 032 C: 032] statistics::discovery: Fix scheduled command [puppet] - 10https://gerrit.wikimedia.org/r/369438 (https://phabricator.wikimedia.org/T170494) (owner: 10Bearloga) [18:27:17] ottomata: thank you! :D [18:27:34] yw! [18:27:46] (03PS1) 10Mobrovac: RESTBase: Add the Recommendation API URI [puppet] - 10https://gerrit.wikimedia.org/r/369446 (https://phabricator.wikimedia.org/T170877) [18:27:48] (03CR) 10Paladox: [C: 031] "Was fixed by doing https://gerrit.wikimedia.org/r/#/c/369444/ so this is now unneeded +1" [puppet] - 10https://gerrit.wikimedia.org/r/369445 (owner: 10Dzahn) [18:28:54] RECOVERY - Restbase root url on restbase-dev1004 is OK: HTTP OK: HTTP/1.1 200 - 15600 bytes in 0.014 second response time [18:29:05] (03CR) 10Dzahn: "yep, exactly. i also wonder about the other 2 lines in this file" [puppet] - 10https://gerrit.wikimedia.org/r/369445 (owner: 10Dzahn) [18:29:54] PROBLEM - Check systemd state on phab1001 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [18:30:05] icinga-wm: try again [18:30:29] (03CR) 10Paladox: [C: 031] "Ah phabricator::logmail::ensure: i guess is either the mail or something to do with logs. where as the line under it is the metric emails" [puppet] - 10https://gerrit.wikimedia.org/r/369445 (owner: 10Dzahn) [18:30:54] RECOVERY - Check systemd state on phab1001 is OK: OK - running: The system is fully operational [18:30:58] (03CR) 10Dzahn: "yea, whatever it is, but don't we want it ENABLED tomorrow?" [puppet] - 10https://gerrit.wikimedia.org/r/369445 (owner: 10Dzahn) [18:31:28] mutante ^^ yep [18:31:29] yea, phab::logmail is the mail to admins with stats [18:31:47] yep [18:32:36] (03PS3) 10Dzahn: Revert "phabricator/admins: give phab admins access to phab1001" [puppet] - 10https://gerrit.wikimedia.org/r/369445 [18:32:56] (03PS4) 10Dzahn: Revert "phabricator/admins: give phab admins access to phab1001" [puppet] - 10https://gerrit.wikimedia.org/r/369445 (https://phabricator.wikimedia.org/T163938) [18:33:06] (03PS1) 10Dzahn: phabricator: enable stats mail and dumps on phab1001 [puppet] - 10https://gerrit.wikimedia.org/r/369447 (https://phabricator.wikimedia.org/T163938) [18:34:27] (03CR) 10Paladox: [C: 031] phabricator: enable stats mail and dumps on phab1001 [puppet] - 10https://gerrit.wikimedia.org/r/369447 (https://phabricator.wikimedia.org/T163938) (owner: 10Dzahn) [18:37:19] (03CR) 10Dzahn: [C: 032] Revert "phabricator/admins: give phab admins access to phab1001" [puppet] - 10https://gerrit.wikimedia.org/r/369445 (https://phabricator.wikimedia.org/T163938) (owner: 10Dzahn) [18:38:09] 10Operations, 10Android-app-feature-Compilations, 10Traffic, 10Wikipedia-Android-App-Backlog, 10Reading-Infrastructure-Team-Backlog (Kanban): Determine URL paths for Zim files - https://phabricator.wikimedia.org/T172148#3490229 (10Fjalapeno) @fgiunchedi We can keep a separate database / list of all the Z... [18:38:15] (03PS1) 10Reedy: OOUIHTMLForm does not support the 'cols' parameter for textareas [mediawiki-config] - 10https://gerrit.wikimedia.org/r/369448 (https://phabricator.wikimedia.org/T172199) [18:39:36] (03PS2) 10Dzahn: phabricator: enable stats mail and dumps on phab1001 [puppet] - 10https://gerrit.wikimedia.org/r/369447 (https://phabricator.wikimedia.org/T163938) [18:40:32] (03PS2) 10Reedy: OOUIHTMLForm does not support the 'cols' parameter for textareas [mediawiki-config] - 10https://gerrit.wikimedia.org/r/369448 (https://phabricator.wikimedia.org/T172199) [18:40:42] (03CR) 10Reedy: "PS2 removes displayformat (noop) attributes too" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/369448 (https://phabricator.wikimedia.org/T172199) (owner: 10Reedy) [18:40:48] (03PS3) 10Dzahn: phabricator: enable stats mail and dumps on phab1001 [puppet] - 10https://gerrit.wikimedia.org/r/369447 (https://phabricator.wikimedia.org/T163938) [18:41:10] (03CR) 10Dzahn: [C: 032] phabricator: enable stats mail and dumps on phab1001 [puppet] - 10https://gerrit.wikimedia.org/r/369447 (https://phabricator.wikimedia.org/T163938) (owner: 10Dzahn) [18:41:38] (03PS1) 10Urbanecm: Rename Wikinews namespace to Wikinieuws on nl.wikinews [mediawiki-config] - 10https://gerrit.wikimedia.org/r/369449 (https://phabricator.wikimedia.org/T172211) [18:41:50] (03CR) 10Mobrovac: [C: 031] "PCC OK - https://puppet-compiler.wmflabs.org/compiler02/7250/" [puppet] - 10https://gerrit.wikimedia.org/r/369446 (https://phabricator.wikimedia.org/T170877) (owner: 10Mobrovac) [18:42:34] !log twentyafterfour@tin Started deploy [phabricator/deployment@3d728e1]: (no justification provided) [18:42:45] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:43:22] !log twentyafterfour@tin Finished deploy [phabricator/deployment@3d728e1]: (no justification provided) (duration: 00m 48s) [18:43:30] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:44:11] (03PS13) 10Urbanecm: Initial configuration for hiwikiversity [mediawiki-config] - 10https://gerrit.wikimedia.org/r/368165 (https://phabricator.wikimedia.org/T168765) [18:44:15] (03PS6) 10Urbanecm: Initial configuration for wikimania2018wiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/368168 (https://phabricator.wikimedia.org/T155038) [18:44:17] (03PS2) 10Urbanecm: Make wikiquote.png equivalent to enwikiquote.png [mediawiki-config] - 10https://gerrit.wikimedia.org/r/368244 (https://phabricator.wikimedia.org/T171887) [18:47:39] !log twentyafterfour@tin Started deploy [phabricator/deployment@3d728e1]: testing phab1001 deployment [18:47:48] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:47:55] !log twentyafterfour@tin Finished deploy [phabricator/deployment@3d728e1]: testing phab1001 deployment (duration: 00m 16s) [18:48:05] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:49:03] (03CR) 10Thcipriani: "> So, I'm not sure if we should be reusing the existing SSH keypair that we have...we never really sorted that question." [puppet] - 10https://gerrit.wikimedia.org/r/363726 (https://phabricator.wikimedia.org/T157414) (owner: 10Paladox) [18:49:18] !log twentyafterfour@tin Started deploy [phabricator/deployment@3d728e1]: testing phab1001 deployment [18:49:20] !log twentyafterfour@tin Finished deploy [phabricator/deployment@3d728e1]: testing phab1001 deployment (duration: 00m 02s) [18:49:28] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:49:36] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:50:02] PROBLEM - Check systemd state on phab1001 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [18:50:14] twentyafterfour ^^ [18:50:19] im guessing that's phd [18:50:53] yes, it is [18:50:54] ● phd.service loaded failed failed phabricator-phd [18:51:07] phd shouldn't be running though [18:51:10] that's kind of normal until now [18:51:11] yea [18:51:15] yep [18:51:27] we'll just ACK that until tomorrow [18:52:07] ACKNOWLEDGEMENT - Check systemd state on phab1001 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. daniel_zahn https://phabricator.wikimedia.org/T163938 [18:52:15] (03CR) 10Chad: "In which case, we'll need to do the following:" [puppet] - 10https://gerrit.wikimedia.org/r/363726 (https://phabricator.wikimedia.org/T157414) (owner: 10Paladox) [18:53:22] RainbowSprinkles im guessing your going to generate the new key? :) [18:53:44] Probably Daniel should, since the private one needs to go in a repo I can't touch :p [18:53:47] (or another opsen) [18:53:57] ok [18:54:19] RainbowSprinkles about restoring ssh::userkey{} call, wont that fail? [18:54:22] since scap does it [18:56:17] We'll need two keys [18:56:22] The one that's there now is for replication [18:56:25] The other one will be for scap [18:56:33] i think it's good to have separate keys, yep [18:56:40] (they have to have different names, which they should based on the $title?) [18:56:40] and i can make a new one [18:57:48] about admin groups i would ask what "assist in managing gerrit server" typically means and then either remove the whole admin group or give it meaningful sudo privileges lines, like typically: start/stop/restart service and view log files [18:58:43] (03Draft1) 10Paladox: gerrit: Merge gerrit-admin into gerrit-root [puppet] - 10https://gerrit.wikimedia.org/r/369452 [18:58:46] (03PS2) 10Paladox: gerrit: Merge gerrit-admin into gerrit-root [puppet] - 10https://gerrit.wikimedia.org/r/369452 [18:58:53] i wouldn't recommend just moving the entire admin group to roots without checking what is actually needed [18:58:56] mutante: Yeah. Basically all they can do is read log files right now [18:59:15] (03CR) 10Luke081515: [C: 031] Assign autopatrol to all holders of autoreview on arwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/369414 (https://phabricator.wikimedia.org/T167071) (owner: 10Urbanecm) [18:59:52] RainbowSprinkles mutante done https://gerrit.wikimedia.org/r/369452 :) [18:59:54] (03CR) 10Luke081515: [C: 031] Make wikiquote.png equivalent to enwikiquote.png [mediawiki-config] - 10https://gerrit.wikimedia.org/r/368244 (https://phabricator.wikimedia.org/T171887) (owner: 10Urbanecm) [19:00:02] most of them are releng anyways [19:00:05] twentyafterfour: Respected human, time to deploy MediaWiki train (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20170801T1900). Please do the needful. [19:00:07] Actually, let's just drop the group [19:00:14] done [19:00:14] I don't think Roan needs it (he already is a root anyway) [19:00:14] (03CR) 10Luke081515: [C: 031] Optimalize all PNGs in this repo [mediawiki-config] - 10https://gerrit.wikimedia.org/r/368423 (https://phabricator.wikimedia.org/T170569) (owner: 10Urbanecm) [19:00:39] RainbowSprinkles so i remove catrope from gerrit-root? [19:00:49] Yeah, just remove him and Christian for now, make it releng-only [19:00:54] ok [19:00:59] I don't think either really need it (and the former already can get it himself) [19:01:18] (03CR) 10Luke081515: [C: 031] Rename Wikinews namespace to Wikinieuws on nl.wikinews [mediawiki-config] - 10https://gerrit.wikimedia.org/r/369449 (https://phabricator.wikimedia.org/T172211) (owner: 10Urbanecm) [19:01:25] done [19:01:25] (03PS3) 10Paladox: gerrit: Merge gerrit-admin into gerrit-root [puppet] - 10https://gerrit.wikimedia.org/r/369452 [19:01:27] (03CR) 10Dzahn: [C: 04-1] "the "normal" way is giving admins the sudo privileges to start/stop/restart the service and read logs. the trend should be towards more fi" [puppet] - 10https://gerrit.wikimedia.org/r/369452 (owner: 10Paladox) [19:02:42] Er, wait, I misread that patch. [19:02:43] Hmm [19:02:54] This is escalating access for releng users... [19:03:05] !log twentyafterfour@tin Started scap: Sync 1.30.0-wmf.12 and build l10n refs T168053 [19:03:16] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:03:17] T168053: 1.30.0-wmf.12 deployment blockers - https://phabricator.wikimedia.org/T168053 [19:03:25] yep merging gerrit-admin into gerrit-root :) [19:03:46] Meh, let's not tackle this just yet, bad diversion [19:03:51] ok [19:03:58] Daniel's right, we should be more fine-grained here. [19:04:02] * RainbowSprinkles will figure that later [19:04:31] ok [19:04:59] (03PS1) 10Ayounsi: Icinga: remove juniper alerts service check if set to false [puppet] - 10https://gerrit.wikimedia.org/r/369454 [19:05:34] (03CR) 10Dzahn: [C: 031] Icinga: remove juniper alerts service check if set to false [puppet] - 10https://gerrit.wikimedia.org/r/369454 (owner: 10Ayounsi) [19:06:21] RainbowSprinkles re reading i now get how to do the ssh:: call. We will use a different user for the scap deploy. [19:06:25] Or am i wrong :) [19:06:36] Same user, different key [19:06:44] Users can have multiple acceptable ssh keys :) [19:07:07] oh [19:07:15] RainbowSprinkles: I am not a root, am I? [19:07:25] i thought we removed ssh::userkey due to some puppet error [19:07:28] You are, but Imma remove you :p [19:07:29] with scap [19:07:38] paladox: Because we were trying to reuse the same key twice. [19:07:40] :) [19:07:44] oh [19:07:45] i see [19:07:46] heh [19:08:03] * paladox will wait for the new code then update the patch to use two keys [19:08:24] code = key [19:11:22] (03PS1) 10Dzahn: gerrit/admins: remove catrope from Gerrit roots [puppet] - 10https://gerrit.wikimedia.org/r/369456 [19:13:06] (03PS4) 10Chad: Gerrit: clean up server permissions [puppet] - 10https://gerrit.wikimedia.org/r/369452 (owner: 10Paladox) [19:13:14] mutante: I hijacked the other change ^ [19:13:25] ah, ok [19:14:00] :) [19:14:06] (03CR) 10Jforrester: [C: 031] "Oops, sorry. LGTM." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/369448 (https://phabricator.wikimedia.org/T172199) (owner: 10Reedy) [19:14:53] RainbowSprinkles: that is the only admin group for qchris, so it means revoking his shell access.. is he really not using it ? [19:15:13] I mean that's a question for him, but I don't think he's used it in well over a year :) [19:15:35] qchris_ might be about.... :p [19:15:37] (yes that was a ping) [19:15:44] qchris_: :) hi [19:16:20] Apr 13 2017 [19:16:27] 10Operations, 10Android-app-feature-Compilations, 10Reading-Infrastructure-Team-Backlog, 10Traffic, 10Wikipedia-Android-App-Backlog: Determine how to upload Zim files to Swift infrastructure - https://phabricator.wikimedia.org/T172123#3490510 (10Mholloway) Hey @fgiunchedi, > > 1. For production swift ac... [19:16:41] Ah, ok. We can leave him for now then :) [19:17:05] Amending [19:17:29] ok [19:17:51] (03PS5) 10Chad: Gerrit: clean up server permissions [puppet] - 10https://gerrit.wikimedia.org/r/369452 (owner: 10Paladox) [19:17:59] (03Abandoned) 10Chad: gerrit/admins: remove catrope from Gerrit roots [puppet] - 10https://gerrit.wikimedia.org/r/369456 (owner: 10Dzahn) [19:18:09] (killed your change, sorry :p) [19:18:43] (03PS1) 10Ottomata: New Oliver Keyes key [puppet] - 10https://gerrit.wikimedia.org/r/369457 (https://phabricator.wikimedia.org/T171696) [19:18:45] (03CR) 10Dzahn: [C: 032] Gerrit: clean up server permissions [puppet] - 10https://gerrit.wikimedia.org/r/369452 (owner: 10Paladox) [19:18:57] no problem, i was going to click the same button :) [19:19:12] !log restarting Cassandra, restbase1004-a.eqiad.wmnet, aberrant read latency [19:19:21] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:19:31] RoanKattouw: I assume if you actually needed gerrit server access you could just DIY :p [19:21:29] oo mutante, merging, ok? [19:21:33] (03CR) 10Ottomata: [V: 032 C: 032] New Oliver Keyes key [puppet] - 10https://gerrit.wikimedia.org/r/369457 (https://phabricator.wikimedia.org/T171696) (owner: 10Ottomata) [19:22:31] mutante: ? [19:22:40] merged. [19:22:54] yea, that's fine [19:23:37] (03PS6) 10Dzahn: Gerrit: clean up server permissions [puppet] - 10https://gerrit.wikimedia.org/r/369452 (owner: 10Paladox) [19:24:03] actually, that wasnt mergeed yet [19:24:52] 10Operations, 10Android-app-feature-Compilations, 10Traffic, 10Wikipedia-Android-App-Backlog, 10Reading-Infrastructure-Team-Backlog (Kanban): Determine URL paths for Zim files - https://phabricator.wikimedia.org/T172148#3490549 (10Mholloway) Chatted with @Fjalapeno about this. Sounds like we'll end up k... [19:26:03] PROBLEM - HHVM rendering on mw1201 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:27:02] RECOVERY - HHVM rendering on mw1201 is OK: HTTP OK: HTTP/1.1 200 OK - 75615 bytes in 2.418 second response time [19:32:04] mutante, RainbowSprinkles: Agreed. I have not used shell access lately. Feel free to remove my access. If I run into a situation that I cannot debug/fix without shell access, I'll just ask for assistance :-) [19:32:28] You sure? It's harmless to leave it in place. Was just doing some spring cleaning :) [19:33:04] !log twentyafterfour@tin Finished scap: Sync 1.30.0-wmf.12 and build l10n refs T168053 (duration: 29m 58s) [19:33:14] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:33:14] T168053: 1.30.0-wmf.12 deployment blockers - https://phabricator.wikimedia.org/T168053 [19:33:15] I do not mind having access. Doing a quick check of the logs was useful in the past. [19:33:31] But ops are helpful :-) [19:33:37] Either way is fine by me. [19:34:28] keep it as it is then :) [19:35:37] Ok. Cool. Thanks. [19:37:14] 10Operations, 10Epic, 10Goal, 10Services (doing): End of August milestone: Cassandra 3 cluster in production - https://phabricator.wikimedia.org/T169939#3413888 (10mobrovac) How about putting VE in maintenance mode for 30 minutes and truncating all of the Parsoid tables? How much space would we gain by tru... [19:38:23] mutante: So quick patch to restore his? [19:38:28] 10Operations, 10Epic, 10Goal, 10Services (doing): End of August milestone: Cassandra 3 cluster in production - https://phabricator.wikimedia.org/T169939#3413888 (10Pchelolo) What we could do to safely truncate HTML tables is to do the following: Step 1: instead of just reading from the HTML/Data-Parsoid t... [19:39:16] RainbowSprinkles: If I needed Gerrit server access I would harass you until I stopped needing it :P [19:39:47] * RainbowSprinkles adds RoanKattouw to his /ignore list :p [19:42:01] 10Operations, 10Epic, 10Goal, 10Services (doing): End of August milestone: Cassandra 3 cluster in production - https://phabricator.wikimedia.org/T169939#3490607 (10mobrovac) Initially I started laying out an algo similar to yours, @Pchelolo, but soon enough realised the intensity of labour needed. Given th... [19:43:42] 10Operations, 10Epic, 10Goal, 10Services (doing): End of August milestone: Cassandra 3 cluster in production - https://phabricator.wikimedia.org/T169939#3490610 (10Pchelolo) @mobrovac Let's create a subtask and invite VE people to discuss the possibility of the maintenance window? [20:04:43] (03CR) 10Ayounsi: [C: 032] Icinga: remove juniper alerts service check if set to false [puppet] - 10https://gerrit.wikimedia.org/r/369454 (owner: 10Ayounsi) [20:04:49] (03PS2) 10Ayounsi: Icinga: remove juniper alerts service check if set to false [puppet] - 10https://gerrit.wikimedia.org/r/369454 [20:05:50] (03PS1) 10Jdlrobson: Update wordmark for Wikipedia Atikamekw [mediawiki-config] - 10https://gerrit.wikimedia.org/r/369464 (https://phabricator.wikimedia.org/T168203) [20:07:08] 10Operations, 10Wikimania-Hackathon-2017-Organization: Urgent - Wikimania needs hosting on a server for onsite conference guide - https://phabricator.wikimedia.org/T172217#3490714 (10Reedy) [20:08:55] RainbowSprinkles: but you amended it so he was never removed [20:09:00] i think it's all good [20:09:08] (03PS3) 10Jdlrobson: Add new mobile watermark for Urdu Wikipedia. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/368444 (https://phabricator.wikimedia.org/T171769) (owner: 10Reception123) [20:11:47] mutante: Oh dur [20:11:49] You're right [20:12:42] 10Operations, 10Android-app-feature-Compilations, 10Reading-Infrastructure-Team-Backlog, 10Wikipedia-Android-App-Backlog (Android-app-release-v2.7.20x-Dookie💩): Add libgumbo-dev to apt.wikimedia.org - https://phabricator.wikimedia.org/T172218#3490717 (10Mholloway) [20:13:48] 10Operations, 10Wikimania-Hackathon-2017-Organization: Urgent - Wikimania needs hosting on a server for onsite conference guide - https://phabricator.wikimedia.org/T172217#3490687 (10Dzahn) Where is the code please? is it in a public repo? it would need a security review before it can get on any production in... [20:14:02] 10Operations, 10Wikimania-Hackathon-2017-Organization: Wikimania needs hosting on a server for onsite conference guide - https://phabricator.wikimedia.org/T172217#3490734 (10Reedy) p:05Triage>03High [20:14:33] 10Operations, 10Wikimania-Hackathon-2017-Organization: Wikimania needs hosting on a server for onsite conference guide - https://phabricator.wikimedia.org/T172217#3490736 (10Dzahn) This is _really_ short notice for something like this, i'm afraid there is not enough time to get this properly reviewed and into... [20:15:59] 10Operations, 10Android-app-feature-Compilations, 10Packaging, 10Reading-Infrastructure-Team-Backlog, 10Wikipedia-Android-App-Backlog (Android-app-release-v2.7.20x-Dookie💩): Add libgumbo-dev to apt.wikimedia.org - https://phabricator.wikimedia.org/T172218#3490737 (10Mholloway) [20:16:01] 10Operations, 10Wikimania-Hackathon-2017-Organization, 10Release-Engineering-Team (Watching / External): Wikimania needs hosting on a server for onsite conference guide - https://phabricator.wikimedia.org/T172217#3490738 (10demon) [20:17:02] 10Operations, 10Wikimania-Hackathon-2017-Organization, 10Release-Engineering-Team (Watching / External): Wikimania needs hosting on a server for onsite conference guide - https://phabricator.wikimedia.org/T172217#3490741 (10Dzahn) > I have a technical person who is in charge Is that person here on Phabrica... [20:17:11] 10Operations, 10Wikimania-Hackathon-2017-Organization, 10Release-Engineering-Team (Watching / External): Wikimania needs hosting on a server for onsite conference guide - https://phabricator.wikimedia.org/T172217#3490742 (10demon) a:05bbogaert>03None [20:26:04] (03PS33) 10Paladox: Gerrit: Add support for scap [puppet] - 10https://gerrit.wikimedia.org/r/363726 (https://phabricator.wikimedia.org/T157414) [20:27:25] 10Operations, 10Wikimania-Hackathon-2017-Organization, 10Release-Engineering-Team (Watching / External): Wikimania needs hosting on a server for onsite conference guide - https://phabricator.wikimedia.org/T172217#3490755 (10Dzahn) Hey, so from Operations point of view.. it's a little short notice but we can... [20:28:12] 10Operations, 10Android-app-feature-Compilations, 10Packaging, 10Reading-Infrastructure-Team-Backlog, 10Wikipedia-Android-App-Backlog (Android-app-release-v2.7.20x-Dookie💩): Add libgumbo-dev to apt.wikimedia.org - https://phabricator.wikimedia.org/T172218#3490757 (10Mholloway) [20:29:17] 10Operations, 10Android-app-feature-Compilations, 10Packaging, 10Reading-Infrastructure-Team-Backlog, 10Wikipedia-Android-App-Backlog (Android-app-release-v2.7.20x-Dookie💩): Add libgumbo-dev to apt.wikimedia.org - https://phabricator.wikimedia.org/T172218#3490717 (10Mholloway) a:05Mholloway>03None [20:30:41] 10Operations, 10Android-app-feature-Compilations, 10Packaging, 10Reading-Infrastructure-Team-Backlog, 10Wikipedia-Android-App-Backlog (Android-app-release-v2.7.20x-Dookie💩): Add libgumbo-dev to apt.wikimedia.org - https://phabricator.wikimedia.org/T172218#3490777 (10Mholloway) [20:33:59] RainbowSprinkles what will the new ssh key be used for? [20:34:11] is the old one being used for replication [20:34:14] and the new one for scap? [20:34:43] Yes [20:36:04] thanks [20:37:28] (03PS1) 1020after4: group0 wikis to 1.30.0-wmf.12 refs refs T168053 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/369486 [20:37:28] (03CR) 1020after4: [C: 032] group0 wikis to 1.30.0-wmf.12 refs refs T168053 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/369486 (owner: 1020after4) [20:37:47] 10Operations, 10Android-app-feature-Compilations, 10Packaging, 10Reading-Infrastructure-Team-Backlog, 10Wikipedia-Android-App-Backlog (Android-app-release-v2.7.20x-Dookie💩): Unable to locate package libgumbo-dev - https://phabricator.wikimedia.org/T172218#3490787 (10Mholloway) [20:37:54] !!! [20:39:07] 10Operations, 10Android-app-feature-Compilations, 10Packaging, 10Reading-Infrastructure-Team-Backlog, 10Wikipedia-Android-App-Backlog (Android-app-release-v2.7.20x-Dookie💩): Unable to locate package libgumbo-dev - https://phabricator.wikimedia.org/T172218#3490717 (10Mholloway) [20:39:22] (03Merged) 10jenkins-bot: group0 wikis to 1.30.0-wmf.12 refs refs T168053 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/369486 (owner: 1020after4) [20:39:30] (03CR) 10jenkins-bot: group0 wikis to 1.30.0-wmf.12 refs refs T168053 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/369486 (owner: 1020after4) [20:39:36] ok, i am making a new deployment key for gerrit [20:39:46] so that is secrets/keyholder/ [20:39:53] unlike the other key [20:40:32] !log twentyafterfour@tin rebuilt wikiversions.php and synchronized wikiversions files: group0 wikis to 1.30.0-wmf.12 refs refs T168053 [20:40:39] (03PS1) 10Jdlrobson: Disable page previews on a variety of pages [mediawiki-config] - 10https://gerrit.wikimedia.org/r/369494 (https://phabricator.wikimedia.org/T170893) [20:40:43] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:40:44] T168053: 1.30.0-wmf.12 deployment blockers - https://phabricator.wikimedia.org/T168053 [20:40:58] refs refs? [20:41:38] (03CR) 10jerkins-bot: [V: 04-1] Disable page previews on a variety of pages [mediawiki-config] - 10https://gerrit.wikimedia.org/r/369494 (https://phabricator.wikimedia.org/T170893) (owner: 10Jdlrobson) [20:44:04] RainbowSprinkles ^^ [20:44:06] (03PS1) 10Jdlrobson: Exclude files from Special:ShortPages on commons [mediawiki-config] - 10https://gerrit.wikimedia.org/r/369503 (https://phabricator.wikimedia.org/T170687) [20:44:17] create mode 100644 modules/secret/secrets/keyholder/gerrit [20:44:19] create mode 100644 modules/secret/secrets/keyholder/gerrit.pub [20:44:42] there it is, puppet should be able to use it now [20:44:50] Does this https://gerrit.wikimedia.org/r/#/c/363726/ look correct now [20:45:27] eh, you are adding a new pubkey? [20:45:47] mutante nope i though i needed to create a copy of it for it to work [20:45:47] no, that's not the one i just created [20:45:48] (03PS34) 10Paladox: Gerrit: Add support for scap [puppet] - 10https://gerrit.wikimedia.org/r/363726 (https://phabricator.wikimedia.org/T157414) [20:45:57] well, here's the thing [20:45:58] but looks like it will work with what you just did so amened :) [20:46:06] i did say that i think pub key should be in pub repo [20:46:17] but then you guys showed me how that's not the case for any existing keypair [20:46:28] and i said i don't want to make you change them all [20:46:33] ok [20:46:55] of course i can add the new pub key in pub repo [20:47:09] but then it has to be the one i made and not the one you made [20:47:35] ok [20:47:42] mutante what you did should work [20:47:42] :) [20:48:32] (03CR) 10Paladox: [C: 031] "Tested and works" [puppet] - 10https://gerrit.wikimedia.org/r/363726 (https://phabricator.wikimedia.org/T157414) (owner: 10Paladox) [20:49:47] RainbowSprinkles added back ssh:: param and works locally at least :) [20:49:49] is it gerrit/gerrit? [20:49:53] not gerrit/deploy? [20:50:00] gerrit/gerrit [20:50:16] i see other things have /deploy [20:50:42] hmm [20:51:37] will that work? [20:51:41] not very familiar with scap setup.. re: that gerrit2 deploy user. [20:51:44] It doesn't have to be /deploy [20:52:31] ok, and that gerrit2 user looks right to you too? [20:53:11] the user already exists on cobalt [20:53:18] do we really need to add it in puppet then? [20:53:25] yeh [20:53:29] scap depends on that [20:53:32] oh, because the package created it, right [20:53:33] Well, it'll need to be controlled there as we'll be dismantling the deb package [20:53:36] and now we won't use the package anymore [20:53:43] yep, makes sense [20:53:47] And yeah, scap::target{} requires => User[] [20:54:09] yep [20:54:29] 10Operations, 10Wikimania-Hackathon-2017-Organization, 10Release-Engineering-Team (Watching / External): Wikimania needs hosting on a server for onsite conference guide - https://phabricator.wikimedia.org/T172217#3490872 (10eyoung) This was supposed to be something that our Volunteer team was going to handle... [20:55:04] (03CR) 10Dzahn: [C: 031] Gerrit: Add support for scap [puppet] - 10https://gerrit.wikimedia.org/r/363726 (https://phabricator.wikimedia.org/T157414) (owner: 10Paladox) [20:55:39] (03PS35) 10Paladox: Gerrit: Add support for scap [puppet] - 10https://gerrit.wikimedia.org/r/363726 (https://phabricator.wikimedia.org/T157414) [20:57:20] 10Operations, 10fundraising-tech-ops, 10netops: bonded/redundant network connections for fundraising hosts - https://phabricator.wikimedia.org/T171962#3481495 (10faidon) (T119654 is a restricted task, I have no access to it) While I don't think bonded ports are particularly problematic, I think we should be... [20:58:13] 10Operations, 10Wikimania-Hackathon-2017-Organization, 10Release-Engineering-Team (Watching / External): Wikimania needs hosting on a server for onsite conference guide - https://phabricator.wikimedia.org/T172217#3490687 (10MaxSem) > I thought I had added him on the original request. I've tried adding as a s... [20:58:35] hashar: you have a good point of contact for this? https://phabricator.wikimedia.org/T172218 [20:59:31] Or anyone else? [20:59:37] 10Operations, 10Android-app-feature-Compilations, 10Packaging, 10Reading-Infrastructure-Team-Backlog, 10Wikipedia-Android-App-Backlog (Android-app-release-v2.7.20x-Dookie💩): Unable to locate package libgumbo-dev - https://phabricator.wikimedia.org/T172218#3490717 (10Paladox) It's in the stretch repo http... [20:59:58] paladox: thanks! [21:00:06] your welcome :) [21:00:07] coreyfloyd: it is not in Jessie per https://packages.debian.org/search?keywords=libgumbo-dev [21:00:17] and what Paladox said :) [21:00:25] :) [21:00:27] 10Operations, 10Android-app-feature-Compilations, 10Packaging, 10Reading-Infrastructure-Team-Backlog, 10Wikipedia-Android-App-Backlog (Android-app-release-v2.7.20x-Dookie💩): Unable to locate package libgumbo-dev - https://phabricator.wikimedia.org/T172218#3490904 (10Paladox) You could possibly manually i... [21:00:31] potentially MAYBE it can be rebuild / backported to Jessie which is something for operations team [21:00:48] or easier: switch to a stretch VPS cloud (TM) [21:01:04] hashar: ahh got it… might be something we can do [21:01:10] mdholloway: ^ [21:01:12] coreyfloyd: and definitely avoid targetting Trusty [21:01:30] if you still have things in production that is using Trusty, it should migrate to Jessie [21:01:34] (afaik) [21:01:38] hashar: ok, yeah we are dusting off the old mobile apps project for some new work [21:01:39] honestly if your going to update your better off stretch unless you absolutely cannot [21:01:50] So it may be time to make some changes to it [21:02:54] 10Operations, 10Epic, 10Goal, 10Services (doing): End of August milestone: Cassandra 3 cluster in production - https://phabricator.wikimedia.org/T169939#3490932 (10Eevans) >>! In T169939#3490593, @mobrovac wrote: > How about putting VE in maintenance mode for 30 minutes and truncating all of the Parsoid ta... [21:02:56] coreyfloyd: hashar: Zppix: paladox: thanks, i'll switch to stretch then! [21:03:01] We can investigate that for sure… I don’t think anything would block us from updating to stretch [21:03:08] :) [21:03:11] yeah, should be fine [21:03:12] What mdholloway said [21:03:15] (03PS8) 10Ottomata: Kafka broker profile and roles for new 'jumbo' cluster and 'simple' cluster [puppet] - 10https://gerrit.wikimedia.org/r/356232 (https://phabricator.wikimedia.org/T166162) [21:03:43] (03PS9) 10Ottomata: Kafka broker profile and roles for new 'jumbo' cluster and 'simple' cluster [puppet] - 10https://gerrit.wikimedia.org/r/356232 (https://phabricator.wikimedia.org/T166162) [21:03:56] coreyfloyd i left a comment that should at least allow you to install the package manually [21:04:04] fixing puppet error as it will detect it is installed [21:04:05] :) [21:04:55] paladox: 👍 [21:05:23] :) [21:08:29] coreyfloyd: and if you target production, better double check with them which distribution will be used (most probably jessie for now) [21:09:53] coreyfloyd: and maybe you can get the HTML rendered pages directly from RESTBase :] [21:10:18] anyway good luck! [21:11:34] hashar: coreyfloyd: i was thinking we probably won't be running stretch in production for quite some time. anyway, this unblocks us for the moment :] thanks again! [21:11:50] 10Operations, 10Wikimania-Hackathon-2017-Organization, 10Release-Engineering-Team (Watching / External): Wikimania needs hosting on a server for onsite conference guide - https://phabricator.wikimedia.org/T172217#3490973 (10Dzahn) @eyoung I have mailed Antoine and asked him to please join us here on the tick... [21:16:16] hashar: btw, a couple of our instances in the Mobile project are displaying (deprecated xxxx-xx-xx) messages next to the image name. is there any special process for keeping cloud vps instances up to date? [21:16:21] (03CR) 10Dzahn: [C: 032] Gerrit: Add support for scap [puppet] - 10https://gerrit.wikimedia.org/r/363726 (https://phabricator.wikimedia.org/T157414) (owner: 10Paladox) [21:16:32] mdholloway is that puppet? [21:16:41] (03CR) 10Chad: [C: 031] "Let's give it a shot. Should test on gerrit2001 prior to cobalt (let's disable puppet). Also should probably do the same with tin/wasat." [puppet] - 10https://gerrit.wikimedia.org/r/363726 (https://phabricator.wikimedia.org/T157414) (owner: 10Paladox) [21:17:18] paladox: yeah, i'm trying to puppetize this and Do Things Right [21:17:20] (03CR) 10Chad: "Heh, too late to object anyway :p" [puppet] - 10https://gerrit.wikimedia.org/r/363726 (https://phabricator.wikimedia.org/T157414) (owner: 10Paladox) [21:17:23] 10Operations, 10Epic, 10Goal, 10Services (doing): End of August milestone: Cassandra 3 cluster in production - https://phabricator.wikimedia.org/T169939#3491011 (10Pchelolo) >>! In T169939#3490932, @Eevans wrote: > Truncating the Wikipedia Parsoid tables (and nothing else) would free ~21T. By comparison,... [21:17:24] mdholloway ah when you create your instance you used an image that's now deprecated [21:17:33] you can ignore it [21:17:33] RainbowSprinkles: best timing :) [21:17:40] disabled puppet on cobalt [21:17:54] you can do apt-get update && apt-get upgrade to update to latest debian updates [21:17:56] mutante thanks :) [21:18:15] paladox: ok, just wanted to make sure there's no negative consequence i should be worried about [21:18:17] runs puppet on gerrit2001 [21:18:19] mdholloway: Generally, "deprecated" isn't a big deal as long as it's a supported release of Debian. [21:18:22] ok [21:19:00] It just means that particular image isn't what would be used if you rebuilt that machine [21:19:02] Notice: /Stage[main]/Scap/Package[scap]/ensure: ensure changed 'purged' to '3.6.0-2' [21:19:08] Error: Execution of '/usr/bin/scap deploy-local --repo gerrit/gerrit -D log_json:False' returned 70: 21:18:36 WARNING - Unhandled error: [21:19:08] :) [21:19:21] hmmmm [21:19:21] run it twice [21:19:30] 'k [21:19:36] i had to do it twice. :) [21:19:52] Config file not found: 'tin.eqiad.wmnet/gerrit/gerrit/.git/DEPLOY_HEAD' [21:20:01] ah [21:20:01] runs puppet on tin (?) [21:20:05] yeh [21:20:12] please [21:20:32] secret(): invalid secret keyholder/gerrit_root [21:20:40] it's called "gerrit" [21:20:44] where does _root come from [21:21:00] runs puppet on wasat too [21:21:27] eh, wrong, on naos [21:21:50] PROBLEM - puppet last run on gerrit2001 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 1 minute ago with 1 failures. Failed resources (up to 3 shown): Package[gerrit/gerrit] [21:21:55] paladox: the private key is not called "gerrit_root" [21:22:02] nope it should not [21:22:09] for some reason it looks for that [21:22:18] hmm [21:22:33] does it get it from here https://gerrit.wikimedia.org/r/#/c/363726/34/hieradata/role/common/deployment_server.yaml [21:22:40] PROBLEM - puppet last run on tin is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [21:22:41] gerrit-root: [21:22:41] trusted_groups: [21:22:41] - gerrit-root [21:22:51] yea, i see it [21:23:06] they keyholder agents [21:23:15] line 55 [21:23:15] i guess we could copy gerrit to gerrit_root (key) [21:23:22] ah [21:23:32] in that place it's not the group name [21:23:36] it's the name of the SSH key [21:23:39] that keyholder loads [21:24:04] yep [21:25:00] PROBLEM - puppet last run on naos is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [21:25:10] i'll fix it.. waiting for git pull.. ehm [21:25:46] (03Draft1) 10Paladox: gerrit: Copy gerrit sh keys into gerrit_root [labs/private] - 10https://gerrit.wikimedia.org/r/369551 [21:25:48] (03PS2) 10Paladox: gerrit: Copy gerrit sh keys into gerrit_root [labs/private] - 10https://gerrit.wikimedia.org/r/369551 [21:25:53] mutante ^^ [21:26:04] for a second i thought git clone is broken [21:26:07] but it was just me [21:26:26] oh [21:26:41] no, a different fix. let me show you [21:26:43] (03PS3) 10Paladox: gerrit: Copy gerrit ssh keys into gerrit_root [labs/private] - 10https://gerrit.wikimedia.org/r/369551 [21:26:44] one sec [21:26:49] ok [21:26:56] (03CR) 10Chad: [V: 032 C: 032] gerrit: Copy gerrit ssh keys into gerrit_root [labs/private] - 10https://gerrit.wikimedia.org/r/369551 (owner: 10Paladox) [21:27:08] (03CR) 10Chad: [V: 032 C: 032] gerrit: Copy gerrit ssh keys into gerrit_root [labs/private] - 10https://gerrit.wikimedia.org/r/369551 (owner: 10Paladox) [21:27:14] no, dont [21:27:19] Oh. [21:27:19] Shit. [21:27:29] (03PS1) 10Chad: nvm I'm dumb: Revert "gerrit: Copy gerrit ssh keys into gerrit_root" [labs/private] - 10https://gerrit.wikimedia.org/r/369552 [21:27:35] (03CR) 10Chad: [V: 032 C: 032] nvm I'm dumb: Revert "gerrit: Copy gerrit ssh keys into gerrit_root" [labs/private] - 10https://gerrit.wikimedia.org/r/369552 (owner: 10Chad) [21:29:25] uploading..slowness..hmm [21:29:42] (03PS1) 10Dzahn: gerrit/deployment_server: fix scap for Gerrit [puppet] - 10https://gerrit.wikimedia.org/r/369553 (https://phabricator.wikimedia.org/T157414) [21:29:56] there, that's the fix [21:30:19] you have a keyholder agent with a key.. and then a group to go with it [21:31:09] (03CR) 10Dzahn: [V: 032 C: 032] gerrit/deployment_server: fix scap for Gerrit [puppet] - 10https://gerrit.wikimedia.org/r/369553 (https://phabricator.wikimedia.org/T157414) (owner: 10Dzahn) [21:31:46] ok [21:32:03] ah [21:32:04] thanks [21:32:35] runs puppet again both deployment servers [21:32:39] :) [21:32:41] expects icinga [21:32:56] it creates the group now and all thaat [21:33:03] :) [21:33:22] ok, no errors there [21:33:27] now on gerrit2001 again [21:33:41] RECOVERY - puppet last run on tin is OK: OK: Puppet is currently enabled, last run 29 seconds ago with 0 failures [21:33:47] :) [21:34:10] RECOVERY - puppet last run on naos is OK: OK: Puppet is currently enabled, last run 44 seconds ago with 0 failures [21:34:49] it is still missing the config file on the deployment server [21:35:00] gerrit/gerrit/.git/DEPLOY_HEAD [21:35:16] oh [21:35:18] ah [21:35:21] hmm [21:35:23] they are happy now but that file did not show up [21:35:32] try scap deploy from tin [21:35:50] PROBLEM - Keyholder SSH agent on tin is CRITICAL: CRITICAL: Keyholder is not armed. Run keyholder arm to arm it. [21:35:58] ooh, duh ^ :) [21:36:04] thanks icinga [21:36:13] i guess that's why :) [21:36:20] PROBLEM - Keyholder SSH agent on naos is CRITICAL: CRITICAL: Keyholder is not armed. Run keyholder arm to arm it. [21:37:20] RECOVERY - Keyholder SSH agent on naos is OK: OK: Keyholder is armed with all configured keys. [21:37:28] is happy that he doesn't have to enter 10 different passphrases anymore for that [21:37:31] but just one [21:37:34] :) [21:37:51] RECOVERY - Keyholder SSH agent on tin is OK: OK: Keyholder is armed with all configured keys. [21:38:24] RainbowSprinkles: do you want to take over and try actual scap run from here? [21:38:44] does it need initial deploy step? [21:38:56] still kept puppet disabled on cobalt [21:39:40] Hmm [21:40:12] Ah, missing dsh group file [21:40:12] so how do we get the config on tin [21:40:13] 21:40:02 deploy failed: [Errno 2] Error: /etc/dsh/group/gerrit is not a file.: '/etc/dsh/group/gerrit' [21:40:17] It's on tin :) [21:40:19] oh, didnt see that [21:40:27] just Config file not found: 'tin.eqiad.wmnet/gerrit/gerrit/.git/DEPLOY_HEAD' [21:40:57] Oh, that's on gerrit's end. We need to execute from tin first [21:41:14] ok, right. yea, i have never done that [21:41:26] thcipriani: Is there an automagic way I can get the dsh group ^ [21:41:33] Or do I need a file{} stanza in my manifest? [21:42:48] RainbowSprinkles: you can get a dsh file by adding servers into hieradata/common/scap/dsh.yaml or by just putting a list of servers in the scap directory [21:43:23] Oh, I can just put it in scap dir? #til [21:43:29] so /srv/deployment/gerrit/gerrit/gerrit [21:43:40] Dear jebus [21:43:43] department of redundancy department [21:43:58] er /srv/deployment/gerrit/gerrit/scap/gerrit [21:43:59] :) gerrit/gerrit/gerrit/DEPLOY :) [21:44:12] !log demon@tin Started deploy [gerrit/gerrit@15f1544]: (no justification provided) [21:44:20] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:44:35] dearrrrr jeasus [21:44:47] are you just deploying to gerrit2001? [21:44:48] (deploy aborted) [21:45:00] mutante: Actually this is harmless, nothing replaces existing services [21:45:06] We can turn puppet on there [21:45:07] ok :) [21:45:15] yep all harmless [21:45:19] it's all in /srv/ :) [21:45:31] puppet re-enabled [21:45:39] !log demon@tin Started deploy [gerrit/gerrit@15f1544]: (no justification provided) [21:45:45] !log demon@tin Finished deploy [gerrit/gerrit@15f1544]: (no justification provided) (duration: 00m 06s) [21:45:49] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:45:50] All lies ^ [21:45:55] hah [21:45:58] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:46:16] RainbowSprinkles: oh! you still need the dsh_targets line [21:46:28] cobalt is getting new ferm rule for deployment ssh .. now [21:46:34] and scap user [21:46:37] !log demon@tin Started deploy [gerrit/gerrit@15f1544]: (no justification provided) [21:46:43] finished [21:46:45] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:46:54] thcipriani: Got that just now :) [21:46:58] :) [21:47:05] * thcipriani deploy stalks [21:47:07] !log demon@tin Started deploy [gerrit/gerrit@15f1544]: Initial deploy [21:47:14] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:47:27] 21:47:07 ['/usr/bin/scap', 'deploy-local', '-v', '--repo', 'gerrit/gerrit', '-g', 'default', 'fetch', '--refresh-config'] on cobalt.wikimedia.org returned [255]: Received disconnect from 2620:0:861:3:208:80:154:81: 2: Too many authentication failures for gerrit2 from 2620:0:861:101:10:64:0:196 port 43486 ssh2 [21:47:47] !log demon@tin Finished deploy [gerrit/gerrit@15f1544]: Initial deploy (duration: 00m 40s) [21:47:55] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:47:58] (lies, still) [21:48:09] maiden voyage [21:48:45] !log demon@tin Started deploy [gerrit/gerrit@15f1544]: Initial deploy x2 [21:48:54] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:48:59] Yep, continued authentication failures [21:49:02] Something's not right key-wise [21:49:07] should double-check that you can ssh in: SSH_AUTH_SOCK=/run/keyholder/proxy.sock -l gerrit2 cobolt.wikimedia.org [21:49:15] er, missing an ssh in there [21:49:19] doint we need to activate the key thingy on the gerrit hosts? [21:49:36] SSH_AUTH_SOCK=/run/keyholder/proxy.sock ssh -l gerrit2 cobolt.wikimedia.org [21:49:43] paladox: i did that [21:49:47] with "keyholder arm" [21:49:49] ok [21:49:50] hmm [21:50:12] oh i see, dosen't gerrit hosts need to be added to known_hosts [21:50:14] cobalt [21:50:16] on the deployment servers [21:50:23] It already is known [21:50:26] All hosts are known [21:50:27] ok [21:50:44] This is auth failures, not unknown host or key failure [21:50:53] ok [21:51:38] i will be happy to debug it and check if the key is really loaded in keyholder, but i have to first get one thing done that requires short "afk" [21:52:45] Well I've got sudo on the gerrit side, trying to debug [21:53:22] try with "ssh -i /path/to/the/private_key" [21:53:33] and use the key directly.. see if it works [21:53:48] Um, I can't? I shouldn't be able to if I'm not root [21:53:53] (that's the point of keyholder...) [21:54:46] right, you don't have sudo on tin.. *nod* [21:55:06] if you do the: SSH_AUTH_SOCK=/run/keyholder/proxy.sock ssh -l gerrit2 [gerrithost], and it doesn't give the command "agent refused to sign key" then keyholder *should* be ok, permissions wise (/etc/keyholder-auth.d) [21:55:30] https://phabricator.wikimedia.org/P5838 is what I gots [21:55:46] tin:/etc/keyholder.d# ssh -i gerrit gerrit2@cobalt.wikimedia.org [21:55:46] Password: [21:57:56] on cobalt, there is no new key in /var/lib/gerrit2/.ssh/authorized_keys [21:58:02] that is the home of gerrit2 user [21:58:13] file hasn't been updated since May 2 [21:58:21] RECOVERY - puppet last run on gerrit2001 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [21:58:38] Hmm, maybe because manage_home is false? [21:58:44] i had the same thought [21:58:47] I thought scap::target handled that [21:58:49] but not sure yet [21:59:19] sorry, i have to go afk and will totally continue at this step unless you find more meanwhile [21:59:53] RainbowSprinkles i can try manage_home to see if it has any problems [22:00:04] MaxSem: Dear anthropoid, the time has come. Please deploy Kartographer deploy (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20170801T2200). [22:00:04] MaxSem: A patch you scheduled for Kartographer deploy is about to be deployed. Please be available during the process. [22:00:19] wee [22:01:10] RainbowSprinkles: it does handle that but the key path in /etc/ssh/userkeys/[blah] IIRC [22:01:59] check /etc/ssh/sshd_config -> AuthorizedKeysFile to be sure [22:02:50] gerrit2 is there, but it's probably the old one from ssh::userkeys{}? [22:03:04] Ya looks like it [22:04:22] (03PS3) 10MaxSem: Redo "enable mapframe for euwiki, ptwiki and uawikimedia" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/369434 (https://phabricator.wikimedia.org/T167619) [22:04:41] tested manage home [22:04:46] and should not cause any problems [22:05:17] (03PS1) 10EBernhardson: Add syslog-udp for logstash testing on 11515 [puppet] - 10https://gerrit.wikimedia.org/r/369559 (https://phabricator.wikimedia.org/T166126) [22:05:38] (03Draft1) 10Paladox: Gerrit: Enable manage_home in scap [puppet] - 10https://gerrit.wikimedia.org/r/369560 [22:05:40] (03PS2) 10Paladox: Gerrit: Enable manage_home in scap [puppet] - 10https://gerrit.wikimedia.org/r/369560 [22:05:49] mutante RainbowSprinkles thcipriani ^^ :) [22:06:00] Well, we don't know that'll even fix things. [22:06:16] (03CR) 10MaxSem: [C: 032] Redo "enable mapframe for euwiki, ptwiki and uawikimedia" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/369434 (https://phabricator.wikimedia.org/T167619) (owner: 10MaxSem) [22:06:44] RainbowSprinkles: I think I see the problem. If you say manage_home => false then scap::target expects you to take care of the keys [22:06:52] Ah ok, so that will fix things [22:06:55] We think :) [22:07:07] wait [22:07:08] there's two manage_home [22:07:09] one for user and one for scap [22:07:14] er sorry manage_user [22:07:39] for scap::target. Not manage_home for the user. [22:07:49] Yeah, for the target [22:08:04] yep [22:08:37] jouncebot: next [22:08:37] In 0 hour(s) and 51 minute(s): Evening SWAT (Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20170801T2300) [22:09:01] (03CR) 10EBernhardson: "puppet compiler: http://puppet-compiler.wmflabs.org/7251/logstash1001.eqiad.wmnet/" [puppet] - 10https://gerrit.wikimedia.org/r/369559 (https://phabricator.wikimedia.org/T166126) (owner: 10EBernhardson) [22:09:33] if manage_user is set that will cause puppet to fail. [22:09:37] (03CR) 10Thcipriani: "inline comment to fix ssh." (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/369560 (owner: 10Paladox) [22:09:41] manage_home under user should work [22:10:00] ah [22:10:09] * MaxSem makes another sacrifice to the great Zuul [22:11:08] what do i put in ssh::userkey ? [22:11:18] (03PS1) 10Chad: Adding targets since we don't have a dsh list [software/gerrit] - 10https://gerrit.wikimedia.org/r/369562 [22:11:24] oh [22:11:27] i see [22:12:28] RainbowSprinkles is it secret('gerrit/gerrit') under ssh:userkey? [22:13:21] Whatever Daniel made the file named [22:13:24] (03PS2) 10EBernhardson: Add syslog-udp for logstash testing on 11515 [puppet] - 10https://gerrit.wikimedia.org/r/369559 (https://phabricator.wikimedia.org/T166126) [22:13:26] But tbh, we should get manage_home working [22:13:28] ok [22:13:32] secret("keyholder/gerrit.pub") [22:13:42] (this doesn't have to be rushed, puppet is running fine everywhere) [22:14:28] thanks [22:14:55] (03PS3) 10Paladox: Gerrit: Enable manage_home in scap [puppet] - 10https://gerrit.wikimedia.org/r/369560 [22:15:06] (03CR) 10Paladox: Gerrit: Enable manage_home in scap (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/369560 (owner: 10Paladox) [22:15:38] (03Merged) 10jenkins-bot: Redo "enable mapframe for euwiki, ptwiki and uawikimedia" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/369434 (https://phabricator.wikimedia.org/T167619) (owner: 10MaxSem) [22:16:37] (03CR) 10Paladox: [C: 031] Adding targets since we don't have a dsh list [software/gerrit] - 10https://gerrit.wikimedia.org/r/369562 (owner: 10Chad) [22:16:51] (03CR) 10jenkins-bot: Redo "enable mapframe for euwiki, ptwiki and uawikimedia" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/369434 (https://phabricator.wikimedia.org/T167619) (owner: 10MaxSem) [22:17:47] works on pt: [22:17:54] (03PS3) 10EBernhardson: Add syslog-udp for logstash testing on 11515 [puppet] - 10https://gerrit.wikimedia.org/r/369559 (https://phabricator.wikimedia.org/T166126) [22:18:41] (03CR) 10Chad: [V: 032 C: 032] Adding targets since we don't have a dsh list [software/gerrit] - 10https://gerrit.wikimedia.org/r/369562 (owner: 10Chad) [22:19:44] thcipriani can we define mutiple keys in ssh::userkey? [22:19:53] i get [22:19:53] Error: Could not retrieve catalog from remote server: Error 400 on SERVER: Duplicate declaration: File[/etc/ssh/userkeys/gerrit2] [22:20:20] !log maxsem@tin Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/369434/3 (duration: 00m 47s) [22:20:38] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:20:48] debt, https://sv.wikipedia.org/wiki/Anv%C3%A4ndare:MaxSem/sandl%C3%A5da [22:22:23] paladox: it looks like you need to define skey to use ssh::userkey multiple times [22:22:41] something like source => []? [22:22:47] the brackets (array) [22:23:13] (03CR) 10Bmansurov: "I think the SVG should be run through svgo so that it's optimized." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/368194 (https://phabricator.wikimedia.org/T168203) (owner: 10Reception123) [22:23:49] Does sshkey support multiple sources? [22:23:55] (03PS1) 10Ladsgroup: Write to term_full_entity_id column in wb_terms table in prod too [mediawiki-config] - 10https://gerrit.wikimedia.org/r/369567 (https://phabricator.wikimedia.org/T167229) [22:24:04] (03CR) 10EBernhardson: "puppet-compiler: http://puppet-compiler.wmflabs.org/7254/logstash1001.eqiad.wmnet/" [puppet] - 10https://gerrit.wikimedia.org/r/369559 (https://phabricator.wikimedia.org/T166126) (owner: 10EBernhardson) [22:24:10] it seems so [22:24:17] "skey => "? [22:24:19] see https://github.com/wikimedia/puppet/blob/48a40fc3aaa43f30080b205ddc6330853e0eb9de/modules/mediawiki/manifests/users.pp#L26 [22:24:24] paladox: if you don't define skey then you'll get a duplicate declaration each time, you need to add soemthing like: skey => 'gerrit-scap' to the ssh::userkey declaration [22:24:53] if you don't use the optional parameter, it just uses the username as ressource name [22:25:00] which works.. as long as you don't have more than one [22:25:10] (03CR) 10Bmansurov: Update wordmark for Wikipedia Atikamekw (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/369464 (https://phabricator.wikimedia.org/T168203) (owner: 10Jdlrobson) [22:25:13] ah [22:25:40] goes to drive a car [22:25:43] (03PS2) 10Jdlrobson: Update wordmark for Wikipedia Atikamekw [mediawiki-config] - 10https://gerrit.wikimedia.org/r/369464 (https://phabricator.wikimedia.org/T168203) [22:25:48] will be back [22:26:37] (03CR) 10Bmansurov: [C: 031] Add new mobile watermark for Urdu Wikipedia. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/368444 (https://phabricator.wikimedia.org/T171769) (owner: 10Reception123) [22:27:02] (03CR) 10Ayounsi: [C: 031] Add syslog-udp for logstash testing on 11515 [puppet] - 10https://gerrit.wikimedia.org/r/369559 (https://phabricator.wikimedia.org/T166126) (owner: 10EBernhardson) [22:27:35] MaxSem: you swatted early! [22:28:02] wanted to have a bit more time, so grabbed a window [22:28:15] MaxSem: gotcha! [22:28:17] (03CR) 10Bmansurov: [C: 031] Disable page previews on a variety of pages [mediawiki-config] - 10https://gerrit.wikimedia.org/r/369494 (https://phabricator.wikimedia.org/T170893) (owner: 10Jdlrobson) [22:28:50] thanks thcipriani that works :) [22:28:56] * paladox amends [22:29:10] (03CR) 10Ayounsi: [C: 032] Add syslog-udp for logstash testing on 11515 [puppet] - 10https://gerrit.wikimedia.org/r/369559 (https://phabricator.wikimedia.org/T166126) (owner: 10EBernhardson) [22:29:17] (03PS4) 10Ayounsi: Add syslog-udp for logstash testing on 11515 [puppet] - 10https://gerrit.wikimedia.org/r/369559 (https://phabricator.wikimedia.org/T166126) (owner: 10EBernhardson) [22:29:50] (03PS4) 10Paladox: Gerrit: Enable manage_home in scap [puppet] - 10https://gerrit.wikimedia.org/r/369560 [22:30:28] i get https://phabricator.wikimedia.org/P5840 [22:30:32] :) [22:31:16] "Notice: /Stage[main]/Security::Access/File[/etc/security/access.conf.d/60-scap-allow-gerrit2]/ensure: removed" can be ignored, im running something manually from my user as i always had problems adding the ssh key on labs to that user [22:31:41] (03CR) 10Bmansurov: [C: 031] Update wordmark for Wikipedia Atikamekw [mediawiki-config] - 10https://gerrit.wikimedia.org/r/369464 (https://phabricator.wikimedia.org/T168203) (owner: 10Jdlrobson) [22:35:50] PROBLEM - CirrusSearch eqiad 95th percentile latency on graphite1001 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [1000.0] [22:36:11] ebernhardson, ^ [22:36:38] MaxSem: did you have to do the null edit to get the maps to work? [22:36:44] MaxSem: yea. it happens :( [22:40:50] RECOVERY - CirrusSearch eqiad 95th percentile latency on graphite1001 is OK: OK: Less than 20.00% above the threshold [500.0] [22:41:50] MaxSem: mapframe isn't working for me yet on my test pages on euwiki and ptwiki [22:42:56] what am I doing wrong? https://eu.wikipedia.org/wiki/Lankide:DTankersley_(WMF)/map_sample [22:43:23] I tried the null edit and it wasn't working. [22:43:30] I tried different browsers and it didn't work [22:43:37] now - using your link - it works. [22:43:39] what the what [22:43:44] I did a null edit and it worked [22:45:00] let me try it again [22:46:21] crazy internet things. I think I hit 'cancel' instead of 'save' for my null eidt [22:46:23] *edit [22:46:28] duuuuuurrrr [22:47:50] thanks again, MaxSem, you're the bestest! :) [22:52:05] (03PS11) 10Paladox: gerrit: DO NOT MERGE [software/gerrit] - 10https://gerrit.wikimedia.org/r/363738 [22:52:17] (03PS10) 10Paladox: Gerrit: Upgrading gerrit to 2.14.2 (DO NOT MERGE) [software/gerrit] - 10https://gerrit.wikimedia.org/r/363734 [22:55:06] debt, :P [22:57:40] PROBLEM - Upload HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [1000.0] [22:57:59] !log push procs off swap for labcontrol1001 w/ swapoff -a & swapon -a [22:58:02] PROBLEM - Eqiad HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 10.00% of data above the critical threshold [1000.0] [22:58:09] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:00:04] addshore, hashar, anomie, RainbowSprinkles, aude, MaxSem, twentyafterfour, RoanKattouw, Dereckson, thcipriani, Niharika, and zeljkof: Dear anthropoid, the time has come. Please deploy Evening SWAT (Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20170801T2300). [23:00:06] Sagan and Jdlrobson: A patch you scheduled for Evening SWAT (Max 8 patches) is about to be deployed. Please be available during the process. [23:00:17] \o [23:00:19] jdlrobson: you can start first [23:00:41] PROBLEM - Upload HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [1000.0] [23:01:29] Who's doing it? [23:02:56] (03PS3) 10Reedy: Update wordmark for Wikipedia Atikamekw [mediawiki-config] - 10https://gerrit.wikimedia.org/r/369464 (https://phabricator.wikimedia.org/T168203) (owner: 10Jdlrobson) [23:02:59] (03CR) 10Reedy: [C: 032] Update wordmark for Wikipedia Atikamekw [mediawiki-config] - 10https://gerrit.wikimedia.org/r/369464 (https://phabricator.wikimedia.org/T168203) (owner: 10Jdlrobson) [23:03:14] looks like you Reedy ? :) [23:03:21] No one else was volunteering :P [23:03:40] 10Operations, 10Wikimania-Hackathon-2017-Organization, 10Release-Engineering-Team (Watching / External): Wikimania needs hosting on a server for onsite conference guide - https://phabricator.wikimedia.org/T172217#3491406 (10bd808) There is an existing [[https://tools.wmflabs.org/openstack-browser/project/wik... [23:03:45] (03CR) 10Reedy: [C: 04-1] "This one fails lint" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/369494 (https://phabricator.wikimedia.org/T170893) (owner: 10Jdlrobson) [23:04:12] 10Operations, 10Android-app-feature-Compilations, 10Packaging, 10Reading-Infrastructure-Team-Backlog, 10Wikipedia-Android-App-Backlog (Android-app-release-v2.7.20x-Dookie💩): Unable to locate package libgumbo-dev - https://phabricator.wikimedia.org/T172218#3491408 (10Mholloway) 05Open>03Resolved a:03... [23:04:17] (03CR) 10Reedy: [C: 04-1] Disable page previews on a variety of pages (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/369494 (https://phabricator.wikimedia.org/T170893) (owner: 10Jdlrobson) [23:05:01] (03Merged) 10jenkins-bot: Update wordmark for Wikipedia Atikamekw [mediawiki-config] - 10https://gerrit.wikimedia.org/r/369464 (https://phabricator.wikimedia.org/T168203) (owner: 10Jdlrobson) [23:05:16] (03PS4) 10Reedy: Add new mobile watermark for Urdu Wikipedia. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/368444 (https://phabricator.wikimedia.org/T171769) (owner: 10Reception123) [23:05:19] (03CR) 10Reedy: [C: 032] Add new mobile watermark for Urdu Wikipedia. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/368444 (https://phabricator.wikimedia.org/T171769) (owner: 10Reception123) [23:06:19] (03CR) 10jenkins-bot: Update wordmark for Wikipedia Atikamekw [mediawiki-config] - 10https://gerrit.wikimedia.org/r/369464 (https://phabricator.wikimedia.org/T168203) (owner: 10Jdlrobson) [23:06:51] (03Merged) 10jenkins-bot: Add new mobile watermark for Urdu Wikipedia. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/368444 (https://phabricator.wikimedia.org/T171769) (owner: 10Reception123) [23:07:15] (03PS2) 10Jdlrobson: Disable page previews on a variety of pages [mediawiki-config] - 10https://gerrit.wikimedia.org/r/369494 (https://phabricator.wikimedia.org/T170893) [23:08:09] !log reedy@tin Synchronized static/images/mobile/copyright/: 2 new workmarks (duration: 00m 48s) [23:08:16] 10Operations, 10Android-app-feature-Compilations, 10Packaging, 10Reading-Infrastructure-Team-Backlog, 10Wikipedia-Android-App-Backlog (Android-app-release-v2.7.20x-Dookie💩): Unable to locate package libgumbo-dev - https://phabricator.wikimedia.org/T172218#3491434 (10Paladox) If installing it manually wor... [23:08:18] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:08:52] (03CR) 10jenkins-bot: Add new mobile watermark for Urdu Wikipedia. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/368444 (https://phabricator.wikimedia.org/T171769) (owner: 10Reception123) [23:09:10] RECOVERY - Eqiad HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] [23:09:38] !log reedy@tin Synchronized wmf-config/InitialiseSettings.php: New wordmarks T168203 T171769 (duration: 00m 47s) [23:09:47] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:09:48] T171769: Use the correct Urdu Wikipedia logo on mobile site - https://phabricator.wikimedia.org/T171769 [23:09:48] T168203: Logo change on mobile version of Wikipedia Atikamekw - https://phabricator.wikimedia.org/T168203 [23:10:10] (03PS3) 10Reedy: Disable page previews on a variety of pages [mediawiki-config] - 10https://gerrit.wikimedia.org/r/369494 (https://phabricator.wikimedia.org/T170893) (owner: 10Jdlrobson) [23:10:13] (03CR) 10Reedy: [C: 032] Disable page previews on a variety of pages [mediawiki-config] - 10https://gerrit.wikimedia.org/r/369494 (https://phabricator.wikimedia.org/T170893) (owner: 10Jdlrobson) [23:12:03] (03CR) 10Reedy: [C: 04-1] "shouldn't touch multiversion/vendor" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/368423 (https://phabricator.wikimedia.org/T170569) (owner: 10Urbanecm) [23:12:20] (03PS2) 10Reedy: Optimalize all PNGs in this repo [mediawiki-config] - 10https://gerrit.wikimedia.org/r/368423 (https://phabricator.wikimedia.org/T170569) (owner: 10Urbanecm) [23:12:30] (03Merged) 10jenkins-bot: Disable page previews on a variety of pages [mediawiki-config] - 10https://gerrit.wikimedia.org/r/369494 (https://phabricator.wikimedia.org/T170893) (owner: 10Jdlrobson) [23:12:43] (03CR) 10jenkins-bot: Disable page previews on a variety of pages [mediawiki-config] - 10https://gerrit.wikimedia.org/r/369494 (https://phabricator.wikimedia.org/T170893) (owner: 10Jdlrobson) [23:13:47] !log reedy@tin Synchronized wmf-config/InitialiseSettings.php: Disable popups on Special pages T170893 (duration: 00m 47s) [23:13:58] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:13:59] T170893: Disable page previews on various special pages - https://phabricator.wikimedia.org/T170893 [23:14:01] (03PS3) 10Reedy: Make wikiquote.png equivalent to enwikiquote.png [mediawiki-config] - 10https://gerrit.wikimedia.org/r/368244 (https://phabricator.wikimedia.org/T171887) (owner: 10Urbanecm) [23:14:05] (03CR) 10Reedy: [C: 032] Make wikiquote.png equivalent to enwikiquote.png [mediawiki-config] - 10https://gerrit.wikimedia.org/r/368244 (https://phabricator.wikimedia.org/T171887) (owner: 10Urbanecm) [23:14:15] Reedy: heh, you are fast [23:14:30] Reedy: Mind if I sneak https://gerrit.wikimedia.org/r/#/c/369570/ onto the list? [23:14:50] RECOVERY - Upload HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] [23:14:50] Sure [23:14:55] Cool thanks [23:15:44] (03Merged) 10jenkins-bot: Make wikiquote.png equivalent to enwikiquote.png [mediawiki-config] - 10https://gerrit.wikimedia.org/r/368244 (https://phabricator.wikimedia.org/T171887) (owner: 10Urbanecm) [23:15:54] (03CR) 10jenkins-bot: Make wikiquote.png equivalent to enwikiquote.png [mediawiki-config] - 10https://gerrit.wikimedia.org/r/368244 (https://phabricator.wikimedia.org/T171887) (owner: 10Urbanecm) [23:15:59] (03PS3) 10Reedy: Optimalize all PNGs in this repo [mediawiki-config] - 10https://gerrit.wikimedia.org/r/368423 (https://phabricator.wikimedia.org/T170569) (owner: 10Urbanecm) [23:16:12] (03CR) 10Reedy: [C: 032] Optimalize all PNGs in this repo [mediawiki-config] - 10https://gerrit.wikimedia.org/r/368423 (https://phabricator.wikimedia.org/T170569) (owner: 10Urbanecm) [23:19:38] (03CR) 10Paladox: [C: 031] "Tested and works" [puppet] - 10https://gerrit.wikimedia.org/r/369560 (owner: 10Paladox) [23:19:51] (03CR) 10Paladox: [C: 031] "By that i mean no puppet errors" [puppet] - 10https://gerrit.wikimedia.org/r/369560 (owner: 10Paladox) [23:19:57] (03Merged) 10jenkins-bot: Optimalize all PNGs in this repo [mediawiki-config] - 10https://gerrit.wikimedia.org/r/368423 (https://phabricator.wikimedia.org/T170569) (owner: 10Urbanecm) [23:20:11] (03CR) 10jenkins-bot: Optimalize all PNGs in this repo [mediawiki-config] - 10https://gerrit.wikimedia.org/r/368423 (https://phabricator.wikimedia.org/T170569) (owner: 10Urbanecm) [23:21:31] !log reedy@tin Synchronized docroot/noc/css/images/: optimise pngs (duration: 00m 47s) [23:21:41] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:23:11] Reedy: thanks. looks like they are all synced [23:23:24] !log reedy@tin Synchronized static/apple-touch/: optimise pngs (duration: 00m 46s) [23:23:34] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:24:44] !log reedy@tin Synchronized static/images/project-logos/: optimise pngs (duration: 00m 46s) [23:24:52] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:26:37] !log reedy@tin Synchronized php-1.30.0-wmf.12/resources/src/mediawiki.rcfilters/mw.rcfilters.Controller.js: T172156 T171514 (duration: 00m 46s) [23:26:48] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:26:48] T172156: Make limit sticky adhere to preference limit value - https://phabricator.wikimedia.org/T172156 [23:26:48] T171514: Make number of days and number of changes settings sticky - https://phabricator.wikimedia.org/T171514 [23:27:16] (03PS2) 10Reedy: Rename Wikinews namespace to Wikinieuws on nl.wikinews [mediawiki-config] - 10https://gerrit.wikimedia.org/r/369449 (https://phabricator.wikimedia.org/T172211) (owner: 10Urbanecm) [23:27:19] (03CR) 10Reedy: [C: 032] Rename Wikinews namespace to Wikinieuws on nl.wikinews [mediawiki-config] - 10https://gerrit.wikimedia.org/r/369449 (https://phabricator.wikimedia.org/T172211) (owner: 10Urbanecm) [23:27:58] (03PS1) 10BBlack: add "issue globalsign.com" to CAA recs [dns] - 10https://gerrit.wikimedia.org/r/369575 (https://phabricator.wikimedia.org/T155806) [23:29:10] (03PS5) 10Paladox: Gerrit: Enable manage_home in scap [puppet] - 10https://gerrit.wikimedia.org/r/369560 [23:29:23] (03Merged) 10jenkins-bot: Rename Wikinews namespace to Wikinieuws on nl.wikinews [mediawiki-config] - 10https://gerrit.wikimedia.org/r/369449 (https://phabricator.wikimedia.org/T172211) (owner: 10Urbanecm) [23:29:35] (03CR) 10jenkins-bot: Rename Wikinews namespace to Wikinieuws on nl.wikinews [mediawiki-config] - 10https://gerrit.wikimedia.org/r/369449 (https://phabricator.wikimedia.org/T172211) (owner: 10Urbanecm) [23:30:53] !log reedy@tin Synchronized wmf-config/InitialiseSettings.php: wikinews - T172211 (duration: 00m 46s) [23:31:02] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:31:02] T172211: Rename Wikinews namespace to Wikinieuws on nl.wikinews, with Wikinews as alias - https://phabricator.wikimedia.org/T172211 [23:31:21] (03PS2) 10Reedy: Assign autopatrol to all holders of autoreview on arwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/369414 (https://phabricator.wikimedia.org/T167071) (owner: 10Urbanecm) [23:31:24] (03CR) 10Reedy: [C: 032] Assign autopatrol to all holders of autoreview on arwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/369414 (https://phabricator.wikimedia.org/T167071) (owner: 10Urbanecm) [23:34:48] (03Merged) 10jenkins-bot: Assign autopatrol to all holders of autoreview on arwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/369414 (https://phabricator.wikimedia.org/T167071) (owner: 10Urbanecm) [23:36:01] PROBLEM - Check systemd state on labpuppetmaster1002 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [23:36:18] (03PS3) 10Reedy: OOUIHTMLForm does not support the 'cols' parameter for textareas [mediawiki-config] - 10https://gerrit.wikimedia.org/r/369448 (https://phabricator.wikimedia.org/T172199) [23:36:21] (03CR) 10Reedy: [C: 032] OOUIHTMLForm does not support the 'cols' parameter for textareas [mediawiki-config] - 10https://gerrit.wikimedia.org/r/369448 (https://phabricator.wikimedia.org/T172199) (owner: 10Reedy) [23:36:28] (03CR) 10jenkins-bot: Assign autopatrol to all holders of autoreview on arwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/369414 (https://phabricator.wikimedia.org/T167071) (owner: 10Urbanecm) [23:36:38] !log reedy@tin Synchronized wmf-config/InitialiseSettings.php: T167071 (duration: 00m 47s) [23:36:47] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:36:48] T167071: Categories created by editors should not require administrative review on ar.wiki - https://phabricator.wikimedia.org/T167071 [23:38:51] (03Merged) 10jenkins-bot: OOUIHTMLForm does not support the 'cols' parameter for textareas [mediawiki-config] - 10https://gerrit.wikimedia.org/r/369448 (https://phabricator.wikimedia.org/T172199) (owner: 10Reedy) [23:39:10] (03CR) 10jenkins-bot: OOUIHTMLForm does not support the 'cols' parameter for textareas [mediawiki-config] - 10https://gerrit.wikimedia.org/r/369448 (https://phabricator.wikimedia.org/T172199) (owner: 10Reedy) [23:39:56] !log reedy@tin Synchronized wmf-config/MetaContactPages.php: Unbreak ContactPage T172199 (duration: 00m 46s) [23:40:07] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:40:08] T172199: Speciaal:Contactpagina on nlwiki gives an internal error - https://phabricator.wikimedia.org/T172199 [23:41:00] !log reedy@tin Synchronized wmf-config/CommonSettings.php: Unbreak ContactPage T172199 (duration: 00m 46s) [23:41:10] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:41:17] (03PS2) 10Reedy: extension.json usage for Scribunto [mediawiki-config] - 10https://gerrit.wikimedia.org/r/368554 (https://phabricator.wikimedia.org/T139800) [23:41:20] (03CR) 10Reedy: [C: 032] extension.json usage for Scribunto [mediawiki-config] - 10https://gerrit.wikimedia.org/r/368554 (https://phabricator.wikimedia.org/T139800) (owner: 10Reedy) [23:42:43] (03Merged) 10jenkins-bot: extension.json usage for Scribunto [mediawiki-config] - 10https://gerrit.wikimedia.org/r/368554 (https://phabricator.wikimedia.org/T139800) (owner: 10Reedy) [23:42:52] (03CR) 10jenkins-bot: extension.json usage for Scribunto [mediawiki-config] - 10https://gerrit.wikimedia.org/r/368554 (https://phabricator.wikimedia.org/T139800) (owner: 10Reedy) [23:43:17] (03PS1) 10EBernhardson: Revert "Add syslog-udp for logstash testing on 11515" [puppet] - 10https://gerrit.wikimedia.org/r/369577 [23:43:44] !log reedy@tin Synchronized wmf-config/extension-list: json! (duration: 00m 46s) [23:43:52] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:44:45] (03PS2) 10Reedy: Make babel use Database and SUL wikis use metawiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/368429 (https://phabricator.wikimedia.org/T145366) [23:44:45] !log reedy@tin Synchronized wmf-config/CommonSettings.php: wfLoadExtension Scribunto (duration: 00m 46s) [23:44:47] (03CR) 10Reedy: [C: 032] Make babel use Database and SUL wikis use metawiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/368429 (https://phabricator.wikimedia.org/T145366) (owner: 10Reedy) [23:44:54] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:46:08] (03Merged) 10jenkins-bot: Make babel use Database and SUL wikis use metawiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/368429 (https://phabricator.wikimedia.org/T145366) (owner: 10Reedy) [23:47:27] !log reedy@tin Synchronized wmf-config/CommonSettings.php: Make Babel use databasey stuff T145366 (duration: 00m 46s) [23:47:36] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:47:36] T145366: Create and populate babel database table on Wikimedia wikis - https://phabricator.wikimedia.org/T145366 [23:53:32] Reedy: https://en.wikipedia.org/static/images/project-logos/enwikiquote.png and https://en.wikipedia.org/static/images/project-logos/wikiquote.png are still different for me. or is that just my cache? [23:53:45] It's very likely it's cached [23:53:45] nvm [23:53:48] it was my cache [23:54:08] Reedy: then thx for SWATing :)