[00:03:49] (03CR) 10Krinkle: Enforce that interface-admin is the only group that can edit non-own CSS/JS (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/421125 (https://phabricator.wikimedia.org/T190015) (owner: 10Gergő Tisza) [00:06:09] (03CR) 10Krinkle: [C: 031] Test that all wikis are in one of the shard dblists [mediawiki-config] - 10https://gerrit.wikimedia.org/r/455587 (https://phabricator.wikimedia.org/T202904) (owner: 10Anomie) [00:44:53] 10Operations, 10netops: rancid pubkey auth to Junos 17.4 failure - https://phabricator.wikimedia.org/T202952 (10ayounsi) p:05Triage>03Normal [00:48:33] (03PS1) 10Ayounsi: Rancid, comment out cr2-eqdfw until pubkey auth issue is solved [puppet] - 10https://gerrit.wikimedia.org/r/455749 (https://phabricator.wikimedia.org/T202952) [00:50:18] (03CR) 10Ayounsi: [C: 032] Rancid, comment out cr2-eqdfw until pubkey auth issue is solved [puppet] - 10https://gerrit.wikimedia.org/r/455749 (https://phabricator.wikimedia.org/T202952) (owner: 10Ayounsi) [01:59:44] (03CR) 10Alex Monk: [C: 032] Certcentral integration tests [software/certcentral] - 10https://gerrit.wikimedia.org/r/454045 (https://phabricator.wikimedia.org/T199711) (owner: 10Vgutierrez) [02:01:10] (03Merged) 10jenkins-bot: Certcentral integration tests [software/certcentral] - 10https://gerrit.wikimedia.org/r/454045 (https://phabricator.wikimedia.org/T199711) (owner: 10Vgutierrez) [02:02:28] (03CR) 10jenkins-bot: Certcentral integration tests [software/certcentral] - 10https://gerrit.wikimedia.org/r/454045 (https://phabricator.wikimedia.org/T199711) (owner: 10Vgutierrez) [02:20:10] (03PS19) 10Zoranzoki21: Enable Extension:Newsletter on hewiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/381537 (https://phabricator.wikimedia.org/T177151) [02:33:07] (03CR) 10Alex Monk: [C: 032] Deliver certificates in every save mode [software/certcentral] - 10https://gerrit.wikimedia.org/r/454794 (https://phabricator.wikimedia.org/T199711) (owner: 10Vgutierrez) [02:34:30] (03Merged) 10jenkins-bot: Deliver certificates in every save mode [software/certcentral] - 10https://gerrit.wikimedia.org/r/454794 (https://phabricator.wikimedia.org/T199711) (owner: 10Vgutierrez) [02:34:56] !log l10nupdate@deploy1001 scap sync-l10n completed (1.32.0-wmf.18) (duration: 13m 17s) [02:34:59] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [02:35:47] (03CR) 10jenkins-bot: Deliver certificates in every save mode [software/certcentral] - 10https://gerrit.wikimedia.org/r/454794 (https://phabricator.wikimedia.org/T199711) (owner: 10Vgutierrez) [02:36:53] (03CR) 10Alex Monk: [C: 032] Implement DNS01 challenge support [software/certcentral] - 10https://gerrit.wikimedia.org/r/454845 (https://phabricator.wikimedia.org/T199711) (owner: 10Vgutierrez) [02:38:15] (03Merged) 10jenkins-bot: Implement DNS01 challenge support [software/certcentral] - 10https://gerrit.wikimedia.org/r/454845 (https://phabricator.wikimedia.org/T199711) (owner: 10Vgutierrez) [02:38:19] (03Merged) 10jenkins-bot: Provide support in the API for different certificate save modes [software/certcentral] - 10https://gerrit.wikimedia.org/r/455153 (https://phabricator.wikimedia.org/T199711) (owner: 10Vgutierrez) [02:39:38] (03CR) 10jenkins-bot: Implement DNS01 challenge support [software/certcentral] - 10https://gerrit.wikimedia.org/r/454845 (https://phabricator.wikimedia.org/T199711) (owner: 10Vgutierrez) [02:39:43] (03CR) 10jenkins-bot: Provide support in the API for different certificate save modes [software/certcentral] - 10https://gerrit.wikimedia.org/r/455153 (https://phabricator.wikimedia.org/T199711) (owner: 10Vgutierrez) [02:40:10] (03CR) 10Alex Monk: [C: 032] "(this is a good start - obviously we will need to return in the near future to handle the important TODOs, but this can go into master)" [software/certcentral] - 10https://gerrit.wikimedia.org/r/454845 (https://phabricator.wikimedia.org/T199711) (owner: 10Vgutierrez) [02:45:09] !log l10nupdate@deploy1001 ResourceLoader cache refresh completed at Tue Aug 28 02:45:08 UTC 2018 (duration 10m 12s) [02:45:11] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [03:31:43] (03CR) 10Andrew Bogott: [C: 032] region-migrate: handle floating IPs [puppet] - 10https://gerrit.wikimedia.org/r/455726 (https://phabricator.wikimedia.org/T191790) (owner: 10Andrew Bogott) [03:31:51] (03PS2) 10Andrew Bogott: region-migrate: handle floating IPs [puppet] - 10https://gerrit.wikimedia.org/r/455726 (https://phabricator.wikimedia.org/T191790) [04:18:18] PROBLEM - IPv6 ping to codfw on ripe-atlas-codfw IPv6 is CRITICAL: CRITICAL - failed 21 probes of 319 (alerts on 19) - https://atlas.ripe.net/measurements/1791212/#!map [04:23:19] RECOVERY - IPv6 ping to codfw on ripe-atlas-codfw IPv6 is OK: OK - failed 16 probes of 319 (alerts on 19) - https://atlas.ripe.net/measurements/1791212/#!map [04:52:05] * Krinkle staging on deploy1001/mwdebug1002 [04:55:19] !log krinkle@deploy1001 sync-file aborted: (beta) 8d773b6cf (duration: 00m 02s) [04:55:22] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [04:55:39] meh, why not. [04:56:27] !log krinkle@deploy1001 Synchronized wmf-config/InitialiseSettings-labs.php: (beta) 8d773b6cf (duration: 00m 50s) [04:56:30] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:01:43] (03PS1) 10Marostegui: db-eqiad.php: Depool db1103:3314 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/455755 [05:04:00] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Depool db1103:3314 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/455755 (owner: 10Marostegui) [05:05:45] (03Merged) 10jenkins-bot: db-eqiad.php: Depool db1103:3314 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/455755 (owner: 10Marostegui) [05:07:21] !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Depool db1103:3314 (duration: 00m 49s) [05:07:24] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:07:25] !log Deploy schema change db1103:3314 [05:07:29] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:12:51] 10Operations, 10SRE-Access-Requests, 10Patch-For-Review: Onboarding Balazs Pocze - https://phabricator.wikimedia.org/T202521 (10Marostegui) [05:13:14] 10Operations, 10SRE-Access-Requests, 10Patch-For-Review: Onboarding Balazs Pocze - https://phabricator.wikimedia.org/T202521 (10Marostegui) @Banyek requested a cloak Aug 27th 2018 [05:13:50] (03CR) 10jenkins-bot: db-eqiad.php: Depool db1103:3314 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/455755 (owner: 10Marostegui) [05:26:36] Jenkins was taking 42min to merge the commit, I was still staging... [05:26:44] * Krinkle tries again [05:28:23] !log krinkle@deploy1001 Synchronized php-1.32.0-wmf.18/resources/src/startup/startup.js: one line to rule them all - I4942bfd236c72b (duration: 00m 48s) [05:28:26] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:32:52] * Krinkle releases deploy handle [05:44:24] (03PS1) 10Volans: Rebuild wheels for pycryptodome security update [software/netbox-deploy] - 10https://gerrit.wikimedia.org/r/455757 [05:46:05] (03CR) 10Volans: "FYI, as reported by github vulnerable dependency check." [software/netbox-deploy] - 10https://gerrit.wikimedia.org/r/455757 (owner: 10Volans) [05:48:42] 10Operations, 10Traffic, 10Patch-For-Review: Upgrade cache servers to stretch - https://phabricator.wikimedia.org/T200445 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by ema on neodymium.eqiad.wmnet for hosts: ``` ['cp3038.esams.wmnet', 'cp3040.esams.wmnet'] ``` The log can be found in `/var/l... [05:54:29] (03CR) 10Volans: [V: 032 C: 032] Rebuild wheels for pycryptodome security update [software/netbox-deploy] - 10https://gerrit.wikimedia.org/r/455757 (owner: 10Volans) [05:55:56] !log volans@deploy1001 Started deploy [netbox/deploy@5e70423]: Security upgrade of dependency [05:55:59] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:56:30] !log volans@deploy1001 Finished deploy [netbox/deploy@5e70423]: Security upgrade of dependency (duration: 00m 34s) [05:56:33] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:01:41] (03PS1) 10Elukey: Switch archiva.wikimedia.org to archiva1001 [dns] - 10https://gerrit.wikimedia.org/r/455760 (https://phabricator.wikimedia.org/T192639) [06:08:53] (03PS1) 10Elukey: Move archiva.wikimedia.org from meitnerium to archiva1001 [puppet] - 10https://gerrit.wikimedia.org/r/455761 (https://phabricator.wikimedia.org/T192639) [06:13:57] PROBLEM - Varnish HTTP text-frontend - port 3124 on cp3040 is CRITICAL: connect to address 10.20.0.175 and port 3124: Connection refused [06:13:57] PROBLEM - Check the NTP synchronisation status of timesyncd on cp3040 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [06:13:57] PROBLEM - dhclient process on cp3040 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [06:14:14] that's me, ignore ^ [06:16:07] RECOVERY - dhclient process on cp3040 is OK: PROCS OK: 0 processes with command name dhclient [06:21:21] 10Operations, 10Traffic, 10Patch-For-Review: Upgrade cache servers to stretch - https://phabricator.wikimedia.org/T200445 (10ops-monitoring-bot) Completed auto-reimage of hosts: ``` ['cp3038.esams.wmnet', 'cp3040.esams.wmnet'] ``` and were **ALL** successful. [06:22:57] RECOVERY - Varnish HTTP text-frontend - port 3124 on cp3040 is OK: HTTP OK: HTTP/1.1 200 OK - 502 bytes in 0.168 second response time [06:24:26] 10Operations, 10Analytics, 10Analytics-Kanban, 10netops, 10Patch-For-Review: Review analytics-in4/6 rules on cr1/cr2 eqiad - https://phabricator.wikimedia.org/T198623 (10elukey) [06:29:20] (03PS1) 10Smalyshev: Add loadCategoriesDaily.sh script [puppet] - 10https://gerrit.wikimedia.org/r/455762 [06:30:14] PROBLEM - puppet last run on mw1323 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 4 minutes ago with 1 failures. Failed resources (up to 3 shown): File[/usr/local/bin/cgroup-mediawiki-clean] [06:30:19] (03PS2) 10Elukey: Move archiva.wikimedia.org from meitnerium to archiva1001 [puppet] - 10https://gerrit.wikimedia.org/r/455761 (https://phabricator.wikimedia.org/T192639) [06:31:34] PROBLEM - puppet last run on mw1278 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 5 minutes ago with 1 failures. Failed resources (up to 3 shown): File[/usr/local/bin/cgroup-mediawiki-clean] [06:43:54] RECOVERY - Check the NTP synchronisation status of timesyncd on cp3040 is OK: OK: synced at Tue 2018-08-28 06:43:52 UTC. [06:44:46] (03CR) 10Elukey: "https://puppet-compiler.wmflabs.org/compiler02/12251/" [puppet] - 10https://gerrit.wikimedia.org/r/455761 (https://phabricator.wikimedia.org/T192639) (owner: 10Elukey) [06:45:49] (03CR) 10Gehel: [C: 032] Add loadCategoriesDaily.sh script [puppet] - 10https://gerrit.wikimedia.org/r/455762 (owner: 10Smalyshev) [06:47:39] !log Deploy schema change on s8 primary master (db1071) [06:47:42] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:48:27] (03CR) 10Muehlenhoff: [C: 031] "Looks good" [puppet] - 10https://gerrit.wikimedia.org/r/446325 (https://phabricator.wikimedia.org/T198756) (owner: 10Filippo Giunchedi) [06:52:26] (03PS1) 10Marostegui: db-codfw.php: Repool db2033 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/455764 (https://phabricator.wikimedia.org/T201757) [06:54:07] (03CR) 10Marostegui: [C: 032] db-codfw.php: Repool db2033 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/455764 (https://phabricator.wikimedia.org/T201757) (owner: 10Marostegui) [06:55:25] (03Merged) 10jenkins-bot: db-codfw.php: Repool db2033 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/455764 (https://phabricator.wikimedia.org/T201757) (owner: 10Marostegui) [06:56:53] RECOVERY - puppet last run on mw1278 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:57:21] !log marostegui@deploy1001 Synchronized wmf-config/db-codfw.php: Repool db2033 - T201757 (duration: 00m 49s) [06:57:25] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:57:26] T201757: Degraded RAID on db2033 - https://phabricator.wikimedia.org/T201757 [06:57:59] (03PS1) 10Addshore: InterwikiSortOrders.php doc / comment [mediawiki-config] - 10https://gerrit.wikimedia.org/r/455765 (https://phabricator.wikimedia.org/T170745) [06:59:17] 10Operations, 10ops-codfw, 10Patch-For-Review: Degraded RAID on db2033 - https://phabricator.wikimedia.org/T201757 (10Marostegui) 05Open>03Resolved Server repooled [07:00:24] RECOVERY - puppet last run on mw1323 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures [07:00:40] 10Operations, 10ops-codfw, 10DBA: Degraded RAID on db2058 - https://phabricator.wikimedia.org/T202824 (10Marostegui) 05Open>03Resolved All good! Thank you! ``` logicaldrive 1 (3.3 TB, RAID 1+0, OK) physicaldrive 1I:1:1 (port 1I:box 1:bay 1, SAS, 600 GB, OK) physicaldrive 1I:1:2 (port... [07:03:22] (03PS6) 10Filippo Giunchedi: mtail: Escape the '.' in /w/load.php for varnishrls.mtail [puppet] - 10https://gerrit.wikimedia.org/r/454724 (https://phabricator.wikimedia.org/T202479) (owner: 10Krinkle) [07:04:17] (03CR) 10jenkins-bot: db-codfw.php: Repool db2033 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/455764 (https://phabricator.wikimedia.org/T201757) (owner: 10Marostegui) [07:05:15] (03PS1) 10Smalyshev: Enable daily category diffs for internal [puppet] - 10https://gerrit.wikimedia.org/r/455766 [07:06:52] (03CR) 10Filippo Giunchedi: [C: 032] mtail: Escape the '.' in /w/load.php for varnishrls.mtail [puppet] - 10https://gerrit.wikimedia.org/r/454724 (https://phabricator.wikimedia.org/T202479) (owner: 10Krinkle) [07:07:02] (03CR) 10Smalyshev: "If dailies loading for tests works today, we can enable this for internal too." [puppet] - 10https://gerrit.wikimedia.org/r/455766 (owner: 10Smalyshev) [07:11:12] 10Operations, 10SRE-Access-Requests, 10Patch-For-Review: Onboarding Balazs Pocze - https://phabricator.wikimedia.org/T202521 (10Marostegui) [07:13:58] (03CR) 10Ema: [C: 04-1] "See inline. Also please add https://phabricator.wikimedia.org/P7488 as modules/varnish/files/tests/text/26-restbase-accept-ignore-semver.v" (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/455036 (https://phabricator.wikimedia.org/T202682) (owner: 10Ppchelko) [07:15:58] 10Operations, 10Analytics, 10vm-requests: eqiad (1) - VM request for Piwik/Matomo - https://phabricator.wikimedia.org/T202963 (10elukey) p:05Triage>03Normal [07:16:14] (03PS1) 10Muehlenhoff: Bump meta package for new ABI in L1TF fixes Remove obsolete linux-meta-4.4 package [debs/linux-meta] - 10https://gerrit.wikimedia.org/r/455767 [07:16:39] (03CR) 10Ema: [C: 04-1] "> Also please add https://phabricator.wikimedia.org/P7488" [puppet] - 10https://gerrit.wikimedia.org/r/455036 (https://phabricator.wikimedia.org/T202682) (owner: 10Ppchelko) [07:18:38] 10Operations, 10ops-codfw, 10DBA, 10Patch-For-Review: db2088 rebooted itself and came back sick - https://phabricator.wikimedia.org/T202822 (10Marostegui) I have started to compare the main tables for s1 and s2 as the server caught up already [07:18:47] (03CR) 10Muehlenhoff: [C: 032] Bump meta package for new ABI in L1TF fixes Remove obsolete linux-meta-4.4 package [debs/linux-meta] - 10https://gerrit.wikimedia.org/r/455767 (owner: 10Muehlenhoff) [07:19:31] (03CR) 10Jcrespo: "Extra suggestions." (033 comments) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/455587 (https://phabricator.wikimedia.org/T202904) (owner: 10Anomie) [07:19:47] (03CR) 10Filippo Giunchedi: graphite: alert when eqiad and codfw drift in number of thumbnails (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/455553 (https://phabricator.wikimedia.org/T199073) (owner: 10Filippo Giunchedi) [07:22:29] (03CR) 10Elukey: [C: 04-1] "Left a couple of comments but thanks for working on this! I just opened a task to upgrade bohrium to stretch :) - T202962" (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/453553 (owner: 10Dzahn) [07:24:01] 10Operations, 10Analytics, 10vm-requests: eqiad (1) - VM request for Piwik/Matomo - https://phabricator.wikimedia.org/T202963 (10elukey) [07:24:21] (03PS2) 10Filippo Giunchedi: mediawiki: use syslogidentifier in systemd units [puppet] - 10https://gerrit.wikimedia.org/r/446325 (https://phabricator.wikimedia.org/T198756) [07:26:24] (03CR) 10Filippo Giunchedi: [C: 032] mediawiki: use syslogidentifier in systemd units [puppet] - 10https://gerrit.wikimedia.org/r/446325 (https://phabricator.wikimedia.org/T198756) (owner: 10Filippo Giunchedi) [07:28:16] 10Operations, 10Traffic, 10Patch-For-Review: Upgrade cache servers to stretch - https://phabricator.wikimedia.org/T200445 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by ema on neodymium.eqiad.wmnet for hosts: ``` ['cp3039.esams.wmnet'] ``` The log can be found in `/var/log/wmf-auto-reimage/20... [07:30:24] (03CR) 10Volans: [C: 031] "I'm not expert in prometheus syntax, but the logic is ok." (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/455553 (https://phabricator.wikimedia.org/T199073) (owner: 10Filippo Giunchedi) [07:36:18] 10Operations, 10Traffic: Make cp1099 the new pinkunicorn - https://phabricator.wikimedia.org/T202966 (10ema) [07:36:27] 10Operations, 10Traffic: Make cp1099 the new pinkunicorn - https://phabricator.wikimedia.org/T202966 (10ema) p:05Triage>03Normal [07:36:33] !log imported linux-meta/1.19 to apt.wikimedia.org/jessie-wikimedia [07:36:37] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:37:11] (03PS1) 10Jcrespo: mariadb: Increase core memory usage to 80% of physical memory [puppet] - 10https://gerrit.wikimedia.org/r/455769 [07:37:54] !log imported python-requests-mock/1.3.0-3~wmf1+stretch to apt.wikimedia.org/stretch-wikimedia [07:37:57] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:38:17] 10Operations, 10MediaWiki-extensions-Translate, 10Language-2018-July-September, 10MW-1.32-release-notes (WMF-deploy-2018-08-21 (1.32.0-wmf.18)), and 4 others: 503 error attempting to open multiple projects (Wikipedia and meta wiki are loading very slowly) - https://phabricator.wikimedia.org/T195293 (10Niker... [07:39:34] !log Deploy schema change on s3:advisorswiki - T202904 T197891 [07:39:39] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:39:40] T197891: Schema change to drop default from externallinks.el_index_60 - https://phabricator.wikimedia.org/T197891 [07:39:40] T202904: advisorswiki is not in any s?.dblist - https://phabricator.wikimedia.org/T202904 [07:41:56] !log Deploy schema change on s3:advisorswiki - T202904 https://phabricator.wikimedia.org/T195193 [07:42:02] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:45:19] !log Deploy schema change on s3:advisorswiki - T202904 T196379 [07:45:24] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:45:25] T196379: Schema change: Add unique index on archive.ar_rev_id - https://phabricator.wikimedia.org/T196379 [07:45:25] T202904: advisorswiki is not in any s?.dblist - https://phabricator.wikimedia.org/T202904 [07:45:52] (03PS2) 10Filippo Giunchedi: graphite: alert when eqiad and codfw drift in number of thumbnails [puppet] - 10https://gerrit.wikimedia.org/r/455553 (https://phabricator.wikimedia.org/T199073) [07:46:21] !log Deploy schema change on s3:advisorswiki - T202904 T192926 [07:46:26] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:46:27] T192926: Schema change to drop archive.ar_text and archive.ar_flags - https://phabricator.wikimedia.org/T192926 [07:47:05] !log Deploy schema change on s3:advisorswiki - T202904 T199368 [07:47:11] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:47:11] T199368: Convert UNIQUE INDEX to PK in Production - https://phabricator.wikimedia.org/T199368 [07:47:45] (03PS3) 10Filippo Giunchedi: graphite: alert when eqiad and codfw drift in number of thumbnails [puppet] - 10https://gerrit.wikimedia.org/r/455553 (https://phabricator.wikimedia.org/T199073) [07:47:53] (03CR) 10Filippo Giunchedi: [C: 032] graphite: alert when eqiad and codfw drift in number of thumbnails [puppet] - 10https://gerrit.wikimedia.org/r/455553 (https://phabricator.wikimedia.org/T199073) (owner: 10Filippo Giunchedi) [07:49:40] (03CR) 10Muehlenhoff: [C: 031] "Looks good, one comment." (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/455562 (https://phabricator.wikimedia.org/T199079) (owner: 10Volans) [07:56:38] 10Operations, 10Performance-Team, 10Wikidata, 10Wikidata-Query-Service: Wikidata produces a lot of failed requests for recentchanges API - https://phabricator.wikimedia.org/T202764 (10jcrespo) I got this on mediawiki logs at the same time than one of the retries. Are you sure you are not using a deprecated... [07:57:46] !log rebooting multatuli for some tests [07:57:50] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:59:49] 10Operations, 10Traffic, 10Patch-For-Review: Upgrade cache servers to stretch - https://phabricator.wikimedia.org/T200445 (10ops-monitoring-bot) Completed auto-reimage of hosts: ``` ['cp3039.esams.wmnet'] ``` and were **ALL** successful. [08:06:43] (03CR) 10Marostegui: [C: 031] mariadb: Increase core memory usage to 80% of physical memory [puppet] - 10https://gerrit.wikimedia.org/r/455769 (owner: 10Jcrespo) [08:07:07] 10Operations, 10SRE-Access-Requests, 10Patch-For-Review: Please add aaron to perf-team - https://phabricator.wikimedia.org/T202650 (10ArielGlenn) [08:26:02] 10Operations, 10SRE-Access-Requests, 10Patch-For-Review: Onboarding Balazs Pocze - https://phabricator.wikimedia.org/T202521 (10Marostegui) @Banyek confirms he has access to logstash and tendril [08:27:35] (03PS6) 10Volans: spicerack, cookbooks: install and configure [puppet] - 10https://gerrit.wikimedia.org/r/455562 (https://phabricator.wikimedia.org/T199079) [08:28:25] (03CR) 10Volans: "done" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/455562 (https://phabricator.wikimedia.org/T199079) (owner: 10Volans) [08:30:49] (03PS2) 10Filippo Giunchedi: diamond: send metrics to graphite1004 [puppet] - 10https://gerrit.wikimedia.org/r/454874 (https://phabricator.wikimedia.org/T196484) [08:31:07] (03PS2) 10ArielGlenn: adding new shell user Kalliope Tsouroupidou [puppet] - 10https://gerrit.wikimedia.org/r/454712 (https://phabricator.wikimedia.org/T202486) (owner: 10RobH) [08:31:33] (03CR) 10Filippo Giunchedi: [C: 032] diamond: send metrics to graphite1004 [puppet] - 10https://gerrit.wikimedia.org/r/454874 (https://phabricator.wikimedia.org/T196484) (owner: 10Filippo Giunchedi) [08:31:35] (03CR) 10jerkins-bot: [V: 04-1] adding new shell user Kalliope Tsouroupidou [puppet] - 10https://gerrit.wikimedia.org/r/454712 (https://phabricator.wikimedia.org/T202486) (owner: 10RobH) [08:32:42] 10Operations, 10Dumps-Generation: Reboots of dumps/snapshot hosts for L1TF/microcode updates - https://phabricator.wikimedia.org/T202623 (10MoritzMuehlenhoff) [08:33:44] (03CR) 10Muehlenhoff: [C: 031] spicerack, cookbooks: install and configure [puppet] - 10https://gerrit.wikimedia.org/r/455562 (https://phabricator.wikimedia.org/T199079) (owner: 10Volans) [08:35:31] (03PS7) 10Volans: spicerack, cookbooks: install and configure [puppet] - 10https://gerrit.wikimedia.org/r/455562 (https://phabricator.wikimedia.org/T199079) [08:36:27] (03PS1) 10Banyek: admin: make Balazs Pocze (banyek) root [puppet] - 10https://gerrit.wikimedia.org/r/455773 (https://phabricator.wikimedia.org/T202521) [08:37:18] (03CR) 10Volans: [C: 032] spicerack, cookbooks: install and configure [puppet] - 10https://gerrit.wikimedia.org/r/455562 (https://phabricator.wikimedia.org/T199079) (owner: 10Volans) [08:37:40] (03CR) 10Marostegui: [C: 031] admin: make Balazs Pocze (banyek) root [puppet] - 10https://gerrit.wikimedia.org/r/455773 (https://phabricator.wikimedia.org/T202521) (owner: 10Banyek) [08:38:28] (03CR) 10ArielGlenn: "recheck" [puppet] - 10https://gerrit.wikimedia.org/r/454712 (https://phabricator.wikimedia.org/T202486) (owner: 10RobH) [08:38:39] (03CR) 10Banyek: [C: 032] admin: make Balazs Pocze (banyek) root [puppet] - 10https://gerrit.wikimedia.org/r/455773 (https://phabricator.wikimedia.org/T202521) (owner: 10Banyek) [08:39:43] (03CR) 10ArielGlenn: [C: 032] adding new shell user Kalliope Tsouroupidou [puppet] - 10https://gerrit.wikimedia.org/r/454712 (https://phabricator.wikimedia.org/T202486) (owner: 10RobH) [08:39:57] (03PS3) 10ArielGlenn: adding new shell user Kalliope Tsouroupidou [puppet] - 10https://gerrit.wikimedia.org/r/454712 (https://phabricator.wikimedia.org/T202486) (owner: 10RobH) [08:41:30] (03PS2) 10Banyek: admin: make Balazs Pocze (banyek) root [puppet] - 10https://gerrit.wikimedia.org/r/455773 (https://phabricator.wikimedia.org/T202521) [08:45:30] (03PS1) 10Volans: spicerack: fix cookbooks path in config [puppet] - 10https://gerrit.wikimedia.org/r/455775 (https://phabricator.wikimedia.org/T199079) [08:48:14] (03PS2) 10ArielGlenn: adding sguebo to restricted and analytics-privatedata-users [puppet] - 10https://gerrit.wikimedia.org/r/455613 (https://phabricator.wikimedia.org/T202362) (owner: 10RobH) [08:48:53] (03CR) 10Volans: [C: 032] spicerack: fix cookbooks path in config [puppet] - 10https://gerrit.wikimedia.org/r/455775 (https://phabricator.wikimedia.org/T199079) (owner: 10Volans) [08:49:02] (03PS2) 10Volans: spicerack: fix cookbooks path in config [puppet] - 10https://gerrit.wikimedia.org/r/455775 (https://phabricator.wikimedia.org/T199079) [08:49:05] (03CR) 10ArielGlenn: [C: 032] adding sguebo to restricted and analytics-privatedata-users [puppet] - 10https://gerrit.wikimedia.org/r/455613 (https://phabricator.wikimedia.org/T202362) (owner: 10RobH) [08:49:30] apergos: sorry, go ahead I can submit mine after you [08:49:36] 10Operations, 10Patch-For-Review, 10Tor: rack/setup/install torrelay1001.wikimedia.org - https://phabricator.wikimedia.org/T196701 (10MoritzMuehlenhoff) >>! In T196701#4536718, @Dzahn wrote: > migration plan: > > goal: keep the same fingerprints > > - stop tor service on radium > - rsync datadir contents (... [08:50:07] done and puppet-merged [08:50:27] ack, thx [08:50:28] (03PS3) 10Volans: spicerack: fix cookbooks path in config [puppet] - 10https://gerrit.wikimedia.org/r/455775 (https://phabricator.wikimedia.org/T199079) [08:51:28] 10Operations, 10SRE-Access-Requests, 10Patch-For-Review: Requesting access to restricted production access and analytics-privatedata-users for Samuel Guebo - https://phabricator.wikimedia.org/T202362 (10ArielGlenn) [08:55:52] (03PS1) 10Alexandros Kosiaris: cxserver: Add lingocloud key [labs/private] - 10https://gerrit.wikimedia.org/r/455779 [08:56:25] 10Operations, 10SRE-Access-Requests, 10Patch-For-Review: Onboarding Balazs Pocze - https://phabricator.wikimedia.org/T202521 (10Marostegui) @Banyek confirms he's got root access to neodymium, puppetmaster1001, db1089 and db2048 [08:56:33] 10Operations, 10SRE-Access-Requests, 10Patch-For-Review: Onboarding Balazs Pocze - https://phabricator.wikimedia.org/T202521 (10Marostegui) [08:57:34] 10Operations, 10SRE-Access-Requests, 10Patch-For-Review: Onboarding Balazs Pocze - https://phabricator.wikimedia.org/T202521 (10Marostegui) [08:58:16] 10Operations, 10Traffic, 10Patch-For-Review: Upgrade cache servers to stretch - https://phabricator.wikimedia.org/T200445 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by ema on neodymium.eqiad.wmnet for hosts: ``` ['cp3043.esams.wmnet'] ``` The log can be found in `/var/log/wmf-auto-reimage/20... [08:58:50] (03PS1) 10ArielGlenn: add phedenskog to perf-roots group [puppet] - 10https://gerrit.wikimedia.org/r/455781 (https://phabricator.wikimedia.org/T202658) [09:00:28] (03CR) 10ArielGlenn: [C: 032] add phedenskog to perf-roots group [puppet] - 10https://gerrit.wikimedia.org/r/455781 (https://phabricator.wikimedia.org/T202658) (owner: 10ArielGlenn) [09:01:17] akosiaris: https://gerrit.wikimedia.org/r/#/c/mediawiki/services/cxserver/deploy/+/454744/ - to review. I'll merge cxserver patch for client. [09:01:33] 10Operations, 10SRE-Access-Requests, 10Patch-For-Review: Onboarding Balazs Pocze - https://phabricator.wikimedia.org/T202521 (10Marostegui) [09:01:57] kart_: merging https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/454745/ [09:02:04] (03PS3) 10Alexandros Kosiaris: Add LingoCloud MT config [puppet] - 10https://gerrit.wikimedia.org/r/454745 (https://phabricator.wikimedia.org/T202604) (owner: 10KartikMistry) [09:02:09] (03CR) 10Alexandros Kosiaris: [V: 032 C: 032] Add LingoCloud MT config [puppet] - 10https://gerrit.wikimedia.org/r/454745 (https://phabricator.wikimedia.org/T202604) (owner: 10KartikMistry) [09:03:38] akosiaris: key too :) [09:04:02] already added in the repo [09:05:42] kart_: puppet change (and key) deployed everywhere [09:05:54] (03PS1) 10DCausse: Setup es 5.x backport branch [software/elasticsearch/plugins] (5.x) - 10https://gerrit.wikimedia.org/r/455782 [09:05:59] akosiaris: cool. I'll update cxserver. [09:06:48] (03PS2) 10ArielGlenn: adding user ktsouroupidou to groups [puppet] - 10https://gerrit.wikimedia.org/r/454713 (https://phabricator.wikimedia.org/T202486) (owner: 10RobH) [09:08:16] 10Operations, 10SRE-Access-Requests, 10Patch-For-Review: Onboarding Balazs Pocze - https://phabricator.wikimedia.org/T202521 (10Marostegui) [09:08:21] !log kartik@deploy1001 Started deploy [cxserver/deploy@e2e5674]: Update cxserver to 98cbefd and LingoCloud deployment (T186715) [09:08:26] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:08:27] T186715: Add new MT Client for LingoCloud - https://phabricator.wikimedia.org/T186715 [09:09:04] (03CR) 10ArielGlenn: [C: 032] adding user ktsouroupidou to groups [puppet] - 10https://gerrit.wikimedia.org/r/454713 (https://phabricator.wikimedia.org/T202486) (owner: 10RobH) [09:11:04] 10Operations, 10SRE-Access-Requests, 10Patch-For-Review: Requesting access to restricted production access and analytics-privatedata-users for Kalliope Tsouroupidou - https://phabricator.wikimedia.org/T202486 (10ArielGlenn) [09:11:49] 10Operations, 10CommRel-Internals, 10Wikimedia-Mailing-lists: Close https://lists.wikimedia.org/mailman/listinfo/cep and keep the archive for now - https://phabricator.wikimedia.org/T155683 (10Qgil) Thank you! (Just a Community Relations internal thing: this task doesn't require evaluation.) [09:11:59] !log kartik@deploy1001 Finished deploy [cxserver/deploy@e2e5674]: Update cxserver to 98cbefd and LingoCloud deployment (T186715) (duration: 03m 38s) [09:12:03] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:12:32] (03PS1) 10Volans: spicerack: fine tune permission for config file [puppet] - 10https://gerrit.wikimedia.org/r/455783 (https://phabricator.wikimedia.org/T199079) [09:12:53] akosiaris: Thanks. Basic tests looks good. [09:13:29] (03PS5) 10Reedy: Add fluidsynth to wikimedia servers [puppet] - 10https://gerrit.wikimedia.org/r/445603 (https://phabricator.wikimedia.org/T184598) [09:13:31] (03CR) 10Volans: [C: 032] spicerack: fine tune permission for config file [puppet] - 10https://gerrit.wikimedia.org/r/455783 (https://phabricator.wikimedia.org/T199079) (owner: 10Volans) [09:14:23] (03PS1) 10Banyek: admin: add Banyek to icinga users [puppet] - 10https://gerrit.wikimedia.org/r/455784 (https://phabricator.wikimedia.org/T202521) [09:14:35] 10Operations, 10SRE-Access-Requests, 10Patch-For-Review: Phabricator: Allow aklapper to delete personal Herald filter rules - https://phabricator.wikimedia.org/T202503 (10Aklapper) Works. Thank you! [09:14:57] kart_: cool! [09:15:27] (03CR) 10Banyek: [C: 032] admin: add Banyek to icinga users [puppet] - 10https://gerrit.wikimedia.org/r/455784 (https://phabricator.wikimedia.org/T202521) (owner: 10Banyek) [09:16:43] jouncebot: next [09:16:43] In 1 hour(s) and 43 minute(s): European Mid-day SWAT(Max 6 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180828T1100) [09:17:11] 10Operations, 10SRE-Access-Requests: request to add imarlier to perf-roots - https://phabricator.wikimedia.org/T202657 (10ArielGlenn) @Imarlier we should get your manager sign-off (whoever you report to) for this. I don't see it in the comments or on the original task; after that this can move along. [09:17:58] (03CR) 10Marostegui: [C: 04-1] "You have to add yourself to those lines too" (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/455784 (https://phabricator.wikimedia.org/T202521) (owner: 10Banyek) [09:19:53] (03PS2) 10Banyek: admin: add Banyek to icinga users [puppet] - 10https://gerrit.wikimedia.org/r/455784 (https://phabricator.wikimedia.org/T202521) [09:20:18] 10Operations, 10Graphite, 10Nodepool, 10Zuul: Improve graphite failover - https://phabricator.wikimedia.org/T88997 (10fgiunchedi) [09:21:30] 10Operations, 10Graphite, 10Zuul: Improve graphite failover - https://phabricator.wikimedia.org/T88997 (10hashar) #Nodepool is legacy / will be gone. [09:24:08] (03PS3) 10Banyek: admin: add Banyek to icinga users [puppet] - 10https://gerrit.wikimedia.org/r/455784 (https://phabricator.wikimedia.org/T202521) [09:24:47] 10Operations, 10SRE-Access-Requests: Access to restbase servers (including sudo) for Imarlier - https://phabricator.wikimedia.org/T202563 (10ArielGlenn) Agreed, restbase-roots looks like the right group for you. We now need manager approval, and then this can go to the SRE meeting next week for review. [09:25:04] 10Operations, 10SRE-Access-Requests: Access to restbase servers (including sudo) for Imarlier - https://phabricator.wikimedia.org/T202563 (10ArielGlenn) [09:25:30] (03CR) 10Marostegui: [C: 031] admin: add Banyek to icinga users [puppet] - 10https://gerrit.wikimedia.org/r/455784 (https://phabricator.wikimedia.org/T202521) (owner: 10Banyek) [09:26:32] (03PS4) 10Banyek: admin: add Banyek to icinga users [puppet] - 10https://gerrit.wikimedia.org/r/455784 (https://phabricator.wikimedia.org/T202521) [09:27:28] (03CR) 10Banyek: [C: 032] admin: add Banyek to icinga users [puppet] - 10https://gerrit.wikimedia.org/r/455784 (https://phabricator.wikimedia.org/T202521) (owner: 10Banyek) [09:29:42] 10Operations, 10Traffic, 10Patch-For-Review: Upgrade cache servers to stretch - https://phabricator.wikimedia.org/T200445 (10ops-monitoring-bot) Completed auto-reimage of hosts: ``` ['cp3043.esams.wmnet'] ``` and were **ALL** successful. [09:30:17] 10Operations, 10SRE-Access-Requests, 10Patch-For-Review: Onboarding Balazs Pocze - https://phabricator.wikimedia.org/T202521 (10Marostegui) [09:32:07] 10Operations, 10SRE-Access-Requests, 10Performance-Team (Radar): add perf-team admins to webserver misc static servers - https://phabricator.wikimedia.org/T202910 (10ArielGlenn) Does everyone on the team need it or or just @Imarlier ? Do they need root to get files into place or are some other permissions su... [09:34:56] 10Operations, 10Traffic, 10Patch-For-Review: Upgrade cache servers to stretch - https://phabricator.wikimedia.org/T200445 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by ema on neodymium.eqiad.wmnet for hosts: ``` ['cp3046.esams.wmnet'] ``` The log can be found in `/var/log/wmf-auto-reimage/20... [09:40:18] 10Operations, 10Performance-Team, 10Wikidata, 10Wikidata-Query-Service: Wikidata produces a lot of failed requests for recentchanges API - https://phabricator.wikimedia.org/T202764 (10Ladsgroup) >>! In T202764#4537363, @jcrespo wrote: > ``` > Warning: API call had warnings trying to get remote JsonConfig:... [09:41:57] (03PS1) 10Banyek: admin: banyek added to sms contact group in nagios [puppet] - 10https://gerrit.wikimedia.org/r/455786 (https://phabricator.wikimedia.org/T202521) [09:42:48] I added you as reviewer [09:42:54] sorry, nothere [09:42:55] 10Operations, 10SRE-Access-Requests, 10Release-Engineering-Team (Watching / External): Add contint-roots to releases{1,2}001 - https://phabricator.wikimedia.org/T201470 (10ArielGlenn) If it's not just about installing the package and restarting but also troubleshooting, it sounds like you'll want an SRE pers... [09:43:11] (03CR) 10Marostegui: [C: 031] admin: banyek added to sms contact group in nagios [puppet] - 10https://gerrit.wikimedia.org/r/455786 (https://phabricator.wikimedia.org/T202521) (owner: 10Banyek) [09:44:02] (03PS2) 10Banyek: admin: banyek added to sms contact group in nagios [puppet] - 10https://gerrit.wikimedia.org/r/455786 (https://phabricator.wikimedia.org/T202521) [09:45:33] (03CR) 10Banyek: [C: 032] admin: banyek added to sms contact group in nagios [puppet] - 10https://gerrit.wikimedia.org/r/455786 (https://phabricator.wikimedia.org/T202521) (owner: 10Banyek) [09:51:11] (03CR) 10Jonas Kress (WMDE): [C: 031] InterwikiSortOrders.php doc / comment [mediawiki-config] - 10https://gerrit.wikimedia.org/r/455765 (https://phabricator.wikimedia.org/T170745) (owner: 10Addshore) [09:54:30] (03PS2) 10ArielGlenn: adding Christoph Jauera to production shell users [puppet] - 10https://gerrit.wikimedia.org/r/454873 (https://phabricator.wikimedia.org/T202475) (owner: 10RobH) [09:56:11] 10Operations, 10SRE-Access-Requests, 10wikidiff2, 10Patch-For-Review, 10User-Addshore: Give WMDE-Fisch permission to upload wikidiff2 releases (releasers-wikidiff2) - https://phabricator.wikimedia.org/T202475 (10ArielGlenn) @WMDE-Fisch I have updated the real name to reflect the name given here (see http... [09:56:23] 10Operations, 10SRE-Access-Requests, 10wikidiff2, 10Patch-For-Review, 10User-Addshore: Give WMDE-Fisch permission to upload wikidiff2 releases (releasers-wikidiff2) - https://phabricator.wikimedia.org/T202475 (10ArielGlenn) [09:57:06] (03CR) 10Addshore: adding Christoph Jauera to production shell users (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/454873 (https://phabricator.wikimedia.org/T202475) (owner: 10RobH) [09:58:32] (03CR) 10Thiemo Kreuz (WMDE): adding Christoph Jauera to production shell users (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/454873 (https://phabricator.wikimedia.org/T202475) (owner: 10RobH) [09:59:19] 10Operations, 10SRE-Access-Requests, 10Patch-For-Review: Onboarding Balazs Pocze - https://phabricator.wikimedia.org/T202521 (10Marostegui) [09:59:39] 10Operations, 10SRE-Access-Requests, 10Patch-For-Review: Onboarding Balazs Pocze - https://phabricator.wikimedia.org/T202521 (10Marostegui) @Banyek has successfully downtimed a service on icinga [10:01:11] 10Operations, 10SRE-Access-Requests, 10User-Addshore: Requesting Access to view EventLogging data for gabriel-wmde / gbirke - https://phabricator.wikimedia.org/T202072 (10ArielGlenn) [10:01:31] (03CR) 10WMDE-Fisch: "That's more correct, thanks. You could also change the EMail though." (033 comments) [puppet] - 10https://gerrit.wikimedia.org/r/454873 (https://phabricator.wikimedia.org/T202475) (owner: 10RobH) [10:01:38] (03PS1) 10Thiemo Kreuz (WMDE): thiemowmde's legal name changed [puppet] - 10https://gerrit.wikimedia.org/r/455799 [10:02:31] 10Operations, 10SRE-Access-Requests, 10wikidiff2, 10Patch-For-Review, 10User-Addshore: Give WMDE-Fisch permission to upload wikidiff2 releases (releasers-wikidiff2) - https://phabricator.wikimedia.org/T202475 (10WMDE-Fisch) >>! In T202475#4537669, @ArielGlenn wrote: > @WMDE-Fisch I have updated the real... [10:02:38] 10Operations, 10Performance-Team, 10Wikidata, 10Wikidata-Query-Service: Wikidata produces a lot of failed requests for recentchanges API - https://phabricator.wikimedia.org/T202764 (10jcrespo) @Ladsgroup That could be a separate issue and could be handled on a separate task (I don't know). The on topic her... [10:06:20] (03PS2) 10Thiemo Kreuz (WMDE): thiemowmde's legal name changed [puppet] - 10https://gerrit.wikimedia.org/r/455799 [10:06:22] 10Operations, 10Traffic, 10Patch-For-Review: Upgrade cache servers to stretch - https://phabricator.wikimedia.org/T200445 (10ops-monitoring-bot) Completed auto-reimage of hosts: ``` ['cp3046.esams.wmnet'] ``` and were **ALL** successful. [10:07:01] (03PS3) 10ArielGlenn: adding Christoph Jauera to production shell users [puppet] - 10https://gerrit.wikimedia.org/r/454873 (https://phabricator.wikimedia.org/T202475) (owner: 10RobH) [10:08:11] (03CR) 10Thiemo Kreuz (WMDE): thiemowmde's legal name changed (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/455799 (owner: 10Thiemo Kreuz (WMDE)) [10:08:25] (03PS4) 10ArielGlenn: adding Christoph Jauera to production shell users [puppet] - 10https://gerrit.wikimedia.org/r/454873 (https://phabricator.wikimedia.org/T202475) (owner: 10RobH) [10:08:28] (03CR) 10Jcrespo: "Should we apply this as is, limit it only to 512GB machines, or any other pattern?" [puppet] - 10https://gerrit.wikimedia.org/r/455769 (owner: 10Jcrespo) [10:09:43] 10Operations, 10SRE-Access-Requests, 10wikidiff2, 10Patch-For-Review, 10User-Addshore: Give WMDE-Fisch permission to upload wikidiff2 releases (releasers-wikidiff2) - https://phabricator.wikimedia.org/T202475 (10ArielGlenn) Please have another look at the patch and let me know. [10:10:59] (03CR) 10WMDE-Fisch: [C: 031] adding Christoph Jauera to production shell users [puppet] - 10https://gerrit.wikimedia.org/r/454873 (https://phabricator.wikimedia.org/T202475) (owner: 10RobH) [10:11:20] (03CR) 10ArielGlenn: adding Christoph Jauera to production shell users (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/454873 (https://phabricator.wikimedia.org/T202475) (owner: 10RobH) [10:11:34] 10Operations, 10Traffic, 10Patch-For-Review: Upgrade cache servers to stretch - https://phabricator.wikimedia.org/T200445 (10ema) 05Open>03Resolved a:03ema Done! \o/ The only cache host running jessie is cp1008, which will be replaced soon by cp1099: T202966. [10:11:53] (03CR) 10Thiemo Kreuz (WMDE): [C: 031] adding Christoph Jauera to production shell users [puppet] - 10https://gerrit.wikimedia.org/r/454873 (https://phabricator.wikimedia.org/T202475) (owner: 10RobH) [10:13:05] 10Operations, 10SRE-Access-Requests, 10wikidiff2, 10Patch-For-Review, 10User-Addshore: Give WMDE-Fisch permission to upload wikidiff2 releases (releasers-wikidiff2) - https://phabricator.wikimedia.org/T202475 (10WMDE-Fisch) >>! In T202475#4537719, @ArielGlenn wrote: > Please have another look at the patc... [10:14:15] (03CR) 10ArielGlenn: [C: 032] adding Christoph Jauera to production shell users [puppet] - 10https://gerrit.wikimedia.org/r/454873 (https://phabricator.wikimedia.org/T202475) (owner: 10RobH) [10:18:11] (03PS2) 10ArielGlenn: adding Christoph Jauera to releasers-wikidiff2 [puppet] - 10https://gerrit.wikimedia.org/r/454879 (https://phabricator.wikimedia.org/T202475) (owner: 10RobH) [10:18:28] (03PS3) 10ArielGlenn: adding Christoph Jauera to releasers-wikidiff2 [puppet] - 10https://gerrit.wikimedia.org/r/454879 (https://phabricator.wikimedia.org/T202475) (owner: 10RobH) [10:20:31] 10Operations, 10SRE-Access-Requests, 10Patch-For-Review: Onboarding Balazs Pocze - https://phabricator.wikimedia.org/T202521 (10Marostegui) Only pending the pwstore access. The rest is done and confirmed. [10:20:46] (03CR) 10ArielGlenn: [C: 032] adding Christoph Jauera to releasers-wikidiff2 [puppet] - 10https://gerrit.wikimedia.org/r/454879 (https://phabricator.wikimedia.org/T202475) (owner: 10RobH) [10:21:50] PROBLEM - IPMI Sensor Status on cp3038 is CRITICAL: Sensor Type(s) Temperature, Power_Supply Status: Critical [PS Redundancy = Critical, Status = Critical] [10:22:48] (03PS1) 10Marostegui: Revert "db-eqiad.php: Depool db1103:3314" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/455803 [10:23:27] 10Operations, 10SRE-Access-Requests, 10Patch-For-Review: Please add aaron to perf-team - https://phabricator.wikimedia.org/T202650 (10ArielGlenn) [10:24:31] (03CR) 10Thiemo Kreuz (WMDE): adding Christoph Jauera to production shell users (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/454873 (https://phabricator.wikimedia.org/T202475) (owner: 10RobH) [10:26:24] (03PS2) 10ArielGlenn: add aaron to perf-team and perf-roots groups [puppet] - 10https://gerrit.wikimedia.org/r/454887 (https://phabricator.wikimedia.org/T202650) (owner: 10RobH) [10:26:47] (03PS3) 10ArielGlenn: add aaron to perf-team and perf-roots groups [puppet] - 10https://gerrit.wikimedia.org/r/454887 (https://phabricator.wikimedia.org/T202650) (owner: 10RobH) [10:28:12] (03CR) 10ArielGlenn: [C: 032] add aaron to perf-team and perf-roots groups [puppet] - 10https://gerrit.wikimedia.org/r/454887 (https://phabricator.wikimedia.org/T202650) (owner: 10RobH) [10:28:14] jouncebot: next [10:28:14] In 0 hour(s) and 31 minute(s): European Mid-day SWAT(Max 6 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180828T1100) [10:29:25] 10Operations, 10SRE-Access-Requests, 10Patch-For-Review: Please add aaron to perf-team - https://phabricator.wikimedia.org/T202650 (10ArielGlenn) [10:30:17] (03PS1) 10Gerrit Patch Uploader: Set category collation to 'uca-et-u-kn' on Estonian-language wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/455804 (https://phabricator.wikimedia.org/T202977) [10:30:22] (03CR) 10Arturo Borrero Gonzalez: [C: 04-1] "https://puppet-compiler.wmflabs.org/compiler02/12258/" [puppet] - 10https://gerrit.wikimedia.org/r/455615 (https://phabricator.wikimedia.org/T202636) (owner: 10Arturo Borrero Gonzalez) [10:30:24] (03CR) 10Gerrit Patch Uploader: "This commit was uploaded using the Gerrit Patch Uploader [1]." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/455804 (https://phabricator.wikimedia.org/T202977) (owner: 10Gerrit Patch Uploader) [10:31:27] heads up, later today I'm going to move statsd.eqiad.wmnet to graphite1004, last time we had to restart a bunch of services to pick up the dns change as per T157022 [10:31:28] T157022: Suspected faulty SSD on graphite1001 - https://phabricator.wikimedia.org/T157022 [10:32:51] (03PS1) 10Filippo Giunchedi: lower statsd/carbon CNAMES TTL [dns] - 10https://gerrit.wikimedia.org/r/455805 (https://phabricator.wikimedia.org/T88997) [10:33:46] (03PS2) 10Filippo Giunchedi: lower statsd/carbon CNAMES TTL [dns] - 10https://gerrit.wikimedia.org/r/455805 (https://phabricator.wikimedia.org/T88997) [10:33:56] (03CR) 10Filippo Giunchedi: [C: 032] lower statsd/carbon CNAMES TTL [dns] - 10https://gerrit.wikimedia.org/r/455805 (https://phabricator.wikimedia.org/T88997) (owner: 10Filippo Giunchedi) [10:34:19] (03CR) 10Marostegui: [C: 032] Revert "db-eqiad.php: Depool db1103:3314" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/455803 (owner: 10Marostegui) [10:35:38] (03Merged) 10jenkins-bot: Revert "db-eqiad.php: Depool db1103:3314" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/455803 (owner: 10Marostegui) [10:36:44] !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Repool db1103:3314 (duration: 00m 50s) [10:36:48] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:36:56] (03PS1) 10Marostegui: db-eqiad.php: Depool db1081 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/455806 [10:37:10] (03CR) 10ArielGlenn: "Let's get your name fixes done in this patchset, and look at any other issues afterwards." [puppet] - 10https://gerrit.wikimedia.org/r/455799 (owner: 10Thiemo Kreuz (WMDE)) [10:38:13] (03Abandoned) 10Filippo Giunchedi: Shift carbon/statsd write traffic to graphite1004 [dns] - 10https://gerrit.wikimedia.org/r/454872 (https://phabricator.wikimedia.org/T196484) (owner: 10Filippo Giunchedi) [10:38:20] (03PS1) 10Filippo Giunchedi: Switch statsd/carbon to graphite1004 [dns] - 10https://gerrit.wikimedia.org/r/455808 (https://phabricator.wikimedia.org/T196484) [10:39:39] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Depool db1081 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/455806 (owner: 10Marostegui) [10:40:57] (03Merged) 10jenkins-bot: db-eqiad.php: Depool db1081 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/455806 (owner: 10Marostegui) [10:42:02] !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Depool db1081 (duration: 00m 48s) [10:42:05] !log Deploy schema change on db1081 [10:42:05] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:42:09] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:43:46] Urbanecm: around? [10:44:18] 10Operations, 10SRE-Access-Requests, 10wikidiff2, 10Patch-For-Review, 10User-Addshore: Give WMDE-Fisch permission to upload wikidiff2 releases (releasers-wikidiff2) - https://phabricator.wikimedia.org/T202475 (10ArielGlenn) [10:44:40] !log joal@deploy1001 Started deploy [analytics/refinery@8cded22]: Regular weekly deploy of analytics Hadoop jobs [10:44:43] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:45:30] (03PS2) 10Arturo Borrero Gonzalez: cloudvps: labtestn: add network link with labtest deployment network [puppet] - 10https://gerrit.wikimedia.org/r/455615 (https://phabricator.wikimedia.org/T202636) [10:47:14] (03PS3) 10ArielGlenn: thiemowmde's legal name changed [puppet] - 10https://gerrit.wikimedia.org/r/455799 (owner: 10Thiemo Kreuz (WMDE)) [10:48:15] (03CR) 10jenkins-bot: Revert "db-eqiad.php: Depool db1103:3314" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/455803 (owner: 10Marostegui) [10:48:17] (03CR) 10jenkins-bot: db-eqiad.php: Depool db1081 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/455806 (owner: 10Marostegui) [10:48:23] (03CR) 10ArielGlenn: [C: 032] thiemowmde's legal name changed [puppet] - 10https://gerrit.wikimedia.org/r/455799 (owner: 10Thiemo Kreuz (WMDE)) [10:50:33] PROBLEM - Request latencies on argon is CRITICAL: instance=10.64.32.133:6443 verb={GET,PUT} https://grafana.wikimedia.org/dashboard/db/kubernetes-api [10:50:52] PROBLEM - etcd request latencies on neon is CRITICAL: instance=10.64.0.40:6443 operation={get,list} https://grafana.wikimedia.org/dashboard/db/kubernetes-api [10:50:52] PROBLEM - etcd request latencies on argon is CRITICAL: instance=10.64.32.133:6443 operation={compareAndSwap,get} https://grafana.wikimedia.org/dashboard/db/kubernetes-api [10:51:13] !log T202549 downtime cloudcontrol1003.wikimedia.org in icinga for 2h [10:51:13] PROBLEM - Request latencies on neon is CRITICAL: instance=10.64.0.40:6443 verb=LIST https://grafana.wikimedia.org/dashboard/db/kubernetes-api [10:51:17] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:51:18] T202549: cloudvps: eqiad1: move nova db to m5-master - https://phabricator.wikimedia.org/T202549 [10:52:01] 10Operations, 10SRE-Access-Requests, 10Patch-For-Review: request to add phendeskog to perf-roots - https://phabricator.wikimedia.org/T202658 (10ArielGlenn) As soon as the user verifies that access works as expected, we can close this ticket. [10:52:02] PROBLEM - etcd request latencies on chlorine is CRITICAL: instance=10.64.0.45:6443 operation={compareAndSwap,get,list} https://grafana.wikimedia.org/dashboard/db/kubernetes-api [10:52:02] PROBLEM - Request latencies on chlorine is CRITICAL: instance=10.64.0.45:6443 verb={GET,LIST,PUT} https://grafana.wikimedia.org/dashboard/db/kubernetes-api [10:52:21] !log Import nova_api_eqiad1 and nova_eqiad1 into m5 master - T202549 [10:52:26] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:52:40] 10Operations, 10SRE-Access-Requests, 10Patch-For-Review: Requesting access to restricted production access and analytics-privatedata-users for Kalliope Tsouroupidou - https://phabricator.wikimedia.org/T202486 (10ArielGlenn) As soon as the user verifies that access works as expected, we can close this ticket. [10:52:52] 10Operations, 10SRE-Access-Requests, 10Patch-For-Review: Requesting access to restricted production access and analytics-privatedata-users for Samuel Guebo - https://phabricator.wikimedia.org/T202362 (10ArielGlenn) As soon as the user verifies that access works as expected, we can close this ticket. [10:53:04] 10Operations, 10SRE-Access-Requests, 10wikidiff2, 10Patch-For-Review, 10User-Addshore: Give WMDE-Fisch permission to upload wikidiff2 releases (releasers-wikidiff2) - https://phabricator.wikimedia.org/T202475 (10ArielGlenn) As soon as the user verifies that access works as expected, we can close this tic... [10:53:10] 10Operations, 10Proton, 10Services (doing): Requests to MW 404 when on HTTPS - https://phabricator.wikimedia.org/T202982 (10mobrovac) p:05Triage>03High [10:53:22] 10Operations, 10SRE-Access-Requests, 10Patch-For-Review: Please add aaron to perf-team - https://phabricator.wikimedia.org/T202650 (10ArielGlenn) As soon as the user verifies that access works as expected, we can close this ticket. [10:53:54] !log joal@deploy1001 Finished deploy [analytics/refinery@8cded22]: Regular weekly deploy of analytics Hadoop jobs (duration: 09m 14s) [10:53:55] (03PS12) 10Arturo Borrero Gonzalez: cloudvps: eqiad1: move nova DBs to m5-master [puppet] - 10https://gerrit.wikimedia.org/r/454774 (https://phabricator.wikimedia.org/T202549) [10:53:57] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:54:14] 10Operations, 10SRE-Access-Requests: Requesting access to restricted production access and analytics-privatedata-users for Ty Hargrove - https://phabricator.wikimedia.org/T202363 (10ArielGlenn) [10:54:50] (03PS1) 10Filippo Giunchedi: graphite: use keepLastValue for thumbs drift alert [puppet] - 10https://gerrit.wikimedia.org/r/455811 (https://phabricator.wikimedia.org/T199073) [10:54:54] (03CR) 10Arturo Borrero Gonzalez: [C: 032] cloudvps: eqiad1: move nova DBs to m5-master [puppet] - 10https://gerrit.wikimedia.org/r/454774 (https://phabricator.wikimedia.org/T202549) (owner: 10Arturo Borrero Gonzalez) [10:55:37] (03PS2) 10Filippo Giunchedi: graphite: use keepLastValue for thumbs drift alert [puppet] - 10https://gerrit.wikimedia.org/r/455811 (https://phabricator.wikimedia.org/T199073) [10:56:36] (03CR) 10Filippo Giunchedi: [C: 032] graphite: use keepLastValue for thumbs drift alert [puppet] - 10https://gerrit.wikimedia.org/r/455811 (https://phabricator.wikimedia.org/T199073) (owner: 10Filippo Giunchedi) [10:57:02] RECOVERY - Request latencies on argon is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/kubernetes-api [10:57:22] RECOVERY - etcd request latencies on neon is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/kubernetes-api [10:57:22] RECOVERY - etcd request latencies on argon is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/kubernetes-api [10:57:23] RECOVERY - etcd request latencies on chlorine is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/kubernetes-api [10:57:23] RECOVERY - Request latencies on chlorine is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/kubernetes-api [10:57:43] RECOVERY - Request latencies on neon is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/kubernetes-api [11:00:04] addshore, hashar, aude, MaxSem, twentyafterfour, RoanKattouw, Dereckson, thcipriani, Niharika, and zeljkof: It is that lovely time of the day again! You are hereby commanded to deploy European Mid-day SWAT(Max 6 patches). (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180828T1100). [11:00:04] Aleksey_WMDE and Urbanecm: A patch you scheduled for European Mid-day SWAT(Max 6 patches) is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker. [11:00:15] Here [11:00:18] Here [11:00:46] !log T202549 delete mysql-server from cloudcontrol100[3,4.wikimedia.org [11:00:50] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:00:51] T202549: cloudvps: eqiad1: move nova db to m5-master - https://phabricator.wikimedia.org/T202549 [11:01:33] Who is deploying? [11:02:05] I can SWAT today [11:02:16] Aleksey_WMDE: you're not a deployer, right? [11:02:21] Cool! [11:02:29] zeljkof: Nope :) [11:03:21] Aleksey_WMDE: ok, I'll ping you once your patch is at mwdebug1002 [11:03:29] Urbanecm: please stand by, your patches are important to us ;) [11:03:30] Got it [11:04:38] (03PS1) 10Faidon Liambotis: ssh-agent-proxy: support RSA SHA2 operations [puppet] - 10https://gerrit.wikimedia.org/r/455812 (https://phabricator.wikimedia.org/T202952) [11:04:47] Will do zeljkof :D [11:05:05] (03PS1) 10ArielGlenn: add Ty Hargrove to shell users [puppet] - 10https://gerrit.wikimedia.org/r/455813 (https://phabricator.wikimedia.org/T202363) [11:05:57] (03PS3) 10Zfilipin: Wikidata: Use new item ID formatter for Q1-Q100 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/455390 (https://phabricator.wikimedia.org/T201833) (owner: 10WMDE-leszek) [11:06:36] (03CR) 10Zfilipin: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/455390 (https://phabricator.wikimedia.org/T201833) (owner: 10WMDE-leszek) [11:07:19] (03PS2) 10ArielGlenn: add Ty Hargrove to shell users [puppet] - 10https://gerrit.wikimedia.org/r/455813 (https://phabricator.wikimedia.org/T202363) [11:07:53] (03Merged) 10jenkins-bot: Wikidata: Use new item ID formatter for Q1-Q100 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/455390 (https://phabricator.wikimedia.org/T201833) (owner: 10WMDE-leszek) [11:08:18] Urbanecm: sorry to do this to you again but can I swap two SWAT slots with you? I have some WLM-related patches which are urgent and some of those config changes don't seem to be [11:08:19] (03CR) 10ArielGlenn: [C: 032] add Ty Hargrove to shell users [puppet] - 10https://gerrit.wikimedia.org/r/455813 (https://phabricator.wikimedia.org/T202363) (owner: 10ArielGlenn) [11:08:53] tgr: there might be time for all, you can go after Aleksey_WMDE and I'll try to deploy as much of Urbanecm's patches as possible [11:09:19] that's even better, thanks [11:09:29] zeljkof, please reload the calendar, I've made some ordering changes. thx [11:09:42] Aleksey_WMDE: your patch is at mwdebug1002, please test and let me know if I can deploy [11:09:50] Urbanecm: will do, thanks [11:09:57] Got it. Give me 5 minutes [11:10:13] tgr: just please update the deployment calendar, you are next after Aleksey_WMDE [11:11:26] (03CR) 10Zfilipin: [C: 031] Remove uzwiki from commonsuploads.dblist [mediawiki-config] - 10https://gerrit.wikimedia.org/r/455628 (https://phabricator.wikimedia.org/T202847) (owner: 10Urbanecm) [11:11:36] done [11:12:03] (03PS1) 10ArielGlenn: add thargrove to restricted and analytics-privatedata-users [puppet] - 10https://gerrit.wikimedia.org/r/455814 (https://phabricator.wikimedia.org/T202363) [11:12:12] (03CR) 10Zfilipin: [C: 031] Create namespace aliases in zhwikiversity [mediawiki-config] - 10https://gerrit.wikimedia.org/r/455394 (https://phabricator.wikimedia.org/T202821) (owner: 10Urbanecm) [11:12:17] tgr: thanks! [11:13:01] (03PS3) 10Arturo Borrero Gonzalez: cloudvps: labtestn: add network link with labtest deployment network [puppet] - 10https://gerrit.wikimedia.org/r/455615 (https://phabricator.wikimedia.org/T202636) [11:13:25] zeljkof: Checked. No issues spotted. [11:13:34] Aleksey_WMDE: ok, deploying [11:14:12] (03CR) 10ArielGlenn: [C: 032] add thargrove to restricted and analytics-privatedata-users [puppet] - 10https://gerrit.wikimedia.org/r/455814 (https://phabricator.wikimedia.org/T202363) (owner: 10ArielGlenn) [11:14:16] tgr: can I deploy config changes while you wait for vendor/core patch to merge? [11:14:31] !log zfilipin@deploy1001 Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:455390|Wikidata: Use new item ID formatter for Q1-Q100 (T201833)]] (duration: 00m 49s) [11:14:36] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:14:36] T201833: Use link formatter that uses cache instead of wb_terms for items Q1-Q100 - https://phabricator.wikimedia.org/T201833 [11:14:45] sure [11:14:53] shouldn't take long, though [11:15:12] 10Operations, 10SRE-Access-Requests, 10Patch-For-Review: Requesting access to restricted production access and analytics-privatedata-users for Ty Hargrove - https://phabricator.wikimedia.org/T202363 (10ArielGlenn) [11:15:34] 10Operations, 10SRE-Access-Requests, 10Patch-For-Review: Requesting access to restricted production access and analytics-privatedata-users for Ty Hargrove - https://phabricator.wikimedia.org/T202363 (10ArielGlenn) As soon as the user verifies that access works as expected, we can close this ticket. [11:16:04] Aleksey_WMDE: deployed, please check and thanks for deploying with #releng :) [11:16:23] zeljkof: Thank you! Will do! [11:16:32] tgr: swat is yours, if you think your patches are quick to deploy, I'll wait [11:17:36] is there any gotcha to deploying a vendor backport? I don't think I have done that before [11:17:50] tgr: I don't think I've done it either :/ [11:18:18] 10Operations, 10SRE-Access-Requests, 10User-Addshore: Requesting Access to view EventLogging data for gabriel-wmde / gbirke - https://phabricator.wikimedia.org/T202072 (10ArielGlenn) We just need the manager/sponsor sign-off and then this can go ahead. [11:19:09] 10Operations, 10SRE-Access-Requests, 10Performance-Team (Radar): add perf-team admins to webserver misc static servers - https://phabricator.wikimedia.org/T202910 (10ArielGlenn) p:05Triage>03Normal [11:19:29] zeljkof: it's apparently not that fast after all so I'll just wait for the config changes [11:19:58] tgr: ok, I'll start with config changes for Urbanecm, let me know when your patches are merged, so I'll stop [11:20:15] (03CR) 10Zfilipin: [C: 031] Permissions changes in ruwikinews [mediawiki-config] - 10https://gerrit.wikimedia.org/r/455232 (https://phabricator.wikimedia.org/T201265) (owner: 10Urbanecm) [11:20:36] (03CR) 10Zfilipin: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/455628 (https://phabricator.wikimedia.org/T202847) (owner: 10Urbanecm) [11:21:36] (03PS4) 10Arturo Borrero Gonzalez: cloudvps: labtestn: add network link with labtest deployment network [puppet] - 10https://gerrit.wikimedia.org/r/455615 (https://phabricator.wikimedia.org/T202636) [11:21:55] (03Merged) 10jenkins-bot: Remove uzwiki from commonsuploads.dblist [mediawiki-config] - 10https://gerrit.wikimedia.org/r/455628 (https://phabricator.wikimedia.org/T202847) (owner: 10Urbanecm) [11:22:29] 10Operations, 10SRE-Access-Requests, 10User-Addshore: Requesting Access to view EventLogging data for gabriel-wmde / gbirke - https://phabricator.wikimedia.org/T202072 (10Addshore) >>! In T202072#4538007, @ArielGlenn wrote: > We just need the manager/sponsor sign-off and then this can go ahead. It looks lik... [11:22:43] Urbanecm: 455628 is at mwdebug1002 [11:23:08] zeljkof, working fine, please deploy [11:23:12] (03PS2) 10Zfilipin: Create namespace aliases in zhwikiversity [mediawiki-config] - 10https://gerrit.wikimedia.org/r/455394 (https://phabricator.wikimedia.org/T202821) (owner: 10Urbanecm) [11:23:37] Urbanecm: deploying [11:24:19] !log zfilipin@deploy1001 Synchronized dblists/commonsuploads.dblist: SWAT: [[gerrit:455628|Remove uzwiki from commonsuploads.dblist (T202847)]] (duration: 00m 48s) [11:24:23] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:24:24] T202847: Granting upload rights to all or certain users on Uzwiki - https://phabricator.wikimedia.org/T202847 [11:24:26] Urbanecm: deployed [11:24:32] ack [11:25:02] (03CR) 10Zfilipin: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/455394 (https://phabricator.wikimedia.org/T202821) (owner: 10Urbanecm) [11:26:19] (03Merged) 10jenkins-bot: Create namespace aliases in zhwikiversity [mediawiki-config] - 10https://gerrit.wikimedia.org/r/455394 (https://phabricator.wikimedia.org/T202821) (owner: 10Urbanecm) [11:27:08] Urbanecm: 455394 is at mwdebug1002 [11:27:13] ack [11:27:26] (03PS2) 10Zfilipin: Permissions changes in ruwikinews [mediawiki-config] - 10https://gerrit.wikimedia.org/r/455232 (https://phabricator.wikimedia.org/T201265) (owner: 10Urbanecm) [11:27:54] 10Operations, 10SRE-Access-Requests, 10User-Addshore: Requesting Access to view EventLogging data for gabriel-wmde / gbirke - https://phabricator.wikimedia.org/T202072 (10ArielGlenn) >>! In T202072#4538020, @Addshore wrote: ... > It looks like @RobH marked the other 2 tickets as signed off, I wonder if this... [11:27:56] working, please deploy zeljkof [11:28:07] Urbanecm: deploying [11:29:04] !log zfilipin@deploy1001 Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:455394|Create namespace aliases in zhwikiversity (T202821)]] (duration: 00m 48s) [11:29:08] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:29:09] T202821: Create namespace aliases in zhwikiversity - https://phabricator.wikimedia.org/T202821 [11:29:14] Urbanecm: deployed [11:29:16] ack [11:29:19] (03CR) 10Zfilipin: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/455232 (https://phabricator.wikimedia.org/T201265) (owner: 10Urbanecm) [11:30:59] (03Merged) 10jenkins-bot: Permissions changes in ruwikinews [mediawiki-config] - 10https://gerrit.wikimedia.org/r/455232 (https://phabricator.wikimedia.org/T201265) (owner: 10Urbanecm) [11:31:53] 10Operations, 10Dumps-Generation: Reboots of dumps/snapshot hosts for L1TF/microcode updates - https://phabricator.wikimedia.org/T202623 (10ArielGlenn) [11:32:43] Urbanecm: 455232 is at mwdebug1002 [11:32:52] ack [11:33:22] please deploy it zeljkof [11:33:30] Urbanecm: ok [11:34:07] (03PS2) 10Zfilipin: Translation of scnwiktionary sitename was removed, add it back [mediawiki-config] - 10https://gerrit.wikimedia.org/r/455627 (https://phabricator.wikimedia.org/T202926) (owner: 10Urbanecm) [11:34:37] !log zfilipin@deploy1001 Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:455232|Permissions changes in ruwikinews (T201265)]] (duration: 00m 48s) [11:34:42] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:34:42] T201265: Change group rights in Russian Wikinews (move, rollback) - https://phabricator.wikimedia.org/T201265 [11:34:45] Urbanecm: deployed [11:34:47] ack [11:34:53] zeljkof: Urbanecm Just in case if you missed: I've noticed some errors in Logstash on mwdebug1002 [11:35:12] I cannot watch logstash, I don't have access to it :( [11:35:18] Aleksey_WMDE: did not see it, let me check [11:35:30] At 2018-08-28T11:31:41 [11:36:49] Aleksey_WMDE: hm, you think it's serious? scap did not complain during deployment [11:37:06] Urbanecm: 455232 is deployed [11:37:11] Well, I've sent an email on 11:31:31 and apparently I did not turn mwdebug1002 [11:37:13] zeljkof: I have no clue... :) [11:37:30] *turn mwdebug1002 off [11:37:54] (an email via MW interface) [11:38:03] Aleksey_WMDE: there are many messages in logs, unless one of them happens a lot I tend to ignore them :/ [11:38:17] Urbanecm: I didn't understand you [11:38:30] (03PS5) 10Arturo Borrero Gonzalez: cloudvps: labtestn: add network link with labtest deployment network [puppet] - 10https://gerrit.wikimedia.org/r/455615 (https://phabricator.wikimedia.org/T202636) [11:38:42] merged, finally. I had no idea vendor tests are so slow. [11:38:43] Urbanecm: is it related to one of the commits for today? [11:38:46] It's just the error happened exactly before you wrote "Urbanecm: 455232 is at mwdebug1002" [11:38:57] At the time Aleksey_WMDE mentioned, I've used Special:EmailUser [11:39:00] Aleksey_WMDE: could be a coincidence :/ [11:39:06] Urbanecm: ah [11:39:06] And I did not turn mwdebug1002 extension off [11:39:16] So it is not related to commit, but might be relevant to the error. [11:39:18] tgr: ok, go ahead, let me know when you're done [11:39:19] I don't see it, so I don't know. [11:39:45] Urbanecm: tgr will now deploy a couple of urgent patches, I'll continue when he's done [11:39:50] ack [11:41:35] (03CR) 10Zfilipin: [C: 031] Translation of scnwiktionary sitename was removed, add it back [mediawiki-config] - 10https://gerrit.wikimedia.org/r/455627 (https://phabricator.wikimedia.org/T202926) (owner: 10Urbanecm) [11:43:51] (03CR) 10Zfilipin: [C: 031] Allow subpages in main namespace in zhwikiversity [mediawiki-config] - 10https://gerrit.wikimedia.org/r/455380 (https://phabricator.wikimedia.org/T202007) (owner: 10星耀晨曦) [11:44:07] (03PS3) 10Zfilipin: Allow subpages in main namespace in zhwikiversity [mediawiki-config] - 10https://gerrit.wikimedia.org/r/455380 (https://phabricator.wikimedia.org/T202007) (owner: 10星耀晨曦) [11:49:19] uh, scap died on me [11:49:28] scap sync-file php-1.32.0-wmf.18/vendor/ '...' [11:49:37] 11:47:21 sync-file failed: Command 'find -O2 '/srv/mediawiki-staging/php-1.32.0-wmf.18/vendor/' -not -type d -name '*.php' -not -name 'autoload_static.php' -or -name '*.inc' | xargs -n1 -P30 -exec php -l >/dev/null 2>&1' returned non-zero exit status 123 [11:49:43] zeljkof: any idea? [11:50:04] 10Operations, 10Commons, 10Multimedia, 10media-storage, 10User-Josve05a: Specific revisions of multiple files missing from Swift - 404 Not Found returned - https://phabricator.wikimedia.org/T124101 (10Raymond) Original of https://commons.wikimedia.org/wiki/File:ATX-Netzteil.jpg is missing too: File not f... [11:50:13] does that mean some kind of lint error? [11:50:48] PROBLEM - IPMI Sensor Status on cp3039 is CRITICAL: Sensor Type(s) Temperature, Power_Supply Status: Critical [PS Redundancy = Critical, Status = Critical] [11:50:49] tgr: huh, I have no idea, hashar can you help? cc addshore Amir1 apergos [11:51:07] * addshore reads up [11:51:20] (03CR) 10jenkins-bot: Wikidata: Use new item ID formatter for Q1-Q100 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/455390 (https://phabricator.wikimedia.org/T201833) (owner: 10WMDE-leszek) [11:51:22] (03CR) 10jenkins-bot: Remove uzwiki from commonsuploads.dblist [mediawiki-config] - 10https://gerrit.wikimedia.org/r/455628 (https://phabricator.wikimedia.org/T202847) (owner: 10Urbanecm) [11:51:24] (03CR) 10jenkins-bot: Create namespace aliases in zhwikiversity [mediawiki-config] - 10https://gerrit.wikimedia.org/r/455394 (https://phabricator.wikimedia.org/T202821) (owner: 10Urbanecm) [11:51:24] what command? [11:51:26] (03CR) 10jenkins-bot: Permissions changes in ruwikinews [mediawiki-config] - 10https://gerrit.wikimedia.org/r/455232 (https://phabricator.wikimedia.org/T201265) (owner: 10Urbanecm) [11:51:40] (03PS6) 10Arturo Borrero Gonzalez: cloudvps: labtestn: add network link with labtest deployment network [puppet] - 10https://gerrit.wikimedia.org/r/455615 (https://phabricator.wikimedia.org/T202636) [11:51:45] Fatal error: syntax error, unexpected T_CONST, expecting T_VARIABLE in /srv/mediawiki-staging/php-1.32.0-wmf.18/vendor/psy/psysh/test/ClassWithSecrets.php on line 16 [11:51:50] argh [11:51:54] the output goes to /dev/null makes it unreadable [11:51:58] why are we linting test files again? [11:52:26] it's in libraries... [11:52:32] test files probably shoudnt be in vendor anyway [11:52:49] yeah, tell that to composer [11:53:08] * apergos is reading along with bemusement [11:53:18] tgr: https://github.com/wikimedia/mediawiki-vendor/blob/master/.gitignore#L39-L45 [11:53:20] zeljkof: wanna deploy the remaining config patches in the meanwhile? [11:53:42] vendor and extensions/TemplateStyles has some undeployed changes, but that should not get in the way [11:54:26] hm [11:54:27] tgr: I would rather wait until the problem is resolved [11:54:44] Urbanecm: are the two remaining commits urgent? can we move them to tomorrow? [11:55:00] Yes, we can [11:55:01] addshore: won't that cause problems for the next person running composer update? [11:55:31] https://wikitech.wikimedia.org/w/index.php?title=Deployments&diff=1801425&oldid=1801420 [11:55:36] it shouldn't do [11:55:42] I don't believe it has caused any issues so far [11:56:13] we have a lint whitelist in vendor: https://github.com/wikimedia/mediawiki-vendor/blob/master/composer.json#L146 [11:56:23] so why is scap not using that? [11:56:44] (03CR) 10Arturo Borrero Gonzalez: [C: 032] "Compiler happy: https://puppet-compiler.wmflabs.org/compiler02/12263/" [puppet] - 10https://gerrit.wikimedia.org/r/455615 (https://phabricator.wikimedia.org/T202636) (owner: 10Arturo Borrero Gonzalez) [11:56:50] Urbanecm: great, thanks [11:56:58] 10Operations, 10docker-pkg, 10Patch-For-Review: releng/mediawiki-phpcs-dryrun fails to upload to docker-registry.wikimedia.org - https://phabricator.wikimedia.org/T200722 (10akosiaris) The image is too large and fills up /var/lib/nginx which is a 1G tmpfs, ending up failing. Let me figure out why and how we... [11:56:59] yw [12:00:04] Deploy window Pre MediaWiki train sanity break (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180828T1200) [12:01:27] damn git submodules [12:01:28] addshore: can you +2 https://gerrit.wikimedia.org/r/#/c/mediawiki/vendor/+/455817 ? [12:01:46] zeljkof, can you please run namespaceDupes for zhwikiversity? We touched namespaces, just to be sure [12:01:52] tgr: shouldnt that also remove some files? [12:02:02] I guess they are already commited? [12:02:26] oh, right [12:03:37] (03PS1) 10Faidon Liambotis: ssh-agent-proxy: make key location configurable [puppet] - 10https://gerrit.wikimedia.org/r/455818 [12:03:39] (03PS1) 10Faidon Liambotis: ssh-agent-proxy: add default values to --help [puppet] - 10https://gerrit.wikimedia.org/r/455819 [12:03:42] (03PS1) 10Faidon Liambotis: ssh-agent-proxy: switch to logger and add --debug [puppet] - 10https://gerrit.wikimedia.org/r/455820 [12:03:44] (03PS1) 10Faidon Liambotis: keyholder: add a \n to a content => line [puppet] - 10https://gerrit.wikimedia.org/r/455821 [12:03:50] 10Operations, 10Performance-Team, 10Wikidata, 10Wikidata-Query-Service, 10User-Addshore: Wikidata produces a lot of failed requests for recentchanges API - https://phabricator.wikimedia.org/T202764 (10Addshore) [12:03:58] addshore: fixed [12:04:27] tgr: +2ed [12:04:37] thanks! [12:07:17] Urbanecm: sure [12:09:18] (03CR) 10Volans: [C: 04-1] "Using formatter_class=argparse.ArgumentDefaultsHelpFormatter in the call to ArgumentParser should make it even more DRY and automatic ;)" [puppet] - 10https://gerrit.wikimedia.org/r/455819 (owner: 10Faidon Liambotis) [12:10:38] Urbanecm: https://phabricator.wikimedia.org/T202821#4538192 [12:10:42] thx [12:10:56] (03CR) 10Zfilipin: "script output at T202821#4538192" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/455394 (https://phabricator.wikimedia.org/T202821) (owner: 10Urbanecm) [12:12:32] zeljkof, can you please run mwscript namespaceDupes.php zhwikiversity --fix --add-prefix=T202821 as well, to resolve conflicts? [12:12:33] T202821: Create namespace aliases in zhwikiversity - https://phabricator.wikimedia.org/T202821 [12:13:06] Urbanecm: sure [12:13:20] (03CR) 10Volans: [C: 031] "Makes sense to me and code looks good." [puppet] - 10https://gerrit.wikimedia.org/r/455818 (owner: 10Faidon Liambotis) [12:13:33] 10Operations, 10ops-codfw, 10DBA, 10Patch-For-Review: db2088 rebooted itself and came back sick - https://phabricator.wikimedia.org/T202822 (10Marostegui) s2 finished checking - all good [12:14:15] 10Operations, 10netops, 10Patch-For-Review: rancid pubkey auth to Junos 17.4 failure - https://phabricator.wikimedia.org/T202952 (10faidon) This was logged every time a login was attempted, in netmon1002's /var/log/auth.log with this: `Aug 28 00:08:07 netmon1002 /ssh-agent-proxy[12127]: [ (03CR) 10Volans: [C: 031] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/455812 (https://phabricator.wikimedia.org/T202952) (owner: 10Faidon Liambotis) [12:14:39] Urbanecm: done! https://phabricator.wikimedia.org/T202821#4538200 [12:14:42] thanks [12:16:18] PROBLEM - MediaWiki memcached error rate on graphite1001 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [5000.0] https://grafana.wikimedia.org/dashboard/db/mediawiki-graphite-alerts?orgId=1&panelId=1&fullscreen [12:18:27] RECOVERY - MediaWiki memcached error rate on graphite1001 is OK: OK: Less than 40.00% above the threshold [1000.0] https://grafana.wikimedia.org/dashboard/db/mediawiki-graphite-alerts?orgId=1&panelId=1&fullscreen [12:20:34] this one seemed a brief issue of mcrouter (port 11213) [12:20:52] "127.0.0.1:11213": SERVER ERROR [12:20:58] on multiple hosts [12:22:42] yeah, we've seen brief spikes of those [12:23:10] I don't know the current status of the investugation, though [12:23:41] but last time I looked it didn't have any production impact, the app servers which triggered the key failures were continuing to work fine [12:25:24] I checked a couple of appserver's mcrouter journalog, and all of them reported mc1035.eqiad.wmnet temporarily misbheaving IIUC [12:25:39] and IIRC Joe mentioned this to me a while ago [12:27:32] (03CR) 10Volans: [C: 04-1] "Change looks good, some minor comments inline." (033 comments) [puppet] - 10https://gerrit.wikimedia.org/r/455820 (owner: 10Faidon Liambotis) [12:28:01] (03CR) 10Volans: [C: 031] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/455821 (owner: 10Faidon Liambotis) [12:29:18] PROBLEM - MediaWiki memcached error rate on graphite1001 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [5000.0] https://grafana.wikimedia.org/dashboard/db/mediawiki-graphite-alerts?orgId=1&panelId=1&fullscreen [12:31:37] !log tgr@deploy1001 Synchronized php-1.32.0-wmf.18/vendor/: SWAT: [[gerrit|455770|Update wikimedia/css-sanitizer to 2.0.0 (T197617)]] (duration: 01m 16s) [12:31:46] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:31:47] T197617: TemplateStyles should be able to add skin-specific CSS - https://phabricator.wikimedia.org/T197617 [12:33:47] RECOVERY - MediaWiki memcached error rate on graphite1001 is OK: OK: Less than 40.00% above the threshold [1000.0] https://grafana.wikimedia.org/dashboard/db/mediawiki-graphite-alerts?orgId=1&panelId=1&fullscreen [12:34:36] !log tgr@deploy1001 Synchronized php-1.32.0-wmf.18/extensions/TemplateStyles/: SWAT: [[gerrit:455771|Hoist selectors for html and body element (T197617)]] (duration: 00m 48s) [12:34:41] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:34:41] elukey mobrovac FYI I wanted to point statsd.eqiad.wmnet to graphite1004, there might need some restarts needed and I was planning to do that tomorrow morning, would that work for you? [12:35:07] yep! [12:36:01] sweet, hopefully we have to do less restarts than last year [12:37:11] zeljkof: took a while but done [12:37:25] thanks for the help, addshore [12:37:38] tgr: great, I was about to ask if you've managed to resolve the problem [12:37:48] thanks addshore [12:38:25] Urbanecm: sorry for delaying your patches. [12:38:32] Nothing happened :) [12:38:37] Did you finally resolve your scap problem? [12:38:40] Did not watch this chan [12:41:02] (03PS2) 10Faidon Liambotis: ssh-agent-proxy: make key location configurable [puppet] - 10https://gerrit.wikimedia.org/r/455818 [12:41:04] (03PS2) 10Faidon Liambotis: ssh-agent-proxy: add default values to --help [puppet] - 10https://gerrit.wikimedia.org/r/455819 [12:41:05] Urbanecm: yeah, the underlying issue is tracked in T202984 [12:41:06] (03PS2) 10Faidon Liambotis: ssh-agent-proxy: switch to logger and add --debug [puppet] - 10https://gerrit.wikimedia.org/r/455820 [12:41:08] T202984: Scap sync fails because it tries to lint test files - https://phabricator.wikimedia.org/T202984 [12:41:08] (03PS2) 10Faidon Liambotis: keyholder: add a \n to a content => line [puppet] - 10https://gerrit.wikimedia.org/r/455821 [12:41:49] (03PS1) 10Elukey: Lower to 5M the TTL for archiva's CNAME record [dns] - 10https://gerrit.wikimedia.org/r/455824 (https://phabricator.wikimedia.org/T192639) [12:43:01] 10Operations, 10Wikimedia-Mailing-lists: Password reset request for wikimedia-nd mailing list - https://phabricator.wikimedia.org/T202247 (10ArielGlenn) An email update should have gone to all of the accounts listed on the info page here: https://lists.wikimedia.org/mailman/listinfo/wikimedia-nd Please check t... [12:45:18] (03PS2) 10Elukey: Switch archiva.wikimedia.org to archiva1001 [dns] - 10https://gerrit.wikimedia.org/r/455760 (https://phabricator.wikimedia.org/T192639) [12:45:25] 10Operations, 10docker-pkg, 10Patch-For-Review: releng/mediawiki-phpcs-dryrun fails to upload to docker-registry.wikimedia.org - https://phabricator.wikimedia.org/T200722 (10hashar) Ahhh, good to know there is a limit :] The image holds a copy of mediawiki/core which should not be necessary: ``` RUN git clon... [12:48:29] moritzm: is it possible that the logfile directive in /etc/memcached is not picked up at all due to the systemd unit? [12:48:40] I tried to check for a logfile but nothing is there [12:50:17] PROBLEM - toolschecker: tools homepage -admin tool- on tools.wmflabs.org is CRITICAL: CRITICAL - Socket timeout after 20 seconds [12:50:57] RECOVERY - toolschecker: tools homepage -admin tool- on tools.wmflabs.org is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 1015 bytes in 0.018 second response time [12:53:15] elukey: looking [12:53:16] (03CR) 10Elukey: [C: 032] Lower to 5M the TTL for archiva's CNAME record [dns] - 10https://gerrit.wikimedia.org/r/455824 (https://phabricator.wikimedia.org/T192639) (owner: 10Elukey) [12:55:02] 10Operations, 10ops-codfw, 10Discovery, 10Wikidata, and 2 others: add SSDs to wdqs200[12] - https://phabricator.wikimedia.org/T202777 (10Gehel) @Papaul: I'm ready to reimage wdqs2002 today. Ping me when you're around and I'll shut it down. [12:56:31] elukey: yeah, the config handling in the init script is vastly different [12:57:23] (03PS2) 10Filippo Giunchedi: prometheus: alert on unusual day-over-day logstash ingestion rate change [puppet] - 10https://gerrit.wikimedia.org/r/455576 (https://phabricator.wikimedia.org/T202307) [12:59:33] (03CR) 10Filippo Giunchedi: "> Patch Set 1: Code-Review+1" (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/455576 (https://phabricator.wikimedia.org/T202307) (owner: 10Filippo Giunchedi) [13:00:04] Deploy window MediaWiki train - European version (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180828T1300) [13:00:30] (03CR) 10Filippo Giunchedi: [C: 032] prometheus: alert on unusual day-over-day logstash ingestion rate change [puppet] - 10https://gerrit.wikimedia.org/r/455576 (https://phabricator.wikimedia.org/T202307) (owner: 10Filippo Giunchedi) [13:00:33] 10Operations, 10SRE-Access-Requests, 10wikidiff2, 10User-Addshore: Give thiemowmde permission to upload wikidiff2 releases (releasers-wikidiff2) - https://phabricator.wikimedia.org/T202476 (10thiemowmde) `ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAACAQCopJe4A8f90fa+Y0GYwvw1k/UaWNvWRNJ9m+ym7GXoMDpVYELQAdeYm52ApG/UsQ... [13:08:07] PROBLEM - Filesystem available is greater than filesystem size on ms-be2041 is CRITICAL: cluster=swift device=/dev/sdd1 fstype=xfs instance=ms-be2041:9100 job=node mountpoint=/srv/swift-storage/sdd1 site=codfw https://grafana.wikimedia.org/dashboard/db/host-overview?orgId=1&var-server=ms-be2041&var-datasource=codfw%2520prometheus%252Fops [13:10:03] next batch of swift hosts we're trying out ext4 :) [13:15:15] !log repair sdd1 on ms-be2041 - T199198 [13:15:20] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:15:21] T199198: Some swift filesystems reporting negative disk usage - https://phabricator.wikimedia.org/T199198 [13:21:19] (03PS3) 10Alexandros Kosiaris: Display etcd /mediawiki-config values in noc.w.o [puppet] - 10https://gerrit.wikimedia.org/r/455578 [13:21:21] (03PS1) 10Alexandros Kosiaris: docker-registry: Allow image layers up to 3g to be pushed [puppet] - 10https://gerrit.wikimedia.org/r/455829 (https://phabricator.wikimedia.org/T200722) [13:22:17] PROBLEM - HTTP releases-jenkins.wikimedia.org on releases1001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:22:22] (03CR) 10jerkins-bot: [V: 04-1] docker-registry: Allow image layers up to 3g to be pushed [puppet] - 10https://gerrit.wikimedia.org/r/455829 (https://phabricator.wikimedia.org/T200722) (owner: 10Alexandros Kosiaris) [13:23:27] RECOVERY - HTTP releases-jenkins.wikimedia.org on releases1001 is OK: HTTP OK: HTTP/1.1 200 OK - 18405 bytes in 9.132 second response time [13:24:13] (03PS1) 10Alexandros Kosiaris: Parameterize tmpfs size [puppet/nginx] - 10https://gerrit.wikimedia.org/r/455830 (https://phabricator.wikimedia.org/T200722) [13:31:24] (03CR) 10Elukey: [C: 031] Add cache app directors for analytics_ui, superset and thorium [puppet] - 10https://gerrit.wikimedia.org/r/455743 (https://phabricator.wikimedia.org/T202011) (owner: 10Ottomata) [13:31:54] (03PS1) 10Ottomata: Install ipython on all analytics nodes [puppet] - 10https://gerrit.wikimedia.org/r/455833 [13:32:23] moritzm: I tried in deployment-prep to play a bit with memcached's options, and I made logging work (to syslog) only using -v or -vv directly from the ExecStart command [13:32:38] if I enable them from /etc/memcached.conf nothing happens [13:33:11] godog: sure, that's fine [13:33:29] elukey: ack [13:33:47] mobrovac: sweet [13:33:58] -v verbose (print errors/warnings while in event loop) [13:34:23] I am wondering if --^ should be enabled for mc1035 (and the others, but we can't really roll restart all memcached shards) to debug this issue [13:34:25] (03CR) 10Ottomata: [C: 032] Install ipython on all analytics nodes [puppet] - 10https://gerrit.wikimedia.org/r/455833 (owner: 10Ottomata) [13:35:37] (03PS2) 10Ottomata: Add cache app directors for analytics_ui, superset and thorium [puppet] - 10https://gerrit.wikimedia.org/r/455743 (https://phabricator.wikimedia.org/T202011) [13:35:44] (03CR) 10Ottomata: [C: 032] "https://puppet-compiler.wmflabs.org/compiler02/12264/cp1075.eqiad.wmnet/" [puppet] - 10https://gerrit.wikimedia.org/r/455743 (https://phabricator.wikimedia.org/T202011) (owner: 10Ottomata) [13:39:50] 10Operations, 10SRE-Access-Requests, 10Patch-For-Review: Onboarding Balazs Pocze - https://phabricator.wikimedia.org/T202521 (10Marostegui) I have signed @Banyek GPG's key [13:41:09] (03PS1) 10Ottomata: Route yarn.wikimedia.org to hadoop_ui director on analytics-tool1001 [puppet] - 10https://gerrit.wikimedia.org/r/455837 (https://phabricator.wikimedia.org/T202011) [13:42:34] (03CR) 10Ottomata: [C: 032] Route yarn.wikimedia.org to hadoop_ui director on analytics-tool1001 [puppet] - 10https://gerrit.wikimedia.org/r/455837 (https://phabricator.wikimedia.org/T202011) (owner: 10Ottomata) [13:47:51] (03PS1) 10Ottomata: Route huewq.wikimedia.org to hadoop_ui director on analytics-tool1001 [puppet] - 10https://gerrit.wikimedia.org/r/455839 (https://phabricator.wikimedia.org/T202011) [13:49:32] ottomata: huewq.wikimedia.org :) [13:50:08] (03CR) 10Muehlenhoff: [C: 031] tor_relay: temp allow rsync of datadir for migration [puppet] - 10https://gerrit.wikimedia.org/r/455745 (https://phabricator.wikimedia.org/T196701) (owner: 10Dzahn) [13:51:50] (03CR) 10Ottomata: [C: 032] Route huewq.wikimedia.org to hadoop_ui director on analytics-tool1001 [puppet] - 10https://gerrit.wikimedia.org/r/455839 (https://phabricator.wikimedia.org/T202011) (owner: 10Ottomata) [14:00:18] (03PS1) 10Ottomata: Route turnilo.wikimedia.org to turnilo director on analytics-tool1002 [puppet] - 10https://gerrit.wikimedia.org/r/455840 (https://phabricator.wikimedia.org/T202011) [14:01:12] (03CR) 10Ottomata: [C: 032] Route turnilo.wikimedia.org to turnilo director on analytics-tool1002 [puppet] - 10https://gerrit.wikimedia.org/r/455840 (https://phabricator.wikimedia.org/T202011) (owner: 10Ottomata) [14:02:30] 10Operations, 10SRE-Access-Requests, 10wikidiff2, 10Patch-For-Review, 10User-Addshore: Give WMDE-Fisch permission to upload wikidiff2 releases (releasers-wikidiff2) - https://phabricator.wikimedia.org/T202475 (10WMDE-Fisch) >>! In T202475#4537876, @ArielGlenn wrote: > As soon as the user verifies that ac... [14:07:58] (03PS1) 10Ottomata: Route turnilo.wikimedia.org to turnilo director on analytics-tool1003 [puppet] - 10https://gerrit.wikimedia.org/r/455842 (https://phabricator.wikimedia.org/T202011) [14:08:11] RECOVERY - Filesystem available is greater than filesystem size on ms-be2041 is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/host-overview?orgId=1&var-server=ms-be2041&var-datasource=codfw%2520prometheus%252Fops [14:10:05] (03PS2) 10Ottomata: Route superset.wikimedia.org to superset director on analytics-tool1003 [puppet] - 10https://gerrit.wikimedia.org/r/455842 (https://phabricator.wikimedia.org/T202011) [14:10:29] (03CR) 10Ottomata: [V: 032 C: 032] Route superset.wikimedia.org to superset director on analytics-tool1003 [puppet] - 10https://gerrit.wikimedia.org/r/455842 (https://phabricator.wikimedia.org/T202011) (owner: 10Ottomata) [14:17:02] 10Operations, 10SRE-Access-Requests, 10Patch-For-Review: Please add aaron to perf-team - https://phabricator.wikimedia.org/T202650 (10Imarlier) @aaron Can you verify that you have access to perf-team hosts (eg, webperf1001)? [14:17:22] 10Operations, 10SRE-Access-Requests, 10Patch-For-Review: Please add aaron to perf-team - https://phabricator.wikimedia.org/T202650 (10Imarlier) a:03aaron [14:18:31] 10Operations, 10SRE-Access-Requests, 10Performance-Team (Radar): add perf-team admins to webserver misc static servers - https://phabricator.wikimedia.org/T202910 (10Dzahn) They need to upload files to sitemap.wikimedia.org's document root. I think we always want at least 2 people to avoid SPOFs. Correct m... [14:19:35] 10Operations, 10SRE-Access-Requests, 10Performance-Team (Radar): add perf-team admins to webserver misc static servers - https://phabricator.wikimedia.org/T202910 (10Imarlier) @ArielGlenn For the moment, I'm the one working on this, but I'm not sure that will continue to be the case so I'd prefer if it were... [14:20:17] 10Operations, 10SRE-Access-Requests, 10Performance-Team (Radar): add perf-team admins to webserver misc static servers - https://phabricator.wikimedia.org/T202910 (10Imarlier) (In other words, pretty much what @Dzahn wrote while I was writing my comment :-) ) [14:21:20] 10Operations, 10SRE-Access-Requests: Access to restbase servers (including sudo) for Imarlier - https://phabricator.wikimedia.org/T202563 (10Imarlier) a:05Imarlier>03VColeman @VColeman Can you approve this, please? [14:21:42] (03PS3) 10Herron: logstash: reduce replica count on old logstash indices [puppet] - 10https://gerrit.wikimedia.org/r/454354 (https://phabricator.wikimedia.org/T201971) [14:22:50] (03PS4) 10Herron: logstash: reduce replica count on old logstash indices [puppet] - 10https://gerrit.wikimedia.org/r/454354 (https://phabricator.wikimedia.org/T201971) [14:22:55] 10Operations, 10SRE-Access-Requests: request to add imarlier to perf-roots - https://phabricator.wikimedia.org/T202657 (10Imarlier) a:03VColeman @VColeman Can you approve this, please? [14:23:02] 10Operations, 10SRE-Access-Requests, 10Performance-Team (Radar): add perf-team admins to webserver misc static servers - https://phabricator.wikimedia.org/T202910 (10Dzahn) I'm thinking whether sitemap.wm.org should have been on the releases servers rather than here. Same kind of minimal Apache but we alread... [14:23:24] 10Operations, 10SRE-Access-Requests, 10wikidiff2, 10Patch-For-Review, 10User-Addshore: Give WMDE-Fisch permission to upload wikidiff2 releases (releasers-wikidiff2) - https://phabricator.wikimedia.org/T202475 (10ArielGlenn) Sorry about this, but the task asks for access to upload things onto the release... [14:24:02] 10Operations, 10SRE-Access-Requests, 10Patch-For-Review: request to add phendeskog to perf-roots - https://phabricator.wikimedia.org/T202658 (10Imarlier) @Peter Can you confirm this, please? (@Krinkle can probably give you hints about the easiest way to do that...) [14:27:55] 10Operations, 10docker-pkg, 10Patch-For-Review: releng/mediawiki-phpcs-dryrun fails to upload to docker-registry.wikimedia.org - https://phabricator.wikimedia.org/T200722 (10hashar) @akosiaris I have refactored the Jenkins job and we will most probably no more need the faulty container `releng/mediawiki-phpc... [14:29:00] 10Operations, 10SRE-Access-Requests, 10Performance-Team (Radar): add perf-team admins to webserver misc static servers - https://phabricator.wikimedia.org/T202910 (10ArielGlenn) There's where we would like sitemap.wm.o to be and where the various workarounds wll not be a PITA if we have it there. The direct... [14:29:44] (03PS3) 10Elukey: Switch archiva.wikimedia.org to archiva1001 [dns] - 10https://gerrit.wikimedia.org/r/455760 (https://phabricator.wikimedia.org/T192639) [14:32:37] 10Operations, 10SRE-Access-Requests, 10wikidiff2, 10Patch-For-Review, 10User-Addshore: Give WMDE-Fisch permission to upload wikidiff2 releases (releasers-wikidiff2) - https://phabricator.wikimedia.org/T202475 (10WMDE-Fisch) >>! In T202475#4538616, @ArielGlenn wrote: > Sorry about this, but the task asks... [14:32:40] 10Operations, 10SRE-Access-Requests, 10Performance-Team (Radar): add perf-team admins to webserver misc static servers - https://phabricator.wikimedia.org/T202910 (10Imarlier) >>! In T202910#4538615, @Dzahn wrote: > I'm thinking whether sitemap.wm.org should have been on the releases servers rather than here... [14:32:53] (03PS1) 10Marostegui: Revert "db-eqiad.php: Depool db1081" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/455846 [14:34:59] (03PS29) 10Gehel: Prep work for multi-instance elasticsearch refactor [puppet] - 10https://gerrit.wikimedia.org/r/440498 (https://phabricator.wikimedia.org/T198351) (owner: 10EBernhardson) [14:45:33] (03CR) 10Anomie: Test that all wikis are in one of the shard dblists (033 comments) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/455587 (https://phabricator.wikimedia.org/T202904) (owner: 10Anomie) [14:45:46] (03PS3) 10Anomie: Test that all wikis are in one of the shard dblists [mediawiki-config] - 10https://gerrit.wikimedia.org/r/455587 (https://phabricator.wikimedia.org/T202904) [14:46:47] (03PS4) 10Anomie: Test that all wikis are in one of the shard dblists [mediawiki-config] - 10https://gerrit.wikimedia.org/r/455587 (https://phabricator.wikimedia.org/T202904) [14:47:03] (03CR) 10Herron: "> I'm +1 on the idea, though I believe reducing replica count should" [puppet] - 10https://gerrit.wikimedia.org/r/454354 (https://phabricator.wikimedia.org/T201971) (owner: 10Herron) [14:47:15] (03PS5) 10Anomie: Test that all wikis are in one of the section dblists [mediawiki-config] - 10https://gerrit.wikimedia.org/r/455587 (https://phabricator.wikimedia.org/T202904) [14:47:32] 10Operations, 10ops-eqiad, 10DC-Ops, 10cloud-services-team: labvirt1003 raid warning - https://phabricator.wikimedia.org/T200203 (10Cmjohnson) [14:47:48] 10Operations, 10ops-codfw, 10DBA: db2042 RAID battery failed - https://phabricator.wikimedia.org/T202051 (10Marostegui) This server started to recharge its BBU again: ``` WARNING: Slot 0: OK: 1I:1:1, 1I:1:2, 1I:1:3, 1I:1:4, 1I:1:5, 1I:1:6, 1I:1:7, 1I:1:8, 1I:1:9, 1I:1:10, 1I:1:11, 1I:1:12 - Controller: OK -... [14:48:18] 10Operations, 10ops-eqiad, 10DC-Ops, 10cloud-services-team: labvirt1003 raid warning - https://phabricator.wikimedia.org/T200203 (10Cmjohnson) @Bstorm The disk has been swapped. Please resolve this once satisfied [14:49:47] !log depooling scb1002 for hardware diagnostics [14:49:51] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:50:57] 10Operations, 10Analytics, 10Traffic, 10Services (blocked): Add Accept header to webrequest logs - https://phabricator.wikimedia.org/T170606 (10mobrovac) While the proposed solution will work for us in this case, I second @Ottomata's thoughts that having this header (or the lack thereof) included in the lo... [14:56:29] 10Operations, 10Electron-PDFs, 10Proton, 10Patch-For-Review, and 3 others: New service request: chromium-render/deploy - https://phabricator.wikimedia.org/T186748 (10phuedx) :point_up: per @mobrovac's review of https://gerrit.wikimedia.org/r/#/c/mediawiki/services/chromium-render/+/451817/. [14:56:33] (03CR) 10Ayounsi: [C: 031] "Not tested but read through it, tiny nitpic otherwise LGTM!" [puppet] - 10https://gerrit.wikimedia.org/r/455818 (owner: 10Faidon Liambotis) [14:57:18] (03CR) 10Ayounsi: [C: 031] ssh-agent-proxy: make key location configurable (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/455818 (owner: 10Faidon Liambotis) [14:58:24] 10Operations, 10SRE-Access-Requests, 10Performance-Team (Radar): add perf-team admins to webserver misc static servers - https://phabricator.wikimedia.org/T202910 (10ArielGlenn) Pinging @BBlack again so we can firm up the final retsing place for this thing, or at least put the final nail in its coffin :-D [14:58:32] (03CR) 10Jcrespo: [C: 031] Test that all wikis are in one of the section dblists [mediawiki-config] - 10https://gerrit.wikimedia.org/r/455587 (https://phabricator.wikimedia.org/T202904) (owner: 10Anomie) [14:59:23] 10Operations, 10docker-pkg, 10Patch-For-Review: releng/mediawiki-phpcs-dryrun fails to upload to docker-registry.wikimedia.org - https://phabricator.wikimedia.org/T200722 (10akosiaris) 05Open>03Resolved a:03akosiaris Thanks. Resolving. I 'll try and see if merging the changes listed above makes sense a... [14:59:53] (03CR) 10Ayounsi: [C: 031] "Not tested but looks straightforward enough." [puppet] - 10https://gerrit.wikimedia.org/r/455821 (owner: 10Faidon Liambotis) [15:00:42] (03PS1) 10Volans: mediawiki: add siteinfo-related methods [software/spicerack] - 10https://gerrit.wikimedia.org/r/455851 (https://phabricator.wikimedia.org/T199079) [15:01:44] (03CR) 10jerkins-bot: [V: 04-1] mediawiki: add siteinfo-related methods [software/spicerack] - 10https://gerrit.wikimedia.org/r/455851 (https://phabricator.wikimedia.org/T199079) (owner: 10Volans) [15:03:45] (03PS2) 10Volans: mediawiki: add siteinfo-related methods [software/spicerack] - 10https://gerrit.wikimedia.org/r/455851 (https://phabricator.wikimedia.org/T199079) [15:05:21] jouncebot: next [15:05:21] In 0 hour(s) and 54 minute(s): Puppet SWAT(Max 6 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180828T1600) [15:07:27] 10Operations, 10Maps, 10Maps-Sprint, 10Reading-Infrastructure-Team-Backlog: Decommission maps-test cluster - https://phabricator.wikimedia.org/T202898 (10Gehel) [15:10:02] 10Operations, 10Wikimedia-Mailing-lists: Password reset request for wikimedia-nd mailing list - https://phabricator.wikimedia.org/T202247 (10Geekdidi) Awesome. Thanks. And...yes, I've a second real person <3 [15:11:02] (03PS1) 10Urbanecm: Introduce engineer user group on Czech Wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/455853 (https://phabricator.wikimedia.org/T203000) [15:13:49] (03PS1) 10DCausse: Add esperanto and stop using gnupg.net as the key server [software/elasticsearch/plugins] (5.x) - 10https://gerrit.wikimedia.org/r/455854 [15:14:58] (03CR) 10DCausse: [V: 032 C: 032] Setup es 5.x backport branch [software/elasticsearch/plugins] (5.x) - 10https://gerrit.wikimedia.org/r/455782 (owner: 10DCausse) [15:16:29] (03CR) 10Filippo Giunchedi: [C: 031] logstash: reduce replica count on old logstash indices [puppet] - 10https://gerrit.wikimedia.org/r/454354 (https://phabricator.wikimedia.org/T201971) (owner: 10Herron) [15:17:01] (03CR) 10Filippo Giunchedi: [C: 031] "> Patch Set 4:" [puppet] - 10https://gerrit.wikimedia.org/r/454354 (https://phabricator.wikimedia.org/T201971) (owner: 10Herron) [15:18:07] 10Operations, 10netops, 10Patch-For-Review: rancid pubkey auth to Junos 17.4 failure - https://phabricator.wikimedia.org/T202952 (10thcipriani) ugh! Seeing the actual error I realize that a 3rd party user brought this up previously and it was fixed in the 3rd party repo (https://phabricator.wikimedia.org/sou... [15:21:50] (03CR) 10Tjones: [C: 031] "The parts I understand look good to me! ;)" [software/elasticsearch/plugins] (5.x) - 10https://gerrit.wikimedia.org/r/455854 (owner: 10DCausse) [15:22:08] 10Operations, 10ops-eqiad, 10DC-Ops: Replace memory bank on scb1002 - https://phabricator.wikimedia.org/T196901 (10Cmjohnson) I misread the memory and didn't have 4GB so i replaced all of the memory with same type just 8GB DIMM in each socket. So the server now had 2X the memory it once had. [15:24:12] 10Operations, 10ops-eqiad, 10DC-Ops: Replace memory bank on scb1002 - https://phabricator.wikimedia.org/T196901 (10Cmjohnson) @MoritzMuehlenhoff Feel free to resolve once satisfied. [15:29:38] 10Operations, 10ops-eqiad, 10DC-Ops: Replace memory bank on scb1002 - https://phabricator.wikimedia.org/T196901 (10MoritzMuehlenhoff) 05Open>03Resolved a:03MoritzMuehlenhoff Thanks, I've repooled the server. Closing the task, will reopen in case there are still issues. [15:30:03] (03CR) 10Herron: [C: 032] "> LGTM, suggestion inline (but feel free to ignore)" [puppet] - 10https://gerrit.wikimedia.org/r/454354 (https://phabricator.wikimedia.org/T201971) (owner: 10Herron) [15:30:14] (03PS5) 10Herron: logstash: reduce replica count on old logstash indices [puppet] - 10https://gerrit.wikimedia.org/r/454354 (https://phabricator.wikimedia.org/T201971) [15:31:54] (03CR) 10Marostegui: [C: 032] Revert "db-eqiad.php: Depool db1081" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/455846 (owner: 10Marostegui) [15:32:04] (03CR) 10Banyek: "I am not sure how this works, but it Looks Good To Me™ - Maybe it could be +1 but I go for 0." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/455846 (owner: 10Marostegui) [15:32:12] RECOVERY - Device not healthy -SMART- on labvirt1003 is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/host-overview?var-server=labvirt1003&var-datasource=eqiad%2520prometheus%252Fops [15:33:15] (03Merged) 10jenkins-bot: Revert "db-eqiad.php: Depool db1081" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/455846 (owner: 10Marostegui) [15:33:32] (03CR) 10Marostegui: [C: 032] "> I am not sure how this works, but it Looks Good To Me™ - Maybe it" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/455846 (owner: 10Marostegui) [15:34:31] !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Repool db1081 (duration: 00m 48s) [15:34:35] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:34:38] (03CR) 10Banyek: "That was my intention, indeed. :)" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/455846 (owner: 10Marostegui) [15:34:48] (03PS1) 10Marostegui: db-eqiad.php: Depool db1097:3314 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/455856 [15:35:12] !log shutting down wdqs1004 to add new disks - T202779 [15:35:16] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:35:16] T202779: add SSDs to wdqs100[45] - https://phabricator.wikimedia.org/T202779 [15:35:26] !log gehel@puppetmaster1001 conftool action : set/pooled=inactive; selector: dc=eqiad,cluster=wdqs,name=wdqs1004.eqiad.wmnet [15:35:29] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:35:56] 10Operations, 10Release-Engineering-Team: Keyholder phab repo duplicate work - https://phabricator.wikimedia.org/T203003 (10thcipriani) [15:36:06] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Depool db1097:3314 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/455856 (owner: 10Marostegui) [15:36:30] 10Operations, 10Release-Engineering-Team: Keyholder phab repo duplicate work - https://phabricator.wikimedia.org/T203003 (10thcipriani) For an example of the problem see T202952 [15:37:23] (03Merged) 10jenkins-bot: db-eqiad.php: Depool db1097:3314 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/455856 (owner: 10Marostegui) [15:38:32] !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Depool db1097:3314 (duration: 00m 48s) [15:38:35] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:38:36] !log Deploy schema change on db1097:3314 [15:38:39] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:40:52] (03CR) 10Thcipriani: "Should solve the immediate issue in T202952." (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/455812 (https://phabricator.wikimedia.org/T202952) (owner: 10Faidon Liambotis) [15:40:57] 10Operations, 10ops-eqiad, 10Discovery, 10Wikidata, and 2 others: add SSDs to wdqs100[45] - https://phabricator.wikimedia.org/T202779 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by gehel on neodymium.eqiad.wmnet for hosts: ``` ['wdqs1004.eqiad.wmnet'] ``` The log can be found in `/var/log/w... [15:41:44] (03CR) 10jenkins-bot: Revert "db-eqiad.php: Depool db1081" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/455846 (owner: 10Marostegui) [15:41:46] (03CR) 10jenkins-bot: db-eqiad.php: Depool db1097:3314 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/455856 (owner: 10Marostegui) [15:43:36] 10Operations, 10Release-Engineering-Team: Keyholder phab repo duplicate work - https://phabricator.wikimedia.org/T203003 (10thcipriani) [15:44:01] (03PS3) 10Faidon Liambotis: keyholder: add a \n to a content => line [puppet] - 10https://gerrit.wikimedia.org/r/455821 [15:44:03] (03PS2) 10Faidon Liambotis: ssh-agent-proxy: support RSA SHA2 operations [puppet] - 10https://gerrit.wikimedia.org/r/455812 (https://phabricator.wikimedia.org/T202952) [15:44:05] (03PS3) 10Faidon Liambotis: ssh-agent-proxy: make key location configurable [puppet] - 10https://gerrit.wikimedia.org/r/455818 [15:44:07] (03PS3) 10Faidon Liambotis: ssh-agent-proxy: add default values to --help [puppet] - 10https://gerrit.wikimedia.org/r/455819 [15:44:09] (03PS3) 10Faidon Liambotis: ssh-agent-proxy: switch to logger and add --debug [puppet] - 10https://gerrit.wikimedia.org/r/455820 [15:44:13] (03PS1) 10Faidon Liambotis: ssh-agent-proxy: clear up client handling logic [puppet] - 10https://gerrit.wikimedia.org/r/455858 [15:44:57] (03CR) 10Faidon Liambotis: "Wait, what's the "phabricator keyholder repository"? Are we maintaining two versions of this codebase?" [puppet] - 10https://gerrit.wikimedia.org/r/455812 (https://phabricator.wikimedia.org/T202952) (owner: 10Faidon Liambotis) [15:45:58] 10Operations, 10netops, 10Patch-For-Review: rancid pubkey auth to Junos 17.4 failure - https://phabricator.wikimedia.org/T202952 (10Volans) Is there any reason why we are keeping two versions of the same code in different places? We should unify and use only one of them IMHO. [15:47:50] 10Operations, 10ops-eqiad, 10Discovery, 10Wikidata, and 2 others: add SSDs to wdqs100[45] - https://phabricator.wikimedia.org/T202779 (10Cmjohnson) Added 2 ssds to wdqs1004 and gehel is re-installing. Will do wdqs1005 at a later time/date. [15:51:49] 10Operations, 10Release-Engineering-Team: Keyholder phab repo duplicate work - https://phabricator.wikimedia.org/T203003 (10faidon) Thanks for filing this! I lost about an hour debugging and (re-)fixing the above issue today, so +1 to everything you said :) Let's sort this out and all other potential cases l... [15:51:52] herron, btw, the other change I have involving labs and exim is https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/439774/ [15:53:12] (03CR) 10Thcipriani: "> Wait, what's the "phabricator keyholder repository"? Are we" [puppet] - 10https://gerrit.wikimedia.org/r/455812 (https://phabricator.wikimedia.org/T202952) (owner: 10Faidon Liambotis) [15:54:29] thcipriani: yeah, I already responded to the task, our updates are crossing each other :) [15:55:16] heh, so many available communication channels [15:55:28] I'm very :( about this [15:55:43] (03CR) 10Gehel: "LGTM, minor comments inline" (034 comments) [software/spicerack] - 10https://gerrit.wikimedia.org/r/455851 (https://phabricator.wikimedia.org/T199079) (owner: 10Volans) [15:56:02] will you take the lead in coalescing the two repos or is something that you expect SRE to do? [15:56:18] just making sure we're not each waiting for each other ;) [15:57:06] once I saw the error message and realized I had seen it before, it made for a frustrating morning. I can look at merging the two repos. I'll add folks that are these tickets to review once I have something. [15:57:19] ok [15:57:27] thanks :) [15:57:46] I'm not sure exactly how to do this just yet. The simplest idea might just be a subtree in the puppet repo, but I don't think that's common? [15:57:55] no it's not [15:58:03] 10Operations, 10Release-Engineering-Team: Keyholder phab repo duplicate work - https://phabricator.wikimedia.org/T203003 (10mmodell) I agree that we should use the upstream repo from puppet and I should have refactored the puppet code as soon as I created rKEYHOLDER. The question then is, do we install via deb... [15:58:36] heh, then I guess ^ is my next question [15:59:01] 10Operations, 10Wikimedia-Mailing-lists: Password reset request for wikimedia-nd mailing list - https://phabricator.wikimedia.org/T202247 (10ArielGlenn) If you have access back, please go ahead and close this (or comment here and I'll do so). [15:59:33] volans: ^ [15:59:38] I'm about to run into a meeting unfortunately [15:59:53] me too :) [16:00:04] godog, moritzm, and _joe_: Your horoscope predicts another unfortunate Puppet SWAT(Max 6 patches) deploy. May Zuul be (nice) with you. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180828T1600). [16:00:04] Ebe123, Reedy, and Reedy: A patch you scheduled for Puppet SWAT(Max 6 patches) is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker. [16:00:20] Thank you. Ready [16:00:26] Reedy and Reedy :) [16:00:36] lolol [16:00:54] needs more Reedys [16:01:26] thcipriani: I can comment on the task after the meeting [16:01:35] volans: thanks! [16:02:11] (03CR) 10Faidon Liambotis: "Setting aside the two repos for a moment and to Tyler's earlier point:" [puppet] - 10https://gerrit.wikimedia.org/r/455812 (https://phabricator.wikimedia.org/T202952) (owner: 10Faidon Liambotis) [16:03:01] (03PS6) 10Vgutierrez: [WIP] Validate challenges before pushing them to the ACME directory [software/certcentral] - 10https://gerrit.wikimedia.org/r/455159 (https://phabricator.wikimedia.org/T199711) [16:04:21] 10Operations, 10ops-eqiad, 10Discovery, 10Wikidata, and 2 others: add SSDs to wdqs100[45] - https://phabricator.wikimedia.org/T202779 (10ops-monitoring-bot) Completed auto-reimage of hosts: ``` ['wdqs1004.eqiad.wmnet'] ``` and were **ALL** successful. [16:04:26] (03CR) 10jerkins-bot: [V: 04-1] [WIP] Validate challenges before pushing them to the ACME directory [software/certcentral] - 10https://gerrit.wikimedia.org/r/455159 (https://phabricator.wikimedia.org/T199711) (owner: 10Vgutierrez) [16:05:26] PROBLEM - puppet last run on labvirt1015 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [16:06:48] (03CR) 10Thcipriani: [C: 031] "> Setting aside the two repos for a moment and to Tyler's earlier" [puppet] - 10https://gerrit.wikimedia.org/r/455812 (https://phabricator.wikimedia.org/T202952) (owner: 10Faidon Liambotis) [16:10:26] RECOVERY - puppet last run on labvirt1015 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [16:11:12] Reedy: apache changes again during puppet swat? :P [16:11:35] elukey: :D [16:11:35] jokes aside, I'd ask to Moritz to review https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/445603/ before merging [16:11:50] He's CC'd [16:12:50] (I just added him) [16:15:21] 10Operations, 10Wikimedia-Mailing-lists: Password reset request for wikimedia-nd mailing list - https://phabricator.wikimedia.org/T202247 (10Geekdidi) 05Open>03Resolved [16:15:39] 10Operations, 10Wikimedia-Mailing-lists: Password reset request for wikimedia-nd mailing list - https://phabricator.wikimedia.org/T202247 (10Geekdidi) Done [16:17:23] Reedy: I cannot help now for the apache changes (and I'd also need to review/test them a bit), so if anybody picks them up later on the new domain might get out very soon, otherwise I'll try to help tomorrow morning EU time if it is ok [16:18:05] Sure. It's gonna need doing one way or the other ASAP :) [16:18:33] yep yep I saw the timings in the task [16:22:21] Reedy: also https://puppet-compiler.wmflabs.org/compiler02/12268/mw1269.eqiad.wmnet/change.mw1269.eqiad.wmnet.err [16:22:27] (03CR) 10Herron: [C: 04-1] "What are some examples of highly undesirable divergence?" [puppet] - 10https://gerrit.wikimedia.org/r/439774 (owner: 10Alex Monk) [16:22:47] (a somehow related change that we'll probably merge next week https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/452323/) [16:23:03] the template is missing for the new domain [16:23:15] err vhost sorry [16:25:30] !log starting branch cut for 1.32.0-wmf.19 [16:25:33] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:28:59] (03CR) 10Alex Monk: "The inability to configure hosts in labs to route wiki mail like prod. Realm branches in templates are much better than separate files for" [puppet] - 10https://gerrit.wikimedia.org/r/439774 (owner: 10Alex Monk) [16:29:18] 10Operations, 10ops-eqiad: Degraded RAID on sodium - https://phabricator.wikimedia.org/T202705 (10Cmjohnson) created a self dispatch ticket with Dell. You have successfully submitted request WO10392987. [16:30:22] So removing qsynth will be in a future patch? [16:33:43] 10Operations, 10ops-eqiad: Decommisson and store old row D network gear. - https://phabricator.wikimedia.org/T170474 (10Cmjohnson) @ayounsi Removed all the old network gear from row D. The ex4200's are going to be decommissioned. Do we want to hold on to the 43 EX4550's? [16:33:47] But I see that timidity-daemon is not there anymore either. Restructuring? [16:35:23] 10Operations, 10ops-eqiad, 10decommission, 10Patch-For-Review, 10cloud-services-team (Kanban): Decommission labsdb1001 and labsdb1003 - https://phabricator.wikimedia.org/T184832 (10Cmjohnson) [16:35:32] 10Operations, 10DBA, 10decommission, 10Goal: reclaim and return all cisco servers - https://phabricator.wikimedia.org/T128821 (10Cmjohnson) [16:35:38] 10Operations, 10ops-eqiad, 10decommission, 10Patch-For-Review, 10cloud-services-team (Kanban): Decommission labsdb1001 and labsdb1003 - https://phabricator.wikimedia.org/T184832 (10Cmjohnson) 05Open>03Resolved [16:35:47] (03CR) 10Elukey: [C: 04-1] "https://puppet-compiler.wmflabs.org/compiler02/12268/mw1269.eqiad.wmnet/change.mw1269.eqiad.wmnet.err :(" [puppet] - 10https://gerrit.wikimedia.org/r/455369 (https://phabricator.wikimedia.org/T202819) (owner: 10Reedy) [16:38:15] (03PS2) 10DCausse: [cirrus] Increase number of shards for wikidata content and commons file [mediawiki-config] - 10https://gerrit.wikimedia.org/r/454588 [16:38:46] (03PS1) 10Ottomata: Remove now unused turnilo, superset, hue, yarn puppetization from thorium [puppet] - 10https://gerrit.wikimedia.org/r/455864 (https://phabricator.wikimedia.org/T202011) [16:39:19] (03CR) 10Ottomata: "Will merge this and clean up thorium if all goes well with migration of sites" [puppet] - 10https://gerrit.wikimedia.org/r/455864 (https://phabricator.wikimedia.org/T202011) (owner: 10Ottomata) [16:42:57] 10Operations, 10DBA, 10decommission, 10Goal: reclaim and return all cisco servers - https://phabricator.wikimedia.org/T128821 (10Cmjohnson) a:03RobH All CISCO servers are off the rack and fully decommissioned. I do not have any clue on what to do next with them but would love to get them out of the stora... [16:43:15] 10Operations, 10ops-eqiad: Decommisson and store old row D network gear. - https://phabricator.wikimedia.org/T170474 (10ayounsi) Some ex4200 need to be kept to replace the OOB (mgmt) switches (as many as we have OOB switches (+ a couple spares). The EX4550's are not needed. Is 43 a typo? that sounds like a lot. [16:44:06] (03CR) 1020after4: [C: 031] Gerrit: Clone avatars repo into /var/www/avatars [puppet] - 10https://gerrit.wikimedia.org/r/440104 (https://phabricator.wikimedia.org/T191183) (owner: 10Paladox) [16:45:39] (03CR) 10Alex Monk: "It's also probably worth noting that comments and reviews like these work against the goal of having beta more closely mirror production. " [puppet] - 10https://gerrit.wikimedia.org/r/439774 (owner: 10Alex Monk) [16:46:45] (03CR) 1020after4: [C: 031] Gerrit: Add support for avatars url in apache [puppet] - 10https://gerrit.wikimedia.org/r/439783 (https://phabricator.wikimedia.org/T191183) (owner: 10Paladox) [16:55:07] 10Operations, 10Traffic: Letsencrypt all the prod things we can - planning - https://phabricator.wikimedia.org/T133717 (10Krenair) ```alex@alex-laptop:~$ openssl s_client -starttls smtp -connect mx1001.wikimedia.org:25 2>/dev/null | openssl x509 -noout -text | grep Issuer: Issuer: C = US, O = Let's Enc... [16:56:16] 10Operations, 10decommission, 10Goal: reclaim and return all cisco servers - https://phabricator.wikimedia.org/T128821 (10Marostegui) [16:56:44] 10Operations, 10Mail, 10Patch-For-Review: mail.wikimedia.org SSL cert expiring Mon 23 Oct 2017 - https://phabricator.wikimedia.org/T174081 (10Krenair) [16:56:49] 10Operations, 10Patch-For-Review: letsencrypt::cert::integrated and non-http servers - https://phabricator.wikimedia.org/T174720 (10Krenair) 05Open>03Resolved a:03herron Looks like @herron resolved this last year [16:57:33] (03PS1) 10Dduvall: group0 to 1.32.0-wmf.19 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/455866 [16:57:38] 10Operations, 10Traffic: Letsencrypt all the prod things we can - planning - https://phabricator.wikimedia.org/T133717 (10Krenair) And actually that's the whole list on this ticket. Anything else missing @bblack or can this be closed? [16:57:42] (03CR) 10Ottomata: "https://puppet-compiler.wmflabs.org/compiler02/12269/thorium.eqiad.wmnet/" [puppet] - 10https://gerrit.wikimedia.org/r/455864 (https://phabricator.wikimedia.org/T202011) (owner: 10Ottomata) [16:59:38] 10Operations, 10ops-eqiad, 10Cloud-VPS, 10Patch-For-Review, 10cloud-services-team (Kanban): rack/setup/install cloudstore1008 & cloudstore1009 - https://phabricator.wikimedia.org/T193655 (10Cmjohnson) Both cloudstores are updated ge-0/0/14 up up cloudstore1008 ge-6/0/17 up up clou... [17:00:05] cscott, arlolra, subbu, halfak, and Amir1: Your horoscope predicts another unfortunate Services – Graphoid / Parsoid / Citoid / ORES deploy. May Zuul be (nice) with you. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180828T1700). [17:00:32] 10Puppet, 10Cloud-Services: Migrate references from $instance.eqiad.wmflabs to $instance.$project.eqiad.wmflabs - https://phabricator.wikimedia.org/T153608 (10Krenair) [17:02:23] I guess https://gerrit.wikimedia.org/r/c/operations/puppet/+/445603 will go for the next Puppet SWAT [17:03:08] (03PS1) 10Smalyshev: Fix daily loader to cd do proper dir [puppet] - 10https://gerrit.wikimedia.org/r/455867 [17:03:24] 10Operations, 10Beta-Cluster-Infrastructure, 10Patch-For-Review: Mails through deployment-mx SPF & DKIM fails - https://phabricator.wikimedia.org/T87338 (10Krenair) And actually something I heard the other day made it sound like we may not be able to do that by default with our neutron setup either. [17:05:36] !log dduvall@deploy1001 Started scap: testwiki to php-1.32.0-wmf.19 and rebuild l10n cache [17:05:39] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:06:48] 10Operations, 10Traffic: Letsencrypt all the prod things we can - planning - https://phabricator.wikimedia.org/T133717 (10BBlack) Yeah I think this is closeable. This was just our initial "convert all the low-hanging fruit" ticket for the previous iteration of LE support. [17:07:21] 10Operations, 10Beta-Cluster-Infrastructure, 10Patch-For-Review: Mails through deployment-mx SPF & DKIM fails - https://phabricator.wikimedia.org/T87338 (10Krenair) 05Open>03Resolved a:03Krenair Anyway although we may struggle to make beta behaviour exactly mirror prod, the initial purpose of this tick... [17:11:29] 10Operations, 10Traffic: Letsencrypt all the prod things we can - planning - https://phabricator.wikimedia.org/T133717 (10Krenair) [17:12:09] 10Operations, 10SRE-Access-Requests, 10wikidiff2, 10Patch-For-Review, 10User-Addshore: Give WMDE-Fisch permission to upload wikidiff2 releases (releasers-wikidiff2) - https://phabricator.wikimedia.org/T202475 (10Legoktm) >>! In T202475#4538527, @WMDE-Fisch wrote: >>>! In T202475#4537876, @ArielGlenn wrot... [17:12:14] 10Operations, 10Traffic: Letsencrypt all the prod things we can - planning - https://phabricator.wikimedia.org/T133717 (10Krenair) 05Open>03Resolved Yep cool [17:18:55] (03CR) 10Muehlenhoff: [C: 031] "Looks fine. Will the mediawiki side of this use the recently introduced wrapper function for executing binaries in firejail?" [puppet] - 10https://gerrit.wikimedia.org/r/445603 (https://phabricator.wikimedia.org/T184598) (owner: 10Reedy) [17:22:55] PROBLEM - Filesystem available is greater than filesystem size on ms-be2043 is CRITICAL: cluster=swift device=/dev/sdh1 fstype=xfs instance=ms-be2043:9100 job=node mountpoint=/srv/swift-storage/sdh1 site=codfw https://grafana.wikimedia.org/dashboard/db/host-overview?orgId=1&var-server=ms-be2043&var-datasource=codfw%2520prometheus%252Fops [17:28:01] (03PS1) 10Arturo Borrero Gonzalez: cloudvps: labtestn: fix bridge mapping [puppet] - 10https://gerrit.wikimedia.org/r/455873 (https://phabricator.wikimedia.org/T202636) [17:28:53] (03CR) 10Arturo Borrero Gonzalez: [C: 032] cloudvps: labtestn: fix bridge mapping [puppet] - 10https://gerrit.wikimedia.org/r/455873 (https://phabricator.wikimedia.org/T202636) (owner: 10Arturo Borrero Gonzalez) [17:44:05] (03CR) 10Dzahn: "has this change been tested on one of the VPS instances?" [puppet] - 10https://gerrit.wikimedia.org/r/439783 (https://phabricator.wikimedia.org/T191183) (owner: 10Paladox) [17:44:30] (03PS1) 10Ottomata: Fix for password lookup function in superset_config.py [puppet] - 10https://gerrit.wikimedia.org/r/455876 (https://phabricator.wikimedia.org/T201430) [17:46:44] 10Operations, 10SRE-Access-Requests, 10wikidiff2, 10User-Addshore: Give thiemowmde permission to upload wikidiff2 releases (releasers-wikidiff2) - https://phabricator.wikimedia.org/T202476 (10RStallman-legalteam) The NDA is now signed and on file with legal. [17:48:25] 10Operations, 10SRE-Access-Requests, 10wikidiff2, 10User-Addshore: Give thiemowmde permission to upload wikidiff2 releases (releasers-wikidiff2) - https://phabricator.wikimedia.org/T202476 (10ArielGlenn) [17:50:14] (03PS2) 10Ottomata: Fix for password lookup function in superset_config.py [puppet] - 10https://gerrit.wikimedia.org/r/455876 (https://phabricator.wikimedia.org/T201430) [17:50:17] (03CR) 10Ottomata: [V: 032 C: 032] Fix for password lookup function in superset_config.py [puppet] - 10https://gerrit.wikimedia.org/r/455876 (https://phabricator.wikimedia.org/T201430) (owner: 10Ottomata) [17:50:47] 10Operations, 10SRE-Access-Requests, 10wikidiff2, 10User-Addshore: Give thiemowmde permission to upload wikidiff2 releases (releasers-wikidiff2) - https://phabricator.wikimedia.org/T202476 (10ArielGlenn) [17:51:50] !log arlolra@deploy1001 Started deploy [parsoid/deploy@d29d829]: Updating Parsoid to 61086f6 [17:51:53] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:52:36] !log rebooting multatuli for microcode tests [17:52:39] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:55:05] (03PS3) 10Volans: mediawiki: add siteinfo-related methods [software/spicerack] - 10https://gerrit.wikimedia.org/r/455851 (https://phabricator.wikimedia.org/T199079) [17:55:57] 10Operations, 10SRE-Access-Requests, 10wikidiff2, 10User-Addshore: Give thiemowmde permission to upload wikidiff2 releases (releasers-wikidiff2) - https://phabricator.wikimedia.org/T202476 (10ArielGlenn) [17:56:13] (03CR) 10Volans: "replies inline" (034 comments) [software/spicerack] - 10https://gerrit.wikimedia.org/r/455851 (https://phabricator.wikimedia.org/T199079) (owner: 10Volans) [17:57:34] 10Operations, 10Data-Services, 10SRE-Access-Requests, 10Patch-For-Review: Access to dumps servers - https://phabricator.wikimedia.org/T201350 (10Dzahn) So.. just to understand for myself.. the sitemaps.wikimedia.org site is hosted on misc_static_sites and not dumps servers or these labs servers now besides... [17:59:47] (03PS1) 10Volans: home: update my own .gitconfig [puppet] - 10https://gerrit.wikimedia.org/r/455879 [18:00:04] Niharika: Dear deployers, time to do the deploy. Dont look at me like that. You signed up for it. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180828T1800). [18:00:10] 10Operations, 10Data-Services, 10SRE-Access-Requests, 10Patch-For-Review: Access to dumps servers - https://phabricator.wikimedia.org/T201350 (10Imarlier) @Dzahn sitemaps.wikimedia.org is hosted on misc_static_sites. Dumps servers would not work because the production front-end Varnish servers need to be... [18:00:35] (03CR) 10Volans: [C: 032] home: update my own .gitconfig [puppet] - 10https://gerrit.wikimedia.org/r/455879 (owner: 10Volans) [18:00:59] !log arlolra@deploy1001 Finished deploy [parsoid/deploy@d29d829]: Updating Parsoid to 61086f6 (duration: 09m 10s) [18:01:03] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:04:07] (03CR) 10Volans: [C: 031] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/455812 (https://phabricator.wikimedia.org/T202952) (owner: 10Faidon Liambotis) [18:04:44] (03CR) 10Volans: [C: 031] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/455818 (owner: 10Faidon Liambotis) [18:04:56] James_F o/ I'm doing the TW beta cluster deployment now. Do you have a moment to see if the patches in https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180828T1800 are in the right order? [18:04:59] (03PS1) 10ArielGlenn: add thiemowmde to shell users [puppet] - 10https://gerrit.wikimedia.org/r/455882 (https://phabricator.wikimedia.org/T202476) [18:05:06] And anything else I might be missing? :) [18:05:09] Niharika: Looking. [18:06:28] (03CR) 10Volans: [C: 031] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/455858 (owner: 10Faidon Liambotis) [18:07:05] (03CR) 10Volans: [C: 031] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/455820 (owner: 10Faidon Liambotis) [18:08:56] (03PS20) 10Bstorm: WIP toolforge: write/move a sonofgridengine module and toolforge profile [puppet] - 10https://gerrit.wikimedia.org/r/448791 (https://phabricator.wikimedia.org/T200557) [18:09:43] (03CR) 10jerkins-bot: [V: 04-1] WIP toolforge: write/move a sonofgridengine module and toolforge profile [puppet] - 10https://gerrit.wikimedia.org/r/448791 (https://phabricator.wikimedia.org/T200557) (owner: 10Bstorm) [18:11:06] (03Restored) 10Niharika29: Deploy TemplateWizard to beta cluster [mediawiki-config] - 10https://gerrit.wikimedia.org/r/455722 (owner: 10Niharika29) [18:11:11] !log dduvall@deploy1001 Finished scap: testwiki to php-1.32.0-wmf.19 and rebuild l10n cache (duration: 65m 35s) [18:11:14] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:11:34] (03PS2) 10Niharika29: Deploy TemplateWizard to beta cluster [mediawiki-config] - 10https://gerrit.wikimedia.org/r/455722 [18:11:35] !log Updated Parsoid to 61086f6 (T199246, T198221) [18:11:36] (03CR) 10Alex Monk: "I'm adding Bryan who wrote that library as a reviewer. Bryan, what do you think? Might it be easiest to just make a file just containing y" [puppet] - 10https://gerrit.wikimedia.org/r/455277 (https://phabricator.wikimedia.org/T48254) (owner: 10Alex Monk) [18:11:41] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:11:41] T198221: Support dir for refs for in Parsoid - https://phabricator.wikimedia.org/T198221 [18:11:42] T199246: Parsoid should expose content inside to editors instead of hiding it in the meta tag's data-parsoid attribute - https://phabricator.wikimedia.org/T199246 [18:12:34] (03PS3) 10Niharika29: Deploy TemplateWizard to beta cluster [mediawiki-config] - 10https://gerrit.wikimedia.org/r/455722 [18:12:40] (03PS4) 10Niharika29: Deploy TemplateWizard to beta cluster [mediawiki-config] - 10https://gerrit.wikimedia.org/r/455722 [18:13:13] (03CR) 10Dzahn: [C: 031] "UID matches, key matches the one on ticket. should be added to the new group for releasers-wikidiff2 in the next patch" [puppet] - 10https://gerrit.wikimedia.org/r/455882 (https://phabricator.wikimedia.org/T202476) (owner: 10ArielGlenn) [18:14:01] James_F: https://gerrit.wikimedia.org/r/#/c/operations/mediawiki-config/+/455722/ Looks okay? [18:14:15] (03CR) 10ArielGlenn: [C: 032] add thiemowmde to shell users [puppet] - 10https://gerrit.wikimedia.org/r/455882 (https://phabricator.wikimedia.org/T202476) (owner: 10ArielGlenn) [18:14:22] (03PS2) 10ArielGlenn: add thiemowmde to shell users [puppet] - 10https://gerrit.wikimedia.org/r/455882 (https://phabricator.wikimedia.org/T202476) [18:15:06] 10Operations, 10Data-Services, 10SRE-Access-Requests, 10Patch-For-Review: Access to dumps servers - https://phabricator.wikimedia.org/T201350 (10Dzahn) > any other lab servers due to network security policies Ok thanks, honestly i don't understand how it is/was related to labs at all. I do understand th... [18:15:28] 10Operations, 10Data-Services, 10SRE-Access-Requests, 10Patch-For-Review: Access to dumps servers - https://phabricator.wikimedia.org/T201350 (10Dzahn) So technically this access here is not needed and maybe should be reverted for good measure if it's not used? [18:16:03] (03CR) 10Jforrester: [C: 031] Deploy TemplateWizard to beta cluster [mediawiki-config] - 10https://gerrit.wikimedia.org/r/455722 (owner: 10Niharika29) [18:18:09] (03PS1) 10ArielGlenn: add thiemowmde to releasers-wikidiff2 [puppet] - 10https://gerrit.wikimedia.org/r/455885 (https://phabricator.wikimedia.org/T202476) [18:18:53] 10Operations, 10SRE-Access-Requests, 10Performance-Team (Radar): add perf-team admins to releases servers (was: webserver misc static servers) - https://phabricator.wikimedia.org/T202910 (10Dzahn) [18:19:08] (03CR) 10Kaldari: [C: 032] Add TemplateWizard extension [mediawiki-config] - 10https://gerrit.wikimedia.org/r/455598 (owner: 10Niharika29) [18:19:56] (03CR) 10ArielGlenn: [C: 032] add thiemowmde to releasers-wikidiff2 [puppet] - 10https://gerrit.wikimedia.org/r/455885 (https://phabricator.wikimedia.org/T202476) (owner: 10ArielGlenn) [18:20:06] testwiki is loading very slowly. (~40sec for main page) https://test.wikipedia.org/ No problems at test2 or mw.o [18:20:06] (03CR) 10Kaldari: [C: 032] Deploy TemplateWizard to beta cluster [mediawiki-config] - 10https://gerrit.wikimedia.org/r/455722 (owner: 10Niharika29) [18:20:25] (03Merged) 10jenkins-bot: Add TemplateWizard extension [mediawiki-config] - 10https://gerrit.wikimedia.org/r/455598 (owner: 10Niharika29) [18:20:33] (03CR) 10Kaldari: [C: 032] Add TemplateWizard extension [mediawiki-config] - 10https://gerrit.wikimedia.org/r/455725 (owner: 10Niharika29) [18:21:06] (03CR) 10Kaldari: [C: 032] Add TemplateWizard extension [mediawiki-config] - 10https://gerrit.wikimedia.org/r/455601 (owner: 10Niharika29) [18:21:36] (03Merged) 10jenkins-bot: Deploy TemplateWizard to beta cluster [mediawiki-config] - 10https://gerrit.wikimedia.org/r/455722 (owner: 10Niharika29) [18:21:56] 10Operations, 10SRE-Access-Requests, 10wikidiff2, 10Patch-For-Review, 10User-Addshore: Give thiemowmde permission to upload wikidiff2 releases (releasers-wikidiff2) - https://phabricator.wikimedia.org/T202476 (10ArielGlenn) In about 30 minutes this change should be live. After that, we will just be waiti... [18:22:07] quiddity: 1.32.0-wmf.19 was deployed to testwiki early. the slowness may just be due to cold hhvm bytecode caches, but i'll take a closer look before group0 deployment [18:22:25] nod. ty :) [18:23:07] (03Merged) 10jenkins-bot: Add TemplateWizard extension [mediawiki-config] - 10https://gerrit.wikimedia.org/r/455601 (owner: 10Niharika29) [18:25:27] (03PS2) 10Kaldari: Add TemplateWizard extension [mediawiki-config] - 10https://gerrit.wikimedia.org/r/455725 (owner: 10Niharika29) [18:25:50] 10Operations, 10DNS, 10Traffic, 10Mobile, 10Patch-For-Review: Many misc wikis lack mobile domains - https://phabricator.wikimedia.org/T152882 (10Dzahn) Meanwhile new wikis have been created without "m" like: wikimania.wikimedia.org has address 198.35.26.96 Host wikimania.m.wikimedia.org not found: 3(NXD... [18:26:03] (03CR) 10Kaldari: [V: 032 C: 032] Add TemplateWizard extension [mediawiki-config] - 10https://gerrit.wikimedia.org/r/455725 (owner: 10Niharika29) [18:27:18] (03Merged) 10jenkins-bot: Add TemplateWizard extension [mediawiki-config] - 10https://gerrit.wikimedia.org/r/455725 (owner: 10Niharika29) [18:31:18] (03PS1) 10Dzahn: add missing mobile domain for wikimania.wikimedia [dns] - 10https://gerrit.wikimedia.org/r/455886 (https://phabricator.wikimedia.org/T152882) [18:31:31] (03CR) 10jenkins-bot: Add TemplateWizard extension [mediawiki-config] - 10https://gerrit.wikimedia.org/r/455598 (owner: 10Niharika29) [18:31:33] (03CR) 10jenkins-bot: Deploy TemplateWizard to beta cluster [mediawiki-config] - 10https://gerrit.wikimedia.org/r/455722 (owner: 10Niharika29) [18:31:35] (03CR) 10jenkins-bot: Add TemplateWizard extension [mediawiki-config] - 10https://gerrit.wikimedia.org/r/455601 (owner: 10Niharika29) [18:31:37] (03CR) 10jenkins-bot: Add TemplateWizard extension [mediawiki-config] - 10https://gerrit.wikimedia.org/r/455725 (owner: 10Niharika29) [18:31:46] !log kaldari@deploy1001 Synchronized wmf-config/extension-list: (no justification provided) (duration: 00m 57s) [18:31:49] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:33:31] (03PS21) 10Bstorm: WIP toolforge: write/move a sonofgridengine module and toolforge profile [puppet] - 10https://gerrit.wikimedia.org/r/448791 (https://phabricator.wikimedia.org/T200557) [18:33:41] 10Operations, 10SRE-Access-Requests: Please add everyone on the performance team to perf-roots - https://phabricator.wikimedia.org/T202648 (10aaron) [18:33:43] 10Operations, 10SRE-Access-Requests, 10Patch-For-Review: Please add aaron to perf-team - https://phabricator.wikimedia.org/T202650 (10aaron) 05Open>03Resolved Confirmed. [18:33:57] !log kaldari@deploy1001 Synchronized wmf-config/extension-list: (no justification provided) (duration: 00m 55s) [18:34:00] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:34:06] quiddity: i'm seeing a hefty "onload" performance regression across the board at around 05:36 UTC https://grafana.wikimedia.org/dashboard/db/performance-metrics?orgId=1&from=1535410859299&to=1535449642942 [18:34:12] (03CR) 10jerkins-bot: [V: 04-1] WIP toolforge: write/move a sonofgridengine module and toolforge profile [puppet] - 10https://gerrit.wikimedia.org/r/448791 (https://phabricator.wikimedia.org/T200557) (owner: 10Bstorm) [18:34:54] 10Operations, 10ops-eqiad, 10Cloud-VPS, 10Patch-For-Review, 10cloud-services-team (Kanban): rack/setup/install cloudvirt102[34] - https://phabricator.wikimedia.org/T199125 (10RobH) >>! In T199125#4539328, @Cmjohnson wrote: > I took the raid controller offer to see what I could do to connect a disk to the... [18:35:08] !log kaldari@deploy1001 Synchronized wmf-config/InitialiseSettings.php: for TemplateWizard (duration: 00m 55s) [18:35:11] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:35:51] ^ Krinkle are you aware of any perf regression? [18:36:12] fwiw, it falls within the services deployment window [18:36:16] !log kaldari@deploy1001 Synchronized wmf-config/InitialiseSettings-labs.php: for TemplateWizard (duration: 00m 55s) [18:36:19] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:36:59] (03CR) 10Dzahn: [C: 032] swift: Fix checks on drive/filesystem titles to allow for labs ones [puppet] - 10https://gerrit.wikimedia.org/r/402758 (https://phabricator.wikimedia.org/T184236) (owner: 10Alex Monk) [18:37:20] marxarelli: all fine. [18:37:24] marxarelli: but thanks for noticing. [18:37:31] !log kaldari@deploy1001 Synchronized wmf-config/CommonSettings.php: for TemplateWizard (duration: 00m 55s) [18:37:32] I can only reproduce (logged-in/out, firefox/chromium) on testwiki. Though it seems to be speeding up now. [18:37:35] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:37:47] 10Operations, 10hardware-requests: Request for swift ms-be expansion - https://phabricator.wikimedia.org/T201937 (10RobH) So we're currently investigating an issue with the Perc H740P controller on T199125. Once we know what is up with that, we'll be able to quote this out. [18:37:49] 10Operations, 10hardware-requests: Request for swift ms-be refresh - https://phabricator.wikimedia.org/T201938 (10RobH) So we're currently investigating an issue with the Perc H740P controller on T199125. Once we know what is up with that, we'll be able to quote this out. [18:37:51] Krinkle: cool, thanks [18:38:03] marxarelli: last weeks' branch made a huge improvement in the load metric by accidentally deferring our JS payload to after the metric tool sees the load event. [18:38:14] marxarelli: last night, I rolled out a fix for that. [18:38:37] it should still be a bit better than last week, but not as much as it appeared. [18:38:49] I'll investigate and report on it more at a later time. [18:39:14] sounds good [18:39:22] (03PS1) 10Superyetkin: Set $wgCategoryCollation = uca-az on azwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/455887 (https://phabricator.wikimedia.org/T201770) [18:39:36] (03PS1) 10Papaul: DHCP: Change MAC address to embedded NIC 1 for Moritzm to test 10GB drivers [puppet] - 10https://gerrit.wikimedia.org/r/455888 [18:40:00] quiddity: i'm seeing improved load times on testwiki as well after running a few dozen curl commands against Special:Random, so i suspect it was just cold bytecode caches [18:40:33] (03CR) 10jerkins-bot: [V: 04-1] DHCP: Change MAC address to embedded NIC 1 for Moritzm to test 10GB drivers [puppet] - 10https://gerrit.wikimedia.org/r/455888 (owner: 10Papaul) [18:41:00] nod. TIL that was a thing. :) (I always warm my car up for ~30 seconds before moving, so I will think of it like that... >.> ) [18:41:31] (03PS1) 10Andrew Bogott: region-migrate: migrate DNS for floating IPs [puppet] - 10https://gerrit.wikimedia.org/r/455889 (https://phabricator.wikimedia.org/T191790) [18:41:51] haha, not completely dissimilar [18:42:30] (03PS3) 10Dzahn: swift: Fix checks on drive/filesystem titles to allow for labs ones [puppet] - 10https://gerrit.wikimedia.org/r/402758 (https://phabricator.wikimedia.org/T184236) (owner: 10Alex Monk) [18:42:48] (03CR) 10Dzahn: "manual rebase to detach it from Change-Id: I90632b779ec2716128 so this can be merged without the parent" [puppet] - 10https://gerrit.wikimedia.org/r/402758 (https://phabricator.wikimedia.org/T184236) (owner: 10Alex Monk) [18:43:05] (03CR) 10jerkins-bot: [V: 04-1] swift: Fix checks on drive/filesystem titles to allow for labs ones [puppet] - 10https://gerrit.wikimedia.org/r/402758 (https://phabricator.wikimedia.org/T184236) (owner: 10Alex Monk) [18:43:07] (03CR) 10Andrew Bogott: [C: 032] region-migrate: migrate DNS for floating IPs [puppet] - 10https://gerrit.wikimedia.org/r/455889 (https://phabricator.wikimedia.org/T191790) (owner: 10Andrew Bogott) [18:44:15] (03PS4) 10Ppchelko: Replace the semver patch version in Accept with x [puppet] - 10https://gerrit.wikimedia.org/r/455036 (https://phabricator.wikimedia.org/T202682) [18:44:17] (03CR) 10Dzahn: [C: 04-1] "no that was wrong as well.. too much changed meanwhile" [puppet] - 10https://gerrit.wikimedia.org/r/402758 (https://phabricator.wikimedia.org/T184236) (owner: 10Alex Monk) [18:47:04] (03PS4) 10Dzahn: swift: Fix checks on drive/filesystem titles to allow for labs ones [puppet] - 10https://gerrit.wikimedia.org/r/402758 (https://phabricator.wikimedia.org/T184236) (owner: 10Alex Monk) [18:48:26] (03CR) 10Dzahn: "ok, so after another rebase we are left with just the change in init_device.pp but not the one in mount_filesystem.pp, if you look at moun" [puppet] - 10https://gerrit.wikimedia.org/r/402758 (https://phabricator.wikimedia.org/T184236) (owner: 10Alex Monk) [18:48:50] (03PS22) 10Bstorm: WIP toolforge: write/move a sonofgridengine module and toolforge profile [puppet] - 10https://gerrit.wikimedia.org/r/448791 (https://phabricator.wikimedia.org/T200557) [18:50:04] 10Operations, 10SRE-Access-Requests: Access to restbase servers (including sudo) for Imarlier - https://phabricator.wikimedia.org/T202563 (10VColeman) approved [18:50:34] 10Operations, 10SRE-Access-Requests: request to add imarlier to perf-roots - https://phabricator.wikimedia.org/T202657 (10VColeman) approved [18:50:44] (03PS4) 10Dzahn: netbox: move IP addresses from class to Hiera [puppet] - 10https://gerrit.wikimedia.org/r/455273 [18:51:26] (03CR) 10Dzahn: [C: 032] "reducing hardcoded IP addresses in puppet classes" [puppet] - 10https://gerrit.wikimedia.org/r/455273 (owner: 10Dzahn) [18:51:59] (03PS1) 10Nuria: Deploy wikistats for master branch [puppet] - 10https://gerrit.wikimedia.org/r/455892 (https://phabricator.wikimedia.org/T203017) [18:52:17] (03PS2) 10Nuria: Deploy wikistats for master branch [puppet] - 10https://gerrit.wikimedia.org/r/455892 (https://phabricator.wikimedia.org/T203017) [18:54:03] (03PS3) 10Nuria: Deploy wikistats from master branch [puppet] - 10https://gerrit.wikimedia.org/r/455892 (https://phabricator.wikimedia.org/T203017) [18:58:34] (03CR) 10Ppchelko: "Done. Thank you :)" [puppet] - 10https://gerrit.wikimedia.org/r/455036 (https://phabricator.wikimedia.org/T202682) (owner: 10Ppchelko) [19:00:04] marxarelli: #bothumor When your hammer is PHP, everything starts looking like a thumb. Rise for MediaWiki train - Americas version. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180828T1900). [19:02:41] (03PS5) 10Ppchelko: Replace the semver patch version in Accept with x [puppet] - 10https://gerrit.wikimedia.org/r/455036 (https://phabricator.wikimedia.org/T202682) [19:02:47] (03PS2) 10Dzahn: DHCP: Change MAC address to embedded NIC 1 for Moritzm to test 10GB drivers [puppet] - 10https://gerrit.wikimedia.org/r/455888 (https://phabricator.wikimedia.org/T196477) (owner: 10Papaul) [19:02:57] (03PS3) 10Dzahn: DHCP: Change MAC address to embedded NIC 1 for Moritzm to test 10GB drivers [puppet] - 10https://gerrit.wikimedia.org/r/455888 (https://phabricator.wikimedia.org/T196477) (owner: 10Papaul) [19:03:08] (03CR) 10Dzahn: [C: 032] "fixed: Line 3: Expected one space after 'Bug:'" [puppet] - 10https://gerrit.wikimedia.org/r/455888 (https://phabricator.wikimedia.org/T196477) (owner: 10Papaul) [19:03:22] (03CR) 10Ppchelko: "Also added the possibility to omit the `/"` in the end of the profile and added a test for that" [puppet] - 10https://gerrit.wikimedia.org/r/455036 (https://phabricator.wikimedia.org/T202682) (owner: 10Ppchelko) [19:05:22] (03CR) 10Dduvall: [C: 032] group0 to 1.32.0-wmf.19 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/455866 (owner: 10Dduvall) [19:06:56] (03Merged) 10jenkins-bot: group0 to 1.32.0-wmf.19 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/455866 (owner: 10Dduvall) [19:08:09] 10Operations, 10ops-codfw, 10Patch-For-Review: rack/setup/install backup2001 - https://phabricator.wikimedia.org/T196477 (10Dzahn) MAC in DHCP has changed to embedded NIC 1 and puppet ran on install2002 to update config. [19:08:31] (03PS2) 10Gehel: Fix daily loader to cd do proper dir [puppet] - 10https://gerrit.wikimedia.org/r/455867 (owner: 10Smalyshev) [19:09:24] (03CR) 10Gehel: [C: 032] Fix daily loader to cd do proper dir [puppet] - 10https://gerrit.wikimedia.org/r/455867 (owner: 10Smalyshev) [19:10:57] (03CR) 10Dzahn: "do you remember why this is stalled?" [puppet] - 10https://gerrit.wikimedia.org/r/439504 (https://phabricator.wikimedia.org/T196835) (owner: 10Paladox) [19:11:32] (03CR) 10Dzahn: "oh yea, of course it needs Change-Id: Ib2061118279b606 nevermind my comment" [puppet] - 10https://gerrit.wikimedia.org/r/439504 (https://phabricator.wikimedia.org/T196835) (owner: 10Paladox) [19:12:19] mutante: lots of puppet action going on. anything that i should be aware for train deployment? i'm about to switch group0 over [19:12:57] (03CR) 10Dzahn: "former comments are still valid. once there is a date that has been decided it will be unstalled" [puppet] - 10https://gerrit.wikimedia.org/r/439444 (https://phabricator.wikimedia.org/T196812) (owner: 10Paladox) [19:13:38] marxarelli: not for anything i am merging at least.. no.. all unrelated to mediawiki [19:13:56] right on. ty [19:14:02] rolling group0 then... [19:14:09] no worries, also i'll stop merging for now and get lunch :) [19:14:18] :) [19:15:11] !log dduvall@deploy1001 rebuilt and synchronized wikiversions files: group0 to 1.32.0-wmf.19 [19:15:14] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:16:28] (03CR) 10MarcoAurelio: "recheck" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/455887 (https://phabricator.wikimedia.org/T201770) (owner: 10Superyetkin) [19:16:30] (03CR) 10Herron: [C: 04-1] "> The inability to configure hosts in labs to route wiki mail like" (036 comments) [puppet] - 10https://gerrit.wikimedia.org/r/439774 (owner: 10Alex Monk) [19:17:03] (03CR) 10jerkins-bot: [V: 04-1] Set $wgCategoryCollation = uca-az on azwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/455887 (https://phabricator.wikimedia.org/T201770) (owner: 10Superyetkin) [19:18:08] (03CR) 10jenkins-bot: group0 to 1.32.0-wmf.19 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/455866 (owner: 10Dduvall) [19:18:37] (03CR) 10Zoranzoki21: [C: 031] "With code is everything ok. Only composer making problems. -1 by Jenkins is not related to code." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/455887 (https://phabricator.wikimedia.org/T201770) (owner: 10Superyetkin) [19:19:19] (03PS1) 10Bstorm: Revert "dumps: give access to perf-team" [puppet] - 10https://gerrit.wikimedia.org/r/455902 [19:19:26] (03CR) 10Jforrester: [C: 031] add missing mobile domain for wikimania.wikimedia [dns] - 10https://gerrit.wikimedia.org/r/455886 (https://phabricator.wikimedia.org/T152882) (owner: 10Dzahn) [19:19:34] (03PS2) 10Bstorm: Revert "dumps: give access to perf-team" [puppet] - 10https://gerrit.wikimedia.org/r/455902 [19:20:40] 10Operations, 10Data-Services, 10SRE-Access-Requests, 10Patch-For-Review: Access to dumps servers - https://phabricator.wikimedia.org/T201350 (10Bstorm) I think that's a good idea, personally. I'll roll it on back. https://gerrit.wikimedia.org/r/c/operations/puppet/+/455902 Unless we have any objections... [19:23:33] (03CR) 10Gehel: [C: 031] mediawiki: add siteinfo-related methods (034 comments) [software/spicerack] - 10https://gerrit.wikimedia.org/r/455851 (https://phabricator.wikimedia.org/T199079) (owner: 10Volans) [19:26:00] !log temporarily disabling puppet agents for rolling kernel updates/reboots on puppetmaster hosts [19:26:03] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:26:58] 10Operations, 10SRE-Access-Requests: request to add imarlier to perf-roots - https://phabricator.wikimedia.org/T202657 (10ArielGlenn) [19:27:31] 10Operations, 10SRE-Access-Requests: Access to restbase servers (including sudo) for Imarlier - https://phabricator.wikimedia.org/T202563 (10ArielGlenn) [19:30:09] (03PS1) 10ArielGlenn: add imarlier to perf-roots [puppet] - 10https://gerrit.wikimedia.org/r/455904 (https://phabricator.wikimedia.org/T202657) [19:32:18] (03CR) 10ArielGlenn: [C: 032] add imarlier to perf-roots [puppet] - 10https://gerrit.wikimedia.org/r/455904 (https://phabricator.wikimedia.org/T202657) (owner: 10ArielGlenn) [19:32:55] PROBLEM - Host rhodium is DOWN: PING CRITICAL - Packet loss = 100% [19:34:09] rhodium taking longer than it should to reboot… looking [19:34:17] ah that would be why rhodium puppet-merge failed. ugh [19:34:25] RECOVERY - Host rhodium is UP: PING OK - Packet loss = 0%, RTA = 0.57 ms [19:35:18] :( [19:35:29] ran it from there, should all be ok now [19:36:10] 10Operations, 10SRE-Access-Requests, 10Patch-For-Review: request to add imarlier to perf-roots - https://phabricator.wikimedia.org/T202657 (10ArielGlenn) [19:36:31] cool, sorry for the trouble. rhodium was the first host. rolling through the rest of eqiad now, then codfw [19:36:56] no worries [19:37:22] 10Operations, 10SRE-Access-Requests, 10Patch-For-Review: request to add imarlier to perf-roots - https://phabricator.wikimedia.org/T202657 (10ArielGlenn) In 60 minutes or so this should be live, and after that if you could just check that access works as expected, this ticket can be closed. [19:38:46] PROBLEM - Host puppetmaster1001 is DOWN: PING CRITICAL - Packet loss = 100% [19:39:05] RECOVERY - Host puppetmaster1001 is UP: PING WARNING - Packet loss = 50%, RTA = 110.47 ms [19:41:18] 10Operations, 10Beta-Cluster-Infrastructure, 10Jenkins, 10Patch-For-Review, 10Release-Engineering-Team (Kanban): Upgrade deployment-prep deployment servers to stretch - https://phabricator.wikimedia.org/T192561 (10Krenair) >>! In T192561#4530411, @Dzahn wrote: > Production deployment servers don't have a... [19:43:55] PROBLEM - Host puppetmaster2001 is DOWN: PING CRITICAL - Packet loss = 100% [19:44:15] RECOVERY - Host puppetmaster2001 is UP: PING OK - Packet loss = 0%, RTA = 36.11 ms [19:47:14] !log puppetmaster reboots finished, re-enabling puppet agents [19:47:15] (03CR) 10BryanDavis: ircecho: Add support for authenticating with SASL (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/455277 (https://phabricator.wikimedia.org/T48254) (owner: 10Alex Monk) [19:47:18] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:47:41] (03PS4) 10Volans: mediawiki: add siteinfo-related methods [software/spicerack] - 10https://gerrit.wikimedia.org/r/455851 (https://phabricator.wikimedia.org/T199079) [19:48:47] (03CR) 10Volans: "done" (032 comments) [software/spicerack] - 10https://gerrit.wikimedia.org/r/455851 (https://phabricator.wikimedia.org/T199079) (owner: 10Volans) [19:49:10] (03PS3) 10Dzahn: Add fixcopyright(\.m)?\.wikimedia\.org [dns] - 10https://gerrit.wikimedia.org/r/455368 (https://phabricator.wikimedia.org/T202819) (owner: 10Reedy) [19:52:45] (03PS1) 10Andrew Bogott: region-migrate: migrate web proxies to the new VM [puppet] - 10https://gerrit.wikimedia.org/r/455911 (https://phabricator.wikimedia.org/T191790) [19:55:22] (03CR) 10Andrew Bogott: [C: 032] region-migrate: migrate web proxies to the new VM [puppet] - 10https://gerrit.wikimedia.org/r/455911 (https://phabricator.wikimedia.org/T191790) (owner: 10Andrew Bogott) [20:16:45] (03CR) 10Smalyshev: [C: 031] "The script looks working now, so let's merge it." [puppet] - 10https://gerrit.wikimedia.org/r/455766 (owner: 10Smalyshev) [20:16:47] (03CR) 10Urbanecm: [C: 031] Set $wgCategoryCollation = uca-az on azwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/455887 (https://phabricator.wikimedia.org/T201770) (owner: 10Superyetkin) [20:16:58] (03CR) 10Urbanecm: [C: 031] "recheck" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/455887 (https://phabricator.wikimedia.org/T201770) (owner: 10Superyetkin) [20:20:41] 10Puppet, 10Cloud-Services, 10Toolforge, 10Documentation: Document our GridEngine set up - https://phabricator.wikimedia.org/T88733 (10srodlund) a:03srodlund [20:25:41] (03CR) 10Legoktm: "This is causing fatals: 'TemplateWizard requires TemplateData to be installed.'" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/455725 (owner: 10Niharika29) [20:25:47] (03CR) 10Urbanecm: [C: 031] "To SWATter: mwscript updateCollation.php --wiki=azwiki --previous-collation=uppercase should be run after deployment." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/455887 (https://phabricator.wikimedia.org/T201770) (owner: 10Superyetkin) [20:25:52] jouncebot: now [20:25:56] jouncebot: next [20:25:58] For the next 0 hour(s) and 34 minute(s): MediaWiki train - Americas version (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180828T1900) [20:25:59] In 2 hour(s) and 34 minute(s): Evening SWAT (Max 6 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180828T2300) [20:26:53] marxarelli: are you done with the train? I need to deploy a mw-config patch to unbreak beta cluster [20:28:21] legoktm: all done! [20:29:53] (03PS1) 10Legoktm: Only enable TemplateWizard if TemplateData is also enabled [mediawiki-config] - 10https://gerrit.wikimedia.org/r/456015 [20:30:06] PROBLEM - MariaDB Slave Lag: s8 on dbstore2001 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 306.36 seconds [20:30:10] (03CR) 10Dzahn: [C: 032] Add fixcopyright(\.m)?\.wikimedia\.org [dns] - 10https://gerrit.wikimedia.org/r/455368 (https://phabricator.wikimedia.org/T202819) (owner: 10Reedy) [20:30:29] (03CR) 10Jforrester: [C: 031] Only enable TemplateWizard if TemplateData is also enabled [mediawiki-config] - 10https://gerrit.wikimedia.org/r/456015 (owner: 10Legoktm) [20:30:41] (03CR) 10Legoktm: [C: 032] Only enable TemplateWizard if TemplateData is also enabled [mediawiki-config] - 10https://gerrit.wikimedia.org/r/456015 (owner: 10Legoktm) [20:31:03] (03CR) 10Jforrester: Add TemplateWizard extension (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/455725 (owner: 10Niharika29) [20:31:59] (03Merged) 10jenkins-bot: Only enable TemplateWizard if TemplateData is also enabled [mediawiki-config] - 10https://gerrit.wikimedia.org/r/456015 (owner: 10Legoktm) [20:33:37] !log legoktm@deploy1001 Synchronized wmf-config/CommonSettings.php: beta only: Only enable TemplateWizard if TemplateData is also enabled (duration: 00m 56s) [20:33:41] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:34:28] (03CR) 10jenkins-bot: Only enable TemplateWizard if TemplateData is also enabled [mediawiki-config] - 10https://gerrit.wikimedia.org/r/456015 (owner: 10Legoktm) [20:35:29] 10Operations, 10Analytics, 10hardware-requests: eqiad: (1) new stat box to offload users from stat1005 - https://phabricator.wikimedia.org/T196345 (10RobH) So we ordered spare systems on T195418. The price per unit is listed on that #procurement task. The specs are: Dual Intel Xeon Silver 4110 2.1G, 8C/16... [20:36:24] (03CR) 10Ema: "One small issue with the test!" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/455036 (https://phabricator.wikimedia.org/T202682) (owner: 10Ppchelko) [20:37:30] 10Operations, 10Analytics, 10hardware-requests: eqiad: (1) new stat box to offload users from stat1005 - https://phabricator.wikimedia.org/T196345 (10RobH) [20:37:45] RECOVERY - MariaDB Slave Lag: s8 on dbstore2001 is OK: OK slave_sql_lag Replication lag: 56.42 seconds [20:39:36] (03CR) 10Gehel: [C: 031] "LGTM" [software/spicerack] - 10https://gerrit.wikimedia.org/r/455851 (https://phabricator.wikimedia.org/T199079) (owner: 10Volans) [20:43:14] (03PS1) 10Ottomata: Install binary pyarrow package to /usr/lib/spark2/python on install [debs/spark2] (debian) - 10https://gerrit.wikimedia.org/r/456019 (https://phabricator.wikimedia.org/T202812) [20:43:17] (03PS6) 10Ppchelko: Replace the semver patch version in Accept with x [puppet] - 10https://gerrit.wikimedia.org/r/455036 (https://phabricator.wikimedia.org/T202682) [20:43:32] (03CR) 10Ppchelko: Replace the semver patch version in Accept with x (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/455036 (https://phabricator.wikimedia.org/T202682) (owner: 10Ppchelko) [20:58:20] (03PS1) 10Ottomata: spark2 - Render a custom spark-env.sh that defaults to using python3 (and ipython3 for driver) [puppet] - 10https://gerrit.wikimedia.org/r/456020 [20:59:04] (03CR) 10jerkins-bot: [V: 04-1] spark2 - Render a custom spark-env.sh that defaults to using python3 (and ipython3 for driver) [puppet] - 10https://gerrit.wikimedia.org/r/456020 (owner: 10Ottomata) [21:01:24] (03PS2) 10Ottomata: spark2 - custom spark-env.sh that defaults to using python3 (and ipython3) [puppet] - 10https://gerrit.wikimedia.org/r/456020 [21:03:02] 10Operations, 10Beta-Cluster-Infrastructure, 10Jenkins, 10Patch-For-Review, 10Release-Engineering-Team (Kanban): Upgrade deployment-prep deployment servers to stretch - https://phabricator.wikimedia.org/T192561 (10Dzahn) Thanks, i see: ``` # Parsoid JavaScript dependencies are updated on beta via npm p... [21:08:31] (03PS4) 10Jforrester: Remove obsolete $wgPopupsBetaFeature, Part I: CommonSettings [mediawiki-config] - 10https://gerrit.wikimedia.org/r/450906 (owner: 10Prtksxna) [21:08:33] (03PS2) 10Jforrester: Remove obsolete $wgPopupsBetaFeature, Part II: InitialiseSettings-labs [mediawiki-config] - 10https://gerrit.wikimedia.org/r/452863 [21:08:35] (03PS7) 10Jforrester: Remove obsolete $wgPopupsBetaFeature, Part III: InitialiseSettings [mediawiki-config] - 10https://gerrit.wikimedia.org/r/444574 (owner: 10Prtksxna) [21:09:35] ACKNOWLEDGEMENT - High lag on wdqs1004 is CRITICAL: 9535 ge 3600 Gehel catching up on updates after reimage https://grafana.wikimedia.org/dashboard/db/wikidata-query-service?orgId=1&panelId=8&fullscreen [21:10:56] PROBLEM - Disk space on elastic1024 is CRITICAL: DISK CRITICAL - free space: /srv 49592 MB (10% inode=99%) [21:11:06] ACKNOWLEDGEMENT - High lag on wdqs1010 is CRITICAL: 7720 ge 3600 Gehel catching up on updates after data transfer to wdqs1004 https://grafana.wikimedia.org/dashboard/db/wikidata-query-service?orgId=1&panelId=8&fullscreen [21:19:46] PROBLEM - MediaWiki exceptions and fatals per minute on graphite1001 is CRITICAL: CRITICAL: 70.00% of data above the critical threshold [50.0] https://grafana.wikimedia.org/dashboard/db/mediawiki-graphite-alerts?orgId=1&panelId=2&fullscreen [21:20:15] (03CR) 10Gehel: [C: 031] "LGTM" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/454588 (owner: 10DCausse) [21:20:58] (03CR) 10Jdlrobson: [C: 031] Remove obsolete $wgPopupsBetaFeature, Part III: InitialiseSettings [mediawiki-config] - 10https://gerrit.wikimedia.org/r/444574 (owner: 10Prtksxna) [21:24:05] RECOVERY - MediaWiki exceptions and fatals per minute on graphite1001 is OK: OK: Less than 70.00% above the threshold [25.0] https://grafana.wikimedia.org/dashboard/db/mediawiki-graphite-alerts?orgId=1&panelId=2&fullscreen [21:24:40] (03Abandoned) 10Gehel: WIP - Install SonarQube server [puppet] - 10https://gerrit.wikimedia.org/r/306892 (owner: 10Racodond) [21:27:34] 10Operations, 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Move internal sites hosted on thorium to ganeti instance(s) - https://phabricator.wikimedia.org/T202011 (10Ottomata) [21:30:16] PROBLEM - Disk space on elastic1024 is CRITICAL: DISK CRITICAL - free space: /srv 52606 MB (10% inode=99%) [21:31:45] PROBLEM - Filesystem available is greater than filesystem size on ms-be2040 is CRITICAL: cluster=swift device=/dev/sdd1 fstype=xfs instance=ms-be2040:9100 job=node mountpoint=/srv/swift-storage/sdd1 site=codfw https://grafana.wikimedia.org/dashboard/db/host-overview?orgId=1&var-server=ms-be2040&var-datasource=codfw%2520prometheus%252Fops [21:32:36] ACKNOWLEDGEMENT - Disk space on elastic1024 is CRITICAL: DISK CRITICAL - free space: /srv 51948 MB (10% inode=99%): Gehel shards are being moved away from elastic1024 [21:37:56] RECOVERY - Disk space on elastic1024 is OK: DISK OK [21:46:41] 10Operations, 10MediaWiki-Containers: Homepage for https://docker-registry.wikimedia.org - https://phabricator.wikimedia.org/T179696 (10colewhite) {T202504} [21:50:46] PROBLEM - DPKG on stat1006 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [21:51:04] !log install nmap on stat1006 && chmod 500 /usr/bin/nmap [21:51:08] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:57:33] (03PS5) 10Alex Monk: ircecho: Add support for authenticating with SASL [puppet] - 10https://gerrit.wikimedia.org/r/455277 (https://phabricator.wikimedia.org/T48254) [22:05:14] jouncebot: now [22:05:14] No deployments scheduled for the next 0 hour(s) and 54 minute(s) [22:05:50] OK if I deploy a patch for T203006 and T203029 ? [22:05:51] T203029: Config:Dashiki:* on Meta can't be opened - https://phabricator.wikimedia.org/T203029 [22:05:51] T203006: PHP Fatal Error: Call to undefined method ZeroBanner\\ZeroConfig::getLicenseObject() - https://phabricator.wikimedia.org/T203006 [22:05:57] (/cc mdholloway .) [22:08:20] * mdholloway looks up, waves [22:08:28] I'll take silence as assent. [22:17:35] 10Operations, 10SRE-Access-Requests, 10Release-Engineering-Team (Watching / External): Add contint-roots to releases{1,2}001 - https://phabricator.wikimedia.org/T201470 (10thcipriani) >>! In T201470#4537642, @ArielGlenn wrote: > If it's not just about installing the package and restarting but also troublesho... [22:36:33] !log jforrester@deploy1001 Synchronized php-1.32.0-wmf.19/extensions/JsonConfig/includes/JCSingleton.php: Hot-deploy T203006 fix (duration: 00m 57s) [22:36:38] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:36:39] T203006: PHP Fatal Error: Call to undefined method ZeroBanner\\ZeroConfig::getLicenseObject() - https://phabricator.wikimedia.org/T203006 [22:37:37] !log jforrester@deploy1001 Synchronized php-1.32.0-wmf.18/extensions/JsonConfig/includes/JCSingleton.php: Hot-deploy T203006 fix (duration: 00m 56s) [22:37:44] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:40:03] OK, well, it worked for T203006 but not T203029 it seems. [22:40:04] T203029: Config:Dashiki:* on Meta can't be opened - https://phabricator.wikimedia.org/T203029 [22:40:07] I give up the conch. [22:47:46] 10Operations, 10JADE, 10TechCom, 10Epic, and 3 others: Deploy JADE extension to production - https://phabricator.wikimedia.org/T183381 (10awight) 05stalled>03Open [22:48:36] 10Operations, 10JADE, 10TechCom, 10Epic, and 3 others: Deploy JADE extension to production - https://phabricator.wikimedia.org/T183381 (10awight) Unblocking as I have steps to take, then we'll resubmit for another round of technical and community review. [22:50:38] 10Operations, 10JADE, 10TechCom, 10Epic, and 3 others: Deploy JADE extension to production - https://phabricator.wikimedia.org/T183381 (10awight) [22:55:01] (03PS2) 10Dzahn: add missing mobile domain for wikimania.wikimedia [dns] - 10https://gerrit.wikimedia.org/r/455886 (https://phabricator.wikimedia.org/T152882) [22:55:27] (03CR) 10Dzahn: [C: 032] add missing mobile domain for wikimania.wikimedia [dns] - 10https://gerrit.wikimedia.org/r/455886 (https://phabricator.wikimedia.org/T152882) (owner: 10Dzahn) [23:00:04] addshore, hashar, aude, MaxSem, twentyafterfour, RoanKattouw, Dereckson, thcipriani, Niharika, and zeljkof: #bothumor When your hammer is PHP, everything starts looking like a thumb. Rise for Evening SWAT (Max 6 patches). (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180828T2300). [23:00:04] mdholloway: A patch you scheduled for Evening SWAT (Max 6 patches) is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker. [23:01:32] 10Operations, 10JADE, 10TechCom, 10Epic, and 3 others: Deploy JADE extension to production - https://phabricator.wikimedia.org/T183381 (10awight) [23:02:13] 10Operations, 10SRE-Access-Requests, 10Release-Engineering-Team (Watching / External): Add contint-roots to releases{1,2}001 - https://phabricator.wikimedia.org/T201470 (10thcipriani) If there are security concerns about adding contint-roots to the releases machines, it might be a better option for SRE to ha... [23:05:05] (03CR) 10Dzahn: [C: 032] "https://wikimania.m.wikimedia.org/wiki/Main_Page" [dns] - 10https://gerrit.wikimedia.org/r/455886 (https://phabricator.wikimedia.org/T152882) (owner: 10Dzahn) [23:05:18] mdholloway: around for SWAT? I can SWAT if so. [23:05:56] thcipriani: thanks, but James_F actually already deployed the backport i'd scheduled [23:06:03] oh, missed that :) [23:06:18] so, nothing further, i think! [23:06:20] Ha, sorry. [23:06:24] sorry i forgot to de-schedule [23:06:25] It was UBN enough. [23:06:27] My fault. [23:06:59] no worries! Not having to deploy is, believe it or not, not a disappointment for me in most cases :) [23:07:40] James_F: i'm trying to enable the fundraising role to check out why the fix isn't working for T203029 but vagrant isn't cooperating :( [23:07:41] T203029: Config:Dashiki:* on Meta can't be opened - https://phabricator.wikimedia.org/T203029 [23:07:47] thcipriani: ha, fair enough :) [23:08:14] mdholloway: Fun. :-( [23:08:26] Error: Evaluation Error: Error while evaluating a Resource Statement, Evaluation Error: Error while evaluating a Resource Statement, Duplicate declaration: Git::Clone[mediawiki/extensions/ParserFunctions] is already declared in file /vagrant/puppet/modules/mediawiki/manifests/extension.pp:162; cannot redeclare at /vagrant/puppet/modules/mediawiki/manifests/extension.pp:162 at [23:08:26] /vagrant/puppet/modules/mediawiki/manifests/extension.pp:162:5 at /vagrant/puppet/modules/role/manifests/parserfunctions.pp:5 on node vagrant.mediawiki-vagrant.dev [23:10:37] mdholloway: try `vagrant roles disable parserfunctions` [23:10:46] 10Operations, 10JADE, 10TechCom, 10Epic, and 3 others: Deploy JADE extension to production - https://phabricator.wikimedia.org/T183381 (10awight) [23:11:38] it looks like that is getting included twice in your Puppet manifests, probably because you have it enabled globally and then the payments role enables it again [23:11:40] bd808: `'parserfunctions' is not currently enabled.` is there a convenient way to check what's pulling it in as a dependency? [23:12:05] i don't have it explicitly enabled. [23:12:47] `git grep parserfunctions`... there are a bunch of roles that pull it in. We should fix the payments role to install it globally [23:13:24] quick hack: edit puppet/modules/payments/manifests/init.pp and remove 'payments:ParserFunctions', from line 61 [23:13:50] ah, looks like it's coming in through both visualeditor and zero in my installation [23:14:22] i'll disable VE and if that doesn't fix it, i'll do the parserfunctions manifest hack [23:17:49] (03PS1) 10Dzahn: add .m. mobile domain for wikimaniateam.wikimedia [dns] - 10https://gerrit.wikimedia.org/r/456047 (https://phabricator.wikimedia.org/T152882) [23:18:41] (03PS2) 10Dzahn: add .m. mobile domain for wikimaniateam.wikimedia [dns] - 10https://gerrit.wikimedia.org/r/456047 (https://phabricator.wikimedia.org/T152882) [23:18:52] (03CR) 10Jforrester: [C: 031] add .m. mobile domain for wikimaniateam.wikimedia [dns] - 10https://gerrit.wikimedia.org/r/456047 (https://phabricator.wikimedia.org/T152882) (owner: 10Dzahn) [23:19:13] (03CR) 10Dzahn: [C: 032] add .m. mobile domain for wikimaniateam.wikimedia [dns] - 10https://gerrit.wikimedia.org/r/456047 (https://phabricator.wikimedia.org/T152882) (owner: 10Dzahn) [23:26:07] (03CR) 10Dzahn: [V: 032 C: 032] tor_relay: temp allow rsync of datadir for migration [puppet] - 10https://gerrit.wikimedia.org/r/455745 (https://phabricator.wikimedia.org/T196701) (owner: 10Dzahn) [23:26:16] (03PS3) 10Dzahn: tor_relay: temp allow rsync of datadir for migration [puppet] - 10https://gerrit.wikimedia.org/r/455745 (https://phabricator.wikimedia.org/T196701) [23:26:51] (03CR) 10jerkins-bot: [V: 04-1] tor_relay: temp allow rsync of datadir for migration [puppet] - 10https://gerrit.wikimedia.org/r/455745 (https://phabricator.wikimedia.org/T196701) (owner: 10Dzahn) [23:26:59] (03CR) 10Dzahn: [V: 032 C: 032] tor_relay: temp allow rsync of datadir for migration [puppet] - 10https://gerrit.wikimedia.org/r/455745 (https://phabricator.wikimedia.org/T196701) (owner: 10Dzahn) [23:27:22] (03CR) 10Dzahn: [V: 032 C: 032] "aware of the jenkins issue, it is a temp thing that will be reverted" [puppet] - 10https://gerrit.wikimedia.org/r/455745 (https://phabricator.wikimedia.org/T196701) (owner: 10Dzahn) [23:29:48] (03PS1) 10Dzahn: Revert "tor_relay: temp allow rsync of datadir for migration" [puppet] - 10https://gerrit.wikimedia.org/r/456049 [23:30:32] (03CR) 10Dzahn: [C: 04-1] "not yet, but creating it now as a reminder" [puppet] - 10https://gerrit.wikimedia.org/r/456049 (owner: 10Dzahn) [23:43:47] mutante, I wonder if the wikis in https://phabricator.wikimedia.org/T152882 can just be done all in one go [23:45:06] Krenair: yea, i am not sure. wikimania-related was easier so i did the low-hanging fruit [23:45:44] wouldnt touch for example login or zero just like that [23:46:07] 10Operations, 10DNS, 10Traffic, 10Mobile, 10Patch-For-Review: Many misc wikis lack mobile domains - https://phabricator.wikimedia.org/T152882 (10Krenair) [23:46:13] and how much does it matter for private ones like boardgovcom.. i dunno [23:47:05] PROBLEM - MariaDB Slave Lag: s2 on dbstore2001 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 317.99 seconds [23:47:46] why is wikitech.m not expected, btw [23:48:14] mutante, that's kind of complicated [23:48:41] MobileFrontend is the extension that is used to serve these mobile variant domains [23:48:56] yep [23:48:57] it expects to get some info from varnish [23:49:06] but these sites do not sit behind varnish [23:49:25] ah, right, makes sense [23:49:29] heh [23:49:35] well I wouldn't say that what I just said made sense [23:49:40] it's hysterical raisins [23:50:03] when wikitech becomes normal and fully integrated into the cluster it would make sense to have a mobile variant just like (almost) everywhere else [23:50:18] well, that wikitech isnt part of the cluster [23:50:23] yea, *nod* [23:50:46] right now wikitech and its labtest evil twin are directly exposed [23:51:56] I think there is a way to get MobileFrontend to function without Varnish, I must have achieved it at some point because I used to run it locally when developing VE stuff [23:52:06] idk how feasible that setup is in wikimedia prod [23:52:24] yea, it's because it used to be the labscontroller before horizon [23:52:33] careful [23:52:44] labscontroller sounds like labcontrol which has it's own meaning :) [23:53:02] I think historically it was located on the same host too :)) [23:53:31] virt0 then virt1000 (?) [23:53:33] oops. eh.. "cloud-vps-admin-web-ui' [23:53:49] yes [23:53:59] though of course at that time 'cloud-vps' was not a thing [23:54:09] or it was but we didn't call it that yet [23:54:18] yea, it was "labs" [23:56:06] btw, something just started working on beta [23:56:10] RECOVERY - English Wikipedia Main page on beta-cluster is OK [23:56:19] I think the only lingering reason to keep wikitech exposed directly to the internet without going via LVS+nginx+varnish and friends is in case something breaks [23:56:25] though there is wikitech-static now, so *shrug* [23:56:48] mutante, yeah idk why that broke briefly [23:56:58] maybe someone merged something bad [23:57:49] yea, there is wikitech-static and syncing and monitoring that the sync works [23:59:11] searched Icinga for it right now.. found an MW version warning [23:59:24] mutante, come the thought of it, that beta alert [23:59:28] socket timeout [23:59:43] means nginx/varnish took too long to start talking [23:59:50] if mediawiki was broken you'd get something different