[00:20:22] (03PS1) 10Thcipriani: gerrit: deploy gervert via scap3 [puppet] - 10https://gerrit.wikimedia.org/r/506932 [02:19:31] PROBLEM - Postgres Replication Lag on maps1003 is CRITICAL: POSTGRES_HOT_STANDBY_DELAY CRITICAL: DB template1 (host:localhost) 52861928 and 3 seconds [02:23:27] RECOVERY - Postgres Replication Lag on maps1003 is OK: POSTGRES_HOT_STANDBY_DELAY OK: DB template1 (host:localhost) 0 and 48 seconds [02:41:44] (03CR) 10Gergő Tisza: wikitech: Disable Gerrit accounts when blocked on wikitech (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/506587 (https://phabricator.wikimedia.org/T218654) (owner: 10BryanDavis) [03:59:41] PROBLEM - Check systemd state on ms-be2015 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [04:04:29] PROBLEM - puppet last run on wtp1026 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [04:19:51] PROBLEM - puppet last run on db1069 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [04:21:11] PROBLEM - EDAC syslog messages on db1107 is CRITICAL: 16.36 ge 4 https://grafana.wikimedia.org/dashboard/db/host-overview?orgId=1&var-server=db1107&var-datasource=eqiad+prometheus/ops [04:30:55] RECOVERY - puppet last run on wtp1026 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [04:34:59] RECOVERY - Check systemd state on ms-be2015 is OK: OK - running: The system is fully operational [04:36:55] PROBLEM - puppet last run on labsdb1010 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [04:51:39] RECOVERY - puppet last run on db1069 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures [05:03:23] RECOVERY - puppet last run on labsdb1010 is OK: OK: Puppet is currently enabled, last run 56 seconds ago with 0 failures [05:10:23] 10Operations, 10ops-eqiad, 10Analytics, 10Analytics-EventLogging, 10DBA: db1107 (eventlogging db master) possibly memory issues - https://phabricator.wikimedia.org/T222050 (10Marostegui) [05:10:46] 10Operations, 10ops-eqiad, 10Analytics, 10Analytics-EventLogging, 10DBA: db1107 (eventlogging db master) possibly memory issues - https://phabricator.wikimedia.org/T222050 (10Marostegui) p:05Triage→03Normal [05:10:57] ACKNOWLEDGEMENT - EDAC syslog messages on db1107 is CRITICAL: 16.26 ge 4 Marostegui T222050 https://grafana.wikimedia.org/dashboard/db/host-overview?orgId=1&var-server=db1107&var-datasource=eqiad+prometheus/ops [05:11:16] 10Operations, 10ops-eqiad, 10Analytics, 10Analytics-EventLogging, 10DBA: db1107 (eventlogging db master) possibly memory issues - https://phabricator.wikimedia.org/T222050 (10Marostegui) I've acked the alert for now. [05:17:15] (03PS1) 10Marostegui: pcX.yaml: Add hostname as a comment [puppet] - 10https://gerrit.wikimedia.org/r/506936 [05:27:14] (03CR) 10Marostegui: [C: 03+2] pcX.yaml: Add hostname as a comment [puppet] - 10https://gerrit.wikimedia.org/r/506936 (owner: 10Marostegui) [05:27:59] PROBLEM - Check systemd state on analytics1050 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [05:36:04] (03PS1) 10Marostegui: mariadb: Move db2045 from s8 to x1 [puppet] - 10https://gerrit.wikimedia.org/r/506937 (https://phabricator.wikimedia.org/T219493) [05:38:52] (03CR) 10Marostegui: [C: 03+2] mariadb: Move db2045 from s8 to x1 [puppet] - 10https://gerrit.wikimedia.org/r/506937 (https://phabricator.wikimedia.org/T219493) (owner: 10Marostegui) [05:51:22] 10Operations, 10ops-eqiad, 10Analytics, 10Analytics-EventLogging, 10DBA: db1107 (eventlogging db master) possibly memory issues - https://phabricator.wikimedia.org/T222050 (10elukey) Thanks a lot @Marostegui! We can shutdown the host without any problem, it only needs a ~10m heads up to properly stop eve... [06:00:45] (03PS1) 10Rxy: Merge interface-editor to interface-admin at jawiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/506939 (https://phabricator.wikimedia.org/T222018) [06:06:50] (03PS1) 10Ammarpad: Set wgArticleCountMethod='any' for bgwikinews [mediawiki-config] - 10https://gerrit.wikimedia.org/r/506943 (https://phabricator.wikimedia.org/T222044) [06:30:36] PROBLEM - puppet last run on mw1323 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 4 minutes ago with 1 failures. Failed resources (up to 3 shown): File[/usr/local/bin/cgroup-mediawiki-clean] [06:41:48] (03CR) 10Elukey: "Thanks for the feedback! Going to incorporate your suggestions in the final change, better safe than sorry with the admin module :)" (033 comments) [puppet] - 10https://gerrit.wikimedia.org/r/498399 (https://phabricator.wikimedia.org/T215171) (owner: 10Elukey) [06:48:31] (03PS2) 10Muehlenhoff: Add sd-pam processes to filter list for debdeploy [puppet] - 10https://gerrit.wikimedia.org/r/506595 (https://phabricator.wikimedia.org/T135991) [06:50:04] (03CR) 10Muehlenhoff: [C: 03+2] Add sd-pam processes to filter list for debdeploy [puppet] - 10https://gerrit.wikimedia.org/r/506595 (https://phabricator.wikimedia.org/T135991) (owner: 10Muehlenhoff) [06:54:06] (03PS5) 10Elukey: admin: allow users to be removed preserving their home directories [puppet] - 10https://gerrit.wikimedia.org/r/498399 (https://phabricator.wikimedia.org/T215171) [06:56:56] RECOVERY - puppet last run on mw1323 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [07:13:28] !log updated stretch netboot image for 9.9 point release [07:13:31] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:13:42] ^ marostegui: can you retry? [07:14:21] (03PS2) 10Muehlenhoff: Remove support for trusty in two Prometheus exporters [puppet] - 10https://gerrit.wikimedia.org/r/503265 [07:14:41] moritzm: yeah thanks! [07:15:55] (03CR) 10Muehlenhoff: [C: 03+2] Remove support for trusty in two Prometheus exporters [puppet] - 10https://gerrit.wikimedia.org/r/503265 (owner: 10Muehlenhoff) [07:21:16] (03PS2) 10Muehlenhoff: dnsrecursor: Remove support for Ubuntu/trusty [puppet] - 10https://gerrit.wikimedia.org/r/500413 [07:22:15] moritzm: working good! thanks :) [07:25:54] cool :-) [07:28:00] PROBLEM - Check systemd state on ms-be2014 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [07:29:43] (03PS3) 10Muehlenhoff: dnsrecursor: Remove support for Ubuntu/trusty [puppet] - 10https://gerrit.wikimedia.org/r/500413 [07:35:06] (03PS4) 10Elukey: profile::analytics::refinery::repository: use the 'analitics-deploy' user [puppet] - 10https://gerrit.wikimedia.org/r/506609 (https://phabricator.wikimedia.org/T220971) [07:37:03] (03CR) 10Elukey: [C: 03+2] profile::analytics::refinery::repository: use the 'analitics-deploy' user [puppet] - 10https://gerrit.wikimedia.org/r/506609 (https://phabricator.wikimedia.org/T220971) (owner: 10Elukey) [07:40:21] (03PS4) 10Muehlenhoff: dnsrecursor: Remove support for Ubuntu/trusty [puppet] - 10https://gerrit.wikimedia.org/r/500413 [07:43:12] (03PS1) 10Marostegui: db-codfw.php: Move db2045 to x1 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/506944 (https://phabricator.wikimedia.org/T219493) [07:43:48] (03PS2) 10Jcrespo: mariadb: Reenable notifications for db2097 [puppet] - 10https://gerrit.wikimedia.org/r/506871 (https://phabricator.wikimedia.org/T220572) [07:44:03] (03CR) 10Marostegui: [C: 03+1] mariadb: Reenable notifications for db2097 [puppet] - 10https://gerrit.wikimedia.org/r/506871 (https://phabricator.wikimedia.org/T220572) (owner: 10Jcrespo) [07:44:44] !log Stop replication on db2034 (x1 master) for maintenance - T219493 [07:44:51] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:44:52] T219493: Decommission 2 codfw x1 hosts db2033 and db2034 - https://phabricator.wikimedia.org/T219493 [07:45:18] RECOVERY - Check systemd state on ms-be2014 is OK: OK - running: The system is fully operational [07:45:30] (03PS2) 10Muehlenhoff: Remove support for Ubuntu/trusty in base packages [puppet] - 10https://gerrit.wikimedia.org/r/498126 [07:45:48] 10Operations, 10ops-eqiad, 10Analytics, 10Analytics-EventLogging, 10DBA: db1107 (eventlogging db master) possibly memory issues - https://phabricator.wikimedia.org/T222050 (10Marostegui) a:03Cmjohnson Thanks - not sure how to proceed as the `dmesg` entries show that the issue is fixed but icinga is sti... [07:47:21] !log Stop mysql on db2034 (lag will happen on x1 codfw) - T219493 [07:47:25] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:48:43] (03CR) 10Jcrespo: [C: 03+2] mariadb: Reenable notifications for db2097 [puppet] - 10https://gerrit.wikimedia.org/r/506871 (https://phabricator.wikimedia.org/T220572) (owner: 10Jcrespo) [07:49:30] (03CR) 10Marostegui: [C: 03+2] db-codfw.php: Move db2045 to x1 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/506944 (https://phabricator.wikimedia.org/T219493) (owner: 10Marostegui) [07:50:40] (03Merged) 10jenkins-bot: db-codfw.php: Move db2045 to x1 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/506944 (https://phabricator.wikimedia.org/T219493) (owner: 10Marostegui) [07:51:39] (03PS1) 10Marostegui: db-codfw.php: Set db2045 with IP instead of hostname [mediawiki-config] - 10https://gerrit.wikimedia.org/r/506946 [07:53:47] (03PS2) 10Marostegui: db-codfw.php: Set db2045 with IP instead of hostname [mediawiki-config] - 10https://gerrit.wikimedia.org/r/506946 [07:55:24] 10Operations, 10MediaWiki-General-or-Unknown, 10serviceops, 10Core Platform Team (PHP7 (TEC4)), and 4 others: Some pages will become completely unreachable after PHP7 update due to Unicode changes - https://phabricator.wikimedia.org/T219279 (10Joe) >>! In T219279#5138236, @Anomie wrote: >>>! In T219279#509... [07:55:32] (03CR) 10Marostegui: [C: 03+2] db-codfw.php: Set db2045 with IP instead of hostname [mediawiki-config] - 10https://gerrit.wikimedia.org/r/506946 (owner: 10Marostegui) [07:55:34] (03CR) 10jenkins-bot: db-codfw.php: Move db2045 to x1 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/506944 (https://phabricator.wikimedia.org/T219493) (owner: 10Marostegui) [07:56:28] (03PS3) 10Giuseppe Lavagetto: Add Language::ucfirst overrides for php 7.2 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/505487 (https://phabricator.wikimedia.org/T219279) [07:56:43] (03Merged) 10jenkins-bot: db-codfw.php: Set db2045 with IP instead of hostname [mediawiki-config] - 10https://gerrit.wikimedia.org/r/506946 (owner: 10Marostegui) [07:57:29] (03CR) 10Giuseppe Lavagetto: [C: 03+2] Add Language::ucfirst overrides for php 7.2 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/505487 (https://phabricator.wikimedia.org/T219279) (owner: 10Giuseppe Lavagetto) [07:57:48] (03PS12) 10Jcrespo: mariadb-snapshots: Setup full daily snapshots for all codfw sections [puppet] - 10https://gerrit.wikimedia.org/r/500980 (https://phabricator.wikimedia.org/T206203) [07:57:50] (03PS10) 10Jcrespo: mariadb-snapshots: Stop replication during transfer [puppet] - 10https://gerrit.wikimedia.org/r/501546 (https://phabricator.wikimedia.org/T206203) [07:57:52] (03PS1) 10Jcrespo: mariadb-backups: Setup db2097 as the source of some codfw backups [puppet] - 10https://gerrit.wikimedia.org/r/506947 (https://phabricator.wikimedia.org/T206203) [07:58:29] (03Merged) 10jenkins-bot: Add Language::ucfirst overrides for php 7.2 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/505487 (https://phabricator.wikimedia.org/T219279) (owner: 10Giuseppe Lavagetto) [07:58:40] !log marostegui@deploy1001 Synchronized wmf-config/db-codfw.php: Move db2045 from s8 to x1 T219493 (duration: 00m 55s) [07:58:43] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:58:44] T219493: Decommission 2 codfw x1 hosts db2033 and db2034 - https://phabricator.wikimedia.org/T219493 [08:02:02] <_joe_> marostegui: I'm testing my change on mwdebug1002, it will take me some time to be sure it's ok [08:02:09] <_joe_> if you're in a hurry, we can revert [08:02:25] _joe_: no, no rush from my side, not planning to push anything [08:03:09] <_joe_> ack, good! [08:06:44] (03CR) 10jenkins-bot: db-codfw.php: Set db2045 with IP instead of hostname [mediawiki-config] - 10https://gerrit.wikimedia.org/r/506946 (owner: 10Marostegui) [08:06:46] (03CR) 10jenkins-bot: Add Language::ucfirst overrides for php 7.2 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/505487 (https://phabricator.wikimedia.org/T219279) (owner: 10Giuseppe Lavagetto) [08:11:37] (03PS1) 10Jcrespo: mariadb-backups: Productionize db2098 for backup source of s2 and s3 [puppet] - 10https://gerrit.wikimedia.org/r/506948 (https://phabricator.wikimedia.org/T220572) [08:12:42] (03PS2) 10Jcrespo: mariadb-backups: Productionize db2098 for backup source of s2 and s3 [puppet] - 10https://gerrit.wikimedia.org/r/506948 (https://phabricator.wikimedia.org/T220572) [08:14:09] (03CR) 10Jcrespo: [C: 03+2] mariadb-backups: Productionize db2098 for backup source of s2 and s3 [puppet] - 10https://gerrit.wikimedia.org/r/506948 (https://phabricator.wikimedia.org/T220572) (owner: 10Jcrespo) [08:23:14] !log joal@deploy1001 Started deploy [analytics/refinery@53a4eee]: Test of deploy-user change (analytics -> analytics-deploy) [08:23:16] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:23:40] !log oblivian@deploy1001 Synchronized wmf-config/Php72ToUpper.php: Adding unicode overrides table for php 7.2 T219279 (duration: 00m 54s) [08:23:44] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:23:44] T219279: Some pages will become completely unreachable after PHP7 update due to Unicode changes - https://phabricator.wikimedia.org/T219279 [08:24:51] (03PS1) 10Marostegui: install_server: Remove db2045 [puppet] - 10https://gerrit.wikimedia.org/r/506949 [08:25:03] !log stop dbstore2001:s2 for cloning to db2098 T220572 [08:25:06] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:25:07] T220572: Productionize eqiad and codfw source backup hosts & codfw backup test host - https://phabricator.wikimedia.org/T220572 [08:25:23] !log oblivian@deploy1001 Synchronized wmf-config/CommonSettings.php: Enable unicode overrides table for php 7.2 T219279 (duration: 00m 53s) [08:25:26] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:25:38] (03CR) 10Marostegui: [C: 03+2] install_server: Remove db2045 [puppet] - 10https://gerrit.wikimedia.org/r/506949 (owner: 10Marostegui) [08:28:19] !log restart keyholder-proxy on deploy1001 (attempt to see if new analytics scap settings got applied) [08:28:21] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:31:07] <_joe_> marostegui: I'm done with scap [08:31:13] _joe_: thanks! [08:32:23] I am waiting for db2098 to show the new checks to downtime them (they are already disabled) [08:33:55] !log restart keyholder on deploy1001 + rearm keys [08:33:57] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:38:52] !log joal@deploy1001 Finished deploy [analytics/refinery@53a4eee]: Test of deploy-user change (analytics -> analytics-deploy) (duration: 15m 38s) [08:38:54] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:39:22] 10Operations: Integrate Stretch 9.9 point update - https://phabricator.wikimedia.org/T222053 (10MoritzMuehlenhoff) [08:39:29] 10Operations: Integrate Stretch 9.9 point update - https://phabricator.wikimedia.org/T222053 (10MoritzMuehlenhoff) p:05Triage→03Normal [08:39:55] !log begin migration of bast4002 to prometheus v2 - T187987 [08:39:59] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:39:59] T187987: 100% of Prometheus traffic served by Prometheus v2 - https://phabricator.wikimedia.org/T187987 [08:43:57] (03PS1) 10Filippo Giunchedi: hieradata: bast4002 to Prometheus v2 [puppet] - 10https://gerrit.wikimedia.org/r/506950 (https://phabricator.wikimedia.org/T187987) [08:44:20] (03CR) 10Filippo Giunchedi: [C: 03+2] hieradata: bast4002 to Prometheus v2 [puppet] - 10https://gerrit.wikimedia.org/r/506950 (https://phabricator.wikimedia.org/T187987) (owner: 10Filippo Giunchedi) [08:51:54] 10Operations: Integrate Stretch 9.9 point update - https://phabricator.wikimedia.org/T222053 (10MoritzMuehlenhoff) [08:53:28] (03PS1) 10Arturo Borrero Gonzalez: openstack: clientpackages: decouple configuration [puppet] - 10https://gerrit.wikimedia.org/r/506952 (https://phabricator.wikimedia.org/T220051) [08:54:05] PROBLEM - Varnish traffic drop between 30min ago and now at ulsfo on icinga1001 is CRITICAL: 54.74 le 60 https://grafana.wikimedia.org/dashboard/db/varnish-http-requests?panelId=6&fullscreen&orgId=1 [08:54:17] migration artifact ^ [08:54:23] :_) [08:54:54] (03PS10) 10Giuseppe Lavagetto: profile::mediawiki::maintenance: systemd-timer based periodic jobs [puppet] - 10https://gerrit.wikimedia.org/r/482792 (https://phabricator.wikimedia.org/T211250) [08:55:09] RECOVERY - Varnish traffic drop between 30min ago and now at ulsfo on icinga1001 is OK: (C)60 le (W)70 le 93.88 https://grafana.wikimedia.org/dashboard/db/varnish-http-requests?panelId=6&fullscreen&orgId=1 [08:57:20] (03CR) 10Filippo Giunchedi: [C: 03+2] logstash: ramp up logs retention [puppet] - 10https://gerrit.wikimedia.org/r/505741 (https://phabricator.wikimedia.org/T220103) (owner: 10Filippo Giunchedi) [08:57:30] (03PS3) 10Filippo Giunchedi: logstash: ramp up logs retention [puppet] - 10https://gerrit.wikimedia.org/r/505741 (https://phabricator.wikimedia.org/T220103) [08:58:41] (03PS1) 10Giuseppe Lavagetto: Send 1% of anonymous users to PHP7.2 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/506953 (https://phabricator.wikimedia.org/T219150) [08:59:07] (03PS1) 10Volans: sre.hosts.downtime: covert to class-based API [cookbooks] - 10https://gerrit.wikimedia.org/r/506954 (https://phabricator.wikimedia.org/T221212) [08:59:11] (03PS1) 10Volans: cookbook API: drop get_title() support [software/spicerack] - 10https://gerrit.wikimedia.org/r/506955 [08:59:13] (03PS1) 10Volans: cookbook API: add class API [software/spicerack] - 10https://gerrit.wikimedia.org/r/506956 (https://phabricator.wikimedia.org/T221212) [08:59:28] !log joal@deploy1001 Started deploy [analytics/refinery@53a4eee]: Test of deploy-user change (analytics -> analytics-deploy) - bis [08:59:30] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:00:08] (03PS2) 10Volans: sre.hosts.downtime: convert to class-based API [cookbooks] - 10https://gerrit.wikimedia.org/r/506954 (https://phabricator.wikimedia.org/T221212) [09:00:46] (03PS2) 10Arturo Borrero Gonzalez: openstack: clientpackages: decouple configuration [puppet] - 10https://gerrit.wikimedia.org/r/506952 (https://phabricator.wikimedia.org/T220051) [09:02:59] (03CR) 10Volans: "Example of conversion of coookbook to the class-based API proposed in I7b4f305d8718bf544974d0ea8a6c0df8eb5936c6" [cookbooks] - 10https://gerrit.wikimedia.org/r/506954 (https://phabricator.wikimedia.org/T221212) (owner: 10Volans) [09:03:13] (03CR) 10Volans: "Proposal to add a class-based API to spicerack to enable additional integrations. An example of converted cookbook is available here: Ic87" [software/spicerack] - 10https://gerrit.wikimedia.org/r/506956 (https://phabricator.wikimedia.org/T221212) (owner: 10Volans) [09:03:52] (03PS13) 10Jcrespo: mariadb-snapshots: Setup full daily snapshots for all codfw sections [puppet] - 10https://gerrit.wikimedia.org/r/500980 (https://phabricator.wikimedia.org/T206203) [09:03:54] (03PS11) 10Jcrespo: mariadb-snapshots: Stop replication during transfer [puppet] - 10https://gerrit.wikimedia.org/r/501546 (https://phabricator.wikimedia.org/T206203) [09:03:56] (03PS2) 10Jcrespo: mariadb-backups: Setup db2097 as the source of some codfw backups [puppet] - 10https://gerrit.wikimedia.org/r/506947 (https://phabricator.wikimedia.org/T206203) [09:03:58] (03PS1) 10Jcrespo: mariadb: Productionize db2099 for backup source of s4 and s5 [puppet] - 10https://gerrit.wikimedia.org/r/506957 (https://phabricator.wikimedia.org/T220572) [09:04:00] (03CR) 10jerkins-bot: [V: 04-1] cookbook API: drop get_title() support [software/spicerack] - 10https://gerrit.wikimedia.org/r/506955 (owner: 10Volans) [09:04:02] (03CR) 10jerkins-bot: [V: 04-1] cookbook API: add class API [software/spicerack] - 10https://gerrit.wikimedia.org/r/506956 (https://phabricator.wikimedia.org/T221212) (owner: 10Volans) [09:05:05] (03CR) 10jerkins-bot: [V: 04-1] mariadb: Productionize db2099 for backup source of s4 and s5 [puppet] - 10https://gerrit.wikimedia.org/r/506957 (https://phabricator.wikimedia.org/T220572) (owner: 10Jcrespo) [09:05:25] (03PS2) 10Jcrespo: mariadb: Productionize db2099 for backup source of s4 and s5 [puppet] - 10https://gerrit.wikimedia.org/r/506957 (https://phabricator.wikimedia.org/T220572) [09:06:53] (03CR) 10Muehlenhoff: [C: 03+1] "Looks good to me" [puppet] - 10https://gerrit.wikimedia.org/r/506683 (owner: 10Elukey) [09:07:12] (03PS3) 10Muehlenhoff: profile::kerberos: make krb.conf working with multiple KDC servers [puppet] - 10https://gerrit.wikimedia.org/r/506683 (owner: 10Elukey) [09:08:18] (03CR) 10Fsero: [C: 03+1] "LGTM" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/506953 (https://phabricator.wikimedia.org/T219150) (owner: 10Giuseppe Lavagetto) [09:08:42] (03CR) 10Muehlenhoff: [C: 03+2] profile::kerberos: make krb.conf working with multiple KDC servers [puppet] - 10https://gerrit.wikimedia.org/r/506683 (owner: 10Elukey) [09:09:27] (03PS3) 10Arturo Borrero Gonzalez: openstack: clientpackages: decouple configuration [puppet] - 10https://gerrit.wikimedia.org/r/506952 (https://phabricator.wikimedia.org/T220051) [09:09:56] (03CR) 10Arturo Borrero Gonzalez: [C: 03+2] "PCC as expected: https://puppet-compiler.wmflabs.org/compiler1002/16129/" [puppet] - 10https://gerrit.wikimedia.org/r/506952 (https://phabricator.wikimedia.org/T220051) (owner: 10Arturo Borrero Gonzalez) [09:10:25] (03CR) 10Arturo Borrero Gonzalez: [V: 03+2 C: 03+2] openstack: clientpackages: decouple configuration [puppet] - 10https://gerrit.wikimedia.org/r/506952 (https://phabricator.wikimedia.org/T220051) (owner: 10Arturo Borrero Gonzalez) [09:11:44] (03PS3) 10Jcrespo: mariadb: Productionize db2099 for backup source of s4 and s5 [puppet] - 10https://gerrit.wikimedia.org/r/506957 (https://phabricator.wikimedia.org/T220572) [09:13:51] !log stop dbstore2002:s4 for cloning to db2099 T220572 [09:13:55] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:13:55] T220572: Productionize eqiad and codfw source backup hosts & codfw backup test host - https://phabricator.wikimedia.org/T220572 [09:14:53] (03CR) 10Jcrespo: [C: 03+2] mariadb: Productionize db2099 for backup source of s4 and s5 [puppet] - 10https://gerrit.wikimedia.org/r/506957 (https://phabricator.wikimedia.org/T220572) (owner: 10Jcrespo) [09:19:19] (03PS5) 10Elukey: admin: add the analytics-hdfs system user to analytics-privatedata-users [puppet] - 10https://gerrit.wikimedia.org/r/504839 (https://phabricator.wikimedia.org/T220971) [09:20:14] (03PS6) 10Elukey: admin: add the 'analytics' system user to analytics-privatedata-users [puppet] - 10https://gerrit.wikimedia.org/r/504839 (https://phabricator.wikimedia.org/T220971) [09:21:26] (03CR) 10Elukey: "I was finally able to move the 'analytics' scap user to 'analytics-deploy', so now we can use 'analytics' without conflicts. The 'analytic" [puppet] - 10https://gerrit.wikimedia.org/r/504839 (https://phabricator.wikimedia.org/T220971) (owner: 10Elukey) [09:26:41] PROBLEM - Disk space on notebook1004 is CRITICAL: DISK CRITICAL - free space: /srv 4479 MB (3% inode=83%): https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space [09:26:41] PROBLEM - Request latencies on neon is CRITICAL: instance=10.64.0.40:6443 verb={LIST,PATCH,PUT} https://grafana.wikimedia.org/dashboard/db/kubernetes-api [09:26:56] checking notebook [09:27:06] (03PS7) 10Elukey: admin: add the 'analytics' system user to analytics-privatedata-users [puppet] - 10https://gerrit.wikimedia.org/r/504839 (https://phabricator.wikimedia.org/T220971) [09:27:39] PROBLEM - Request latencies on chlorine is CRITICAL: instance=10.64.0.45:6443 verb=PATCH https://grafana.wikimedia.org/dashboard/db/kubernetes-api [09:27:47] !log joal@deploy1001 Finished deploy [analytics/refinery@53a4eee]: Test of deploy-user change (analytics -> analytics-deploy) - bis (duration: 28m 19s) [09:27:50] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:27:57] PROBLEM - etcd request latencies on neon is CRITICAL: instance=10.64.0.40:6443 operation={compareAndSwap,get,list} https://grafana.wikimedia.org/dashboard/db/kubernetes-api [09:27:59] RECOVERY - Disk space on notebook1004 is OK: DISK OK https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space [09:28:25] PROBLEM - etcd request latencies on chlorine is CRITICAL: instance=10.64.0.45:6443 operation={compareAndSwap,get,list} https://grafana.wikimedia.org/dashboard/db/kubernetes-api [09:30:13] RECOVERY - Request latencies on chlorine is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/kubernetes-api [09:30:59] RECOVERY - etcd request latencies on chlorine is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/kubernetes-api [09:31:47] RECOVERY - etcd request latencies on neon is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/kubernetes-api [09:31:49] RECOVERY - Request latencies on neon is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/kubernetes-api [09:32:04] (03PS1) 10Arturo Borrero Gonzalez: openstack: drop labtest/labtestn unused code [puppet] - 10https://gerrit.wikimedia.org/r/506962 (https://phabricator.wikimedia.org/T218026) [09:32:49] (03CR) 10Lucas Werkmeister (WMDE): [C: 03+1] Add namespace "Aldono" at eo.wiktionary [mediawiki-config] - 10https://gerrit.wikimedia.org/r/505765 (https://phabricator.wikimedia.org/T221525) (owner: 10Tulsi Bhagat) [09:34:23] (03PS4) 10Jbond: standard refactor: remove standard class from base classes [puppet] - 10https://gerrit.wikimedia.org/r/506682 (https://phabricator.wikimedia.org/T221225) [09:35:23] (03CR) 10Jbond: [C: 03+2] standard refactor: remove standard class from base classes [puppet] - 10https://gerrit.wikimedia.org/r/506682 (https://phabricator.wikimedia.org/T221225) (owner: 10Jbond) [09:36:00] (03PS1) 10Jbond: Revert "standard refactor: remove standard class from base classes" [puppet] - 10https://gerrit.wikimedia.org/r/506964 [09:36:46] (03CR) 10jerkins-bot: [V: 04-1] Revert "standard refactor: remove standard class from base classes" [puppet] - 10https://gerrit.wikimedia.org/r/506964 (owner: 10Jbond) [09:41:08] (03PS1) 10Muehlenhoff: Add ferm rules for the Kerberos KDC [puppet] - 10https://gerrit.wikimedia.org/r/506965 [09:41:44] (03PS11) 10Arturo Borrero Gonzalez: standard: introduce a wrapper profile [puppet] - 10https://gerrit.wikimedia.org/r/506614 (https://phabricator.wikimedia.org/T221225) [09:43:55] (03PS2) 10Volans: cookbook API: drop get_title() support [software/spicerack] - 10https://gerrit.wikimedia.org/r/506955 [09:43:57] (03PS2) 10Volans: cookbook API: add class API [software/spicerack] - 10https://gerrit.wikimedia.org/r/506956 (https://phabricator.wikimedia.org/T221212) [09:43:59] (03PS1) 10Volans: setup.py: force urllib3 version [software/spicerack] - 10https://gerrit.wikimedia.org/r/506966 [09:44:16] (03PS6) 10Ema: varnish: run VTC tests against remote PCC [puppet] - 10https://gerrit.wikimedia.org/r/506868 (https://phabricator.wikimedia.org/T128188) [09:45:14] (03CR) 10Gehel: "A few comments already. I'll do a second pass at some point." (035 comments) [software/spicerack] - 10https://gerrit.wikimedia.org/r/506956 (https://phabricator.wikimedia.org/T221212) (owner: 10Volans) [09:47:17] (03CR) 10Jbond: [C: 03+1] "Thanks for the changes LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/498399 (https://phabricator.wikimedia.org/T215171) (owner: 10Elukey) [09:49:52] (03CR) 10Jbond: [C: 03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/504839 (https://phabricator.wikimedia.org/T220971) (owner: 10Elukey) [09:50:14] (03CR) 10Muehlenhoff: [C: 03+1] "Looks good" [puppet] - 10https://gerrit.wikimedia.org/r/498399 (https://phabricator.wikimedia.org/T215171) (owner: 10Elukey) [09:52:13] (03CR) 10Jbond: [C: 03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/506965 (owner: 10Muehlenhoff) [09:54:07] (03CR) 10Elukey: [C: 03+1] Add ferm rules for the Kerberos KDC [puppet] - 10https://gerrit.wikimedia.org/r/506965 (owner: 10Muehlenhoff) [09:57:06] (03PS12) 10Arturo Borrero Gonzalez: standard: introduce a wrapper profile and use it in CloudVPS [puppet] - 10https://gerrit.wikimedia.org/r/506614 (https://phabricator.wikimedia.org/T221225) [09:57:26] (03CR) 10Volans: [C: 04-1] "I don't think that stacking multiple reports results into a single check is a good idea. Each report should have it's own Icinga check IMH" (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/505657 (https://phabricator.wikimedia.org/T215378) (owner: 10CRusnov) [09:58:38] (03CR) 10Giuseppe Lavagetto: [C: 03+2] profile::mediawiki::maintenance: systemd-timer based periodic jobs [puppet] - 10https://gerrit.wikimedia.org/r/482792 (https://phabricator.wikimedia.org/T211250) (owner: 10Giuseppe Lavagetto) [09:58:51] (03PS11) 10Giuseppe Lavagetto: profile::mediawiki::maintenance: systemd-timer based periodic jobs [puppet] - 10https://gerrit.wikimedia.org/r/482792 (https://phabricator.wikimedia.org/T211250) [09:59:10] (03PS7) 10Ema: varnish: run VTC tests against remote PCC [puppet] - 10https://gerrit.wikimedia.org/r/506868 (https://phabricator.wikimedia.org/T128188) [10:01:14] (03PS2) 10Muehlenhoff: Add ferm rules for the Kerberos KDC [puppet] - 10https://gerrit.wikimedia.org/r/506965 [10:01:57] (03CR) 10Arturo Borrero Gonzalez: [C: 03+2] standard: introduce a wrapper profile and use it in CloudVPS [puppet] - 10https://gerrit.wikimedia.org/r/506614 (https://phabricator.wikimedia.org/T221225) (owner: 10Arturo Borrero Gonzalez) [10:05:17] (03PS3) 10Muehlenhoff: Add ferm rules for the Kerberos KDC [puppet] - 10https://gerrit.wikimedia.org/r/506965 [10:07:03] (03CR) 10Muehlenhoff: [C: 03+2] Add ferm rules for the Kerberos KDC [puppet] - 10https://gerrit.wikimedia.org/r/506965 (owner: 10Muehlenhoff) [10:08:40] (03PS8) 10Ema: varnish: run VTC tests against remote PCC [puppet] - 10https://gerrit.wikimedia.org/r/506868 (https://phabricator.wikimedia.org/T128188) [10:11:18] (03PS12) 10Arturo Borrero Gonzalez: sudo: decouple sudo from sudo-ldap [puppet] - 10https://gerrit.wikimedia.org/r/506435 (https://phabricator.wikimedia.org/T221225) [10:20:35] 10Operations, 10Traffic: cp3037 is currently unreachable - https://phabricator.wikimedia.org/T222041 (10Vgutierrez) Requested a power drain via remote hands: > Powered the server cp3037 for at least 15 seconds. After I plugged the power cable back in I was not be able to turn them manually on cp3037 mgmt inte... [10:22:32] (03PS9) 10Ema: varnish: run VTC tests against remote PCC [puppet] - 10https://gerrit.wikimedia.org/r/506868 (https://phabricator.wikimedia.org/T128188) [10:28:30] (03PS10) 10Ema: varnish: run VTC tests against remote PCC [puppet] - 10https://gerrit.wikimedia.org/r/506868 (https://phabricator.wikimedia.org/T128188) [10:30:04] jan_drewniak: I seem to be stuck in Groundhog week. Sigh. Time for (yet another) Wikimedia Portals Update deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20190429T1030). [10:30:22] (03CR) 10Vgutierrez: [C: 03+1] "this looks awesome" [puppet] - 10https://gerrit.wikimedia.org/r/506868 (https://phabricator.wikimedia.org/T128188) (owner: 10Ema) [10:31:16] PROBLEM - Check systemd state on ms-be2015 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [10:34:14] (03CR) 10Volans: "> Patch Set 1:" [puppet] - 10https://gerrit.wikimedia.org/r/502829 (https://phabricator.wikimedia.org/T220625) (owner: 10Gehel) [10:35:10] RECOVERY - Check systemd state on ms-be2015 is OK: OK - running: The system is fully operational [10:39:19] (03CR) 10Volans: "Nits inline" (033 comments) [cookbooks] - 10https://gerrit.wikimedia.org/r/504570 (https://phabricator.wikimedia.org/T220946) (owner: 10Mathew.onipe) [10:40:26] (03CR) 10Ema: [C: 03+2] varnish: run VTC tests against remote PCC [puppet] - 10https://gerrit.wikimedia.org/r/506868 (https://phabricator.wikimedia.org/T128188) (owner: 10Ema) [10:56:20] PROBLEM - High lag on wdqs1003 is CRITICAL: 3667 ge 3600 https://grafana.wikimedia.org/dashboard/db/wikidata-query-service?orgId=1&panelId=8&fullscreen [10:56:24] (03CR) 10Volans: [C: 04-2] "Sorry but I don't think this is a viable approach." (033 comments) [software/spicerack] - 10https://gerrit.wikimedia.org/r/506610 (owner: 10Mathew.onipe) [10:57:40] (03PS3) 10Ema: Normalize thumbnail URLs to avoid cachebusting [puppet] - 10https://gerrit.wikimedia.org/r/495643 (https://phabricator.wikimedia.org/T216339) (owner: 10Gilles) [11:00:04] MaxSem, RoanKattouw, and Niharika: It is that lovely time of the day again! You are hereby commanded to deploy European Mid-day SWAT(Max 6 patches). (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20190429T1100). [11:00:04] rxy, Tulsi, and kart_: A patch you scheduled for European Mid-day SWAT(Max 6 patches) is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker. [11:00:08] o/ [11:00:22] \o/ [11:02:03] (03CR) 10Ema: "I've addressed some of the comments myself." (034 comments) [puppet] - 10https://gerrit.wikimedia.org/r/495643 (https://phabricator.wikimedia.org/T216339) (owner: 10Gilles) [11:03:13] \0 [11:03:22] Who can SWAT? [11:03:46] I haven't deploy access [11:04:00] OK. So, it is me then :) [11:04:10] thanks :) [11:04:43] (03PS2) 10KartikMistry: Allow admins to add or remove patroller group at enwikivoyage [mediawiki-config] - 10https://gerrit.wikimedia.org/r/506860 (https://phabricator.wikimedia.org/T222008) (owner: 10Rxy) [11:06:03] (03PS1) 10Ema: varnish: remove admission probability VTC test [puppet] - 10https://gerrit.wikimedia.org/r/506977 [11:06:36] (03CR) 10KartikMistry: [C: 03+2] Allow admins to add or remove patroller group at enwikivoyage [mediawiki-config] - 10https://gerrit.wikimedia.org/r/506860 (https://phabricator.wikimedia.org/T222008) (owner: 10Rxy) [11:06:44] (03CR) 10Ema: [C: 03+2] varnish: remove admission probability VTC test [puppet] - 10https://gerrit.wikimedia.org/r/506977 (owner: 10Ema) [11:07:07] (03PS4) 10Ema: Normalize thumbnail URLs to avoid cachebusting [puppet] - 10https://gerrit.wikimedia.org/r/495643 (https://phabricator.wikimedia.org/T216339) (owner: 10Gilles) [11:07:49] (03Merged) 10jenkins-bot: Allow admins to add or remove patroller group at enwikivoyage [mediawiki-config] - 10https://gerrit.wikimedia.org/r/506860 (https://phabricator.wikimedia.org/T222008) (owner: 10Rxy) [11:09:16] (03PS2) 10KartikMistry: Merge interface-editor to interface-admin at jawiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/506939 (https://phabricator.wikimedia.org/T222018) (owner: 10Rxy) [11:10:39] rxy: first patch in mwdebug1002. Please test. [11:11:17] kart_: Ok, It works expectedly [11:11:22] please deploy to prod [11:11:44] (03CR) 10jenkins-bot: Allow admins to add or remove patroller group at enwikivoyage [mediawiki-config] - 10https://gerrit.wikimedia.org/r/506860 (https://phabricator.wikimedia.org/T222008) (owner: 10Rxy) [11:11:58] rxy: OK. [11:14:18] !log kartik@deploy1001 Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit|506860|Allow admins to add or remove patroller group at enwikivoyage (T222008)]] (duration: 00m 55s) [11:14:22] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:14:22] T222008: Allow admins to add/remove patroller user group on en.wikivoyage - https://phabricator.wikimedia.org/T222008 [11:14:37] rxy: OK. Done. Now, next patch. [11:15:04] 10Operations, 10Continuous-Integration-Infrastructure, 10Traffic, 10Patch-For-Review: Make CI run Varnish VCL tests - https://phabricator.wikimedia.org/T128188 (10ema) VTC tests can now be run from dev workstations against PCC: ` ema@ariel:~/wmf/operations-puppet$ cd modules/varnish/files/tests ; ./run.py... [11:15:27] the first patch is work expectedly at server: mw1253.eqiad.wmnet [11:15:34] (prod ), ok, next [11:16:01] (03CR) 10KartikMistry: [C: 03+2] Merge interface-editor to interface-admin at jawiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/506939 (https://phabricator.wikimedia.org/T222018) (owner: 10Rxy) [11:17:00] (03Merged) 10jenkins-bot: Merge interface-editor to interface-admin at jawiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/506939 (https://phabricator.wikimedia.org/T222018) (owner: 10Rxy) [11:18:12] rxy: at mwdebug1002. Please test. [11:19:27] kart_: It work expectedly. Please deploy to prod. [11:19:41] OK! [11:21:11] ah. paste newline :/ [11:21:12] (03PS1) 10Arturo Borrero Gonzalez: cloudvps: introduce proper base role/profile for VM instances [puppet] - 10https://gerrit.wikimedia.org/r/506979 (https://phabricator.wikimedia.org/T221225) [11:21:45] !log kartik@deploy1001 Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit|506939| (T222018)]] (duration: 00m 53s) [11:21:49] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:21:49] T222018: Merge Interface Editor userrights to Interface Administrator, and remove Interface Editor from ja.wp - https://phabricator.wikimedia.org/T222018 [11:22:45] rxy: deployed. [11:22:49] (03CR) 10jenkins-bot: Merge interface-editor to interface-admin at jawiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/506939 (https://phabricator.wikimedia.org/T222018) (owner: 10Rxy) [11:22:54] Tulsi: your patch next now. [11:22:55] the second patch is work expectedly at server: mwdebug1002.eqiad.wmnet . thanks ! [11:23:11] OK kart_. I am ready to test it. [11:23:22] (03CR) 10KartikMistry: [C: 03+2] Add namespace "Aldono" at eo.wiktionary [mediawiki-config] - 10https://gerrit.wikimedia.org/r/505765 (https://phabricator.wikimedia.org/T221525) (owner: 10Tulsi Bhagat) [11:23:39] (03PS1) 10Ema: cumin aliases: upload_ats is upload [puppet] - 10https://gerrit.wikimedia.org/r/506980 (https://phabricator.wikimedia.org/T219967) [11:24:58] (03CR) 10Arturo Borrero Gonzalez: "PCC as expected: https://puppet-compiler.wmflabs.org/compiler1002/16140/" [puppet] - 10https://gerrit.wikimedia.org/r/506979 (https://phabricator.wikimedia.org/T221225) (owner: 10Arturo Borrero Gonzalez) [11:25:10] (03PS2) 10Gehel: cloudelastic: allow jobrunners and mwmaint nodes to access cloudelastic [puppet] - 10https://gerrit.wikimedia.org/r/502829 (https://phabricator.wikimedia.org/T220625) [11:25:23] (03PS4) 10KartikMistry: Add namespace "Aldono" at eo.wiktionary [mediawiki-config] - 10https://gerrit.wikimedia.org/r/505765 (https://phabricator.wikimedia.org/T221525) (owner: 10Tulsi Bhagat) [11:25:44] Tulsi: sorry, forgot to rebase. few minutes.. [11:26:01] (03CR) 10Gehel: [C: 03+2] cloudelastic: allow jobrunners and mwmaint nodes to access cloudelastic [puppet] - 10https://gerrit.wikimedia.org/r/502829 (https://phabricator.wikimedia.org/T220625) (owner: 10Gehel) [11:26:21] Ah. Okay. No problem. [11:28:31] Tulsi: On mwdebug1002, please test. [11:28:53] kart_, Working fine. Please deploy it! [11:28:59] cool. [11:29:36] (03PS4) 10Mathew.onipe: elasticsearch: config file for aligning puppet config [puppet] - 10https://gerrit.wikimedia.org/r/506378 (https://phabricator.wikimedia.org/T218932) [11:30:00] (03CR) 10Ema: "pcc lgtm https://puppet-compiler.wmflabs.org/compiler1002/16143/" [puppet] - 10https://gerrit.wikimedia.org/r/506980 (https://phabricator.wikimedia.org/T219967) (owner: 10Ema) [11:30:03] !log kartik@deploy1001 Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit|505765|Add namespace "Aldono" at eo.wiktionary (T221525)]] (duration: 00m 54s) [11:30:07] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:30:07] T221525: Add namespace "Aldono" at EO wiktionary - https://phabricator.wikimedia.org/T221525 [11:30:15] Tulsi: done. [11:30:16] kart_: Requires `mwmaintenancescript namespaceDupes.php --wiki=eowiktionary --fix` after deployment. [11:31:26] Tulsi: oh. Please mention that in the deployment page next time. I'm new SWATer. [11:31:32] Let me do that. [11:31:43] I've mention that in gerrit [11:31:49] ah [11:32:07] (03CR) 10Volans: [C: 03+1] "LGTM, a nit and a suggestion inline" (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/506980 (https://phabricator.wikimedia.org/T219967) (owner: 10Ema) [11:33:08] Tulsi: can you check? [11:33:11] now. [11:33:28] what results you get? [11:33:34] no pages to fix [11:33:59] (03CR) 10jenkins-bot: Add namespace "Aldono" at eo.wiktionary [mediawiki-config] - 10https://gerrit.wikimedia.org/r/505765 (https://phabricator.wikimedia.org/T221525) (owner: 10Tulsi Bhagat) [11:34:17] Tulsi: yes. 0. [11:34:29] What should be the result? [11:34:43] Can you paste the whole output to https://phabricator.wikimedia.org/T221525 [11:34:46] (03PS35) 10Vgutierrez: trafficserver: Provide support for multiple ATS instances [puppet] - 10https://gerrit.wikimedia.org/r/504601 (https://phabricator.wikimedia.org/T221217) [11:34:48] (03PS22) 10Vgutierrez: trafficserver: Provide support for inbound TLS traffic [puppet] - 10https://gerrit.wikimedia.org/r/506159 (https://phabricator.wikimedia.org/T221594) [11:34:50] (03PS15) 10Vgutierrez: trafficserver: Allow disabling caching requests [puppet] - 10https://gerrit.wikimedia.org/r/506390 (https://phabricator.wikimedia.org/T221594) [11:34:52] (03PS2) 10Vgutierrez: prometheus: Support several instances of the trafficserver exporter [puppet] - 10https://gerrit.wikimedia.org/r/506659 (https://phabricator.wikimedia.org/T221217) [11:34:52] Yeah. It's Okay [11:34:54] (03PS24) 10Vgutierrez: trafficserver: Provide a TLS terminator profile and backend+TLS role [puppet] - 10https://gerrit.wikimedia.org/r/506398 (https://phabricator.wikimedia.org/T221594) [11:34:56] (03PS1) 10Vgutierrez: trafficserver: Provide a unified monitoring define [puppet] - 10https://gerrit.wikimedia.org/r/506986 (https://phabricator.wikimedia.org/T221217) [11:35:23] Tulsi: cool. [11:36:41] (03CR) 10jerkins-bot: [V: 04-1] trafficserver: Provide a unified monitoring define [puppet] - 10https://gerrit.wikimedia.org/r/506986 (https://phabricator.wikimedia.org/T221217) (owner: 10Vgutierrez) [11:37:06] (03PS2) 10Ema: cumin aliases: upload_ats is upload [puppet] - 10https://gerrit.wikimedia.org/r/506980 (https://phabricator.wikimedia.org/T219967) [11:37:55] Tulsi: https://phabricator.wikimedia.org/T221525#5143290 [11:38:50] kart_: that’s how it should look, as far as I know [11:38:58] (03CR) 10Ema: "Comments addressed, updated pcc here: https://puppet-compiler.wmflabs.org/compiler1002/16144/" (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/506980 (https://phabricator.wikimedia.org/T219967) (owner: 10Ema) [11:39:44] Lucas_WMDE: OK. Thanks! [11:39:56] kart_, the change is not showing exactly to when i have tested it [11:40:05] Namespace will be: Aldono [11:40:45] it's showing Ablono [11:40:53] kart_: there could also be “n links to fix, n were resolvable”, which would still be fine (e. g. https://phabricator.wikimedia.org/T218796#5082331) [11:41:01] it would only be a problem if not all problems were resolvable [11:41:15] https://wikitech.wikimedia.org/wiki/Adding_Namespaces#Deployment has some more info [11:41:22] (03PS2) 10Vgutierrez: trafficserver: Provide a unified monitoring define [puppet] - 10https://gerrit.wikimedia.org/r/506986 (https://phabricator.wikimedia.org/T221217) [11:41:23] (03PS25) 10Vgutierrez: trafficserver: Provide a TLS terminator profile and backend+TLS role [puppet] - 10https://gerrit.wikimedia.org/r/506398 (https://phabricator.wikimedia.org/T221594) [11:41:34] jouncebot: now [11:41:34] For the next 0 hour(s) and 18 minute(s): European Mid-day SWAT(Max 6 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20190429T1100) [11:41:59] (03CR) 10jerkins-bot: [V: 04-1] trafficserver: Provide a unified monitoring define [puppet] - 10https://gerrit.wikimedia.org/r/506986 (https://phabricator.wikimedia.org/T221217) (owner: 10Vgutierrez) [11:42:00] kart_, https://prnt.sc/ni8rwi [11:42:29] Oh sorry [11:42:31] my bad [11:42:35] Everything is fine [11:42:45] Thank you for the deployment. [11:42:49] (03CR) 10Volans: [C: 03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/506980 (https://phabricator.wikimedia.org/T219967) (owner: 10Ema) [11:43:19] Tulsi: OK! You're welcome. [11:43:39] :) [11:43:53] (03CR) 10Ema: [C: 03+2] cumin aliases: upload_ats is upload [puppet] - 10https://gerrit.wikimedia.org/r/506980 (https://phabricator.wikimedia.org/T219967) (owner: 10Ema) [11:48:30] (03Abandoned) 10Jbond: Revert "standard refactor: remove standard class from base classes" [puppet] - 10https://gerrit.wikimedia.org/r/506964 (owner: 10Jbond) [11:51:11] (03PS1) 10Jbond: rename standard: update all profiles [puppet] - 10https://gerrit.wikimedia.org/r/506987 [11:51:13] (03PS1) 10Jbond: rename standard: update roles [puppet] - 10https://gerrit.wikimedia.org/r/506988 [11:51:15] (03PS1) 10Jbond: rename standard: roles with hiera updates [puppet] - 10https://gerrit.wikimedia.org/r/506989 [11:51:18] (03PS1) 10Jbond: rename standard: update the mx role [puppet] - 10https://gerrit.wikimedia.org/r/506990 [11:51:20] (03PS1) 10Jbond: rename standard: move standard out of site.pp and into pybaltest role [puppet] - 10https://gerrit.wikimedia.org/r/506991 [11:51:22] (03PS1) 10Jbond: rename standard: move the standard out of site.pp and into role analytics_cluster::hadoop::client [puppet] - 10https://gerrit.wikimedia.org/r/506992 [11:51:24] (03PS1) 10Jbond: rename standard: move standard out of site.pp and into role::ores::redis [puppet] - 10https://gerrit.wikimedia.org/r/506993 [11:51:28] (03PS1) 10Jbond: rename staging: remove standard from site.pp [puppet] - 10https://gerrit.wikimedia.org/r/506994 [11:51:30] (03PS1) 10Jbond: rename standard: update site.pp default [puppet] - 10https://gerrit.wikimedia.org/r/506995 [11:52:54] (03PS1) 10Jbond: rename standard: cp3022.esams.wmnet dose not appear to exist anymore [puppet] - 10https://gerrit.wikimedia.org/r/507004 [11:52:58] (03PS2) 10Giuseppe Lavagetto: Send 1% of anonymous users to PHP7.2 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/506953 (https://phabricator.wikimedia.org/T219150) [11:53:04] (03CR) 10jerkins-bot: [V: 04-1] rename standard: move the standard out of site.pp and into role analytics_cluster::hadoop::client [puppet] - 10https://gerrit.wikimedia.org/r/506992 (owner: 10Jbond) [11:53:08] PROBLEM - HHVM rendering on mwdebug1002 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers [11:54:04] (03PS12) 10Giuseppe Lavagetto: profile::mediawiki::maintenance: systemd-timer based periodic jobs [puppet] - 10https://gerrit.wikimedia.org/r/482792 (https://phabricator.wikimedia.org/T211250) [11:54:16] RECOVERY - HHVM rendering on mwdebug1002 is OK: HTTP OK: HTTP/1.1 200 OK - 77918 bytes in 0.162 second response time https://wikitech.wikimedia.org/wiki/Application_servers [11:55:12] (03PS2) 10Arturo Borrero Gonzalez: cloudvps: introduce proper base role/profile for VM instances [puppet] - 10https://gerrit.wikimedia.org/r/506979 (https://phabricator.wikimedia.org/T221225) [11:55:41] (03PS3) 10Vgutierrez: trafficserver: Provide a unified monitoring define [puppet] - 10https://gerrit.wikimedia.org/r/506986 (https://phabricator.wikimedia.org/T221217) [11:55:44] (03PS26) 10Vgutierrez: trafficserver: Provide a TLS terminator profile and backend+TLS role [puppet] - 10https://gerrit.wikimedia.org/r/506398 (https://phabricator.wikimedia.org/T221594) [11:55:51] (03CR) 10Giuseppe Lavagetto: [V: 03+2 C: 03+2] profile::mediawiki::maintenance: systemd-timer based periodic jobs [puppet] - 10https://gerrit.wikimedia.org/r/482792 (https://phabricator.wikimedia.org/T211250) (owner: 10Giuseppe Lavagetto) [11:56:03] <_joe_> I don't have all day for you, jenkins [11:56:24] !log kartik@deploy1001 Synchronized php-1.34.0-wmf.1/extensions/ContentTranslation: SWAT: [[gerrit|506971|Change the way we calculate total unmodified MT (T221930)]] (duration: 00m 56s) [11:56:27] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:56:28] T221930: Threshold to prevent publishing needs more precision - https://phabricator.wikimedia.org/T221930 [11:56:46] Lol _joe_ [11:56:47] !log EU-Midday SWAT done. Thanks. [11:56:49] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:57:02] _joe_: It serves you well :) [11:58:18] (03PS3) 10Arturo Borrero Gonzalez: cloudvps: introduce proper base role/profile for VM instances [puppet] - 10https://gerrit.wikimedia.org/r/506979 (https://phabricator.wikimedia.org/T221225) [11:58:30] <_joe_> (just to be clear, it was a simple rebase because ops/puppet is ff-only, I don't endorse skipping CI in any other case, and only if the change is small) [11:59:22] (03CR) 10Volans: "some replies inline, thanks a lot for the thoughts and first pass" (034 comments) [software/spicerack] - 10https://gerrit.wikimedia.org/r/506956 (https://phabricator.wikimedia.org/T221212) (owner: 10Volans) [11:59:44] PROBLEM - Work requests waiting in Zuul Gearman server on contint1001 is CRITICAL: CRITICAL: 57.14% of data above the critical threshold [140.0] https://grafana.wikimedia.org/dashboard/db/zuul-gearman?panelId=10&fullscreen&orgId=1 [12:03:15] _joe_: honestly, i think everyone wants to skip CI every now and then so i doubt anyone would complain about you doing it :D [12:04:48] PROBLEM - puppet last run on lvs3004 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [12:04:55] (03PS2) 10Jbond: rename standard: move the standard out of site.pp [puppet] - 10https://gerrit.wikimedia.org/r/506992 [12:07:14] !log stop dbstore2002:s3 and dbstore2001:s5 for cloning to db2098/99 T220572 [12:07:17] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:07:18] T220572: Productionize eqiad and codfw source backup hosts & codfw backup test host - https://phabricator.wikimedia.org/T220572 [12:09:23] (03CR) 10Muehlenhoff: [C: 03+1] "Looks good" [puppet] - 10https://gerrit.wikimedia.org/r/506992 (owner: 10Jbond) [12:10:23] (03CR) 10Muehlenhoff: [C: 03+1] "Looks good" [puppet] - 10https://gerrit.wikimedia.org/r/506991 (owner: 10Jbond) [12:15:30] (03CR) 10Jbond: "Hello All Sorry for the change spam, this patch set series is a rework of Arturo's change[1] I have tried to split this down as much as po" [puppet] - 10https://gerrit.wikimedia.org/r/506988 (owner: 10Jbond) [12:15:52] (03PS2) 10Jbond: rename standard: update roles [puppet] - 10https://gerrit.wikimedia.org/r/506988 [12:16:05] (03CR) 10Muehlenhoff: [C: 03+1] "Looks good" [puppet] - 10https://gerrit.wikimedia.org/r/506990 (owner: 10Jbond) [12:16:14] (03PS2) 10Jbond: rename standard: update all profiles [puppet] - 10https://gerrit.wikimedia.org/r/506987 [12:16:20] (03PS5) 10Mathew.onipe: elasticsearch: config file for aligning puppet config [puppet] - 10https://gerrit.wikimedia.org/r/506378 (https://phabricator.wikimedia.org/T218932) [12:18:15] (03PS1) 10Arturo Borrero Gonzalez: vagrant: refactor roles into profiles [puppet] - 10https://gerrit.wikimedia.org/r/507005 (https://phabricator.wikimedia.org/T221225) [12:19:35] (03PS4) 10Vgutierrez: trafficserver: Provide a unified monitoring define [puppet] - 10https://gerrit.wikimedia.org/r/506986 (https://phabricator.wikimedia.org/T221217) [12:19:37] (03PS27) 10Vgutierrez: trafficserver: Provide a TLS terminator profile and backend+TLS role [puppet] - 10https://gerrit.wikimedia.org/r/506398 (https://phabricator.wikimedia.org/T221594) [12:19:39] (03PS1) 10Vgutierrez: nagios_common: Provide check_https_hostheader_port_url check [puppet] - 10https://gerrit.wikimedia.org/r/507006 (https://phabricator.wikimedia.org/T221594) [12:19:41] (03CR) 10Muehlenhoff: [C: 03+1] "Looks good, one note on the commit message." (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/506994 (owner: 10Jbond) [12:20:47] (03PS2) 10Jbond: rename staging - Kubernetes workers: remove standard from site.pp [puppet] - 10https://gerrit.wikimedia.org/r/506994 [12:21:37] (03CR) 10Muehlenhoff: [C: 03+1] "Looks good, was partly decomissioned in https://phabricator.wikimedia.org/T130883" [puppet] - 10https://gerrit.wikimedia.org/r/507004 (owner: 10Jbond) [12:22:21] (03CR) 10Jbond: [C: 03+2] rename standard: cp3022.esams.wmnet dose not appear to exist anymore [puppet] - 10https://gerrit.wikimedia.org/r/507004 (owner: 10Jbond) [12:22:47] (03PS1) 10Jbond: rename standard: update comments [puppet] - 10https://gerrit.wikimedia.org/r/507007 [12:23:55] 10Operations, 10ops-esams, 10decommission: decom cp3011-22 (12 machines) - https://phabricator.wikimedia.org/T130883 (10jbond) intend to get this in a second but in case i forget the following DNS entries need cleaning up templates/10.in-addr.arpa:146 1H IN PTR cp3022.mgmt.esams.wmnet. templates/wmnet:cp3... [12:23:58] RECOVERY - Work requests waiting in Zuul Gearman server on contint1001 is OK: OK: Less than 30.00% above the threshold [90.0] https://grafana.wikimedia.org/dashboard/db/zuul-gearman?panelId=10&fullscreen&orgId=1 [12:24:10] (03PS2) 10Jbond: rename standard: cp3022.esams.wmnet dose not appear to exist anymore [puppet] - 10https://gerrit.wikimedia.org/r/507004 (https://phabricator.wikimedia.org/T130883) [12:24:25] (03PS3) 10Jbond: rename standard: cp3022.esams.wmnet dose not appear to exist anymore [puppet] - 10https://gerrit.wikimedia.org/r/507004 (https://phabricator.wikimedia.org/T130883) [12:24:35] (03PS6) 10Mathew.onipe: elasticsearch: config file for aligning puppet config [puppet] - 10https://gerrit.wikimedia.org/r/506378 (https://phabricator.wikimedia.org/T218932) [12:25:36] (03CR) 10Jbond: [C: 03+2] rename standard: cp3022.esams.wmnet dose not appear to exist anymore [puppet] - 10https://gerrit.wikimedia.org/r/507004 (https://phabricator.wikimedia.org/T130883) (owner: 10Jbond) [12:27:36] PROBLEM - Check systemd state on ms-be2014 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [12:29:29] (03PS4) 10Arturo Borrero Gonzalez: cloudvps: introduce proper base role/profile for VM instances [puppet] - 10https://gerrit.wikimedia.org/r/506979 (https://phabricator.wikimedia.org/T221225) [12:29:31] (03PS13) 10Arturo Borrero Gonzalez: sudo: decouple sudo from sudo-ldap [puppet] - 10https://gerrit.wikimedia.org/r/506435 (https://phabricator.wikimedia.org/T221225) [12:30:38] RECOVERY - puppet last run on lvs3004 is OK: OK: Puppet is currently enabled, last run 1 second ago with 0 failures [12:30:43] (03PS3) 10Jbond: rename standard: update roles [puppet] - 10https://gerrit.wikimedia.org/r/506988 [12:31:23] (03CR) 10jerkins-bot: [V: 04-1] sudo: decouple sudo from sudo-ldap [puppet] - 10https://gerrit.wikimedia.org/r/506435 (https://phabricator.wikimedia.org/T221225) (owner: 10Arturo Borrero Gonzalez) [12:31:47] (03PS2) 10Jbond: rename standard: update comments [puppet] - 10https://gerrit.wikimedia.org/r/507007 [12:31:49] (03CR) 10Muehlenhoff: [C: 03+1] "Looks good" [puppet] - 10https://gerrit.wikimedia.org/r/506995 (owner: 10Jbond) [12:34:12] (03PS3) 10Jbond: rename standard: update all profiles [puppet] - 10https://gerrit.wikimedia.org/r/506987 [12:34:14] (03PS4) 10Jbond: rename standard: update roles [puppet] - 10https://gerrit.wikimedia.org/r/506988 [12:34:52] (03CR) 10Jbond: [C: 03+2] rename standard: update comments [puppet] - 10https://gerrit.wikimedia.org/r/507007 (owner: 10Jbond) [12:35:06] (03PS3) 10Jbond: rename standard: update comments [puppet] - 10https://gerrit.wikimedia.org/r/507007 [12:38:18] RECOVERY - Check systemd state on ms-be2014 is OK: OK - running: The system is fully operational [12:42:04] (03CR) 10Arturo Borrero Gonzalez: [C: 03+1] "+1. the WMCS part looks just fine." [puppet] - 10https://gerrit.wikimedia.org/r/506987 (owner: 10Jbond) [12:42:31] (03CR) 10Muehlenhoff: [C: 03+1] "Looks good. Totally unrelated to this change; but I'm wondering if has_default_mail_relay:false is actually still needed for RT and rackta" [puppet] - 10https://gerrit.wikimedia.org/r/506989 (owner: 10Jbond) [12:45:16] (03CR) 10Muehlenhoff: [C: 03+1] "Looks good" [puppet] - 10https://gerrit.wikimedia.org/r/506987 (owner: 10Jbond) [12:46:04] !log resume rollout rsyslog 8.1901.0-1 to jessie hosts - T219764 [12:46:08] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:46:09] T219764: Upgrade jessie hosts to rsyslog 8.1901.0-1 - https://phabricator.wikimedia.org/T219764 [12:46:53] (03PS2) 10Fsero: registryha: feat: new records for lvs Updated: added reverse records Bug: T221101 Change-Id: I028ef9e338c97f6ad67b27ee7524fcfdbea64d22 [dns] - 10https://gerrit.wikimedia.org/r/506369 (https://phabricator.wikimedia.org/T221101) [12:48:24] (03PS3) 10Fsero: registryha: feat: new records for lvs Updated: added reverse records [dns] - 10https://gerrit.wikimedia.org/r/506369 (https://phabricator.wikimedia.org/T221101) [12:49:27] (03PS1) 10Jbond: decom cp3011-22: Removing old DNS entries [dns] - 10https://gerrit.wikimedia.org/r/507010 (https://phabricator.wikimedia.org/T130883) [12:49:57] (03PS1) 10CDanis: admin: add yubikey ssh key for cdanis [puppet] - 10https://gerrit.wikimedia.org/r/507011 [12:53:53] (03CR) 10Muehlenhoff: [C: 04-1] "The mgmt entries are only removed once the servers are physically taken down by dc ops, see https://wikitech.wikimedia.org/wiki/Server_Lif" [dns] - 10https://gerrit.wikimedia.org/r/507010 (https://phabricator.wikimedia.org/T130883) (owner: 10Jbond) [12:55:28] (03CR) 10Volans: [C: 03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/507011 (owner: 10CDanis) [12:56:26] (03CR) 10CDanis: [C: 03+2] admin: add yubikey ssh key for cdanis [puppet] - 10https://gerrit.wikimedia.org/r/507011 (owner: 10CDanis) [12:56:43] (03CR) 10Jbond: "> Patch Set 1: Code-Review-1" [dns] - 10https://gerrit.wikimedia.org/r/507010 (https://phabricator.wikimedia.org/T130883) (owner: 10Jbond) [12:59:52] (03CR) 10Muehlenhoff: [C: 04-1] "Yeah, they were shut down back in 2017:" [dns] - 10https://gerrit.wikimedia.org/r/507010 (https://phabricator.wikimedia.org/T130883) (owner: 10Jbond) [13:05:06] (03CR) 10Jbond: "Thanks for the explanation Moritz. Added mark and will leave the change here until they are gone" [dns] - 10https://gerrit.wikimedia.org/r/507010 (https://phabricator.wikimedia.org/T130883) (owner: 10Jbond) [13:13:27] (03CR) 10Elukey: [C: 03+1] rename standard: update all profiles [puppet] - 10https://gerrit.wikimedia.org/r/506987 (owner: 10Jbond) [13:17:06] (03PS6) 10Fsero: registryha: feat: introducing LVS configuration [puppet] - 10https://gerrit.wikimedia.org/r/506367 (https://phabricator.wikimedia.org/T221101) [13:17:40] (03CR) 10Ema: [C: 03+1] "A few minor comments, LGTM otherwise. Really nice work!" (035 comments) [puppet] - 10https://gerrit.wikimedia.org/r/504601 (https://phabricator.wikimedia.org/T221217) (owner: 10Vgutierrez) [13:17:49] !log rolling security updates for libpng [13:17:51] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:28:06] (03CR) 10Elukey: [C: 03+1] "Despite the length of the change, it is indeed way more clear to review, thanks! I checked that only roles were changed and that standard " [puppet] - 10https://gerrit.wikimedia.org/r/506988 (owner: 10Jbond) [13:30:15] (03PS7) 10Mathew.onipe: elasticsearch: config file for aligning puppet config [puppet] - 10https://gerrit.wikimedia.org/r/506378 (https://phabricator.wikimedia.org/T218932) [13:30:45] (03CR) 10Ottomata: [C: 03+1] rename standard: move the standard out of site.pp [puppet] - 10https://gerrit.wikimedia.org/r/506992 (owner: 10Jbond) [13:30:54] (03CR) 10Ottomata: [C: 03+1] admin: add the 'analytics' system user to analytics-privatedata-users [puppet] - 10https://gerrit.wikimedia.org/r/504839 (https://phabricator.wikimedia.org/T220971) (owner: 10Elukey) [13:31:12] (03CR) 10Elukey: [C: 04-1] "This needs also a change in role::requesttracker (standard -> profile::standard)" [puppet] - 10https://gerrit.wikimedia.org/r/506989 (owner: 10Jbond) [13:31:47] (03CR) 10Elukey: [C: 03+1] rename standard: update the mx role [puppet] - 10https://gerrit.wikimedia.org/r/506990 (owner: 10Jbond) [13:32:02] (03CR) 10Elukey: [C: 03+1] rename standard: move standard out of site.pp and into pybaltest role [puppet] - 10https://gerrit.wikimedia.org/r/506991 (owner: 10Jbond) [13:32:18] (03CR) 10Elukey: [C: 03+1] rename standard: move the standard out of site.pp [puppet] - 10https://gerrit.wikimedia.org/r/506992 (owner: 10Jbond) [13:32:36] (03CR) 10Elukey: [C: 03+1] rename standard: move standard out of site.pp and into role::ores::redis [puppet] - 10https://gerrit.wikimedia.org/r/506993 (owner: 10Jbond) [13:32:54] (03CR) 10Gilles: "Thanks for fixing all these issues. ./run.py used to get pretty far for me, and then on that machine I didn't have varnishtest installed s" [puppet] - 10https://gerrit.wikimedia.org/r/495643 (https://phabricator.wikimedia.org/T216339) (owner: 10Gilles) [13:32:58] (03CR) 10Elukey: [C: 03+1] rename staging - Kubernetes workers: remove standard from site.pp [puppet] - 10https://gerrit.wikimedia.org/r/506994 (owner: 10Jbond) [13:33:26] (03CR) 10Elukey: [C: 03+1] rename standard: update site.pp default [puppet] - 10https://gerrit.wikimedia.org/r/506995 (owner: 10Jbond) [13:42:32] (03CR) 10Giuseppe Lavagetto: [C: 03+1] registryha: feat: new records for lvs Updated: added reverse records [dns] - 10https://gerrit.wikimedia.org/r/506369 (https://phabricator.wikimedia.org/T221101) (owner: 10Fsero) [13:43:23] (03CR) 10Herron: [C: 03+1] transparency report: allow members of LDAP 'nda' to see private site [puppet] - 10https://gerrit.wikimedia.org/r/506848 (https://phabricator.wikimedia.org/T221744) (owner: 10Dzahn) [13:44:28] (03CR) 10Herron: [C: 03+1] ldap-admins: add foks, add admin group on labweb hosts [puppet] - 10https://gerrit.wikimedia.org/r/506542 (https://phabricator.wikimedia.org/T220860) (owner: 10Dzahn) [13:44:46] (03CR) 10Fsero: [C: 03+2] registryha: feat: new records for lvs Updated: added reverse records [dns] - 10https://gerrit.wikimedia.org/r/506369 (https://phabricator.wikimedia.org/T221101) (owner: 10Fsero) [13:45:43] !log DNS: creating docker-registry.svc.(eqiad|codfw).wmnet RRs [13:45:45] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:45:46] 10Operations, 10User-fgiunchedi: Upgrade jessie hosts to rsyslog 8.1901.0-1 - https://phabricator.wikimedia.org/T219764 (10fgiunchedi) 05Open→03Resolved a:03fgiunchedi I've finished rolling out 8.1901.0-1 to jessie hosts in production, @Krenair looks like this can be resolved? I'll handle the stretch rsy... [13:46:37] (03PS5) 10CRusnov: Cleanups to the oldhardware report [software/netbox-reports] - 10https://gerrit.wikimedia.org/r/506008 (https://phabricator.wikimedia.org/T220422) [13:47:01] (03Abandoned) 10CRusnov: Cleanups to the oldhardware report [software/netbox-reports] - 10https://gerrit.wikimedia.org/r/506718 (https://phabricator.wikimedia.org/T220422) (owner: 10CRusnov) [13:49:30] ACKNOWLEDGEMENT - High lag on wdqs1003 is CRITICAL: 5343 ge 3600 Gehel update lag above threshold, but starting to recover https://grafana.wikimedia.org/dashboard/db/wikidata-query-service?orgId=1&panelId=8&fullscreen [13:50:07] 10Operations, 10Thumbor, 10Traffic: SwiftMedia URL rewrite returns some 404s with wrong Content-Length - https://phabricator.wikimedia.org/T222071 (10ema) [13:50:14] 10Operations, 10Thumbor, 10Traffic: SwiftMedia URL rewrite returns some 404s with wrong Content-Length - https://phabricator.wikimedia.org/T222071 (10ema) p:05Triage→03Normal [13:51:02] (03CR) 10Giuseppe Lavagetto: [C: 03+1] "LGTM, see my question inline" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/506367 (https://phabricator.wikimedia.org/T221101) (owner: 10Fsero) [13:51:58] (03CR) 10Mathew.onipe: [C: 04-1] elasticsearch: config file for aligning puppet config (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/506378 (https://phabricator.wikimedia.org/T218932) (owner: 10Mathew.onipe) [13:53:58] (03CR) 10Fsero: registryha: feat: introducing LVS configuration (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/506367 (https://phabricator.wikimedia.org/T221101) (owner: 10Fsero) [13:55:37] (03PS36) 10Vgutierrez: trafficserver: Provide support for multiple ATS instances [puppet] - 10https://gerrit.wikimedia.org/r/504601 (https://phabricator.wikimedia.org/T221217) [13:55:49] (03CR) 10Vgutierrez: trafficserver: Provide support for multiple ATS instances (035 comments) [puppet] - 10https://gerrit.wikimedia.org/r/504601 (https://phabricator.wikimedia.org/T221217) (owner: 10Vgutierrez) [13:56:42] !log rolling security updates for imagemagick [13:56:44] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:00:20] PROBLEM - HHVM rendering on mw1312 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers [14:02:48] RECOVERY - HHVM rendering on mw1312 is OK: HTTP OK: HTTP/1.1 200 OK - 77843 bytes in 0.828 second response time https://wikitech.wikimedia.org/wiki/Application_servers [14:05:05] (03CR) 10Elukey: "Aaron/Timo: Thoughts? :)" [puppet] - 10https://gerrit.wikimedia.org/r/504831 (https://phabricator.wikimedia.org/T214275) (owner: 10Elukey) [14:07:31] (03CR) 10Krinkle: [C: 03+1] "The reason it's a no-op is presumably because mediawiki_memcached_servers remains non-empty for the time being? LGTM." [puppet] - 10https://gerrit.wikimedia.org/r/504831 (https://phabricator.wikimedia.org/T214275) (owner: 10Elukey) [14:07:47] (03CR) 10Ladsgroup: "I think this is good to go now." [puppet] - 10https://gerrit.wikimedia.org/r/504776 (owner: 10Ladsgroup) [14:10:56] 10Operations, 10Performance-Team, 10Thumbor, 10Traffic: SwiftMedia URL rewrite returns some 404s with wrong Content-Length - https://phabricator.wikimedia.org/T222071 (10Gilles) a:03Gilles [14:11:35] (03CR) 10Elukey: "> The reason it's a no-op is presumably because mediawiki_memcached_servers" [puppet] - 10https://gerrit.wikimedia.org/r/504831 (https://phabricator.wikimedia.org/T214275) (owner: 10Elukey) [14:11:45] (03CR) 10Gilles: "Filed reason for the failure: https://phabricator.wikimedia.org/T222072" [puppet] - 10https://gerrit.wikimedia.org/r/495643 (https://phabricator.wikimedia.org/T216339) (owner: 10Gilles) [14:15:23] (03CR) 10Giuseppe Lavagetto: [C: 03+1] "Please remove the addition of the discovery data for now." (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/506367 (https://phabricator.wikimedia.org/T221101) (owner: 10Fsero) [14:19:54] (03PS7) 10Fsero: registryha: feat: introducing LVS configuration [puppet] - 10https://gerrit.wikimedia.org/r/506367 (https://phabricator.wikimedia.org/T221101) [14:21:45] (03PS1) 10Muehlenhoff: Ignore libpng for tiff service restarts [puppet] - 10https://gerrit.wikimedia.org/r/507020 [14:22:33] (03PS5) 10Gilles: Normalize thumbnail URLs to avoid cachebusting [puppet] - 10https://gerrit.wikimedia.org/r/495643 (https://phabricator.wikimedia.org/T216339) [14:23:12] (03CR) 10Gilles: Normalize thumbnail URLs to avoid cachebusting (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/495643 (https://phabricator.wikimedia.org/T216339) (owner: 10Gilles) [14:23:38] 10Operations, 10puppet-compiler, 10Jenkins: compiler1002.puppet-diffs.eqiad.wmflabs disk is full - https://phabricator.wikimedia.org/T222072 (10Gilles) [14:29:00] PROBLEM - Check systemd state on ms-be2014 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [14:34:20] 10Operations, 10puppet-compiler, 10Jenkins: compiler1002.puppet-diffs.eqiad.wmflabs disk is full - https://phabricator.wikimedia.org/T222072 (10Vgutierrez) I've cleaned outputs older than 31 days, that gave us almost 5G: ` root@compiler1002:/srv/jenkins-workspace/puppet-compiler/output# find ./ -type d -ctim... [14:35:30] (03PS37) 10Vgutierrez: trafficserver: Provide support for multiple ATS instances [puppet] - 10https://gerrit.wikimedia.org/r/504601 (https://phabricator.wikimedia.org/T221217) [14:35:47] (03CR) 10Ottomata: [C: 03+1] admin: allow analytics-admins to control jupyter user units [puppet] - 10https://gerrit.wikimedia.org/r/504067 (owner: 10Elukey) [14:35:55] (03CR) 10Muehlenhoff: [C: 03+1] rename standard: move standard out of site.pp and into role::ores::redis [puppet] - 10https://gerrit.wikimedia.org/r/506993 (owner: 10Jbond) [14:37:34] (03CR) 10Vgutierrez: "pcc looks happy after rebasing the change, https://puppet-compiler.wmflabs.org/compiler1002/16159/" [puppet] - 10https://gerrit.wikimedia.org/r/504601 (https://phabricator.wikimedia.org/T221217) (owner: 10Vgutierrez) [14:38:06] RECOVERY - Check systemd state on ms-be2014 is OK: OK - running: The system is fully operational [14:38:13] (03PS2) 10Muehlenhoff: admin: Remove my access from ores [puppet] - 10https://gerrit.wikimedia.org/r/504776 (owner: 10Ladsgroup) [14:39:13] (03CR) 10Muehlenhoff: [C: 03+2] admin: Remove my access from ores [puppet] - 10https://gerrit.wikimedia.org/r/504776 (owner: 10Ladsgroup) [14:39:27] 10Operations, 10User-fgiunchedi: Upgrade jessie hosts to rsyslog 8.1901.0-1 - https://phabricator.wikimedia.org/T219764 (10Krenair) Thanks! [14:42:13] (03CR) 10Ema: [C: 03+1] "Nice!" [puppet] - 10https://gerrit.wikimedia.org/r/504601 (https://phabricator.wikimedia.org/T221217) (owner: 10Vgutierrez) [14:44:46] (03PS8) 10Vgutierrez: acme_chief: Prevalidate CN/SNI list [software/acme-chief] - 10https://gerrit.wikimedia.org/r/504512 (https://phabricator.wikimedia.org/T220518) [14:46:14] (03PS1) 10Ema: Add profile::cache::varnish::frontend::text [puppet] - 10https://gerrit.wikimedia.org/r/507022 (https://phabricator.wikimedia.org/T219967) [14:48:06] 10Operations, 10puppet-compiler, 10Jenkins: compiler1002.puppet-diffs.eqiad.wmflabs disk is full - https://phabricator.wikimedia.org/T222072 (10Gilles) 05Open→03Resolved a:03Gilles [14:49:47] !log added uid=sukhe,ou=people,dc=wikimedia,dc=org to nda ldap group T221990 [14:49:50] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:49:51] T221990: LDAP access to the nda group for sukhe - https://phabricator.wikimedia.org/T221990 [14:50:10] (03CR) 10jenkins-bot: acme_chief: Prevalidate CN/SNI list [software/acme-chief] - 10https://gerrit.wikimedia.org/r/504512 (https://phabricator.wikimedia.org/T220518) (owner: 10Vgutierrez) [14:50:23] 10Operations, 10ops-codfw, 10DC-Ops, 10decommission: decommission db2014,db2020, db2021, db2022, db2024, db2031 - https://phabricator.wikimedia.org/T221424 (10Papaul) [14:51:43] (03PS8) 10Mathew.onipe: elasticsearch: config file for aligning puppet config [puppet] - 10https://gerrit.wikimedia.org/r/506378 (https://phabricator.wikimedia.org/T218932) [14:52:34] (03PS1) 10Vgutierrez: Release 0.17 [software/acme-chief] - 10https://gerrit.wikimedia.org/r/507026 (https://phabricator.wikimedia.org/T220518) [14:53:38] (03PS9) 10Mathew.onipe: elasticsearch: config file for aligning puppet config [puppet] - 10https://gerrit.wikimedia.org/r/506378 (https://phabricator.wikimedia.org/T218932) [14:55:18] 10Operations, 10monitoring: Icinga meta-monitoring: automatically sync contact list - https://phabricator.wikimedia.org/T222074 (10Volans) [14:55:26] 10Operations, 10monitoring: Icinga meta-monitoring: automatically sync contact list - https://phabricator.wikimedia.org/T222074 (10Volans) p:05Triage→03Normal [14:56:01] 10Operations, 10RESTBase, 10RESTBase-API, 10serviceops, and 4 others: Make RESTBase spec standard compliant and switch to OpenAPI 3.0 - https://phabricator.wikimedia.org/T218218 (10CCicalese_WMF) [14:56:20] 10Operations, 10RESTBase, 10RESTBase-API, 10serviceops, and 4 others: Make RESTBase spec standard compliant and switch to OpenAPI 3.0 - https://phabricator.wikimedia.org/T218218 (10CCicalese_WMF) [14:57:50] PROBLEM - HHVM jobrunner on mw2161 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - 473 bytes in 0.073 second response time https://wikitech.wikimedia.org/wiki/Jobrunner [14:59:08] RECOVERY - HHVM jobrunner on mw2161 is OK: HTTP OK: HTTP/1.1 200 OK - 270 bytes in 0.081 second response time https://wikitech.wikimedia.org/wiki/Jobrunner [14:59:46] (03CR) 10Thcipriani: [C: 03+1] "Now that the releng offsite is over, we should get this merged." [puppet] - 10https://gerrit.wikimedia.org/r/504973 (https://phabricator.wikimedia.org/T182756) (owner: 10Hashar) [15:04:01] (03PS1) 10Lucas Werkmeister (WMDE): Serialize empty lists as objects on Test Wikidata [mediawiki-config] - 10https://gerrit.wikimedia.org/r/507030 (https://phabricator.wikimedia.org/T138104) [15:04:03] (03PS1) 10Lucas Werkmeister (WMDE): Serialize empty lists as objects on Wikidata [mediawiki-config] - 10https://gerrit.wikimedia.org/r/507031 (https://phabricator.wikimedia.org/T138104) [15:04:06] (03PS1) 10Lucas Werkmeister (WMDE): Serialize empty lists as objects on Commons [mediawiki-config] - 10https://gerrit.wikimedia.org/r/507032 (https://phabricator.wikimedia.org/T138104) [15:07:06] (03PS4) 10Jbond: rename standard: update all profiles [puppet] - 10https://gerrit.wikimedia.org/r/506987 [15:07:08] (03PS5) 10Jbond: rename standard: update roles [puppet] - 10https://gerrit.wikimedia.org/r/506988 [15:07:10] (03PS2) 10Jbond: rename standard: roles with hiera updates [puppet] - 10https://gerrit.wikimedia.org/r/506989 [15:07:12] (03PS2) 10Jbond: rename standard: update the mx role [puppet] - 10https://gerrit.wikimedia.org/r/506990 [15:07:15] (03PS2) 10Jbond: rename standard: move standard out of site.pp and into pybaltest role [puppet] - 10https://gerrit.wikimedia.org/r/506991 [15:07:17] (03PS3) 10Jbond: rename standard: move the standard out of site.pp and into role analytics_cluster::hadoop::client [puppet] - 10https://gerrit.wikimedia.org/r/506992 [15:07:19] (03PS2) 10Jbond: rename standard: move standard out of site.pp and into role::ores::redis [puppet] - 10https://gerrit.wikimedia.org/r/506993 [15:07:21] (03PS3) 10Jbond: rename staging: remove standard from site.pp [puppet] - 10https://gerrit.wikimedia.org/r/506994 [15:07:23] (03PS2) 10Jbond: rename standard: update site.pp default [puppet] - 10https://gerrit.wikimedia.org/r/506995 [15:08:06] (03CR) 10Jbond: [C: 03+2] rename standard: update all profiles [puppet] - 10https://gerrit.wikimedia.org/r/506987 (owner: 10Jbond) [15:08:37] (03CR) 10Jbond: [C: 03+2] rename standard: update roles [puppet] - 10https://gerrit.wikimedia.org/r/506988 (owner: 10Jbond) [15:08:48] (03PS3) 10Giuseppe Lavagetto: profile::mediawiki::maintenance: migrate tor job to periodic_job [puppet] - 10https://gerrit.wikimedia.org/r/482793 (https://phabricator.wikimedia.org/T211250) [15:08:53] (03CR) 10jerkins-bot: [V: 04-1] rename standard: move the standard out of site.pp and into role analytics_cluster::hadoop::client [puppet] - 10https://gerrit.wikimedia.org/r/506992 (owner: 10Jbond) [15:10:06] 10Operations, 10puppet-compiler, 10User-herron: Prevent puppet catalog compiler workers from running out of disk space - https://phabricator.wikimedia.org/T222075 (10herron) p:05Triage→03Normal [15:10:21] 10Operations, 10monitoring, 10puppet-compiler, 10User-herron: Prevent puppet catalog compiler workers from running out of disk space - https://phabricator.wikimedia.org/T222075 (10herron) [15:12:06] (03CR) 10Jforrester: "Presumably this should happen to testcommonswiki too?" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/507032 (https://phabricator.wikimedia.org/T138104) (owner: 10Lucas Werkmeister (WMDE)) [15:12:08] (03CR) 10Jbond: "> Patch Set 1: Code-Review-1" [puppet] - 10https://gerrit.wikimedia.org/r/506989 (owner: 10Jbond) [15:12:12] (03CR) 10Jbond: [C: 03+2] rename standard: roles with hiera updates [puppet] - 10https://gerrit.wikimedia.org/r/506989 (owner: 10Jbond) [15:12:53] (03CR) 10Lucas Werkmeister (WMDE): "Thanks, I forgot about testcommonswiki. Separate change or same as this one?" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/507032 (https://phabricator.wikimedia.org/T138104) (owner: 10Lucas Werkmeister (WMDE)) [15:13:08] PROBLEM - puppet last run on cp2019 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:13:12] PROBLEM - puppet last run on cp4026 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:14:02] (03CR) 10Jbond: [C: 03+2] rename standard: update the mx role [puppet] - 10https://gerrit.wikimedia.org/r/506990 (owner: 10Jbond) [15:14:14] ^^looking [15:15:30] ^^ seems fine now, suspect it is caused by me merging a lot of changes [15:16:18] PROBLEM - puppet last run on cp2006 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:16:59] (03PS2) 10Lucas Werkmeister (WMDE): Serialize empty lists as objects on Commons [mediawiki-config] - 10https://gerrit.wikimedia.org/r/507032 (https://phabricator.wikimedia.org/T138104) [15:17:01] (03PS1) 10Lucas Werkmeister (WMDE): Serialize empty lists as objects on Test Commons [mediawiki-config] - 10https://gerrit.wikimedia.org/r/507034 (https://phabricator.wikimedia.org/T138104) [15:17:29] (03CR) 10Lucas Werkmeister (WMDE): "Went for a separate change, though I plan to SWAT them all tomorrow (hopefully there’ll be enough time)." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/507032 (https://phabricator.wikimedia.org/T138104) (owner: 10Lucas Werkmeister (WMDE)) [15:17:31] (03PS4) 10Giuseppe Lavagetto: profile::mediawiki::maintenance: migrate tor job to periodic_job [puppet] - 10https://gerrit.wikimedia.org/r/482793 (https://phabricator.wikimedia.org/T211250) [15:17:44] PROBLEM - puppet last run on krypton is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:18:00] PROBLEM - puppet last run on mw1286 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:18:26] RECOVERY - puppet last run on cp2019 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [15:19:43] (03CR) 10Jforrester: [C: 03+1] Serialize empty lists as objects on Commons [mediawiki-config] - 10https://gerrit.wikimedia.org/r/507032 (https://phabricator.wikimedia.org/T138104) (owner: 10Lucas Werkmeister (WMDE)) [15:20:16] (03CR) 10Giuseppe Lavagetto: [C: 03+2] profile::mediawiki::maintenance: migrate tor job to periodic_job [puppet] - 10https://gerrit.wikimedia.org/r/482793 (https://phabricator.wikimedia.org/T211250) (owner: 10Giuseppe Lavagetto) [15:21:36] RECOVERY - puppet last run on cp2006 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [15:25:18] 10Operations, 10Puppet, 10puppet-compiler: Frequent puppet failures - https://phabricator.wikimedia.org/T221529 (10jbond) Have not had time to look at this in depth yet however i did just notice an issue while applying a refactor change[1] while applying the change set i got the following eror on a few host... [15:25:48] (03CR) 10Jbond: [C: 03+2] rename standard: move standard out of site.pp and into pybaltest role [puppet] - 10https://gerrit.wikimedia.org/r/506991 (owner: 10Jbond) [15:25:58] (03CR) 10CDanis: "ping :)" [puppet] - 10https://gerrit.wikimedia.org/r/504898 (https://phabricator.wikimedia.org/T196336) (owner: 10CDanis) [15:26:55] (03PS3) 10Jbond: rename standard: move standard out of site.pp and into pybaltest role [puppet] - 10https://gerrit.wikimedia.org/r/506991 [15:26:57] (03PS4) 10Jbond: rename standard: move the standard out of site.pp and into role analytics_cluster::hadoop::client [puppet] - 10https://gerrit.wikimedia.org/r/506992 [15:26:59] (03PS3) 10Jbond: rename standard: move standard out of site.pp and into role::ores::redis [puppet] - 10https://gerrit.wikimedia.org/r/506993 [15:27:01] (03PS4) 10Jbond: rename staging: remove standard from site.pp [puppet] - 10https://gerrit.wikimedia.org/r/506994 [15:27:03] (03PS3) 10Jbond: rename standard: update site.pp default [puppet] - 10https://gerrit.wikimedia.org/r/506995 [15:27:10] (03PS10) 10Mathew.onipe: elasticsearch: config file for aligning puppet config [puppet] - 10https://gerrit.wikimedia.org/r/506378 (https://phabricator.wikimedia.org/T218932) [15:27:31] (03CR) 10CRusnov: icinga: pause nsca on reloads (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/504898 (https://phabricator.wikimedia.org/T196336) (owner: 10CDanis) [15:28:07] (03CR) 10jerkins-bot: [V: 04-1] rename standard: move the standard out of site.pp and into role analytics_cluster::hadoop::client [puppet] - 10https://gerrit.wikimedia.org/r/506992 (owner: 10Jbond) [15:28:28] (03CR) 10CDanis: icinga: pause nsca on reloads (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/504898 (https://phabricator.wikimedia.org/T196336) (owner: 10CDanis) [15:28:43] (03PS1) 10Mholloway: Update WikimediaEditorTasks config [mediawiki-config] - 10https://gerrit.wikimedia.org/r/507035 (https://phabricator.wikimedia.org/T221951) [15:28:48] (03PS5) 10Jbond: rename standard: move the standard out of site.pp [puppet] - 10https://gerrit.wikimedia.org/r/506992 [15:28:54] PROBLEM - Check systemd state on ms-be2015 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [15:29:48] (03CR) 10Jbond: [C: 03+2] rename standard: move the standard out of site.pp [puppet] - 10https://gerrit.wikimedia.org/r/506992 (owner: 10Jbond) [15:29:51] (03CR) 10CRusnov: icinga: pause nsca on reloads (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/504898 (https://phabricator.wikimedia.org/T196336) (owner: 10CDanis) [15:29:56] (03PS6) 10Paladox: gerrit: Enable G1 GC [puppet] - 10https://gerrit.wikimedia.org/r/327763 (https://phabricator.wikimedia.org/T221026) [15:30:11] (03PS7) 10Paladox: gerrit: Enable G1 GC [puppet] - 10https://gerrit.wikimedia.org/r/327763 (https://phabricator.wikimedia.org/T221026) [15:30:20] (03CR) 10Jbond: [C: 03+2] rename standard: move standard out of site.pp and into role::ores::redis [puppet] - 10https://gerrit.wikimedia.org/r/506993 (owner: 10Jbond) [15:30:51] (03PS4) 10Jbond: rename standard: move standard out of site.pp and into role::ores::redis [puppet] - 10https://gerrit.wikimedia.org/r/506993 [15:31:51] (03PS5) 10Jbond: rename staging: remove standard from site.pp [puppet] - 10https://gerrit.wikimedia.org/r/506994 [15:32:00] (03CR) 10Dbrant: [C: 03+1] Update WikimediaEditorTasks config [mediawiki-config] - 10https://gerrit.wikimedia.org/r/507035 (https://phabricator.wikimedia.org/T221951) (owner: 10Mholloway) [15:32:02] (03PS4) 10Jbond: rename standard: update site.pp default [puppet] - 10https://gerrit.wikimedia.org/r/506995 [15:33:37] (03CR) 10Jbond: [C: 03+2] rename staging: remove standard from site.pp [puppet] - 10https://gerrit.wikimedia.org/r/506994 (owner: 10Jbond) [15:34:06] RECOVERY - Check systemd state on ms-be2015 is OK: OK - running: The system is fully operational [15:34:20] PROBLEM - puppet last run on labweb1002 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:34:30] (03CR) 10Jbond: [C: 03+2] rename standard: update site.pp default [puppet] - 10https://gerrit.wikimedia.org/r/506995 (owner: 10Jbond) [15:36:36] (03CR) 10Mholloway: [C: 03+2] Update WikimediaEditorTasks config [mediawiki-config] - 10https://gerrit.wikimedia.org/r/507035 (https://phabricator.wikimedia.org/T221951) (owner: 10Mholloway) [15:36:50] 10Operations, 10ops-codfw, 10DC-Ops, 10decommission: Decommission osm-db200[12] and osm-web200[1234] - https://phabricator.wikimedia.org/T187445 (10Papaul) osm-db2001 ` papaul@asw-a-codfw> show interfaces ge-5/0/2 descriptions Interface Admin Link Description ge-5/0/2 down down DISABLED `... [15:37:37] 10Operations, 10ops-codfw, 10DC-Ops, 10decommission: Decommission osm-db200[12] and osm-web200[1234] - https://phabricator.wikimedia.org/T187445 (10Papaul) [15:37:41] (03Merged) 10jenkins-bot: Update WikimediaEditorTasks config [mediawiki-config] - 10https://gerrit.wikimedia.org/r/507035 (https://phabricator.wikimedia.org/T221951) (owner: 10Mholloway) [15:37:56] (03CR) 10jenkins-bot: Update WikimediaEditorTasks config [mediawiki-config] - 10https://gerrit.wikimedia.org/r/507035 (https://phabricator.wikimedia.org/T221951) (owner: 10Mholloway) [15:38:37] (03CR) 10Jforrester: [C: 03+1] "Woohoo." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/506953 (https://phabricator.wikimedia.org/T219150) (owner: 10Giuseppe Lavagetto) [15:40:37] !log mholloway-shell@deploy1001 Synchronized wmf-config/InitialiseSettings.php: Update WikimediaEditorTasks counter config (T221951) (duration: 00m 58s) [15:40:40] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:40:41] (03CR) 10CRusnov: "> Patch Set 3: Code-Review-1" (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/505657 (https://phabricator.wikimedia.org/T215378) (owner: 10CRusnov) [15:40:41] T221951: Change unlock criteria for Add/translate descriptions to 24h and 3 edits - https://phabricator.wikimedia.org/T221951 [15:40:44] (03PS1) 10Giuseppe Lavagetto: profile::mediawiki::periodic_job: use a shell-safe name [puppet] - 10https://gerrit.wikimedia.org/r/507040 [15:41:43] (03CR) 10Giuseppe Lavagetto: [C: 03+2] profile::mediawiki::periodic_job: use a shell-safe name [puppet] - 10https://gerrit.wikimedia.org/r/507040 (owner: 10Giuseppe Lavagetto) [15:44:14] RECOVERY - puppet last run on krypton is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [15:44:28] RECOVERY - puppet last run on mw1286 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [15:44:58] RECOVERY - puppet last run on cp4026 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures [15:46:09] (03CR) 10CRusnov: "Inline replies." (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/505657 (https://phabricator.wikimedia.org/T215378) (owner: 10CRusnov) [15:48:50] (03PS4) 10CRusnov: Add a check_netbox_report icinga check [puppet] - 10https://gerrit.wikimedia.org/r/505657 (https://phabricator.wikimedia.org/T215378) [15:50:59] (03PS1) 10Giuseppe Lavagetto: profile::mediawiki::maintenance: run tor job every 20 minutes, not 20 seconds [puppet] - 10https://gerrit.wikimedia.org/r/507044 [15:51:15] (03CR) 10CDanis: icinga: pause nsca on reloads (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/504898 (https://phabricator.wikimedia.org/T196336) (owner: 10CDanis) [15:51:37] PROBLEM - puppet last run on labweb1001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:51:40] (03PS1) 10Mathew.onipe: icinga: create and apply config check [puppet] - 10https://gerrit.wikimedia.org/r/507045 (https://phabricator.wikimedia.org/T218932) [15:52:16] <_joe_> labweb is my fault [15:52:20] <_joe_> I'll fix it [15:52:36] 10Operations, 10Traffic, 10Core Platform Team Backlog (Next), 10Services (next): Have Varnish set the `X-Request-Id` header for incoming external requests - https://phabricator.wikimedia.org/T221976 (10ema) p:05Triage→03Normal [15:53:09] (03PS2) 10Mathew.onipe: icinga: create and apply cirrus config check [puppet] - 10https://gerrit.wikimedia.org/r/507045 (https://phabricator.wikimedia.org/T218932) [15:55:48] (03CR) 10Jbond: [C: 03+1] "LGTM and thanks :)" [puppet] - 10https://gerrit.wikimedia.org/r/507020 (owner: 10Muehlenhoff) [15:56:47] (03PS2) 10Giuseppe Lavagetto: profile::mediawiki::maintenance: run tor job every 20 minutes, not 20 seconds [puppet] - 10https://gerrit.wikimedia.org/r/507044 [15:57:46] (03CR) 10Jforrester: "Oops." [puppet] - 10https://gerrit.wikimedia.org/r/507044 (owner: 10Giuseppe Lavagetto) [15:58:52] (03CR) 10Giuseppe Lavagetto: [C: 03+2] profile::mediawiki::maintenance: run tor job every 20 minutes, not 20 seconds [puppet] - 10https://gerrit.wikimedia.org/r/507044 (owner: 10Giuseppe Lavagetto) [16:01:19] (03CR) 10CRusnov: icinga: pause nsca on reloads (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/504898 (https://phabricator.wikimedia.org/T196336) (owner: 10CDanis) [16:03:41] PROBLEM - HHVM rendering on mw2135 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers [16:03:41] PROBLEM - HHVM rendering on mw2255 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers [16:04:02] (03PS1) 10Jbond: debdeploy: ignore emacs for service restarts [puppet] - 10https://gerrit.wikimedia.org/r/507056 [16:04:43] RECOVERY - HHVM rendering on mw2135 is OK: HTTP OK: HTTP/1.1 200 OK - 78082 bytes in 0.286 second response time https://wikitech.wikimedia.org/wiki/Application_servers [16:04:43] RECOVERY - HHVM rendering on mw2255 is OK: HTTP OK: HTTP/1.1 200 OK - 78082 bytes in 0.304 second response time https://wikitech.wikimedia.org/wiki/Application_servers [16:06:42] (03PS4) 10Jforrester: SDC: Enable feature flag for depicts in UW on Test Commons [mediawiki-config] - 10https://gerrit.wikimedia.org/r/499523 (https://phabricator.wikimedia.org/T217024) [16:07:08] (03CR) 10Jforrester: "The time is nigh." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/499523 (https://phabricator.wikimedia.org/T217024) (owner: 10Jforrester) [16:08:27] PROBLEM - Check the last execution of MediaWiki job mediawiki_tor_exit_node on mwmaint2001 is CRITICAL: NRPE: Command check_check_MediaWiki not defined [16:09:10] (03CR) 10Matthias Mullie: [C: 03+1] SDC: Enable feature flag for depicts in UW on Test Commons [mediawiki-config] - 10https://gerrit.wikimedia.org/r/499523 (https://phabricator.wikimedia.org/T217024) (owner: 10Jforrester) [16:09:15] (03PS2) 10Jforrester: SDC: Configure initial qualifiers for Test Commons [mediawiki-config] - 10https://gerrit.wikimedia.org/r/502845 [16:09:28] (03CR) 10jerkins-bot: [V: 04-1] SDC: Configure initial qualifiers for Test Commons [mediawiki-config] - 10https://gerrit.wikimedia.org/r/502845 (owner: 10Jforrester) [16:13:04] (03CR) 10Vgutierrez: [C: 03+1] Add profile::cache::varnish::frontend::text [puppet] - 10https://gerrit.wikimedia.org/r/507022 (https://phabricator.wikimedia.org/T219967) (owner: 10Ema) [16:17:39] 10Operations, 10ops-codfw, 10Traffic, 10decommission: Decommission acamar and achernar - https://phabricator.wikimedia.org/T198286 (10Papaul) ` papaul@asw-a-codfw# run show interfaces ge-5/0/13 descriptions Interface Admin Link Description ge-5/0/13 down down DISABLED papaul@asw-b-codfw#... [16:19:14] (03CR) 10Gehel: [C: 04-1] "minor comments inline. The principle looks good, I'll do a more in depth review asap." (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/507045 (https://phabricator.wikimedia.org/T218932) (owner: 10Mathew.onipe) [16:19:42] 10Operations, 10ops-codfw, 10Traffic, 10decommission: Decommission acamar and achernar - https://phabricator.wikimedia.org/T198286 (10Papaul) [16:22:44] I'm going to deploy some config changes. [16:22:47] (03CR) 10Jforrester: [C: 03+2] SDC: Enable feature flag for depicts in UW on Test Commons [mediawiki-config] - 10https://gerrit.wikimedia.org/r/499523 (https://phabricator.wikimedia.org/T217024) (owner: 10Jforrester) [16:23:48] (03Merged) 10jenkins-bot: SDC: Enable feature flag for depicts in UW on Test Commons [mediawiki-config] - 10https://gerrit.wikimedia.org/r/499523 (https://phabricator.wikimedia.org/T217024) (owner: 10Jforrester) [16:25:21] (03PS2) 10Jforrester: Rename UploadWizard depicts/statements toggle [mediawiki-config] - 10https://gerrit.wikimedia.org/r/506639 (owner: 10Matthias Mullie) [16:25:33] (03CR) 10Jforrester: [C: 03+2] Rename UploadWizard depicts/statements toggle [mediawiki-config] - 10https://gerrit.wikimedia.org/r/506639 (owner: 10Matthias Mullie) [16:26:39] (03Merged) 10jenkins-bot: Rename UploadWizard depicts/statements toggle [mediawiki-config] - 10https://gerrit.wikimedia.org/r/506639 (owner: 10Matthias Mullie) [16:28:01] !log jforrester@deploy1001 Synchronized wmf-config/InitialiseSettings.php: SDC: Enable feature flag for depicts in UW on Test Commons (duration: 00m 53s) [16:28:04] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:30:01] (03PS11) 10Mathew.onipe: elasticsearch: config file for aligning puppet config [puppet] - 10https://gerrit.wikimedia.org/r/506378 (https://phabricator.wikimedia.org/T218932) [16:30:04] (03PS3) 10Mathew.onipe: icinga: create and apply cirrus config check [puppet] - 10https://gerrit.wikimedia.org/r/507045 (https://phabricator.wikimedia.org/T218932) [16:33:23] !log jforrester@deploy1001 Synchronized wmf-config/InitialiseSettings.php: SDC UW config cleanup: Add wmgMediaInfoEnableUploadWizardDepicts to IS (duration: 00m 53s) [16:33:25] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:33:42] !log jforrester@deploy1001 sync-file aborted: SDC UW config cleanup: Switch to wmgMediaInfoEnableUploadWizardStatements in CS (duration: 00m 01s) [16:33:44] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:34:21] (03PS2) 10Bstorm: cloudstore: introduce rsync framework for secondary cluster [puppet] - 10https://gerrit.wikimedia.org/r/506847 (https://phabricator.wikimedia.org/T209527) [16:34:46] (03CR) 10jenkins-bot: SDC: Enable feature flag for depicts in UW on Test Commons [mediawiki-config] - 10https://gerrit.wikimedia.org/r/499523 (https://phabricator.wikimedia.org/T217024) (owner: 10Jforrester) [16:34:48] (03CR) 10jenkins-bot: Rename UploadWizard depicts/statements toggle [mediawiki-config] - 10https://gerrit.wikimedia.org/r/506639 (owner: 10Matthias Mullie) [16:34:58] !log jforrester@deploy1001 Synchronized wmf-config/CommonSettings.php: SDC UW config cleanup: Switch to wmgMediaInfoEnableUploadWizardStatements in CS (duration: 00m 53s) [16:35:00] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:38:08] !log jforrester@deploy1001 Synchronized wmf-config/InitialiseSettings.php: SDC UW config cleanup: Drop wmgMediaInfoEnableUploadWizardDepicts from IS (duration: 00m 53s) [16:38:10] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:39:43] (03PS1) 10Jforrester: SDC: Stop setting wgMediaInfoEnableFilePageDepicts, no longer read [mediawiki-config] - 10https://gerrit.wikimedia.org/r/507060 [16:40:52] (03CR) 10Jforrester: [C: 04-1] "Not until wmf.3 is everywhere and won't roll back (relies on 573a80f being deployed)." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/507060 (owner: 10Jforrester) [16:42:19] (03PS1) 10Giuseppe Lavagetto: wikitech::web: re-add manually the cronjob for tor exit nodes [puppet] - 10https://gerrit.wikimedia.org/r/507061 [16:42:21] (03PS1) 10Giuseppe Lavagetto: profile::mediawiki::periodic_job: add syslog identifier [puppet] - 10https://gerrit.wikimedia.org/r/507062 [16:43:57] (03CR) 10Giuseppe Lavagetto: [C: 03+2] wikitech::web: re-add manually the cronjob for tor exit nodes [puppet] - 10https://gerrit.wikimedia.org/r/507061 (owner: 10Giuseppe Lavagetto) [16:45:42] 10Operations, 10Research, 10SRE-Access-Requests: Revoke @pirroh's shell access - https://phabricator.wikimedia.org/T222085 (10Miriam) [16:48:13] (03PS12) 10Mathew.onipe: elasticsearch: config file for aligning puppet config [puppet] - 10https://gerrit.wikimedia.org/r/506378 (https://phabricator.wikimedia.org/T218932) [16:48:18] (03PS4) 10Mathew.onipe: icinga: create and apply cirrus config check [puppet] - 10https://gerrit.wikimedia.org/r/507045 (https://phabricator.wikimedia.org/T218932) [16:48:24] RECOVERY - puppet last run on labweb1001 is OK: OK: Puppet is currently enabled, last run 15 seconds ago with 0 failures [16:51:27] 10Operations, 10SRE-Access-Requests: Allow analytics-admins to control jupyter user units - https://phabricator.wikimedia.org/T222087 (10Ottomata) [16:51:44] 10Operations, 10SRE-Access-Requests: Allow analytics-admins to control jupyter user units - https://phabricator.wikimedia.org/T222087 (10Ottomata) a:03elukey This was approved in SRE meeting. [16:53:22] (03PS13) 10Mathew.onipe: elasticsearch: config file for aligning puppet config [puppet] - 10https://gerrit.wikimedia.org/r/506378 (https://phabricator.wikimedia.org/T218932) [16:53:24] (03PS5) 10Mathew.onipe: icinga: create and apply cirrus config check [puppet] - 10https://gerrit.wikimedia.org/r/507045 (https://phabricator.wikimedia.org/T218932) [16:53:28] (03CR) 10Giuseppe Lavagetto: [C: 03+2] profile::mediawiki::periodic_job: add syslog identifier [puppet] - 10https://gerrit.wikimedia.org/r/507062 (owner: 10Giuseppe Lavagetto) [16:57:03] 10Operations, 10ops-eqiad, 10DBA, 10Patch-For-Review: rack/setup/install db1139|db1140.eqiad.wmnet (2 dump slaves) - https://phabricator.wikimedia.org/T218985 (10RobH) a:05RobH→03None IRC Update: This is ready for installation by the #DBA team, one of them will steal this task later this week. [16:57:31] 10Operations, 10ops-eqiad, 10DBA, 10Patch-For-Review: rack/setup/install db1139|db1140.eqiad.wmnet (2 dump slaves) - https://phabricator.wikimedia.org/T218985 (10jcrespo) a:03jcrespo [16:59:29] (03CR) 10Muehlenhoff: "I think we don't need to track end user applications. On a technical level there's nothing which prevents it, but that's quite a rabbit ho" [puppet] - 10https://gerrit.wikimedia.org/r/507056 (owner: 10Jbond) [17:00:04] gehel and onimisionipe: I, the Bot under the Fountain, allow thee, The Deployer, to do Wikidata Query Service weekly deploy deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20190429T1700). [17:01:09] here here [17:02:06] RECOVERY - puppet last run on labweb1002 is OK: OK: Puppet is currently enabled, last run 27 seconds ago with 0 failures [17:03:11] 10Operations, 10Puppet, 10puppet-compiler: Frequent puppet failures - https://phabricator.wikimedia.org/T221529 (10crusnov) >>! In T221529#5143984, @jbond wrote: > Have not had time to look at this in depth yet however i did just notice an issue while applying a refactor change[1] > > while applying the cha... [17:04:40] 10Operations, 10ops-codfw, 10DBA, 10Goal, 10Patch-For-Review: rack/setup/install db2[103-120].codfw.wmnet (18 hosts) - https://phabricator.wikimedia.org/T221532 (10Papaul) [17:05:08] (03PS3) 10Jforrester: SDC: Configure initial qualifiers for Test Commons [mediawiki-config] - 10https://gerrit.wikimedia.org/r/502845 [17:06:45] (03PS1) 10Alex Monk: role::spare::system: replace standard with profile::standard [puppet] - 10https://gerrit.wikimedia.org/r/507066 [17:07:47] (03CR) 10Giuseppe Lavagetto: [C: 03+2] gerrit: deploy gervert via scap3 [puppet] - 10https://gerrit.wikimedia.org/r/506932 (owner: 10Thcipriani) [17:08:09] (03PS2) 10Giuseppe Lavagetto: gerrit: deploy gervert via scap3 [puppet] - 10https://gerrit.wikimedia.org/r/506932 (owner: 10Thcipriani) [17:08:57] !log onimisionipe@deploy1001 Started deploy [wdqs/wdqs@9273213]: Blazegraph upgrade for new LDF version and GUI updates [17:08:59] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:09:01] 10Operations, 10fundraising-tech-ops, 10netops, 10Patch-For-Review: Revoke production prometheus fundraising access - https://phabricator.wikimedia.org/T217355 (10cwdent) commit 5e6951f6082627a1f61f38d27532a48197eada4f Author: Casey Dentinger Date: Mon Apr 29 17:03:... [17:09:31] 10Operations, 10fundraising-tech-ops, 10netops, 10Patch-For-Review: Revoke production prometheus fundraising access - https://phabricator.wikimedia.org/T217355 (10cwdent) commit 173b34cd01bdc0aa5998af347406d9775755656f (HEAD -> master, origin/master, origin/HEAD) Author: Casey Dentinger 10Operations, 10fundraising-tech-ops, 10netops, 10Patch-For-Review: Revoke production prometheus fundraising access - https://phabricator.wikimedia.org/T217355 (10cwdent) [17:12:03] (03CR) 10Jbond: "LGTM thanks arturo, this was originally in the wrong PS i must have made a mistake when moving it" [puppet] - 10https://gerrit.wikimedia.org/r/507066 (owner: 10Alex Monk) [17:13:02] 10Operations, 10ops-codfw, 10DBA, 10Goal, 10Patch-For-Review: rack/setup/install db2[103-120].codfw.wmnet (18 hosts) - https://phabricator.wikimedia.org/T221532 (10Papaul) [17:13:30] PROBLEM - Nginx local proxy to apache on mw1348 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - 1308 bytes in 0.007 second response time https://wikitech.wikimedia.org/wiki/Application_servers [17:14:50] RECOVERY - Nginx local proxy to apache on mw1348 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 617 bytes in 0.044 second response time https://wikitech.wikimedia.org/wiki/Application_servers [17:15:33] (03PS1) 10Paladox: beta: Stop cloning over /p/ [puppet] - 10https://gerrit.wikimedia.org/r/507069 [17:15:55] !log onimisionipe@deploy1001 Finished deploy [wdqs/wdqs@9273213]: Blazegraph upgrade for new LDF version and GUI updates (duration: 06m 58s) [17:15:57] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:16:30] (03PS2) 10Paladox: beta: Stop cloning over /p/ [puppet] - 10https://gerrit.wikimedia.org/r/507069 [17:16:49] (03PS1) 10Paladox: zuul: Stop cloning over /p/ [puppet] - 10https://gerrit.wikimedia.org/r/507070 [17:16:53] (03PS3) 10Bstorm: cloudstore: introduce rsync framework for secondary cluster [puppet] - 10https://gerrit.wikimedia.org/r/506847 (https://phabricator.wikimedia.org/T209527) [17:17:06] (03PS2) 10Paladox: zuul: Stop cloning over /p/ [puppet] - 10https://gerrit.wikimedia.org/r/507070 [17:17:20] (03PS1) 10Dzahn: admins: revoke access for user pirroh [puppet] - 10https://gerrit.wikimedia.org/r/507071 (https://phabricator.wikimedia.org/T222085) [17:18:58] (03CR) 10BBlack: [C: 03+2] facter3/puppet5: update interface fact parsing [puppet] - 10https://gerrit.wikimedia.org/r/506651 (https://phabricator.wikimedia.org/T219803) (owner: 10Jbond) [17:21:08] (03CR) 10Dzahn: [C: 03+2] admins: revoke access for user pirroh [puppet] - 10https://gerrit.wikimedia.org/r/507071 (https://phabricator.wikimedia.org/T222085) (owner: 10Dzahn) [17:21:58] (03CR) 10Jbond: [C: 03+1] "forgot to actully +1" [puppet] - 10https://gerrit.wikimedia.org/r/507066 (owner: 10Alex Monk) [17:22:44] (03CR) 10Jbond: "https://puppet-compiler.wmflabs.org/compiler1002/16171/" [puppet] - 10https://gerrit.wikimedia.org/r/506651 (https://phabricator.wikimedia.org/T219803) (owner: 10Jbond) [17:23:55] (03CR) 10Bstorm: [C: 03+2] cloudstore: introduce rsync framework for secondary cluster [puppet] - 10https://gerrit.wikimedia.org/r/506847 (https://phabricator.wikimedia.org/T209527) (owner: 10Bstorm) [17:24:05] (03PS4) 10Bstorm: cloudstore: introduce rsync framework for secondary cluster [puppet] - 10https://gerrit.wikimedia.org/r/506847 (https://phabricator.wikimedia.org/T209527) [17:24:55] (03PS14) 10Mathew.onipe: elasticsearch: config file for aligning puppet config [puppet] - 10https://gerrit.wikimedia.org/r/506378 (https://phabricator.wikimedia.org/T218932) [17:24:57] (03PS6) 10Mathew.onipe: icinga: create and apply cirrus config check [puppet] - 10https://gerrit.wikimedia.org/r/507045 (https://phabricator.wikimedia.org/T218932) [17:24:59] 10Operations, 10Research, 10SRE-Access-Requests, 10Patch-For-Review: Revoke @pirroh's shell access - https://phabricator.wikimedia.org/T222085 (10Dzahn) Thanks @Miriam for making the ticket. Done. puppet is removing the user. for example on bast1002: ` Apr 29 17:22:17 bast1002 puppet-agent[1667]: (/Stage... [17:27:03] (03PS1) 10Paladox: scap: Stop cloning over /p/ [puppet] - 10https://gerrit.wikimedia.org/r/507072 [17:27:36] (03PS2) 10Paladox: scap: Stop cloning over /p/ [puppet] - 10https://gerrit.wikimedia.org/r/507072 [17:28:35] godog: rsyslog_exporter: error handling stats line: unknown pstats type: 0 [17:28:36] (03PS1) 10Paladox: reportupdater: Stop cloning over /p/ [puppet] - 10https://gerrit.wikimedia.org/r/507073 [17:28:51] (03PS2) 10Paladox: reportupdater: Stop cloning over /p/ [puppet] - 10https://gerrit.wikimedia.org/r/507073 [17:29:36] (03PS3) 10Paladox: reportupdater: Stop cloning over /p/ [puppet] - 10https://gerrit.wikimedia.org/r/507073 [17:29:59] (03PS1) 10Paladox: toollabs: Stop cloning over /p/ [puppet] - 10https://gerrit.wikimedia.org/r/507074 [17:30:15] (03PS2) 10Paladox: toollabs: Stop cloning over /p/ [puppet] - 10https://gerrit.wikimedia.org/r/507074 [17:30:44] (03PS1) 10Paladox: analytics: Stop cloning over /p/ [puppet] - 10https://gerrit.wikimedia.org/r/507075 [17:30:57] (03PS2) 10Paladox: analytics: Stop cloning over /p/ [puppet] - 10https://gerrit.wikimedia.org/r/507075 [17:31:24] (03PS1) 10Paladox: authdns: Stop cloning over /p/ [puppet] - 10https://gerrit.wikimedia.org/r/507076 [17:31:37] (03PS2) 10Paladox: authdns: Stop cloning over /p/ [puppet] - 10https://gerrit.wikimedia.org/r/507076 [17:32:06] (03PS1) 10Paladox: statistics: Stop cloning over /p/ [puppet] - 10https://gerrit.wikimedia.org/r/507077 [17:32:23] (03PS2) 10Paladox: statistics: Stop cloning over /p/ [puppet] - 10https://gerrit.wikimedia.org/r/507077 [17:32:27] (03PS1) 10Jgreen: remove deprecated silverpop 1024-bit domainkey [dns] - 10https://gerrit.wikimedia.org/r/507078 (https://phabricator.wikimedia.org/T214525) [17:32:41] (03PS3) 10Paladox: statistics: Stop cloning over /p/ [puppet] - 10https://gerrit.wikimedia.org/r/507077 [17:33:09] (03PS1) 10Paladox: snapshot: Stop cloning over /p/ [puppet] - 10https://gerrit.wikimedia.org/r/507079 [17:33:50] (03PS2) 10Paladox: puppetmaster: Stop cloning over /p/ [puppet] - 10https://gerrit.wikimedia.org/r/507079 [17:34:06] (03CR) 10Jgreen: [C: 03+2] remove deprecated silverpop 1024-bit domainkey [dns] - 10https://gerrit.wikimedia.org/r/507078 (https://phabricator.wikimedia.org/T214525) (owner: 10Jgreen) [17:34:43] (03PS1) 10Paladox: mgmt: Stop cloning over /p/ [puppet] - 10https://gerrit.wikimedia.org/r/507083 [17:35:01] (03PS2) 10Paladox: mgmt: Stop cloning over /p/ [puppet] - 10https://gerrit.wikimedia.org/r/507083 [17:35:08] PROBLEM - puppet last run on deploy2001 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 6 minutes ago with 1 failures. Failed resources (up to 3 shown): Scap_source[gervert/deploy] [17:35:46] (03PS1) 10Paladox: phragile: Stop cloning over /p/ [puppet] - 10https://gerrit.wikimedia.org/r/507084 [17:35:51] !log authdns-update to deploy T214525 [17:36:05] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:36:06] T214525: remove IBM/Silverpop 1024-bit domain key - https://phabricator.wikimedia.org/T214525 [17:36:09] (03PS3) 10Dzahn: mgmt: Stop cloning over /p/ [puppet] - 10https://gerrit.wikimedia.org/r/507083 (owner: 10Paladox) [17:36:15] (03PS2) 10Paladox: phragile: Stop cloning over /p/ [puppet] - 10https://gerrit.wikimedia.org/r/507084 [17:36:45] (03CR) 10Dzahn: [C: 03+2] mgmt: Stop cloning over /p/ [puppet] - 10https://gerrit.wikimedia.org/r/507083 (owner: 10Paladox) [17:37:00] thanks mutante! [17:37:01] (03PS3) 10Paladox: scap: Stop cloning over /p/ [puppet] - 10https://gerrit.wikimedia.org/r/507072 [17:37:34] 10Operations, 10Patch-For-Review: Replacement of network::constant's special_hosts - https://phabricator.wikimedia.org/T220894 (10Krenair) [17:37:38] (03PS4) 10Paladox: statistics: Stop cloning over /p/ [puppet] - 10https://gerrit.wikimedia.org/r/507077 [17:37:42] (03PS15) 10Mathew.onipe: elasticsearch: config file for aligning puppet config [puppet] - 10https://gerrit.wikimedia.org/r/506378 (https://phabricator.wikimedia.org/T218932) [17:37:44] (03PS7) 10Mathew.onipe: icinga: create and apply cirrus config check [puppet] - 10https://gerrit.wikimedia.org/r/507045 (https://phabricator.wikimedia.org/T218932) [17:38:12] (03PS1) 10Paladox: toolforge: Stop cloning over /p/ [puppet] - 10https://gerrit.wikimedia.org/r/507087 [17:38:28] (03PS2) 10Paladox: toolforge: Stop cloning over /p/ [puppet] - 10https://gerrit.wikimedia.org/r/507087 [17:38:55] 10Operations, 10DNS, 10Traffic, 10fundraising-tech-ops, 10Patch-For-Review: remove IBM/Silverpop 1024-bit domain key - https://phabricator.wikimedia.org/T214525 (10Jgreen) 05Open→03Resolved a:03Jgreen this is deployed [17:39:03] (03PS1) 10Paladox: openldap: Stop cloning over /p/ [puppet] - 10https://gerrit.wikimedia.org/r/507088 [17:39:24] (03PS2) 10Paladox: openldap: Stop cloning over /p/ [puppet] - 10https://gerrit.wikimedia.org/r/507088 [17:39:34] PROBLEM - Work requests waiting in Zuul Gearman server on contint1001 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [140.0] https://grafana.wikimedia.org/dashboard/db/zuul-gearman?panelId=10&fullscreen&orgId=1 [17:39:57] (03PS3) 10Paladox: openldap: Stop cloning over /p/ [puppet] - 10https://gerrit.wikimedia.org/r/507088 [17:40:42] (03PS3) 10Paladox: puppetmaster: Stop cloning over /p/ [puppet] - 10https://gerrit.wikimedia.org/r/507079 [17:40:53] (03CR) 10Jbond: "> Patch Set 1:" [puppet] - 10https://gerrit.wikimedia.org/r/507056 (owner: 10Jbond) [17:40:57] (03PS3) 10Paladox: toollabs: Stop cloning over /p/ [puppet] - 10https://gerrit.wikimedia.org/r/507074 [17:41:50] 10Operations, 10Research, 10SRE-Access-Requests, 10Patch-For-Review: Request access to data for citation usage research - https://phabricator.wikimedia.org/T198662 (10Dzahn) [17:41:53] 10Operations, 10Research, 10SRE-Access-Requests, 10Patch-For-Review: Revoke @pirroh's shell access - https://phabricator.wikimedia.org/T222085 (10Dzahn) 05Open→03Resolved [17:43:14] 10Operations, 10Research, 10SRE-Access-Requests, 10Patch-For-Review: Revoke @pirroh's shell access - https://phabricator.wikimedia.org/T222085 (10Miriam) Thank you so much @dzahn! [17:43:42] (03PS3) 10Ottomata: Refactor eventgate-analytics to eventgate [deployment-charts] - 10https://gerrit.wikimedia.org/r/506166 (https://phabricator.wikimedia.org/T218346) [17:45:11] (03CR) 10BryanDavis: toollabs: Stop cloning over /p/ (033 comments) [puppet] - 10https://gerrit.wikimedia.org/r/507074 (owner: 10Paladox) [17:46:33] (03CR) 10Dzahn: "the ticket is https://phabricator.wikimedia.org/T218844" [puppet] - 10https://gerrit.wikimedia.org/r/507074 (owner: 10Paladox) [17:46:43] (03PS3) 10BBlack: wm.org no-op cleanup: no empty left-hand-side [dns] - 10https://gerrit.wikimedia.org/r/501610 [17:46:45] (03PS3) 10BBlack: wm.org no-op cleanup: prefer @ to wikimedia.org [dns] - 10https://gerrit.wikimedia.org/r/501611 [17:46:47] (03PS3) 10BBlack: wm.org no-op cleanup: Group on hostnames for meta [dns] - 10https://gerrit.wikimedia.org/r/501612 [17:46:49] (03PS3) 10BBlack: wm.org no-op cleanup: re-arrange top section [dns] - 10https://gerrit.wikimedia.org/r/501613 [17:46:51] (03PS3) 10BBlack: ns[012].wikimedia.org: 1D TTLs [dns] - 10https://gerrit.wikimedia.org/r/501614 [17:46:56] (03PS4) 10BBlack: wm.org: 1h non-dyna records for foo.dcname entries [dns] - 10https://gerrit.wikimedia.org/r/501615 [17:46:58] (03PS1) 10BBlack: wm.org no-op cleanup: move other meta up from end [dns] - 10https://gerrit.wikimedia.org/r/507093 [17:47:00] (03PS4) 10Dzahn: toollabs: Stop cloning over /p/ [puppet] - 10https://gerrit.wikimedia.org/r/507074 (https://phabricator.wikimedia.org/T218844) (owner: 10Paladox) [17:47:11] (03PS4) 10Ottomata: Refactor eventgate-analytics to eventgate [deployment-charts] - 10https://gerrit.wikimedia.org/r/506166 (https://phabricator.wikimedia.org/T218346) [17:47:26] (03PS4) 10Dzahn: openldap: Stop cloning over /p/ [puppet] - 10https://gerrit.wikimedia.org/r/507088 (https://phabricator.wikimedia.org/T218844) (owner: 10Paladox) [17:47:31] (03PS3) 10Dzahn: toolforge: Stop cloning over /p/ [puppet] - 10https://gerrit.wikimedia.org/r/507087 (https://phabricator.wikimedia.org/T218844) (owner: 10Paladox) [17:47:35] (03PS3) 10Dzahn: phragile: Stop cloning over /p/ [puppet] - 10https://gerrit.wikimedia.org/r/507084 (https://phabricator.wikimedia.org/T218844) (owner: 10Paladox) [17:47:41] (03PS5) 10Ottomata: Refactor eventgate-analytics to eventgate [deployment-charts] - 10https://gerrit.wikimedia.org/r/506166 (https://phabricator.wikimedia.org/T218346) [17:47:43] (03PS4) 10Dzahn: puppetmaster: Stop cloning over /p/ [puppet] - 10https://gerrit.wikimedia.org/r/507079 (https://phabricator.wikimedia.org/T218844) (owner: 10Paladox) [17:47:46] (03PS5) 10Dzahn: statistics: Stop cloning over /p/ [puppet] - 10https://gerrit.wikimedia.org/r/507077 (https://phabricator.wikimedia.org/T218844) (owner: 10Paladox) [17:47:51] (03PS3) 10Dzahn: authdns: Stop cloning over /p/ [puppet] - 10https://gerrit.wikimedia.org/r/507076 (https://phabricator.wikimedia.org/T218844) (owner: 10Paladox) [17:47:56] (03PS3) 10Dzahn: analytics: Stop cloning over /p/ [puppet] - 10https://gerrit.wikimedia.org/r/507075 (https://phabricator.wikimedia.org/T218844) (owner: 10Paladox) [17:48:22] (03CR) 10Lucas Werkmeister (WMDE): [C: 03+1] Add www4.bibl.ulaval.ca to wgCopyUploadsDomains [mediawiki-config] - 10https://gerrit.wikimedia.org/r/503561 (https://phabricator.wikimedia.org/T220704) (owner: 10Framawiki) [17:49:37] (03PS16) 10Mathew.onipe: elasticsearch: config file for aligning puppet config [puppet] - 10https://gerrit.wikimedia.org/r/506378 (https://phabricator.wikimedia.org/T218932) [17:49:39] (03PS8) 10Mathew.onipe: icinga: create and apply cirrus config check [puppet] - 10https://gerrit.wikimedia.org/r/507045 (https://phabricator.wikimedia.org/T218932) [17:49:52] (03CR) 10BryanDavis: toolforge: Stop cloning over /p/ (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/507087 (https://phabricator.wikimedia.org/T218844) (owner: 10Paladox) [17:50:41] (03CR) 10Lucas Werkmeister (WMDE): [C: 03+1] Publish throttle-analyze at noc [mediawiki-config] - 10https://gerrit.wikimedia.org/r/481267 (https://phabricator.wikimedia.org/T187894) (owner: 10Framawiki) [17:51:50] (03PS1) 10Bstorm: cloudstore: add ferm rules for rsync on the scratch/maps cluster [puppet] - 10https://gerrit.wikimedia.org/r/507094 (https://phabricator.wikimedia.org/T209527) [17:55:33] (03PS1) 10Alex Monk: [DNM][WIP] See what happens in puppet-compiler if we remove the ssh class [puppet] - 10https://gerrit.wikimedia.org/r/507095 [17:56:05] jouncebot, next [17:56:05] In 0 hour(s) and 3 minute(s): Morning SWAT (Max 6 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20190429T1800) [17:56:06] (03CR) 10Lucas Werkmeister (WMDE): "Looks okay to me, but is there any guidance on how/where to run these tests? When I try it locally, it fails." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/486382 (https://phabricator.wikimedia.org/T214632) (owner: 10Framawiki) [17:56:25] (03CR) 10Bstorm: [C: 03+2] cloudstore: add ferm rules for rsync on the scratch/maps cluster [puppet] - 10https://gerrit.wikimedia.org/r/507094 (https://phabricator.wikimedia.org/T209527) (owner: 10Bstorm) [17:56:36] (03CR) 10Alex Monk: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/507095 (owner: 10Alex Monk) [17:58:14] hmm gerrit feels slow [17:59:11] cc thcipriani ^^ [18:00:04] Deploy window Morning SWAT (Max 6 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20190429T1800) [18:00:04] framawiki, Lucas_WMDE, and Krenair: A patch you scheduled for Morning SWAT (Max 6 patches) is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker. [18:00:12] o/ [18:00:13] o/ [18:00:16] o/ [18:00:57] does jouncebot not ping SWATters anymore? [18:01:14] addshore, Antoine (hashar), Katie (aude), Max (MaxSem), Mukunda (twentyafterfour), Roan (RoanKattouw), Sébastien (Dereckson), Tyler (thcipriani), Niharika (Niharika), or Željko (zeljkof) [18:02:42] framawiki: are you a deployer? [18:02:54] no, sorry [18:03:03] no problem [18:03:07] I could deploy your changes [18:03:23] (03PS1) 10Dzahn: requesttracker: remove has_default_mail_relay: false [puppet] - 10https://gerrit.wikimedia.org/r/507096 [18:03:24] would be great, thanks! [18:03:33] if the SWAT people aren’t around… [18:03:39] let’s wait until :05 perhaps [18:03:40] (03PS2) 10Framawiki: Add www4.bibl.ulaval.ca to wgCopyUploadsDomains [mediawiki-config] - 10https://gerrit.wikimedia.org/r/503561 (https://phabricator.wikimedia.org/T220704) [18:03:56] (03PS5) 10Framawiki: [tests] wgExtraNamespaces should not contain colons [mediawiki-config] - 10https://gerrit.wikimedia.org/r/486382 (https://phabricator.wikimedia.org/T214632) [18:04:06] I can do the SWAT [18:04:17] yay [18:04:19] thanks [18:05:39] (03PS6) 10Framawiki: Publish throttle-analyze at noc [mediawiki-config] - 10https://gerrit.wikimedia.org/r/481267 (https://phabricator.wikimedia.org/T187894) [18:06:31] Lucas_WMDE: Your testcommons change depends on the production wikidata change which is not proposed to be deployed in this window [18:06:37] oh [18:06:39] right [18:06:42] let me shuffle them around a bit [18:06:52] I can deploy the test wikidata change, but you'll have to fix the dependency tree if you want to deploy the test commons change [18:07:03] I'll do framawiki's changes first so you can fix that [18:07:24] yep, thanks [18:07:26] (03CR) 10Catrope: [C: 03+2] Add www4.bibl.ulaval.ca to wgCopyUploadsDomains [mediawiki-config] - 10https://gerrit.wikimedia.org/r/503561 (https://phabricator.wikimedia.org/T220704) (owner: 10Framawiki) [18:08:18] PROBLEM - Check systemd state on cloudstore1009 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [18:08:39] (03PS1) 10Bstorm: cloudstore: correct problems in ferm rules [puppet] - 10https://gerrit.wikimedia.org/r/507097 (https://phabricator.wikimedia.org/T209527) [18:08:47] (03Merged) 10jenkins-bot: Add www4.bibl.ulaval.ca to wgCopyUploadsDomains [mediawiki-config] - 10https://gerrit.wikimedia.org/r/503561 (https://phabricator.wikimedia.org/T220704) (owner: 10Framawiki) [18:09:21] (03PS2) 10Lucas Werkmeister (WMDE): Serialize empty lists as objects on Test Wikidata [mediawiki-config] - 10https://gerrit.wikimedia.org/r/507030 (https://phabricator.wikimedia.org/T138104) [18:09:23] (03PS2) 10Lucas Werkmeister (WMDE): Serialize empty lists as objects on Test Commons [mediawiki-config] - 10https://gerrit.wikimedia.org/r/507034 (https://phabricator.wikimedia.org/T138104) [18:09:25] (03PS2) 10Lucas Werkmeister (WMDE): Serialize empty lists as objects on Wikidata [mediawiki-config] - 10https://gerrit.wikimedia.org/r/507031 (https://phabricator.wikimedia.org/T138104) [18:09:27] (03PS3) 10Lucas Werkmeister (WMDE): Serialize empty lists as objects on Commons [mediawiki-config] - 10https://gerrit.wikimedia.org/r/507032 (https://phabricator.wikimedia.org/T138104) [18:09:31] threads going up [18:09:37] according to https://gerrit.wikimedia.org/r/monitoring?part=graph&graph=activeThreads [18:10:16] framawiki: Your first change (copy uploads) is on mwdebug1002, please test (insofar as it can be tested) [18:10:26] paladox: normal during deploy? [18:10:30] nope [18:10:39] this is likley the same problem we've been having [18:10:42] ok [18:10:43] Lucas_WMDE: jouncebot didn't ping because someone removed all the {{ircnick}} templates from the ping line; I've fixed it. [18:10:52] ah, okay, thanks [18:10:54] users will notice slowness of gerrit. [18:10:56] (03CR) 10Bstorm: [C: 03+2] cloudstore: correct problems in ferm rules [puppet] - 10https://gerrit.wikimedia.org/r/507097 (https://phabricator.wikimedia.org/T209527) (owner: 10Bstorm) [18:10:57] like me [18:11:11] paladox: cant say i do [18:11:26] some users will see it compared to others. [18:11:33] ack [18:11:36] just like the phab issue with apache. [18:12:05] (03CR) 10Jgreen: [C: 03+2] Add failover URL and public IP for frmon* [dns] - 10https://gerrit.wikimedia.org/r/506707 (https://phabricator.wikimedia.org/T221475) (owner: 10Cdentinger) [18:12:07] (03PS6) 10Jgreen: Add failover URL and public IP for frmon* [dns] - 10https://gerrit.wikimedia.org/r/506707 (https://phabricator.wikimedia.org/T221475) (owner: 10Cdentinger) [18:12:13] RoanKattouw: lgtm on mwdebug1002 [18:12:52] (03PS7) 10Catrope: Publish throttle-analyze at noc [mediawiki-config] - 10https://gerrit.wikimedia.org/r/481267 (https://phabricator.wikimedia.org/T187894) (owner: 10Framawiki) [18:13:00] (03CR) 10Catrope: [C: 03+2] Publish throttle-analyze at noc [mediawiki-config] - 10https://gerrit.wikimedia.org/r/481267 (https://phabricator.wikimedia.org/T187894) (owner: 10Framawiki) [18:13:23] (03PS6) 10Catrope: [tests] wgExtraNamespaces should not contain colons [mediawiki-config] - 10https://gerrit.wikimedia.org/r/486382 (https://phabricator.wikimedia.org/T214632) (owner: 10Framawiki) [18:13:27] !log catrope@deploy1001 Synchronized wmf-config/InitialiseSettings.php: Add www4.bibl.ulaval.ca to wgCopyUploadsDomains (T220704) (duration: 00m 53s) [18:13:28] (03CR) 10Catrope: [C: 03+2] [tests] wgExtraNamespaces should not contain colons [mediawiki-config] - 10https://gerrit.wikimedia.org/r/486382 (https://phabricator.wikimedia.org/T214632) (owner: 10Framawiki) [18:13:31] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:13:31] T220704: Please add to $wgCopyUploadsDomains - https://phabricator.wikimedia.org/T220704 [18:14:13] (03Merged) 10jenkins-bot: Publish throttle-analyze at noc [mediawiki-config] - 10https://gerrit.wikimedia.org/r/481267 (https://phabricator.wikimedia.org/T187894) (owner: 10Framawiki) [18:14:38] (03Merged) 10jenkins-bot: [tests] wgExtraNamespaces should not contain colons [mediawiki-config] - 10https://gerrit.wikimedia.org/r/486382 (https://phabricator.wikimedia.org/T214632) (owner: 10Framawiki) [18:16:12] framawiki: Your other two patches (throttle-analyze at noc, and tests) are on mwdebug1002, please test (I'm guessing the test change doesn't directly affect anything, only the unit tests for the repo) [18:16:56] (03PS3) 10Lucas Werkmeister (WMDE): Serialize empty lists as objects on Test Wikidata [mediawiki-config] - 10https://gerrit.wikimedia.org/r/507030 (https://phabricator.wikimedia.org/T138104) [18:16:58] (03PS3) 10Lucas Werkmeister (WMDE): Serialize empty lists as objects on Test Commons [mediawiki-config] - 10https://gerrit.wikimedia.org/r/507034 (https://phabricator.wikimedia.org/T138104) [18:17:00] (03PS3) 10Lucas Werkmeister (WMDE): Serialize empty lists as objects on Wikidata [mediawiki-config] - 10https://gerrit.wikimedia.org/r/507031 (https://phabricator.wikimedia.org/T138104) [18:17:02] (03PS4) 10Lucas Werkmeister (WMDE): Serialize empty lists as objects on Commons [mediawiki-config] - 10https://gerrit.wikimedia.org/r/507032 (https://phabricator.wikimedia.org/T138104) [18:17:17] RoanKattouw: Yes test patch is not testable like that [18:18:07] (03CR) 10Dzahn: [C: 03+2] requesttracker: remove has_default_mail_relay: false [puppet] - 10https://gerrit.wikimedia.org/r/507096 (owner: 10Dzahn) [18:18:14] (03PS2) 10Dzahn: requesttracker: remove has_default_mail_relay: false [puppet] - 10https://gerrit.wikimedia.org/r/507096 [18:18:31] (03PS1) 10Ottomata: Enable cirrussearch-request logging to eventgate-analytics for group0 wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/507099 (https://phabricator.wikimedia.org/T214080) [18:19:04] Yeah I thought so. Well let me know if the throttle-analyze patch is working [18:19:06] RoanKattouw: about the noc one, file is not available as it should be (https://noc.wikimedia.org/conf/highlight.php?file=throttle-analyze.php is 404), that looks the same as it was in the last tried then reverted deploy (see Urbanecm comment in https://gerrit.wikimedia.org/r/#/c/operations/mediawiki-config/+/481267/) [18:19:19] 10Operations, 10cloud-services-team, 10serviceops: Change /r/p/ to /r/ on all hosts (where https://gerrit.wikimedia.org/r/p/ exists) - https://phabricator.wikimedia.org/T222093 (10Paladox) [18:19:29] yup, exactly [18:19:30] I do you think there may be cache problems? [18:19:45] Hmm I also don't know if noc respects XWD [18:19:55] XWD? [18:19:56] It looks pretty safe so I'll just deploy it [18:19:59] (03CR) 10Thcipriani: [C: 03+1] gerrit: Enable G1 GC [puppet] - 10https://gerrit.wikimedia.org/r/327763 (https://phabricator.wikimedia.org/T221026) (owner: 10Paladox) [18:20:08] Sorry, X-Wikimedia-Debug, the header used by the WikimediaDebug browser extension [18:20:09] perhaps it is possible to curl localhost from server or smething like that [18:20:26] ah, i see RoanKattouw [18:20:55] but at worst, even if it doesn't work, it's unlikely to crash the whole thing. [18:20:59] so as you want RoanKattouw [18:21:02] let me know if you have some space left in SWAT, its full officially [18:21:03] (03CR) 10Ppchelko: [C: 03+1] Enable cirrussearch-request logging to eventgate-analytics for group0 wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/507099 (https://phabricator.wikimedia.org/T214080) (owner: 10Ottomata) [18:21:16] 10Operations, 10TechCom, 10Core Platform Team (Session Management Service (CDP2)), 10Core Platform Team Backlog (Later), and 6 others: Establish an SLA for session storage - https://phabricator.wikimedia.org/T211721 (10EvanProdromou) @aaron I lost the thread here. Could you give me some candidate numbers f... [18:21:24] !log catrope@deploy1001 Synchronized docroot/noc: Publish throttle-analyze at noc (T187894) (duration: 00m 53s) [18:21:28] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:21:29] T187894: Publish throttle-analyze.php to noc.wikimedia.org - https://phabricator.wikimedia.org/T187894 [18:21:31] Urbanecm: If you're willing to wait for the end, sure [18:21:42] better than waiting for tomorrow eu :) [18:21:52] 10Operations, 10TechCom, 10Core Platform Team (Session Management Service (CDP2)), 10Core Platform Team Backlog (Later), and 6 others: Establish an SLA for session storage - https://phabricator.wikimedia.org/T211721 (10EvanProdromou) @Eevans have we been considering cross-DC writes in the performance testi... [18:22:06] framawiki: OK well https://noc.wikimedia.org/conf/highlight.php?file=throttle-analyze.php looks like it's working [18:22:17] yay [18:22:18] yeah for me too! [18:22:26] 10Operations, 10TechCom, 10Core Platform Team (Session Management Service (CDP2)), 10Core Platform Team Backlog (Later), and 6 others: Establish an SLA for session storage - https://phabricator.wikimedia.org/T211721 (10EvanProdromou) @Krinkle so, we need p99 and max on the above table? and no p50? Also, I... [18:22:27] so lgtm [18:22:31] !log authdns-update for T221475 [18:22:34] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:22:35] T221475: Network setup for frmon2001.frack.codfw.wmnet - https://phabricator.wikimedia.org/T221475 [18:22:41] was just cache problem [18:23:42] (03CR) 10Dzahn: "> I'm wondering if has_default_mail_relay:false is actually still needed for RT and racktables?" [puppet] - 10https://gerrit.wikimedia.org/r/506989 (owner: 10Jbond) [18:23:45] or as you said noc doesn't respect debug tag, and it only worked when it went into production [18:25:07] (03CR) 10Catrope: [C: 03+2] Serialize empty lists as objects on Test Wikidata [mediawiki-config] - 10https://gerrit.wikimedia.org/r/507030 (https://phabricator.wikimedia.org/T138104) (owner: 10Lucas Werkmeister (WMDE)) [18:25:11] OK, Lucas_WMDE next [18:25:22] RoanKattouw: are you deploying my changes? [18:25:45] I could also do it [18:25:46] RoanKattouw, added the patch to the calendar [18:25:50] (03PS5) 10Paladox: toolforge: update origin URL for integration/composer.git clones [puppet] - 10https://gerrit.wikimedia.org/r/507074 (https://phabricator.wikimedia.org/T218844) [18:25:53] (03CR) 10Ppchelko: Refactor eventgate-analytics to eventgate (031 comment) [deployment-charts] - 10https://gerrit.wikimedia.org/r/506166 (https://phabricator.wikimedia.org/T218346) (owner: 10Ottomata) [18:26:15] (03Merged) 10jenkins-bot: Serialize empty lists as objects on Test Wikidata [mediawiki-config] - 10https://gerrit.wikimedia.org/r/507030 (https://phabricator.wikimedia.org/T138104) (owner: 10Lucas Werkmeister (WMDE)) [18:26:31] (03CR) 10Paladox: toolforge: update origin URL for integration/composer.git clones (033 comments) [puppet] - 10https://gerrit.wikimedia.org/r/507074 (https://phabricator.wikimedia.org/T218844) (owner: 10Paladox) [18:26:32] 10Operations, 10cloud-services-team, 10serviceops: Change /r/p/ to /r/ on all hosts (where https://gerrit.wikimedia.org/r/p/ exists) - https://phabricator.wikimedia.org/T222093 (10bd808) @Paladox can you make the description more concrete? I think you have some idea from your Puppet patches about specific gi... [18:26:43] !log otto@deploy1001 scap-helm eventgate-analytics upgrade staging -f analytics/eventgate-analytics-staging-values.yaml --reset-values stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging] [18:26:45] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:26:45] !log otto@deploy1001 scap-helm eventgate-analytics cluster staging completed [18:26:45] !log otto@deploy1001 scap-helm eventgate-analytics finished [18:26:48] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:26:50] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:26:53] 10Operations, 10cloud-services-team, 10serviceops: Change /r/p/ to /r/ on all hosts (where https://gerrit.wikimedia.org/r/p/ exists) - https://phabricator.wikimedia.org/T222093 (10bd808) [18:28:27] Lucas_WMDE: testwikidata on mwdebug1002, please test [18:28:47] RoanKattouw: works, okay to deploy [18:28:54] thanks for your work RoanKattouw. [18:29:04] (03CR) 10Catrope: [C: 03+2] Serialize empty lists as objects on Test Commons [mediawiki-config] - 10https://gerrit.wikimedia.org/r/507034 (https://phabricator.wikimedia.org/T138104) (owner: 10Lucas Werkmeister (WMDE)) [18:29:06] (03PS6) 10Ottomata: Refactor eventgate-analytics to eventgate [deployment-charts] - 10https://gerrit.wikimedia.org/r/506166 (https://phabricator.wikimedia.org/T218346) [18:29:10] (03CR) 10Ottomata: Refactor eventgate-analytics to eventgate (031 comment) [deployment-charts] - 10https://gerrit.wikimedia.org/r/506166 (https://phabricator.wikimedia.org/T218346) (owner: 10Ottomata) [18:29:17] (03CR) 10Dzahn: "btw, also see https://phabricator.wikimedia.org/T103161" [puppet] - 10https://gerrit.wikimedia.org/r/327763 (https://phabricator.wikimedia.org/T221026) (owner: 10Paladox) [18:29:32] (03PS8) 10Dzahn: gerrit: Enable G1 GC [puppet] - 10https://gerrit.wikimedia.org/r/327763 (https://phabricator.wikimedia.org/T221026) (owner: 10Paladox) [18:29:34] !log jbond@cumin1001 conftool action : set/pooled=no; selector: name=mw122[1-6].eqiad.wmnet [18:29:36] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:30:13] (03Merged) 10jenkins-bot: Serialize empty lists as objects on Test Commons [mediawiki-config] - 10https://gerrit.wikimedia.org/r/507034 (https://phabricator.wikimedia.org/T138104) (owner: 10Lucas Werkmeister (WMDE)) [18:30:15] !log catrope@deploy1001 Synchronized wmf-config/InitialiseSettings.php: Serialize empty lists as objects on Test Wikidata (T138104) (duration: 00m 53s) [18:30:18] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:30:19] T138104: Do not serialize empty containers (descriptions/aliases/sitelinks) as empty array [] - https://phabricator.wikimedia.org/T138104 [18:31:32] (03PS1) 10Ottomata: eventgate-analytics - precache mediawiki/cirrussearch/request schema [deployment-charts] - 10https://gerrit.wikimedia.org/r/507100 [18:32:21] (03CR) 10Ottomata: [V: 03+2 C: 03+2] eventgate-analytics - precache mediawiki/cirrussearch/request schema [deployment-charts] - 10https://gerrit.wikimedia.org/r/507100 (owner: 10Ottomata) [18:33:00] (03CR) 10Dzahn: [C: 03+1] "per Tyler's comments on ticket, many folks on the upstream mailing list running large Gerrit installations use G1. though how about the "" [puppet] - 10https://gerrit.wikimedia.org/r/327763 (https://phabricator.wikimedia.org/T221026) (owner: 10Paladox) [18:33:01] Lucas_WMDE: And testcommons is now ready on mwdebug1002 [18:33:15] !log otto@deploy1001 scap-helm eventgate-analytics upgrade staging -f analytics/eventgate-analytics-staging-values.yaml --reset-values stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging] [18:33:16] !log otto@deploy1001 scap-helm eventgate-analytics cluster staging completed [18:33:16] !log otto@deploy1001 scap-helm eventgate-analytics finished [18:33:17] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:33:19] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:33:21] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:33:22] RECOVERY - puppet last run on deploy2001 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures [18:33:44] (03CR) 10Paladox: "> per Tyler's comments on ticket, many folks on the upstream mailing" [puppet] - 10https://gerrit.wikimedia.org/r/327763 (https://phabricator.wikimedia.org/T221026) (owner: 10Paladox) [18:33:46] (03CR) 10Dzahn: [C: 03+2] gerrit: Enable G1 GC [puppet] - 10https://gerrit.wikimedia.org/r/327763 (https://phabricator.wikimedia.org/T221026) (owner: 10Paladox) [18:33:46] hm, doesn’t seem to be working as expected [18:34:01] (03CR) 10Catrope: "> Patch Set 1:" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/506374 (https://phabricator.wikimedia.org/T221829) (owner: 10Urbanecm) [18:34:02] PROBLEM - Request latencies on neon is CRITICAL: instance=10.64.0.40:6443 verb=CONNECT https://grafana.wikimedia.org/dashboard/db/kubernetes-api [18:34:12] thanks mutante!! [18:34:17] (03CR) 10Catrope: [C: 03+2] Change wikimaniawiki logo to Wikimania 2019 version [mediawiki-config] - 10https://gerrit.wikimedia.org/r/506374 (https://phabricator.wikimedia.org/T221829) (owner: 10Urbanecm) [18:34:42] !log jbond@cumin1001 conftool action : set/pooled=yes; selector: name=mw122[1-6].eqiad.wmnet [18:34:45] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:34:45] Lucas_WMDE: That would be because I forgot to hit enter, oops. Try now. [18:35:20] (03Merged) 10jenkins-bot: Change wikimaniawiki logo to Wikimania 2019 version [mediawiki-config] - 10https://gerrit.wikimedia.org/r/506374 (https://phabricator.wikimedia.org/T221829) (owner: 10Urbanecm) [18:35:48] 10Operations, 10RESTBase-Cassandra, 10Patch-For-Review: consider moving Cassandra to G1GC in production - https://phabricator.wikimedia.org/T103161 (10Dzahn) also T221026#5143639 for usign G1 GC on Gerrit [18:36:10] okay, now it’s working [18:36:19] (my XWD also disabled itself in between, fun) [18:36:21] please deploy [18:37:16] !log catrope@deploy1001 Synchronized wmf-config/InitialiseSettings.php: Serialize empty lists as objects on Test Commons (T138104) (duration: 00m 54s) [18:37:20] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:37:21] T138104: Do not serialize empty containers (descriptions/aliases/sitelinks) as empty array [] - https://phabricator.wikimedia.org/T138104 [18:37:54] !log jbond@cumin1001 conftool action : set/pooled=no; selector: name=mw122[8-9].eqiad.wmnet [18:37:56] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:38:15] (03CR) 10Dzahn: "deployed on cobalt. javaOptions have been changed and "Scheduling refresh of Exec[systemd daemon-reload for gerrit.service" but that does " [puppet] - 10https://gerrit.wikimedia.org/r/327763 (https://phabricator.wikimedia.org/T221026) (owner: 10Paladox) [18:39:13] jbond42: pooling appservers during deployment might lead to them being out of sync [18:39:14] RECOVERY - Request latencies on neon is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/kubernetes-api [18:39:31] oh sorry i will stop [18:39:40] jbond42: just run "scap pull" on them [18:39:44] one sec i will repool [18:39:53] ack [18:39:57] that should get them up2date.. cool [18:40:07] RoanKattouw, before syncing mine, please can you check the hash of OpenStackNovaUser.php on labweb1001.wikimedia.org against the version to be deployed? [18:40:42] Andrew did test it at one point and I have a feeling it didn't get reverted, so technically the change might already be out there [18:41:04] mutante: I imagine it's hard but we should fix that (from a process perspective) at some point [18:41:12] !log jbond@cumin1001 conftool action : set/pooled=yes; selector: name=mw122[8-9].eqiad.wmnet [18:41:14] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:41:22] !log otto@deploy1001 scap-helm eventgate-analytics upgrade staging -f analytics/eventgate-analytics-staging-values.yaml --reset-values stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging] [18:41:24] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:41:25] !log otto@deploy1001 scap-helm eventgate-analytics cluster staging completed [18:41:25] !log otto@deploy1001 scap-helm eventgate-analytics finished [18:41:27] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:41:29] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:42:02] !log catrope@deploy1001 Synchronized static/images/project-logos/: Change wikimaniawiki logo to Wikimania 2019 version (T221829) (duration: 00m 54s) [18:42:06] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:42:06] T221829: Change logo for wikimaniawiki to Wikimania 2019 specific version - https://phabricator.wikimedia.org/T221829 [18:42:33] cdanis: yea.. maybe something like a lock file that is created by jouncebot [18:42:44] Urbanecm: Deployed and URLs purged, please verify [18:42:48] thanks RoanKattouw [18:42:54] I'm always paranoid that I might not have done the purges correctly [18:43:01] LGTM RoanKattouw ! [18:43:11] at least, purges shouldn't be able to turn the sites down :) [18:43:15] mutante: I was more thinking something like a cookbook that knew to always scap pull before repooling [18:43:20] Krenair: I do not have ssh access to labweb1001 [18:43:28] huh [18:43:39] Crap I didn't realize that your patch was an OpenStackManager patch [18:43:44] I'm not sure if I can deploy those [18:44:09] hieradata/role/eqiad/wmcs/openstack/eqiad1/labweb.yaml: - deployment [18:44:34] !log otto@deploy1001 scap-helm eventgate-analytics upgrade production -f analytics/eventgate-analytics-codfw-values.yaml --reset-values stable/eventgate-analytics [namespace: eventgate-analytics, clusters: codfw] [18:44:36] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:44:37] !log otto@deploy1001 scap-helm eventgate-analytics cluster codfw completed [18:44:37] !log otto@deploy1001 scap-helm eventgate-analytics finished [18:44:40] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:44:41] 10Operations, 10Security-Team, 10Traffic: scan external ranges with current Nessus rulesets - https://phabricator.wikimedia.org/T222097 (10chasemp) [18:44:42] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:45:01] ssh labweb1001.eqiad.wmnet gives me an error message, ssh labweb1001.wikimedia.org times out, seemingly because port 22 isn't open [18:45:08] it's not eqiad.wmnet [18:45:20] you still have to proxy via a bastion to get to labweb1001.wikimedia.org [18:45:27] wtf [18:45:31] OK let me try that [18:45:39] !log otto@deploy1001 scap-helm eventgate-analytics upgrade production -f analytics/eventgate-analytics-eqiad-values.yaml --reset-values stable/eventgate-analytics [namespace: eventgate-analytics, clusters: eqiad] [18:45:41] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:45:41] !log otto@deploy1001 scap-helm eventgate-analytics cluster eqiad completed [18:45:41] !log otto@deploy1001 scap-helm eventgate-analytics finished [18:45:43] I'll have to modify my SSH config to even make that happen [18:45:43] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:45:46] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:46:19] this change was made against silver.wikimedia.org back in 2015-2016 [18:46:29] OK that worked [18:48:02] I'm not actually sure if it still makes sense for those machines to be in wikimedia.org instead of eqiad.wmnet [18:48:19] $ sha1sum nova/OpenStackNovaUser.php [18:48:19] 6c0e04aae3b3e435c2f80099ac601311bcb95cf5 nova/OpenStackNovaUser.php [18:48:25] That's what's curently on there [18:48:26] PROBLEM - Request latencies on neon is CRITICAL: instance=10.64.0.40:6443 verb=CONNECT https://grafana.wikimedia.org/dashboard/db/kubernetes-api [18:49:15] 6c0e04aae3b3e435c2f80099ac601311bcb95cf5 is the new version [18:49:23] okay so the change is already deployed [18:49:44] just wasn't merged on the branch [18:50:02] I confirmed that on the deploy server too, that's the hash I got after doing git pull ; git submodule update extensions/OpenStackManager [18:50:09] and probably wasn't staged on the deployment server [18:50:16] No doesn't look like it [18:51:02] RECOVERY - Request latencies on neon is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/kubernetes-api [18:51:19] well I guess it's all sorted then. Assuming the other file is also already up to date [18:52:10] PROBLEM - Request latencies on acrux is CRITICAL: instance=10.192.0.93:6443 verb=CONNECT https://grafana.wikimedia.org/dashboard/db/kubernetes-api [18:52:54] (03CR) 10Urbanecm: "> Patch Set 3:" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/486382 (https://phabricator.wikimedia.org/T214632) (owner: 10Framawiki) [18:53:30] RECOVERY - Request latencies on acrux is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/kubernetes-api [18:53:32] (03CR) 10jenkins-bot: Add www4.bibl.ulaval.ca to wgCopyUploadsDomains [mediawiki-config] - 10https://gerrit.wikimedia.org/r/503561 (https://phabricator.wikimedia.org/T220704) (owner: 10Framawiki) [18:53:34] (03CR) 10jenkins-bot: Publish throttle-analyze at noc [mediawiki-config] - 10https://gerrit.wikimedia.org/r/481267 (https://phabricator.wikimedia.org/T187894) (owner: 10Framawiki) [18:53:36] (03CR) 10jenkins-bot: [tests] wgExtraNamespaces should not contain colons [mediawiki-config] - 10https://gerrit.wikimedia.org/r/486382 (https://phabricator.wikimedia.org/T214632) (owner: 10Framawiki) [18:53:38] (03CR) 10jenkins-bot: Serialize empty lists as objects on Test Wikidata [mediawiki-config] - 10https://gerrit.wikimedia.org/r/507030 (https://phabricator.wikimedia.org/T138104) (owner: 10Lucas Werkmeister (WMDE)) [18:53:40] (03CR) 10jenkins-bot: Serialize empty lists as objects on Test Commons [mediawiki-config] - 10https://gerrit.wikimedia.org/r/507034 (https://phabricator.wikimedia.org/T138104) (owner: 10Lucas Werkmeister (WMDE)) [18:53:42] (03CR) 10jenkins-bot: Change wikimaniawiki logo to Wikimania 2019 version [mediawiki-config] - 10https://gerrit.wikimedia.org/r/506374 (https://phabricator.wikimedia.org/T221829) (owner: 10Urbanecm) [18:55:34] 10Operations, 10TechCom, 10Core Platform Team (Session Management Service (CDP2)), 10Core Platform Team Backlog (Later), and 6 others: Establish an SLA for session storage - https://phabricator.wikimedia.org/T211721 (10Eevans) >>! In T211721#5144537, @EvanProdromou wrote: > @Eevans have we been considering... [18:57:39] 10Operations, 10TechCom, 10Core Platform Team (Session Management Service (CDP2)), 10Core Platform Team Backlog (Later), and 6 others: Establish an SLA for session storage - https://phabricator.wikimedia.org/T211721 (10mobrovac) >>! In T211721#5144690, @Eevans wrote: >>>! In T211721#5144537, @EvanProdromou... [18:59:09] !log Deployed patch for T221739 [18:59:10] RECOVERY - Work requests waiting in Zuul Gearman server on contint1001 is OK: OK: Less than 30.00% above the threshold [90.0] https://grafana.wikimedia.org/dashboard/db/zuul-gearman?panelId=10&fullscreen&orgId=1 [18:59:11] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:01:18] 10Operations, 10Domains, 10Traffic: Register wiki(m|p)edia.ro - https://phabricator.wikimedia.org/T222080 (10Dzahn) Hi @Strainu please contact @CRoslof directly in this matter. Thank you! [19:01:48] !log deploying config change to enable cirrusssearch-request logging to eventgate-analytics for group0 wikis - T214080 [19:01:51] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:01:52] T214080: Rewrite Avro schemas (ApiAction, CirrusSearchRequestSet) as JSONSchema and produce to EventGate - https://phabricator.wikimedia.org/T214080 [19:01:53] (03CR) 10Ottomata: [C: 03+2] Enable cirrussearch-request logging to eventgate-analytics for group0 wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/507099 (https://phabricator.wikimedia.org/T214080) (owner: 10Ottomata) [19:01:58] (03PS2) 10Ottomata: Enable cirrussearch-request logging to eventgate-analytics for group0 wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/507099 (https://phabricator.wikimedia.org/T214080) [19:02:32] (03PS1) 10Dzahn: remove bast2001.mgmt [dns] - 10https://gerrit.wikimedia.org/r/507102 (https://phabricator.wikimedia.org/T219492) [19:04:29] 10Operations, 10TechCom, 10Core Platform Team (Session Management Service (CDP2)), 10Core Platform Team Backlog (Later), and 6 others: Establish an SLA for session storage - https://phabricator.wikimedia.org/T211721 (10Eevans) >>! In T211721#5144702, @mobrovac wrote: >>>! In T211721#5144690, @Eevans wrote:... [19:04:50] (03CR) 10jenkins-bot: Enable cirrussearch-request logging to eventgate-analytics for group0 wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/507099 (https://phabricator.wikimedia.org/T214080) (owner: 10Ottomata) [19:05:52] !log otto@deploy1001 Synchronized wmf-config/InitialiseSettings.php: Enable cirrussearch-request logging to eventgate-analytics for group0 wikis - T214080 (duration: 00m 53s) [19:05:56] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:07:16] !log otto@deploy1001 sync-file aborted: Enable cirrussearch-request logging to eventgate-analytics for group0 wikis - T214080 (duration: 00m 02s) [19:07:19] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:07:21] T214080: Rewrite Avro schemas (ApiAction, CirrusSearchRequestSet) as JSONSchema and produce to EventGate - https://phabricator.wikimedia.org/T214080 [19:10:33] 10Operations, 10ops-eqiad, 10DC-Ops, 10Epic, and 2 others: Move cloudvirt hosts to 10Gb ethernet - https://phabricator.wikimedia.org/T216195 (10Dzahn) [19:10:36] 10Operations, 10ops-eqiad, 10DC-Ops, 10Epic, and 2 others: relocate/reimage cloudvirt1007 with 10G interfaces - https://phabricator.wikimedia.org/T221047 (10Dzahn) 05Open→03Stalled [19:11:00] 10Operations, 10fundraising-tech-ops, 10netops, 10Patch-For-Review: Network setup for frmon2001.frack.codfw.wmnet - https://phabricator.wikimedia.org/T221475 (10cwdent) @papaul it looks like you wired this up in T196557 but I have tried the hw addresses reported by racadm and am not seeing dhcp packets, co... [19:11:14] (03PS1) 10Bstorm: cloudstore: edit ferm rules a bit more [puppet] - 10https://gerrit.wikimedia.org/r/507104 (https://phabricator.wikimedia.org/T209527) [19:12:06] (03CR) 10jerkins-bot: [V: 04-1] cloudstore: edit ferm rules a bit more [puppet] - 10https://gerrit.wikimedia.org/r/507104 (https://phabricator.wikimedia.org/T209527) (owner: 10Bstorm) [19:13:11] (03PS2) 10Bstorm: cloudstore: edit ferm rules a bit more [puppet] - 10https://gerrit.wikimedia.org/r/507104 (https://phabricator.wikimedia.org/T209527) [19:14:23] (03PS3) 10Bstorm: cloudstore: edit ferm rules a bit more [puppet] - 10https://gerrit.wikimedia.org/r/507104 (https://phabricator.wikimedia.org/T209527) [19:15:24] (03CR) 10Bstorm: [C: 03+2] cloudstore: edit ferm rules a bit more [puppet] - 10https://gerrit.wikimedia.org/r/507104 (https://phabricator.wikimedia.org/T209527) (owner: 10Bstorm) [19:17:21] the bad news: ops prometheus in eqiad might be about to both OOM again. the good news: pretty sure I know exactly what caused it last week 🙃 [19:17:22] !log jbond@cumin1001 conftool action : set/pooled=no; selector: name=mw123[0-5].eqiad.wmnet [19:17:25] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:17:53] progress :D [19:18:23] (03PS2) 10Dzahn: remove bast2001.mgmt [dns] - 10https://gerrit.wikimedia.org/r/507102 (https://phabricator.wikimedia.org/T219492) [19:20:59] !log jbond@cumin1001 conftool action : set/pooled=yes; selector: name=mw123[0-5].eqiad.wmnet [19:21:01] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:21:30] !log jbond@cumin1001 conftool action : set/pooled=no; selector: name=mw123[8-9].eqiad.wmnet [19:21:32] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:24:35] cwd: re: frmon2001 and DHCP.. so does it mean you used wmf-auto-reimage to boot that host? [19:25:16] RECOVERY - Check systemd state on cloudstore1009 is OK: OK - running: The system is fully operational [19:25:31] !log jbond@cumin1001 conftool action : set/pooled=yes; selector: name=mw123[8-9].eqiad.wmnet [19:25:34] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:26:05] mutante: uuuh [19:26:09] i dont' think so? [19:26:17] tftpboot [19:26:23] !log jbond@cumin1001 conftool action : set/pooled=yes; selector: name=mw124[0-4].eqiad.wmnet [19:26:25] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:26:32] cwd: how did you make it boot? [19:26:39] hitting F12 in console? [19:26:39] !log jbond@cumin1001 conftool action : set/pooled=no; selector: name=mw124[0-4].eqiad.wmnet [19:26:42] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:26:48] using the mgmt interface [19:27:32] mutante: yeah shift+esc+2 [19:28:14] cwd: so also not the " racadm config -g cfgServerInfo -o cfgServerFirstBootDevice PXE" in racadm.. i see... looking at install server logs [19:29:06] mutante: i have not typed such a thing [19:29:18] but i do see a lot of noise on the console [19:29:32] i think it's power cycling? [19:30:13] papaul: is that you^? [19:31:01] !log jbond@cumin1001 conftool action : set/pooled=yes; selector: name=mw124[0-4].eqiad.wmnet [19:31:03] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:32:07] !log jbond@cumin1001 conftool action : set/pooled=no; selector: name=mw124[5-9].eqiad.wmnet [19:32:09] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:32:37] cwd: the reason is there is no DHCP server config for frmon2001 yet. The MAC address needs to be added .. a stanza in modules/install_server/files/dhcpd/ [19:32:39] (03CR) 10CDanis: icinga: pause nsca on reloads (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/504898 (https://phabricator.wikimedia.org/T196336) (owner: 10CDanis) [19:33:07] (03PS1) 10Papaul: DNS: remove mgmt DNS for acamar and achernar [dns] - 10https://gerrit.wikimedia.org/r/507105 [19:33:22] mutante: well we actually have the dhcp server on frack infra [19:34:07] cwd: ooh.. then everything is different for frack.. and ignore my comments [19:35:09] :) [19:35:38] !log jbond@cumin1001 conftool action : set/pooled=yes; selector: name=mw124[5-9].eqiad.wmnet [19:35:40] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:36:05] 10Operations, 10fundraising-tech-ops, 10netops, 10Patch-For-Review: Network setup for frmon2001.frack.codfw.wmnet - https://phabricator.wikimedia.org/T221475 (10Dzahn) a:05cwdent→03Papaul [19:36:27] !log jbond@cumin1001 conftool action : set/pooled=no; selector: name=mw125[0-3].eqiad.wmnet [19:36:29] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:36:57] mutante: your triaging is appreciated :) [19:37:30] :) [19:39:38] !log jbond@cumin1001 conftool action : set/pooled=yes; selector: name=mw125[0-3].eqiad.wmnet [19:39:40] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:39:54] !log jbond@cumin1001 conftool action : set/pooled=no; selector: name=mw125[4-8].eqiad.wmnet [19:39:56] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:41:17] jouncebot: now [19:41:17] No deployments scheduled for the next 0 hour(s) and 18 minute(s) [19:41:45] mutante: thanks for merging the g1gc patch, you haven't restarted gerrit have you? [19:41:58] thcipriani: no, because they were in the middle of deploying in that moment [19:42:04] cool [19:42:10] I will do that now then [19:42:19] yep:) [19:43:36] !log gerrit restart for https://gerrit.wikimedia.org/r/327763 T221026 [19:43:39] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:43:40] T221026: Gerrit thread use GC thrashing - https://phabricator.wikimedia.org/T221026 [19:44:14] !log jbond@cumin1001 conftool action : set/pooled=yes; selector: name=mw125[4-8].eqiad.wmnet [19:44:16] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:44:45] !log jbond@cumin1001 conftool action : set/pooled=no; selector: name=mw126[1-4].eqiad.wmnet [19:44:47] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:44:59] !log gerrit back [19:45:01] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:45:08] (03PS2) 10Alex Monk: Remove the ssh module's unused init.pp [puppet] - 10https://gerrit.wikimedia.org/r/507095 [19:45:17] 10Operations, 10monitoring, 10Wikimedia-Incident: prometheus: usable dashboard for meta-metrics about Prometheus itself (query durations etc) - https://phabricator.wikimedia.org/T222102 (10CDanis) [19:46:48] 10Operations, 10monitoring, 10Wikimedia-Incident: prometheus: usable dashboard for meta-metrics about Prometheus itself (query durations etc) - https://phabricator.wikimedia.org/T222102 (10CDanis) [19:48:13] (03CR) 10Herron: [C: 04-1] "Looks good overall, just needs a syntax fix. Please see inline." (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/506400 (https://phabricator.wikimedia.org/T220987) (owner: 10Jbond) [19:52:51] !log jbond@cumin1001 conftool action : set/pooled=yes; selector: name=mw126[1-4].eqiad.wmnet [19:52:53] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:56:41] 10Operations, 10Core Platform Team Kanban (Blocked Externally), 10Services (blocked), 10User-Eevans, 10User-fgiunchedi: New upstream jvm-tools - https://phabricator.wikimedia.org/T178839 (10mobrovac) 05Open→03Stalled @Eevans @fgiunchedi is there a plan to resume this work or should we close this tick... [19:59:28] 10Operations, 10Electron-PDFs, 10Core Platform Team Kanban (Done with CPT), 10Design, and 4 others: Use "Charter" as preferred typeface on Electron - https://phabricator.wikimedia.org/T181200 (10mobrovac) 05Open→03Declined Declining since #proton is replacing #electron-pdfs as the default PDF rendering... [19:59:51] !log jbond@cumin1001 conftool action : set/pooled=no; selector: name=mw126[5-9].eqiad.wmnet [19:59:53] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:00:04] cscott, arlolra, subbu, bearND, and halfak: It is that lovely time of the day again! You are hereby commanded to deploy Services – Parsoid / Citoid / Mobileapps / ORES / …. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20190429T2000). [20:00:04] ottomata and Pchelolo: #bothumor Q:How do functions break up? A:They stop calling each other. Rise for eventgate deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20190429T2000). [20:00:43] 10Operations, 10Electron-PDFs, 10Core Platform Team Kanban (Done with CPT), 10Services (done): pdfrender logs to /var/log/syslog as well as to /srv/log/pdfrender - https://phabricator.wikimedia.org/T191191 (10mobrovac) 05Open→03Declined Declining since #electron-pdfs is going away in favour of #proton . [20:00:58] i did not schedule that deploy properly...timezone faila. [20:01:22] anyway we already did it, but then discovered that the code the config was enabling wasn't deployed last week. so we will wait until the group0 train moves tomorrow. [20:01:43] 10Operations, 10Analytics, 10Analytics-EventLogging, 10Performance-Team (Radar): Upgrade python-kafka - https://phabricator.wikimedia.org/T221848 (10kchapman) [20:02:12] 10Operations, 10Cassandra, 10Core Platform Team Kanban (Done with CPT), 10Services (done), 10User-Eevans: Revisit default settings for c-foreach-restart - https://phabricator.wikimedia.org/T198787 (10mobrovac) I believe this is done, correct @Eevans ? [20:02:45] 10Operations, 10MediaWiki-Containers, 10Release-Engineering-Team, 10Core Platform Team Kanban (Done with CPT), and 4 others: FY2017/18 Program 6 - Outcome 2 - Objective 3: Integrated, container-based development environment - https://phabricator.wikimedia.org/T170456 (10mobrovac) [20:02:55] 10Operations, 10ops-codfw: rack/setup/install bast2002.wikimedia.org - https://phabricator.wikimedia.org/T196665 (10Papaul) [20:07:03] (03PS1) 10Kosta Harlan: GrowthExperiments: Enable SpecialHomepage feature for cs/kowiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/507114 (https://phabricator.wikimedia.org/T221266) [20:07:08] (03PS1) 10Kosta Harlan: GrowthExperiments: Begin experiment for Homepage with cs/kowiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/507115 (https://phabricator.wikimedia.org/T221266) [20:08:33] !log jbond@cumin1001 conftool action : set/pooled=yes; selector: name=mw126[5-9].eqiad.wmnet [20:08:36] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:09:37] 10Operations, 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10Performance-Team (Radar): Upgrade python-kafka - https://phabricator.wikimedia.org/T221848 (10Ottomata) [20:10:06] !log jbond@cumin1001 conftool action : set/pooled=no; selector: name=mw127[0-4].eqiad.wmnet [20:10:08] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:11:32] 10Operations, 10Discovery-Search, 10procurement: Price estimates for Search Platform hardware request for 2019-2020 - https://phabricator.wikimedia.org/T222104 (10Gehel) [20:11:44] robh: ^ (as discussed) [20:12:38] (03PS4) 10Alex Monk: Revert "Revert "Revert "sshd_config: Increase MaxAuthTries""" [puppet] - 10https://gerrit.wikimedia.org/r/377269 (https://phabricator.wikimedia.org/T172333) (owner: 10Alexandros Kosiaris) [20:12:40] (03PS1) 10Alex Monk: Move deployment_hosts out of network::constants [puppet] - 10https://gerrit.wikimedia.org/r/507116 (https://phabricator.wikimedia.org/T220894) [20:13:30] 10Operations, 10Patch-For-Review: Replacement of network::constant's special_hosts - https://phabricator.wikimedia.org/T220894 (10Krenair) [20:13:36] (03CR) 10jerkins-bot: [V: 04-1] Move deployment_hosts out of network::constants [puppet] - 10https://gerrit.wikimedia.org/r/507116 (https://phabricator.wikimedia.org/T220894) (owner: 10Alex Monk) [20:14:47] (03PS2) 10Alex Monk: Move deployment_hosts out of network::constants [puppet] - 10https://gerrit.wikimedia.org/r/507116 (https://phabricator.wikimedia.org/T220894) [20:18:05] !log jbond@cumin1001 conftool action : set/pooled=yes; selector: name=mw127[0-4].eqiad.wmnet [20:18:07] 10Operations, 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10Performance-Team (Radar): Upgrade python-kafka - https://phabricator.wikimedia.org/T221848 (10Ottomata) Built 1.4.6. .debs and added them to https://apt.wikimedia.org/wikimedia/pool/main/p/python-kafka/. @gilles, can you do the u... [20:18:07] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:18:36] !log jbond@cumin1001 conftool action : set/pooled=no; selector: name=mw127[5-9].eqiad.wmnet [20:18:38] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:18:38] (03CR) 10Alex Monk: "should run PCC after parent commit is merged" [puppet] - 10https://gerrit.wikimedia.org/r/507116 (https://phabricator.wikimedia.org/T220894) (owner: 10Alex Monk) [20:19:32] 10Operations, 10monitoring, 10Wikimedia-Incident: prometheus: current query limits are insufficient to prevent OOMs - https://phabricator.wikimedia.org/T222105 (10CDanis) [20:19:42] !log arlolra@deploy1001 Started deploy [parsoid/deploy@7859b58]: Updating Parsoid to c9dab9d [20:19:44] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:21:37] (03CR) 10Dzahn: [C: 03+2] DNS: remove mgmt DNS for acamar and achernar [dns] - 10https://gerrit.wikimedia.org/r/507105 (owner: 10Papaul) [20:23:01] 10Operations, 10ops-codfw, 10Traffic, 10decommission, 10Patch-For-Review: Decommission acamar and achernar - https://phabricator.wikimedia.org/T198286 (10Dzahn) [20:23:12] (03PS3) 10Dzahn: remove bast2001.mgmt [dns] - 10https://gerrit.wikimedia.org/r/507102 (https://phabricator.wikimedia.org/T219492) [20:25:26] (03PS2) 10Jbond: logstash: add ulog parser to logstash [puppet] - 10https://gerrit.wikimedia.org/r/506400 (https://phabricator.wikimedia.org/T220987) [20:25:58] !log jbond@cumin1001 conftool action : set/pooled=yes; selector: name=mw127[5-9].eqiad.wmnet [20:26:01] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:26:18] !log arlolra@deploy1001 Finished deploy [parsoid/deploy@7859b58]: Updating Parsoid to c9dab9d (duration: 06m 36s) [20:26:20] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:27:56] (03CR) 10Dzahn: [C: 03+2] remove bast2001.mgmt [dns] - 10https://gerrit.wikimedia.org/r/507102 (https://phabricator.wikimedia.org/T219492) (owner: 10Dzahn) [20:30:22] 10Operations, 10TechCom, 10Core Platform Team (Session Management Service (CDP2)), 10Core Platform Team Backlog (Later), and 6 others: Establish an SLA for session storage - https://phabricator.wikimedia.org/T211721 (10EvanProdromou) @mobrovac I think it's less about cross-DC writes to Kask, and more about... [20:32:30] 10Operations, 10Thumbor, 10serviceops, 10Patch-For-Review, 10Performance-Team (Radar): Build Thumbor packages for buster - https://phabricator.wikimedia.org/T221562 (10kchapman) [20:35:48] 10Operations, 10Performance-Team, 10Traffic: Send peering requests to AS with the worst TTFB - https://phabricator.wikimedia.org/T219486 (10kchapman) a:03Gilles [20:37:00] !log Deployed patch for T222014 [20:37:02] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:37:31] !log add BGP session to AS4922 in eqiad [20:37:33] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:40:46] !log Updated Parsoid to c9dab9d (T106578, T113194, T205338, T219072, T219938, T221384, T219943) [20:40:59] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:41:00] T205338: Extract and use a DOMHandler interface (API) for HTML -> wt handlers defined on DOM nodes - https://phabricator.wikimedia.org/T205338 [20:41:00] T221384: Wikipeg cache is unsafe when rule variables are set to null/undefined - https://phabricator.wikimedia.org/T221384 [20:41:01] T106578: Sanitizer entity restrictions should match HTML5 - https://phabricator.wikimedia.org/T106578 [20:41:01] T113194: MediaWiki fails to parse – (en dash) - https://phabricator.wikimedia.org/T113194 [20:41:01] T219938: Port HTML5Treebuilder - https://phabricator.wikimedia.org/T219938 [20:41:02] T219943: Create a composer library for wikipeg - https://phabricator.wikimedia.org/T219943 [20:41:02] T219072: Extend JS/PHP hybrid testing to other Parsoid components - https://phabricator.wikimedia.org/T219072 [20:49:14] 10Operations, 10TechCom, 10Core Platform Team (Session Management Service (CDP2)), 10Core Platform Team Backlog (Later), and 6 others: Establish an SLA for session storage - https://phabricator.wikimedia.org/T211721 (10EvanProdromou) On a related note, do we want or need an SLA on consistency? [20:50:05] 10Operations, 10fundraising-tech-ops, 10netops, 10Patch-For-Review: Network setup for frmon2001.frack.codfw.wmnet - https://phabricator.wikimedia.org/T221475 (10Papaul) a:05Papaul→03cwdent @cwdent the switch ports were not setup. You should be good now. ` papaul@fasw-c-codfw# run show interfaces ge-[0... [20:53:36] 10Operations, 10monitoring, 10Wikimedia-Incident: prometheus: some sort of IRC alerts on restarts? - https://phabricator.wikimedia.org/T222108 (10CDanis) [20:53:44] 10Operations, 10ops-eqiad, 10DC-Ops, 10fundraising-tech-ops: decommission frav1001.frack.eqiad.wmnet - https://phabricator.wikimedia.org/T222109 (10Jgreen) [20:54:18] 10Operations, 10SRE-Access-Requests, 10Patch-For-Review: access for foks to labweb (in one way or another) (or make changePassword.php work on mwmaint hosts) - https://phabricator.wikimedia.org/T220860 (10Dzahn) a:03jrbs Let's confirm that it doesn't actually work on mwmaint and what the error is. And also... [20:55:27] 10Operations, 10ops-codfw, 10Traffic, 10decommission, 10Patch-For-Review: Decommission acamar and achernar - https://phabricator.wikimedia.org/T198286 (10Papaul) [20:55:43] 10Operations, 10ops-codfw, 10Traffic, 10decommission, 10Patch-For-Review: Decommission acamar and achernar - https://phabricator.wikimedia.org/T198286 (10Papaul) 05Open→03Resolved Complete [20:59:16] mutante: paladox just sent me https://bugs.chromium.org/p/gerrit/issues/detail?id=3259#c5 which does seem similar to what we've been seeing [21:00:03] thcipriani: ack, i am reading the same thing. the timeout setting doesnt seem to do anything [21:00:04] bawolff and Reedy: I, the Bot under the Fountain, allow thee, The Deployer, to do Weekly Security deployment window deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20190429T2100). [21:00:56] that would explain why I see a sendmail thread that occasionally hangs and seems to just block other threads. [21:00:58] thcipriani: paladox: so.. yea.. fixed in java 9 and backported to SOME 1.8 versions... which we use of course [21:01:06] yeh [21:01:45] https://bugs.openjdk.java.net/browse/JDK-8075484 [21:01:58] shows it was fixed in 1.8.72 and 1.8.92 [21:02:05] i do NOT see u181 in the list [21:02:08] so it affects us [21:02:08] dosen't explicity mention 1.8.82. [21:02:16] yeh [21:03:10] ah [21:03:11] "Unfortunately the fix was not backported to the native java version that you get when you run apt-get install java (1.8.0_181)" [21:03:23] that's in the issue tyler linked [21:03:25] well neat. [21:04:10] a java bug .. but a gerrit bug as well... nice [21:04:41] best of both worlds :P (not) [21:04:48] hehe, yea [21:05:16] upstream no longer have openjdk 8 for jessie [21:05:22] since the removal of jessie-backports [21:05:34] 10Operations, 10monitoring, 10Wikimedia-Incident: figure out why Kafka dashboard hammers Prometheus, and fix it - https://phabricator.wikimedia.org/T222112 (10CDanis) [21:07:17] thcipriani: well.. if we would just install the package upgrades: [21:07:18] (03PS3) 10Jbond: logstash: add ulog parser to logstash [puppet] - 10https://gerrit.wikimedia.org/r/506400 (https://phabricator.wikimedia.org/T220987) [21:07:19] Inst openjdk-8-jdk [8u181-b13-2] (8u212-b01-1~deb8u1 Wikimedia:8/jessie-wikimedia [amd64]) [] [21:07:30] then we would be 8u212 [21:07:45] so not even having to build new ones [21:08:08] paladox: u212 ? [21:08:18] ohh [21:08:22] yes let's use 212 [21:08:25] that should include it [21:09:15] (03PS4) 10Jbond: logstash: add ulog parser to logstash [puppet] - 10https://gerrit.wikimedia.org/r/506400 (https://phabricator.wikimedia.org/T220987) [21:09:54] thcipriani: should i JFDI ? [21:10:02] mutante: paladox that has actually been on my list for a while. I was hoping to isolate the tweaks we were making to gerrit's heap, so hadn't upgraded just yet. Ironic if that's been the cause. [21:10:12] heh [21:10:13] mutante: yes please [21:10:16] heh [21:10:56] !log cobalt (gerrit) upgrading openjdk 8 minor version [21:10:58] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:11:21] done.. the package side at least [21:12:53] k, another gerrit restart should probably do it then. [21:13:26] !log restarting gerrit [21:13:27] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:15:01] ok, it's back [21:15:45] thanks mutante! [21:16:13] ^ thank you mutante [21:16:46] np. lmk how the threads are looking [21:17:02] 10Operations, 10monitoring, 10Wikimedia-Incident: figure out why Kafka dashboard hammers Prometheus, and fix it - https://phabricator.wikimedia.org/T222112 (10CDanis) Very easy to reproduce this presently. Not an OOM but close. https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&from=1556564400... [21:19:02] mutante: not sure if it is relevant actually impactful, but I'm seeing a lot more tcp attemptfails on cobalt post-restart https://grafana.wikimedia.org/d/000000377/host-overview?refresh=5m&orgId=1&var-server=cobalt&var-datasource=eqiad%20prometheus%2Fops&var-cluster=misc [21:19:08] it's still just a few a second though [21:19:53] ahh and looks back to baseline now [21:20:42] cdanis: thanks. i see it and just noticed that too.. looks like back [21:22:23] cdanis: when zooming out to last 7 days it looks more normal [21:24:21] (03PS5) 10Alex Monk: network::constants: Move mysql_root_clients from special_hosts [puppet] - 10https://gerrit.wikimedia.org/r/505407 (https://phabricator.wikimedia.org/T220894) [21:24:45] threads are looking good! [21:25:15] :) nice [21:32:00] (03PS1) 10Bstorm: cloudstore: change direction a bit on the rsync methods [puppet] - 10https://gerrit.wikimedia.org/r/507206 (https://phabricator.wikimedia.org/T209527) [21:32:34] (03CR) 10jerkins-bot: [V: 04-1] cloudstore: change direction a bit on the rsync methods [puppet] - 10https://gerrit.wikimedia.org/r/507206 (https://phabricator.wikimedia.org/T209527) (owner: 10Bstorm) [21:33:51] 10Operations, 10monitoring, 10Wikimedia-Incident: prometheus: upgrade to 2.9.2 - https://phabricator.wikimedia.org/T222113 (10CDanis) [21:33:54] (03PS2) 10Bstorm: cloudstore: change direction a bit on the rsync methods [puppet] - 10https://gerrit.wikimedia.org/r/507206 (https://phabricator.wikimedia.org/T209527) [21:34:04] bstorm_: rsync::quickdatacopy already does all the things you want [21:34:13] and much simpler / fewer code lines [21:35:22] (03CR) 10Bstorm: [C: 03+2] cloudstore: change direction a bit on the rsync methods [puppet] - 10https://gerrit.wikimedia.org/r/507206 (https://phabricator.wikimedia.org/T209527) (owner: 10Bstorm) [21:36:47] That might be very cool. [21:36:56] I'll take a look mutante just in case it's all set there [21:37:16] Quick and simple is what I want :) [21:37:43] 10Operations, 10Patch-For-Review: Replacement of network::constant's special_hosts - https://phabricator.wikimedia.org/T220894 (10Krenair) >>! In T220894#5134357, @jcrespo wrote: > I think it is better to hardcode the constants on `modules/profile/manifests/mariadb/ferm.pp` (for now, not as an ideal situation)... [21:39:42] socket error (and threads are slightly increasing) let's see if the java update will correctly timeout sendmail! [21:40:06] (03PS1) 10CDanis: prometheus: 10M max-samples for all instances [puppet] - 10https://gerrit.wikimedia.org/r/507210 (https://phabricator.wikimedia.org/T222105) [21:40:44] 10Operations, 10Patch-For-Review: Replacement of network::constant's special_hosts - https://phabricator.wikimedia.org/T220894 (10Krenair) [21:42:09] (03PS6) 10Paladox: toolforge: update origin URL for integration/composer.git clones [puppet] - 10https://gerrit.wikimedia.org/r/507074 (https://phabricator.wikimedia.org/T218844) [21:45:06] (03Abandoned) 10Volans: Fix records for camera [dns] - 10https://gerrit.wikimedia.org/r/467709 (owner: 10Volans) [21:46:11] (03CR) 10Volans: [C: 03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/504898 (https://phabricator.wikimedia.org/T196336) (owner: 10CDanis) [21:48:26] seems the timeout is working (if i presume the socket errors are sendmail) threads are going down slightly [21:51:23] (03CR) 10Alex Monk: "PS5: detached from problematic parent commit, made base::firewall changes necessary to make this one work without it." [puppet] - 10https://gerrit.wikimedia.org/r/505407 (https://phabricator.wikimedia.org/T220894) (owner: 10Alex Monk) [21:51:30] (03CR) 10Alex Monk: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/505407 (https://phabricator.wikimedia.org/T220894) (owner: 10Alex Monk) [21:52:39] (03CR) 10Alex Monk: [C: 04-1] "detached child commit and reduced its scope a bit. might try to fix this later, but it's not crucial to getting rid of network::constants " [puppet] - 10https://gerrit.wikimedia.org/r/505406 (owner: 10Alex Monk) [21:59:49] (03CR) 10Volans: "Few comments inline" (034 comments) [puppet] - 10https://gerrit.wikimedia.org/r/507045 (https://phabricator.wikimedia.org/T218932) (owner: 10Mathew.onipe) [22:00:45] (03Abandoned) 10Alex Monk: mariadb: Replace role::mariadb::ferm with profile::mariadb::ferm [puppet] - 10https://gerrit.wikimedia.org/r/505406 (owner: 10Alex Monk) [22:09:22] (03CR) 10Volans: "> Patch Set 3:" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/505657 (https://phabricator.wikimedia.org/T215378) (owner: 10CRusnov) [22:15:43] 10Operations, 10cloud-services-team, 10serviceops: Change /r/p/ to /r/ on all hosts (where https://gerrit.wikimedia.org/r/p/ exists) - https://phabricator.wikimedia.org/T222093 (10Paladox) p:05Triage→03High [22:16:09] (03PS7) 10Paladox: toolforge: update origin URL for integration/composer.git clones [puppet] - 10https://gerrit.wikimedia.org/r/507074 (https://phabricator.wikimedia.org/T218844) [22:16:15] (03Abandoned) 10Paladox: toolforge: Stop cloning over /p/ [puppet] - 10https://gerrit.wikimedia.org/r/507087 (https://phabricator.wikimedia.org/T218844) (owner: 10Paladox) [22:17:44] (03PS1) 10Bstorm: cloudstore: finish up the script for sync [puppet] - 10https://gerrit.wikimedia.org/r/507212 (https://phabricator.wikimedia.org/T209527) [22:18:43] 10Operations, 10cloud-services-team, 10serviceops: Change /r/p/ to /r/ on all hosts (where https://gerrit.wikimedia.org/r/p/ exists) - https://phabricator.wikimedia.org/T222093 (10Paladox) [22:19:19] (03PS5) 10Paladox: puppetmaster: Stop cloning over /p/ [puppet] - 10https://gerrit.wikimedia.org/r/507079 (https://phabricator.wikimedia.org/T218844) [22:19:54] (03PS4) 10Paladox: authdns: Stop cloning over /p/ [puppet] - 10https://gerrit.wikimedia.org/r/507076 (https://phabricator.wikimedia.org/T218844) [22:19:58] (03PS3) 10Paladox: beta: Stop cloning over /p/ [puppet] - 10https://gerrit.wikimedia.org/r/507069 [22:20:14] (03PS4) 10Paladox: beta: Stop cloning over /p/ [puppet] - 10https://gerrit.wikimedia.org/r/507069 (https://phabricator.wikimedia.org/T218844) [22:20:16] 10Operations, 10Wikimedia-Site-requests, 10acl*stewards: Create accounts for new stewards in closed wikis - https://phabricator.wikimedia.org/T222117 (10Base) [22:20:34] (03PS5) 10Paladox: openldap: Stop cloning over /p/ [puppet] - 10https://gerrit.wikimedia.org/r/507088 (https://phabricator.wikimedia.org/T218844) [22:20:47] (03PS3) 10Paladox: zuul: Stop cloning over /p/ [puppet] - 10https://gerrit.wikimedia.org/r/507070 [22:21:06] (03PS4) 10Paladox: zuul: Stop cloning over /p/ [puppet] - 10https://gerrit.wikimedia.org/r/507070 (https://phabricator.wikimedia.org/T218844) [22:21:13] (03PS4) 10Paladox: scap: Stop cloning over /p/ [puppet] - 10https://gerrit.wikimedia.org/r/507072 [22:21:15] (03CR) 10Bstorm: [C: 03+2] cloudstore: finish up the script for sync [puppet] - 10https://gerrit.wikimedia.org/r/507212 (https://phabricator.wikimedia.org/T209527) (owner: 10Bstorm) [22:21:24] (03PS5) 10Paladox: scap: Stop cloning over /p/ [puppet] - 10https://gerrit.wikimedia.org/r/507072 (https://phabricator.wikimedia.org/T218844) [22:27:07] (03PS1) 10Bstorm: cloudstore: cleanup extraneous bits [puppet] - 10https://gerrit.wikimedia.org/r/507213 (https://phabricator.wikimedia.org/T209527) [22:28:56] PROBLEM - Check systemd state on ms-be2014 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [22:29:05] (03CR) 10Bstorm: [C: 03+2] cloudstore: cleanup extraneous bits [puppet] - 10https://gerrit.wikimedia.org/r/507213 (https://phabricator.wikimedia.org/T209527) (owner: 10Bstorm) [22:38:06] RECOVERY - Check systemd state on ms-be2014 is OK: OK - running: The system is fully operational [22:42:23] (03PS1) 10Bstorm: cloudstore: the cluster ip must be passed through [puppet] - 10https://gerrit.wikimedia.org/r/507216 [22:43:11] (03CR) 10jerkins-bot: [V: 04-1] cloudstore: the cluster ip must be passed through [puppet] - 10https://gerrit.wikimedia.org/r/507216 (owner: 10Bstorm) [22:45:06] (03PS2) 10Bstorm: cloudstore: the cluster ip must be passed through [puppet] - 10https://gerrit.wikimedia.org/r/507216 [22:45:32] (03PS1) 10Ayounsi: Add the Juniper to Netbox import script. [software/netbox-deploy] - 10https://gerrit.wikimedia.org/r/507217 [22:45:59] (03CR) 10jerkins-bot: [V: 04-1] cloudstore: the cluster ip must be passed through [puppet] - 10https://gerrit.wikimedia.org/r/507216 (owner: 10Bstorm) [22:47:11] (03PS3) 10Bstorm: cloudstore: the cluster ip must be passed through [puppet] - 10https://gerrit.wikimedia.org/r/507216 (https://phabricator.wikimedia.org/T209527) [22:48:23] (03CR) 10Bstorm: [C: 03+2] cloudstore: the cluster ip must be passed through [puppet] - 10https://gerrit.wikimedia.org/r/507216 (https://phabricator.wikimedia.org/T209527) (owner: 10Bstorm) [22:54:06] How do you do "forgot password" / reset on wikitech currently? [22:55:58] You carn't i doin't think [22:56:51] well then.. can't create a user [22:56:56] password i used doesnt work [22:57:01] user is already created [22:57:11] emailing it automatically has a warning to NOT use it [22:57:20] cant re-create same user again [22:59:02] mutante you could set there password and email it (i guess, but ask them to immediately change it) [22:59:16] paladox: yea, that's why i asked how i set the password :) [22:59:30] ah, im not entirly sure how it's done in prod [22:59:34] that's exactly what i wanted to do [22:59:58] but first i need to know the working password [23:00:04] MaxSem, RoanKattouw, and Niharika: Time to snap out of that daydream and deploy Evening SWAT (Max 6 patches). Get on with it. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20190429T2300). [23:00:04] kaldari: A patch you scheduled for Evening SWAT (Max 6 patches) is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker. [23:01:31] (03PS1) 10EBernhardson: Add cloudelastic servers to wgCirrusSearchClusters [mediawiki-config] - 10https://gerrit.wikimedia.org/r/507219 (https://phabricator.wikimedia.org/T220625) [23:01:42] i'm also shipping something (not quite in the deployment calendar yet...) can run SWAT [23:02:18] (03CR) 10jerkins-bot: [V: 04-1] Add cloudelastic servers to wgCirrusSearchClusters [mediawiki-config] - 10https://gerrit.wikimedia.org/r/507219 (https://phabricator.wikimedia.org/T220625) (owner: 10EBernhardson) [23:03:47] kaldari: around? gerrit doesn't want to rebase your patch. But looks trivial enough i can do it myself in a minute [23:03:56] !log smalyshev@deploy1001 Started deploy [wdqs/wdqs@65796ad]: New deploy with GUI fix [23:03:58] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:07:22] (03PS1) 10Bstorm: cloudstore: correct python syntax [puppet] - 10https://gerrit.wikimedia.org/r/507220 (https://phabricator.wikimedia.org/T209527) [23:08:00] 10Operations, 10Analytics, 10EventBus, 10serviceops, and 4 others: Enabling api-request eventgate to group1 caused minor service disruptions - https://phabricator.wikimedia.org/T218255 (10mobrovac) [23:08:37] (03CR) 10Bstorm: [C: 03+2] cloudstore: correct python syntax [puppet] - 10https://gerrit.wikimedia.org/r/507220 (https://phabricator.wikimedia.org/T209527) (owner: 10Bstorm) [23:09:13] mutante, yeah password reset is disabled at the moment. you should be able to use changePassword.php [23:10:12] something like `mwscript changePassword.php labswiki --user HMonroy --password ` [23:10:18] from a labweb server [23:11:15] (probably) [23:11:54] Mhm, I forgot about that [23:12:24] (03PS2) 10EBernhardson: Add cloudelastic servers to wgCirrusSearchClusters [mediawiki-config] - 10https://gerrit.wikimedia.org/r/507219 (https://phabricator.wikimedia.org/T220625) [23:13:14] (03CR) 10jerkins-bot: [V: 04-1] Add cloudelastic servers to wgCirrusSearchClusters [mediawiki-config] - 10https://gerrit.wikimedia.org/r/507219 (https://phabricator.wikimedia.org/T220625) (owner: 10EBernhardson) [23:15:15] (03PS3) 10EBernhardson: Add cloudelastic servers to wgCirrusSearchClusters [mediawiki-config] - 10https://gerrit.wikimedia.org/r/507219 (https://phabricator.wikimedia.org/T220625) [23:16:06] (03CR) 10jerkins-bot: [V: 04-1] Add cloudelastic servers to wgCirrusSearchClusters [mediawiki-config] - 10https://gerrit.wikimedia.org/r/507219 (https://phabricator.wikimedia.org/T220625) (owner: 10EBernhardson) [23:16:19] (03PS2) 10EBernhardson: Add static.inaturalist.org to $wgCopyUploadsDomains for Commons [mediawiki-config] - 10https://gerrit.wikimedia.org/r/504469 (https://phabricator.wikimedia.org/T221154) (owner: 10Kaldari) [23:17:54] (03PS4) 10EBernhardson: Add cloudelastic servers to wgCirrusSearchClusters [mediawiki-config] - 10https://gerrit.wikimedia.org/r/507219 (https://phabricator.wikimedia.org/T220625) [23:18:48] (03CR) 10jerkins-bot: [V: 04-1] Add cloudelastic servers to wgCirrusSearchClusters [mediawiki-config] - 10https://gerrit.wikimedia.org/r/507219 (https://phabricator.wikimedia.org/T220625) (owner: 10EBernhardson) [23:19:50] (03PS2) 10Nuria: admin: allow analytics-admins to control jupyter user units [puppet] - 10https://gerrit.wikimedia.org/r/504067 (owner: 10Elukey) [23:22:05] (03PS1) 10Bstorm: cloudstore: touch up the script a bit from testing [puppet] - 10https://gerrit.wikimedia.org/r/507222 (https://phabricator.wikimedia.org/T209527) [23:22:58] Krenair: thanks, i got it working with slappasswd and modify-ldap-user but had to install the former locally [23:23:24] waiting for examples of changePassword.php myself to fix another access request thing [23:23:50] [00:09:44] something like `mwscript changePassword.php labswiki --user HMonroy --password ` [23:24:20] PROBLEM - Host db1093 is DOWN: PING CRITICAL - Packet loss = 100% [23:26:16] (03PS2) 10Bstorm: cloudstore: touch up the script a bit from testing [puppet] - 10https://gerrit.wikimedia.org/r/507222 (https://phabricator.wikimedia.org/T209527) [23:27:03] (03CR) 10Bstorm: [C: 03+2] cloudstore: touch up the script a bit from testing [puppet] - 10https://gerrit.wikimedia.org/r/507222 (https://phabricator.wikimedia.org/T209527) (owner: 10Bstorm) [23:29:21] (03PS5) 10EBernhardson: Add cloudelastic servers to wgCirrusSearchClusters [mediawiki-config] - 10https://gerrit.wikimedia.org/r/507219 (https://phabricator.wikimedia.org/T220625) [23:29:32] kaldari: i'm just going to ship your patch, seems safe enough [23:30:11] (03CR) 10jerkins-bot: [V: 04-1] Add cloudelastic servers to wgCirrusSearchClusters [mediawiki-config] - 10https://gerrit.wikimedia.org/r/507219 (https://phabricator.wikimedia.org/T220625) (owner: 10EBernhardson) [23:30:22] (03CR) 10EBernhardson: [C: 03+2] Add static.inaturalist.org to $wgCopyUploadsDomains for Commons [mediawiki-config] - 10https://gerrit.wikimedia.org/r/504469 (https://phabricator.wikimedia.org/T221154) (owner: 10Kaldari) [23:31:24] (03Merged) 10jenkins-bot: Add static.inaturalist.org to $wgCopyUploadsDomains for Commons [mediawiki-config] - 10https://gerrit.wikimedia.org/r/504469 (https://phabricator.wikimedia.org/T221154) (owner: 10Kaldari) [23:31:57] (03PS6) 10EBernhardson: Add cloudelastic servers to wgCirrusSearchClusters [mediawiki-config] - 10https://gerrit.wikimedia.org/r/507219 (https://phabricator.wikimedia.org/T220625) [23:33:28] !log ebernhardson@deploy1001 Synchronized wmf-config/InitialiseSettings.php: T221154: Add static.inaturalist.org to $wgCopyUploadDomains for Commons (duration: 00m 54s) [23:33:31] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:33:32] T221154: Add static.inaturalist.org to $wgCopyUploadsDomains for Commons - https://phabricator.wikimedia.org/T221154 [23:33:38] (03CR) 10jenkins-bot: Add static.inaturalist.org to $wgCopyUploadsDomains for Commons [mediawiki-config] - 10https://gerrit.wikimedia.org/r/504469 (https://phabricator.wikimedia.org/T221154) (owner: 10Kaldari) [23:34:59] !log smalyshev@deploy1001 Finished deploy [wdqs/wdqs@65796ad]: New deploy with GUI fix (duration: 31m 04s) [23:35:01] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:35:08] here :) [23:35:27] Thanks for the swat! [23:37:22] kaldari: :) [23:37:32] PROBLEM - Text HTTP 5xx reqs/min on graphite1004 is CRITICAL: CRITICAL: 11.11% of data above the critical threshold [1000.0] https://grafana.wikimedia.org/dashboard/file/varnish-aggregate-client-status-codes?panelId=3&fullscreen&orgId=1&var-site=All&var-cache_type=text&var-status_type=5 [23:37:36] :S [23:37:58] sorry I missed you :P [23:38:19] But yeah, it was good to just deploy. Thanks! [23:39:02] 5xx alert may be related, timing was exact right timing. But it seems to have subsided already [23:39:34] not sure how though, that patch shouldn't be able to trigger 5xx... [23:39:51] oh, i bet it's the hhvm memory thing [23:40:50] (03CR) 10EBernhardson: [C: 03+2] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/507219 (https://phabricator.wikimedia.org/T220625) (owner: 10EBernhardson) [23:41:35] (03PS7) 10EBernhardson: Add cloudelastic servers to wgCirrusSearchClusters [mediawiki-config] - 10https://gerrit.wikimedia.org/r/507219 (https://phabricator.wikimedia.org/T220625) [23:42:14] (03CR) 10EBernhardson: [C: 03+2] Add cloudelastic servers to wgCirrusSearchClusters [mediawiki-config] - 10https://gerrit.wikimedia.org/r/507219 (https://phabricator.wikimedia.org/T220625) (owner: 10EBernhardson) [23:43:18] (03Merged) 10jenkins-bot: Add cloudelastic servers to wgCirrusSearchClusters [mediawiki-config] - 10https://gerrit.wikimedia.org/r/507219 (https://phabricator.wikimedia.org/T220625) (owner: 10EBernhardson) [23:45:01] (03CR) 10jenkins-bot: Add cloudelastic servers to wgCirrusSearchClusters [mediawiki-config] - 10https://gerrit.wikimedia.org/r/507219 (https://phabricator.wikimedia.org/T220625) (owner: 10EBernhardson) [23:45:22] RECOVERY - Text HTTP 5xx reqs/min on graphite1004 is OK: OK: Less than 1.00% above the threshold [250.0] https://grafana.wikimedia.org/dashboard/file/varnish-aggregate-client-status-codes?panelId=3&fullscreen&orgId=1&var-site=All&var-cache_type=text&var-status_type=5 [23:45:58] (03PS1) 10Ayounsi: Netbox: add j2nb support [puppet] - 10https://gerrit.wikimedia.org/r/507224 [23:54:21] !log ebernhardson@deploy1001 Synchronized tests/: T220625 Add cloudelastic servers to wgCirrusSearchClusters (1/5) (duration: 00m 53s) [23:54:24] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:54:25] T220625: Initialize CirrusSearch on cloudelastic - https://phabricator.wikimedia.org/T220625 [23:55:33] !log ebernhardson@deploy1001 Synchronized wmf-config/LabsServices.php: T220625 Add cloudelastic servers to wgCirrusSearchClusters (2/5) (duration: 00m 52s) [23:55:36] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:56:47] !log ebernhardson@deploy1001 Synchronized wmf-config/ProductionServices.php: T220625 Add cloudelastic servers to wgCirrusSearchClusters (3/5) (duration: 00m 50s) [23:56:50] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:58:34] !log ebernhardson@deploy1001 Synchronized wmf-config/InitialiseSettings.php: T220625 Add cloudelastic servers to wgCirrusSearchClusters (4/5) (duration: 00m 52s) [23:58:37] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:59:46] !log ebernhardson@deploy1001 Synchronized wmf-config/CirrusSearch-production.php: T220625 Add cloudelastic servers to wgCirrusSearchClusters (5/5) (duration: 00m 52s) [23:59:49] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:59:49] T220625: Initialize CirrusSearch on cloudelastic - https://phabricator.wikimedia.org/T220625