[00:14:09] 10Operations, 10ops-codfw, 10serviceops: wtp2005 hardware issue - https://phabricator.wikimedia.org/T257903 (10Papaul) @Dzahn I noticed that this server is not present in Icinga and has status "failed" in Netbox. Can we turn this task to a decommission task if the server is no longer needed in production an... [00:21:22] 10Operations, 10MW-on-K8s, 10TechCom-RFC, 10serviceops: Decide on logging in k8s for ShellBox - https://phabricator.wikimedia.org/T263545 (10tstarling) > https://pracucci.com/php-on-kubernetes-application-logging-via-unix-pipe.html provides two options for how php-fit could be set up to get the logs from w... [01:46:27] 10Operations, 10MW-on-K8s, 10serviceops: Decide on logging in k8s for ShellBox - https://phabricator.wikimedia.org/T263545 (10tstarling) [04:37:59] (03PS4) 10KartikMistry: ContentTranslation: Remove testwiki from extension1 cluster [mediawiki-config] - 10https://gerrit.wikimedia.org/r/628790 (https://phabricator.wikimedia.org/T263417) [04:38:44] (03PS4) 10KartikMistry: ContentTranslation: Remove test2wiki from wgContentTranslationAsBetaFeature [mediawiki-config] - 10https://gerrit.wikimedia.org/r/628780 [05:02:35] !log marostegui@cumin1001 dbctl commit (dc=all): 'Depool db2074 for MCR schema change', diff saved to https://phabricator.wikimedia.org/P12739 and previous config saved to /var/cache/conftool/dbconfig/20200923-050234-marostegui.json [05:02:38] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:03:15] !log Remove triggers from db2094:3313 for MCR schema change T238966 [05:03:19] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:03:19] T238966: Apply updates for MCR, actor migration, and content migration, to production wikis. - https://phabricator.wikimedia.org/T238966 [05:14:32] (03PS1) 10Marostegui: mariadb: Remove es2014 puppet references [puppet] - 10https://gerrit.wikimedia.org/r/629252 (https://phabricator.wikimedia.org/T262889) [05:16:01] !log marostegui@cumin1001 START - Cookbook sre.hosts.decommission [05:16:04] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:16:44] (03CR) 10Marostegui: [C: 03+2] mariadb: Remove es2014 puppet references [puppet] - 10https://gerrit.wikimedia.org/r/629252 (https://phabricator.wikimedia.org/T262889) (owner: 10Marostegui) [05:17:34] 10Operations, 10ops-eqiad, 10DBA, 10DC-Ops: (Need By: 2020-09-15) rack/setup/install db1150 (see note on hostname) - https://phabricator.wikimedia.org/T260817 (10jcrespo) Any update on this? A week has passed beyond the "Need by" date. [05:21:54] 10Operations, 10DBA, 10Performance-Team, 10Platform Engineering, 10User-Kormat: Remove sections from db configs - https://phabricator.wikimedia.org/T263127 (10Marostegui) >>! In T263127#6485437, @daniel wrote: > Contributions queries are somewhat special, we may want to keep them separate in case we want... [05:33:38] !log marostegui@cumin1001 END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) [05:33:42] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:34:56] 10Operations, 10ops-codfw, 10DC-Ops, 10decommission-hardware: decommission es2014.codfw.wmnet - https://phabricator.wikimedia.org/T262889 (10Marostegui) a:05Marostegui→03Papaul [05:34:57] 10Operations, 10ops-codfw, 10DC-Ops, 10decommission-hardware: decommission es2014.codfw.wmnet - https://phabricator.wikimedia.org/T262889 (10Marostegui) Host ready for #dc-ops steps! [05:37:53] !log Purge global_status_log table on tendril - T252331 [05:37:57] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:37:58] T252331: tendril_purge_global_status_log_5m and global_status_log needs more frequent purging - https://phabricator.wikimedia.org/T252331 [05:43:27] PROBLEM - HTTPS-dbtree on dbmonitor1001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Dbtree.wikimedia.org [05:45:15] RECOVERY - HTTPS-dbtree on dbmonitor1001 is OK: HTTP OK: HTTP/1.1 200 OK - 92419 bytes in 0.574 second response time https://wikitech.wikimedia.org/wiki/Dbtree.wikimedia.org [05:55:31] !log marostegui@cumin1001 dbctl commit (dc=all): 'Depool db2084 T262856', diff saved to https://phabricator.wikimedia.org/P12741 and previous config saved to /var/cache/conftool/dbconfig/20200923-055531-marostegui.json [05:55:38] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:55:39] T262856: Investigate indexes of wb_changes - https://phabricator.wikimedia.org/T262856 [05:57:02] (03CR) 10Santhosh: [C: 03+2] ContentTranslation: Remove test2wiki from wgContentTranslationAsBetaFeature [mediawiki-config] - 10https://gerrit.wikimedia.org/r/628780 (owner: 10KartikMistry) [05:57:52] (03Merged) 10jenkins-bot: ContentTranslation: Remove test2wiki from wgContentTranslationAsBetaFeature [mediawiki-config] - 10https://gerrit.wikimedia.org/r/628780 (owner: 10KartikMistry) [05:58:51] !log marostegui@cumin1001 dbctl commit (dc=all): 'Slowly repool db2084 after index removal T262856', diff saved to https://phabricator.wikimedia.org/P12742 and previous config saved to /var/cache/conftool/dbconfig/20200923-055850-marostegui.json [05:58:55] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:02:21] (03CR) 10Volans: [C: 03+2] dns: add icinga check mode [software/netbox-extras] - 10https://gerrit.wikimedia.org/r/628898 (owner: 10Volans) [06:04:26] (03CR) 10Santhosh: [C: 03+1] ContentTranslation: Remove testwiki from extension1 cluster [mediawiki-config] - 10https://gerrit.wikimedia.org/r/628790 (https://phabricator.wikimedia.org/T263417) (owner: 10KartikMistry) [06:06:46] (03PS1) 10KartikMistry: Revert "ContentTranslation: Remove test2wiki from wgContentTranslationAsBetaFeature" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/629194 [06:08:12] !log marostegui@cumin1001 dbctl commit (dc=all): 'Depool es2012 for decommmissioning', diff saved to https://phabricator.wikimedia.org/P12743 and previous config saved to /var/cache/conftool/dbconfig/20200923-060812-marostegui.json [06:08:15] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:08:20] (03CR) 10KartikMistry: [C: 03+2] "Revert." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/629194 (owner: 10KartikMistry) [06:09:08] (03Merged) 10jenkins-bot: Revert "ContentTranslation: Remove test2wiki from wgContentTranslationAsBetaFeature" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/629194 (owner: 10KartikMistry) [06:12:06] (03PS1) 10Marostegui: es2012: Disable notifications [puppet] - 10https://gerrit.wikimedia.org/r/629258 (https://phabricator.wikimedia.org/T263613) [06:12:40] (03CR) 10Marostegui: [C: 03+2] es2012: Disable notifications [puppet] - 10https://gerrit.wikimedia.org/r/629258 (https://phabricator.wikimedia.org/T263613) (owner: 10Marostegui) [06:19:04] (03CR) 10Nikerabbit: "Do not deploy before tables are created." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/628790 (https://phabricator.wikimedia.org/T263417) (owner: 10KartikMistry) [06:21:23] (03PS1) 10Volans: dns: make the generation script executable [software/netbox-extras] - 10https://gerrit.wikimedia.org/r/629259 (https://phabricator.wikimedia.org/T258729) [06:23:19] (03CR) 10Volans: [C: 03+2] "Just file mode change, self merging" [software/netbox-extras] - 10https://gerrit.wikimedia.org/r/629259 (https://phabricator.wikimedia.org/T258729) (owner: 10Volans) [06:31:40] !log marostegui@cumin1001 dbctl commit (dc=all): 'Depool es2018 T263615', diff saved to https://phabricator.wikimedia.org/P12744 and previous config saved to /var/cache/conftool/dbconfig/20200923-063140-marostegui.json [06:31:47] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:31:48] T263615: decommission es2018.codfw.wmnet - https://phabricator.wikimedia.org/T263615 [06:32:44] (03PS1) 10Marostegui: es2018: Disable notifications [puppet] - 10https://gerrit.wikimedia.org/r/629315 (https://phabricator.wikimedia.org/T263615) [06:33:24] (03CR) 10Marostegui: [C: 03+2] es2018: Disable notifications [puppet] - 10https://gerrit.wikimedia.org/r/629315 (https://phabricator.wikimedia.org/T263615) (owner: 10Marostegui) [06:34:01] !log Stop MySQL on es2012 and es2018 T263613 T263615 [06:34:06] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:34:07] T263613: decommission es2012.codfw.wmnet - https://phabricator.wikimedia.org/T263613 [06:49:49] (03PS2) 10Urbanecm: Enable wgCheckUserLogLogins at all wikis but few large wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/629227 (https://phabricator.wikimedia.org/T253802) [06:49:56] (03CR) 10Urbanecm: Enable wgCheckUserLogLogins at all wikis but few large wikis (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/629227 (https://phabricator.wikimedia.org/T253802) (owner: 10Urbanecm) [06:50:22] PROBLEM - Check whether ferm is active by checking the default input chain on ms-be2022 is CRITICAL: ERROR ferm input drop default policy not set, ferm might not have been started correctly https://wikitech.wikimedia.org/wiki/Monitoring/check_ferm [06:51:40] (03PS3) 10Urbanecm: Enable wgCheckUserLogLogins at all wikis but few large wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/629227 (https://phabricator.wikimedia.org/T253802) [06:54:05] (03PS1) 10Jcrespo: remote backups: Modify the scheduler so it parallelizes per destination [software/wmfbackups] - 10https://gerrit.wikimedia.org/r/629316 (https://phabricator.wikimedia.org/T257551) [06:54:30] (03CR) 10jerkins-bot: [V: 04-1] remote backups: Modify the scheduler so it parallelizes per destination [software/wmfbackups] - 10https://gerrit.wikimedia.org/r/629316 (https://phabricator.wikimedia.org/T257551) (owner: 10Jcrespo) [06:56:02] PROBLEM - Check systemd state on ms-be2022 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [06:56:55] (03PS2) 10Jcrespo: remote backups: Modify the scheduler so it parallelizes per destination [software/wmfbackups] - 10https://gerrit.wikimedia.org/r/629316 (https://phabricator.wikimedia.org/T257551) [06:57:18] (03CR) 10jerkins-bot: [V: 04-1] remote backups: Modify the scheduler so it parallelizes per destination [software/wmfbackups] - 10https://gerrit.wikimedia.org/r/629316 (https://phabricator.wikimedia.org/T257551) (owner: 10Jcrespo) [06:58:06] (03PS3) 10Jcrespo: remote backups: Modify the scheduler so it parallelizes per destination [software/wmfbackups] - 10https://gerrit.wikimedia.org/r/629316 (https://phabricator.wikimedia.org/T257551) [07:09:07] (03CR) 10Jcrespo: [C: 03+2] remote backups: Modify the scheduler so it parallelizes per destination [software/wmfbackups] - 10https://gerrit.wikimedia.org/r/629316 (https://phabricator.wikimedia.org/T257551) (owner: 10Jcrespo) [07:09:26] !log marostegui@cumin1001 dbctl commit (dc=all): 'Depool db2084 to re-add change_revision_id index T262856', diff saved to https://phabricator.wikimedia.org/P12745 and previous config saved to /var/cache/conftool/dbconfig/20200923-070926-marostegui.json [07:09:31] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:09:32] T262856: Investigate indexes of wb_changes - https://phabricator.wikimedia.org/T262856 [07:11:29] !log marostegui@cumin1001 dbctl commit (dc=all): 'Slowly repool db2084 after index changes T262856', diff saved to https://phabricator.wikimedia.org/P12746 and previous config saved to /var/cache/conftool/dbconfig/20200923-071129-marostegui.json [07:11:33] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:20:25] RECOVERY - Check whether ferm is active by checking the default input chain on ms-be2022 is OK: OK ferm input default policy is set https://wikitech.wikimedia.org/wiki/Monitoring/check_ferm [07:27:01] (03CR) 10Muehlenhoff: [C: 03+1] "Looks good to me, I like it!" [puppet] - 10https://gerrit.wikimedia.org/r/626723 (owner: 10Jbond) [07:30:15] RECOVERY - Check systemd state on ms-be2022 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [07:48:35] 10Operations, 10Product-Infrastructure-Team-Backlog, 10Wikifeeds, 10serviceops, 10Patch-For-Review: [Bug] The feed/featured endpoint is broken - https://phabricator.wikimedia.org/T263043 (10Joe) a:05Joe→03None De-assigning from me as the immediate bug is solved, and Petr is working on the long-term fix. [07:49:43] 10Operations, 10Analytics-Radar, 10Patch-For-Review: Move Hue to a Buster VM - https://phabricator.wikimedia.org/T258768 (10elukey) 05Stalled→03Open Also opened https://github.com/cloudera/hue/pull/1271 [07:50:44] 10Operations, 10Product-Infrastructure-Team-Backlog, 10RESTBase, 10Wikifeeds, 10serviceops: wikifeeds OpenAPI spec test doesn't fail if the response from `feed/featured` is malformed - https://phabricator.wikimedia.org/T263097 (10Joe) a:05Joe→03None [08:03:19] 10Operations, 10Product-Infrastructure-Team-Backlog, 10RESTBase, 10Wikifeeds, 10serviceops: Wikifeeds should send uncachable response in case of some upstream failure - https://phabricator.wikimedia.org/T263100 (10Joe) a:05Joe→03None [08:03:27] (03CR) 10ArielGlenn: "Before I look at this in detail:" [puppet] - 10https://gerrit.wikimedia.org/r/629121 (https://phabricator.wikimedia.org/T259067) (owner: 10Cparle) [08:03:49] 10Operations, 10Patch-For-Review: Migrate dbmonitor hosts to Buster - https://phabricator.wikimedia.org/T224589 (10Joe) a:05Joe→03None [08:04:48] (03CR) 10Filippo Giunchedi: [C: 03+2] profile: add queues to rsyslog kafka output [puppet] - 10https://gerrit.wikimedia.org/r/627865 (https://phabricator.wikimedia.org/T226703) (owner: 10Filippo Giunchedi) [08:06:51] !log marostegui@cumin1001 dbctl commit (dc=all): 'Slowly repool db2084 after index changes T262856', diff saved to https://phabricator.wikimedia.org/P12747 and previous config saved to /var/cache/conftool/dbconfig/20200923-080651-marostegui.json [08:06:55] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:06:56] T262856: Investigate indexes of wb_changes - https://phabricator.wikimedia.org/T262856 [08:09:06] 10Operations, 10DBA, 10Performance-Team, 10Platform Engineering, 10User-Kormat: Remove sections from db configs - https://phabricator.wikimedia.org/T263127 (10daniel) >>! In T263127#6486164, @Marostegui wrote: >>>! In T263127#6485437, @daniel wrote: >> Contributions queries are somewhat special, we may w... [08:14:34] (03PS5) 10Filippo Giunchedi: am: use status.cgi JSON as source for problems [debs/prometheus-icinga-exporter] - 10https://gerrit.wikimedia.org/r/628090 [08:15:33] godog: we don't need a new problem source, we are already over-supplied ;) [08:15:44] Hey everyone, there is currently one wiki pending creation, and I would like to check with the mighty SREs if it is wise to create a wiki when we run via codfw. The task is T262812, if interested. [08:15:45] T262812: Create private arbcom-ru wiki - https://phabricator.wikimedia.org/T262812 [08:15:53] ping marostegui ^^ [08:16:00] kormat: haha! well played [08:16:10] Urbanecm: sweet, is it done already? [08:16:31] marostegui: no, I don't want to create a wiki when we run via codfw unless you tell me it's fine :-) [08:17:37] Urbanecm: I don't know if the process changes in anyway, from a DB point of view, I think it should be transparent (like any other write) but MW-wise that I don't know :) [08:19:26] marostegui: thanks. [08:21:09] For the MW side of things, I think creating a wiki is only a (complex) config change - but I'm not sure about traffic and services side of things [08:23:22] Urbanecm: Yeah, I cannot help you with that :( [08:23:45] I totally understand, I'll try to catch someone with more knowledge about those sides later :-). [08:23:57] 10Operations, 10DBA, 10Performance-Team, 10Platform Engineering, 10User-Kormat: Remove sections from db configs - https://phabricator.wikimedia.org/T263127 (10jcrespo) Daniel: load groups support may be ok- however some research should be done that they *actually* provide a performance benefit. What was... [08:23:59] but I have one reason less to worry about - DB creation 🙂 [08:29:53] !log installing dbus security updates on buster [08:29:56] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:32:00] !log marostegui@cumin1001 dbctl commit (dc=all): 'Slowly repool db2084 after index changes T262856', diff saved to https://phabricator.wikimedia.org/P12748 and previous config saved to /var/cache/conftool/dbconfig/20200923-083200-marostegui.json [08:32:04] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:32:05] T262856: Investigate indexes of wb_changes - https://phabricator.wikimedia.org/T262856 [08:32:12] 10Operations, 10MW-on-K8s, 10serviceops: Decide on logging in k8s for ShellBox - https://phabricator.wikimedia.org/T263545 (10tstarling) @Joe says that application logs can go to logstash while php-fpm error logs will go to k8s. I want to say that the simplest way to send application logs to logstash is to u... [08:38:03] (03PS5) 10Ema: varnish: install packages specifying archive component [puppet] - 10https://gerrit.wikimedia.org/r/629094 (https://phabricator.wikimedia.org/T261632) [08:38:37] (03CR) 10Arturo Borrero Gonzalez: [C: 03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/628879 (https://phabricator.wikimedia.org/T263339) (owner: 10Bstorm) [08:44:36] (03PS1) 10Volans: netbox: add check for uncommitted DNS changes [puppet] - 10https://gerrit.wikimedia.org/r/629321 (https://phabricator.wikimedia.org/T258729) [08:45:18] (03CR) 10Ema: [C: 03+2] varnish: install packages specifying archive component [puppet] - 10https://gerrit.wikimedia.org/r/629094 (https://phabricator.wikimedia.org/T261632) (owner: 10Ema) [08:45:30] (03CR) 10jerkins-bot: [V: 04-1] netbox: add check for uncommitted DNS changes [puppet] - 10https://gerrit.wikimedia.org/r/629321 (https://phabricator.wikimedia.org/T258729) (owner: 10Volans) [08:46:40] (03PS2) 10Volans: netbox: add check for uncommitted DNS changes [puppet] - 10https://gerrit.wikimedia.org/r/629321 (https://phabricator.wikimedia.org/T258729) [09:00:51] (03PS3) 10Volans: netbox: add check for uncommitted DNS changes [puppet] - 10https://gerrit.wikimedia.org/r/629321 (https://phabricator.wikimedia.org/T258729) [09:06:57] (03PS1) 10Arturo Borrero Gonzalez: hiera: labtestvirt2003: fix hiera key names [puppet] - 10https://gerrit.wikimedia.org/r/629323 (https://phabricator.wikimedia.org/T261724) [09:09:07] (03PS15) 10Elukey: WIP: profile::hadoop::common: add datanode mountpoints override [puppet] - 10https://gerrit.wikimedia.org/r/629165 [09:11:00] (03PS16) 10Elukey: WIP: profile::hadoop::common: add datanode mountpoints override [puppet] - 10https://gerrit.wikimedia.org/r/629165 [09:11:55] (03CR) 10Arturo Borrero Gonzalez: [C: 03+2] hiera: labtestvirt2003: fix hiera key names [puppet] - 10https://gerrit.wikimedia.org/r/629323 (https://phabricator.wikimedia.org/T261724) (owner: 10Arturo Borrero Gonzalez) [09:14:39] (03PS17) 10Elukey: profile::hadoop::common: add datanode mountpoints override [puppet] - 10https://gerrit.wikimedia.org/r/629165 [09:18:10] 10Operations, 10DBA, 10Performance-Team, 10Platform Engineering, 10User-Kormat: Remove sections from db configs - https://phabricator.wikimedia.org/T263127 (10ArielGlenn) >>! In T263127#6470471, @Marostegui wrote: > So in the past, the hosts serving those 5 groups used to have different schema partitioni... [09:18:11] (03PS18) 10Elukey: profile::hadoop::common: add datanode mountpoints override [puppet] - 10https://gerrit.wikimedia.org/r/629165 [09:30:07] (03PS1) 10JMeybohm: service: add TLS endpoint for mathoid 1/2 [puppet] - 10https://gerrit.wikimedia.org/r/629325 (https://phabricator.wikimedia.org/T255875) [09:30:09] (03PS1) 10JMeybohm: service: add TLS endpoint for mathoid 2/2 [puppet] - 10https://gerrit.wikimedia.org/r/629326 (https://phabricator.wikimedia.org/T255875) [09:30:11] (03PS1) 10JMeybohm: services_proxy: switch mathoid to the TLS endpoint [puppet] - 10https://gerrit.wikimedia.org/r/629327 (https://phabricator.wikimedia.org/T255875) [09:30:13] (03PS1) 10JMeybohm: lvs: Remove mathoid non-TLS endpoint 1/2 [puppet] - 10https://gerrit.wikimedia.org/r/629328 (https://phabricator.wikimedia.org/T255875) [09:30:15] (03PS1) 10JMeybohm: lvs: Remove mathoid non-TLS endpoint 2/2 [puppet] - 10https://gerrit.wikimedia.org/r/629329 (https://phabricator.wikimedia.org/T255875) [09:34:48] (03CR) 10Giuseppe Lavagetto: [C: 04-1] "The idea is sound and the code is flawless. I would nonetheless get rid of the wmflib::codename type and just validate input within the fu" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/626723 (owner: 10Jbond) [09:34:51] (03CR) 10JMeybohm: [C: 03+1] services: add TLS encrypted endpoint for wikifeeds (1/2) [puppet] - 10https://gerrit.wikimedia.org/r/629154 (https://phabricator.wikimedia.org/T255878) (owner: 10Giuseppe Lavagetto) [09:35:07] 10Operations, 10Traffic, 10Performance-Team (Radar): Depooling single text caching server in esams had a disproportionate performance impact - https://phabricator.wikimedia.org/T238085 (10ema) [09:35:13] 10Operations, 10Traffic, 10Performance-Team (Radar), 10Sustainability (Incident Followup): Edge cache response time per server should be monitored - https://phabricator.wikimedia.org/T238086 (10ema) 05Resolved→03Open The dashboard has stopped working on September 1st: {F32360683} [09:35:36] (03CR) 10JMeybohm: [C: 04-1] "You may want to add a stanza for the TLS endpoint to modules/lvs/manifests/monitor_services.pp" [puppet] - 10https://gerrit.wikimedia.org/r/629155 (https://phabricator.wikimedia.org/T255878) (owner: 10Giuseppe Lavagetto) [09:36:23] (03CR) 10Giuseppe Lavagetto: [C: 04-1] service: add TLS endpoint for mathoid 1/2 (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/629325 (https://phabricator.wikimedia.org/T255875) (owner: 10JMeybohm) [09:37:05] (03CR) 10Giuseppe Lavagetto: [C: 03+1] service: add TLS endpoint for mathoid 2/2 [puppet] - 10https://gerrit.wikimedia.org/r/629326 (https://phabricator.wikimedia.org/T255875) (owner: 10JMeybohm) [09:37:18] (03CR) 10Giuseppe Lavagetto: [C: 03+1] services_proxy: switch mathoid to the TLS endpoint [puppet] - 10https://gerrit.wikimedia.org/r/629327 (https://phabricator.wikimedia.org/T255875) (owner: 10JMeybohm) [09:37:39] (03CR) 10Giuseppe Lavagetto: [C: 03+1] lvs: Remove mathoid non-TLS endpoint 1/2 [puppet] - 10https://gerrit.wikimedia.org/r/629328 (https://phabricator.wikimedia.org/T255875) (owner: 10JMeybohm) [09:37:54] (03CR) 10Giuseppe Lavagetto: [C: 03+1] lvs: Remove mathoid non-TLS endpoint 2/2 [puppet] - 10https://gerrit.wikimedia.org/r/629329 (https://phabricator.wikimedia.org/T255875) (owner: 10JMeybohm) [09:39:08] (03PS2) 10JMeybohm: service: add TLS endpoint for mathoid 1/2 [puppet] - 10https://gerrit.wikimedia.org/r/629325 (https://phabricator.wikimedia.org/T255875) [09:39:10] (03PS2) 10JMeybohm: service: add TLS endpoint for mathoid 2/2 [puppet] - 10https://gerrit.wikimedia.org/r/629326 (https://phabricator.wikimedia.org/T255875) [09:39:12] (03PS2) 10JMeybohm: services_proxy: switch mathoid to the TLS endpoint [puppet] - 10https://gerrit.wikimedia.org/r/629327 (https://phabricator.wikimedia.org/T255875) [09:39:14] (03PS2) 10JMeybohm: lvs: Remove mathoid non-TLS endpoint 1/2 [puppet] - 10https://gerrit.wikimedia.org/r/629328 (https://phabricator.wikimedia.org/T255875) [09:39:16] (03PS2) 10JMeybohm: lvs: Remove mathoid non-TLS endpoint 2/2 [puppet] - 10https://gerrit.wikimedia.org/r/629329 (https://phabricator.wikimedia.org/T255875) [09:39:43] (03CR) 10Alexandros Kosiaris: [C: 04-1] "I can see the appeal of a newer and more modern function, not so sure about the naming. Couple of inline comments." (035 comments) [puppet] - 10https://gerrit.wikimedia.org/r/626723 (owner: 10Jbond) [09:40:00] (03CR) 10JMeybohm: service: add TLS endpoint for mathoid 1/2 (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/629325 (https://phabricator.wikimedia.org/T255875) (owner: 10JMeybohm) [09:40:39] 10Operations, 10Prod-Kubernetes, 10serviceops, 10Kubernetes: Move zotero to use TLS only - https://phabricator.wikimedia.org/T255869 (10JMeybohm) a:03JMeybohm [09:41:15] 10Operations, 10observability: Grafana error: "parse error at char 1: unexpected character: '\\ufeff'" when copy-pasting metric names - https://phabricator.wikimedia.org/T263624 (10ema) p:05Triage→03Medium [09:42:42] (03CR) 10Elukey: "Andrew: this seems to work, my original idea didn't work and puppet defeated me. Let me know if you like this, I am not strongly in favor " [puppet] - 10https://gerrit.wikimedia.org/r/629165 (owner: 10Elukey) [09:43:13] "puppet defeated me" - we should have a 12-steps support group for this [09:44:23] (03PS6) 10Kormat: bsection: Script for binary-searching log files. [puppet] - 10https://gerrit.wikimedia.org/r/627841 [09:45:11] !log marostegui@cumin1001 dbctl commit (dc=all): 'Slowly repool db2084 after index changes T262856', diff saved to https://phabricator.wikimedia.org/P12750 and previous config saved to /var/cache/conftool/dbconfig/20200923-094511-marostegui.json [09:45:16] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:45:18] T262856: Investigate indexes of wb_changes - https://phabricator.wikimedia.org/T262856 [09:48:56] (03PS8) 10Lucas Werkmeister (WMDE): Remove $wgExtraLanguageNames from Wikidata and Commons [mediawiki-config] - 10https://gerrit.wikimedia.org/r/620050 (https://phabricator.wikimedia.org/T260118) (owner: 10Guergana Tzatchkova) [09:49:11] I’m going to deploy some config changes for Wikidata now [09:49:34] (03CR) 10Lucas Werkmeister (WMDE): [C: 03+2] Remove $wgExtraLanguageNames from Wikidata and Commons [mediawiki-config] - 10https://gerrit.wikimedia.org/r/620050 (https://phabricator.wikimedia.org/T260118) (owner: 10Guergana Tzatchkova) [09:50:25] (03Merged) 10jenkins-bot: Remove $wgExtraLanguageNames from Wikidata and Commons [mediawiki-config] - 10https://gerrit.wikimedia.org/r/620050 (https://phabricator.wikimedia.org/T260118) (owner: 10Guergana Tzatchkova) [09:55:44] !log lucaswerkmeister-wmde@deploy1001 Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:620050|Remove $wgExtraLanguageNames from Wikidata and Commons (T260118)]], part 1/2 (duration: 01m 16s) [09:55:48] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:55:50] T260118: Move content of $wgExtraLanguageNames on Wikidata to default Terms languages - https://phabricator.wikimedia.org/T260118 [09:57:05] !log lucaswerkmeister-wmde@deploy1001 Synchronized wmf-config/InitialiseSettings-labs.php: Config: [[gerrit:620050|Remove $wgExtraLanguageNames from Wikidata and Commons (T260118)]], part 2/2 (production no-op) (duration: 01m 04s) [09:57:08] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:57:57] (03PS2) 10Lucas Werkmeister (WMDE): Configure entityDataCachePaths for Wikibase [mediawiki-config] - 10https://gerrit.wikimedia.org/r/629133 [09:58:08] (03CR) 10Lucas Werkmeister (WMDE): [C: 03+2] Configure entityDataCachePaths for Wikibase [mediawiki-config] - 10https://gerrit.wikimedia.org/r/629133 (owner: 10Lucas Werkmeister (WMDE)) [09:58:50] (03Merged) 10jenkins-bot: Configure entityDataCachePaths for Wikibase [mediawiki-config] - 10https://gerrit.wikimedia.org/r/629133 (owner: 10Lucas Werkmeister (WMDE)) [09:59:08] !log update puppet compilter's facts [09:59:11] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:59:18] "compilter's" [09:59:21] nice one Luca [09:59:29] hehe [10:01:02] a filter with computed criteria? :> [10:01:45] !log lucaswerkmeister-wmde@deploy1001 Synchronized wmf-config/Wikibase.php: Config: [[gerrit:629133|Configure entityDataCachePaths for Wikibase]] (duration: 01m 05s) [10:01:48] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:01:53] 10Operations, 10Traffic: Upgrade a production cache node to Varnish 6 - https://phabricator.wikimedia.org/T263557 (10ema) I've upgraded deployment-cache-text06 to Varnish 6, and https://en.wikipedia.beta.wmflabs.org looks fine. Later today I'll use Varnish 6 to run our VTC tests, and then proceed with the upgr... [10:01:57] !log marostegui@cumin1001 dbctl commit (dc=all): 'Fully repool db2084 after index changes T262856', diff saved to https://phabricator.wikimedia.org/P12751 and previous config saved to /var/cache/conftool/dbconfig/20200923-100156-marostegui.json [10:02:01] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:02:02] T262856: Investigate indexes of wb_changes - https://phabricator.wikimedia.org/T262856 [10:02:05] (03PS1) 10JMeybohm: service: add TLS endpoint for zotero 1/3 [puppet] - 10https://gerrit.wikimedia.org/r/629334 (https://phabricator.wikimedia.org/T255869) [10:02:07] (03PS1) 10JMeybohm: service: add TLS endpoint for zotero 2/3 [puppet] - 10https://gerrit.wikimedia.org/r/629335 (https://phabricator.wikimedia.org/T255869) [10:02:09] (03PS1) 10JMeybohm: service: add TLS endpoint for zotero 3/3 [puppet] - 10https://gerrit.wikimedia.org/r/629336 (https://phabricator.wikimedia.org/T255869) [10:02:11] (03PS1) 10JMeybohm: services_proxy: switch zotero to the TLS endpoint [puppet] - 10https://gerrit.wikimedia.org/r/629337 (https://phabricator.wikimedia.org/T255869) [10:02:13] (03PS1) 10JMeybohm: lvs: Remove zotero non-TLS endpoint 1/2 [puppet] - 10https://gerrit.wikimedia.org/r/629338 (https://phabricator.wikimedia.org/T255869) [10:02:15] (03PS1) 10JMeybohm: lvs: Remove zotero non-TLS endpoint 2/2 [puppet] - 10https://gerrit.wikimedia.org/r/629339 (https://phabricator.wikimedia.org/T255869) [10:05:14] 10Operations, 10Traffic, 10Performance-Team (Radar), 10Sustainability (Incident Followup): Edge cache response time per server should be monitored - https://phabricator.wikimedia.org/T238086 (10Gilles) That's because of the switchover, the metric is now coming from the "codfw prometheus/ops" instead of the... [10:07:22] 10Operations, 10Traffic, 10Performance-Team (Radar): Depooling single text caching server in esams had a disproportionate performance impact - https://phabricator.wikimedia.org/T238085 (10Gilles) [10:07:24] 10Operations, 10Traffic, 10Performance-Team (Radar), 10Sustainability (Incident Followup): Edge cache response time per server should be monitored - https://phabricator.wikimedia.org/T238086 (10Gilles) 05Open→03Resolved For now I've switched the source, we'll have to remember doing it again when the pr... [10:08:01] 10Operations, 10DBA, 10Blocked-on-schema-change, 10User-Kormat: Schema change to make change_tag.ct_rc_id unsigned - https://phabricator.wikimedia.org/T259831 (10Kormat) s2 codfw progress: [] db2088.codfw.wmnet [] db2095.codfw.wmnet sanitarium [] db2098.codfw.wmnet dbstore [] db2104.codfw.wmnet [] db2107.c... [10:09:50] (03CR) 10Giuseppe Lavagetto: [C: 03+1] service: add TLS endpoint for mathoid 1/2 [puppet] - 10https://gerrit.wikimedia.org/r/629325 (https://phabricator.wikimedia.org/T255875) (owner: 10JMeybohm) [10:12:03] (03CR) 10Volans: "Compiler results available at https://puppet-compiler.wmflabs.org/compiler1001/25333/" [puppet] - 10https://gerrit.wikimedia.org/r/629321 (https://phabricator.wikimedia.org/T258729) (owner: 10Volans) [10:20:17] !log kormat@cumin1001 START - Cookbook sre.hosts.downtime [10:20:18] !log kormat@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) [10:20:20] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:20:24] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:21:21] !log kormat@cumin1001 dbctl commit (dc=all): 'db2088:3312 depooling: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12752 and previous config saved to /var/cache/conftool/dbconfig/20200923-102120-kormat.json [10:21:25] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:21:25] T259831: Schema change to make change_tag.ct_rc_id unsigned - https://phabricator.wikimedia.org/T259831 [10:33:35] (03PS1) 10Effie Mouzeli: service_proxy: enable ipv6 on envoy config [puppet] - 10https://gerrit.wikimedia.org/r/629343 (https://phabricator.wikimedia.org/T255568) [10:40:37] (03PS3) 10Jbond: wmflib::debian::version: update the os_version [puppet] - 10https://gerrit.wikimedia.org/r/626723 [10:41:55] (03CR) 10jerkins-bot: [V: 04-1] wmflib::debian::version: update the os_version [puppet] - 10https://gerrit.wikimedia.org/r/626723 (owner: 10Jbond) [10:42:03] !log kormat@cumin1001 dbctl commit (dc=all): 'db2088:3312 (re)pooling @ 25%: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12753 and previous config saved to /var/cache/conftool/dbconfig/20200923-104202-kormat.json [10:42:07] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:42:08] T259831: Schema change to make change_tag.ct_rc_id unsigned - https://phabricator.wikimedia.org/T259831 [10:42:17] (03PS1) 10Muehlenhoff: Add a helper to dump/restore memcached for reboots [puppet] - 10https://gerrit.wikimedia.org/r/629344 (https://phabricator.wikimedia.org/T233933) [10:42:52] (03CR) 10jerkins-bot: [V: 04-1] Add a helper to dump/restore memcached for reboots [puppet] - 10https://gerrit.wikimedia.org/r/629344 (https://phabricator.wikimedia.org/T233933) (owner: 10Muehlenhoff) [10:48:31] (03CR) 10Muehlenhoff: [C: 03+1] "LGTM, two nits inline" (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/629321 (https://phabricator.wikimedia.org/T258729) (owner: 10Volans) [10:51:17] (03PS4) 10Volans: netbox: add check for uncommitted DNS changes [puppet] - 10https://gerrit.wikimedia.org/r/629321 (https://phabricator.wikimedia.org/T258729) [10:51:28] (03CR) 10Volans: "Thanks, comments addressed." (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/629321 (https://phabricator.wikimedia.org/T258729) (owner: 10Volans) [10:53:43] (03PS2) 10Muehlenhoff: Add a helper to dump/restore memcached for reboots [puppet] - 10https://gerrit.wikimedia.org/r/629344 (https://phabricator.wikimedia.org/T233933) [10:55:38] (03CR) 10Muehlenhoff: [C: 03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/629321 (https://phabricator.wikimedia.org/T258729) (owner: 10Volans) [10:56:18] (03CR) 10Volans: [C: 03+2] netbox: add check for uncommitted DNS changes [puppet] - 10https://gerrit.wikimedia.org/r/629321 (https://phabricator.wikimedia.org/T258729) (owner: 10Volans) [10:56:51] (03PS3) 10Muehlenhoff: Add a helper to dump/restore memcached for reboots [puppet] - 10https://gerrit.wikimedia.org/r/629344 (https://phabricator.wikimedia.org/T233933) [10:57:06] !log kormat@cumin1001 dbctl commit (dc=all): 'db2088:3312 (re)pooling @ 50%: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12754 and previous config saved to /var/cache/conftool/dbconfig/20200923-105705-kormat.json [10:57:10] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:57:12] T259831: Schema change to make change_tag.ct_rc_id unsigned - https://phabricator.wikimedia.org/T259831 [11:00:05] Amir1, Lucas_WMDE, awight, and Urbanecm: How many deployers does it take to do European mid-day backport window deploy? (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20200923T1100). [11:00:05] kart_: A patch you scheduled for European mid-day backport window is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker. [11:00:26] * Urbanecm waves [11:00:32] kart_: do you want to self-service? [11:00:32] * kart_ is here [11:00:51] Urbanecm: can you help in table creation? Never did that :) [11:00:54] (03PS4) 10Jbond: wmflib::debian::version: update the os_version [puppet] - 10https://gerrit.wikimedia.org/r/626723 [11:01:10] kart_: sure [11:01:28] Thanks. We need to create table first and then config patch need to deploy after that. [11:01:40] Only on 'testwiki'. [11:01:42] kart_: okay. Where can I find the table schema? [11:01:54] (I guess it is somewhere in the extension's repo) [11:02:24] ContentTranslation/sql [11:02:33] all of them need to create. [11:02:33] (03CR) 10Jbond: "Thanks for the quick responses, i have made comments inline but ultimately i agree that the signature is not ideal." (035 comments) [puppet] - 10https://gerrit.wikimedia.org/r/626723 (owner: 10Jbond) [11:03:41] kart_: so, we need to create cx_lists, cx_notification_log, cx_suggestions, cx_translations and cx_translators, all at testwiki only, is that right? [11:04:07] Urbanecm: yeah. [11:04:11] thanks [11:05:40] Urbanecm: let me know command or full log it too, so I can note down for future ref. [11:05:45] will do [11:05:47] (03CR) 10jerkins-bot: [V: 04-1] wmflib::debian::version: update the os_version [puppet] - 10https://gerrit.wikimedia.org/r/626723 (owner: 10Jbond) [11:08:22] kart_: So, tables should be created, see the paste: https://www.irccloud.com/pastebin/r9jUcD4W/ [11:08:55] !log Create ContentTranslation tables at testwiki using SQL files from `/srv/mediawiki/php-1.36.0-wmf.10/extensions/ContentTranslation/sql` (T263417 [11:08:59] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:09:00] T263417: Exclude testwikis and private wikis from old unpublished CX draft purge script run - https://phabricator.wikimedia.org/T263417 [11:09:29] kart_: as for the syntax, I used `mwscript mysql.php --wiki=testwiki --write < /srv/mediawiki/php-1.36.0-wmf.10/extensions/ContentTranslation/sql/lists.sql` [11:09:49] (and similar command for the rest of the tables) [11:10:41] Noted. Thanks! [11:11:18] kart_: do you want me to deploy the config change too? [11:11:19] kart_: those tables are only meant to be on testwiki, right? [11:11:34] marostegui: yep. [11:11:40] kart_: got it, thank you [11:12:03] Urbanecm: Please go ahead, since you've already done warmup :) [11:12:07] sure :) [11:12:09] !log kormat@cumin1001 dbctl commit (dc=all): 'db2088:3312 (re)pooling @ 75%: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12755 and previous config saved to /var/cache/conftool/dbconfig/20200923-111209-kormat.json [11:12:14] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:12:15] T259831: Schema change to make change_tag.ct_rc_id unsigned - https://phabricator.wikimedia.org/T259831 [11:12:28] (03PS5) 10Urbanecm: ContentTranslation: Remove testwiki from extension1 cluster [mediawiki-config] - 10https://gerrit.wikimedia.org/r/628790 (https://phabricator.wikimedia.org/T263417) (owner: 10KartikMistry) [11:12:32] (03CR) 10Urbanecm: [C: 03+2] ContentTranslation: Remove testwiki from extension1 cluster [mediawiki-config] - 10https://gerrit.wikimedia.org/r/628790 (https://phabricator.wikimedia.org/T263417) (owner: 10KartikMistry) [11:12:38] kart_: will you be able to test it? [11:13:04] Urbanecm: Yeah. Good idea to test it before deploy. [11:13:04] hi, am I too late to add two patches to backport? [11:13:10] PROBLEM - Work requests waiting in Zuul Gearman server on contint2001 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [150.0] https://www.mediawiki.org/wiki/Continuous_integration/Zuul https://grafana.wikimedia.org/dashboard/db/zuul-gearman?panelId=10&fullscreen&orgId=1 [11:13:14] kostajh: not at all, please add them to the calendar :) [11:13:18] k, doing it now [11:13:24] (03Merged) 10jenkins-bot: ContentTranslation: Remove testwiki from extension1 cluster [mediawiki-config] - 10https://gerrit.wikimedia.org/r/628790 (https://phabricator.wikimedia.org/T263417) (owner: 10KartikMistry) [11:13:32] kart_: cool, thanks [11:13:47] (03PS1) 10Kosta Harlan: Mark pageviews as not used in the mobile postedit notice [extensions/GrowthExperiments] (wmf/1.36.0-wmf.10) - 10https://gerrit.wikimedia.org/r/629201 (https://phabricator.wikimedia.org/T263611) [11:13:48] kart_: pulled onto mwdebug2001 [11:14:10] (03PS1) 10Kosta Harlan: Fix GrowthTasksApi lazy-loading flags for pages with no views [extensions/GrowthExperiments] (wmf/1.36.0-wmf.10) - 10https://gerrit.wikimedia.org/r/629202 (https://phabricator.wikimedia.org/T263611) [11:15:19] Urbanecm: looks like we've DB error: Cannot access the database: Unknown error (10.192.16.105)) [11:15:31] looking [11:15:40] looking too [11:15:48] RECOVERY - Work requests waiting in Zuul Gearman server on contint2001 is OK: OK: Less than 100.00% above the threshold [90.0] https://www.mediawiki.org/wiki/Continuous_integration/Zuul https://grafana.wikimedia.org/dashboard/db/zuul-gearman?panelId=10&fullscreen&orgId=1 [11:16:23] Urbanecm: added them [11:16:37] so the DB looks ok [11:17:31] (03CR) 10Jbond: [C: 03+1] "LGTM minor optional nit" (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/629344 (https://phabricator.wikimedia.org/T233933) (owner: 10Muehlenhoff) [11:17:36] marostegui: indeed, the tables are there [11:18:10] kart_: are you sure false is right value for wmgContentTranslationCluster? [11:19:26] kart_: I feel I know what the issue is [11:19:47] 10Operations, 10CAS-SSO, 10Patch-For-Review, 10User-jbond: mod_auth_cas segfaulting on Stretch Apache setups using OpenSSL 1.0.2 and 1.1 (netmon/yarn) - https://phabricator.wikimedia.org/T257587 (10MoritzMuehlenhoff) 05Open→03Declined There isn't a technically viable solution to this bug other than mov... [11:19:49] kart_: once we set wmgContentTranslationCluster to false, it means wgContentTranslationCluster is set to false, but wgContentTranslationDatabase is still set to wikishared [11:20:02] that means we're querying s3 servers for wikishared database, which does not exist, hence the DB error [11:20:49] kart_: you need to upload a followup patch that touches lines 3461-3465 of CommonSettings.php to set wgContentTranslationDatabase to false as well (the extension's default) [11:21:02] ah. [11:21:20] https://pastebin.com/SdFR5CN1 - full backtrace. [11:21:47] thanks (but I have logstash opened :-)) [11:21:59] (03CR) 10Urbanecm: [C: 03+2] Fix GrowthTasksApi lazy-loading flags for pages with no views [extensions/GrowthExperiments] (wmf/1.36.0-wmf.10) - 10https://gerrit.wikimedia.org/r/629202 (https://phabricator.wikimedia.org/T263611) (owner: 10Kosta Harlan) [11:22:02] (03CR) 10Urbanecm: [C: 03+2] Mark pageviews as not used in the mobile postedit notice [extensions/GrowthExperiments] (wmf/1.36.0-wmf.10) - 10https://gerrit.wikimedia.org/r/629201 (https://phabricator.wikimedia.org/T263611) (owner: 10Kosta Harlan) [11:22:37] Urbanecm: :D [11:23:18] kart_: https://gerrit.wikimedia.org/g/mediawiki/extensions/ContentTranslation/+/f345acce2292fc238733f23bfba2115065fe07ad/includes/Database.php#20, this definitely tries to query wikishared database at s3 servers. Moving https://gerrit.wikimedia.org/g/operations/mediawiki-config/+/f93fdfac6e3d30a0b0cf0602d1d6077182b5b7c4/wmf-config/CommonSettings.php#3465 to be in the if should fix this IMO [11:23:20] what do you think kart_ ? [11:23:31] (03PS5) 10Jbond: wmflib::debian::version: update the os_version [puppet] - 10https://gerrit.wikimedia.org/r/626723 [11:24:16] Urbanecm: right. [11:24:24] (03PS6) 10Jbond: wmflib::debian::version: update the os_version [puppet] - 10https://gerrit.wikimedia.org/r/626723 [11:24:40] can you upload a follow-up patch please? I can also just revert if you need to think more about it [11:24:45] kostajh: ftr, I +2'ed the backports, will ping you once they're ready to be tested [11:25:13] (03CR) 10Muehlenhoff: Add a helper to dump/restore memcached for reboots (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/629344 (https://phabricator.wikimedia.org/T233933) (owner: 10Muehlenhoff) [11:25:43] (03CR) 10jerkins-bot: [V: 04-1] wmflib::debian::version: update the os_version [puppet] - 10https://gerrit.wikimedia.org/r/626723 (owner: 10Jbond) [11:26:21] Urbanecm: cool I’m here when you’re ready :) [11:26:28] (03PS1) 10Volans: netbox: set timeout for nrpe check [puppet] - 10https://gerrit.wikimedia.org/r/629358 (https://phabricator.wikimedia.org/T258729) [11:26:32] cool :) [11:26:56] 10Operations, 10ops-eqiad, 10DBA, 10netops, and 3 others: Upgrade eqiad rack D4 to 10G switch - https://phabricator.wikimedia.org/T196487 (10jijiki) [11:27:13] !log kormat@cumin1001 dbctl commit (dc=all): 'db2088:3312 (re)pooling @ 100%: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12756 and previous config saved to /var/cache/conftool/dbconfig/20200923-112712-kormat.json [11:27:17] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:27:18] T259831: Schema change to make change_tag.ct_rc_id unsigned - https://phabricator.wikimedia.org/T259831 [11:27:23] (03PS1) 10KartikMistry: Set ContentTranslationDatabase to wikishared only if ContentTranslationCluster is set [mediawiki-config] - 10https://gerrit.wikimedia.org/r/629360 (https://phabricator.wikimedia.org/T263417) [11:27:40] Urbanecm: Uploaded fix. [11:27:49] thanks, looking [11:27:50] (03PS4) 10Muehlenhoff: Add a helper to dump/restore memcached for reboots [puppet] - 10https://gerrit.wikimedia.org/r/629344 (https://phabricator.wikimedia.org/T233933) [11:27:59] (03CR) 10Urbanecm: [C: 03+2] Set ContentTranslationDatabase to wikishared only if ContentTranslationCluster is set [mediawiki-config] - 10https://gerrit.wikimedia.org/r/629360 (https://phabricator.wikimedia.org/T263417) (owner: 10KartikMistry) [11:28:53] (03Merged) 10jenkins-bot: Set ContentTranslationDatabase to wikishared only if ContentTranslationCluster is set [mediawiki-config] - 10https://gerrit.wikimedia.org/r/629360 (https://phabricator.wikimedia.org/T263417) (owner: 10KartikMistry) [11:29:30] kart_: pulled onto mwdebug2001, can you test it again? [11:29:52] Sure [11:30:40] (03PS7) 10Jbond: wmflib::debian::version: update the os_version [puppet] - 10https://gerrit.wikimedia.org/r/626723 [11:30:57] (03Merged) 10jenkins-bot: Fix GrowthTasksApi lazy-loading flags for pages with no views [extensions/GrowthExperiments] (wmf/1.36.0-wmf.10) - 10https://gerrit.wikimedia.org/r/629202 (https://phabricator.wikimedia.org/T263611) (owner: 10Kosta Harlan) [11:31:00] (03Merged) 10jenkins-bot: Mark pageviews as not used in the mobile postedit notice [extensions/GrowthExperiments] (wmf/1.36.0-wmf.10) - 10https://gerrit.wikimedia.org/r/629201 (https://phabricator.wikimedia.org/T263611) (owner: 10Kosta Harlan) [11:31:18] Urbanecm: seems working, but let me also check if content is saved in testwiki DBs. What's best way to check that? [11:31:29] (03CR) 10Volans: [C: 03+2] "self merging to test it, I'm not sure if there is a max timeout hardcoded in nrpe" [puppet] - 10https://gerrit.wikimedia.org/r/629358 (https://phabricator.wikimedia.org/T258729) (owner: 10Volans) [11:31:33] kart_: ssh to mwmaint2001, and do "sql testwiki" [11:34:32] (03PS1) 10Urbanecm: Revert "Disable deprecated warning in Language::commafy() for non numeric string" [core] (wmf/1.36.0-wmf.10) - 10https://gerrit.wikimedia.org/r/629203 (https://phabricator.wikimedia.org/T263592) [11:34:55] (03CR) 10Urbanecm: [V: 03+2 C: 03+2] "reverting undeployed patch that got merged to a wmf branch" [core] (wmf/1.36.0-wmf.10) - 10https://gerrit.wikimedia.org/r/629203 (https://phabricator.wikimedia.org/T263592) (owner: 10Urbanecm) [11:36:33] 10Operations, 10CAS-SSO, 10User-jbond: Cross data center setup for CAS - https://phabricator.wikimedia.org/T233931 (10MoritzMuehlenhoff) 05Open→03Resolved a:03MoritzMuehlenhoff We have a working dual DC setup with a replicated session store in memcached/mcrouter and the U2F tokens are synched with a sy... [11:36:36] 10Operations, 10Security-Team, 10CAS-SSO, 10User-jbond: Further steps for CAS/web SSO - https://phabricator.wikimedia.org/T233921 (10MoritzMuehlenhoff) [11:36:41] Urbanecm: seems I'm still getting empty tables :/ [11:38:00] Urbanecm: oh and I can see my other draft, that means, DB is not being used. [11:38:15] 😞 [11:38:47] !log Revert https://gerrit.wikimedia.org/r/c/mediawiki/core/+/629188 and fetch to deploy1001 to unblock EU B&C deployment (T237467; cc twentyafterfour) [11:38:51] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:38:52] T237467: Invariant failed: Bad UTF-8 (full string verification) - https://phabricator.wikimedia.org/T237467 [11:39:15] kart_: what is supposed to fill those tables? Jobs, or direct POST requests? [11:39:33] Urbanecm: direct as content should be saved. [11:40:08] kart_: hmm. I suggest to revert both patches, and investigate why it doesn't work later [11:40:17] (03PS1) 10Ayounsi: Add vrrp_master_pinning in eqiad [homer/public] - 10https://gerrit.wikimedia.org/r/629364 (https://phabricator.wikimedia.org/T263212) [11:40:31] Urbanecm: yeah. Please. [11:40:35] doing [11:40:52] Urbanecm: Please keep cx_* tables. [11:41:04] kart_: Sure, I'm not dropping anything :) [11:41:09] :) [11:41:16] (03PS1) 10Urbanecm: Revert "ContentTranslation: Remove testwiki from extension1 cluster" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/629204 (https://phabricator.wikimedia.org/T263417) [11:41:25] (03PS1) 10Urbanecm: Revert "Set ContentTranslationDatabase to wikishared only if ContentTranslationCluster is set" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/629205 (https://phabricator.wikimedia.org/T263417) [11:41:27] (03CR) 10Urbanecm: [C: 03+2] Revert "ContentTranslation: Remove testwiki from extension1 cluster" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/629204 (https://phabricator.wikimedia.org/T263417) (owner: 10Urbanecm) [11:41:32] (03CR) 10Urbanecm: [C: 03+2] Revert "Set ContentTranslationDatabase to wikishared only if ContentTranslationCluster is set" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/629205 (https://phabricator.wikimedia.org/T263417) (owner: 10Urbanecm) [11:42:16] (03Merged) 10jenkins-bot: Revert "ContentTranslation: Remove testwiki from extension1 cluster" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/629204 (https://phabricator.wikimedia.org/T263417) (owner: 10Urbanecm) [11:42:19] (03Merged) 10jenkins-bot: Revert "Set ContentTranslationDatabase to wikishared only if ContentTranslationCluster is set" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/629205 (https://phabricator.wikimedia.org/T263417) (owner: 10Urbanecm) [11:42:31] kart_: done [11:42:48] Urbanecm: thanks a lot! I'll come up with better patches tomorrow :) [11:42:54] sure! [11:43:17] kostajh: I pulled your patches to mwdebug2001, can you test them, please? [11:43:26] Urbanecm: yep [11:44:56] (03CR) 10Ayounsi: "Changes for 1 devices: ['cr1-eqiad.wikimedia.org']" [homer/public] - 10https://gerrit.wikimedia.org/r/629364 (https://phabricator.wikimedia.org/T263212) (owner: 10Ayounsi) [11:45:34] kostajh: thanks, please let me know once done :) [11:46:56] Urbanecm: yeah, looks good [11:47:17] thanks, syncing :) [11:47:17] 10Operations, 10observability: Grafana error: "parse error at char 1: unexpected character: '\\ufeff'" when copy-pasting metric names - https://phabricator.wikimedia.org/T263624 (10ema) Similarly, if I copy a value from Grafana and paste it in my terminal I get this sort of stuff: ` sum(irate 10Operations, 10ops-eqiad, 10DBA, 10DC-Ops: (Need By: 2020-08-31) rack/setup/install es10[26-34].eqiad.wmnet - https://phabricator.wikimedia.org/T260370 (10Marostegui) @wiki_willy do you think it is feasible to have these hosts racked&installed by 30th Oct? [11:49:26] !log urbanecm@deploy1001 Synchronized php-1.36.0-wmf.10/extensions/GrowthExperiments/modules/help/ext.growthExperiments.PostEdit.js: 1ab31a966edc4748f82f75bb370371733c2ca090: Mark pageviews as not used in the mobile postedit notice (T263611) (duration: 01m 06s) [11:49:29] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:49:30] T263611: [wmf.10] mobile - mw-ge-small-task-card-pageviews skeleton persistently displayed in post-edit dialog - https://phabricator.wikimedia.org/T263611 [11:49:43] thanks Urbanecm! [11:49:47] no problem [11:50:41] (the second patch is now being synced) [11:51:21] !log urbanecm@deploy1001 Synchronized php-1.36.0-wmf.10/extensions/GrowthExperiments/modules/homepage/suggestededits/ext.growthExperiments.Homepage.GrowthTasksApi.js: 73b5ce82b3913232b708405147f0bb6d27128974: Fix GrowthTasksApi lazy-loading flags for pages with no views (T263611) (duration: 01m 05s) [11:51:25] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:51:30] kostajh: done all :) [11:51:32] anything else? [11:52:27] !log installing GNUTLS bugfix updates from buster 10.5 point release [11:52:30] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:55:15] (03PS1) 10Ema: cp4027: upgrade to Varnish 6 [puppet] - 10https://gerrit.wikimedia.org/r/629366 (https://phabricator.wikimedia.org/T263557) [11:57:03] 10Operations, 10DBA, 10Blocked-on-schema-change, 10User-Kormat: Schema change to make change_tag.ct_rc_id unsigned - https://phabricator.wikimedia.org/T259831 (10Kormat) [11:57:22] (03CR) 10Huji: [C: 03+1] Enable wgCheckUserLogLogins at all wikis but few large wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/629227 (https://phabricator.wikimedia.org/T253802) (owner: 10Urbanecm) [11:58:41] (03PS1) 10Ema: varnish: use component/varnish6 for VTC tests [puppet] - 10https://gerrit.wikimedia.org/r/629368 (https://phabricator.wikimedia.org/T263557) [11:59:06] (03CR) 10jerkins-bot: [V: 04-1] varnish: use component/varnish6 for VTC tests [puppet] - 10https://gerrit.wikimedia.org/r/629368 (https://phabricator.wikimedia.org/T263557) (owner: 10Ema) [11:59:11] marostegui: Sorry I missed your question about 'wmgContentTranslationCluster' earlier during deployment. [11:59:26] marostegui: Do you know what was wrong there? [11:59:46] kart_: Which question? [12:01:23] marostegui: Sorry. It was by Urbanecm. [12:01:28] :-) [12:01:30] I really need coffee now. [12:03:11] (03PS1) 10Effie Mouzeli: hieradata: enable onhost memcached on mwdeb1001 [puppet] - 10https://gerrit.wikimedia.org/r/629369 (https://phabricator.wikimedia.org/T244340) [12:06:36] (03CR) 10Ema: [C: 03+2] cp4027: upgrade to Varnish 6 [puppet] - 10https://gerrit.wikimedia.org/r/629366 (https://phabricator.wikimedia.org/T263557) (owner: 10Ema) [12:06:52] (03CR) 10Ema: [V: 03+2 C: 03+2] varnish: use component/varnish6 for VTC tests [puppet] - 10https://gerrit.wikimedia.org/r/629368 (https://phabricator.wikimedia.org/T263557) (owner: 10Ema) [12:08:29] 10Operations, 10Traffic, 10Patch-For-Review: Upgrade a production cache node to Varnish 6 - https://phabricator.wikimedia.org/T263557 (10ema) All cache_text VTC tests green with 6.0.6, proceeding with the upgrade of cp4027. [12:09:14] !log cp4027: depool and upgrade varnish to 6.0.6-1wm1 T263557 [12:09:19] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:09:20] T263557: Upgrade a production cache node to Varnish 6 - https://phabricator.wikimedia.org/T263557 [12:13:59] 10Operations, 10DBA, 10Performance-Team, 10Platform Engineering, 10User-Kormat: Remove groups from db configs - https://phabricator.wikimedia.org/T263127 (10Kormat) [12:21:36] (03PS1) 10KartikMistry: WIP: ContentTranslation: Do not use wikishared for testwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/629371 (https://phabricator.wikimedia.org/T263417) [12:22:46] !log cp4027: repool with varnish 6.0.6-1wm1 T263557 [12:22:50] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:22:51] T263557: Upgrade a production cache node to Varnish 6 - https://phabricator.wikimedia.org/T263557 [12:29:02] 10Operations, 10DBA, 10Performance-Team, 10Platform Engineering, 10User-Kormat: Remove groups from db configs - https://phabricator.wikimedia.org/T263127 (10Kormat) Another reason to remove load groups where possible is they make it very difficult to predict what effect depooling a db server will have on... [12:31:17] (03CR) 10Volans: [C: 03+1] "Looks coherent with the task. Evaluating the consequence of the different return paths was already achieved in the task AFAIUI" [homer/public] - 10https://gerrit.wikimedia.org/r/629364 (https://phabricator.wikimedia.org/T263212) (owner: 10Ayounsi) [12:34:13] seeing a lot of ErrorException from line 57 of /srv/mediawiki/php-1.36.0-wmf.9/extensions/GlobalUsage/includes/ApiQueryGlobalUsage.php: PHP Notice: Undefined index: [12:34:48] and it's all happening on mw2262 & mw2287 for some reason [12:35:29] !log marostegui@cumin1001 dbctl commit (dc=all): 'db2074 (re)pooling @ 25%: Slowly db2074 ', diff saved to https://phabricator.wikimedia.org/P12758 and previous config saved to /var/cache/conftool/dbconfig/20200923-123528-root.json [12:35:32] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:36:50] !log kormat@cumin1001 dbctl commit (dc=all): 'Add db2088:3312 to api while db2104 gets depooled T259831', diff saved to https://phabricator.wikimedia.org/P12759 and previous config saved to /var/cache/conftool/dbconfig/20200923-123649-kormat.json [12:36:53] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:36:55] T259831: Schema change to make change_tag.ct_rc_id unsigned - https://phabricator.wikimedia.org/T259831 [12:37:09] !log kormat@cumin1001 START - Cookbook sre.hosts.downtime [12:37:09] !log kormat@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) [12:37:12] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:37:15] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:38:07] !log marostegui@cumin1001 dbctl commit (dc=all): 'Depool db2074', diff saved to https://phabricator.wikimedia.org/P12760 and previous config saved to /var/cache/conftool/dbconfig/20200923-123806-marostegui.json [12:38:10] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:38:13] 10Operations: Integrate Buster 10.5 point release - https://phabricator.wikimedia.org/T259519 (10MoritzMuehlenhoff) [12:38:17] 10Puppet: puppetdb seems to be slow on host reimage - https://phabricator.wikimedia.org/T263578 (10jbond) I had a look around today and i couldn't see anything obvious. I think it would be useful to increase the log level on puppetdb the next time we reimage something that uses codfw. I checked and autovacum... [12:39:22] !log kormat@cumin1001 dbctl commit (dc=all): 'db2104 depooling: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12761 and previous config saved to /var/cache/conftool/dbconfig/20200923-123922-kormat.json [12:39:26] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:40:45] (03CR) 10Jbond: "> Patch Set 2:" [puppet] - 10https://gerrit.wikimedia.org/r/628970 (owner: 10Dzahn) [12:49:42] !log installing libunwind bugfix updates from buster 10.5 point release [12:49:45] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:50:15] (03PS1) 10Muehlenhoff: Add library hints for libunwind [puppet] - 10https://gerrit.wikimedia.org/r/629373 [12:51:16] (03PS1) 10Filippo Giunchedi: pontoon: add kafka settings to o11y [puppet] - 10https://gerrit.wikimedia.org/r/629374 [12:51:32] (03CR) 10Muehlenhoff: [C: 03+2] Add library hints for libunwind [puppet] - 10https://gerrit.wikimedia.org/r/629373 (owner: 10Muehlenhoff) [12:52:17] (03CR) 10Filippo Giunchedi: [C: 03+2] pontoon: add kafka settings to o11y [puppet] - 10https://gerrit.wikimedia.org/r/629374 (owner: 10Filippo Giunchedi) [12:59:11] !log marostegui@cumin1001 dbctl commit (dc=all): 'db2074 (re)pooling @ 25%: Slowly repool db2074 ', diff saved to https://phabricator.wikimedia.org/P12762 and previous config saved to /var/cache/conftool/dbconfig/20200923-125911-root.json [12:59:14] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:02:51] 10Operations: Integrate Buster 10.5 point release - https://phabricator.wikimedia.org/T259519 (10MoritzMuehlenhoff) [13:04:09] !log installing multipath-tools bugfix updates from buster 10.5 point release [13:04:12] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:05:19] !log kormat@cumin1001 dbctl commit (dc=all): 'db2104 (re)pooling @ 25%: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12763 and previous config saved to /var/cache/conftool/dbconfig/20200923-130518-kormat.json [13:05:23] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:05:24] T259831: Schema change to make change_tag.ct_rc_id unsigned - https://phabricator.wikimedia.org/T259831 [13:08:42] (03CR) 10Ottomata: "LG! Some ideas" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/629165 (owner: 10Elukey) [13:14:15] !log marostegui@cumin1001 dbctl commit (dc=all): 'db2074 (re)pooling @ 75%: Slowly repool db2074 ', diff saved to https://phabricator.wikimedia.org/P12764 and previous config saved to /var/cache/conftool/dbconfig/20200923-131414-root.json [13:14:18] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:20:11] !log installing ruby-json security updates [13:20:15] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:20:22] !log kormat@cumin1001 dbctl commit (dc=all): 'db2104 (re)pooling @ 50%: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12765 and previous config saved to /var/cache/conftool/dbconfig/20200923-132022-kormat.json [13:20:26] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:20:27] T259831: Schema change to make change_tag.ct_rc_id unsigned - https://phabricator.wikimedia.org/T259831 [13:23:31] (03PS1) 10Filippo Giunchedi: pontoon: use Python API for enroll [puppet] - 10https://gerrit.wikimedia.org/r/629378 [13:23:33] (03PS1) 10Filippo Giunchedi: pontoon: cleanup/update ENC [puppet] - 10https://gerrit.wikimedia.org/r/629379 [13:24:10] (03CR) 10jerkins-bot: [V: 04-1] pontoon: use Python API for enroll [puppet] - 10https://gerrit.wikimedia.org/r/629378 (owner: 10Filippo Giunchedi) [13:29:18] !log marostegui@cumin1001 dbctl commit (dc=all): 'db2074 (re)pooling @ 100%: Slowly repool db2074 ', diff saved to https://phabricator.wikimedia.org/P12766 and previous config saved to /var/cache/conftool/dbconfig/20200923-132918-root.json [13:29:21] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:35:25] !log kormat@cumin1001 dbctl commit (dc=all): 'db2104 (re)pooling @ 75%: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12768 and previous config saved to /var/cache/conftool/dbconfig/20200923-133525-kormat.json [13:35:30] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:35:31] T259831: Schema change to make change_tag.ct_rc_id unsigned - https://phabricator.wikimedia.org/T259831 [13:35:43] 10Operations: Integrate Buster 10.5 point release - https://phabricator.wikimedia.org/T259519 (10MoritzMuehlenhoff) [13:41:21] (03PS1) 10Muehlenhoff: Configure bastions for Buster on next reimage [puppet] - 10https://gerrit.wikimedia.org/r/629380 (https://phabricator.wikimedia.org/T243057) [13:42:06] (03PS2) 10Filippo Giunchedi: pontoon: use Python API for enroll [puppet] - 10https://gerrit.wikimedia.org/r/629378 [13:42:08] (03PS2) 10Filippo Giunchedi: pontoon: cleanup/update ENC [puppet] - 10https://gerrit.wikimedia.org/r/629379 [13:50:01] (03CR) 10Muehlenhoff: [C: 03+2] Configure bastions for Buster on next reimage [puppet] - 10https://gerrit.wikimedia.org/r/629380 (https://phabricator.wikimedia.org/T243057) (owner: 10Muehlenhoff) [13:50:29] !log kormat@cumin1001 dbctl commit (dc=all): 'db2104 (re)pooling @ 100%: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12769 and previous config saved to /var/cache/conftool/dbconfig/20200923-135028-kormat.json [13:50:33] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:50:35] T259831: Schema change to make change_tag.ct_rc_id unsigned - https://phabricator.wikimedia.org/T259831 [13:51:02] (03CR) 10CDanis: [C: 03+1] Add vrrp_master_pinning in eqiad [homer/public] - 10https://gerrit.wikimedia.org/r/629364 (https://phabricator.wikimedia.org/T263212) (owner: 10Ayounsi) [13:59:34] (03PS1) 10Lucas Werkmeister (WMDE): Enable repo config propagateChangeVisibility everywhere [mediawiki-config] - 10https://gerrit.wikimedia.org/r/629383 [14:01:36] (03PS1) 10Elukey: Add sre.hadoop.init_hadoop_workers.py [cookbooks] - 10https://gerrit.wikimedia.org/r/629384 (https://phabricator.wikimedia.org/T262189) [14:03:30] (03CR) 10Volans: Add sre.hadoop.init_hadoop_workers.py (031 comment) [cookbooks] - 10https://gerrit.wikimedia.org/r/629384 (https://phabricator.wikimedia.org/T262189) (owner: 10Elukey) [14:04:06] (03CR) 10Elukey: Add sre.hadoop.init_hadoop_workers.py (031 comment) [cookbooks] - 10https://gerrit.wikimedia.org/r/629384 (https://phabricator.wikimedia.org/T262189) (owner: 10Elukey) [14:04:45] (03PS2) 10Elukey: Add sre.hadoop.init_hadoop_workers.py [cookbooks] - 10https://gerrit.wikimedia.org/r/629384 (https://phabricator.wikimedia.org/T262189) [14:04:57] time-to-minus-1-from-riccardo ~= 5 secs [14:04:59] :D [14:11:58] lol [14:14:19] (03CR) 10Tobias Andersson: [C: 03+1] Enable repo config propagateChangeVisibility everywhere [mediawiki-config] - 10https://gerrit.wikimedia.org/r/629383 (owner: 10Lucas Werkmeister (WMDE)) [14:14:49] (03CR) 10Muehlenhoff: Manage /etc/apt/sources.list via Puppet (033 comments) [puppet] - 10https://gerrit.wikimedia.org/r/626693 (https://phabricator.wikimedia.org/T156562) (owner: 10Muehlenhoff) [14:15:11] (03PS8) 10Muehlenhoff: Manage /etc/apt/sources.list via Puppet [puppet] - 10https://gerrit.wikimedia.org/r/626693 (https://phabricator.wikimedia.org/T156562) [14:19:45] (03PS9) 10Muehlenhoff: Manage /etc/apt/sources.list via Puppet [puppet] - 10https://gerrit.wikimedia.org/r/626693 (https://phabricator.wikimedia.org/T156562) [14:20:35] (03CR) 10Filippo Giunchedi: "LGTM overall, see inline" (033 comments) [puppet] - 10https://gerrit.wikimedia.org/r/626693 (https://phabricator.wikimedia.org/T156562) (owner: 10Muehlenhoff) [14:22:12] (03PS3) 10Elukey: Add sre.hadoop.init_hadoop_workers.py [cookbooks] - 10https://gerrit.wikimedia.org/r/629384 (https://phabricator.wikimedia.org/T262189) [14:25:22] I’ll deploy a small Wikidata config change, since the deployment calendar looks free at the moment [14:25:23] PROBLEM - Check systemd state on stat1006 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [14:27:21] (03CR) 10Lucas Werkmeister (WMDE): [C: 03+2] Enable repo config propagateChangeVisibility everywhere [mediawiki-config] - 10https://gerrit.wikimedia.org/r/629383 (owner: 10Lucas Werkmeister (WMDE)) [14:28:22] (03Merged) 10jenkins-bot: Enable repo config propagateChangeVisibility everywhere [mediawiki-config] - 10https://gerrit.wikimedia.org/r/629383 (owner: 10Lucas Werkmeister (WMDE)) [14:28:59] quickly testing on mwdebug2001 [14:31:55] seems fine, syncing (in two steps) [14:32:16] (03PS1) 10Herron: icinga: switch active server from icinga1001 to alert1001 [puppet] - 10https://gerrit.wikimedia.org/r/629408 (https://phabricator.wikimedia.org/T247966) [14:33:48] !log lucaswerkmeister-wmde@deploy1001 Synchronized wmf-config/Wikibase.php: Config: [[gerrit:629383|Enable repo config propagateChangeVisibility everywhere]], 1/2 (duration: 01m 06s) [14:33:51] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:35:11] !log lucaswerkmeister-wmde@deploy1001 Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:629383|Enable repo config propagateChangeVisibility everywhere]], 2/2 (duration: 01m 06s) [14:35:13] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:37:23] !log grew prometheus1004 prometheus-ops filesystem to 1.6T [14:37:26] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:37:50] (03PS4) 10Elukey: Add sre.hadoop.init_hadoop_workers.py [cookbooks] - 10https://gerrit.wikimedia.org/r/629384 (https://phabricator.wikimedia.org/T262189) [14:40:55] (03PS5) 10Elukey: Add sre.hadoop.init-hadoop-workers.py [cookbooks] - 10https://gerrit.wikimedia.org/r/629384 (https://phabricator.wikimedia.org/T262189) [14:42:25] (03PS1) 10Tchanders: SpecialUnblock: Allow getTargetAndType to accept null $par [core] (wmf/1.36.0-wmf.10) - 10https://gerrit.wikimedia.org/r/629390 (https://phabricator.wikimedia.org/T263642) [14:42:29] (03PS1) 10Fdans: dumps::web::fetches::stat_dumps: add rsync job for pageview complete [puppet] - 10https://gerrit.wikimedia.org/r/629409 (https://phabricator.wikimedia.org/T251777) [14:42:49] (03CR) 10Elukey: "Merging to test on cumin1001." [cookbooks] - 10https://gerrit.wikimedia.org/r/629384 (https://phabricator.wikimedia.org/T262189) (owner: 10Elukey) [14:42:51] (03CR) 10Elukey: [C: 03+2] Add sre.hadoop.init-hadoop-workers.py [cookbooks] - 10https://gerrit.wikimedia.org/r/629384 (https://phabricator.wikimedia.org/T262189) (owner: 10Elukey) [14:44:24] !log kormat@cumin1001 START - Cookbook sre.hosts.downtime [14:44:25] !log kormat@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) [14:44:27] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:44:30] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:44:42] !log kormat@cumin1001 dbctl commit (dc=all): 'db2126 depooling: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12770 and previous config saved to /var/cache/conftool/dbconfig/20200923-144441-kormat.json [14:44:46] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:44:47] T259831: Schema change to make change_tag.ct_rc_id unsigned - https://phabricator.wikimedia.org/T259831 [14:45:29] (03CR) 10Muehlenhoff: Manage /etc/apt/sources.list via Puppet (034 comments) [puppet] - 10https://gerrit.wikimedia.org/r/626693 (https://phabricator.wikimedia.org/T156562) (owner: 10Muehlenhoff) [14:45:48] (03CR) 10DannyS712: [C: 03+1] SpecialUnblock: Allow getTargetAndType to accept null $par [core] (wmf/1.36.0-wmf.10) - 10https://gerrit.wikimedia.org/r/629390 (https://phabricator.wikimedia.org/T263642) (owner: 10Tchanders) [14:47:00] (03PS10) 10Muehlenhoff: Manage /etc/apt/sources.list via Puppet [puppet] - 10https://gerrit.wikimedia.org/r/626693 (https://phabricator.wikimedia.org/T156562) [14:48:10] !log kormat@cumin1001 START - Cookbook sre.hosts.downtime [14:48:11] !log kormat@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) [14:48:13] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:48:17] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:48:18] (03PS1) 10Mholloway: Update push-notifications config and image [deployment-charts] - 10https://gerrit.wikimedia.org/r/629410 [14:49:24] (03PS1) 10Elukey: sre.hadoop.init-hadoop-workers: add more logging and fix run_async usage [cookbooks] - 10https://gerrit.wikimedia.org/r/629411 [14:51:45] (03CR) 10Mholloway: [C: 03+2] Update push-notifications config and image [deployment-charts] - 10https://gerrit.wikimedia.org/r/629410 (owner: 10Mholloway) [14:52:01] (03PS2) 10Elukey: sre.hadoop.init-hadoop-workers: add more logging and fix run_async usage [cookbooks] - 10https://gerrit.wikimedia.org/r/629411 [14:53:21] (03CR) 10Elukey: [C: 03+2] sre.hadoop.init-hadoop-workers: add more logging and fix run_async usage [cookbooks] - 10https://gerrit.wikimedia.org/r/629411 (owner: 10Elukey) [14:54:30] (03Merged) 10jenkins-bot: Update push-notifications config and image [deployment-charts] - 10https://gerrit.wikimedia.org/r/629410 (owner: 10Mholloway) [14:55:01] (03PS2) 10Jforrester: SpecialUnblock: Allow getTargetAndType to accept null $par [core] (wmf/1.36.0-wmf.10) - 10https://gerrit.wikimedia.org/r/629390 (https://phabricator.wikimedia.org/T263642) (owner: 10Tchanders) [14:55:15] (03PS11) 10Jbond: Manage /etc/apt/sources.list via Puppet [puppet] - 10https://gerrit.wikimedia.org/r/626693 (https://phabricator.wikimedia.org/T156562) (owner: 10Muehlenhoff) [14:55:30] (03CR) 10Jbond: [C: 03+1] "LGTM" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/626693 (https://phabricator.wikimedia.org/T156562) (owner: 10Muehlenhoff) [14:55:44] (03CR) 10Tchanders: [C: 03+1] SpecialUnblock: Allow getTargetAndType to accept null $par [core] (wmf/1.36.0-wmf.10) - 10https://gerrit.wikimedia.org/r/629390 (https://phabricator.wikimedia.org/T263642) (owner: 10Tchanders) [14:55:57] 10Operations, 10DBA, 10Performance-Team, 10Platform Engineering, 10User-Kormat: Remove groups from db configs - https://phabricator.wikimedia.org/T263127 (10Krinkle) [14:57:47] 10Operations, 10DBA, 10Performance-Team, 10Platform Engineering, 10User-Kormat: Remove groups from db configs - https://phabricator.wikimedia.org/T263127 (10Krinkle) I've added to the task description: > * [ ] Understanding and agreement on which of these (if any) we need to keep, and why. To be carried... [14:59:38] 10Operations, 10ops-eqiad, 10DBA, 10DC-Ops: (Need By: 2020-08-31) rack/setup/install es10[26-34].eqiad.wmnet - https://phabricator.wikimedia.org/T260370 (10wiki_willy) Hi @Marostegui - I think that should be doable. During my sync up with @Cmjohnson and @RobH tomorrow, we'll discuss and see if we can get... [15:00:00] 10Operations, 10ops-eqiad, 10DBA, 10DC-Ops: (Need By: 2020-08-31) rack/setup/install es10[26-34].eqiad.wmnet - https://phabricator.wikimedia.org/T260370 (10Marostegui) Thank you! [15:01:30] (03PS1) 10Elukey: sre.hadoop.init-hadoop-workers.py: fix disk labels window [cookbooks] - 10https://gerrit.wikimedia.org/r/629413 [15:13:09] !log kormat@cumin1001 dbctl commit (dc=all): 'db2126 (re)pooling @ 25%: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12771 and previous config saved to /var/cache/conftool/dbconfig/20200923-151308-kormat.json [15:13:12] (03PS1) 10Herron: dns: point icinga CNAMEs to alert1001 [dns] - 10https://gerrit.wikimedia.org/r/629415 (https://phabricator.wikimedia.org/T247966) [15:13:13] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:13:14] T259831: Schema change to make change_tag.ct_rc_id unsigned - https://phabricator.wikimedia.org/T259831 [15:14:27] (03CR) 10JMeybohm: "In PCC this looks okay (https://puppet-compiler.wmflabs.org/compiler1003/25339/)." (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/629343 (https://phabricator.wikimedia.org/T255568) (owner: 10Effie Mouzeli) [15:14:52] 10Operations, 10DNS, 10Traffic, 10netbox: Netbox DNS change not effective in gdns - https://phabricator.wikimedia.org/T255748 (10BBlack) @Volans is this closeable now? [15:14:55] (03CR) 10JMeybohm: [C: 04-1] service_proxy: enable ipv6 on envoy config [puppet] - 10https://gerrit.wikimedia.org/r/629343 (https://phabricator.wikimedia.org/T255568) (owner: 10Effie Mouzeli) [15:17:11] (03PS1) 10Elukey: sre.hadoop.init-hadoop-workers.py: better handling of disk labels/letters [cookbooks] - 10https://gerrit.wikimedia.org/r/629416 [15:19:15] (03PS1) 10CDanis: allow useful Jenkins URLs [puppet] - 10https://gerrit.wikimedia.org/r/629417 (https://phabricator.wikimedia.org/T178458) [15:19:35] (03CR) 10Elukey: [C: 03+2] sre.hadoop.init-hadoop-workers.py: better handling of disk labels/letters [cookbooks] - 10https://gerrit.wikimedia.org/r/629416 (owner: 10Elukey) [15:21:10] !log elukey@cumin1001 START - Cookbook sre.hadoop.init-hadoop-workers [15:21:13] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:21:23] !log elukey@cumin1001 END (FAIL) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=99) [15:21:26] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:21:49] 10Operations, 10DNS, 10Traffic, 10netbox: Netbox DNS change not effective in gdns - https://phabricator.wikimedia.org/T255748 (10Volans) 05Open→03Resolved a:03Volans I think so didn't get any report of issues. [15:27:06] (03PS1) 10Elukey: sre.hadoop.init-hadoop-workers.py: fix usage of tune2fs [cookbooks] - 10https://gerrit.wikimedia.org/r/629420 [15:28:13] !log kormat@cumin1001 dbctl commit (dc=all): 'db2126 (re)pooling @ 50%: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12772 and previous config saved to /var/cache/conftool/dbconfig/20200923-152812-kormat.json [15:28:17] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:28:18] T259831: Schema change to make change_tag.ct_rc_id unsigned - https://phabricator.wikimedia.org/T259831 [15:28:39] (03CR) 10Elukey: [C: 03+2] sre.hadoop.init-hadoop-workers.py: fix usage of tune2fs [cookbooks] - 10https://gerrit.wikimedia.org/r/629420 (owner: 10Elukey) [15:30:02] !log elukey@cumin1001 START - Cookbook sre.hadoop.init-hadoop-workers [15:30:06] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:31:21] (03CR) 10Urbanecm: [C: 03+2] "train blocker" [core] (wmf/1.36.0-wmf.10) - 10https://gerrit.wikimedia.org/r/629390 (https://phabricator.wikimedia.org/T263642) (owner: 10Tchanders) [15:33:21] 10Operations, 10ops-codfw, 10DC-Ops, 10decommission-hardware: decommission es2014.codfw.wmnet - https://phabricator.wikimedia.org/T262889 (10Papaul) ` papaul@asw-a-codfw# show | compare [edit interfaces interface-range vlan-private1-a-codfw] - member ge-1/0/5; [edit interfaces interface-range disabled]... [15:33:32] !log elukey@cumin1001 END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0) [15:33:35] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:33:38] * elukey dances [15:35:36] 10Operations, 10ops-codfw, 10DC-Ops, 10decommission-hardware: decommission es2014.codfw.wmnet - https://phabricator.wikimedia.org/T262889 (10Papaul) [15:36:44] (03CR) 10Filippo Giunchedi: [C: 03+1] Manage /etc/apt/sources.list via Puppet [puppet] - 10https://gerrit.wikimedia.org/r/626693 (https://phabricator.wikimedia.org/T156562) (owner: 10Muehlenhoff) [15:37:25] !log elukey@cumin1001 START - Cookbook sre.hadoop.init-hadoop-workers [15:37:27] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:38:32] (03CR) 10ArielGlenn: "Consider adding a README if there isn't one already; see e.g. https://github.com/wikimedia/puppet/blob/production/modules/dumps/files/web/" [puppet] - 10https://gerrit.wikimedia.org/r/629409 (https://phabricator.wikimedia.org/T251777) (owner: 10Fdans) [15:39:09] 10Operations, 10Product-Infrastructure-Team-Backlog, 10RESTBase, 10Wikifeeds, and 2 others: wikifeeds OpenAPI spec test doesn't fail if the response from `feed/featured` is malformed - https://phabricator.wikimedia.org/T263097 (10LGoto) [15:39:15] 10Operations, 10Product-Infrastructure-Team-Backlog, 10RESTBase, 10Wikifeeds, and 2 others: wikifeeds OpenAPI spec test doesn't fail if the response from `feed/featured` is malformed - https://phabricator.wikimedia.org/T263097 (10LGoto) p:05Triage→03Low [15:39:40] (03CR) 10Filippo Giunchedi: graphite-carbon: disable internal log rotation and use logrotate (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/628423 (https://phabricator.wikimedia.org/T263103) (owner: 10Herron) [15:40:07] (03CR) 10Fdans: "@ArielGlenn thank you, there will be a subsequent change that adds the public page for the dumps, effectively publishing them, but this ch" [puppet] - 10https://gerrit.wikimedia.org/r/629409 (https://phabricator.wikimedia.org/T251777) (owner: 10Fdans) [15:40:26] (03CR) 10Dzahn: [C: 03+2] "https://puppet-compiler.wmflabs.org/compiler1003/25340/" [puppet] - 10https://gerrit.wikimedia.org/r/629223 (owner: 10Dzahn) [15:40:44] !log elukey@cumin1001 END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0) [15:40:47] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:42:13] (03CR) 10Filippo Giunchedi: [C: 03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/628973 (owner: 10Dzahn) [15:42:15] (03PS2) 10Dzahn: nutcracker: hiera-lookup, add data types [puppet] - 10https://gerrit.wikimedia.org/r/628461 [15:43:03] (03PS1) 10ArielGlenn: use previous pagerange in guessing intervals for page content dumps [dumps] - 10https://gerrit.wikimedia.org/r/629422 (https://phabricator.wikimedia.org/T263319) [15:43:16] !log kormat@cumin1001 dbctl commit (dc=all): 'db2126 (re)pooling @ 75%: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12773 and previous config saved to /var/cache/conftool/dbconfig/20200923-154315-kormat.json [15:43:19] !log elukey@cumin1001 START - Cookbook sre.hadoop.init-hadoop-workers [15:43:19] !log elukey@cumin1001 END (FAIL) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=99) [15:43:20] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:43:22] T259831: Schema change to make change_tag.ct_rc_id unsigned - https://phabricator.wikimedia.org/T259831 [15:43:25] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:43:28] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:43:52] (03PS1) 10Jbond: wmflib: drop Wmflib::Sourceurl and replace with Stdlib::Filesource [puppet] - 10https://gerrit.wikimedia.org/r/629423 [15:44:03] !log elukey@cumin1001 START - Cookbook sre.hadoop.init-hadoop-workers [15:44:04] !log elukey@cumin1001 END (FAIL) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=99) [15:44:06] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:44:09] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:44:14] sorry for the spam [15:45:09] !log elukey@cumin1001 START - Cookbook sre.hadoop.init-hadoop-workers [15:45:11] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:45:14] (03CR) 10jerkins-bot: [V: 04-1] wmflib: drop Wmflib::Sourceurl and replace with Stdlib::Filesource [puppet] - 10https://gerrit.wikimedia.org/r/629423 (owner: 10Jbond) [15:45:32] !log pt1979@cumin2001 START - Cookbook sre.dns.netbox [15:45:35] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:47:03] (03CR) 10Dzahn: [V: 03+1 C: 03+2] "noop on scb hosts - https://puppet-compiler.wmflabs.org/compiler1003/25342/" [puppet] - 10https://gerrit.wikimedia.org/r/628461 (owner: 10Dzahn) [15:48:30] !log elukey@cumin1001 END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0) [15:48:33] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:48:54] !log elukey@cumin1001 START - Cookbook sre.hadoop.init-hadoop-workers [15:48:56] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:50:41] !log pt1979@cumin2001 END (FAIL) - Cookbook sre.dns.netbox (exit_code=99) [15:50:46] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:50:49] (03CR) 10Cwhite: "Overall, LGTM. I'd prefer not to merge until the migration to buster is complete in case we have to match in the interim." (032 comments) [debs/prometheus-icinga-exporter] - 10https://gerrit.wikimedia.org/r/628090 (owner: 10Filippo Giunchedi) [15:51:21] (03CR) 10Cwhite: "> Patch Set 5:" (032 comments) [debs/prometheus-icinga-exporter] - 10https://gerrit.wikimedia.org/r/628090 (owner: 10Filippo Giunchedi) [15:51:34] (03Merged) 10jenkins-bot: SpecialUnblock: Allow getTargetAndType to accept null $par [core] (wmf/1.36.0-wmf.10) - 10https://gerrit.wikimedia.org/r/629390 (https://phabricator.wikimedia.org/T263642) (owner: 10Tchanders) [15:51:46] !log pt1979@cumin2001 START - Cookbook sre.dns.netbox [15:51:49] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:52:06] (03PS1) 10Dzahn: acme_chief: replace hiera() with lookup() [puppet] - 10https://gerrit.wikimedia.org/r/629424 [15:52:30] !log elukey@cumin1001 END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0) [15:52:33] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:52:56] (03PS2) 10Dzahn: acme_chief: replace hiera() with lookup() [puppet] - 10https://gerrit.wikimedia.org/r/629424 [15:53:53] !log pt1979@cumin2001 END (PASS) - Cookbook sre.dns.netbox (exit_code=0) [15:53:56] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:54:14] (03PS6) 10Filippo Giunchedi: am: use status.cgi JSON as source for problems [debs/prometheus-icinga-exporter] - 10https://gerrit.wikimedia.org/r/628090 [15:54:46] (03PS2) 10Jbond: wmflib: drop Wmflib::Sourceurl and replace with Stdlib::Filesource [puppet] - 10https://gerrit.wikimedia.org/r/629423 [15:54:50] (03CR) 10Filippo Giunchedi: am: use status.cgi JSON as source for problems (031 comment) [debs/prometheus-icinga-exporter] - 10https://gerrit.wikimedia.org/r/628090 (owner: 10Filippo Giunchedi) [15:55:05] (03PS1) 10Dzahn: statistics::web: hiera->lookup, add data type [puppet] - 10https://gerrit.wikimedia.org/r/629425 [15:55:25] (03PS1) 10Ahmon Dancy: Disable deprecated warning in Language::commafy() for non numeric string [core] (wmf/1.36.0-wmf.10) - 10https://gerrit.wikimedia.org/r/629426 (https://phabricator.wikimedia.org/T263592) [15:55:27] 10Operations, 10ops-codfw, 10DC-Ops, 10decommission-hardware: decommission es2014.codfw.wmnet - https://phabricator.wikimedia.org/T262889 (10Papaul) [15:56:10] !log urbanecm@deploy1001 Synchronized php-1.36.0-wmf.10/includes/specials/SpecialUnblock.php: 3234fad0d9b370b1cf75093dd13c0e1639619f08: SpecialUnblock: Allow getTargetAndType to accept null $par (T263642) (duration: 01m 08s) [15:56:10] (03CR) 10DannyS712: [C: 03+1] Disable deprecated warning in Language::commafy() for non numeric string [core] (wmf/1.36.0-wmf.10) - 10https://gerrit.wikimedia.org/r/629426 (https://phabricator.wikimedia.org/T263592) (owner: 10Ahmon Dancy) [15:56:13] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:56:14] T263642: TypeError when visiting Special:Unblock with no subpage - https://phabricator.wikimedia.org/T263642 [15:56:33] Urbanecm https://www.mediawiki.org/wiki/Special:Unblock no longer throwing [15:56:40] thanks, indeed [15:57:29] !log urbanecm@deploy1001 Synchronized php-1.36.0-wmf.10/includes/specials/SpecialBlock.php: 3234fad0d9b370b1cf75093dd13c0e1639619f08: SpecialUnblock: Allow getTargetAndType to accept null $par (T263642) (duration: 01m 07s) [15:57:33] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:57:44] (03PS1) 10Dzahn: diffscan: hiera->lookup, add data types [puppet] - 10https://gerrit.wikimedia.org/r/629427 [15:57:53] DannyS712: ad your +1 few rows earlier, https://gerrit.wikimedia.org/r/c/mediawiki/core/+/629219 isn't merged through :) [15:57:58] !log updating firmware on mw1360, troubleshooting nic failure issue T262151 [15:58:02] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:58:03] T262151: mw1360's NIC is faulty - https://phabricator.wikimedia.org/T262151 [15:58:19] !log kormat@cumin1001 dbctl commit (dc=all): 'db2126 (re)pooling @ 100%: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12774 and previous config saved to /var/cache/conftool/dbconfig/20200923-155819-kormat.json [15:58:21] (03PS3) 10Muehlenhoff: Remove stretch-backports from bootstrapvz config [puppet] - 10https://gerrit.wikimedia.org/r/610121 (https://phabricator.wikimedia.org/T256881) [15:58:23] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:58:24] T259831: Schema change to make change_tag.ct_rc_id unsigned - https://phabricator.wikimedia.org/T259831 [15:59:23] Urbanecm why give a +1 when you can give a +2 ? https://gerrit.wikimedia.org/r/c/mediawiki/core/+/629219 should be merging now [15:59:39] I meant the wmf/ branch version :) [16:00:10] !log kormat@cumin1001 dbctl commit (dc=all): 'Remove db2088:3312 from api now that db2104/db2126 are done T259831', diff saved to https://phabricator.wikimedia.org/P12775 and previous config saved to /var/cache/conftool/dbconfig/20200923-160010-kormat.json [16:00:14] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:00:44] !log switching icinga over from icinga1001 to alert1001 T247966 [16:00:48] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:00:49] T247966: Migrate role::alerting_host to Buster - https://phabricator.wikimedia.org/T247966 [16:01:24] (03CR) 10Herron: [C: 03+2] dns: point icinga CNAMEs to alert1001 [dns] - 10https://gerrit.wikimedia.org/r/629415 (https://phabricator.wikimedia.org/T247966) (owner: 10Herron) [16:02:02] 10Operations, 10netops: Configure BGP route damping on Anycast sessions - https://phabricator.wikimedia.org/T262372 (10CDanis) I'm no expert here, but seems reasonable enough to me. [16:05:44] (03CR) 10Herron: [C: 03+2] icinga: switch active server from icinga1001 to alert1001 [puppet] - 10https://gerrit.wikimedia.org/r/629408 (https://phabricator.wikimedia.org/T247966) (owner: 10Herron) [16:06:03] (03PS1) 10Evrifaessa: Merge branch 'master' of https://gerrit.wikimedia.org/r/operations/mediawiki-config [mediawiki-config] - 10https://gerrit.wikimedia.org/r/629428 [16:06:05] (03PS1) 10Evrifaessa: Set $wgCategoryCollation = uca-tr on trwikiquote [mediawiki-config] - 10https://gerrit.wikimedia.org/r/629429 (https://phabricator.wikimedia.org/T263628) [16:06:40] 10Operations, 10Traffic: Clean up DNS server puppetization - https://phabricator.wikimedia.org/T240285 (10BBlack) 05Open→03Resolved The new puppetization has been stable for quite a while now, we can resolve this, as it's kind of ambiguous what if any further improvements are warranted outside of any speci... [16:06:44] 10Operations, 10Traffic, 10netops, 10Patch-For-Review, 10Performance-Team (Radar): Anycast AuthDNS - https://phabricator.wikimedia.org/T98006 (10BBlack) [16:06:49] (03Abandoned) 10Evrifaessa: Merge branch 'master' of https://gerrit.wikimedia.org/r/operations/mediawiki-config [mediawiki-config] - 10https://gerrit.wikimedia.org/r/629428 (owner: 10Evrifaessa) [16:08:17] 10Operations, 10ops-eqiad, 10serviceops: mw1360's NIC is faulty - https://phabricator.wikimedia.org/T262151 (10RobH) All tests passed with no issues. I've updated the firmware to the newest version of bios, which is the mainboard firmware. The system no longer sees the NIC. I suppose we should try to reim... [16:09:37] PROBLEM - Prometheus jobs reduced availability on alert1001 is CRITICAL: job={swagger_check_citoid_cluster_codfw,swagger_check_citoid_cluster_eqiad} site={codfw,eqiad} https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [16:11:38] 10Operations, 10Traffic: Define 3-host infra cluster for traffic pops - https://phabricator.wikimedia.org/T96852 (10BBlack) [16:11:42] 10Operations, 10Traffic: Consolidate misc servers at edge sites - https://phabricator.wikimedia.org/T257323 (10BBlack) [16:12:00] 10Operations, 10Traffic: Define 3-host infra cluster for traffic pops - https://phabricator.wikimedia.org/T96852 (10BBlack) ^ Remaining work superseded by new plans in the ticket this was closed into. [16:13:35] 10Operations, 10ops-eqiad, 10serviceops: mw1360's NIC is faulty - https://phabricator.wikimedia.org/T262151 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by robh on cumin1001.eqiad.wmnet for hosts: ` mw1360.eqiad.wmnet ` The log can be found in `/var/log/wmf-auto-reimage/202009231613_robh_1862_... [16:13:40] 10Operations, 10ops-eqiad, 10serviceops: mw1360's NIC is faulty - https://phabricator.wikimedia.org/T262151 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['mw1360.eqiad.wmnet'] ` Of which those **FAILED**: ` ['mw1360.eqiad.wmnet'] ` [16:16:40] 10Operations, 10ops-eqiad, 10serviceops: mw1360's NIC is faulty - https://phabricator.wikimedia.org/T262151 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by robh on cumin1001.eqiad.wmnet for hosts: ` mw1360.eqiad.wmnet ` The log can be found in `/var/log/wmf-auto-reimage/202009231616_robh_4519_... [16:17:42] (03CR) 10Ahmon Dancy: [C: 03+2] Disable deprecated warning in Language::commafy() for non numeric string [core] (wmf/1.36.0-wmf.10) - 10https://gerrit.wikimedia.org/r/629426 (https://phabricator.wikimedia.org/T263592) (owner: 10Ahmon Dancy) [16:18:08] 10Operations, 10Parsing-Team, 10TechCom, 10serviceops, and 4 others: Strategy for storing parser output for "old revision" (Popular diffs and permalinks) - https://phabricator.wikimedia.org/T244058 (10daniel) a:03holger.knust Assigning Holger, per yesterday's meeting. [16:19:54] 10Operations, 10Jade, 10TechCom, 10Epic, and 3 others: Deploy pilot of Jade to a small set of wikis. - https://phabricator.wikimedia.org/T183381 (10calbon) 05Open→03Resolved a:03calbon [16:19:59] (03PS1) 10Filippo Giunchedi: prometheus: aggregation rules for ats-tls client TTFB [puppet] - 10https://gerrit.wikimedia.org/r/629430 (https://phabricator.wikimedia.org/T263536) [16:20:07] PROBLEM - Maps - OSM synchronization lag - eqiad on alert1001 is CRITICAL: 2.1e+07 ge 2.592e+05 https://wikitech.wikimedia.org/wiki/Maps/Runbook https://grafana.wikimedia.org/dashboard/db/maps-performances?panelId=11&fullscreen&orgId=1 [16:20:18] 10Operations, 10ops-codfw, 10DC-Ops: codfw: Fixing Accounting/Netbox error - https://phabricator.wikimedia.org/T263658 (10wiki_willy) [16:21:56] 10Operations, 10observability, 10Patch-For-Review, 10Performance-Team (Radar): Fully migrate producers off statsd - https://phabricator.wikimedia.org/T205870 (10calbon) [16:22:46] 10Operations: Strongswan Icinga check: do not report issues about depooled hosts - https://phabricator.wikimedia.org/T148976 (10BBlack) p:05Medium→03Triage This was mostly about cache nodes back when those had ipsec, I think. The remaining case that uses ipsec anymore is the memcached cluster. Does this ma... [16:23:41] 10Operations, 10Traffic: High number of failed inbound TFO connections in esams Mon-Fri - https://phabricator.wikimedia.org/T143562 (10BBlack) 05Open→03Declined No movement in 4 years. If there are new/ongoing TFO issues, someone should make a new ticket about them! [16:23:56] 10Puppet, 10SRE-tools, 10Patch-For-Review, 10Python3-Porting, and 3 others: Forward port Python2 files to Python3 in Puppet Repository - https://phabricator.wikimedia.org/T247364 (10crusnov) These all pass Python 3 tox and have no patches for automated porting, so seem ready for the Python 3 future. They a... [16:25:46] 10Operations, 10RESTBase, 10RESTBase-API, 10Traffic, 10Services (next): RESTBase support for www.wikimedia.org missing - https://phabricator.wikimedia.org/T133178 (10BBlack) @Krinkle - Is this ticket still worth pursuing at all? [16:27:58] 10Operations, 10ops-eqiad, 10serviceops: mw1360's NIC is faulty - https://phabricator.wikimedia.org/T262151 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['mw1360.eqiad.wmnet'] ` Of which those **FAILED**: ` ['mw1360.eqiad.wmnet'] ` [16:29:39] PROBLEM - Stale file for node-exporter textfile in eqiad on alert1001 is CRITICAL: cluster=misc file=smartmon.prom instance=relforge1004 job=node site=eqiad https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/knkl4dCWz/node-exporter-textfile [16:30:04] 10Operations, 10CX-cxserver, 10Citoid, 10RESTBase, and 3 others: Decom legacy ex-parsoidcache cxserver, citoid, and restbase service hostnames - https://phabricator.wikimedia.org/T133001 (10BBlack) @Pchelolo what about `https://cxserver.wikimedia.org/` - Can it be removed? Or is it better to just ignore i... [16:30:42] (03PS1) 10Effie Mouzeli: WIP canary_appserver: install memcached if use_onhost_memcache [puppet] - 10https://gerrit.wikimedia.org/r/629431 (https://phabricator.wikimedia.org/T244340) [16:31:27] (03CR) 10jerkins-bot: [V: 04-1] WIP canary_appserver: install memcached if use_onhost_memcache [puppet] - 10https://gerrit.wikimedia.org/r/629431 (https://phabricator.wikimedia.org/T244340) (owner: 10Effie Mouzeli) [16:32:02] 10Operations, 10Jade, 10TechCom, 10Epic, and 3 others: Deploy pilot of Jade to a small set of wikis. - https://phabricator.wikimedia.org/T183381 (10DannyS712) @calbon jade doesn't appear to be enabled in production on any wiki. Per `InitialiseSettings` ` lang=php 'wmgUseJADE' => [ 'default' => false, ], ` [16:33:29] PROBLEM - MediaWiki exceptions and fatals per minute on alert1001 is CRITICAL: 226 gt 100 https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [16:33:53] PROBLEM - OSPF status on cr2-codfw is CRITICAL: OSPFv2: 4/5 UP : OSPFv3: 5/5 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status [16:34:12] 10Operations, 10ORES, 10Machine Learning Platform (Research): Investigate memory usage of ORES in kubernetes - https://phabricator.wikimedia.org/T210264 (10calbon) 05Open→03Resolved [16:34:15] 10Operations, 10Machine Learning Platform, 10ORES: [Epic] Deploy ORES in kubernetes cluster - https://phabricator.wikimedia.org/T182331 (10calbon) [16:35:03] RECOVERY - MediaWiki exceptions and fatals per minute on alert1001 is OK: (C)100 gt (W)50 gt 9 https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [16:37:23] !log upload dnsdist_1.4.0-1~deb10u2 to apt.wm.o (buster) - T252132 [16:37:29] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:37:31] T252132: Deploy Wikidough: Experimental DNS-over-HTTPS (DoH) public resolver - https://phabricator.wikimedia.org/T252132 [16:37:47] PROBLEM - Router interfaces on cr2-eqiad is CRITICAL: CRITICAL: host 208.80.154.197, interfaces up: 240, down: 1, dormant: 0, excluded: 0, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down [16:38:26] 10Operations, 10Traffic, 10HTTPS: Inbound TLS for tier-1 varnish backend caches - https://phabricator.wikimedia.org/T109321 (10BBlack) 05Open→03Invalid There is no more varnish-be [16:38:28] 10Operations, 10Traffic, 10HTTPS, 10codfw-rollout: HTTPS for internal service traffic - https://phabricator.wikimedia.org/T108580 (10BBlack) [16:38:35] (03Merged) 10jenkins-bot: Disable deprecated warning in Language::commafy() for non numeric string [core] (wmf/1.36.0-wmf.10) - 10https://gerrit.wikimedia.org/r/629426 (https://phabricator.wikimedia.org/T263592) (owner: 10Ahmon Dancy) [16:38:47] 10Operations, 10Traffic, 10HTTPS, 10codfw-rollout: HTTPS for internal service traffic - https://phabricator.wikimedia.org/T108580 (10BBlack) [16:38:57] 10Operations, 10Traffic, 10HTTPS, 10codfw-rollout: Outbound HTTPS for varnish backend instances - https://phabricator.wikimedia.org/T109325 (10BBlack) 05Open→03Invalid There is no more varnish-be [16:40:32] (03CR) 10Hashar: [C: 03+1] "That will definitely cause Jenkins to load the whole build history which is a few hundred of thousands XML files. That is the reason they" [puppet] - 10https://gerrit.wikimedia.org/r/629417 (https://phabricator.wikimedia.org/T178458) (owner: 10CDanis) [16:44:03] 10Operations, 10Traffic, 10HTTPS, 10codfw-rollout: HTTPS for internal service traffic - https://phabricator.wikimedia.org/T108580 (10BBlack) All subtasks gone, but there are technically stlil a few edges cases showing up in the trafficserver backend-facing config. Specifically: ` $ grep 'replacement: htt... [16:45:03] 10Operations, 10Traffic, 10Wikimedia-General-or-Unknown, 10HTTPS: Wikimedia's recent upgrade to nginx v. 1.13.6 breaks older Android HTTP libraries - https://phabricator.wikimedia.org/T180269 (10BBlack) 05Open→03Declined We've long since moved on from this. Nginx isn't even terminating our public TLS... [16:50:37] 10Operations, 10Data-Persistence, 10Data-Persistence-Backup, 10observability: check-mariadb-backups pkg_resources.VersionConflict: 0.2 (/usr/lib/python3/dist-packages), Requirement.parse('wmfbackups==0.1' - https://phabricator.wikimedia.org/T263662 (10herron) [16:50:53] 10Operations, 10Data-Persistence, 10Data-Persistence-Backup, 10observability: check-mariadb-backups pkg_resources.VersionConflict: 0.2 (/usr/lib/python3/dist-packages), Requirement.parse('wmfbackups==0.1') - https://phabricator.wikimedia.org/T263662 (10herron) [16:51:09] (03PS1) 10Volans: dns: convert check Icinga to read/save state [software/netbox-extras] - 10https://gerrit.wikimedia.org/r/629432 (https://phabricator.wikimedia.org/T258729) [16:52:57] 10Operations, 10Traffic, 10Goal: Establish timeline and methodology for upcoming deprecation of non-forward-secret ciphers and TLSv1.0 - https://phabricator.wikimedia.org/T192559 (10BBlack) 05Open→03Resolved a:03BBlack A lot of this planning is already-done, and the remainder of the plans are in progre... [16:54:36] 10Operations, 10Data-Persistence, 10Data-Persistence-Backup, 10observability: check-mariadb-backups pkg_resources.VersionConflict: 0.2 (/usr/lib/python3/dist-packages), Requirement.parse('wmfbackups==0.1') - https://phabricator.wikimedia.org/T263662 (10jcrespo) Was there an OS upgrade on the change from ic... [16:55:50] 10Operations, 10Traffic: nginx HTTP 500 rate increase on specific cache hosts - https://phabricator.wikimedia.org/T226805 (10BBlack) 05Open→03Declined This has been idle over a year, and some of the software stack referenced here doesn't exist anymore. [16:56:59] 10Operations, 10CX-cxserver, 10Citoid, 10RESTBase, and 3 others: Decom legacy ex-parsoidcache cxserver, citoid, and restbase service hostnames - https://phabricator.wikimedia.org/T133001 (10KartikMistry) >>! In T133001#6488134, @BBlack wrote: > @Pchelolo what about `https://cxserver.wikimedia.org/` - Can i... [16:58:00] 10Operations, 10Traffic: Wikipedia is unavailable on Symbian phone's browsers - https://phabricator.wikimedia.org/T227828 (10BBlack) 05Open→03Declined I don't think there's much we can do here. We can expect there will be more tickets like this over time as we deprecate and remove legacy TLS standards, fr... [16:58:53] (03PS1) 10Ssingh: hieradata: update preferred cipher suite order for Wikidough [puppet] - 10https://gerrit.wikimedia.org/r/629434 (https://phabricator.wikimedia.org/T252132) [17:00:05] 10Operations, 10Data-Persistence, 10Data-Persistence-Backup, 10observability: check-mariadb-backups pkg_resources.VersionConflict: 0.2 (/usr/lib/python3/dist-packages), Requirement.parse('wmfbackups==0.1') - https://phabricator.wikimedia.org/T263662 (10herron) >>! In T263662#6488323, @jcrespo wrote: > Was... [17:00:15] 10Operations, 10Data-Persistence, 10Data-Persistence-Backup, 10observability: check-mariadb-backups pkg_resources.VersionConflict: 0.2 (/usr/lib/python3/dist-packages), Requirement.parse('wmfbackups==0.1') - https://phabricator.wikimedia.org/T263662 (10jcrespo) I see the issue, after apt upgrade, the 0.2 v... [17:01:54] (03CR) 10Ssingh: "https://puppet-compiler.wmflabs.org/compiler1001/25343/malmok.wikimedia.org/index.html" [puppet] - 10https://gerrit.wikimedia.org/r/629434 (https://phabricator.wikimedia.org/T252132) (owner: 10Ssingh) [17:02:07] 10Operations, 10Jade, 10TechCom, 10Epic, and 3 others: Deploy pilot of Jade to a small set of wikis. - https://phabricator.wikimedia.org/T183381 (10ACraze) [17:02:38] 10Operations, 10Data-Persistence, 10Data-Persistence-Backup, 10observability: check-mariadb-backups pkg_resources.VersionConflict: 0.2 (/usr/lib/python3/dist-packages), Requirement.parse('wmfbackups==0.1') - https://phabricator.wikimedia.org/T263662 (10jcrespo) Is there an alert2001 to do the same fix? `... [17:02:48] 10Operations, 10Jade, 10TechCom, 10Epic, and 3 others: Deploy pilot of Jade to a small set of wikis. - https://phabricator.wikimedia.org/T183381 (10ACraze) [17:03:34] 10Operations, 10Traffic: Analyze the impact of removing TLSv1/v1.1 on puppetmasters - https://phabricator.wikimedia.org/T242991 (10BBlack) @jbond any further thoughts here? We do still have ~55 jessies: ` conf[2001-2003].codfw.wmnet,dbmonitor1001.wikimedia.org,helium.eqiad.wmnet,heze.codfw.wmnet,kraz.wikimed... [17:04:05] (03PS1) 10Elukey: sre.hadoop.init-hadoop-workers.py: add journalnode partition [cookbooks] - 10https://gerrit.wikimedia.org/r/629435 (https://phabricator.wikimedia.org/T262189) [17:05:08] 10Operations, 10Data-Persistence, 10Data-Persistence-Backup, 10observability: check-mariadb-backups pkg_resources.VersionConflict: 0.2 (/usr/lib/python3/dist-packages), Requirement.parse('wmfbackups==0.1') - https://phabricator.wikimedia.org/T263662 (10jcrespo) 05Open→03Resolved a:03jcrespo This shou... [17:05:19] 10Operations, 10Data-Persistence, 10Data-Persistence-Backup, 10observability: check-mariadb-backups pkg_resources.VersionConflict: 0.2 (/usr/lib/python3/dist-packages), Requirement.parse('wmfbackups==0.1') - https://phabricator.wikimedia.org/T263662 (10herron) >>! In T263662#6488341, @jcrespo wrote: > I se... [17:06:06] 10Operations, 10Data-Persistence, 10Data-Persistence-Backup, 10observability: check-mariadb-backups pkg_resources.VersionConflict: 0.2 (/usr/lib/python3/dist-packages), Requirement.parse('wmfbackups==0.1') - https://phabricator.wikimedia.org/T263662 (10herron) >>! In T263662#6488383, @jcrespo wrote: > This... [17:07:04] 10Operations, 10Data-Persistence, 10Data-Persistence-Backup, 10observability: check-mariadb-backups pkg_resources.VersionConflict: 0.2 (/usr/lib/python3/dist-packages), Requirement.parse('wmfbackups==0.1') - https://phabricator.wikimedia.org/T263662 (10jcrespo) I will, it is just that by the time we add >=... [17:09:02] 10Operations, 10Traffic, 10Security: HTTP MediaWiki API GET requests to Wikimedia wikis should not be redirected to HTTPS when they have a session cookie or Authorization header - https://phabricator.wikimedia.org/T247490 (10BBlack) Yeah this is an interesting angle on things. Currently for all traffic to c... [17:09:29] 10Operations, 10ops-eqiad, 10serviceops: mw1360's NIC is faulty - https://phabricator.wikimedia.org/T262151 (10RobH) So both myself and Papaul have looked into this, checking multiple items: * updated bios and idrac to newest firmware revisions * nic is enabled in bios * nic has error message in idrac inven... [17:15:26] (03PS2) 10Volans: dns: convert check Icinga to read/save state [software/netbox-extras] - 10https://gerrit.wikimedia.org/r/629432 (https://phabricator.wikimedia.org/T258729) [17:17:04] (03CR) 10CDanis: "> Patch Set 1: Code-Review+1" [puppet] - 10https://gerrit.wikimedia.org/r/629417 (https://phabricator.wikimedia.org/T178458) (owner: 10CDanis) [17:19:53] (03PS2) 10Elukey: sre.hadoop.init-hadoop-workers.py: add journalnode partition [cookbooks] - 10https://gerrit.wikimedia.org/r/629435 (https://phabricator.wikimedia.org/T262189) [17:20:21] 10Operations, 10Jade, 10TechCom, 10Epic, and 3 others: Deploy pilot of Jade to a small set of wikis. - https://phabricator.wikimedia.org/T183381 (10ACraze) [17:20:31] (03CR) 10CRusnov: [C: 03+1] "Looks good. Thanks for the shuld change." [software/netbox-extras] - 10https://gerrit.wikimedia.org/r/629432 (https://phabricator.wikimedia.org/T258729) (owner: 10Volans) [17:23:11] 10Operations, 10ops-eqiad, 10serviceops: mw1360's NIC is faulty - https://phabricator.wikimedia.org/T262151 (10RobH) Papaul suggests that we drain power (unplug since we don't have switched PDUs in normal racks) and let sit for a couple minutes and then plug it all back in. Worth a shot, since it has cleare... [17:23:19] 10Operations, 10ops-eqiad, 10serviceops: mw1360's NIC is faulty - https://phabricator.wikimedia.org/T262151 (10RobH) a:03Cmjohnson [17:24:03] (03PS1) 10Dave Pifke: webperf: new python-ua-parser navtiming dependency [puppet] - 10https://gerrit.wikimedia.org/r/629436 (https://phabricator.wikimedia.org/T260580) [17:25:26] 10Operations, 10ops-eqiad, 10serviceops: mw1360's NIC is faulty - https://phabricator.wikimedia.org/T262151 (10RobH) [17:25:32] 10Operations, 10ops-eqiad, 10serviceops: mw1360's NIC is faulty - https://phabricator.wikimedia.org/T262151 (10RobH) p:05Triage→03Medium [17:26:17] (03PS1) 10Dzahn: service::uwsgi: replace hiera() with lookup() [puppet] - 10https://gerrit.wikimedia.org/r/629437 [17:27:43] 10Operations, 10Fundraising-Backlog, 10MediaWiki-extensions-CentralNotice, 10Traffic, and 2 others: Spike: CentralNotice: Verify that our Special:HideBanners cookie storm works as efficiently as possible - https://phabricator.wikimedia.org/T117435 (10BBlack) 05Open→03Resolved a:03BBlack Resolving for... [17:29:21] !log migrating ganeti instances off ganeti5002 for troubleshooting per T261130 [17:29:26] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:29:27] T261130: ganeti5002 was down / powered off, machine check entries in SEL - https://phabricator.wikimedia.org/T261130 [17:30:07] 10Operations, 10Traffic: restrict upload cache access for private wikis - https://phabricator.wikimedia.org/T129839 (10BBlack) Is this still an ongoing concern? No updates since 2016 [17:30:23] (03PS19) 10Elukey: profile::hadoop::common: add datanode mountpoints override [puppet] - 10https://gerrit.wikimedia.org/r/629165 [17:32:56] 10Operations, 10Traffic: Parametrization of VCL is inconsistent - https://phabricator.wikimedia.org/T137747 (10BBlack) 05Open→03Invalid Very old ticket references very old stuff. If there are still similar concerns in more-modern cache puppetization, someone should make a new ticket! [17:33:31] (03PS20) 10Elukey: profile::hadoop::common: add datanode mountpoints override [puppet] - 10https://gerrit.wikimedia.org/r/629165 [17:33:57] 10Operations, 10Fundraising-Backlog, 10MediaWiki-extensions-CentralNotice, 10Traffic: CentralNotice: Review and update Varnish caching for Special:BannerLoader - https://phabricator.wikimedia.org/T149873 (10BBlack) Are we still working on something here, or is this best closed and any remaining concerns op... [17:37:03] 10Operations, 10MediaWiki-API, 10Traffic, 10Patch-For-Review: Varnish does not cache Action API responses when logged in - https://phabricator.wikimedia.org/T155314 (10BBlack) Anything to still pursue here? It's been a few years. Obviously, one path towards fixing these things is to not emit `Vary: Cookie... [17:37:40] 10Operations, 10Traffic, 10Mobile: Samsung Internet's desktop mode getting redirected to mobile site - https://phabricator.wikimedia.org/T158599 (10BBlack) 05Open→03Declined Reopen or make a new ticket if this is still an issue for a real user, it's too-stale with no movement as-is. [17:40:30] 10Operations, 10Pybal, 10Traffic, 10netops, 10Patch-For-Review: Frequent RST returned by appservers to LVS hosts - https://phabricator.wikimedia.org/T163674 (10BBlack) 05Open→03Declined Declining for lack of movement and lack of urgency. [17:40:53] 10Operations, 10Traffic, 10netops: High amount of unexpected ICMP dest unreachable toward esams cache clusters - https://phabricator.wikimedia.org/T167691 (10BBlack) 05Open→03Declined `ssl_do_wait_shutdown` never really did anything, declining this on for lack of urgency (are there users impacted?) and m... [17:41:48] 10Operations, 10Traffic, 10Patch-For-Review: Uncacheable content handling: hfp vs hfm - https://phabricator.wikimedia.org/T180434 (10BBlack) 05Open→03Resolved a:03ema Looks like this was resolved long ago and didn't block the V5 upgrade [17:43:46] 10Operations, 10CheckUser, 10Traffic: Log source port for anonymous users and expose it for sysops/checkusers - https://phabricator.wikimedia.org/T181368 (10BBlack) Is this still desirable for checkusers? Infrastructure has changed since then and is still-changing, but we could probably find a way to pass t... [17:43:55] (03PS21) 10Elukey: profile::hadoop::common: add datanode mountpoints override [puppet] - 10https://gerrit.wikimedia.org/r/629165 [17:44:33] (03PS1) 10Dzahn: maps: hiera()->lookup(), add data types [puppet] - 10https://gerrit.wikimedia.org/r/629439 [17:44:50] (03PS1) 10Volans: netbox: convert Icinga check in timer [puppet] - 10https://gerrit.wikimedia.org/r/629440 (https://phabricator.wikimedia.org/T258729) [17:45:05] (03CR) 10Volans: [C: 03+2] dns: convert check Icinga to read/save state [software/netbox-extras] - 10https://gerrit.wikimedia.org/r/629432 (https://phabricator.wikimedia.org/T258729) (owner: 10Volans) [17:45:18] 10Operations, 10Traffic: Consider using vmod_var instead of temporary headers in VCL - https://phabricator.wikimedia.org/T198620 (10BBlack) This might be a useful project still, as it might help clarify our remaining frontend VCL going forward. Maybe keep this for a backburner thing to attack post-V6-upgrade. [17:45:34] (03CR) 10jerkins-bot: [V: 04-1] maps: hiera()->lookup(), add data types [puppet] - 10https://gerrit.wikimedia.org/r/629439 (owner: 10Dzahn) [17:45:51] (03CR) 10jerkins-bot: [V: 04-1] netbox: convert Icinga check in timer [puppet] - 10https://gerrit.wikimedia.org/r/629440 (https://phabricator.wikimedia.org/T258729) (owner: 10Volans) [17:46:15] 10Operations, 10Traffic: cp3040: kernel crash in ipsec code shortly after reboot - https://phabricator.wikimedia.org/T201666 (10BBlack) 05Open→03Invalid Ipsec for cp nodes is long gone, as is this kernel I'm sure [17:46:38] (03PS1) 10Effie Mouzeli: change variable use_onhost_memcache to use_onhost_memcached [puppet] - 10https://gerrit.wikimedia.org/r/629441 [17:48:01] (03PS2) 10Volans: netbox: convert Icinga check in timer [puppet] - 10https://gerrit.wikimedia.org/r/629440 (https://phabricator.wikimedia.org/T258729) [17:48:52] 10Operations, 10netops: Configure BGP route damping on Anycast sessions - https://phabricator.wikimedia.org/T262372 (10jbond) Sorry missed this looks good to me [17:49:16] (03PS2) 10Effie Mouzeli: change variable use_onhost_memcache to use_onhost_memcached [puppet] - 10https://gerrit.wikimedia.org/r/629441 [17:50:20] 10Operations, 10Traffic: add Icinga alert on Varnish backends that are close to maxing out their allowed connections to their applayer backends - https://phabricator.wikimedia.org/T224738 (10BBlack) 05Open→03Invalid We don't have varnish-be anymore. [17:50:45] (03CR) 10Volans: "Compiler output at: https://puppet-compiler.wmflabs.org/compiler1003/25349/netbox1001.wikimedia.org/index.html" [puppet] - 10https://gerrit.wikimedia.org/r/629440 (https://phabricator.wikimedia.org/T258729) (owner: 10Volans) [17:52:31] 10Operations, 10Traffic, 10Patch-For-Review: Investigate esams text varnish backend fetch failures - https://phabricator.wikimedia.org/T226375 (10BBlack) 05Open→03Resolved a:03ema Long-ago dealt with it looks like, and in any case varnish-be doesn't exist anymore. [17:53:55] 10Operations, 10Traffic: mobile commons GET dying in Varnish layer(?) under oddly specific conditions - https://phabricator.wikimedia.org/T226776 (10BBlack) 05Open→03Declined Declining for now, as multiple implicated parts of the software stack have changed significantly since this report, and nothing was... [17:54:03] (03Abandoned) 10Effie Mouzeli: change variable use_onhost_memcache to use_onhost_memcached [puppet] - 10https://gerrit.wikimedia.org/r/629441 (owner: 10Effie Mouzeli) [17:54:14] (03CR) 10Effie Mouzeli: "pcc https://puppet-compiler.wmflabs.org/compiler1002/25350/" [puppet] - 10https://gerrit.wikimedia.org/r/629441 (owner: 10Effie Mouzeli) [17:54:51] (03Restored) 10Effie Mouzeli: change variable use_onhost_memcache to use_onhost_memcached [puppet] - 10https://gerrit.wikimedia.org/r/629441 (owner: 10Effie Mouzeli) [17:55:35] (03Abandoned) 10Effie Mouzeli: WIP canary_appserver: install memcached if use_onhost_memcache [puppet] - 10https://gerrit.wikimedia.org/r/629431 (https://phabricator.wikimedia.org/T244340) (owner: 10Effie Mouzeli) [17:56:10] (03CR) 10Hashar: [C: 03+1] "Any URL really :]" [puppet] - 10https://gerrit.wikimedia.org/r/629417 (https://phabricator.wikimedia.org/T178458) (owner: 10CDanis) [17:56:21] (03PS1) 10Jbond: puppetmaster: update web site to use strong ssl ciphers [puppet] - 10https://gerrit.wikimedia.org/r/629442 (https://phabricator.wikimedia.org/T242991) [17:56:34] (03PS1) 10Dzahn: oozie: hiera->lookup, add data types [puppet] - 10https://gerrit.wikimedia.org/r/629443 [17:57:39] 10Operations, 10Traffic, 10Patch-For-Review: Analyze the impact of removing TLSv1/v1.1 on puppetmasters - https://phabricator.wikimedia.org/T242991 (10jbond) >>! In T242991#6488366, @BBlack wrote: > @jbond any further thoughts here? We do still have ~55 jessies: > > ` > conf[2001-2003].codfw.wmnet,dbmonito... [17:57:40] (03CR) 10jerkins-bot: [V: 04-1] oozie: hiera->lookup, add data types [puppet] - 10https://gerrit.wikimedia.org/r/629443 (owner: 10Dzahn) [17:58:11] (03CR) 10Dzahn: [C: 03+1] "nice. incomplete list of things already using "strong" cipher list:" [puppet] - 10https://gerrit.wikimedia.org/r/629442 (https://phabricator.wikimedia.org/T242991) (owner: 10Jbond) [17:58:55] 10Operations, 10Traffic: cp3032 and cp3040 occasional failed fetches - https://phabricator.wikimedia.org/T235736 (10BBlack) 05Open→03Declined Probably related to the transient memory issues discussed in various tickets: T164768 T165063 T249809 . In any case this is almost a year old with no investigation,... [18:00:05] dancy and twentyafterfour: #bothumor My software never has bugs. It just develops random features. Rise for Train log triage with CPT. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20200923T1800). [18:00:05] RoanKattouw, Niharika, and Urbanecm: Time to snap out of that daydream and deploy Morning backport window. Get on with it. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20200923T1800). [18:00:05] Evrifaessa and ryankemper: A patch you scheduled for Morning backport window is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker. [18:01:02] (03CR) 10Dzahn: [C: 03+1] "note there is also: modules/puppetmaster/manifests/init.pp: $ssl_settings = ssl_ciphersuite('apache', 'compat')" [puppet] - 10https://gerrit.wikimedia.org/r/629442 (https://phabricator.wikimedia.org/T242991) (owner: 10Jbond) [18:01:32] (03CR) 10Ssingh: "Merging based on PCC above and the previous review: https://gerrit.wikimedia.org/r/c/operations/puppet/+/616067/2#message-37b53697abddc8df" [puppet] - 10https://gerrit.wikimedia.org/r/629434 (https://phabricator.wikimedia.org/T252132) (owner: 10Ssingh) [18:01:33] (03CR) 10Ssingh: [C: 03+2] hieradata: update preferred cipher suite order for Wikidough [puppet] - 10https://gerrit.wikimedia.org/r/629434 (https://phabricator.wikimedia.org/T252132) (owner: 10Ssingh) [18:01:51] RoanKattouw, Niharika and Urbanecm: you here? [18:02:13] I can deploy today! [18:02:39] yay \o/ [18:03:04] (03PS2) 10Jbond: puppetmaster: update web site to use strong ssl ciphers [puppet] - 10https://gerrit.wikimedia.org/r/629442 (https://phabricator.wikimedia.org/T242991) [18:03:20] (03CR) 10Urbanecm: [C: 03+2] Set $wgCategoryCollation = uca-tr on trwikiquote [mediawiki-config] - 10https://gerrit.wikimedia.org/r/629429 (https://phabricator.wikimedia.org/T263628) (owner: 10Evrifaessa) [18:04:33] (03PS2) 10Urbanecm: Set $wgCategoryCollation = uca-tr on trwikiquote [mediawiki-config] - 10https://gerrit.wikimedia.org/r/629429 (https://phabricator.wikimedia.org/T263628) (owner: 10Evrifaessa) [18:04:42] (03CR) 10Urbanecm: [C: 03+2] Set $wgCategoryCollation = uca-tr on trwikiquote [mediawiki-config] - 10https://gerrit.wikimedia.org/r/629429 (https://phabricator.wikimedia.org/T263628) (owner: 10Evrifaessa) [18:05:47] (03Merged) 10jenkins-bot: Set $wgCategoryCollation = uca-tr on trwikiquote [mediawiki-config] - 10https://gerrit.wikimedia.org/r/629429 (https://phabricator.wikimedia.org/T263628) (owner: 10Evrifaessa) [18:05:57] ryankemper: hey, do you want to deploy your patch yourself, once I finish with Evrifaessa's patch? [18:06:06] (03CR) 10Jbond: "> Patch Set 1:" [puppet] - 10https://gerrit.wikimedia.org/r/629442 (https://phabricator.wikimedia.org/T242991) (owner: 10Jbond) [18:06:12] (03CR) 10Jbond: "PCC: https://puppet-compiler.wmflabs.org/compiler1002/25351" [puppet] - 10https://gerrit.wikimedia.org/r/629442 (https://phabricator.wikimedia.org/T242991) (owner: 10Jbond) [18:06:46] Urbanecm: absolutely [18:06:59] Evrifaessa: I'm syncing that, because the servers need to recompute sorting keys first [18:07:05] ryankemper: okay, I'll ping you once I'm done [18:09:37] 10Operations, 10ops-codfw, 10DC-Ops: codfw: Fixing Accounting/Netbox error - https://phabricator.wikimedia.org/T263658 (10Papaul) 05Open→03Resolved Done [18:10:30] 10Operations, 10Machine Learning Platform, 10ORES: [Epic] Deploy ORES in kubernetes cluster - https://phabricator.wikimedia.org/T182331 (10ACraze) [18:10:36] 10Operations, 10Machine Learning Platform, 10ORES, 10Release Pipeline (Blubber): Build blubber file for ORES - https://phabricator.wikimedia.org/T210268 (10ACraze) 05Open→03Declined Declining this as we've discovered ORES will not fit on k8s in it's current design Here is a draft PR that we started on... [18:11:29] Urbanecm: Did you sync it? [18:11:37] Evrifaessa: yes, but logmsgbot is away [18:11:49] !log Logmsgbot seems to be down [18:11:54] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:12:22] the categories are still sorted with the old collation model, not uca-tr [18:12:24] !log urbanecm@deploy1001: scap sync-file wmf-config/InitialiseSettings.php 'b1554f36be68106c9364f4aa2fd70d759ad74356: Set $wgCategoryCollation = uca-tr on trwikiquote (T263628)' [18:12:29] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:12:30] T263628: Set $wgCategoryCollation = uca-tr on trwikiquote - https://phabricator.wikimedia.org/T263628 [18:12:32] Evrifaessa: yes, because I didn't run the recomputing script yet [18:12:47] SREs: Can someone restart logmsgbot please? [18:12:51] https://wikitech.wikimedia.org/wiki/Logmsgbot [18:13:32] !log Start of [urbanecm@mwmaint2001 ~]$ mwscript updateCollation.php --wiki=trwikiquote --previous-collation=uppercase # T263628 [18:13:37] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:13:47] !log End of [urbanecm@mwmaint2001 ~]$ mwscript updateCollation.php --wiki=trwikiquote --previous-collation=uppercase # T263628 [18:13:51] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:13:52] Evrifaessa: what about now? [18:14:11] seems okay [18:14:18] cool [18:14:23] ty :3 [18:15:23] ryankemper: I'm done, but please restart logmsgbot before deploying :-) [18:15:36] https://wikitech.wikimedia.org/wiki/Logmsgbot should show how to do it [18:15:49] 10Operations, 10Machine Learning Platform, 10ORES: Build helm charts for ORES - https://phabricator.wikimedia.org/T210269 (10ACraze) 05Open→03Declined [18:15:52] 10Operations, 10Machine Learning Platform, 10ORES: [Epic] Deploy ORES in kubernetes cluster - https://phabricator.wikimedia.org/T182331 (10ACraze) [18:16:20] Cool, working on restarting it [18:16:53] thanks [18:18:11] There's no `tcpircbot-logmsgbot` on `icinga1001` whatsoever. Hmm... [18:18:24] https://www.irccloud.com/pastebin/9waMywYr/ [18:18:39] hmm... [18:18:45] ...the docs are probably terribly outdated then [18:19:19] (03PS22) 10Elukey: profile::hadoop::common: add datanode mountpoints override [puppet] - 10https://gerrit.wikimedia.org/r/629165 [18:19:23] ryankemper: what about tcpircbot? See https://gerrit.wikimedia.org/g/operations/puppet/+/b9027c2d851357c5421affdff883c39678030708/modules/profile/manifests/tcpircbot.pp#35 [18:19:56] Urbanecm: no, and `systemctl list-units | grep -i irc` turns up nothing [18:20:05] interesting [18:21:05] Could this have been moved to k8s or something? [18:21:10] have no idea [18:21:13] Seems unlikely but I'm grasping for any possible straws haha [18:21:28] Okay, I'll post in #sre and see if anyone has context, and meanwhile I'll proceed w/ my deploy [18:22:17] thcipriani patch for T263675 is pending [18:22:17] T263675: Undefined class constant 'LIMIT' - https://phabricator.wikimedia.org/T263675 [18:22:38] (03CR) 10Ryan Kemper: [V: 03+2 C: 03+2] "backport deploy-ing this now" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/628978 (https://phabricator.wikimedia.org/T263073) (owner: 10Ryan Kemper) [18:22:44] ryankemper: https://gerrit.wikimedia.org/g/operations/puppet/+/b9027c2d851357c5421affdff883c39678030708/hieradata/common.yaml#1079 seems to say it should be still on icinga? [18:23:19] hmm [18:23:36] ah icinga.wikimedia.org points to `alert1001` [18:23:43] There we go [18:23:44] aha! [18:23:56] https://www.irccloud.com/pastebin/jFbpfLxB/ [18:24:07] good [18:24:29] it's running, but not present here, because it failed to authenticate against nickserv in time [18:24:33] DannyS712: thanks :) [18:24:43] usually restarting helps with my IRC bots :-) [18:24:56] (it's in -overflow instead if interested) [18:25:18] thcipriani should I wait to cherry pick it, or do so now? (Or can you once it merges on master, assuming it gets +2? I need to leave in ~15 minutes) [18:25:33] Urbanecm: It's already restarting itself so don't think that will help [18:25:38] aha [18:26:00] we can allow unregistered users to go here temporarily in theory 🙂 [18:26:08] https://www.irccloud.com/pastebin/UphuhzDs/ [18:26:17] what? [18:27:48] DannyS712: I'm in a meeting at the moment, we can handle cherry picking assuming it merges to master [18:29:02] Urbanecm: so is logmsgbot just responsible for echoing the log messages? i.e. everything is appearing in the SAL fine? [18:29:07] (03CR) 10Elukey: "Ok looks good: https://puppet-compiler.wmflabs.org/compiler1003/25354/" [puppet] - 10https://gerrit.wikimedia.org/r/629165 (owner: 10Elukey) [18:29:09] I think that's the case but want to sanity check for I proceed [18:29:18] before I* [18:29:20] ryankemper: exactly, if you do a manual !log entry, we should be fine SAL-side [18:29:31] Great, proceeding [18:31:00] !log HEAD of `/srv/mediawiki-staging` is now at 7a96d63d862eacf5244eec79b63d29d78fbaa6f7 as expected [18:31:03] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:34:44] (03PS7) 10Dave Pifke: arclamp: serve SVGs, compressed logs from Swift [puppet] - 10https://gerrit.wikimedia.org/r/623068 (https://phabricator.wikimedia.org/T244776) [18:34:53] PROBLEM - tcpircbot_service_running on alert1001 is CRITICAL: PROCS CRITICAL: 0 processes with command name python, args tcpircbot.py https://wikitech.wikimedia.org/wiki/Logmsgbot [18:36:07] RECOVERY - tcpircbot_service_running on alert1001 is OK: PROCS OK: 1 process with command name python, args tcpircbot.py https://wikitech.wikimedia.org/wiki/Logmsgbot [18:36:33] (03CR) 10Jbond: [C: 03+1] "LGTM but see comment (which you can treat as optional)" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/629427 (owner: 10Dzahn) [18:37:19] 10Operations, 10RESTBase, 10RESTBase-API, 10Traffic, 10Services (next): RESTBase support for www.wikimedia.org missing - https://phabricator.wikimedia.org/T133178 (10Krinkle) Yes, in so far that the primary RESTBase URL is still completely broken if accessed through the canonical version of that domainna... [18:39:59] The above didn't break anything but also hasn't seemed to fix the problem we're trying to solve, digging in a bit [18:41:37] yeah, I notice that logmsgbot seems to be connected regardless of if the tcpircbot-logmsgbot service is running on alert1001, and the whois for logmsgbot is (~logmsgbot@nat.openstack.eqiad1.wikimediacloud.org) ? [18:42:14] not super familiar with how this was set up [18:43:53] there is a little bit of info at https://wikitech.wikimedia.org/wiki/Logmsgbot [18:44:19] herron: you say the issue is it's not working on the new host? [18:45:12] also https://wikitech.wikimedia.org/wiki/Ircecho [18:45:58] i see ircecho is running on alert1001 [18:46:16] for some reason tcpircbot can't connect as nick logmsgbot, but it connected fine as logmsgbot2 who is now voiced in this channel [18:46:42] wonder if there's a nick conflict? [18:47:40] did the old one get killed? then it may take some time until it is timed-out ..unless someone would manually run the "nickserv ghost" command [18:47:57] !log Above deploy appears successful, test requests seem to be taking 40ms instead of the previous 140ms [18:48:00] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:48:15] logmsgbot is still connected, but the whois looks off [18:48:50] Ah wait, I didn't do the scap sync-file part of the deploy, deploy's not quite done yet [18:50:13] herron: ack, that runs in cloud.. and not sure either why it does.. trying to find something in openstack-browser [18:53:49] herron: asked wmcs channel about it [18:54:56] !log `scap sync-file wmf-config/ProductionServices.php 'Config: [[gerrit:628978|cloudelastic: envoy sits in front now (T263073)]]'` from `ryankemper@deploy1001:/srv/mediawiki-staging` [18:55:01] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:55:02] T263073: Large, steady increase in unprocessed cloudelastic job.cirrusSearchElasticaWrite messages - https://phabricator.wikimedia.org/T263073 [18:56:57] 10Operations, 10MW-on-K8s, 10serviceops: Decide on logging in k8s for ShellBox - https://phabricator.wikimedia.org/T263545 (10crusnov) p:05Triage→03Medium [18:57:04] in puppet I can only see the alerting_host role using profile::icinga::logmsgbot -> profile::tcpircbot -> tcpircbot::instance. Don't see those in cloud roles. [18:57:14] mutante: thanks, following along there as well [18:57:29] !log (Above deploy complete) [18:57:32] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:57:48] I guess on the SAL "above" is technically "below"...oh well [18:58:40] (03CR) 10CRusnov: [C: 03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/629440 (https://phabricator.wikimedia.org/T258729) (owner: 10Volans) [18:59:13] (03PS1) 10Jeena Huneidi: CI profile: move ruamel requirement to publisher [puppet] - 10https://gerrit.wikimedia.org/r/629449 (https://phabricator.wikimedia.org/T255835) [18:59:42] ryankemper: if you read https://twitter.com/wikimedia_sal it's above again :) (you can edit SAL in wiki if you ever feel the need to) [19:00:04] dancy and twentyafterfour: Your horoscope predicts another unfortunate Mediawiki train - American Version deploy. May Zuul be (nice) with you. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20200923T1900). [19:00:17] (03CR) 10jerkins-bot: [V: 04-1] CI profile: move ruamel requirement to publisher [puppet] - 10https://gerrit.wikimedia.org/r/629449 (https://phabricator.wikimedia.org/T255835) (owner: 10Jeena Huneidi) [19:02:07] 10Operations, 10Traffic, 10observability, 10Patch-For-Review: Aggregated metrics for ats-tls <-> clients ttfb percentiles - https://phabricator.wikimedia.org/T263536 (10crusnov) p:05Triage→03Medium a:03fgiunchedi [19:03:19] (03PS2) 10Jeena Huneidi: CI profile: move ruamel requirement to publisher [puppet] - 10https://gerrit.wikimedia.org/r/629449 (https://phabricator.wikimedia.org/T255835) [19:03:33] 10Operations: Allow easier ICU transitions in MediaWiki - https://phabricator.wikimedia.org/T263437 (10crusnov) p:05Triage→03Medium [19:12:18] mutante: I ghosted the nick and restarted tcpircbot-logmsgbot on alert1001 quckly and it's back now [19:12:35] but will possibly happen again on the next restart of tcpircbot-logmsgbot [19:13:07] so yeah will be good to track down with cloud [19:15:43] 10Operations, 10Jade, 10TechCom, 10Epic, and 3 others: Deploy pilot of Jade to a small set of wikis. - https://phabricator.wikimedia.org/T183381 (10ACraze) [19:17:21] herron: cool, yep, sounds good. [19:18:57] (03CR) 10Krinkle: webperf: new python-ua-parser navtiming dependency (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/629436 (https://phabricator.wikimedia.org/T260580) (owner: 10Dave Pifke) [19:21:49] 10Operations: problems installing OS on new install servers (bootstrapping install servers in POPs) - https://phabricator.wikimedia.org/T263684 (10Dzahn) [19:22:40] 10Operations: problems installing OS on new install servers (bootstrapping install servers in POPs) - https://phabricator.wikimedia.org/T263684 (10Dzahn) [19:22:42] 10Operations, 10Patch-For-Review: serve tftpboot environment from the install servers and create one in each edge POP - https://phabricator.wikimedia.org/T252526 (10Dzahn) [19:23:10] 10Operations, 10Patch-For-Review: Sort out plan for install* servers in edge sites - https://phabricator.wikimedia.org/T242602 (10Dzahn) [19:23:15] 10Operations, 10Patch-For-Review: serve tftpboot environment from the install servers and create one in each edge POP - https://phabricator.wikimedia.org/T252526 (10Dzahn) 05Open→03Stalled currently stalled on T263684 [19:23:55] (03PS1) 10Ahmon Dancy: Make SpecialMobileHistory::LIMIT protected [extensions/MobileFrontend] (wmf/1.36.0-wmf.10) - 10https://gerrit.wikimedia.org/r/629398 (https://phabricator.wikimedia.org/T263675) [19:23:56] 10Operations: problems installing OS on new install servers (bootstrapping install servers in POPs) - https://phabricator.wikimedia.org/T263684 (10Dzahn) p:05Triage→03High [19:24:32] (03PS1) 10Mholloway: Update mobileapps to 2020-09-23-191819-production [deployment-charts] - 10https://gerrit.wikimedia.org/r/629458 [19:24:59] (03CR) 10Mholloway: [C: 04-1] "Hold for service deploy window" [deployment-charts] - 10https://gerrit.wikimedia.org/r/629458 (owner: 10Mholloway) [19:28:39] (03PS1) 10Mholloway: Update wikifeeds to 2020-09-23-192431-production [deployment-charts] - 10https://gerrit.wikimedia.org/r/629459 [19:29:06] (03CR) 10Mholloway: [C: 04-1] "Hold for service deployment window" [deployment-charts] - 10https://gerrit.wikimedia.org/r/629459 (owner: 10Mholloway) [19:29:52] (03PS1) 10Ahmon Dancy: ApiQueryGlobalUsage: handle undefined indexes [extensions/GlobalUsage] (wmf/1.36.0-wmf.10) - 10https://gerrit.wikimedia.org/r/629399 (https://phabricator.wikimedia.org/T263601) [19:30:05] PROBLEM - ganeti-confd running on ganeti5002 is CRITICAL: PROCS CRITICAL: 0 processes with UID = 111 (gnt-confd), command name ganeti-confd https://wikitech.wikimedia.org/wiki/Ganeti [19:31:07] PROBLEM - ganeti-noded running on ganeti5002 is CRITICAL: PROCS CRITICAL: 0 processes with UID = 0 (root), command name ganeti-noded https://wikitech.wikimedia.org/wiki/Ganeti [19:31:07] PROBLEM - ganeti-mond running on ganeti5002 is CRITICAL: PROCS CRITICAL: 0 processes with UID = 0 (root), command name ganeti-mond https://wikitech.wikimedia.org/wiki/Ganeti [19:33:08] i gotta ack those [19:36:34] (03PS1) 10Andrew Bogott: Added bogus secrets for barbican testing [labs/private] - 10https://gerrit.wikimedia.org/r/629460 (https://phabricator.wikimedia.org/T263680) [19:38:46] 10Operations, 10ops-eqsin, 10serviceops: ganeti5002 was down / powered off, machine check entries in SEL - https://phabricator.wikimedia.org/T261130 (10RobH) Ok, export of the SEL (have to clear it to run the hw diagnostic or it throws error for errors in SEL) /admin1-> racadm getsel Record: 1 Date/Ti... [19:38:49] (03CR) 10Ebernhardson: "It's only been a couple months since the last increase, and 59 is a *lot* of shards. I think we need to do more investigation and better u" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/628980 (https://phabricator.wikimedia.org/T260083) (owner: 10Ryan Kemper) [19:39:32] 10Operations, 10ops-eqsin, 10serviceops: ganeti5002 was down / powered off, machine check entries in SEL - https://phabricator.wikimedia.org/T261130 (10RobH) Removed system from ganeti cluster via directions on wikitech for extended downtime. will do hw testing on it next. [19:39:41] (03PS1) 10Mholloway: Update proton to 2020-09-23-192904-production [deployment-charts] - 10https://gerrit.wikimedia.org/r/629461 [19:40:01] (03CR) 10Mholloway: [C: 04-1] "Hold for service deploy window" [deployment-charts] - 10https://gerrit.wikimedia.org/r/629461 (owner: 10Mholloway) [19:42:40] !log ganeti5002 firmware update before hw testing via T261130 [19:42:47] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:42:48] T261130: ganeti5002 was down / powered off, machine check entries in SEL - https://phabricator.wikimedia.org/T261130 [19:43:09] (03PS2) 10Effie Mouzeli: service_proxy: enable ipv6 on envoy config [puppet] - 10https://gerrit.wikimedia.org/r/629343 (https://phabricator.wikimedia.org/T255568) [19:43:13] (03CR) 10Effie Mouzeli: service_proxy: enable ipv6 on envoy config (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/629343 (https://phabricator.wikimedia.org/T255568) (owner: 10Effie Mouzeli) [19:43:58] (03CR) 10Effie Mouzeli: "> But I would definitely recommend to disable puppet and test on one server per role (I guess) before this is rolled out fleet wide." [puppet] - 10https://gerrit.wikimedia.org/r/629343 (https://phabricator.wikimedia.org/T255568) (owner: 10Effie Mouzeli) [19:51:28] 10Operations, 10Traffic, 10Performance-Team (Radar): experiment with a "unified" ATS-BE pool - https://phabricator.wikimedia.org/T263291 (10crusnov) p:05Triage→03Medium [19:54:13] (03PS1) 10Dzahn: remove shinken module, profile, role [puppet] - 10https://gerrit.wikimedia.org/r/629464 [19:55:36] (03PS2) 10Dzahn: remove shinken module, profile, role [puppet] - 10https://gerrit.wikimedia.org/r/629464 [19:56:23] mw train seems still blocked for the moment, anyone mind if i get a head start on service deployments? [19:58:17] 10Operations, 10ops-eqiad, 10DC-Ops, 10fundraising-tech-ops: (Need By: 2020-09-30) rack/setup/install frdb1004.frack.eqiad.wmnet - https://phabricator.wikimedia.org/T260379 (10Jgreen) [19:59:22] 10Operations, 10ops-eqiad, 10DC-Ops, 10fundraising-tech-ops: (Need By: 2020-09-30) rack/setup/install frdb1004.frack.eqiad.wmnet - https://phabricator.wikimedia.org/T260379 (10Jgreen) [20:00:04] chrisalbon and accraze: May I have your attention please! Services – Graphoid / ORES. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20200923T2000) [20:00:31] (03CR) 10Dave Pifke: webperf: new python-ua-parser navtiming dependency (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/629436 (https://phabricator.wikimedia.org/T260580) (owner: 10Dave Pifke) [20:00:33] (03CR) 10Dzahn: "Yuvipanda, let me know if you would like to keep this code." [puppet] - 10https://gerrit.wikimedia.org/r/629464 (owner: 10Dzahn) [20:02:12] !log mholloway-shell@deploy1001 helmfile [staging] Ran 'sync' command on namespace 'push-notifications' for release 'main' . [20:02:16] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:03:30] 10Operations, 10ops-eqsin, 10serviceops: ganeti5002 was down / powered off, machine check entries in SEL - https://phabricator.wikimedia.org/T261130 (10RobH) ` Technical Support will need this information to diagnose the problem. Please record the information below. Service Tag : FLX09X2 Error Code : 2000-0... [20:05:10] 10Operations, 10ops-codfw, 10serviceops: wtp2005 hardware issue - https://phabricator.wikimedia.org/T257903 (10Dzahn) a:05Papaul→03Dzahn [20:05:53] (03CR) 10Ahmon Dancy: [C: 03+2] Make SpecialMobileHistory::LIMIT protected [extensions/MobileFrontend] (wmf/1.36.0-wmf.10) - 10https://gerrit.wikimedia.org/r/629398 (https://phabricator.wikimedia.org/T263675) (owner: 10Ahmon Dancy) [20:06:16] !log mholloway-shell@deploy1001 helmfile [eqiad] Ran 'sync' command on namespace 'push-notifications' for release 'main' . [20:06:20] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:08:57] !log mholloway-shell@deploy1001 helmfile [codfw] Ran 'sync' command on namespace 'push-notifications' for release 'main' . [20:09:00] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:10:27] (03CR) 10Mholloway: [C: 03+2] Update proton to 2020-09-23-192904-production [deployment-charts] - 10https://gerrit.wikimedia.org/r/629461 (owner: 10Mholloway) [20:13:01] (03Merged) 10jenkins-bot: Update proton to 2020-09-23-192904-production [deployment-charts] - 10https://gerrit.wikimedia.org/r/629461 (owner: 10Mholloway) [20:13:05] 10Operations, 10ops-codfw, 10serviceops: wtp2005 hardware issue - https://phabricator.wikimedia.org/T257903 (10Dzahn) >>! In T257903#6307464, @akosiaris wrote: > Yeah I think we can for now. The replacing hosts have been racked and have the role(insetup) applied so we can take it from here. Thanks! Meanwhil... [20:14:03] 10Operations, 10ops-codfw, 10serviceops: wtp2005 hardware issue - https://phabricator.wikimedia.org/T257903 (10Dzahn) p:05Low→03Medium [20:14:24] (03PS1) 10Dzahn: decom wtp2005 [puppet] - 10https://gerrit.wikimedia.org/r/629468 (https://phabricator.wikimedia.org/T257903) [20:15:03] (03PS1) 10Bartosz Dziewoński: Simplify DiscussionTools config [mediawiki-config] - 10https://gerrit.wikimedia.org/r/629469 [20:15:05] (03PS1) 10Bartosz Dziewoński: Move DiscussionTools out of beta on arwiki, cswiki, huwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/629470 (https://phabricator.wikimedia.org/T249394) [20:15:16] !log mholloway-shell@deploy1001 helmfile [staging] Ran 'sync' command on namespace 'proton' for release 'production' . [20:15:19] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:16:03] (03PS2) 10Dzahn: decom wtp2005.codfw.wmnet [puppet] - 10https://gerrit.wikimedia.org/r/629468 (https://phabricator.wikimedia.org/T257903) [20:18:11] !log mholloway-shell@deploy1001 helmfile [eqiad] Ran 'sync' command on namespace 'proton' for release 'production' . [20:18:15] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:22:58] !log mholloway-shell@deploy1001 helmfile [codfw] Ran 'sync' command on namespace 'proton' for release 'production' . [20:23:01] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:23:09] (03CR) 10Mholloway: [C: 03+2] Update wikifeeds to 2020-09-23-192431-production [deployment-charts] - 10https://gerrit.wikimedia.org/r/629459 (owner: 10Mholloway) [20:25:31] (03Merged) 10jenkins-bot: Update wikifeeds to 2020-09-23-192431-production [deployment-charts] - 10https://gerrit.wikimedia.org/r/629459 (owner: 10Mholloway) [20:26:02] (03Merged) 10jenkins-bot: Make SpecialMobileHistory::LIMIT protected [extensions/MobileFrontend] (wmf/1.36.0-wmf.10) - 10https://gerrit.wikimedia.org/r/629398 (https://phabricator.wikimedia.org/T263675) (owner: 10Ahmon Dancy) [20:27:34] !log mholloway-shell@deploy1001 helmfile [staging] Ran 'sync' command on namespace 'wikifeeds' for release 'staging' . [20:27:38] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:27:55] (03CR) 10Ahmon Dancy: [C: 03+2] ApiQueryGlobalUsage: handle undefined indexes [extensions/GlobalUsage] (wmf/1.36.0-wmf.10) - 10https://gerrit.wikimedia.org/r/629399 (https://phabricator.wikimedia.org/T263601) (owner: 10Ahmon Dancy) [20:28:19] (03CR) 10Andrew Bogott: [V: 03+2 C: 03+2] Added bogus secrets for barbican testing [labs/private] - 10https://gerrit.wikimedia.org/r/629460 (https://phabricator.wikimedia.org/T263680) (owner: 10Andrew Bogott) [20:28:31] (03CR) 10Dzahn: [C: 03+2] "{"wtp2005.codfw.wmnet": {"weight": 10, "pooled": "inactive"}, "tags": "dc=codfw,cluster=parsoid,service=parsoid"}" [puppet] - 10https://gerrit.wikimedia.org/r/629468 (https://phabricator.wikimedia.org/T257903) (owner: 10Dzahn) [20:28:52] !log mholloway-shell@deploy1001 helmfile [eqiad] Ran 'sync' command on namespace 'wikifeeds' for release 'production' . [20:28:57] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:28:58] (03PS1) 10Andrew Bogott: OpenStack: add initial manifests for OpenStack Barbican, a secrets API [puppet] - 10https://gerrit.wikimedia.org/r/629472 (https://phabricator.wikimedia.org/T263680) [20:30:02] (03CR) 10jerkins-bot: [V: 04-1] OpenStack: add initial manifests for OpenStack Barbican, a secrets API [puppet] - 10https://gerrit.wikimedia.org/r/629472 (https://phabricator.wikimedia.org/T263680) (owner: 10Andrew Bogott) [20:30:13] !log dzahn@cumin1001 START - Cookbook sre.hosts.decommission [20:30:16] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:30:46] !log dzahn@cumin1001 END (ERROR) - Cookbook sre.hosts.decommission (exit_code=97) [20:30:50] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:31:05] (03Merged) 10jenkins-bot: ApiQueryGlobalUsage: handle undefined indexes [extensions/GlobalUsage] (wmf/1.36.0-wmf.10) - 10https://gerrit.wikimedia.org/r/629399 (https://phabricator.wikimedia.org/T263601) (owner: 10Ahmon Dancy) [20:32:34] (03CR) 10Mholloway: [C: 03+2] Update mobileapps to 2020-09-23-191819-production [deployment-charts] - 10https://gerrit.wikimedia.org/r/629458 (owner: 10Mholloway) [20:33:34] 10Operations, 10ops-codfw, 10serviceops, 10Patch-For-Review: wtp2005 hardware issue - https://phabricator.wikimedia.org/T257903 (10Dzahn) >>! In T257903#6485866, @Papaul wrote: > I noticed that this server is not present in Icinga and has status "failed" in Netbox. When I tried to run the decom cookbook I... [20:33:53] !log dzahn@cumin1001 START - Cookbook sre.hosts.decommission [20:33:58] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:34:11] (03PS2) 10Andrew Bogott: OpenStack: add initial manifests for OpenStack Barbican, a secrets API [puppet] - 10https://gerrit.wikimedia.org/r/629472 (https://phabricator.wikimedia.org/T263680) [20:34:49] (03Merged) 10jenkins-bot: Update mobileapps to 2020-09-23-191819-production [deployment-charts] - 10https://gerrit.wikimedia.org/r/629458 (owner: 10Mholloway) [20:34:52] 10Operations, 10ops-codfw, 10serviceops, 10Patch-For-Review: wtp2005 hardware issue - https://phabricator.wikimedia.org/T257903 (10Dzahn) Arr.. it still shows up in MediaWiki config: ` Looking for matches in puppetmaster1001.eqiad.wmnet:/var/lib/git/operations/puppet Looking for matches in puppetmaster10... [20:35:16] (03CR) 10jerkins-bot: [V: 04-1] OpenStack: add initial manifests for OpenStack Barbican, a secrets API [puppet] - 10https://gerrit.wikimedia.org/r/629472 (https://phabricator.wikimedia.org/T263680) (owner: 10Andrew Bogott) [20:36:05] (03PS3) 10Andrew Bogott: OpenStack: add initial manifests for OpenStack Barbican, a secrets API [puppet] - 10https://gerrit.wikimedia.org/r/629472 (https://phabricator.wikimedia.org/T263680) [20:36:33] !log mholloway-shell@deploy1001 helmfile [staging] Ran 'sync' command on namespace 'mobileapps' for release 'staging' . [20:36:36] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:36:43] (03CR) 10BryanDavis: [C: 03+1] remove shinken module, profile, role (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/629464 (owner: 10Dzahn) [20:37:01] (03PS4) 10Andrew Bogott: OpenStack: add initial manifests for OpenStack Barbican, a secrets API [puppet] - 10https://gerrit.wikimedia.org/r/629472 (https://phabricator.wikimedia.org/T263680) [20:38:01] (03CR) 10jerkins-bot: [V: 04-1] OpenStack: add initial manifests for OpenStack Barbican, a secrets API [puppet] - 10https://gerrit.wikimedia.org/r/629472 (https://phabricator.wikimedia.org/T263680) (owner: 10Andrew Bogott) [20:38:09] !log mholloway-shell@deploy1001 helmfile [eqiad] Ran 'sync' command on namespace 'mobileapps' for release 'production' . [20:38:09] !log mholloway-shell@deploy1001 helmfile [eqiad] Ran 'sync' command on namespace 'mobileapps' for release 'nontls' . [20:38:15] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:38:20] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:40:11] (03PS5) 10Andrew Bogott: OpenStack: add initial manifests for OpenStack Barbican, a secrets API [puppet] - 10https://gerrit.wikimedia.org/r/629472 (https://phabricator.wikimedia.org/T263680) [20:41:16] (03CR) 10jerkins-bot: [V: 04-1] OpenStack: add initial manifests for OpenStack Barbican, a secrets API [puppet] - 10https://gerrit.wikimedia.org/r/629472 (https://phabricator.wikimedia.org/T263680) (owner: 10Andrew Bogott) [20:41:30] !log dancy@deploy1001 Started scap: (no justification provided) [20:41:33] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:42:10] (03PS1) 10Dzahn: remove wtp2005 from wgLinterSubmitterWhitelist [mediawiki-config] - 10https://gerrit.wikimedia.org/r/629475 (https://phabricator.wikimedia.org/T257903) [20:42:11] !log mholloway-shell@deploy1001 helmfile [codfw] Ran 'sync' command on namespace 'mobileapps' for release 'nontls' . [20:42:11] !log mholloway-shell@deploy1001 helmfile [codfw] Ran 'sync' command on namespace 'mobileapps' for release 'production' . [20:42:14] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:42:17] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:42:50] !log dancy@deploy1001 Started scap: Deploying fixes for T263601 and T263675 to 1.36.0-wmf.10 [20:42:58] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:42:58] T263601: ApiQueryGlobalUsage.php Undefined index error when accessing $pageIds - https://phabricator.wikimedia.org/T263601 [20:42:59] T263675: Undefined class constant 'LIMIT' - https://phabricator.wikimedia.org/T263675 [20:43:47] 10Operations, 10SRE-Access-Requests: Requesting access to analytics-privatedata-users for Djellel Difallah - https://phabricator.wikimedia.org/T263692 (10Isaac) [20:44:13] (03CR) 10Dzahn: [C: 03+1] "Any deployer is welcome to pick this up 😊" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/629475 (https://phabricator.wikimedia.org/T257903) (owner: 10Dzahn) [20:44:41] 10Operations, 10SRE-Access-Requests: Requesting access to analytics-privatedata-users for Djellel Difallah - https://phabricator.wikimedia.org/T263692 (10Isaac) @leila if you could give approval, that'd be appreciated @DED if you could confirm you've signed the L3 agreement in the task description and make sur... [20:46:46] 10Operations, 10ops-codfw, 10serviceops, 10Patch-For-Review: wtp2005 hardware issue - https://phabricator.wikimedia.org/T257903 (10Dzahn) ` Found match(es) in the Puppet or mediawiki-config repositories (see above), proceed anyway? Type "done" to proceed > done Scheduling downtime on Icinga server alert100... [20:47:31] rzl: mutante: o/ do you happen to know whether wikifeeds is currently in a state in which it's safe to deploy a change to main_app.image without risking a regression of T263043? [20:47:32] T263043: [Bug] The feed/featured endpoint is broken - https://phabricator.wikimedia.org/T263043 [20:48:02] 10Operations, 10SRE-Access-Requests: Requesting access to analytics-privatedata-users for Djellel Difallah - https://phabricator.wikimedia.org/T263692 (10leila) Approved on my end. (team manager) [20:50:27] (03PS6) 10Andrew Bogott: OpenStack: add initial manifests for OpenStack Barbican, a secrets API [puppet] - 10https://gerrit.wikimedia.org/r/629472 (https://phabricator.wikimedia.org/T263680) [20:51:30] (03CR) 10jerkins-bot: [V: 04-1] OpenStack: add initial manifests for OpenStack Barbican, a secrets API [puppet] - 10https://gerrit.wikimedia.org/r/629472 (https://phabricator.wikimedia.org/T263680) (owner: 10Andrew Bogott) [20:52:07] (03PS7) 10Andrew Bogott: OpenStack: add initial manifests for OpenStack Barbican, a secrets API [puppet] - 10https://gerrit.wikimedia.org/r/629472 (https://phabricator.wikimedia.org/T263680) [20:52:27] (03CR) 10jerkins-bot: [V: 04-1] OpenStack: add initial manifests for OpenStack Barbican, a secrets API [puppet] - 10https://gerrit.wikimedia.org/r/629472 (https://phabricator.wikimedia.org/T263680) (owner: 10Andrew Bogott) [20:52:40] !log dzahn@cumin1001 END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) [20:52:44] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:52:47] 10Operations, 10ops-codfw, 10serviceops, 10Patch-For-Review: wtp2005 hardware issue - https://phabricator.wikimedia.org/T257903 (10ops-monitoring-bot) cookbooks.sre.hosts.decommission executed by dzahn@cumin1001 for hosts: `wtp2005.codfw.wmnet` - wtp2005.codfw.wmnet (**FAIL**) - **Failed downtime host on... [20:53:38] (03PS8) 10Andrew Bogott: OpenStack: add initial manifests for OpenStack Barbican, a secrets API [puppet] - 10https://gerrit.wikimedia.org/r/629472 (https://phabricator.wikimedia.org/T263680) [20:54:04] volans: hi, can i bug you about a problem with sre.dns.netbox? [20:54:39] (03CR) 10jerkins-bot: [V: 04-1] OpenStack: add initial manifests for OpenStack Barbican, a secrets API [puppet] - 10https://gerrit.wikimedia.org/r/629472 (https://phabricator.wikimedia.org/T263680) (owner: 10Andrew Bogott) [20:55:20] mutante what's up? [20:55:40] (03PS9) 10Andrew Bogott: OpenStack: add initial manifests for OpenStack Barbican, a secrets API [puppet] - 10https://gerrit.wikimedia.org/r/629472 (https://phabricator.wikimedia.org/T263680) [20:56:22] volans: i used sre.hosts.decom to remove wtp2005 and got to the DNS generation step. and then sre.dns.netbox cookbook failed to run [20:56:29] Cumin ecit_code=2 [20:56:31] (03PS10) 10Andrew Bogott: OpenStack: add initial manifests for OpenStack Barbican, a secrets API [puppet] - 10https://gerrit.wikimedia.org/r/629472 (https://phabricator.wikimedia.org/T263680) [20:56:40] and now i am not sure what is best to do next [20:57:20] let me have a look [20:57:22] it also tells me about the manual change that is needed additionally, but that is like what i saw last time [20:57:36] thank you [20:57:44] (03PS11) 10Andrew Bogott: OpenStack: add initial manifests for OpenStack Barbican, a secrets API [puppet] - 10https://gerrit.wikimedia.org/r/629472 (https://phabricator.wikimedia.org/T263680) [20:58:08] the desired change is just "-wtp2005" where it shows up [20:58:43] and that host wtp2005 was maybe in a weird zombie state. not in puppetdb but still in conftool and DNS [20:58:54] doesn't matter [20:58:56] ok [20:59:15] meanwhile i am uploading the manual change doing the same [20:59:16] I'm running the cookbook in dry run mode to see what does it say while checking the logs [20:59:59] (03PS12) 10Andrew Bogott: OpenStack: add initial manifests for OpenStack Barbican, a secrets API [puppet] - 10https://gerrit.wikimedia.org/r/629472 (https://phabricator.wikimedia.org/T263680) [21:00:15] 10Operations, 10SRE-Access-Requests: Requesting access to analytics-privatedata-users for Djellel Difallah - https://phabricator.wikimedia.org/T263692 (10DED) Since I am using the same phab username (with updated email and MediaWiki linked accounts), my signature is still valid: > You signed this document on O... [21:01:01] (03CR) 10jerkins-bot: [V: 04-1] OpenStack: add initial manifests for OpenStack Barbican, a secrets API [puppet] - 10https://gerrit.wikimedia.org/r/629472 (https://phabricator.wikimedia.org/T263680) (owner: 10Andrew Bogott) [21:02:40] (03PS13) 10Andrew Bogott: OpenStack: add initial manifests for OpenStack Barbican, a secrets API [puppet] - 10https://gerrit.wikimedia.org/r/629472 (https://phabricator.wikimedia.org/T263680) [21:02:43] so the change was correctly committed to the exported netbox repository although cumin sais it failed [21:02:50] the netbox status was 'failed' [21:02:59] ok [21:04:04] (03PS14) 10Andrew Bogott: OpenStack: add initial manifests for OpenStack Barbican, a secrets API [puppet] - 10https://gerrit.wikimedia.org/r/629472 (https://phabricator.wikimedia.org/T263680) [21:04:28] (03PS1) 10Dzahn: decom wtp2005.codfw.wmnet [dns] - 10https://gerrit.wikimedia.org/r/629476 (https://phabricator.wikimedia.org/T257903) [21:04:39] interestingly enough the exit code was rc=1 [21:04:48] while the script doesn't use 1 as exit code in any place [21:05:02] volans: should I merge that ^ ? [21:05:20] sure, codfw has not yet been migrated, that's the one that will take effect [21:05:28] the netbox one is noop still [21:05:30] alright [21:05:38] (03CR) 10Dzahn: [C: 03+2] decom wtp2005.codfw.wmnet [dns] - 10https://gerrit.wikimedia.org/r/629476 (https://phabricator.wikimedia.org/T257903) (owner: 10Dzahn) [21:05:52] papaul confirmed the IP is gone in netbox. seems alright [21:06:00] volans: needs to move to dc-ops team [21:06:14] (03PS15) 10Andrew Bogott: OpenStack: add initial manifests for OpenStack Barbican, a secrets API [puppet] - 10https://gerrit.wikimedia.org/r/629472 (https://phabricator.wikimedia.org/T263680) [21:06:30] lol [21:06:56] !log volans@cumin1001 START - Cookbook sre.dns.netbox [21:06:59] ran authdns-update. and thanks volans [21:07:00] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:07:11] mutante: I'm running the cookbook again with --force 4f55095 [21:07:17] ah, *nod* [21:07:23] to force push that change even if there are no diffs anymore [21:07:28] (see -h for more details) [21:07:37] good to know,ok [21:08:13] from the logs is not clear what exactly failed, cumin says the script returned 1 but it doesn't return 1 ever, so it's weird [21:08:21] but 1 is the default for so many things that who knows [21:09:14] this isn't to do with the new icinga stuff? [21:09:25] 10Operations, 10ops-codfw, 10serviceops: decom wtp2005 (was: wtp2005 hardware issue) - https://phabricator.wikimedia.org/T257903 (10Dzahn) a:05Dzahn→03Papaul [21:09:55] I assume you have that Traceback I am seeing already [21:11:09] the spicerack.remote.RemoteExecutionError one? [21:11:19] yes [21:11:27] yeah sure [21:12:01] chaomodus: I don't see how [21:12:05] this is what failed [21:12:07] node=netbox1001.wikimedia.org, rc=1, command='cd /tmp && runuser -u netbox -- python3 /srv/deployment/netbox-extras/dns/generate_dns_snippets.py push "/tmp/dns-c25pcHBldHM-8yzfk42e" "4f550956bb2faebc7c0b6665a86949580d5fa15f"' [21:12:22] !log volans@cumin1001 END (PASS) - Cookbook sre.dns.netbox (exit_code=0) [21:12:25] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:12:33] ok change pushed to the dns osts [21:14:17] I'll look more into it tomorrow if anything else could have made it exit with 1 [21:14:32] 10Operations, 10ops-codfw, 10serviceops: decom wtp2005 (was: wtp2005 hardware issue) - https://phabricator.wikimedia.org/T257903 (10Dzahn) @Papaul This should now be ready for decom. After some intial issue with the DNS removal it is now gone. The only thing left is a MW config change but that can go anytime... [21:14:54] yep, thanks volans, and papaul you can go ahead [21:15:04] * volans back off [21:15:12] the only thing left is the removal from MW config and that can be anytime [21:24:14] !log dancy@deploy1001 Finished scap: (no justification provided) (duration: 42m 52s) [21:24:18] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:26:37] (03PS1) 10Ahmon Dancy: group1 wikis to 1.36.0-wmf.10 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/629481 [21:26:39] (03CR) 10Ahmon Dancy: [C: 03+2] group1 wikis to 1.36.0-wmf.10 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/629481 (owner: 10Ahmon Dancy) [21:27:11] jouncebot: next [21:27:11] In 1 hour(s) and 32 minute(s): Evening backport window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20200923T2300) [21:27:36] (03Merged) 10jenkins-bot: group1 wikis to 1.36.0-wmf.10 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/629481 (owner: 10Ahmon Dancy) [21:29:12] !log dancy@deploy1001 rebuilt and synchronized wikiversions files: group1 wikis to 1.36.0-wmf.10 [21:29:16] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:30:02] (03CR) 10BryanDavis: [C: 03+1] toolforge: use locales::all for the grid [puppet] - 10https://gerrit.wikimedia.org/r/628879 (https://phabricator.wikimedia.org/T263339) (owner: 10Bstorm) [21:30:16] !log dancy@deploy1001 Synchronized php: group1 wikis to 1.36.0-wmf.10 (duration: 01m 04s) [21:30:20] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:30:34] 10Operations, 10ops-codfw, 10serviceops: decom wtp2005 (was: wtp2005 hardware issue) - https://phabricator.wikimedia.org/T257903 (10Papaul) ` edit interfaces interface-range disabled] member ge-3/0/2 { ... } + member ge-4/0/21; [edit interfaces] - ge-4/0/21 { - description wtp2005; - en... [21:30:47] (03CR) 10Bstorm: [C: 03+2] toolforge: use locales::all for the grid [puppet] - 10https://gerrit.wikimedia.org/r/628879 (https://phabricator.wikimedia.org/T263339) (owner: 10Bstorm) [21:33:50] !log pt1979@cumin2001 START - Cookbook sre.dns.netbox [21:33:52] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:35:00] 10Operations, 10ops-codfw, 10serviceops: decom wtp2005 (was: wtp2005 hardware issue) - https://phabricator.wikimedia.org/T257903 (10Dzahn) >>! In T257903#6489392, @Dzahn wrote: > Arr.. it still shows up in MediaWiki config: removal added to today's evening deploy window (formerly SWAT) https://wikitech.wik... [21:37:20] 10Operations, 10Push-Notification-Service, 10serviceops, 10Product-Infrastructure-Team-Backlog (Kanban): Timeout when fetching OAuth token - https://phabricator.wikimedia.org/T263695 (10Mholloway) [21:37:41] !log pt1979@cumin2001 END (FAIL) - Cookbook sre.dns.netbox (exit_code=99) [21:37:44] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:38:28] !log pt1979@cumin2001 START - Cookbook sre.dns.netbox [21:38:31] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:39:29] 10Operations, 10Push-Notification-Service, 10serviceops, 10Product-Infrastructure-Team-Backlog (Kanban): Timeout when fetching Google OAuth2 token for FCM - https://phabricator.wikimedia.org/T263695 (10Mholloway) [21:39:43] (03PS3) 10Dzahn: remove shinken module, profile, role [puppet] - 10https://gerrit.wikimedia.org/r/629464 (https://phabricator.wikimedia.org/T236547) [21:40:32] !log pt1979@cumin2001 END (PASS) - Cookbook sre.dns.netbox (exit_code=0) [21:40:36] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:42:09] 10Operations, 10ops-codfw, 10serviceops: decom wtp2005 (was: wtp2005 hardware issue) - https://phabricator.wikimedia.org/T257903 (10Papaul) Removed mgmt DNS. what left is just to removed the disk from the server and unrack it. [21:44:07] (03CR) 10Dzahn: [C: 03+1] "The only tests that could be done here: wtp2005.codfw.wmnet is gone from DNS and would not be allowed anymore to submit lint issues." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/629475 (https://phabricator.wikimedia.org/T257903) (owner: 10Dzahn) [21:45:00] 10Operations, 10ops-eqsin, 10serviceops: ganeti5002 was down / powered off, machine check entries in SEL - https://phabricator.wikimedia.org/T261130 (10RobH) I've created SR1037478758 to dispatch a replacement mainboard. I'll open an inbound shipment ticket with SG3 once I get notification of the shipment,... [21:48:20] 10Operations, 10Patch-For-Review: serve tftpboot environment from the install servers and create one in each edge POP - https://phabricator.wikimedia.org/T252526 (10Dzahn) [21:48:24] 10Operations, 10vm-requests, 10Patch-For-Review: esams,ulsfo,eqsin: one VM request each for install_servers - https://phabricator.wikimedia.org/T254157 (10Dzahn) [21:48:53] (03PS4) 10Dzahn: site: add new POP install servers with insetup role [puppet] - 10https://gerrit.wikimedia.org/r/601342 (https://phabricator.wikimedia.org/T252526) [21:49:43] (03CR) 10Dzahn: [C: 03+2] site: add new POP install servers with insetup role [puppet] - 10https://gerrit.wikimedia.org/r/601342 (https://phabricator.wikimedia.org/T252526) (owner: 10Dzahn) [21:54:05] 10Operations, 10Patch-For-Review: serve tftpboot environment from the install servers and create one in each edge POP - https://phabricator.wikimedia.org/T252526 (10Dzahn) [21:54:16] 10Operations, 10vm-requests, 10Patch-For-Review: esams,ulsfo,eqsin: one VM request each for install_servers - https://phabricator.wikimedia.org/T254157 (10Dzahn) 05Open→03Resolved VMs have been created and added to site.pp with insetup role. There are some problems with installing OS but that shall be a... [21:55:14] 10Operations, 10Push-Notification-Service, 10serviceops, 10Product-Infrastructure-Team-Backlog (Kanban): Timeout when fetching Google OAuth2 token for FCM - https://phabricator.wikimedia.org/T263695 (10Mholloway) Outgoing requests to APNS via url-downloader are working fine, so I suspect the problem is in... [21:56:29] 10Operations, 10Analytics, 10serviceops-radar, 10Article-Recommendation, and 2 others: Transferring data from Hadoop to production MySQL database - https://phabricator.wikimedia.org/T213566 (10Dzahn) [21:56:50] 10Operations, 10vm-requests: Site: 1 VM request for recommender-systems - https://phabricator.wikimedia.org/T215421 (10Dzahn) 05Stalled→03Declined Please reopen this ticket if you still want the VM and want to restart the discussion on the parent task. [21:59:26] 10Operations, 10vm-requests: eqiad: New Ganeti instance for Hue (an-tool1009) - https://phabricator.wikimedia.org/T258771 (10Dzahn) I see there is an-tool1009 on buster in site.pp. Is this already done? [22:26:20] 10Operations, 10Product-Infrastructure-Data, 10Epic, 10Goal, 10Patch-For-Review: automatically collect network error reports from users' browsers (Network Error Logging API) - https://phabricator.wikimedia.org/T257527 (10CDanis) So, comparing the same 24h window (19:00 UTC Tuesday -- 19:00 UTC Wednesday)... [22:27:41] !log ganeti5003 - gnt-instance start install5001 [22:27:46] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:43:30] (03PS1) 10Dzahn: install_server: move POP installserver VMs to ttyS0 file [puppet] - 10https://gerrit.wikimedia.org/r/629486 (https://phabricator.wikimedia.org/T263684) [22:44:15] (03PS2) 10Dzahn: install_server: move POP installserver VMs to ttyS0 file [puppet] - 10https://gerrit.wikimedia.org/r/629486 (https://phabricator.wikimedia.org/T263684) [22:46:18] (03CR) 10Papaul: [C: 03+2] install_server: move POP installserver VMs to ttyS0 file [puppet] - 10https://gerrit.wikimedia.org/r/629486 (https://phabricator.wikimedia.org/T263684) (owner: 10Dzahn) [22:47:28] (03CR) 10Dzahn: "also remove the "option pxelinux.pathprefix "buster-installer/"" used for testing on install5001" [puppet] - 10https://gerrit.wikimedia.org/r/629486 (https://phabricator.wikimedia.org/T263684) (owner: 10Dzahn) [22:48:32] 10Operations, 10Push-Notification-Service, 10serviceops, 10Product-Infrastructure-Team-Backlog (Kanban): Timeout when fetching Google OAuth2 token for FCM - https://phabricator.wikimedia.org/T263695 (10Mholloway) The http.Agent with the proxy setup has to be passed to the Credential as well as to AppOption... [22:50:56] 10Operations, 10Product-Infrastructure-Data, 10Epic, 10Goal, 10Patch-For-Review: automatically collect network error reports from users' browsers (Network Error Logging API) - https://phabricator.wikimedia.org/T257527 (10CDanis) [22:51:06] !log ganeti5003 - rebooting install5001 [22:51:10] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:00:04] RoanKattouw, Niharika, and Urbanecm: #bothumor When your hammer is PHP, everything starts looking like a thumb. Rise for Evening backport window. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20200923T2300). [23:00:04] Mutante: A patch you scheduled for Evening backport window is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker. [23:00:31] oh hi, yes [23:00:44] RoanKattouw: i have a very simple change that removes a decom'ed server [23:00:58] that would be the only one in this slot [23:01:23] or Urbanecm or Nikerabbit [23:01:29] sorry, wrong nick [23:01:37] i meant Niharika [23:02:00] 10Operations, 10Push-Notification-Service, 10serviceops, 10Patch-For-Review, 10Product-Infrastructure-Team-Backlog (Kanban): Timeout when fetching Google OAuth2 token for FCM - https://phabricator.wikimedia.org/T263695 (10Mholloway) a:03Mholloway [23:04:33] !log ganeti4003 - rebooting install4001 [23:04:36] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:10:36] !log ganeti5003 - rebooting install5001 - OS install on 3001/4001/5001 T263684 [23:10:40] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:10:41] T263684: problems installing OS on new install servers (bootstrapping install servers in POPs) - https://phabricator.wikimedia.org/T263684 [23:22:10] (03PS1) 10Mholloway: Update push-notifications to 2020-09-23-230744-publish [deployment-charts] - 10https://gerrit.wikimedia.org/r/629491 (https://phabricator.wikimedia.org/T263695) [23:23:18] mutante: yesM [23:23:22] sorry, I was afk [23:23:49] Urbanecm: it's not urgent by any means, but if you don't mind.. [23:23:56] i saw the empty slot and added it [23:23:56] not at all :) [23:24:03] :) thanks [23:24:14] (03CR) 10Urbanecm: [C: 03+2] remove wtp2005 from wgLinterSubmitterWhitelist [mediawiki-config] - 10https://gerrit.wikimedia.org/r/629475 (https://phabricator.wikimedia.org/T257903) (owner: 10Dzahn) [23:24:20] so yea, the only test here is: confirm it's gone from DNS, and i did [23:24:37] yup, saw that in the comment - thanks [23:25:01] (03Merged) 10jenkins-bot: remove wtp2005 from wgLinterSubmitterWhitelist [mediawiki-config] - 10https://gerrit.wikimedia.org/r/629475 (https://phabricator.wikimedia.org/T257903) (owner: 10Dzahn) [23:27:19] !log urbanecm@deploy1001 Synchronized wmf-config/InitialiseSettings.php: 22382a97ec252488a346fbf0c3d40bc974d0cdbe: remove wtp2005 from wgLinterSubmitterWhitelist (T257903) (duration: 01m 04s) [23:27:23] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:27:24] T257903: decom wtp2005 (was: wtp2005 hardware issue) - https://phabricator.wikimedia.org/T257903 [23:27:26] mutante: should be live :) [23:28:07] Urbanecm: thank you:) nothing else to do here. window is already over i think [23:28:31] we have 32 minutes :) [23:28:52] (unless my clocks are misconfigured :)) [23:28:55] can't confirm that non-existing server can't submit stuff, it's already down [23:29:11] I did not mean the time but that there is nothing else in the calendar [23:29:20] ah, got it [23:29:58] sure, I guess it is fine, a down server isn't supposed to send anything :) [23:30:36] this would have only mattered if that IP address is soon reused by something else [23:30:59] and even then.. a random non-appserver would have been whitelisted to submit lint issues.. would it would have never done [23:31:14] (03CR) 10Mholloway: [C: 03+2] Update push-notifications to 2020-09-23-230744-publish [deployment-charts] - 10https://gerrit.wikimedia.org/r/629491 (https://phabricator.wikimedia.org/T263695) (owner: 10Mholloway) [23:31:28] but still... decom is not fully done if a server is still in config :p [23:31:36] mutante: exactly :) [23:31:43] do we reuse ip addresses often? [23:31:46] (just out of curiosity) [23:32:10] yes, but the brandnew part is that we don't manually control anymore which one you use when [23:32:30] since they are generated from netbox data (soon) [23:32:57] i see [23:33:16] reusing IPs is common. reusing host names is not common anymore but has happened and each time i disliked that [23:33:31] (03Merged) 10jenkins-bot: Update push-notifications to 2020-09-23-230744-publish [deployment-charts] - 10https://gerrit.wikimedia.org/r/629491 (https://phabricator.wikimedia.org/T263695) (owner: 10Mholloway) [23:33:56] understandably - "do you talk about the old one or new one", right? [23:34:12] at one point icinga was "neon" [23:34:33] now neon is a k8s:staging::master [23:34:46] sometimes you get attached to the name, muscle memory [23:35:05] shouldn't we update https://wikitech.wikimedia.org/wiki/Neon then? [23:35:10] it's just in eqiad because our naming scheme was "chemical elements" and there are just not enough in the pool [23:35:20] stars in codfw ..pool is large enough [23:35:30] :D we need some more elements [23:35:32] heh, yes [23:36:05] and yeah, muscle memory is a think - it took me a while to ssh to codfw servers, and it will take me a while to ssh to eqiad hosts once we switch back [23:36:15] the problem is kind of solved though [23:36:15] (is there a plan for switchback btw?) [23:36:28] because we now use foo1001 pattern for basically everything [23:36:43] which makes sense [23:37:13] (but we might have more than 999 servers for one role in one dc far in the future?) [23:37:25] !log mholloway-shell@deploy1001 helmfile [staging] Ran 'sync' command on namespace 'push-notifications' for release 'main' . [23:37:28] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:38:42] Urbanecm: yes, but the exact date is still TBD (https://wikitech.wikimedia.org/wiki/Switch_Datacenter#Schedule_for_2020_switch) [23:38:55] thanks [23:40:10] !log mholloway-shell@deploy1001 helmfile [eqiad] Ran 'sync' command on namespace 'push-notifications' for release 'main' . [23:40:13] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:42:45] !log mholloway-shell@deploy1001 helmfile [codfw] Ran 'sync' command on namespace 'push-notifications' for release 'main' . [23:42:48] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:43:00] (03PS1) 10Urbanecm: Add new Racine namespace to frwiktionary [mediawiki-config] - 10https://gerrit.wikimedia.org/r/629492 (https://phabricator.wikimedia.org/T263525) [23:43:34] (03CR) 10Urbanecm: [C: 03+2] Add new Racine namespace to frwiktionary [mediawiki-config] - 10https://gerrit.wikimedia.org/r/629492 (https://phabricator.wikimedia.org/T263525) (owner: 10Urbanecm) [23:44:18] (03Merged) 10jenkins-bot: Add new Racine namespace to frwiktionary [mediawiki-config] - 10https://gerrit.wikimedia.org/r/629492 (https://phabricator.wikimedia.org/T263525) (owner: 10Urbanecm) [23:44:35] !log urbanecm@deploy1001 sync-file aborted: (no justification provided) (duration: 00m 00s) [23:44:38] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:46:10] !log urbanecm@deploy1001 Synchronized wmf-config/InitialiseSettings.php: cbd77e3dff0d56b851b3d15b4d267d1faacfae26: Add new Racine namespace to frwiktionary (T263525) (duration: 01m 05s) [23:46:14] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:46:15] T263525: New Namespace for the French Wiktionary : "Racine" - https://phabricator.wikimedia.org/T263525 [23:46:38] 10Operations: problems installing OS on new install servers (bootstrapping install servers in POPs) - https://phabricator.wikimedia.org/T263684 (10Dzahn) 05Open→03Resolved OS has been installed on all 3 servers. Reason was wrong tty file in DHCP (thanks @Papaul for pointing it out) and some remnants from a... [23:46:41] 10Operations, 10Patch-For-Review: serve tftpboot environment from the install servers and create one in each edge POP - https://phabricator.wikimedia.org/T252526 (10Dzahn) [23:47:02] 10Operations, 10Patch-For-Review: Sort out plan for install* servers in edge sites - https://phabricator.wikimedia.org/T242602 (10Dzahn) [23:47:10] 10Operations, 10Patch-For-Review: serve tftpboot environment from the install servers and create one in each edge POP - https://phabricator.wikimedia.org/T252526 (10Dzahn) 05Stalled→03Open p:05Medium→03High [23:52:52] !log alert1001 - systemctl restar ircecho because icinga-wm left the chat [23:52:55] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:58:12] 10Operations, 10Push-Notification-Service, 10serviceops, 10Patch-For-Review, 10Product-Infrastructure-Team-Backlog (Kanban): Timeout when fetching Google OAuth2 token for FCM - https://phabricator.wikimedia.org/T263695 (10Mholloway) 05Open→03Resolved Message are now getting through to FCM.