[01:02:57] (03PS2) 10Zoranzoki21: Disable search engine indexing in some namespaces of Icelandic Wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/532771 (https://phabricator.wikimedia.org/T231179) [01:12:25] (03PS1) 10Zoranzoki21: Enable Page Previews as default on zhwikivoyage [mediawiki-config] - 10https://gerrit.wikimedia.org/r/532836 (https://phabricator.wikimedia.org/T230624) [01:17:42] (03CR) 10Urbanecm: [C: 03+1] "lgtm" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/532836 (https://phabricator.wikimedia.org/T230624) (owner: 10Zoranzoki21) [02:02:52] (03PS1) 10Zoranzoki21: Create HIDPI logo for Wikispecies [mediawiki-config] - 10https://gerrit.wikimedia.org/r/532839 (https://phabricator.wikimedia.org/T230113) [02:04:29] (03CR) 10Zoranzoki21: "Urbanecm: I used optipng with -o5 is it correct? I forgot again..." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/532839 (https://phabricator.wikimedia.org/T230113) (owner: 10Zoranzoki21) [02:16:57] (03CR) 10Urbanecm: "> Patch Set 1:" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/532839 (https://phabricator.wikimedia.org/T230113) (owner: 10Zoranzoki21) [02:18:47] (03CR) 10Zoranzoki21: "kizule@kizule:~/development/mediawiki-config$ optipng -o7 static/images/project-logos/specieswiki-2x.png" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/532839 (https://phabricator.wikimedia.org/T230113) (owner: 10Zoranzoki21) [02:19:10] (03CR) 10Zoranzoki21: ":D" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/532839 (https://phabricator.wikimedia.org/T230113) (owner: 10Zoranzoki21) [02:42:19] PROBLEM - High average GET latency for mw requests on appserver in eqiad on icinga1001 is CRITICAL: cluster=appserver code=200 handler=proxy:unix:/run/php/fpm-www.sock https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=appserver&var-method=GET [02:43:53] RECOVERY - High average GET latency for mw requests on appserver in eqiad on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=appserver&var-method=GET [02:49:39] PROBLEM - ElasticSearch shard size check - 9243 on search.svc.codfw.wmnet is CRITICAL: CRITICAL - commonswiki_content_1556151793(82gb) https://wikitech.wikimedia.org/wiki/Search%23If_it_has_been_indexed [03:15:09] (03PS12) 10CRusnov: backends: add Netbox backend [software/cumin] - 10https://gerrit.wikimedia.org/r/514840 (https://phabricator.wikimedia.org/T205900) [03:17:28] (03CR) 10CRusnov: "Tah!" (0320 comments) [software/cumin] - 10https://gerrit.wikimedia.org/r/514840 (https://phabricator.wikimedia.org/T205900) (owner: 10CRusnov) [03:21:35] (03CR) 10jerkins-bot: [V: 04-1] backends: add Netbox backend [software/cumin] - 10https://gerrit.wikimedia.org/r/514840 (https://phabricator.wikimedia.org/T205900) (owner: 10CRusnov) [03:25:07] 10Operations, 10Icinga, 10observability: Have a link to the alert in the icinga alert email - https://phabricator.wikimedia.org/T231274 (10Mathew.onipe) p:05Triageβ†’03Normal [03:26:46] 10Operations, 10Packaging, 10serviceops, 10CPT Initiatives (Session Management Service (CDP2)), 10Core Platform Team Workboards (Green): Need help to create and deploy Debian-packaged Python 3 app - https://phabricator.wikimedia.org/T229980 (10Mathew.onipe) p:05Triageβ†’03Normal [03:27:17] 10Operations, 10Packaging, 10serviceops, 10CPT Initiatives (Session Management Service (CDP2)), 10Core Platform Team Workboards (Green): Need help to create and deploy Debian-packaged Python 3 app - https://phabricator.wikimedia.org/T229980 (10Mathew.onipe) I changed the priority of this to normal. Feel... [03:47:27] 10Operations, 10Discovery-Search (Current work): Run jstack / jmap / etc... with PrivateTmp=true - https://phabricator.wikimedia.org/T230774 (10Mathew.onipe) @Gehel I think you meant: https://wikitech.wikimedia.org/wiki/Search#Using_jstack_or_jmap_or_other_similar_tools_to_view_logs [03:56:55] (03PS13) 10CRusnov: backends: add Netbox backend [software/cumin] - 10https://gerrit.wikimedia.org/r/514840 (https://phabricator.wikimedia.org/T205900) [03:57:36] (03CR) 10CRusnov: "- Added documentation builds and example configuration with explanation." [software/cumin] - 10https://gerrit.wikimedia.org/r/514840 (https://phabricator.wikimedia.org/T205900) (owner: 10CRusnov) [04:16:31] (03PS4) 10CRusnov: Add Netbox instance addresses [dns] - 10https://gerrit.wikimedia.org/r/532502 (https://phabricator.wikimedia.org/T223291) [04:18:45] (03PS5) 10CRusnov: Add Netbox instance addresses [dns] - 10https://gerrit.wikimedia.org/r/532502 (https://phabricator.wikimedia.org/T223291) [04:19:55] (03PS6) 10CRusnov: Add Netbox instance addresses [dns] - 10https://gerrit.wikimedia.org/r/532502 (https://phabricator.wikimedia.org/T223291) [04:20:39] (03CR) 10CRusnov: "thanks for nitpicks ;)" (038 comments) [dns] - 10https://gerrit.wikimedia.org/r/532502 (https://phabricator.wikimedia.org/T223291) (owner: 10CRusnov) [04:40:14] (03CR) 10Vgutierrez: [C: 04-1] lvs: allow access to wdqs lvs on port 8888 (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/529053 (https://phabricator.wikimedia.org/T176875) (owner: 10Mathew.onipe) [04:48:00] (03PS11) 10Mathew.onipe: lvs: allow access to wdqs lvs on port 8888 [puppet] - 10https://gerrit.wikimedia.org/r/529053 (https://phabricator.wikimedia.org/T176875) [04:48:02] (03PS6) 10Mathew.onipe: elasticsearch: ship logs to local syslog server [puppet] - 10https://gerrit.wikimedia.org/r/531922 (https://phabricator.wikimedia.org/T225125) [04:48:29] (03CR) 10Mathew.onipe: lvs: allow access to wdqs lvs on port 8888 (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/529053 (https://phabricator.wikimedia.org/T176875) (owner: 10Mathew.onipe) [04:55:47] (03PS3) 10Vgutierrez: Release 8.0.5-1wm4 [debs/trafficserver] - 10https://gerrit.wikimedia.org/r/532723 (https://phabricator.wikimedia.org/T231287) [04:55:53] (03PS1) 10Marostegui: Revert "db-eqiad.php: Depool pc1009" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/532847 [04:57:23] (03CR) 10Marostegui: [C: 03+2] Revert "db-eqiad.php: Depool pc1009" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/532847 (owner: 10Marostegui) [04:58:18] (03Merged) 10jenkins-bot: Revert "db-eqiad.php: Depool pc1009" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/532847 (owner: 10Marostegui) [04:58:35] (03CR) 10jenkins-bot: Revert "db-eqiad.php: Depool pc1009" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/532847 (owner: 10Marostegui) [04:58:55] (03PS1) 10Marostegui: report_users: Add dbproxy1019 IP [software] - 10https://gerrit.wikimedia.org/r/532848 [05:01:24] (03CR) 10Vgutierrez: [C: 04-1] lvs: allow access to wdqs lvs on port 8888 (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/529053 (https://phabricator.wikimedia.org/T176875) (owner: 10Mathew.onipe) [05:01:41] (03PS2) 10Marostegui: report_users: Add dbproxy1016 IP [software] - 10https://gerrit.wikimedia.org/r/532848 [05:02:51] (03CR) 10Marostegui: [C: 03+2] report_users: Add dbproxy1016 IP [software] - 10https://gerrit.wikimedia.org/r/532848 (owner: 10Marostegui) [05:03:12] 10Operations, 10ops-eqiad, 10DBA: Degraded RAID on db1063 - https://phabricator.wikimedia.org/T231199 (10Marostegui) Thanks! ` root@db1063:~# megacli -LDPDInfo -aAll Adapter #0 Number of Virtual Disks: 1 Virtual Drive: 0 (Target Id: 0) Name : RAID Level : Primary-1, Secondary-0, RAI... [05:03:21] 10Operations, 10ops-eqiad, 10DBA: Degraded RAID on db1063 - https://phabricator.wikimedia.org/T231199 (10Marostegui) 05Openβ†’03Resolved [05:03:52] !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Repool pc1009 after optimize T210725 (duration: 00m 54s) [05:03:58] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:03:59] T210725: Replace parsercache keys to something more meaningful on db-XXXX.php - https://phabricator.wikimedia.org/T210725 [05:04:19] 10Operations, 10DBA: Disable/remove unused features on Tendril - https://phabricator.wikimedia.org/T231185 (10Marostegui) [05:13:04] (03PS40) 10CRusnov: profile::netbox: Reorganize for splitting front and back-end. [puppet] - 10https://gerrit.wikimedia.org/r/514395 (https://phabricator.wikimedia.org/T223291) [05:14:05] (03CR) 10CRusnov: "Thank you for the feedback. I have implemented most/all of the suggested changes." (0317 comments) [puppet] - 10https://gerrit.wikimedia.org/r/514395 (https://phabricator.wikimedia.org/T223291) (owner: 10CRusnov) [05:15:00] (03CR) 10jerkins-bot: [V: 04-1] profile::netbox: Reorganize for splitting front and back-end. [puppet] - 10https://gerrit.wikimedia.org/r/514395 (https://phabricator.wikimedia.org/T223291) (owner: 10CRusnov) [05:15:14] 10Operations, 10DBA, 10OTRS: Switchover m1 primary master: db1063 to db1135 - https://phabricator.wikimedia.org/T231403 (10Marostegui) [05:15:31] 10Operations, 10DBA, 10OTRS: Switchover m1 primary master: db1063 to db1135 - https://phabricator.wikimedia.org/T231403 (10Marostegui) p:05Triageβ†’03Normal [05:16:00] 10Operations, 10DBA, 10OTRS: Switchover m1 primary master: db1063 to db1135 - https://phabricator.wikimedia.org/T231403 (10Marostegui) @akosiaris @Dzahn @jcrespo @ayounsi let me know if that proposed day and time would work for you. Thanks! [05:19:06] !log Start dropping neodymium grants across all the databases, parsercache, es, dbstore... T229796 [05:19:12] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:19:12] T229796: Remove sarin and neodymium GRANTs from all the databases - https://phabricator.wikimedia.org/T229796 [05:20:30] (03PS41) 10CRusnov: profile::netbox: Reorganize for splitting front and back-end. [puppet] - 10https://gerrit.wikimedia.org/r/514395 (https://phabricator.wikimedia.org/T223291) [05:22:22] (03CR) 10jerkins-bot: [V: 04-1] profile::netbox: Reorganize for splitting front and back-end. [puppet] - 10https://gerrit.wikimedia.org/r/514395 (https://phabricator.wikimedia.org/T223291) (owner: 10CRusnov) [05:24:48] (03PS4) 10Vgutierrez: Release 8.0.5-1wm4 [debs/trafficserver] - 10https://gerrit.wikimedia.org/r/532723 (https://phabricator.wikimedia.org/T231287) [05:34:14] 10Operations, 10DBA, 10OTRS: Switchover m1 primary master: db1063 to db1135 - https://phabricator.wikimedia.org/T231403 (10ayounsi) πŸ‘ [05:34:53] 10Operations, 10Traffic, 10Patch-For-Review: Investigate HTTP/2 limits on trafficserver - https://phabricator.wikimedia.org/T231287 (10Vgutierrez) After some discussion with upstream developers, https://github.com/apache/trafficserver/pull/5888 has been submitted and it's been included in https://gerrit.wiki... [05:35:47] 10Operations, 10Traffic, 10Patch-For-Review: Investigate HTTP/2 limits on trafficserver - https://phabricator.wikimedia.org/T231287 (10Vgutierrez) p:05Triageβ†’03Normal [05:54:23] (03PS1) 10Vgutierrez: ATS: Fix indentation on Trafficserver::Network_settings [puppet] - 10https://gerrit.wikimedia.org/r/532851 [05:54:41] !log Remove old rows from pc1010 - T210725 [05:54:54] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:55:10] T210725: Replace parsercache keys to something more meaningful on db-XXXX.php - https://phabricator.wikimedia.org/T210725 [05:58:12] (03PS2) 10Vgutierrez: ATS: Fix indentation on Trafficserver::Network_settings [puppet] - 10https://gerrit.wikimedia.org/r/532851 [06:02:18] (03PS1) 10Vgutierrez: ATS: Allow configuring HTTP/2 limits [puppet] - 10https://gerrit.wikimedia.org/r/532852 (https://phabricator.wikimedia.org/T231287) [06:02:21] (03PS1) 10Vgutierrez: ATS: Disable HTTP/2 max priority frames limit [puppet] - 10https://gerrit.wikimedia.org/r/532853 (https://phabricator.wikimedia.org/T231287) [06:06:02] (03CR) 10Vgutierrez: [C: 03+2] "PCC shows the expected NOOP: https://puppet-compiler.wmflabs.org/compiler1001/18069/" [puppet] - 10https://gerrit.wikimedia.org/r/532851 (owner: 10Vgutierrez) [06:10:49] (03CR) 10Vgutierrez: [C: 03+2] ATS: Allow configuring HTTP/2 limits [puppet] - 10https://gerrit.wikimedia.org/r/532852 (https://phabricator.wikimedia.org/T231287) (owner: 10Vgutierrez) [06:12:12] (03CR) 10Vgutierrez: "pcc seems happy: https://puppet-compiler.wmflabs.org/compiler1001/18070/" [puppet] - 10https://gerrit.wikimedia.org/r/532852 (https://phabricator.wikimedia.org/T231287) (owner: 10Vgutierrez) [06:12:38] (03CR) 10Vgutierrez: "pcc looks happy: https://puppet-compiler.wmflabs.org/compiler1002/18071/" [puppet] - 10https://gerrit.wikimedia.org/r/532853 (https://phabricator.wikimedia.org/T231287) (owner: 10Vgutierrez) [06:21:31] 10Operations, 10DBA: Remove sarin and neodymium GRANTs from all the databases - https://phabricator.wikimedia.org/T229796 (10Marostegui) [06:21:53] 10Operations, 10DBA: Remove sarin and neodymium GRANTs from all the databases - https://phabricator.wikimedia.org/T229796 (10Marostegui) 05Openβ†’03Resolved `neodymium` grants have been removed everywhere [06:21:55] 10Operations, 10ops-codfw, 10decommission, 10Patch-For-Review: Decommission sarin - https://phabricator.wikimedia.org/T220504 (10Marostegui) [06:21:57] 10Operations, 10ops-eqiad, 10decommission, 10Patch-For-Review: Decommission neodymium - https://phabricator.wikimedia.org/T220503 (10Marostegui) [06:25:05] 10Operations, 10DBA: Decommission db2053.codfw.wmnet - https://phabricator.wikimedia.org/T231407 (10Marostegui) [06:25:17] 10Operations, 10DBA: Decommission db2053.codfw.wmnet - https://phabricator.wikimedia.org/T231407 (10Marostegui) p:05Triageβ†’03Normal [06:26:18] (03PS1) 10Marostegui: db2053: Disable notifications [puppet] - 10https://gerrit.wikimedia.org/r/532856 (https://phabricator.wikimedia.org/T231407) [06:27:23] (03CR) 10Marostegui: [C: 03+2] db2053: Disable notifications [puppet] - 10https://gerrit.wikimedia.org/r/532856 (https://phabricator.wikimedia.org/T231407) (owner: 10Marostegui) [06:28:41] 10Operations, 10DBA, 10Patch-For-Review: Decommission db2053.codfw.wmnet - https://phabricator.wikimedia.org/T231407 (10Marostegui) [06:29:00] 10Operations, 10DBA: Predictive failures on disk S.M.A.R.T. status - https://phabricator.wikimedia.org/T208323 (10Marostegui) [06:31:48] (03PS1) 10Marostegui: db-eqiad,db-codfw.php: Remove db2053 from config [mediawiki-config] - 10https://gerrit.wikimedia.org/r/532857 (https://phabricator.wikimedia.org/T231407) [06:32:21] (03CR) 10Marostegui: [C: 03+2] db-eqiad,db-codfw.php: Remove db2053 from config [mediawiki-config] - 10https://gerrit.wikimedia.org/r/532857 (https://phabricator.wikimedia.org/T231407) (owner: 10Marostegui) [06:32:36] 10Operations, 10DBA, 10Patch-For-Review: Decommission db2053.codfw.wmnet - https://phabricator.wikimedia.org/T231407 (10Marostegui) [06:33:17] (03Merged) 10jenkins-bot: db-eqiad,db-codfw.php: Remove db2053 from config [mediawiki-config] - 10https://gerrit.wikimedia.org/r/532857 (https://phabricator.wikimedia.org/T231407) (owner: 10Marostegui) [06:33:33] (03CR) 10jenkins-bot: db-eqiad,db-codfw.php: Remove db2053 from config [mediawiki-config] - 10https://gerrit.wikimedia.org/r/532857 (https://phabricator.wikimedia.org/T231407) (owner: 10Marostegui) [06:34:41] !log marostegui@deploy1001 Synchronized wmf-config/db-codfw.php: Remove db2053 from config T231407 (duration: 00m 55s) [06:34:58] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:34:58] T231407: Decommission db2053.codfw.wmnet - https://phabricator.wikimedia.org/T231407 [06:35:39] !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Remove db2053 from config T231407 (duration: 00m 53s) [06:36:00] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:41:43] !log Upgrade mysql on s7 codfw hosts: db2054, db2061, db2068, db2077 - T230106 [06:41:48] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:41:48] T230106: Switchover codfw primary database masters to new hosts - https://phabricator.wikimedia.org/T230106 [07:20:29] (03PS1) 10Dzahn: install_server: switch miscweb servers to buster installer [puppet] - 10https://gerrit.wikimedia.org/r/532862 (https://phabricator.wikimedia.org/T224247) [07:21:14] (03PS10) 10DannyS712: General cleanup of initialize settings [mediawiki-config] - 10https://gerrit.wikimedia.org/r/532280 (https://phabricator.wikimedia.org/T231178) [07:23:31] (03CR) 10Dzahn: [C: 03+2] install_server: switch miscweb servers to buster installer [puppet] - 10https://gerrit.wikimedia.org/r/532862 (https://phabricator.wikimedia.org/T224247) (owner: 10Dzahn) [07:23:58] (03PS2) 10Dzahn: install_server: switch miscweb servers to buster installer [puppet] - 10https://gerrit.wikimedia.org/r/532862 (https://phabricator.wikimedia.org/T224247) [07:26:04] (03CR) 10Filippo Giunchedi: [C: 03+1] swiftrepl: bring close to as-is in production [software] - 10https://gerrit.wikimedia.org/r/532793 (https://phabricator.wikimedia.org/T231110) (owner: 10CDanis) [07:28:22] (03CR) 10Ema: [C: 03+1] Release 8.0.5-1wm4 (031 comment) [debs/trafficserver] - 10https://gerrit.wikimedia.org/r/532723 (https://phabricator.wikimedia.org/T231287) (owner: 10Vgutierrez) [07:33:38] (03CR) 10Vgutierrez: [C: 03+2] Release 8.0.5-1wm4 [debs/trafficserver] - 10https://gerrit.wikimedia.org/r/532723 (https://phabricator.wikimedia.org/T231287) (owner: 10Vgutierrez) [07:39:09] (03CR) 10Ema: "hieradata/role/common/cache/text_ats.yaml needs to be updated too" [puppet] - 10https://gerrit.wikimedia.org/r/532695 (https://phabricator.wikimedia.org/T224247) (owner: 10Dzahn) [07:39:41] (03PS1) 10Dzahn: Revert "install_server: switch miscweb servers to buster installer" [puppet] - 10https://gerrit.wikimedia.org/r/532866 [07:41:50] (03CR) 10Dzahn: "talked with Ema and he would like to use envoyproxy on this host for TLS termination and currently we only have stretch packages so the je" [puppet] - 10https://gerrit.wikimedia.org/r/532866 (owner: 10Dzahn) [07:42:36] (03CR) 10Dzahn: [C: 03+2] Revert "install_server: switch miscweb servers to buster installer" [puppet] - 10https://gerrit.wikimedia.org/r/532866 (owner: 10Dzahn) [07:44:01] (03PS1) 10Marostegui: dbproxy1018: Productionize dbproxy1018, will replace dbproxy1010 [puppet] - 10https://gerrit.wikimedia.org/r/532867 (https://phabricator.wikimedia.org/T202367) [07:45:23] (03PS2) 10Marostegui: dbproxy1018: Productionize dbproxy1018, will replace dbproxy1010 [puppet] - 10https://gerrit.wikimedia.org/r/532867 (https://phabricator.wikimedia.org/T202367) [07:45:42] (03PS3) 10Marostegui: dbproxy1018: Productionize dbproxy1018, will replace dbproxy1010 [puppet] - 10https://gerrit.wikimedia.org/r/532867 (https://phabricator.wikimedia.org/T202367) [07:47:30] (03PS1) 10Vgutierrez: Update wikisource non canonical domains [dns] - 10https://gerrit.wikimedia.org/r/532870 (https://phabricator.wikimedia.org/T133548) [07:47:32] (03PS1) 10Vgutierrez: Update wiktionary non canonical domains [dns] - 10https://gerrit.wikimedia.org/r/532871 (https://phabricator.wikimedia.org/T133548) [07:47:34] (03PS1) 10Vgutierrez: Update wikivoyage non canonical domains [dns] - 10https://gerrit.wikimedia.org/r/532872 (https://phabricator.wikimedia.org/T133548) [07:47:36] (03PS1) 10Vgutierrez: Update wikiversity non canonical domains [dns] - 10https://gerrit.wikimedia.org/r/532873 (https://phabricator.wikimedia.org/T133548) [07:47:38] (03PS1) 10Vgutierrez: Update wikiquote non canonical domains [dns] - 10https://gerrit.wikimedia.org/r/532874 (https://phabricator.wikimedia.org/T133548) [07:47:40] (03PS1) 10Vgutierrez: Update wikinews non canonical domains [dns] - 10https://gerrit.wikimedia.org/r/532875 (https://phabricator.wikimedia.org/T133548) [07:47:42] (03PS1) 10Vgutierrez: Update wikibooks non canonical domains [dns] - 10https://gerrit.wikimedia.org/r/532876 (https://phabricator.wikimedia.org/T133548) [07:47:46] O:) [07:48:11] (03CR) 10Marostegui: [C: 03+2] dbproxy1018: Productionize dbproxy1018, will replace dbproxy1010 [puppet] - 10https://gerrit.wikimedia.org/r/532867 (https://phabricator.wikimedia.org/T202367) (owner: 10Marostegui) [07:48:34] <_joe_> mutante: I do have the buster packages for envoy built [07:48:44] <_joe_> I shall test and upload them [07:52:07] (03PS1) 10Marostegui: dbproxy1018: Disable notifications [puppet] - 10https://gerrit.wikimedia.org/r/532877 [07:52:22] (03PS2) 10Dzahn: trafficserver/varnish: replace krypton with miscweb1001, rename director [puppet] - 10https://gerrit.wikimedia.org/r/532695 (https://phabricator.wikimedia.org/T224247) [07:52:24] _joe_: ooh! heh, a lot of good timing around this :) [07:52:26] ema: ^ [07:52:45] mutante: please, use ATS instead of trafficserver on commit messages <3 [07:53:08] (03CR) 10Marostegui: [C: 03+2] dbproxy1018: Disable notifications [puppet] - 10https://gerrit.wikimedia.org/r/532877 (owner: 10Marostegui) [07:53:37] vgutierrez: ok. or want a standardized topic branch name? that makes nice Gerrit URLs [07:53:50] with lists of changes [07:54:31] hmm regarding the topic name I use the phab task [08:00:37] (03PS1) 10Ema: ATS: get rid of alternate_domains not overriding caching [puppet] - 10https://gerrit.wikimedia.org/r/532878 (https://phabricator.wikimedia.org/T227432) [08:02:36] 10Operations, 10Commons, 10MediaWiki-File-management, 10media-storage, 10Patch-For-Review: bring swiftrepl back to life - https://phabricator.wikimedia.org/T231110 (10aaron) I do worry about the risk of data loss if swiftrepl is also deleting files based on container list differences. Between some FileB... [08:03:07] vgutierrez: ok. by the way i just do that because i use the puppet module names [08:03:23] yeah... maybe is our (traffic) fault [08:03:34] we decided on ATS as a convention for trafficserver changes [08:03:44] alright [08:04:51] (03Abandoned) 10Ema: ATS: get rid of alternate_domains not overriding caching [puppet] - 10https://gerrit.wikimedia.org/r/532878 (https://phabricator.wikimedia.org/T227432) (owner: 10Ema) [08:06:22] !log marostegui@cumin1001 START - Cookbook sre.hosts.downtime [08:06:27] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:07:28] PROBLEM - HTTP availability for Nginx -SSL terminators- at esams on icinga1001 is CRITICAL: cluster=cache_text site=esams https://wikitech.wikimedia.org/wiki/Cache_TLS_termination https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=4&fullscreen&refresh=1m&orgId=1 [08:07:40] PROBLEM - HTTP availability for Varnish at esams on icinga1001 is CRITICAL: job=varnish-text site=esams https://wikitech.wikimedia.org/wiki/Varnish%23Diagnosing_Varnish_alerts https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=3&fullscreen&refresh=1m&orgId=1 https://logstash.wikimedia.org/goto/60aa05b6e1129b475fbf4e7be868c67d [08:08:20] !log marostegui@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) [08:08:24] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:08:32] (03PS1) 10Vgutierrez: Update wikimediafoundation non canonical domains [dns] - 10https://gerrit.wikimedia.org/r/532879 (https://phabricator.wikimedia.org/T133548) [08:10:25] !log uploaded trafficserver-8.0.5-1wm4 to apt.wikimedia.org (stretch) - T231287 [08:10:30] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:10:30] T231287: Investigate HTTP/2 limits on trafficserver - https://phabricator.wikimedia.org/T231287 [08:10:58] RECOVERY - HTTP availability for Varnish at esams on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Varnish%23Diagnosing_Varnish_alerts https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=3&fullscreen&refresh=1m&orgId=1 https://logstash.wikimedia.org/goto/60aa05b6e1129b475fbf4e7be868c67d [08:11:10] (03PS3) 10Dzahn: ATS/varnish: replace krypton with miscweb1001, rename director [puppet] - 10https://gerrit.wikimedia.org/r/532695 (https://phabricator.wikimedia.org/T224247) [08:11:44] RECOVERY - HTTP availability for Nginx -SSL terminators- at esams on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Cache_TLS_termination https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=4&fullscreen&refresh=1m&orgId=1 [08:12:10] (03CR) 10Ema: [C: 03+1] ATS/varnish: replace krypton with miscweb1001, rename director [puppet] - 10https://gerrit.wikimedia.org/r/532695 (https://phabricator.wikimedia.org/T224247) (owner: 10Dzahn) [08:13:51] (03PS2) 10Ema: Revert "webserver_misc_apps: do not install envoy" [puppet] - 10https://gerrit.wikimedia.org/r/532380 (https://phabricator.wikimedia.org/T210411) [08:16:29] 10Operations, 10netops: Review switches ACL to connect from tools-bastion to dbproxy1018 - https://phabricator.wikimedia.org/T231418 (10Marostegui) [08:16:49] 10Operations, 10netops: Review switches ACL to connect from tools-bastion to dbproxy1018 - https://phabricator.wikimedia.org/T231418 (10Marostegui) [08:16:54] 10Operations, 10netops: Review switches ACL to connect from tools-bastion to dbproxy1018 - https://phabricator.wikimedia.org/T231418 (10Marostegui) p:05Triageβ†’03Normal [08:17:35] !log Deploy grants on labsdb hosts for dbproxy1018 - T202367 [08:17:40] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:17:41] T202367: Productionize dbproxy101[2-7].eqiad.wmnet and dbproxy200[1-4] - https://phabricator.wikimedia.org/T202367 [08:18:48] (03PS1) 10Vgutierrez: Update wikipedia non canonical domains [dns] - 10https://gerrit.wikimedia.org/r/532880 (https://phabricator.wikimedia.org/T133548) [08:18:57] finally.. fourth attempt :/ [08:18:58] (03CR) 10Dzahn: [C: 03+2] Revert "webserver_misc_apps: do not install envoy" [puppet] - 10https://gerrit.wikimedia.org/r/532380 (https://phabricator.wikimedia.org/T210411) (owner: 10Ema) [08:19:08] (03PS3) 10Dzahn: Revert "webserver_misc_apps: do not install envoy" [puppet] - 10https://gerrit.wikimedia.org/r/532380 (https://phabricator.wikimedia.org/T210411) (owner: 10Ema) [08:26:06] (03PS2) 10Filippo Giunchedi: prometheus: bump logstash rate of ingestion threshold [puppet] - 10https://gerrit.wikimedia.org/r/532707 (https://phabricator.wikimedia.org/T228878) [08:26:36] (03CR) 10Filippo Giunchedi: [C: 03+2] prometheus: bump logstash rate of ingestion threshold [puppet] - 10https://gerrit.wikimedia.org/r/532707 (https://phabricator.wikimedia.org/T228878) (owner: 10Filippo Giunchedi) [08:27:55] (03PS1) 10Vgutierrez: Redirect wikimania.com to the non canonical redirect service [dns] - 10https://gerrit.wikimedia.org/r/532881 (https://phabricator.wikimedia.org/T133548) [08:27:57] (03PS1) 10Vgutierrez: Redirect mediawiki.com to the non canonical redirect service [dns] - 10https://gerrit.wikimedia.org/r/532882 (https://phabricator.wikimedia.org/T133548) [08:31:11] (03CR) 10Ema: [C: 03+1] Redirect mediawiki.com to the non canonical redirect service [dns] - 10https://gerrit.wikimedia.org/r/532882 (https://phabricator.wikimedia.org/T133548) (owner: 10Vgutierrez) [08:31:20] (03CR) 10Ema: [C: 03+1] Redirect wikimania.com to the non canonical redirect service [dns] - 10https://gerrit.wikimedia.org/r/532881 (https://phabricator.wikimedia.org/T133548) (owner: 10Vgutierrez) [08:31:26] (03CR) 10Ema: [C: 03+1] Update wikipedia non canonical domains [dns] - 10https://gerrit.wikimedia.org/r/532880 (https://phabricator.wikimedia.org/T133548) (owner: 10Vgutierrez) [08:32:01] (03CR) 10Ema: [C: 03+1] "Didn't we get rid of wpzero though?" [dns] - 10https://gerrit.wikimedia.org/r/532879 (https://phabricator.wikimedia.org/T133548) (owner: 10Vgutierrez) [08:32:13] (03CR) 10Ema: [C: 03+1] Update wikibooks non canonical domains [dns] - 10https://gerrit.wikimedia.org/r/532876 (https://phabricator.wikimedia.org/T133548) (owner: 10Vgutierrez) [08:32:25] (03CR) 10Ema: [C: 03+1] Update wikinews non canonical domains [dns] - 10https://gerrit.wikimedia.org/r/532875 (https://phabricator.wikimedia.org/T133548) (owner: 10Vgutierrez) [08:32:40] (03CR) 10Ema: [C: 03+1] Update wikiquote non canonical domains [dns] - 10https://gerrit.wikimedia.org/r/532874 (https://phabricator.wikimedia.org/T133548) (owner: 10Vgutierrez) [08:32:51] (03CR) 10Ema: [C: 03+1] Update wikiversity non canonical domains [dns] - 10https://gerrit.wikimedia.org/r/532873 (https://phabricator.wikimedia.org/T133548) (owner: 10Vgutierrez) [08:33:12] (03CR) 10Ema: [C: 03+1] Update wikivoyage non canonical domains [dns] - 10https://gerrit.wikimedia.org/r/532872 (https://phabricator.wikimedia.org/T133548) (owner: 10Vgutierrez) [08:33:25] (03CR) 10Ema: [C: 03+1] Update wiktionary non canonical domains [dns] - 10https://gerrit.wikimedia.org/r/532871 (https://phabricator.wikimedia.org/T133548) (owner: 10Vgutierrez) [08:33:44] (03CR) 10Ema: [C: 03+1] Update wikisource non canonical domains [dns] - 10https://gerrit.wikimedia.org/r/532870 (https://phabricator.wikimedia.org/T133548) (owner: 10Vgutierrez) [08:34:29] that should grant me the title of committer of the day [08:35:00] :) [08:35:01] (03CR) 10Dzahn: "zero.wikipedia.org has been dropped in 9943d176e50882ec11a478" [dns] - 10https://gerrit.wikimedia.org/r/532879 (https://phabricator.wikimedia.org/T133548) (owner: 10Vgutierrez) [08:37:09] (03CR) 10Vgutierrez: "> Patch Set 1:" [dns] - 10https://gerrit.wikimedia.org/r/532879 (https://phabricator.wikimedia.org/T133548) (owner: 10Vgutierrez) [08:38:21] (03CR) 10Vgutierrez: [C: 03+2] Update wikisource non canonical domains [dns] - 10https://gerrit.wikimedia.org/r/532870 (https://phabricator.wikimedia.org/T133548) (owner: 10Vgutierrez) [08:39:08] (03CR) 10Dzahn: [C: 03+1] "> Patch Set 1:" [dns] - 10https://gerrit.wikimedia.org/r/532879 (https://phabricator.wikimedia.org/T133548) (owner: 10Vgutierrez) [08:40:15] (03CR) 10Vgutierrez: [C: 03+2] Update wiktionary non canonical domains [dns] - 10https://gerrit.wikimedia.org/r/532871 (https://phabricator.wikimedia.org/T133548) (owner: 10Vgutierrez) [08:41:34] (03CR) 10Vgutierrez: [C: 03+2] Update wikivoyage non canonical domains [dns] - 10https://gerrit.wikimedia.org/r/532872 (https://phabricator.wikimedia.org/T133548) (owner: 10Vgutierrez) [08:54:56] (03PS1) 10Dzahn: misc_apps::httpd: do not load SSL httpd module [puppet] - 10https://gerrit.wikimedia.org/r/532948 (https://phabricator.wikimedia.org/T224247) [09:01:43] (03CR) 10Dzahn: [C: 03+2] "all VirtualHosts on krypton/miscweb1001 are 80, nothing is 443 since a long time when this moved behind varnish in the first place" [puppet] - 10https://gerrit.wikimedia.org/r/532948 (https://phabricator.wikimedia.org/T224247) (owner: 10Dzahn) [09:03:57] (03CR) 10DCausse: elasticsearch: ship logs to local syslog server (033 comments) [puppet] - 10https://gerrit.wikimedia.org/r/531922 (https://phabricator.wikimedia.org/T225125) (owner: 10Mathew.onipe) [09:05:40] PROBLEM - Disk space on contint1001 is CRITICAL: DISK CRITICAL - free space: /srv 50402 MB (5% inode=93%): https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space https://grafana.wikimedia.org/dashboard/db/host-overview?var-server=contint1001&var-datasource=eqiad+prometheus/ops [09:06:16] !log miscweb1001 - a2dismod ssl; restart apache - stop listening on 443 to make room for envoy [09:06:20] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:08:00] (03PS1) 10Vgutierrez: Feed more parked domains to the non canonical redirect service [dns] - 10https://gerrit.wikimedia.org/r/532949 (https://phabricator.wikimedia.org/T133548) [09:09:56] (03PS1) 10Vgutierrez: ncredir: Add non-canonical-redirect-6 [puppet] - 10https://gerrit.wikimedia.org/r/532950 (https://phabricator.wikimedia.org/T133548) [09:09:58] (03PS1) 10Vgutierrez: ncredir: Add redirection rules for domains added in non-canonical-cert-6 [puppet] - 10https://gerrit.wikimedia.org/r/532951 (https://phabricator.wikimedia.org/T133548) [09:10:31] (03CR) 10Vgutierrez: [C: 03+2] Update wikiversity non canonical domains [dns] - 10https://gerrit.wikimedia.org/r/532873 (https://phabricator.wikimedia.org/T133548) (owner: 10Vgutierrez) [09:11:13] !log miscweb2001 - edit /etc/apache2/ports.conf and replace port 444 with 443 again; a2dismod ssl; systemctl restart apache2; systemctl restart envoyproxy; now also has envoy listening on 443, matches miscweb1001 and manual hack removed (T210411) [09:11:18] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:11:19] T210411: Applayer services without TLS - https://phabricator.wikimedia.org/T210411 [09:12:10] (03CR) 10Vgutierrez: [C: 03+2] Update wikiquote non canonical domains [dns] - 10https://gerrit.wikimedia.org/r/532874 (https://phabricator.wikimedia.org/T133548) (owner: 10Vgutierrez) [09:12:37] 10Operations, 10Continuous-Integration-Infrastructure, 10serviceops, 10Release-Engineering-Team (CI & Testing services), 10Release-Engineering-Team-TODO (201907): contint1001 store docker images on separate partition or disk - https://phabricator.wikimedia.org/T207707 (10Marostegui) [09:12:40] 10Operations, 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team-TODO (201907): contint1001: DISK WARNING - free space: /srv 88397 MB (10% inode=94%): - https://phabricator.wikimedia.org/T219850 (10Marostegui) 05Resolvedβ†’03Open This is alerting again: ` [09:05:40] <+icinga-wm> PROBLEM... [09:13:59] (03PS2) 10Vgutierrez: ncredir: Add non-canonical-redirect-6 [puppet] - 10https://gerrit.wikimedia.org/r/532950 (https://phabricator.wikimedia.org/T230470) [09:14:03] (03PS2) 10Vgutierrez: ncredir: Add redirection rules for domains added in non-canonical-cert-6 [puppet] - 10https://gerrit.wikimedia.org/r/532951 (https://phabricator.wikimedia.org/T133548) [09:14:05] (03CR) 10Vgutierrez: [C: 03+2] Update wikinews non canonical domains [dns] - 10https://gerrit.wikimedia.org/r/532875 (https://phabricator.wikimedia.org/T133548) (owner: 10Vgutierrez) [09:14:20] !log mwdebug1002 - restart php-fpm [09:14:24] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:14:42] RECOVERY - PHP opcache health on mwdebug1002 is OK: OK: opcache is healthy https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [09:15:32] onimisionipe: do you know about the ElasticSearch chard size check" [09:15:38] apparently this is too large: CRITICAL - commonswiki_content_1556235298(60.333333333333336gb) [09:15:45] (03CR) 10Vgutierrez: [C: 03+2] Update wikibooks non canonical domains [dns] - 10https://gerrit.wikimedia.org/r/532876 (https://phabricator.wikimedia.org/T133548) (owner: 10Vgutierrez) [09:15:50] alert probably when over 60GB i guess [09:16:30] (03CR) 10Vgutierrez: [C: 03+2] Update wikimediafoundation non canonical domains [dns] - 10https://gerrit.wikimedia.org/r/532879 (https://phabricator.wikimedia.org/T133548) (owner: 10Vgutierrez) [09:19:10] (03PS3) 10Vgutierrez: ncredir: Add redirection rules for domains added in non-canonical-cert-6 [puppet] - 10https://gerrit.wikimedia.org/r/532951 (https://phabricator.wikimedia.org/T133548) [09:19:56] !log notebook1003 - systemctl start jupyter-iflorez-singleuser [09:20:00] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:23:03] (03CR) 10Vgutierrez: [C: 03+2] Update wikipedia non canonical domains [dns] - 10https://gerrit.wikimedia.org/r/532880 (https://phabricator.wikimedia.org/T133548) (owner: 10Vgutierrez) [09:25:12] (03CR) 10Vgutierrez: [C: 03+2] Redirect wikimania.com to the non canonical redirect service [dns] - 10https://gerrit.wikimedia.org/r/532881 (https://phabricator.wikimedia.org/T133548) (owner: 10Vgutierrez) [09:25:15] (03PS4) 10Vgutierrez: ncredir: Add redirection rules for domains added in non-canonical-cert-6 [puppet] - 10https://gerrit.wikimedia.org/r/532951 (https://phabricator.wikimedia.org/T133548) [09:25:34] (03CR) 10Vgutierrez: [C: 03+2] Redirect mediawiki.com to the non canonical redirect service [dns] - 10https://gerrit.wikimedia.org/r/532882 (https://phabricator.wikimedia.org/T133548) (owner: 10Vgutierrez) [09:27:34] 10Operations, 10Traffic: STL file downloads result in ERR_SPDY_PROTOCOL_ERROR - https://phabricator.wikimedia.org/T231422 (10Gilles) [09:28:00] !log notebook1004 - systemctl start jupyter-ebernhardson-singleuser (T231365) [09:28:05] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:28:05] T231365: Trouble accessing Jupyter Lab - https://phabricator.wikimedia.org/T231365 [09:28:06] RECOVERY - Check systemd state on notebook1004 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [09:28:26] PROBLEM - Disk space on contint1001 is CRITICAL: DISK CRITICAL - free space: /srv 50705 MB (5% inode=93%): https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space https://grafana.wikimedia.org/dashboard/db/host-overview?var-server=contint1001&var-datasource=eqiad+prometheus/ops [09:28:54] (03CR) 10Vgutierrez: [C: 03+2] ncredir: Add non-canonical-redirect-6 [puppet] - 10https://gerrit.wikimedia.org/r/532950 (https://phabricator.wikimedia.org/T230470) (owner: 10Vgutierrez) [09:29:05] (03PS3) 10Vgutierrez: ncredir: Add non-canonical-redirect-6 [puppet] - 10https://gerrit.wikimedia.org/r/532950 (https://phabricator.wikimedia.org/T230470) [09:30:56] ACKNOWLEDGEMENT - Check systemd state on notebook1003 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. daniel_zahn https://phabricator.wikimedia.org/T231365#5444987 https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [09:33:23] 10Operations, 10Traffic: cergen fails signing CSR - https://phabricator.wikimedia.org/T231423 (10ema) [09:33:30] 10Operations, 10Traffic: cergen fails signing CSR - https://phabricator.wikimedia.org/T231423 (10ema) p:05Triageβ†’03High [09:34:26] (03CR) 10Ema: [C: 03+1] ncredir: Add redirection rules for domains added in non-canonical-cert-6 [puppet] - 10https://gerrit.wikimedia.org/r/532951 (https://phabricator.wikimedia.org/T133548) (owner: 10Vgutierrez) [09:35:16] (03CR) 10Vgutierrez: [C: 03+2] ncredir: Add redirection rules for domains added in non-canonical-cert-6 [puppet] - 10https://gerrit.wikimedia.org/r/532951 (https://phabricator.wikimedia.org/T133548) (owner: 10Vgutierrez) [09:35:25] (03PS5) 10Vgutierrez: ncredir: Add redirection rules for domains added in non-canonical-cert-6 [puppet] - 10https://gerrit.wikimedia.org/r/532951 (https://phabricator.wikimedia.org/T133548) [09:37:22] 10Operations, 10Traffic, 10serviceops: Error pulling image from docker registry - https://phabricator.wikimedia.org/T231388 (10ema) A proper fix for this issue is blocked on cergen bug T231423. I am going to disable TLS between ATS and eqiad's docker-registry for the time being. cp1075 is also in eqiad, so... [09:41:01] 10Operations, 10Traffic: Cannot download STL files due to network error - https://phabricator.wikimedia.org/T231422 (10Gilles) [09:42:12] (03PS1) 10Ema: ATS: temporarily use plain HTTP to access docker-registry [puppet] - 10https://gerrit.wikimedia.org/r/532953 (https://phabricator.wikimedia.org/T227432) [09:43:14] 10Operations, 10DNS, 10Domains, 10Traffic, 10Patch-For-Review: Could not reach wikipedia from domain wikipedia.fi - https://phabricator.wikimedia.org/T230470 (10Vgutierrez) 05Openβ†’03Resolved a:03Vgutierrez https is happy now after adding wikipedia.fi as part of non-canonical-redirect-6 in https://g... [09:44:19] 10Operations, 10DNS, 10Domains, 10Traffic, 10Patch-For-Review: Could not reach wikipedia from domain wikipedia.fi - https://phabricator.wikimedia.org/T230470 (10Vgutierrez) oh, it also works for https://wikipedia.fi: ` willikins:~ vgutierrez$ curl https://wikipedia.fi -o /dev/null -v 2>&1 |fgrep -i Locat... [09:48:37] (03PS1) 10Dzahn: misc_apps::httpd: allow port 80 from deployment servers [puppet] - 10https://gerrit.wikimedia.org/r/532954 (https://phabricator.wikimedia.org/T224247) [09:52:00] (03PS2) 10Ema: ATS: temporarily use plain HTTP to access docker-registry [puppet] - 10https://gerrit.wikimedia.org/r/532953 (https://phabricator.wikimedia.org/T227432) [09:52:41] (03CR) 10Dzahn: [C: 03+2] "https://puppet-compiler.wmflabs.org/compiler1002/18074/krypton.eqiad.wmnet/" [puppet] - 10https://gerrit.wikimedia.org/r/532954 (https://phabricator.wikimedia.org/T224247) (owner: 10Dzahn) [09:56:26] !log upgrading trafficserver on cp5001 to version 8.0.5-1wm4 - T231287 [09:56:32] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:56:32] T231287: Investigate HTTP/2 limits on trafficserver - https://phabricator.wikimedia.org/T231287 [09:58:08] !log repooling cp5001 - T231287 [09:58:13] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:01:28] (03PS2) 10Vgutierrez: ATS: Allow configuring HTTP/2 settings [puppet] - 10https://gerrit.wikimedia.org/r/532852 (https://phabricator.wikimedia.org/T231287) [10:01:30] (03PS2) 10Vgutierrez: ATS: Disable HTTP/2 max priority frames limit [puppet] - 10https://gerrit.wikimedia.org/r/532853 (https://phabricator.wikimedia.org/T231287) [10:05:26] (03PS1) 10Pmiazga: Enable AMC Outreach modal [mediawiki-config] - 10https://gerrit.wikimedia.org/r/532957 (https://phabricator.wikimedia.org/T231206) [10:08:36] (03PS1) 10Dzahn: iegreview: require a mysql client to be installed [puppet] - 10https://gerrit.wikimedia.org/r/532959 (https://phabricator.wikimedia.org/T224247) [10:08:55] PROBLEM - High average GET latency for mw requests on appserver in eqiad on icinga1001 is CRITICAL: cluster=appserver code=200 handler=proxy:unix:/run/php/fpm-www.sock https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=appserver&var-method=GET [10:09:20] (03CR) 10Dzahn: [C: 03+2] iegreview: require a mysql client to be installed [puppet] - 10https://gerrit.wikimedia.org/r/532959 (https://phabricator.wikimedia.org/T224247) (owner: 10Dzahn) [10:09:38] (03PS1) 10Ema: profile::docker::registry: whitelist ATS nodes [puppet] - 10https://gerrit.wikimedia.org/r/532960 (https://phabricator.wikimedia.org/T227432) [10:10:25] RECOVERY - High average GET latency for mw requests on appserver in eqiad on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=appserver&var-method=GET [10:17:36] (03PS1) 10Dzahn: webserver_misc_apps: only include envoy if on stretch [puppet] - 10https://gerrit.wikimedia.org/r/532962 [10:18:13] (03CR) 10jerkins-bot: [V: 04-1] webserver_misc_apps: only include envoy if on stretch [puppet] - 10https://gerrit.wikimedia.org/r/532962 (owner: 10Dzahn) [10:18:46] (03PS2) 10Dzahn: webserver_misc_apps: only include envoy if on stretch [puppet] - 10https://gerrit.wikimedia.org/r/532962 (https://phabricator.wikimedia.org/T210411) [10:19:17] (03PS2) 10Ema: profile::docker::registry: whitelist ATS nodes [puppet] - 10https://gerrit.wikimedia.org/r/532960 (https://phabricator.wikimedia.org/T227432) [10:20:11] (03CR) 10Vgutierrez: [C: 03+2] Feed more parked domains to the non canonical redirect service [dns] - 10https://gerrit.wikimedia.org/r/532949 (https://phabricator.wikimedia.org/T133548) (owner: 10Vgutierrez) [10:22:54] (03CR) 10Dzahn: [C: 03+2] webserver_misc_apps: only include envoy if on stretch [puppet] - 10https://gerrit.wikimedia.org/r/532962 (https://phabricator.wikimedia.org/T210411) (owner: 10Dzahn) [10:25:08] (03PS4) 10Dzahn: ATS/varnish: replace krypton with miscweb1001, rename director [puppet] - 10https://gerrit.wikimedia.org/r/532695 (https://phabricator.wikimedia.org/T224247) [10:34:25] 10Operations, 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team-TODO (201907): contint1001: DISK WARNING - free space: /srv 88397 MB (10% inode=94%): - https://phabricator.wikimedia.org/T219850 (10Dzahn) /srv/jenkins 753G (!) @hashar [10:37:21] 10Operations, 10DBA, 10OTRS: Switchover m1 primary master: db1063 to db1135 - https://phabricator.wikimedia.org/T231403 (10Dzahn) Fine with me, speaking for RT and racktables. Note that RT is separate from OTRS which is more critical. [10:38:32] 10Operations, 10DBA, 10OTRS: Switchover m1 primary master: db1063 to db1135 - https://phabricator.wikimedia.org/T231403 (10Marostegui) >>! In T231403#5445157, @Dzahn wrote: > Fine with me, speaking for RT and racktables. Note that RT is separate from OTRS which is more critical. Correct! My bad, sorry! Amen... [10:38:43] 10Operations, 10DBA: Switchover m1 primary master: db1063 to db1135 - https://phabricator.wikimedia.org/T231403 (10Marostegui) [10:42:33] (03PS1) 10Jdrewniak: Bumping portals to master [mediawiki-config] - 10https://gerrit.wikimedia.org/r/532966 (https://phabricator.wikimedia.org/T128546) [10:42:33] !log start of foreachwikiindblist wikidataclient extensions/Wikibase/lib/maintenance/populateSitesTable.php --force-protocol https (T212886) [10:42:38] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:42:41] T212886: Wikidata support for nap.wikisource - https://phabricator.wikimedia.org/T212886 [10:43:28] (03PS3) 10Ema: profile::docker::registry: whitelist ATS nodes [puppet] - 10https://gerrit.wikimedia.org/r/532960 (https://phabricator.wikimedia.org/T227432) [10:52:06] (03PS4) 10Ema: docker_registry_ha: whitelist ATS nodes [puppet] - 10https://gerrit.wikimedia.org/r/532960 (https://phabricator.wikimedia.org/T227432) [10:53:32] (03CR) 10Ema: "pcc looks sane https://puppet-compiler.wmflabs.org/compiler1001/18083/" [puppet] - 10https://gerrit.wikimedia.org/r/532960 (https://phabricator.wikimedia.org/T227432) (owner: 10Ema) [10:53:55] (03PS1) 10Dzahn: remove wikiba.se microsite puppetization [puppet] - 10https://gerrit.wikimedia.org/r/532972 (https://phabricator.wikimedia.org/T99531) [10:55:50] (03CR) 10Vgutierrez: [C: 03+2] ATS: make sure that the systemd service is enabled [puppet] - 10https://gerrit.wikimedia.org/r/532652 (owner: 10Vgutierrez) [10:56:01] (03PS3) 10Vgutierrez: ATS: make sure that the systemd service is enabled [puppet] - 10https://gerrit.wikimedia.org/r/532652 [10:58:09] (03PS1) 10Dzahn: ATS/acme_chief/varnish: remove wikiba.se [puppet] - 10https://gerrit.wikimedia.org/r/532973 (https://phabricator.wikimedia.org/T155359) [10:58:55] uh... [10:59:00] jouncebot: next [10:59:00] In 0 hour(s) and 0 minute(s): European Mid-day SWAT(Max 6 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20190828T1100) [11:00:04] Amir1, Lucas_WMDE, and Urbanecm: #bothumor I οΏ½ Unicode. All rise for European Mid-day SWAT(Max 6 patches) deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20190828T1100). [11:00:04] kart_, raynor, matthiasmullie, and jan_drewniak: A patch you scheduled for European Mid-day SWAT(Max 6 patches) is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker. [11:00:17] (03PS1) 10Dzahn: labs.yaml: remove wikibase in cloud vps [puppet] - 10https://gerrit.wikimedia.org/r/532975 (https://phabricator.wikimedia.org/T99531) [11:00:19] I can SWAT today! [11:00:20] (03PS1) 10Dzahn: ATS: remove wikiba.se backend [puppet] - 10https://gerrit.wikimedia.org/r/532976 (https://phabricator.wikimedia.org/T99531) [11:00:33] o/ [11:00:40] o/ [11:00:56] p.s. I can do my own [11:01:10] Urbanecm: I'm here too. [11:01:25] o/ [11:02:07] (03CR) 10Vgutierrez: [C: 04-1] "I'd get rid of the acme_chief certificate configuration on a second commit, otherwise puppet is going to fail on the cache nodes if this c" [puppet] - 10https://gerrit.wikimedia.org/r/532973 (https://phabricator.wikimedia.org/T155359) (owner: 10Dzahn) [11:03:08] cool! [11:03:21] kart_: +2'ed your config change, will ping you when it's on mwdebug [11:03:28] (03PS8) 10Urbanecm: Enable webfonts for ru,uk,be of wikisource, and for sourceswiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/509739 (https://phabricator.wikimedia.org/T220752) (owner: 10Vladis13) [11:03:39] (03CR) 10Urbanecm: [C: 03+2] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/509739 (https://phabricator.wikimedia.org/T220752) (owner: 10Vladis13) [11:03:50] sure! [11:04:02] jan_drewniak: do you want to do it yourself? [11:04:15] Urbanecm: yup [11:04:33] ok, then let's wait for the end of all other scheduled patches, I'll then hand SWAT over to you jan_drewniak [11:04:43] (03Merged) 10jenkins-bot: Enable webfonts for ru,uk,be of wikisource, and for sourceswiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/509739 (https://phabricator.wikimedia.org/T220752) (owner: 10Vladis13) [11:04:46] sounds good [11:04:55] !log end of foreachwikiindblist wikidataclient extensions/Wikibase/lib/maintenance/populateSitesTable.php --force-protocol https (T212886) [11:05:00] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:05:01] T212886: Wikidata support for nap.wikisource - https://phabricator.wikimedia.org/T212886 [11:05:11] kart_: please test on mwdebug1002 and let me know [11:05:43] OK [11:05:56] (03CR) 10Vgutierrez: [C: 04-2] "After some thoughts... I'd block this till we have to re-image the text nodes to get rid of nginx, otherwise we need to run manual operati" [puppet] - 10https://gerrit.wikimedia.org/r/532973 (https://phabricator.wikimedia.org/T155359) (owner: 10Dzahn) [11:06:14] matthiasmullie: I see you +2'ed your backport, do you want to deploy it yourself, or do you prefer me doing it? [11:06:28] I can do my own once you're done! [11:06:46] (03CR) 10jenkins-bot: Enable webfonts for ru,uk,be of wikisource, and for sourceswiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/509739 (https://phabricator.wikimedia.org/T220752) (owner: 10Vladis13) [11:07:02] ok, will let you know once you can deploy it matthiasmullie [11:07:07] thanks [11:07:26] (03PS2) 10Urbanecm: Enable AMC Outreach modal [mediawiki-config] - 10https://gerrit.wikimedia.org/r/532957 (https://phabricator.wikimedia.org/T231206) (owner: 10Pmiazga) [11:07:37] (03CR) 10Urbanecm: [C: 03+2] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/532957 (https://phabricator.wikimedia.org/T231206) (owner: 10Pmiazga) [11:08:09] raynor: you're next, will let you know once the patch is at mwdebug1002 to be tested [11:08:38] (03Merged) 10jenkins-bot: Enable AMC Outreach modal [mediawiki-config] - 10https://gerrit.wikimedia.org/r/532957 (https://phabricator.wikimedia.org/T231206) (owner: 10Pmiazga) [11:08:56] (03CR) 10jenkins-bot: Enable AMC Outreach modal [mediawiki-config] - 10https://gerrit.wikimedia.org/r/532957 (https://phabricator.wikimedia.org/T231206) (owner: 10Pmiazga) [11:09:11] Urbanecm, thx, I'm waiting [11:09:13] raynor: your patch is at mwdebug1002, please test and let me know [11:09:16] Urbanecm: wgULSWebfontsEnabled: false - I don't know what is wrong.. [11:09:46] Urbanecm, on it [11:09:53] kart_: looking [11:09:55] where do you see that? [11:10:07] https://ru.wikisource.org [11:11:20] kart_: https://usercontent.irccloud-cdn.com/file/t9T0bQ75/image.png [11:11:34] seems to be okay at mwdebug1002? [11:11:40] (03PS3) 10Vgutierrez: ATS: Disable HTTP/2 max priority frames limit [puppet] - 10https://gerrit.wikimedia.org/r/532853 (https://phabricator.wikimedia.org/T231287) [11:12:15] Urbanecm: seems OK now :) [11:12:26] ok, so might be a temporary issue kart_ :-) [11:12:33] can I sync kart_ ? [11:12:35] thanks raynor [11:12:41] Yeah, go ahead. [11:13:01] Urbanecm - so far looks good, I cannot test it properly as I don't have an account with 100+ edits ;/ [11:13:07] please proceed to prod, IMHO looks good [11:13:12] thanks raynor [11:13:38] (03PS4) 10Vgutierrez: ATS: Disable HTTP/2 max priority frames limit on cp5001 [puppet] - 10https://gerrit.wikimedia.org/r/532853 (https://phabricator.wikimedia.org/T231287) [11:14:46] (03CR) 10Vgutierrez: [C: 03+2] ATS: Allow configuring HTTP/2 settings [puppet] - 10https://gerrit.wikimedia.org/r/532852 (https://phabricator.wikimedia.org/T231287) (owner: 10Vgutierrez) [11:14:58] (03PS3) 10Vgutierrez: ATS: Allow configuring HTTP/2 settings [puppet] - 10https://gerrit.wikimedia.org/r/532852 (https://phabricator.wikimedia.org/T231287) [11:15:11] !log urbanecm@deploy1001 Synchronized wmf-config/InitialiseSettings.php: SWAT: 4ebddb8: Enable webfonts for ru,uk,be of wikisource, and for sourceswiki (T220752) (duration: 00m 55s) [11:15:17] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:15:19] synced kart_ [11:15:19] T220752: Enable webfonts by default in UniversalLanguageSelector - https://phabricator.wikimedia.org/T220752 [11:15:33] Urbanecm: Thanks! [11:16:22] yw [11:16:35] raynor: syncing yours [11:16:50] !log urbanecm@deploy1001 Synchronized wmf-config/InitialiseSettings.php: SWAT: 622cb63: Enable AMC Outreach modal (T231206) (duration: 00m 54s) [11:16:55] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:16:56] T231206: Turn on AMC outreach modal - https://phabricator.wikimedia.org/T231206 [11:17:09] thx Urbanecm, let me double check prod just to be sure [11:17:14] sure raynor [11:20:21] Whom to ping for broken Graphana? [11:20:33] _joe_: ^ [11:21:50] _joe_ is out today, maybe godog? [11:21:58] OK! [11:22:16] kart_ maybe creating a task so it can be followed up on phab? [11:22:17] (03PS3) 10Zoranzoki21: Disable search engine indexing in some namespaces of Icelandic Wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/532771 (https://phabricator.wikimedia.org/T231179) [11:22:22] Urbanecm, looks like everything is ok, thx! [11:22:26] (03PS2) 10Zoranzoki21: Enable Page Previews as default on zhwikivoyage [mediawiki-config] - 10https://gerrit.wikimedia.org/r/532836 (https://phabricator.wikimedia.org/T230624) [11:22:29] marostegui: ok [11:22:36] (03PS2) 10Zoranzoki21: Create HIDPI logo for Wikispecies [mediawiki-config] - 10https://gerrit.wikimedia.org/r/532839 (https://phabricator.wikimedia.org/T230113) [11:22:45] jan_drewniak: go ahead with your change, and hand SWAT to matthiasmullie once you're done [11:23:22] Urbanecm: Ok, I'll start mine now [11:23:24] matthiasmullie: once you're done with your deployment, please hand SWAT back to me, I'll continue with my stuff [11:24:05] (03CR) 10Jdrewniak: [C: 03+2] Bumping portals to master [mediawiki-config] - 10https://gerrit.wikimedia.org/r/532966 (https://phabricator.wikimedia.org/T128546) (owner: 10Jdrewniak) [11:25:05] (03Merged) 10jenkins-bot: Bumping portals to master [mediawiki-config] - 10https://gerrit.wikimedia.org/r/532966 (https://phabricator.wikimedia.org/T128546) (owner: 10Jdrewniak) [11:25:07] Urbanecm: /here [11:25:18] 10Operations, 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team-TODO (201907): contint1001: DISK WARNING - free space: /srv 88397 MB (10% inode=94%): - https://phabricator.wikimedia.org/T219850 (10hashar) The builds are growing insane again. In megabytes: ` contint1001:/srv$ du /srv/jenkins... [11:25:23] Zoranzoki21: will ping you once it will be your turn, other deployments are in progress [11:25:33] Ok [11:26:46] (03CR) 10jenkins-bot: Bumping portals to master [mediawiki-config] - 10https://gerrit.wikimedia.org/r/532966 (https://phabricator.wikimedia.org/T128546) (owner: 10Jdrewniak) [11:26:52] (03CR) 10Vgutierrez: [C: 03+2] ATS: Disable HTTP/2 max priority frames limit on cp5001 [puppet] - 10https://gerrit.wikimedia.org/r/532853 (https://phabricator.wikimedia.org/T231287) (owner: 10Vgutierrez) [11:27:03] (03PS5) 10Vgutierrez: ATS: Disable HTTP/2 max priority frames limit on cp5001 [puppet] - 10https://gerrit.wikimedia.org/r/532853 (https://phabricator.wikimedia.org/T231287) [11:28:35] 10Operations, 10Traffic, 10Patch-For-Review: Investigate HTTP/2 limits on trafficserver - https://phabricator.wikimedia.org/T231287 (10Vgutierrez) 05Openβ†’03Resolved a:03Vgutierrez [11:28:53] godog: https://phabricator.wikimedia.org/T231432 - let me know if more info is needed. [11:30:03] !log jdrewniak@deploy1001 Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: [[gerrit:532966| Bumping portals to master (T128546)]] (duration: 00m 53s) [11:30:08] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:30:09] T128546: [Recurring Task] Update Wikipedia and sister projects portals statistics - https://phabricator.wikimedia.org/T128546 [11:30:43] I have something to add to the SWAT when it's done [11:30:56] !log jdrewniak@deploy1001 Synchronized portals: Wikimedia Portals Update: [[gerrit:532966| Bumping portals to master (T128546)]] (duration: 00m 52s) [11:31:01] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:31:09] !log Optimize pc1010 after deleting old rows - T210725 [11:31:14] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:31:15] T210725: Replace parsercache keys to something more meaningful on db-XXXX.php - https://phabricator.wikimedia.org/T210725 [11:32:00] Urbanecm: alrighty I'm done [11:32:12] thanks jan_drewniak [11:32:20] Amir1: okay, will ping you when it will be possible [11:32:44] matthiasmullie: you can deploy your change. Please hand SWAT to me once you're done [11:33:11] (03PS1) 10Ladsgroup: Enable WRITE_BOTH for items term store for testwikidatawiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/532983 (https://phabricator.wikimedia.org/T225055) [11:34:30] jouncebot: next [11:34:30] In 0 hour(s) and 25 minute(s): Pre MediaWiki train sanity break (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20190828T1200) [11:34:32] jouncebot: now [11:34:32] For the next 0 hour(s) and 25 minute(s): European Mid-day SWAT(Max 6 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20190828T1100) [11:34:34] ah yeah swat is going on [11:35:55] * Urbanecm waves to hashar [11:36:13] matthiasmullie: still around? [11:36:41] once swat is done, I will +2 a change that is only for beta ( https://gerrit.wikimedia.org/r/#/c/operations/mediawiki-config/+/530144/ ) [11:36:49] which is a noop for prod [11:36:57] 10Operations, 10Traffic, 10Patch-For-Review: Puppetize ATS TLS configuration for incoming traffic - https://phabricator.wikimedia.org/T221594 (10Vgutierrez) 05Openβ†’03Resolved [11:37:00] 10Operations, 10Traffic: Evaluate ATS TLS stack - https://phabricator.wikimedia.org/T220383 (10Vgutierrez) [11:37:28] Urbanecm: I have to go now on bus.. I will request deployment of my patches for next SWAT [11:37:39] ok Zoranzoki21 [11:37:44] But you can deploy it if you can :) [11:37:48] Bye [11:38:33] (03PS4) 10Urbanecm: Whitelist jenkins for edit rate limits on beta [mediawiki-config] - 10https://gerrit.wikimedia.org/r/530144 (https://phabricator.wikimedia.org/T230481) (owner: 10Jakob) [11:38:45] * Urbanecm is going to +2 ^^ [11:38:53] (03CR) 10Urbanecm: [C: 03+2] "noop for prod" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/530144 (https://phabricator.wikimedia.org/T230481) (owner: 10Jakob) [11:39:02] 10Operations, 10Traffic: Move cache upload cluster from nginx to ats-tls - https://phabricator.wikimedia.org/T231433 (10Vgutierrez) [11:39:04] hashar: +2'ed on your behalf [11:39:15] 10Operations, 10Traffic: Move cache upload cluster from nginx to ats-tls - https://phabricator.wikimedia.org/T231433 (10Vgutierrez) p:05Triageβ†’03Normal [11:39:32] (no one should be using deployment machine rn, so it's safe to do now) [11:39:52] Urbanecm: yeah still here [11:40:06] (03Merged) 10jenkins-bot: Whitelist jenkins for edit rate limits on beta [mediawiki-config] - 10https://gerrit.wikimedia.org/r/530144 (https://phabricator.wikimedia.org/T230481) (owner: 10Jakob) [11:40:15] will sync my changes now [11:40:21] thanks matthiasmullie [11:40:26] let me know when you're done [11:40:34] 10Operations, 10DBA, 10wikitech.wikimedia.org, 10Patch-For-Review, 10cloud-services-team (Kanban): Switchover m5 primary master: db1073 to db1133: Tuesday 3rd Sept at 13:00 UTC - https://phabricator.wikimedia.org/T229657 (10Marostegui) [11:42:53] !log mlitn@deploy1001 Synchronized php-1.34.0-wmf.20/extensions/WikibaseMediaInfo: [SDC] Check existence of objects before using it (duration: 00m 54s) [11:42:58] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:43:41] (03PS1) 10Vgutierrez: hiera: Move nginx from port 443 to 4443 on cp2002 [puppet] - 10https://gerrit.wikimedia.org/r/532984 (https://phabricator.wikimedia.org/T231433) [11:43:44] (03PS1) 10Vgutierrez: hiera: Move ats-tls from port 8443 to port 443 on cp2002 [puppet] - 10https://gerrit.wikimedia.org/r/532985 (https://phabricator.wikimedia.org/T231433) [11:43:58] Urbanecm: done! [11:44:05] thanks matthiasmullie [11:44:22] Amir1: want to sync now? [11:44:26] (not sure how much time it requires) [11:44:41] Yeah, it's five minutes tops [11:44:51] go ahead then Amir1 [11:45:24] (03PS2) 10Ladsgroup: Enable WRITE_BOTH for items term store for testwikidatawiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/532983 (https://phabricator.wikimedia.org/T225055) [11:45:33] (03CR) 10Ladsgroup: [C: 03+2] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/532983 (https://phabricator.wikimedia.org/T225055) (owner: 10Ladsgroup) [11:46:37] (03Merged) 10jenkins-bot: Enable WRITE_BOTH for items term store for testwikidatawiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/532983 (https://phabricator.wikimedia.org/T225055) (owner: 10Ladsgroup) [11:48:27] !log ladsgroup@deploy1001 Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:532983|Enable WRITE_BOTH for items term store for testwikidatawiki (T225055)]] (duration: 00m 54s) [11:48:32] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:48:34] T225055: Switch `tmpItemTermsMigrationStages` to MIGRATION_WRITE_BOTH - https://phabricator.wikimedia.org/T225055 [11:48:38] (03CR) 10jenkins-bot: Whitelist jenkins for edit rate limits on beta [mediawiki-config] - 10https://gerrit.wikimedia.org/r/530144 (https://phabricator.wikimedia.org/T230481) (owner: 10Jakob) [11:48:47] Urbanecm: I'm done [11:48:54] thanks Amir1 [11:49:17] (03PS3) 10Urbanecm: Create HIDPI logo for Wikispecies [mediawiki-config] - 10https://gerrit.wikimedia.org/r/532839 (https://phabricator.wikimedia.org/T230113) (owner: 10Zoranzoki21) [11:49:23] (03CR) 10Urbanecm: [C: 03+2] Create HIDPI logo for Wikispecies [mediawiki-config] - 10https://gerrit.wikimedia.org/r/532839 (https://phabricator.wikimedia.org/T230113) (owner: 10Zoranzoki21) [11:50:39] (03Merged) 10jenkins-bot: Create HIDPI logo for Wikispecies [mediawiki-config] - 10https://gerrit.wikimedia.org/r/532839 (https://phabricator.wikimedia.org/T230113) (owner: 10Zoranzoki21) [11:50:42] (03CR) 10jenkins-bot: Enable WRITE_BOTH for items term store for testwikidatawiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/532983 (https://phabricator.wikimedia.org/T225055) (owner: 10Ladsgroup) [11:51:01] (03PS3) 10Urbanecm: Enable Page Previews as default on zhwikivoyage [mediawiki-config] - 10https://gerrit.wikimedia.org/r/532836 (https://phabricator.wikimedia.org/T230624) (owner: 10Zoranzoki21) [11:51:50] Urbanecm: thank you :) [11:51:57] happy to help! [11:52:17] !log urbanecm@deploy1001 Synchronized static/images/project-logos/: SWAT: f86baa3: Create HIDPI logo for Wikispecies (1/2, T230113) (duration: 00m 54s) [11:52:22] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:52:23] T230113: Create HIDPI logo for Wikispecies - https://phabricator.wikimedia.org/T230113 [11:52:47] (03CR) 10jenkins-bot: Create HIDPI logo for Wikispecies [mediawiki-config] - 10https://gerrit.wikimedia.org/r/532839 (https://phabricator.wikimedia.org/T230113) (owner: 10Zoranzoki21) [11:53:17] (03CR) 10Urbanecm: [C: 03+2] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/532836 (https://phabricator.wikimedia.org/T230624) (owner: 10Zoranzoki21) [11:53:52] (03PS1) 10Vgutierrez: hiera: Move nginx from port 443 to 4443 on cp1076 [puppet] - 10https://gerrit.wikimedia.org/r/532987 (https://phabricator.wikimedia.org/T231433) [11:53:54] (03PS1) 10Vgutierrez: hiera: Move ats-tls from port 8443 to 4443 on cp1076 [puppet] - 10https://gerrit.wikimedia.org/r/532988 (https://phabricator.wikimedia.org/T231433) [11:53:56] (03PS1) 10Vgutierrez: hiera: Move nginx from port 443 to port 4443 on cp3034 [puppet] - 10https://gerrit.wikimedia.org/r/532989 (https://phabricator.wikimedia.org/T231433) [11:53:58] (03PS1) 10Vgutierrez: hiera: Move ats-tls from port 8443 to port 443 on cp3034 [puppet] - 10https://gerrit.wikimedia.org/r/532990 (https://phabricator.wikimedia.org/T231433) [11:54:00] (03PS1) 10Vgutierrez: hiera: Move nginx to port 4443 on cp4021 [puppet] - 10https://gerrit.wikimedia.org/r/532991 (https://phabricator.wikimedia.org/T231433) [11:54:02] (03PS1) 10Vgutierrez: hiera: Move ats-tls from port 8443 to port 443 on cp4021 [puppet] - 10https://gerrit.wikimedia.org/r/532992 (https://phabricator.wikimedia.org/T231433) [11:54:27] (03Merged) 10jenkins-bot: Enable Page Previews as default on zhwikivoyage [mediawiki-config] - 10https://gerrit.wikimedia.org/r/532836 (https://phabricator.wikimedia.org/T230624) (owner: 10Zoranzoki21) [11:54:29] (03PS4) 10Urbanecm: Disable search engine indexing in some namespaces of Icelandic Wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/532771 (https://phabricator.wikimedia.org/T231179) (owner: 10Zoranzoki21) [11:54:37] !log urbanecm@deploy1001 Synchronized wmf-config/InitialiseSettings.php: SWAT: f86baa3: Create HIDPI logo for Wikispecies (T230113) (duration: 00m 52s) [11:54:38] (03PS5) 10Urbanecm: Disable search engine indexing in some namespaces of Icelandic Wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/532771 (https://phabricator.wikimedia.org/T231179) (owner: 10Zoranzoki21) [11:54:42] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:54:43] (03CR) 10Urbanecm: [C: 03+2] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/532771 (https://phabricator.wikimedia.org/T231179) (owner: 10Zoranzoki21) [11:55:21] 10Operations, 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team-TODO (201907): contint1001: DISK WARNING - free space: /srv 88397 MB (10% inode=94%): - https://phabricator.wikimedia.org/T219850 (10hashar) There are a lot of `mw-debug-cli.log` files which are 130MBytes. It is generated by Me... [11:55:54] !log Purge /static/images/project-logos/specieswiki-1.5x.png and /static/images/project-logos/specieswiki-2x.png (T230113) [11:55:59] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:57:34] !log urbanecm@deploy1001 Synchronized wmf-config/InitialiseSettings.php: SWAT: 2aebc15: Enable Page Previews as default on zhwikivoyage (T230624) (duration: 00m 52s) [11:57:40] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:57:40] T230624: Enable Page Previews as default on zhwikivoyage(Chinese Wikivoyage) - https://phabricator.wikimedia.org/T230624 [11:58:26] (03PS11) 10Urbanecm: General cleanup of initialize settings [mediawiki-config] - 10https://gerrit.wikimedia.org/r/532280 (https://phabricator.wikimedia.org/T231178) (owner: 10DannyS712) [11:58:54] (03Abandoned) 10Urbanecm: Assign all rights assigned to suppress group to oversight group [mediawiki-config] - 10https://gerrit.wikimedia.org/r/531147 (https://phabricator.wikimedia.org/T230601) (owner: 10Urbanecm) [11:59:04] (03Merged) 10jenkins-bot: Disable search engine indexing in some namespaces of Icelandic Wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/532771 (https://phabricator.wikimedia.org/T231179) (owner: 10Zoranzoki21) [11:59:16] (03PS2) 10Urbanecm: [rowiki] Allow sysops to name patrollers [mediawiki-config] - 10https://gerrit.wikimedia.org/r/531956 (https://phabricator.wikimedia.org/T231099) (owner: 10Strainu) [11:59:22] (03CR) 10Urbanecm: [C: 03+2] [rowiki] Allow sysops to name patrollers [mediawiki-config] - 10https://gerrit.wikimedia.org/r/531956 (https://phabricator.wikimedia.org/T231099) (owner: 10Strainu) [12:00:04] Deploy window Pre MediaWiki train sanity break (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20190828T1200) [12:01:31] !log urbanecm@deploy1001 Synchronized wmf-config/InitialiseSettings.php: SWAT: 34f1552: Disable search engine indexing in some namespaces of Icelandic Wikipedia (T231179) (duration: 00m 54s) [12:01:37] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:01:37] T231179: Disable search engine indexing (via noindex) in some namespaces of Icelandic Wikipedia - https://phabricator.wikimedia.org/T231179 [12:02:29] PROBLEM - Work requests waiting in Zuul Gearman server on contint1001 is CRITICAL: CRITICAL: 57.14% of data above the critical threshold [140.0] https://www.mediawiki.org/wiki/Continuous_integration/Zuul https://grafana.wikimedia.org/dashboard/db/zuul-gearman?panelId=10&fullscreen&orgId=1 [12:02:47] uhoh, didn't look at the watch carefully [12:03:01] !log EU SWAT is taking few mins out of the sanity break, last patch [12:03:06] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:04:00] (03CR) 10Urbanecm: [V: 03+2 C: 03+2] "busy ci, jenkins is likely to pass" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/531956 (https://phabricator.wikimedia.org/T231099) (owner: 10Strainu) [12:05:26] (03CR) 10jerkins-bot: [V: 04-1] General cleanup of initialize settings [mediawiki-config] - 10https://gerrit.wikimedia.org/r/532280 (https://phabricator.wikimedia.org/T231178) (owner: 10DannyS712) [12:05:47] !log urbanecm@deploy1001 Synchronized wmf-config/InitialiseSettings.php: SWAT: 389919f: [rowiki] Allow sysops to name patrollers (T231099) (duration: 00m 53s) [12:05:52] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:05:54] T231099: Patroller rights changes for ro.wp - https://phabricator.wikimedia.org/T231099 [12:06:02] !log Closing EU SWAT [12:06:07] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:12:23] 10Operations, 10serviceops, 10PHP 7.2 support, 10PHP 7.3 support: PHP 7.2 is very slow on an allocation-intensive benchmark - https://phabricator.wikimedia.org/T230861 (10ssastry) >>! In T230861#5433344, @tstarling wrote: > Tracing of TagTk::__destruct() shows that tokens are freed as it goes, they're not... [12:16:07] 10Operations, 10ops-eqiad: Degraded RAID on helium - https://phabricator.wikimedia.org/T224794 (10Jclark-ctr) 05Openβ†’03Resolved [12:16:11] 10Operations, 10DBA, 10serviceops, 10Goal: Strengthen backup infrastructure and support - https://phabricator.wikimedia.org/T229209 (10Jclark-ctr) [12:16:59] !log contint1001: manually gzip a few mw-debug-cli.log.gz files # T219850 [12:17:04] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:17:05] T219850: contint1001: DISK WARNING - free space: /srv 88397 MB (10% inode=94%): - https://phabricator.wikimedia.org/T219850 [12:33:26] (03CR) 10jenkins-bot: Enable Page Previews as default on zhwikivoyage [mediawiki-config] - 10https://gerrit.wikimedia.org/r/532836 (https://phabricator.wikimedia.org/T230624) (owner: 10Zoranzoki21) [12:33:27] RECOVERY - Disk space on contint1001 is OK: DISK OK https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space https://grafana.wikimedia.org/dashboard/db/host-overview?var-server=contint1001&var-datasource=eqiad+prometheus/ops [12:52:16] kart_: ack, if I can't get around taking a look today I'm back on Mon FYI [12:58:03] (03PS5) 10Ema: docker_registry_ha: allow eqiad/codfw varnish/ATS text nodes [puppet] - 10https://gerrit.wikimedia.org/r/532960 (https://phabricator.wikimedia.org/T227432) [13:00:04] zeljkof: How many deployers does it take to do MediaWiki train - European version deploy? (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20190828T1300). [13:00:08] kart_: looks like it was the k8s switch to codfw, see task [13:00:23] PROBLEM - Check systemd state on notebook1004 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [13:00:24] thank you for the reminder jouncebot :P [13:00:56] (03CR) 10jenkins-bot: Disable search engine indexing in some namespaces of Icelandic Wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/532771 (https://phabricator.wikimedia.org/T231179) (owner: 10Zoranzoki21) [13:03:07] 10Operations, 10Discovery-Search, 10Elasticsearch: Reindex commonswiki as shards have grown beyond critical threshold - https://phabricator.wikimedia.org/T231446 (10Mathew.onipe) [13:03:29] 10Operations, 10Discovery-Search, 10Elasticsearch: Reindex commonswiki as shards have grown beyond critical threshold - https://phabricator.wikimedia.org/T231446 (10Mathew.onipe) p:05Triageβ†’03Normal [13:06:47] (03CR) 10jenkins-bot: [rowiki] Allow sysops to name patrollers [mediawiki-config] - 10https://gerrit.wikimedia.org/r/531956 (https://phabricator.wikimedia.org/T231099) (owner: 10Strainu) [13:09:13] (03PS12) 10Mathew.onipe: lvs: allow access to wdqs lvs on port 8888 [puppet] - 10https://gerrit.wikimedia.org/r/529053 (https://phabricator.wikimedia.org/T176875) [13:09:15] (03PS7) 10Mathew.onipe: elasticsearch: ship logs to local syslog server [puppet] - 10https://gerrit.wikimedia.org/r/531922 (https://phabricator.wikimedia.org/T225125) [13:12:21] (03PS13) 10Mathew.onipe: lvs: allow access to wdqs lvs on port 8888 [puppet] - 10https://gerrit.wikimedia.org/r/529053 (https://phabricator.wikimedia.org/T176875) [13:12:23] (03PS8) 10Mathew.onipe: elasticsearch: ship logs to local syslog server [puppet] - 10https://gerrit.wikimedia.org/r/531922 (https://phabricator.wikimedia.org/T225125) [13:14:23] (03PS1) 10Ema: phabricator::main: whitelist ATS hosts [puppet] - 10https://gerrit.wikimedia.org/r/533009 (https://phabricator.wikimedia.org/T227432) [13:15:16] !log Optimize pc2010 after deleting old rows - T210725 [13:15:23] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:15:24] T210725: Replace parsercache keys to something more meaningful on db-XXXX.php - https://phabricator.wikimedia.org/T210725 [13:17:11] (03PS2) 10Ema: phabricator::main: whitelist ATS hosts [puppet] - 10https://gerrit.wikimedia.org/r/533009 (https://phabricator.wikimedia.org/T227432) [13:18:14] !log Change min_replicas to 3 on s6 for eqiad and codfw T231019 [13:18:19] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:18:20] T231019: set min_replicas on database sections in dbctl - https://phabricator.wikimedia.org/T231019 [13:20:47] !log Change min_replicas to 4 on s8 for eqiad and codfw T231019 [13:20:52] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:24:59] RECOVERY - Work requests waiting in Zuul Gearman server on contint1001 is OK: OK: Less than 30.00% above the threshold [90.0] https://www.mediawiki.org/wiki/Continuous_integration/Zuul https://grafana.wikimedia.org/dashboard/db/zuul-gearman?panelId=10&fullscreen&orgId=1 [13:25:04] (03CR) 10Ema: [C: 03+2] docker_registry_ha: allow eqiad/codfw varnish/ATS text nodes [puppet] - 10https://gerrit.wikimedia.org/r/532960 (https://phabricator.wikimedia.org/T227432) (owner: 10Ema) [13:25:19] (03CR) 10Ema: [C: 03+2] phabricator::main: whitelist ATS hosts [puppet] - 10https://gerrit.wikimedia.org/r/533009 (https://phabricator.wikimedia.org/T227432) (owner: 10Ema) [13:26:52] (03PS1) 10Zfilipin: group1 wikis to 1.34.0-wmf.20 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/533012 [13:26:54] (03CR) 10Zfilipin: [C: 03+2] group1 wikis to 1.34.0-wmf.20 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/533012 (owner: 10Zfilipin) [13:28:09] (03Merged) 10jenkins-bot: group1 wikis to 1.34.0-wmf.20 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/533012 (owner: 10Zfilipin) [13:28:30] (03CR) 10jenkins-bot: group1 wikis to 1.34.0-wmf.20 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/533012 (owner: 10Zfilipin) [13:29:37] !log zfilipin@deploy1001 rebuilt and synchronized wikiversions files: group1 wikis to 1.34.0-wmf.20 [13:29:54] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:30:33] !log zfilipin@deploy1001 Synchronized php: group1 wikis to 1.34.0-wmf.20 (duration: 00m 55s) [13:30:50] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:32:39] PROBLEM - termbox codfw on termbox.svc.codfw.wmnet is CRITICAL: /termbox (get rendered termbox) is CRITICAL: Test get rendered termbox returned the unexpected status 500 (expecting: 200) https://wikitech.wikimedia.org/wiki/WMDE/Wikidata/SSR_Service [13:33:38] spike in errors in logs, reverting train [13:34:42] (03PS1) 10Ema: Add discovery CNAME webserver-misc-apps -> miscweb1001 [dns] - 10https://gerrit.wikimedia.org/r/533014 (https://phabricator.wikimedia.org/T210411) [13:36:15] PROBLEM - High average GET latency for mw requests on appserver in eqiad on icinga1001 is CRITICAL: cluster=appserver code=200 handler=proxy:unix:/run/php/fpm-www.sock https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=appserver&var-method=GET [13:38:40] !log zfilipin@deploy1001 rebuilt and synchronized wikiversions files: Revert "group1 wikis to 1.34.0-wmf.20" [13:38:44] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:38:49] RECOVERY - termbox codfw on termbox.svc.codfw.wmnet is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/WMDE/Wikidata/SSR_Service [13:39:25] 10Operations, 10DNS, 10Domains, 10Traffic: Could not reach wikipedia from domain wikipedia.fi - https://phabricator.wikimedia.org/T230470 (10Zache) seems to work, thanks [13:40:09] (03PS1) 10Zfilipin: Revert "group1 wikis to 1.34.0-wmf.20" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/533017 [13:40:11] (03CR) 10Zfilipin: [C: 03+2] Revert "group1 wikis to 1.34.0-wmf.20" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/533017 (owner: 10Zfilipin) [13:40:53] RECOVERY - High average GET latency for mw requests on appserver in eqiad on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=appserver&var-method=GET [13:41:15] (03PS3) 10Ema: ATS: temporarily use plain HTTP to access docker-registry [puppet] - 10https://gerrit.wikimedia.org/r/532953 (https://phabricator.wikimedia.org/T227432) [13:42:10] (03PS2) 10Dzahn: Add discovery CNAME webserver-misc-apps -> miscweb1001 [dns] - 10https://gerrit.wikimedia.org/r/533014 (https://phabricator.wikimedia.org/T210411) (owner: 10Ema) [13:42:39] (03CR) 10Dzahn: [C: 03+2] Add discovery CNAME webserver-misc-apps -> miscweb1001 [dns] - 10https://gerrit.wikimedia.org/r/533014 (https://phabricator.wikimedia.org/T210411) (owner: 10Ema) [13:43:19] (03CR) 10Ema: [C: 03+2] ATS: temporarily use plain HTTP to access docker-registry [puppet] - 10https://gerrit.wikimedia.org/r/532953 (https://phabricator.wikimedia.org/T227432) (owner: 10Ema) [13:43:25] (03Merged) 10jenkins-bot: Revert "group1 wikis to 1.34.0-wmf.20" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/533017 (owner: 10Zfilipin) [13:43:45] (03CR) 10jenkins-bot: Revert "group1 wikis to 1.34.0-wmf.20" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/533017 (owner: 10Zfilipin) [13:47:17] (03CR) 10CDanis: [C: 03+2] swiftrepl: bring close to as-is in production [software] - 10https://gerrit.wikimedia.org/r/532793 (https://phabricator.wikimedia.org/T231110) (owner: 10CDanis) [13:47:29] (03CR) 10Ema: ATS/varnish: replace krypton with miscweb1001, rename director (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/532695 (https://phabricator.wikimedia.org/T224247) (owner: 10Dzahn) [13:47:48] (03Merged) 10jenkins-bot: swiftrepl: bring close to as-is in production [software] - 10https://gerrit.wikimedia.org/r/532793 (https://phabricator.wikimedia.org/T231110) (owner: 10CDanis) [13:48:41] PROBLEM - High average GET latency for mw requests on appserver in eqiad on icinga1001 is CRITICAL: cluster=appserver code=200 handler={proxy:fcgi://127.0.0.1:9000,proxy:unix:/run/php/fpm-www.sock https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluste [13:48:41] ethod=GET [13:49:24] 10Operations, 10Continuous-Integration-Infrastructure, 10Patch-For-Review, 10Release-Engineering-Team-TODO (201907): contint1001: DISK WARNING - free space: /srv 88397 MB (10% inode=94%): - https://phabricator.wikimedia.org/T219850 (10hashar) 05Openβ†’03Resolved Some of the jobs (`mediawiki-quibble-*`, F... [13:49:27] 10Operations, 10Continuous-Integration-Infrastructure, 10serviceops, 10Release-Engineering-Team (CI & Testing services), 10Release-Engineering-Team-TODO (201907): contint1001 store docker images on separate partition or disk - https://phabricator.wikimedia.org/T207707 (10hashar) [13:51:12] (03PS13) 10Jhedden: openstack: Add codfw1dev nova API and metadata to haproxy [puppet] - 10https://gerrit.wikimedia.org/r/530580 (https://phabricator.wikimedia.org/T223907) [13:58:03] !log ema@puppetmaster1001 conftool action : set/pooled=yes; selector: name=cp1075.eqiad.wmnet,service=ats-be [13:58:05] RECOVERY - High average GET latency for mw requests on appserver in eqiad on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=appserver&var-method=GET [13:58:07] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:59:13] !log cp1075 ats-be repooled to resume testing T228629 [13:59:18] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:59:19] T228629: ATS Backends: Test live cache_text traffic - https://phabricator.wikimedia.org/T228629 [13:59:56] (03PS1) 10Mathew.onipe: Reindex commonswiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/533023 (https://phabricator.wikimedia.org/T231446) [14:01:10] (03PS2) 10Mathew.onipe: Reshard commonswiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/533023 (https://phabricator.wikimedia.org/T231446) [14:02:12] 10Operations, 10DBA, 10wikitech.wikimedia.org, 10Patch-For-Review, 10cloud-services-team (Kanban): Switchover m5 primary master: db1073 to db1133: Tuesday 3rd Sept at 13:00 UTC - https://phabricator.wikimedia.org/T229657 (10Marostegui) [14:02:55] 10Operations, 10DBA, 10wikitech.wikimedia.org, 10Patch-For-Review, 10cloud-services-team (Kanban): Switchover m5 primary master: db1073 to db1133: Tuesday 3rd Sept at 13:00 UTC - https://phabricator.wikimedia.org/T229657 (10Marostegui) [14:03:25] 10Operations, 10DBA, 10wikitech.wikimedia.org, 10Patch-For-Review, 10cloud-services-team (Kanban): Switchover m5 primary master: db1073 to db1133: Tuesday 3rd Sept at 13:00 UTC - https://phabricator.wikimedia.org/T229657 (10Marostegui) [14:03:39] (03PS1) 10Dzahn: ATS/varnish: add director miscweb and switch racktables to it [puppet] - 10https://gerrit.wikimedia.org/r/533024 (https://phabricator.wikimedia.org/T224247) [14:04:24] (03CR) 10Dzahn: ATS/varnish: replace krypton with miscweb1001, rename director (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/532695 (https://phabricator.wikimedia.org/T224247) (owner: 10Dzahn) [14:05:20] (03CR) 10Dzahn: ATS/varnish: add director miscweb and switch racktables to it (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/533024 (https://phabricator.wikimedia.org/T224247) (owner: 10Dzahn) [14:05:53] (03CR) 10Dzahn: ATS/varnish: add director miscweb and switch racktables to it (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/533024 (https://phabricator.wikimedia.org/T224247) (owner: 10Dzahn) [14:07:26] (03CR) 10Ema: ATS/varnish: add director miscweb and switch racktables to it (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/533024 (https://phabricator.wikimedia.org/T224247) (owner: 10Dzahn) [14:09:18] (03PS2) 10Dzahn: ATS/varnish: add director miscweb and switch racktables to it [puppet] - 10https://gerrit.wikimedia.org/r/533024 (https://phabricator.wikimedia.org/T224247) [14:09:20] (03CR) 10Dzahn: ATS/varnish: add director miscweb and switch racktables to it (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/533024 (https://phabricator.wikimedia.org/T224247) (owner: 10Dzahn) [14:13:11] (03PS14) 10Jhedden: openstack: Add codfw1dev nova API and metadata to haproxy [puppet] - 10https://gerrit.wikimedia.org/r/530580 (https://phabricator.wikimedia.org/T223907) [14:17:14] (03PS1) 10Ema: restbase: TLS termination with envoy [puppet] - 10https://gerrit.wikimedia.org/r/533028 (https://phabricator.wikimedia.org/T210411) [14:18:48] (03CR) 10DCausse: Reshard commonswiki (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/533023 (https://phabricator.wikimedia.org/T231446) (owner: 10Mathew.onipe) [14:19:07] (03CR) 10Dzahn: [C: 03+2] "https://puppet-compiler.wmflabs.org/compiler1001/18091/" [puppet] - 10https://gerrit.wikimedia.org/r/533024 (https://phabricator.wikimedia.org/T224247) (owner: 10Dzahn) [14:19:18] (03PS15) 10Jhedden: openstack: Add codfw1dev nova API and metadata to haproxy [puppet] - 10https://gerrit.wikimedia.org/r/530580 (https://phabricator.wikimedia.org/T223907) [14:20:45] (03CR) 10DCausse: Reshard commonswiki (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/533023 (https://phabricator.wikimedia.org/T231446) (owner: 10Mathew.onipe) [14:22:40] jouncebot: now [14:22:40] For the next 0 hour(s) and 37 minute(s): MediaWiki train - European version (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20190828T1300) [14:22:45] jouncebot: next [14:22:45] In 1 hour(s) and 37 minute(s): Morning SWAT (Max 6 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20190828T1600) [14:23:48] (03CR) 10Ema: "pcc seems fine https://puppet-compiler.wmflabs.org/compiler1002/18092/" [puppet] - 10https://gerrit.wikimedia.org/r/533028 (https://phabricator.wikimedia.org/T210411) (owner: 10Ema) [14:27:43] (03PS1) 10Strainu: [rowiki] Allow sysops to remove patrollers [mediawiki-config] - 10https://gerrit.wikimedia.org/r/533029 (https://phabricator.wikimedia.org/T231099) [14:35:12] !log racktables - down for maintenance [14:35:17] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:36:04] (03PS16) 10Jhedden: openstack: Add codfw1dev nova API and metadata to haproxy [puppet] - 10https://gerrit.wikimedia.org/r/530580 (https://phabricator.wikimedia.org/T223907) [14:40:00] (03PS3) 10Mathew.onipe: Reshard commonswiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/533023 (https://phabricator.wikimedia.org/T231446) [14:40:50] (03CR) 10Mathew.onipe: Reshard commonswiki (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/533023 (https://phabricator.wikimedia.org/T231446) (owner: 10Mathew.onipe) [14:44:22] (03PS1) 10CDanis: puppetmaster: restore proxing of legacy certificate API [puppet] - 10https://gerrit.wikimedia.org/r/533035 (https://phabricator.wikimedia.org/T231423) [14:46:18] (03PS1) 10Ottomata: Restore proxy rule for legacy /certificate.* API to puppet master [puppet] - 10https://gerrit.wikimedia.org/r/533037 (https://phabricator.wikimedia.org/T231423) [14:46:37] (03PS2) 10CDanis: puppetmaster: restore proxing of legacy certificate API [puppet] - 10https://gerrit.wikimedia.org/r/533035 (https://phabricator.wikimedia.org/T231423) [14:46:47] hah [14:46:48] ok! [14:46:56] (03PS42) 10CRusnov: profile::netbox: Reorganize for splitting front and back-end. [puppet] - 10https://gerrit.wikimedia.org/r/514395 (https://phabricator.wikimedia.org/T223291) [14:47:50] (03Abandoned) 10Ottomata: Restore proxy rule for legacy /certificate.* API to puppet master [puppet] - 10https://gerrit.wikimedia.org/r/533037 (https://phabricator.wikimedia.org/T231423) (owner: 10Ottomata) [14:47:56] (03CR) 10Ottomata: [C: 03+1] puppetmaster: restore proxing of legacy certificate API [puppet] - 10https://gerrit.wikimedia.org/r/533035 (https://phabricator.wikimedia.org/T231423) (owner: 10CDanis) [14:48:16] (03CR) 10CDanis: [C: 03+2] puppetmaster: restore proxing of legacy certificate API [puppet] - 10https://gerrit.wikimedia.org/r/533035 (https://phabricator.wikimedia.org/T231423) (owner: 10CDanis) [14:48:58] (03CR) 10jerkins-bot: [V: 04-1] profile::netbox: Reorganize for splitting front and back-end. [puppet] - 10https://gerrit.wikimedia.org/r/514395 (https://phabricator.wikimedia.org/T223291) (owner: 10CRusnov) [14:53:30] 10Operations, 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team-TODO (201907): contint1001: DISK WARNING - free space: /srv 88397 MB (10% inode=94%): - https://phabricator.wikimedia.org/T219850 (10hashar) Once clenanup has completed: ` $ df -h /srv Filesystem Size Us... [14:59:57] (03PS2) 10Ema: restbase: TLS termination with envoy [puppet] - 10https://gerrit.wikimedia.org/r/533028 (https://phabricator.wikimedia.org/T210411) [14:59:59] (03PS1) 10Ema: Add discovery hostname to docker-registry certificate [puppet] - 10https://gerrit.wikimedia.org/r/533039 (https://phabricator.wikimedia.org/T210411) [15:03:31] jouncebot: now [15:03:32] No deployments scheduled for the next 0 hour(s) and 56 minute(s) [15:06:59] (03PS2) 10Ema: Add discovery hostname to docker-registry certificate [puppet] - 10https://gerrit.wikimedia.org/r/533039 (https://phabricator.wikimedia.org/T210411) [15:07:48] (03CR) 10Ema: [C: 03+2] Add discovery hostname to docker-registry certificate [puppet] - 10https://gerrit.wikimedia.org/r/533039 (https://phabricator.wikimedia.org/T210411) (owner: 10Ema) [15:09:12] (03CR) 10Volans: "@holger, I should be able to do another pass tomorrow, sorry for the delay." [software/cassandra-table-properties] - 10https://gerrit.wikimedia.org/r/524921 (https://phabricator.wikimedia.org/T220246) (owner: 10Holger Knust) [15:15:42] !log restart puppetdb on compiler1002.puppet-diffs.eqiad.wmflabs [15:15:51] 10Operations, 10netbox, 10Patch-For-Review: Setup Swift Storage for Netbox image (was: netbox won't allow me to upload photos of the rack) - https://phabricator.wikimedia.org/T209182 (10fgiunchedi) Had a chat with @crusnov about swift replication and backups, to summarize: 1. We can use swift container sync... [15:15:52] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:23:00] (03PS43) 10CRusnov: profile::netbox: Reorganize for splitting front and back-end. [puppet] - 10https://gerrit.wikimedia.org/r/514395 (https://phabricator.wikimedia.org/T223291) [15:24:26] (03PS1) 10Ema: Revert "ATS: temporarily use plain HTTP to access docker-registry" [puppet] - 10https://gerrit.wikimedia.org/r/533041 (https://phabricator.wikimedia.org/T227432) [15:29:13] PROBLEM - MediaWiki exceptions and fatals per minute on graphite1004 is CRITICAL: CRITICAL: 80.00% of data above the critical threshold [50.0] https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/dashboard/db/mediawiki-graphite-alerts?orgId=1&panelId=2&fullscreen [15:29:14] (03CR) 10Jhedden: "PCC results https://puppet-compiler.wmflabs.org/compiler1002/18098/" [puppet] - 10https://gerrit.wikimedia.org/r/530580 (https://phabricator.wikimedia.org/T223907) (owner: 10Jhedden) [15:31:30] 10Operations, 10Traffic, 10serviceops, 10Patch-For-Review: Applayer services without TLS - https://phabricator.wikimedia.org/T210411 (10Dzahn) [15:32:54] (03PS2) 10Herron: eventgate-main: add new kafka-main brokers to broker list [deployment-charts] - 10https://gerrit.wikimedia.org/r/529428 (https://phabricator.wikimedia.org/T225005) [15:33:24] (03CR) 10Ottomata: [C: 03+1] eventgate-main: add new kafka-main brokers to broker list [deployment-charts] - 10https://gerrit.wikimedia.org/r/529428 (https://phabricator.wikimedia.org/T225005) (owner: 10Herron) [15:38:40] 10Operations, 10netbox, 10Patch-For-Review: Setup Swift Storage for Netbox image (was: netbox won't allow me to upload photos of the rack) - https://phabricator.wikimedia.org/T209182 (10Volans) Thanks for the update. For the replication I guess one-way it's ok, we don't plan to have a/a backend in Netbox an... [15:43:04] (03PS3) 10CDanis: swiftrepl: log on replications [software] - 10https://gerrit.wikimedia.org/r/531964 (https://phabricator.wikimedia.org/T231110) [15:55:34] PROBLEM - nova-compute proc maximum on cloudvirt1024 is CRITICAL: connect to address 10.64.20.43 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting [15:55:47] PROBLEM - nova-compute proc minimum on cloudvirt1024 is CRITICAL: connect to address 10.64.20.43 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting [15:56:21] PROBLEM - ensure kvm processes are running on cloudvirt1024 is CRITICAL: connect to address 10.64.20.43 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting [15:58:39] (03CR) 10CDanis: [C: 03+2] swiftrepl: log on replications [software] - 10https://gerrit.wikimedia.org/r/531964 (https://phabricator.wikimedia.org/T231110) (owner: 10CDanis) [15:59:07] (03Merged) 10jenkins-bot: swiftrepl: log on replications [software] - 10https://gerrit.wikimedia.org/r/531964 (https://phabricator.wikimedia.org/T231110) (owner: 10CDanis) [16:00:04] MaxSem, RoanKattouw, Niharika, and Urbanecm: #bothumor My software never has bugs. It just develops random features. Rise for Morning SWAT (Max 6 patches). (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20190828T1600). [16:00:04] No GERRIT patches in the queue for this window AFAICS. [16:01:44] !log contint2001: upgraded Debian packages / Jenkins [16:01:48] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:03:02] jouncebot: now [16:03:02] For the next 0 hour(s) and 56 minute(s): Morning SWAT (Max 6 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20190828T1600) [16:03:18] !log imported new jenkins package to thirdparty/ci stretch-wikimedia [16:03:23] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:03:29] I am going to upgrade the CI Jenkins on contint1001 [16:03:41] hashar: swat is empty, btw [16:03:46] great! [16:06:32] !log upgrading Jenkins on contint1001 [16:06:37] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:10:43] (03CR) 10Cwhite: [C: 03+2] profile, varnishkafka: remove logster cron entries from varnishkafka hosts [puppet] - 10https://gerrit.wikimedia.org/r/531730 (https://phabricator.wikimedia.org/T229357) (owner: 10Cwhite) [16:10:50] (03PS2) 10Cwhite: profile, varnishkafka: remove logster cron entries from varnishkafka hosts [puppet] - 10https://gerrit.wikimedia.org/r/531730 (https://phabricator.wikimedia.org/T229357) [16:30:25] the mediawiki exceptions alert is real btw, anyone looking ? [16:32:05] looks like it might be this [16:32:06] mostly from scandium? [16:32:06] RuntimeException from line 449 of /srv/mediawiki/php-1.34.0-wmf.19/includes/libs/services/ServiceContainer.php: Circular dependency when creating service! MobileFrontend.AMC.UserMode -> MobileFrontend.AMC.Manager -> Mobi [16:32:13] it doesn't show up on mediawiki-new-errors console [16:32:20] That's a known issue [16:32:39] That.. was supposedly fixed [16:32:39] https://phabricator.wikimedia.org/T231014 [16:33:55] ah, mhhh ok I'll keep digging, I see elevated 5xx too starting ~15.15 [16:34:15] it looks like it is that godog [16:34:18] it correlates very well [16:34:52] yeah that's likely it! [16:36:10] I'll update the task [16:36:23] The other is https://phabricator.wikimedia.org/T231456 [16:38:17] thanks Reedy cdanis <3 [16:41:03] Reedy: It was fixed in wmf.20 because it was marked as a wmf.20 blocker. [16:41:18] but not backported? [16:41:18] I didn't follow-up and so didn't notice that the error was reported against wmf.19. [16:41:24] heh, fair enough [16:41:28] No, doing so now. [16:41:36] though, the stack trace in the bug does say .19 :P [16:41:55] Sure, but I didn't read the task because the Web team had dealt with it. [16:46:13] PROBLEM - Check the Netbox report-s- puppetdb for fail status. on netmon1002 is CRITICAL: puppetdb.PuppetDB CRITICAL https://wikitech.wikimedia.org/wiki/Netbox%23Reports [16:46:39] (03CR) 10Filippo Giunchedi: [C: 04-1] "Thanks all for the feedback! I brought this up at the infra-foundations team meeting today and while it seemed like a easy win on paper it" [puppet] - 10https://gerrit.wikimedia.org/r/528462 (https://phabricator.wikimedia.org/T228379) (owner: 10Filippo Giunchedi) [16:46:50] (03Abandoned) 10Filippo Giunchedi: monitoring::host: rename critical to paging [puppet] - 10https://gerrit.wikimedia.org/r/528462 (https://phabricator.wikimedia.org/T228379) (owner: 10Filippo Giunchedi) [16:47:05] (03Abandoned) 10Filippo Giunchedi: monitoring::service rename critical to paging [puppet] - 10https://gerrit.wikimedia.org/r/528463 (https://phabricator.wikimedia.org/T228379) (owner: 10Filippo Giunchedi) [16:47:13] (03PS5) 10Jforrester: Stop setting wgGraphIsTrusted (no longer used) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/522536 [16:47:32] (03CR) 10Jforrester: [C: 03+2] Stop setting wgGraphIsTrusted (no longer used) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/522536 (owner: 10Jforrester) [16:50:23] (03Merged) 10jenkins-bot: Stop setting wgGraphIsTrusted (no longer used) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/522536 (owner: 10Jforrester) [16:50:43] (03CR) 10jenkins-bot: Stop setting wgGraphIsTrusted (no longer used) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/522536 (owner: 10Jforrester) [16:51:47] !log jforrester@deploy1001 Synchronized wmf-config/InitialiseSettings.php: Stop setting wgGraphIsTrusted (no longer used) (duration: 00m 56s) [16:51:52] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:54:54] Hmm, I note that all the MobileFrontend errors are on HHVM and none on PHP7; are we accidentally/intentionally not pushing any PHP7 traffic on the mobile domains? [16:55:09] Or does the bug magically not occur on PHP7? [16:55:20] 10Operations, 10WMF-Legal, 10serviceops: Move old transparency report pages to historical URLs and setup redirect - https://phabricator.wikimedia.org/T230638 (10Varnent) >>! In T230638#5443831, @BBlack wrote: > @Varnent: For the redirects: just the main https://transparency.wikimedia.org/ URL? Or also the s... [17:05:25] PROBLEM - Work requests waiting in Zuul Gearman server on contint1001 is CRITICAL: CRITICAL: 35.71% of data above the critical threshold [140.0] https://www.mediawiki.org/wiki/Continuous_integration/Zuul https://grafana.wikimedia.org/dashboard/db/zuul-gearman?panelId=10&fullscreen&orgId=1 [17:06:13] !log jforrester@deploy1001 Synchronized php-1.34.0-wmf.19/extensions/MobileFrontend/includes: T231014 Postpone call to MobileContext::shouldDisplayMobileView() (duration: 00m 55s) [17:06:26] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:06:29] T231014: ServiceContainer.php: Circular dependency when creating MobileFrontend service "AMC.UserMode > AMC.Manager > FeaturesManager > UserModes > AMC.UserMode" - https://phabricator.wikimedia.org/T231014 [17:12:37] RECOVERY - MediaWiki exceptions and fatals per minute on graphite1004 is OK: OK: Less than 70.00% above the threshold [25.0] https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/dashboard/db/mediawiki-graphite-alerts?orgId=1&panelId=2&fullscreen [17:12:47] James_F, Reedy - yeah, we fixed the issue and it went live with 1.34-wmf.20. We thought that is pretty rare thing so we didn't backport it to wmf.19. Sorry for the trouble [17:13:31] raynor: It was rare because no-one was using AMC much. Oh well. :-) [17:14:17] we had some users, but not much. Not backporting it to wmf.19 is my fault. I should thing about that before enabling the Outreach modal [17:14:27] now we expect much more AMC users, that's why this error started to happen much more often [17:14:29] James_F: Are you planning on deploying the backported Kartographer fix? [17:14:35] !log jforrester@deploy1001 Synchronized php-1.34.0-wmf.20/extensions/Kartographer/includes/ApiQueryMapData.php: T231453 Fix array access as object (duration: 00m 54s) [17:14:39] aha :) [17:14:43] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:14:43] T231453: ApiQueryMapData.php: PHP Notice: Cannot access property on non-object - https://phabricator.wikimedia.org/T231453 [17:14:47] thanks! [17:15:03] James_F, is there anything what I can do to help you with the issue? [17:17:28] mdholloway: :-) [17:17:38] raynor: No, it looks all fixed. [17:17:43] (He says, optimistically.) [17:19:27] ok. thanks [17:20:15] PROBLEM - Varnish traffic drop between 30min ago and now at eqsin on icinga1001 is CRITICAL: 51.24 le 60 https://wikitech.wikimedia.org/wiki/Varnish%23Diagnosing_Varnish_alerts https://grafana.wikimedia.org/dashboard/db/varnish-http-requests?panelId=6&fullscreen&orgId=1 [17:20:36] (I'm fake rolling the train to group1 on mwdebug1002.) [17:23:23] RECOVERY - Varnish traffic drop between 30min ago and now at eqsin on icinga1001 is OK: (C)60 le (W)70 le 73.28 https://wikitech.wikimedia.org/wiki/Varnish%23Diagnosing_Varnish_alerts https://grafana.wikimedia.org/dashboard/db/varnish-http-requests?panelId=6&fullscreen&orgId=1 [17:24:41] OK, I'm clear on prod. [17:36:57] RECOVERY - Check systemd state on notebook1004 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [17:42:49] !log re-enable both sides of the reline link between knams and esams - T230448 [17:42:55] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:42:56] T230448: Aug 28th: turn off 1/3 esams-knams lasers in advance of Relined PA-988002 maintenance - https://phabricator.wikimedia.org/T230448 [17:46:59] 10Operations, 10Traffic, 10netops: Aug 28th: turn off 1/3 esams-knams lasers in advance of Relined PA-988002 maintenance - https://phabricator.wikimedia.org/T230448 (10ayounsi) 05Openβ†’03Resolved Confirmed working. [18:02:33] I don't see any deployments scheduled for now, so I plan to move the train forward, since it's unblocked now. Please let me know if you think that's not a good idea :) [18:17:15] 10Operations, 10ops-eqiad, 10DC-Ops: b6-eqiad pdu refresh (Tuesday 9/10 @11am UTC) - https://phabricator.wikimedia.org/T227541 (10RobH) a:05RobHβ†’03None Removing myself as assignee since this has all the servers populated in the task description. mw and thumbor hosts @joe stated he would add a followup c... [18:20:14] nobody complained, so moving the train forward [18:21:16] (03PS1) 10Zfilipin: group1 wikis to 1.34.0-wmf.20 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/533071 [18:21:30] (03CR) 10Zfilipin: [C: 03+2] group1 wikis to 1.34.0-wmf.20 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/533071 (owner: 10Zfilipin) [18:22:51] (03Merged) 10jenkins-bot: group1 wikis to 1.34.0-wmf.20 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/533071 (owner: 10Zfilipin) [18:23:19] (03CR) 10jenkins-bot: group1 wikis to 1.34.0-wmf.20 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/533071 (owner: 10Zfilipin) [18:24:26] !log zfilipin@deploy1001 rebuilt and synchronized wikiversions files: group1 wikis to 1.34.0-wmf.20 [18:24:32] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:25:04] 10Operations, 10Cassandra, 10Core Platform Team Workboards (Clinic Duty Team), 10User-Eevans: puppetize turning off reserved space for cassandra /srv - https://phabricator.wikimedia.org/T132632 (10Eevans) a:03Eevans [18:25:09] PROBLEM - Work requests waiting in Zuul Gearman server on contint1001 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [140.0] https://www.mediawiki.org/wiki/Continuous_integration/Zuul https://grafana.wikimedia.org/dashboard/db/zuul-gearman?panelId=10&fullscreen&orgId=1 [18:25:20] !log zfilipin@deploy1001 Synchronized php: group1 wikis to 1.34.0-wmf.20 (duration: 00m 53s) [18:25:25] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:27:36] 10Operations, 10ops-eqiad, 10DC-Ops: b8-eqiad pdu refresh (Thursday 10/31 @11am UTC) - https://phabricator.wikimedia.org/T227543 (10RobH) [18:27:39] PROBLEM - termbox codfw on termbox.svc.codfw.wmnet is CRITICAL: /termbox (get rendered termbox) is CRITICAL: Test get rendered termbox returned the unexpected status 500 (expecting: 200) https://wikitech.wikimedia.org/wiki/WMDE/Wikidata/SSR_Service [18:28:32] 10Operations, 10ops-eqiad, 10DC-Ops: b8-eqiad pdu refresh (Thursday 10/31 @11am UTC) - https://phabricator.wikimedia.org/T227543 (10RobH) [18:30:16] 10Operations, 10ops-eqiad, 10DC-Ops: b8-eqiad pdu refresh (Thursday 10/31 @11am UTC) - https://phabricator.wikimedia.org/T227543 (10RobH) [18:30:33] 10Operations, 10ops-eqiad, 10DC-Ops: b8-eqiad pdu refresh (Thursday 10/31 @11am UTC) - https://phabricator.wikimedia.org/T227543 (10RobH) a:05RobHβ†’03None [18:30:42] 10Operations, 10ops-eqiad, 10DC-Ops: b8-eqiad pdu refresh (Thursday 10/31 @11am UTC) - https://phabricator.wikimedia.org/T227543 (10RobH) p:05Triageβ†’03High [18:31:18] 10Operations, 10ops-eqiad, 10DC-Ops: b7-eqiad pdu refresh (Tuesday 11/5 @11am UTC) - https://phabricator.wikimedia.org/T227542 (10RobH) [18:31:27] 10Operations, 10ops-eqiad, 10DC-Ops: b7-eqiad pdu refresh (Tuesday 11/5 @11am UTC) - https://phabricator.wikimedia.org/T227542 (10RobH) p:05Triageβ†’03High [18:32:41] !log rebooting restbase-dev1006 -- T229421 [18:32:48] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:32:49] T229421: restbase-dev1006: ACPI errors - https://phabricator.wikimedia.org/T229421 [18:35:29] RECOVERY - termbox codfw on termbox.svc.codfw.wmnet is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/WMDE/Wikidata/SSR_Service [18:39:35] 10Operations, 10ops-eqiad, 10DC-Ops: b7-eqiad pdu refresh (Tuesday 11/5 @11am UTC) - https://phabricator.wikimedia.org/T227542 (10RobH) a:05RobHβ†’03None [18:39:57] 10Operations, 10ops-eqiad, 10DC-Ops: b7-eqiad pdu refresh (Tuesday 11/5 @11am UTC) - https://phabricator.wikimedia.org/T227542 (10RobH) [18:40:06] 10Operations, 10ops-eqiad, 10DC-Ops: b7-eqiad pdu refresh (Tuesday 11/5 @11am UTC) - https://phabricator.wikimedia.org/T227542 (10RobH) [18:40:42] 10Operations, 10ops-eqiad, 10DC-Ops: a3-eqiad pdu refresh - https://phabricator.wikimedia.org/T227139 (10RobH) a:05RobHβ†’03None [18:40:48] 10Operations, 10ops-eqiad, 10DC-Ops: a4-eqiad pdu refresh - https://phabricator.wikimedia.org/T227140 (10RobH) a:05RobHβ†’03None [18:40:54] 10Operations, 10ops-eqiad, 10DC-Ops: a5-eqiad pdu refresh - https://phabricator.wikimedia.org/T227141 (10RobH) a:05RobHβ†’03None [18:41:18] 10Operations, 10ops-eqiad, 10DC-Ops: dbproxy1012 and dbprov1001 alerting on PS Redundancy - https://phabricator.wikimedia.org/T228859 (10RobH) a:05RobHβ†’03None [18:42:55] 10Operations, 10ops-eqiad, 10DC-Ops: b3-eqiad pdu refresh (Tuesday 9/17 @11am UTC) - https://phabricator.wikimedia.org/T227539 (10RobH) a:05RobHβ†’03None [18:47:46] 10Operations, 10Cassandra, 10Core Platform Team Workboards (Clinic Duty Team): Revisit default settings for c-foreach-restart - https://phabricator.wikimedia.org/T198787 (10Eevans) [18:56:19] * Krinkle poking around at mwdebug1002 [18:57:01] 10Operations, 10netops: Review switches ACL to connect from tools-bastion to dbproxy1018 - https://phabricator.wikimedia.org/T231418 (10ayounsi) `lang=diff [edit firewall family inet filter labs-instance-in4 term labsdb-tcp4 from destination-address] 10.64.37.28/32 { ... } + 10.64.37.27/32; [ed... [18:57:39] !log update cloud firewall policies on cr1/2-eqiad - T231418 [18:57:45] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:57:45] T231418: Review switches ACL to connect from tools-bastion to dbproxy1018 - https://phabricator.wikimedia.org/T231418 [18:58:18] 10Operations, 10netops: Review switches ACL to connect from tools-bastion to dbproxy1018 - https://phabricator.wikimedia.org/T231418 (10ayounsi) 05Openβ†’03Resolved a:03ayounsi [19:12:59] Krinkle: Still poking? Want to hot-patch T231488 for testing. [19:12:59] T231488: UploadFromChunks.php: Call to undefined method MediaWiki\FileBackend\FSFile\TempFSFileFactory::getTempFSFile() - https://phabricator.wikimedia.org/T231488 [19:15:34] !log Live hacking php-1.34.0-wmf.20/includes/upload/UploadFromChunks.php on mwdebug1002 for T231488 [19:15:41] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:18:45] (03PS1) 10Ayounsi: Add bash script between FNM and notify script [puppet] - 10https://gerrit.wikimedia.org/r/533081 (https://phabricator.wikimedia.org/T226810) [19:24:51] James_F: go ahead, [19:24:56] I've reset the server now [19:27:01] (03CR) 10Ayounsi: "https://puppet-compiler.wmflabs.org/compiler1002/18099/" [puppet] - 10https://gerrit.wikimedia.org/r/533081 (https://phabricator.wikimedia.org/T226810) (owner: 10Ayounsi) [19:34:04] Unfortunately I'm really struggling to replicate the issue, which makes proving that this fixes it hard. [19:34:43] RECOVERY - ElasticSearch shard size check - 9243 on search.svc.eqiad.wmnet is OK: OK - All good! https://wikitech.wikimedia.org/wiki/Search%23If_it_has_been_indexed [19:34:46] Chunked uploads seem to never return from the "assembling" stage in UploadWizard… [19:35:36] 10Operations, 10Cassandra, 10Core Platform Team Workboards (Clinic Duty Team), 10User-Eevans: puppetize turning off reserved space for cassandra /srv - https://phabricator.wikimedia.org/T132632 (10Eevans) Interestingly, reserved space on the main data volumes in the production cluster already have zero res... [19:38:33] I'm going to deploy it as-is. [19:39:13] !log jforrester@deploy1001 Synchronized php-1.34.0-wmf.20/includes/upload/UploadFromChunks.php: T231488 Speculatively hot-deploy fix ahead of landing in git (duration: 00m 54s) [19:39:18] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:39:19] T231488: UploadFromChunks.php: Call to undefined method MediaWiki\FileBackend\FSFile\TempFSFileFactory::getTempFSFile() - https://phabricator.wikimedia.org/T231488 [19:46:10] 10Operations, 10Cassandra, 10Core Platform Team Workboards (Clinic Duty Team): puppetize turning off reserved space for cassandra /srv - https://phabricator.wikimedia.org/T132632 (10Eevans) I do not know how it came to pass that machines are getting setup without reserved space, but given how long this issue... [20:00:04] cscott, arlolra, subbu, bearND, halfak, and accraze: It is that lovely time of the day again! You are hereby commanded to deploy Services – Parsoid / Citoid / Mobileapps / ORES / …. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20190828T2000). [20:01:48] (03Abandoned) 10EBernhardson: Improve diff output with sorting [software/puppet-compiler] - 10https://gerrit.wikimedia.org/r/442228 (owner: 10EBernhardson) [20:02:05] (03Abandoned) 10Cwhite: admin: admin data and access for Abijeet Patro [puppet] - 10https://gerrit.wikimedia.org/r/529125 (https://phabricator.wikimedia.org/T230020) (owner: 10Cwhite) [20:02:43] (03Abandoned) 10EBernhardson: Install mysql client to mediawiki canary appservers [puppet] - 10https://gerrit.wikimedia.org/r/460451 (owner: 10EBernhardson) [20:12:08] (Prod is clean again,) [20:28:11] (03CR) 10Ayounsi: [C: 03+1] "Reviewed the changes between PS38 and PS43." [puppet] - 10https://gerrit.wikimedia.org/r/514395 (https://phabricator.wikimedia.org/T223291) (owner: 10CRusnov) [20:32:11] PROBLEM - Host cloudvirtan1001.mgmt is DOWN: PING CRITICAL - Packet loss = 100% [20:32:59] PROBLEM - Host cloudvirtan1002.mgmt is DOWN: PING CRITICAL - Packet loss = 100% [20:33:09] PROBLEM - Host cloudvirtan1005.mgmt is DOWN: PING CRITICAL - Packet loss = 100% [20:34:49] PROBLEM - Host cloudvirtan1004.mgmt is DOWN: PING CRITICAL - Packet loss = 100% [20:35:55] PROBLEM - Host cloudvirtan1003.mgmt is DOWN: PING CRITICAL - Packet loss = 100% [21:00:12] 10Operations, 10Wikimedia-Mailing-lists, 10Security, 10User-Josve05a: Stop storing Mailman passwords in plain text - https://phabricator.wikimedia.org/T181803 (10Josve05a) [21:00:58] 10Operations, 10Wikimedia-Mailing-lists, 10Privacy, 10Security, 10User-Josve05a: Stop storing Mailman passwords in plain text - https://phabricator.wikimedia.org/T181803 (10Josve05a) [21:12:35] (03CR) 10Jhedden: [C: 03+1] wmcs::nfs: add ipv6 mapped address [puppet] - 10https://gerrit.wikimedia.org/r/531241 (https://phabricator.wikimedia.org/T102099) (owner: 10Jbond) [21:18:34] (03PS1) 10BBlack: Revert "Add TXT verify for brave.com for wikipedia.org" [dns] - 10https://gerrit.wikimedia.org/r/533102 [21:20:51] cmjohnson1, ottomata ^ [21:21:55] RECOVERY - Work requests waiting in Zuul Gearman server on contint1001 is OK: OK: Less than 30.00% above the threshold [90.0] https://www.mediawiki.org/wiki/Continuous_integration/Zuul https://grafana.wikimedia.org/dashboard/db/zuul-gearman?panelId=10&fullscreen&orgId=1 [21:33:17] (03CR) 10BBlack: [C: 03+2] Revert "Add TXT verify for brave.com for wikipedia.org" [dns] - 10https://gerrit.wikimedia.org/r/533102 (owner: 10BBlack) [21:35:27] XioNoX: assuming is due to T225128 [21:35:28] T225128: Move cloudvirtan* hardware out of CloudVPS back into production Analytics VLAN. - https://phabricator.wikimedia.org/T225128 [21:36:11] RECOVERY - Host cloudvirtan1001.mgmt is UP: PING OK - Packet loss = 0%, RTA = 1.40 ms [21:36:28] ok! [21:36:59] I see that some have their notifcation muted, and others don't [21:37:38] should they be downtimed for longer? [21:37:59] why use mute notification vs. downtime? (same goes for other hosts in the queue) [21:38:07] PROBLEM - High average GET latency for mw requests on appserver in eqiad on icinga1001 is CRITICAL: cluster=appserver code=200 handler=proxy:unix:/run/php/fpm-www.sock https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=appserver&var-method=GET [21:38:13] PROBLEM - Memory correctable errors -EDAC- on thumbor1004 is CRITICAL: 4.001 ge 4 https://wikitech.wikimedia.org/wiki/Monitoring/Memory%23Memory_correctable_errors_-EDAC- https://grafana.wikimedia.org/dashboard/db/host-overview?orgId=1&var-server=thumbor1004&var-datasource=eqiad+prometheus/ops [21:38:29] i downtimed a while ago using decom cookbook [21:38:47] more than a month ago tho [21:38:49] RECOVERY - Host cloudvirtan1004.mgmt is UP: PING OK - Packet loss = 0%, RTA = 5.41 ms [21:39:07] ah, did their downtime end? [21:39:14] maybe jclark-ctr is working on it? [21:39:16] !log Set downtime/ack for showmount on labstore1004 (T229448) [21:39:21] not sure what the cookbook does [21:39:22] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:39:23] T229448: showmount not working on labstore1004 - https://phabricator.wikimedia.org/T229448 [21:41:17] RECOVERY - High average GET latency for mw requests on appserver in eqiad on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=appserver&var-method=GET [21:42:49] RECOVERY - Host cloudvirtan1002.mgmt is UP: PING OK - Packet loss = 0%, RTA = 2.24 ms [21:42:59] RECOVERY - Host cloudvirtan1005.mgmt is UP: PING OK - Packet loss = 0%, RTA = 0.83 ms [21:43:34] https://phabricator.wikimedia.org/T225128 cloudvirtan10[1...5] in processes of being moved [21:51:45] jclark-ctr: make sure the hosts (and their mgmt interface) are downtimed in our monitoring before powering them down. I can walk you through icinga if needed [21:56:16] 10Operations, 10Commons, 10MediaWiki-File-management, 10Traffic, and 2 others: Picture from Commons not found from Singapore - https://phabricator.wikimedia.org/T231086 (10CDanis) Fortunately this occurrence seems to be quite rare. On each Swift frontend host, I: * grepped today's logs for GETs that resul... [21:57:29] RECOVERY - Host cloudvirtan1003.mgmt is UP: PING WARNING - Packet loss = 44%, RTA = 126.07 ms [22:14:39] 10Operations, 10ops-eqiad, 10Analytics, 10Analytics-Kanban, 10netops: Move cloudvirtan* hardware out of CloudVPS back into production Analytics VLAN. - https://phabricator.wikimedia.org/T225128 (10Jclark-ctr) Host moved cmjohnson. advised to move out if row B in to 10G racks leave 1 in B ` host... [22:20:48] ACKNOWLEDGEMENT - Check the Netbox report-s- librenms for fail status. on netmon1002 is CRITICAL: librenms.LibreNMS CRITICAL Ayounsi https://phabricator.wikimedia.org/T231502 https://wikitech.wikimedia.org/wiki/Netbox%23Reports [22:30:15] 10Operations, 10ops-eqiad, 10Analytics, 10Analytics-Kanban, 10netops: Move cloudvirtan* hardware out of CloudVPS back into production Analytics VLAN. - https://phabricator.wikimedia.org/T225128 (10Jclark-ctr) a:05Jclark-ctrβ†’03Cmjohnson [22:30:49] PROBLEM - Check the Netbox report-s- puppetdb for fail status. on netmon1002 is CRITICAL: puppetdb.PuppetDB CRITICAL https://wikitech.wikimedia.org/wiki/Netbox%23Reports [22:41:49] PROBLEM - HHVM rendering on mw1277 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers [22:43:15] RECOVERY - HHVM rendering on mw1277 is OK: HTTP OK: HTTP/1.1 200 OK - 74727 bytes in 0.168 second response time https://wikitech.wikimedia.org/wiki/Application_servers [23:00:04] MaxSem, RoanKattouw, Niharika, and Urbanecm: #bothumor My software never has bugs. It just develops random features. Rise for Evening SWAT (Max 6 patches). (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20190828T2300). [23:00:05] No GERRIT patches in the queue for this window AFAICS. [23:08:30] (03PS1) 10Tks4Fish: Add scielo.br to wgCopyUploadsDomains for commonswiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/533114 (https://phabricator.wikimedia.org/T231402) [23:14:04] (03CR) 10Urbanecm: [C: 03+2] Add scielo.br to wgCopyUploadsDomains for commonswiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/533114 (https://phabricator.wikimedia.org/T231402) (owner: 10Tks4Fish) [23:15:36] (03Merged) 10jenkins-bot: Add scielo.br to wgCopyUploadsDomains for commonswiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/533114 (https://phabricator.wikimedia.org/T231402) (owner: 10Tks4Fish) [23:16:18] Tks4Fish: going to sync that patch now [23:16:26] okay, thanks :) [23:16:34] (03CR) 10jenkins-bot: Add scielo.br to wgCopyUploadsDomains for commonswiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/533114 (https://phabricator.wikimedia.org/T231402) (owner: 10Tks4Fish) [23:17:29] !log urbanecm@deploy1001 Synchronized wmf-config/InitialiseSettings.php: SWAT: 8bfe43a: Add scielo.br to wgCopyUploadsDomains for commonswiki (T231402) (duration: 00m 55s) [23:17:37] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:17:39] T231402: Add to the wgCopyUploadsDomains whitelist of Wikimedia Commons - https://phabricator.wikimedia.org/T231402 [23:18:03] Tks4Fish: done! Feel free to close the task as resolved now :) [23:18:15] okay, thanks a lot man, really appreciate it [23:18:20] (and cg for your first gerrit contribution) [23:18:32] happy to help! [23:41:30] (03CR) 10Volans: "Some nit and a question inline, looks good otherwise." (033 comments) [puppet] - 10https://gerrit.wikimedia.org/r/533081 (https://phabricator.wikimedia.org/T226810) (owner: 10Ayounsi) [23:52:45] 10Operations, 10Traffic: Unexpectedly received mobile version of an article while logged out - https://phabricator.wikimedia.org/T231504 (10Mholloway) [23:56:27] 10Operations, 10Traffic: Unexpectedly received mobile version of an article while logged out - https://phabricator.wikimedia.org/T231504 (10Mholloway)