[00:10:12] (03PS1) 10BryanDavis: Toolforge: "Tool Labs" legacy name updates [puppet] - 10https://gerrit.wikimedia.org/r/498754 [00:17:03] (03PS1) 10QChris: Add .gitreview [debs/python-git-archive-all] - 10https://gerrit.wikimedia.org/r/498755 [00:17:05] (03CR) 10QChris: [V: 03+2 C: 03+2] Add .gitreview [debs/python-git-archive-all] - 10https://gerrit.wikimedia.org/r/498755 (owner: 10QChris) [01:31:11] PROBLEM - Wikitech and wt-static content in sync on labweb1002 is CRITICAL: wikitech-static CRIT - wikitech and wikitech-static out of sync (200154s 200000s) https://wikitech.wikimedia.org/wiki/Wikitech-static [01:47:29] PROBLEM - Wikitech and wt-static content in sync on labweb1001 is CRITICAL: wikitech-static CRIT - wikitech and wikitech-static out of sync (200154s 200000s) https://wikitech.wikimedia.org/wiki/Wikitech-static [01:57:53] PROBLEM - pdfrender on scb1003 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://phabricator.wikimedia.org/T174916 [02:01:39] RECOVERY - pdfrender on scb1003 is OK: HTTP OK: HTTP/1.1 200 OK - 275 bytes in 2.095 second response time https://phabricator.wikimedia.org/T174916 [02:06:55] PROBLEM - pdfrender on scb1003 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://phabricator.wikimedia.org/T174916 [02:09:21] RECOVERY - pdfrender on scb1003 is OK: HTTP OK: HTTP/1.1 200 OK - 275 bytes in 0.002 second response time https://phabricator.wikimedia.org/T174916 [02:09:35] 10Operations, 10Patch-For-Review: Tracking and Reducing cron-spam to root@ - https://phabricator.wikimedia.org/T132324 (10GTirloni) [02:13:23] PROBLEM - pdfrender on scb1003 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://phabricator.wikimedia.org/T174916 [02:14:31] RECOVERY - pdfrender on scb1003 is OK: HTTP OK: HTTP/1.1 200 OK - 275 bytes in 0.002 second response time https://phabricator.wikimedia.org/T174916 [02:18:33] PROBLEM - pdfrender on scb1003 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://phabricator.wikimedia.org/T174916 [02:19:43] RECOVERY - pdfrender on scb1003 is OK: HTTP OK: HTTP/1.1 200 OK - 275 bytes in 1.561 second response time https://phabricator.wikimedia.org/T174916 [02:23:43] PROBLEM - pdfrender on scb1003 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://phabricator.wikimedia.org/T174916 [02:28:45] RECOVERY - pdfrender on scb1003 is OK: HTTP OK: HTTP/1.1 200 OK - 275 bytes in 2.523 second response time https://phabricator.wikimedia.org/T174916 [02:32:47] PROBLEM - pdfrender on scb1003 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://phabricator.wikimedia.org/T174916 [02:50:55] RECOVERY - pdfrender on scb1003 is OK: HTTP OK: HTTP/1.1 200 OK - 275 bytes in 8.438 second response time https://phabricator.wikimedia.org/T174916 [02:57:27] PROBLEM - pdfrender on scb1003 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://phabricator.wikimedia.org/T174916 [02:58:35] RECOVERY - pdfrender on scb1003 is OK: HTTP OK: HTTP/1.1 200 OK - 275 bytes in 2.090 second response time https://phabricator.wikimedia.org/T174916 [03:03:53] PROBLEM - pdfrender on scb1003 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://phabricator.wikimedia.org/T174916 [03:27:21] PROBLEM - puppet last run on cloudvirt1029 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [03:33:39] RECOVERY - pdfrender on scb1003 is OK: HTTP OK: HTTP/1.1 200 OK - 275 bytes in 9.186 second response time https://phabricator.wikimedia.org/T174916 [03:35:21] (03PS5) 10CRusnov: Netbox module for Spicerack [software/spicerack] - 10https://gerrit.wikimedia.org/r/493138 (https://phabricator.wikimedia.org/T217072) [03:37:33] PROBLEM - pdfrender on scb1003 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://phabricator.wikimedia.org/T174916 [03:39:31] (03CR) 10jerkins-bot: [V: 04-1] Netbox module for Spicerack [software/spicerack] - 10https://gerrit.wikimedia.org/r/493138 (https://phabricator.wikimedia.org/T217072) (owner: 10CRusnov) [03:43:09] PROBLEM - puppet last run on authdns1001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [03:46:44] (03PS6) 10CRusnov: Netbox module for Spicerack [software/spicerack] - 10https://gerrit.wikimedia.org/r/493138 (https://phabricator.wikimedia.org/T217072) [03:50:43] (03CR) 10jerkins-bot: [V: 04-1] Netbox module for Spicerack [software/spicerack] - 10https://gerrit.wikimedia.org/r/493138 (https://phabricator.wikimedia.org/T217072) (owner: 10CRusnov) [03:58:59] RECOVERY - puppet last run on cloudvirt1029 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures [04:02:59] (03PS7) 10CRusnov: Netbox module for Spicerack [software/spicerack] - 10https://gerrit.wikimedia.org/r/493138 (https://phabricator.wikimedia.org/T217072) [04:08:12] (03CR) 10jerkins-bot: [V: 04-1] Netbox module for Spicerack [software/spicerack] - 10https://gerrit.wikimedia.org/r/493138 (https://phabricator.wikimedia.org/T217072) (owner: 10CRusnov) [04:09:31] RECOVERY - puppet last run on authdns1001 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [04:10:57] (03PS8) 10CRusnov: Netbox module for Spicerack [software/spicerack] - 10https://gerrit.wikimedia.org/r/493138 (https://phabricator.wikimedia.org/T217072) [04:21:27] RECOVERY - pdfrender on scb1003 is OK: HTTP OK: HTTP/1.1 200 OK - 275 bytes in 7.210 second response time https://phabricator.wikimedia.org/T174916 [04:25:21] PROBLEM - pdfrender on scb1003 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://phabricator.wikimedia.org/T174916 [04:30:21] RECOVERY - pdfrender on scb1003 is OK: HTTP OK: HTTP/1.1 200 OK - 275 bytes in 0.002 second response time https://phabricator.wikimedia.org/T174916 [04:31:51] !log restarted pdfrender on scb1003 to try to help flapping [04:31:53] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:31:45] RECOVERY - Wikitech and wt-static content in sync on labweb1002 is OK: wikitech-static OK - wikitech and wikitech-static in sync (115848 200000s) https://wikitech.wikimedia.org/wiki/Wikitech-static [05:48:01] RECOVERY - Wikitech and wt-static content in sync on labweb1001 is OK: wikitech-static OK - wikitech and wikitech-static in sync (115848 200000s) https://wikitech.wikimedia.org/wiki/Wikitech-static [06:05:46] (03PS1) 10Marostegui: db-eqiad.php: Depool db1118 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/498766 [06:06:41] (03PS2) 10Marostegui: db-eqiad.php: Depool db1118 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/498766 [06:06:59] (03PS1) 10Marostegui: check_private_data_report: Add labsdb1012 [puppet] - 10https://gerrit.wikimedia.org/r/498767 [06:08:01] (03CR) 10Marostegui: [C: 03+2] check_private_data_report: Add labsdb1012 [puppet] - 10https://gerrit.wikimedia.org/r/498767 (owner: 10Marostegui) [06:08:29] (03CR) 10Marostegui: [C: 03+2] db-eqiad.php: Depool db1118 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/498766 (owner: 10Marostegui) [06:09:41] (03Merged) 10jenkins-bot: db-eqiad.php: Depool db1118 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/498766 (owner: 10Marostegui) [06:09:54] (03CR) 10jenkins-bot: db-eqiad.php: Depool db1118 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/498766 (owner: 10Marostegui) [06:10:53] !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Depool db1118 for schema change and upgrade (duration: 00m 54s) [06:10:55] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:14:03] 10Operations, 10ops-codfw, 10DBA, 10Patch-For-Review: rack/setup/deploy codfw dedicated backup recovery/provisioning hosts - https://phabricator.wikimedia.org/T218336 (10Marostegui) [06:17:17] (03PS1) 10Marostegui: install_server: Add db1139,db1140,dbprov200* [puppet] - 10https://gerrit.wikimedia.org/r/498768 (https://phabricator.wikimedia.org/T218985) [06:42:27] 10Operations, 10ops-eqiad, 10DBA: db1078 s3 primary DB master BBU pre-failure - https://phabricator.wikimedia.org/T219115 (10Marostegui) [06:42:45] 10Operations, 10ops-eqiad, 10DBA: db1078 s3 primary DB master BBU pre-failure - https://phabricator.wikimedia.org/T219115 (10Marostegui) p:05Triage→03High Setting this to high priority as this is s3 primary database master. [06:44:53] !log Deploy schema change on s1 codfw master, this will generate lag on codfw [06:44:54] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:45:29] !log Stop MySQL on db1118 for upgrade [06:45:31] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:52:29] (03PS1) 10星耀晨曦: Add new throttle rule [mediawiki-config] - 10https://gerrit.wikimedia.org/r/498770 (https://phabricator.wikimedia.org/T219113) [06:53:15] (03CR) 10jerkins-bot: [V: 04-1] Add new throttle rule [mediawiki-config] - 10https://gerrit.wikimedia.org/r/498770 (https://phabricator.wikimedia.org/T219113) (owner: 10星耀晨曦) [06:54:06] (03PS1) 10Vgutierrez: archiva: Switch to the directory based deployment used by acme-chief [puppet] - 10https://gerrit.wikimedia.org/r/498771 (https://phabricator.wikimedia.org/T207295) [06:54:50] (03PS1) 10Marostegui: db-eqiad.php: Slowly repool db1118 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/498772 [06:56:50] (03CR) 10Marostegui: [C: 03+2] db-eqiad.php: Slowly repool db1118 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/498772 (owner: 10Marostegui) [06:57:58] (03Merged) 10jenkins-bot: db-eqiad.php: Slowly repool db1118 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/498772 (owner: 10Marostegui) [06:59:05] !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Slowly repool db1118 after mysql upgrade (duration: 00m 50s) [06:59:06] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:59:17] (03CR) 10Vgutierrez: [C: 03+2] "pcc shows the expected path changes: https://puppet-compiler.wmflabs.org/compiler1002/15308/" [puppet] - 10https://gerrit.wikimedia.org/r/498771 (https://phabricator.wikimedia.org/T207295) (owner: 10Vgutierrez) [07:03:56] (03PS1) 10Meshvogel: db::views: Bring back abuse_filter_history table [puppet] - 10https://gerrit.wikimedia.org/r/498773 (https://phabricator.wikimedia.org/T123978) [07:03:58] (03CR) 10Welcome, new contributor!: "Thank you for making your first contribution to Wikimedia! :) To learn how to get your code changes reviewed faster and more likely to get" [puppet] - 10https://gerrit.wikimedia.org/r/498773 (https://phabricator.wikimedia.org/T123978) (owner: 10Meshvogel) [07:04:16] (03CR) 10jenkins-bot: db-eqiad.php: Slowly repool db1118 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/498772 (owner: 10Marostegui) [07:07:27] (03PS2) 10星耀晨曦: Add new throttle rule [mediawiki-config] - 10https://gerrit.wikimedia.org/r/498770 (https://phabricator.wikimedia.org/T219113) [07:08:08] (03PS5) 10Giuseppe Lavagetto: Direct 0.1% of anonymous users to php7 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/494756 (https://phabricator.wikimedia.org/T216676) [07:09:14] (03PS1) 10Vgutierrez: dumps: Switch to the directory based deployment used by acme-chief [puppet] - 10https://gerrit.wikimedia.org/r/498774 (https://phabricator.wikimedia.org/T207295) [07:13:03] (03CR) 10Vgutierrez: [C: 03+2] "everything looks as expected:" [puppet] - 10https://gerrit.wikimedia.org/r/498774 (https://phabricator.wikimedia.org/T207295) (owner: 10Vgutierrez) [07:18:56] (03PS1) 10Marostegui: db-eqiad.php: More traffic to db1118 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/498775 [07:20:36] (03CR) 10Marostegui: [C: 03+2] db-eqiad.php: More traffic to db1118 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/498775 (owner: 10Marostegui) [07:22:48] (03Merged) 10jenkins-bot: db-eqiad.php: More traffic to db1118 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/498775 (owner: 10Marostegui) [07:24:02] !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: More traffic to db1118 (duration: 00m 49s) [07:24:04] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:26:26] (03CR) 10jenkins-bot: db-eqiad.php: More traffic to db1118 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/498775 (owner: 10Marostegui) [07:28:10] (03CR) 10Mathew.onipe: elasticsearch: convert check to py3 (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/498292 (https://phabricator.wikimedia.org/T215439) (owner: 10Mathew.onipe) [07:28:12] (03PS3) 10Mathew.onipe: elasticsearch: convert check to py3 [puppet] - 10https://gerrit.wikimedia.org/r/498292 (https://phabricator.wikimedia.org/T215439) [07:30:46] (03PS1) 10Vgutierrez: openldap: Switch to the directory based deployment used by acme-chief [puppet] - 10https://gerrit.wikimedia.org/r/498776 (https://phabricator.wikimedia.org/T207295) [07:34:07] 10Operations, 10Operations-Software-Development, 10Continuous-Integration-Config, 10Patch-For-Review: Puppet tox: properly lint both Py2 and Py3 files - https://phabricator.wikimedia.org/T184435 (10jcrespo) +1 for the migration. Assuming it is easily reversible, we can even try to change it and see what ha... [07:40:06] !log disable puppet in production openldap servers before merging https://gerrit.wikimedia.org/r/498776 [07:40:08] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:40:54] (03CR) 10Vgutierrez: [C: 03+2] "PCC shows the expected changes in the 5 affected nodes: https://puppet-compiler.wmflabs.org/compiler1002/15309/" [puppet] - 10https://gerrit.wikimedia.org/r/498776 (https://phabricator.wikimedia.org/T207295) (owner: 10Vgutierrez) [07:42:35] (03PS2) 10Muehlenhoff: Add dbus-daemon to filter_services list of debdeploy [puppet] - 10https://gerrit.wikimedia.org/r/498321 (https://phabricator.wikimedia.org/T135991) [07:45:26] (03PS1) 10Marostegui: db-eqiad.php: Fully repool db1118 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/498777 [07:45:43] PROBLEM - Labs LDAP on labtestservices2001 is CRITICAL: Could not bind to the LDAP server https://wikitech.wikimedia.org/wiki/LDAP%23Troubleshooting [07:46:22] that's me [07:46:38] (03CR) 10Muehlenhoff: [C: 03+2] Add dbus-daemon to filter_services list of debdeploy [puppet] - 10https://gerrit.wikimedia.org/r/498321 (https://phabricator.wikimedia.org/T135991) (owner: 10Muehlenhoff) [07:46:43] apparmor blocking access to the new paths :/ [07:47:58] (03PS1) 10Vgutierrez: Revert "openldap: Switch to the directory based deployment used by acme-chief" [puppet] - 10https://gerrit.wikimedia.org/r/498778 [07:48:29] (03CR) 10Vgutierrez: [C: 03+2] Revert "openldap: Switch to the directory based deployment used by acme-chief" [puppet] - 10https://gerrit.wikimedia.org/r/498778 (owner: 10Vgutierrez) [07:48:44] (03PS2) 10Vgutierrez: Revert "openldap: Switch to the directory based deployment used by acme-chief" [puppet] - 10https://gerrit.wikimedia.org/r/498778 [07:49:05] PROBLEM - puppet last run on labtestservices2001 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 4 minutes ago with 1 failures. Failed resources (up to 3 shown): Service[slapd] [07:49:54] the joy of tech debt, that host is even up for decom: https://phabricator.wikimedia.org/T218022 [07:53:21] RECOVERY - Labs LDAP on labtestservices2001 is OK: LDAP OK - 0.114 seconds response time https://wikitech.wikimedia.org/wiki/LDAP%23Troubleshooting [07:54:19] RECOVERY - puppet last run on labtestservices2001 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [07:58:21] !log disable puppet and downtime host in icinga for labtestservices2001 - T218022 [07:58:23] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:58:24] T218022: decommission: labtestservices2001.wikimedia.org - https://phabricator.wikimedia.org/T218022 [07:59:55] RECOVERY - Disk space on notebook1004 is OK: DISK OK [08:02:06] (03PS1) 10Vgutierrez: Revert "Revert "openldap: Switch to the directory based deployment used by acme-chief"" [puppet] - 10https://gerrit.wikimedia.org/r/498779 [08:03:13] (03CR) 10jerkins-bot: [V: 04-1] Revert "Revert "openldap: Switch to the directory based deployment used by acme-chief"" [puppet] - 10https://gerrit.wikimedia.org/r/498779 (owner: 10Vgutierrez) [08:03:19] (03CR) 10Marostegui: [C: 03+2] db-eqiad.php: Fully repool db1118 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/498777 (owner: 10Marostegui) [08:03:23] yeah yeah.. the commit message :) [08:04:03] (03PS2) 10Vgutierrez: Revert "Revert "openldap: Switch to the directory based deployment used by acme-chief"" [puppet] - 10https://gerrit.wikimedia.org/r/498779 [08:04:30] (03Merged) 10jenkins-bot: db-eqiad.php: Fully repool db1118 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/498777 (owner: 10Marostegui) [08:05:39] (03CR) 10Vgutierrez: [C: 03+2] Revert "Revert "openldap: Switch to the directory based deployment used by acme-chief"" [puppet] - 10https://gerrit.wikimedia.org/r/498779 (owner: 10Vgutierrez) [08:05:46] !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Fully repool db1118 (duration: 00m 49s) [08:05:47] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:07:56] (03CR) 10Hashar: [C: 03+1] "Well done :]" [puppet] - 10https://gerrit.wikimedia.org/r/498731 (https://phabricator.wikimedia.org/T219085) (owner: 10Alex Monk) [08:08:20] !log reenabling puppet in openldap servers [08:08:22] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:09:55] (03CR) 10jenkins-bot: db-eqiad.php: Fully repool db1118 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/498777 (owner: 10Marostegui) [08:17:52] (03PS1) 10Marostegui: db-eqiad.php: Depool db1105:3311 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/498780 [08:19:07] (03CR) 10Marostegui: [C: 03+2] db-eqiad.php: Depool db1105:3311 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/498780 (owner: 10Marostegui) [08:20:12] (03Merged) 10jenkins-bot: db-eqiad.php: Depool db1105:3311 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/498780 (owner: 10Marostegui) [08:20:48] (03CR) 10jenkins-bot: db-eqiad.php: Depool db1105:3311 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/498780 (owner: 10Marostegui) [08:21:37] !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Depool db1105:3311 (duration: 00m 49s) [08:21:39] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:21:47] (03PS4) 10Mathew.onipe: elasticsearch: convert check to py3 [puppet] - 10https://gerrit.wikimedia.org/r/498292 (https://phabricator.wikimedia.org/T215439) [08:22:12] (03PS1) 10Vgutierrez: lists: Switch to the directory based deployment used by acme-chief [puppet] - 10https://gerrit.wikimedia.org/r/498781 (https://phabricator.wikimedia.org/T207295) [08:28:11] 10Operations, 10MediaWiki-Cache, 10MW-1.33-notes (1.33.0-wmf.21; 2019-03-12), 10Patch-For-Review, and 3 others: Mcrouter periodically reports soft TKOs for mc1022 (was mc1035) leading to MW Memcached exceptions - https://phabricator.wikimedia.org/T203786 (10elukey) Today I have executed some tests on mw222... [08:31:22] (03PS1) 10Jcrespo: mariadb: Narrow the search for the treated backup [software/wmfmariadbpy] - 10https://gerrit.wikimedia.org/r/498782 (https://phabricator.wikimedia.org/T206203) [08:31:45] (03CR) 10jerkins-bot: [V: 04-1] mariadb: Narrow the search for the treated backup [software/wmfmariadbpy] - 10https://gerrit.wikimedia.org/r/498782 (https://phabricator.wikimedia.org/T206203) (owner: 10Jcrespo) [08:36:57] PROBLEM - Check systemd state on an-coord1001 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [08:37:31] checking --^ [08:51:04] (03PS1) 10Dzahn: add Icinga notes_url to various NRPE monitor checks, pt 1 [puppet] - 10https://gerrit.wikimedia.org/r/498784 [08:51:06] (03CR) 10Mathew.onipe: multi-instance for elastic deployment-prep (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/497698 (https://phabricator.wikimedia.org/T213940) (owner: 10Mathew.onipe) [08:51:24] (03PS2) 10Mathew.onipe: multi-instance for elastic deployment-prep [puppet] - 10https://gerrit.wikimedia.org/r/497698 (https://phabricator.wikimedia.org/T213940) [08:53:31] RECOVERY - Check systemd state on an-coord1001 is OK: OK - running: The system is fully operational [08:53:41] (03PS4) 10Dzahn: contint: change `/r/p/` to `/r/` for gerrit links [puppet] - 10https://gerrit.wikimedia.org/r/498057 (https://phabricator.wikimedia.org/T218844) (owner: 10MarcoAurelio) [08:55:05] (03PS1) 10Marostegui: Revert "db-eqiad.php: Depool db1105:3311" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/498785 [08:55:33] (03PS1) 10Muehlenhoff: Remove access for melodykramer [puppet] - 10https://gerrit.wikimedia.org/r/498786 [08:58:43] come on jeeeenkins [09:03:24] (03PS2) 10Jcrespo: mariadb: Narrow the search for the treated backup [software/wmfmariadbpy] - 10https://gerrit.wikimedia.org/r/498782 (https://phabricator.wikimedia.org/T206203) [09:03:40] (03CR) 10Marostegui: "recheck" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/498785 (owner: 10Marostegui) [09:04:06] (03PS1) 10Jcrespo: mariadb-backups: Fix big on state updating logic [puppet] - 10https://gerrit.wikimedia.org/r/498788 (https://phabricator.wikimedia.org/T206203) [09:07:17] ACKNOWLEDGEMENT - Long running screen/tmux on prometheus2004 is CRITICAL: CRIT: Long running SCREEN process. (user: root PID: 7256, 2060027s 1728000s). Filippo Giunchedi known [09:07:19] mutante: jenkins broken? [09:07:36] marostegui: either that or really slow [09:07:49] I am not seeing it processing any jobs [09:07:51] https://integration.wikimedia.org/zuul/ [09:08:02] yeah, it is empty [09:08:44] the zuul service on contint1001 is active (runnin) [09:09:58] following the wikitech docs to restart it [09:10:14] !log contint1001 - restarting zuul [09:10:16] (03CR) 10Marostegui: "recheck" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/498785 (owner: 10Marostegui) [09:10:16] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:10:27] i see log activity now [09:10:45] (03CR) 10Dzahn: [C: 03+1] "recheck" [puppet] - 10https://gerrit.wikimedia.org/r/498057 (https://phabricator.wikimedia.org/T218844) (owner: 10MarcoAurelio) [09:10:59] yeah, I see your change now being checked [09:11:03] cool [09:11:20] did the exact command from https://www.mediawiki.org/wiki/Continuous_integration/Zuul#Restart [09:11:21] and now mine [09:12:06] yep, graphs look better too [09:13:21] (03CR) 10Marostegui: "recheck" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/498785 (owner: 10Marostegui) [09:13:26] (03CR) 10Dzahn: [C: 03+2] contint: change `/r/p/` to `/r/` for gerrit links [puppet] - 10https://gerrit.wikimedia.org/r/498057 (https://phabricator.wikimedia.org/T218844) (owner: 10MarcoAurelio) [09:13:30] merged the contint change to fix gerrit links (unrelated but also CI) [09:13:34] (03CR) 10Marostegui: [C: 03+2] Revert "db-eqiad.php: Depool db1105:3311" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/498785 (owner: 10Marostegui) [09:15:07] hashar: hi. fyi just restarted zuul. it had empty queue and after simple restart works again. then merged that change to fix the Gerrit URLs from /r/p/ to /r/ [09:15:13] first thing unrelated to second [09:15:23] eek [09:15:46] which change was it? [09:16:03] !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Repool db1105:3311 (duration: 00m 49s) [09:16:04] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:16:19] hashar: https://gerrit.wikimedia.org/r/498057 with your +1 [09:16:22] (03Merged) 10jenkins-bot: Revert "db-eqiad.php: Depool db1105:3311" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/498785 (owner: 10Marostegui) [09:16:23] (03CR) 10jenkins-bot: Revert "db-eqiad.php: Depool db1105:3311" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/498785 (owner: 10Marostegui) [09:16:56] mutante: ah . Well it has nothing to do with Zuul :] [09:16:58] (03CR) 10Muehlenhoff: "recheck" [puppet] - 10https://gerrit.wikimedia.org/r/498786 (owner: 10Muehlenhoff) [09:17:10] hashar: i know.. that's why i said it has nothinng to do with it :) [09:17:46] then I am confused as to why you had to restart Zuul [09:17:53] then I am still waking up :] [09:18:36] !log Upgrade db2055 [09:18:37] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:18:49] (03PS1) 10Marostegui: db-eqiad.php: Depool db1083 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/498789 [09:19:51] hashar: these are 2 entirely unrelated events. a) zuul did not work, restarting it fixed it. b) after zuul worked again i merged something [09:20:08] (03CR) 10Marostegui: [C: 03+2] db-eqiad.php: Depool db1083 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/498789 (owner: 10Marostegui) [09:20:27] AH [09:21:16] (03Merged) 10jenkins-bot: db-eqiad.php: Depool db1083 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/498789 (owner: 10Marostegui) [09:21:48] so either I ruin my morning trying to figure what went wrong with zuul OR ... I happily ignore that incident since it is resolved [09:22:22] !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Depool db1083+ (duration: 00m 49s) [09:22:23] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:22:58] hashar: yes, pick B [09:23:13] [B] [09:24:14] (03PS2) 10Muehlenhoff: Remove access for melodykramer [puppet] - 10https://gerrit.wikimedia.org/r/498786 [09:24:24] !log contint1001: manually compressing Zuul log files sudo -u zuul gzip --best /var/log/zuul/*.log.????-??-?? [09:24:26] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:24:32] somehow I thought logrotate was compressing them [09:25:53] (03CR) 10Muehlenhoff: [C: 03+2] Remove access for melodykramer [puppet] - 10https://gerrit.wikimedia.org/r/498786 (owner: 10Muehlenhoff) [09:26:14] (03CR) 10jenkins-bot: db-eqiad.php: Depool db1083 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/498789 (owner: 10Marostegui) [09:33:10] 10Operations, 10Cloud-VPS, 10Toolforge, 10LDAP, and 2 others: LDAP server running out of memory frequently and disrupting Cloud VPS clients - https://phabricator.wikimedia.org/T217280 (10hashar) >>! In T217280#5008363, @GTirloni wrote: > Most active talkers to LDAP (in bytes, ~10min packet capture): > >... [09:34:04] !log Upgrade db2062 [09:34:05] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:35:00] (03CR) 10Filippo Giunchedi: logstash: send varnish syslogs via kafka logging pipeline (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/498467 (https://phabricator.wikimedia.org/T213899) (owner: 10Herron) [09:38:07] (03CR) 10Filippo Giunchedi: [C: 03+1] Switch mjolnir to rsyslog based structured logging [puppet] - 10https://gerrit.wikimedia.org/r/498232 (https://phabricator.wikimedia.org/T218833) (owner: 10EBernhardson) [09:39:47] (03CR) 10Filippo Giunchedi: [C: 03+1] Fix pinning for smartmontools [puppet] - 10https://gerrit.wikimedia.org/r/498205 (https://phabricator.wikimedia.org/T216711) (owner: 10Muehlenhoff) [09:41:32] (03CR) 10Filippo Giunchedi: [C: 03+1] "Nice!" [puppet] - 10https://gerrit.wikimedia.org/r/496776 (https://phabricator.wikimedia.org/T215277) (owner: 10Jbond) [09:42:28] !log uploaded openjdk 8u212-b01-1~deb8u1 to apt.wikimedia.org/jessie-wikimedia/main [09:42:29] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:42:46] (03CR) 10DCausse: [C: 03+1] Enable WBCS on Testcommons too [mediawiki-config] - 10https://gerrit.wikimedia.org/r/498442 (https://phabricator.wikimedia.org/T218715) (owner: 10Smalyshev) [09:42:52] (03PS1) 10Marostegui: db-eqiad.php: Slowly repool db1083 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/498791 [09:49:11] (03CR) 10Marostegui: "recheck" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/498791 (owner: 10Marostegui) [09:50:06] mutante: looks like zuul is doing the same thing again [09:50:13] as in, not processing anything [09:50:22] (03CR) 10Marostegui: "recheck" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/498791 (owner: 10Marostegui) [09:52:09] Although the logs shows lots of activity [09:55:09] hashar: ^ [09:58:31] i see activity for beta-scap-eqiad [09:58:34] (03PS2) 10Arturo Borrero Gonzalez: Toolforge: "Tool Labs" legacy name updates [puppet] - 10https://gerrit.wikimedia.org/r/498754 (owner: 10BryanDavis) [09:58:36] It looks like it has a big pile of things to process as per zuul.w.o (just updated that) [09:58:42] 10Operations, 10MediaWiki-Cache, 10Performance-Team (Radar), 10User-Elukey: mcrouter does not remove a memcached shard from consistent hashing when timeouts happen - https://phabricator.wikimedia.org/T208934 (10elukey) Very interesting reading about FailoverRoute and deletes: https://github.com/facebook/mc... [09:58:56] catching up with the backlog that piled up while it was down. i guess [09:59:48] (03CR) 10Marostegui: [C: 03+2] db-eqiad.php: Slowly repool db1083 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/498791 (owner: 10Marostegui) [10:00:28] (03CR) 10Arturo Borrero Gonzalez: [C: 03+2] "Much of this code (toollabs*) is going away soon, but I'm merging anyway." [puppet] - 10https://gerrit.wikimedia.org/r/498754 (owner: 10BryanDavis) [10:01:06] (03Merged) 10jenkins-bot: db-eqiad.php: Slowly repool db1083 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/498791 (owner: 10Marostegui) [10:01:08] (03PS1) 10Muehlenhoff: Add Cumin aliases for Hadoop test cluster [puppet] - 10https://gerrit.wikimedia.org/r/498792 [10:02:18] !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Slowly repool db1083 (duration: 00m 49s) [10:02:19] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:04:58] (03CR) 10Arturo Borrero Gonzalez: [C: 03+2] network: labs: remove outdated entries [puppet] - 10https://gerrit.wikimedia.org/r/498698 (owner: 10Alex Monk) [10:06:58] (03PS1) 10Marostegui: db-eqiad.php: More traffic to db1083 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/498794 [10:07:23] !log installing ntfs-3g security updates [10:07:25] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:07:29] (03PS3) 10Arturo Borrero Gonzalez: network: Allow customisation of cumin list on a per-project basis [puppet] - 10https://gerrit.wikimedia.org/r/498697 (owner: 10Alex Monk) [10:09:41] (03CR) 10Arturo Borrero Gonzalez: [C: 03+2] "Honestly I don't like the inlined hiera() call. This file has several of them already, so merging anyway." [puppet] - 10https://gerrit.wikimedia.org/r/498697 (owner: 10Alex Monk) [10:09:44] (03CR) 10jenkins-bot: db-eqiad.php: Slowly repool db1083 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/498791 (owner: 10Marostegui) [10:10:37] (03CR) 10Marostegui: [C: 03+2] db-eqiad.php: More traffic to db1083 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/498794 (owner: 10Marostegui) [10:11:09] (03PS2) 10Arturo Borrero Gonzalez: network: labs: remove outdated entries [puppet] - 10https://gerrit.wikimedia.org/r/498698 (owner: 10Alex Monk) [10:11:41] (03Merged) 10jenkins-bot: db-eqiad.php: More traffic to db1083 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/498794 (owner: 10Marostegui) [10:12:53] !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: More traffic to db1083 (duration: 00m 49s) [10:12:54] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:14:41] (03PS2) 10Muehlenhoff: Add Cumin aliases for Hadoop test cluster [puppet] - 10https://gerrit.wikimedia.org/r/498792 [10:14:46] (03PS2) 10Arturo Borrero Gonzalez: openstack: Add an instance spread check for deployment-prep [puppet] - 10https://gerrit.wikimedia.org/r/498699 (https://phabricator.wikimedia.org/T219088) (owner: 10Alex Monk) [10:16:34] (03PS3) 10Jcrespo: BackupStatistics: Narrow the search for the treated backup [software/wmfmariadbpy] - 10https://gerrit.wikimedia.org/r/498782 (https://phabricator.wikimedia.org/T206203) [10:16:37] PROBLEM - IPv6 ping to eqsin on ripe-atlas-eqsin IPv6 is CRITICAL: CRITICAL - failed 39 probes of 399 (alerts on 35) - https://atlas.ripe.net/measurements/11645088/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23RIPE_alerts [10:16:47] (03CR) 10Arturo Borrero Gonzalez: [C: 03+2] openstack: Add an instance spread check for deployment-prep [puppet] - 10https://gerrit.wikimedia.org/r/498699 (https://phabricator.wikimedia.org/T219088) (owner: 10Alex Monk) [10:16:58] (03CR) 10jerkins-bot: [V: 04-1] BackupStatistics: Narrow the search for the treated backup [software/wmfmariadbpy] - 10https://gerrit.wikimedia.org/r/498782 (https://phabricator.wikimedia.org/T206203) (owner: 10Jcrespo) [10:17:27] (03PS2) 10Jcrespo: mariadb-backups: Fix big on state updating logic [puppet] - 10https://gerrit.wikimedia.org/r/498788 (https://phabricator.wikimedia.org/T206203) [10:17:49] PROBLEM - puppet last run on cloudvirtan1003 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 5 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[initramfs-tools] [10:17:51] (03PS3) 10Muehlenhoff: Add Cumin aliases for Hadoop test cluster [puppet] - 10https://gerrit.wikimedia.org/r/498792 [10:18:11] (03PS3) 10Jcrespo: mariadb-backups: Fix bug on state updating logic [puppet] - 10https://gerrit.wikimedia.org/r/498788 (https://phabricator.wikimedia.org/T206203) [10:18:52] (03PS4) 10Muehlenhoff: Add Cumin aliases for Hadoop test cluster [puppet] - 10https://gerrit.wikimedia.org/r/498792 [10:19:43] (03PS5) 10Filippo Giunchedi: profile: kafkatee instance for udp2log compat [puppet] - 10https://gerrit.wikimedia.org/r/498386 (https://phabricator.wikimedia.org/T126989) [10:20:38] (03CR) 10jenkins-bot: db-eqiad.php: More traffic to db1083 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/498794 (owner: 10Marostegui) [10:21:52] (03PS1) 10Alex Monk: network::constants: Move hiera calls to the parameters [puppet] - 10https://gerrit.wikimedia.org/r/498796 [10:21:53] RECOVERY - IPv6 ping to eqsin on ripe-atlas-eqsin IPv6 is OK: OK - failed 20 probes of 399 (alerts on 35) - https://atlas.ripe.net/measurements/11645088/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23RIPE_alerts [10:21:57] (03CR) 10Filippo Giunchedi: profile: kafkatee instance for udp2log compat (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/498386 (https://phabricator.wikimedia.org/T126989) (owner: 10Filippo Giunchedi) [10:22:26] (03CR) 10Marostegui: [C: 03+1] "Looks good, I haven't actually tested the query itself, but I assume you did :-)" [puppet] - 10https://gerrit.wikimedia.org/r/498788 (https://phabricator.wikimedia.org/T206203) (owner: 10Jcrespo) [10:22:30] (03CR) 10Muehlenhoff: [C: 03+2] Add Cumin aliases for Hadoop test cluster [puppet] - 10https://gerrit.wikimedia.org/r/498792 (owner: 10Muehlenhoff) [10:22:54] (03CR) 10jerkins-bot: [V: 04-1] network::constants: Move hiera calls to the parameters [puppet] - 10https://gerrit.wikimedia.org/r/498796 (owner: 10Alex Monk) [10:23:00] (03PS1) 10Arturo Borrero Gonzalez: Revert "network: Allow customisation of cumin list on a per-project basis" [puppet] - 10https://gerrit.wikimedia.org/r/498797 [10:23:09] (03PS1) 10Marostegui: db-eqiad.php: More traffic to db1083 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/498798 [10:23:29] (03CR) 10jerkins-bot: [V: 04-1] Revert "network: Allow customisation of cumin list on a per-project basis" [puppet] - 10https://gerrit.wikimedia.org/r/498797 (owner: 10Arturo Borrero Gonzalez) [10:24:04] (03CR) 10Alex Monk: [C: 04-1] "? that page doesn't explain how to do this?" [puppet] - 10https://gerrit.wikimedia.org/r/498797 (owner: 10Arturo Borrero Gonzalez) [10:24:21] (03CR) 10Marostegui: [C: 03+2] db-eqiad.php: More traffic to db1083 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/498798 (owner: 10Marostegui) [10:24:58] (03CR) 10Alex Monk: [C: 04-1] "Specifically this was needed so $CUMIN_MASTERS in profile::puppetdb would be set correctly" [puppet] - 10https://gerrit.wikimedia.org/r/498797 (owner: 10Arturo Borrero Gonzalez) [10:25:25] (03Merged) 10jenkins-bot: db-eqiad.php: More traffic to db1083 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/498798 (owner: 10Marostegui) [10:26:26] !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: More traffic to db1083 (duration: 00m 49s) [10:26:27] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:27:02] (03PS2) 10Arturo Borrero Gonzalez: Revert "network: Allow customisation of cumin list on a per-project basis" [puppet] - 10https://gerrit.wikimedia.org/r/498797 [10:27:17] (03CR) 10Alex Monk: [C: 04-1] "per PS1" [puppet] - 10https://gerrit.wikimedia.org/r/498797 (owner: 10Arturo Borrero Gonzalez) [10:27:42] !log installing Java security updates on Hadoop/Druid test cluster [10:27:43] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:29:48] 10Operations, 10serviceops: SRE FY2019 Q4 goal: complete the transition to PHP7 - https://phabricator.wikimedia.org/T219127 (10Joe) p:05Triage→03Normal [10:30:04] jan_drewniak: #bothumor I � Unicode. All rise for Wikimedia Portals Update deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20190325T1030). [10:31:13] 10Operations, 10serviceops: SRE FY2019 Q4 goal: complete the transition to PHP7 - https://phabricator.wikimedia.org/T219127 (10Joe) [10:31:21] (03CR) 10jenkins-bot: db-eqiad.php: More traffic to db1083 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/498798 (owner: 10Marostegui) [10:32:23] (03PS2) 10Alex Monk: network::constants: Move hiera calls to the parameters [puppet] - 10https://gerrit.wikimedia.org/r/498796 [10:32:56] (03PS1) 10Marostegui: db-eqiad.php: More traffic to db1083 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/498799 [10:33:14] (03PS1) 10Jdrewniak: Bumping portals to master [mediawiki-config] - 10https://gerrit.wikimedia.org/r/498800 (https://phabricator.wikimedia.org/T128546) [10:33:25] (03PS19) 10Dzahn: icinga/planet: add generic check_lastmod plugin and check planet updates [puppet] - 10https://gerrit.wikimedia.org/r/472713 (https://phabricator.wikimedia.org/T203208) [10:34:01] (03CR) 10Alex Monk: "I'm dealing with the inline hiera calls in If770c8e5" [puppet] - 10https://gerrit.wikimedia.org/r/498697 (owner: 10Alex Monk) [10:34:17] (03CR) 10Dzahn: "i resolved all of the original comments except one optional one about datetime. boldly going ahead" [puppet] - 10https://gerrit.wikimedia.org/r/472713 (https://phabricator.wikimedia.org/T203208) (owner: 10Dzahn) [10:34:19] (03CR) 10Jdrewniak: [C: 03+2] Bumping portals to master [mediawiki-config] - 10https://gerrit.wikimedia.org/r/498800 (https://phabricator.wikimedia.org/T128546) (owner: 10Jdrewniak) [10:35:25] (03Merged) 10jenkins-bot: Bumping portals to master [mediawiki-config] - 10https://gerrit.wikimedia.org/r/498800 (https://phabricator.wikimedia.org/T128546) (owner: 10Jdrewniak) [10:36:11] 10Operations, 10serviceops, 10Beta-Feature: Remove php7 beta feature - https://phabricator.wikimedia.org/T219128 (10Joe) p:05Triage→03Normal [10:37:41] (03CR) 10Volans: [C: 03+1] "> Patch Set 1:" [puppet] - 10https://gerrit.wikimedia.org/r/498797 (owner: 10Arturo Borrero Gonzalez) [10:37:44] !log jdrewniak@deploy1001 Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: [[gerrit:498800| Bumping portals to master (T128546)]] (duration: 00m 49s) [10:37:47] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:37:48] T128546: [Recurring Task] Update Wikipedia and sister projects portals statistics - https://phabricator.wikimedia.org/T128546 [10:37:55] (03CR) 10Marostegui: [C: 03+2] db-eqiad.php: More traffic to db1083 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/498799 (owner: 10Marostegui) [10:38:34] !log jdrewniak@deploy1001 Synchronized portals: Wikimedia Portals Update: [[gerrit:498800| Bumping portals to master (T128546)]] (duration: 00m 49s) [10:38:36] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:39:03] (03Merged) 10jenkins-bot: db-eqiad.php: More traffic to db1083 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/498799 (owner: 10Marostegui) [10:40:00] !log disable deprecation warnings on elasticsearch eqiad - T218994 [10:40:01] !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: More traffic to db1083 (duration: 00m 49s) [10:40:01] 10Operations, 10CirrusSearch, 10Discovery-Search, 10Patch-For-Review: Deprecation warning on elasticsearch 6 expected [retry_on_conflict] - https://phabricator.wikimedia.org/T218994 (10Gehel) >>! In T218994#5048792, @EBernhardson wrote: > I suspect the following would mute the error messages: > ` > curl -X... [10:40:02] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:40:03] T218994: Deprecation warning on elasticsearch 6 expected [retry_on_conflict] - https://phabricator.wikimedia.org/T218994 [10:40:04] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:40:24] (03PS1) 10Ema: trafficserver (8.0.3-1wm1) stretch-wikimedia; urgency=medium [debs/trafficserver] - 10https://gerrit.wikimedia.org/r/498801 [10:40:33] 10Operations, 10Traffic, 10serviceops: Allow directing a percentage of API traffic to PHP7 - https://phabricator.wikimedia.org/T219129 (10Joe) [10:40:50] (03PS1) 10Marostegui: db-eqiad.php: Fully repool db1083 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/498802 [10:41:43] (03CR) 10Alex Monk: [C: 04-1] "We have PuppetDB in deployment-prep. The quickest and most easily maintainable solution was this. If you revert it we will have to maintai" [puppet] - 10https://gerrit.wikimedia.org/r/498797 (owner: 10Arturo Borrero Gonzalez) [10:42:03] (03CR) 10jenkins-bot: Bumping portals to master [mediawiki-config] - 10https://gerrit.wikimedia.org/r/498800 (https://phabricator.wikimedia.org/T128546) (owner: 10Jdrewniak) [10:42:05] (03CR) 10jenkins-bot: db-eqiad.php: More traffic to db1083 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/498799 (owner: 10Marostegui) [10:44:11] RECOVERY - puppet last run on cloudvirtan1003 is OK: OK: Puppet is currently enabled, last run 59 seconds ago with 0 failures [10:48:05] (03PS20) 10Dzahn: icinga/planet: add generic check_lastmod plugin and check planet updates [puppet] - 10https://gerrit.wikimedia.org/r/472713 (https://phabricator.wikimedia.org/T203208) [10:51:48] (03CR) 10Dzahn: [V: 03+1 C: 03+2] "https://puppet-compiler.wmflabs.org/compiler1002/15311/icinga1001.wikimedia.org/" [puppet] - 10https://gerrit.wikimedia.org/r/472713 (https://phabricator.wikimedia.org/T203208) (owner: 10Dzahn) [10:52:25] (03PS21) 10Dzahn: icinga/planet: add generic check_lastmod plugin and check planet updates [puppet] - 10https://gerrit.wikimedia.org/r/472713 (https://phabricator.wikimedia.org/T203208) [10:54:47] (03CR) 10Marostegui: [C: 03+2] db-eqiad.php: Fully repool db1083 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/498802 (owner: 10Marostegui) [10:55:55] (03Merged) 10jenkins-bot: db-eqiad.php: Fully repool db1083 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/498802 (owner: 10Marostegui) [10:56:54] !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Fully repool db1083 (duration: 00m 48s) [10:56:54] (03CR) 10Alex Monk: [C: 04-1] "(We were attempting to minimise cherry-picks on the puppetmaster (see T135427 among other tasks) but code-reviews like this have been maki" [puppet] - 10https://gerrit.wikimedia.org/r/498797 (owner: 10Arturo Borrero Gonzalez) [10:56:55] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:59:45] (03CR) 10Dzahn: "yep, this will lead to more cherry-picks and more "labs is not like prod" :/" [puppet] - 10https://gerrit.wikimedia.org/r/498797 (owner: 10Arturo Borrero Gonzalez) [11:00:04] addshore, hashar, aude, MaxSem, twentyafterfour, RoanKattouw, Dereckson, thcipriani, Niharika, and zeljkof: #bothumor Q:Why did functions stop calling each other? A:They had arguments. Rise for European Mid-day SWAT(Max 6 patches) . (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20190325T1100). [11:00:05] Lucas_WMDE, Daimona, and _joe_: A patch you scheduled for European Mid-day SWAT(Max 6 patches) is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker. [11:00:07] (03CR) 10Giuseppe Lavagetto: [C: 04-1] "While I think I understand the problem you're trying to solve, this patch doesn't work in general. It worked on the patch you've tried it " (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/498731 (https://phabricator.wikimedia.org/T219085) (owner: 10Alex Monk) [11:00:13] o/ [11:00:14] o/ [11:00:17] <_joe_> o/ [11:00:41] Lucas_WMDE, Daimona, and _joe_ if any of you are deployers, go ahead with your changes, while I get ready [11:00:52] I'm not [11:01:05] o/ [11:01:05] (03CR) 10Volans: profile: kafkatee instance for udp2log compat (033 comments) [puppet] - 10https://gerrit.wikimedia.org/r/498386 (https://phabricator.wikimedia.org/T126989) (owner: 10Filippo Giunchedi) [11:01:09] Lucas_WMDE, _joe_: are you deployers? [11:01:10] <_joe_> I am, I will merge the first of my changes if Lucas_WMDE is not around [11:01:11] I’m a deployer [11:01:18] <_joe_> ok then go on [11:01:21] <_joe_> :) [11:01:33] <_joe_> zeljkof: I will merge the second of my patches a bit later [11:01:46] <_joe_> after everything else is deployed [11:01:46] Lucas_WMDE, _joe_: anybody wants to deploy Daimona's patches? [11:02:00] <_joe_> zeljkof: I don't really, I've my hands full [11:02:20] _joe_: ok [11:02:22] (03CR) 10Dzahn: [C: 04-1] Gerrit: Support switching ldap servers (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/494811 (owner: 10Paladox) [11:02:31] Lucas_WMDE: can you deploy Daimona's patches? [11:02:58] I think I can, yeah [11:03:12] Lucas_WMDE: great, thanks [11:03:19] btw it looks like my patch will take a while to go through Zuul (backport on Wikibase) [11:03:19] in that case, swat is taken care of [11:03:30] I'm around if anybody needs me [11:03:31] if anyone else wants to go first…? (or is that verboten, having multiple SWAT things in Zuul?) [11:03:42] _joe_ ^ [11:03:44] <_joe_> Lucas_WMDE: no I can wait :) [11:03:48] ok :) [11:03:57] <_joe_> and yes, I'd prefer to keep a clean rebase tree [11:04:03] I think it's ok to deploy a config change while a backport is in CI [11:04:27] I can also use the time to look at Daimona’s patches to see if I’m actually comfortable deploying them ^^ [11:04:33] (03CR) 10jenkins-bot: db-eqiad.php: Fully repool db1083 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/498802 (owner: 10Marostegui) [11:04:38] _joe_: are config and core/extension/skin patches connected? [11:04:57] meaning, while a backport is in CI, is it ok deploy a config change? [11:05:01] Lucas_WMDE cool, let me know :-) [11:05:16] <_joe_> zeljkof: not really, right [11:05:17] since a config change takes a minute and a backport ten or so [11:05:28] <_joe_> ok, lemme do my first change then [11:05:34] Just please ping me, in the meanwhile I'm tweaking the blackout on itwp [11:06:11] 2× ack [11:06:17] (03PS5) 10Giuseppe Lavagetto: Use the local proxy for search under php7 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/494755 (https://phabricator.wikimedia.org/T215491) [11:06:28] (03CR) 10Giuseppe Lavagetto: [C: 03+2] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/494755 (https://phabricator.wikimedia.org/T215491) (owner: 10Giuseppe Lavagetto) [11:06:31] (03CR) 10Alex Monk: wmf_style lint: Show new errors that already appear on other lines (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/498731 (https://phabricator.wikimedia.org/T219085) (owner: 10Alex Monk) [11:07:06] (03CR) 10Ema: [C: 03+2] trafficserver (8.0.3-1wm1) stretch-wikimedia; urgency=medium [debs/trafficserver] - 10https://gerrit.wikimedia.org/r/498801 (owner: 10Ema) [11:07:33] (03Merged) 10jenkins-bot: Use the local proxy for search under php7 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/494755 (https://phabricator.wikimedia.org/T215491) (owner: 10Giuseppe Lavagetto) [11:07:41] PROBLEM - Check correctness of the icinga configuration on icinga1001 is CRITICAL: Icinga configuration contains errors https://wikitech.wikimedia.org/wiki/Icinga [11:07:52] <_joe_> uh can someone look? ^^ [11:08:31] _joe_: Service check command 'check_lastmod' specified in service 'check updates on en.planet.wikimedia.org' not defined [11:08:36] cc mutante [11:08:58] (03CR) 10Arturo Borrero Gonzalez: "Perhaps the hiera calls should be on a profile, somethine like:" [puppet] - 10https://gerrit.wikimedia.org/r/498796 (owner: 10Alex Monk) [11:09:01] <_joe_> ok deploying [11:09:41] !log trafficserver 8.0.3-1wm1 uploaded to stretch-wikimedia [11:09:43] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:10:07] (03CR) 10Lucas Werkmeister (WMDE): [C: 03+1] Enable logging of private filters on commonswiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/497236 (https://phabricator.wikimedia.org/T218527) (owner: 10Ammarpad) [11:10:30] (03CR) 10Lucas Werkmeister (WMDE): [C: 03+1] Remove $wgAbuseFilterProfile [mediawiki-config] - 10https://gerrit.wikimedia.org/r/486280 (https://phabricator.wikimedia.org/T191039) (owner: 10Daimona Eaytoy) [11:10:31] ACKNOWLEDGEMENT - Check correctness of the icinga configuration on icinga1001 is CRITICAL: Icinga configuration contains errors daniel_zahn WIP https://wikitech.wikimedia.org/wiki/Icinga [11:10:41] (03CR) 10Lucas Werkmeister (WMDE): [C: 03+1] Remove $wgAbuseFilterRuntimeProfile [mediawiki-config] - 10https://gerrit.wikimedia.org/r/486470 (https://phabricator.wikimedia.org/T191039) (owner: 10Daimona Eaytoy) [11:11:02] (03CR) 10Alex Monk: "We'd have to replace the network::constants includes all over the codebase and I'm not sure you can just include profiles inside random cl" [puppet] - 10https://gerrit.wikimedia.org/r/498796 (owner: 10Alex Monk) [11:11:13] _joe_: volans: yes, i am fixing it, it's from a new check [11:13:47] <_joe_> ok, the change looks good on mwdebug1002 [11:13:53] <_joe_> deploying everywhere [11:15:31] (03CR) 10jenkins-bot: Use the local proxy for search under php7 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/494755 (https://phabricator.wikimedia.org/T215491) (owner: 10Giuseppe Lavagetto) [11:16:19] !log oblivian@deploy1001 Synchronized wmf-config/LabsServices.php: switching to use of the local proxy for search in php7 (duration: 00m 50s) [11:16:20] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:16:53] btw, is there a convenient way to look up the person behind a (production) shell username? [11:17:07] (right now I’m guessing that oblivian must be _joe_) [11:17:16] <_joe_> yes [11:18:25] PROBLEM - puppet last run on etcd1001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [11:18:33] (03CR) 10Volans: [C: 03+1] "I'll try to answer to the previous comments:" [puppet] - 10https://gerrit.wikimedia.org/r/498797 (owner: 10Arturo Borrero Gonzalez) [11:18:34] grep -A1 "name: $username" modules/admin/data/data.yaml [11:18:34] !log oblivian@deploy1001 Synchronized wmf-config/ProductionServices.php: switching to use of the local proxy for search in php7 (duration: 00m 50s) [11:18:36] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:18:36] (checking modules/admin/data/data.yaml in operations/puppet.git works, I guess) [11:18:43] <_joe_> Lucas_WMDE: you can see in puppet yes [11:18:48] okay, thanks mutante :) [11:18:56] <_joe_> ok, done [11:18:59] <_joe_> I'm done [11:19:00] perhaps I’ll build myself a nifty shell alias for that [11:19:03] alright [11:19:10] my backport looks like it’s almost through gate-and-submit [11:19:19] so I’ll wait for that before doing Daimona’s patches [11:19:40] ack [11:19:46] !log cp-ats-eqiad: upgrade trafficserver to 8.0.3-1wm1 [11:19:47] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:23:06] !log switch codfw prometheus from prometheus2003 to prometheus2004 [11:23:07] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:23:44] <_joe_> Lucas_WMDE: lemme know when you're done and I can merge my second patch [11:23:49] !log filippo@puppetmaster1001 conftool action : set/pooled=yes; selector: name=prometheus2004.codfw.wmnet [11:23:50] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:24:02] ok [11:24:11] (oooh, your second patch looks exciting! yay php7) [11:24:19] (03PS1) 10Dzahn: nagios_common: add check command check_lastmod [puppet] - 10https://gerrit.wikimedia.org/r/498807 (https://phabricator.wikimedia.org/T203208) [11:24:41] (03CR) 10Jbond: [C: 03+2] Create and mtail parser for ulogd and install it on the syslog server [puppet] - 10https://gerrit.wikimedia.org/r/496776 (https://phabricator.wikimedia.org/T215277) (owner: 10Jbond) [11:24:53] (03CR) 10Alex Monk: [C: 04-1] "The commit message talks about solving a different problem than what was assumed to be the purpose of the patch being reverted." [puppet] - 10https://gerrit.wikimedia.org/r/498797 (owner: 10Arturo Borrero Gonzalez) [11:24:59] (03PS8) 10Jbond: Create and mtail parser for ulogd and install it on the syslog server [puppet] - 10https://gerrit.wikimedia.org/r/496776 (https://phabricator.wikimedia.org/T215277) [11:26:22] (03PS2) 10Dzahn: nagios_common: add check command check_lastmod [puppet] - 10https://gerrit.wikimedia.org/r/498807 (https://phabricator.wikimedia.org/T203208) [11:26:34] !log filippo@puppetmaster1001 conftool action : set/pooled=no; selector: name=prometheus2003.codfw.wmnet [11:26:35] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:26:41] looks like the final jenkins build is taking a few more minutes, sorry about that… [11:27:51] (03CR) 10Dzahn: [C: 03+2] nagios_common: add check command check_lastmod [puppet] - 10https://gerrit.wikimedia.org/r/498807 (https://phabricator.wikimedia.org/T203208) (owner: 10Dzahn) [11:28:12] (03PS3) 10Dzahn: nagios_common: add check command check_lastmod [puppet] - 10https://gerrit.wikimedia.org/r/498807 (https://phabricator.wikimedia.org/T203208) [11:29:37] 10Operations, 10serviceops: Use PHP7 to run maintenance scripts - https://phabricator.wikimedia.org/T219135 (10Joe) [11:29:48] fatalmonitor seems to be full of slow queries right now, is that expected? [11:30:10] a bunch of SELECT MASTER_GTID_WAIT('…', 10) apparently [11:30:28] checking [11:30:43] (03PS2) 10Vgutierrez: lists: Switch to the directory based deployment used by acme-chief [puppet] - 10https://gerrit.wikimedia.org/r/498781 (https://phabricator.wikimedia.org/T207295) [11:30:47] meanwhile, my backport finally got merged, deploying [11:31:03] (03Abandoned) 10Ppchelko: Map syslog{severity,facility}-text to {severity,facility}_label [puppet] - 10https://gerrit.wikimedia.org/r/497321 (https://phabricator.wikimedia.org/T211125) (owner: 10Ppchelko) [11:31:22] (03PS2) 10Muehlenhoff: Fix pinning for smartmontools [puppet] - 10https://gerrit.wikimedia.org/r/498205 (https://phabricator.wikimedia.org/T216711) [11:32:10] (03CR) 10Vgutierrez: [C: 03+2] "TLS material looks as expected:" [puppet] - 10https://gerrit.wikimedia.org/r/498781 (https://phabricator.wikimedia.org/T207295) (owner: 10Vgutierrez) [11:32:32] (03PS3) 10Vgutierrez: lists: Switch to the directory based deployment used by acme-chief [puppet] - 10https://gerrit.wikimedia.org/r/498781 (https://phabricator.wikimedia.org/T207295) [11:32:46] (03PS9) 10Jbond: Create and mtail parser for ulogd and install it on the syslog server [puppet] - 10https://gerrit.wikimedia.org/r/496776 (https://phabricator.wikimedia.org/T215277) [11:32:58] (03PS3) 10Ppchelko: Create node-specific logstash filters for syslog. [puppet] - 10https://gerrit.wikimedia.org/r/498417 (https://phabricator.wikimedia.org/T211125) [11:33:00] sigh.. fighting with puppet on icinga but i got it [11:33:14] backport looks fine on mwdebug1002, deploying [11:34:28] (03CR) 10Alex Monk: [C: 04-1] "*The commit message talks about solving what was assumed to be the purpose of the patch being reverted, but it was mistaken about the purp" [puppet] - 10https://gerrit.wikimedia.org/r/498797 (owner: 10Arturo Borrero Gonzalez) [11:35:24] (03PS10) 10Jbond: Create and mtail parser for ulogd and install it on the syslog server [puppet] - 10https://gerrit.wikimedia.org/r/496776 (https://phabricator.wikimedia.org/T215277) [11:35:40] !log lucaswerkmeister-wmde@deploy1001 Synchronized php-1.33.0-wmf.22/extensions/Wikibase/repo: SWAT: [[gerrit:498354| Revert "OutputPageBeforeHTML: do nothing for non entity pages" (T218907)]] (duration: 01m 06s) [11:35:42] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:35:43] T218907: $UNIQ[hash]$ on Special:Undelete for items - https://phabricator.wikimedia.org/T218907 [11:35:46] (03CR) 10Thiemo Kreuz (WMDE): [C: 03+1] "As said before, please don't merge this before April 4." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/498371 (https://phabricator.wikimedia.org/T218766) (owner: 10WMDE-Fisch) [11:35:50] (03CR) 10Volans: [C: 03+2] check_icinga: don't page on secondary host alerts [software/external-monitoring] - 10https://gerrit.wikimedia.org/r/498425 (owner: 10Volans) [11:35:56] _joe_: do you want to do your second patch before I proceed with Daimona’s patches? [11:35:57] <_joe_> Lucas_WMDE: uhm can you pause for a sec? [11:36:19] (03Merged) 10jenkins-bot: check_icinga: don't page on secondary host alerts [software/external-monitoring] - 10https://gerrit.wikimedia.org/r/498425 (owner: 10Volans) [11:36:22] anything wrong? [11:36:37] PROBLEM - HHVM rendering on mwdebug1002 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers [11:36:53] <_joe_> we're looking at what you mentioned before [11:36:59] <_joe_> what's up with mwdebug1002? [11:37:07] <_joe_> did you deploy your change there? [11:37:11] yes [11:37:15] (03PS3) 10Muehlenhoff: Fix pinning for smartmontools [puppet] - 10https://gerrit.wikimedia.org/r/498205 (https://phabricator.wikimedia.org/T216711) [11:37:15] looks like HTMLCacheUpdateJob::invalidateTitles avalanche [11:37:18] but also everywhere else by now [11:37:45] RECOVERY - HHVM rendering on mwdebug1002 is OK: HTTP OK: HTTP/1.1 200 OK - 79984 bytes in 0.363 second response time https://wikitech.wikimedia.org/wiki/Application_servers [11:37:52] the mwaint1002 script I spooted just stopped [11:39:01] <_joe_> jynus: uhm they should have a set concurrency [11:39:04] (03PS1) 10Dzahn: nagios_common: add command config for lastmod plugin [puppet] - 10https://gerrit.wikimedia.org/r/498810 (https://phabricator.wikimedia.org/T203208) [11:39:11] jynus: could it be CentralNotice (the invalidate titles stuff I mean)? [11:39:58] <_joe_> and indeed [11:39:58] the spike has stopped [11:40:01] <_joe_> https://grafana.wikimedia.org/d/000000400/jobqueue-eventbus?panelId=24&fullscreen&orgId=1 [11:40:01] (03CR) 10Dzahn: [C: 03+2] nagios_common: add command config for lastmod plugin [puppet] - 10https://gerrit.wikimedia.org/r/498810 (https://phabricator.wikimedia.org/T203208) (owner: 10Dzahn) [11:40:14] <_joe_> marostegui: no shit, look at htmlcacheupdate in that graph :P [11:40:16] (03PS2) 10Dzahn: nagios_common: add command config for lastmod plugin [puppet] - 10https://gerrit.wikimedia.org/r/498810 (https://phabricator.wikimedia.org/T203208) [11:40:26] <_joe_> Lucas_WMDE: you can go on I guess [11:40:39] _joe_: that matches the db activity I saw [11:40:47] <_joe_> but it's very limited [11:40:51] _joe_: I was done anyways for the moment – do you want to go before Daimona’s patches or should I do those? [11:40:59] <_joe_> Lucas_WMDE: go on [11:41:00] which led to lag, which led to errors [11:41:01] ok [11:41:20] Daimona: I assume the $wgAbuseFilterRuntimeProfile can’t be tested beyond “wiki still works” [11:41:21] good job ATS, you handled 12K purges per second! :) [11:41:28] any idea about enabling logging of private filters? [11:41:34] we need someone to look at cause [11:41:38] Here I am [11:41:52] (03PS9) 10Lucas Werkmeister (WMDE): Remove $wgAbuseFilterRuntimeProfile [mediawiki-config] - 10https://gerrit.wikimedia.org/r/486470 (https://phabricator.wikimedia.org/T191039) (owner: 10Daimona Eaytoy) [11:41:53] <_joe_> jynus: so the cause as far as the db is concerned is the jobs [11:41:56] 20K errors per minute https://logstash.wikimedia.org/goto/1d50bc74e44326a7fa0d9fd4bb2d9f59 [11:42:04] (03CR) 10Lucas Werkmeister (WMDE): [C: 03+2] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/486470 (https://phabricator.wikimedia.org/T191039) (owner: 10Daimona Eaytoy) [11:42:05] _joe_: and what caused the jobs :-P [11:42:07] I'll test it on Special:AbuseFilter and check if profiling still appears, plus a look on logstash [11:42:11] <_joe_> now where the jobs came from, that's another problem :D [11:42:14] he he [11:42:21] <_joe_> I don't think we have a good tracking of that [11:42:22] ok sounds good [11:42:25] As for the second, I'll lurk on Special:RecentChanges on commons and wait for a filter hit to show up [11:42:31] _joe_: however, db should be able to throttle automatically [11:42:33] <_joe_> mobrovac, Pchelolo maybe you have ideas? [11:42:50] or it used to, when concurrency was lower and connections reused [11:42:58] * Pchelolo reading the backscroll [11:43:08] <_joe_> jynus: and we probably need to look at rate-limiting better such events [11:43:12] (03PS1) 10Vgutierrez: mirrors: Switch to the directory based deployment used by acme-chief [puppet] - 10https://gerrit.wikimedia.org/r/498811 (https://phabricator.wikimedia.org/T207295) [11:43:23] (03Merged) 10jenkins-bot: Remove $wgAbuseFilterRuntimeProfile [mediawiki-config] - 10https://gerrit.wikimedia.org/r/486470 (https://phabricator.wikimedia.org/T191039) (owner: 10Daimona Eaytoy) [11:43:26] <_joe_> even if the concurrency of htmlCacheUpdate quadrupled for a few minutes, it still seems low [11:43:33] <_joe_> so I guess those jobs run very fast [11:44:09] (03CR) 10Muehlenhoff: [C: 03+2] Fix pinning for smartmontools [puppet] - 10https://gerrit.wikimedia.org/r/498205 (https://phabricator.wikimedia.org/T216711) (owner: 10Muehlenhoff) [11:44:10] Daimona: the first patch is on mwdebug1002, please test [11:44:37] checking [11:44:40] (03PS3) 10Dzahn: nagios_common: add command config for lastmod plugin [puppet] - 10https://gerrit.wikimedia.org/r/498810 (https://phabricator.wikimedia.org/T203208) [11:44:45] RECOVERY - puppet last run on etcd1001 is OK: OK: Puppet is currently enabled, last run 7 seconds ago with 0 failures [11:45:24] marostegui: https://grafana.wikimedia.org/d/000000363/mediawiki-mysql-loadbalancer?orgId=1&from=1553510712431&to=1553514312431 [11:46:33] marostegui: https://grafana.wikimedia.org/d/000000303/mysql-replication-lag?panelId=1&fullscreen&orgId=1&from=now-1h&to=now [11:46:58] !log cp-ats-codfw: upgrade trafficserver to 8.0.3-1wm1 [11:46:59] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:47:05] Lucas_WMDE looks good! [11:47:12] (03PS4) 10Lucas Werkmeister (WMDE): Remove $wgAbuseFilterProfile [mediawiki-config] - 10https://gerrit.wikimedia.org/r/486280 (https://phabricator.wikimedia.org/T191039) (owner: 10Daimona Eaytoy) [11:47:18] alright, thanks! [11:47:33] (03CR) 10Lucas Werkmeister (WMDE): [C: 03+2] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/486280 (https://phabricator.wikimedia.org/T191039) (owner: 10Daimona Eaytoy) [11:47:35] jynus: so one graphs says db1118 and the other says db1105 and db1099 (recent changes hosts, which are the ones I saw on tendril lagging) [11:48:03] _joe_: the max concurrency of htmlCacheUpdate is 10, I can make it lower if needed, but 10 seems to be quite a low number.. [11:48:08] !log lucaswerkmeister-wmde@deploy1001 Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:486470|Remove $wgAbuseFilterRuntimeProfile (T191039)]] (duration: 00m 49s) [11:48:10] (03CR) 10jenkins-bot: Remove $wgAbuseFilterRuntimeProfile [mediawiki-config] - 10https://gerrit.wikimedia.org/r/486470 (https://phabricator.wikimedia.org/T191039) (owner: 10Daimona Eaytoy) [11:48:11] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:48:11] T191039: Re-enable filter profiling on every wiki - https://phabricator.wikimedia.org/T191039 [11:48:36] (03Merged) 10jenkins-bot: Remove $wgAbuseFilterProfile [mediawiki-config] - 10https://gerrit.wikimedia.org/r/486280 (https://phabricator.wikimedia.org/T191039) (owner: 10Daimona Eaytoy) [11:48:46] regarding where they came from - jurging by the increase in p75 root_job_latency they were 1 big template update unfolding [11:48:49] (03CR) 10jenkins-bot: Remove $wgAbuseFilterProfile [mediawiki-config] - 10https://gerrit.wikimedia.org/r/486280 (https://phabricator.wikimedia.org/T191039) (owner: 10Daimona Eaytoy) [11:49:06] I can look in kafka which template that was if needed [11:49:07] <_joe_> Pchelolo: it seems! but in reality we've never gone over 6 or so [11:49:21] <_joe_> also it changes if they're per-wiki or distributed across wikis [11:49:26] Daimona: the second one is also on mwdebug1002 [11:49:35] * Daimona is testing [11:49:42] (note, this is another $wgAbuseFilterProfile removal, not yet enable logging of private filters) [11:50:41] Yep [11:50:54] Well, I have to say that profiling has disappeared, but don't revert yet [11:51:02] I have to check one more thing [11:51:14] (03PS1) 10Dzahn: nagios_common: rename lastmod plugin config to contain "check_" [puppet] - 10https://gerrit.wikimedia.org/r/498812 [11:51:49] Ahh dang [11:52:06] The removal of these globals from the AF codebase is still in master [11:52:11] oh [11:52:12] not on wmf.22? [11:52:15] Nope [11:52:19] (03CR) 10Dzahn: [C: 03+2] nagios_common: rename lastmod plugin config to contain "check_" [puppet] - 10https://gerrit.wikimedia.org/r/498812 (owner: 10Dzahn) [11:52:23] both? or just the second one? [11:52:23] However [11:52:29] (03PS2) 10Dzahn: nagios_common: rename lastmod plugin config to contain "check_" [puppet] - 10https://gerrit.wikimedia.org/r/498812 [11:52:35] I thought they were set to true as default in AF, but actually they weren't [11:52:36] Both [11:52:47] I guess there's not time to push a new commit for backport in wmf.22, right? [11:52:55] probably not [11:53:02] let’s revert the config changes for now, I’d say [11:53:13] So either it's reverted and SWATted again next weel, or wait for the train [11:53:24] I guess they're both fine [11:53:30] What you feel more comfortable with [11:53:34] It's the same [11:53:37] It won't cause any harm [11:53:44] I’m not sure what you mean, sorry [11:53:45] Just people won't have profiling data until train [11:53:53] Ah darn [11:53:55] I mean [11:54:28] You can either revert it now, and then I'll add it to a SWAT window next week; or deploy it all the same, as it won't cause major harm. [11:54:29] <_joe_> yeah please let's revert if something doesn't work [11:54:33] _joe_: I believe they were caused by an update on https://en.wikipedia.org/wiki/Module:Language/data/iana_scripts [11:54:35] okay reverting [11:54:42] Yay, good [11:54:55] <_joe_> Pchelolo: just for reference, how did you find that? [11:54:58] (03PS1) 10Lucas Werkmeister (WMDE): Revert "Remove $wgAbuseFilterProfile" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/498813 [11:55:04] kafkacat [11:55:11] (03PS1) 10Lucas Werkmeister (WMDE): Revert "Remove $wgAbuseFilterRuntimeProfile" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/498814 [11:55:15] <_joe_> Pchelolo: even better - is it on wikitech? [11:55:17] <_joe_> :P [11:55:20] (03CR) 10Lucas Werkmeister (WMDE): [C: 03+2] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/498813 (owner: 10Lucas Werkmeister (WMDE)) [11:55:26] <_joe_> so you went to look at one of these jobs [11:55:38] <_joe_> and you found the root job and searched that, right? [11:56:09] mutante: FYI puppet is still not running correctly on icinga [11:56:26] (03Merged) 10jenkins-bot: Revert "Remove $wgAbuseFilterProfile" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/498813 (owner: 10Lucas Werkmeister (WMDE)) [11:56:47] _joe_: I'll make a TODO item to write that up on wikitech [11:57:00] (03PS2) 10Lucas Werkmeister (WMDE): Revert "Remove $wgAbuseFilterRuntimeProfile" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/498814 [11:57:06] (03CR) 10Lucas Werkmeister (WMDE): [C: 03+2] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/498814 (owner: 10Lucas Werkmeister (WMDE)) [11:57:35] basically fetched some jobs with an offset that resembles the mid point of the incident and it was pretty obvious what's the majority of the jobs are about [11:57:40] I need to script that.. [11:58:06] <_joe_> Pchelolo: so that others can debug such issues as well, yep :) [11:58:10] (03Merged) 10jenkins-bot: Revert "Remove $wgAbuseFilterRuntimeProfile" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/498814 (owner: 10Lucas Werkmeister (WMDE)) [11:58:18] so, concurrency of 10 htmlCacheUpdates with batchSize of 300 it too much I guess? [11:58:23] <_joe_> Lucas_WMDE: can I go on with my change? [11:58:34] Daimona: both reverts are on mwdebug1002, please test [11:58:40] _joe_: sorry, still in the process of cleaning up [11:58:50] (03PS6) 10Giuseppe Lavagetto: Direct 0.1% of anonymous users to php7 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/494756 (https://phabricator.wikimedia.org/T216676) [11:59:02] (03CR) 10jenkins-bot: Revert "Remove $wgAbuseFilterProfile" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/498813 (owner: 10Lucas Werkmeister (WMDE)) [11:59:04] we can partition htmlCacheUpdate per mySQL shard just like we do with refreshLinks, it's not hard [11:59:04] (03CR) 10jenkins-bot: Revert "Remove $wgAbuseFilterRuntimeProfile" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/498814 (owner: 10Lucas Werkmeister (WMDE)) [11:59:24] that job just never caused any issues, so we didn't do it before [11:59:28] <_joe_> Pchelolo: that's probably advisable? I can't talk right now I have to SWAT :) [11:59:34] Lucas_WMDE reverts look good, please go ahead [11:59:39] ok. will file a ticket. [11:59:39] <_joe_> but marostegui / jynus might [11:59:47] volans: i am aware and working on it. the nagios_common stuff is horrible [12:00:08] first it wants the config without check_ prefix, then the other way around.. meh [12:00:40] !log lucaswerkmeister-wmde@deploy1001 Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:498814|Revert "Remove $wgAbuseFilterRuntimeProfile" (T191039)]] (duration: 00m 51s) [12:00:46] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:00:47] T191039: Re-enable filter profiling on every wiki - https://phabricator.wikimedia.org/T191039 [12:00:47] _joe_: done [12:00:54] Pchelolo: yeah, probably we need to adjust that value, we can discuss on the task [12:00:57] feel free to proceed with your patch if you want to extend the SWAT a bit [12:01:01] Pchelolo- there is 2 sides of things to consider- job concurrency and mediawiki code checks to prevent those [12:01:18] Daimona: no time for enable logging of private filters on commonswiki, sorry [12:01:31] <_joe_> Lucas_WMDE: yeah I'll merge, it's not really something that is part of a SWAT window, it's the kind of know SRE can move at any time [12:01:45] Lucas_WMDE Yeah no prob I'll reschedule [12:01:48] Thanks! [12:02:07] okay [12:02:07] Pchelolo: any of the 2 would work, probably mentioning both on the ticket should be advisable [12:02:27] (03CR) 10Giuseppe Lavagetto: [C: 03+2] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/494756 (https://phabricator.wikimedia.org/T216676) (owner: 10Giuseppe Lavagetto) [12:02:44] jynus: could you point me to where the MW checks for that live? [12:03:01] I don't know, that is why I suggest adding people that knows :-) [12:03:08] (03PS1) 10Daimona Eaytoy: Revert "Revert "Remove $wgAbuseFilterProfile"" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/498817 (https://phabricator.wikimedia.org/T191039) [12:03:17] either core platform or performance, as a wild guess [12:03:34] (03Merged) 10jenkins-bot: Direct 0.1% of anonymous users to php7 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/494756 (https://phabricator.wikimedia.org/T216676) (owner: 10Giuseppe Lavagetto) [12:03:49] ok. will find out [12:03:56] Pchelolo: HTMLCacheUpdateJob::invalidateTitles is the function I have on my logs [12:03:58] (03PS1) 10Daimona Eaytoy: Revert "Revert "Remove $wgAbuseFilterRuntimeProfile"" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/498818 (https://phabricator.wikimedia.org/T191039) [12:04:58] Pchelolo: My suggestion is to see if it is an issue more in the infra or on the code, or on both, and ask for help for thouse on the know [12:05:58] (03PS1) 10Dzahn: nagios_common: another fix to check_lastmod suffix and naming [puppet] - 10https://gerrit.wikimedia.org/r/498819 [12:06:18] Pchelolo: if you file a ticket, add me and I will be glad to complete with all the info I have if that helps [12:06:26] (03CR) 10Dzahn: [C: 03+2] nagios_common: another fix to check_lastmod suffix and naming [puppet] - 10https://gerrit.wikimedia.org/r/498819 (owner: 10Dzahn) [12:06:43] (03CR) 10jerkins-bot: [V: 04-1] nagios_common: another fix to check_lastmod suffix and naming [puppet] - 10https://gerrit.wikimedia.org/r/498819 (owner: 10Dzahn) [12:06:49] ok. thank you. will add you as a subscriber [12:07:45] !log installing openssl1.0 security updates on stretch [12:07:46] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:08:41] !log oblivian@deploy1001 Synchronized wmf-config/CommonSettings.php: Move 0.1% of anonymous users to php7 T212828 (duration: 00m 49s) [12:08:51] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:08:51] T212828: SRE FY2019 Q3 goal: Ramp-up serving traffic to PHP 7 - https://phabricator.wikimedia.org/T212828 [12:08:53] Lucas_WMDE I'm sorry, the other revert hasn't been deployed [12:09:11] (Searching link) [12:09:17] I still see some logspam [12:09:34] https://gerrit.wikimedia.org/r/#/c/operations/mediawiki-config/+/498813/ [12:10:01] Daimona: hm [12:10:06] (03CR) 10jenkins-bot: Direct 0.1% of anonymous users to php7 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/494756 (https://phabricator.wikimedia.org/T216676) (owner: 10Giuseppe Lavagetto) [12:10:25] I didn’t `scap sync-file` the first revert, since I never `scap sync-file`d the second change [12:10:36] perhaps I did the reverts in the wrong order? [12:10:39] (03PS2) 10Dzahn: nagios_common: another fix to check_lastmod and naming [puppet] - 10https://gerrit.wikimedia.org/r/498819 [12:10:53] let me check [12:10:59] Thanks [12:11:44] no, that looks right to me [12:11:54] I sync-file’d InitialiseSettings.php both times [12:12:03] (03CR) 10Arturo Borrero Gonzalez: "> Patch Set 2:" [puppet] - 10https://gerrit.wikimedia.org/r/498796 (owner: 10Alex Monk) [12:12:12] and never sync-file’d wmf-config/abusefilter.php [12:12:25] Uhm that's weird [12:12:45] <_joe_> what logspam do you see? [12:13:01] But in fact, I don't see logs the second deployment [12:13:24] It's "PHP Notice: Undefined index: runtime" [12:13:35] and "PHP Notice: Undefined index: condCount" [12:13:36] <_joe_> the logspam specific to abusefilter seems to have stopped [12:13:49] Daimona: there’s no log entry for the second deployment because that was never scap-sync’ed [12:13:52] Uh right, the last one is now 4 minutes old [12:13:55] ^ Indeed [12:13:57] you notified me of the issue when it was still on mwdebug1002 [12:14:14] the deployment to the debug server is separate and not logged [12:14:15] Yay I just saw logspam for some minutes after the revert and thought something wrong was still in prod [12:14:18] (unless I misunderstood something) [12:14:20] okay [12:14:31] Yes yes I misunderstood what I saw [12:14:44] <_joe_> Daimona: it's a bit strange, yes, but it could be lag ingesting data in lostash maybe? I dunno tbh [12:14:52] Ah I have an explanation [12:14:54] _joe_: you’re done with the php7 patch, right? [12:14:57] <_joe_> yes [12:15:01] !log EU SWAT finished [12:15:02] (03PS3) 10Dzahn: nagios_common: another fix to check_lastmod location and naming [puppet] - 10https://gerrit.wikimedia.org/r/498819 (https://phabricator.wikimedia.org/T203208) [12:15:03] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:15:04] It happened when we first enabled profiling everywhere [12:15:26] It's due to profiling data being created, the code is trying to retrieve it while it still doesn't exist for all filters [12:15:42] Mostly, it's for edits which started before the deploy and were saved later I think [12:15:48] I just thought it would have stopped earlier [12:16:40] Uh actually not that [12:16:56] <_joe_> I don't think that's how it works, no [12:16:58] Not exactly, at least: for edits *stashed* in cache before the deploy, and saved *from cache* later [12:17:12] This is the right explanation :-) [12:17:14] <_joe_> yes [12:17:25] <_joe_> that's possible indeed [12:17:27] I got confused [12:17:29] (03CR) 10Dzahn: [C: 03+2] nagios_common: another fix to check_lastmod location and naming [puppet] - 10https://gerrit.wikimedia.org/r/498819 (https://phabricator.wikimedia.org/T203208) (owner: 10Dzahn) [12:17:38] Yep, I checked last time [12:22:35] PROBLEM - puppet last run on db1089 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [12:23:25] PROBLEM - puppet last run on an-worker1089 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [12:29:10] thanks for merging the r/p/ -> /r/ patch of mine mutante [12:31:25] (03PS1) 10Dzahn: nagios_common: remove duplicate config, check_command already uses it [puppet] - 10https://gerrit.wikimedia.org/r/498825 [12:32:29] 10Operations, 10Wikidata, 10Wikidata-Termbox-Hike, 10serviceops, and 4 others: New Service Request: Wikidata Termbox SSR - https://phabricator.wikimedia.org/T212189 (10RazShuty) Hey @akosiaris, not sure I see it in there, maybe I'm lost a bit... can you point me out to where the SSR is in https://www.media... [12:34:05] (03PS2) 10Dzahn: nagios_common: remove duplicate config, check_command already uses it [puppet] - 10https://gerrit.wikimedia.org/r/498825 [12:34:37] (03CR) 10Dzahn: [C: 03+2] nagios_common: remove duplicate config, check_command already uses it [puppet] - 10https://gerrit.wikimedia.org/r/498825 (owner: 10Dzahn) [12:41:41] (03PS1) 10Muehlenhoff: Fix filtering of auto-restarted/ignored services which are running a legacy init script [debs/debdeploy] - 10https://gerrit.wikimedia.org/r/498828 [12:42:07] (03CR) 10Alex Monk: "I might be able to have a go. I imagine I'll make a new patch set that depends on this." [puppet] - 10https://gerrit.wikimedia.org/r/498796 (owner: 10Alex Monk) [12:42:58] hauskatze: welcome! thanks for the patch [12:43:19] (03CR) 10Alex Monk: [C: 04-1] "(the concern around inline hiera calls may get fixed by a mix of If770c8e5 and something else discussed there which I'm going to attempt)" [puppet] - 10https://gerrit.wikimedia.org/r/498797 (owner: 10Arturo Borrero Gonzalez) [12:43:41] RECOVERY - puppet last run on db1089 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [12:44:39] volans: finally both icinga config and puppet on icinga is fixed. sry for noise, happens a lot when trying to add new check commands and there are 2 ways to do it [12:44:51] 10Operations, 10Wikidata, 10Wikidata-Termbox-Hike, 10serviceops, and 4 others: New Service Request: Wikidata Termbox SSR - https://phabricator.wikimedia.org/T212189 (10Joe) >>! In T212189#5053087, @RazShuty wrote: > Hey @akosiaris, not sure I see it in there, maybe I'm lost a bit... can you point me out to... [12:44:59] mutante: ack, thx for fixing [12:46:39] 10Operations, 10serviceops: Use PHP7 to run all async jobs - https://phabricator.wikimedia.org/T219148 (10Joe) [12:47:01] RECOVERY - Check correctness of the icinga configuration on icinga1001 is OK: Icinga configuration is correct https://wikitech.wikimedia.org/wiki/Icinga [12:47:23] (03CR) 10Alex Monk: wmf_style lint: Show new errors that already appear on other lines (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/498731 (https://phabricator.wikimedia.org/T219085) (owner: 10Alex Monk) [12:48:17] !log reloading icinga config [12:48:18] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:49:04] 10Operations, 10Wikidata, 10Wikidata-Termbox-Hike, 10serviceops, and 4 others: New Service Request: Wikidata Termbox SSR - https://phabricator.wikimedia.org/T212189 (10RazShuty) Thanks a lot! [12:49:32] RECOVERY - puppet last run on an-worker1089 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [12:51:45] (03PS1) 10Muehlenhoff: Enable base::service_auto_restart for rsync/namenode standby [puppet] - 10https://gerrit.wikimedia.org/r/498834 (https://phabricator.wikimedia.org/T135991) [12:51:54] (03PS4) 10Eevans: prometheus: collect session storage Cassandra metrics [puppet] - 10https://gerrit.wikimedia.org/r/497848 (https://phabricator.wikimedia.org/T209108) [12:52:40] 10Operations, 10serviceops: Ramp up percentage of users on php7.2 to 100% on both API and appserver clusters - https://phabricator.wikimedia.org/T219150 (10Joe) [12:53:47] (03PS1) 10Dzahn: icinga/planet: merge monitoring into a single host and class [puppet] - 10https://gerrit.wikimedia.org/r/498835 (https://phabricator.wikimedia.org/T203208) [12:56:23] (03CR) 10Tchanders: [C: 03+1] Enforce 8 char password length requirements for non-privileged users [mediawiki-config] - 10https://gerrit.wikimedia.org/r/496202 (https://phabricator.wikimedia.org/T211622) (owner: 10Dmaza) [12:59:51] (03PS1) 10Jbond: mtail: Add hostname to ulogd metric data [puppet] - 10https://gerrit.wikimedia.org/r/498837 [13:00:33] (03CR) 10Jbond: [C: 03+1] "LGTM" [debs/debdeploy] - 10https://gerrit.wikimedia.org/r/498828 (owner: 10Muehlenhoff) [13:00:36] (03CR) 10jerkins-bot: [V: 04-1] mtail: Add hostname to ulogd metric data [puppet] - 10https://gerrit.wikimedia.org/r/498837 (owner: 10Jbond) [13:00:59] (03PS2) 10Dzahn: icinga/planet: merge monitoring into a single host and class [puppet] - 10https://gerrit.wikimedia.org/r/498835 (https://phabricator.wikimedia.org/T203208) [13:01:30] (03CR) 10Jbond: [C: 03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/498834 (https://phabricator.wikimedia.org/T135991) (owner: 10Muehlenhoff) [13:04:14] (03CR) 10Dzahn: [C: 03+2] "https://puppet-compiler.wmflabs.org/compiler1002/15314/icinga1001.wikimedia.org/" [puppet] - 10https://gerrit.wikimedia.org/r/498835 (https://phabricator.wikimedia.org/T203208) (owner: 10Dzahn) [13:14:29] (03PS2) 10Jbond: mtail: Add hostname to ulogd metric data [puppet] - 10https://gerrit.wikimedia.org/r/498837 [13:15:32] (03PS1) 10Dzahn: mediawiki::maintenance: run tor_exit cron with PHP 7.2 [puppet] - 10https://gerrit.wikimedia.org/r/498845 (https://phabricator.wikimedia.org/T219135) [13:17:15] (03PS5) 10Gehel: elasticsearch: convert check to py3 [puppet] - 10https://gerrit.wikimedia.org/r/498292 (https://phabricator.wikimedia.org/T215439) (owner: 10Mathew.onipe) [13:17:48] !log mwmaint1002 - manually running tor_exit_node cron command and test with PHP 7.2 [13:17:50] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:18:48] (03CR) 10Dzahn: [V: 03+1 C: 03+1] "tested without issues" [puppet] - 10https://gerrit.wikimedia.org/r/498845 (https://phabricator.wikimedia.org/T219135) (owner: 10Dzahn) [13:19:44] (03CR) 10Gehel: [C: 03+2] elasticsearch: convert check to py3 [puppet] - 10https://gerrit.wikimedia.org/r/498292 (https://phabricator.wikimedia.org/T215439) (owner: 10Mathew.onipe) [13:22:50] (03PS1) 10Ema: cp1076: use ATS backends instead of Varnish [puppet] - 10https://gerrit.wikimedia.org/r/498849 (https://phabricator.wikimedia.org/T213263) [13:26:03] (03PS1) 10Muehlenhoff: Add agetty to filter_services list of debdeploy [puppet] - 10https://gerrit.wikimedia.org/r/498852 (https://phabricator.wikimedia.org/T135991) [13:28:40] !log planet - manually updating en version since new monitoring check warned it wasn't current (T203208) [13:28:43] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:28:44] T203208: Add monitoring for planet updating - https://phabricator.wikimedia.org/T203208 [13:29:15] (03CR) 10Dzahn: [V: 03+1 C: 03+2] mediawiki::maintenance: run tor_exit cron with PHP 7.2 [puppet] - 10https://gerrit.wikimedia.org/r/498845 (https://phabricator.wikimedia.org/T219135) (owner: 10Dzahn) [13:29:32] (03PS2) 10Dzahn: mediawiki::maintenance: run tor_exit cron with PHP 7.2 [puppet] - 10https://gerrit.wikimedia.org/r/498845 (https://phabricator.wikimedia.org/T219135) [13:31:58] (03CR) 10Jbond: [C: 04-1] ferm::service: Allow ensure absent without proto/port (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/498645 (owner: 10Alex Monk) [13:32:42] (03CR) 10Alex Monk: ferm::service: Allow ensure absent without proto/port (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/498645 (owner: 10Alex Monk) [13:33:07] (03PS3) 10Alex Monk: ferm::service: Allow ensure absent without proto/port [puppet] - 10https://gerrit.wikimedia.org/r/498645 [13:33:54] (03PS4) 10Jcrespo: mariadb-backups: Fix bug on state updating logic [puppet] - 10https://gerrit.wikimedia.org/r/498788 (https://phabricator.wikimedia.org/T206203) [13:35:26] (03CR) 10Jbond: "This CR add's hostname to the labels so the data can be a bit more useful." [puppet] - 10https://gerrit.wikimedia.org/r/498837 (owner: 10Jbond) [13:37:03] (03CR) 10Jbond: [C: 03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/498645 (owner: 10Alex Monk) [13:40:57] (03CR) 10Jcrespo: [C: 03+2] mariadb-backups: Fix bug on state updating logic [puppet] - 10https://gerrit.wikimedia.org/r/498788 (https://phabricator.wikimedia.org/T206203) (owner: 10Jcrespo) [13:41:12] !log cp1076: depool varnish-fe and point it to cp-ats T213263 [13:41:14] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:41:15] T213263: Partial cache_upload traffic switchover to ATS and switchback to Varnish - https://phabricator.wikimedia.org/T213263 [13:41:48] !log ema@puppetmaster1001 conftool action : set/pooled=no; selector: name=cp1076.eqiad.wmnet,service=nginx [13:41:49] !log ema@puppetmaster1001 conftool action : set/pooled=no; selector: name=cp1076.eqiad.wmnet,service=varnish-fe [13:41:49] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:41:51] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:42:42] (03CR) 10Ema: [C: 03+2] cp1076: use ATS backends instead of Varnish [puppet] - 10https://gerrit.wikimedia.org/r/498849 (https://phabricator.wikimedia.org/T213263) (owner: 10Ema) [13:42:52] (03PS2) 10Ema: cp1076: use ATS backends instead of Varnish [puppet] - 10https://gerrit.wikimedia.org/r/498849 (https://phabricator.wikimedia.org/T213263) [13:43:22] (03CR) 10Andrew Bogott: [C: 03+1] puppet_alert - Log to syslog instead of stdout [puppet] - 10https://gerrit.wikimedia.org/r/498353 (https://phabricator.wikimedia.org/T218987) (owner: 10GTirloni) [13:46:18] (03PS2) 10Marostegui: install_server: Add db1139,db1140,dbprov200* [puppet] - 10https://gerrit.wikimedia.org/r/498768 (https://phabricator.wikimedia.org/T218985) [13:50:30] PROBLEM - check updates on en.planet.wikimedia.org on en.planet.wikimedia.org is CRITICAL: CRITICAL - Content not updated recently (69188 69120) https://wikitech.wikimedia.org/wiki/Planet.wikimedia.org [13:51:12] (03PS1) 10Dzahn: icinga/planet: fix seconds->hour calc and add comments [puppet] - 10https://gerrit.wikimedia.org/r/498865 (https://phabricator.wikimedia.org/T203208) [13:52:15] !log cp1076: repool varnish-fe, frontend misses served by cp-ats T213263 [13:52:18] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:52:19] T213263: Partial cache_upload traffic switchover to ATS and switchback to Varnish - https://phabricator.wikimedia.org/T213263 [13:52:44] !log ema@puppetmaster1001 conftool action : set/pooled=yes; selector: name=cp1076.eqiad.wmnet,service=nginx [13:52:45] !log ema@puppetmaster1001 conftool action : set/pooled=yes; selector: name=cp1076.eqiad.wmnet,service=varnish-fe [13:52:45] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:52:47] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:53:30] (03CR) 10Dzahn: [C: 03+2] icinga/planet: fix seconds->hour calc and add comments [puppet] - 10https://gerrit.wikimedia.org/r/498865 (https://phabricator.wikimedia.org/T203208) (owner: 10Dzahn) [13:54:25] (03CR) 10Gehel: [C: 03+1] "LGTM, waiting for Mathew to be around to merge." [puppet] - 10https://gerrit.wikimedia.org/r/497698 (https://phabricator.wikimedia.org/T213940) (owner: 10Mathew.onipe) [13:54:43] (03PS3) 10Gehel: multi-instance for elastic deployment-prep [puppet] - 10https://gerrit.wikimedia.org/r/497698 (https://phabricator.wikimedia.org/T213940) (owner: 10Mathew.onipe) [13:55:38] (03CR) 10Dzahn: [C: 04-1] "labsdb10[10-12] should be labsdb101[0-2]" [puppet] - 10https://gerrit.wikimedia.org/r/498768 (https://phabricator.wikimedia.org/T218985) (owner: 10Marostegui) [13:58:15] (03PS4) 10GTirloni: puppet_alert - Log to syslog instead of stdout [puppet] - 10https://gerrit.wikimedia.org/r/498353 (https://phabricator.wikimedia.org/T218987) [14:00:19] (03CR) 10GTirloni: [C: 03+2] puppet_alert - Log to syslog instead of stdout [puppet] - 10https://gerrit.wikimedia.org/r/498353 (https://phabricator.wikimedia.org/T218987) (owner: 10GTirloni) [14:05:09] (03PS3) 10Marostegui: install_server: Add db1139,db1140,dbprov200* [puppet] - 10https://gerrit.wikimedia.org/r/498768 (https://phabricator.wikimedia.org/T218985) [14:07:15] mutante: ^ [14:09:42] (03PS5) 10Gehel: Pass flag use_nodejs10 for maps services [puppet] - 10https://gerrit.wikimedia.org/r/495735 (https://phabricator.wikimedia.org/T215523) (owner: 10MSantos) [14:10:12] (03CR) 10Jcrespo: [C: 04-1] "I took over the setup, if you want to do it instead, first coordinate with me." [puppet] - 10https://gerrit.wikimedia.org/r/498768 (https://phabricator.wikimedia.org/T218985) (owner: 10Marostegui) [14:12:59] marostegui: so.. as to why the previous version would not work: it's because it's shell globbing: https://phabricator.wikimedia.org/P8262 [14:13:41] marostegui: that looks good now. what was new to me is that have special partman recipes to prevent reinstall [14:14:04] (03PS3) 10Ottomata: [EventBus] Decrease timeout and use hasty mode for analytics. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/496446 (https://phabricator.wikimedia.org/T218260) (owner: 10Ppchelko) [14:14:16] (03CR) 10Gehel: [C: 04-1] "PCC still failing with `eventlogging_service_uri` declared as HTTPS but actually being HTTP. If an HTTPS endpoint exists, maybe this is th" [puppet] - 10https://gerrit.wikimedia.org/r/495735 (https://phabricator.wikimedia.org/T215523) (owner: 10MSantos) [14:15:31] (03CR) 10Marostegui: "> I took over the setup, if you want to do it instead, first" [puppet] - 10https://gerrit.wikimedia.org/r/498768 (https://phabricator.wikimedia.org/T218985) (owner: 10Marostegui) [14:17:04] (03PS4) 10Andrew Bogott: ldap: add an index for 'sudoHost' [puppet] - 10https://gerrit.wikimedia.org/r/498396 (https://phabricator.wikimedia.org/T46722) [14:17:18] RECOVERY - check updates on en.planet.wikimedia.org on en.planet.wikimedia.org is OK: OK - Website content is current (70796 = 86400) https://wikitech.wikimedia.org/wiki/Planet.wikimedia.org [14:17:48] ^ new check, confirms it works [14:19:05] (03PS1) 10Ppchelko: Enable rsyslog logging for node services. [puppet] - 10https://gerrit.wikimedia.org/r/498872 (https://phabricator.wikimedia.org/T211125) [14:19:07] (03CR) 10Andrew Bogott: [C: 03+2] ldap: add an index for 'sudoHost' [puppet] - 10https://gerrit.wikimedia.org/r/498396 (https://phabricator.wikimedia.org/T46722) (owner: 10Andrew Bogott) [14:19:20] (03CR) 10CDanis: [C: 03+1] "> This CR add's hostname to the labels so the data can be a bit more" [puppet] - 10https://gerrit.wikimedia.org/r/498837 (owner: 10Jbond) [14:19:32] (03PS1) 10Ottomata: Re-enable eventgate-analytics api-request logging for group0 wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/498873 (https://phabricator.wikimedia.org/T214080) [14:22:27] (03PS4) 10Marostegui: install_server: Add db1139,db1140,dbprov200* [puppet] - 10https://gerrit.wikimedia.org/r/498768 (https://phabricator.wikimedia.org/T218985) [14:23:26] (03PS5) 10Marostegui: install_server: Add db1139,db1140 [puppet] - 10https://gerrit.wikimedia.org/r/498768 (https://phabricator.wikimedia.org/T218985) [14:23:36] (03CR) 10EBernhardson: [C: 04-1] "Patch ready, but mjolnir update isn't deployed yet" [puppet] - 10https://gerrit.wikimedia.org/r/498232 (https://phabricator.wikimedia.org/T218833) (owner: 10EBernhardson) [14:26:15] (03PS1) 10Gehel: Revert "elasticsearch: hide deprecation warning for ParseField" [puppet] - 10https://gerrit.wikimedia.org/r/498877 [14:26:41] (03CR) 10jerkins-bot: [V: 04-1] Revert "elasticsearch: hide deprecation warning for ParseField" [puppet] - 10https://gerrit.wikimedia.org/r/498877 (owner: 10Gehel) [14:29:32] (03PS2) 10Gehel: Revert "elasticsearch: hide deprecation warning for ParseField" [puppet] - 10https://gerrit.wikimedia.org/r/498877 [14:29:48] !log updating slapd indexes on seaborgium, serpens, ldap-eqiad-replica01, ldap-eqiad-replica02 for 498396 [14:29:50] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:30:33] (03CR) 10DCausse: [C: 03+1] Revert "elasticsearch: hide deprecation warning for ParseField" [puppet] - 10https://gerrit.wikimedia.org/r/498877 (owner: 10Gehel) [14:30:51] (03CR) 10Gehel: [C: 03+2] Revert "elasticsearch: hide deprecation warning for ParseField" [puppet] - 10https://gerrit.wikimedia.org/r/498877 (owner: 10Gehel) [14:30:53] (03Restored) 10Hashar: Jenkins job validation (DO NOT SUBMIT) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/127236 (owner: 10Hashar) [14:31:07] (03Abandoned) 10Hashar: Jenkins job validation (DO NOT SUBMIT) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/127236 (owner: 10Hashar) [14:32:39] 10Operations, 10ops-eqiad, 10DBA: db1078 s3 primary DB master BBU pre-failure - https://phabricator.wikimedia.org/T219115 (10Cmjohnson) Your case was successfully submitted. Please note your Case ID: 5337355107 for future reference. [14:32:54] (03PS1) 10Hashar: Jenkins job validation (DO NOT SUBMIT) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/498882 [14:33:25] (03CR) 10Muehlenhoff: [C: 03+2] Fix filtering of auto-restarted/ignored services which are running a legacy init script [debs/debdeploy] - 10https://gerrit.wikimedia.org/r/498828 (owner: 10Muehlenhoff) [14:33:52] 10Operations, 10ops-eqiad, 10DBA: db1078 s3 primary DB master BBU pre-failure - https://phabricator.wikimedia.org/T219115 (10Marostegui) Thanks! [14:35:50] 10Operations, 10cloud-services-team (Kanban): labweb1001: mwscript php7.2: command not found - https://phabricator.wikimedia.org/T219157 (10GTirloni) [14:35:57] (03CR) 10Hashar: "recheck" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/498882 (owner: 10Hashar) [14:37:56] (03Abandoned) 10Hashar: Jenkins job validation (DO NOT SUBMIT) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/498882 (owner: 10Hashar) [14:38:32] (03CR) 10Vgutierrez: [C: 03+2] "TLS material looks as expected:" [puppet] - 10https://gerrit.wikimedia.org/r/498811 (https://phabricator.wikimedia.org/T207295) (owner: 10Vgutierrez) [14:38:45] (03PS2) 10Vgutierrez: mirrors: Switch to the directory based deployment used by acme-chief [puppet] - 10https://gerrit.wikimedia.org/r/498811 (https://phabricator.wikimedia.org/T207295) [14:40:02] (03CR) 10MSantos: "> Patch Set 5: Code-Review-1" (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/495735 (https://phabricator.wikimedia.org/T215523) (owner: 10MSantos) [14:41:27] (03PS2) 10Ppchelko: Enable rsyslog logging for node services. [puppet] - 10https://gerrit.wikimedia.org/r/498872 (https://phabricator.wikimedia.org/T211125) [14:47:29] (03PS1) 10Vgutierrez: exim: Switch to the directory based deployment used by acme-chief [puppet] - 10https://gerrit.wikimedia.org/r/498892 (https://phabricator.wikimedia.org/T207295) [14:48:35] (03PS3) 10Ppchelko: Expose rsyslog_udp_port to services configs. [puppet] - 10https://gerrit.wikimedia.org/r/498872 (https://phabricator.wikimedia.org/T211125) [14:48:37] (03PS6) 10Marostegui: install_server: Add db1139,db1140 [puppet] - 10https://gerrit.wikimedia.org/r/498768 (https://phabricator.wikimedia.org/T218985) [14:49:47] (03CR) 10Vgutierrez: [C: 03+2] "TLS material looks as expected:" [puppet] - 10https://gerrit.wikimedia.org/r/498892 (https://phabricator.wikimedia.org/T207295) (owner: 10Vgutierrez) [14:49:59] (03PS2) 10Vgutierrez: exim: Switch to the directory based deployment used by acme-chief [puppet] - 10https://gerrit.wikimedia.org/r/498892 (https://phabricator.wikimedia.org/T207295) [14:50:32] (03PS4) 10Ppchelko: Expose rsyslog_udp_port to services configs. [puppet] - 10https://gerrit.wikimedia.org/r/498872 (https://phabricator.wikimedia.org/T211125) [14:50:39] (03CR) 10Jbond: [C: 03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/498852 (https://phabricator.wikimedia.org/T135991) (owner: 10Muehlenhoff) [14:50:52] (03PS1) 10Andrew Bogott: nfs-mounts: set up NFS for the PAWS project [puppet] - 10https://gerrit.wikimedia.org/r/498893 (https://phabricator.wikimedia.org/T219077) [14:52:20] PROBLEM - HHVM jobrunner on mw1308 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - 473 bytes in 0.001 second response time https://wikitech.wikimedia.org/wiki/Jobrunner [14:53:36] RECOVERY - HHVM jobrunner on mw1308 is OK: HTTP OK: HTTP/1.1 200 OK - 270 bytes in 0.005 second response time https://wikitech.wikimedia.org/wiki/Jobrunner [14:55:02] (03CR) 10Ppchelko: "Puppet compiler: https://puppet-compiler.wmflabs.org/compiler1002/15323/" [puppet] - 10https://gerrit.wikimedia.org/r/498872 (https://phabricator.wikimedia.org/T211125) (owner: 10Ppchelko) [14:56:25] (03PS2) 10Andrew Bogott: nfs-mounts: set up NFS for the PAWS project [puppet] - 10https://gerrit.wikimedia.org/r/498893 (https://phabricator.wikimedia.org/T219077) [15:00:00] (03CR) 10CRusnov: "New update fixes issues below, I believe. It's more heavyweight than just exposing the API (it attempts to make the API's peculiarities hi" (034 comments) [software/spicerack] - 10https://gerrit.wikimedia.org/r/493138 (https://phabricator.wikimedia.org/T217072) (owner: 10CRusnov) [15:00:51] !log updateing passenger on rhodium [15:00:52] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:05:27] (03PS1) 10Vgutierrez: gerrit: Switch to the directory based deployment used by acme-chief [puppet] - 10https://gerrit.wikimedia.org/r/498900 (https://phabricator.wikimedia.org/T207295) [15:06:25] 10Operations, 10Operations-Software-Development, 10Continuous-Integration-Config, 10Patch-For-Review: Puppet tox: properly lint both Py2 and Py3 files - https://phabricator.wikimedia.org/T184435 (10Bstorm) >>! In T184435#5051826, @Volans wrote: > > Also it seems that at least some files under `modules/ope... [15:07:05] (03CR) 10Ppchelko: [C: 03+1] Re-enable eventgate-analytics api-request logging for group0 wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/498873 (https://phabricator.wikimedia.org/T214080) (owner: 10Ottomata) [15:07:46] PROBLEM - puppet last run on wtp1046 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:08:46] (03CR) 10Vgutierrez: [C: 03+2] "TLS material looks as expected:" [puppet] - 10https://gerrit.wikimedia.org/r/498900 (https://phabricator.wikimedia.org/T207295) (owner: 10Vgutierrez) [15:09:59] (03PS1) 10Andrew Bogott: openldap: spruce up the anti-memory-leak cron for replicas [puppet] - 10https://gerrit.wikimedia.org/r/498902 [15:10:55] (03CR) 10jerkins-bot: [V: 04-1] openldap: spruce up the anti-memory-leak cron for replicas [puppet] - 10https://gerrit.wikimedia.org/r/498902 (owner: 10Andrew Bogott) [15:12:19] (03PS2) 10Andrew Bogott: openldap: spruce up the anti-memory-leak cron for replicas [puppet] - 10https://gerrit.wikimedia.org/r/498902 [15:13:08] (03PS3) 10Andrew Bogott: openldap: spruce up the anti-memory-leak cron for replicas [puppet] - 10https://gerrit.wikimedia.org/r/498902 [15:13:19] (03PS4) 10Gehel: multi-instance for elastic deployment-prep [puppet] - 10https://gerrit.wikimedia.org/r/497698 (https://phabricator.wikimedia.org/T213940) (owner: 10Mathew.onipe) [15:13:49] (03PS3) 10Jbond: mtail: Add hostname to ulogd metric data [puppet] - 10https://gerrit.wikimedia.org/r/498837 [15:14:41] (03CR) 10Jbond: [C: 03+2] mtail: Add hostname to ulogd metric data [puppet] - 10https://gerrit.wikimedia.org/r/498837 (owner: 10Jbond) [15:15:48] (03CR) 10Gehel: [C: 04-1] "> Patch Set 5:" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/495735 (https://phabricator.wikimedia.org/T215523) (owner: 10MSantos) [15:15:54] (03CR) 10Gehel: [C: 03+2] multi-instance for elastic deployment-prep [puppet] - 10https://gerrit.wikimedia.org/r/497698 (https://phabricator.wikimedia.org/T213940) (owner: 10Mathew.onipe) [15:16:08] gehel: Thanks! [15:16:14] (03PS5) 10Gehel: multi-instance for elastic deployment-prep [puppet] - 10https://gerrit.wikimedia.org/r/497698 (https://phabricator.wikimedia.org/T213940) (owner: 10Mathew.onipe) [15:16:16] (03CR) 10Jbond: "> Patch Set 2: Code-Review+1" [puppet] - 10https://gerrit.wikimedia.org/r/498837 (owner: 10Jbond) [15:16:36] (03PS1) 10Vgutierrez: icinga: Switch to the directory based deployment used by acme-chief [puppet] - 10https://gerrit.wikimedia.org/r/498904 (https://phabricator.wikimedia.org/T207295) [15:18:23] (03CR) 10Filippo Giunchedi: profile: kafkatee instance for udp2log compat (033 comments) [puppet] - 10https://gerrit.wikimedia.org/r/498386 (https://phabricator.wikimedia.org/T126989) (owner: 10Filippo Giunchedi) [15:18:58] (03PS1) 10Dzahn: add vikipedia.com as parked domain [dns] - 10https://gerrit.wikimedia.org/r/498905 [15:19:17] (03CR) 10Vgutierrez: [C: 03+1] "TLS material looks as expected:" [puppet] - 10https://gerrit.wikimedia.org/r/498904 (https://phabricator.wikimedia.org/T207295) (owner: 10Vgutierrez) [15:20:11] !log ebernhardson@deploy1001 Started deploy [search/mjolnir/deploy@ddf26d0]: Ship new logging support code [15:20:12] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:20:35] (03CR) 10Volans: [C: 03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/498904 (https://phabricator.wikimedia.org/T207295) (owner: 10Vgutierrez) [15:20:55] (03CR) 10CDanis: [C: 03+1] icinga: Switch to the directory based deployment used by acme-chief [puppet] - 10https://gerrit.wikimedia.org/r/498904 (https://phabricator.wikimedia.org/T207295) (owner: 10Vgutierrez) [15:21:07] (03CR) 10CRusnov: "I'll take another pass at this after we wrap up some other tasks." [puppet] - 10https://gerrit.wikimedia.org/r/496836 (https://phabricator.wikimedia.org/T212526) (owner: 10CRusnov) [15:21:33] (03CR) 10Vgutierrez: [C: 03+2] icinga: Switch to the directory based deployment used by acme-chief [puppet] - 10https://gerrit.wikimedia.org/r/498904 (https://phabricator.wikimedia.org/T207295) (owner: 10Vgutierrez) [15:21:39] !log ebernhardson@deploy1001 Finished deploy [search/mjolnir/deploy@ddf26d0]: Ship new logging support code (duration: 01m 29s) [15:21:41] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:21:42] (03PS2) 10Vgutierrez: icinga: Switch to the directory based deployment used by acme-chief [puppet] - 10https://gerrit.wikimedia.org/r/498904 (https://phabricator.wikimedia.org/T207295) [15:23:56] (03PS7) 10Marostegui: install_server: Add db1139,db1140 [puppet] - 10https://gerrit.wikimedia.org/r/498768 (https://phabricator.wikimedia.org/T218985) [15:27:26] (03CR) 10Volans: [C: 04-1] "Few glitches here and there to improve." (0310 comments) [cookbooks] - 10https://gerrit.wikimedia.org/r/496527 (https://phabricator.wikimedia.org/T203963) (owner: 10CRusnov) [15:27:28] (03PS1) 10Jbond: mtail: add hostname to systemd metric [puppet] - 10https://gerrit.wikimedia.org/r/498911 [15:31:24] (03PS1) 10Santhosh: ExternalGuidance: Allow google translate hosts as known services [mediawiki-config] - 10https://gerrit.wikimedia.org/r/498913 (https://phabricator.wikimedia.org/T218948) [15:31:35] (03PS2) 10Muehlenhoff: Add agetty to filter_services list of debdeploy [puppet] - 10https://gerrit.wikimedia.org/r/498852 (https://phabricator.wikimedia.org/T135991) [15:33:26] (03CR) 10Muehlenhoff: [C: 03+2] Add agetty to filter_services list of debdeploy [puppet] - 10https://gerrit.wikimedia.org/r/498852 (https://phabricator.wikimedia.org/T135991) (owner: 10Muehlenhoff) [15:34:06] RECOVERY - puppet last run on wtp1046 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [15:34:43] (03CR) 10Volans: "reply inline" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/498386 (https://phabricator.wikimedia.org/T126989) (owner: 10Filippo Giunchedi) [15:37:43] (03CR) 10Mobrovac: Create node-specific logstash filters for syslog. (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/498417 (https://phabricator.wikimedia.org/T211125) (owner: 10Ppchelko) [15:38:29] (03CR) 10Mobrovac: [C: 03+1] Expose rsyslog_udp_port to services configs. [puppet] - 10https://gerrit.wikimedia.org/r/498872 (https://phabricator.wikimedia.org/T211125) (owner: 10Ppchelko) [15:38:40] (03PS1) 10Mathew.onipe: Revert "multi-instance for elastic deployment-prep" [puppet] - 10https://gerrit.wikimedia.org/r/498916 [15:38:52] (03PS1) 10Gehel: Revert "multi-instance for elastic deployment-prep" [puppet] - 10https://gerrit.wikimedia.org/r/498917 [15:39:07] (03PS2) 10Gehel: Revert "multi-instance for elastic deployment-prep" [puppet] - 10https://gerrit.wikimedia.org/r/498917 [15:39:10] (03CR) 10Ppchelko: Create node-specific logstash filters for syslog. (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/498417 (https://phabricator.wikimedia.org/T211125) (owner: 10Ppchelko) [15:40:37] (03CR) 10Gehel: [C: 03+2] Revert "multi-instance for elastic deployment-prep" [puppet] - 10https://gerrit.wikimedia.org/r/498917 (owner: 10Gehel) [15:40:49] onimisionipe: ^ [15:40:59] Thanks! [15:41:59] (03PS1) 10Vgutierrez: acme_chief: Clean old file based certificate files (1/2) [puppet] - 10https://gerrit.wikimedia.org/r/498920 (https://phabricator.wikimedia.org/T207295) [15:42:01] (03PS1) 10Vgutierrez: acme_chief: Clean old file based certificate files (2/2) [puppet] - 10https://gerrit.wikimedia.org/r/498921 (https://phabricator.wikimedia.org/T207295) [15:44:03] (03PS1) 10CDanis: icinga: delete any commands not explicitly added by puppet [puppet] - 10https://gerrit.wikimedia.org/r/498922 [15:44:54] (03CR) 10jerkins-bot: [V: 04-1] icinga: delete any commands not explicitly added by puppet [puppet] - 10https://gerrit.wikimedia.org/r/498922 (owner: 10CDanis) [15:44:56] (03CR) 10Filippo Giunchedi: [C: 03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/496813 (https://phabricator.wikimedia.org/T211125) (owner: 10Ppchelko) [15:45:15] (03CR) 10Mobrovac: Create node-specific logstash filters for syslog. (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/498417 (https://phabricator.wikimedia.org/T211125) (owner: 10Ppchelko) [15:46:22] (03CR) 10Filippo Giunchedi: [C: 03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/498872 (https://phabricator.wikimedia.org/T211125) (owner: 10Ppchelko) [15:46:31] (03PS2) 10CDanis: icinga: delete any commands not explicitly added by puppet [puppet] - 10https://gerrit.wikimedia.org/r/498922 [15:47:17] (03CR) 10Bartosz Dziewoński: [C: 03+1] Enable VisualEditor at fiwikibooks [mediawiki-config] - 10https://gerrit.wikimedia.org/r/446583 (https://phabricator.wikimedia.org/T192135) (owner: 10Zoranzoki21) [15:48:26] (03CR) 10Muehlenhoff: [C: 03+1] icinga: delete any commands not explicitly added by puppet (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/498922 (owner: 10CDanis) [15:48:42] !log remove 2nd AS7568 router in Equinix Singapore [15:48:43] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:49:42] (03PS4) 10Ppchelko: Create node-specific logstash filters for syslog. [puppet] - 10https://gerrit.wikimedia.org/r/498417 (https://phabricator.wikimedia.org/T211125) [15:49:44] (03CR) 10Ppchelko: Create node-specific logstash filters for syslog. (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/498417 (https://phabricator.wikimedia.org/T211125) (owner: 10Ppchelko) [15:49:48] (03CR) 10CDanis: [C: 03+2] "PCC looks good https://puppet-compiler.wmflabs.org/compiler1002/15326/" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/498922 (owner: 10CDanis) [15:50:23] (03PS3) 10CDanis: icinga: delete any commands not explicitly added by puppet [puppet] - 10https://gerrit.wikimedia.org/r/498922 [15:51:49] (03PS4) 10CDanis: icinga: delete any commands not explicitly added by puppet [puppet] - 10https://gerrit.wikimedia.org/r/498922 [15:52:45] (03CR) 10CDanis: [C: 03+2] icinga: delete any commands not explicitly added by puppet [puppet] - 10https://gerrit.wikimedia.org/r/498922 (owner: 10CDanis) [15:54:56] PROBLEM - puppet last run on elastic2028 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 7 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[openjdk-8-jdk] [15:56:09] (03PS1) 10Bartosz Dziewoński: Enable VisualEditor in Draft namespace ("Mustand") in etwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/498924 (https://phabricator.wikimedia.org/T192254) [15:59:22] (03PS1) 10DCausse: mwgrep add support for elasticsearch6 [puppet] - 10https://gerrit.wikimedia.org/r/498927 (https://phabricator.wikimedia.org/T219162) [15:59:56] (03CR) 10Mobrovac: [C: 03+1] "LGTM, nits in-lined" (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/498417 (https://phabricator.wikimedia.org/T211125) (owner: 10Ppchelko) [16:03:59] (03CR) 10SBassett: [C: 03+1] "Works on mwmaint1002 for me w/ a few minimal test searches." [puppet] - 10https://gerrit.wikimedia.org/r/498927 (https://phabricator.wikimedia.org/T219162) (owner: 10DCausse) [16:06:52] 10Operations, 10RESTBase, 10RESTBase-API, 10serviceops, and 3 others: Make RESTBase spec standard compliant and switch to OpenAPI 3.0 - https://phabricator.wikimedia.org/T218218 (10Pchelolo) a:05holger.knust→03Clarakosi For step 2 we need to switch `hyperswitch` to upstream swagger. Nowadays the swag... [16:08:04] (03PS1) 10Dzahn: Revert "mediawiki::maintenance: run tor_exit cron with PHP 7.2" [puppet] - 10https://gerrit.wikimedia.org/r/498928 [16:08:34] (03CR) 10Dzahn: "works fine in prod but fails and causes cron spam on labweb1001" [puppet] - 10https://gerrit.wikimedia.org/r/498928 (owner: 10Dzahn) [16:10:10] (03PS2) 10Dzahn: Revert "mediawiki::maintenance: run tor_exit cron with PHP 7.2" [puppet] - 10https://gerrit.wikimedia.org/r/498928 [16:10:42] RECOVERY - puppet last run on elastic2028 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [16:13:25] !log ebernhardson@deploy1001 Started deploy [search/mjolnir/deploy@6eda7d8]: Ship new logging support code via new simplified virtualenv deployment [16:13:27] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:16:03] !log ebernhardson@deploy1001 Finished deploy [search/mjolnir/deploy@6eda7d8]: Ship new logging support code via new simplified virtualenv deployment (duration: 02m 38s) [16:16:05] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:17:14] aww, somehow the scap !log no longer mentions the failed deploy [16:18:41] 10Operations, 10Patch-For-Review: Tracking and Reducing cron-spam to root@ - https://phabricator.wikimedia.org/T132324 (10elukey) [16:18:53] (03CR) 10Esanders: [C: 03+1] "wfm" [puppet] - 10https://gerrit.wikimedia.org/r/498927 (https://phabricator.wikimedia.org/T219162) (owner: 10DCausse) [16:19:06] !log updating Jenkins plugins and restarting [16:19:08] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:19:42] !log ebernhardson@deploy1001 Started deploy [search/mjolnir/deploy@ba09eb5]: Ship new logging support code via new simplified virtualenv deployment [16:19:44] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:19:51] 10Operations, 10serviceops, 10Patch-For-Review: Use PHP7 to run maintenance scripts - https://phabricator.wikimedia.org/T219135 (10Dzahn) @Joe I tried with the first one above and it was fine in production, mwmaint1002, but then i noticed cron spam from labweb1001/1002. `/usr/local/bin/mwscript: line 25: ph... [16:20:14] (03PS1) 10Krinkle: arclamp: Document YAML config file structure and logging logic [puppet] - 10https://gerrit.wikimedia.org/r/498932 (https://phabricator.wikimedia.org/T176916) [16:20:16] (03PS1) 10Krinkle: arclamp: Rename YAML file from xenon-log to arclamp-log-xenon [puppet] - 10https://gerrit.wikimedia.org/r/498933 (https://phabricator.wikimedia.org/T176916) [16:20:18] (03PS1) 10Krinkle: arclamp: Make Redis subscription channel configurable [puppet] - 10https://gerrit.wikimedia.org/r/498934 (https://phabricator.wikimedia.org/T176916) [16:20:33] (03PS2) 10DCausse: mwgrep: add support for elasticsearch6 [puppet] - 10https://gerrit.wikimedia.org/r/498927 (https://phabricator.wikimedia.org/T219162) [16:21:17] (03CR) 10DCausse: "diff from PS1 is just better error handling" [puppet] - 10https://gerrit.wikimedia.org/r/498927 (https://phabricator.wikimedia.org/T219162) (owner: 10DCausse) [16:24:50] 10Operations, 10ops-eqiad: Degraded RAID on ms-be1020 - https://phabricator.wikimedia.org/T214778 (10Cmjohnson) 05Open→03Resolved This server raid appears to be in optimal condition. I verified the h/w and icinga is not reporting a degraded raid. Resolving for now [16:26:08] (03CR) 10EBernhardson: [C: 03+1] mwgrep: add support for elasticsearch6 (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/498927 (https://phabricator.wikimedia.org/T219162) (owner: 10DCausse) [16:28:26] 10Operations, 10serviceops, 10Patch-For-Review: Use PHP7 to run maintenance scripts - https://phabricator.wikimedia.org/T219135 (10Dzahn) ok, not all of them, just this: ` [labweb1001:~] $ sudo crontab -u www-data -l # HEADER: This file was autogenerated at 2019-03-25 13:48:37 +0000 by puppet .. # Puppet N... [16:28:52] !log ebernhardson@deploy1001 Finished deploy [search/mjolnir/deploy@ba09eb5]: Ship new logging support code via new simplified virtualenv deployment (duration: 09m 10s) [16:28:53] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:29:43] 10Operations, 10Analytics, 10Product-Analytics, 10Patch-For-Review, 10User-Elukey: notebook/stat server(s) running out of memory - https://phabricator.wikimedia.org/T212824 (10mforns) @aborrero ping :] [16:29:52] 10Operations, 10Analytics, 10Product-Analytics, 10Patch-For-Review, 10User-Elukey: notebook/stat server(s) running out of memory - https://phabricator.wikimedia.org/T212824 (10mforns) p:05High→03Normal [16:30:53] 10Operations, 10ops-eqiad, 10Operations-Software-Development, 10monitoring, 10Patch-For-Review: ms-be1043 sdk failed - https://phabricator.wikimedia.org/T218544 (10Cmjohnson) You have successfully submitted request SR988320478. A disk has been ordered [16:31:08] 10Operations, 10Analytics, 10Product-Analytics, 10Patch-For-Review, 10User-Elukey: notebook/stat server(s) running out of memory - https://phabricator.wikimedia.org/T212824 (10elukey) This is my bad, I should I have followed up on this task. There are more variables since I added my last comment to add i... [16:32:52] (03CR) 10KartikMistry: [C: 03+1] ExternalGuidance: Allow google translate hosts as known services [mediawiki-config] - 10https://gerrit.wikimedia.org/r/498913 (https://phabricator.wikimedia.org/T218948) (owner: 10Santhosh) [16:35:02] (03PS1) 10Mathew.onipe: don't change old cluster name [puppet] - 10https://gerrit.wikimedia.org/r/498937 (https://phabricator.wikimedia.org/T213940) [16:35:24] (03CR) 10jerkins-bot: [V: 04-1] don't change old cluster name [puppet] - 10https://gerrit.wikimedia.org/r/498937 (https://phabricator.wikimedia.org/T213940) (owner: 10Mathew.onipe) [16:39:32] (03PS6) 10CRusnov: Add report which checks against puppetdb and compares serial numbers [software/netbox-reports] - 10https://gerrit.wikimedia.org/r/495267 (https://phabricator.wikimedia.org/T212526) [16:41:41] (03CR) 10CRusnov: "Okay I have excluded ripe-atlas-v1 machines (and set up so we can add arbitrary type slugs to that list). I have also made it so that None" (031 comment) [software/netbox-reports] - 10https://gerrit.wikimedia.org/r/495267 (https://phabricator.wikimedia.org/T212526) (owner: 10CRusnov) [16:45:14] (03CR) 10CDanis: [C: 03+1] mtail: add hostname to systemd metric [puppet] - 10https://gerrit.wikimedia.org/r/498911 (owner: 10Jbond) [16:45:47] 10Operations, 10Analytics, 10Product-Analytics, 10Patch-For-Review, 10User-Elukey: notebook/stat server(s) running out of memory - https://phabricator.wikimedia.org/T212824 (10diego) Finally it's not just me squeezing notebooks memory :) [16:47:03] !log ebernhardson@deploy1001 Started deploy [search/mjolnir/deploy@2fb5038]: Ship new logging support code via new simplified virtualenv deployment [16:47:04] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:48:43] (03PS2) 10Mathew.onipe: don't change old cluster name [puppet] - 10https://gerrit.wikimedia.org/r/498937 (https://phabricator.wikimedia.org/T213940) [16:50:51] (03PS3) 10Dzahn: Revert "mediawiki::maintenance: run tor_exit cron with PHP 7.2" [puppet] - 10https://gerrit.wikimedia.org/r/498928 [16:52:15] 10Operations, 10ops-eqiad: Degraded RAID on sodium - https://phabricator.wikimedia.org/T212010 (10RobH) Ok, synced up with Chris and have the following going on: * emailed our dell account team, they opened SR 987845644 with Ivan Martinez His email (last Friday): > Hello Rob/Chris, > > > > My name is... [16:53:17] (03CR) 10Dzahn: [C: 03+2] Revert "mediawiki::maintenance: run tor_exit cron with PHP 7.2" [puppet] - 10https://gerrit.wikimedia.org/r/498928 (owner: 10Dzahn) [16:56:55] (03PS1) 10Krinkle: arclamp: Rename xenon-log script to arclamp-log [puppet] - 10https://gerrit.wikimedia.org/r/498942 (https://phabricator.wikimedia.org/T176916) [16:56:55] !log ebernhardson@deploy1001 Finished deploy [search/mjolnir/deploy@2fb5038]: Ship new logging support code via new simplified virtualenv deployment (duration: 09m 52s) [16:56:56] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:57:19] 10Operations, 10ops-eqiad, 10DC-Ops: icinga1001 mysterious reboots - https://phabricator.wikimedia.org/T210108 (10Cmjohnson) [16:57:24] 10Operations, 10ops-eqiad, 10monitoring, 10Patch-For-Review: icinga1001 crashed - https://phabricator.wikimedia.org/T214760 (10Cmjohnson) 05Stalled→03Resolved I believe this is done now -- resolving [16:58:36] 10Operations, 10ops-eqiad, 10Analytics, 10Analytics-Kanban: confirm gpu form factor in stat1005 - https://phabricator.wikimedia.org/T216528 (10Cmjohnson) [16:58:46] (03CR) 10EBernhardson: [C: 03+1] [cirrus] switch low volume wikis to eqiad (elastic 6.5.4) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/498318 (https://phabricator.wikimedia.org/T218878) (owner: 10DCausse) [17:00:04] gehel and onimisionipe: #bothumor When your hammer is PHP, everything starts looking like a thumb. Rise for Wikidata Query Service weekly deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20190325T1700). [17:00:06] fyi, barring any objections I'm going t use the wdqs deploy window to ship a mw-config change (start using elastic6 for low volume wikis) [17:00:28] here here.. Ok dcausse [17:00:59] (03CR) 10DCausse: [C: 03+2] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/498318 (https://phabricator.wikimedia.org/T218878) (owner: 10DCausse) [17:01:23] 10Operations, 10Analytics, 10hardware-requests: GPU upgrade for stat1005 - https://phabricator.wikimedia.org/T216226 (10Cmjohnson) [17:01:28] 10Operations, 10ops-eqiad, 10Analytics, 10Analytics-Kanban: confirm gpu form factor in stat1005 - https://phabricator.wikimedia.org/T216528 (10Cmjohnson) 05Open→03Resolved Resolving [17:02:06] 10Operations, 10serviceops, 10Patch-For-Review: Use PHP7 to run maintenance scripts - https://phabricator.wikimedia.org/T219135 (10Krinkle) Duplicate of T195392? [17:02:14] (03Merged) 10jenkins-bot: [cirrus] switch low volume wikis to eqiad (elastic 6.5.4) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/498318 (https://phabricator.wikimedia.org/T218878) (owner: 10DCausse) [17:02:28] (03CR) 10jenkins-bot: [cirrus] switch low volume wikis to eqiad (elastic 6.5.4) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/498318 (https://phabricator.wikimedia.org/T218878) (owner: 10DCausse) [17:03:30] PROBLEM - pdfrender on scb1004 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://phabricator.wikimedia.org/T174916 [17:05:19] (03PS2) 10Dzahn: admin: add perf-roots to admin::groups for mwmaint* [puppet] - 10https://gerrit.wikimedia.org/r/497840 (https://phabricator.wikimedia.org/T217813) (owner: 10Effie Mouzeli) [17:06:10] (03CR) 10Dzahn: [C: 03+2] "approved in SRE meeting" [puppet] - 10https://gerrit.wikimedia.org/r/497840 (https://phabricator.wikimedia.org/T217813) (owner: 10Effie Mouzeli) [17:08:54] 10Operations, 10PHP 7.0 support, 10Patch-For-Review: Audit and sync INI settings as needed between HHVM and PHP 7 - https://phabricator.wikimedia.org/T211488 (10Joe) I just finished going through all settings. Given a lot of settings don't correspond 1:1, or have no correspondence at all between the two ap... [17:09:16] (03PS1) 10Zoranzoki21: Remove namespace 104 from FlaggedRevs configuration for arwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/498947 [17:10:30] !log dcausse@deploy1001 Synchronized wmf-config/InitialiseSettings.php: T218878: [cirrus] switch low volume wikis to eqiad (elastic 6.5.4) (duration: 00m 49s) [17:10:33] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:10:33] T218878: Upgrade to elasticsearch 6.5.4 for cirrus / codfw - https://phabricator.wikimedia.org/T218878 [17:10:39] (03PS2) 10Zoranzoki21: Remove namespace 104 from FlaggedRevs configuration for arwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/498947 (https://phabricator.wikimedia.org/T217507) [17:10:46] (03PS1) 10Krinkle: arclamp: Rename xenon-grep to arclamp-grep [puppet] - 10https://gerrit.wikimedia.org/r/498948 (https://phabricator.wikimedia.org/T176916) [17:10:56] onimisionipe: I'm done [17:11:10] dcausse: alright. Thanks [17:11:16] will continue from here [17:11:16] !log restart mjolnir-kafka-msearch-daemon on relforge100[12] [17:11:18] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:11:56] (03PS1) 10Urbanecm: Throttle rule for Wikimedia Hackathon 2019 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/498949 (https://phabricator.wikimedia.org/T213869) [17:14:26] !log onimisionipe@deploy1001 Started deploy [wdqs/wdqs@281aaf8]: New build for blazegraph and updater plus GUI updates [17:14:28] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:14:58] RECOVERY - pdfrender on scb1004 is OK: HTTP OK: HTTP/1.1 200 OK - 275 bytes in 3.950 second response time https://phabricator.wikimedia.org/T174916 [17:16:34] (03PS1) 10Krinkle: arclamp: Rename xenon-log to arclamp-log [puppet] - 10https://gerrit.wikimedia.org/r/498950 (https://phabricator.wikimedia.org/T176916) [17:18:58] PROBLEM - pdfrender on scb1004 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://phabricator.wikimedia.org/T174916 [17:19:31] 10Operations, 10serviceops, 10Patch-For-Review: Use PHP7 to run maintenance scripts - https://phabricator.wikimedia.org/T219135 (10Dzahn) [17:20:06] RECOVERY - pdfrender on scb1004 is OK: HTTP OK: HTTP/1.1 200 OK - 275 bytes in 0.007 second response time https://phabricator.wikimedia.org/T174916 [17:21:26] 10Operations, 10serviceops, 10Patch-For-Review: Use PHP7 to run maintenance scripts - https://phabricator.wikimedia.org/T219135 (10Dzahn) >>! In T219135#5054561, @Krinkle wrote: > Duplicate of T195392? Yes. Just not sure how to merge them best. Unfortunately Phabricator still does not actually do that and o... [17:22:17] 10Operations, 10SRE-Access-Requests, 10Patch-For-Review: Grant root on MediaWiki maintenance hosts to perf-roots - https://phabricator.wikimedia.org/T217813 (10Dzahn) [17:23:02] 10Operations, 10SRE-Access-Requests, 10Patch-For-Review: Grant root on MediaWiki maintenance hosts to perf-roots - https://phabricator.wikimedia.org/T217813 (10Dzahn) 05Open→03Resolved This request has been approved in today's SRE meeting and code is now merged. perf-roots members now have shell access... [17:24:10] PROBLEM - pdfrender on scb1004 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://phabricator.wikimedia.org/T174916 [17:24:39] !log restart pdfrender on scb1004 [17:24:41] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:24:58] !log onimisionipe@deploy1001 Finished deploy [wdqs/wdqs@281aaf8]: New build for blazegraph and updater plus GUI updates (duration: 10m 31s) [17:24:59] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:25:05] (03PS2) 10Bstorm: sonofgridengine: link hostgroup processing to host processing [puppet] - 10https://gerrit.wikimedia.org/r/498519 (https://phabricator.wikimedia.org/T216151) [17:25:16] RECOVERY - pdfrender on scb1004 is OK: HTTP OK: HTTP/1.1 200 OK - 275 bytes in 0.002 second response time https://phabricator.wikimedia.org/T174916 [17:26:28] (03CR) 10Bstorm: [C: 03+2] sonofgridengine: link hostgroup processing to host processing [puppet] - 10https://gerrit.wikimedia.org/r/498519 (https://phabricator.wikimedia.org/T216151) (owner: 10Bstorm) [17:27:00] (03PS3) 10GTirloni: nfs-mounts: set up NFS for the PAWS project [puppet] - 10https://gerrit.wikimedia.org/r/498893 (https://phabricator.wikimedia.org/T219077) (owner: 10Andrew Bogott) [17:28:13] (03CR) 10GTirloni: [C: 03+2] nfs-mounts: set up NFS for the PAWS project [puppet] - 10https://gerrit.wikimedia.org/r/498893 (https://phabricator.wikimedia.org/T219077) (owner: 10Andrew Bogott) [17:29:32] zeljkof or others involved with SWATs: I saw "Gerrit hashtag: xxx" in Deployments in a wikipage. Does it mean I can just assign this hashtag to a change and it'll be all necessary to schedule a change? [17:29:58] Urbanecm: not sure who wrote that, twentyafterfour? [17:30:06] I'm not sure how it works [17:30:14] RECOVERY - Router interfaces on cr2-eqiad is OK: OK: host 208.80.154.197, interfaces up: 231, down: 0, dormant: 0, excluded: 0, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down [17:30:19] Ok, thanks zeljkof [17:30:47] Urbanecm: that's part of a plan for the near future, yes [17:31:04] what do I need to do if I want to schedule a change for tomorrow EU SWAT? :-) [17:31:38] Urbanecm: for now use the old process but we would like to switch to using hashtags very soon [17:31:59] twentyafterfour, okay. What place should I watch to be notified about the switch? [17:32:12] or what is "very soon", if that's defined :) [17:32:14] I guess all we really need to do to switch is document it and promote the idea. thcipriani and I have been working on this [17:32:33] I'm not sure I guess we just need buy-in from swatters [17:32:44] (03CR) 10Filippo Giunchedi: "For later reference while I'm looking into this, a current log entry for restbase looks like this now in beta:" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/498417 (https://phabricator.wikimedia.org/T211125) (owner: 10Ppchelko) [17:32:47] ok, thanks twentyafterfour [17:33:04] Urbanecm: we will surely update the deployments page with more instructions when it's ready [17:33:05] twentyafterfour: Urbanecm, probably an announce email to wikitech-l, like most things :) [17:33:27] ok, thanks again twentyafterfour greg-g [17:34:12] (03CR) 10Filippo Giunchedi: "> Patch Set 4:" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/498417 (https://phabricator.wikimedia.org/T211125) (owner: 10Ppchelko) [17:38:16] (03PS10) 10MacFan4000: Set wgNoticeProjects for wikimedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/471663 (https://phabricator.wikimedia.org/T208694) [17:39:12] (03PS5) 10Ppchelko: Create node-specific logstash filters for syslog. [puppet] - 10https://gerrit.wikimedia.org/r/498417 (https://phabricator.wikimedia.org/T211125) [17:39:36] (03CR) 10Ppchelko: "The 'severity_label' in the output posted was affected by https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/497321/ that was cherry-p" [puppet] - 10https://gerrit.wikimedia.org/r/498417 (https://phabricator.wikimedia.org/T211125) (owner: 10Ppchelko) [17:40:30] PROBLEM - Router interfaces on cr2-eqiad is CRITICAL: CRITICAL: host 208.80.154.197, interfaces up: 229, down: 1, dormant: 0, excluded: 0, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down [17:45:06] PROBLEM - Packet loss ratio for UDP on logstash1007 is CRITICAL: 0.3411 ge 0.1 https://grafana.wikimedia.org/dashboard/db/logstash [17:45:36] RECOVERY - Router interfaces on cr2-eqord is OK: OK: host 208.80.154.198, interfaces up: 55, down: 0, dormant: 0, excluded: 0, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down [17:45:40] RECOVERY - Router interfaces on cr2-eqiad is OK: OK: host 208.80.154.197, interfaces up: 231, down: 0, dormant: 0, excluded: 0, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down [17:49:00] RECOVERY - Packet loss ratio for UDP on logstash1007 is OK: (C)0.1 ge (W)0.05 ge 0 https://grafana.wikimedia.org/dashboard/db/logstash [17:51:25] (03PS2) 10Krinkle: arclamp: Rename xenon-grep to arclamp-grep [puppet] - 10https://gerrit.wikimedia.org/r/498948 (https://phabricator.wikimedia.org/T176916) [17:51:27] (03PS2) 10Krinkle: arclamp: Rename xenon-log to arclamp-log [puppet] - 10https://gerrit.wikimedia.org/r/498950 (https://phabricator.wikimedia.org/T176916) [17:53:06] (03PS2) 10Krinkle: arclamp: Rename xenon-log script to arclamp-log in Puppet [puppet] - 10https://gerrit.wikimedia.org/r/498942 (https://phabricator.wikimedia.org/T176916) [17:53:08] (03PS3) 10Krinkle: arclamp: Rename xenon-grep to arclamp-grep [puppet] - 10https://gerrit.wikimedia.org/r/498948 (https://phabricator.wikimedia.org/T176916) [17:53:10] (03PS3) 10Krinkle: arclamp: Rename provisioning of xenon-log to arclamp-log [puppet] - 10https://gerrit.wikimedia.org/r/498950 (https://phabricator.wikimedia.org/T176916) [17:56:23] (03CR) 10Krinkle: "Clean compiler output for the first 4 patches including this one:" [puppet] - 10https://gerrit.wikimedia.org/r/498942 (https://phabricator.wikimedia.org/T176916) (owner: 10Krinkle) [18:00:04] Deploy window Morning SWAT (Max 6 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20190325T1800) [18:00:04] dmaza, Zoranzoki21, dcausse, ottomata, and MatmaRex: A patch you scheduled for Morning SWAT (Max 6 patches) is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker. [18:00:15] here [18:00:19] o/ [18:00:56] (03CR) 10Krinkle: [C: 03+1] "need mwgrep :)" [puppet] - 10https://gerrit.wikimedia.org/r/498927 (https://phabricator.wikimedia.org/T219162) (owner: 10DCausse) [18:00:57] Here :) [18:01:41] Reedy: Hi, Have you think about https://phabricator.wikimedia.org/T217486? [18:02:21] I suppose I can SWAT [18:02:29] here [18:02:35] (03PS1) 10CDanis: sync_icinga_state: copy permissions/group/owner as well [puppet] - 10https://gerrit.wikimedia.org/r/498954 [18:03:14] Can you deploy my patches first, I have no much time [18:04:07] Reedy: Nobody from SWAT member will merge the patch, until your approval. [18:04:22] PROBLEM - Packet loss ratio for UDP on logstash1007 is CRITICAL: 0.1234 ge 0.1 https://grafana.wikimedia.org/dashboard/db/logstash [18:04:43] Zoranzoki21: yes just after dmaza config patch, I've just hit °2 [18:04:44] (03CR) 10DCausse: [C: 03+2] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/496202 (https://phabricator.wikimedia.org/T211622) (owner: 10Dmaza) [18:04:46] +2 [18:04:56] dcausse: Ok [18:05:42] hi [18:06:22] i suppose i might be over the limit, i can reschedule to next swat if we run out of time [18:06:32] this is so cool. 3 SWAT members are online [18:06:56] RECOVERY - Packet loss ratio for UDP on logstash1007 is OK: (C)0.1 ge (W)0.05 ge 0 https://grafana.wikimedia.org/dashboard/db/logstash [18:07:37] (03PS7) 10DCausse: Enforce 8 char password length requirements for non-privileged users [mediawiki-config] - 10https://gerrit.wikimedia.org/r/496202 (https://phabricator.wikimedia.org/T211622) (owner: 10Dmaza) [18:08:59] Wait, would that (the copy upload thing) even work [18:09:09] We used to have commons firewalled off... [18:09:09] (03CR) 10DCausse: [C: 03+2] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/496202 (https://phabricator.wikimedia.org/T211622) (owner: 10Dmaza) [18:10:13] (03Merged) 10jenkins-bot: Enforce 8 char password length requirements for non-privileged users [mediawiki-config] - 10https://gerrit.wikimedia.org/r/496202 (https://phabricator.wikimedia.org/T211622) (owner: 10Dmaza) [18:11:29] dmaza: live on mwdebug1002, can you test? [18:11:49] yes, testing. [18:13:36] looks good here [18:13:39] dcausse: [18:13:45] ok [18:13:46] (03PS1) 10Paladox: Merge branch 'stable-2.15' into wmf/stable-2.15 [software/gerrit] (wmf/stable-2.15) - 10https://gerrit.wikimedia.org/r/498961 [18:14:05] (03CR) 10Paladox: [V: 03+2 C: 03+2] Merge branch 'stable-2.15' into wmf/stable-2.15 [software/gerrit] (wmf/stable-2.15) - 10https://gerrit.wikimedia.org/r/498961 (owner: 10Paladox) [18:14:20] Question: How I can en-masse remove myself as reviewer? [18:14:52] (03CR) 10jenkins-bot: Enforce 8 char password length requirements for non-privileged users [mediawiki-config] - 10https://gerrit.wikimedia.org/r/496202 (https://phabricator.wikimedia.org/T211622) (owner: 10Dmaza) [18:15:26] !log dcausse@deploy1001 Synchronized wmf-config/CommonSettings.php: T211622: Enforce 8 char password length requirements for non-privileged users (duration: 00m 50s) [18:15:28] bawolff: I don't know about commons firewall. But testwiki works fine with this's patch config as community want for pawiki. [18:15:38] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:15:39] T211622: Change password length requirement and ensure enforcement for non-privileged users (from 1 to 8) - https://phabricator.wikimedia.org/T211622 [18:15:41] Well if testwiki works fine, then that's cool [18:15:43] dmaza: done, still waiting for CI on your other patch [18:15:54] dcausse: thank you.. i'll be here [18:16:01] (03PS8) 10Zoranzoki21: Add category at wgGettingStartedExcludedCategories for srwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/482534 [18:16:09] historically it didn't work, but maybe that fixed [18:16:24] (03PS9) 10Zoranzoki21: Add categories for all Croatian projects at wmgBabelMainCategory [mediawiki-config] - 10https://gerrit.wikimedia.org/r/482548 [18:16:34] (03PS3) 10Zoranzoki21: Remove namespace 104 from FlaggedRevs configuration for arwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/498947 (https://phabricator.wikimedia.org/T217507) [18:16:42] (03CR) 10DCausse: [C: 03+2] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/446583 (https://phabricator.wikimedia.org/T192135) (owner: 10Zoranzoki21) [18:17:28] bawolff: Yeah that is not worked in past. But Now it is working fine. I uploaded https://test.wikipedia.org/wiki/File:AgentCARTER.png via api. [18:18:01] (03Merged) 10jenkins-bot: Enable VisualEditor at fiwikibooks [mediawiki-config] - 10https://gerrit.wikimedia.org/r/446583 (https://phabricator.wikimedia.org/T192135) (owner: 10Zoranzoki21) [18:18:39] dcausse: mwdebug? [18:20:14] bawolff: The only problem is here that zeljkofilipin told me to get +1 from Reedy or some other trusted user. But whenever I called Reedy on IRC, I did not get any reply. [18:20:21] dcausse: i'm going to get a coffee real quick, be back in < 10 mins [18:20:46] (03CR) 10Ayounsi: [C: 03+1] "I haven't tested it, but the logic and code looks good to me." [software/netbox-reports] - 10https://gerrit.wikimedia.org/r/495267 (https://phabricator.wikimedia.org/T212526) (owner: 10CRusnov) [18:20:59] Jayprakash12345: Well its kind of late for him now I guess [18:21:19] 10Operations, 10MediaWiki-Cache, 10MW-1.33-notes (1.33.0-wmf.21; 2019-03-12), 10Patch-For-Review, and 3 others: Mcrouter periodically reports soft TKOs for mc1022 (was mc1035) leading to MW Memcached exceptions - https://phabricator.wikimedia.org/T203786 (10elukey) Today I've ran tcpdump on mc1022 and stop... [18:21:51] Zoranzoki21: live on mwdebug1002, can you test? [18:21:56] dcausse: Sure [18:22:07] dmaza: FlaggedRev is on mwdebug1002 as well [18:22:17] dcausse: thank you.. testing [18:22:50] dcausse: Ok. [18:22:52] works [18:23:20] Zoranzoki21: deploying [18:25:04] PROBLEM - HHVM rendering on mwdebug1002 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers [18:26:08] (03CR) 10jenkins-bot: Enable VisualEditor at fiwikibooks [mediawiki-config] - 10https://gerrit.wikimedia.org/r/446583 (https://phabricator.wikimedia.org/T192135) (owner: 10Zoranzoki21) [18:26:59] !log dcausse@deploy1001 Synchronized dblists/visualeditor-nondefault.dblist: T192135 (duration: 00m 50s) [18:27:03] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:27:04] T192135: Enable visual editor at fi.wikibooks.org - https://phabricator.wikimedia.org/T192135 [18:27:36] RECOVERY - HHVM rendering on mwdebug1002 is OK: HTTP OK: HTTP/1.1 200 OK - 80016 bytes in 5.706 second response time https://wikitech.wikimedia.org/wiki/Application_servers [18:28:15] I am back [18:28:23] Zoranzoki21: you're change is live [18:28:40] Which [18:28:50] For visual editor [18:28:58] Zoranzoki21: I'd need a +1 for https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/482534 [18:29:01] dcausse: looks good here [18:29:03] Zoranzoki21: yes [18:29:10] dmaza: ok shipping [18:29:39] Ok... [18:31:00] What I have all of patches [18:31:04] I am on phone [18:31:42] ... [18:31:51] Zoranzoki21: none of your last 3 patches have a +1 :( [18:31:59] Ok [18:32:07] !log dcausse@deploy1001 Synchronized php-1.33.0-wmf.22/extensions/FlaggedRevs/: T218949: Fix reject changes when user is partially blocked (duration: 00m 51s) [18:32:09] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:32:10] T218949: When flagged revisions are in effect, users cannot review revisions if partially blocked - https://phabricator.wikimedia.org/T218949 [18:32:14] dmaza: ^ live [18:32:14] But I need for FR merged patch [18:32:38] dcausse: thank you [18:32:38] * ottomata back [18:32:39] Zoranzoki21: perhaps create a phab task attached to them? [18:32:45] It is there? [18:32:56] Included already [18:33:04] Zoranzoki21: only https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/498947 have a phab task [18:33:09] but no +1 :( [18:33:23] Can you give me link for FR patch [18:33:33] I am on phone [18:33:38] Now [18:33:46] Zoranzoki21: https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/498947 [18:34:41] I see task number in commit message [18:35:03] What I nees [18:35:04] Zoranzoki21: yes, reviewing now [18:35:06] *need [18:35:58] (03CR) 10DCausse: [C: 03+2] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/498947 (https://phabricator.wikimedia.org/T217507) (owner: 10Zoranzoki21) [18:36:09] (03CR) 10Lucas Werkmeister (WMDE): Remove namespace 104 from FlaggedRevs configuration for arwiki (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/498947 (https://phabricator.wikimedia.org/T217507) (owner: 10Zoranzoki21) [18:36:17] Zoranzoki21: I'm going to ship the FlaggedRev patch [18:37:00] Hmm [18:37:09] (03Merged) 10jenkins-bot: Remove namespace 104 from FlaggedRevs configuration for arwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/498947 (https://phabricator.wikimedia.org/T217507) (owner: 10Zoranzoki21) [18:37:23] (03CR) 10jenkins-bot: Remove namespace 104 from FlaggedRevs configuration for arwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/498947 (https://phabricator.wikimedia.org/T217507) (owner: 10Zoranzoki21) [18:37:52] Everything is ok? [18:38:43] Zoranzoki21: can you test FlaggedRev on arwiki&mwdebug1002? [18:38:56] I will [18:39:17] Zoranzoki21: it's live on mwdebug1002 [18:39:50] Ok is [18:41:06] !log dcausse@deploy1001 scap failed: average error rate on 6/11 canaries increased by 10x (rerun with --force to override this check, see https://logstash.wikimedia.org/goto/db09a36be5ed3e81155041f7d46ad040 for details) [18:41:07] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:42:06] (03PS1) 10DCausse: Revert "Remove namespace 104 from FlaggedRevs configuration for arwiki" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/498966 [18:42:17] Lucas_WMDE: thanks [18:42:26] What is going on? [18:42:50] Zoranzoki21: your patch is broken, see comments from Lucas, I'm reverting [18:42:51] dcausse: did it blow up on mwdebug1002? [18:43:01] ok [18:43:03] Lucas_WMDE: canary servers [18:43:24] Zoranzoki21: OMG.. Let's revert it and I will do it for another window [18:43:29] good thing we have those [18:43:32] Oops.. dcausse [18:43:38] !log restart mjolnir-kafka-msearch-daemon across cirrus elasticsearch servers [18:43:39] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:43:49] (03CR) 10DCausse: [C: 03+2] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/498966 (owner: 10DCausse) [18:44:55] (03Merged) 10jenkins-bot: Revert "Remove namespace 104 from FlaggedRevs configuration for arwiki" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/498966 (owner: 10DCausse) [18:46:46] !log dcausse@deploy1001 Synchronized wmf-config/flaggedrevs.php: revert T217507 (duration: 00m 49s) [18:46:51] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:46:52] T217507: FlaggedRevs still treats a removed namespace as if it still exists (arwiki) - https://phabricator.wikimedia.org/T217507 [18:47:03] ottomata: around? [18:47:07] yup [18:47:55] (03CR) 10DCausse: [C: 03+2] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/496446 (https://phabricator.wikimedia.org/T218260) (owner: 10Ppchelko) [18:48:06] ottomata: is there something you can test on mwdebug1002? [18:49:05] dcausse: yes i thiink so! i can get to a group0 wiki on mwdebug ya... [18:49:08] got to remember, tes.wm.? [18:49:10] test.wp i mean? [18:49:21] yes [18:49:33] yes, dcausse if you deploy those two patches there, i can test on mwdebug1002 [18:49:39] ottomata: ok [18:49:58] dcausse: looks like my patch is throwing some exceptions.. how/who/when can revert this? I don't know the process [18:50:27] dmaza: prep a revert patch and I'll ship it [18:50:40] k.. let me see how to do that [18:51:13] (03PS4) 10DCausse: [EventBus] Decrease timeout and use hasty mode for analytics. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/496446 (https://phabricator.wikimedia.org/T218260) (owner: 10Ppchelko) [18:51:31] (03CR) 10DCausse: [C: 03+2] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/496446 (https://phabricator.wikimedia.org/T218260) (owner: 10Ppchelko) [18:51:44] dcausse: fyi those patches can go together, i can't test the first one without the second one really. [18:51:50] but if we need to revert we can just revert the 2nd one [18:51:55] ottomata: ok [18:52:33] (03Merged) 10jenkins-bot: [EventBus] Decrease timeout and use hasty mode for analytics. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/496446 (https://phabricator.wikimedia.org/T218260) (owner: 10Ppchelko) [18:52:40] dmaza: is it PHP Warning: in_array() expects parameter 2 to be an array or collection ? [18:52:47] yes [18:52:52] if yes it's my fault, I shipped a bad patch [18:52:59] I reverted it [18:53:00] oh [18:53:12] it was only canary servers [18:53:20] sorry.. so we are good then? [18:53:34] dmaza: double check that the errors are gone but yes I think we're good [18:53:52] ok.. I'll keep an eye out. Thank you [18:54:00] PROBLEM - Disk space on elastic1017 is CRITICAL: DISK CRITICAL - free space: /srv 28242 MB (5% inode=99%) [18:54:24] (03PS2) 10DCausse: Re-enable eventgate-analytics api-request logging for group0 wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/498873 (https://phabricator.wikimedia.org/T214080) (owner: 10Ottomata) [18:54:26] (03CR) 10jenkins-bot: Revert "Remove namespace 104 from FlaggedRevs configuration for arwiki" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/498966 (owner: 10DCausse) [18:54:28] (03CR) 10jenkins-bot: [EventBus] Decrease timeout and use hasty mode for analytics. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/496446 (https://phabricator.wikimedia.org/T218260) (owner: 10Ppchelko) [18:56:14] PROBLEM - Packet loss ratio for UDP on logstash1008 is CRITICAL: 0.1136 ge 0.1 https://grafana.wikimedia.org/dashboard/db/logstash [18:56:39] (03CR) 10DCausse: [C: 03+2] Re-enable eventgate-analytics api-request logging for group0 wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/498873 (https://phabricator.wikimedia.org/T214080) (owner: 10Ottomata) [18:57:30] RECOVERY - Packet loss ratio for UDP on logstash1008 is OK: (C)0.1 ge (W)0.05 ge 0 https://grafana.wikimedia.org/dashboard/db/logstash [18:58:02] (03Merged) 10jenkins-bot: Re-enable eventgate-analytics api-request logging for group0 wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/498873 (https://phabricator.wikimedia.org/T214080) (owner: 10Ottomata) [18:58:56] ottomata: should be live on mwdebug1002 [18:59:03] k testing [19:00:37] any objections if I extend the SWAT window? [19:00:52] dcausse: looks good to me, proceed [19:01:02] ok [19:01:11] 10Operations, 10serviceops, 10Core Platform Team (Security, stability, performance and scalability (TEC1)), 10Core Platform Team Kanban (Doing), and 3 others: Enabling api-request eventgate to group1 caused minor service disruptions - https://phabricator.wikimedia.org/T218255 (10Pchelolo) [19:02:49] !log ebernhardson@deploy1001 Started deploy [search/mjolnir/deploy@1cbf290]: Roll update to mjolnir-bulk-daemon es6 handling of super_detect_noop [19:02:51] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:02:56] (03PS1) 10Bstorm: sonofgridengine: SGE_DEBUG_LEVEL required for sge shadow to run [puppet] - 10https://gerrit.wikimedia.org/r/498973 (https://phabricator.wikimedia.org/T211055) [19:02:58] ottomata: "I see Unable to deliver all events: 202: 1 events hastily received." as ERROR in logstash from mwdebug1002 [19:03:04] is this OK? [19:03:08] (03PS8) 10Arturo Borrero Gonzalez: WMCS: introduce sssd, replacing nscd/nslcd [puppet] - 10https://gerrit.wikimedia.org/r/498359 (https://phabricator.wikimedia.org/T218126) [19:03:35] 10Operations, 10serviceops, 10Core Platform Team (Security, stability, performance and scalability (TEC1)), 10Core Platform Team Kanban (Doing), and 3 others: Enabling api-request eventgate to group1 caused minor service disruptions - https://phabricator.wikimedia.org/T218255 (10Ottomata) [19:03:42] Unable? [19:03:43] dcausse: no, it's not... [19:03:52] as error [19:04:00] i'm trying to load logstash too [19:04:12] RECOVERY - Disk space on elastic1017 is OK: DISK OK [19:04:14] open logstash and filter using host:mwdebug1002 [19:04:20] (03CR) 10jerkins-bot: [V: 04-1] WMCS: introduce sssd, replacing nscd/nslcd [puppet] - 10https://gerrit.wikimedia.org/r/498359 (https://phabricator.wikimedia.org/T218126) (owner: 10Arturo Borrero Gonzalez) [19:05:00] ottomata: we forgot about https://github.com/wikimedia/mediawiki-extensions-EventBus/blob/master/includes/EventBus.php#L157 [19:05:22] ah ha [19:05:23] should I revert both config patches? [19:05:26] eventbus status check. [19:05:28] hmm [19:05:44] so dcausse i believe its actually working, its just that eventbus extension thinks that 202 is an error [19:05:57] we should just check 2xx. [19:06:16] !log ebernhardson@deploy1001 Finished deploy [search/mjolnir/deploy@1cbf290]: Roll update to mjolnir-bulk-daemon es6 handling of super_detect_noop (duration: 03m 27s) [19:06:18] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:06:23] (03CR) 10jenkins-bot: Re-enable eventgate-analytics api-request logging for group0 wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/498873 (https://phabricator.wikimedia.org/T214080) (owner: 10Ottomata) [19:06:29] we can revert the group0 change [19:06:34] no need to revert the other one. [19:07:05] (03PS9) 10Arturo Borrero Gonzalez: WMCS: introduce sssd, replacing nscd/nslcd [puppet] - 10https://gerrit.wikimedia.org/r/498359 (https://phabricator.wikimedia.org/T218126) [19:07:07] (03PS1) 10DCausse: Revert "Re-enable eventgate-analytics api-request logging for group0 wikis" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/498975 [19:07:09] ottomata: I need to go afk for some time, I'll create a change for EventBus later this evening unless you create a change [19:07:15] ottomata: ok reverting [19:07:18] Pcheloloi'll do now [19:07:20] ok thanks dcausse [19:07:23] thanks for catching that [19:07:31] np [19:07:45] it would have cause group0 wikis to all report those errors for all api requests, even though all was working fine. [19:07:54] we can SWAT both tomorrow [19:08:10] ok reverting both then [19:08:51] (03CR) 10DCausse: [C: 03+2] Revert "Re-enable eventgate-analytics api-request logging for group0 wikis" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/498975 (owner: 10DCausse) [19:09:05] (03PS1) 10DCausse: Revert "[EventBus] Decrease timeout and use hasty mode for analytics." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/498976 [19:09:22] dcausse: you don't need to revert the decrease timeout one !:) [19:09:33] ok ok :)= [19:09:47] it won't matter if you do, but we'll just have to re-swat it tomorrow [19:09:49] ottomata: so I need to ship it :) [19:09:52] oh [19:09:57] yeah, if we're here let's ship it [19:09:59] if you don't mind [19:09:59] (03Merged) 10jenkins-bot: Revert "Re-enable eventgate-analytics api-request logging for group0 wikis" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/498975 (owner: 10DCausse) [19:10:01] it will be inactive [19:10:10] ok [19:10:20] (03Abandoned) 10DCausse: Revert "[EventBus] Decrease timeout and use hasty mode for analytics." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/498976 (owner: 10DCausse) [19:10:27] until its enabled on a wiki, but its nice to just have it out there so we don't need to move multiple patches at once tomorrow [19:13:14] !log dcausse@deploy1001 Synchronized wmf-config/CommonSettings.php: T218260 (duration: 00m 49s) [19:13:22] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:13:23] T218260: Decrease timeout for EventBus extension for analytics events - https://phabricator.wikimedia.org/T218260 [19:13:24] ottomata: live ^ [19:14:12] MatmaRex: around [19:14:22] ? [19:14:26] dcausse: yeah [19:14:34] (03CR) 10Bstorm: [C: 03+2] sonofgridengine: SGE_DEBUG_LEVEL required for sge shadow to run [puppet] - 10https://gerrit.wikimedia.org/r/498973 (https://phabricator.wikimedia.org/T211055) (owner: 10Bstorm) [19:14:59] MatmaRex: ok to deploy your patch? [19:15:05] sure [19:15:22] (03CR) 10DCausse: [C: 03+2] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/498924 (https://phabricator.wikimedia.org/T192254) (owner: 10Bartosz Dziewoński) [19:15:28] ok shipping [19:15:58] (03PS2) 10DCausse: Enable VisualEditor in Draft namespace ("Mustand") in etwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/498924 (https://phabricator.wikimedia.org/T192254) (owner: 10Bartosz Dziewoński) [19:17:15] (03CR) 10DCausse: [C: 03+2] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/498924 (https://phabricator.wikimedia.org/T192254) (owner: 10Bartosz Dziewoński) [19:17:32] (03CR) 10jenkins-bot: Revert "Re-enable eventgate-analytics api-request logging for group0 wikis" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/498975 (owner: 10DCausse) [19:17:45] thankd dcausse [19:17:50] (03PS2) 10DCausse: [cirrus] switch all wikis to eqiad (elastic 6.5.4) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/498319 (https://phabricator.wikimedia.org/T218878) [19:18:12] (03Merged) 10jenkins-bot: Enable VisualEditor in Draft namespace ("Mustand") in etwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/498924 (https://phabricator.wikimedia.org/T192254) (owner: 10Bartosz Dziewoński) [19:18:25] (03CR) 10jenkins-bot: Enable VisualEditor in Draft namespace ("Mustand") in etwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/498924 (https://phabricator.wikimedia.org/T192254) (owner: 10Bartosz Dziewoński) [19:18:41] MatmaRex: it's live on mwdebug1002 [19:19:26] dcausse: seems good [19:19:33] chaomodus: I'm seeing puppet errors on the puppet compiler hosts "Could not find data item profile::puppetdb::microservice::enabled in any Hiera data file " [19:19:37] MatmaRex: thanks for testing, shipping [19:19:43] Do you have a guess about whether I should set that to 'true' or 'false' for those hosts? [19:19:52] (It looks to me like you introduced that var) [19:20:23] my recommendation would be to set it to false [19:20:28] ok, will try [19:20:43] Sorry about the trouble [19:21:04] (03PS3) 10DCausse: [cirrus] switch all wikis to eqiad (elastic 6.5.4) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/498319 (https://phabricator.wikimedia.org/T218878) [19:21:06] Can you think of other things I might need to provide, related to that change? [19:21:15] !log dcausse@deploy1001 Synchronized wmf-config/InitialiseSettings.php: T192254 (duration: 00m 49s) [19:21:18] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:21:19] T192254: Enable Visual Editor in Draft namespace (102) in etwiki - https://phabricator.wikimedia.org/T192254 [19:21:24] MatmaRex: done [19:21:45] shipping my match now [19:22:10] thanks [19:22:18] (03CR) 10DCausse: [C: 03+2] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/498319 (https://phabricator.wikimedia.org/T218878) (owner: 10DCausse) [19:23:24] (03Merged) 10jenkins-bot: [cirrus] switch all wikis to eqiad (elastic 6.5.4) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/498319 (https://phabricator.wikimedia.org/T218878) (owner: 10DCausse) [19:23:53] oops, now it wants profile::puppetdb::microservice::port [19:24:45] chaomodus: ^ ? [19:24:57] (sorry, I'm acting a bit helpless because I don't know what any of this is) [19:26:16] boh [19:28:43] (03CR) 10jenkins-bot: [cirrus] switch all wikis to eqiad (elastic 6.5.4) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/498319 (https://phabricator.wikimedia.org/T218878) (owner: 10DCausse) [19:32:49] dcausse: is SWAT completed? [19:32:58] volans: almost [19:33:08] !log dcausse@deploy1001 Synchronized wmf-config/InitialiseSettings.php: [cirrus] switch all wikis to eqiad (elastic 6.5.4) (duration: 00m 50s) [19:33:10] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:33:22] volans: gimme a sec to monitor ^ [19:33:23] dcausse: ack, no prob, just wasn't sure and I need to restart Icinga but don't want in the middle of deployments ;) [19:33:30] no hurry [19:33:32] take your time [19:33:35] ok [19:37:29] !log morning SWAT done [19:37:33] volans: ^ [19:37:34] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:38:17] dcausse: ack, thanks [19:40:19] !log restart icinga on icinga1001 to reset modified attributes [19:40:21] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:42:14] 10Operations, 10CirrusSearch, 10Discovery-Search, 10Patch-For-Review: Deprecation warning on elasticsearch 6 - https://phabricator.wikimedia.org/T218994 (10dcausse) [19:45:49] PROBLEM - Packet loss ratio for UDP on logstash1009 is CRITICAL: 0.1044 ge 0.1 https://grafana.wikimedia.org/dashboard/db/logstash [19:48:14] !log elasticsearch search cluster: SET "logger.org.elasticsearch.deprecation.index.query.functionscore.ScoreFunctionBuilder" to "ERROR" to chi/psi/omega@eqiad (T218994) [19:48:17] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:48:17] T218994: Deprecation warning on elasticsearch 6 - https://phabricator.wikimedia.org/T218994 [19:49:49] PROBLEM - Mjolnir bulk update failure check - eqiad on icinga1001 is CRITICAL: 5.425e+06 gt 2 https://grafana.wikimedia.org/d/000000591/elasticsearch-mjolnir-bulk-updates?orgId=1&from=now-7d&to=now&panelId=1&fullscreen [19:50:49] RECOVERY - Packet loss ratio for UDP on logstash1009 is OK: (C)0.1 ge (W)0.05 ge 0.009885 https://grafana.wikimedia.org/dashboard/db/logstash [19:54:31] !log elasticsearch search cluster: SET "logger.org.elasticsearch.common.logging.DeprecationLogger" to "ERROR" to psi/omega@eqiad (T218994) [19:54:33] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:54:34] T218994: Deprecation warning on elasticsearch 6 - https://phabricator.wikimedia.org/T218994 [19:55:30] 10Operations, 10CirrusSearch, 10Discovery-Search, 10Patch-For-Review: Deprecation warning on elasticsearch 6 - https://phabricator.wikimedia.org/T218994 (10dcausse) [20:00:04] cscott, arlolra, subbu, bearND, halfak, and Amir1: That opportune time is upon us again. Time for a Services – Parsoid / Citoid / Mobileapps / ORES / … deploy. Don't be afraid. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20190325T2000). [20:06:31] 10Operations, 10MediaWiki-Cache, 10MW-1.33-notes (1.33.0-wmf.21; 2019-03-12), 10Patch-For-Review, and 3 others: Mcrouter periodically reports soft TKOs for mc1022 (was mc1035) leading to MW Memcached exceptions - https://phabricator.wikimedia.org/T203786 (10elukey) Even more interesting, is that if I run t... [20:11:23] (03CR) 10Cwhite: [C: 03+1] service::uwsgi: Allow instances to disable logging config [puppet] - 10https://gerrit.wikimedia.org/r/498516 (https://phabricator.wikimedia.org/T217932) (owner: 10BryanDavis) [20:14:00] (03CR) 10Cwhite: [C: 03+1] uwsgi::app: Handle ensure absent [puppet] - 10https://gerrit.wikimedia.org/r/498641 (owner: 10Alex Monk) [20:15:30] 10Operations, 10CirrusSearch, 10Discovery-Search, 10Patch-For-Review: Deprecation warning on elasticsearch 6 - https://phabricator.wikimedia.org/T218994 (10EBernhardson) [20:16:09] (03PS2) 10Jforrester: [BETA] SDC: Stop setting up old-style federation [mediawiki-config] - 10https://gerrit.wikimedia.org/r/498432 [20:16:19] (03CR) 10Jforrester: [C: 03+2] "Let's try this out." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/498432 (owner: 10Jforrester) [20:17:23] (03Merged) 10jenkins-bot: [BETA] SDC: Stop setting up old-style federation [mediawiki-config] - 10https://gerrit.wikimedia.org/r/498432 (owner: 10Jforrester) [20:22:22] 10Operations, 10CirrusSearch, 10Discovery-Search, 10Patch-For-Review: Deprecation warning on elasticsearch 6 - https://phabricator.wikimedia.org/T218994 (10EBernhardson) [20:22:29] (03PS2) 10Jforrester: SDC: Stop setting wgWBRepoSettings['foreignRepositories'] old-style federation [mediawiki-config] - 10https://gerrit.wikimedia.org/r/498433 [20:23:30] (03CR) 10jenkins-bot: [BETA] SDC: Stop setting up old-style federation [mediawiki-config] - 10https://gerrit.wikimedia.org/r/498432 (owner: 10Jforrester) [20:27:01] (03CR) 10Cwhite: "Looks pretty good. One suggestion inline." (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/498655 (owner: 10Alex Monk) [20:28:00] 10Operations, 10CirrusSearch, 10Discovery-Search, 10Patch-For-Review: Deprecation warning on elasticsearch 6 - https://phabricator.wikimedia.org/T218994 (10EBernhardson) Went through logstash and filtered one error at a time to collect a list of new warnings. Task description has been updated will all of t... [20:29:11] (03CR) 10Cwhite: "Looks good. One suggestion inline." (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/498645 (owner: 10Alex Monk) [20:32:29] !log T218994 set various deprecation channels on all six cirrus elasticsearch clusters to ERROR [20:32:32] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:32:32] T218994: Deprecation warning on elasticsearch 6 - https://phabricator.wikimedia.org/T218994 [20:33:03] 10Operations, 10CirrusSearch, 10Discovery-Search, 10Patch-For-Review: Deprecation warning on elasticsearch 6 - https://phabricator.wikimedia.org/T218994 (10EBernhardson) [20:39:41] (03CR) 10Ayounsi: [C: 03+2] Add Icinga alert to ping-offload dashboard alerts [puppet] - 10https://gerrit.wikimedia.org/r/498264 (https://phabricator.wikimedia.org/T190090) (owner: 10Ayounsi) [20:41:58] (03CR) 10Alex Monk: ferm::service: Allow ensure absent without proto/port (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/498645 (owner: 10Alex Monk) [20:43:36] (03CR) 10CRusnov: [C: 03+1] "LGTM, dashboard appears to be the correct one." [puppet] - 10https://gerrit.wikimedia.org/r/498264 (https://phabricator.wikimedia.org/T190090) (owner: 10Ayounsi) [20:47:36] (03PS3) 10Ayounsi: Add Icinga alert to ping-offload dashboard alerts [puppet] - 10https://gerrit.wikimedia.org/r/498264 (https://phabricator.wikimedia.org/T190090) [20:50:34] (03CR) 10Hashar: "That follow up a proposal from last week and IIRC Giuseppe and Faidon participated as well (hence adding you two as reviewers)." [puppet] - 10https://gerrit.wikimedia.org/r/498431 (owner: 10Hashar) [20:59:36] (03CR) 10Cwhite: "My main concern with this is how the script handles unreachable puppetmasters or puppetmasters that for some reason cannot perform the mer" [puppet] - 10https://gerrit.wikimedia.org/r/497069 (owner: 10Andrew Bogott) [21:00:04] bawolff and Reedy: #bothumor Q:Why did functions stop calling each other? A:They had arguments. Rise for Weekly Security deployment window . (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20190325T2100). [21:01:40] * ebernhardson cringes [21:02:11] :/ [21:05:59] I've a config fix for production I'd like to push if the window is not being used? [21:06:57] addshore: Also, is https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/490104/ needed to land? When? Who will do that? :-) [21:07:52] it isn;t blocking anything, but someone will land it, eventually! [21:08:11] Kk, as long as you've not been awkwardly waiting for me to land it, cool. ;-) [21:08:23] (03CR) 10Jforrester: [C: 03+2] SDC: Stop setting wgWBRepoSettings['foreignRepositories'] old-style federation [mediawiki-config] - 10https://gerrit.wikimedia.org/r/498433 (owner: 10Jforrester) [21:09:21] (03Merged) 10jenkins-bot: SDC: Stop setting wgWBRepoSettings['foreignRepositories'] old-style federation [mediawiki-config] - 10https://gerrit.wikimedia.org/r/498433 (owner: 10Jforrester) [21:09:39] (03PS6) 10MSantos: Pass flag use_nodejs10 for maps services [puppet] - 10https://gerrit.wikimedia.org/r/495735 (https://phabricator.wikimedia.org/T215523) [21:11:39] (03PS1) 10Jforrester: Revert "SDC: Stop setting wgWBRepoSettings['foreignRepositories'] old-style federation" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/498992 [21:11:50] addshore: Nope, turns out we need "old style federation". [21:11:55] (03CR) 10Jforrester: [C: 03+2] Revert "SDC: Stop setting wgWBRepoSettings['foreignRepositories'] old-style federation" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/498992 (owner: 10Jforrester) [21:12:17] we're not using the window [21:12:58] (03Merged) 10jenkins-bot: Revert "SDC: Stop setting wgWBRepoSettings['foreignRepositories'] old-style federation" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/498992 (owner: 10Jforrester) [21:13:08] Thanks, bawolff. [21:15:17] (03CR) 10Addshore: [C: 04-1] "IS.php would need syncing frist." (032 comments) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/490104 (https://phabricator.wikimedia.org/T214557) (owner: 10WMDE-leszek) [21:16:00] James_F: sorry, didnt see that above until just now (was writing a comment on the other patch) [21:16:08] addshore: :-) [21:16:13] yes, we still need the foreign repos config, we will need that until the entity source config is defined [21:16:24] And then unset it at run-time? [21:16:29] right now code in wikibase generated the entity source config from the legacy foreign repo config [21:16:46] well, it stays in globals, just is not used, other than to setup the entity source based stuff [21:17:03] Can we get the entity source config definition to execute earlier somehow? [21:17:27] hmmmmmmm??? [21:17:40] (03CR) 10jenkins-bot: SDC: Stop setting wgWBRepoSettings['foreignRepositories'] old-style federation [mediawiki-config] - 10https://gerrit.wikimedia.org/r/498433 (owner: 10Jforrester) [21:17:42] (03CR) 10jenkins-bot: Revert "SDC: Stop setting wgWBRepoSettings['foreignRepositories'] old-style federation" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/498992 (owner: 10Jforrester) [21:18:26] what is done is decided in https://github.com/wikimedia/mediawiki-extensions-Wikibase/blob/master/repo/includes/WikibaseRepo.php#L444-L449 [21:18:46] Oh, hmm. [21:18:47] everything below those selected lines is creating the config needed, but from foreignRepositories [21:19:29] addshore: https://logstash.wikimedia.org/app/kibana#/doc/logstash-*/logstash-2019.03.25/mediawiki?id=AWm2sJT9NBo9dX1kaP1A&_g=h@44136fa was the fatal if you care, but it sounds like you have a grip on things. [21:20:35] when which patch was tested? but basically yes, right now if you remove foreignRepo config, wihtout having entitysource config defined in mediawiki-config, then commons will forget how to load things from wikidata [21:21:16] About 10 minutes ago. :-) [21:22:13] PROBLEM - Check for gridmaster host resolution TCP on labservices1001 is CRITICAL: DNS CRITICAL - 0.014 seconds response time (No ANSWER SECTION found) https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting [21:22:27] PROBLEM - Check for gridmaster host resolution TCP on labservices1002 is CRITICAL: DNS CRITICAL - 0.019 seconds response time (No ANSWER SECTION found) https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting [21:22:31] PROBLEM - Check for gridmaster host resolution UDP on cloudservices1004 is CRITICAL: DNS CRITICAL - 0.017 seconds response time (No ANSWER SECTION found) https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting [21:22:35] PROBLEM - Check for gridmaster host resolution TCP on cloudservices1004 is CRITICAL: DNS CRITICAL - 0.018 seconds response time (No ANSWER SECTION found) https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting [21:22:43] PROBLEM - Check for gridmaster host resolution TCP on cloudservices1003 is CRITICAL: DNS CRITICAL - 0.018 seconds response time (No ANSWER SECTION found) https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting [21:22:45] (03CR) 10Addshore: SDC: Stop setting wgWBRepoSettings['foreignRepositories'] old-style federation (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/498433 (owner: 10Jforrester) [21:22:49] PROBLEM - Check for gridmaster host resolution UDP on cloudservices1003 is CRITICAL: DNS CRITICAL - 0.011 seconds response time (No ANSWER SECTION found) https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting [21:23:03] PROBLEM - Check for gridmaster host resolution UDP on labservices1001 is CRITICAL: DNS CRITICAL - 0.014 seconds response time (No ANSWER SECTION found) https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting [21:23:09] PROBLEM - Check for gridmaster host resolution UDP on labservices1002 is CRITICAL: DNS CRITICAL - 0.019 seconds response time (No ANSWER SECTION found) https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting [21:23:49] (03PS1) 10Jforrester: Revert "[BETA] SDC: Stop setting up old-style federation" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/498993 [21:23:57] (03PS2) 10Jforrester: Revert "[BETA] SDC: Stop setting up old-style federation" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/498993 [21:24:02] (03CR) 10Jforrester: [C: 03+2] Revert "[BETA] SDC: Stop setting up old-style federation" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/498993 (owner: 10Jforrester) [21:24:37] (03CR) 10EBernhardson: [C: 03+1] "mjolnir has been deployed and should have everything in place" [puppet] - 10https://gerrit.wikimedia.org/r/498232 (https://phabricator.wikimedia.org/T218833) (owner: 10EBernhardson) [21:25:01] (03Merged) 10jenkins-bot: Revert "[BETA] SDC: Stop setting up old-style federation" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/498993 (owner: 10Jforrester) [21:25:41] ebernhardson: When does https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/498442 need to land? Now? [21:26:05] (03PS7) 10EBernhardson: Switch mjolnir to rsyslog based structured logging [puppet] - 10https://gerrit.wikimedia.org/r/498232 (https://phabricator.wikimedia.org/T218833) [21:26:44] James_F: now would be good? It's the only thing still running the old code in Wikibase extension [21:27:03] oh, i guess stas is off today. I can schedule it for SWAT [21:27:03] Kk. [21:27:05] (03CR) 10Jforrester: [C: 03+2] Enable WBCS on Testcommons too [mediawiki-config] - 10https://gerrit.wikimedia.org/r/498442 (https://phabricator.wikimedia.org/T218715) (owner: 10Smalyshev) [21:27:09] or you can do it now :) [21:27:12] (03PS3) 10Jforrester: Enable WBCS on Testcommons too [mediawiki-config] - 10https://gerrit.wikimedia.org/r/498442 (https://phabricator.wikimedia.org/T218715) (owner: 10Smalyshev) [21:27:18] (03CR) 10Jforrester: [C: 03+2] "…" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/498442 (https://phabricator.wikimedia.org/T218715) (owner: 10Smalyshev) [21:27:39] Get Things Fixed™ [21:28:29] (03Merged) 10jenkins-bot: Enable WBCS on Testcommons too [mediawiki-config] - 10https://gerrit.wikimedia.org/r/498442 (https://phabricator.wikimedia.org/T218715) (owner: 10Smalyshev) [21:28:44] I'm not quite in the Reedy school of "unless it's C-1'ed I'll deploy it randomly at 03:00 on a holiday Sunday from orbit", but scheduling is meant to serve us, not the other way around. ;-) [21:28:48] (03CR) 10jenkins-bot: Revert "[BETA] SDC: Stop setting up old-style federation" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/498993 (owner: 10Jforrester) [21:28:50] (03CR) 10jenkins-bot: Enable WBCS on Testcommons too [mediawiki-config] - 10https://gerrit.wikimedia.org/r/498442 (https://phabricator.wikimedia.org/T218715) (owner: 10Smalyshev) [21:30:00] haha, from orbit [21:30:27] i hear the internet in orbit isn't soo bad, they have teleconferencing :P [21:30:29] From orbit, via an arthritic rabid carrier pigeon. [21:30:33] Happy? [21:31:17] ebernhardson: Search on mwdebug1002 looks fine to me. Happy for me to sync? [21:31:39] PROBLEM - puppet last run on labtestmetal2001 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 6 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[tzdata] [21:31:47] James_F: yup, looks the same [21:31:57] Kk. [21:33:06] !log jforrester@deploy1001 Synchronized wmf-config/InitialiseSettings.php: T218715 Enable WBCS on Testcommons too (duration: 00m 50s) [21:33:08] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:33:09] T218715: Deploy WikibaseCirrusSearch in production and run all searches through it - https://phabricator.wikimedia.org/T218715 [21:33:44] 10Operations, 10MediaWiki-Cache, 10Patch-For-Review, 10Performance-Team (Radar), 10User-Elukey: Consider removing the last traces of nutcracker in Mediawiki configs - https://phabricator.wikimedia.org/T214275 (10kchapman) [21:35:38] ebernhardson: Whilst I'm at it, does https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/497988/ want to head out? (Initial WikibaseLexemeCirrusSearch set-up. Code is out on wmf.22 so the whole stack should be fine.) [21:37:45] James_F: sure it can go, doesn't really matter what is deployed since its a noop with variables set to false [21:37:52] Yeah. [21:37:57] (03CR) 10Jforrester: [C: 03+2] Deploy WikibaseLexemeCirrusSearch: Part 1 - set up variables [mediawiki-config] - 10https://gerrit.wikimedia.org/r/497988 (https://phabricator.wikimedia.org/T216206) (owner: 10Smalyshev) [21:38:03] (03PS2) 10Jforrester: Deploy WikibaseLexemeCirrusSearch: Part 1 - set up variables [mediawiki-config] - 10https://gerrit.wikimedia.org/r/497988 (https://phabricator.wikimedia.org/T216206) (owner: 10Smalyshev) [21:38:08] (03CR) 10Jforrester: [C: 03+2] "…" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/497988 (https://phabricator.wikimedia.org/T216206) (owner: 10Smalyshev) [21:38:26] It'd be nice if "(Merge Conflict)" was highlighted in red rather than blending into the rest of the UI. [21:39:10] (03Merged) 10jenkins-bot: Deploy WikibaseLexemeCirrusSearch: Part 1 - set up variables [mediawiki-config] - 10https://gerrit.wikimedia.org/r/497988 (https://phabricator.wikimedia.org/T216206) (owner: 10Smalyshev) [21:39:16] James_F it is [21:39:21] (from 2.16) :) [21:39:33] paladox: Sure, but 2.16 is not here. [21:39:45] yup. [21:39:51] (03CR) 10jenkins-bot: Deploy WikibaseLexemeCirrusSearch: Part 1 - set up variables [mediawiki-config] - 10https://gerrit.wikimedia.org/r/497988 (https://phabricator.wikimedia.org/T216206) (owner: 10Smalyshev) [21:40:38] !log apply transport-in4 filter to cr1/2-eqiad - T190090 [21:40:43] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:40:43] T190090: Offload pings to dedicated server - https://phabricator.wikimedia.org/T190090 [21:43:01] 10Operations, 10netops: eqiad - eqord Telia link down - IC-314533 - https://phabricator.wikimedia.org/T218307 (10ayounsi) 05Open→03Resolved Link is back up. Cf. email thread for full details. But tl;dr; outdated cable label caused onsite tech to unplug our link. [21:43:14] !log jforrester@deploy1001 Synchronized wmf-config/InitialiseSettings.php: T216206 Deploy WikibaseLexemeCirrusSearch: Part 1 - set up variables, sub-part a (duration: 00m 50s) [21:43:17] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:43:17] T216206: Set up WikibaseLexemeCirrusSearch extension for Elastic code in WikibaseLexeme - https://phabricator.wikimedia.org/T216206 [21:44:32] !log jforrester@deploy1001 Synchronized wmf-config/Wikibase.php: T216206 Deploy WikibaseLexemeCirrusSearch: Part 1 - set up variables, sub-part b (duration: 00m 49s) [21:44:34] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:45:37] (03CR) 10Jforrester: [C: 03+1] "This should be good to go; wmf.22 is everywhere, tomorrow wmf.23 will get cut and if this is live it'll go out with wmf.23 and wmf.22 prop" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/497989 (https://phabricator.wikimedia.org/T216206) (owner: 10Smalyshev) [21:46:21] 10Operations, 10Traffic, 10netops, 10Patch-For-Review: Offload pings to dedicated server - https://phabricator.wikimedia.org/T190090 (10ayounsi) [21:48:00] 10Operations, 10Traffic, 10netops, 10Patch-For-Review: Offload pings to dedicated server - https://phabricator.wikimedia.org/T190090 (10ayounsi) 05Open→03Resolved Everything needed here is done. Full doc on https://wikitech.wikimedia.org/wiki/Ping_offload Will open a followup task once the Ganeti clust... [21:48:32] (03Abandoned) 10Jforrester: SDC: Stop setting up old-style federation, no longer read [mediawiki-config] - 10https://gerrit.wikimedia.org/r/498434 (owner: 10Jforrester) [21:50:21] (03PS1) 10Jforrester: [BETA] SDC: Temporarily disable Depicts on Beta Commons for testing T219221 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/498994 [21:50:42] (03CR) 10Jforrester: [C: 03+2] [BETA] SDC: Temporarily disable Depicts on Beta Commons for testing T219221 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/498994 (owner: 10Jforrester) [21:51:45] (03Merged) 10jenkins-bot: [BETA] SDC: Temporarily disable Depicts on Beta Commons for testing T219221 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/498994 (owner: 10Jforrester) [21:51:59] (03CR) 10jenkins-bot: [BETA] SDC: Temporarily disable Depicts on Beta Commons for testing T219221 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/498994 (owner: 10Jforrester) [21:52:11] (03PS2) 10Jforrester: Deploy WikibaseLexemeCirrusSearch: Part 2 - extensionlist [mediawiki-config] - 10https://gerrit.wikimedia.org/r/497989 (https://phabricator.wikimedia.org/T216206) (owner: 10Smalyshev) [22:14:22] (03PS1) 10EBernhardson: Turn on Elastica logging channel [mediawiki-config] - 10https://gerrit.wikimedia.org/r/499002 (https://phabricator.wikimedia.org/T219234) [22:16:53] (03PS1) 10Krinkle: speed-tests: Make static rev 872156204 more standalone [mediawiki-config] - 10https://gerrit.wikimedia.org/r/499003 (https://phabricator.wikimedia.org/T185446) [22:19:01] (03CR) 10Volans: [C: 03+1] "LGTM. I think I know what it was that way." [puppet] - 10https://gerrit.wikimedia.org/r/498954 (owner: 10CDanis) [22:19:18] cdanis: s/what/why/ :) [22:22:15] (03PS1) 10Andrew Bogott: puppet compiler workers: include the cloud puppet enc tool [puppet] - 10https://gerrit.wikimedia.org/r/499006 [22:22:17] (03PS1) 10Andrew Bogott: puppet-compiler: restore the ability to export facts without puppetdb [puppet] - 10https://gerrit.wikimedia.org/r/499007 [22:22:50] (03PS2) 10Andrew Bogott: puppet compiler workers: include the cloud puppet enc tool [puppet] - 10https://gerrit.wikimedia.org/r/499006 [22:25:06] (03CR) 10Andrew Bogott: [C: 03+2] puppet compiler workers: include the cloud puppet enc tool [puppet] - 10https://gerrit.wikimedia.org/r/499006 (owner: 10Andrew Bogott) [22:26:31] (03CR) 10Alex Monk: "I've been thinking about this. The problem is network::constants isn't set up like a normal class, it's not responsible for producing Pupp" [puppet] - 10https://gerrit.wikimedia.org/r/498796 (owner: 10Alex Monk) [22:28:28] (03CR) 10Krinkle: [C: 03+2] speed-tests: Make static rev 872156204 more standalone [mediawiki-config] - 10https://gerrit.wikimedia.org/r/499003 (https://phabricator.wikimedia.org/T185446) (owner: 10Krinkle) [22:28:42] Rolling out a quick patch for speed-tests [22:29:27] (03Merged) 10jenkins-bot: speed-tests: Make static rev 872156204 more standalone [mediawiki-config] - 10https://gerrit.wikimedia.org/r/499003 (https://phabricator.wikimedia.org/T185446) (owner: 10Krinkle) [22:30:30] (03PS1) 10Krinkle: speed-tests: Actually add the images [mediawiki-config] - 10https://gerrit.wikimedia.org/r/499009 (https://phabricator.wikimedia.org/T185446) [22:30:35] (03CR) 10Krinkle: [C: 03+2] speed-tests: Actually add the images [mediawiki-config] - 10https://gerrit.wikimedia.org/r/499009 (https://phabricator.wikimedia.org/T185446) (owner: 10Krinkle) [22:30:54] (03PS1) 10Jforrester: Revert "[BETA] SDC: Temporarily disable Depicts on Beta Commons for testing T219221" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/499010 [22:31:06] (03PS2) 10Jforrester: Revert "[BETA] SDC: Temporarily disable Depicts on Beta Commons for testing T219221" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/499010 [22:31:11] (03CR) 10Jforrester: [C: 03+2] Revert "[BETA] SDC: Temporarily disable Depicts on Beta Commons for testing T219221" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/499010 (owner: 10Jforrester) [22:31:36] (03Merged) 10jenkins-bot: speed-tests: Actually add the images [mediawiki-config] - 10https://gerrit.wikimedia.org/r/499009 (https://phabricator.wikimedia.org/T185446) (owner: 10Krinkle) [22:32:13] Krinkle: FYI as you're using the config repo. No-op for prod. ^^ [22:32:21] (03Merged) 10jenkins-bot: Revert "[BETA] SDC: Temporarily disable Depicts on Beta Commons for testing T219221" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/499010 (owner: 10Jforrester) [22:32:43] James_F: ok. pulled [22:32:48] !log krinkle@deploy1001 Synchronized docroot/wikipedia.org/speed-tests/Banksy.enwiki.872156204: T185446 (duration: 00m 49s) [22:32:51] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:32:52] T185446: Create static version of wiki page as reference page for our tests - https://phabricator.wikimedia.org/T185446 [22:33:49] * Krinkle releases deploy handle [22:35:35] (03CR) 10jenkins-bot: speed-tests: Make static rev 872156204 more standalone [mediawiki-config] - 10https://gerrit.wikimedia.org/r/499003 (https://phabricator.wikimedia.org/T185446) (owner: 10Krinkle) [22:35:37] (03CR) 10jenkins-bot: speed-tests: Actually add the images [mediawiki-config] - 10https://gerrit.wikimedia.org/r/499009 (https://phabricator.wikimedia.org/T185446) (owner: 10Krinkle) [22:35:39] (03CR) 10jenkins-bot: Revert "[BETA] SDC: Temporarily disable Depicts on Beta Commons for testing T219221" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/499010 (owner: 10Jforrester) [22:36:15] (03CR) 10Cwhite: [C: 03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/498954 (owner: 10CDanis) [22:39:16] (03PS2) 10CDanis: sync_icinga_state: copy permissions/group/owner as well [puppet] - 10https://gerrit.wikimedia.org/r/498954 [22:39:53] (03CR) 10Cwhite: [C: 03+1] "LGTM. Will see about deploying tomorrow. Thanks!" [puppet] - 10https://gerrit.wikimedia.org/r/498232 (https://phabricator.wikimedia.org/T218833) (owner: 10EBernhardson) [22:40:20] (03CR) 10CDanis: [C: 03+2] sync_icinga_state: copy permissions/group/owner as well [puppet] - 10https://gerrit.wikimedia.org/r/498954 (owner: 10CDanis) [22:41:06] (03CR) 10CDanis: [C: 03+2] "going to puppet-merge this and manually run puppet & execute it off-cycle on icinga2001 just to be sure" [puppet] - 10https://gerrit.wikimedia.org/r/498954 (owner: 10CDanis) [22:45:06] 10Operations, 10Traffic, 10Wikidata, 10Wikidata-Query-Service, 10User-Smalyshev: Reduce / remove the aggessive cache busting behaviour of wdqs-updater - https://phabricator.wikimedia.org/T217897 (10Smalyshev) > I'm still a bit confused about this logic inside the updater, especially with this id validati... [23:00:04] addshore, hashar, aude, MaxSem, twentyafterfour, RoanKattouw, Dereckson, thcipriani, Niharika, and zeljkof: #bothumor Q:Why did functions stop calling each other? A:They had arguments. Rise for Evening SWAT (Max 6 patches) . (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20190325T2300). [23:00:04] Krenair and ebernhardson: A patch you scheduled for Evening SWAT (Max 6 patches) is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker. [23:01:57] (03PS1) 10Krinkle: profiler: Remove code for virtually prepended frames in excimer samples [mediawiki-config] - 10https://gerrit.wikimedia.org/r/499011 (https://phabricator.wikimedia.org/T176916) [23:03:11] (03CR) 10Krinkle: [C: 04-1] "The comment about some traces not starting with a file may be a problem. Tim mentions' `php -a` and post-shutdown as potential candidates." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/499011 (https://phabricator.wikimedia.org/T176916) (owner: 10Krinkle) [23:06:25] I can SWAT is no-one else is doing so. [23:06:41] Krenair, ebernhardson: You here? [23:07:04] yep [23:07:24] Krenair: For yours, I just sling it out, right? db-labs isn't called in prod? [23:07:30] yes [23:07:33] (03CR) 10Jforrester: [C: 03+2] db-labs: add new slave deployment-db06 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/498729 (https://phabricator.wikimedia.org/T219087) (owner: 10Alex Monk) [23:07:36] WFM. [23:07:48] there's a labs realm safeguard around the whole thing [23:08:08] * James_F nods. [23:08:09] both inside the file and around the inclusion [23:08:34] (03Merged) 10jenkins-bot: db-labs: add new slave deployment-db06 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/498729 (https://phabricator.wikimedia.org/T219087) (owner: 10Alex Monk) [23:08:47] (03PS3) 10Jforrester: Deploy WikibaseLexemeCirrusSearch: Part 2 - extensionlist [mediawiki-config] - 10https://gerrit.wikimedia.org/r/497989 (https://phabricator.wikimedia.org/T216206) (owner: 10Smalyshev) [23:08:49] (03CR) 10jenkins-bot: db-labs: add new slave deployment-db06 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/498729 (https://phabricator.wikimedia.org/T219087) (owner: 10Alex Monk) [23:08:52] (03CR) 10Jforrester: [C: 03+2] Deploy WikibaseLexemeCirrusSearch: Part 2 - extensionlist [mediawiki-config] - 10https://gerrit.wikimedia.org/r/497989 (https://phabricator.wikimedia.org/T216206) (owner: 10Smalyshev) [23:10:14] James_F: here [23:10:14] (03PS2) 10Jforrester: [BETA] Enable WikibaseLexemeCirrusSearch on beta wikidata [mediawiki-config] - 10https://gerrit.wikimedia.org/r/497990 (https://phabricator.wikimedia.org/T216206) (owner: 10Smalyshev) [23:10:30] ebernhardson: Good to go with https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/499002 ? [23:10:36] my patch should be hard to tell any difference, there are a few call sites that log to Elastica channel but i don't expect it to log anything today [23:10:39] James_F: yup [23:10:41] (03CR) 10Jforrester: [C: 03+2] Turn on Elastica logging channel [mediawiki-config] - 10https://gerrit.wikimedia.org/r/499002 (https://phabricator.wikimedia.org/T219234) (owner: 10EBernhardson) [23:10:45] Kk. [23:10:58] i guess if it does log something, found a new problem :) [23:11:14] (03CR) 10Jforrester: "Do we actually want to do this? Lexeme shouldn't really be installed in repo mode on Commons at all…" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/497991 (https://phabricator.wikimedia.org/T216206) (owner: 10Smalyshev) [23:11:21] * James_F grins. [23:12:27] (03Merged) 10jenkins-bot: Deploy WikibaseLexemeCirrusSearch: Part 2 - extensionlist [mediawiki-config] - 10https://gerrit.wikimedia.org/r/497989 (https://phabricator.wikimedia.org/T216206) (owner: 10Smalyshev) [23:14:32] (03Merged) 10jenkins-bot: Turn on Elastica logging channel [mediawiki-config] - 10https://gerrit.wikimedia.org/r/499002 (https://phabricator.wikimedia.org/T219234) (owner: 10EBernhardson) [23:14:39] ebernhardson: Worth putting on mwdebug? [23:19:04] ebernhardson: Or should I just sync? (It's on mwdebug1002.) [23:19:23] James_F: just sync should be fine [23:19:30] Kk. [23:19:53] (03CR) 10jenkins-bot: Deploy WikibaseLexemeCirrusSearch: Part 2 - extensionlist [mediawiki-config] - 10https://gerrit.wikimedia.org/r/497989 (https://phabricator.wikimedia.org/T216206) (owner: 10Smalyshev) [23:19:55] (03CR) 10jenkins-bot: Turn on Elastica logging channel [mediawiki-config] - 10https://gerrit.wikimedia.org/r/499002 (https://phabricator.wikimedia.org/T219234) (owner: 10EBernhardson) [23:20:37] !log jforrester@deploy1001 Synchronized wmf-config/InitialiseSettings.php: SWAT T219234 Turn on Elastica logging channel (duration: 00m 51s) [23:20:41] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:20:41] T219234: Job runner request timeouts in Elastica\Task - https://phabricator.wikimedia.org/T219234 [23:20:49] Anything more? [23:22:34] OK, I'm declaring SWAT closed. [23:22:36] Be best. [23:26:31] \o/ [23:27:56] thanks James_F, it looks okay [23:29:13] Krenair: Groovy. [23:34:26] (03PS2) 10Krinkle: profiler: Fix stack frame mangling for Excimer [mediawiki-config] - 10https://gerrit.wikimedia.org/r/499011 (https://phabricator.wikimedia.org/T176916) [23:37:12] (03PS3) 10Krinkle: profiler: Fix stack frame mangling for Excimer [mediawiki-config] - 10https://gerrit.wikimedia.org/r/499011 (https://phabricator.wikimedia.org/T176916) [23:37:34] (03PS4) 10Krinkle: profiler: Fix stack frame mangling for Excimer [mediawiki-config] - 10https://gerrit.wikimedia.org/r/499011 (https://phabricator.wikimedia.org/T176916) [23:52:06] (03PS1) 10Alex Monk: scap: Make wmflabs php7 behaviour match prod's [puppet] - 10https://gerrit.wikimedia.org/r/499025 (https://phabricator.wikimedia.org/T219242) [23:53:29] (03PS1) 10Andrew Bogott: puppet compiler: collect facts from cloud VMs as well as prod hosts [puppet] - 10https://gerrit.wikimedia.org/r/499026