[01:36:35] (03PS3) 10Vladis13: Enable webfonts for ru,uk,be of wiki,wikisource, and for sourceswiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/509739 (https://phabricator.wikimedia.org/T220752) [01:41:30] (03PS4) 10Vladis13: Enable webfonts for ru,uk,be of wikisource, and for sourceswiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/509739 (https://phabricator.wikimedia.org/T220752) [04:33:29] RECOVERY - snapshot of s6 in codfw on db1115 is OK: snapshot for s6 at codfw taken less than 4 days ago and larger than 90 GB: Last one 2019-08-12 03:33:37 from db2097.codfw.wmnet:3316 (497 GB) https://wikitech.wikimedia.org/wiki/MariaDB/Backups [04:48:52] (03CR) 10Santhosh: [C: 04-1] Enable webfonts for ru,uk,be of wikisource, and for sourceswiki (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/509739 (https://phabricator.wikimedia.org/T220752) (owner: 10Vladis13) [04:52:50] (03PS5) 10Vladis13: Enable webfonts for ru,uk,be of wikisource, and for sourceswiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/509739 (https://phabricator.wikimedia.org/T220752) [04:55:37] (03PS6) 10Vladis13: Enable webfonts for ru,uk,be of wikisource, and for sourceswiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/509739 (https://phabricator.wikimedia.org/T220752) [05:01:37] (03PS2) 10Marostegui: filtered_tables: Remove math table [puppet] - 10https://gerrit.wikimedia.org/r/529346 (https://phabricator.wikimedia.org/T196055) [05:02:03] !log Remove math table from s1 - T196055 [05:02:11] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:02:12] T196055: Remove table `math` from the database - https://phabricator.wikimedia.org/T196055 [05:02:25] (03PS7) 10Vladis13: Enable webfonts for ru,uk,be of wikisource, and for sourceswiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/509739 (https://phabricator.wikimedia.org/T220752) [05:04:33] !log Remove math table from s3 - T196055 [05:04:40] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:11:31] (03PS1) 10Marostegui: check_depooled: Fixed typo, changed dbctl command [software] - 10https://gerrit.wikimedia.org/r/529628 [05:11:47] (03PS2) 10Marostegui: mariadb: Promote db1133 to m5 master [puppet] - 10https://gerrit.wikimedia.org/r/529331 (https://phabricator.wikimedia.org/T229657) [05:12:05] (03CR) 10Marostegui: [C: 03+2] "Thanks - done at https://gerrit.wikimedia.org/r/529628" [software] - 10https://gerrit.wikimedia.org/r/529366 (owner: 10Marostegui) [05:12:29] (03CR) 10Marostegui: [C: 03+2] check_depooled: Fixed typo, changed dbctl command [software] - 10https://gerrit.wikimedia.org/r/529628 (owner: 10Marostegui) [05:12:57] (03Merged) 10jenkins-bot: check_depooled: Fixed typo, changed dbctl command [software] - 10https://gerrit.wikimedia.org/r/529628 (owner: 10Marostegui) [05:20:25] (03PS1) 10Marostegui: mariadb: Provision db2121 into s7 [puppet] - 10https://gerrit.wikimedia.org/r/529629 (https://phabricator.wikimedia.org/T228969) [05:22:44] (03CR) 10Marostegui: [C: 03+2] mariadb: Provision db2121 into s7 [puppet] - 10https://gerrit.wikimedia.org/r/529629 (https://phabricator.wikimedia.org/T228969) (owner: 10Marostegui) [05:43:03] (03CR) 10Giuseppe Lavagetto: [C: 03+2] Remove the service object for the default schema [software/conftool] - 10https://gerrit.wikimedia.org/r/527564 (owner: 10Giuseppe Lavagetto) [05:43:10] (03CR) 10jerkins-bot: [V: 04-1] Remove the service object for the default schema [software/conftool] - 10https://gerrit.wikimedia.org/r/527564 (owner: 10Giuseppe Lavagetto) [05:43:52] (03CR) 10Marostegui: [C: 03+2] filtered_tables: Remove math table [puppet] - 10https://gerrit.wikimedia.org/r/529346 (https://phabricator.wikimedia.org/T196055) (owner: 10Marostegui) [05:43:59] (03PS3) 10Marostegui: filtered_tables: Remove math table [puppet] - 10https://gerrit.wikimedia.org/r/529346 (https://phabricator.wikimedia.org/T196055) [05:44:35] (03PS2) 10Giuseppe Lavagetto: Remove the service object for the default schema [software/conftool] - 10https://gerrit.wikimedia.org/r/527564 [05:44:37] (03PS2) 10Giuseppe Lavagetto: kvobject: fix some class property ordering [software/conftool] - 10https://gerrit.wikimedia.org/r/527565 [05:48:27] 10Operations, 10Math: Clean up artifacts from LaTeX based math rendering - https://phabricator.wikimedia.org/T195847 (10Marostegui) [05:49:44] 10Operations, 10Math: Clean up artifacts from LaTeX based math rendering - https://phabricator.wikimedia.org/T195847 (10Marostegui) [05:56:34] (03PS2) 10Vgutierrez: ATS: Fix OCSP stapling configuration [puppet] - 10https://gerrit.wikimedia.org/r/529332 (https://phabricator.wikimedia.org/T221594) [05:56:36] (03PS2) 10Vgutierrez: ATS: Allow writing OCSP responses in /etc/acmecerts [puppet] - 10https://gerrit.wikimedia.org/r/529335 (https://phabricator.wikimedia.org/T221594) [05:57:45] (03CR) 10jerkins-bot: [V: 04-1] ATS: Fix OCSP stapling configuration [puppet] - 10https://gerrit.wikimedia.org/r/529332 (https://phabricator.wikimedia.org/T221594) (owner: 10Vgutierrez) [06:01:08] (03PS3) 10Vgutierrez: ATS: Fix OCSP stapling configuration [puppet] - 10https://gerrit.wikimedia.org/r/529332 (https://phabricator.wikimedia.org/T221594) [06:01:10] (03PS3) 10Vgutierrez: ATS: Allow writing OCSP responses in /etc/acmecerts [puppet] - 10https://gerrit.wikimedia.org/r/529335 (https://phabricator.wikimedia.org/T221594) [06:02:20] (03CR) 10jerkins-bot: [V: 04-1] ATS: Fix OCSP stapling configuration [puppet] - 10https://gerrit.wikimedia.org/r/529332 (https://phabricator.wikimedia.org/T221594) (owner: 10Vgutierrez) [06:04:28] (03PS15) 10Vgutierrez: ATS: Include TLS instance in cache upload role [puppet] - 10https://gerrit.wikimedia.org/r/513970 (https://phabricator.wikimedia.org/T221594) [06:10:12] (03PS4) 10Vgutierrez: ATS: Fix OCSP stapling configuration [puppet] - 10https://gerrit.wikimedia.org/r/529332 (https://phabricator.wikimedia.org/T221594) [06:18:30] (03PS1) 10Marostegui: db-eqiad,db-codfw.php: Pool db2121 into s7 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/529634 (https://phabricator.wikimedia.org/T228969) [06:22:24] (03PS4) 10Vgutierrez: ATS: Allow writing OCSP responses in /etc/acmecerts [puppet] - 10https://gerrit.wikimedia.org/r/529335 (https://phabricator.wikimedia.org/T221594) [06:29:09] (03CR) 10Giuseppe Lavagetto: [C: 03+1] db-eqiad,db-codfw.php: Pool db2121 into s7 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/529634 (https://phabricator.wikimedia.org/T228969) (owner: 10Marostegui) [06:29:20] thanks _joe_ [06:31:43] (03PS16) 10Vgutierrez: ATS: Include TLS instance in cache upload role [puppet] - 10https://gerrit.wikimedia.org/r/513970 (https://phabricator.wikimedia.org/T221594) [06:33:10] Can someone review this https://gerrit.wikimedia.org/r/c/operations/puppet/+/481533 [06:33:13] and https://gerrit.wikimedia.org/r/c/operations/dns/+/481532 [06:33:31] PROBLEM - puppet last run on db1079 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 7 minutes ago with 1 failures. Failed resources (up to 3 shown): File[/usr/local/bin/apt2xml] https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [06:33:32] It's been sitting since December 2019 [06:33:35] It's been sitting since December 2018 [06:36:52] hmmm it looks good to me [06:39:25] (03CR) 10Vgutierrez: [C: 03+2] Add redirects from 'lzh' to 'zh-classical' [puppet] - 10https://gerrit.wikimedia.org/r/481533 (https://phabricator.wikimedia.org/T167513) (owner: 10Fomafix) [06:39:37] (03PS6) 10Vgutierrez: Add redirects from 'lzh' to 'zh-classical' [puppet] - 10https://gerrit.wikimedia.org/r/481533 (https://phabricator.wikimedia.org/T167513) (owner: 10Fomafix) [06:40:28] Thanks! [06:41:06] I didn't make the patch, but it has been long wanted by the community. <3 [06:41:34] (03CR) 10Vgutierrez: [C: 03+2] Add 'lzh' as alias for 'zh-classical' [dns] - 10https://gerrit.wikimedia.org/r/481532 (https://phabricator.wikimedia.org/T167513) (owner: 10Fomafix) [06:41:42] (03PS5) 10Vgutierrez: Add 'lzh' as alias for 'zh-classical' [dns] - 10https://gerrit.wikimedia.org/r/481532 (https://phabricator.wikimedia.org/T167513) (owner: 10Fomafix) [06:55:53] RECOVERY - puppet last run on db1079 is OK: OK: Puppet is currently enabled, last run 44 seconds ago with 0 failures https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [07:07:45] (03PS5) 10Vgutierrez: ATS: Fix OCSP stapling configuration [puppet] - 10https://gerrit.wikimedia.org/r/529332 (https://phabricator.wikimedia.org/T221594) [07:07:47] (03PS5) 10Vgutierrez: ATS: Allow writing OCSP responses in /etc/acmecerts [puppet] - 10https://gerrit.wikimedia.org/r/529335 (https://phabricator.wikimedia.org/T221594) [07:08:55] (03CR) 10jerkins-bot: [V: 04-1] ATS: Fix OCSP stapling configuration [puppet] - 10https://gerrit.wikimedia.org/r/529332 (https://phabricator.wikimedia.org/T221594) (owner: 10Vgutierrez) [07:09:05] ffs [07:13:26] (03PS6) 10Vgutierrez: ATS: Fix OCSP stapling configuration [puppet] - 10https://gerrit.wikimedia.org/r/529332 (https://phabricator.wikimedia.org/T221594) [07:13:28] (03PS6) 10Vgutierrez: ATS: Allow writing OCSP responses in /etc/acmecerts [puppet] - 10https://gerrit.wikimedia.org/r/529335 (https://phabricator.wikimedia.org/T221594) [07:15:09] PROBLEM - puppet last run on stat1004 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 3 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[python3.7] https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [07:15:36] 10Operations, 10DNS, 10Traffic, 10Wikimedia-Apache-configuration, 10Patch-For-Review: Redirect lzh.wikipedia to zh-classical.wikipedia - https://phabricator.wikimedia.org/T167513 (10Viztor) 05Open→03Resolved a:03Viztor Thanks everyone, this has been deployed. [07:17:13] (03CR) 10Marostegui: [C: 03+2] db-eqiad,db-codfw.php: Pool db2121 into s7 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/529634 (https://phabricator.wikimedia.org/T228969) (owner: 10Marostegui) [07:17:52] (03PS17) 10Vgutierrez: ATS: Include TLS instance in cache upload role [puppet] - 10https://gerrit.wikimedia.org/r/513970 (https://phabricator.wikimedia.org/T221594) [07:18:16] nice.. another CR that will be able to vote and drink alcohol (in the EU) before being merged [07:18:16] (03Merged) 10jenkins-bot: db-eqiad,db-codfw.php: Pool db2121 into s7 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/529634 (https://phabricator.wikimedia.org/T228969) (owner: 10Marostegui) [07:18:28] what a wonderful mess :) [07:19:30] !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Provision db2121 into s7 T228969 (duration: 00m 48s) [07:19:39] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:19:40] T228969: Productionize db21[21-31} - https://phabricator.wikimedia.org/T228969 [07:20:26] (03CR) 10jenkins-bot: db-eqiad,db-codfw.php: Pool db2121 into s7 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/529634 (https://phabricator.wikimedia.org/T228969) (owner: 10Marostegui) [07:20:56] !log marostegui@deploy1001 Synchronized wmf-config/db-codfw.php: Provision db2121 into s7 T228969 (duration: 00m 47s) [07:21:03] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:24:02] (03CR) 10Vgutierrez: ATS: Fix OCSP stapling configuration (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/529332 (https://phabricator.wikimedia.org/T221594) (owner: 10Vgutierrez) [07:24:19] (03CR) 10Vgutierrez: ATS: Allow writing OCSP responses in /etc/acmecerts (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/529335 (https://phabricator.wikimedia.org/T221594) (owner: 10Vgutierrez) [07:24:36] <_joe_> vgutierrez: why? [07:24:58] <_joe_> why is it so hard to merge? [07:25:26] nah, I've messed it up a little bit rebasing the 3 changes [07:25:37] now it's ready to be merged [07:25:47] as soon as ema gives me the green light [07:26:02] cause that one it's going to mess with our upload cluster a little bit [07:26:18] !log marostegui@cumin1001 dbctl commit (dc=all): 'Pool db2121 into s7', diff saved to https://phabricator.wikimedia.org/P8899 and previous config saved to /var/cache/conftool/dbconfig/20190812-072617-marostegui.json [07:26:24] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:32:16] (03PS18) 10Vgutierrez: ATS: Include TLS instance in cache upload role [puppet] - 10https://gerrit.wikimedia.org/r/513970 (https://phabricator.wikimedia.org/T221594) [07:33:44] (03PS1) 10Marostegui: mariadb: Promote db2105 to s3 codfw master [puppet] - 10https://gerrit.wikimedia.org/r/529704 (https://phabricator.wikimedia.org/T230106) [07:34:38] !log Switchover s3 codfw master db2043 -> db2105 - T230106 [07:34:45] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:34:45] T230106: Switchover codfw primary database masters to new hosts - https://phabricator.wikimedia.org/T230106 [07:35:55] (03PS7) 10Vgutierrez: ATS: Allow writing OCSP responses in /etc/acmecerts [puppet] - 10https://gerrit.wikimedia.org/r/529335 (https://phabricator.wikimedia.org/T221594) [07:35:57] (03PS19) 10Vgutierrez: ATS: Include TLS instance in cache upload role [puppet] - 10https://gerrit.wikimedia.org/r/513970 (https://phabricator.wikimedia.org/T221594) [07:39:00] (03PS1) 10Marostegui: db-codfw.php: Re-organize s3 codfw [mediawiki-config] - 10https://gerrit.wikimedia.org/r/529705 (https://phabricator.wikimedia.org/T230106) [07:39:12] (03PS8) 10Vgutierrez: ATS: Allow writing OCSP responses in /etc/acmecerts [puppet] - 10https://gerrit.wikimedia.org/r/529335 (https://phabricator.wikimedia.org/T221594) [07:39:14] (03PS20) 10Vgutierrez: ATS: Include TLS instance in cache upload role [puppet] - 10https://gerrit.wikimedia.org/r/513970 (https://phabricator.wikimedia.org/T221594) [07:40:00] (03CR) 10Marostegui: [C: 03+2] mariadb: Promote db2105 to s3 codfw master [puppet] - 10https://gerrit.wikimedia.org/r/529704 (https://phabricator.wikimedia.org/T230106) (owner: 10Marostegui) [07:43:01] RECOVERY - puppet last run on stat1004 is OK: OK: Puppet is currently enabled, last run 56 seconds ago with 0 failures https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [07:43:15] !log marostegui@cumin1001 dbctl commit (dc=all): 'Promote db2105 to s3 codfw master T230106', diff saved to https://phabricator.wikimedia.org/P8900 and previous config saved to /var/cache/conftool/dbconfig/20190812-074314-marostegui.json [07:43:27] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:43:28] T230106: Switchover codfw primary database masters to new hosts - https://phabricator.wikimedia.org/T230106 [07:43:54] (03CR) 10Marostegui: [C: 03+2] db-codfw.php: Re-organize s3 codfw [mediawiki-config] - 10https://gerrit.wikimedia.org/r/529705 (https://phabricator.wikimedia.org/T230106) (owner: 10Marostegui) [07:44:51] (03Merged) 10jenkins-bot: db-codfw.php: Re-organize s3 codfw [mediawiki-config] - 10https://gerrit.wikimedia.org/r/529705 (https://phabricator.wikimedia.org/T230106) (owner: 10Marostegui) [07:46:27] (03CR) 10jenkins-bot: db-codfw.php: Re-organize s3 codfw [mediawiki-config] - 10https://gerrit.wikimedia.org/r/529705 (https://phabricator.wikimedia.org/T230106) (owner: 10Marostegui) [07:46:28] !log marostegui@deploy1001 Synchronized wmf-config/db-codfw.php: Promote db2105 as s3 codfw master (duration: 00m 47s) [07:46:34] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:47:18] (03PS7) 10Vgutierrez: ATS: Fix OCSP stapling configuration [puppet] - 10https://gerrit.wikimedia.org/r/529332 (https://phabricator.wikimedia.org/T221594) [07:47:20] (03PS9) 10Vgutierrez: ATS: Allow writing OCSP responses in /etc/acmecerts [puppet] - 10https://gerrit.wikimedia.org/r/529335 (https://phabricator.wikimedia.org/T221594) [07:47:22] (03PS21) 10Vgutierrez: ATS: Include TLS instance in cache upload role [puppet] - 10https://gerrit.wikimedia.org/r/513970 (https://phabricator.wikimedia.org/T221594) [07:53:34] 10Operations, 10DNS, 10Traffic, 10Wikimedia-Apache-configuration, 10Patch-For-Review: Redirect lzh.wikipedia to zh-classical.wikipedia - https://phabricator.wikimedia.org/T167513 (10Fomafix) 05Resolved→03Open `lzh.wikipedia.org` works: ` $ curl https://lzh.wikipedia.org/ (03PS1) 10Marostegui: db-codfw.php: Reorganize s3 codfw weights [mediawiki-config] - 10https://gerrit.wikimedia.org/r/529707 (https://phabricator.wikimedia.org/T220170) [08:07:31] !log marostegui@cumin1001 dbctl commit (dc=codfw): 'Reorganize s3 codfw weights T220170', diff saved to https://phabricator.wikimedia.org/P8901 and previous config saved to /var/cache/conftool/dbconfig/20190812-080731-marostegui.json [08:07:39] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:07:39] T220170: Address Database infrastructure blockers on datacenter switchover & multi-dc deployment - https://phabricator.wikimedia.org/T220170 [08:09:39] (03CR) 10Marostegui: [C: 03+2] db-codfw.php: Reorganize s3 codfw weights [mediawiki-config] - 10https://gerrit.wikimedia.org/r/529707 (https://phabricator.wikimedia.org/T220170) (owner: 10Marostegui) [08:10:48] (03Merged) 10jenkins-bot: db-codfw.php: Reorganize s3 codfw weights [mediawiki-config] - 10https://gerrit.wikimedia.org/r/529707 (https://phabricator.wikimedia.org/T220170) (owner: 10Marostegui) [08:11:03] (03CR) 10jenkins-bot: db-codfw.php: Reorganize s3 codfw weights [mediawiki-config] - 10https://gerrit.wikimedia.org/r/529707 (https://phabricator.wikimedia.org/T220170) (owner: 10Marostegui) [08:12:32] !log marostegui@deploy1001 Synchronized wmf-config/db-codfw.php: Reorganize s3 codfw weights T220170 (duration: 00m 48s) [08:12:39] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:12:40] T220170: Address Database infrastructure blockers on datacenter switchover & multi-dc deployment - https://phabricator.wikimedia.org/T220170 [08:15:32] 10Operations, 10DNS, 10Traffic, 10Wikimedia-Apache-configuration, 10Patch-For-Review: Redirect lzh.wikipedia to zh-classical.wikipedia - https://phabricator.wikimedia.org/T167513 (10Vgutierrez) yeah, it's deployed, and actually I think we can drop the lzh.m.wikipedia.org rule from there, checking other l... [08:15:49] (03PS1) 10Elukey: Raise Hadoop HDFS Nanenode's heap size [puppet] - 10https://gerrit.wikimedia.org/r/529709 [08:17:35] (03CR) 10Arturo Borrero Gonzalez: [C: 03+1] "LGTM, with some comments inline for possible improvements." (033 comments) [puppet] - 10https://gerrit.wikimedia.org/r/529436 (https://phabricator.wikimedia.org/T223907) (owner: 10Jhedden) [08:18:48] (03PS2) 10Elukey: Raise Hadoop HDFS Nanenode's heap size [puppet] - 10https://gerrit.wikimedia.org/r/529709 [08:19:35] (03CR) 10Elukey: [C: 03+2] Raise Hadoop HDFS Nanenode's heap size [puppet] - 10https://gerrit.wikimedia.org/r/529709 (owner: 10Elukey) [08:22:33] !log restart Analytics hadoop HDFS namenodes to pick up new heap settings [08:22:39] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:23:29] (03PS1) 10Vgutierrez: redirects.dat: Drop unnecessary lzh.m.wikipedia.org rule [puppet] - 10https://gerrit.wikimedia.org/r/529710 (https://phabricator.wikimedia.org/T167513) [08:23:31] (03PS1) 10Vgutierrez: nc_redirects: Add lzh -> zh-classical language redirection for wikipedia.com [puppet] - 10https://gerrit.wikimedia.org/r/529711 (https://phabricator.wikimedia.org/T167513) [08:29:24] (03CR) 10Vgutierrez: [C: 03+2] redirects.dat: Drop unnecessary lzh.m.wikipedia.org rule [puppet] - 10https://gerrit.wikimedia.org/r/529710 (https://phabricator.wikimedia.org/T167513) (owner: 10Vgutierrez) [08:29:40] (03CR) 10Vgutierrez: [C: 03+2] nc_redirects: Add lzh -> zh-classical language redirection for wikipedia.com [puppet] - 10https://gerrit.wikimedia.org/r/529711 (https://phabricator.wikimedia.org/T167513) (owner: 10Vgutierrez) [08:30:27] 10Operations, 10DBA, 10decommission: Decommission db2043.codfw.wmnet - https://phabricator.wikimedia.org/T230311 (10Marostegui) [08:30:28] (03PS2) 10Vgutierrez: redirects.dat: Drop unnecessary lzh.m.wikipedia.org rule [puppet] - 10https://gerrit.wikimedia.org/r/529710 (https://phabricator.wikimedia.org/T167513) [08:30:40] 10Operations, 10DBA, 10decommission: Decommission db2043.codfw.wmnet - https://phabricator.wikimedia.org/T230311 (10Marostegui) p:05Triage→03Normal [08:30:41] (03PS2) 10Vgutierrez: nc_redirects: Add lzh -> zh-classical language redirection for wikipedia.com [puppet] - 10https://gerrit.wikimedia.org/r/529711 (https://phabricator.wikimedia.org/T167513) [08:32:19] (03CR) 10Giuseppe Lavagetto: profile:templates:services_proxy: Enable ipv6 (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/529401 (https://phabricator.wikimedia.org/T224538) (owner: 10Effie Mouzeli) [08:34:34] (03CR) 10Giuseppe Lavagetto: [C: 03+1] "fine with me, the code looks correct and AIUI all bases have been covered. I think we shall remove the selection at a later point when we " [puppet] - 10https://gerrit.wikimedia.org/r/425027 (https://phabricator.wikimedia.org/T195392) (owner: 10Giuseppe Lavagetto) [08:34:46] (03PS2) 10Filippo Giunchedi: base: don't CRITICAL on per-host puppet failures [puppet] - 10https://gerrit.wikimedia.org/r/528719 (https://phabricator.wikimedia.org/T229262) [08:35:00] (03CR) 10Filippo Giunchedi: [C: 03+2] base: don't CRITICAL on per-host puppet failures [puppet] - 10https://gerrit.wikimedia.org/r/528719 (https://phabricator.wikimedia.org/T229262) (owner: 10Filippo Giunchedi) [08:39:22] (03CR) 10Gehel: [C: 04-2] "the correct approach in is https://gerrit.wikimedia.org/r/c/operations/puppet/+/529362" [puppet] - 10https://gerrit.wikimedia.org/r/528885 (https://phabricator.wikimedia.org/T229621) (owner: 10Mathew.onipe) [08:41:54] (03CR) 10Effie Mouzeli: "We could restrict both on 127.0.0.1 and ::1, I left it on all interfaces since it was listening already on v4, but additionally because th" [puppet] - 10https://gerrit.wikimedia.org/r/529401 (https://phabricator.wikimedia.org/T224538) (owner: 10Effie Mouzeli) [08:47:05] 10Operations, 10DNS, 10Traffic, 10Wikimedia-Apache-configuration, 10Patch-For-Review: Redirect lzh.wikipedia to zh-classical.wikipedia - https://phabricator.wikimedia.org/T167513 (10Vgutierrez) 05Open→03Resolved [08:47:07] (03CR) 10Gehel: [C: 04-1] Add maps reboot cookbook (033 comments) [cookbooks] - 10https://gerrit.wikimedia.org/r/511819 (https://phabricator.wikimedia.org/T224072) (owner: 10Mathew.onipe) [08:53:36] (03PS1) 10Marostegui: mariadb: Decommission db2043 [puppet] - 10https://gerrit.wikimedia.org/r/529716 (https://phabricator.wikimedia.org/T230311) [08:53:50] 10Operations, 10ops-codfw, 10media-storage: Degraded RAID on ms-be2021 - https://phabricator.wikimedia.org/T230275 (10fgiunchedi) a:03Papaul @papaul please replace `1I:1:1`, thanks! [08:55:38] (03PS1) 10Marostegui: db-eqiad,db-codfw.php: Remove db2043 from config [mediawiki-config] - 10https://gerrit.wikimedia.org/r/529717 (https://phabricator.wikimedia.org/T230311) [08:59:12] (03PS2) 10Filippo Giunchedi: prometheus: add sentry4 outlet OIDs [puppet] - 10https://gerrit.wikimedia.org/r/528857 (https://phabricator.wikimedia.org/T148541) [08:59:18] (03CR) 10Filippo Giunchedi: [C: 03+2] prometheus: add sentry4 outlet OIDs [puppet] - 10https://gerrit.wikimedia.org/r/528857 (https://phabricator.wikimedia.org/T148541) (owner: 10Filippo Giunchedi) [09:06:08] !log depool and pool back mw1222 [09:06:15] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:06:49] (03PS1) 10Fsero: bug: TILLER_NAMESPACE variable in admin shouldnt be set [puppet] - 10https://gerrit.wikimedia.org/r/529718 [09:07:43] (03CR) 10jerkins-bot: [V: 04-1] bug: TILLER_NAMESPACE variable in admin shouldnt be set [puppet] - 10https://gerrit.wikimedia.org/r/529718 (owner: 10Fsero) [09:08:38] (03PS2) 10Fsero: helmfile: bug: TILLER_NAMESPACE variable in admin shouldnt be set [puppet] - 10https://gerrit.wikimedia.org/r/529718 [09:09:59] (03PS3) 10Fsero: helmfile: bug: TILLER_NAMESPACE variable in admin shouldnt be set [puppet] - 10https://gerrit.wikimedia.org/r/529718 [09:11:07] (03CR) 10Fsero: [C: 03+2] helmfile: bug: TILLER_NAMESPACE variable in admin shouldnt be set [puppet] - 10https://gerrit.wikimedia.org/r/529718 (owner: 10Fsero) [09:12:38] 10Operations, 10observability, 10Patch-For-Review, 10User-fgiunchedi: Replace Torrus with Prometheus snmp_exporter for PDUs monitoring - https://phabricator.wikimedia.org/T148541 (10fgiunchedi) [09:14:47] (03CR) 10Gehel: [C: 03+1] "PCC looks reasonable for the part that I understand: https://puppet-compiler.wmflabs.org/compiler1001/17850/" [puppet] - 10https://gerrit.wikimedia.org/r/529362 (https://phabricator.wikimedia.org/T229621) (owner: 10Jbond) [09:18:33] (03PS1) 10Fsero: helmfile: bug: TILLER_NAMESPACE extra line [puppet] - 10https://gerrit.wikimedia.org/r/529719 [09:19:16] (03CR) 10Marostegui: [C: 03+2] db-eqiad,db-codfw.php: Remove db2043 from config [mediawiki-config] - 10https://gerrit.wikimedia.org/r/529717 (https://phabricator.wikimedia.org/T230311) (owner: 10Marostegui) [09:20:12] (03Merged) 10jenkins-bot: db-eqiad,db-codfw.php: Remove db2043 from config [mediawiki-config] - 10https://gerrit.wikimedia.org/r/529717 (https://phabricator.wikimedia.org/T230311) (owner: 10Marostegui) [09:20:20] (03CR) 10Ema: [C: 03+1] ATS: Fix OCSP stapling configuration [puppet] - 10https://gerrit.wikimedia.org/r/529332 (https://phabricator.wikimedia.org/T221594) (owner: 10Vgutierrez) [09:20:30] (03CR) 10jenkins-bot: db-eqiad,db-codfw.php: Remove db2043 from config [mediawiki-config] - 10https://gerrit.wikimedia.org/r/529717 (https://phabricator.wikimedia.org/T230311) (owner: 10Marostegui) [09:20:45] (03PS2) 10Fsero: helmfile: bug: TILLER_NAMESPACE extra line [puppet] - 10https://gerrit.wikimedia.org/r/529719 [09:21:23] !log marostegui@deploy1001 Synchronized wmf-config/db-codfw.php: Remove db2043 from config T230311 (duration: 00m 48s) [09:21:31] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:21:32] T230311: Decommission db2043.codfw.wmnet - https://phabricator.wikimedia.org/T230311 [09:21:58] 10Operations, 10DBA, 10decommission, 10Patch-For-Review: Decommission db2043.codfw.wmnet - https://phabricator.wikimedia.org/T230311 (10Marostegui) [09:22:01] (03CR) 10Vgutierrez: [C: 03+2] ATS: Fix OCSP stapling configuration [puppet] - 10https://gerrit.wikimedia.org/r/529332 (https://phabricator.wikimedia.org/T221594) (owner: 10Vgutierrez) [09:22:02] !log Remove db2043 from tendril and zarcillo T230311 [09:22:09] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:22:10] (03PS8) 10Vgutierrez: ATS: Fix OCSP stapling configuration [puppet] - 10https://gerrit.wikimedia.org/r/529332 (https://phabricator.wikimedia.org/T221594) [09:22:19] !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Remove db2043 from config T230311 (duration: 00m 47s) [09:22:26] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:22:36] 10Operations, 10ops-eqiad, 10cloud-services-team: Degraded RAID on cloudvirt1024 - https://phabricator.wikimedia.org/T230289 (10wiki_willy) a:03Cmjohnson Just a heads up Chris, the system is under warranty thru June 2021. Thanks, Willy [09:24:00] (03PS3) 10Fsero: helmfile: bug: TILLER_NAMESPACE extra line [puppet] - 10https://gerrit.wikimedia.org/r/529719 [09:24:21] !log Remove empty table testcommonswiki. globalblocks from s4 - T230055 [09:24:28] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:24:28] T230055: Remove globalblocks tables from wikis - https://phabricator.wikimedia.org/T230055 [09:24:55] (03PS4) 10Fomafix: Add redirects from 'sgs' to 'bat-smg' [puppet] - 10https://gerrit.wikimedia.org/r/481540 (https://phabricator.wikimedia.org/T204830) [09:27:27] (03PS4) 10Fsero: helmfile: bug: TILLER_NAMESPACE extra line [puppet] - 10https://gerrit.wikimedia.org/r/529719 [09:27:29] (03CR) 10Vgutierrez: [C: 03+2] ATS: Allow writing OCSP responses in /etc/acmecerts [puppet] - 10https://gerrit.wikimedia.org/r/529335 (https://phabricator.wikimedia.org/T221594) (owner: 10Vgutierrez) [09:27:35] (03PS10) 10Vgutierrez: ATS: Allow writing OCSP responses in /etc/acmecerts [puppet] - 10https://gerrit.wikimedia.org/r/529335 (https://phabricator.wikimedia.org/T221594) [09:29:28] (03PS5) 10Fsero: helmfile: bug: TILLER_NAMESPACE extra line [puppet] - 10https://gerrit.wikimedia.org/r/529719 [09:29:30] (03PS2) 10Marostegui: mariadb: Decommission db2043 [puppet] - 10https://gerrit.wikimedia.org/r/529716 (https://phabricator.wikimedia.org/T230311) [09:30:19] (03CR) 10Marostegui: [C: 03+2] mariadb: Decommission db2043 [puppet] - 10https://gerrit.wikimedia.org/r/529716 (https://phabricator.wikimedia.org/T230311) (owner: 10Marostegui) [09:31:43] (03PS4) 10Effie Mouzeli: profile:templates:services_proxy: Enable ipv6 and listen only locally [puppet] - 10https://gerrit.wikimedia.org/r/529401 (https://phabricator.wikimedia.org/T224538) [09:31:55] 10Operations, 10DBA, 10decommission, 10Patch-For-Review: Decommission db2043.codfw.wmnet - https://phabricator.wikimedia.org/T230311 (10Marostegui) [09:32:32] (03PS11) 10Vgutierrez: ATS: Allow writing OCSP responses in /etc/acmecerts [puppet] - 10https://gerrit.wikimedia.org/r/529335 (https://phabricator.wikimedia.org/T221594) [09:32:46] !log Stop MySQL on db2043 T230311 [09:32:52] (03CR) 10Fsero: [C: 03+2] helmfile: bug: TILLER_NAMESPACE extra line [puppet] - 10https://gerrit.wikimedia.org/r/529719 (owner: 10Fsero) [09:32:53] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:32:54] T230311: Decommission db2043.codfw.wmnet - https://phabricator.wikimedia.org/T230311 [09:33:01] (03PS6) 10Fsero: helmfile: bug: TILLER_NAMESPACE extra line [puppet] - 10https://gerrit.wikimedia.org/r/529719 [09:33:24] (03PS5) 10Fomafix: Add redirects from 'sgs' to 'bat-smg' [puppet] - 10https://gerrit.wikimedia.org/r/481540 (https://phabricator.wikimedia.org/T204830) [09:33:37] 10Operations, 10DBA, 10decommission, 10Patch-For-Review: Decommission db2043.codfw.wmnet - https://phabricator.wikimedia.org/T230311 (10Marostegui) [09:33:55] (03PS2) 10Fomafix: Add 'vro' as alias for 'fiu-vro' [puppet] - 10https://gerrit.wikimedia.org/r/527915 (https://phabricator.wikimedia.org/T31186) [09:34:11] 10Operations, 10ops-codfw, 10decommission: Decommission db2043.codfw.wmnet - https://phabricator.wikimedia.org/T230311 (10Marostegui) a:05Marostegui→03RobH This host is ready for #dc-ops to decommission [09:34:31] 10Operations, 10DBA: Decommission db2043-db2069 - https://phabricator.wikimedia.org/T228258 (10Marostegui) [09:34:42] 10Operations, 10DBA: Predictive failures on disk S.M.A.R.T. status - https://phabricator.wikimedia.org/T208323 (10Marostegui) [09:36:04] !log Remove empty table enwikivoyage.globalblocks from s5 - T230055 [09:36:11] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:36:12] T230055: Remove globalblocks tables from wikis - https://phabricator.wikimedia.org/T230055 [09:36:22] (03PS7) 10Fsero: helmfile: bug: TILLER_NAMESPACE extra line [puppet] - 10https://gerrit.wikimedia.org/r/529719 [09:36:53] !log Disable puppet on mwmaint for 425027 [09:37:00] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:40:42] (03PS2) 10Fomafix: Add 'egl' as alias for 'eml' [puppet] - 10https://gerrit.wikimedia.org/r/527933 (https://phabricator.wikimedia.org/T36217) [09:42:45] (03CR) 10Effie Mouzeli: [C: 03+2] profile::mediawiki default php to php7 [puppet] - 10https://gerrit.wikimedia.org/r/425027 (https://phabricator.wikimedia.org/T195392) (owner: 10Giuseppe Lavagetto) [09:42:56] (03PS15) 10Effie Mouzeli: profile::mediawiki default php to php7 [puppet] - 10https://gerrit.wikimedia.org/r/425027 (https://phabricator.wikimedia.org/T195392) (owner: 10Giuseppe Lavagetto) [09:44:46] (03PS16) 10Effie Mouzeli: profile::mediawiki default php to php7 [puppet] - 10https://gerrit.wikimedia.org/r/425027 (https://phabricator.wikimedia.org/T195392) (owner: 10Giuseppe Lavagetto) [09:50:26] (03PS1) 10Marostegui: db2127: Clarify it is a candidate master [puppet] - 10https://gerrit.wikimedia.org/r/529725 (https://phabricator.wikimedia.org/T230106) [09:56:12] (03CR) 10Marostegui: [C: 03+2] db2127: Clarify it is a candidate master [puppet] - 10https://gerrit.wikimedia.org/r/529725 (https://phabricator.wikimedia.org/T230106) (owner: 10Marostegui) [09:56:43] (03PS9) 10Ema: systemd: add support for mask and unmask [puppet] - 10https://gerrit.wikimedia.org/r/529328 [09:56:45] (03PS3) 10Ema: ATS: ensure trafficserver is not auto-started upon installation [puppet] - 10https://gerrit.wikimedia.org/r/529402 [10:01:07] !log Remove empty table wikidatawiki.globalblocks from s8 - T230055 [10:01:14] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:01:15] T230055: Remove globalblocks tables from wikis - https://phabricator.wikimedia.org/T230055 [10:03:47] (03CR) 10Vgutierrez: [C: 03+1] systemd: add support for mask and unmask [puppet] - 10https://gerrit.wikimedia.org/r/529328 (owner: 10Ema) [10:07:57] !log Upgrade trafficserver to 8.0.3-1wm3 in cp5001 - T221594 [10:08:05] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:08:05] T221594: Puppetize ATS TLS configuration for incoming traffic - https://phabricator.wikimedia.org/T221594 [10:09:34] !log Remove empty table globalblocks from s3 (where it exists) - T230055 [10:09:41] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:09:42] T230055: Remove globalblocks tables from wikis - https://phabricator.wikimedia.org/T230055 [10:18:48] (03PS3) 10Fomafix: Add 'cmn' as alias for 'zh' [puppet] - 10https://gerrit.wikimedia.org/r/528835 (https://phabricator.wikimedia.org/T23915) [10:19:33] (03PS8) 10Vgutierrez: prometheus: Identify trafficserver instances using the layer label [puppet] - 10https://gerrit.wikimedia.org/r/508289 (https://phabricator.wikimedia.org/T221217) [10:21:12] 10Operations, 10DNS, 10Traffic, 10Wikidata, 10Wikidata.org: attempt to visit en.wikidata.org subdomain stalls - https://phabricator.wikimedia.org/T202840 (10Aklapper) 05Open→03Declined I'm going to boldly decline as the current situation is intentional (see https://gerrit.wikimedia.org/r/#/c/operatio... [10:22:47] (03PS5) 10Effie Mouzeli: profile:templates:services_proxy: Enable ipv6 and listen only locally [puppet] - 10https://gerrit.wikimedia.org/r/529401 (https://phabricator.wikimedia.org/T224538) [10:23:43] (03CR) 10Giuseppe Lavagetto: [C: 03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/529401 (https://phabricator.wikimedia.org/T224538) (owner: 10Effie Mouzeli) [10:25:21] jouncebot: now [10:25:22] No deployments scheduled for the next 0 hour(s) and 4 minute(s) [10:25:26] jouncebot: next [10:25:27] In 0 hour(s) and 4 minute(s): Wikimedia Portals Update (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20190812T1030) [10:28:20] 10Operations, 10Patch-For-Review: puppetdb queue size went up since July 30 - https://phabricator.wikimedia.org/T230002 (10Volans) 05Open→03Resolved a:03Volans As the queue on grafana has gone back to zero too I'll resolve it for now. Thanks a lot for the fix @jbond [10:28:22] 10Puppet, 10Patch-For-Review: Upgrade Puppet Masters and Puppet DB servers - https://phabricator.wikimedia.org/T228657 (10Volans) [10:28:25] !log Disable puppet on all servers running a services_proxy - T224538 [10:28:33] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:28:33] T224538: Socket Errors on PHP7 - https://phabricator.wikimedia.org/T224538 [10:30:04] jan_drewniak: I, the Bot under the Fountain, allow thee, The Deployer, to do Wikimedia Portals Update deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20190812T1030). [10:32:02] (03CR) 10Ema: [C: 03+2] systemd: add support for mask and unmask [puppet] - 10https://gerrit.wikimedia.org/r/529328 (owner: 10Ema) [10:34:23] (03CR) 10Effie Mouzeli: [C: 03+2] profile:templates:services_proxy: Enable ipv6 and listen only locally [puppet] - 10https://gerrit.wikimedia.org/r/529401 (https://phabricator.wikimedia.org/T224538) (owner: 10Effie Mouzeli) [10:34:35] (03PS6) 10Effie Mouzeli: profile:templates:services_proxy: Enable ipv6 and listen only locally [puppet] - 10https://gerrit.wikimedia.org/r/529401 (https://phabricator.wikimedia.org/T224538) [10:34:42] (03PS1) 10Elukey: role::kerberos::kdc: add support for replication [puppet] - 10https://gerrit.wikimedia.org/r/529733 (https://phabricator.wikimedia.org/T226089) [10:37:26] (03PS2) 10Elukey: role::kerberos::kdc: add support for replication [puppet] - 10https://gerrit.wikimedia.org/r/529733 (https://phabricator.wikimedia.org/T226089) [10:39:15] !log Restarting nginx on mwmaint2001.codfw.wmnet,mwmaint1002.eqiad.wmnet,scandium.eqiad.wmnet,snapshot[1005-1009].eqiad.wmnet, deploy2001.codfw.wmnet,deploy1001.eqiad.wmnet [10:39:21] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:42:10] (03CR) 10Santhosh: [C: 03+1] "+1 from me." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/509739 (https://phabricator.wikimedia.org/T220752) (owner: 10Vladis13) [10:45:26] (03CR) 10Filippo Giunchedi: [C: 03+1] monitoring: remove hostname from mgmt definitions [puppet] - 10https://gerrit.wikimedia.org/r/526165 (owner: 10Cwhite) [10:46:34] (03CR) 10Filippo Giunchedi: [C: 03+1] icinga: disable autocomplete.js in icinga search text input [puppet] - 10https://gerrit.wikimedia.org/r/528586 (owner: 10Cwhite) [10:47:06] !log Enabling puppet and rolling restarting nginx across the fleet - T224538 [10:47:08] (03CR) 10Filippo Giunchedi: [C: 03+1] "LGTM https://puppet-compiler.wmflabs.org/compiler1002/17855/cp2010.codfw.wmnet/" [puppet] - 10https://gerrit.wikimedia.org/r/529399 (https://phabricator.wikimedia.org/T229357) (owner: 10Cwhite) [10:47:14] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:47:14] T224538: Socket Errors on PHP7 - https://phabricator.wikimedia.org/T224538 [10:47:43] !log Upgrade trafficserver to 8.0.3-1wm3 in cp5002 - T221594 [10:47:50] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:47:50] T221594: Puppetize ATS TLS configuration for incoming traffic - https://phabricator.wikimedia.org/T221594 [10:48:23] (03CR) 10Volans: "Post merge -1. For the way we use it, the --run option is used only in systemd timers, that don't want to exit with the result of the repo" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/529132 (owner: 10CRusnov) [10:51:14] (03CR) 10Volans: [C: 03+2] "LGTM" [software/conftool] - 10https://gerrit.wikimedia.org/r/529185 (owner: 10CDanis) [10:51:50] (03CR) 10Volans: [C: 03+2] "LGTM" [software/conftool] - 10https://gerrit.wikimedia.org/r/529367 (owner: 10CDanis) [10:53:49] (03Merged) 10jenkins-bot: dbctl: reduce argparse boilerplate [software/conftool] - 10https://gerrit.wikimedia.org/r/529185 (owner: 10CDanis) [10:53:52] (03CR) 10Volans: [C: 03+1] "LGTM, one question on the deployment procedure inline." (031 comment) [software/conftool] - 10https://gerrit.wikimedia.org/r/529396 (https://phabricator.wikimedia.org/T229677) (owner: 10CDanis) [10:54:31] (03Merged) 10jenkins-bot: dbctl: clarify some CLI help messages [software/conftool] - 10https://gerrit.wikimedia.org/r/529367 (owner: 10CDanis) [10:59:38] (03PS3) 10Elukey: profile::kerberos::kadminserver: add support for replication [puppet] - 10https://gerrit.wikimedia.org/r/529733 (https://phabricator.wikimedia.org/T226089) [10:59:58] !log restarting trafficserver in cp5002 [11:00:04] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:00:04] Amir1, Lucas_WMDE, awight, and Urbanecm: My dear minions, it's time we take the moon! Just kidding. Time for European Mid-day SWAT(Max 6 patches) deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20190812T1100). [11:00:05] No GERRIT patches in the queue for this window AFAICS. [11:04:37] (03CR) 10Elukey: "Non blocking comment :)" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/529399 (https://phabricator.wikimedia.org/T229357) (owner: 10Cwhite) [11:05:48] (03PS4) 10Elukey: profile::kerberos::kadminserver: add support for replication [puppet] - 10https://gerrit.wikimedia.org/r/529733 (https://phabricator.wikimedia.org/T226089) [11:07:46] (03CR) 10Elukey: "Currently a no-op https://puppet-compiler.wmflabs.org/compiler1001/17857/kerberos1001.eqiad.wmnet/" [puppet] - 10https://gerrit.wikimedia.org/r/529733 (https://phabricator.wikimedia.org/T226089) (owner: 10Elukey) [11:08:43] (03CR) 10Elukey: "Moritz: this is a proposal for a first version of the replication between krb hosts, let's discuss it whenever you have time :)" [puppet] - 10https://gerrit.wikimedia.org/r/529733 (https://phabricator.wikimedia.org/T226089) (owner: 10Elukey) [11:09:12] ack jouncebot [11:09:25] looks like no EU SWAT today :) [11:10:05] * Amir1 comes up with something to deploy for sake of deployment [11:10:25] :D [11:11:31] If any ops around to deploy this, it would be great https://gerrit.wikimedia.org/r/c/operations/puppet/+/528526 [11:21:59] * Urbanecm is going to do an emergency deploy for T230304 [11:22:27] Amir1: Since I don't see you on deploy1001, I assume the air is clear? [11:22:50] Urbanecm: yea yeah [11:22:53] go [11:22:58] thanks [11:24:12] !log urbanecm@deploy1001 Synchronized wmf-config/InitialiseSettings.php: Enable global abuse filters on warwiki as an emergency measure (T230304) (duration: 00m 48s) [11:24:21] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:24:57] (03PS1) 10Urbanecm: Enable global abuse filters on warwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/529743 (https://phabricator.wikimedia.org/T230304) [11:25:50] (03CR) 10Urbanecm: [C: 03+2] Enable global abuse filters on warwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/529743 (https://phabricator.wikimedia.org/T230304) (owner: 10Urbanecm) [11:26:52] (03Merged) 10jenkins-bot: Enable global abuse filters on warwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/529743 (https://phabricator.wikimedia.org/T230304) (owner: 10Urbanecm) [11:27:08] (03CR) 10jenkins-bot: Enable global abuse filters on warwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/529743 (https://phabricator.wikimedia.org/T230304) (owner: 10Urbanecm) [11:27:11] thanks everyone, I'm done [11:34:44] !log restart atsmtail@backend on cp1076 [11:34:50] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:34:56] one more emergency deploy for the same task [11:36:44] !log urbanecm@deploy1001 Synchronized wmf-config/InitialiseSettings.php: More restrictive account creation throttle (T230304) (duration: 00m 47s) [11:36:51] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:37:33] (03PS1) 10Urbanecm: Temporary make account creation limits more restrictive [mediawiki-config] - 10https://gerrit.wikimedia.org/r/529746 (https://phabricator.wikimedia.org/T230304) [11:37:50] (03CR) 10Urbanecm: [C: 03+2] Temporary make account creation limits more restrictive [mediawiki-config] - 10https://gerrit.wikimedia.org/r/529746 (https://phabricator.wikimedia.org/T230304) (owner: 10Urbanecm) [11:38:48] (03Merged) 10jenkins-bot: Temporary make account creation limits more restrictive [mediawiki-config] - 10https://gerrit.wikimedia.org/r/529746 (https://phabricator.wikimedia.org/T230304) (owner: 10Urbanecm) [11:39:04] (03CR) 10jenkins-bot: Temporary make account creation limits more restrictive [mediawiki-config] - 10https://gerrit.wikimedia.org/r/529746 (https://phabricator.wikimedia.org/T230304) (owner: 10Urbanecm) [11:39:08] Done with the above [11:52:08] thanks Urbanecm [11:52:13] a lot [11:55:02] yw hauskatze [11:56:01] (03PS4) 10Mathew.onipe: cloudelastic: fix monitored ip addresses [puppet] - 10https://gerrit.wikimedia.org/r/529362 (https://phabricator.wikimedia.org/T229621) (owner: 10Jbond) [11:56:03] (03PS3) 10Mathew.onipe: lvs: allow access to wdqs lvs on port 8888 [puppet] - 10https://gerrit.wikimedia.org/r/529053 (https://phabricator.wikimedia.org/T176875) [11:56:44] (03CR) 10Mathew.onipe: lvs: allow access to wdqs lvs on port 8888 (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/529053 (https://phabricator.wikimedia.org/T176875) (owner: 10Mathew.onipe) [11:57:47] PROBLEM - mobileapps endpoints health on scb2006 is CRITICAL: /{domain}/v1/page/media/{title} (Get media in test page) is CRITICAL: Test Get media in test page returned the unexpected status 504 (expecting: 200) https://wikitech.wikimedia.org/wiki/Services/Monitoring/mobileapps [12:01:01] RECOVERY - mobileapps endpoints health on scb2006 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/mobileapps [12:07:01] (03PS15) 10Mathew.onipe: Add maps reboot cookbook [cookbooks] - 10https://gerrit.wikimedia.org/r/511819 (https://phabricator.wikimedia.org/T224072) [12:07:19] (03CR) 10Mathew.onipe: Add maps reboot cookbook (033 comments) [cookbooks] - 10https://gerrit.wikimedia.org/r/511819 (https://phabricator.wikimedia.org/T224072) (owner: 10Mathew.onipe) [12:08:42] (03PS10) 10Fomafix: Add additional aliases for sr-cyrl and sr-latn next to sr-ec and sr-el [puppet] - 10https://gerrit.wikimedia.org/r/368248 (https://phabricator.wikimedia.org/T117845) [12:15:24] (03PS1) 10Marostegui: mariadb: Provision db2122 into s7 [puppet] - 10https://gerrit.wikimedia.org/r/529775 (https://phabricator.wikimedia.org/T228969) [12:20:50] (03CR) 10Marostegui: [C: 03+2] mariadb: Provision db2122 into s7 [puppet] - 10https://gerrit.wikimedia.org/r/529775 (https://phabricator.wikimedia.org/T228969) (owner: 10Marostegui) [12:31:58] (03PS1) 10Ema: ATS: add hiera flag to disable systemd hardening [puppet] - 10https://gerrit.wikimedia.org/r/529780 [12:33:49] (03CR) 10Vgutierrez: [C: 03+1] ATS: add hiera flag to disable systemd hardening [puppet] - 10https://gerrit.wikimedia.org/r/529780 (owner: 10Ema) [12:34:40] (03CR) 10Ema: [C: 03+2] ATS: add hiera flag to disable systemd hardening [puppet] - 10https://gerrit.wikimedia.org/r/529780 (owner: 10Ema) [12:38:38] (03PS1) 10Ema: ATS: disable systemd hardening on some hosts [puppet] - 10https://gerrit.wikimedia.org/r/529784 [12:45:18] (03CR) 10Ema: [C: 03+2] ATS: disable systemd hardening on some hosts [puppet] - 10https://gerrit.wikimedia.org/r/529784 (owner: 10Ema) [12:48:59] (03PS1) 10Elukey: profile::kerberos::kdc: add debconf settings [puppet] - 10https://gerrit.wikimedia.org/r/529786 (https://phabricator.wikimedia.org/T226089) [12:51:00] !log cp1076,cp5001,cp5002: ats-backend-restart to disable ATS systemd hardening features [12:51:08] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:00:00] RECOVERY - Device not healthy -SMART- on helium is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/SMART%23Alerts https://grafana.wikimedia.org/dashboard/db/host-overview?var-server=helium&var-datasource=eqiad+prometheus/ops [13:13:20] (03PS1) 10Fsero: prometheus, k8s: enabling services prometheus service discovery [puppet] - 10https://gerrit.wikimedia.org/r/529789 [13:15:27] (03PS2) 10Fsero: prometheus, k8s: enabling services prometheus service discovery [puppet] - 10https://gerrit.wikimedia.org/r/529789 [13:15:46] 10Operations, 10User-fgiunchedi: CPU scaling governor audit - https://phabricator.wikimedia.org/T225713 (10Gehel) elastic[1032-1052].eqiad.wmnet,elastic[2025-2036].codfw.wmnet have been configured with `set /system1/oemhp_power1 oemhp_powerreg=os`. This will take effect after next rolling restart. [13:15:55] (03PS1) 10Filippo Giunchedi: facilities: introduce monitor_pdu_phase for ulsfo PDUs [puppet] - 10https://gerrit.wikimedia.org/r/529790 (https://phabricator.wikimedia.org/T148541) [13:15:58] (03PS1) 10Filippo Giunchedi: prometheus: generate targets for single phase PDUs [puppet] - 10https://gerrit.wikimedia.org/r/529791 (https://phabricator.wikimedia.org/T148541) [13:16:31] (03CR) 10jerkins-bot: [V: 04-1] facilities: introduce monitor_pdu_phase for ulsfo PDUs [puppet] - 10https://gerrit.wikimedia.org/r/529790 (https://phabricator.wikimedia.org/T148541) (owner: 10Filippo Giunchedi) [13:16:50] (03CR) 10jerkins-bot: [V: 04-1] prometheus: generate targets for single phase PDUs [puppet] - 10https://gerrit.wikimedia.org/r/529791 (https://phabricator.wikimedia.org/T148541) (owner: 10Filippo Giunchedi) [13:17:10] (03CR) 10CDanis: "Giuseppe, could you find some time to review this in the next couple of days?" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/528938 (https://phabricator.wikimedia.org/T229631) (owner: 10CDanis) [13:17:52] (03CR) 10Fsero: "PCC seems happy https://puppet-compiler.wmflabs.org/compiler1001/17862/" [puppet] - 10https://gerrit.wikimedia.org/r/529789 (owner: 10Fsero) [13:17:54] (03CR) 10Effie Mouzeli: [V: 03+1] "Looks good https://puppet-compiler.wmflabs.org/compiler1001/17861/" [puppet] - 10https://gerrit.wikimedia.org/r/525856 (https://phabricator.wikimedia.org/T185089) (owner: 10Ppchelko) [13:18:52] !log gehel@cumin2001 START - Cookbook sre.elasticsearch.rolling-reboot [13:18:58] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:25:51] (03PS2) 10Filippo Giunchedi: facilities: introduce monitor_pdu_phase for ulsfo PDUs [puppet] - 10https://gerrit.wikimedia.org/r/529790 (https://phabricator.wikimedia.org/T148541) [13:25:53] (03PS2) 10Filippo Giunchedi: prometheus: generate targets for single phase PDUs [puppet] - 10https://gerrit.wikimedia.org/r/529791 (https://phabricator.wikimedia.org/T148541) [13:26:32] (03PS1) 10Ema: ATS: restart atsmtail upon trafficserver restart [puppet] - 10https://gerrit.wikimedia.org/r/529792 [13:28:59] (03PS2) 10Fsero: Revert "k8s, cache: disabling codfw services for k8s cluster recreation" [puppet] - 10https://gerrit.wikimedia.org/r/528409 (https://phabricator.wikimedia.org/T228837) (owner: 10Alexandros Kosiaris) [13:29:33] (03CR) 10Vgutierrez: [C: 03+1] ATS: restart atsmtail upon trafficserver restart [puppet] - 10https://gerrit.wikimedia.org/r/529792 (owner: 10Ema) [13:30:04] (03CR) 10Ema: [C: 03+1] Revert "k8s, cache: disabling codfw services for k8s cluster recreation" [puppet] - 10https://gerrit.wikimedia.org/r/528409 (https://phabricator.wikimedia.org/T228837) (owner: 10Alexandros Kosiaris) [13:30:29] (03PS3) 10Filippo Giunchedi: facilities: introduce monitor_pdu_phase for ulsfo PDUs [puppet] - 10https://gerrit.wikimedia.org/r/529790 (https://phabricator.wikimedia.org/T148541) [13:30:31] (03PS3) 10Filippo Giunchedi: prometheus: generate targets for single phase PDUs [puppet] - 10https://gerrit.wikimedia.org/r/529791 (https://phabricator.wikimedia.org/T148541) [13:30:33] (03CR) 10Fsero: [C: 03+2] Revert "k8s, cache: disabling codfw services for k8s cluster recreation" [puppet] - 10https://gerrit.wikimedia.org/r/528409 (https://phabricator.wikimedia.org/T228837) (owner: 10Alexandros Kosiaris) [13:31:37] (03CR) 10Effie Mouzeli: [V: 03+1] "https://puppet-compiler.wmflabs.org/compiler1001/17865/graphite1004.eqiad.wmnet/" [puppet] - 10https://gerrit.wikimedia.org/r/525856 (https://phabricator.wikimedia.org/T185089) (owner: 10Ppchelko) [13:31:59] (03PS2) 10Ema: ATS: restart atsmtail upon trafficserver restart [puppet] - 10https://gerrit.wikimedia.org/r/529792 [13:33:18] (03CR) 10Ema: [C: 03+2] ATS: restart atsmtail upon trafficserver restart [puppet] - 10https://gerrit.wikimedia.org/r/529792 (owner: 10Ema) [13:34:14] (03CR) 10Filippo Giunchedi: [C: 03+2] facilities: introduce monitor_pdu_phase for ulsfo PDUs [puppet] - 10https://gerrit.wikimedia.org/r/529790 (https://phabricator.wikimedia.org/T148541) (owner: 10Filippo Giunchedi) [13:34:21] (03PS4) 10Filippo Giunchedi: facilities: introduce monitor_pdu_phase for ulsfo PDUs [puppet] - 10https://gerrit.wikimedia.org/r/529790 (https://phabricator.wikimedia.org/T148541) [13:34:38] (03CR) 10Effie Mouzeli: [V: 03+1 C: 03+2] Remove RESTBase graphite alerts. [puppet] - 10https://gerrit.wikimedia.org/r/525856 (https://phabricator.wikimedia.org/T185089) (owner: 10Ppchelko) [13:34:49] (03PS2) 10Effie Mouzeli: Remove RESTBase graphite alerts. [puppet] - 10https://gerrit.wikimedia.org/r/525856 (https://phabricator.wikimedia.org/T185089) (owner: 10Ppchelko) [13:36:51] (03PS5) 10Filippo Giunchedi: facilities: introduce monitor_pdu_phase for ulsfo PDUs [puppet] - 10https://gerrit.wikimedia.org/r/529790 (https://phabricator.wikimedia.org/T148541) [13:38:14] (03CR) 10Filippo Giunchedi: [V: 03+2 C: 03+2] facilities: introduce monitor_pdu_phase for ulsfo PDUs [puppet] - 10https://gerrit.wikimedia.org/r/529790 (https://phabricator.wikimedia.org/T148541) (owner: 10Filippo Giunchedi) [13:41:07] 10Operations, 10serviceops, 10PHP 7.2 support, 10Patch-For-Review: Socket Errors on PHP7 - https://phabricator.wikimedia.org/T224538 (10jijiki) 05Open→03Resolved {F30008369} Fixed! [13:44:51] PROBLEM - Mediawiki Cirrussearch update rate - codfw on icinga1001 is CRITICAL: CRITICAL: 20.00% of data under the critical threshold [50.0] https://wikitech.wikimedia.org/wiki/Search%23No_updates https://grafana.wikimedia.org/d/JLK3I_siz/elasticsearch-indexing?panelId=44&fullscreen&orgId=1 [13:46:15] (03CR) 10Filippo Giunchedi: [C: 03+2] prometheus: generate targets for single phase PDUs [puppet] - 10https://gerrit.wikimedia.org/r/529791 (https://phabricator.wikimedia.org/T148541) (owner: 10Filippo Giunchedi) [13:46:15] PROBLEM - ElasticSearch numbers of masters eligible - 9443 on search.svc.codfw.wmnet is CRITICAL: CRITICAL - Found 2 eligible masters. https://wikitech.wikimedia.org/wiki/Search%23Expected_eligible_masters_check_and_alert [13:46:21] (03PS1) 10Gehel: elasticsearch: add sleep() to terminate request and consume write queue [cookbooks] - 10https://gerrit.wikimedia.org/r/529794 [13:46:23] (03PS4) 10Filippo Giunchedi: prometheus: generate targets for single phase PDUs [puppet] - 10https://gerrit.wikimedia.org/r/529791 (https://phabricator.wikimedia.org/T148541) [13:46:39] gehel: you aware? ^^^^ [13:47:01] yep, rolling restart in progress, that alert is new and is not managed by the cookbook [13:47:08] got it [13:47:09] Ok [13:47:14] so it's a test of the alert :D [13:47:19] :) [13:47:19] he, he, he ... [13:47:29] volans: are you working? [13:47:34] RECOVERY - ElasticSearch numbers of masters eligible - 9443 on search.svc.codfw.wmnet is OK: OK - All good https://wikitech.wikimedia.org/wiki/Search%23Expected_eligible_masters_check_and_alert [13:47:46] new runbook on wikitech: to test a new alert of ES, perform a rolling restart of the whole cluster :-P [13:47:51] gehel: yep [13:48:05] lol [13:48:14] volans: I thought you had a few days off [13:48:46] yes I was off last week since Wed. mid afternoon [13:51:00] volans: do we have a good way to downtime a single service check (as opposed to the full host) in icinga? [13:51:20] !log gehel@cumin2001 END (ERROR) - Cookbook sre.elasticsearch.rolling-reboot (exit_code=97) [13:51:28] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:51:57] (03PS1) 10Elukey: Add no-cache response header for Wikistats V2's index.html [puppet] - 10https://gerrit.wikimedia.org/r/529795 (https://phabricator.wikimedia.org/T230136) [13:52:13] gehel: no, service downtime was not yet added to spicerack, but it's trivial to add using self._get_command_string() in the Icinga class [13:54:17] actually adding a service_command() method that uses the above to make a proper abstraction [13:54:49] actually, the update rate should not be triggered, so should not be downtimed [13:55:07] the only "bad" coupling requires is that the service description must be known [13:55:20] and that is defined in puppet and sometimes changed because is also the one shown in the UI [13:55:24] yep, but that's essential coupling [13:55:35] *required [13:56:17] that aside if you need that feature is probably ~10m of which 8 of them to write proper docstrings and tests :) [13:56:49] (03CR) 10Mathew.onipe: [C: 04-1] elasticsearch: add sleep() to terminate request and consume write queue (032 comments) [cookbooks] - 10https://gerrit.wikimedia.org/r/529794 (owner: 10Gehel) [13:57:55] 10Operations, 10Readers-Web-Backlog, 10Traffic: [Bug] iPadOS 13 shows the desktop version of Safari with a broken layout - https://phabricator.wikimedia.org/T229875 (10phuedx) >>! In T229875#5396278, @Jdlrobson wrote: > We shouldn't add ` jouncebot: next [14:21:26] In 2 hour(s) and 38 minute(s): Wikidata Query Service weekly deploy (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20190812T1700) [14:21:28] jouncebot: now [14:21:28] No deployments scheduled for the next 2 hour(s) and 38 minute(s) [14:22:05] * Urbanecm is going to do (yet another) emergency deploy for T230304 [14:23:02] (03CR) 10Filippo Giunchedi: [C: 03+2] hieradata: let Prometheus on PoPs talk to snmp_exporter [puppet] - 10https://gerrit.wikimedia.org/r/529797 (https://phabricator.wikimedia.org/T148541) (owner: 10Filippo Giunchedi) [14:23:10] (03PS2) 10Filippo Giunchedi: hieradata: let Prometheus on PoPs talk to snmp_exporter [puppet] - 10https://gerrit.wikimedia.org/r/529797 (https://phabricator.wikimedia.org/T148541) [14:24:23] !log urbanecm@deploy1001 Synchronized wmf-config/InitialiseSettings.php: Account creation throttle to 2 everywhere (T230304) (duration: 00m 47s) [14:24:29] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:25:43] (03PS1) 10Urbanecm: Set account create throttle to 2 everywhere [mediawiki-config] - 10https://gerrit.wikimedia.org/r/529798 (https://phabricator.wikimedia.org/T230304) [14:26:10] (03CR) 10Urbanecm: [C: 03+2] Set account create throttle to 2 everywhere [mediawiki-config] - 10https://gerrit.wikimedia.org/r/529798 (https://phabricator.wikimedia.org/T230304) (owner: 10Urbanecm) [14:27:21] (03Merged) 10jenkins-bot: Set account create throttle to 2 everywhere [mediawiki-config] - 10https://gerrit.wikimedia.org/r/529798 (https://phabricator.wikimedia.org/T230304) (owner: 10Urbanecm) [14:28:23] (03CR) 10jenkins-bot: Set account create throttle to 2 everywhere [mediawiki-config] - 10https://gerrit.wikimedia.org/r/529798 (https://phabricator.wikimedia.org/T230304) (owner: 10Urbanecm) [14:31:49] (03CR) 10Filippo Giunchedi: "Untested but LGTM, although the same config will be needed in staging/k8s.pp too I think" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/529789 (owner: 10Fsero) [14:32:52] RECOVERY - Disk space on cp5001 is OK: DISK OK https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space https://grafana.wikimedia.org/dashboard/db/host-overview?var-server=cp5001&var-datasource=eqsin+prometheus/ops [14:34:08] RECOVERY - Mediawiki Cirrussearch update rate - codfw on icinga1001 is OK: OK: Less than 1.00% under the threshold [80.0] https://wikitech.wikimedia.org/wiki/Search%23No_updates https://grafana.wikimedia.org/d/JLK3I_siz/elasticsearch-indexing?panelId=44&fullscreen&orgId=1 [14:36:52] (03PS5) 10MSantos: First version of the wikifeeds chart [deployment-charts] - 10https://gerrit.wikimedia.org/r/526679 (https://phabricator.wikimedia.org/T229287) [14:37:09] (03CR) 10MSantos: "> Patch Set 4:" (033 comments) [deployment-charts] - 10https://gerrit.wikimedia.org/r/526679 (https://phabricator.wikimedia.org/T229287) (owner: 10MSantos) [14:37:32] (03PS1) 10Ema: ATS: add capabilities needed by traffic_crashlog [puppet] - 10https://gerrit.wikimedia.org/r/529799 [14:37:40] (03PS1) 10Filippo Giunchedi: prometheus: don't poll st4OutletCapabilities [puppet] - 10https://gerrit.wikimedia.org/r/529800 (https://phabricator.wikimedia.org/T148541) [14:39:13] (03PS2) 10Filippo Giunchedi: prometheus: don't poll st4OutletCapabilities [puppet] - 10https://gerrit.wikimedia.org/r/529800 (https://phabricator.wikimedia.org/T148541) [14:39:51] !log disable puppet on mw122[4-8] [14:39:57] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:40:08] (03PS3) 10Filippo Giunchedi: prometheus: don't poll st4OutletCapabilities [puppet] - 10https://gerrit.wikimedia.org/r/529800 (https://phabricator.wikimedia.org/T148541) [14:42:26] (03PS1) 10Ema: ATS: disable compress plugins [puppet] - 10https://gerrit.wikimedia.org/r/529801 (https://phabricator.wikimedia.org/T227432) [14:42:38] (03CR) 10Filippo Giunchedi: [C: 03+2] prometheus: don't poll st4OutletCapabilities [puppet] - 10https://gerrit.wikimedia.org/r/529800 (https://phabricator.wikimedia.org/T148541) (owner: 10Filippo Giunchedi) [14:43:24] (03PS2) 10Ema: ATS: add capabilities needed by traffic_crashlog [puppet] - 10https://gerrit.wikimedia.org/r/529799 [14:43:26] (03PS2) 10Ema: ATS: disable compress plugin [puppet] - 10https://gerrit.wikimedia.org/r/529801 (https://phabricator.wikimedia.org/T227432) [14:43:45] 10Operations, 10Readers-Web-Backlog, 10Traffic: [Bug] iPadOS 13 shows the desktop version of Safari with a broken layout - https://phabricator.wikimedia.org/T229875 (10Jdlrobson) >>! In T229875#5409585, @phuedx wrote: >>>! In T229875#5396278, @Jdlrobson wrote: >> We shouldn't add ` (03PS1) 10Mathew.onipe: remote: make RemoteHosts iterable via a generator [software/spicerack] - 10https://gerrit.wikimedia.org/r/529802 [14:49:41] (03PS4) 10Elukey: Add Cache-Control response header for Wikistats V2's index.html [puppet] - 10https://gerrit.wikimedia.org/r/529795 (https://phabricator.wikimedia.org/T230136) [14:49:52] (03CR) 10jerkins-bot: [V: 04-1] remote: make RemoteHosts iterable via a generator [software/spicerack] - 10https://gerrit.wikimedia.org/r/529802 (owner: 10Mathew.onipe) [14:50:52] (03CR) 10Ema: [C: 03+1] Add Cache-Control response header for Wikistats V2's index.html [puppet] - 10https://gerrit.wikimedia.org/r/529795 (https://phabricator.wikimedia.org/T230136) (owner: 10Elukey) [14:52:17] (03PS2) 10Mathew.onipe: remote: make RemoteHosts iterable via a generator [software/spicerack] - 10https://gerrit.wikimedia.org/r/529802 [14:54:08] (03PS3) 10Mathew.onipe: remote: make RemoteHosts iterable via a generator [software/spicerack] - 10https://gerrit.wikimedia.org/r/529802 [14:59:10] (03CR) 10Vgutierrez: [C: 03+1] ATS: disable compress plugin [puppet] - 10https://gerrit.wikimedia.org/r/529801 (https://phabricator.wikimedia.org/T227432) (owner: 10Ema) [14:59:14] (03CR) 10Effie Mouzeli: [C: 03+2] hieradata: enable php72_only on mw122[4-8] [puppet] - 10https://gerrit.wikimedia.org/r/529796 (https://phabricator.wikimedia.org/T219150) (owner: 10Effie Mouzeli) [14:59:23] (03PS2) 10Effie Mouzeli: hieradata: enable php72_only on mw122[4-8] [puppet] - 10https://gerrit.wikimedia.org/r/529796 (https://phabricator.wikimedia.org/T219150) [14:59:33] (03CR) 10Ema: [C: 03+2] ATS: add capabilities needed by traffic_crashlog [puppet] - 10https://gerrit.wikimedia.org/r/529799 (owner: 10Ema) [14:59:45] (03CR) 10Ema: [C: 03+2] ATS: disable compress plugin [puppet] - 10https://gerrit.wikimedia.org/r/529801 (https://phabricator.wikimedia.org/T227432) (owner: 10Ema) [15:00:26] (03PS3) 10Effie Mouzeli: hieradata: enable php72_only on mw122[4-8] [puppet] - 10https://gerrit.wikimedia.org/r/529796 (https://phabricator.wikimedia.org/T219150) [15:01:47] (03CR) 10Ayounsi: [C: 03+2] Ignore SJ Manufacturing ThruPower devices for LibreNMS report [software/netbox-reports] - 10https://gerrit.wikimedia.org/r/529439 (owner: 10Ayounsi) [15:01:56] !log cp1076, cp500[12]: restart trafficserver with compress plugin disabled [15:02:02] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:05:00] RECOVERY - Check the Netbox report-s- librenms for fail status. on netmon1002 is OK: librenms.LibreNMS OK https://wikitech.wikimedia.org/wiki/Netbox%23Reports [15:05:31] !log rolling restat php-fpm on mw122[4-8] - T219150 [15:05:38] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:05:39] T219150: Ramp up percentage of users on php7.2 to 100% on both API and appserver clusters - https://phabricator.wikimedia.org/T219150 [15:10:47] (03PS1) 10Mathew.onipe: icinga: add timeout option to elastic checks [puppet] - 10https://gerrit.wikimedia.org/r/529806 [15:14:53] (03PS1) 10Alaa Sarhan: Revert "Revert "Switch property terms migration to WRITE_NEW on client wikis"" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/529807 [15:18:44] 10Operations, 10serviceops, 10Performance-Team (Radar), 10User-jijiki: Ramp up percentage of users on php7.2 to 100% on both API and appserver clusters - https://phabricator.wikimedia.org/T219150 (10jijiki) [15:18:58] 10Operations, 10serviceops, 10Performance-Team (Radar), 10User-jijiki: Ramp up percentage of users on php7.2 to 100% on both API and appserver clusters - https://phabricator.wikimedia.org/T219150 (10jijiki) [15:26:05] 10Operations, 10MediaWiki-extensions-Mailgun, 10cloud-services-team, 10serviceops, and 4 others: Switch cronjobs on maintenance hosts to PHP7 - https://phabricator.wikimedia.org/T195392 (10jijiki) @Dzahn Thank you! We'll see how things are since we now have merged https://gerrit.wikimedia.org/r/425027, and... [15:26:22] (03CR) 10Volans: [C: 04-1] "Question inline" (031 comment) [software/spicerack] - 10https://gerrit.wikimedia.org/r/529802 (owner: 10Mathew.onipe) [15:26:27] 10Operations, 10MediaWiki-extensions-Mailgun, 10cloud-services-team, 10serviceops, and 4 others: Switch cronjobs on maintenance hosts to PHP7 - https://phabricator.wikimedia.org/T195392 (10jijiki) [15:37:52] (03CR) 10Gehel: [C: 04-1] icinga: add timeout option to elastic checks (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/529806 (owner: 10Mathew.onipe) [15:39:24] (03Abandoned) 10Arturo Borrero Gonzalez: k8s: kubelet: stop requiring ::k8s::infrastructure_config [puppet] - 10https://gerrit.wikimedia.org/r/519375 (https://phabricator.wikimedia.org/T215531) (owner: 10Arturo Borrero Gonzalez) [15:58:36] (03PS1) 10Ema: ATS: unset Accept-Encoding [puppet] - 10https://gerrit.wikimedia.org/r/529810 [15:59:44] (03PS2) 10Ema: ATS: unset Accept-Encoding [puppet] - 10https://gerrit.wikimedia.org/r/529810 [17:00:04] gehel and onimisionipe: I, the Bot under the Fountain, allow thee, The Deployer, to do Wikidata Query Service weekly deploy deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20190812T1700). [17:27:47] didn't know there was deployment today [17:37:49] !log onimisionipe@deploy1001 Started deploy [wdqs/wdqs@8579f50]: Updated GUI, New endpoints and New Blazegraph and Updater build [17:37:57] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:39:08] 10Operations, 10netops: Instability of the Level3 link between cr2-eqiad and cr2-esams - https://phabricator.wikimedia.org/T228827 (10ayounsi) Email sent to our account rep to know what they can do. [17:42:53] !log onimisionipe@deploy1001 Finished deploy [wdqs/wdqs@8579f50]: Updated GUI, New endpoints and New Blazegraph and Updater build (duration: 05m 04s) [17:42:59] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:00:05] MaxSem, RoanKattouw, Niharika, and Urbanecm: I, the Bot under the Fountain, allow thee, The Deployer, to do Morning SWAT (Max 6 patches) deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20190812T1800). [18:00:05] No GERRIT patches in the queue for this window AFAICS. [18:00:14] (03PS6) 10Fomafix: Add redirects from 'sgs' to 'bat-smg' [puppet] - 10https://gerrit.wikimedia.org/r/481540 (https://phabricator.wikimedia.org/T204830) [18:03:49] (03PS2) 10Fomafix: Add 'rup' as alias for 'roa-rup' [puppet] - 10https://gerrit.wikimedia.org/r/527917 (https://phabricator.wikimedia.org/T17988) [18:33:01] (03PS3) 10Jdlrobson: Remove unused remnant from old menu click tracking [mediawiki-config] - 10https://gerrit.wikimedia.org/r/527615 (https://phabricator.wikimedia.org/T228681) [18:35:55] !log mforns@deploy1001 Started deploy [analytics/refinery@5418d3b]: deploying analytics-refinery up to 5418d3be5f65f7325324d0c15c51b3ca722dde1c [18:36:01] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:46:00] PROBLEM - High CPU load on API appserver on mw1231 is CRITICAL: CRITICAL - load average: 57.87, 30.62, 17.59 https://wikitech.wikimedia.org/wiki/Application_servers [18:46:10] PROBLEM - High CPU load on API appserver on mw1229 is CRITICAL: CRITICAL - load average: 48.42, 23.49, 14.34 https://wikitech.wikimedia.org/wiki/Application_servers [18:46:10] PROBLEM - High CPU load on API appserver on mw1234 is CRITICAL: CRITICAL - load average: 48.21, 24.40, 14.59 https://wikitech.wikimedia.org/wiki/Application_servers [18:46:16] PROBLEM - High CPU load on API appserver on mw1312 is CRITICAL: CRITICAL - load average: 74.46, 42.18, 26.49 https://wikitech.wikimedia.org/wiki/Application_servers [18:46:20] PROBLEM - High CPU load on API appserver on mw1233 is CRITICAL: CRITICAL - load average: 66.14, 33.87, 18.42 https://wikitech.wikimedia.org/wiki/Application_servers [18:46:42] PROBLEM - High CPU load on API appserver on mw1276 is CRITICAL: CRITICAL - load average: 64.11, 30.38, 19.63 https://wikitech.wikimedia.org/wiki/Application_servers [18:46:44] PROBLEM - High CPU load on API appserver on mw1285 is CRITICAL: CRITICAL - load average: 64.66, 30.92, 19.89 https://wikitech.wikimedia.org/wiki/Application_servers [18:46:44] PROBLEM - High CPU load on API appserver on mw1282 is CRITICAL: CRITICAL - load average: 60.10, 29.30, 19.51 https://wikitech.wikimedia.org/wiki/Application_servers [18:46:48] PROBLEM - High CPU load on API appserver on mw1232 is CRITICAL: CRITICAL - load average: 66.30, 29.85, 16.55 https://wikitech.wikimedia.org/wiki/Application_servers [18:46:48] PROBLEM - High CPU load on API appserver on mw1288 is CRITICAL: CRITICAL - load average: 67.36, 36.14, 22.35 https://wikitech.wikimedia.org/wiki/Application_servers [18:46:50] PROBLEM - High CPU load on API appserver on mw1287 is CRITICAL: CRITICAL - load average: 68.35, 32.92, 20.85 https://wikitech.wikimedia.org/wiki/Application_servers [18:47:02] PROBLEM - High CPU load on API appserver on mw1316 is CRITICAL: CRITICAL - load average: 76.49, 44.41, 27.41 https://wikitech.wikimedia.org/wiki/Application_servers [18:47:04] PROBLEM - High CPU load on API appserver on mw1279 is CRITICAL: CRITICAL - load average: 69.62, 34.31, 21.12 https://wikitech.wikimedia.org/wiki/Application_servers [18:47:06] PROBLEM - High CPU load on API appserver on mw1281 is CRITICAL: CRITICAL - load average: 66.92, 33.84, 21.28 https://wikitech.wikimedia.org/wiki/Application_servers [18:47:08] PROBLEM - High CPU load on API appserver on mw1290 is CRITICAL: CRITICAL - load average: 69.38, 33.63, 21.07 https://wikitech.wikimedia.org/wiki/Application_servers [18:47:08] PROBLEM - High CPU load on API appserver on mw1278 is CRITICAL: CRITICAL - load average: 61.58, 32.20, 20.60 https://wikitech.wikimedia.org/wiki/Application_servers [18:47:10] PROBLEM - recommendation_api endpoints health on scb2006 is CRITICAL: /{domain}/v1/description/addition/{target} (Description addition suggestions) timed out before a response was received: /{domain}/v1/article/creation/morelike/{seed} (article.creation.morelike - good article title) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api [18:47:16] PROBLEM - High CPU load on API appserver on mw1314 is CRITICAL: CRITICAL - load average: 81.09, 46.39, 28.27 https://wikitech.wikimedia.org/wiki/Application_servers [18:47:18] PROBLEM - High CPU load on API appserver on mw1230 is CRITICAL: CRITICAL - load average: 51.15, 24.89, 14.23 https://wikitech.wikimedia.org/wiki/Application_servers [18:47:20] PROBLEM - High CPU load on API appserver on mw1235 is CRITICAL: CRITICAL - load average: 50.28, 28.61, 15.95 https://wikitech.wikimedia.org/wiki/Application_servers [18:47:22] PROBLEM - High CPU load on API appserver on mw1277 is CRITICAL: CRITICAL - load average: 67.92, 35.47, 22.05 https://wikitech.wikimedia.org/wiki/Application_servers [18:47:40] PROBLEM - High CPU load on API appserver on mw1286 is CRITICAL: CRITICAL - load average: 68.20, 39.99, 23.97 https://wikitech.wikimedia.org/wiki/Application_servers [18:47:46] PROBLEM - High CPU load on API appserver on mw1284 is CRITICAL: CRITICAL - load average: 71.18, 40.83, 24.46 https://wikitech.wikimedia.org/wiki/Application_servers [18:48:06] PROBLEM - recommendation_api endpoints health on scb1004 is CRITICAL: /{domain}/v1/caption/translation/from/{source}/to/{target} (Caption translation suggestions) timed out before a response was received: /{domain}/v1/article/creation/morelike/{seed} (article.creation.morelike - good article title) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api [18:48:06] PROBLEM - recommendation_api endpoints health on scb1002 is CRITICAL: /{domain}/v1/description/translation/from/{source}/to/{target} (Description translation suggestions) timed out before a response was received: /{domain}/v1/article/creation/morelike/{seed} (article.creation.morelike - good article title) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api [18:48:22] RECOVERY - High CPU load on API appserver on mw1282 is OK: OK - load average: 30.00, 29.20, 20.60 https://wikitech.wikimedia.org/wiki/Application_servers [18:48:28] PROBLEM - recommendation_api endpoints health on scb2003 is CRITICAL: /{domain}/v1/article/creation/morelike/{seed} (article.creation.morelike - good article title) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api [18:48:28] PROBLEM - restbase endpoints health on restbase2020 is CRITICAL: /en.wikipedia.org/v1/feed/featured/{yyyy}/{mm}/{dd} (Retrieve aggregated feed content for April 29, 2016) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase [18:48:38] PROBLEM - recommendation_api endpoints health on scb2001 is CRITICAL: /{domain}/v1/article/creation/morelike/{seed} (article.creation.morelike - good article title) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api [18:48:38] PROBLEM - recommendation_api endpoints health on scb2005 is CRITICAL: /{domain}/v1/article/creation/morelike/{seed} (article.creation.morelike - good article title) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api [18:48:38] PROBLEM - restbase endpoints health on restbase1022 is CRITICAL: /en.wikipedia.org/v1/transform/wikitext/to/html/{title} (Transform wikitext to html) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase [18:48:42] RECOVERY - High CPU load on API appserver on mw1279 is OK: OK - load average: 28.63, 29.79, 20.79 https://wikitech.wikimedia.org/wiki/Application_servers [18:48:48] PROBLEM - recommendation_api endpoints health on scb2002 is CRITICAL: /{domain}/v1/article/creation/morelike/{seed} (article.creation.morelike - good article title) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api [18:48:54] RECOVERY - High CPU load on API appserver on mw1230 is OK: OK - load average: 20.51, 22.40, 14.44 https://wikitech.wikimedia.org/wiki/Application_servers [18:49:00] RECOVERY - High CPU load on API appserver on mw1277 is OK: OK - load average: 27.84, 31.19, 21.87 https://wikitech.wikimedia.org/wiki/Application_servers [18:49:24] PROBLEM - High CPU load on API appserver on mw1229 is CRITICAL: CRITICAL - load average: 54.01, 35.87, 20.80 https://wikitech.wikimedia.org/wiki/Application_servers [18:49:30] PROBLEM - recommendation_api endpoints health on scb2004 is CRITICAL: /{domain}/v1/article/creation/morelike/{seed} (article.creation.morelike - good article title) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api [18:49:34] PROBLEM - High CPU load on API appserver on mw1313 is CRITICAL: CRITICAL - load average: 78.76, 51.22, 32.07 https://wikitech.wikimedia.org/wiki/Application_servers [18:49:56] RECOVERY - High CPU load on API appserver on mw1276 is OK: OK - load average: 20.45, 30.08, 22.08 https://wikitech.wikimedia.org/wiki/Application_servers [18:49:58] RECOVERY - High CPU load on API appserver on mw1285 is OK: OK - load average: 21.29, 29.38, 21.81 https://wikitech.wikimedia.org/wiki/Application_servers [18:49:58] PROBLEM - High CPU load on API appserver on mw1315 is CRITICAL: CRITICAL - load average: 72.29, 51.02, 32.76 https://wikitech.wikimedia.org/wiki/Application_servers [18:50:04] RECOVERY - High CPU load on API appserver on mw1287 is OK: OK - load average: 20.71, 30.17, 22.44 https://wikitech.wikimedia.org/wiki/Application_servers [18:50:20] RECOVERY - High CPU load on API appserver on mw1281 is OK: OK - load average: 20.66, 28.33, 21.72 https://wikitech.wikimedia.org/wiki/Application_servers [18:50:22] RECOVERY - High CPU load on API appserver on mw1278 is OK: OK - load average: 22.56, 29.67, 22.09 https://wikitech.wikimedia.org/wiki/Application_servers [18:50:34] PROBLEM - High CPU load on API appserver on mw1235 is CRITICAL: CRITICAL - load average: 62.45, 42.50, 23.64 https://wikitech.wikimedia.org/wiki/Application_servers [18:50:56] PROBLEM - High CPU load on API appserver on mw1317 is CRITICAL: CRITICAL - load average: 72.05, 53.66, 34.60 https://wikitech.wikimedia.org/wiki/Application_servers [18:51:06] PROBLEM - High CPU load on API appserver on mw1312 is CRITICAL: CRITICAL - load average: 67.03, 53.91, 35.99 https://wikitech.wikimedia.org/wiki/Application_servers [18:51:34] PROBLEM - High CPU load on API appserver on mw1276 is CRITICAL: CRITICAL - load average: 64.02, 39.43, 26.01 https://wikitech.wikimedia.org/wiki/Application_servers [18:51:40] PROBLEM - High CPU load on API appserver on mw1288 is CRITICAL: CRITICAL - load average: 61.35, 48.12, 31.37 https://wikitech.wikimedia.org/wiki/Application_servers [18:51:55] PROBLEM - High CPU load on API appserver on mw1316 is CRITICAL: CRITICAL - load average: 79.73, 58.15, 37.67 https://wikitech.wikimedia.org/wiki/Application_servers [18:51:55] PROBLEM - High CPU load on API appserver on mw1279 is CRITICAL: CRITICAL - load average: 71.95, 40.40, 25.91 https://wikitech.wikimedia.org/wiki/Application_servers [18:51:58] PROBLEM - High CPU load on API appserver on mw1290 is CRITICAL: CRITICAL - load average: 64.55, 49.26, 31.41 https://wikitech.wikimedia.org/wiki/Application_servers [18:52:00] PROBLEM - High CPU load on API appserver on mw1278 is CRITICAL: CRITICAL - load average: 62.01, 39.08, 26.08 https://wikitech.wikimedia.org/wiki/Application_servers [18:52:08] PROBLEM - High CPU load on API appserver on mw1314 is CRITICAL: CRITICAL - load average: 82.91, 57.57, 37.19 https://wikitech.wikimedia.org/wiki/Application_servers [18:52:08] PROBLEM - High CPU load on API appserver on mw1230 is CRITICAL: CRITICAL - load average: 51.66, 29.91, 18.38 https://wikitech.wikimedia.org/wiki/Application_servers [18:52:14] PROBLEM - High CPU load on API appserver on mw1277 is CRITICAL: CRITICAL - load average: 68.12, 41.04, 26.84 https://wikitech.wikimedia.org/wiki/Application_servers [18:52:28] PROBLEM - High CPU load on API appserver on mw1280 is CRITICAL: CRITICAL - load average: 72.06, 41.75, 26.40 https://wikitech.wikimedia.org/wiki/Application_servers [18:52:36] PROBLEM - recommendation_api endpoints health on scb1003 is CRITICAL: /{domain}/v1/caption/addition/{target} (Caption addition suggestions) timed out before a response was received: /{domain}/v1/article/creation/translation/{source}{/seed} (article.creation.translation - bad seed) timed out before a response was received: /{domain}/v1/caption/translation/from/{source}/to/{target} (Caption translation suggestions) timed out before [18:52:36] eceived: /{domain}/v1/description/translation/from/{source}/to/{target} (Description translation suggestions) timed out before a response was received: /{domain}/v1/description/addition/{target} (Description addition suggestions) timed out before a response was received: /{domain}/v1/article/creation/morelike/{seed} (article.creation.morelike - good article title) timed out before a response was received: /{domain}/v1/article/cre [18:52:36] eed} (article.creation.morelike - bad article title) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api [18:52:36] PROBLEM - restbase endpoints health on restbase1016 is CRITICAL: /en.wikipedia.org/v1/feed/featured/{yyyy}/{mm}/{dd} (Retrieve aggregated feed content for April 29, 2016) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase [18:52:44] PROBLEM - High CPU load on API appserver on mw1289 is CRITICAL: CRITICAL - load average: 73.46, 40.94, 25.54 https://wikitech.wikimedia.org/wiki/Application_servers [18:53:08] PROBLEM - restbase endpoints health on restbase2012 is CRITICAL: /en.wikipedia.org/v1/feed/featured/{yyyy}/{mm}/{dd} (Retrieve aggregated feed content for April 29, 2016) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase [18:53:10] PROBLEM - High CPU load on API appserver on mw1283 is CRITICAL: CRITICAL - load average: 79.54, 42.74, 25.84 https://wikitech.wikimedia.org/wiki/Application_servers [18:53:12] PROBLEM - restbase endpoints health on restbase2013 is CRITICAL: /en.wikipedia.org/v1/feed/featured/{yyyy}/{mm}/{dd} (Retrieve aggregated feed content for April 29, 2016) timed out before a response was received: /en.wikipedia.org/v1/page/title/{title} (Get rev by title from storage) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase [18:53:12] PROBLEM - restbase endpoints health on restbase1020 is CRITICAL: /en.wikipedia.org/v1/feed/featured/{yyyy}/{mm}/{dd} (Retrieve aggregated feed content for April 29, 2016) timed out before a response was received: /en.wikipedia.org/v1/page/title/{title} (Get rev by title from storage) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase [18:53:13] PROBLEM - High CPU load on API appserver on mw1285 is CRITICAL: CRITICAL - load average: 71.19, 46.61, 29.61 https://wikitech.wikimedia.org/wiki/Application_servers [18:53:13] PROBLEM - High CPU load on API appserver on mw1282 is CRITICAL: CRITICAL - load average: 82.88, 48.39, 29.61 https://wikitech.wikimedia.org/wiki/Application_servers [18:53:20] PROBLEM - High CPU load on API appserver on mw1287 is CRITICAL: CRITICAL - load average: 74.91, 48.38, 30.68 https://wikitech.wikimedia.org/wiki/Application_servers [18:53:20] PROBLEM - restbase endpoints health on restbase1021 is CRITICAL: /en.wikipedia.org/v1/feed/featured/{yyyy}/{mm}/{dd} (Retrieve aggregated feed content for April 29, 2016) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase [18:53:20] PROBLEM - proton endpoints health on proton1001 is CRITICAL: /{domain}/v1/pdf/{title}/{format}/{type} (Print the Foo page from en.wp.org in letter format) timed out before a response was received: /{domain}/v1/pdf/{title}/{format}/{type} (Print the Bar page from en.wp.org in A4 format using optimized for reading on mobile devices) timed out before a response was received: /{domain}/v1/pdf/{title}/{format}/{type} (Respond file not [18:53:20] xistent title) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/proton [18:53:30] PROBLEM - restbase endpoints health on restbase2009 is CRITICAL: /en.wikipedia.org/v1/feed/featured/{yyyy}/{mm}/{dd} (Retrieve aggregated feed content for April 29, 2016) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase [18:53:32] PROBLEM - recommendation_api endpoints health on scb1001 is CRITICAL: /{domain}/v1/caption/addition/{target} (Caption addition suggestions) timed out before a response was received: /{domain}/v1/description/translation/from/{source}/to/{target} (Description translation suggestions) timed out before a response was received: /{domain}/v1/description/addition/{target} (Description addition suggestions) timed out before a response wa [18:53:32] ain}/v1/article/creation/morelike/{seed} (article.creation.morelike - good article title) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api [18:53:32] PROBLEM - mobileapps endpoints health on scb2003 is CRITICAL: /{domain}/v1/page/media/{title} (Get media in test page) timed out before a response was received: /{domain}/v1/page/summary/{title} (Get summary for test page) timed out before a response was received: /{domain}/v1/transform/html/to/mobile-html/{title} (Get preview mobile HTML for test page) timed out before a response was received https://wikitech.wikimedia.org/wiki/ [18:53:32] ng/mobileapps [18:53:36] PROBLEM - High CPU load on API appserver on mw1281 is CRITICAL: CRITICAL - load average: 77.65, 50.18, 31.31 https://wikitech.wikimedia.org/wiki/Application_servers [18:53:38] PROBLEM - restbase endpoints health on restbase2010 is CRITICAL: /en.wikipedia.org/v1/feed/featured/{yyyy}/{mm}/{dd} (Retrieve aggregated feed content for April 29, 2016) timed out before a response was received: /en.wikipedia.org/v1/page/title/{title} (Get rev by title from storage) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase [18:53:40] PROBLEM - restbase endpoints health on restbase2017 is CRITICAL: /en.wikipedia.org/v1/feed/featured/{yyyy}/{mm}/{dd} (Retrieve aggregated feed content for April 29, 2016) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase [18:53:50] PROBLEM - PHP7 rendering on mw1347 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [18:54:04] PROBLEM - restbase endpoints health on restbase1023 is CRITICAL: /en.wikipedia.org/v1/feed/featured/{yyyy}/{mm}/{dd} (Retrieve aggregated feed content for April 29, 2016) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase [18:54:18] PROBLEM - restbase endpoints health on restbase1024 is CRITICAL: /en.wikipedia.org/v1/feed/featured/{yyyy}/{mm}/{dd} (Retrieve aggregated feed content for April 29, 2016) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase [18:54:20] PROBLEM - restbase endpoints health on restbase-dev1004 is CRITICAL: /en.wikipedia.org/v1/feed/featured/{yyyy}/{mm}/{dd} (Retrieve aggregated feed content for April 29, 2016) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase [18:54:30] PROBLEM - PHP7 rendering on mw1348 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [18:54:32] PROBLEM - restbase endpoints health on restbase2011 is CRITICAL: /en.wikipedia.org/v1/feed/featured/{yyyy}/{mm}/{dd} (Retrieve aggregated feed content for April 29, 2016) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase [18:54:40] PROBLEM - restbase endpoints health on restbase1025 is CRITICAL: /en.wikipedia.org/v1/feed/featured/{yyyy}/{mm}/{dd} (Retrieve aggregated feed content for April 29, 2016) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase [18:54:40] PROBLEM - restbase endpoints health on restbase2014 is CRITICAL: /en.wikipedia.org/v1/feed/featured/{yyyy}/{mm}/{dd} (Retrieve aggregated feed content for April 29, 2016) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase [18:54:56] PROBLEM - restbase endpoints health on restbase-dev1005 is CRITICAL: /en.wikipedia.org/v1/feed/featured/{yyyy}/{mm}/{dd} (Retrieve aggregated feed content for April 29, 2016) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase [18:54:58] PROBLEM - mobileapps endpoints health on scb1003 is CRITICAL: /{domain}/v1/page/media/{title} (Get media in test page) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/mobileapps [18:55:00] RECOVERY - proton endpoints health on proton1001 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/proton [18:55:04] RECOVERY - restbase endpoints health on restbase1022 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase [18:55:08] RECOVERY - mobileapps endpoints health on scb2003 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/mobileapps [18:55:20] RECOVERY - PHP7 rendering on mw1347 is OK: HTTP OK: HTTP/1.1 200 OK - 78683 bytes in 1.708 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [18:55:40] RECOVERY - restbase endpoints health on restbase1023 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase [18:55:44] PROBLEM - restbase endpoints health on restbase2019 is CRITICAL: /en.wikipedia.org/v1/feed/featured/{yyyy}/{mm}/{dd} (Retrieve aggregated feed content for April 29, 2016) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase [18:55:56] RECOVERY - restbase endpoints health on restbase1024 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase [18:56:00] RECOVERY - restbase endpoints health on restbase-dev1004 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase [18:56:02] PROBLEM - High CPU load on API appserver on mw1312 is CRITICAL: CRITICAL - load average: 61.33, 56.61, 41.88 https://wikitech.wikimedia.org/wiki/Application_servers [18:56:02] RECOVERY - PHP7 rendering on mw1348 is OK: HTTP OK: HTTP/1.1 200 OK - 78683 bytes in 2.478 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [18:56:04] PROBLEM - High CPU load on API appserver on mw1313 is CRITICAL: CRITICAL - load average: 58.54, 53.05, 39.48 https://wikitech.wikimedia.org/wiki/Application_servers [18:56:14] RECOVERY - restbase endpoints health on restbase2014 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase [18:56:22] RECOVERY - restbase endpoints health on restbase2012 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase [18:56:30] PROBLEM - High CPU load on API appserver on mw1315 is CRITICAL: CRITICAL - load average: 74.81, 57.10, 40.64 https://wikitech.wikimedia.org/wiki/Application_servers [18:56:34] RECOVERY - restbase endpoints health on restbase2020 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase [18:56:40] PROBLEM - restbase endpoints health on restbase1027 is CRITICAL: /en.wikipedia.org/v1/feed/featured/{yyyy}/{mm}/{dd} (Retrieve aggregated feed content for April 29, 2016) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase [18:57:16] PROBLEM - termbox eqiad on termbox.svc.eqiad.wmnet is CRITICAL: /termbox (get rendered termbox) is CRITICAL: Test get rendered termbox returned the unexpected status 500 (expecting: 200) https://wikitech.wikimedia.org/wiki/WMDE/Wikidata/SSR_Service [18:57:30] PROBLEM - High CPU load on API appserver on mw1317 is CRITICAL: CRITICAL - load average: 68.02, 57.30, 41.74 https://wikitech.wikimedia.org/wiki/Application_servers [18:57:58] RECOVERY - restbase endpoints health on restbase1025 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase [18:58:32] PROBLEM - High CPU load on API appserver on mw1278 is CRITICAL: CRITICAL - load average: 62.53, 42.49, 31.72 https://wikitech.wikimedia.org/wiki/Application_servers [18:58:46] PROBLEM - High CPU load on API appserver on mw1277 is CRITICAL: CRITICAL - load average: 73.04, 48.14, 34.51 https://wikitech.wikimedia.org/wiki/Application_servers [18:59:00] PROBLEM - restbase endpoints health on restbase1026 is CRITICAL: /en.wikipedia.org/v1/feed/featured/{yyyy}/{mm}/{dd} (Retrieve aggregated feed content for April 29, 2016) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase [18:59:08] RECOVERY - restbase endpoints health on restbase1016 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase [18:59:42] PROBLEM - High CPU load on API appserver on mw1283 is CRITICAL: CRITICAL - load average: 80.70, 52.88, 34.44 https://wikitech.wikimedia.org/wiki/Application_servers [18:59:42] PROBLEM - High CPU load on API appserver on mw1276 is CRITICAL: CRITICAL - load average: 75.43, 53.50, 37.80 https://wikitech.wikimedia.org/wiki/Application_servers [18:59:42] PROBLEM - restbase endpoints health on restbase-dev1006 is CRITICAL: /en.wikipedia.org/v1/feed/featured/{yyyy}/{mm}/{dd} (Retrieve aggregated feed content for April 29, 2016) timed out before a response was received: /en.wikipedia.org/v1/page/graph/png/{title}/{revision}/{graph_id} (Get a graph from Graphoid) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase [18:59:44] PROBLEM - High CPU load on API appserver on mw1285 is CRITICAL: CRITICAL - load average: 53.29, 48.83, 35.81 https://wikitech.wikimedia.org/wiki/Application_servers [18:59:44] PROBLEM - High CPU load on API appserver on mw1282 is CRITICAL: CRITICAL - load average: 85.67, 56.24, 37.72 https://wikitech.wikimedia.org/wiki/Application_servers [18:59:52] PROBLEM - High CPU load on API appserver on mw1297 is CRITICAL: CRITICAL - load average: 83.09, 42.59, 23.54 https://wikitech.wikimedia.org/wiki/Application_servers [18:59:52] PROBLEM - Restbase LVS eqiad on restbase.svc.eqiad.wmnet is CRITICAL: /en.wikipedia.org/v1/feed/featured/{yyyy}/{mm}/{dd} (Retrieve aggregated feed content for April 29, 2016) timed out before a response was received https://wikitech.wikimedia.org/wiki/RESTBase [18:59:58] (03PS2) 10Fomafix: Add 'cbk' as alias for 'cbk-zam' [puppet] - 10https://gerrit.wikimedia.org/r/527912 (https://phabricator.wikimedia.org/T124657) [19:00:10] PROBLEM - Mobileapps LVS eqiad on mobileapps.svc.eqiad.wmnet is CRITICAL: /{domain}/v1/page/media/{title} (Get media in test page) timed out before a response was received: /{domain}/v1/page/random/title (retrieve a random article title) timed out before a response was received https://wikitech.wikimedia.org/wiki/Mobileapps_%28service%29 [19:00:12] PROBLEM - mobileapps endpoints health on scb1004 is CRITICAL: /{domain}/v1/page/media/{title} (Get media in test page) timed out before a response was received: /{domain}/v1/media/image/featured/{year}/{month}/{day} (retrieve featured image data for April 29, 2016) timed out before a response was received: /{domain}/v1/page/mobile-html/{title} (Get page content HTML for test page) timed out before a response was received: /{domai [19:00:12] /title (retrieve a random article title) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/mobileapps [19:00:26] PROBLEM - PHP7 rendering on mw1348 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [19:00:32] PROBLEM - restbase endpoints health on restbase1017 is CRITICAL: /en.wikipedia.org/v1/feed/featured/{yyyy}/{mm}/{dd} (Retrieve aggregated feed content for April 29, 2016) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase [19:00:34] RECOVERY - restbase endpoints health on restbase2019 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase [19:00:36] PROBLEM - mobileapps endpoints health on scb2004 is CRITICAL: /{domain}/v1/page/media/{title} (Get media in test page) timed out before a response was received: /{domain}/v1/media/image/featured/{year}/{month}/{day} (retrieve featured image data for April 29, 2016) timed out before a response was received: /{domain}/v1/page/mobile-sections/{title} (retrieve test page via mobile-sections) timed out before a response was received h [19:00:36] ikimedia.org/wiki/Services/Monitoring/mobileapps [19:00:38] PROBLEM - restbase endpoints health on restbase1023 is CRITICAL: /en.wikipedia.org/v1/feed/featured/{yyyy}/{mm}/{dd} (Retrieve aggregated feed content for April 29, 2016) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase [19:00:40] PROBLEM - Nginx local proxy to apache on mw1347 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers [19:00:44] PROBLEM - mobileapps endpoints health on scb1002 is CRITICAL: /{domain}/v1/media/image/featured/{year}/{month}/{day} (retrieve featured image data for April 29, 2016) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/mobileapps [19:00:58] PROBLEM - restbase endpoints health on restbase1024 is CRITICAL: /en.wikipedia.org/v1/page/title/{title} (Get rev by title from storage) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase [19:00:58] PROBLEM - restbase endpoints health on restbase-dev1004 is CRITICAL: /en.wikipedia.org/v1/feed/featured/{yyyy}/{mm}/{dd} (Retrieve aggregated feed content for April 29, 2016) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase [19:01:04] PROBLEM - restbase endpoints health on restbase2015 is CRITICAL: /en.wikipedia.org/v1/feed/featured/{yyyy}/{mm}/{dd} (Retrieve aggregated feed content for April 29, 2016) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase [19:01:08] PROBLEM - restbase endpoints health on restbase2016 is CRITICAL: /en.wikipedia.org/v1/feed/featured/{yyyy}/{mm}/{dd} (Retrieve aggregated feed content for April 29, 2016) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase [19:01:10] PROBLEM - mobileapps endpoints health on scb2005 is CRITICAL: /{domain}/v1/page/media/{title} (Get media in test page) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/mobileapps [19:01:10] PROBLEM - PHP7 rendering on mw1227 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [19:01:18] PROBLEM - restbase endpoints health on restbase2014 is CRITICAL: /en.wikipedia.org/v1/feed/featured/{yyyy}/{mm}/{dd} (Retrieve aggregated feed content for April 29, 2016) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase [19:01:24] PROBLEM - restbase endpoints health on restbase2012 is CRITICAL: /en.wikipedia.org/v1/feed/featured/{yyyy}/{mm}/{dd} (Retrieve aggregated feed content for April 29, 2016) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase [19:01:26] RECOVERY - restbase endpoints health on restbase1020 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase [19:01:30] PROBLEM - mobileapps endpoints health on scb2002 is CRITICAL: /{domain}/v1/page/media/{title} (Get media in test page) timed out before a response was received: /{domain}/v1/media/image/featured/{year}/{month}/{day} (retrieve featured image data for April 29, 2016) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/mobileapps [19:01:30] PROBLEM - Restbase LVS codfw on restbase.svc.codfw.wmnet is CRITICAL: /en.wikipedia.org/v1/feed/featured/{yyyy}/{mm}/{dd} (Retrieve aggregated feed content for April 29, 2016) timed out before a response was received https://wikitech.wikimedia.org/wiki/RESTBase [19:01:30] PROBLEM - Mobileapps LVS codfw on mobileapps.svc.codfw.wmnet is CRITICAL: /{domain}/v1/media/image/featured/{year}/{month}/{day} (retrieve featured image data for April 29, 2016) timed out before a response was received https://wikitech.wikimedia.org/wiki/Mobileapps_%28service%29 [19:01:32] RECOVERY - restbase endpoints health on restbase-dev1005 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase [19:01:32] RECOVERY - Restbase LVS eqiad on restbase.svc.eqiad.wmnet is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/RESTBase [19:01:34] RECOVERY - restbase endpoints health on restbase1027 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase [19:01:34] RECOVERY - restbase endpoints health on restbase1021 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase [19:01:36] PROBLEM - mobileapps endpoints health on scb2006 is CRITICAL: /{domain}/v1/media/image/featured/{year}/{month}/{day} (retrieve featured image data for April 29, 2016) timed out before a response was received: /{domain}/v1/page/summary/{title} (Get summary for test page) timed out before a response was received: /{domain}/v1/transform/html/to/mobile-html/{title} (Get preview mobile HTML for test page) timed out before a response w [19:01:36] ://wikitech.wikimedia.org/wiki/Services/Monitoring/mobileapps [19:01:38] PROBLEM - restbase endpoints health on restbase2020 is CRITICAL: /en.wikipedia.org/v1/feed/featured/{yyyy}/{mm}/{dd} (Retrieve aggregated feed content for April 29, 2016) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase [19:01:42] RECOVERY - restbase endpoints health on restbase2009 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase [19:01:50] PROBLEM - HTTP availability for Nginx -SSL terminators- at eqiad on icinga1001 is CRITICAL: cluster=cache_text site=eqiad https://wikitech.wikimedia.org/wiki/Cache_TLS_termination https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=4&fullscreen&refresh=1m&orgId=1 [19:01:52] PROBLEM - HTTP availability for Nginx -SSL terminators- at esams on icinga1001 is CRITICAL: cluster=cache_text site=esams https://wikitech.wikimedia.org/wiki/Cache_TLS_termination https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=4&fullscreen&refresh=1m&orgId=1 [19:01:52] RECOVERY - Mobileapps LVS eqiad on mobileapps.svc.eqiad.wmnet is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Mobileapps_%28service%29 [19:02:02] RECOVERY - PHP7 rendering on mw1348 is OK: HTTP OK: HTTP/1.1 200 OK - 78682 bytes in 0.564 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [19:02:14] RECOVERY - restbase endpoints health on restbase1017 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase [19:02:14] RECOVERY - Nginx local proxy to apache on mw1347 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 659 bytes in 0.047 second response time https://wikitech.wikimedia.org/wiki/Application_servers [19:02:18] RECOVERY - mobileapps endpoints health on scb2004 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/mobileapps [19:02:18] PROBLEM - HTTP availability for Nginx -SSL terminators- at ulsfo on icinga1001 is CRITICAL: cluster=cache_text site=ulsfo https://wikitech.wikimedia.org/wiki/Cache_TLS_termination https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=4&fullscreen&refresh=1m&orgId=1 [19:02:20] PROBLEM - HTTP availability for Varnish at eqiad on icinga1001 is CRITICAL: job=varnish-text site=eqiad https://wikitech.wikimedia.org/wiki/Varnish%23Diagnosing_Varnish_alerts https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=3&fullscreen&refresh=1m&orgId=1 https://logstash.wikimedia.org/goto/60aa05b6e1129b475fbf4e7be868c67d [19:02:22] RECOVERY - restbase endpoints health on restbase1023 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase [19:02:28] RECOVERY - mobileapps endpoints health on scb1002 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/mobileapps [19:02:34] RECOVERY - recommendation_api endpoints health on scb1003 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api [19:02:36] RECOVERY - restbase endpoints health on restbase-dev1004 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase [19:02:36] RECOVERY - restbase endpoints health on restbase1024 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase [19:02:40] PROBLEM - High CPU load on API appserver on mw1312 is CRITICAL: CRITICAL - load average: 74.56, 58.96, 47.35 https://wikitech.wikimedia.org/wiki/Application_servers [19:02:42] PROBLEM - High CPU load on API appserver on mw1313 is CRITICAL: CRITICAL - load average: 69.28, 54.24, 44.35 https://wikitech.wikimedia.org/wiki/Application_servers [19:02:44] RECOVERY - PHP7 rendering on mw1227 is OK: HTTP OK: HTTP/1.1 200 OK - 78682 bytes in 0.977 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [19:02:46] PROBLEM - HTTP availability for Varnish at esams on icinga1001 is CRITICAL: job=varnish-text site=esams https://wikitech.wikimedia.org/wiki/Varnish%23Diagnosing_Varnish_alerts https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=3&fullscreen&refresh=1m&orgId=1 https://logstash.wikimedia.org/goto/60aa05b6e1129b475fbf4e7be868c67d [19:02:48] RECOVERY - restbase endpoints health on restbase2015 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase [19:02:48] RECOVERY - restbase endpoints health on restbase2011 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase [19:02:48] RECOVERY - mobileapps endpoints health on scb2005 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/mobileapps [19:02:50] RECOVERY - restbase endpoints health on restbase2016 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase [19:02:54] RECOVERY - restbase endpoints health on restbase2014 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase [19:02:58] PROBLEM - HTTP availability for Varnish at ulsfo on icinga1001 is CRITICAL: job=varnish-text site=ulsfo https://wikitech.wikimedia.org/wiki/Varnish%23Diagnosing_Varnish_alerts https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=3&fullscreen&refresh=1m&orgId=1 https://logstash.wikimedia.org/goto/60aa05b6e1129b475fbf4e7be868c67d [19:03:02] RECOVERY - restbase endpoints health on restbase2012 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase [19:03:04] RECOVERY - restbase endpoints health on restbase-dev1006 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase [19:03:06] RECOVERY - restbase endpoints health on restbase2013 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase [19:03:06] RECOVERY - Restbase LVS codfw on restbase.svc.codfw.wmnet is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/RESTBase [19:03:08] RECOVERY - mobileapps endpoints health on scb2002 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/mobileapps [19:03:08] RECOVERY - Mobileapps LVS codfw on mobileapps.svc.codfw.wmnet is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Mobileapps_%28service%29 [19:03:08] PROBLEM - High CPU load on API appserver on mw1285 is CRITICAL: CRITICAL - load average: 42.83, 44.84, 36.49 https://wikitech.wikimedia.org/wiki/Application_servers [19:03:08] PROBLEM - High CPU load on API appserver on mw1315 is CRITICAL: CRITICAL - load average: 77.68, 56.88, 45.19 https://wikitech.wikimedia.org/wiki/Application_servers [19:03:12] RECOVERY - mobileapps endpoints health on scb1003 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/mobileapps [19:03:12] RECOVERY - mobileapps endpoints health on scb2006 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/mobileapps [19:03:14] RECOVERY - restbase endpoints health on restbase2020 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase [19:03:16] RECOVERY - High CPU load on API appserver on mw1297 is OK: OK - load average: 11.73, 30.34, 22.86 https://wikitech.wikimedia.org/wiki/Application_servers [19:03:26] PROBLEM - HTTP availability for Nginx -SSL terminators- at eqsin on icinga1001 is CRITICAL: cluster=cache_text site=eqsin https://wikitech.wikimedia.org/wiki/Cache_TLS_termination https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=4&fullscreen&refresh=1m&orgId=1 [19:03:32] RECOVERY - mobileapps endpoints health on scb1004 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/mobileapps [19:03:32] RECOVERY - restbase endpoints health on restbase2010 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase [19:03:36] RECOVERY - restbase endpoints health on restbase2017 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase [19:03:48] PROBLEM - MediaWiki exceptions and fatals per minute on graphite1004 is CRITICAL: CRITICAL: 90.00% of data above the critical threshold [50.0] https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/dashboard/db/mediawiki-graphite-alerts?orgId=1&panelId=2&fullscreen [19:03:54] RECOVERY - termbox eqiad on termbox.svc.eqiad.wmnet is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/WMDE/Wikidata/SSR_Service [19:03:58] RECOVERY - restbase endpoints health on restbase1026 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase [19:04:08] PROBLEM - High CPU load on API appserver on mw1317 is CRITICAL: CRITICAL - load average: 83.80, 61.24, 47.21 https://wikitech.wikimedia.org/wiki/Application_servers [19:04:20] RECOVERY - recommendation_api endpoints health on scb2004 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api [19:04:32] RECOVERY - recommendation_api endpoints health on scb1004 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api [19:04:34] RECOVERY - recommendation_api endpoints health on scb1002 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api [19:04:44] PROBLEM - High CPU load on API appserver on mw1342 is CRITICAL: CRITICAL - load average: 78.65, 47.77, 34.47 https://wikitech.wikimedia.org/wiki/Application_servers [19:04:52] RECOVERY - recommendation_api endpoints health on scb2003 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api [19:05:00] RECOVERY - recommendation_api endpoints health on scb2005 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api [19:05:02] RECOVERY - recommendation_api endpoints health on scb2001 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api [19:05:04] RECOVERY - recommendation_api endpoints health on scb1001 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api [19:05:08] RECOVERY - HTTP availability for Nginx -SSL terminators- at esams on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Cache_TLS_termination https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=4&fullscreen&refresh=1m&orgId=1 [19:05:08] (03PS2) 10Fomafix: Add 'bho' as alias for 'bh' [puppet] - 10https://gerrit.wikimedia.org/r/528782 (https://phabricator.wikimedia.org/T41968) [19:05:12] RECOVERY - recommendation_api endpoints health on scb2002 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api [19:05:12] RECOVERY - recommendation_api endpoints health on scb2006 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api [19:05:34] RECOVERY - HTTP availability for Varnish at eqiad on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Varnish%23Diagnosing_Varnish_alerts https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=3&fullscreen&refresh=1m&orgId=1 https://logstash.wikimedia.org/goto/60aa05b6e1129b475fbf4e7be868c67d [19:06:00] RECOVERY - HTTP availability for Varnish at esams on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Varnish%23Diagnosing_Varnish_alerts https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=3&fullscreen&refresh=1m&orgId=1 https://logstash.wikimedia.org/goto/60aa05b6e1129b475fbf4e7be868c67d [19:06:12] RECOVERY - HTTP availability for Varnish at ulsfo on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Varnish%23Diagnosing_Varnish_alerts https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=3&fullscreen&refresh=1m&orgId=1 https://logstash.wikimedia.org/goto/60aa05b6e1129b475fbf4e7be868c67d [19:06:40] RECOVERY - HTTP availability for Nginx -SSL terminators- at eqsin on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Cache_TLS_termination https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=4&fullscreen&refresh=1m&orgId=1 [19:06:44] RECOVERY - HTTP availability for Nginx -SSL terminators- at eqiad on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Cache_TLS_termination https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=4&fullscreen&refresh=1m&orgId=1 [19:07:08] RECOVERY - HTTP availability for Nginx -SSL terminators- at ulsfo on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Cache_TLS_termination https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=4&fullscreen&refresh=1m&orgId=1 [19:07:58] RECOVERY - High CPU load on API appserver on mw1342 is OK: OK - load average: 25.63, 37.71, 33.23 https://wikitech.wikimedia.org/wiki/Application_servers [19:08:36] RECOVERY - MediaWiki exceptions and fatals per minute on graphite1004 is OK: OK: Less than 70.00% above the threshold [25.0] https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/dashboard/db/mediawiki-graphite-alerts?orgId=1&panelId=2&fullscreen [19:09:32] RECOVERY - High CPU load on API appserver on mw1283 is OK: OK - load average: 16.43, 24.25, 29.24 https://wikitech.wikimedia.org/wiki/Application_servers [19:09:34] RECOVERY - High CPU load on API appserver on mw1285 is OK: OK - load average: 15.02, 23.09, 29.05 https://wikitech.wikimedia.org/wiki/Application_servers [19:09:58] RECOVERY - High CPU load on API appserver on mw1278 is OK: OK - load average: 15.24, 23.35, 29.67 https://wikitech.wikimedia.org/wiki/Application_servers [19:10:36] RECOVERY - High CPU load on API appserver on mw1229 is OK: OK - load average: 9.88, 15.53, 23.29 https://wikitech.wikimedia.org/wiki/Application_servers [19:11:08] RECOVERY - High CPU load on API appserver on mw1276 is OK: OK - load average: 13.83, 21.73, 29.76 https://wikitech.wikimedia.org/wiki/Application_servers [19:11:10] RECOVERY - High CPU load on API appserver on mw1282 is OK: OK - load average: 14.83, 22.02, 29.47 https://wikitech.wikimedia.org/wiki/Application_servers [19:11:48] RECOVERY - High CPU load on API appserver on mw1277 is OK: OK - load average: 15.76, 21.23, 29.05 https://wikitech.wikimedia.org/wiki/Application_servers [19:12:16] RECOVERY - High CPU load on API appserver on mw1289 is OK: OK - load average: 16.04, 21.02, 29.08 https://wikitech.wikimedia.org/wiki/Application_servers [19:12:52] RECOVERY - High CPU load on API appserver on mw1287 is OK: OK - load average: 16.50, 20.35, 28.91 https://wikitech.wikimedia.org/wiki/Application_servers [19:13:06] RECOVERY - High CPU load on API appserver on mw1281 is OK: OK - load average: 11.95, 18.68, 29.27 https://wikitech.wikimedia.org/wiki/Application_servers [19:13:18] RECOVERY - High CPU load on API appserver on mw1230 is OK: OK - load average: 8.29, 13.63, 22.44 https://wikitech.wikimedia.org/wiki/Application_servers [19:13:36] RECOVERY - High CPU load on API appserver on mw1280 is OK: OK - load average: 14.17, 19.17, 29.04 https://wikitech.wikimedia.org/wiki/Application_servers [19:14:17] !log gehel@cumin2001 START - Cookbook sre.elasticsearch.rolling-reboot [19:14:24] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:14:40] RECOVERY - High CPU load on API appserver on mw1279 is OK: OK - load average: 17.80, 20.00, 28.96 https://wikitech.wikimedia.org/wiki/Application_servers [19:14:51] (03PS2) 10Fomafix: Add 'nrf' as alias for 'nrm' [puppet] - 10https://gerrit.wikimedia.org/r/527909 (https://phabricator.wikimedia.org/T25216) [19:15:19] !log mforns@deploy1001 Finished deploy [analytics/refinery@5418d3b]: deploying analytics-refinery up to 5418d3be5f65f7325324d0c15c51b3ca722dde1c (duration: 39m 23s) [19:15:25] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:15:30] RECOVERY - High CPU load on API appserver on mw1312 is OK: OK - load average: 20.76, 24.45, 35.09 https://wikitech.wikimedia.org/wiki/Application_servers [19:15:32] RECOVERY - High CPU load on API appserver on mw1313 is OK: OK - load average: 21.46, 25.78, 35.20 https://wikitech.wikimedia.org/wiki/Application_servers [19:16:52] RECOVERY - High CPU load on API appserver on mw1286 is OK: OK - load average: 17.59, 19.21, 29.12 https://wikitech.wikimedia.org/wiki/Application_servers [19:17:58] RECOVERY - High CPU load on API appserver on mw1290 is OK: OK - load average: 16.80, 19.18, 29.18 https://wikitech.wikimedia.org/wiki/Application_servers [19:18:10] RECOVERY - High CPU load on API appserver on mw1235 is OK: OK - load average: 12.40, 14.39, 23.39 https://wikitech.wikimedia.org/wiki/Application_servers [19:18:32] RECOVERY - High CPU load on API appserver on mw1317 is OK: OK - load average: 22.44, 25.81, 34.63 https://wikitech.wikimedia.org/wiki/Application_servers [19:18:34] RECOVERY - High CPU load on API appserver on mw1284 is OK: OK - load average: 14.47, 18.00, 28.92 https://wikitech.wikimedia.org/wiki/Application_servers [19:18:50] PROBLEM - High CPU load on API appserver on mw1346 is CRITICAL: CRITICAL - load average: 75.22, 45.32, 33.53 https://wikitech.wikimedia.org/wiki/Application_servers [19:19:14] RECOVERY - High CPU load on API appserver on mw1288 is OK: OK - load average: 15.02, 18.67, 28.99 https://wikitech.wikimedia.org/wiki/Application_servers [19:19:42] RECOVERY - High CPU load on API appserver on mw1314 is OK: OK - load average: 18.83, 23.20, 34.38 https://wikitech.wikimedia.org/wiki/Application_servers [19:20:02] RECOVERY - High CPU load on API appserver on mw1231 is OK: OK - load average: 12.59, 16.35, 23.17 https://wikitech.wikimedia.org/wiki/Application_servers [19:20:20] PROBLEM - ElasticSearch health check for shards on 9643 on search.svc.codfw.wmnet is CRITICAL: CRITICAL - elasticsearch inactive shards 930 threshold =0.2 breach: initializing_shards: 0, status: yellow, timed_out: False, active_primary_shards: 1450, active_shards: 3419, active_shards_percent_as_number: 78.6157737410899, cluster_name: production-search-psi-codfw, task_max_waiting_in_queue_millis: 0, delayed_unassigned_shards: 0, n [19:20:20] 2, number_of_pending_tasks: 0, relocating_shards: 0, number_of_in_flight_fetch: 0, number_of_data_nodes: 12, unassigned_shards: 930 https://wikitech.wikimedia.org/wiki/Search%23Administration [19:20:30] PROBLEM - ElasticSearch numbers of masters eligible - 9643 on search.svc.codfw.wmnet is CRITICAL: CRITICAL - Found 2 eligible masters. https://wikitech.wikimedia.org/wiki/Search%23Expected_eligible_masters_check_and_alert [19:20:46] RECOVERY - High CPU load on API appserver on mw1315 is OK: OK - load average: 19.51, 25.40, 35.70 https://wikitech.wikimedia.org/wiki/Application_servers [19:21:04] RECOVERY - High CPU load on API appserver on mw1316 is OK: OK - load average: 19.85, 22.75, 34.50 https://wikitech.wikimedia.org/wiki/Application_servers [19:21:58] RECOVERY - ElasticSearch health check for shards on 9643 on search.svc.codfw.wmnet is OK: OK - elasticsearch status production-search-psi-codfw: number_of_nodes: 14, unassigned_shards: 77, relocating_shards: 0, number_of_data_nodes: 14, active_primary_shards: 1450, cluster_name: production-search-psi-codfw, initializing_shards: 3, timed_out: False, status: yellow, task_max_waiting_in_queue_millis: 159, number_of_pending_tasks: 6, [19:21:58] ght_fetch: 0, delayed_unassigned_shards: 0, active_shards_percent_as_number: 98.16049666590021, active_shards: 4269 https://wikitech.wikimedia.org/wiki/Search%23Administration [19:22:04] RECOVERY - High CPU load on API appserver on mw1346 is OK: OK - load average: 25.18, 36.19, 32.34 https://wikitech.wikimedia.org/wiki/Application_servers [19:22:08] RECOVERY - ElasticSearch numbers of masters eligible - 9643 on search.svc.codfw.wmnet is OK: OK - All good https://wikitech.wikimedia.org/wiki/Search%23Expected_eligible_masters_check_and_alert [19:26:36] RECOVERY - High CPU load on API appserver on mw1234 is OK: OK - load average: 12.58, 14.61, 22.83 https://wikitech.wikimedia.org/wiki/Application_servers [19:26:46] RECOVERY - High CPU load on API appserver on mw1233 is OK: OK - load average: 12.56, 18.45, 22.81 https://wikitech.wikimedia.org/wiki/Application_servers [19:27:12] RECOVERY - High CPU load on API appserver on mw1232 is OK: OK - load average: 10.57, 13.60, 22.97 https://wikitech.wikimedia.org/wiki/Application_servers [19:31:36] PROBLEM - High CPU load on API appserver on mw1346 is CRITICAL: CRITICAL - load average: 74.98, 45.89, 37.64 https://wikitech.wikimedia.org/wiki/Application_servers [19:34:48] RECOVERY - High CPU load on API appserver on mw1346 is OK: OK - load average: 21.96, 35.37, 35.27 https://wikitech.wikimedia.org/wiki/Application_servers [19:40:51] (03CR) 10Alex Monk: x509: Expose the OCSP URI of a Certificate as a property (031 comment) [software/acme-chief] - 10https://gerrit.wikimedia.org/r/516604 (https://phabricator.wikimedia.org/T219765) (owner: 10Vgutierrez) [19:49:52] (03CR) 10Alex Monk: [C: 03+1] "comment inline is me nitpicking, this is good to go" [software/acme-chief] - 10https://gerrit.wikimedia.org/r/516604 (https://phabricator.wikimedia.org/T219765) (owner: 10Vgutierrez) [19:59:00] PROBLEM - ElasticSearch numbers of masters eligible - 9643 on search.svc.codfw.wmnet is CRITICAL: CRITICAL - Found 2 eligible masters. https://wikitech.wikimedia.org/wiki/Search%23Expected_eligible_masters_check_and_alert [20:00:05] cscott, arlolra, subbu, bearND, halfak, and accraze: Time to snap out of that daydream and deploy Services – Parsoid / Citoid / Mobileapps / ORES / …. Get on with it. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20190812T2000). [20:00:48] (03CR) 10Alex Monk: "$ pip3 show cryptography" [software/acme-chief] - 10https://gerrit.wikimedia.org/r/529202 (https://phabricator.wikimedia.org/T219765) (owner: 10Vgutierrez) [20:02:14] RECOVERY - ElasticSearch numbers of masters eligible - 9643 on search.svc.codfw.wmnet is OK: OK - All good https://wikitech.wikimedia.org/wiki/Search%23Expected_eligible_masters_check_and_alert [20:03:58] (03CR) 10Alex Monk: "this works though: from cryptography.x509 import ocsp" [software/acme-chief] - 10https://gerrit.wikimedia.org/r/529202 (https://phabricator.wikimedia.org/T219765) (owner: 10Vgutierrez) [20:08:22] !log gehel@cumin2001 END (ERROR) - Cookbook sre.elasticsearch.rolling-reboot (exit_code=97) [20:08:29] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:12:21] !log mbsantos@deploy1001 Started deploy [mobileapps/deploy@615004f]: Update service-mobileapp-node to f0a2847 [20:12:28] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:17:26] !log mbsantos@deploy1001 Finished deploy [mobileapps/deploy@615004f]: Update service-mobileapp-node to f0a2847 (duration: 05m 05s) [20:17:33] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:31:23] (03CR) 10Alex Monk: ocsp: Provide basic functionality to perform OCSP requests (032 comments) [software/acme-chief] - 10https://gerrit.wikimedia.org/r/529202 (https://phabricator.wikimedia.org/T219765) (owner: 10Vgutierrez) [20:33:57] 10Operations, 10DNS, 10Traffic, 10Wikimedia-Apache-configuration: Sundown aliases `minnan` and `zh-cfr` for `nan`/`zh-min-nan` - https://phabricator.wikimedia.org/T230382 (10Fomafix) [20:44:21] (03PS1) 10Fomafix: Remove aliases 'minnan' and 'zh-cfr' [dns] - 10https://gerrit.wikimedia.org/r/529829 (https://phabricator.wikimedia.org/T230382) [20:44:29] (03PS1) 10Fomafix: Remove aliases 'minnan' and 'zh-cfr' [puppet] - 10https://gerrit.wikimedia.org/r/529830 (https://phabricator.wikimedia.org/T230382) [21:00:04] Reedy and sbassett: #bothumor My software never has bugs. It just develops random features. Rise for Weekly Security deployment window. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20190812T2100). [21:03:02] lol [21:21:22] PROBLEM - Work requests waiting in Zuul Gearman server on contint1001 is CRITICAL: CRITICAL: 57.14% of data above the critical threshold [140.0] https://www.mediawiki.org/wiki/Continuous_integration/Zuul https://grafana.wikimedia.org/dashboard/db/zuul-gearman?panelId=10&fullscreen&orgId=1 [22:06:04] RECOVERY - Work requests waiting in Zuul Gearman server on contint1001 is OK: OK: Less than 30.00% above the threshold [90.0] https://www.mediawiki.org/wiki/Continuous_integration/Zuul https://grafana.wikimedia.org/dashboard/db/zuul-gearman?panelId=10&fullscreen&orgId=1 [23:00:04] MaxSem, RoanKattouw, and Niharika: Your horoscope predicts another unfortunate Evening SWAT (Max 6 patches) deploy. May Zuul be (nice) with you. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20190812T2300). [23:00:05] No GERRIT patches in the queue for this window AFAICS. [23:24:17] !log add samplicator to buster-wikimedia repo [23:24:26] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:47:32] (03CR) 10Viztor: [C: 03+1] "looks good." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/529600 (https://phabricator.wikimedia.org/T230294) (owner: 10Zoranzoki21)