[00:00:04] Deploy window Martin Luther King Jr. Day (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20200120T0000) [00:08:37] (03PS1) 10Bmansurov: Add recommendation-api chart [deployment-charts] - 10https://gerrit.wikimedia.org/r/565788 (https://phabricator.wikimedia.org/T241230) [00:08:50] (03CR) 10jerkins-bot: [V: 04-1] Add recommendation-api chart [deployment-charts] - 10https://gerrit.wikimedia.org/r/565788 (https://phabricator.wikimedia.org/T241230) (owner: 10Bmansurov) [00:11:14] (03PS2) 10Bmansurov: Add recommendation-api chart [deployment-charts] - 10https://gerrit.wikimedia.org/r/565788 (https://phabricator.wikimedia.org/T241230) [00:11:46] (03CR) 10jerkins-bot: [V: 04-1] Add recommendation-api chart [deployment-charts] - 10https://gerrit.wikimedia.org/r/565788 (https://phabricator.wikimedia.org/T241230) (owner: 10Bmansurov) [04:03:00] (03PS1) 10Legoktm: openstack: Add Python 3 support to wmcs-cold-migrate [puppet] - 10https://gerrit.wikimedia.org/r/565793 (https://phabricator.wikimedia.org/T229920) [04:03:02] (03PS1) 10Legoktm: openstack: Add Python 3 support to wmcs-cold-nova-migrate [puppet] - 10https://gerrit.wikimedia.org/r/565794 (https://phabricator.wikimedia.org/T229920) [04:03:04] (03PS1) 10Legoktm: openstack: Add Python 3 support to wmcs-live-migrate [puppet] - 10https://gerrit.wikimedia.org/r/565795 (https://phabricator.wikimedia.org/T229920) [04:03:06] (03PS1) 10Legoktm: openstack: Add Python 3 support to wmcs-makedomain [puppet] - 10https://gerrit.wikimedia.org/r/565796 (https://phabricator.wikimedia.org/T229920) [04:03:08] (03PS1) 10Legoktm: openstack: Add Python 3 support to wmcs-region-migrate-quotas [puppet] - 10https://gerrit.wikimedia.org/r/565797 (https://phabricator.wikimedia.org/T229920) [04:03:10] (03PS1) 10Legoktm: openstack: Add Python 3 support to wmcs-region-migrate-security-groups [puppet] - 10https://gerrit.wikimedia.org/r/565798 (https://phabricator.wikimedia.org/T229920) [04:03:12] (03PS1) 10Legoktm: openstack: Add Python 3 support to wmcs-region-migrate [puppet] - 10https://gerrit.wikimedia.org/r/565799 (https://phabricator.wikimedia.org/T229920) [04:20:11] (03PS1) 10Legoktm: scap: Port mwgrep to Python 3 and other cleanup [puppet] - 10https://gerrit.wikimedia.org/r/565800 [05:31:32] (03CR) 10Ammarpad: "This was not deployed again. I scheduled I think three times, it's being skipped always not sure why. If someone can schedule/deploy it th" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/557439 (https://phabricator.wikimedia.org/T240728) (owner: 10Ammarpad) [05:52:37] (03CR) 10Mholloway: [C: 03+1] Re-enable delayed new upload jobs for MachineVision extension [mediawiki-config] - 10https://gerrit.wikimedia.org/r/565615 (https://phabricator.wikimedia.org/T241072) (owner: 10Cparle) [05:54:09] (03CR) 10Mholloway: [C: 03+1] Remove handler deleted from the MachineVision extension (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/565614 (https://phabricator.wikimedia.org/T241242) (owner: 10Cparle) [06:05:12] (03PS1) 10BryanDavis: Deprecate Jessie based Kubernetes types [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/565807 [06:08:00] !log marostegui@cumin1001 dbctl commit (dc=all): 'Depool db1121, pool db1084 into vslow T232446', diff saved to https://phabricator.wikimedia.org/P10214 and previous config saved to /var/cache/conftool/dbconfig/20200120-060759-marostegui.json [06:08:03] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:08:04] T232446: Compress new Wikibase tables - https://phabricator.wikimedia.org/T232446 [06:08:55] !log Compress db1121 - T232446 [06:08:57] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:09:24] !log Stop replication on db1107 [06:09:25] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:12:40] (03PS6) 10BryanDavis: Report error messages on stderr [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/496565 [06:12:42] (03PS6) 10BryanDavis: Remove lighttpd-precise handling [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/496566 [06:12:44] (03PS6) 10BryanDavis: Improve support for extra_args [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/496567 [06:12:46] (03PS5) 10BryanDavis: Rename internal "toollabs" package to "toolforge" [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/563605 [06:12:48] (03PS13) 10BryanDavis: Make Kubernetes the default backend and warn when guessing [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/443190 (https://phabricator.wikimedia.org/T154504) (owner: 10Nehajha) [06:12:50] (03PS7) 10BryanDavis: kubernetes: Set php7.3 as the default type [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/496564 [06:12:52] (03PS2) 10BryanDavis: Deprecate Jessie based Kubernetes types [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/565807 [06:13:48] (03CR) 10BryanDavis: Report error messages on stderr (031 comment) [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/496565 (owner: 10BryanDavis) [06:19:12] 10Operations, 10ops-codfw, 10DBA: db2085 crashed - https://phabricator.wikimedia.org/T243148 (10Marostegui) @Papaul errors on the OS logs: ` Jan 19 07:06:56 db2085 kernel: [3257752.197344] {1}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 4 Jan 19 07:06:56 db2085 kernel: [3257752.1... [06:19:53] 10Operations, 10ops-codfw, 10DBA: db2085 crashed - memory issues - https://phabricator.wikimedia.org/T243148 (10Marostegui) [06:22:02] (03PS1) 10Marostegui: db2085: Disable notifications [puppet] - 10https://gerrit.wikimedia.org/r/565808 (https://phabricator.wikimedia.org/T243148) [06:23:19] (03CR) 10Marostegui: [C: 03+2] db2085: Disable notifications [puppet] - 10https://gerrit.wikimedia.org/r/565808 (https://phabricator.wikimedia.org/T243148) (owner: 10Marostegui) [07:03:47] (03PS1) 10Marostegui: mariadb: Move es2021 from spare to es4 slave [puppet] - 10https://gerrit.wikimedia.org/r/565816 (https://phabricator.wikimedia.org/T243052) [07:06:45] PROBLEM - MariaDB Slave Lag: s8 on db2085 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 46986.86 seconds https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_slave [07:07:16] ^ expected as I have started mysql on that host after crashing [07:07:18] (03PS2) 10Marostegui: mariadb: Move es2021 from spare to es4 slave [puppet] - 10https://gerrit.wikimedia.org/r/565816 (https://phabricator.wikimedia.org/T243052) [07:08:28] (03CR) 10Marostegui: [C: 03+2] mariadb: Move es2021 from spare to es4 slave [puppet] - 10https://gerrit.wikimedia.org/r/565816 (https://phabricator.wikimedia.org/T243052) (owner: 10Marostegui) [07:10:10] !log Stop MySQL on es2020 to clone es2021 - T243052 [07:10:12] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:10:13] T243052: Productionize es1020-es1025, es2020-es2025 - https://phabricator.wikimedia.org/T243052 [07:15:14] !log marostegui@cumin1001 dbctl commit (dc=all): 'Depool db2084 T239453', diff saved to https://phabricator.wikimedia.org/P10215 and previous config saved to /var/cache/conftool/dbconfig/20200120-071513-marostegui.json [07:15:16] !log Remove partitions from revision on db2084:3314 T239453 [07:15:17] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:15:18] T239453: Remove partitions from revision table - https://phabricator.wikimedia.org/T239453 [07:15:20] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:24:51] (03PS1) 10Marostegui: install_server: Do not reimage es2020, es2021 [puppet] - 10https://gerrit.wikimedia.org/r/565830 (https://phabricator.wikimedia.org/T243052) [07:25:50] (03CR) 10Marostegui: [C: 03+2] install_server: Do not reimage es2020, es2021 [puppet] - 10https://gerrit.wikimedia.org/r/565830 (https://phabricator.wikimedia.org/T243052) (owner: 10Marostegui) [07:28:06] 10Operations, 10Commons, 10MediaWiki-extensions-PagedTiffHandler, 10Multimedia, 10Patch-For-Review: Large TIFF files do not pass file verification (related to version of image magick installed) - https://phabricator.wikimedia.org/T240455 (10Alicia_Fagerving_WMSE) [07:28:31] 10Operations, 10Commons, 10MediaWiki-extensions-PagedTiffHandler, 10Multimedia, 10Patch-For-Review: Large TIFF files do not pass file verification (related to version of image magick installed) - https://phabricator.wikimedia.org/T240455 (10Alicia_Fagerving_WMSE) [07:30:38] (03PS1) 10Marostegui: db1121: Disable notifications [puppet] - 10https://gerrit.wikimedia.org/r/565832 (https://phabricator.wikimedia.org/T232446) [07:31:33] (03CR) 10Marostegui: [C: 03+2] db1121: Disable notifications [puppet] - 10https://gerrit.wikimedia.org/r/565832 (https://phabricator.wikimedia.org/T232446) (owner: 10Marostegui) [07:49:56] 10Operations, 10Performance-Team, 10serviceops: Increased latency in CODFW API and APP monitoring urls (~07:20 UTC 19 Jan 2020) - https://phabricator.wikimedia.org/T243149 (10elukey) p:05Triage→03Normal [07:56:50] RECOVERY - MariaDB Slave Lag: s8 on db2085 is OK: OK slave_sql_lag Replication lag: 0.46 seconds https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_slave [08:10:04] RECOVERY - Host cp3061 is UP: PING OK - Packet loss = 0%, RTA = 83.35 ms [08:10:38] !log Compare data on db2085:3318 - T243148 [08:10:40] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:10:42] T243148: db2085 crashed - memory issues - https://phabricator.wikimedia.org/T243148 [08:16:23] (03PS3) 10Giuseppe Lavagetto: conftool::safe_service_restarts: better support for non lvs servers [puppet] - 10https://gerrit.wikimedia.org/r/565574 [08:18:47] PROBLEM - IPv6 ping to codfw on ripe-atlas-codfw IPv6 is CRITICAL: CRITICAL - failed 41 probes of 514 (alerts on 35) - https://atlas.ripe.net/measurements/1791212/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts [08:19:55] RECOVERY - snapshot of s7 in codfw on db1115 is OK: snapshot for s7 at codfw taken less than 4 days ago and larger than 90 GB: Last one 2020-01-20 06:11:49 from db2100.codfw.wmnet:3317 (925 GB) https://wikitech.wikimedia.org/wiki/MariaDB/Backups [08:23:53] RECOVERY - IPv6 ping to codfw on ripe-atlas-codfw IPv6 is OK: OK - failed 28 probes of 514 (alerts on 35) - https://atlas.ripe.net/measurements/1791212/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts [08:28:26] 10Operations, 10ops-esams, 10Traffic: Upgrade BIOS and IDRAC firmware on esams caches - https://phabricator.wikimedia.org/T243167 (10MoritzMuehlenhoff) [08:28:39] 10Operations, 10ops-esams, 10Traffic: Upgrade BIOS and IDRAC firmware on esams caches - https://phabricator.wikimedia.org/T243167 (10MoritzMuehlenhoff) p:05Triage→03High [08:33:10] (03PS4) 10Giuseppe Lavagetto: conftool::safe_service_restarts: better support for non lvs servers [puppet] - 10https://gerrit.wikimedia.org/r/565574 [08:38:28] (03PS1) 10Giuseppe Lavagetto: deployment-prep: stop declaring lvs_services [puppet] - 10https://gerrit.wikimedia.org/r/565977 [08:39:07] (03PS1) 10Muehlenhoff: Record new MOU date for nathante [puppet] - 10https://gerrit.wikimedia.org/r/565978 [08:41:07] (03PS1) 10Marostegui: db1125: Enable notifications [puppet] - 10https://gerrit.wikimedia.org/r/565979 (https://phabricator.wikimedia.org/T232446) [08:41:49] (03CR) 10Muehlenhoff: [C: 03+2] Record new MOU date for nathante [puppet] - 10https://gerrit.wikimedia.org/r/565978 (owner: 10Muehlenhoff) [08:41:57] (03CR) 10Marostegui: [C: 03+2] db1125: Enable notifications [puppet] - 10https://gerrit.wikimedia.org/r/565979 (https://phabricator.wikimedia.org/T232446) (owner: 10Marostegui) [08:44:09] !log marostegui@cumin1001 dbctl commit (dc=all): 'Depool db1094', diff saved to https://phabricator.wikimedia.org/P10216 and previous config saved to /var/cache/conftool/dbconfig/20200120-084408-marostegui.json [08:44:10] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:44:15] !log Upgrade db1094 [08:44:16] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:49:09] !log marostegui@cumin1001 dbctl commit (dc=all): 'Slowly repool db1094', diff saved to https://phabricator.wikimedia.org/P10217 and previous config saved to /var/cache/conftool/dbconfig/20200120-084908-marostegui.json [08:49:11] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:49:27] 10Operations, 10Puppet, 10DBA, 10User-jbond: DB: perform rolling restart of mariadb deamons to pick up CA changes - https://phabricator.wikimedia.org/T239791 (10Marostegui) [08:51:58] !log Upgrade db1139:3311 db1139:3316 [08:51:59] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:55:38] !log marostegui@cumin1001 dbctl commit (dc=all): 'Slowly repool db1094', diff saved to https://phabricator.wikimedia.org/P10218 and previous config saved to /var/cache/conftool/dbconfig/20200120-085537-marostegui.json [08:55:41] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:59:08] 10Operations, 10Puppet, 10DBA, 10User-jbond: DB: perform rolling restart of mariadb deamons to pick up CA changes - https://phabricator.wikimedia.org/T239791 (10Marostegui) [09:01:30] !log installing Java security updates on an-conf* [09:01:31] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:03:38] !log marostegui@cumin1001 dbctl commit (dc=all): 'Slowly repool db1094', diff saved to https://phabricator.wikimedia.org/P10219 and previous config saved to /var/cache/conftool/dbconfig/20200120-090336-marostegui.json [09:03:39] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:05:57] !log restarting CAS to pick up Java security updates [09:05:59] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:06:18] !log marostegui@cumin1001 dbctl commit (dc=all): 'Depool db1129', diff saved to https://phabricator.wikimedia.org/P10220 and previous config saved to /var/cache/conftool/dbconfig/20200120-090617-marostegui.json [09:06:19] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:06:22] !log Upgrade db1129 [09:06:23] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:08:53] !log marostegui@cumin1001 dbctl commit (dc=all): 'Fully repool db1094', diff saved to https://phabricator.wikimedia.org/P10221 and previous config saved to /var/cache/conftool/dbconfig/20200120-090850-marostegui.json [09:08:54] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:14:43] (03CR) 10ArielGlenn: "We'll want this same entry in several other places, i.e. all the analytics (stats1007) diectories that are fetched. Is this the language w" [puppet] - 10https://gerrit.wikimedia.org/r/565403 (owner: 10Dzahn) [09:19:12] 10Operations, 10Puppet, 10DBA, 10User-jbond: DB: perform rolling restart of mariadb deamons to pick up CA changes - https://phabricator.wikimedia.org/T239791 (10Marostegui) [09:19:30] !log marostegui@cumin1001 dbctl commit (dc=all): 'Slowly repool db1129', diff saved to https://phabricator.wikimedia.org/P10222 and previous config saved to /var/cache/conftool/dbconfig/20200120-091929-marostegui.json [09:19:32] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:19:44] (03PS1) 10Ema: cache: configure netconsole on cp3061 [puppet] - 10https://gerrit.wikimedia.org/r/565982 (https://phabricator.wikimedia.org/T242579) [09:20:24] (03CR) 10jerkins-bot: [V: 04-1] cache: configure netconsole on cp3061 [puppet] - 10https://gerrit.wikimedia.org/r/565982 (https://phabricator.wikimedia.org/T242579) (owner: 10Ema) [09:22:24] (03PS2) 10Ema: cache: configure netconsole on cp3061 [puppet] - 10https://gerrit.wikimedia.org/r/565982 (https://phabricator.wikimedia.org/T242579) [09:26:43] !log marostegui@cumin1001 dbctl commit (dc=all): 'Slowly repool db1129', diff saved to https://phabricator.wikimedia.org/P10223 and previous config saved to /var/cache/conftool/dbconfig/20200120-092642-marostegui.json [09:26:44] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:32:35] (03PS1) 10Elukey: Revert "Deploy Apache BigTop's apt repository on analytics1031" [puppet] - 10https://gerrit.wikimedia.org/r/565984 [09:36:05] !log marostegui@cumin1001 dbctl commit (dc=all): 'Slowly repool db1129', diff saved to https://phabricator.wikimedia.org/P10224 and previous config saved to /var/cache/conftool/dbconfig/20200120-093603-marostegui.json [09:36:06] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:36:25] (03CR) 10Elukey: [C: 03+2] Revert "Deploy Apache BigTop's apt repository on analytics1031" [puppet] - 10https://gerrit.wikimedia.org/r/565984 (owner: 10Elukey) [09:41:14] (03CR) 10Arturo Borrero Gonzalez: [C: 03+1] openstack: Add Python 3 support to wmcs-region-migrate [puppet] - 10https://gerrit.wikimedia.org/r/565799 (https://phabricator.wikimedia.org/T229920) (owner: 10Legoktm) [09:42:06] (03CR) 10Arturo Borrero Gonzalez: [C: 03+1] openstack: Add Python 3 support to wmcs-region-migrate-security-groups [puppet] - 10https://gerrit.wikimedia.org/r/565798 (https://phabricator.wikimedia.org/T229920) (owner: 10Legoktm) [09:42:42] (03CR) 10Arturo Borrero Gonzalez: [C: 03+1] openstack: Add Python 3 support to wmcs-region-migrate-quotas [puppet] - 10https://gerrit.wikimedia.org/r/565797 (https://phabricator.wikimedia.org/T229920) (owner: 10Legoktm) [09:43:26] (03CR) 10Arturo Borrero Gonzalez: [C: 03+1] openstack: Add Python 3 support to wmcs-makedomain [puppet] - 10https://gerrit.wikimedia.org/r/565796 (https://phabricator.wikimedia.org/T229920) (owner: 10Legoktm) [09:44:15] (03CR) 10Filippo Giunchedi: install_server: introduce raid0 standard partman recipe (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/564959 (https://phabricator.wikimedia.org/T156955) (owner: 10Filippo Giunchedi) [09:44:22] (03CR) 10Arturo Borrero Gonzalez: [C: 03+1] openstack: Add Python 3 support to wmcs-live-migrate [puppet] - 10https://gerrit.wikimedia.org/r/565795 (https://phabricator.wikimedia.org/T229920) (owner: 10Legoktm) [09:44:47] !log marostegui@cumin1001 dbctl commit (dc=all): 'Fully repool db1129', diff saved to https://phabricator.wikimedia.org/P10225 and previous config saved to /var/cache/conftool/dbconfig/20200120-094445-marostegui.json [09:44:48] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:45:03] (03CR) 10Arturo Borrero Gonzalez: [C: 03+1] openstack: Add Python 3 support to wmcs-cold-nova-migrate [puppet] - 10https://gerrit.wikimedia.org/r/565794 (https://phabricator.wikimedia.org/T229920) (owner: 10Legoktm) [09:55:46] (03CR) 10Giuseppe Lavagetto: [C: 03+2] "this has also the nice side effect to add php service restarts to mwdebug servers and on scandium." [puppet] - 10https://gerrit.wikimedia.org/r/565574 (owner: 10Giuseppe Lavagetto) [10:01:11] !log jmm@cumin2001 START - Cookbook sre.hosts.decommission [10:01:12] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:01:46] !log jmm@cumin2001 END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) [10:01:47] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:01:52] 10Operations, 10Patch-For-Review: Migrate URL downloaders to Buster - https://phabricator.wikimedia.org/T224551 (10ops-monitoring-bot) cookbooks.sre.hosts.decommission executed by jmm@cumin2001 for hosts: `aluminium.wikimedia.org` - aluminium.wikimedia.org (**FAIL**) - Downtimed host on Icinga - No manage... [10:04:23] !log jmm@cumin2001 START - Cookbook sre.hosts.decommission [10:04:25] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:04:55] !log jmm@cumin2001 END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) [10:04:56] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:05:00] 10Operations, 10Patch-For-Review: Migrate URL downloaders to Buster - https://phabricator.wikimedia.org/T224551 (10ops-monitoring-bot) cookbooks.sre.hosts.decommission executed by jmm@cumin2001 for hosts: `alcyone.wikimedia.org` - alcyone.wikimedia.org (**FAIL**) - Downtimed host on Icinga - No management... [10:05:46] (03CR) 10Arturo Borrero Gonzalez: [C: 03+1] openstack: Add Python 3 support to wmcs-cold-migrate [puppet] - 10https://gerrit.wikimedia.org/r/565793 (https://phabricator.wikimedia.org/T229920) (owner: 10Legoktm) [10:06:43] !log removing alcyone/aluminium in Ganeti [10:06:44] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:06:55] !log removing alcyone/aluminium in Ganeti T224551 [10:06:57] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:06:58] T224551: Migrate URL downloaders to Buster - https://phabricator.wikimedia.org/T224551 [10:12:24] (03CR) 10Arturo Borrero Gonzalez: [C: 03+1] "LGTM. This is only required for a very specific version combo of docker/iptables/debian. I would add at least a check to see if this manif" [puppet] - 10https://gerrit.wikimedia.org/r/565752 (https://phabricator.wikimedia.org/T242319) (owner: 10Legoktm) [10:14:09] (03PS3) 10Ema: cache: configure netconsole on cp3061 [puppet] - 10https://gerrit.wikimedia.org/r/565982 (https://phabricator.wikimedia.org/T242579) [10:14:25] PROBLEM - Widespread puppet agent failures- no resources reported on icinga1001 is CRITICAL: 0.0113 ge 0.01 https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/yOxVDGvWk/puppet [10:14:48] (03PS2) 10Giuseppe Lavagetto: deployment-prep: stop declaring lvs_services [puppet] - 10https://gerrit.wikimedia.org/r/565977 [10:16:16] (03CR) 10Giuseppe Lavagetto: [C: 03+2] deployment-prep: stop declaring lvs_services [puppet] - 10https://gerrit.wikimedia.org/r/565977 (owner: 10Giuseppe Lavagetto) [10:16:34] (03CR) 10Arturo Borrero Gonzalez: "> Patch Set 2:" [puppet] - 10https://gerrit.wikimedia.org/r/565431 (https://phabricator.wikimedia.org/T229441) (owner: 10Alex Monk) [10:16:48] _joe_: there are puppet failures, e.g. on mw1293: lookup() did not find a value for the name 'has_lvs', known? [10:19:48] (03CR) 10Legoktm: "> I believe you don't need this for Debian Bullseye (buster+1) or iptables >= 1.8.3." [puppet] - 10https://gerrit.wikimedia.org/r/565752 (https://phabricator.wikimedia.org/T242319) (owner: 10Legoktm) [10:21:36] (03PS4) 10Ema: cache: configure netconsole on cp3061 [puppet] - 10https://gerrit.wikimedia.org/r/565982 (https://phabricator.wikimedia.org/T242579) [10:22:48] (03PS1) 10Cparle: Remove handler deleted from the MachineVision extension on beta [mediawiki-config] - 10https://gerrit.wikimedia.org/r/565987 (https://phabricator.wikimedia.org/T241242) [10:23:40] (03CR) 10Cparle: Remove handler deleted from the MachineVision extension (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/565614 (https://phabricator.wikimedia.org/T241242) (owner: 10Cparle) [10:23:54] (03CR) 10jerkins-bot: [V: 04-1] Remove handler deleted from the MachineVision extension on beta [mediawiki-config] - 10https://gerrit.wikimedia.org/r/565987 (https://phabricator.wikimedia.org/T241242) (owner: 10Cparle) [10:27:45] (03PS5) 10Ema: cache: configure netconsole on cp3061 [puppet] - 10https://gerrit.wikimedia.org/r/565982 (https://phabricator.wikimedia.org/T242579) [10:29:32] (03PS2) 10Cparle: Remove handler deleted from the MachineVision extension on beta [mediawiki-config] - 10https://gerrit.wikimedia.org/r/565987 (https://phabricator.wikimedia.org/T241242) [10:30:11] (03CR) 10Ema: "https://puppet-compiler.wmflabs.org/compiler1003/20455/" [puppet] - 10https://gerrit.wikimedia.org/r/565982 (https://phabricator.wikimedia.org/T242579) (owner: 10Ema) [10:30:46] (03CR) 10jerkins-bot: [V: 04-1] Remove handler deleted from the MachineVision extension on beta [mediawiki-config] - 10https://gerrit.wikimedia.org/r/565987 (https://phabricator.wikimedia.org/T241242) (owner: 10Cparle) [10:33:00] (03CR) 10Vgutierrez: [C: 03+1] cache: configure netconsole on cp3061 [puppet] - 10https://gerrit.wikimedia.org/r/565982 (https://phabricator.wikimedia.org/T242579) (owner: 10Ema) [10:34:12] (03CR) 10Ema: [C: 03+1] "Perhaps mention the addition of $extra_certs to the profile in the commit log. Other than that +1." [puppet] - 10https://gerrit.wikimedia.org/r/565625 (https://phabricator.wikimedia.org/T242374) (owner: 10Vgutierrez) [10:34:58] (03PS18) 10Giuseppe Lavagetto: wmflib: Introduce a more usable data structure to describe services. [puppet] - 10https://gerrit.wikimedia.org/r/558620 [10:34:59] (03PS15) 10Giuseppe Lavagetto: lvs::monitor: fix most technical debt [puppet] - 10https://gerrit.wikimedia.org/r/564690 [10:35:01] (03PS3) 10Giuseppe Lavagetto: lvs::monitor: use unique identifiers for services [puppet] - 10https://gerrit.wikimedia.org/r/565290 [10:35:33] (03CR) 10Arturo Borrero Gonzalez: [C: 03+1] "> Patch Set 3:" [puppet] - 10https://gerrit.wikimedia.org/r/565752 (https://phabricator.wikimedia.org/T242319) (owner: 10Legoktm) [10:38:54] <_joe_> moritzm: no it's pretty strange indeed [10:39:19] 10Operations, 10DC-Ops, 10netops: Juniper network device audit - all sites - https://phabricator.wikimedia.org/T213843 (10faidon) @ayounsi, what's the status here? [10:40:08] (03PS3) 10Vgutierrez: varnishkafka (1.0.14-1) buster-wikimedia; urgency=medium [software/varnish/varnishkafka] (debian) - 10https://gerrit.wikimedia.org/r/565581 (https://phabricator.wikimedia.org/T242093) [10:40:12] <_joe_> moritzm: as in, I just realized somehow we removed it from common.yaml [10:41:15] (03CR) 10jerkins-bot: [V: 04-1] varnishkafka (1.0.14-1) buster-wikimedia; urgency=medium [software/varnish/varnishkafka] (debian) - 10https://gerrit.wikimedia.org/r/565581 (https://phabricator.wikimedia.org/T242093) (owner: 10Vgutierrez) [10:45:13] 10Operations, 10Puppet, 10User-jbond: Add check for changes applied at all runs - https://phabricator.wikimedia.org/T242910 (10jbond) [10:45:41] (03PS1) 10Giuseppe Lavagetto: jobrunner: add has_lvs explicitly in production [puppet] - 10https://gerrit.wikimedia.org/r/565995 [10:46:00] (03CR) 10Giuseppe Lavagetto: [C: 03+2] jobrunner: add has_lvs explicitly in production [puppet] - 10https://gerrit.wikimedia.org/r/565995 (owner: 10Giuseppe Lavagetto) [10:47:47] 10Operations, 10Parsoid: Allowed memory size of 1073741824 bytes exhausted (tried to allocate 65536 bytes) in /srv/mediawiki/php-1.35.0-wmf.15/includes/OutputHandler.php on line 111 - https://phabricator.wikimedia.org/T243177 (10Marostegui) [10:48:30] <_joe_> moritzm: fixed [10:50:30] looks good, thanks! [10:51:19] 10Operations, 10Puppet, 10User-jbond: puppet-merge locking dosn;t seem to work in all instances - https://phabricator.wikimedia.org/T240970 (10jbond) 05Open→03Resolved I believe this was caused by a number oif issues which have since been fixed * [[ https://gerrit.wikimedia.org/r/c/operations/puppet/+/... [10:54:48] (03PS7) 10Vgutierrez: ATS: Deploy wikiworkshop TLS certificate [puppet] - 10https://gerrit.wikimedia.org/r/565625 (https://phabricator.wikimedia.org/T242374) [10:56:27] (03CR) 10Giuseppe Lavagetto: wmflib: Introduce a more usable data structure to describe services. (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/558620 (owner: 10Giuseppe Lavagetto) [10:58:21] (03PS19) 10Giuseppe Lavagetto: wmflib: Introduce a more usable data structure to describe services. [puppet] - 10https://gerrit.wikimedia.org/r/558620 [10:58:23] (03PS16) 10Giuseppe Lavagetto: lvs::monitor: fix most technical debt [puppet] - 10https://gerrit.wikimedia.org/r/564690 [10:58:25] (03PS4) 10Giuseppe Lavagetto: lvs::monitor: use unique identifiers for services [puppet] - 10https://gerrit.wikimedia.org/r/565290 [11:00:29] (03CR) 10Jbond: wmflib: Introduce a more usable data structure to describe services. (034 comments) [puppet] - 10https://gerrit.wikimedia.org/r/558620 (owner: 10Giuseppe Lavagetto) [11:02:59] (03CR) 10Giuseppe Lavagetto: [C: 03+2] "The change will be progressively applied to all affected clusters." [puppet] - 10https://gerrit.wikimedia.org/r/558620 (owner: 10Giuseppe Lavagetto) [11:05:46] (03PS6) 10Ema: cache: configure netconsole on cp3061 [puppet] - 10https://gerrit.wikimedia.org/r/565982 (https://phabricator.wikimedia.org/T242579) [11:10:40] (03PS7) 10Ema: cache: configure netconsole on cp3061 [puppet] - 10https://gerrit.wikimedia.org/r/565982 (https://phabricator.wikimedia.org/T242579) [11:14:05] !log deploying wikiworkshop TLS certificate on the text cluster - T242374 [11:14:08] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:14:09] T242374: Set up git-driven static microsite for wikiworkshop.org - https://phabricator.wikimedia.org/T242374 [11:14:37] (03PS4) 10Legoktm: codesearch: Use iptables from buster-backports for docker compatibility [puppet] - 10https://gerrit.wikimedia.org/r/565752 (https://phabricator.wikimedia.org/T242319) [11:14:57] (03CR) 10Vgutierrez: [C: 03+2] ATS: Deploy wikiworkshop TLS certificate [puppet] - 10https://gerrit.wikimedia.org/r/565625 (https://phabricator.wikimedia.org/T242374) (owner: 10Vgutierrez) [11:18:21] (03CR) 10Legoktm: "New PCC: https://puppet-compiler.wmflabs.org/compiler1003/20456/" [puppet] - 10https://gerrit.wikimedia.org/r/565752 (https://phabricator.wikimedia.org/T242319) (owner: 10Legoktm) [11:21:10] (03CR) 10Arturo Borrero Gonzalez: [C: 03+1] "LGTM!" [puppet] - 10https://gerrit.wikimedia.org/r/565752 (https://phabricator.wikimedia.org/T242319) (owner: 10Legoktm) [11:27:33] (03CR) 10Jbond: [C: 04-1] "Looks like there is a bug in the ca_manager script." (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/565312 (owner: 10Filippo Giunchedi) [11:30:05] !log add SSL validation to conftool/etcd expected no-op (https://gerrit.wikimedia.org/r/c/operations/puppet/+/561817) [11:30:07] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:30:10] (03CR) 10Jbond: [C: 03+2] etcd: enable ssl validation [puppet] - 10https://gerrit.wikimedia.org/r/561817 (https://phabricator.wikimedia.org/T240941) (owner: 10Jbond) [11:30:18] (03PS1) 10Giuseppe Lavagetto: role::cloudelastic: fix the deep reference to lvs::configuration [puppet] - 10https://gerrit.wikimedia.org/r/566007 [11:31:33] (03CR) 10Giuseppe Lavagetto: "I'm not fixing the tech debt here, but I'm in the middle of a fairly large rollout and I don't have the time for a thorough review." [puppet] - 10https://gerrit.wikimedia.org/r/566007 (owner: 10Giuseppe Lavagetto) [11:32:00] (03PS1) 10Jbond: Revert "etcd: enable ssl validation" [puppet] - 10https://gerrit.wikimedia.org/r/566008 [11:32:19] !log reverting untill joes change is finished - add SSL validation to conftool/etcd expected no-op (https://gerrit.wikimedia.org/r/c/operations/puppet/+/561817) [11:32:21] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:32:33] (03CR) 10Jbond: [V: 03+2 C: 03+2] Revert "etcd: enable ssl validation" [puppet] - 10https://gerrit.wikimedia.org/r/566008 (owner: 10Jbond) [11:33:23] (03CR) 10Giuseppe Lavagetto: [C: 03+2] "https://puppet-compiler.wmflabs.org/compiler1001/20458/" [puppet] - 10https://gerrit.wikimedia.org/r/566007 (owner: 10Giuseppe Lavagetto) [11:33:39] (03PS2) 10Giuseppe Lavagetto: role::cloudelastic: fix the deep reference to lvs::configuration [puppet] - 10https://gerrit.wikimedia.org/r/566007 [11:35:13] (03PS1) 10Jbond: etcd: enable ssl validation [puppet] - 10https://gerrit.wikimedia.org/r/566009 (https://phabricator.wikimedia.org/T240941) [11:40:52] !log jmm@cumin2001 START - Cookbook sre.hosts.decommission [11:40:53] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:41:29] !log jmm@cumin2001 END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) [11:41:30] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:41:34] 10Operations, 10Patch-For-Review: Migrate URL downloaders to Buster - https://phabricator.wikimedia.org/T224551 (10ops-monitoring-bot) cookbooks.sre.hosts.decommission executed by jmm@cumin2001 for hosts: `alsafi.wikimedia.org` - alsafi.wikimedia.org (**FAIL**) - Downtimed host on Icinga - No management i... [11:43:07] (03PS5) 10Arturo Borrero Gonzalez: dynamicproxy: urlproxy: introduce support for domain-based routing [puppet] - 10https://gerrit.wikimedia.org/r/565556 (https://phabricator.wikimedia.org/T234617) [11:43:56] !log removing alsafi in Ganeti T224551 [11:43:58] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:43:59] T224551: Migrate URL downloaders to Buster - https://phabricator.wikimedia.org/T224551 [11:49:46] RECOVERY - Widespread puppet agent failures- no resources reported on icinga1001 is OK: (C)0.01 ge (W)0.006 ge 0.00212 https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/yOxVDGvWk/puppet [11:50:40] (03CR) 10Alexandros Kosiaris: [C: 04-1] "Overall, good start. I 've left a number of comments." (035 comments) [deployment-charts] - 10https://gerrit.wikimedia.org/r/565788 (https://phabricator.wikimedia.org/T241230) (owner: 10Bmansurov) [11:56:45] (03PS1) 10Vgutierrez: Merge remote-tracking branch 'origin/varnishv51' into debian [software/varnish/varnishkafka] - 10https://gerrit.wikimedia.org/r/566012 [12:02:22] (03Abandoned) 10Vgutierrez: Merge remote-tracking branch 'origin/varnishv51' into debian [software/varnish/varnishkafka] - 10https://gerrit.wikimedia.org/r/566012 (owner: 10Vgutierrez) [12:04:12] (03CR) 10Muehlenhoff: [C: 03+1] "Looks good to me" [puppet] - 10https://gerrit.wikimedia.org/r/565982 (https://phabricator.wikimedia.org/T242579) (owner: 10Ema) [12:07:44] !log jmm@cumin2001 START - Cookbook sre.hosts.decommission [12:07:56] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:08:17] !log jmm@cumin2001 END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) [12:08:18] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:08:23] 10Operations, 10Patch-For-Review: Migrate URL downloaders to Buster - https://phabricator.wikimedia.org/T224551 (10ops-monitoring-bot) cookbooks.sre.hosts.decommission executed by jmm@cumin2001 for hosts: `actinium.wikimedia.org` - actinium.wikimedia.org (**FAIL**) - Downtimed host on Icinga - No manageme... [12:09:14] !log removing actinium in Ganeti T224551 [12:09:16] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:09:17] T224551: Migrate URL downloaders to Buster - https://phabricator.wikimedia.org/T224551 [12:18:31] !log elukey@cumin1001 START - Cookbook sre.zookeeper.roll-restart-zookeeper [12:18:33] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:24:19] 10Operations, 10ops-codfw, 10serviceops: rack/setup/install new codfw mw systems - https://phabricator.wikimedia.org/T241852 (10jijiki) @Papaul @Joe Do you think it is ok if we rack everything else, have the new servers in rows ABD up and running, then decom some servers in C and rack the new ones? [12:24:57] (03PS2) 10Muehlenhoff: Remove actinium|alcyone|alsafi|aluminium [puppet] - 10https://gerrit.wikimedia.org/r/565271 (https://phabricator.wikimedia.org/T224551) [12:25:06] !log elukey@cumin1001 END (PASS) - Cookbook sre.zookeeper.roll-restart-zookeeper (exit_code=0) [12:25:07] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:27:27] (03CR) 10Muehlenhoff: [C: 03+2] Remove actinium|alcyone|alsafi|aluminium [puppet] - 10https://gerrit.wikimedia.org/r/565271 (https://phabricator.wikimedia.org/T224551) (owner: 10Muehlenhoff) [12:27:56] (03PS2) 10Muehlenhoff: Remove DNS records for actinium|alcyone|alsafi|aluminium [dns] - 10https://gerrit.wikimedia.org/r/565282 (https://phabricator.wikimedia.org/T224551) [12:29:49] (03CR) 10Arturo Borrero Gonzalez: dynamicproxy: urlproxy: introduce support for domain-based routing (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/565556 (https://phabricator.wikimedia.org/T234617) (owner: 10Arturo Borrero Gonzalez) [12:30:47] (03CR) 10Muehlenhoff: [C: 03+2] Remove DNS records for actinium|alcyone|alsafi|aluminium [dns] - 10https://gerrit.wikimedia.org/r/565282 (https://phabricator.wikimedia.org/T224551) (owner: 10Muehlenhoff) [12:32:01] 10Operations, 10Traffic, 10Inuka-Team (Kanban), 10Patch-For-Review, 10Performance-Team (Radar): Code for InukaPageView instrumentation - https://phabricator.wikimedia.org/T238029 (10AMuigai) @Neil_P._Quinn_WMF This is related to the other ticket... [12:33:13] 10Operations, 10Research, 10Traffic, 10Patch-For-Review: Set up git-driven static microsite for wikiworkshop.org - https://phabricator.wikimedia.org/T242374 (10Vgutierrez) >>! In T242374#5812968, @BBlack wrote: > Most of this has been configured now, the remaining slightly difficult bit is configuring an a... [12:33:57] 10Operations, 10ops-codfw, 10User-jbond: (No Need By Date Provided) codfw: rack/setup/install puppetmaster2003.codfw.wmnet - https://phabricator.wikimedia.org/T239732 (10jbond) [12:36:18] (03CR) 10Vgutierrez: "recheck" [software/varnish/varnishkafka] (debian) - 10https://gerrit.wikimedia.org/r/565581 (https://phabricator.wikimedia.org/T242093) (owner: 10Vgutierrez) [12:38:09] (03PS4) 10Vgutierrez: varnishkafka (1.0.14-1) buster-wikimedia; urgency=medium [software/varnish/varnishkafka] (debian) - 10https://gerrit.wikimedia.org/r/565581 (https://phabricator.wikimedia.org/T242093) [12:40:25] (03CR) 10Elukey: [C: 03+1] varnishkafka (1.0.14-1) buster-wikimedia; urgency=medium [software/varnish/varnishkafka] (debian) - 10https://gerrit.wikimedia.org/r/565581 (https://phabricator.wikimedia.org/T242093) (owner: 10Vgutierrez) [12:41:02] (03CR) 10Vgutierrez: [C: 03+2] varnishkafka (1.0.14-1) buster-wikimedia; urgency=medium [software/varnish/varnishkafka] (debian) - 10https://gerrit.wikimedia.org/r/565581 (https://phabricator.wikimedia.org/T242093) (owner: 10Vgutierrez) [12:41:43] 10Operations, 10Parsoid-PHP, 10serviceops, 10Wikimedia-production-error: wt2html: Out of memory crashers - https://phabricator.wikimedia.org/T236833 (10ssastry) [12:42:21] 10Operations, 10Parsoid-PHP, 10serviceops, 10Wikimedia-production-error: wt2html: Out of memory crashers - https://phabricator.wikimedia.org/T236833 (10ssastry) 05duplicate→03Open [12:42:49] 10Operations, 10Parsoid-PHP, 10serviceops, 10Wikimedia-production-error: wt2html: Out of memory crashers - https://phabricator.wikimedia.org/T236833 (10ssastry) [12:45:24] !log uploaded varnishkafka 1.0.14-1 to apt.wm.o (buster) - T242093 [12:45:27] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:45:28] T242093: Upgrade cache cluster to debian buster - https://phabricator.wikimedia.org/T242093 [12:47:31] (03CR) 10Muehlenhoff: "PCC: https://puppet-compiler.wmflabs.org/compiler1002/20459/" [puppet] - 10https://gerrit.wikimedia.org/r/565288 (https://phabricator.wikimedia.org/T224551) (owner: 10Muehlenhoff) [12:47:38] (03PS2) 10Muehlenhoff: profile::url_downloader: Remove support for jessie [puppet] - 10https://gerrit.wikimedia.org/r/565288 (https://phabricator.wikimedia.org/T224551) [12:50:43] (03PS1) 10Elukey: Revert "Increase Spark's crypto settings in Hadoop test" [puppet] - 10https://gerrit.wikimedia.org/r/566017 [12:53:42] (03CR) 10Muehlenhoff: [C: 03+2] profile::url_downloader: Remove support for jessie [puppet] - 10https://gerrit.wikimedia.org/r/565288 (https://phabricator.wikimedia.org/T224551) (owner: 10Muehlenhoff) [12:57:42] 10Operations, 10Patch-For-Review: Migrate URL downloaders to Buster - https://phabricator.wikimedia.org/T224551 (10MoritzMuehlenhoff) >>! In T224551#5811699, @Dzahn wrote: > But running puppet recreates the squid3 file. So this will happen again next time it gets restarted and needs a follow-up fix. This was... [12:58:19] 10Operations, 10Patch-For-Review: Migrate URL downloaders to Buster - https://phabricator.wikimedia.org/T224551 (10MoritzMuehlenhoff) 05Open→03Resolved This is complete. The new Buster instances are urldownloader[12]00[12] and the old jessie systems have been removed. [12:58:21] 10Operations: Track remaining jessie systems in production - https://phabricator.wikimedia.org/T224549 (10MoritzMuehlenhoff) [12:58:31] 10Operations: Track remaining jessie systems in production - https://phabricator.wikimedia.org/T224549 (10MoritzMuehlenhoff) [13:05:42] 10Operations, 10Traffic: Analyze the impact of removing TLSv1/v1.1 on puppetmasters - https://phabricator.wikimedia.org/T242991 (10jbond) I created a quick ruby script ` lang=ruby #!/usr/bin/env ruby require "open-uri" require 'json' print(JSON.pretty_generate(JSON.parse(URI.parse('https://www.howsmyssl.com/a... [13:06:34] (03CR) 10Jbond: [C: 03+2] etcd: enable ssl validation [puppet] - 10https://gerrit.wikimedia.org/r/566009 (https://phabricator.wikimedia.org/T240941) (owner: 10Jbond) [13:06:47] !log add SSL validation to conftool/etcd expected no-op (https://gerrit.wikimedia.org/r/c/operations/puppet/+/566009) [13:06:50] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:19:59] (03PS2) 10Elukey: Revert "Increase Spark's crypto settings in Hadoop test" [puppet] - 10https://gerrit.wikimedia.org/r/566017 [13:20:26] (03PS1) 10Jbond: puppetmatser: add new puppetmaster2003 [puppet] - 10https://gerrit.wikimedia.org/r/566022 (https://phabricator.wikimedia.org/T239732) [13:21:31] (03CR) 10Elukey: [C: 03+2] Revert "Increase Spark's crypto settings in Hadoop test" [puppet] - 10https://gerrit.wikimedia.org/r/566017 (owner: 10Elukey) [13:24:05] 10Operations, 10User-jbond: Add cas authentication to puppetboard - https://phabricator.wikimedia.org/T238924 (10jbond) 05Open→03Resolved [13:24:07] 10Operations, 10Security-Team, 10User-jbond: Further steps for CAS/web SSO - https://phabricator.wikimedia.org/T233921 (10jbond) [13:25:16] 10Puppet, 10User-jbond: puppet-=merge not working on the private repo - https://phabricator.wikimedia.org/T241075 (10jbond) 05Open→03Resolved a:03jbond [13:29:01] (03CR) 10Muehlenhoff: [C: 03+1] "LGTM" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/566022 (https://phabricator.wikimedia.org/T239732) (owner: 10Jbond) [13:31:21] (03PS2) 10Jbond: puppetmatser: add new puppetmaster2003 [puppet] - 10https://gerrit.wikimedia.org/r/566022 (https://phabricator.wikimedia.org/T239732) [13:33:03] (03CR) 10Jbond: [C: 03+2] puppetmatser: add new puppetmaster2003 [puppet] - 10https://gerrit.wikimedia.org/r/566022 (https://phabricator.wikimedia.org/T239732) (owner: 10Jbond) [13:38:31] (03CR) 10Filippo Giunchedi: puppetmaster: install cassandra-ca-manager (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/565312 (owner: 10Filippo Giunchedi) [13:38:46] (03PS2) 10Filippo Giunchedi: puppetmaster: install cassandra-ca-manager [puppet] - 10https://gerrit.wikimedia.org/r/565312 [13:43:50] (03CR) 10Jbond: "puppet code looks good to me but not able to make a call as to whether the fundamental change is a good idea as I'm not familiar with the " [puppet] - 10https://gerrit.wikimedia.org/r/565312 (owner: 10Filippo Giunchedi) [13:44:03] (03CR) 10Filippo Giunchedi: "> Patch Set 1: Code-Review-1" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/565312 (owner: 10Filippo Giunchedi) [13:44:05] (03PS8) 10Ema: cache: configure netconsole on cp3061 [puppet] - 10https://gerrit.wikimedia.org/r/565982 (https://phabricator.wikimedia.org/T242579) [13:46:19] 10Operations, 10Puppet, 10Patch-For-Review, 10User-jbond: Clean up SSL configuration - https://phabricator.wikimedia.org/T240941 (10jbond) [13:47:03] (03CR) 10Jbond: [C: 03+1] "> Patch Set 2:" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/565312 (owner: 10Filippo Giunchedi) [13:48:12] PROBLEM - Check systemd state on puppetmaster2003 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [13:48:26] ^^^ il get this [13:49:12] RECOVERY - Check systemd state on puppetmaster2003 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [13:58:22] jbond42_: thanks, what was it? [13:58:59] (03PS1) 10Muehlenhoff: Remove obsolete raid0-lvm-srv.cfg [puppet] - 10https://gerrit.wikimedia.org/r/566028 (https://phabricator.wikimedia.org/T156955) [13:59:49] volans: i have only just built puppet on 2003, when puppet installes puppet-master it tries to start the puppet-master service, which we dont use and always fails. just need to reset the state [14:00:38] ack, I thoght we had it disabled in our puttetization [14:01:02] we do but the postinstall script in the deb tries to start it anyway [14:01:04] (03CR) 10Ema: [C: 03+2] cache: configure netconsole on cp3061 [puppet] - 10https://gerrit.wikimedia.org/r/565982 (https://phabricator.wikimedia.org/T242579) (owner: 10Ema) [14:03:14] k [14:04:51] (03PS1) 10Jbond: puppetmaster: ensure puppetmaster[12]002 have the puppetmaster::backend role [puppet] - 10https://gerrit.wikimedia.org/r/566031 [14:06:08] (03CR) 10Ema: [C: 03+1] puppetmaster: ensure puppetmaster[12]002 have the puppetmaster::backend role [puppet] - 10https://gerrit.wikimedia.org/r/566031 (owner: 10Jbond) [14:06:26] (03CR) 10Jbond: [C: 03+2] puppetmaster: ensure puppetmaster[12]002 have the puppetmaster::backend role [puppet] - 10https://gerrit.wikimedia.org/r/566031 (owner: 10Jbond) [14:07:22] (03CR) 10Filippo Giunchedi: [C: 03+1] Remove obsolete raid0-lvm-srv.cfg [puppet] - 10https://gerrit.wikimedia.org/r/566028 (https://phabricator.wikimedia.org/T156955) (owner: 10Muehlenhoff) [14:10:39] (03PS1) 10Jbond: puppet-merge: test [puppet] - 10https://gerrit.wikimedia.org/r/566034 [14:10:55] (03CR) 10Jbond: [V: 03+2 C: 03+2] puppet-merge: test [puppet] - 10https://gerrit.wikimedia.org/r/566034 (owner: 10Jbond) [14:11:32] (03PS1) 10Muehlenhoff: Switch some of the WMCS systems to standardized Partman recipes [puppet] - 10https://gerrit.wikimedia.org/r/566035 (https://phabricator.wikimedia.org/T156955) [14:13:24] (03CR) 10Filippo Giunchedi: "PCC https://puppet-compiler.wmflabs.org/compiler1003/20462/" [puppet] - 10https://gerrit.wikimedia.org/r/565312 (owner: 10Filippo Giunchedi) [14:13:33] jbond42_: merging https://gerrit.wikimedia.org/r/c/operations/puppet/+/565312 unless there's work ongoing ? [14:14:15] godog: no you should be good [14:14:22] (03CR) 10Marostegui: [C: 03+1] Remove obsolete raid0-lvm-srv.cfg [puppet] - 10https://gerrit.wikimedia.org/r/566028 (https://phabricator.wikimedia.org/T156955) (owner: 10Muehlenhoff) [14:14:43] (03CR) 10Filippo Giunchedi: [C: 03+2] puppetmaster: install cassandra-ca-manager [puppet] - 10https://gerrit.wikimedia.org/r/565312 (owner: 10Filippo Giunchedi) [14:14:44] cheers [14:15:01] (03PS3) 10Filippo Giunchedi: puppetmaster: install cassandra-ca-manager [puppet] - 10https://gerrit.wikimedia.org/r/565312 [14:18:22] (03PS1) 10Elukey: Revert "Revert "Increase Spark's crypto settings in Hadoop test"" [puppet] - 10https://gerrit.wikimedia.org/r/566038 [14:19:18] (03CR) 10Elukey: [C: 03+2] Revert "Revert "Increase Spark's crypto settings in Hadoop test"" [puppet] - 10https://gerrit.wikimedia.org/r/566038 (owner: 10Elukey) [14:26:07] (03CR) 10Mholloway: [C: 03+1] Remove handler deleted from the MachineVision extension on beta [mediawiki-config] - 10https://gerrit.wikimedia.org/r/565987 (https://phabricator.wikimedia.org/T241242) (owner: 10Cparle) [14:30:32] (03PS1) 10Ema: profile::netconsole: set local_ip to ipaddress by default [puppet] - 10https://gerrit.wikimedia.org/r/566041 (https://phabricator.wikimedia.org/T242579) [14:32:40] (03PS1) 10Arturo Borrero Gonzalez: [RFC] kubernetes: add support for multiple objects of any kind [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/566045 (https://phabricator.wikimedia.org/T156626) [14:32:42] (03CR) 10jerkins-bot: [V: 04-1] profile::netconsole: set local_ip to ipaddress by default [puppet] - 10https://gerrit.wikimedia.org/r/566041 (https://phabricator.wikimedia.org/T242579) (owner: 10Ema) [14:33:23] (03PS1) 10Ema: cache: enable netconsole on all upload@esams [puppet] - 10https://gerrit.wikimedia.org/r/566046 (https://phabricator.wikimedia.org/T242579) [14:34:28] (03CR) 10Ema: [C: 03+1] Release 8.0.5-1wm12 [debs/trafficserver] - 10https://gerrit.wikimedia.org/r/564584 (https://phabricator.wikimedia.org/T242620) (owner: 10Vgutierrez) [14:35:50] (03CR) 10Vgutierrez: [V: 03+2 C: 03+2] Release 8.0.5-1wm12 [debs/trafficserver] - 10https://gerrit.wikimedia.org/r/564584 (https://phabricator.wikimedia.org/T242620) (owner: 10Vgutierrez) [14:39:30] (03PS2) 10Ema: profile::netconsole: set local_ip to ipaddress [puppet] - 10https://gerrit.wikimedia.org/r/566041 (https://phabricator.wikimedia.org/T242579) [14:39:32] (03PS2) 10Ema: cache: enable netconsole on all upload@esams [puppet] - 10https://gerrit.wikimedia.org/r/566046 (https://phabricator.wikimedia.org/T242579) [14:42:00] (03PS2) 10Muehlenhoff: Remove obsolete raid0-lvm-srv.cfg [puppet] - 10https://gerrit.wikimedia.org/r/566028 (https://phabricator.wikimedia.org/T156955) [14:42:34] (03CR) 10Filippo Giunchedi: [C: 03+1] Switch some of the WMCS systems to standardized Partman recipes [puppet] - 10https://gerrit.wikimedia.org/r/566035 (https://phabricator.wikimedia.org/T156955) (owner: 10Muehlenhoff) [14:44:21] (03CR) 10Muehlenhoff: [C: 03+2] Remove obsolete raid0-lvm-srv.cfg [puppet] - 10https://gerrit.wikimedia.org/r/566028 (https://phabricator.wikimedia.org/T156955) (owner: 10Muehlenhoff) [14:49:09] (03PS3) 10Ema: profile::netconsole: use ipaddress and interface_primary [puppet] - 10https://gerrit.wikimedia.org/r/566041 (https://phabricator.wikimedia.org/T242579) [14:49:11] (03PS3) 10Ema: cache: enable netconsole on all upload@esams [puppet] - 10https://gerrit.wikimedia.org/r/566046 (https://phabricator.wikimedia.org/T242579) [14:49:43] (03CR) 10Jhedden: [C: 03+1] "+1 on the cloudcephmon hosts. I think this would also be fine for the labweb and cloudnet servers, but I'm not sure why they are using a n" [puppet] - 10https://gerrit.wikimedia.org/r/566035 (https://phabricator.wikimedia.org/T156955) (owner: 10Muehlenhoff) [14:57:23] (03CR) 10Ema: "pcc here: https://puppet-compiler.wmflabs.org/compiler1001/20466/" [puppet] - 10https://gerrit.wikimedia.org/r/566041 (https://phabricator.wikimedia.org/T242579) (owner: 10Ema) [14:57:42] (03CR) 10Jhedden: [C: 03+1] dumps-distribution: switch to sharing out only the pertinent dir on nfs [puppet] - 10https://gerrit.wikimedia.org/r/565405 (https://phabricator.wikimedia.org/T242798) (owner: 10Bstorm) [15:00:29] (03CR) 10Ema: "pcc here: https://puppet-compiler.wmflabs.org/compiler1002/20467/" [puppet] - 10https://gerrit.wikimedia.org/r/566046 (https://phabricator.wikimedia.org/T242579) (owner: 10Ema) [15:01:22] (03CR) 10Vgutierrez: [C: 03+1] cache: enable netconsole on all upload@esams [puppet] - 10https://gerrit.wikimedia.org/r/566046 (https://phabricator.wikimedia.org/T242579) (owner: 10Ema) [15:02:44] (03CR) 10Muehlenhoff: "I think the noswap is just some historical cruft, a number of partman recipes have been copied around in the past, removing or adding some" [puppet] - 10https://gerrit.wikimedia.org/r/566035 (https://phabricator.wikimedia.org/T156955) (owner: 10Muehlenhoff) [15:08:19] (03CR) 10Ema: [C: 03+1] Switch DNS servers and contemporary LVSes to standard Partman recipes [puppet] - 10https://gerrit.wikimedia.org/r/561837 (https://phabricator.wikimedia.org/T156955) (owner: 10Muehlenhoff) [15:08:30] (03PS17) 10Giuseppe Lavagetto: lvs::monitor: fix most technical debt [puppet] - 10https://gerrit.wikimedia.org/r/564690 [15:08:32] (03PS5) 10Giuseppe Lavagetto: lvs::monitor: use unique identifiers for services [puppet] - 10https://gerrit.wikimedia.org/r/565290 [15:08:34] (03PS1) 10Giuseppe Lavagetto: cache::varnish::frontend: fix for no-lvs case [puppet] - 10https://gerrit.wikimedia.org/r/566050 [15:10:06] (03CR) 10Ema: [C: 03+2] cache: enable netconsole on all upload@esams [puppet] - 10https://gerrit.wikimedia.org/r/566046 (https://phabricator.wikimedia.org/T242579) (owner: 10Ema) [15:11:36] (03CR) 10Ema: [C: 03+1] cache::varnish::frontend: fix for no-lvs case [puppet] - 10https://gerrit.wikimedia.org/r/566050 (owner: 10Giuseppe Lavagetto) [15:14:55] (03CR) 10Giuseppe Lavagetto: [C: 03+2] cache::varnish::frontend: fix for no-lvs case [puppet] - 10https://gerrit.wikimedia.org/r/566050 (owner: 10Giuseppe Lavagetto) [15:20:08] !log rolling upgrade of ats to version 8.0.5-1wm12 - T242620 T242778 [15:20:11] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:20:12] T242620: ats-tls is having issues when varnish-fe goes away - https://phabricator.wikimedia.org/T242620 [15:20:12] T242778: ATS strict round robin parent select policy doesn't work as expected - https://phabricator.wikimedia.org/T242778 [15:22:59] !log mholloway-shell@deploy1001 Started deploy [mobileapps/deploy@2a1f493]: Update mobileapps to 1848cf5 [15:23:00] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:27:41] (03PS1) 10Volans: tests: simplify and centralize skipif markers [software/spicerack] - 10https://gerrit.wikimedia.org/r/566053 [15:27:43] (03PS1) 10Volans: ganeti: add initial support for gnt-instance [software/spicerack] - 10https://gerrit.wikimedia.org/r/566054 (https://phabricator.wikimedia.org/T231068) [15:28:54] !log mholloway-shell@deploy1001 Finished deploy [mobileapps/deploy@2a1f493]: Update mobileapps to 1848cf5 (duration: 05m 55s) [15:28:56] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:30:43] <_joe_> mdholloway: hi! [15:31:01] _joe_: o/ [15:37:56] (03PS2) 10Giuseppe Lavagetto: admin: upgrade Hugh Nowlan to root shell user (ops) [puppet] - 10https://gerrit.wikimedia.org/r/564171 (https://phabricator.wikimedia.org/T242309) (owner: 10Dzahn) [15:38:45] (03CR) 10Giuseppe Lavagetto: [C: 03+2] admin: upgrade Hugh Nowlan to root shell user (ops) [puppet] - 10https://gerrit.wikimedia.org/r/564171 (https://phabricator.wikimedia.org/T242309) (owner: 10Dzahn) [15:40:13] <_joe_> mdholloway: sorry I was fighting with CI [15:52:04] (03PS1) 10Elukey: profile::analytics::cluster::packages::common: add libcrypto.so link [puppet] - 10https://gerrit.wikimedia.org/r/566062 (https://phabricator.wikimedia.org/T240934) [15:52:30] (03PS1) 10Ema: netconsole:: rename to netconsole::client [puppet] - 10https://gerrit.wikimedia.org/r/566063 (https://phabricator.wikimedia.org/T242579) [15:52:32] (03PS1) 10Ema: netconsole: add netconsole::server [puppet] - 10https://gerrit.wikimedia.org/r/566064 (https://phabricator.wikimedia.org/T242579) [15:55:49] (03PS2) 10Elukey: profile::analytics::cluster::packages::common: add libcrypto.so link [puppet] - 10https://gerrit.wikimedia.org/r/566062 (https://phabricator.wikimedia.org/T240934) [15:58:30] (03CR) 10Elukey: "https://puppet-compiler.wmflabs.org/compiler1003/20471/" [puppet] - 10https://gerrit.wikimedia.org/r/566062 (https://phabricator.wikimedia.org/T240934) (owner: 10Elukey) [16:01:15] (03PS4) 10Vgutierrez: ATS: Set connect timeout and TTFB timeouts to different values [puppet] - 10https://gerrit.wikimedia.org/r/564708 (https://phabricator.wikimedia.org/T242620) [16:08:03] 10Operations, 10Traffic: ATS strict round robin parent select policy doesn't work as expected - https://phabricator.wikimedia.org/T242778 (10Vgutierrez) 05Open→03Resolved a:03Vgutierrez Solved in 8.0.5-1wm12 [16:08:07] 10Operations, 10Traffic, 10Patch-For-Review: ats-tls is having issues when varnish-fe goes away - https://phabricator.wikimedia.org/T242620 (10Vgutierrez) [16:14:02] !log Change email assigned to User:Sadsadas (T243222) [16:14:05] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:14:06] T243222: Reset password for User:Sadsadas - https://phabricator.wikimedia.org/T243222 [16:18:32] (03CR) 10Vgutierrez: "pcc looks happy as expected: https://puppet-compiler.wmflabs.org/compiler1002/20473/" [puppet] - 10https://gerrit.wikimedia.org/r/564708 (https://phabricator.wikimedia.org/T242620) (owner: 10Vgutierrez) [16:26:40] (03CR) 10Arturo Borrero Gonzalez: [C: 03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/566035 (https://phabricator.wikimedia.org/T156955) (owner: 10Muehlenhoff) [16:28:42] (03CR) 10Arturo Borrero Gonzalez: [C: 03+1] "LGTM. Please collect +1 from at least andrew as well-" [puppet] - 10https://gerrit.wikimedia.org/r/565431 (https://phabricator.wikimedia.org/T229441) (owner: 10Alex Monk) [16:31:03] (03CR) 10Arturo Borrero Gonzalez: [C: 03+1] "LGTM. But I wonder if the 'and' formatting has anything to do with py2 vs py3. In such case, I would recommend to merge that in a differen" [puppet] - 10https://gerrit.wikimedia.org/r/565458 (https://phabricator.wikimedia.org/T229920) (owner: 10Andrew Bogott) [16:34:53] (03CR) 10Muehlenhoff: [C: 03+1] "One comment on the comment, but the code change looks good" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/566062 (https://phabricator.wikimedia.org/T240934) (owner: 10Elukey) [16:41:14] (03CR) 10Jdlrobson: "Ammarpad, When scheduling a deploy you also need to be at the deploy to test it. The only reason it should be skipped is if 1) nobody is a" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/557439 (https://phabricator.wikimedia.org/T240728) (owner: 10Ammarpad) [16:43:30] (03PS1) 10Filippo Giunchedi: thumbor: ship logs to localhost / logging pipeline [puppet] - 10https://gerrit.wikimedia.org/r/566069 (https://phabricator.wikimedia.org/T242609) [16:46:32] (03CR) 10Elukey: profile::analytics::cluster::packages::common: add libcrypto.so link (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/566062 (https://phabricator.wikimedia.org/T240934) (owner: 10Elukey) [16:52:25] (03Abandoned) 10Jakob: termbox: update eqiad service to latest [deployment-charts] - 10https://gerrit.wikimedia.org/r/562241 (owner: 10Jakob) [16:52:33] (03Abandoned) 10Jakob: termbox: update codfw service to latest [deployment-charts] - 10https://gerrit.wikimedia.org/r/562240 (owner: 10Jakob) [16:52:36] (03Abandoned) 10Jakob: termbox: update staging service to latest [deployment-charts] - 10https://gerrit.wikimedia.org/r/562239 (owner: 10Jakob) [16:54:06] (03CR) 10Jhedden: [C: 03+1] CloudVPS: codfw1dev: Fix default SSH rule to use correct range [puppet] - 10https://gerrit.wikimedia.org/r/565431 (https://phabricator.wikimedia.org/T229441) (owner: 10Alex Monk) [16:55:18] (03CR) 10Filippo Giunchedi: "Should DTRT, compiler run at https://puppet-compiler.wmflabs.org/compiler1002/20475/" [puppet] - 10https://gerrit.wikimedia.org/r/566069 (https://phabricator.wikimedia.org/T242609) (owner: 10Filippo Giunchedi) [17:26:13] 10Operations, 10Beta-Cluster-Infrastructure, 10observability: deployment-cache-upload05: Several millions of logstash error entries - https://phabricator.wikimedia.org/T243129 (10Reedy) [17:33:38] (03PS2) 10Volans: ganeti: add initial support for gnt-instance [software/spicerack] - 10https://gerrit.wikimedia.org/r/566054 (https://phabricator.wikimedia.org/T231068) [17:41:37] (03PS1) 10Giuseppe Lavagetto: conftool: add kubeproxy service [puppet] - 10https://gerrit.wikimedia.org/r/566075 [17:41:39] (03PS1) 10Giuseppe Lavagetto: lvs: switch all services on k8s to use the same conftool service [puppet] - 10https://gerrit.wikimedia.org/r/566076 [17:41:41] (03PS1) 10Giuseppe Lavagetto: conftool: remove unused services from kubernetes [puppet] - 10https://gerrit.wikimedia.org/r/566077 [18:00:51] (03CR) 10Vgutierrez: [C: 03+1] netconsole:: rename to netconsole::client [puppet] - 10https://gerrit.wikimedia.org/r/566063 (https://phabricator.wikimedia.org/T242579) (owner: 10Ema) [18:00:59] (03CR) 10Vgutierrez: [C: 03+1] netconsole: add netconsole::server [puppet] - 10https://gerrit.wikimedia.org/r/566064 (https://phabricator.wikimedia.org/T242579) (owner: 10Ema) [18:02:49] (03CR) 10Effie Mouzeli: [C: 03+1] "LGTM, I can merge it tomorrow and roll it out to 1-2 hosts to see how it goes" [puppet] - 10https://gerrit.wikimedia.org/r/566069 (https://phabricator.wikimedia.org/T242609) (owner: 10Filippo Giunchedi) [18:03:18] 10Operations, 10Beta-Cluster-Infrastructure: Upgrade puppet in deployment-prep - https://phabricator.wikimedia.org/T243226 (10Reedy) [18:14:50] 10Operations, 10Beta-Cluster-Infrastructure: Upgrade puppet in deployment-prep - https://phabricator.wikimedia.org/T243226 (10MarcoAurelio) FWIW: ` maurelio@deployment-deploy01:~$ sudo puppet --version 4.8.2 ` to which version do we need to upgrade? [18:20:18] 10Operations, 10Beta-Cluster-Infrastructure: Upgrade puppet in deployment-prep - https://phabricator.wikimedia.org/T243226 (10Reedy) 5.5 I think [20:01:36] 10Operations, 10Beta-Cluster-Infrastructure: Upgrade puppet in deployment-prep - https://phabricator.wikimedia.org/T243226 (10Krenair) am guessing this is just us needing to get a new puppetmaster with buster instead of stretch