[00:00:04] addshore, hashar, aude, MaxSem, twentyafterfour, RoanKattouw, Dereckson, thcipriani, Niharika, and zeljkof: (Dis)respected human, time to deploy Evening SWAT (Max 6 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20190205T0000). Please do the needful. [00:00:05] No GERRIT patches in the queue for this window AFAICS. [00:03:03] Ergh, mispalced my patches [00:06:09] RECOVERY - recommendation_api endpoints health on scb1002 is OK: All endpoints are healthy [00:06:13] RECOVERY - recommendation_api endpoints health on scb1003 is OK: All endpoints are healthy [00:06:19] RECOVERY - recommendation_api endpoints health on scb1001 is OK: All endpoints are healthy [00:06:20] (03PS2) 10MaxSem: Remove explicit right grants for group 'confirmed' [mediawiki-config] - 10https://gerrit.wikimedia.org/r/486422 (https://phabricator.wikimedia.org/T214655) [00:06:21] RECOVERY - recommendation_api endpoints health on scb2006 is OK: All endpoints are healthy [00:06:25] RECOVERY - recommendation_api endpoints health on scb1004 is OK: All endpoints are healthy [00:06:25] RECOVERY - recommendation_api endpoints health on scb2003 is OK: All endpoints are healthy [00:06:27] RECOVERY - recommendation_api endpoints health on scb2002 is OK: All endpoints are healthy [00:06:27] (03CR) 10MaxSem: [C: 03+2] Remove explicit right grants for group 'confirmed' [mediawiki-config] - 10https://gerrit.wikimedia.org/r/486422 (https://phabricator.wikimedia.org/T214655) (owner: 10MaxSem) [00:06:39] RECOVERY - recommendation_api endpoints health on scb2005 is OK: All endpoints are healthy [00:06:49] RECOVERY - recommendation_api endpoints health on scb2001 is OK: All endpoints are healthy [00:07:05] RECOVERY - recommendation_api endpoints health on scb2004 is OK: All endpoints are healthy [00:07:34] (03Merged) 10jenkins-bot: Remove explicit right grants for group 'confirmed' [mediawiki-config] - 10https://gerrit.wikimedia.org/r/486422 (https://phabricator.wikimedia.org/T214655) (owner: 10MaxSem) [00:09:55] (03CR) 10jenkins-bot: Remove explicit right grants for group 'confirmed' [mediawiki-config] - 10https://gerrit.wikimedia.org/r/486422 (https://phabricator.wikimedia.org/T214655) (owner: 10MaxSem) [00:11:39] !log maxsem@deploy1001 Synchronized wmf-config/InitialiseSettings.php: (no justification provided) (duration: 00m 46s) [00:11:40] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [00:12:13] (03CR) 10BryanDavis: [C: 03+1] striker: Remove support for trusty [puppet] - 10https://gerrit.wikimedia.org/r/486453 (owner: 10Muehlenhoff) [00:13:08] (03CR) 10BryanDavis: [C: 03+1] "I checked the Cloud VPS projects and the few Trusty instances that remain are not using this module as far as I can tell." [puppet] - 10https://gerrit.wikimedia.org/r/486455 (owner: 10Muehlenhoff) [00:17:33] 10Operations, 10MediaWiki-Email, 10Composer, 10Upstream: PHP classes are loaded from system packages before Composer packages in WMF production - https://phabricator.wikimedia.org/T215224 (10Tgr) [00:18:59] (03PS4) 10MaxSem: Set confirmed permissions after extensions are loaded [mediawiki-config] - 10https://gerrit.wikimedia.org/r/486405 (https://phabricator.wikimedia.org/T213003) [00:19:04] (03CR) 10MaxSem: [C: 03+2] Set confirmed permissions after extensions are loaded [mediawiki-config] - 10https://gerrit.wikimedia.org/r/486405 (https://phabricator.wikimedia.org/T213003) (owner: 10MaxSem) [00:20:00] 10Operations, 10Wikimedia-General-or-Unknown, 10Patch-For-Review: Remove pear/mail packages from WMF MW app servers - https://phabricator.wikimedia.org/T195364 (10Tgr) Note that the pear/mail files (the ones we get from composer as well) [[https://github.com/pear/Mail_Mime/blob/7b2f93fa5219da99e9997f497b916b... [00:20:07] (03Merged) 10jenkins-bot: Set confirmed permissions after extensions are loaded [mediawiki-config] - 10https://gerrit.wikimedia.org/r/486405 (https://phabricator.wikimedia.org/T213003) (owner: 10MaxSem) [00:20:09] 10Operations, 10MediaWiki-Email, 10Composer, 10Upstream: PEAR PHP classes are loaded from system packages instead of Composer packages in WMF production - https://phabricator.wikimedia.org/T215224 (10Tgr) [00:21:09] (03CR) 10jenkins-bot: Set confirmed permissions after extensions are loaded [mediawiki-config] - 10https://gerrit.wikimedia.org/r/486405 (https://phabricator.wikimedia.org/T213003) (owner: 10MaxSem) [00:22:50] 10Operations, 10MediaWiki-Email, 10Composer, 10Upstream: PEAR PHP classes are loaded from system packages instead of Composer packages in WMF production - https://phabricator.wikimedia.org/T215224 (10Tgr) [00:23:13] 10Operations, 10PHP 7.0 support, 10Patch-For-Review: Audit and sync INI settings as needed between HHVM and PHP 7 - https://phabricator.wikimedia.org/T211488 (10Tgr) [00:24:00] !log maxsem@deploy1001 Synchronized wmf-config/CommonSettings.php: https://gerrit.wikimedia.org/r/#/c/operations/mediawiki-config/+/486405/ (duration: 00m 46s) [00:24:02] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [00:26:45] 10Operations, 10MediaWiki-Email, 10Composer, 10Upstream: PEAR PHP classes are loaded from system packages instead of Composer packages in WMF production - https://phabricator.wikimedia.org/T215224 (10Tgr) [00:30:27] PROBLEM - puppet last run on cloudvirtan1002 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [00:47:07] !log add BGP sessions to AS64050 on cr1-eqsin [00:47:08] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [00:56:57] RECOVERY - puppet last run on cloudvirtan1002 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [01:20:33] 10Operations, 10ops-eqiad, 10netops: Replace eqiad mgmt switches with EX4200s - https://phabricator.wikimedia.org/T213128 (10ayounsi) This is fine, I only need 1 for tests, once in prod they can do without. [01:36:55] (03CR) 10Volans: [C: 04-1] "Question inline" (031 comment) [software/spicerack] - 10https://gerrit.wikimedia.org/r/486858 (https://phabricator.wikimedia.org/T207920) (owner: 10Mathew.onipe) [01:40:01] 10Operations, 10Continuous-Integration-Config: cergen CI fails to run on Debian Stretch because cryptography dependency cannot be built against newer openssl version - https://phabricator.wikimedia.org/T212395 (10Volans) [01:50:04] (03PS2) 10Dzahn: delete ifttt roles and module [puppet] - 10https://gerrit.wikimedia.org/r/486126 [01:53:00] (03CR) 10Dzahn: [C: 03+2] "I have talked to multiple people and consensus was we have checked what could be checked and it's very unlikely this is used." [puppet] - 10https://gerrit.wikimedia.org/r/486126 (owner: 10Dzahn) [01:55:57] (03PS5) 10Dzahn: planet: add data types and hardcode https link, not proto-relative [puppet] - 10https://gerrit.wikimedia.org/r/486701 [02:08:04] 10Operations, 10MediaWiki-Email, 10Composer, 10Upstream: PEAR PHP classes are loaded from system packages instead of Composer packages in WMF production - https://phabricator.wikimedia.org/T215224 (10MaxSem) Is there any reason why we shouldn't just nuke the PEAR packages? [02:19:17] RECOVERY - BGP status on cr2-esams is OK: BGP OK - up: 403, down: 6, shutdown: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status [02:21:47] !log delete 2nd as9121 router on cr2-esams [02:21:48] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [02:24:06] !log remove BGP session to as6412 on cr2-eqiad (gone from IX) [02:24:07] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [02:25:58] (03CR) 10Dzahn: [C: 03+2] planet: add data types and hardcode https link, not proto-relative [puppet] - 10https://gerrit.wikimedia.org/r/486701 (owner: 10Dzahn) [02:28:46] (03CR) 10Dzahn: "noop everywhere, 2 x prod and 1 x labs" [puppet] - 10https://gerrit.wikimedia.org/r/486701 (owner: 10Dzahn) [02:30:53] (03PS1) 10Volans: puppet: add delete() method to remove a host [software/spicerack] - 10https://gerrit.wikimedia.org/r/487981 (https://phabricator.wikimedia.org/T205884) [02:32:03] 10Operations, 10MediaWiki-Email, 10Composer, 10Upstream: PEAR PHP classes are loaded from system packages instead of Composer packages in WMF production - https://phabricator.wikimedia.org/T215224 (10Tgr) >>! In T215224#4926242, @MaxSem wrote: > Is there any reason why we shouldn't just nuke the PEAR packa... [02:33:30] (03PS1) 10Volans: sre.hosts: add decommission cookbook [cookbooks] - 10https://gerrit.wikimedia.org/r/487982 (https://phabricator.wikimedia.org/T205886) [02:35:25] (03CR) 10Mathew.onipe: elasticsearch_cluster: fix issues from test result (031 comment) [software/spicerack] - 10https://gerrit.wikimedia.org/r/486858 (https://phabricator.wikimedia.org/T207920) (owner: 10Mathew.onipe) [02:43:59] (03PS2) 10CRusnov: Reorganize and add tox/CI support for repository. [software/netbox-reports] - 10https://gerrit.wikimedia.org/r/487612 [02:47:56] (03CR) 10CRusnov: "I think that patch set 2 addresses requests." (034 comments) [software/netbox-reports] - 10https://gerrit.wikimedia.org/r/487612 (owner: 10CRusnov) [03:12:59] (03CR) 10Volans: [C: 04-1] "I've tested locally and found some issue." (035 comments) [software/netbox-reports] - 10https://gerrit.wikimedia.org/r/487612 (owner: 10CRusnov) [03:19:46] (03CR) 10Volans: [C: 04-1] "also we need a .gitignore for the .tox directory" [software/netbox-reports] - 10https://gerrit.wikimedia.org/r/487612 (owner: 10CRusnov) [03:47:20] (03PS3) 10CRusnov: Reorganize and add tox/CI support for repository. [software/netbox-reports] - 10https://gerrit.wikimedia.org/r/487612 [03:50:35] (03PS4) 10CRusnov: Reorganize and add tox/CI support for repository. [software/netbox-reports] - 10https://gerrit.wikimedia.org/r/487612 [03:51:32] (03CR) 10CRusnov: "I think I have addressed your issues. :)" (035 comments) [software/netbox-reports] - 10https://gerrit.wikimedia.org/r/487612 (owner: 10CRusnov) [04:12:45] (03PS5) 10CRusnov: Reorganize and add tox/CI support for repository. [software/netbox-reports] - 10https://gerrit.wikimedia.org/r/487612 [04:25:23] (03PS1) 10Gergő Tisza: Add PHP version to MediaWiki logs [mediawiki-config] - 10https://gerrit.wikimedia.org/r/487987 [05:42:59] (03CR) 10Elukey: [C: 03+1] "https://puppet-compiler.wmflabs.org/compiler1002/14522/cp1075.eqiad.wmnet/" [puppet] - 10https://gerrit.wikimedia.org/r/487883 (owner: 10Muehlenhoff) [05:45:06] (03CR) 10Elukey: [C: 03+1] "https://puppet-compiler.wmflabs.org/compiler1002/14523/" [puppet] - 10https://gerrit.wikimedia.org/r/487898 (owner: 10Muehlenhoff) [06:29:25] PROBLEM - puppet last run on mw1300 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 3 minutes ago with 1 failures. Failed resources (up to 3 shown): File[/var/lib/hphpd/hphpd.ini] [06:51:57] PROBLEM - IPv4 ping to codfw on ripe-atlas-codfw is CRITICAL: CRITICAL - failed 45 probes of 418 (alerts on 35) - https://atlas.ripe.net/measurements/1791210/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23RIPE_alerts [06:53:51] 10Operations, 10ops-eqiad, 10DBA, 10Patch-For-Review: db1114 crashed - https://phabricator.wikimedia.org/T214720 (10Marostegui) p:05Triage→03Normal [06:54:05] (03PS2) 10Marostegui: phabricator.my.cnf.erb: Disable local_infile [puppet] - 10https://gerrit.wikimedia.org/r/487884 (https://phabricator.wikimedia.org/T214248) [06:55:15] (03CR) 10Marostegui: [C: 03+2] phabricator.my.cnf.erb: Disable local_infile [puppet] - 10https://gerrit.wikimedia.org/r/487884 (https://phabricator.wikimedia.org/T214248) (owner: 10Marostegui) [06:55:55] RECOVERY - puppet last run on mw1300 is OK: OK: Puppet is currently enabled, last run 18 seconds ago with 0 failures [06:57:13] RECOVERY - IPv4 ping to codfw on ripe-atlas-codfw is OK: OK - failed 1 probes of 418 (alerts on 35) - https://atlas.ripe.net/measurements/1791210/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23RIPE_alerts [07:03:16] (03PS1) 10Marostegui: phabricator_instance.my.cnf.erb: Disable local_infile [puppet] - 10https://gerrit.wikimedia.org/r/487995 (https://phabricator.wikimedia.org/T214248) [07:04:19] (03CR) 10Marostegui: [C: 03+2] phabricator_instance.my.cnf.erb: Disable local_infile [puppet] - 10https://gerrit.wikimedia.org/r/487995 (https://phabricator.wikimedia.org/T214248) (owner: 10Marostegui) [07:05:10] !log Reboot mysql on db1117:3323 (this will make the dbproxies complain) T214248 [07:05:11] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:13:11] !log Taking mysqldump from dbstore1002.staging - T210478 [07:13:13] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:13:14] T210478: Migrate dbstore1002 to a multi instance setup on dbstore100[3-5] - https://phabricator.wikimedia.org/T210478 [07:15:01] marostegui: for a second I've read reboot dbstore1002 and I thought you felt adventurous after SF :D [07:15:35] hahaha [07:15:42] I am not that brave! [07:41:17] (03PS1) 10Marostegui: db-eqiad.php: Depool db1076 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/487996 (https://phabricator.wikimedia.org/T210713) [07:42:29] (03CR) 10Marostegui: [C: 03+2] db-eqiad.php: Depool db1076 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/487996 (https://phabricator.wikimedia.org/T210713) (owner: 10Marostegui) [07:43:33] (03Merged) 10jenkins-bot: db-eqiad.php: Depool db1076 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/487996 (https://phabricator.wikimedia.org/T210713) (owner: 10Marostegui) [07:44:43] !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Depool db1076 T210713 (duration: 00m 47s) [07:44:45] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:44:46] T210713: Drop change_tag.ct_tag column in production - https://phabricator.wikimedia.org/T210713 [07:44:47] !log Deploy schema change on db1076 - T210713 [07:44:49] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:49:55] (03CR) 10jenkins-bot: db-eqiad.php: Depool db1076 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/487996 (https://phabricator.wikimedia.org/T210713) (owner: 10Marostegui) [07:53:35] 10Operations, 10Analytics, 10User-Elukey: rack/setup/install labsdb1012.eqiad.wmnet - https://phabricator.wikimedia.org/T215231 (10Marostegui) p:05Triage→03Normal [07:54:34] 10Operations, 10ops-eqiad, 10Analytics, 10User-Elukey: rack/setup/install labsdb1012.eqiad.wmnet - https://phabricator.wikimedia.org/T215231 (10Marostegui) I have suggested to use `labsdb1012` as a hostname, as this host has the same hardware as the other labsdb1009-1011 and will be setup the same way of t... [07:56:30] !log Upgrade MySQL and kernel on db1076 [07:56:31] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:57:46] (03PS2) 10Muehlenhoff: vagrant: Remove support for trusty [puppet] - 10https://gerrit.wikimedia.org/r/487899 [07:58:03] 10Operations, 10ops-eqiad, 10Analytics, 10User-Elukey: rack/setup/install labsdb1012.eqiad.wmnet - https://phabricator.wikimedia.org/T215231 (10elukey) >>! In T215231#4926445, @Marostegui wrote: > I have suggested to use `labsdb1012` as a hostname, as this host has the same hardware as the other labsdb1009... [08:01:44] (03PS1) 10Marostegui: install_server: Allow install labsdb1012 [puppet] - 10https://gerrit.wikimedia.org/r/487997 (https://phabricator.wikimedia.org/T215231) [08:01:50] (03CR) 10Muehlenhoff: [C: 03+2] vagrant: Remove support for trusty [puppet] - 10https://gerrit.wikimedia.org/r/487899 (owner: 10Muehlenhoff) [08:02:09] (03PS5) 10Vgutierrez: Ditch certcentral config template, configure in puppet [puppet] - 10https://gerrit.wikimedia.org/r/468604 (owner: 10Alex Monk) [08:11:20] (03PS1) 10Marostegui: db-eqiad.php: Slowly repool db1076 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/487998 [08:14:08] (03PS2) 10Muehlenhoff: strongswan: Stop supporting trusty [puppet] - 10https://gerrit.wikimedia.org/r/486442 [08:14:31] (03CR) 10Marostegui: [C: 03+2] db-eqiad.php: Slowly repool db1076 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/487998 (owner: 10Marostegui) [08:15:34] (03Merged) 10jenkins-bot: db-eqiad.php: Slowly repool db1076 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/487998 (owner: 10Marostegui) [08:15:40] (03PS3) 10Muehlenhoff: strongswan: Stop supporting trusty [puppet] - 10https://gerrit.wikimedia.org/r/486442 [08:16:31] (03CR) 10Muehlenhoff: [C: 03+2] strongswan: Stop supporting trusty [puppet] - 10https://gerrit.wikimedia.org/r/486442 (owner: 10Muehlenhoff) [08:16:39] !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Slowly repool db1076 (duration: 00m 45s) [08:16:40] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:17:01] 10Operations, 10Analytics, 10Analytics-Kanban, 10Product-Analytics, 10Patch-For-Review: dbstore1002 Mysql errors - https://phabricator.wikimedia.org/T213670 (10Marostegui) An attempt to run mydumper for T210478 on dbstore1002 made it crash. [08:18:59] sigh [08:19:23] :( [08:19:33] (03PS2) 10Muehlenhoff: shinken: Remove trusty support [puppet] - 10https://gerrit.wikimedia.org/r/487900 [08:22:43] (03CR) 10jenkins-bot: db-eqiad.php: Slowly repool db1076 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/487998 (owner: 10Marostegui) [08:27:22] marostegui: if the x1 replication fails I can take care of it :D [08:28:23] :) [08:28:29] I will let you know once the recover is finished [08:28:42] (03CR) 10Elukey: [C: 03+1] install_server: Allow install labsdb1012 [puppet] - 10https://gerrit.wikimedia.org/r/487997 (https://phabricator.wikimedia.org/T215231) (owner: 10Marostegui) [08:29:55] (03PS2) 10Marostegui: install_server: Allow install labsdb1012 [puppet] - 10https://gerrit.wikimedia.org/r/487997 (https://phabricator.wikimedia.org/T215231) [08:30:00] (03PS1) 10Marostegui: db-eqiad.php: Increase traffic for db1076 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/487999 [08:30:42] (03CR) 10Marostegui: [C: 03+2] install_server: Allow install labsdb1012 [puppet] - 10https://gerrit.wikimedia.org/r/487997 (https://phabricator.wikimedia.org/T215231) (owner: 10Marostegui) [08:31:19] (03CR) 10Marostegui: [C: 03+2] db-eqiad.php: Increase traffic for db1076 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/487999 (owner: 10Marostegui) [08:32:20] (03Merged) 10jenkins-bot: db-eqiad.php: Increase traffic for db1076 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/487999 (owner: 10Marostegui) [08:33:26] !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Increase traffic for db1076 (duration: 00m 46s) [08:33:26] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:33:30] (03CR) 10Vgutierrez: [C: 03+2] Ditch certcentral config template, configure in puppet [puppet] - 10https://gerrit.wikimedia.org/r/468604 (owner: 10Alex Monk) [08:33:40] (03PS6) 10Vgutierrez: Ditch certcentral config template, configure in puppet [puppet] - 10https://gerrit.wikimedia.org/r/468604 (owner: 10Alex Monk) [08:33:50] (03CR) 10jenkins-bot: db-eqiad.php: Increase traffic for db1076 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/487999 (owner: 10Marostegui) [08:41:30] 10Puppet, 10Beta-Cluster-Infrastructure, 10monitoring: Puppet failure on deployment-prometheus01.deployment-prep.eqiad.wmflabs - https://phabricator.wikimedia.org/T214558 (10fgiunchedi) 05Open→03Resolved a:03fgiunchedi I've ran puppet on the vm to make it successful and then disabled puppet again, reso... [08:43:24] 10Operations, 10monitoring, 10Goal, 10User-fgiunchedi: Include apache_exporter in puppet module apache - https://phabricator.wikimedia.org/T187434 (10fgiunchedi) [08:43:44] 10Operations, 10Goal, 10Technical-Debt, 10User-CDanis, 10User-fgiunchedi: Reduce technical debt in metrics monitoring - https://phabricator.wikimedia.org/T177195 (10fgiunchedi) [08:43:48] 10Operations, 10monitoring, 10Goal, 10User-fgiunchedi, 10cloud-services-team (Kanban): Port non-deprecated Diamond collectors to Prometheus - https://phabricator.wikimedia.org/T177196 (10fgiunchedi) 05Open→03Resolved a:03fgiunchedi >>! In T177196#4909149, @MoritzMuehlenhoff wrote: > I think this ta... [08:45:54] (03PS1) 10Marostegui: db-eqiad.php: Increase traffic for db1076 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/488001 [08:49:39] (03CR) 10Marostegui: [C: 03+2] db-eqiad.php: Increase traffic for db1076 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/488001 (owner: 10Marostegui) [08:52:41] (03PS1) 10Vgutierrez: certcentral: Get rid of authorisedhost exported file resource [puppet] - 10https://gerrit.wikimedia.org/r/488002 (https://phabricator.wikimedia.org/T213301) [08:54:31] (03Merged) 10jenkins-bot: db-eqiad.php: Increase traffic for db1076 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/488001 (owner: 10Marostegui) [08:54:45] (03PS2) 10Muehlenhoff: striker: Remove support for trusty [puppet] - 10https://gerrit.wikimedia.org/r/486453 [08:56:30] !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Slowly repool db1076 (duration: 00m 45s) [08:56:31] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:56:37] (03CR) 10jerkins-bot: [V: 04-1] striker: Remove support for trusty [puppet] - 10https://gerrit.wikimedia.org/r/486453 (owner: 10Muehlenhoff) [08:57:46] (03CR) 10Muehlenhoff: "recheck" [puppet] - 10https://gerrit.wikimedia.org/r/486453 (owner: 10Muehlenhoff) [08:59:32] 10Operations, 10ops-eqiad, 10Analytics, 10Patch-For-Review, 10User-Elukey: rack/setup/install labsdb1012.eqiad.wmnet - https://phabricator.wikimedia.org/T215231 (10jcrespo) I don't think we should setup new hosts using multi-source. [08:59:43] (03PS1) 10Elukey: Rename sX-analytics-slave to replicas and update dbstore CNAME targets [dns] - 10https://gerrit.wikimedia.org/r/488004 (https://phabricator.wikimedia.org/T210478) [09:04:38] (03CR) 10Muehlenhoff: [C: 03+2] striker: Remove support for trusty [puppet] - 10https://gerrit.wikimedia.org/r/486453 (owner: 10Muehlenhoff) [09:06:40] 10Operations, 10ops-eqiad, 10Analytics, 10Patch-For-Review, 10User-Elukey: rack/setup/install labsdb1012.eqiad.wmnet - https://phabricator.wikimedia.org/T215231 (10Marostegui) >>! In T215231#4926627, @jcrespo wrote: > I don't think we should setup new hosts using multi-source. This host will act as a cu... [09:06:44] (03CR) 10Vgutierrez: [C: 03+2] certcentral: Get rid of authorisedhost exported file resource [puppet] - 10https://gerrit.wikimedia.org/r/488002 (https://phabricator.wikimedia.org/T213301) (owner: 10Vgutierrez) [09:07:19] (03PS2) 10Vgutierrez: certcentral: Get rid of authorisedhost exported file resource [puppet] - 10https://gerrit.wikimedia.org/r/488002 (https://phabricator.wikimedia.org/T213301) [09:07:43] 10Operations, 10ops-eqiad, 10Analytics, 10Patch-For-Review, 10User-Elukey: rack/setup/install labsdb1012.eqiad.wmnet - https://phabricator.wikimedia.org/T215231 (10jcrespo) > current labsdb hosts Those will be on multi-instance soon(TM). [09:08:39] 10Operations, 10ops-eqiad, 10Analytics, 10Patch-For-Review, 10User-Elukey: rack/setup/install labsdb1012.eqiad.wmnet - https://phabricator.wikimedia.org/T215231 (10Marostegui) >>! In T215231#4926632, @jcrespo wrote: >> current labsdb hosts > > Those will be on multi-instance soon(TM). Yes, that was dis... [09:08:51] (03CR) 10Elukey: "Jaime/Manuel: I am asking around for a .svc.eqiad.wmnet solution, but I found these old CNAMEs and I thought to refactor them to keep ever" [dns] - 10https://gerrit.wikimedia.org/r/488004 (https://phabricator.wikimedia.org/T210478) (owner: 10Elukey) [09:10:13] (03CR) 10Jcrespo: [C: 03+1] "I am ok with this, but if you deploy it as is, it will break the current usage of dbstore1002." [dns] - 10https://gerrit.wikimedia.org/r/488004 (https://phabricator.wikimedia.org/T210478) (owner: 10Elukey) [09:11:22] !log Start all slaves on dbstore1002 - T213670 [09:11:24] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:11:25] T213670: dbstore1002 Mysql errors - https://phabricator.wikimedia.org/T213670 [09:12:14] (03CR) 10Elukey: "> I am ok with this, but if you deploy it as is, it will break the" [dns] - 10https://gerrit.wikimedia.org/r/488004 (https://phabricator.wikimedia.org/T210478) (owner: 10Elukey) [09:12:45] (03PS1) 10Muehlenhoff: contint::packages::php: Remove support for trusty [puppet] - 10https://gerrit.wikimedia.org/r/488005 [09:14:20] (03CR) 10jerkins-bot: [V: 04-1] contint::packages::php: Remove support for trusty [puppet] - 10https://gerrit.wikimedia.org/r/488005 (owner: 10Muehlenhoff) [09:18:31] (03PS1) 10Marostegui: db-eqiad.php: Fully repool db1076 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/488006 [09:19:11] 10Operations, 10ops-eqiad, 10Analytics, 10Patch-For-Review, 10User-Elukey: rack/setup/install labsdb1012.eqiad.wmnet - https://phabricator.wikimedia.org/T215231 (10elukey) From our perspective, we will use the replica only for ETL (so with Sqoop) and we'll not grant any user access to the host. Having mu... [09:21:20] 10Operations, 10Certcentral, 10Traffic, 10Goal: Deploy managed LetsEncrypt certs for all public use-cases - https://phabricator.wikimedia.org/T213705 (10Vgutierrez) [09:21:42] (03CR) 10jenkins-bot: db-eqiad.php: Increase traffic for db1076 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/488001 (owner: 10Marostegui) [09:21:46] (03CR) 10Marostegui: [C: 03+2] db-eqiad.php: Fully repool db1076 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/488006 (owner: 10Marostegui) [09:22:25] (03PS1) 10Muehlenhoff: releases::reprepro: Remove trusty [puppet] - 10https://gerrit.wikimedia.org/r/488009 [09:22:30] (03Abandoned) 10Gilles: Hide debugging headers [puppet] - 10https://gerrit.wikimedia.org/r/480977 (https://phabricator.wikimedia.org/T210484) (owner: 10Gilles) [09:22:46] 10Operations, 10Analytics, 10Performance-Team, 10Traffic: Only serve debug HTTP headers when x-wikimedia-debug is present - https://phabricator.wikimedia.org/T210484 (10Gilles) [09:23:57] PROBLEM - Disk space on contint1001 is CRITICAL: DISK CRITICAL - free space: / 2598 MB (5% inode=57%) [09:24:16] (03CR) 10Joal: [C: 03+1] "+1 for me - Thanks for the better naming elukey :)" [dns] - 10https://gerrit.wikimedia.org/r/488004 (https://phabricator.wikimedia.org/T210478) (owner: 10Elukey) [09:25:21] 10Operations: Archival of home directories on servers with very large homes - https://phabricator.wikimedia.org/T215171 (10JAllemandou) For analytics-machines, should we use hdfs as an archive store? And, if we archive, how long should we keep the archives? [09:25:52] (03Merged) 10jenkins-bot: db-eqiad.php: Fully repool db1076 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/488006 (owner: 10Marostegui) [09:26:51] !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Fully repool db1076 (duration: 00m 46s) [09:26:52] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:27:44] (03CR) 10Marostegui: [C: 04-1] Rename sX-analytics-slave to replicas and update dbstore CNAME targets (032 comments) [dns] - 10https://gerrit.wikimedia.org/r/488004 (https://phabricator.wikimedia.org/T210478) (owner: 10Elukey) [09:30:29] (03PS1) 10Marostegui: db-eqiad.php: Depool db1090:3312 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/488010 (https://phabricator.wikimedia.org/T210713) [09:30:55] (03CR) 10Elukey: Rename sX-analytics-slave to replicas and update dbstore CNAME targets (031 comment) [dns] - 10https://gerrit.wikimedia.org/r/488004 (https://phabricator.wikimedia.org/T210478) (owner: 10Elukey) [09:31:50] (03CR) 10Marostegui: [C: 03+2] db-eqiad.php: Depool db1090:3312 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/488010 (https://phabricator.wikimedia.org/T210713) (owner: 10Marostegui) [09:32:32] (03PS2) 10Elukey: Rename sX-analytics-slave to replicas and update dbstore CNAME targets [dns] - 10https://gerrit.wikimedia.org/r/488004 (https://phabricator.wikimedia.org/T210478) [09:32:54] (03CR) 10jenkins-bot: db-eqiad.php: Fully repool db1076 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/488006 (owner: 10Marostegui) [09:33:13] (03Merged) 10jenkins-bot: db-eqiad.php: Depool db1090:3312 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/488010 (https://phabricator.wikimedia.org/T210713) (owner: 10Marostegui) [09:33:27] (03CR) 10jenkins-bot: db-eqiad.php: Depool db1090:3312 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/488010 (https://phabricator.wikimedia.org/T210713) (owner: 10Marostegui) [09:34:06] (03CR) 10Marostegui: [C: 03+1] Rename sX-analytics-slave to replicas and update dbstore CNAME targets [dns] - 10https://gerrit.wikimedia.org/r/488004 (https://phabricator.wikimedia.org/T210478) (owner: 10Elukey) [09:34:11] !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Depool db1090:3312 (duration: 00m 45s) [09:34:12] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:34:45] !log Deploy schema change on db1090:3312 - T210713 [09:34:47] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:34:48] T210713: Drop change_tag.ct_tag column in production - https://phabricator.wikimedia.org/T210713 [09:42:00] !log contint1001: docker image prune -f [09:42:02] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:42:09] RECOVERY - Disk space on contint1001 is OK: DISK OK [10:00:00] (03PS1) 10Marostegui: Revert "db-eqiad.php: Depool db1090:3312" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/488013 [10:08:38] (03CR) 10Arturo Borrero Gonzalez: [C: 03+1] memcached: Unconditionally use systemd [puppet] - 10https://gerrit.wikimedia.org/r/487898 (owner: 10Muehlenhoff) [10:10:47] (03CR) 10Marostegui: [C: 03+2] Revert "db-eqiad.php: Depool db1090:3312" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/488013 (owner: 10Marostegui) [10:11:52] (03Merged) 10jenkins-bot: Revert "db-eqiad.php: Depool db1090:3312" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/488013 (owner: 10Marostegui) [10:12:56] !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Repool db1090:3312 (duration: 00m 46s) [10:12:56] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:13:40] (03PS4) 10Arturo Borrero Gonzalez: graphite: refactor into role/profile [puppet] - 10https://gerrit.wikimedia.org/r/487481 [10:15:30] (03CR) 10Arturo Borrero Gonzalez: [C: 03+2] graphite: refactor into role/profile [puppet] - 10https://gerrit.wikimedia.org/r/487481 (owner: 10Arturo Borrero Gonzalez) [10:16:38] godog: hello, for when you are available I got a bunch of metrics to delete from Graphite (the ones for nodepool). https://phabricator.wikimedia.org/T215172 [10:17:35] they should have been garbage collected by now though ( graphite::whisper_cleanup has a 15 days ttl for them) [10:17:59] (03CR) 10Arturo Borrero Gonzalez: [C: 03+2] "Thank you all for the reviews :-)" [puppet] - 10https://gerrit.wikimedia.org/r/487481 (owner: 10Arturo Borrero Gonzalez) [10:19:10] (03CR) 10jenkins-bot: Revert "db-eqiad.php: Depool db1090:3312" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/488013 (owner: 10Marostegui) [10:21:15] (03PS1) 10Hashar: graphite: nodepool is gone, drop whisper_cleanup [puppet] - 10https://gerrit.wikimedia.org/r/488017 (https://phabricator.wikimedia.org/T215172) [10:22:27] hashar: kk, thanks for the task [10:22:36] (03CR) 10Hashar: "Gotta check whether the metrics actually got purged from disk. The TTL is 15 days and Nodepool has been dropped ages ago." [puppet] - 10https://gerrit.wikimedia.org/r/488017 (https://phabricator.wikimedia.org/T215172) (owner: 10Hashar) [10:23:06] godog: and greg hinted at the metrics should have been all purged by now thanks to whisper_cleanup. So I created https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/488017/ to remove the garbage collector :D [10:23:16] hopefully the directory is empty now ;) [10:24:43] (03PS4) 10Arturo Borrero Gonzalez: wmcs: monitoring: refactor code into roles/profiles [puppet] - 10https://gerrit.wikimedia.org/r/487482 [10:24:51] hashar: hehe it should indeed, I'll take a closer look later in the week [10:25:54] 10Operations, 10ops-codfw, 10Patch-For-Review, 10User-fgiunchedi: ms-be2047 spontaneous reboots - https://phabricator.wikimedia.org/T209921 (10fgiunchedi) >>! In T209921#4925508, @Papaul wrote: > @fgiunchedi I replaced the problematic server with the new one Dell shipped to me. The OS is installed and pupp... [10:26:44] (03PS1) 10Hashar: jenkins: stop purging nodepool agents config history [puppet] - 10https://gerrit.wikimedia.org/r/488019 (https://phabricator.wikimedia.org/T126552) [10:27:14] (03CR) 10Arturo Borrero Gonzalez: [C: 03+2] wmcs: monitoring: refactor code into roles/profiles [puppet] - 10https://gerrit.wikimedia.org/r/487482 (owner: 10Arturo Borrero Gonzalez) [10:27:32] (03PS2) 10Hashar: jenkins: stop purging nodepool agents config history [puppet] - 10https://gerrit.wikimedia.org/r/488019 (https://phabricator.wikimedia.org/T126552) [10:28:02] (03CR) 10Hashar: "Previously added by https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/308165/ for T126552. Affects contint1001/contint2001 :)" [puppet] - 10https://gerrit.wikimedia.org/r/488019 (https://phabricator.wikimedia.org/T126552) (owner: 10Hashar) [10:28:07] (03CR) 10Hashar: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/488019 (https://phabricator.wikimedia.org/T126552) (owner: 10Hashar) [10:29:28] I may have messed puppet in grafana servers [10:29:29] Error while evaluating a Function Call, Could not find data item profile::grafana::readonly_domain in any Hiera data file and no default supplied at /etc/puppet/modules/profile/manifests/grafana.pp:6:22 [10:29:43] cc godog [10:30:05] probably due to this [10:30:06] https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/487481/ [10:30:08] checking now [10:33:24] wait, only in labmon servers? [10:34:37] PROBLEM - puppet last run on labmon1001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [10:34:41] PROBLEM - puppet last run on labmon1002 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [10:36:15] 10Operations, 10ops-eqiad: Degraded RAID on ms-be1020 - https://phabricator.wikimedia.org/T214778 (10fgiunchedi) a:05fgiunchedi→03Cmjohnson >>! In T214778#4918910, @Cmjohnson wrote: > This is a HP server, while the f/w can probably be updated remotely it would be best if I did the update on-site with the s... [10:36:35] (03PS1) 10Marostegui: db-eqiad.php: Depool db1074 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/488020 (https://phabricator.wikimedia.org/T210713) [10:36:53] godog: graphite servers are fine: [10:36:56] https://www.irccloud.com/pastebin/RCg55QaF/ [10:37:00] sorry for the noise [10:37:17] (03PS2) 10Tim Eulitz: Disable confirmation prompt on rollback by default [mediawiki-config] - 10https://gerrit.wikimedia.org/r/487861 (https://phabricator.wikimedia.org/T215019) [10:37:26] arturo: profile::grafana::readonly_domain is set in hieradata/role/common/labs/monitoring.yaml. but the labmon role is now wmcs::monitoring [10:37:41] yes, already writting a patch [10:37:50] 10Operations, 10media-storage: ms-be1034 crash - https://phabricator.wikimedia.org/T214838 (10fgiunchedi) a:03Cmjohnson Similarly to {T214778} this host will need a firmware/bios/etc upgrade, assigning to @Cmjohnson. Let me know when ok to take the host offline. [10:38:15] arturo: np! thanks for the heads up [10:38:16] thanks moritzm [10:40:46] (03PS1) 10Arturo Borrero Gonzalez: hiera: wmcs: monitoring: rename file [puppet] - 10https://gerrit.wikimedia.org/r/488021 [10:40:55] (03CR) 10Marostegui: [C: 03+2] db-eqiad.php: Depool db1074 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/488020 (https://phabricator.wikimedia.org/T210713) (owner: 10Marostegui) [10:41:57] (03Merged) 10jenkins-bot: db-eqiad.php: Depool db1074 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/488020 (https://phabricator.wikimedia.org/T210713) (owner: 10Marostegui) [10:42:09] (03CR) 10jenkins-bot: db-eqiad.php: Depool db1074 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/488020 (https://phabricator.wikimedia.org/T210713) (owner: 10Marostegui) [10:42:58] !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Depool db1074 (duration: 00m 47s) [10:42:59] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:43:06] !log Deploy schema change on db1074 with replication, lag will be generated on s2 - T210713 [10:43:08] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:43:09] T210713: Drop change_tag.ct_tag column in production - https://phabricator.wikimedia.org/T210713 [10:43:13] (03CR) 10Arturo Borrero Gonzalez: [C: 03+2] hiera: wmcs: monitoring: rename file [puppet] - 10https://gerrit.wikimedia.org/r/488021 (owner: 10Arturo Borrero Gonzalez) [10:46:36] (03PS1) 10MarcoAurelio: [WIP] exim: Ban recurrent spam to our lists [puppet] - 10https://gerrit.wikimedia.org/r/488022 [10:48:56] 10Operations, 10Puppet, 10Packaging: Prepare puppet for Debian buster - https://phabricator.wikimedia.org/T213546 (10MoritzMuehlenhoff) [10:48:58] (03CR) 10Hashar: [C: 03+1] "There are a bunch of:" [puppet] - 10https://gerrit.wikimedia.org/r/488005 (owner: 10Muehlenhoff) [10:49:28] (03CR) 10Filippo Giunchedi: mathoid: Update prometheus-stats.conf (031 comment) [deployment-charts] - 10https://gerrit.wikimedia.org/r/486396 (owner: 10Alexandros Kosiaris) [10:50:24] (03CR) 10WMDE-Fisch: [C: 03+1] Disable confirmation prompt on rollback by default [mediawiki-config] - 10https://gerrit.wikimedia.org/r/487861 (https://phabricator.wikimedia.org/T215019) (owner: 10Tim Eulitz) [10:50:29] RECOVERY - puppet last run on labmon1001 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [10:50:33] RECOVERY - puppet last run on labmon1002 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [10:51:21] (03PS7) 10Jbond: Create module for managing ulogd [puppet] - 10https://gerrit.wikimedia.org/r/486513 (https://phabricator.wikimedia.org/T116011) [10:51:29] (03CR) 10Filippo Giunchedi: [C: 03+1] Fix PNG transparency for more cases [software/thumbor-plugins] - 10https://gerrit.wikimedia.org/r/487785 (https://phabricator.wikimedia.org/T198370) (owner: 10Gilles) [10:52:09] (03CR) 10jerkins-bot: [V: 04-1] Create module for managing ulogd [puppet] - 10https://gerrit.wikimedia.org/r/486513 (https://phabricator.wikimedia.org/T116011) (owner: 10Jbond) [10:53:30] (03CR) 10Filippo Giunchedi: [C: 03+1] "LGTM, re: deployment for thumbor haproxy isn't used yet" [puppet] - 10https://gerrit.wikimedia.org/r/487895 (owner: 10Muehlenhoff) [10:54:27] (03PS2) 10MarcoAurelio: [WIP] exim: Ban recurrent spam to our lists [puppet] - 10https://gerrit.wikimedia.org/r/488022 [10:55:54] (03CR) 10MarcoAurelio: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/488022 (owner: 10MarcoAurelio) [10:56:10] (03CR) 10Filippo Giunchedi: [C: 04-1] "LGTM overall, see inline" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/487878 (owner: 10Muehlenhoff) [10:57:44] (03CR) 10Filippo Giunchedi: [C: 03+1] "LGTM, looks like es isn't going to get restarted automatically after this change (?)" [puppet] - 10https://gerrit.wikimedia.org/r/487787 (https://phabricator.wikimedia.org/T76090) (owner: 10Gehel) [10:58:11] (03CR) 10MarcoAurelio: "https://puppet-compiler.wmflabs.org/compiler1002/107/ fails." [puppet] - 10https://gerrit.wikimedia.org/r/488022 (owner: 10MarcoAurelio) [11:00:02] (03CR) 10Filippo Giunchedi: [C: 04-1] "LGTM, see inline" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/486493 (https://phabricator.wikimedia.org/T213708) (owner: 10Cwhite) [11:00:35] (03PS1) 10Marostegui: Revert "db-eqiad.php: Depool db1074" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/488024 [11:02:16] (03CR) 10Marostegui: [C: 03+2] Revert "db-eqiad.php: Depool db1074" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/488024 (owner: 10Marostegui) [11:03:12] (03PS8) 10Jbond: Create module for managing ulogd [puppet] - 10https://gerrit.wikimedia.org/r/486513 (https://phabricator.wikimedia.org/T116011) [11:03:21] (03Merged) 10jenkins-bot: Revert "db-eqiad.php: Depool db1074" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/488024 (owner: 10Marostegui) [11:03:35] (03CR) 10jenkins-bot: Revert "db-eqiad.php: Depool db1074" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/488024 (owner: 10Marostegui) [11:04:05] (03CR) 10jerkins-bot: [V: 04-1] Create module for managing ulogd [puppet] - 10https://gerrit.wikimedia.org/r/486513 (https://phabricator.wikimedia.org/T116011) (owner: 10Jbond) [11:04:47] !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Repool db1074 (duration: 00m 46s) [11:04:48] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:05:59] (03CR) 10Muehlenhoff: statsite/statsd: Unconditionally use systemd (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/487878 (owner: 10Muehlenhoff) [11:08:04] (03CR) 10Muehlenhoff: aptrepo: add prometheus-node-exporter components for all dists (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/486493 (https://phabricator.wikimedia.org/T213708) (owner: 10Cwhite) [11:09:48] (03PS9) 10Jbond: Create module for managing ulogd [puppet] - 10https://gerrit.wikimedia.org/r/486513 (https://phabricator.wikimedia.org/T116011) [11:10:43] (03CR) 10jerkins-bot: [V: 04-1] Create module for managing ulogd [puppet] - 10https://gerrit.wikimedia.org/r/486513 (https://phabricator.wikimedia.org/T116011) (owner: 10Jbond) [11:13:26] (03PS3) 10Muehlenhoff: statsite/statsd: Unconditionally use systemd [puppet] - 10https://gerrit.wikimedia.org/r/487878 [11:15:43] (03PS10) 10Jbond: Create module for managing ulogd [puppet] - 10https://gerrit.wikimedia.org/r/486513 (https://phabricator.wikimedia.org/T116011) [11:16:30] (03CR) 10jerkins-bot: [V: 04-1] Create module for managing ulogd [puppet] - 10https://gerrit.wikimedia.org/r/486513 (https://phabricator.wikimedia.org/T116011) (owner: 10Jbond) [11:19:01] PROBLEM - very high load average likely xfs on ms-be2047 is CRITICAL: CRITICAL - load average: 296.20, 297.24, 289.92 [11:19:50] 10Operations, 10Performance-Team, 10Traffic, 10media-storage: Automatically clean up unused thumbnails in Swift - https://phabricator.wikimedia.org/T211661 (10Gilles) I've looked at bloomd's capabilities and it doesn't have a concept of things expiring, which makes sense given the nature of a bloom filter.... [11:20:10] 10Operations, 10Wikimedia-Mailing-lists: Ban recurrent spam to Wikimedia mailing lists (January 2019) - https://phabricator.wikimedia.org/T215251 (10MarcoAurelio) [11:21:14] (03CR) 10Gilles: [V: 03+2 C: 03+2] Fix PNG transparency for more cases [software/thumbor-plugins] - 10https://gerrit.wikimedia.org/r/487785 (https://phabricator.wikimedia.org/T198370) (owner: 10Gilles) [11:22:58] (03PS11) 10Jbond: Create module for managing ulogd [puppet] - 10https://gerrit.wikimedia.org/r/486513 (https://phabricator.wikimedia.org/T116011) [11:23:53] (03CR) 10jerkins-bot: [V: 04-1] Create module for managing ulogd [puppet] - 10https://gerrit.wikimedia.org/r/486513 (https://phabricator.wikimedia.org/T116011) (owner: 10Jbond) [11:27:40] (03CR) 10Filippo Giunchedi: [C: 03+1] "LGTM!" [puppet] - 10https://gerrit.wikimedia.org/r/487878 (owner: 10Muehlenhoff) [11:29:24] (03CR) 10MarcoAurelio: "Filed T215251 for this one." [puppet] - 10https://gerrit.wikimedia.org/r/488022 (owner: 10MarcoAurelio) [11:33:04] (03PS3) 10Marostegui: analytics-dbstore.sql: Initial research user role [puppet] - 10https://gerrit.wikimedia.org/r/487000 (https://phabricator.wikimedia.org/T214469) [11:33:08] (03CR) 10Filippo Giunchedi: [C: 03+1] role::graphite::base: Unconditionally use systemd [puppet] - 10https://gerrit.wikimedia.org/r/487874 (owner: 10Muehlenhoff) [11:33:41] (03CR) 10Filippo Giunchedi: [C: 03+1] rsyslog::kafka_shipper: raise rsyslog MaxMessageSize from 8k to 64k [puppet] - 10https://gerrit.wikimedia.org/r/480793 (https://phabricator.wikimedia.org/T205849) (owner: 10Herron) [11:35:26] !log added firmware-enriched buster netboot image (T213546) [11:35:28] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:35:28] T213546: Prepare puppet for Debian buster - https://phabricator.wikimedia.org/T213546 [11:38:31] PROBLEM - puppet last run on mw1232 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [11:42:27] (03PS2) 10Arturo Borrero Gonzalez: wmcs: prefix all of our customs scripts with 'wmcs-' [puppet] - 10https://gerrit.wikimedia.org/r/487368 [11:43:49] RECOVERY - puppet last run on mw1232 is OK: OK: Puppet is currently enabled, last run 28 seconds ago with 0 failures [11:44:22] (03CR) 10jerkins-bot: [V: 04-1] wmcs: prefix all of our customs scripts with 'wmcs-' [puppet] - 10https://gerrit.wikimedia.org/r/487368 (owner: 10Arturo Borrero Gonzalez) [11:45:30] (03CR) 10Arturo Borrero Gonzalez: [C: 03+2] "Catalog seems fine: https://puppet-compiler.wmflabs.org/compiler1002/14528/" [puppet] - 10https://gerrit.wikimedia.org/r/487368 (owner: 10Arturo Borrero Gonzalez) [11:46:09] (03CR) 10Arturo Borrero Gonzalez: [V: 03+2 C: 03+2] "Removing jenkins to avoid the -1 from the python linters. We will handle that separately." [puppet] - 10https://gerrit.wikimedia.org/r/487368 (owner: 10Arturo Borrero Gonzalez) [11:51:35] (03PS1) 10Arturo Borrero Gonzalez: openstack: admin_scripts: fix relationships [puppet] - 10https://gerrit.wikimedia.org/r/488028 [11:51:37] PROBLEM - puppet last run on cloudcontrol1004 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [11:52:17] (03CR) 10jerkins-bot: [V: 04-1] openstack: admin_scripts: fix relationships [puppet] - 10https://gerrit.wikimedia.org/r/488028 (owner: 10Arturo Borrero Gonzalez) [11:52:25] PROBLEM - puppet last run on cloudcontrol1003 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [11:54:25] (03PS2) 10Arturo Borrero Gonzalez: openstack: admin_scripts: fix relationships [puppet] - 10https://gerrit.wikimedia.org/r/488028 [11:55:07] (03CR) 10Arturo Borrero Gonzalez: [C: 03+2] "Catalog is fine: https://puppet-compiler.wmflabs.org/compiler1002/14529/" [puppet] - 10https://gerrit.wikimedia.org/r/488028 (owner: 10Arturo Borrero Gonzalez) [11:56:09] (03PS12) 10Jbond: Create module for managing ulogd [puppet] - 10https://gerrit.wikimedia.org/r/486513 (https://phabricator.wikimedia.org/T116011) [11:57:15] (03PS2) 10Alexandros Kosiaris: Introduce blubberoid.wikimedia.org [dns] - 10https://gerrit.wikimedia.org/r/485814 (https://phabricator.wikimedia.org/T212251) [11:57:23] (03CR) 10jerkins-bot: [V: 04-1] Create module for managing ulogd [puppet] - 10https://gerrit.wikimedia.org/r/486513 (https://phabricator.wikimedia.org/T116011) (owner: 10Jbond) [11:57:37] (03CR) 10Alexandros Kosiaris: [C: 03+2] Introduce blubberoid.wikimedia.org [dns] - 10https://gerrit.wikimedia.org/r/485814 (https://phabricator.wikimedia.org/T212251) (owner: 10Alexandros Kosiaris) [11:59:32] (03PS2) 10Alexandros Kosiaris: Introduce blubberoid.wikimedia.org in varnish [puppet] - 10https://gerrit.wikimedia.org/r/485823 (https://phabricator.wikimedia.org/T212251) [11:59:37] (03CR) 10Alexandros Kosiaris: [C: 03+2] Introduce blubberoid.wikimedia.org in varnish [puppet] - 10https://gerrit.wikimedia.org/r/485823 (https://phabricator.wikimedia.org/T212251) (owner: 10Alexandros Kosiaris) [12:00:04] addshore, hashar, aude, MaxSem, twentyafterfour, RoanKattouw, Dereckson, thcipriani, Niharika, and zeljkof: It is that lovely time of the day again! You are hereby commanded to deploy European Mid-day SWAT(Max 6 patches). (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20190205T1200). [12:00:04] tim_WMDE: A patch you scheduled for European Mid-day SWAT(Max 6 patches) is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker. [12:01:14] (03PS1) 10Muehlenhoff: Temporarily readd graphite2002 for buster d-i tests [puppet] - 10https://gerrit.wikimedia.org/r/488031 [12:05:08] 10Operations, 10decommission, 10User-fgiunchedi: Return graphite200[12] to spares pool - https://phabricator.wikimedia.org/T199321 (10MoritzMuehlenhoff) Please don't proceed with decom for now; Filippo uses graphite2001 for prometheus 2 tests and I'm using graphite2002 for some buster tests. [12:05:12] (03PS1) 10Arturo Borrero Gonzalez: openstack: admin_scripts: also rename puppet dirs to include the prefix [puppet] - 10https://gerrit.wikimedia.org/r/488032 [12:05:25] (03PS1) 10Gilles: Skip PDF JPX test on Jessie [software/thumbor-plugins] - 10https://gerrit.wikimedia.org/r/488033 (https://phabricator.wikimedia.org/T215253) [12:05:25] 10Operations, 10ops-codfw, 10decommission, 10monitoring: Decom graphite2002 - https://phabricator.wikimedia.org/T200210 (10MoritzMuehlenhoff) Please don't proceed with decom for now; I'm using graphite2002 for some buster tests. [12:05:43] (03CR) 10jerkins-bot: [V: 04-1] openstack: admin_scripts: also rename puppet dirs to include the prefix [puppet] - 10https://gerrit.wikimedia.org/r/488032 (owner: 10Arturo Borrero Gonzalez) [12:07:36] (03CR) 10Arturo Borrero Gonzalez: [C: 03+2] "Catalog is fine: https://puppet-compiler.wmflabs.org/compiler1002/14530/" [puppet] - 10https://gerrit.wikimedia.org/r/488032 (owner: 10Arturo Borrero Gonzalez) [12:08:29] (03CR) 10Arturo Borrero Gonzalez: [V: 03+2 C: 03+2] "Again, removing jenkins to avoid the -1 from the python linters. We will handle the python code update/cleanup separately." [puppet] - 10https://gerrit.wikimedia.org/r/488032 (owner: 10Arturo Borrero Gonzalez) [12:08:38] (03PS2) 10Arturo Borrero Gonzalez: openstack: admin_scripts: also rename puppet dirs to include the prefix [puppet] - 10https://gerrit.wikimedia.org/r/488032 [12:08:47] (03CR) 10Arturo Borrero Gonzalez: [V: 03+2 C: 03+2] openstack: admin_scripts: also rename puppet dirs to include the prefix [puppet] - 10https://gerrit.wikimedia.org/r/488032 (owner: 10Arturo Borrero Gonzalez) [12:09:59] PROBLEM - puppet last run on labtestcontrol2003 is CRITICAL: CRITICAL: Puppet has 2 failures. Last run 5 minutes ago with 2 failures. Failed resources (up to 3 shown): File[/usr/local/sbin/wmcs-nova-quota-sync],File[/root/wmcs-nova-quota-sync/readme.md] [12:10:42] (03CR) 10Gilles: [V: 03+2 C: 03+2] Skip PDF JPX test on Jessie [software/thumbor-plugins] - 10https://gerrit.wikimedia.org/r/488033 (https://phabricator.wikimedia.org/T215253) (owner: 10Gilles) [12:11:47] PROBLEM - puppet last run on labcontrol1001 is CRITICAL: CRITICAL: Puppet has 2 failures. Last run 6 minutes ago with 2 failures. Failed resources (up to 3 shown): File[/root/wmcs-nova-quota-sync/readme.md],File[/usr/local/sbin/wmcs-nova-quota-sync] [12:12:28] (03PS1) 10Gilles: Version bump [software/thumbor-plugins] - 10https://gerrit.wikimedia.org/r/488037 (https://phabricator.wikimedia.org/T198370) [12:12:47] RECOVERY - puppet last run on cloudcontrol1004 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [12:13:33] RECOVERY - puppet last run on cloudcontrol1003 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [12:15:05] tim_WMDE: got someone to deploy your thing? [12:15:28] Nope, if you would be available, please go ahead [12:15:51] (03PS3) 10Addshore: Disable confirmation prompt on rollback by default [mediawiki-config] - 10https://gerrit.wikimedia.org/r/487861 (https://phabricator.wikimedia.org/T215019) (owner: 10Tim Eulitz) [12:15:55] (03CR) 10Addshore: [C: 03+2] Disable confirmation prompt on rollback by default [mediawiki-config] - 10https://gerrit.wikimedia.org/r/487861 (https://phabricator.wikimedia.org/T215019) (owner: 10Tim Eulitz) [12:15:58] tim_WMDE: will do! [12:16:12] Thanks! [12:17:14] (03Merged) 10jenkins-bot: Disable confirmation prompt on rollback by default [mediawiki-config] - 10https://gerrit.wikimedia.org/r/487861 (https://phabricator.wikimedia.org/T215019) (owner: 10Tim Eulitz) [12:18:47] !log addshore@deploy1001 Synchronized wmf-config/InitialiseSettings.php: SWAT: Disable confirmation prompt on rollback by default T215019 (duration: 00m 47s) [12:18:49] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:18:50] T215019: Add an option for wikis to set the default of showing confirmation prompts for rollbacks - https://phabricator.wikimedia.org/T215019 [12:18:50] tim_WMDE: done [12:18:52] (03PS1) 10Arturo Borrero Gonzalez: openstack: spreadcheck: use the 'wmcs-' prefix [puppet] - 10https://gerrit.wikimedia.org/r/488039 [12:18:53] !log swat done [12:18:54] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:19:58] Thanks again! [12:20:57] (03PS2) 10Arturo Borrero Gonzalez: openstack: spreadcheck: use the 'wmcs-' prefix [puppet] - 10https://gerrit.wikimedia.org/r/488039 [12:21:33] tim_WMDE: np! [12:21:47] (03PS2) 10Muehlenhoff: Temporarily readd graphite2002 for buster d-i tests [puppet] - 10https://gerrit.wikimedia.org/r/488031 [12:21:49] (03PS13) 10Jbond: Create module for managing ulogd [puppet] - 10https://gerrit.wikimedia.org/r/486513 (https://phabricator.wikimedia.org/T116011) [12:22:48] (03CR) 10jerkins-bot: [V: 04-1] Create module for managing ulogd [puppet] - 10https://gerrit.wikimedia.org/r/486513 (https://phabricator.wikimedia.org/T116011) (owner: 10Jbond) [12:23:23] (03PS14) 10Jbond: Create module for managing ulogd [puppet] - 10https://gerrit.wikimedia.org/r/486513 (https://phabricator.wikimedia.org/T116011) [12:23:29] (03PS4) 10Marostegui: analytics-dbstore.sql: Initial research user role [puppet] - 10https://gerrit.wikimedia.org/r/487000 (https://phabricator.wikimedia.org/T214469) [12:23:40] (03CR) 10jenkins-bot: Disable confirmation prompt on rollback by default [mediawiki-config] - 10https://gerrit.wikimedia.org/r/487861 (https://phabricator.wikimedia.org/T215019) (owner: 10Tim Eulitz) [12:23:42] (03CR) 10Arturo Borrero Gonzalez: [C: 03+2] "Catalog apparently fine: https://puppet-compiler.wmflabs.org/compiler1002/14531/" [puppet] - 10https://gerrit.wikimedia.org/r/488039 (owner: 10Arturo Borrero Gonzalez) [12:24:01] 10Operations, 10monitoring, 10Patch-For-Review: Serve >= 50% of production Prometheus systems with Prometheus v2 - https://phabricator.wikimedia.org/T187987 (10fgiunchedi) I ran a test conversion on graphite2001 using `prometheus-storage-migrator` and a snapshot of data taken from prometheus2003 and parallel... [12:24:18] (03CR) 10jerkins-bot: [V: 04-1] Create module for managing ulogd [puppet] - 10https://gerrit.wikimedia.org/r/486513 (https://phabricator.wikimedia.org/T116011) (owner: 10Jbond) [12:24:53] (03PS3) 10Muehlenhoff: Temporarily readd graphite2002 for buster d-i tests [puppet] - 10https://gerrit.wikimedia.org/r/488031 [12:25:24] (03PS15) 10Jbond: Create module for managing ulogd [puppet] - 10https://gerrit.wikimedia.org/r/486513 (https://phabricator.wikimedia.org/T116011) [12:25:44] (03CR) 10Muehlenhoff: [C: 03+2] Temporarily readd graphite2002 for buster d-i tests [puppet] - 10https://gerrit.wikimedia.org/r/488031 (owner: 10Muehlenhoff) [12:26:19] (03CR) 10jerkins-bot: [V: 04-1] Create module for managing ulogd [puppet] - 10https://gerrit.wikimedia.org/r/486513 (https://phabricator.wikimedia.org/T116011) (owner: 10Jbond) [12:29:34] (03PS1) 10Arturo Borrero Gonzalez: openstack: spreadcheck: typo in relationship [puppet] - 10https://gerrit.wikimedia.org/r/488041 [12:30:56] (03CR) 10Arturo Borrero Gonzalez: [C: 03+2] openstack: spreadcheck: typo in relationship [puppet] - 10https://gerrit.wikimedia.org/r/488041 (owner: 10Arturo Borrero Gonzalez) [12:31:19] PROBLEM - puppet last run on cloudcontrol1004 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [12:32:05] PROBLEM - puppet last run on cloudcontrol1003 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [12:36:25] RECOVERY - puppet last run on labtestcontrol2003 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [12:36:37] RECOVERY - puppet last run on cloudcontrol1004 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [12:37:23] RECOVERY - puppet last run on cloudcontrol1003 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures [12:38:15] RECOVERY - puppet last run on labcontrol1001 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [12:41:28] has anything changed between yesterday and today with jenkins voting? I have stuff that passes locally and was passing yesterday but is now failing jshint. I can't figure out what's changed? [12:43:09] hopefully someone in -releng would know details [12:47:34] Does anyone know which script is used to build the release notes pages on mw.org: https://www.mediawiki.org/wiki/MediaWiki_1.32/wmf.23 ? [12:48:41] he-rron and mu-tante ain't avalaible yet at this time right? [12:48:42] It appears use the gerrit field "Uploader" instead of "Owner" for credit [12:49:16] (03PS16) 10Jbond: Create module for managing ulogd [puppet] - 10https://gerrit.wikimedia.org/r/486513 (https://phabricator.wikimedia.org/T116011) [12:49:46] (03CR) 10Arturo Borrero Gonzalez: "There are a couple of typos in the commit message." [puppet] - 10https://gerrit.wikimedia.org/r/487882 (owner: 10Jbond) [12:50:15] (03CR) 10jerkins-bot: [V: 04-1] Create module for managing ulogd [puppet] - 10https://gerrit.wikimedia.org/r/486513 (https://phabricator.wikimedia.org/T116011) (owner: 10Jbond) [12:50:39] edsanders, https://phabricator.wikimedia.org/diffusion/MREL/browse/master/make-deploy-notes/uploadChangelog.php ? [12:52:31] https://phabricator.wikimedia.org/diffusion/MREL/browse/master/make-deploy-notes/make-deploy-notes [12:53:09] probably [12:53:16] (03PS17) 10Jbond: Create module for managing ulogd [puppet] - 10https://gerrit.wikimedia.org/r/486513 (https://phabricator.wikimedia.org/T116011) [12:54:13] (03CR) 10jerkins-bot: [V: 04-1] Create module for managing ulogd [puppet] - 10https://gerrit.wikimedia.org/r/486513 (https://phabricator.wikimedia.org/T116011) (owner: 10Jbond) [12:55:58] Thanks, cc Krinkle, Reedy, James_F the author information it outputs is misleading. We should try to get the patch owner, not the last person to have modified it [13:00:04] Deploy window Pre MediaWiki train sanity break (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20190205T1300) [13:11:50] 10Operations: Archival of home directories on servers with very large homes - https://phabricator.wikimedia.org/T215171 (10elukey) I'd say that for analytics hosts (like stat boxes) we should follow what we thought it was happening, namely homes left there so we can follow up with the owners to figure out if dat... [13:12:41] (03CR) 10Elukey: [C: 03+2] Rename sX-analytics-slave to replicas and update dbstore CNAME targets [dns] - 10https://gerrit.wikimedia.org/r/488004 (https://phabricator.wikimedia.org/T210478) (owner: 10Elukey) [13:12:45] (03PS3) 10Elukey: Rename sX-analytics-slave to replicas and update dbstore CNAME targets [dns] - 10https://gerrit.wikimedia.org/r/488004 (https://phabricator.wikimedia.org/T210478) [13:20:26] 10Operations, 10Patch-For-Review: ferm: Log dropped packets - https://phabricator.wikimedia.org/T116011 (10jbond) I have created a simple module for configuereing ulogd2 avalible in https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/486513/. The questions is what do we want to log and how do we want to lo... [13:33:45] 10Operations, 10ops-codfw: Degraded RAID on thumbor2002 - https://phabricator.wikimedia.org/T215185 (10jijiki) [13:33:47] 10Operations, 10ops-codfw, 10serviceops, 10User-jijiki: Degraded RAID on thumbor2002 - https://phabricator.wikimedia.org/T214813 (10jijiki) [13:44:07] RECOVERY - very high load average likely xfs on ms-be2047 is OK: OK - load average: 0.02, 4.38, 76.31 [13:45:27] 10Operations, 10MediaWiki-Cache, 10MW-1.33-notes (1.33.0-wmf.16; 2019-02-05), 10Patch-For-Review, and 3 others: Mcrouter periodically reports soft TKOs for mc1022 (was mc1035) leading to MW Memcached exceptions - https://phabricator.wikimedia.org/T203786 (10elukey) >>! In T203786#4922717, @elukey wrote: >... [13:46:31] finally found a repro \o/ --^ [13:52:12] 10Operations, 10ops-codfw, 10serviceops, 10User-jijiki: Degraded RAID on thumbor2002 - https://phabricator.wikimedia.org/T214813 (10jijiki) @Papaul At your convenience, please replace the drive again, the machine should boot. The server has been already depooled since last week. If the server fails to b... [13:52:55] 10Operations: Archival of home directories on servers with very large homes - https://phabricator.wikimedia.org/T215171 (10MoritzMuehlenhoff) The archival mechanism doesn't seem very robust either; e.g. for user "banyek" the home is still around on e.g. cumin2001 or puppetmaster1001. [13:55:15] !log swift codfw-prod: add ms-be2047 - T209395 T209921 [13:55:19] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:55:20] T209395: rack/setup/install new ms-be servers ms-be204[4-9] ,ms-be2050 - https://phabricator.wikimedia.org/T209395 [13:55:20] T209921: ms-be2047 spontaneous reboots - https://phabricator.wikimedia.org/T209921 [13:58:32] 10Operations, 10cloud-services-team, 10monitoring: Upgrade Prometheus to 2.6 in deployment-prep and tools - https://phabricator.wikimedia.org/T215272 (10fgiunchedi) p:05Triage→03Normal [14:00:04] Deploy window MediaWiki train - European version (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20190205T1400) [14:03:27] (03PS1) 10Marostegui: analytics-grants.sql.erb: Clean up GRANTs [puppet] - 10https://gerrit.wikimedia.org/r/488052 (https://phabricator.wikimedia.org/T214469) [14:04:40] (03PS4) 10Vgutierrez: certcentral: Implement staging time [software/certcentral] - 10https://gerrit.wikimedia.org/r/485594 (https://phabricator.wikimedia.org/T213737) [14:05:07] !log Delete non used grants from dbstore1002: log, warehouse,project_illustration, cognate\_wiktionary, datasets - T212487 T210478 [14:05:13] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:05:15] T210478: Migrate dbstore1002 to a multi instance setup on dbstore100[3-5] - https://phabricator.wikimedia.org/T210478 [14:05:16] T212487: Review dbstore1002's non-wiki databases and decide which ones needs to be migrated to the new multi instance setup - https://phabricator.wikimedia.org/T212487 [14:05:21] (03CR) 10Filippo Giunchedi: "I've cherry-picked this patch on deployment-puppetmaster and it is running on deployment-prometheus02 as expected." [puppet] - 10https://gerrit.wikimedia.org/r/486051 (https://phabricator.wikimedia.org/T187987) (owner: 10Filippo Giunchedi) [14:11:11] (03PS2) 10Marostegui: analytics-grants.sql.erb: Clean up GRANTs [puppet] - 10https://gerrit.wikimedia.org/r/488052 (https://phabricator.wikimedia.org/T214469) [14:11:25] 10Operations, 10Puppet, 10Continuous-Integration-Config: Improve CI checks to cover more of the code base - https://phabricator.wikimedia.org/T215275 (10jbond) [14:12:20] !log Deploy schema change on db1066 (s2 master) - T210713 [14:12:23] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:12:23] T210713: Drop change_tag.ct_tag column in production - https://phabricator.wikimedia.org/T210713 [14:13:07] (03PS8) 10Jbond: Improve CI checks to ensure a basic catalouge compiles on all supported OS's [puppet] - 10https://gerrit.wikimedia.org/r/487882 (https://phabricator.wikimedia.org/T215275) [14:18:34] (03CR) 10Gilles: [V: 03+2 C: 03+2] Version bump [software/thumbor-plugins] - 10https://gerrit.wikimedia.org/r/488037 (https://phabricator.wikimedia.org/T198370) (owner: 10Gilles) [14:20:55] (03PS1) 10Gilles: Upgrade to 2.3 [debs/python-thumbor-wikimedia] - 10https://gerrit.wikimedia.org/r/488060 (https://phabricator.wikimedia.org/T198370) [14:22:46] (03CR) 10Elukey: [C: 03+1] analytics-grants.sql.erb: Clean up GRANTs [puppet] - 10https://gerrit.wikimedia.org/r/488052 (https://phabricator.wikimedia.org/T214469) (owner: 10Marostegui) [14:23:16] (03CR) 10Marostegui: [C: 03+2] analytics-grants.sql.erb: Clean up GRANTs [puppet] - 10https://gerrit.wikimedia.org/r/488052 (https://phabricator.wikimedia.org/T214469) (owner: 10Marostegui) [14:24:03] (03PS1) 10Jgreen: point payments.frdev.wm.o to codfw payments IP [dns] - 10https://gerrit.wikimedia.org/r/488062 [14:25:04] (03PS2) 10Gilles: Upgrade to 2.3 [debs/python-thumbor-wikimedia] - 10https://gerrit.wikimedia.org/r/488060 (https://phabricator.wikimedia.org/T215253) [14:25:17] (03CR) 10Jgreen: [C: 03+2] point payments.frdev.wm.o to codfw payments IP [dns] - 10https://gerrit.wikimedia.org/r/488062 (owner: 10Jgreen) [14:25:38] (03PS4) 10Alexandros Kosiaris: mathoid: Update prometheus-stats.conf [deployment-charts] - 10https://gerrit.wikimedia.org/r/486396 [14:25:40] 10Operations, 10Analytics, 10Product-Analytics, 10User-Elukey: notebook/stat server(s) running out of memory - https://phabricator.wikimedia.org/T212824 (10elukey) @aborrero thanks a lot! As far as I can see the limits are applied for each user separately, but my use case is a bit different - I'd need to a... [14:25:58] (03PS3) 10Gilles: Upgrade to 2.3 [debs/python-thumbor-wikimedia] - 10https://gerrit.wikimedia.org/r/488060 (https://phabricator.wikimedia.org/T198370) [14:26:17] (03PS2) 10Filippo Giunchedi: graphite: nodepool is gone, drop whisper_cleanup [puppet] - 10https://gerrit.wikimedia.org/r/488017 (https://phabricator.wikimedia.org/T215172) (owner: 10Hashar) [14:26:24] (03CR) 10Filippo Giunchedi: [C: 03+2] graphite: nodepool is gone, drop whisper_cleanup [puppet] - 10https://gerrit.wikimedia.org/r/488017 (https://phabricator.wikimedia.org/T215172) (owner: 10Hashar) [14:26:31] !log authdns-update for payments dev/testing hostname [14:26:32] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:30:49] (03CR) 10Alexandros Kosiaris: "The inline comment did not need addressing, however testing did unearth an interesting discrepancy between statsd timers and prometheus ti" (031 comment) [deployment-charts] - 10https://gerrit.wikimedia.org/r/486396 (owner: 10Alexandros Kosiaris) [14:30:50] !log Deploy schema change on s4 codfw master with replication, lag will be generated on s4 codfw - T210713 [14:30:52] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:30:53] T210713: Drop change_tag.ct_tag column in production - https://phabricator.wikimedia.org/T210713 [14:33:57] 10Operations, 10Patch-For-Review: ferm: Log dropped packets - https://phabricator.wikimedia.org/T116011 (10MoritzMuehlenhoff) >>! In T116011#4927275, @jbond wrote: > I have created a simple module for configuereing ulogd2 avalible in https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/486513/. The question... [14:42:45] 10Operations, 10MediaWiki-Email, 10Composer, 10Upstream: PEAR PHP classes are loaded from system packages instead of Composer packages in WMF production - https://phabricator.wikimedia.org/T215224 (10Anomie) The composer-added include path is being overwritten in WMF production by https://gerrit.wikimedia.... [14:53:18] (03CR) 10CDanis: [C: 03+1] "Nice! Looks good to me assuming still no PCC diffs. Just one question" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/486051 (https://phabricator.wikimedia.org/T187987) (owner: 10Filippo Giunchedi) [14:53:34] (03CR) 10jerkins-bot: [V: 04-1] prometheus: add feature flag for v2 compat [puppet] - 10https://gerrit.wikimedia.org/r/486051 (https://phabricator.wikimedia.org/T187987) (owner: 10Filippo Giunchedi) [14:57:43] (03PS1) 10Anomie: Preserve Composer's include paths [mediawiki-config] - 10https://gerrit.wikimedia.org/r/488070 (https://phabricator.wikimedia.org/T215126) [14:59:48] (03PS2) 10Anomie: Preserve Composer's include paths [mediawiki-config] - 10https://gerrit.wikimedia.org/r/488070 (https://phabricator.wikimedia.org/T215126) [15:01:31] (03CR) 10Alexandros Kosiaris: [C: 03+1] "Looks correct, and from what I gather using python-apt is only useful for faster parsing, so I guess we can live without it." [docker-images/docker-pkg] - 10https://gerrit.wikimedia.org/r/487849 (owner: 10Hashar) [15:01:34] (03CR) 10Anomie: Preserve Composer's include paths (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/488070 (https://phabricator.wikimedia.org/T215126) (owner: 10Anomie) [15:06:50] 10Operations, 10monitoring: Expose linux kernel firewall and connections statistics - https://phabricator.wikimedia.org/T215277 (10jcrespo) [15:06:58] (03CR) 10Alexandros Kosiaris: [C: 03+1] cli.defaults was altered by read_config() [docker-images/docker-pkg] - 10https://gerrit.wikimedia.org/r/487848 (owner: 10Hashar) [15:09:54] 10Operations, 10MobileFrontend, 10Traffic, 10Readers-Web-Backlog (Tracking): Remove .m. subdomain, serve mobile and desktop variants through the same URL - https://phabricator.wikimedia.org/T214998 (10pmiazga) [15:11:07] gilles: T215250 doesn't have any projects, maybe #performance or #media-storage or #thumbor would be nice? [15:11:08] T215250: Estimate size of Commons image corpus at given resolution - https://phabricator.wikimedia.org/T215250 [15:11:29] or even #analytics [15:15:27] !log force curator action 'replicas' to set older logstash indices to 1 replica - T213078 [15:15:29] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:15:30] T213078: Replicas set to two on logstash indices regardless of index age - https://phabricator.wikimedia.org/T213078 [15:15:34] (03PS2) 10Alexandros Kosiaris: ntp: Update comment for usage of all_networks [puppet] - 10https://gerrit.wikimedia.org/r/483442 [15:15:40] (03CR) 10Alexandros Kosiaris: [V: 03+2 C: 03+2] ntp: Update comment for usage of all_networks [puppet] - 10https://gerrit.wikimedia.org/r/483442 (owner: 10Alexandros Kosiaris) [15:19:32] (03CR) 10Alexandros Kosiaris: [C: 03+2] "Per https://puppet-compiler.wmflabs.org/compiler1002/14532/fermium.wikimedia.org/ the only diff is the addition of the localhost IPs, whic" [puppet] - 10https://gerrit.wikimedia.org/r/483441 (owner: 10Alexandros Kosiaris) [15:19:41] (03PS2) 10Alexandros Kosiaris: mailmain: Switch to using aggregate_networks [puppet] - 10https://gerrit.wikimedia.org/r/483441 [15:19:46] (03CR) 10Alexandros Kosiaris: [V: 03+2 C: 03+2] mailmain: Switch to using aggregate_networks [puppet] - 10https://gerrit.wikimedia.org/r/483441 (owner: 10Alexandros Kosiaris) [15:20:54] mailmain? [15:21:02] * vgutierrez sends some coffee to ak [15:21:09] *akosiaris [15:21:15] damn jetlag, I need coffee as well [15:22:13] vgutierrez: old patches that I hadn't merged. Looks like a good time to do so now [15:22:30] yeah, small typo in the commit msg ;P [15:22:39] I 'll probably have an update for varnish soon [15:22:47] ah, did not notice... [15:23:05] well... it's merged now, so it will be like that forever [15:24:00] so now you need to please the OCD gods with some offer [15:24:34] volans: your presence is being requested ^. Something about a sacrifice? :P [15:27:34] (03CR) 10Alexandros Kosiaris: "PCC is happy at https://puppet-compiler.wmflabs.org/compiler1002/14533/" [puppet] - 10https://gerrit.wikimedia.org/r/483432 (owner: 10Alexandros Kosiaris) [15:30:50] (03CR) 10DCausse: [C: 03+1] elasticsearch: exit the JVM on OutOfMemoryError [puppet] - 10https://gerrit.wikimedia.org/r/487787 (https://phabricator.wikimedia.org/T76090) (owner: 10Gehel) [15:31:18] (03PS1) 10Alexandros Kosiaris: Add mailmain to typos [puppet] - 10https://gerrit.wikimedia.org/r/488073 [15:34:00] (03CR) 10Alexandros Kosiaris: [C: 03+2] Add mailmain to typos [puppet] - 10https://gerrit.wikimedia.org/r/488073 (owner: 10Alexandros Kosiaris) [15:34:15] 10Operations, 10monitoring: Expose linux kernel firewall and connections statistics - https://phabricator.wikimedia.org/T215277 (10CDanis) Neat! I only did a quick look, but I couldn't find an existing Prometheus exporter for ulogd. I suppose we could write one, or we could just write an `mtail` module for i... [15:46:50] (03PS1) 10Elukey: Introduce systemd::slice::user [puppet] - 10https://gerrit.wikimedia.org/r/488077 (https://phabricator.wikimedia.org/T212824) [15:46:56] (03PS1) 10Elukey: Introduce profile::analytics::cluster::limits::statistics [puppet] - 10https://gerrit.wikimedia.org/r/488078 (https://phabricator.wikimedia.org/T212824) [15:47:04] (03PS1) 10Elukey: profile::toolforge::bastion: use systemd::slice::user [puppet] - 10https://gerrit.wikimedia.org/r/488079 [15:48:47] PROBLEM - MariaDB Slave Lag: s4 on db1102 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 795.38 seconds [15:49:52] (03PS2) 10Elukey: Introduce systemd::slice::user [puppet] - 10https://gerrit.wikimedia.org/r/488077 (https://phabricator.wikimedia.org/T212824) [15:49:54] (03PS2) 10Elukey: Introduce profile::analytics::cluster::limits::statistics [puppet] - 10https://gerrit.wikimedia.org/r/488078 (https://phabricator.wikimedia.org/T212824) [15:49:57] (03PS2) 10Elukey: profile::toolforge::bastion: use systemd::slice::user [puppet] - 10https://gerrit.wikimedia.org/r/488079 [15:50:49] arturo: o/ - I tried to generalize your awesome user-.slice code in ---^ [15:51:01] so it could be easily re-used (mostly by me :P) [15:51:17] let me know if you like the idea or not, naming etc.. can be amended of course [15:54:11] elukey: o/ --- I assume you pointed to wikibugs, which I ignore :-P [15:54:46] ??? [15:55:40] (03PS3) 10Elukey: Introduce systemd::slice::user [puppet] - 10https://gerrit.wikimedia.org/r/488077 (https://phabricator.wikimedia.org/T212824) [15:55:42] (03PS3) 10Elukey: Introduce profile::analytics::cluster::limits::statistics [puppet] - 10https://gerrit.wikimedia.org/r/488078 (https://phabricator.wikimedia.org/T212824) [15:55:44] (03PS3) 10Elukey: profile::toolforge::bastion: use systemd::slice::user [puppet] - 10https://gerrit.wikimedia.org/r/488079 [15:55:47] (still fixing typos) [15:59:03] elukey: I ignore wikibugs in IRC [15:59:31] arturo: ahhhh you mean that you filter notifications on IRC, sorry I didn't get it :D [15:59:57] (03CR) 10Filippo Giunchedi: [C: 03+1] "LGTM! Good catch on the / 1000 operation" [deployment-charts] - 10https://gerrit.wikimedia.org/r/486396 (owner: 10Alexandros Kosiaris) [16:01:38] (03CR) 10CDanis: [C: 03+1] mathoid: Update prometheus-stats.conf (031 comment) [deployment-charts] - 10https://gerrit.wikimedia.org/r/486396 (owner: 10Alexandros Kosiaris) [16:02:09] (03CR) 10Filippo Giunchedi: prometheus: add feature flag for v2 compat (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/486051 (https://phabricator.wikimedia.org/T187987) (owner: 10Filippo Giunchedi) [16:02:32] yess, sorry elukey :-) [16:03:16] arturo: anyway, to answer your question, yes I added you to some code reviews :) [16:06:30] (03CR) 10Gehel: "> LGTM, looks like es isn't going to get restarted automatically after this change (?)" [puppet] - 10https://gerrit.wikimedia.org/r/487787 (https://phabricator.wikimedia.org/T76090) (owner: 10Gehel) [16:08:48] elukey: cool, I will follow up on gerrit [16:12:15] RECOVERY - MariaDB Slave Lag: s4 on db1102 is OK: OK slave_sql_lag Replication lag: 0.00 seconds [16:12:20] 10Operations, 10ops-codfw, 10Traffic, 10Patch-For-Review: rack/setup/install LVS200[7-10] - https://phabricator.wikimedia.org/T196560 (10Vgutierrez) So, the NIC issue reported in T203194 seems to be fixed after upgrading the NIC firmware to version 21.40 (https://www.dell.com/support/home/us/en/04/drivers/... [16:14:51] 10Operations, 10decommission, 10User-fgiunchedi: Return graphite200[12] to spares pool - https://phabricator.wikimedia.org/T199321 (10RobH) a:05RobH→03MoritzMuehlenhoff Ok, I'm assignign this to Mortiz until these systems are ready for decom. Please unassign when you are done! Filippo please comment wh... [16:34:37] (03PS1) 10Thcipriani: plugins: add healthcheck plugin [software/gerrit] (wmf/stable-2.15) - 10https://gerrit.wikimedia.org/r/488088 (https://phabricator.wikimedia.org/T214326) [16:36:08] (03PS1) 10Fsero: Bump helm to 2.12.2 for security and features [debs/helm] (debian/stretch-wikimedia) - 10https://gerrit.wikimedia.org/r/488089 (https://phabricator.wikimedia.org/T215244) [16:36:28] (03CR) 10Paladox: "Already added in https://gerrit.wikimedia.org/r/c/operations/software/gerrit/+/488088 :)" [software/gerrit] (wmf/stable-2.15) - 10https://gerrit.wikimedia.org/r/488088 (https://phabricator.wikimedia.org/T214326) (owner: 10Thcipriani) [16:41:43] (03CR) 10Fsero: "I've created a new branch since it reflects the target OS for the package and also because master depends on a gbp-import and I cannot reb" [debs/helm] (debian/stretch-wikimedia) - 10https://gerrit.wikimedia.org/r/488089 (https://phabricator.wikimedia.org/T215244) (owner: 10Fsero) [16:43:32] (03PS3) 10Cwhite: aptrepo: add prometheus-node-exporter components for all dists [puppet] - 10https://gerrit.wikimedia.org/r/486493 (https://phabricator.wikimedia.org/T213708) [16:44:33] (03CR) 10Paladox: "Sorry wrong link, https://gerrit.wikimedia.org/r/c/operations/software/gerrit/+/487913" [software/gerrit] (wmf/stable-2.15) - 10https://gerrit.wikimedia.org/r/488088 (https://phabricator.wikimedia.org/T214326) (owner: 10Thcipriani) [16:46:12] (03PS17) 10Cwhite: prometheus: upgrade to node-exporter 0.17 [puppet] - 10https://gerrit.wikimedia.org/r/486192 (https://phabricator.wikimedia.org/T213708) [16:50:05] (03PS2) 10EBernhardson: mwgrep: Query all search clusters [puppet] - 10https://gerrit.wikimedia.org/r/487924 (https://phabricator.wikimedia.org/T215199) [17:00:04] godog and _joe_: Time to snap out of that daydream and deploy Puppet SWAT(Max 6 patches). Get on with it. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20190205T1700). [17:00:05] No GERRIT patches in the queue for this window AFAICS. [17:01:42] (03CR) 10DCausse: [C: 04-1] mwgrep: Query all search clusters (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/487924 (https://phabricator.wikimedia.org/T215199) (owner: 10EBernhardson) [17:05:11] (03PS1) 10Paladox: gerrit: Switch db from mysql to H2 [puppet] - 10https://gerrit.wikimedia.org/r/488093 [17:05:34] 10Operations, 10hardware-requests: codfw spare pool system for partman testing - https://phabricator.wikimedia.org/T215301 (10RobH) p:05Triage→03Normal [17:05:43] (03CR) 10jerkins-bot: [V: 04-1] gerrit: Switch db from mysql to H2 [puppet] - 10https://gerrit.wikimedia.org/r/488093 (owner: 10Paladox) [17:07:48] (03PS2) 10Paladox: gerrit: Switch db from mysql to H2 [puppet] - 10https://gerrit.wikimedia.org/r/488093 [17:08:11] (03CR) 10Paladox: [C: 04-1] "Pending upgrade to 2.16." [puppet] - 10https://gerrit.wikimedia.org/r/488093 (owner: 10Paladox) [17:10:05] (03CR) 10Cwhite: "Have you thoughts about the v1 configuration files that become unmanaged after this CS?" [puppet] - 10https://gerrit.wikimedia.org/r/486051 (https://phabricator.wikimedia.org/T187987) (owner: 10Filippo Giunchedi) [17:11:39] 10Operations, 10hardware-requests: codfw spare pool system for partman testing - https://phabricator.wikimedia.org/T215301 (10RobH) [17:12:21] (03PS3) 10EBernhardson: mwgrep: Query all search clusters [puppet] - 10https://gerrit.wikimedia.org/r/487924 (https://phabricator.wikimedia.org/T215199) [17:12:32] (03CR) 10EBernhardson: mwgrep: Query all search clusters (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/487924 (https://phabricator.wikimedia.org/T215199) (owner: 10EBernhardson) [17:15:37] 10Operations, 10hardware-requests: codfw spare pool system for partman testing - https://phabricator.wikimedia.org/T215301 (10RobH) [17:16:22] (03CR) 10Fsero: "i removed the golang-glide dependency because the buster or unstable version installed trigger a bug while getting helm dependencies." [debs/helm] (debian/stretch-wikimedia) - 10https://gerrit.wikimedia.org/r/488089 (https://phabricator.wikimedia.org/T215244) (owner: 10Fsero) [17:16:54] 10Operations, 10hardware-requests: codfw spare pool system for partman testing - https://phabricator.wikimedia.org/T215301 (10RobH) [17:17:31] 10Operations, 10hardware-requests: codfw spare pool system for partman testing - https://phabricator.wikimedia.org/T215301 (10RobH) network port asw-d-codfw:ge-8/0/4 labeled as 'theemin' and set to private1-d-codfw vlan [17:19:53] (03PS1) 10CDanis: theemin.codfw and theemin.mgmt.codfw [dns] - 10https://gerrit.wikimedia.org/r/488094 (https://phabricator.wikimedia.org/T215301) [17:20:42] (03PS3) 10Herron: [WIP] exim: Ban recurrent spam to our lists [puppet] - 10https://gerrit.wikimedia.org/r/488022 (owner: 10MarcoAurelio) [17:21:33] !log scandium -- copy /srv/visualdiff/testrecude/testrun.ids from ruthenium to the same locatio [17:21:34] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:21:46] (03CR) 10RobH: [C: 03+1] theemin.codfw and theemin.mgmt.codfw [dns] - 10https://gerrit.wikimedia.org/r/488094 (https://phabricator.wikimedia.org/T215301) (owner: 10CDanis) [17:22:07] (03CR) 10CDanis: [C: 03+2] theemin.codfw and theemin.mgmt.codfw [dns] - 10https://gerrit.wikimedia.org/r/488094 (https://phabricator.wikimedia.org/T215301) (owner: 10CDanis) [17:22:25] (03Abandoned) 10Cdentinger: rename frmon to frmon.frdev, slight retab [dns] - 10https://gerrit.wikimedia.org/r/487909 (https://phabricator.wikimedia.org/T215182) (owner: 10Cdentinger) [17:22:39] !log scandium - restart parsoid-vd service [17:22:40] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:22:41] (03PS1) 10Cdentinger: point frmon.frdev.wm.o right to the public IP [dns] - 10https://gerrit.wikimedia.org/r/488095 [17:24:03] 10Operations, 10SRE-Access-Requests, 10Discovery-Search (Current work): Analytics query access for search platform NLP contractor @Julia.glen - https://phabricator.wikimedia.org/T214623 (10EBernhardson) @gehel Since we are all finally back I want to make sure this is progressing, it's very important for juli... [17:25:18] 10Operations, 10monitoring: Expose linux kernel firewall and connections statistics - https://phabricator.wikimedia.org/T215277 (10jbond) The data tracks metrics on all connections so may contain to much information for Prometheus. an example running on a test machine (jbond-puppet-client.puppet.eqiad.wmflabs... [17:27:19] (03PS2) 10Cdentinger: point frmon.frdev.wm.o right to the public IP [dns] - 10https://gerrit.wikimedia.org/r/488095 (https://phabricator.wikimedia.org/T215182) [17:28:52] (03PS3) 10Paladox: gerrit: Switch db from mysql to H2 [puppet] - 10https://gerrit.wikimedia.org/r/488093 [17:32:15] (03CR) 10Herron: [C: 03+1] "Looks good, thanks for this! Want me to merge?" [puppet] - 10https://gerrit.wikimedia.org/r/483432 (owner: 10Alexandros Kosiaris) [17:33:34] (03PS2) 10Fsero: Bump helm to 2.12.2 for security and features [debs/helm] (debian/stretch-wikimedia) - 10https://gerrit.wikimedia.org/r/488089 (https://phabricator.wikimedia.org/T215244) [17:37:05] (03PS3) 10Cdentinger: point frmon.frdev.wm.o right to the public IP [dns] - 10https://gerrit.wikimedia.org/r/488095 (https://phabricator.wikimedia.org/T215182) [17:39:18] 10Operations, 10Gerrit: Convert Gerrit to use H2 as the database after 2.16 upgrade - https://phabricator.wikimedia.org/T211139 (10Paladox) I've just successfully switched https://gerrit.git.wmflabs.org/r/q/status:open from using mysql to h2 for that single table. It's very easy to migrate, setup just a fake... [17:39:30] (03PS4) 10Paladox: gerrit: Switch db from mysql to H2 [puppet] - 10https://gerrit.wikimedia.org/r/488093 (https://phabricator.wikimedia.org/T211139) [17:43:24] @seen Hauskatze [17:43:24] mutante: Last time I saw hauskatze they were quitting the network with reason: Quit: hauskatze N/A at 2/5/2019 4:33:06 PM (1h10m17s ago) [17:43:41] (03PS9) 10Paladox: Gerrit: Add flogger javaopts [puppet] - 10https://gerrit.wikimedia.org/r/463519 (https://phabricator.wikimedia.org/T200739) [17:44:14] (03CR) 10DCausse: mwgrep: Query all search clusters (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/487924 (https://phabricator.wikimedia.org/T215199) (owner: 10EBernhardson) [17:44:20] (03Abandoned) 10Paladox: Gerrit: Switch to the mariadb connector [puppet] - 10https://gerrit.wikimedia.org/r/384588 (https://phabricator.wikimedia.org/T176164) (owner: 10Paladox) [17:44:52] (03PS3) 10Paladox: Gerrit: Support git protocol version 2 [puppet] - 10https://gerrit.wikimedia.org/r/473643 [17:45:33] (03Abandoned) 10Paladox: Gerrit: Make header a blue and the text white [puppet] - 10https://gerrit.wikimedia.org/r/458593 (https://phabricator.wikimedia.org/T200739) (owner: 10Paladox) [17:45:50] (03PS3) 10Paladox: Gerrit: Update soy templates for gerrit 2.16 [puppet] - 10https://gerrit.wikimedia.org/r/473264 [17:46:02] (03PS1) 10CDanis: theemin testing: add dhcp and netboot configs [puppet] - 10https://gerrit.wikimedia.org/r/488097 (https://phabricator.wikimedia.org/T215301) [17:46:13] 10Operations, 10hardware-requests, 10Patch-For-Review: codfw spare pool system for partman testing - https://phabricator.wikimedia.org/T215301 (10RobH) [17:46:18] (03PS2) 10Paladox: Insert the description of the change. [puppet] - 10https://gerrit.wikimedia.org/r/356034 [17:46:22] (03Abandoned) 10Paladox: Insert the description of the change. [puppet] - 10https://gerrit.wikimedia.org/r/356034 (owner: 10Paladox) [17:47:07] (03PS1) 10Paladox: gerrit: Remove symlink to mysql java connector [puppet] - 10https://gerrit.wikimedia.org/r/488099 [17:47:09] (03CR) 10CDanis: [C: 03+2] theemin testing: add dhcp and netboot configs [puppet] - 10https://gerrit.wikimedia.org/r/488097 (https://phabricator.wikimedia.org/T215301) (owner: 10CDanis) [17:47:30] (03PS2) 10Paladox: gerrit: Remove symlink to mysql java connector [puppet] - 10https://gerrit.wikimedia.org/r/488099 [17:50:34] (03PS9) 10Paladox: ircecho: Migrate from OptionParser to ArgumentParser [puppet] - 10https://gerrit.wikimedia.org/r/480760 [17:52:48] (03Abandoned) 10Thcipriani: plugins: add healthcheck plugin [software/gerrit] (wmf/stable-2.15) - 10https://gerrit.wikimedia.org/r/488088 (https://phabricator.wikimedia.org/T214326) (owner: 10Thcipriani) [17:53:19] 10Operations, 10hardware-requests, 10Patch-For-Review: codfw spare pool system for partman testing - https://phabricator.wikimedia.org/T215301 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by cdanis on cumin2001.codfw.wmnet for hosts: ` ['theemin.codfw.wmnet'] ` The log can be found in `/var/lo... [17:55:44] (03PS4) 10Herron: lists: reject recurrent spam based on subject as stopgap [puppet] - 10https://gerrit.wikimedia.org/r/488022 (https://phabricator.wikimedia.org/T215251) (owner: 10MarcoAurelio) [17:56:45] (03CR) 10jerkins-bot: [V: 04-1] lists: reject recurrent spam based on subject as stopgap [puppet] - 10https://gerrit.wikimedia.org/r/488022 (https://phabricator.wikimedia.org/T215251) (owner: 10MarcoAurelio) [17:58:54] (03PS3) 10Dzahn: jenkins: stop purging nodepool agents config history [puppet] - 10https://gerrit.wikimedia.org/r/488019 (https://phabricator.wikimedia.org/T126552) (owner: 10Hashar) [17:59:11] (03PS10) 10Paladox: Gerrit: Add flogger javaopts [puppet] - 10https://gerrit.wikimedia.org/r/463519 (https://phabricator.wikimedia.org/T200739) [17:59:28] (03PS11) 10Paladox: Gerrit: Add flogger javaopts [puppet] - 10https://gerrit.wikimedia.org/r/463519 (https://phabricator.wikimedia.org/T200739) [18:00:04] cscott, arlolra, subbu, halfak, and Amir1: #bothumor When your hammer is PHP, everything starts looking like a thumb. Rise for Services – Graphoid / Parsoid / Citoid / ORES. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20190205T1800). [18:01:17] (03CR) 10Thcipriani: [V: 03+2 C: 03+2] "plugin builds for me" [software/gerrit] (wmf/stable-2.15) - 10https://gerrit.wikimedia.org/r/487913 (https://phabricator.wikimedia.org/T214326) (owner: 10Paladox) [18:01:19] (03CR) 10Dzahn: [C: 03+2] jenkins: stop purging nodepool agents config history [puppet] - 10https://gerrit.wikimedia.org/r/488019 (https://phabricator.wikimedia.org/T126552) (owner: 10Hashar) [18:02:05] (03PS5) 10Herron: lists: reject recurrent spam based on subject as stopgap [puppet] - 10https://gerrit.wikimedia.org/r/488022 (https://phabricator.wikimedia.org/T215251) (owner: 10MarcoAurelio) [18:04:09] (03CR) 10Herron: [C: 03+2] "Looks ok to me, but ideally we can follow this up to reject based on more criteria (i.e. subject and from headers) as well as identify a l" [puppet] - 10https://gerrit.wikimedia.org/r/488022 (https://phabricator.wikimedia.org/T215251) (owner: 10MarcoAurelio) [18:05:05] (03PS6) 10Herron: lists: reject recurrent spam based on subject as stopgap [puppet] - 10https://gerrit.wikimedia.org/r/488022 (https://phabricator.wikimedia.org/T215251) (owner: 10MarcoAurelio) [18:06:31] (03CR) 10Herron: [C: 03+2] lists: reject recurrent spam based on subject as stopgap [puppet] - 10https://gerrit.wikimedia.org/r/488022 (https://phabricator.wikimedia.org/T215251) (owner: 10MarcoAurelio) [18:08:35] 10Operations, 10SRE-Access-Requests, 10Discovery-Search (Current work): Analytics query access for search platform NLP contractor @Julia.glen - https://phabricator.wikimedia.org/T214623 (10Dzahn) @EBernhardson One reason that blocks progress is that no SSH public key has been pasted yet. The other is that du... [18:08:55] (03PS4) 10Jgreen: point frmon.frdev.wm.o right to the public IP [dns] - 10https://gerrit.wikimedia.org/r/488095 (https://phabricator.wikimedia.org/T215182) (owner: 10Cdentinger) [18:09:15] 10Operations, 10SRE-Access-Requests, 10Discovery-Search (Current work): Analytics query access for search platform NLP contractor @Julia.glen - https://phabricator.wikimedia.org/T214623 (10Dzahn) Also,, have the talks to legal about the NDA happened? [18:10:32] (03PS1) 10Thcipriani: Plugins: Add healthcheck plugin jar [software/gerrit] (deploy/wmf/stable-2.15) - 10https://gerrit.wikimedia.org/r/488101 (https://phabricator.wikimedia.org/T214326) [18:10:46] 10Operations, 10hardware-requests, 10Patch-For-Review: codfw spare pool system for partman testing - https://phabricator.wikimedia.org/T215301 (10RobH) a:05CDanis→03Papaul When attempting to PXE boot this host: 10:05 < cdanis> : "PXE-E61: Media test failure, check cable" Then I went ahead and attach... [18:12:10] (03CR) 10Jgreen: [C: 03+2] point frmon.frdev.wm.o right to the public IP [dns] - 10https://gerrit.wikimedia.org/r/488095 (https://phabricator.wikimedia.org/T215182) (owner: 10Cdentinger) [18:13:03] (03PS2) 10Dzahn: releases::reprepro: Remove trusty [puppet] - 10https://gerrit.wikimedia.org/r/488009 (owner: 10Muehlenhoff) [18:13:04] 10Operations, 10hardware-requests, 10Patch-For-Review: codfw spare pool system for partman testing - https://phabricator.wikimedia.org/T215301 (10RobH) a:05Papaul→03CDanis [18:13:44] !log authdns-update to deploy 7fee817fd3 [18:13:45] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:17:01] 10Operations, 10Wikimedia-Mailing-lists, 10Patch-For-Review: Ban recurrent spam to Wikimedia mailing lists (January 2019) - https://phabricator.wikimedia.org/T215251 (10herron) p:05Triage→03Normal Thanks for the patch! As mentioned in https://gerrit.wikimedia.org/r/488022 a reject rule is now in place b... [18:17:39] !log contint1001/contint2001 -manually deleting crontab lines unpuppetized in gerrit:488019 (T209361) [18:17:42] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:17:42] T209361: Phase out Nodepool from production - https://phabricator.wikimedia.org/T209361 [18:18:44] (03CR) 10Dzahn: "i deleted the lines manually from crontab on both contint1001 and contint2001 since we did not do "ensure => absent" here first" [puppet] - 10https://gerrit.wikimedia.org/r/488019 (https://phabricator.wikimedia.org/T126552) (owner: 10Hashar) [18:19:06] (03PS4) 10EBernhardson: mwgrep: Query all search clusters [puppet] - 10https://gerrit.wikimedia.org/r/487924 (https://phabricator.wikimedia.org/T215199) [18:19:09] (03CR) 10EBernhardson: mwgrep: Query all search clusters (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/487924 (https://phabricator.wikimedia.org/T215199) (owner: 10EBernhardson) [18:19:52] (03CR) 10Dzahn: [C: 03+2] releases::reprepro: Remove trusty [puppet] - 10https://gerrit.wikimedia.org/r/488009 (owner: 10Muehlenhoff) [18:20:39] (03PS4) 10Herron: rsyslog::kafka_shipper: raise rsyslog MaxMessageSize from 8k to 64k [puppet] - 10https://gerrit.wikimedia.org/r/480793 (https://phabricator.wikimedia.org/T205849) [18:21:06] 10Operations, 10Performance-Team, 10Traffic, 10media-storage: Automatically clean up unused thumbnails in Swift - https://phabricator.wikimedia.org/T211661 (10ori) What orders of magnitude are we talking about with respect to: total number of thumbnails in Swift, and number of thumbnail requests per second? [18:21:50] !log bsitzmann@deploy1001 Started deploy [mobileapps/deploy@2959e12]: Update mobileapps to 107c1b1 (T214714) [18:21:52] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:21:52] T214714: Improper CSS being served to apps. - https://phabricator.wikimedia.org/T214714 [18:26:32] !log bsitzmann@deploy1001 Finished deploy [mobileapps/deploy@2959e12]: Update mobileapps to 107c1b1 (T214714) (duration: 04m 43s) [18:26:34] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:26:39] 10Operations, 10Cloud-Services, 10Patch-For-Review: rack/setup/install cloudcontrol2001-dev & cloudvirt200[123]-dev - https://phabricator.wikimedia.org/T214448 (10Papaul) [18:27:28] (03PS3) 10Arturo Borrero Gonzalez: DNS: Add production DNS enties for cloudcontrol2001-dev and cloudvirt200[123]-dev [dns] - 10https://gerrit.wikimedia.org/r/486504 (https://phabricator.wikimedia.org/T214448) (owner: 10Papaul) [18:29:09] !log arlolra@deploy1001 Started deploy [parsoid/deploy@a4acfa6]: Updating Parsoid to fb67a71 [18:29:10] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:29:37] (03CR) 10Arturo Borrero Gonzalez: [C: 03+2] DNS: Add production DNS enties for cloudcontrol2001-dev and cloudvirt200[123]-dev [dns] - 10https://gerrit.wikimedia.org/r/486504 (https://phabricator.wikimedia.org/T214448) (owner: 10Papaul) [18:35:50] (03PS6) 10Dzahn: jenkins: add data types to parameters [puppet] - 10https://gerrit.wikimedia.org/r/485094 [18:36:09] (03CR) 10Dzahn: "@Hashar i addressed all your concerns i think. ok to go ahead?" [puppet] - 10https://gerrit.wikimedia.org/r/485094 (owner: 10Dzahn) [18:38:02] (03CR) 10Dzahn: [V: 03+1] "@Hashar any concerns? should i just go ahead?" [puppet] - 10https://gerrit.wikimedia.org/r/485096 (owner: 10Dzahn) [18:39:04] !log arlolra@deploy1001 Finished deploy [parsoid/deploy@a4acfa6]: Updating Parsoid to fb67a71 (duration: 09m 54s) [18:39:04] 10Operations, 10Performance-Team, 10Traffic, 10media-storage: Automatically clean up unused thumbnails in Swift - https://phabricator.wikimedia.org/T211661 (10ori) It seems that Swift has [[ https://docs.openstack.org/swift/latest/api/object-expiration.html | built-in support for object expiration ]], whic... [18:39:05] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:41:33] (03CR) 10Arturo Borrero Gonzalez: "I find the class name confusing." (036 comments) [puppet] - 10https://gerrit.wikimedia.org/r/488077 (https://phabricator.wikimedia.org/T212824) (owner: 10Elukey) [18:44:11] (03CR) 10Arturo Borrero Gonzalez: "These limits are pretty high. But beware that the limits will be applied to your own personal account. So running things with 'sudo' (like" [puppet] - 10https://gerrit.wikimedia.org/r/488078 (https://phabricator.wikimedia.org/T212824) (owner: 10Elukey) [18:44:53] (03CR) 10Arturo Borrero Gonzalez: "> These limits are pretty high. But beware that the limits will be" [puppet] - 10https://gerrit.wikimedia.org/r/488078 (https://phabricator.wikimedia.org/T212824) (owner: 10Elukey) [18:45:21] (03PS8) 10Dzahn: gerrit: add icinga https check for actual JSON content [puppet] - 10https://gerrit.wikimedia.org/r/487901 (https://phabricator.wikimedia.org/T215033) [18:45:50] 10Operations, 10SRE-Access-Requests, 10Discovery-Search (Current work): Analytics query access for search platform NLP contractor @Julia.glen - https://phabricator.wikimedia.org/T214623 (10Nuria) ping @Julia.glen to paste ssh public keys in ticket (maybe a paste in phab that is best) [18:46:41] 10Operations, 10SRE-Access-Requests, 10Discovery-Search (Current work): Analytics query access for search platform NLP contractor @Julia.glen - https://phabricator.wikimedia.org/T214623 (10RStallman-legalteam) Sorry for the all hands delay. If the applicable contract is with Glenbrook Networks, Inc, it looks... [18:48:59] (03CR) 10Dzahn: "@Volans What do you think, i am now avoiding any "if $slave" and the check will always be attached to a virtual "gerrit.wikimedia.org" hos" [puppet] - 10https://gerrit.wikimedia.org/r/487901 (https://phabricator.wikimedia.org/T215033) (owner: 10Dzahn) [18:49:21] mutante: I'll have a look in few [18:52:36] no rush at all, thanks volans [18:53:55] !log arlolra@deploy1001 Started deploy [parsoid/deploy@a4acfa6]: (no justification provided) [18:53:56] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:56:01] !log arlolra@deploy1001 Finished deploy [parsoid/deploy@a4acfa6]: (no justification provided) (duration: 02m 06s) [18:56:03] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:00:04] Deploy window Pre MediaWiki train sanity break (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20190205T1900) [19:03:01] 10Operations, 10hardware-requests, 10Patch-For-Review: codfw spare pool system for partman testing - https://phabricator.wikimedia.org/T215301 (10CDanis) # Installed server with standard `raid1-lvm-ext4-srv.cfg` partman config * Booted fine * Went into BIOS and swapped boot order of SATA devices (afterward... [19:05:23] 10Operations, 10Performance-Team, 10Traffic, 10media-storage: Automatically clean up unused thumbnails in Swift - https://phabricator.wikimedia.org/T211661 (10Gilles) This hadn't been considered because I didn't know that feature existed... Checking on the production Swift servers, swift-object-expirer isn... [19:07:23] 10Operations, 10Performance-Team, 10Traffic, 10media-storage: Automatically clean up unused thumbnails in Swift - https://phabricator.wikimedia.org/T211661 (10Gilles) Well, seems to have been that simple all along: https://packages.debian.org/jessie-backports/swift-object-expirer [19:07:32] (03CR) 10Dzahn: "@jijiki What do you think? The comments on the meeting were like "this should not be expected by users, except on maintenance servers". ri" [puppet] - 10https://gerrit.wikimedia.org/r/479142 (https://phabricator.wikimedia.org/T211512) (owner: 10Dzahn) [19:10:12] (03PS1) 10CDanis: add 'dualboot' fork of raid1-lvm-ext4-srv & try it on theemin [puppet] - 10https://gerrit.wikimedia.org/r/488110 (https://phabricator.wikimedia.org/T215301) [19:11:11] 10Operations, 10SRE-Access-Requests: Requesting access to deployment, contint-admins, and contint-docker for Brennen Bearnes - https://phabricator.wikimedia.org/T215328 (10brennen) [19:11:15] (03CR) 10CDanis: [C: 03+2] add 'dualboot' fork of raid1-lvm-ext4-srv & try it on theemin [puppet] - 10https://gerrit.wikimedia.org/r/488110 (https://phabricator.wikimedia.org/T215301) (owner: 10CDanis) [19:11:59] 10Operations, 10hardware-requests, 10Patch-For-Review: request to assign wmf6937 (mw1298, former imagescaler) (now: wmf4727) as phab1002 - https://phabricator.wikimedia.org/T195623 (10Dzahn) a:03RobH Just assigning since i have questions for you. [19:12:22] 10Operations, 10Performance-Team, 10Traffic, 10media-storage: Automatically clean up unused thumbnails in Swift - https://phabricator.wikimedia.org/T211661 (10Gilles) For setting it on X% of access, preferably async, we'll just need a little logic to check that the thumbnail has a corresponding original. O... [19:13:42] 10Operations, 10SRE-Access-Requests: Requesting access to deployment, contint-admins, and contint-docker for Brennen Bearnes - https://phabricator.wikimedia.org/T215328 (10brennen) [19:22:39] 10Operations, 10Performance-Team, 10Traffic, 10media-storage: Automatically clean up unused thumbnails in Swift - https://phabricator.wikimedia.org/T211661 (10ori) (It might have to be X% of access //on varnish//, since I assume the most oft-requested thumbnails enjoy a very high varnish cache hit rate. Yo... [19:23:47] PROBLEM - MariaDB Slave Lag: s5 on dbstore2001 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 329.31 seconds [19:24:45] 10Operations, 10hardware-requests, 10Patch-For-Review: request to assign wmf6937 (mw1298, former imagescaler) (now: wmf4727) as phab1002 - https://phabricator.wikimedia.org/T195623 (10RobH) Ok, I reviewed this in IRC with @dzahn and have the following action items: * we dont upgrade memory on existing sys... [19:25:07] 10Operations, 10Performance-Team, 10Traffic, 10media-storage: Automatically clean up unused thumbnails in Swift - https://phabricator.wikimedia.org/T211661 (10Gilles) We discussed this a bunch and we believe that basing this on Swift access alone with an expiry of a month is fine. The hottest things can't... [19:29:06] 10Operations, 10SRE-Access-Requests, 10Discovery-Search (Current work): Analytics query access for search platform NLP contractor @Julia.glen - https://phabricator.wikimedia.org/T214623 (10Julia.glen) My public key is https://phabricator.wikimedia.org/P8051 [19:30:36] 10Operations, 10decommission, 10hardware-requests: decommission wmf6937 as phab1002, reimage as mw1298 - https://phabricator.wikimedia.org/T215332 (10RobH) p:05Triage→03Normal [19:31:01] 10Operations, 10Patch-For-Review: Reallocate former image scalers - https://phabricator.wikimedia.org/T192457 (10RobH) [19:31:03] 10Operations, 10decommission, 10hardware-requests: decommission wmf6937 as phab1002, reimage as mw1298 - https://phabricator.wikimedia.org/T215332 (10RobH) [19:32:14] 10Operations, 10hardware-requests, 10Patch-For-Review: codfw spare pool system for partman testing - https://phabricator.wikimedia.org/T215301 (10CDanis) # Overwrote first 512 bytes of /dev/sda and /dev/sdb with zeros * Boot automatically fell through disk to using PXE # Reimaged system using `raid1-lvm-ex... [19:33:41] (03PS1) 10CDanis: dualboot: grub-installer/only_debian --> false [puppet] - 10https://gerrit.wikimedia.org/r/488112 (https://phabricator.wikimedia.org/T215301) [19:34:25] (03CR) 10CDanis: [C: 03+2] dualboot: grub-installer/only_debian --> false [puppet] - 10https://gerrit.wikimedia.org/r/488112 (https://phabricator.wikimedia.org/T215301) (owner: 10CDanis) [19:37:59] 10Operations, 10Wikimedia-Mailing-lists, 10Patch-For-Review: Ban recurrent spam to Wikimedia mailing lists (January 2019) - https://phabricator.wikimedia.org/T215251 (10Quiddity) >>! In T215251#4928519, @herron wrote: > Thanks for the patch! As mentioned in https://gerrit.wikimedia.org/r/488022 a reject rul... [19:40:58] 10Operations, 10hardware-requests: requesting wmf7622 as phabricator system in eqiad - https://phabricator.wikimedia.org/T215335 (10RobH) p:05Triage→03Normal [19:41:51] 10Operations, 10hardware-requests: requesting wmf7622 as phabricator system in eqiad - https://phabricator.wikimedia.org/T215335 (10RobH) [19:42:37] (03PS1) 10Ladsgroup: Use separate DB connection for ID insertions on testwikidatawiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/488114 (https://phabricator.wikimedia.org/T215147) [19:44:53] RECOVERY - MariaDB Slave Lag: s5 on dbstore2001 is OK: OK slave_sql_lag Replication lag: 38.75 seconds [19:45:31] 10Operations, 10hardware-requests: requesting WMF7426 as phabricator system in eqiad - https://phabricator.wikimedia.org/T215335 (10RobH) [19:46:17] 10Operations, 10hardware-requests, 10Patch-For-Review: request to assign wmf6937 (mw1298, former imagescaler) (now: wmf4727) as phab1002 - https://phabricator.wikimedia.org/T195623 (10RobH) 05Open→03Resolved T215332 and T215335 filed as followup, resolving this task. [19:46:30] 10Operations, 10hardware-requests: requesting WMF7426 as phabricator system in eqiad - https://phabricator.wikimedia.org/T215335 (10RobH) a:05RobH→03faidon [19:49:14] (03CR) 10Volans: "It would be nice to have CI run on this before merging. Apart a small detail (see inline) LGTM otherwise" (031 comment) [software/netbox-reports] - 10https://gerrit.wikimedia.org/r/487612 (owner: 10CRusnov) [19:52:04] 10Operations, 10Performance-Team, 10Traffic, 10media-storage: Automatically clean up unused thumbnails in Swift - https://phabricator.wikimedia.org/T211661 (10ori) The worry I had was this: a thumbnail that is requested once a minute on average probably has an approximately similar varnish cache hit rate t... [19:53:26] (03CR) 10Cwhite: "https://puppet-compiler.wmflabs.org/compiler1002/14539/" [puppet] - 10https://gerrit.wikimedia.org/r/486192 (https://phabricator.wikimedia.org/T213708) (owner: 10Cwhite) [19:54:14] 10Operations, 10hardware-requests, 10Patch-For-Review: codfw spare pool system for partman testing - https://phabricator.wikimedia.org/T215301 (10CDanis) # Overwrote first 512 bytes of /dev/sda and /dev/sdb with zeros * Boot automatically fell through disk to using PXE # Reimaged system again, with the... [19:54:40] (03CR) 10CRusnov: "> Patch Set 5:" (031 comment) [software/netbox-reports] - 10https://gerrit.wikimedia.org/r/487612 (owner: 10CRusnov) [19:54:45] 10Operations, 10hardware-requests: requesting WMF7426 as phabricator system in eqiad - https://phabricator.wikimedia.org/T215335 (10RobH) a:05faidon→03None [19:55:36] (03PS6) 10CRusnov: Reorganize and add tox/CI support for repository. [software/netbox-reports] - 10https://gerrit.wikimedia.org/r/487612 [19:56:48] (03CR) 10Volans: "Compiler results available here:" (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/487901 (https://phabricator.wikimedia.org/T215033) (owner: 10Dzahn) [19:58:35] 10Operations, 10hardware-requests: requesting WMF7426 as phabricator system in eqiad - https://phabricator.wikimedia.org/T215335 (10RobH) a:03Dzahn So I filed this on behalf of a conversation with @dzahn regarding parent task T195623. We still need to have direct confirmation on this task that the use of th... [19:59:37] 10Operations, 10Wikimedia-Mailing-lists, 10Patch-For-Review: Ban recurrent spam to Wikimedia mailing lists (January 2019) - https://phabricator.wikimedia.org/T215251 (10MarcoAurelio) Thanks for the emails @Quiddity and for the tweaks/merging @herron. Ideally I'd say to force this kind of emails to be tagged... [19:59:44] 10Operations, 10hardware-requests, 10Patch-For-Review: codfw spare pool system for partman testing - https://phabricator.wikimedia.org/T215301 (10CDanis) Grub does seem to be installed on both MBRs, with just minimal differences (going to guess pointing at the different physical disk for /): `root@theemin:~... [20:00:04] twentyafterfour: (Dis)respected human, time to deploy MediaWiki train - Americas version (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20190205T2000). Please do the needful. [20:00:26] 10Operations, 10ops-eqiad: WMF7426 fails to accept racadm powercycle commands - https://phabricator.wikimedia.org/T215338 (10RobH) p:05Triage→03Normal [20:02:51] (03CR) 10Hashar: "Eek, I thought puppet took care about it automatically. Thanks for the clean up :)" [puppet] - 10https://gerrit.wikimedia.org/r/488019 (https://phabricator.wikimedia.org/T126552) (owner: 10Hashar) [20:05:59] (03CR) 10Dzahn: "nope, puppet never removes anything unless told to, same as for files and users" [puppet] - 10https://gerrit.wikimedia.org/r/488019 (https://phabricator.wikimedia.org/T126552) (owner: 10Hashar) [20:13:39] (03CR) 10EBernhardson: jobrunner: support php7 (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/481866 (owner: 10Giuseppe Lavagetto) [20:18:08] 10Operations, 10hardware-requests, 10Patch-For-Review: codfw spare pool system for partman testing - https://phabricator.wikimedia.org/T215301 (10CDanis) a:05CDanis→03Papaul @Papaul when you're back in codfw and have a spare moment, can you please unplug the disk on SATA port A from wmf6653 (in row D), a... [20:25:17] 10Operations, 10hardware-requests, 10Patch-For-Review: codfw spare pool system for partman testing - https://phabricator.wikimedia.org/T215301 (10CDanis) [20:25:17] 10Operations: sw raid1 doesnt install grub on sdb - https://phabricator.wikimedia.org/T215183 (10CDanis) [20:26:02] (03PS1) 10Dzahn: admins: remove empty OIT admin group [puppet] - 10https://gerrit.wikimedia.org/r/488119 [20:28:23] (03CR) 10Dzahn: "unless we generally want to keep groups around to avoid the GIDs being reused or having to check for that when creating new ones.. as just" [puppet] - 10https://gerrit.wikimedia.org/r/488119 (owner: 10Dzahn) [20:35:00] 10Operations, 10monitoring: Expose linux kernel firewall and connections statistics - https://phabricator.wikimedia.org/T215277 (10CDanis) Ah yes, sorry to be unclear -- the full extent of the data is definitely too much for Prometheus. We could create a few aggregations of the data in some counters, however... [20:38:41] (03CR) 10CDanis: [C: 03+1] gerrit: add icinga https check for actual JSON content (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/487901 (https://phabricator.wikimedia.org/T215033) (owner: 10Dzahn) [20:47:31] 10Operations, 10SRE-Access-Requests, 10Discovery-Search (Current work): Analytics query access for search platform NLP contractor @Julia.glen - https://phabricator.wikimedia.org/T214623 (10Dzahn) [20:52:25] (03PS1) 10Dzahn: admins: create user with analytics-privatedata access for juliaglen [puppet] - 10https://gerrit.wikimedia.org/r/488120 (https://phabricator.wikimedia.org/T214623) [20:57:10] 10Operations, 10SRE-Access-Requests, 10Discovery-Search (Current work), 10Patch-For-Review: Analytics query access for search platform NLP contractor @Julia.glen - https://phabricator.wikimedia.org/T214623 (10Dzahn) I talked on IRC with @EBernhardson and clarified he requested group is analytics-privatedat... [21:02:48] (03CR) 10Gergő Tisza: [C: 03+1] "This works, but maybe it would be more robust to just use" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/488070 (https://phabricator.wikimedia.org/T215126) (owner: 10Anomie) [21:04:55] 10Operations, 10Wikimedia-Mailing-lists, 10User-herron: Ban recurrent spam to Wikimedia mailing lists (January 2019) - https://phabricator.wikimedia.org/T215251 (10herron) [21:11:05] 10Operations, 10Core Platform Team (PHP7 (TEC4)), 10Core Platform Team Kanban (Doing), 10HHVM, and 3 others: Migrate to PHP 7 in WMF production - https://phabricator.wikimedia.org/T176370 (10Tgr) [21:15:48] (03PS2) 10Gergő Tisza: Add PHP version to MediaWiki logs [mediawiki-config] - 10https://gerrit.wikimedia.org/r/487987 (https://phabricator.wikimedia.org/T215350) [21:17:35] (03CR) 10Krinkle: [C: 04-1] Preserve Composer's include paths (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/488070 (https://phabricator.wikimedia.org/T215126) (owner: 10Anomie) [21:18:53] 10Operations, 10SRE-Access-Requests, 10Discovery-Search (Current work), 10Patch-For-Review: Analytics query access for search platform NLP contractor @Julia.glen - https://phabricator.wikimedia.org/T214623 (10TJones) Thanks, @Dzahn! [21:18:57] (03PS5) 10Herron: rsyslog::kafka_shipper: raise rsyslog MaxMessageSize from 8k to 64k [puppet] - 10https://gerrit.wikimedia.org/r/480793 (https://phabricator.wikimedia.org/T205849) [21:19:41] PROBLEM - Check systemd state on ms-be2038 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [21:20:27] (03CR) 10Herron: [C: 03+2] rsyslog::kafka_shipper: raise rsyslog MaxMessageSize from 8k to 64k [puppet] - 10https://gerrit.wikimedia.org/r/480793 (https://phabricator.wikimedia.org/T205849) (owner: 10Herron) [21:32:36] (03CR) 10Jforrester: [C: 03+1] "Fancy." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/487987 (https://phabricator.wikimedia.org/T215350) (owner: 10Gergő Tisza) [21:38:20] (03CR) 10Anomie: Preserve Composer's include paths (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/488070 (https://phabricator.wikimedia.org/T215126) (owner: 10Anomie) [21:41:56] (03CR) 10Krinkle: [C: 03+1] Preserve Composer's include paths (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/488070 (https://phabricator.wikimedia.org/T215126) (owner: 10Anomie) [21:47:49] 10Operations, 10hardware-requests: requesting WMF7426 as phabricator system in eqiad - https://phabricator.wikimedia.org/T215335 (10Dzahn) Talked about it on IRC , the direct comparison between CPUs is: https://ark.intel.com/compare/123550,83359 The amount of RAM is what we needed and disk space is enough.... [21:51:11] RECOVERY - Check systemd state on ms-be2038 is OK: OK - running: The system is fully operational [22:00:04] (03CR) 10Anomie: Preserve Composer's include paths (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/488070 (https://phabricator.wikimedia.org/T215126) (owner: 10Anomie) [22:03:48] twentyafterfour: No train cut for wmf.16 yet? [22:03:50] (03PS1) 10Paladox: Gerrit: Introduce support for SiteNotice in PolyGerrit [puppet] - 10https://gerrit.wikimedia.org/r/488192 [22:11:02] (03PS2) 10Reedy: Remove pear packages from MW Application Servers [puppet] - 10https://gerrit.wikimedia.org/r/434710 (https://phabricator.wikimedia.org/T195364) [22:11:17] 10Operations, 10hardware-requests: requesting WMF7426 as phabricator system in eqiad - https://phabricator.wikimedia.org/T215335 (10Dzahn) a:05Dzahn→03RobH [22:11:23] (03PS3) 10Reedy: Remove pear packages from MW Application Servers [puppet] - 10https://gerrit.wikimedia.org/r/434710 (https://phabricator.wikimedia.org/T195364) [22:12:06] 10Operations, 10MobileFrontend, 10Traffic, 10Readers-Web-Backlog (Tracking): Remove .m. subdomain, serve mobile and desktop variants through the same URL - https://phabricator.wikimedia.org/T214998 (10Jdlrobson) Thanks for raising this. This is clearly not something we ever got satisfyingly right and clear... [22:13:53] (03CR) 10Reedy: "So these packages existing are causing some issues - T215126 and T215224" [puppet] - 10https://gerrit.wikimedia.org/r/434710 (https://phabricator.wikimedia.org/T195364) (owner: 10Reedy) [22:14:17] (03CR) 10Gergő Tisza: "@Krinkle do you have any concerns with the current text of the patch?" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/483339 (owner: 10Gergő Tisza) [22:16:47] (03CR) 10Krinkle: [C: 03+1] Preserve Composer's include paths (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/488070 (https://phabricator.wikimedia.org/T215126) (owner: 10Anomie) [22:18:44] 10Operations, 10hardware-requests: requesting WMF7426 as phabricator system in eqiad - https://phabricator.wikimedia.org/T215335 (10RobH) a:05RobH→03faidon @faidon, Please approve the allocation of our last dual cpu spare pool system in eqiad to allocation as the secondary phabricator system in eqiad. On... [22:21:45] (03CR) 10Nuria: [C: 03+1] admins: create user with analytics-privatedata access for juliaglen [puppet] - 10https://gerrit.wikimedia.org/r/488120 (https://phabricator.wikimedia.org/T214623) (owner: 10Dzahn) [22:21:57] (03CR) 10Krinkle: [C: 03+1] "LGTM, and I understand now where it does the logic. I do think though, we should aim to remove this confusing exemption. If its use case f" (032 comments) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/483339 (owner: 10Gergő Tisza) [22:25:28] (03PS4) 10Dzahn: varnish/trafficserver: switch parsoid-tests backend, rename director [puppet] - 10https://gerrit.wikimedia.org/r/486423 (https://phabricator.wikimedia.org/T201366) [22:27:23] (03CR) 10Dzahn: [C: 03+2] "thanks Valentin for reviewing, copied missing file from ruthenium, restarted parsoid-vd..checked in Icinga 5 hours later it was still runn" [puppet] - 10https://gerrit.wikimedia.org/r/486423 (https://phabricator.wikimedia.org/T201366) (owner: 10Dzahn) [22:33:28] 10Operations: parsoid-vd - "no such file or directory, open '/srv/visualdiff/testreduce/testrun.ids" - https://phabricator.wikimedia.org/T215049 (10Dzahn) a:03Dzahn [22:34:49] (03CR) 10Gergő Tisza: [C: 03+1] Preserve Composer's include paths (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/488070 (https://phabricator.wikimedia.org/T215126) (owner: 10Anomie) [22:35:08] 10Operations, 10Parsoid, 10Patch-For-Review: rack/setup/install scandium.eqiad.wmnet (parsoid test box) - https://phabricator.wikimedia.org/T201366 (10Dzahn) @ssastry ^ After i copied `/srv/visualdiff/testreduce/testrun.ids` from ruthenium to scandium and restarted parsoid-vd.. it ran for over 5 hours now a... [22:37:37] 10Operations, 10Parsoid, 10Patch-For-Review: rack/setup/install scandium.eqiad.wmnet (parsoid test box) - https://phabricator.wikimedia.org/T201366 (10Dzahn) @ssastry also regarding our chat about how to handle the testrun.ids next time .. i just saw T215049 has been created [22:44:22] 10Operations: parsoid-vd - "no such file or directory, open '/srv/visualdiff/testreduce/testrun.ids" - https://phabricator.wikimedia.org/T215049 (10Dzahn) Thanks @GTirloni for looking at this. This server is the new parsoid testing server, to replace ruthenium which runs on jessie. The missing file here is... [22:45:48] 10Operations: parsoid-vd - "no such file or directory, open '/srv/visualdiff/testreduce/testrun.ids" - https://phabricator.wikimedia.org/T215049 (10Dzahn) p:05Triage→03Low prio is now lower .. the service is fixed and no more monitoring alerts are expected. it stays open to make sure next time we install a... [22:49:42] 10Operations, 10decommission, 10hardware-requests: decommission wmf6937 as phab1002, reimage as mw1298 - https://phabricator.wikimedia.org/T215332 (10Dzahn) @RobH Want me to take it for the first couple check boxes and then give it back? [22:49:47] 10Operations, 10MobileFrontend, 10Traffic, 10Readers-Web-Backlog (Tracking): Remove .m. subdomain, serve mobile and desktop variants through the same URL - https://phabricator.wikimedia.org/T214998 (10Tgr) Re: SEO, Google's [[https://developers.google.com/search/mobile-sites/mobile-seo/|Mobile SEO Overview... [22:50:46] 10Operations, 10hardware-requests: requesting WMF7426 as phabricator system in eqiad - https://phabricator.wikimedia.org/T215335 (10faidon) Is there a task describing the plans for a secondary Phabricator system? How did we come up with those specs? [22:52:43] (03PS2) 10Paladox: Gerrit: Introduce support for SiteNotice in PolyGerrit [puppet] - 10https://gerrit.wikimedia.org/r/488192 [22:53:07] James_F: make-wmf-branch totally blew up I'm still trying to repair the half-done branch cut [22:53:39] it broke at the one place where we don't have a plausible resume path [22:54:29] Ah. :-( [22:54:46] Was this because of the WikibaseQuality extension being removed? [22:54:49] (03CR) 10BryanDavis: [C: 03+1] Add PHP version to MediaWiki logs [mediawiki-config] - 10https://gerrit.wikimedia.org/r/487987 (https://phabricator.wikimedia.org/T215350) (owner: 10Gergő Tisza) [22:57:22] 10Operations, 10fundraising-tech-ops, 10netops: Deploy pfw policy to allow https to frmon.frdev.wikimedia.org - https://phabricator.wikimedia.org/T215364 (10cwdent) [22:58:34] 10Operations, 10fundraising-tech-ops, 10netops: Deploy pfw policy to allow https to frmon.frdev.wikimedia.org - https://phabricator.wikimedia.org/T215364 (10cwdent) [23:03:03] 10Operations, 10MobileFrontend, 10Traffic, 10Readers-Web-Backlog (Tracking): Remove .m. subdomain, serve mobile and desktop variants through the same URL - https://phabricator.wikimedia.org/T214998 (10tstarling) It complicates SEO in the sense that, when I wrote this task, I was looking at Google Search Co... [23:22:52] (03PS3) 10Paladox: Gerrit: Introduce support for SiteNotice in PolyGerrit [puppet] - 10https://gerrit.wikimedia.org/r/488192 [23:28:28] (03PS4) 10Paladox: Gerrit: Introduce support for SiteNotice in PolyGerrit [puppet] - 10https://gerrit.wikimedia.org/r/488192 [23:28:59] (03PS5) 10Paladox: Gerrit: Introduce support for SiteNotice in PolyGerrit [puppet] - 10https://gerrit.wikimedia.org/r/488192 (https://phabricator.wikimedia.org/T215323) [23:43:39] (03PS1) 10Volans: administrative: add owner getter to Reason class [software/spicerack] - 10https://gerrit.wikimedia.org/r/488204 (https://phabricator.wikimedia.org/T205884)