[00:01:05] (03CR) 10Dzahn: [V: 03+1 C: 03+2] parsoid-php: on beta, add sudo privs for php-fpm restarts [puppet] - 10https://gerrit.wikimedia.org/r/547349 (https://phabricator.wikimedia.org/T236275) (owner: 10Dzahn) [00:04:16] 10Operations, 10ops-esams, 10DC-Ops, 10Patch-For-Review: decom amslvs1-4 (dc work) - https://phabricator.wikimedia.org/T87790 (10Papaul) 05Open→03Resolved a:03Papaul complete [00:04:18] 10Operations, 10ops-esams, 10Epic: Remove all decommissioned hardware - https://phabricator.wikimedia.org/T184063 (10Papaul) [00:05:32] (03PS1) 10Alex Monk: Replace old star.tools.wmflabs.org certificate with new one from acme-chief [puppet] - 10https://gerrit.wikimedia.org/r/547354 (https://phabricator.wikimedia.org/T236962) [00:05:36] 10Operations, 10ops-esams, 10DC-Ops, 10decommission: decom bast3003 (65R8Q4J, formerly amslvs4) - https://phabricator.wikimedia.org/T216199 (10Papaul) [00:07:12] (03CR) 10Alex Monk: "Related change: https://gerrit.wikimedia.org/r/plugins/gitiles/cloud/instance-puppet/+/1580edcab5fd8ee0412508ee97899e33b93b69a9%5E%21/#F0" [puppet] - 10https://gerrit.wikimedia.org/r/547354 (https://phabricator.wikimedia.org/T236962) (owner: 10Alex Monk) [00:08:37] (03PS1) 10Alex Monk: Remove old absented star.tools.wmflabs.org cert [puppet] - 10https://gerrit.wikimedia.org/r/547357 (https://phabricator.wikimedia.org/T236962) [00:11:06] (03CR) 10Catrope: "If the dblist is not pulled in by CommonSettings.php, does it still matter perf-wise?" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/546894 (https://phabricator.wikimedia.org/T208369) (owner: 10Gergő Tisza) [00:11:48] (03CR) 10Dzahn: "Notice: /Stage[main]/Profile::Parsoid/Sudo::User[scap3_restart_php]/File[/etc/sudoers.d/scap3_restart_php]/ensure: defined content as '{md" [puppet] - 10https://gerrit.wikimedia.org/r/547349 (https://phabricator.wikimedia.org/T236275) (owner: 10Dzahn) [00:12:33] (03PS1) 10Papaul: DNS: Remove mgmt DNS for bast3003 [dns] - 10https://gerrit.wikimedia.org/r/547358 [00:13:20] (03CR) 10Dzahn: [C: 03+1] DNS: Remove mgmt DNS for bast3003 [dns] - 10https://gerrit.wikimedia.org/r/547358 (owner: 10Papaul) [00:13:29] (03CR) 10Papaul: [C: 03+2] DNS: Remove mgmt DNS for bast3003 [dns] - 10https://gerrit.wikimedia.org/r/547358 (owner: 10Papaul) [00:14:51] 10Operations, 10ops-esams, 10DC-Ops, 10decommission, 10Patch-For-Review: decom bast3003 (65R8Q4J, formerly amslvs4) - https://phabricator.wikimedia.org/T216199 (10Papaul) [00:16:55] (03CR) 10Dzahn: "since the second line matches the command defined in https://gerrit.wikimedia.org/r/c/mediawiki/services/parsoid/deploy/+/546985/2/scap/en" [puppet] - 10https://gerrit.wikimedia.org/r/547349 (https://phabricator.wikimedia.org/T236275) (owner: 10Dzahn) [00:17:19] 10Operations, 10ops-esams, 10DC-Ops, 10decommission, 10Patch-For-Review: decom bast3003 (65R8Q4J, formerly amslvs4) - https://phabricator.wikimedia.org/T216199 (10Papaul) 05Open→03Resolved a:03Papaul complete [00:17:22] 10Operations, 10ops-esams, 10DC-Ops, 10Patch-For-Review: ESAMS Refresh/Rebuild (October 2019) - https://phabricator.wikimedia.org/T235805 (10Papaul) [00:17:24] 10Operations, 10ops-esams, 10Patch-For-Review: install/designate other machine as esams bastion - https://phabricator.wikimedia.org/T184936 (10Papaul) [00:22:00] PROBLEM - Host cp3056 is DOWN: PING CRITICAL - Packet loss = 100% [00:30:43] (03PS1) 10Alex Monk: tools-static: Allow X-Forwarded-Proto: https header [puppet] - 10https://gerrit.wikimedia.org/r/547360 (https://phabricator.wikimedia.org/T236952) [00:34:44] @seen arlolra [00:34:44] mutante: Last time I saw arlolra they were leaving the channel #wikimedia-operations at 10/30/2019 8:46:31 PM (3h48m12s ago) [00:38:06] (03PS1) 10Alex Monk: toolforge: Remove direct TLS termination support from static-server [puppet] - 10https://gerrit.wikimedia.org/r/547363 (https://phabricator.wikimedia.org/T236952) [00:39:48] (03PS2) 10Alex Monk: toolforge: Remove direct TLS termination support from static-server [puppet] - 10https://gerrit.wikimedia.org/r/547363 (https://phabricator.wikimedia.org/T236952) [00:40:34] (03PS1) 10Alex Monk: toolforge: Remove old absented star.wmflabs.org certificate [puppet] - 10https://gerrit.wikimedia.org/r/547364 (https://phabricator.wikimedia.org/T236952) [00:40:57] !log jforrester@deploy1001 Synchronized php-1.35.0-wmf.4/extensions/WikiLove/resources/ext.wikiLove.icon.vector.css: T236958 Fix Vector icon after upstream change (duration: 01m 02s) [00:41:02] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [00:41:03] T236958: 1.35.0-wmf.4: Wikilove background image no longer displayed - https://phabricator.wikimedia.org/T236958 [00:56:02] 10Operations, 10netops: Outbound BGP graceful shutdown - https://phabricator.wikimedia.org/T211728 (10faidon) a:05faidon→03None Reviving this... is this still waiting for my feedback? If so, LGTM :) [00:59:08] 10Operations, 10netops: Peer with SFMIX at ULSFO in 200 Paul - https://phabricator.wikimedia.org/T124843 (10faidon) 05Stalled→03Declined We reached out a while ago and based on the conversations there, I don't think it's good idea right now. [01:00:33] 10Operations, 10netops, 10observability: create a test for multicast relay - https://phabricator.wikimedia.org/T82038 (10faidon) 05Open→03Declined We've lived since 2013 without it, I don't think the extra complexity here would be warranted! [01:35:44] 10Operations, 10Traffic: Move cache text cluster from nginx to ats-tls - https://phabricator.wikimedia.org/T231627 (10Vgutierrez) [01:35:47] 10Operations, 10Traffic: ats-tls shows a huge amount of ESTABLISHED sockets even when the server is depooled - https://phabricator.wikimedia.org/T236458 (10Vgutierrez) 05Open→03Resolved This's been successfully mitigated by 9002f6dfc959fccf527b7c7a3778947496858695 [02:13:40] PROBLEM - Check the Netbox report cables for fail status. on netbox1001 is CRITICAL: cables.Cables CRITICAL https://wikitech.wikimedia.org/wiki/Netbox%23Reports [02:49:27] (03PS1) 10Vgutierrez: hiera: Move nginx from port 443 to port 4443 on cp4030 [puppet] - 10https://gerrit.wikimedia.org/r/547369 (https://phabricator.wikimedia.org/T231627) [02:49:28] (03PS1) 10Vgutierrez: hiera: Move ats-tls from port 8443 to port 443 on cp4030 [puppet] - 10https://gerrit.wikimedia.org/r/547370 (https://phabricator.wikimedia.org/T231627) [02:51:15] !log switch from nginx to ats-tls on cp4030 - T231627 [02:51:21] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [02:51:22] T231627: Move cache text cluster from nginx to ats-tls - https://phabricator.wikimedia.org/T231627 [02:51:51] (03CR) 10Vgutierrez: [C: 03+2] hiera: Move nginx from port 443 to port 4443 on cp4030 [puppet] - 10https://gerrit.wikimedia.org/r/547369 (https://phabricator.wikimedia.org/T231627) (owner: 10Vgutierrez) [02:53:25] (03CR) 10Vgutierrez: [C: 03+2] hiera: Move ats-tls from port 8443 to port 443 on cp4030 [puppet] - 10https://gerrit.wikimedia.org/r/547370 (https://phabricator.wikimedia.org/T231627) (owner: 10Vgutierrez) [03:04:01] 10Operations, 10Traffic, 10Patch-For-Review: Move cache text cluster from nginx to ats-tls - https://phabricator.wikimedia.org/T231627 (10Vgutierrez) [03:09:15] (03PS1) 10Vgutierrez: hiera: Move nginx from port 443 to port 4443 on cp4031 [puppet] - 10https://gerrit.wikimedia.org/r/547371 (https://phabricator.wikimedia.org/T231627) [03:09:17] (03PS1) 10Vgutierrez: hier: Move ats-tls from port 8443 to port 443 on cp4031 [puppet] - 10https://gerrit.wikimedia.org/r/547372 (https://phabricator.wikimedia.org/T231627) [03:09:31] !log switch from nginx to ats-tls on cp4031 - T231627 [03:09:36] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [03:09:36] T231627: Move cache text cluster from nginx to ats-tls - https://phabricator.wikimedia.org/T231627 [03:09:56] (03CR) 10jerkins-bot: [V: 04-1] hiera: Move nginx from port 443 to port 4443 on cp4031 [puppet] - 10https://gerrit.wikimedia.org/r/547371 (https://phabricator.wikimedia.org/T231627) (owner: 10Vgutierrez) [03:11:12] (03PS2) 10Vgutierrez: hiera: Move nginx from port 443 to port 4443 on cp4031 [puppet] - 10https://gerrit.wikimedia.org/r/547371 (https://phabricator.wikimedia.org/T231627) [03:12:04] (03CR) 10Vgutierrez: [C: 03+2] hiera: Move nginx from port 443 to port 4443 on cp4031 [puppet] - 10https://gerrit.wikimedia.org/r/547371 (https://phabricator.wikimedia.org/T231627) (owner: 10Vgutierrez) [03:13:10] (03CR) 10Vgutierrez: [C: 03+2] hier: Move ats-tls from port 8443 to port 443 on cp4031 [puppet] - 10https://gerrit.wikimedia.org/r/547372 (https://phabricator.wikimedia.org/T231627) (owner: 10Vgutierrez) [03:13:20] (03PS2) 10Vgutierrez: hier: Move ats-tls from port 8443 to port 443 on cp4031 [puppet] - 10https://gerrit.wikimedia.org/r/547372 (https://phabricator.wikimedia.org/T231627) [03:13:47] (03PS3) 10Vgutierrez: hiera: ats-tls from port 8443 to port 443 on cp4031 [puppet] - 10https://gerrit.wikimedia.org/r/547372 (https://phabricator.wikimedia.org/T231627) [03:14:17] (03PS4) 10Vgutierrez: hiera: Move ats-tls from port 8443 to port 443 on cp4031 [puppet] - 10https://gerrit.wikimedia.org/r/547372 (https://phabricator.wikimedia.org/T231627) [03:22:36] 10Operations, 10Traffic, 10Patch-For-Review: Move cache text cluster from nginx to ats-tls - https://phabricator.wikimedia.org/T231627 (10Vgutierrez) [03:22:46] 10Operations, 10Parsoid-PHP, 10serviceops: wt2html: Out of memory crashers - https://phabricator.wikimedia.org/T236833 (10ssastry) With that change, 40% of the urls don't OOM anymore. [03:30:38] (03PS2) 10Revi: Enable partial blocks on kowiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/546913 (https://phabricator.wikimedia.org/T236752) [03:30:52] RECOVERY - Check the Netbox report cables for fail status. on netbox1001 is OK: cables.Cables OK https://wikitech.wikimedia.org/wiki/Netbox%23Reports [03:31:23] (03PS1) 10Vgutierrez: hiera: Set nginx on port 4443 for cache text_ats on ulsfo [puppet] - 10https://gerrit.wikimedia.org/r/547373 (https://phabricator.wikimedia.org/T231627) [03:31:25] (03PS1) 10Vgutierrez: hiera: Set ats-tls on port 443 for cache text_ats on ulsfo [puppet] - 10https://gerrit.wikimedia.org/r/547374 (https://phabricator.wikimedia.org/T231627) [03:38:03] (03PS2) 10Vgutierrez: hiera: Set nginx on port 4443 for cache text_ats on ulsfo [puppet] - 10https://gerrit.wikimedia.org/r/547373 (https://phabricator.wikimedia.org/T231627) [03:38:05] (03PS2) 10Vgutierrez: hiera: Set ats-tls on port 443 for cache text_ats on ulsfo [puppet] - 10https://gerrit.wikimedia.org/r/547374 (https://phabricator.wikimedia.org/T231627) [03:44:52] !log switch from nginx to ats-tls on cp4032 - T231627 [03:45:01] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [03:45:02] T231627: Move cache text cluster from nginx to ats-tls - https://phabricator.wikimedia.org/T231627 [03:45:14] (03CR) 10Vgutierrez: [C: 03+2] hiera: Set nginx on port 4443 for cache text_ats on ulsfo [puppet] - 10https://gerrit.wikimedia.org/r/547373 (https://phabricator.wikimedia.org/T231627) (owner: 10Vgutierrez) [03:47:25] (03CR) 10Vgutierrez: [C: 03+2] hiera: Set ats-tls on port 443 for cache text_ats on ulsfo [puppet] - 10https://gerrit.wikimedia.org/r/547374 (https://phabricator.wikimedia.org/T231627) (owner: 10Vgutierrez) [04:26:33] PROBLEM - Disk space on elastic1018 is CRITICAL: DISK CRITICAL - free space: /srv 24713 MB (5% inode=99%): https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space https://grafana.wikimedia.org/dashboard/db/host-overview?var-server=elastic1018&var-datasource=eqiad+prometheus/ops [04:29:47] RECOVERY - Disk space on elastic1018 is OK: DISK OK https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space https://grafana.wikimedia.org/dashboard/db/host-overview?var-server=elastic1018&var-datasource=eqiad+prometheus/ops [04:44:40] (03PS3) 10DannyS712: Partial cleanup of InitializeSettings [mediawiki-config] - 10https://gerrit.wikimedia.org/r/546369 (https://phabricator.wikimedia.org/T231178) [05:04:21] PROBLEM - Disk space on elastic1025 is CRITICAL: DISK CRITICAL - free space: /srv 24833 MB (5% inode=99%): https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space https://grafana.wikimedia.org/dashboard/db/host-overview?var-server=elastic1025&var-datasource=eqiad+prometheus/ops [05:26:02] 10Operations, 10Traffic, 10Patch-For-Review: Move cache text cluster from nginx to ats-tls - https://phabricator.wikimedia.org/T231627 (10Vgutierrez) [05:26:45] RECOVERY - Disk space on elastic1025 is OK: DISK OK https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space https://grafana.wikimedia.org/dashboard/db/host-overview?var-server=elastic1025&var-datasource=eqiad+prometheus/ops [06:30:43] 10Operations, 10Patch-For-Review: Build cergen for buster - https://phabricator.wikimedia.org/T235405 (10elukey) On puppetmaster2001 I cannot see /etc/apt/sources.list.d/buster-cergen.list, hence the new package version seems not available.. expected? [06:37:32] !log upgrade cergen to 0.2.5 on puppetmaster1001 [06:37:37] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:37:46] (03PS1) 10ArielGlenn: fix up index.html file production for misc (and so adds/changes) dumps [dumps] - 10https://gerrit.wikimedia.org/r/547377 (https://phabricator.wikimedia.org/T236875) [06:39:47] vgutierrez: o/ hola, just to triple check cp3056 is under maintenance? [06:40:12] I just checked icinga and it shows red [06:43:00] ah ok https://phabricator.wikimedia.org/T236497 [06:43:01] afaik yes [06:43:03] will ack the alarm [06:43:07] it has nvme issues [06:43:59] yep yep just wanted to triple check, I just opened icinga :) [06:44:12] (03PS1) 10Elukey: profile::graphite::alerts: remove old Eventlogging alarms [puppet] - 10https://gerrit.wikimedia.org/r/547378 (https://phabricator.wikimedia.org/T159170) [06:44:49] (03CR) 10Elukey: [C: 03+2] profile::graphite::alerts: remove old Eventlogging alarms [puppet] - 10https://gerrit.wikimedia.org/r/547378 (https://phabricator.wikimedia.org/T159170) (owner: 10Elukey) [06:48:55] <_joe_> elukey: what changed in the new cergen? [06:49:19] <_joe_> elukey: if anything of substance, I'd suggest emailing ops-l [06:55:21] _joe_ minor things, I packaged some fixes that Andrew did + I added the support for java truststore passwords [06:55:40] (03CR) 10Jcrespo: "> Patch Set 2: Code-Review+1" [puppet] - 10https://gerrit.wikimedia.org/r/547283 (https://phabricator.wikimedia.org/T223602) (owner: 10Jcrespo) [06:55:44] but I can send an email as FYI for sure [06:56:31] (03CR) 10Jcrespo: "> Patch Set 2:" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/545411 (https://phabricator.wikimedia.org/T223602) (owner: 10Jforrester) [07:00:32] <_joe_> elukey: no I thought it was more breaking, it's ok [07:18:05] (03CR) 10Effie Mouzeli: [C: 03+2] mediawiki: Add 'caught_by' to php7-fatal-error log message [puppet] - 10https://gerrit.wikimedia.org/r/546230 (https://phabricator.wikimedia.org/T234283) (owner: 10Krinkle) [07:42:58] ottomata: hola [07:43:59] xotwod: you know the rule [07:44:04] don't ask to ask, just ask (tm) [07:45:18] I did about 15 global blocks, and around 10 of them never showed up. [07:45:24] StewardBot logged them in -stewards [07:45:41] Jon Kolbert globally blocked 23.226.41.0/24 (expiration 07:13, 31 October 2021) with the following comment: [[m:NOP|No Open Proxy]]: Webhost: Contact [[m:Special:Contact/stewards|stewards]] if you are affected [07:45:54] yet no blocks show here https://meta.wikimedia.org/wiki/Special:Contributions/23.226.41.0/24 [08:10:53] xotwod: I think you should open a security task on phabricator (although I am not sure what is the policy for that really) [08:11:11] https://phabricator.wikimedia.org/maniphest/task/edit/form/2/ [08:13:28] 10Operations, 10observability, 10Performance-Team (Radar): Revisit Grafana/Icinga notification strategy - https://phabricator.wikimedia.org/T203485 (10Addshore) >>! In T203485#5620999, @Krinkle wrote: > This task is a placeholder for improving the current situation. It is not a blocker. Various teams at WMF... [08:19:20] (03PS2) 10ArielGlenn: fix up index.html file production for misc (and so adds/changes) dumps [dumps] - 10https://gerrit.wikimedia.org/r/547377 (https://phabricator.wikimedia.org/T236875) [08:21:31] (03PS1) 10Giuseppe Lavagetto: Rake: Add yaml validation [deployment-charts] - 10https://gerrit.wikimedia.org/r/547403 [08:21:37] (03PS1) 10Addshore: Setup wikidata alerts from grafana dashboard [puppet] - 10https://gerrit.wikimedia.org/r/547404 (https://phabricator.wikimedia.org/T203485) [08:21:46] (03CR) 10jerkins-bot: [V: 04-1] Rake: Add yaml validation [deployment-charts] - 10https://gerrit.wikimedia.org/r/547403 (owner: 10Giuseppe Lavagetto) [08:22:15] <_joe_> \o/ [08:23:25] (03CR) 10ArielGlenn: [C: 03+2] fix up index.html file production for misc (and so adds/changes) dumps [dumps] - 10https://gerrit.wikimedia.org/r/547377 (https://phabricator.wikimedia.org/T236875) (owner: 10ArielGlenn) [08:24:16] o/ hi all, I'd love it if I could get https://gerrit.wikimedia.org/r/547404 merged so we can add some lovely extra altering layer for us for wikidata [08:24:26] also, I'd love to know who is on the "wikidata" contact group! [08:25:30] !log ariel@deploy1001 Started deploy [dumps/dumps@f2b6d78]: couple of fixup scripts, bug fix for incr dumps index.html generation [08:25:33] !log ariel@deploy1001 Finished deploy [dumps/dumps@f2b6d78]: couple of fixup scripts, bug fix for incr dumps index.html generation (duration: 00m 03s) [08:25:35] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:25:39] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:35:13] added it to puppet swat for now [08:43:06] addshore: o/ modules/nagios_common/files/contactgroups.cfg [08:44:00] notes_url is usually a wikipage [08:44:03] in wikitech [08:47:01] (03PS2) 10Giuseppe Lavagetto: Rake: Add yaml validation [deployment-charts] - 10https://gerrit.wikimedia.org/r/547403 [08:47:19] (03CR) 10jerkins-bot: [V: 04-1] Rake: Add yaml validation [deployment-charts] - 10https://gerrit.wikimedia.org/r/547403 (owner: 10Giuseppe Lavagetto) [09:01:29] (03CR) 10Alexandros Kosiaris: [C: 04-1] "Couple of comments, but otherwise, pretty nice idea" (035 comments) [deployment-charts] - 10https://gerrit.wikimedia.org/r/547403 (owner: 10Giuseppe Lavagetto) [09:04:16] (03PS3) 10Giuseppe Lavagetto: Rake: Add yaml validation [deployment-charts] - 10https://gerrit.wikimedia.org/r/547403 [09:04:30] (03CR) 10jerkins-bot: [V: 04-1] Rake: Add yaml validation [deployment-charts] - 10https://gerrit.wikimedia.org/r/547403 (owner: 10Giuseppe Lavagetto) [09:08:04] (03PS1) 10Ema: cache: reimage cp5009 as text_ats [puppet] - 10https://gerrit.wikimedia.org/r/547481 (https://phabricator.wikimedia.org/T227432) [09:08:19] (03CR) 10Giuseppe Lavagetto: Rake: Add yaml validation (035 comments) [deployment-charts] - 10https://gerrit.wikimedia.org/r/547403 (owner: 10Giuseppe Lavagetto) [09:09:16] (03PS4) 10Giuseppe Lavagetto: Rake: Add yaml validation [deployment-charts] - 10https://gerrit.wikimedia.org/r/547403 [09:09:30] (03CR) 10jerkins-bot: [V: 04-1] Rake: Add yaml validation [deployment-charts] - 10https://gerrit.wikimedia.org/r/547403 (owner: 10Giuseppe Lavagetto) [09:10:33] !log depool cp5009 and reimage as text_ats T227432 [09:10:38] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:10:39] T227432: Replace Varnish backends with ATS on cache text nodes - https://phabricator.wikimedia.org/T227432 [09:11:18] (03PS5) 10Giuseppe Lavagetto: Rake: Add yaml validation [deployment-charts] - 10https://gerrit.wikimedia.org/r/547403 [09:14:54] (03CR) 10Vgutierrez: [C: 03+1] cache: reimage cp5009 as text_ats [puppet] - 10https://gerrit.wikimedia.org/r/547481 (https://phabricator.wikimedia.org/T227432) (owner: 10Ema) [09:15:51] (03CR) 10Ema: [C: 03+2] cache: reimage cp5009 as text_ats [puppet] - 10https://gerrit.wikimedia.org/r/547481 (https://phabricator.wikimedia.org/T227432) (owner: 10Ema) [09:15:55] 10Operations, 10Puppet, 10puppet-compiler, 10User-jbond: puppet-compi;ler: fix git permissions - https://phabricator.wikimedia.org/T236986 (10jbond) p:05Triage→03Normal [09:16:24] 10Operations, 10Puppet, 10puppet-compiler, 10User-jbond: puppet-compiler: fix git permissions - https://phabricator.wikimedia.org/T236986 (10jbond) [09:16:38] 10Operations, 10Commons, 10MediaWiki-File-management, 10Multimedia, and 8 others: Picture from Commons not found from Singapore - https://phabricator.wikimedia.org/T231086 (10fgiunchedi) >>! In T231086#5619797, @aaron wrote: >>>! In T231086#5601608, @fgiunchedi wrote: >> swiftrepl is puppetized now to run... [09:17:00] 10Operations, 10Traffic, 10Patch-For-Review: Replace Varnish backends with ATS on cache text nodes - https://phabricator.wikimedia.org/T227432 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by ema on cumin1001.eqiad.wmnet for hosts: ` ['cp5009.eqsin.wmnet'] ` The log can be found in `/var/log/wm... [09:18:16] (03PS1) 10Awight: Enable Book Referencing on the beta cluster [mediawiki-config] - 10https://gerrit.wikimedia.org/r/547484 (https://phabricator.wikimedia.org/T236894) [09:23:03] !log temporarily stop logstash on logstash2006 to test performance with two ingesters only - T215904 [09:23:08] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:23:09] T215904: Better understanding of Logstash performance - https://phabricator.wikimedia.org/T215904 [09:26:55] PROBLEM - logstash process on logstash2006 is CRITICAL: PROCS CRITICAL: 0 processes with UID = 498 (logstash), command name java, args logstash https://wikitech.wikimedia.org/wiki/Logstash [09:27:19] PROBLEM - logstash JSON linesTCP port on logstash2006 is CRITICAL: connect to address 127.0.0.1 and port 11514: Connection refused https://wikitech.wikimedia.org/wiki/Logstash [09:27:33] PROBLEM - logstash syslog TCP port on logstash2006 is CRITICAL: connect to address 127.0.0.1 and port 10514: Connection refused https://wikitech.wikimedia.org/wiki/Logstash [09:28:57] (03CR) 10Arturo Borrero Gonzalez: [C: 03+2] toolforge::proxy: Remove old star.wmflabs.org absent resource [puppet] - 10https://gerrit.wikimedia.org/r/547293 (owner: 10Alex Monk) [09:31:26] 10Operations, 10Traffic: ats-be on the text cluster is experiencing broken connections - https://phabricator.wikimedia.org/T236988 (10Vgutierrez) [09:33:12] (03PS1) 10Effie Mouzeli: (WIP) logging: remove hhvm references in filters [puppet] - 10https://gerrit.wikimedia.org/r/547489 (https://phabricator.wikimedia.org/T229792) [09:37:52] !log temporarily stop logstash on logstash2005 to test performance with two ingesters only - T215904 [09:37:57] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:37:57] T215904: Better understanding of Logstash performance - https://phabricator.wikimedia.org/T215904 [09:42:49] PROBLEM - logstash syslog TCP port on logstash2005 is CRITICAL: connect to address 127.0.0.1 and port 10514: Connection refused https://wikitech.wikimedia.org/wiki/Logstash [09:43:09] PROBLEM - logstash JSON linesTCP port on logstash2005 is CRITICAL: connect to address 127.0.0.1 and port 11514: Connection refused https://wikitech.wikimedia.org/wiki/Logstash [09:43:18] (03PS1) 10Elukey: role::analytics_cluster::hadoop::worker: deploy TLS keys [puppet] - 10https://gerrit.wikimedia.org/r/547491 [09:43:27] PROBLEM - logstash process on logstash2005 is CRITICAL: PROCS CRITICAL: 0 processes with UID = 498 (logstash), command name java, args logstash https://wikitech.wikimedia.org/wiki/Logstash [09:43:54] !log ema@cumin1001 START - Cookbook sre.hosts.downtime [09:43:58] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:44:28] ema: this will likely fail too ^^^ [09:44:45] the config change to use the new puppetdb has not yet been merged [09:46:00] !log ema@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) [09:46:04] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:47:31] volans: looks like it worked [09:50:28] (03PS6) 10Giuseppe Lavagetto: Rake: Add yaml validation [deployment-charts] - 10https://gerrit.wikimedia.org/r/547403 (https://phabricator.wikimedia.org/T236899) [09:50:48] lucky you :) [09:51:19] :) [09:51:37] (03PS1) 10Elukey: Add TLS fake cert files for analytics1042 [labs/private] - 10https://gerrit.wikimedia.org/r/547492 [09:51:53] (03CR) 10Elukey: [V: 03+2 C: 03+2] Add TLS fake cert files for analytics1042 [labs/private] - 10https://gerrit.wikimedia.org/r/547492 (owner: 10Elukey) [09:52:02] ema: puppetdb was updated at 09:43:50,844 [09:52:18] so yeah by 4 seconds :D [09:52:25] (03CR) 10Giuseppe Lavagetto: [C: 03+2] Rake: Add yaml validation [deployment-charts] - 10https://gerrit.wikimedia.org/r/547403 (https://phabricator.wikimedia.org/T236899) (owner: 10Giuseppe Lavagetto) [09:52:38] (03Merged) 10jenkins-bot: Rake: Add yaml validation [deployment-charts] - 10https://gerrit.wikimedia.org/r/547403 (https://phabricator.wikimedia.org/T236899) (owner: 10Giuseppe Lavagetto) [09:53:47] (03PS1) 10Jbond: puppet_compiler: ensure all working dirs have correct owner [puppet] - 10https://gerrit.wikimedia.org/r/547493 (https://phabricator.wikimedia.org/T236986) [09:54:52] (03PS1) 10Giuseppe Lavagetto: blubberoid: new chart version fixing TLS [deployment-charts] - 10https://gerrit.wikimedia.org/r/547494 [09:56:55] 10Operations, 10Puppet, 10User-jbond: Add a CI check for the use of hiera() function - https://phabricator.wikimedia.org/T220820 (10jbond) 05Open→03Resolved a:03jbond [09:57:23] (03PS1) 10Elukey: Add secrets for the Hadoop Analytics TLS config in hiera [labs/private] - 10https://gerrit.wikimedia.org/r/547496 [09:57:25] jouncebot next [09:57:25] In 1 hour(s) and 2 minute(s): European Mid-day SWAT(Max 6 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20191031T1100) [09:57:28] 10Operations, 10Puppet, 10Release-Engineering-Team, 10puppet-compiler, 10User-jbond: add compiler1003 to jenkins - https://phabricator.wikimedia.org/T236468 (10jbond) 05Open→03Resolved a:03jbond Closing please reopen if further issues observed [09:57:51] (03CR) 10Elukey: [V: 03+2 C: 03+2] Add secrets for the Hadoop Analytics TLS config in hiera [labs/private] - 10https://gerrit.wikimedia.org/r/547496 (owner: 10Elukey) [10:01:38] (03CR) 10Giuseppe Lavagetto: [C: 03+2] blubberoid: new chart version fixing TLS [deployment-charts] - 10https://gerrit.wikimedia.org/r/547494 (owner: 10Giuseppe Lavagetto) [10:01:57] (03Merged) 10jenkins-bot: blubberoid: new chart version fixing TLS [deployment-charts] - 10https://gerrit.wikimedia.org/r/547494 (owner: 10Giuseppe Lavagetto) [10:02:51] (03PS1) 10Elukey: Fix typo in hiera config for the Hadoop Cluster [labs/private] - 10https://gerrit.wikimedia.org/r/547497 [10:03:05] (03CR) 10Elukey: [V: 03+2 C: 03+2] Fix typo in hiera config for the Hadoop Cluster [labs/private] - 10https://gerrit.wikimedia.org/r/547497 (owner: 10Elukey) [10:04:33] (03CR) 10Arturo Borrero Gonzalez: [C: 03+2] Replace old star.tools.wmflabs.org certificate with new one from acme-chief [puppet] - 10https://gerrit.wikimedia.org/r/547354 (https://phabricator.wikimedia.org/T236962) (owner: 10Alex Monk) [10:05:03] (03Abandoned) 10Arturo Borrero Gonzalez: toolforge: docker registry: use new SSL certificate by acme-chief [puppet] - 10https://gerrit.wikimedia.org/r/547221 (owner: 10Arturo Borrero Gonzalez) [10:05:05] (03CR) 10Elukey: [C: 03+2] "https://puppet-compiler.wmflabs.org/compiler1002/19191/analytics1042.eqiad.wmnet/" [puppet] - 10https://gerrit.wikimedia.org/r/547491 (owner: 10Elukey) [10:05:37] !log oblivian@deploy1001 helmfile [STAGING] Ran 'apply' command on namespace 'blubberoid' for release 'staging' . [10:05:41] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:07:01] !log oblivian@deploy1001 helmfile [STAGING] Ran 'apply' command on namespace 'blubberoid' for release 'staging' . [10:07:04] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:13:22] !log bounce logstash on logstash2004 [10:13:26] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:13:44] (03PS1) 10Giuseppe Lavagetto: tls: env variables need to be strings in yaml [deployment-charts] - 10https://gerrit.wikimedia.org/r/547499 [10:16:24] (03PS2) 10Effie Mouzeli: logging: remove hhvm references [puppet] - 10https://gerrit.wikimedia.org/r/547489 (https://phabricator.wikimedia.org/T229792) [10:16:40] (03CR) 10Giuseppe Lavagetto: [C: 03+2] tls: env variables need to be strings in yaml [deployment-charts] - 10https://gerrit.wikimedia.org/r/547499 (owner: 10Giuseppe Lavagetto) [10:16:53] (03Merged) 10jenkins-bot: tls: env variables need to be strings in yaml [deployment-charts] - 10https://gerrit.wikimedia.org/r/547499 (owner: 10Giuseppe Lavagetto) [10:18:23] 10Operations, 10Traffic: Replace Varnish backends with ATS on cache text nodes - https://phabricator.wikimedia.org/T227432 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['cp5009.eqsin.wmnet'] ` and were **ALL** successful. [10:18:38] !log oblivian@deploy1001 helmfile [STAGING] Ran 'apply' command on namespace 'blubberoid' for release 'staging' . [10:18:42] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:19:24] !log oblivian@deploy1001 helmfile [STAGING] Ran 'apply' command on namespace 'blubberoid' for release 'staging' . [10:19:27] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:23:31] PROBLEM - Check systemd state on logstash2004 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [10:29:21] (03PS1) 10Jbond: Move the nginx submodule into the repository - part 1 [puppet] - 10https://gerrit.wikimedia.org/r/547500 (https://phabricator.wikimedia.org/T230206) [10:29:23] (03PS1) 10Jbond: Move the zookeeper zabbix into the repository - part 2 [puppet] - 10https://gerrit.wikimedia.org/r/547501 (https://phabricator.wikimedia.org/T230206) [10:29:34] !log oblivian@deploy1001 helmfile [STAGING] Ran 'apply' command on namespace 'blubberoid' for release 'staging' . [10:29:38] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:30:06] (03CR) 10Alexandros Kosiaris: [C: 04-1] Setup wikidata alerts from grafana dashboard (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/547404 (https://phabricator.wikimedia.org/T203485) (owner: 10Addshore) [10:30:36] !log oblivian@deploy1001 helmfile [STAGING] Ran 'apply' command on namespace 'blubberoid' for release 'staging' . [10:30:39] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:31:28] (03CR) 10jerkins-bot: [V: 04-1] Move the nginx submodule into the repository - part 1 [puppet] - 10https://gerrit.wikimedia.org/r/547500 (https://phabricator.wikimedia.org/T230206) (owner: 10Jbond) [10:32:10] (03CR) 10Addshore: Setup wikidata alerts from grafana dashboard (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/547404 (https://phabricator.wikimedia.org/T203485) (owner: 10Addshore) [10:32:33] akosiaris: no idea which Id I should use, I used one similar to the ones i had seen elsewhere in puppet [10:33:01] addshore: the one that corresponds to the dashboard you want to alert on [10:33:08] hence my suggestion [10:33:19] it's part of the URL [10:34:06] fixing [10:34:07] both [10:34:58] (03PS2) 10Addshore: Setup wikidata alerts from grafana dashboard [puppet] - 10https://gerrit.wikimedia.org/r/547404 (https://phabricator.wikimedia.org/T203485) [10:35:19] (03CR) 10Jbond: [V: 03+2] "Overriding CI, Errors are caused because fixtures doesn't know to add the module from environment/modules further both patch sets will be " [puppet] - 10https://gerrit.wikimedia.org/r/547500 (https://phabricator.wikimedia.org/T230206) (owner: 10Jbond) [10:35:23] !log oblivian@deploy1001 helmfile [STAGING] Ran 'sync' command on namespace 'blubberoid' for release 'staging' . [10:35:27] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:37:02] !log oblivian@deploy1001 helmfile [STAGING] Ran 'apply' command on namespace 'blubberoid' for release 'staging' . [10:37:06] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:38:43] !log pool cp5009 with ATS backend T227432 [10:38:47] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:38:48] T227432: Replace Varnish backends with ATS on cache text nodes - https://phabricator.wikimedia.org/T227432 [10:39:08] !log oblivian@deploy1001 helmfile [STAGING] Ran 'apply' command on namespace 'blubberoid' for release 'staging' . [10:39:11] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:39:54] akosiaris: also, while merging that one it would be great if I could get a reminder / update of what contacts are in wikidata [10:40:15] in icinga for wikidata? [10:40:25] in icinga for the wikidata contact group [10:41:32] addshore: https://gerrit.wikimedia.org/r/plugins/gitiles/operations/puppet/+/production/modules/nagios_common/files/contactgroups.cfg#52 [10:41:49] thanks *adds a link to that in the docs* [10:41:54] I thought that was private somewhere [10:42:49] Any idea if icinga has a mattermost or generic web hook plugin for contact groups? [10:46:35] (03CR) 10WMDE-Fisch: [C: 03+1] Enable Book Referencing on the beta cluster [mediawiki-config] - 10https://gerrit.wikimedia.org/r/547484 (https://phabricator.wikimedia.org/T236894) (owner: 10Awight) [10:52:37] (03CR) 10Elukey: Move the zookeeper zabbix into the repository - part 2 (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/547501 (https://phabricator.wikimedia.org/T230206) (owner: 10Jbond) [10:53:12] (03PS2) 10Jbond: Move the zabbix into the repository - part 2 [puppet] - 10https://gerrit.wikimedia.org/r/547501 (https://phabricator.wikimedia.org/T230206) [10:54:46] !log bounce logstash on logstash2004 [10:54:48] (03CR) 10Jbond: Move the zabbix into the repository - part 2 (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/547501 (https://phabricator.wikimedia.org/T230206) (owner: 10Jbond) [10:54:50] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:56:29] (03PS1) 10Giuseppe Lavagetto: blubberoid/scaffold: ports is an array [deployment-charts] - 10https://gerrit.wikimedia.org/r/547503 [10:57:27] (03CR) 10Filippo Giunchedi: [C: 03+1] "LGTM!" [puppet] - 10https://gerrit.wikimedia.org/r/547489 (https://phabricator.wikimedia.org/T229792) (owner: 10Effie Mouzeli) [10:57:53] 10Operations, 10ops-eqiad, 10DC-Ops: b8-eqiad pdu refresh (Thursday 10/31 @11am UTC) - https://phabricator.wikimedia.org/T227543 (10Jclark-ctr) Starting pdu refresh eqiad b8 [10:57:55] (03CR) 10Giuseppe Lavagetto: [C: 03+2] blubberoid/scaffold: ports is an array [deployment-charts] - 10https://gerrit.wikimedia.org/r/547503 (owner: 10Giuseppe Lavagetto) [10:58:06] (03Merged) 10jenkins-bot: blubberoid/scaffold: ports is an array [deployment-charts] - 10https://gerrit.wikimedia.org/r/547503 (owner: 10Giuseppe Lavagetto) [10:58:13] PROBLEM - logstash process on logstash2004 is CRITICAL: PROCS CRITICAL: 0 processes with UID = 498 (logstash), command name java, args logstash https://wikitech.wikimedia.org/wiki/Logstash [10:58:48] (03CR) 10Filippo Giunchedi: [C: 03+1] "> Patch Set 2: Code-Review+1" [puppet] - 10https://gerrit.wikimedia.org/r/547489 (https://phabricator.wikimedia.org/T229792) (owner: 10Effie Mouzeli) [10:59:01] PROBLEM - logstash JSON linesTCP port on logstash2004 is CRITICAL: connect to address 127.0.0.1 and port 11514: Connection refused https://wikitech.wikimedia.org/wiki/Logstash [10:59:01] (03PS2) 10KartikMistry: Enable CX out of beta in Albanian Wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/547086 (https://phabricator.wikimedia.org/T236064) [10:59:17] PROBLEM - logstash syslog TCP port on logstash2004 is CRITICAL: connect to address 127.0.0.1 and port 10514: Connection refused https://wikitech.wikimedia.org/wiki/Logstash [10:59:25] !log aborrero@cumin1001 START - Cookbook sre.hosts.downtime [10:59:26] !log aborrero@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) [10:59:29] PROBLEM - Maps - OSM synchronization lag - codfw on icinga1001 is CRITICAL: 2.988e+05 ge 2.592e+05 https://wikitech.wikimedia.org/wiki/Maps/Runbook https://grafana.wikimedia.org/dashboard/db/maps-performances?panelId=12&fullscreen&orgId=1 [10:59:32] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:59:35] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:59:48] (03CR) 10Filippo Giunchedi: [C: 03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/547144 (https://phabricator.wikimedia.org/T229792) (owner: 10Effie Mouzeli) [10:59:53] !log aborrero@cumin1001 START - Cookbook sre.hosts.downtime [10:59:54] !log aborrero@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) [10:59:56] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:59:59] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:00:05] Amir1, Lucas_WMDE, awight, and Urbanecm: (Dis)respected human, time to deploy European Mid-day SWAT(Max 6 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20191031T1100). Please do the needful. [11:00:05] kart_ and Tpt: A patch you scheduled for European Mid-day SWAT(Max 6 patches) is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker. [11:00:15] !log oblivian@deploy1001 helmfile [STAGING] Ran 'apply' command on namespace 'blubberoid' for release 'staging' . [11:00:18] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:00:21] Sure [11:00:37] kart_: do you want to self-deploy, or should I? [11:00:39] RECOVERY - logstash JSON linesTCP port on logstash2004 is OK: TCP OK - 0.000 second response time on 127.0.0.1 port 11514 https://wikitech.wikimedia.org/wiki/Logstash [11:00:40] Hi! [11:00:55] RECOVERY - logstash syslog TCP port on logstash2004 is OK: TCP OK - 0.000 second response time on 127.0.0.1 port 10514 https://wikitech.wikimedia.org/wiki/Logstash [11:01:01] PROBLEM - Maps - OSM synchronization lag - eqiad on icinga1001 is CRITICAL: 2.989e+05 ge 2.592e+05 https://wikitech.wikimedia.org/wiki/Maps/Runbook https://grafana.wikimedia.org/dashboard/db/maps-performances?panelId=11&fullscreen&orgId=1 [11:01:16] (03CR) 10KartikMistry: [C: 03+2] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/547086 (https://phabricator.wikimedia.org/T236064) (owner: 10KartikMistry) [11:01:25] RECOVERY - logstash process on logstash2004 is OK: PROCS OK: 1 process with UID = 498 (logstash), command name java, args logstash https://wikitech.wikimedia.org/wiki/Logstash [11:01:28] Urbanecm: deploying.. [11:01:37] ack kart_ [11:02:05] (03Merged) 10jenkins-bot: Enable CX out of beta in Albanian Wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/547086 (https://phabricator.wikimedia.org/T236064) (owner: 10KartikMistry) [11:02:14] Urbanecm: I've two changes, one depending on the other, is that ok? https://gerrit.wikimedia.org/r/c/mediawiki/extensions/ProofreadPage/+/547495 https://gerrit.wikimedia.org/r/c/mediawiki/extensions/ProofreadPage/+/547498 [11:02:22] Tpt[m]: yup :-) [11:02:44] Urbanecm: Great! Thanks! [11:02:49] yw [11:03:21] !log cp5008: restart ats-be to clear "backend process restarted" alert [11:03:24] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:03:54] 10Operations, 10ops-eqiad, 10DC-Ops: b8-eqiad pdu refresh (Thursday 10/31 @11am UTC) - https://phabricator.wikimedia.org/T227543 (10aborrero) [11:05:53] RECOVERY - traffic_server backend process restarted on cp5008 is OK: (C)2 ge (W)2 ge 1 https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server https://grafana.wikimedia.org/d/6uhkG6OZk/ats-instance-drilldown?orgId=1&var-site=eqsin+prometheus/ops&var-instance=cp5008&var-layer=backend [11:06:43] (03CR) 10Filippo Giunchedi: "LGTM to my untrained eye, see inline" (032 comments) [deployment-charts] - 10https://gerrit.wikimedia.org/r/547307 (https://phabricator.wikimedia.org/T236386) (owner: 10Ottomata) [11:07:44] !log kartik@deploy1001 Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit|547086|Enable ContentTranslation out of Beta in Albanian WP (T236064)]] (duration: 01m 02s) [11:07:49] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:07:50] T236064: Request for ContentTranslation to be enabled by default on sq.wikipedia - https://phabricator.wikimedia.org/T236064 [11:08:37] (03CR) 10Filippo Giunchedi: "LGTM!" [puppet] - 10https://gerrit.wikimedia.org/r/542472 (https://phabricator.wikimedia.org/T205870) (owner: 10Cwhite) [11:09:27] Urbanecm: I'm done [11:09:39] okay [11:09:43] (03CR) 10Elukey: Refactor profile::analytics::refinery::job::import_mediawiki_dumps (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/546966 (https://phabricator.wikimedia.org/T234333) (owner: 10Joal) [11:11:03] Tpt[m]: will ping you once your atch is ready to be tested [11:11:39] thank you! [11:12:05] (03PS1) 10Arturo Borrero Gonzalez: toolforge: new k8s: rename hiera keys for consistency [puppet] - 10https://gerrit.wikimedia.org/r/547504 (https://phabricator.wikimedia.org/T214513) [11:14:53] (03CR) 10Arturo Borrero Gonzalez: "Heads up Jason, this may break your ceph testing VMs." [puppet] - 10https://gerrit.wikimedia.org/r/547504 (https://phabricator.wikimedia.org/T214513) (owner: 10Arturo Borrero Gonzalez) [11:16:06] (03CR) 10Filippo Giunchedi: [C: 04-1] "See inline, the concept LGTM" (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/546195 (owner: 10Jbond) [11:16:13] PROBLEM - Too many messages in kafka logging-eqiad on icinga1001 is CRITICAL: cluster=misc exported_cluster=logging-eqiad group=logstash-codfw instance=kafkamon1001:9501 job=burrow partition={0,1,2,3,4,5} site=eqiad topic={udp_localhost-info,udp_localhost-warning} https://wikitech.wikimedia.org/wiki/Logstash%23Kafka_consumer_lag https://grafana.wikimedia.org/d/000000484/kafka-consumer-lag?from=now-3h&to=now&orgId=1&var-datasource [11:16:13] /ops&var-cluster=logging-eqiad&var-topic=All&var-consumer_group=All [11:18:27] known ^ that's me [11:18:36] Tpt[m]: please test your change at mwdebug1001 and let me know [11:20:38] addshore: icinga v1 having a good interface for plugin into things? I 'd say no [11:20:40] (03CR) 10Elukey: "Had a broader chat with my team, we'll start looking into Airflow next Q and we don't want to block you on this. Eventually, when we'll st" [puppet] - 10https://gerrit.wikimedia.org/r/544989 (owner: 10EBernhardson) [11:20:42] Urbanecm: I just tested. Everything looks good. [11:20:44] Please deploy [11:20:54] akosiaris: okay :D [11:21:18] great Tpt[m] [11:22:14] (03CR) 10Arturo Borrero Gonzalez: [C: 03+2] toolforge: new k8s: rename hiera keys for consistency [puppet] - 10https://gerrit.wikimedia.org/r/547504 (https://phabricator.wikimedia.org/T214513) (owner: 10Arturo Borrero Gonzalez) [11:23:19] !log urbanecm@deploy1001 Synchronized php-1.35.0-wmf.4/extensions/ProofreadPage/: SWAT: e0d5ce9: Add page navigation tabs in correct order skin-side and remove js requirement for Vector tab icons (T231250); ed17da2: Makes sure that Vector default background does not override the navigation arrows (T236969) (duration: 01m 02s) [11:23:25] Tpt[m]: here you are! [11:23:25] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:23:27] (03CR) 10DCausse: [C: 03+1] Support /entity/ and other Wikidata URLs for Commons (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/526757 (https://phabricator.wikimedia.org/T222321) (owner: 10Smalyshev) [11:23:28] T231250: Use icons for previous, next, index tabs on Wikisource pages - https://phabricator.wikimedia.org/T231250 [11:23:31] T236969: Next, previous and index tabs on Wikisource are broken in Vector - https://phabricator.wikimedia.org/T236969 [11:24:34] !log EU SWAT done [11:24:37] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:24:49] jynus: for when you have time: https://gerrit.wikimedia.org/r/c/operations/puppet/+/547271 [11:24:54] sorry for bothering you [11:25:12] (03CR) 10Filippo Giunchedi: "See inline, LGTM overall" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/546290 (https://phabricator.wikimedia.org/T236505) (owner: 10Cwhite) [11:25:18] The error is from something else actually, I need to run explain on stuff https://tendril.wikimedia.org/report/slow_queries?host=%5Edb&user=wikiuser&schema=wikidatawiki&qmode=eq&query=&hours=24 [11:25:21] (03PS1) 10Jbond: puppet git: add a descriptive config version [puppet] - 10https://gerrit.wikimedia.org/r/547505 (https://phabricator.wikimedia.org/T228854) [11:25:23] (03PS1) 10Jbond: motd: add the config version to the MOTD [puppet] - 10https://gerrit.wikimedia.org/r/547506 (https://phabricator.wikimedia.org/T228854) [11:25:23] sorry., Amir, I have time sensitive maintenance to prepare [11:25:31] I can have a look later in the day [11:25:41] Urbanecm: Out of mwdebug.1001 I only see the first change deployed and not the second. Is it normal? [11:25:56] Tpt[m]: that's strange [11:26:21] Amir1: or you can ask other root too 0:-) [11:26:31] Tpt[m]: hmm, now I'm looking, it might be some kind of cache [11:26:49] probably [11:26:54] (03CR) 10Ayounsi: [C: 03+1] devices: refactor signature [software/homer] - 10https://gerrit.wikimedia.org/r/543889 (owner: 10Volans) [11:26:59] (03CR) 10Filippo Giunchedi: "See inline" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/546992 (https://phabricator.wikimedia.org/T236505) (owner: 10Cwhite) [11:27:00] Sure, [11:27:36] Tpt[m]: you might try ?debug=1 parameter, or simply wait :-) [11:27:44] let me know if it still doesn't look okay in some time! [11:28:42] Urbamecm: Special:Version is not updated yet for me so it's maybe related [11:30:11] Urbanecm: It works now. Sorry for having bothered you [11:30:19] no problem :) [11:32:06] (03CR) 10Alexandros Kosiaris: [C: 03+2] "Thanks!" [puppet] - 10https://gerrit.wikimedia.org/r/547404 (https://phabricator.wikimedia.org/T203485) (owner: 10Addshore) [11:32:19] (03CR) 10Ayounsi: [C: 03+1] netbox: allow to select the devices from Netbox [software/homer] - 10https://gerrit.wikimedia.org/r/543890 (https://phabricator.wikimedia.org/T228388) (owner: 10Volans) [11:33:04] thanks akosiaris :) It's going to be great being able to have a set of alarms just for us [11:33:11] (03PS1) 10Jcrespo: mariadb: depool pc1008 temporarily [mediawiki-config] - 10https://gerrit.wikimedia.org/r/547508 (https://phabricator.wikimedia.org/T227543) [11:36:00] :) [11:36:35] (03PS2) 10Jbond: puppet git: add a descriptive config version [puppet] - 10https://gerrit.wikimedia.org/r/547505 (https://phabricator.wikimedia.org/T228854) [11:37:01] !log jynus@cumin1001 dbctl commit (dc=all): 'Depool db1119, db1113 T227543', diff saved to https://phabricator.wikimedia.org/P9507 and previous config saved to /var/cache/conftool/dbconfig/20191031-113659-jynus.json [11:37:06] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:37:07] T227543: b8-eqiad pdu refresh (Thursday 10/31 @11am UTC) - https://phabricator.wikimedia.org/T227543 [11:38:42] 10Operations, 10observability, 10serviceops, 10Performance-Team (Radar): Messages in Logstash from php-fatal-error.php are missing from type:mediawiki/channel:fatal - https://phabricator.wikimedia.org/T234283 (10jijiki) Please ping if there are more things to be done for this task:) [11:39:31] 10Operations, 10serviceops, 10HHVM, 10MW-1.35-notes (1.35.0-wmf.3; 2019-10-22), and 2 others: Remove HHVM from production - https://phabricator.wikimedia.org/T229792 (10jijiki) p:05Triage→03High [11:40:47] (03CR) 10Jcrespo: [C: 03+2] mariadb: depool pc1008 temporarily [mediawiki-config] - 10https://gerrit.wikimedia.org/r/547508 (https://phabricator.wikimedia.org/T227543) (owner: 10Jcrespo) [11:41:34] (03Merged) 10jenkins-bot: mariadb: depool pc1008 temporarily [mediawiki-config] - 10https://gerrit.wikimedia.org/r/547508 (https://phabricator.wikimedia.org/T227543) (owner: 10Jcrespo) [11:42:58] ^expect parser cache to increase miss ration by a 30% [11:43:33] !log jynus@deploy1001 Synchronized wmf-config/db-eqiad.php: depooling pc1008 T227543 (duration: 01m 01s) [11:43:36] (03PS3) 10Jbond: puppet git: add a descriptive config version [puppet] - 10https://gerrit.wikimedia.org/r/547505 (https://phabricator.wikimedia.org/T228854) [11:43:38] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:43:39] T227543: b8-eqiad pdu refresh (Thursday 10/31 @11am UTC) - https://phabricator.wikimedia.org/T227543 [11:44:51] RECOVERY - logstash process on logstash2005 is OK: PROCS OK: 1 process with UID = 498 (logstash), command name java, args logstash https://wikitech.wikimedia.org/wiki/Logstash [11:45:19] RECOVERY - logstash process on logstash2006 is OK: PROCS OK: 1 process with UID = 498 (logstash), command name java, args logstash https://wikitech.wikimedia.org/wiki/Logstash [11:45:31] RECOVERY - logstash syslog TCP port on logstash2006 is OK: TCP OK - 0.000 second response time on 127.0.0.1 port 10514 https://wikitech.wikimedia.org/wiki/Logstash [11:45:49] RECOVERY - logstash JSON linesTCP port on logstash2005 is OK: TCP OK - 0.000 second response time on 127.0.0.1 port 11514 https://wikitech.wikimedia.org/wiki/Logstash [11:46:07] RECOVERY - logstash syslog TCP port on logstash2005 is OK: TCP OK - 0.000 second response time on 127.0.0.1 port 10514 https://wikitech.wikimedia.org/wiki/Logstash [11:46:33] RECOVERY - logstash JSON linesTCP port on logstash2006 is OK: TCP OK - 0.000 second response time on 127.0.0.1 port 11514 https://wikitech.wikimedia.org/wiki/Logstash [11:48:20] (03CR) 10Volans: [C: 03+2] devices: refactor signature [software/homer] - 10https://gerrit.wikimedia.org/r/543889 (owner: 10Volans) [11:48:30] (03CR) 10Volans: [C: 03+2] netbox: allow to select the devices from Netbox [software/homer] - 10https://gerrit.wikimedia.org/r/543890 (https://phabricator.wikimedia.org/T228388) (owner: 10Volans) [11:48:37] (03CR) 10Volans: [C: 03+2] setup.py: remove unused test dependency [software/homer] - 10https://gerrit.wikimedia.org/r/543965 (owner: 10Volans) [11:48:54] !log setting pc1008 as a replica of active pc1010 [11:48:58] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:49:52] RECOVERY - Too many messages in kafka logging-eqiad on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Logstash%23Kafka_consumer_lag https://grafana.wikimedia.org/d/000000484/kafka-consumer-lag?from=now-3h&to=now&orgId=1&var-datasource=eqiad+prometheus/ops&var-cluster=logging-eqiad&var-topic=All&var-consumer_group=All [11:50:39] (03PS1) 10Arturo Borrero Gonzalez: toolforge: new k8s: rename node to worker [puppet] - 10https://gerrit.wikimedia.org/r/547509 (https://phabricator.wikimedia.org/T214513) [11:51:10] (03Merged) 10jenkins-bot: devices: refactor signature [software/homer] - 10https://gerrit.wikimedia.org/r/543889 (owner: 10Volans) [11:52:06] (03Merged) 10jenkins-bot: netbox: allow to select the devices from Netbox [software/homer] - 10https://gerrit.wikimedia.org/r/543890 (https://phabricator.wikimedia.org/T228388) (owner: 10Volans) [11:52:08] (03Merged) 10jenkins-bot: setup.py: remove unused test dependency [software/homer] - 10https://gerrit.wikimedia.org/r/543965 (owner: 10Volans) [11:53:13] (03CR) 10Arturo Borrero Gonzalez: [C: 03+2] toolforge: new k8s: rename node to worker [puppet] - 10https://gerrit.wikimedia.org/r/547509 (https://phabricator.wikimedia.org/T214513) (owner: 10Arturo Borrero Gonzalez) [11:56:42] PROBLEM - Juniper alarms on asw2-b-eqiad is CRITICAL: JNX_ALARMS CRITICAL - 1 red alarms, 0 yellow alarms https://wikitech.wikimedia.org/wiki/Network_monitoring%23Juniper_alarm [11:56:48] (03CR) 10Effie Mouzeli: "https://puppet-compiler.wmflabs.org/compiler1003/19196/logstash1007.eqiad.wmnet/" [puppet] - 10https://gerrit.wikimedia.org/r/547489 (https://phabricator.wikimedia.org/T229792) (owner: 10Effie Mouzeli) [11:57:54] RECOVERY - Juniper alarms on asw2-b-eqiad is OK: JNX_ALARMS OK - 0 red alarms, 0 yellow alarms https://wikitech.wikimedia.org/wiki/Network_monitoring%23Juniper_alarm [11:59:03] (03CR) 10Alexandros Kosiaris: "You can probably ship eqiad, codfw as well in the same patch. But keep in mind there is some minor setup work that needs to be done to get" (031 comment) [deployment-charts] - 10https://gerrit.wikimedia.org/r/547307 (https://phabricator.wikimedia.org/T236386) (owner: 10Ottomata) [12:02:56] (03PS3) 10Effie Mouzeli: logging: remove hhvm references [puppet] - 10https://gerrit.wikimedia.org/r/547489 (https://phabricator.wikimedia.org/T229792) [12:04:33] (03PS4) 10Jbond: puppet git: add a descriptive config version [puppet] - 10https://gerrit.wikimedia.org/r/547505 (https://phabricator.wikimedia.org/T228854) [12:05:09] (03CR) 10jerkins-bot: [V: 04-1] logging: remove hhvm references [puppet] - 10https://gerrit.wikimedia.org/r/547489 (https://phabricator.wikimedia.org/T229792) (owner: 10Effie Mouzeli) [12:08:27] (03PS2) 10Jbond: motd: add the config version to the MOTD [puppet] - 10https://gerrit.wikimedia.org/r/547506 (https://phabricator.wikimedia.org/T228854) [12:14:35] (03PS4) 10Effie Mouzeli: logging: remove hhvm references [puppet] - 10https://gerrit.wikimedia.org/r/547489 (https://phabricator.wikimedia.org/T229792) [12:16:24] PROBLEM - MariaDB Slave Lag: pc2 on pc2008 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 311.39 seconds https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_slave [12:16:28] (03CR) 10Effie Mouzeli: "> > Patch Set 2: Code-Review+1" [puppet] - 10https://gerrit.wikimedia.org/r/547489 (https://phabricator.wikimedia.org/T229792) (owner: 10Effie Mouzeli) [12:17:09] I will handle pc2008 [12:18:30] (not user facing, under maintenance) [12:20:24] (03CR) 10Alexandros Kosiaris: [C: 03+1] Optimize archiva-gitfat-link script [puppet] - 10https://gerrit.wikimedia.org/r/547281 (https://phabricator.wikimedia.org/T235668) (owner: 10Ottomata) [12:20:43] (03Abandoned) 10Alexandros Kosiaris: archiva: Increase the cron gitfat internal [puppet] - 10https://gerrit.wikimedia.org/r/541775 (owner: 10Alexandros Kosiaris) [12:22:47] (03PS1) 10Filippo Giunchedi: hieradata: set kafka-logging default to 6 partitions [puppet] - 10https://gerrit.wikimedia.org/r/547519 (https://phabricator.wikimedia.org/T215904) [12:22:55] (03CR) 10Elukey: "Looks good to me, even if I am still a bit ignorant about this part of archiva. Was the new script tested on Archiva?" [puppet] - 10https://gerrit.wikimedia.org/r/547281 (https://phabricator.wikimedia.org/T235668) (owner: 10Ottomata) [12:23:04] (03CR) 10jerkins-bot: [V: 04-1] hieradata: set kafka-logging default to 6 partitions [puppet] - 10https://gerrit.wikimedia.org/r/547519 (https://phabricator.wikimedia.org/T215904) (owner: 10Filippo Giunchedi) [12:23:43] :( [12:23:48] (03CR) 10Elukey: [C: 03+1] Optimize archiva-gitfat-link script [puppet] - 10https://gerrit.wikimedia.org/r/547281 (https://phabricator.wikimedia.org/T235668) (owner: 10Ottomata) [12:24:19] (03PS2) 10Filippo Giunchedi: hieradata: set kafka-logging default to 6 partitions [puppet] - 10https://gerrit.wikimedia.org/r/547519 (https://phabricator.wikimedia.org/T215904) [12:26:08] 10Operations, 10ops-eqiad, 10DC-Ops: b8-eqiad pdu refresh (Thursday 10/31 @11am UTC) - https://phabricator.wikimedia.org/T227543 (10Jclark-ctr) finished pdu refresh, netbox updated, [12:27:14] 10Operations, 10ops-eqiad, 10DC-Ops: b8-eqiad pdu refresh (Thursday 10/31 @11am UTC) - https://phabricator.wikimedia.org/T227543 (10Jclark-ctr) a:05Cmjohnson→03RobH [12:29:05] (03PS1) 10Elukey: Enable the encrypted shuffle functionality in Hadoop Analytics [puppet] - 10https://gerrit.wikimedia.org/r/547522 (https://phabricator.wikimedia.org/T236995) [12:31:56] (03PS1) 10Ayounsi: Add the ability to ignore some or all Junos warnings [software/homer] - 10https://gerrit.wikimedia.org/r/547523 [12:36:32] (03PS2) 10Ayounsi: Add the ability to ignore some or all Junos warnings [software/homer] - 10https://gerrit.wikimedia.org/r/547523 [12:39:46] PROBLEM - Check the Netbox report librenms for fail status. on netbox1001 is CRITICAL: librenms.LibreNMS CRITICAL https://wikitech.wikimedia.org/wiki/Netbox%23Reports [12:41:56] PROBLEM - IPMI Sensor Status on analytics1062 is CRITICAL: Sensor Type(s) Temperature, Power_Supply Status: Critical [PS Redundancy = Critical, Status = Critical] https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook%23Power_Supply_Failures [12:42:28] RECOVERY - Check systemd state on logstash2004 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [12:47:44] (03PS1) 10Jcrespo: Revert "mariadb: depool pc1008 temporarily" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/547525 [12:49:34] (03CR) 10Ottomata: "Thank you!" [puppet] - 10https://gerrit.wikimedia.org/r/547378 (https://phabricator.wikimedia.org/T159170) (owner: 10Elukey) [12:58:26] ottomata: yeah I saw the airflow tasks float by and I've already chatted with a couple people about it, since I did an eval a few years back and it looked promising [13:00:56] (03PS2) 10Ayounsi: Update config to match new esams infra [homer/public] - 10https://gerrit.wikimedia.org/r/545660 (https://phabricator.wikimedia.org/T235805) [13:00:58] (03CR) 10DCausse: [C: 03+1] "https://gerrit.wikimedia.org/r/c/operations/puppet/+/532772 needs to be reverted as well" [puppet] - 10https://gerrit.wikimedia.org/r/526757 (https://phabricator.wikimedia.org/T222321) (owner: 10Smalyshev) [13:09:29] (03CR) 10Ottomata: Add eventgate-logging-external instance in staging (031 comment) [deployment-charts] - 10https://gerrit.wikimedia.org/r/547307 (https://phabricator.wikimedia.org/T236386) (owner: 10Ottomata) [13:10:01] (03PS1) 10CDanis: WIP [puppet] - 10https://gerrit.wikimedia.org/r/547527 [13:15:49] (03PS3) 10BBlack: Un-submodule for nginx: move to prod env [1/2] [puppet] - 10https://gerrit.wikimedia.org/r/521323 (https://phabricator.wikimedia.org/T183454) [13:15:51] (03PS3) 10BBlack: Un-submodule for nginx: rename to orig path [2/2] [puppet] - 10https://gerrit.wikimedia.org/r/521324 [13:16:07] !log jynus@cumin1001 dbctl commit (dc=all): 'Repool db1119, db1113 at 10% T227543', diff saved to https://phabricator.wikimedia.org/P9509 and previous config saved to /var/cache/conftool/dbconfig/20191031-131606-jynus.json [13:16:18] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:16:20] T227543: b8-eqiad pdu refresh (Thursday 10/31 @11am UTC) - https://phabricator.wikimedia.org/T227543 [13:18:08] (03CR) 10jerkins-bot: [V: 04-1] Un-submodule for nginx: move to prod env [1/2] [puppet] - 10https://gerrit.wikimedia.org/r/521323 (https://phabricator.wikimedia.org/T183454) (owner: 10BBlack) [13:18:24] (03CR) 10Effie Mouzeli: "I won't +1 only because I had a quick look rather than a detailed one" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/542472 (https://phabricator.wikimedia.org/T205870) (owner: 10Cwhite) [13:18:39] (03CR) 10Jcrespo: [C: 03+2] Revert "mariadb: depool pc1008 temporarily" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/547525 (owner: 10Jcrespo) [13:19:31] (03Merged) 10jenkins-bot: Revert "mariadb: depool pc1008 temporarily" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/547525 (owner: 10Jcrespo) [13:20:38] RECOVERY - MariaDB Slave Lag: pc2 on pc2008 is OK: OK slave_sql_lag Replication lag: 0.04 seconds https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_slave [13:21:19] !log jynus@deploy1001 Synchronized wmf-config/db-eqiad.php: repool pc1008 T227543 (duration: 01m 02s) [13:21:26] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:21:28] T227543: b8-eqiad pdu refresh (Thursday 10/31 @11am UTC) - https://phabricator.wikimedia.org/T227543 [13:21:59] hit rate is going up again [13:25:25] yeah, we are back to 80% hit rate [13:28:05] (03CR) 10Ottomata: Add eventgate-logging-external instance in staging (031 comment) [deployment-charts] - 10https://gerrit.wikimedia.org/r/547307 (https://phabricator.wikimedia.org/T236386) (owner: 10Ottomata) [13:30:18] (03CR) 10Ottomata: "> Patch Set 1:" [puppet] - 10https://gerrit.wikimedia.org/r/547281 (https://phabricator.wikimedia.org/T235668) (owner: 10Ottomata) [13:30:31] (03PS2) 10Ottomata: Optimize archiva-gitfat-link script [puppet] - 10https://gerrit.wikimedia.org/r/547281 (https://phabricator.wikimedia.org/T235668) [13:31:38] (03CR) 10Effie Mouzeli: lvs, prometheus, profile: add swagger exporter jobs (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/542472 (https://phabricator.wikimedia.org/T205870) (owner: 10Cwhite) [13:33:12] (03CR) 10Ottomata: [C: 03+2] Optimize archiva-gitfat-link script [puppet] - 10https://gerrit.wikimedia.org/r/547281 (https://phabricator.wikimedia.org/T235668) (owner: 10Ottomata) [13:39:36] (03PS2) 10CDanis: WIP [puppet] - 10https://gerrit.wikimedia.org/r/547527 [13:40:47] !log upload xdebug 2.7.0-1+wmf2 to component/php72 - T234418 [13:40:52] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:40:59] T234418: Upgrade our php-xdebug package for php7.2 - https://phabricator.wikimedia.org/T234418 [13:44:01] (03PS3) 10CDanis: WIP [puppet] - 10https://gerrit.wikimedia.org/r/547527 [13:45:04] (03PS1) 10Ayounsi: FNM, fix netflow parsing for udp (no flag) flows [puppet] - 10https://gerrit.wikimedia.org/r/547534 [13:46:03] (03PS4) 10CDanis: WIP [puppet] - 10https://gerrit.wikimedia.org/r/547527 [13:48:09] (03PS5) 10CDanis: WIP [puppet] - 10https://gerrit.wikimedia.org/r/547527 [13:48:25] (03PS11) 10Joal: Refactor profile::analytics::refinery::job::import_mediawiki_dumps [puppet] - 10https://gerrit.wikimedia.org/r/546966 (https://phabricator.wikimedia.org/T234333) [13:49:38] (03CR) 10CDanis: [C: 03+1] FNM, fix netflow parsing for udp (no flag) flows [puppet] - 10https://gerrit.wikimedia.org/r/547534 (owner: 10Ayounsi) [13:50:27] (03CR) 10jerkins-bot: [V: 04-1] Refactor profile::analytics::refinery::job::import_mediawiki_dumps [puppet] - 10https://gerrit.wikimedia.org/r/546966 (https://phabricator.wikimedia.org/T234333) (owner: 10Joal) [13:50:30] (03CR) 10BBlack: [C: 03+1] FNM, fix netflow parsing for udp (no flag) flows [puppet] - 10https://gerrit.wikimedia.org/r/547534 (owner: 10Ayounsi) [13:52:04] (03PS4) 10BBlack: Un-submodule for nginx: move to prod env [1/2] [puppet] - 10https://gerrit.wikimedia.org/r/521323 (https://phabricator.wikimedia.org/T183454) [13:52:06] (03PS4) 10BBlack: Un-submodule for nginx: rename to orig path [2/2] [puppet] - 10https://gerrit.wikimedia.org/r/521324 (https://phabricator.wikimedia.org/T230206) [13:52:40] (03PS3) 10Jbond: Move the zabbix into the repository - part 2 [puppet] - 10https://gerrit.wikimedia.org/r/547501 (https://phabricator.wikimedia.org/T230206) [13:52:59] (03PS5) 10BBlack: Un-submodule for nginx: move to prod env [1/2] [puppet] - 10https://gerrit.wikimedia.org/r/521323 (https://phabricator.wikimedia.org/T230206) [13:53:01] (03PS5) 10BBlack: Un-submodule for nginx: rename to orig path [2/2] [puppet] - 10https://gerrit.wikimedia.org/r/521324 (https://phabricator.wikimedia.org/T230206) [13:53:03] (03CR) 10Arturo Borrero Gonzalez: [C: 03+2] "I tested this in toolsbeta and tools and everything looks fine" [puppet] - 10https://gerrit.wikimedia.org/r/546995 (https://phabricator.wikimedia.org/T236826) (owner: 10Arturo Borrero Gonzalez) [13:54:14] (03CR) 10Jbond: [V: 03+2 C: 03+1] "LGTM also overriding CI as we know it fails" [puppet] - 10https://gerrit.wikimedia.org/r/521323 (https://phabricator.wikimedia.org/T230206) (owner: 10BBlack) [13:54:27] (03CR) 10Jbond: [C: 03+1] "lgtm" [puppet] - 10https://gerrit.wikimedia.org/r/521324 (https://phabricator.wikimedia.org/T230206) (owner: 10BBlack) [13:55:02] (03CR) 10jerkins-bot: [V: 04-1] Un-submodule for nginx: move to prod env [1/2] [puppet] - 10https://gerrit.wikimedia.org/r/521323 (https://phabricator.wikimedia.org/T230206) (owner: 10BBlack) [13:55:35] (03CR) 10BBlack: [C: 03+2] Un-submodule for nginx: move to prod env [1/2] [puppet] - 10https://gerrit.wikimedia.org/r/521323 (https://phabricator.wikimedia.org/T230206) (owner: 10BBlack) [13:56:33] (03CR) 10BBlack: [C: 03+2] Un-submodule for nginx: rename to orig path [2/2] [puppet] - 10https://gerrit.wikimedia.org/r/521324 (https://phabricator.wikimedia.org/T230206) (owner: 10BBlack) [13:56:44] (03PS6) 10BBlack: Un-submodule for nginx: rename to orig path [2/2] [puppet] - 10https://gerrit.wikimedia.org/r/521324 (https://phabricator.wikimedia.org/T230206) [13:57:15] (03CR) 10BBlack: [V: 03+2 C: 03+2] Un-submodule for nginx: rename to orig path [2/2] [puppet] - 10https://gerrit.wikimedia.org/r/521324 (https://phabricator.wikimedia.org/T230206) (owner: 10BBlack) [14:06:35] (03PS1) 10BBlack: Final commit, deprecate old nginx submodule repo [puppet/nginx] - 10https://gerrit.wikimedia.org/r/547536 (https://phabricator.wikimedia.org/T230206) [14:06:41] (03CR) 10jerkins-bot: [V: 04-1] Final commit, deprecate old nginx submodule repo [puppet/nginx] - 10https://gerrit.wikimedia.org/r/547536 (https://phabricator.wikimedia.org/T230206) (owner: 10BBlack) [14:07:15] (03CR) 10BBlack: [V: 03+2 C: 03+2] Final commit, deprecate old nginx submodule repo [puppet/nginx] - 10https://gerrit.wikimedia.org/r/547536 (https://phabricator.wikimedia.org/T230206) (owner: 10BBlack) [14:07:25] (03CR) 10jerkins-bot: [V: 04-1] Final commit, deprecate old nginx submodule repo [puppet/nginx] - 10https://gerrit.wikimedia.org/r/547536 (https://phabricator.wikimedia.org/T230206) (owner: 10BBlack) [14:08:17] (03CR) 10Elukey: "John a couple of questions:" [puppet] - 10https://gerrit.wikimedia.org/r/547501 (https://phabricator.wikimedia.org/T230206) (owner: 10Jbond) [14:10:40] (03PS12) 10Joal: Refactor profile::analytics::refinery::job::import_mediawiki_dumps [puppet] - 10https://gerrit.wikimedia.org/r/546966 (https://phabricator.wikimedia.org/T234333) [14:12:03] (03CR) 10Jbond: "> Patch Set 3:" [puppet] - 10https://gerrit.wikimedia.org/r/547501 (https://phabricator.wikimedia.org/T230206) (owner: 10Jbond) [14:12:10] (03Abandoned) 10Jbond: Move the zabbix into the repository - part 2 [puppet] - 10https://gerrit.wikimedia.org/r/547501 (https://phabricator.wikimedia.org/T230206) (owner: 10Jbond) [14:12:40] (03Abandoned) 10Jbond: Move the nginx submodule into the repository - part 1 [puppet] - 10https://gerrit.wikimedia.org/r/547500 (https://phabricator.wikimedia.org/T230206) (owner: 10Jbond) [14:12:46] (03CR) 10jerkins-bot: [V: 04-1] Refactor profile::analytics::refinery::job::import_mediawiki_dumps [puppet] - 10https://gerrit.wikimedia.org/r/546966 (https://phabricator.wikimedia.org/T234333) (owner: 10Joal) [14:13:33] (03CR) 10Elukey: "https://puppet-compiler.wmflabs.org/compiler1003/19204/analytics1042.eqiad.wmnet/" [puppet] - 10https://gerrit.wikimedia.org/r/547522 (https://phabricator.wikimedia.org/T236995) (owner: 10Elukey) [14:16:05] (03PS1) 10Alexandros Kosiaris: Be more verbose about backup1001, backup2001 [dns] - 10https://gerrit.wikimedia.org/r/547537 [14:16:17] (03PS2) 10Ayounsi: FNM, fix netflow parsing for udp (no flag) flows [puppet] - 10https://gerrit.wikimedia.org/r/547534 [14:16:19] (03PS1) 10Ayounsi: FNM, Fix bug causing recovery emails not to be sent [puppet] - 10https://gerrit.wikimedia.org/r/547538 [14:23:55] (03PS13) 10Joal: Refactor profile::analytics::refinery::job::import_mediawiki_dumps [puppet] - 10https://gerrit.wikimedia.org/r/546966 (https://phabricator.wikimedia.org/T234333) [14:24:56] !log jynus@cumin1001 dbctl commit (dc=all): 'Depool db1119', diff saved to https://phabricator.wikimedia.org/P9511 and previous config saved to /var/cache/conftool/dbconfig/20191031-142455-jynus.json [14:25:00] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:28:59] !log reloading ferm on db1119 [14:29:02] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:30:16] (03CR) 10Filippo Giunchedi: "LGTM" (032 comments) [deployment-charts] - 10https://gerrit.wikimedia.org/r/547307 (https://phabricator.wikimedia.org/T236386) (owner: 10Ottomata) [14:31:08] (03PS1) 10DCausse: Revert "Revert "Add L and M to allowed statement starts"" [puppet] - 10https://gerrit.wikimedia.org/r/547541 (https://phabricator.wikimedia.org/T222321) [14:31:38] (03PS2) 10DCausse: Support /entity/ and other Wikidata URLs for Commons [puppet] - 10https://gerrit.wikimedia.org/r/526757 (https://phabricator.wikimedia.org/T222321) (owner: 10Smalyshev) [14:39:44] (03CR) 10Elukey: [C: 03+2] Enable the encrypted shuffle functionality in Hadoop Analytics [puppet] - 10https://gerrit.wikimedia.org/r/547522 (https://phabricator.wikimedia.org/T236995) (owner: 10Elukey) [14:48:13] (03CR) 10Giuseppe Lavagetto: [C: 04-1] "My biggest objection is the standard is now called OpenAPI and not swagger; we should start using the appropriate name (and probably fix t" (037 comments) [puppet] - 10https://gerrit.wikimedia.org/r/542472 (https://phabricator.wikimedia.org/T205870) (owner: 10Cwhite) [14:50:12] !log jynus@cumin1001 dbctl commit (dc=all): 'Repool db1119 at 10%', diff saved to https://phabricator.wikimedia.org/P9512 and previous config saved to /var/cache/conftool/dbconfig/20191031-145010-jynus.json [14:50:15] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:53:27] (03PS1) 10Ema: varnish: make hitrate dstat plugin work w/o varnish-be [puppet] - 10https://gerrit.wikimedia.org/r/547547 (https://phabricator.wikimedia.org/T227432) [14:54:07] (03CR) 10jerkins-bot: [V: 04-1] varnish: make hitrate dstat plugin work w/o varnish-be [puppet] - 10https://gerrit.wikimedia.org/r/547547 (https://phabricator.wikimedia.org/T227432) (owner: 10Ema) [14:55:45] (03PS2) 10Ema: varnish: make hitrate dstat plugin work w/o varnish-be [puppet] - 10https://gerrit.wikimedia.org/r/547547 (https://phabricator.wikimedia.org/T227432) [14:56:01] !log Password reset for SUL user `Darth AK` [14:56:05] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:58:12] (03PS1) 10Ottomata: eventgate 0.0.12 - log to stdout only [deployment-charts] - 10https://gerrit.wikimedia.org/r/547549 [15:07:37] (03PS2) 10Ottomata: Add eventgate-logging-external instance [deployment-charts] - 10https://gerrit.wikimedia.org/r/547307 (https://phabricator.wikimedia.org/T236386) [15:07:46] (03PS3) 10Ema: varnish: make hitrate dstat plugin work w/o varnish-be [puppet] - 10https://gerrit.wikimedia.org/r/547547 (https://phabricator.wikimedia.org/T227432) [15:08:04] (03CR) 10Ottomata: "Ok, I added the codfw and eqiad services too." [deployment-charts] - 10https://gerrit.wikimedia.org/r/547307 (https://phabricator.wikimedia.org/T236386) (owner: 10Ottomata) [15:09:03] (03PS4) 10Jbond: check_puppetrun: alert critical after 24 hours [puppet] - 10https://gerrit.wikimedia.org/r/546195 [15:09:05] (03CR) 10Ema: [C: 03+2] varnish: make hitrate dstat plugin work w/o varnish-be [puppet] - 10https://gerrit.wikimedia.org/r/547547 (https://phabricator.wikimedia.org/T227432) (owner: 10Ema) [15:09:07] (03CR) 10Ottomata: Add eventgate-logging-external instance (031 comment) [deployment-charts] - 10https://gerrit.wikimedia.org/r/547307 (https://phabricator.wikimedia.org/T236386) (owner: 10Ottomata) [15:11:29] (03CR) 10Jbond: "Thanks updated" (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/546195 (owner: 10Jbond) [15:13:31] (03PS1) 10Andrew Bogott: labtestwikitech: use local database for Echo [mediawiki-config] - 10https://gerrit.wikimedia.org/r/547553 (https://phabricator.wikimedia.org/T119154) [15:14:17] (03CR) 10Alexandros Kosiaris: [C: 04-1] Add eventgate-logging-external instance (031 comment) [deployment-charts] - 10https://gerrit.wikimedia.org/r/547307 (https://phabricator.wikimedia.org/T236386) (owner: 10Ottomata) [15:14:24] (03CR) 10Nuria: [C: 03+1] Enable the encrypted shuffle functionality in Hadoop Analytics [puppet] - 10https://gerrit.wikimedia.org/r/547522 (https://phabricator.wikimedia.org/T236995) (owner: 10Elukey) [15:15:44] https://twitter.com/wikimediatech says this account is temporarily restricted to me, is that known? [15:16:27] (03PS2) 10Andrew Bogott: labtestwikitech: use local database for Echo [mediawiki-config] - 10https://gerrit.wikimedia.org/r/547553 (https://phabricator.wikimedia.org/T119154) [15:16:32] it is restricted to me too, probably twitter detected it as a bot and flagged it [15:16:45] not sure who controls it [15:16:46] no new tweets since Oct 25 either [15:17:18] I am guessing for the same reason- blocked on a captcha or similar [15:17:56] (03PS3) 10Ottomata: Add eventgate-logging-external instance [deployment-charts] - 10https://gerrit.wikimedia.org/r/547307 (https://phabricator.wikimedia.org/T236386) [15:18:02] (03CR) 10Ottomata: Add eventgate-logging-external instance (031 comment) [deployment-charts] - 10https://gerrit.wikimedia.org/r/547307 (https://phabricator.wikimedia.org/T236386) (owner: 10Ottomata) [15:19:01] (03CR) 10Alexandros Kosiaris: [C: 03+1] eventgate 0.0.12 - log to stdout only [deployment-charts] - 10https://gerrit.wikimedia.org/r/547549 (owner: 10Ottomata) [15:19:37] (03CR) 10Alexandros Kosiaris: [C: 03+1] "Just make sure that the logs in logstash shipped via the logging pipeline are as satisfactory to you as the direct logstash approach" [deployment-charts] - 10https://gerrit.wikimedia.org/r/547549 (owner: 10Ottomata) [15:19:54] (03PS4) 10Ottomata: Add eventgate-logging-external instance [deployment-charts] - 10https://gerrit.wikimedia.org/r/547307 (https://phabricator.wikimedia.org/T236386) [15:20:23] (03CR) 10Ottomata: [C: 03+2] "K, will check it out with this new eventgate-logging-external in staging." [deployment-charts] - 10https://gerrit.wikimedia.org/r/547549 (owner: 10Ottomata) [15:20:43] (03PS5) 10Ottomata: Add eventgate-logging-external instance [deployment-charts] - 10https://gerrit.wikimedia.org/r/547307 (https://phabricator.wikimedia.org/T236386) [15:21:10] it’s weird, since Twitter is usually such a bot-friendly platform 😶 [15:24:06] lolol [15:24:07] jynus: I have the password for that account. I will try to see what twitter "flagged" in the account's tweet stream. [15:24:29] (03PS1) 10Elukey: Enable mapreduce.ssl.enabled in Hadoop Analytics [puppet] - 10https://gerrit.wikimedia.org/r/547557 (https://phabricator.wikimedia.org/T236995) [15:24:35] (03CR) 10Jforrester: "Can't you put it back in the wikitech dblist? That'd fix this (and likely other issues), right?" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/547553 (https://phabricator.wikimedia.org/T119154) (owner: 10Andrew Bogott) [15:24:40] you can see it if you click through to the profile [15:24:41] set up a mirror on Mastodon with https://moa.party/ or other ;) [15:24:44] but it's annoying [15:24:44] (03CR) 10Elukey: [C: 03+2] Enable mapreduce.ssl.enabled in Hadoop Analytics [puppet] - 10https://gerrit.wikimedia.org/r/547557 (https://phabricator.wikimedia.org/T236995) (owner: 10Elukey) [15:24:49] (03CR) 10Elukey: [V: 03+2 C: 03+2] Enable mapreduce.ssl.enabled in Hadoop Analytics [puppet] - 10https://gerrit.wikimedia.org/r/547557 (https://phabricator.wikimedia.org/T236995) (owner: 10Elukey) [15:25:07] (03CR) 10Ayounsi: [C: 03+2] FNM, fix netflow parsing for udp (no flag) flows [puppet] - 10https://gerrit.wikimedia.org/r/547534 (owner: 10Ayounsi) [15:26:04] ffs. they want to verify a phone number for the account :/ [15:26:15] "In order to make sure Twitter is as safe as possible, sometimes — like now — you may be asked to confirm you’re not a robot. Easy, right? Just complete the following to get back to the Tweets." [15:26:25] but... it is a robot [15:26:33] IIRC you’re allowed to have up to three accounts per phone number, fwiw [15:26:41] (03CR) 10Andrew Bogott: "> Can't you put it back in the wikitech dblist? That'd fix this (and likely other issues), right?" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/547553 (https://phabricator.wikimedia.org/T119154) (owner: 10Andrew Bogott) [15:26:53] bd808: maybe someone at the office and it is done by call? [15:27:21] I have a couple throw away SMS accounts that I can use :) [15:27:41] let’s !log that so twitter sees it :P [15:27:47] well, I was trying to avoid you future spam [15:29:00] (03CR) 10Jforrester: "> Patch Set 2:" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/547553 (https://phabricator.wikimedia.org/T119154) (owner: 10Andrew Bogott) [15:31:15] (03PS1) 10Jbond: backup::host: remove day and jobsdefault [puppet] - 10https://gerrit.wikimedia.org/r/547559 (https://phabricator.wikimedia.org/T221083) [15:34:01] (03CR) 10Andrew Bogott: "If there's a better way to write 0f90f50665158ccaa78a594c8b5c35a71207bc74 then I welcome that -- I don't understand how database assignmen" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/547553 (https://phabricator.wikimedia.org/T119154) (owner: 10Andrew Bogott) [15:45:14] (03CR) 10Jforrester: [C: 03+1] "> Patch Set 2:" [puppet] - 10https://gerrit.wikimedia.org/r/547283 (https://phabricator.wikimedia.org/T223602) (owner: 10Jcrespo) [15:51:15] (03PS1) 10Ayounsi: Initial netbox interfaces support [software/homer] - 10https://gerrit.wikimedia.org/r/547562 [15:52:35] (03PS1) 10Jhedden: grafana: set default undef on wpt_graphite_proxy_port [puppet] - 10https://gerrit.wikimedia.org/r/547563 (https://phabricator.wikimedia.org/T231870) [15:54:23] (03CR) 10jerkins-bot: [V: 04-1] Initial netbox interfaces support [software/homer] - 10https://gerrit.wikimedia.org/r/547562 (owner: 10Ayounsi) [15:57:14] (03PS1) 10Urbanecm: Change bawiki logo to an anniversary one [mediawiki-config] - 10https://gerrit.wikimedia.org/r/547566 (https://phabricator.wikimedia.org/T237035) [15:59:35] (03CR) 10CDanis: [C: 03+1] grafana: set default undef on wpt_graphite_proxy_port [puppet] - 10https://gerrit.wikimedia.org/r/547563 (https://phabricator.wikimedia.org/T231870) (owner: 10Jhedden) [16:00:04] godog and _joe_: #bothumor Q:Why did functions stop calling each other? A:They had arguments. Rise for Puppet SWAT(Max 6 patches) . (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20191031T1600). [16:00:04] Addshore: A patch you scheduled for Puppet SWAT(Max 6 patches) is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker. [16:00:04] (03CR) 10Jhedden: [C: 03+2] grafana: set default undef on wpt_graphite_proxy_port [puppet] - 10https://gerrit.wikimedia.org/r/547563 (https://phabricator.wikimedia.org/T231870) (owner: 10Jhedden) [16:02:32] (03PS2) 10Ayounsi: Initial netbox interfaces support [software/homer] - 10https://gerrit.wikimedia.org/r/547562 [16:02:37] addshore: https://gerrit.wikimedia.org/r/c/operations/puppet/+/547404/ is merged already \o/ [16:02:42] re: puppet swat [16:04:55] (03CR) 10jerkins-bot: [V: 04-1] Initial netbox interfaces support [software/homer] - 10https://gerrit.wikimedia.org/r/547562 (owner: 10Ayounsi) [16:05:58] (03CR) 10Jcrespo: "> Patch Set 2:" [puppet] - 10https://gerrit.wikimedia.org/r/547283 (https://phabricator.wikimedia.org/T223602) (owner: 10Jcrespo) [16:06:08] (03CR) 10Jcrespo: [C: 04-1] check_private_data: ignore comments on private.dblist [puppet] - 10https://gerrit.wikimedia.org/r/547283 (https://phabricator.wikimedia.org/T223602) (owner: 10Jcrespo) [16:08:33] (03PS1) 10Jbond: add seed [labs/private] - 10https://gerrit.wikimedia.org/r/547567 [16:08:51] (03CR) 10Jbond: [V: 03+2 C: 03+2] add seed [labs/private] - 10https://gerrit.wikimedia.org/r/547567 (owner: 10Jbond) [16:09:10] (03PS2) 10Jbond: add seed [labs/private] - 10https://gerrit.wikimedia.org/r/547567 [16:09:16] (03CR) 10Jbond: [V: 03+2 C: 03+2] add seed [labs/private] - 10https://gerrit.wikimedia.org/r/547567 (owner: 10Jbond) [16:09:26] !log jynus@cumin1001 dbctl commit (dc=all): 'Repool db1119, db1113 at 100%', diff saved to https://phabricator.wikimedia.org/P9513 and previous config saved to /var/cache/conftool/dbconfig/20191031-160925-jynus.json [16:09:30] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:12:58] (03PS3) 10Herron: ipsec: remove check_strongswan in favor of prometheus check [puppet] - 10https://gerrit.wikimedia.org/r/546666 (https://phabricator.wikimedia.org/T230236) [16:14:06] !log restart dbprov2002 after upgrade T236924 [16:14:12] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:14:16] T236924: dbprov2002 slower to generate snapshots - https://phabricator.wikimedia.org/T236924 [16:16:41] (03PS1) 10Jbond: backup::host: refactor [puppet] - 10https://gerrit.wikimedia.org/r/547568 (https://phabricator.wikimedia.org/T221083) [16:16:43] (03PS1) 10Jbond: backup::host: use fqdn_rand_string for password generation [puppet] - 10https://gerrit.wikimedia.org/r/547569 (https://phabricator.wikimedia.org/T221083) [16:18:42] (03CR) 10Bstorm: "This change is ready for review." [puppet] - 10https://gerrit.wikimedia.org/r/546182 (https://phabricator.wikimedia.org/T236290) (owner: 10Jhedden) [16:19:19] (03CR) 10Filippo Giunchedi: "LGTM! (not voting since I don't actually know for real)" [deployment-charts] - 10https://gerrit.wikimedia.org/r/547307 (https://phabricator.wikimedia.org/T236386) (owner: 10Ottomata) [16:19:21] (03CR) 10Jbond: [C: 04-1] "guess they are used" [puppet] - 10https://gerrit.wikimedia.org/r/547559 (https://phabricator.wikimedia.org/T221083) (owner: 10Jbond) [16:19:40] (03PS2) 10Jbond: backup::host: remove day and jobsdefault [puppet] - 10https://gerrit.wikimedia.org/r/547559 (https://phabricator.wikimedia.org/T221083) [16:19:55] (03PS2) 10Jbond: backup::host: refactor [puppet] - 10https://gerrit.wikimedia.org/r/547568 (https://phabricator.wikimedia.org/T221083) [16:20:07] (03PS2) 10Jbond: backup::host: use fqdn_rand_string for password generation [puppet] - 10https://gerrit.wikimedia.org/r/547569 (https://phabricator.wikimedia.org/T221083) [16:21:19] (03CR) 10Herron: [C: 03+1] "thx for the cleanup!" [puppet] - 10https://gerrit.wikimedia.org/r/547489 (https://phabricator.wikimedia.org/T229792) (owner: 10Effie Mouzeli) [16:23:04] (03CR) 10Filippo Giunchedi: [C: 03+1] "LGTM!" [puppet] - 10https://gerrit.wikimedia.org/r/547489 (https://phabricator.wikimedia.org/T229792) (owner: 10Effie Mouzeli) [16:23:33] !log Our @wikimediatech Twitter account is soft blocked pending phone number verification. bd808 trying to figure out a good way to do that verification for a bot account. [16:23:36] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:24:44] bd808: /r/totallynotrobots might know! :) [16:25:09] (03CR) 10Jcrespo: "Alex, what do you think?" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/547568 (https://phabricator.wikimedia.org/T221083) (owner: 10Jbond) [16:27:36] (03CR) 10Jcrespo: backup::host: use fqdn_rand_string for password generation (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/547569 (https://phabricator.wikimedia.org/T221083) (owner: 10Jbond) [16:28:57] (03CR) 10Jforrester: "recheck for CI purposes" [puppet] - 10https://gerrit.wikimedia.org/r/547489 (https://phabricator.wikimedia.org/T229792) (owner: 10Effie Mouzeli) [16:29:02] "phone number verification" i.e. to prove there's a real person [16:29:05] grrrrr [16:30:18] yeah, its a pain. I was trying to use a google voice number I have and managed to get into a soft block for too many attempts now too. So probably 24h before it lets me try again. :/ [16:31:58] (03PS14) 10Jhedden: ceph: add k8s manifests for ceph deployment using rook [puppet] - 10https://gerrit.wikimedia.org/r/546182 (https://phabricator.wikimedia.org/T236290) [16:32:11] out of curioisity, is there a mirror of that twitter account into the fediverse? [16:32:53] liw: good suggestion you are assigned as a volunteer to implement! [16:34:00] (03CR) 10jerkins-bot: [V: 04-1] ceph: add k8s manifests for ceph deployment using rook [puppet] - 10https://gerrit.wikimedia.org/r/546182 (https://phabricator.wikimedia.org/T236290) (owner: 10Jhedden) [16:34:04] I'd have to touch Twitter, sorry [16:36:10] (03PS15) 10Jhedden: ceph: add k8s manifests for ceph deployment using rook [puppet] - 10https://gerrit.wikimedia.org/r/546182 (https://phabricator.wikimedia.org/T236290) [16:38:08] liw: you wouldn’t have to go via Twitter, you could make stashbot post to Mastodon directly [16:38:16] (current related code is mostly at https://gerrit.wikimedia.org/r/plugins/gitiles/labs/tools/stashbot/+/c8951e6959a69db1a741db70211850cd82e24d9a/stashbot/sal.py, I think) [16:38:57] liw: the topic has come up multiple times now (usually from greg-g). Mostly we need to make an account somewhere and like Lucas_WMDE says add support in the bot. [16:39:58] an account is easy to arrange; the code changes are more work (and I've failed to make them for my own CI system, alas) [16:40:28] (03CR) 10Bstorm: "Most of this is just things that I got curious about when reading it." (035 comments) [puppet] - 10https://gerrit.wikimedia.org/r/546182 (https://phabricator.wikimedia.org/T236290) (owner: 10Jhedden) [16:46:40] (03CR) 10Bstorm: ceph: add k8s manifests for ceph deployment using rook (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/546182 (https://phabricator.wikimedia.org/T236290) (owner: 10Jhedden) [16:47:35] (03CR) 10Herron: [C: 03+1] "Should be very useful!" [puppet] - 10https://gerrit.wikimedia.org/r/547505 (https://phabricator.wikimedia.org/T228854) (owner: 10Jbond) [16:51:44] (03PS1) 10Ayounsi: Implement lazy caching for NetboxData [software/homer] - 10https://gerrit.wikimedia.org/r/547576 [16:52:13] (03CR) 10Jhedden: ceph: add k8s manifests for ceph deployment using rook (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/546182 (https://phabricator.wikimedia.org/T236290) (owner: 10Jhedden) [16:54:06] (03CR) 10jerkins-bot: [V: 04-1] Implement lazy caching for NetboxData [software/homer] - 10https://gerrit.wikimedia.org/r/547576 (owner: 10Ayounsi) [16:55:45] (03PS2) 10Ayounsi: Implement lazy caching for NetboxData [software/homer] - 10https://gerrit.wikimedia.org/r/547576 [16:57:10] (03CR) 10Jforrester: "> Patch Set 2:" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/547553 (https://phabricator.wikimedia.org/T119154) (owner: 10Andrew Bogott) [16:57:51] (03CR) 10Bstorm: ceph: add k8s manifests for ceph deployment using rook (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/546182 (https://phabricator.wikimedia.org/T236290) (owner: 10Jhedden) [16:58:27] (03CR) 10Filippo Giunchedi: [C: 03+1] "LGTM! nits inline" (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/547505 (https://phabricator.wikimedia.org/T228854) (owner: 10Jbond) [16:59:42] (03CR) 10Volans: [C: 03+1] "LGTM!" [software/homer] - 10https://gerrit.wikimedia.org/r/547576 (owner: 10Ayounsi) [17:00:04] cscott, arlolra, subbu, halfak, accraze, and mdholloway: Your horoscope predicts another unfortunate Services – Graphoid / Parsoid / Citoid / ORES deploy. May Zuul be (nice) with you. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20191031T1700). [17:00:55] (03CR) 10Herron: [C: 03+1] "LGTM! Minor formatting comment inline" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/547506 (https://phabricator.wikimedia.org/T228854) (owner: 10Jbond) [17:04:46] (03CR) 10Filippo Giunchedi: "LGTM, modulo what Keith was saying or prefix the print with "configuration version" or "puppet version" or sth like that" [puppet] - 10https://gerrit.wikimedia.org/r/547506 (https://phabricator.wikimedia.org/T228854) (owner: 10Jbond) [17:05:40] (03CR) 10Ayounsi: [C: 03+2] Implement lazy caching for NetboxData [software/homer] - 10https://gerrit.wikimedia.org/r/547576 (owner: 10Ayounsi) [17:08:22] (03CR) 10Filippo Giunchedi: "See inline re: --short, other than that LGTM" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/547505 (https://phabricator.wikimedia.org/T228854) (owner: 10Jbond) [17:08:27] 10Operations, 10ops-esams: wipe backup-array1 - https://phabricator.wikimedia.org/T237041 (10Papaul) [17:09:10] (03CR) 10Bstorm: "One other thing to think about here is that with these on bare metal servers, you will likely want ferm rules on the hosts." [puppet] - 10https://gerrit.wikimedia.org/r/546182 (https://phabricator.wikimedia.org/T236290) (owner: 10Jhedden) [17:09:45] (03PS6) 10Jbond: puppet git: add a descriptive config version [puppet] - 10https://gerrit.wikimedia.org/r/547505 (https://phabricator.wikimedia.org/T228854) [17:10:11] (03CR) 10Jbond: puppet git: add a descriptive config version (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/547505 (https://phabricator.wikimedia.org/T228854) (owner: 10Jbond) [17:10:57] Godog, indeed :))) [17:11:14] (03CR) 10Filippo Giunchedi: [C: 03+1] "LGTM! Thanks for tackling this!" [puppet] - 10https://gerrit.wikimedia.org/r/547505 (https://phabricator.wikimedia.org/T228854) (owner: 10Jbond) [17:15:29] (03CR) 10Herron: [C: 03+2] ipsec: remove check_strongswan in favor of prometheus check [puppet] - 10https://gerrit.wikimedia.org/r/546666 (https://phabricator.wikimedia.org/T230236) (owner: 10Herron) [17:17:37] (03CR) 10Andrew Bogott: "> It can be in both." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/547553 (https://phabricator.wikimedia.org/T119154) (owner: 10Andrew Bogott) [17:21:27] (03PS1) 10Ayounsi: Add some missing esams OSPF links [homer/public] - 10https://gerrit.wikimedia.org/r/547583 [17:23:24] (03PS1) 10Ayounsi: Intial interfaces templates [homer/public] - 10https://gerrit.wikimedia.org/r/547584 [17:24:11] (03PS3) 10Dzahn: mariadb: remove cobalt from ferm_misc rules [puppet] - 10https://gerrit.wikimedia.org/r/545333 (https://phabricator.wikimedia.org/T236187) [17:24:12] (03CR) 10Herron: [C: 03+1] puppet_compiler: ensure all working dirs have correct owner [puppet] - 10https://gerrit.wikimedia.org/r/547493 (https://phabricator.wikimedia.org/T236986) (owner: 10Jbond) [17:25:45] (03PS1) 10Ayounsi: Initial forwarding-options templating [homer/public] - 10https://gerrit.wikimedia.org/r/547586 [17:25:48] 10Operations, 10ops-codfw: Degraded RAID on db2120 - https://phabricator.wikimedia.org/T236453 (10Papaul) Create Dispatch: Success You have successfully submitted request SR1001682860. [17:28:22] (03PS1) 10Ayounsi: Initial templating for CR routing-options [homer/public] - 10https://gerrit.wikimedia.org/r/547587 [17:28:25] (03PS14) 10Joal: Refactor profile::analytics::refinery::job::import_mediawiki_dumps [puppet] - 10https://gerrit.wikimedia.org/r/546966 (https://phabricator.wikimedia.org/T234333) [17:31:04] (03PS6) 10CDanis: WIP [puppet] - 10https://gerrit.wikimedia.org/r/547527 [17:31:07] (03CR) 10Jforrester: "Sure." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/547553 (https://phabricator.wikimedia.org/T119154) (owner: 10Andrew Bogott) [17:34:17] (03CR) 10Jforrester: "Oh, I see, someone mapped wikitech.dblist as BOTH a config/type dblist and a DB allocation dblist, which makes everything complicated." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/547553 (https://phabricator.wikimedia.org/T119154) (owner: 10Andrew Bogott) [17:34:25] (03CR) 10Herron: "🤔 hmm, we have the 6 logstash consumers between the two sites today, and will likely add another 6 more consumers in the process of buildi" [puppet] - 10https://gerrit.wikimedia.org/r/547519 (https://phabricator.wikimedia.org/T215904) (owner: 10Filippo Giunchedi) [17:34:44] (03PS7) 10CDanis: WIP [puppet] - 10https://gerrit.wikimedia.org/r/547527 [17:35:21] no parsoid deploy today [17:36:53] (03PS15) 10Elukey: Refactor profile::analytics::refinery::job::import_mediawiki_dumps [puppet] - 10https://gerrit.wikimedia.org/r/546966 (https://phabricator.wikimedia.org/T234333) (owner: 10Joal) [17:37:22] (03PS1) 10Ayounsi: Make NTP servers a variable and update esams IPs [homer/public] - 10https://gerrit.wikimedia.org/r/547589 [17:37:49] (03CR) 10Elukey: [C: 03+2] Refactor profile::analytics::refinery::job::import_mediawiki_dumps [puppet] - 10https://gerrit.wikimedia.org/r/546966 (https://phabricator.wikimedia.org/T234333) (owner: 10Joal) [17:37:55] (03PS1) 10Elukey: Revert TLS MapReduce shuffle configuration for Hadoop Analytics [puppet] - 10https://gerrit.wikimedia.org/r/547590 [17:42:04] (03CR) 10Volans: Initial forwarding-options templating (031 comment) [homer/public] - 10https://gerrit.wikimedia.org/r/547586 (owner: 10Ayounsi) [17:45:32] (03PS1) 10Jforrester: Split out DB-related concerns for wikitech and test wikitech into s-wikitech, s-wikitech-wmcs [mediawiki-config] - 10https://gerrit.wikimedia.org/r/547596 [17:45:34] (03PS1) 10Jforrester: Follow-up 0f90f506: Leave labtestwiki in the wikitech dblist for config [mediawiki-config] - 10https://gerrit.wikimedia.org/r/547597 [17:46:21] andrewbogott: So, those two theoretically will fix all the config drift, but adding new DB sections into etcd is something that the DBAs need to do manually, I think. [17:46:31] andrewbogott: Sorry you had to deal with this mess. [17:46:57] James_F: there's an etcd-avoiding kludge I recommended in the configs for labtestwiki [17:47:11] (03CR) 10Jforrester: [C: 04-2] "Careful DBA involvement needed, I'm pretty sure." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/547596 (owner: 10Jforrester) [17:47:34] cdanis: Yeah, but unfortunately the kludge though reasonable also made other things much worse. [17:47:39] ah [17:47:49] sorry [17:48:05] cdanis: Because the patch also unconfigured labtestwiki as not a 'wikitech', which meant that a bunch of config got dropped. [17:48:42] Not your fault, we're re-using the wikitech config dblist as a database section dblist, which makes everything break when you want them stored on different DB clusters. [17:50:15] aaah [17:50:39] So in my patch I make the 'DB section' lists explicit by having them start with 's'. ;-) [17:50:45] #HighTechBodge. [17:52:23] (03PS2) 10Dzahn: gerrit: change gerrit master_host to gerrit1001, remove duplicate [puppet] - 10https://gerrit.wikimedia.org/r/545342 (https://phabricator.wikimedia.org/T222391) [17:53:33] (03CR) 10Dzahn: "re: the puppet compiler failure that looks like broken yaml: http://www.yamllint.com/ says it's valid, fwiw. but moving some stuff around " [puppet] - 10https://gerrit.wikimedia.org/r/545342 (https://phabricator.wikimedia.org/T222391) (owner: 10Dzahn) [17:54:14] (03CR) 10Dzahn: "https://puppet-compiler.wmflabs.org/compiler1003/19212/" [puppet] - 10https://gerrit.wikimedia.org/r/545342 (https://phabricator.wikimedia.org/T222391) (owner: 10Dzahn) [17:55:14] (03CR) 10Filippo Giunchedi: "> Patch Set 2:" [puppet] - 10https://gerrit.wikimedia.org/r/547519 (https://phabricator.wikimedia.org/T215904) (owner: 10Filippo Giunchedi) [17:55:57] (03PS5) 10Jbond: check_puppetrun: alert critical after 24 hours [puppet] - 10https://gerrit.wikimedia.org/r/546195 [17:59:34] (03PS3) 10Dzahn: gerrit: change gerrit master_host to gerrit1001, remove duplicate [puppet] - 10https://gerrit.wikimedia.org/r/545342 (https://phabricator.wikimedia.org/T222391) [17:59:38] (03PS6) 10Jbond: check_puppetrun: alert critical after 24 hours [puppet] - 10https://gerrit.wikimedia.org/r/546195 [18:00:04] MaxSem, RoanKattouw, Niharika, and Urbanecm: #bothumor Q:How do functions break up? A:They stop calling each other. Rise for Morning SWAT(Max 6 patches) deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20191031T1800). [18:00:04] isaacj: A patch you scheduled for Morning SWAT(Max 6 patches) is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker. [18:00:06] !log fdans@deploy1001 Started deploy [analytics/refinery@8ca04df]: deploying refinery [18:00:19] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:00:25] here [18:00:37] PROBLEM - Host ps1-b8-eqiad is DOWN: PING CRITICAL - Packet loss = 100% [18:01:01] (03CR) 10Paladox: [C: 03+1] gerrit: change gerrit master_host to gerrit1001, remove duplicate [puppet] - 10https://gerrit.wikimedia.org/r/545342 (https://phabricator.wikimedia.org/T222391) (owner: 10Dzahn) [18:01:11] I can SWAT today! [18:01:15] !log fdans@deploy1001 Finished deploy [analytics/refinery@8ca04df]: deploying refinery (duration: 01m 09s) [18:01:27] Also here (jouncebot left me out, but patch us scheduled) [18:01:29] *is [18:01:32] (03PS2) 10Urbanecm: Undeploy reader surveys in English, Polish, and Russian. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/547339 (https://phabricator.wikimedia.org/T232525) (owner: 10Isaac Johnson) [18:01:34] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:01:37] Urbanecm: thanks much! [18:01:39] thanks Urbanecm : just some undeploying of surveys from me [18:01:42] AndyRussG: yup, just see it. [18:02:06] cool! :) [18:03:05] AndyRussG: +2'ed your backports, will let you know once it's ready to be tested [18:03:16] (03CR) 10Dzahn: [C: 03+2] "This cleans it up and the result is gerrit1001 rsync will for now only allow rsync from itself. https://puppet-compiler.wmflabs.org/compil" [puppet] - 10https://gerrit.wikimedia.org/r/545342 (https://phabricator.wikimedia.org/T222391) (owner: 10Dzahn) [18:03:19] Urbanecm: ok sounds good, thanks! [18:03:28] James_F: Hi. Can you help me understand https://phabricator.wikimedia.org/P9514 ? It's re. https://phabricator.wikimedia.org/diffusion/EXSL/browse/master/composer.json [18:03:34] (03CR) 10Urbanecm: [C: 03+2] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/547339 (https://phabricator.wikimedia.org/T232525) (owner: 10Isaac Johnson) [18:03:43] composer file is okay, but somewhat phpcbf refuses to work [18:04:13] hauskater: You need to add a .phpcs.xml file. [18:04:26] right [18:04:40] Yeah, the author forgot to add one [18:04:45] (03Merged) 10jenkins-bot: Undeploy reader surveys in English, Polish, and Russian. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/547339 (https://phabricator.wikimedia.org/T232525) (owner: 10Isaac Johnson) [18:04:49] and a .gitignore one as well [18:06:29] (03CR) 10Cwhite: [C: 03+1] "Looks good!" [puppet] - 10https://gerrit.wikimedia.org/r/547144 (https://phabricator.wikimedia.org/T229792) (owner: 10Effie Mouzeli) [18:06:56] (03CR) 10Dzahn: "only changes on prod gerrit server were to rsyncd setup as expected" [puppet] - 10https://gerrit.wikimedia.org/r/545342 (https://phabricator.wikimedia.org/T222391) (owner: 10Dzahn) [18:07:07] isaacj: could you check it is correctly undeployed, please? [18:07:08] mwdebug1001 [18:07:21] Urbanecm: yep, thanks, will take a few minutes [18:07:22] (03PS4) 10Umherirrender: Switch to wmf specific run mode for $wgDisableQueryPageUpdate [mediawiki-config] - 10https://gerrit.wikimedia.org/r/530871 (https://phabricator.wikimedia.org/T78711) [18:07:27] sure [18:07:32] (03CR) 10Cwhite: profile: get exim metrics from lists (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/546992 (https://phabricator.wikimedia.org/T236505) (owner: 10Cwhite) [18:09:09] Urbanecm: actually that was nice and quick. all looks good. thanks! [18:09:11] (03PS4) 10Cwhite: mtail,profile: add smtp metrics collection with mtail [puppet] - 10https://gerrit.wikimedia.org/r/546290 (https://phabricator.wikimedia.org/T236505) [18:09:59] isaacj: ack, deploying [18:10:23] (03CR) 10Dzahn: [C: 03+2] install_server: remove cobalt from DHCP and partman [puppet] - 10https://gerrit.wikimedia.org/r/545336 (https://phabricator.wikimedia.org/T236187) (owner: 10Dzahn) [18:11:14] AndyRussG: your patch is available to being tested at mwdebug1001, please test and let me know. [18:12:18] Urbanecm: ok one sec! [18:12:33] !log urbanecm@deploy1001 Synchronized wmf-config/InitialiseSettings.php: SWAT: fe08fbb: Undeploy reader surveys in English, Polish, and Russian (T232525) (duration: 01m 02s) [18:12:36] sure AndyRussG [18:12:38] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:12:39] T232525: Repeat demographics surveys for longer time period - https://phabricator.wikimedia.org/T232525 [18:13:49] (03PS3) 10Dzahn: site: turn former Gerrit server into a spare system [puppet] - 10https://gerrit.wikimedia.org/r/545328 (https://phabricator.wikimedia.org/T236187) [18:16:15] Urbanecm: looks fine! I don't see the expected backend result (via Kafka stream on the analytics cluster) but there's also no breakage [18:16:39] It' a simple change, just an update to an EventLogging schema [18:16:45] Urbanecm: so let's go ahead and deploy please [18:19:39] AndyRussG: syncing [18:19:44] (03PS4) 10Dzahn: site: turn former Gerrit server into a spare system [puppet] - 10https://gerrit.wikimedia.org/r/545328 (https://phabricator.wikimedia.org/T236187) [18:20:01] !log urbanecm@deploy1001 Synchronized php-1.35.0-wmf.3/extensions/CentralNotice: SWAT: 963e963: Update CentralNoticeImpression scheme for campaign fallback (T236627) (duration: 01m 01s) [18:20:09] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:20:09] T236627: CentralNotice: Adapt impression event schema for campaign fallback - https://phabricator.wikimedia.org/T236627 [18:20:58] Urbanecm: thanks much! [18:21:25] !log urbanecm@deploy1001 Synchronized php-1.35.0-wmf.4/extensions/CentralNotice: SWAT: 3e5b33f: Update CentralNoticeImpression scheme for campaign fallback (T236627) (duration: 00m 55s) [18:21:30] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:21:34] AndyRussG: done [18:22:04] Urbanecm: cool! Thanks so much!! :) [18:22:10] you're welcome [18:22:17] !log Morning SWAT done [18:22:20] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:22:26] and Urbanecm: validated that deployment looks good more broadly. thanks again! [18:22:34] great! [18:24:09] Urbanecm: are you sure that worked? https://en.wikipedia.org/wiki/Special:Version [18:24:25] I don't see the updated code pointed to under CentralNotice on the version special page [18:26:39] looking [18:27:42] thanks! [18:27:58] (03PS7) 10Jbond: puppet git: add a descriptive config version [puppet] - 10https://gerrit.wikimedia.org/r/547505 (https://phabricator.wikimedia.org/T228854) [18:28:00] (03PS3) 10Jbond: motd: add the config version to the MOTD [puppet] - 10https://gerrit.wikimedia.org/r/547506 (https://phabricator.wikimedia.org/T228854) [18:28:46] Urbanecm: Hmmm I just checked another way, and it does seem to be updated [18:28:52] AndyRussG: good [18:29:01] On the background call to send the event from the client side, it is sending the new schema version [18:29:22] AndyRussG: I see the commit at the deployment machine, which means it should be synced to all servers too [18:29:30] (03PS8) 10Jbond: puppet git: add a descriptive config version [puppet] - 10https://gerrit.wikimedia.org/r/547505 (https://phabricator.wikimedia.org/T228854) [18:30:15] Urbanecm: yes I see it's a different mistake! [18:30:16] aaargh [18:30:24] which one? [18:30:24] (03CR) 10Jbond: "i updated the msg format, makes it a bit nicer for umans but possibly a bit harder to parse. let me know what you think?" [puppet] - 10https://gerrit.wikimedia.org/r/547505 (https://phabricator.wikimedia.org/T228854) (owner: 10Jbond) [18:30:45] Urbanecm: a mistake in the new schema version, so we're not getting the fix we expected [18:30:54] aha :-) [18:31:05] it's ok, just means we have to do this again later :( [18:31:10] many thanks in any case!!! [18:31:25] and sorry to for the waste of time!!! [18:31:37] np [18:34:31] :) [18:36:25] (03PS1) 10Ottomata: analytics.wikimedia.org - set CORS header for redirects too [puppet] - 10https://gerrit.wikimedia.org/r/547617 (https://phabricator.wikimedia.org/T235494) [18:37:05] (03PS2) 10Ottomata: analytics.wikimedia.org - set CORS header for redirects too [puppet] - 10https://gerrit.wikimedia.org/r/547617 (https://phabricator.wikimedia.org/T235494) [18:37:28] (03CR) 10Andrew Bogott: [C: 03+1] "I mostly don't understand this but I am nonetheless in favor :)" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/547596 (owner: 10Jforrester) [18:37:35] (03CR) 10Andrew Bogott: [C: 03+1] Follow-up 0f90f506: Leave labtestwiki in the wikitech dblist for config [mediawiki-config] - 10https://gerrit.wikimedia.org/r/547597 (owner: 10Jforrester) [18:39:24] (03PS8) 10CDanis: WIP [puppet] - 10https://gerrit.wikimedia.org/r/547527 [18:39:33] PROBLEM - Check systemd state on netbox1001 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [18:44:08] (03CR) 10EBernhardson: "* renamed role to search::airflow" [puppet] - 10https://gerrit.wikimedia.org/r/544989 (owner: 10EBernhardson) [18:45:38] (03PS5) 10Cwhite: mtail,profile: add smtp metrics collection with mtail [puppet] - 10https://gerrit.wikimedia.org/r/546290 (https://phabricator.wikimedia.org/T236505) [18:45:55] (03PS1) 10Dzahn: gerrit: allow rsync of home dirs for server migrations [puppet] - 10https://gerrit.wikimedia.org/r/547619 (https://phabricator.wikimedia.org/T236187) [18:46:05] (03PS3) 10EBernhardson: airflow: Add upstream configuration [puppet] - 10https://gerrit.wikimedia.org/r/544996 [18:46:07] (03PS8) 10EBernhardson: airflow: Initial deployment for search platform [puppet] - 10https://gerrit.wikimedia.org/r/544989 (https://phabricator.wikimedia.org/T236180) [18:46:09] (03PS9) 10EBernhardson: airflow: Run webserver and scheduler processes [puppet] - 10https://gerrit.wikimedia.org/r/544990 (https://phabricator.wikimedia.org/T236180) [18:46:23] (03CR) 10Cwhite: mtail,profile: add smtp metrics collection with mtail (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/546290 (https://phabricator.wikimedia.org/T236505) (owner: 10Cwhite) [18:46:33] (03CR) 10jerkins-bot: [V: 04-1] gerrit: allow rsync of home dirs for server migrations [puppet] - 10https://gerrit.wikimedia.org/r/547619 (https://phabricator.wikimedia.org/T236187) (owner: 10Dzahn) [18:51:11] (03PS2) 10Dzahn: gerrit: allow rsync of home dirs for server migrations [puppet] - 10https://gerrit.wikimedia.org/r/547619 (https://phabricator.wikimedia.org/T236187) [18:51:45] (03CR) 10jerkins-bot: [V: 04-1] gerrit: allow rsync of home dirs for server migrations [puppet] - 10https://gerrit.wikimedia.org/r/547619 (https://phabricator.wikimedia.org/T236187) (owner: 10Dzahn) [18:51:47] PROBLEM - Check the Netbox report librenms for fail status. on netbox1001 is CRITICAL: librenms.LibreNMS CRITICAL https://wikitech.wikimedia.org/wiki/Netbox%23Reports [18:55:06] (03PS3) 10Dzahn: gerrit: allow rsync of home dirs for server migrations [puppet] - 10https://gerrit.wikimedia.org/r/547619 (https://phabricator.wikimedia.org/T236187) [18:58:25] (03CR) 10Ottomata: [C: 03+2] analytics.wikimedia.org - set CORS header for redirects too [puppet] - 10https://gerrit.wikimedia.org/r/547617 (https://phabricator.wikimedia.org/T235494) (owner: 10Ottomata) [19:00:04] brennen and twentyafterfour: #bothumor I � Unicode. All rise for Mediawiki train - American Version deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20191031T1900). [19:03:35] twentyafterfour: all looks quiet, promoting to all wikis. [19:05:22] (03PS1) 10Brennen Bearnes: all wikis to 1.35.0-wmf.4 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/547634 [19:05:24] (03CR) 10Brennen Bearnes: [C: 03+2] all wikis to 1.35.0-wmf.4 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/547634 (owner: 10Brennen Bearnes) [19:06:15] (03Merged) 10jenkins-bot: all wikis to 1.35.0-wmf.4 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/547634 (owner: 10Brennen Bearnes) [19:06:41] RECOVERY - Check systemd state on netbox1001 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [19:08:43] (03CR) 10Andrew Bogott: [C: 03+1] "lmk when you're ready to test this post-merge and I'll merge it. (Or you can test with a cherry-pick probably)" [puppet] - 10https://gerrit.wikimedia.org/r/547360 (https://phabricator.wikimedia.org/T236952) (owner: 10Alex Monk) [19:08:58] !log brennen@deploy1001 rebuilt and synchronized wikiversions files: all wikis to 1.35.0-wmf.4 [19:09:03] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:09:36] (03CR) 10Andrew Bogott: [C: 03+1] toolforge: Remove direct TLS termination support from static-server [puppet] - 10https://gerrit.wikimedia.org/r/547363 (https://phabricator.wikimedia.org/T236952) (owner: 10Alex Monk) [19:09:38] (03CR) 10Volans: "Nice work. Most of my comments are just replies, just a couple of minor nits and I think this first version could be merged." (0312 comments) [software/httpbb] - 10https://gerrit.wikimedia.org/r/545689 (https://phabricator.wikimedia.org/T236699) (owner: 10RLazarus) [19:10:24] 10Operations, 10observability, 10serviceops, 10Performance-Team (Radar): Messages in Logstash from php-fatal-error.php are missing from type:mediawiki/channel:fatal - https://phabricator.wikimedia.org/T234283 (10Krinkle) >>! In T234283#5619101, @gerritbot wrote: > Change 546219 **merged** by Effie Mouzeli:... [19:11:58] (03CR) 10Volans: "Last minute comment" (031 comment) [software/httpbb] - 10https://gerrit.wikimedia.org/r/545689 (https://phabricator.wikimedia.org/T236699) (owner: 10RLazarus) [19:21:41] (03CR) 10Alex Monk: "Confirmed working as a cherry-pick" [puppet] - 10https://gerrit.wikimedia.org/r/547360 (https://phabricator.wikimedia.org/T236952) (owner: 10Alex Monk) [19:22:37] (03PS2) 10Andrew Bogott: tools-static: Allow X-Forwarded-Proto: https header [puppet] - 10https://gerrit.wikimedia.org/r/547360 (https://phabricator.wikimedia.org/T236952) (owner: 10Alex Monk) [19:25:39] (03CR) 10Andrew Bogott: [C: 03+2] tools-static: Allow X-Forwarded-Proto: https header [puppet] - 10https://gerrit.wikimedia.org/r/547360 (https://phabricator.wikimedia.org/T236952) (owner: 10Alex Monk) [19:26:44] (03CR) 10Herron: [C: 03+1] motd: add the config version to the MOTD [puppet] - 10https://gerrit.wikimedia.org/r/547506 (https://phabricator.wikimedia.org/T228854) (owner: 10Jbond) [19:28:27] PROBLEM - Router interfaces on cr2-eqiad is CRITICAL: CRITICAL: host 208.80.154.197, interfaces up: 240, down: 1, dormant: 0, excluded: 0, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down [19:28:35] PROBLEM - OSPF status on cr2-codfw is CRITICAL: OSPFv2: 4/5 UP : OSPFv3: 4/5 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status [19:28:37] (03CR) 10Herron: [C: 03+1] hieradata: set kafka-logging default to 6 partitions [puppet] - 10https://gerrit.wikimedia.org/r/547519 (https://phabricator.wikimedia.org/T215904) (owner: 10Filippo Giunchedi) [19:29:46] 10Operations, 10Analytics, 10Analytics-Kanban, 10Traffic, 10observability: Publish tls related info to webrequest via varnish - https://phabricator.wikimedia.org/T233661 (10JAllemandou) Some maths about datasize increase using approximated ratios for values TLS-Version, Key-Exchange, Auth and Cipher from... [19:29:57] !log fdans@deploy1001 Started deploy [analytics/refinery@af91ce6]: deploying refinery, second attempt [19:30:01] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:30:13] the eqiad-codfw link down is unplanned [19:31:07] !log Homer push to ulsfo [19:31:10] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:33:21] (03CR) 10Dzahn: [C: 03+2] "https://puppet-compiler.wmflabs.org/compiler1002/19220/gerrit1001.wikimedia.org/" [puppet] - 10https://gerrit.wikimedia.org/r/547619 (https://phabricator.wikimedia.org/T236187) (owner: 10Dzahn) [19:33:37] 10Operations, 10observability, 10serviceops, 10Performance-Team (Radar): Messages in Logstash from php-fatal-error.php are missing from type:mediawiki/channel:fatal - https://phabricator.wikimedia.org/T234283 (10jijiki) @Krinkle we are looking into it, tx [19:34:20] (03CR) 10Paladox: [C: 03+1] gerrit: allow rsync of home dirs for server migrations [puppet] - 10https://gerrit.wikimedia.org/r/547619 (https://phabricator.wikimedia.org/T236187) (owner: 10Dzahn) [19:35:42] (03CR) 10Ayounsi: [V: 03+2 C: 03+2] Make NTP servers a variable and update esams IPs [homer/public] - 10https://gerrit.wikimedia.org/r/547589 (owner: 10Ayounsi) [19:36:50] !log fdans@deploy1001 Finished deploy [analytics/refinery@af91ce6]: deploying refinery, second attempt (duration: 06m 53s) [19:36:54] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:40:12] (03PS1) 10Volans: devices: allow to expose arbitrary metadata [software/homer] - 10https://gerrit.wikimedia.org/r/547638 (https://phabricator.wikimedia.org/T228388) [19:40:14] (03PS1) 10Volans: Netbox: expose additional metadata [software/homer] - 10https://gerrit.wikimedia.org/r/547639 (https://phabricator.wikimedia.org/T228388) [19:40:36] (03PS11) 10MarcoAurelio: Initial configuration for ge.wikimedia.org [mediawiki-config] - 10https://gerrit.wikimedia.org/r/545909 (https://phabricator.wikimedia.org/T236389) [19:41:38] 10Operations, 10ops-eqiad: rack/setup/install mw13[49-84].eqiad.wmnet - https://phabricator.wikimedia.org/T236437 (10wiki_willy) a:05Joe→03Jclark-ctr Assigning to @Jclark-ctr since he's going to be taking care of the install, but @Joe - let us know if there are any specific racking instructions for these.... [19:41:40] 10Operations, 10Analytics, 10Analytics-Kanban, 10Traffic, 10observability: Publish tls related info to webrequest via varnish - https://phabricator.wikimedia.org/T233661 (10Ottomata) I don't expect addition of these fields to really impact Kafka or much else! :) > I think representing those values in a... [19:44:51] (03PS2) 10Ayounsi: Fix esams + eqsin OSPF links errors [homer/public] - 10https://gerrit.wikimedia.org/r/547583 [19:49:17] (03CR) 10Volans: [C: 04-1] "We're in a bit of a weird situation. We've the assumption of junos in homer/__init__ although we shouldn't, but is also premature to gener" (032 comments) [software/homer] - 10https://gerrit.wikimedia.org/r/547523 (owner: 10Ayounsi) [19:49:25] !log rsyncing home dirs from previous gerrit server cobalt to gerrit1001 [19:49:28] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:49:57] !log Homer push to eqsin [19:50:01] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:51:09] 10Operations, 10ops-eqiad, 10DC-Ops: b8-eqiad pdu refresh (Thursday 10/31 @11am UTC) - https://phabricator.wikimedia.org/T227543 (10wiki_willy) a:05Cmjohnson→03RobH Cable reseated (clip was bent) by @jclark-ctr - reassigning back to @RobH for configuration. [19:53:44] (03PS9) 10CDanis: rsync: add option to TLS-wrap communications [puppet] - 10https://gerrit.wikimedia.org/r/547527 [19:54:56] (03PS10) 10CDanis: rsync: add option to TLS-wrap communications [puppet] - 10https://gerrit.wikimedia.org/r/547527 [19:55:55] 10Operations, 10ops-eqiad, 10DC-Ops: b8-eqiad pdu refresh (Thursday 10/31 @11am UTC) - https://phabricator.wikimedia.org/T227543 (10Jclark-ctr) Reseated cable fixed bent clip. ` # pmshell 1: ps1-a1-eqiad 2: ps1-a2-eqiad 3: ps1-a3-eqiad 4: ps1-a4-eqiad 5: ps1-a5-eqiad 6: ps1-a6-eqiad 7:... [19:56:06] (03CR) 10CDanis: "This is a no-op without a hiera change for rsync::server (and a param change for rsync::quickdatacopy)." [puppet] - 10https://gerrit.wikimedia.org/r/547527 (owner: 10CDanis) [20:00:19] (03CR) 10CDanis: "Oh, this also needs adjustments to fw rules, will do." [puppet] - 10https://gerrit.wikimedia.org/r/547527 (owner: 10CDanis) [20:01:22] XioNoX: looks like they emailed re: the eqiad/codfw link [20:01:29] yep! [20:07:04] !log Homer push to all asw* - new NTP servers - T237011 [20:07:09] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:07:10] T237011: Update DNS/NTP servers on the esams PDUs/SCS - https://phabricator.wikimedia.org/T237011 [20:10:55] ACKNOWLEDGEMENT - OSPF status on cr2-codfw is CRITICAL: OSPFv2: 4/5 UP : OSPFv3: 4/5 UP CDanis Zayo case TTN-0003664585 https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status [20:10:55] ACKNOWLEDGEMENT - Router interfaces on cr2-eqiad is CRITICAL: CRITICAL: host 208.80.154.197, interfaces up: 240, down: 1, dormant: 0, excluded: 0, unused: 0: CDanis Zayo case TTN-0003664585 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down [20:12:53] !log Homer push to all msw* - new NTP servers - T237011 [20:12:57] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:12:58] T237011: Update DNS/NTP servers on the esams PDUs/SCS - https://phabricator.wikimedia.org/T237011 [20:22:24] 10Operations, 10Analytics, 10Analytics-Kanban, 10Traffic, 10observability: Publish tls related info to webrequest via varnish - https://phabricator.wikimedia.org/T233661 (10Nuria) >If we could encode this in the same way as x-analytics, it'd be more easy to refine into a map in wmf.webrequest. +1 to this... [20:25:29] (03CR) 10CDanis: [C: 03+1] FNM, Fix bug causing recovery emails not to be sent [puppet] - 10https://gerrit.wikimedia.org/r/547538 (owner: 10Ayounsi) [20:26:21] (03CR) 10Ayounsi: [V: 03+2 C: 03+2] "Tested and match prod." [homer/public] - 10https://gerrit.wikimedia.org/r/547583 (owner: 10Ayounsi) [20:27:23] !log restarting logstash on logstash1008 to test level->severity filter selector [20:27:26] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:29:37] (03PS1) 10Ayounsi: Fix cr2-eqsin ospf stub interfaces [homer/public] - 10https://gerrit.wikimedia.org/r/547645 [20:29:39] (03PS1) 10Ayounsi: Remove esams asw and csw (decom) [homer/public] - 10https://gerrit.wikimedia.org/r/547646 [20:29:41] (03PS1) 10Ayounsi: Add term vmhost to cr loopback4 filter [homer/public] - 10https://gerrit.wikimedia.org/r/547647 (https://phabricator.wikimedia.org/T236598) [20:32:37] PROBLEM - Check the Netbox report librenms for fail status. on netbox1001 is CRITICAL: librenms.LibreNMS CRITICAL https://wikitech.wikimedia.org/wiki/Netbox%23Reports [20:32:41] (03CR) 10Ayounsi: [C: 03+2] FNM, Fix bug causing recovery emails not to be sent [puppet] - 10https://gerrit.wikimedia.org/r/547538 (owner: 10Ayounsi) [20:32:51] (03PS2) 10Ayounsi: FNM, Fix bug causing recovery emails not to be sent [puppet] - 10https://gerrit.wikimedia.org/r/547538 [20:34:57] (03PS2) 10Dzahn: install_server: remove cobalt from DHCP and partman [puppet] - 10https://gerrit.wikimedia.org/r/545336 (https://phabricator.wikimedia.org/T236187) [20:35:50] !log Homer push to all cr2-eqdfw - new NTP servers, remove border-in4 term unused-ips, add (unused) BGP_Wikimedia_pops, re-order ospf interfaces [20:35:56] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:36:22] (03CR) 10Dzahn: [C: 03+2] install_server: remove cobalt from DHCP and partman [puppet] - 10https://gerrit.wikimedia.org/r/545336 (https://phabricator.wikimedia.org/T236187) (owner: 10Dzahn) [20:37:17] (03PS5) 10Dzahn: site: turn former Gerrit server into a spare system [puppet] - 10https://gerrit.wikimedia.org/r/545328 (https://phabricator.wikimedia.org/T236187) [20:51:24] (03PS6) 10Dzahn: site: turn former Gerrit server into a spare system [puppet] - 10https://gerrit.wikimedia.org/r/545328 (https://phabricator.wikimedia.org/T236187) [20:54:17] 10Operations, 10ops-esams, 10DC-Ops, 10Patch-For-Review: ESAMS Refresh/Rebuild (October 2019) - https://phabricator.wikimedia.org/T235805 (10Papaul) [20:57:23] (03CR) 10Dzahn: [C: 03+2] "https://puppet-compiler.wmflabs.org/compiler1002/19221/" [puppet] - 10https://gerrit.wikimedia.org/r/545328 (https://phabricator.wikimedia.org/T236187) (owner: 10Dzahn) [21:00:17] (03PS4) 10Dzahn: mariadb: remove cobalt from ferm_misc rules [puppet] - 10https://gerrit.wikimedia.org/r/545333 (https://phabricator.wikimedia.org/T236187) [21:00:52] 10Operations, 10MediaWiki-General, 10serviceops-radar, 10Performance-Team (Radar), and 2 others: Use a multi-dc aware store for ObjectCache's MainStash if needed. - https://phabricator.wikimedia.org/T212129 (10Eevans) [21:02:33] (03CR) 10Dzahn: [C: 03+2] mariadb: remove cobalt from ferm_misc rules [puppet] - 10https://gerrit.wikimedia.org/r/545333 (https://phabricator.wikimedia.org/T236187) (owner: 10Dzahn) [21:03:34] (03PS1) 10Gergő Tisza: GrowthExperiments (beta-only): make GE use local search on beta enwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/547648 (https://phabricator.wikimedia.org/T236823) [21:03:58] 10Operations, 10MediaWiki-General, 10serviceops-radar, 10Performance-Team (Radar), and 2 others: Use a multi-dc aware store for ObjectCache's MainStash if needed. - https://phabricator.wikimedia.org/T212129 (10Eevans) [21:05:51] (03PS1) 10Dzahn: remove production IPs for cobalt.wikimedia.org [dns] - 10https://gerrit.wikimedia.org/r/547650 (https://phabricator.wikimedia.org/T236187) [21:07:03] (03PS11) 10CDanis: rsync: add option to TLS-wrap communications [puppet] - 10https://gerrit.wikimedia.org/r/547527 [21:09:13] (03CR) 10jerkins-bot: [V: 04-1] rsync: add option to TLS-wrap communications [puppet] - 10https://gerrit.wikimedia.org/r/547527 (owner: 10CDanis) [21:09:33] PROBLEM - logstash syslog TCP port on logstash1009 is CRITICAL: connect to address 127.0.0.1 and port 10514: Connection refused https://wikitech.wikimedia.org/wiki/Logstash [21:10:23] (03PS12) 10CDanis: rsync: add option to TLS-wrap communications [puppet] - 10https://gerrit.wikimedia.org/r/547527 [21:11:09] RECOVERY - logstash syslog TCP port on logstash1009 is OK: TCP OK - 0.000 second response time on 127.0.0.1 port 10514 https://wikitech.wikimedia.org/wiki/Logstash [21:12:30] (03CR) 10jerkins-bot: [V: 04-1] rsync: add option to TLS-wrap communications [puppet] - 10https://gerrit.wikimedia.org/r/547527 (owner: 10CDanis) [21:13:57] PROBLEM - PyBal backends health check on lvs1016 is CRITICAL: PYBAL CRITICAL - CRITICAL - logstash-syslog-tcp_10514: Servers logstash1008.eqiad.wmnet are marked down but pooled https://wikitech.wikimedia.org/wiki/PyBal [21:14:13] PROBLEM - PyBal backends health check on lvs1015 is CRITICAL: PYBAL CRITICAL - CRITICAL - logstash-syslog-tcp_10514: Servers logstash1008.eqiad.wmnet are marked down but pooled https://wikitech.wikimedia.org/wiki/PyBal [21:14:33] PROBLEM - logstash syslog TCP port on logstash1008 is CRITICAL: connect to address 127.0.0.1 and port 10514: Connection refused https://wikitech.wikimedia.org/wiki/Logstash [21:14:39] PROBLEM - LVS HTTP IPv4 #page on logstash.svc.eqiad.wmnet is CRITICAL: connect to address 10.2.2.36 and port 10514: Connection refused https://wikitech.wikimedia.org/wiki/LVS%23Diagnosing_problems [21:15:36] yo [21:15:37] RECOVERY - PyBal backends health check on lvs1016 is OK: PYBAL OK - All pools are healthy https://wikitech.wikimedia.org/wiki/PyBal [21:15:49] looks like there was some ongoing work in logstash ? [21:15:51] RECOVERY - PyBal backends health check on lvs1015 is OK: PYBAL OK - All pools are healthy https://wikitech.wikimedia.org/wiki/PyBal [21:15:56] hey, I’m at the store not sure what is up with logstash [21:16:11] RECOVERY - logstash syslog TCP port on logstash1008 is OK: TCP OK - 0.000 second response time on 127.0.0.1 port 10514 https://wikitech.wikimedia.org/wiki/Logstash [21:16:12] rolling restarts perhaps ? shdubsh ? [21:16:16] RECOVERY - LVS HTTP IPv4 #page on logstash.svc.eqiad.wmnet is OK: TCP OK - 0.000 second response time on 10.2.2.36 port 10514 https://wikitech.wikimedia.org/wiki/LVS%23Diagnosing_problems [21:16:31] sorry, that's me. been working on it [21:16:34] ah ha [21:16:41] ok! no pb [21:16:42] kk [21:16:47] <_joe_> shdubsh: we gift you a sticker, sir [21:16:57] ah! ok [21:17:04] (03PS1) 10Jdlrobson: Do not load page previews on Special:BlankPage [mediawiki-config] - 10https://gerrit.wikimedia.org/r/547653 (https://phabricator.wikimedia.org/T235797) [21:17:43] 10Operations, 10ops-eqiad, 10DC-Ops, 10serviceops: mw1239 memory errors - https://phabricator.wikimedia.org/T227867 (10Dzahn) still a problem i think: {P9515} [21:18:51] !log ppchelko@deploy1001 Started deploy [restbase/deploy@9cac9ac]: Bump Parsoid-PHP traffic mirroring to 50% T235902 [21:18:56] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:18:57] T235902: Tracking: Shadow Parsoid/PHP deployment to production cluster to handle mirrored reparse traffic - https://phabricator.wikimedia.org/T235902 [21:19:41] ack [21:21:06] (03CR) 10VolkerE: [C: 03+1] Do not load page previews on Special:BlankPage [mediawiki-config] - 10https://gerrit.wikimedia.org/r/547653 (https://phabricator.wikimedia.org/T235797) (owner: 10Jdlrobson) [21:22:11] (03CR) 10Krinkle: [C: 04-1] "I assume Blankpage is used as example of what an article loads at minimum. Unless Popups becomes like TMH (in that, it only loads on some " [mediawiki-config] - 10https://gerrit.wikimedia.org/r/547653 (https://phabricator.wikimedia.org/T235797) (owner: 10Jdlrobson) [21:22:58] (03PS1) 10Cwhite: profile: select on severity and lowercase err [puppet] - 10https://gerrit.wikimedia.org/r/547654 (https://phabricator.wikimedia.org/T234283) [21:25:13] !log setting up ps1-b8-eqiad per T227543. it will reboot twice in the next 15 minutes, and then should start to clear up in icinga [21:25:18] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:25:19] T227543: b8-eqiad pdu refresh (Thursday 10/31 @11am UTC) - https://phabricator.wikimedia.org/T227543 [21:26:01] (03CR) 10Cwhite: [C: 03+2] profile: select on severity and lowercase err [puppet] - 10https://gerrit.wikimedia.org/r/547654 (https://phabricator.wikimedia.org/T234283) (owner: 10Cwhite) [21:27:11] 10Operations, 10ops-codfw, 10DC-Ops, 10decommission: Decommission db2052.codfw.wmnet - https://phabricator.wikimedia.org/T230883 (10Papaul) [21:29:23] RECOVERY - Host ps1-b8-eqiad is UP: PING OK - Packet loss = 0%, RTA = 1.74 ms [21:30:52] 10Operations, 10ops-codfw, 10DC-Ops, 10decommission: Decommission db2060.codfw.wmnet - https://phabricator.wikimedia.org/T231625 (10Papaul) [21:31:12] 10Operations, 10LDAP-Access-Requests: Add bawolff to either NDA or WMF ldap group - https://phabricator.wikimedia.org/T236636 (10Dzahn) "bawolff present in privileged LDAP group (wmf),but not present in data.yaml" ^ users in LDAP groups must also exist in the puppet admin module, please add them there in the... [21:32:35] !log ppchelko@deploy1001 Finished deploy [restbase/deploy@9cac9ac]: Bump Parsoid-PHP traffic mirroring to 50% T235902 (duration: 13m 44s) [21:32:40] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:32:40] T235902: Tracking: Shadow Parsoid/PHP deployment to production cluster to handle mirrored reparse traffic - https://phabricator.wikimedia.org/T235902 [21:32:47] jouncebot: now [21:32:47] No deployments scheduled for the next 1 hour(s) and 27 minute(s) [21:32:50] jouncebot: next [21:32:50] In 1 hour(s) and 27 minute(s): Evening SWAT(Max 6 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20191031T2300) [21:34:08] 10Operations, 10ops-codfw, 10DC-Ops, 10decommission: Decommission db2053.codfw.wmnet - https://phabricator.wikimedia.org/T231407 (10Papaul) [21:36:03] 10Operations, 10ops-codfw, 10DC-Ops, 10decommission, 10Patch-For-Review: Decommission db2059.codfw.wmnet - https://phabricator.wikimedia.org/T230884 (10Papaul) [21:41:05] (03CR) 10RobH: [C: 03+2] ps1-b8-eqiad update for monitoring [puppet] - 10https://gerrit.wikimedia.org/r/547657 (https://phabricator.wikimedia.org/T227543) (owner: 10RobH) [21:41:46] 10Operations, 10ops-esams: Terminate OE10,11,12,13 Racks - https://phabricator.wikimedia.org/T237055 (10wiki_willy) [21:42:26] 10Operations, 10ops-esams: Terminate OE10,11,12,13 Racks - https://phabricator.wikimedia.org/T237055 (10wiki_willy) [21:42:28] 10Operations, 10ops-esams, 10DC-Ops, 10Patch-For-Review: ESAMS Refresh/Rebuild (October 2019) - https://phabricator.wikimedia.org/T235805 (10wiki_willy) [21:45:33] 10Operations, 10ops-esams: Terminate OE10,11,12,13 Racks - https://phabricator.wikimedia.org/T237055 (10wiki_willy) Emailed Jim Buatti last Tuesday to provide overview of what we're trying to do with the contract (have OE14,15,16 renew in May 2021 and terminate OE10,11,12,13 in Nov 2020 with clause to term ear... [21:45:51] RECOVERY - Keyholder SSH agent on deploy1001 is OK: OK: Keyholder is armed with all configured keys. https://wikitech.wikimedia.org/wiki/Keyholder [21:46:07] !log deploy1001 - move apach2modsec deployment key out of keyholder dir, keyholder arm to reload all other deployment keys including the new one for design (T235677) [21:46:12] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:46:12] T235677: Automatic pickup of Gerrit clone master doesn't happen (due to git-lfs not installed on production misc) - https://phabricator.wikimedia.org/T235677 [21:46:54] (03PS1) 10Papaul: DNS: Remove mgmt DNS for db2052,db2053.db2059 and db2060 [dns] - 10https://gerrit.wikimedia.org/r/547663 [21:48:02] (03PS13) 10CDanis: rsync: add option to TLS-wrap communications [puppet] - 10https://gerrit.wikimedia.org/r/547527 [21:49:29] (03CR) 10Papaul: [C: 03+2] DNS: Remove mgmt DNS for db2052,db2053.db2059 and db2060 [dns] - 10https://gerrit.wikimedia.org/r/547663 (owner: 10Papaul) [21:49:34] !log deploy1001 keyholder restart, keyholder arm ... [21:49:37] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:59:49] !log deploy1001 - recreating deploy_design deployment key as ED25519 and with the correct comment (the comment matters and must match path to the file for keyholder) (T235677) [21:59:53] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:59:54] T235677: Automatic pickup of Gerrit clone master doesn't happen (due to git-lfs not installed on production misc) - https://phabricator.wikimedia.org/T235677 [22:00:09] PROBLEM - Keyholder SSH agent on deploy1001 is CRITICAL: CRITICAL: Keyholder is not armed. Run keyholder arm to arm it. https://wikitech.wikimedia.org/wiki/Keyholder [22:00:16] ^ me [22:01:14] twentyafterfour: ^ /etc/keyholder.d/deploy_design (ED25519) [22:01:21] it shows up in "status" now [22:02:19] except the old key is also still in one part of it [22:02:36] restarting keyholder one more time [22:03:15] ok, now it's cleaned up and lgtm [22:03:34] ok should I try to deploy? [22:03:40] yes please [22:04:11] RECOVERY - Keyholder SSH agent on deploy1001 is OK: OK: Keyholder is armed with all configured keys. https://wikitech.wikimedia.org/wiki/Keyholder [22:04:13] !log twentyafterfour@deploy1001 Started deploy [design/style-guide@4d8d085]: testing deploy_design [22:04:17] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:04:31] rescheduling service check.. should recover now [22:04:36] 22:04:13 ['/usr/bin/scap', 'deploy-local', '-v', '--repo', 'design/style-guide', '-g', 'default', 'fetch', '--refresh-config'] on bromine.eqiad.wmnet returned [255]: Permission denied (publickey,keyboard-interactive). [22:04:44] mutante: ^ [22:05:01] do we need to run puppet on the target to get the authorized_keys updated? [22:05:16] yes, we do. running on bromine [22:05:20] bromine.eqiad.wmnet and vega.codfw.wmnet [22:05:35] bromine done. content changed. [22:05:40] cool [22:05:43] !log twentyafterfour@deploy1001 Finished deploy [design/style-guide@4d8d085]: testing deploy_design (duration: 01m 30s) [22:05:46] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:07:07] also on vega: [22:07:10] IOError: [Errno 13] Permission denied: '/srv/deployment/design/style-guide-cache/.config' [22:07:26] there might be an ownership change needed [22:07:38] !log twentyafterfour@deploy1001 Started deploy [design/style-guide@4d8d085]: testing deploy_design [22:07:44] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:07:58] IOError: [Errno 13] Permission denied: '/srv/deployment/design/style-guide-cache/.config' [22:08:01] 22:07:38 ERROR - deploy-local failed: [Errno 13] Permission denied: '/srv/deployment/design/style-guide-cache/.config' [22:08:09] yeah same io error on both servers now but at least ssh is working [22:08:29] ok.. first of all.. glad that SSH part is solved [22:08:34] it's probably owned by deploy_service? [22:08:38] then the permissions.. why did that even happen.. hrmm [22:08:43] looks [22:09:17] I think scap automatically deploys and it sets permissions according to the way the keys are configured, so it's probably been auto-deployed with deploy_service [22:09:20] i understand [22:09:31] it's deploy-service from the group previously used [22:09:36] right [22:12:14] !log bromine sudo find /srv/deployment/design/ -uid 498 -exec chown deploy-design:deploy-design {} \; [22:12:18] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:12:40] probably need the same on vega [22:12:45] done [22:12:45] !log twentyafterfour@deploy1001 deploy aborted: testing deploy_design (duration: 05m 07s) [22:12:48] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:12:57] !log vega sudo find /srv/deployment/design/ -uid 498 -exec chown deploy-design:deploy-design {} \; [22:13:00] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:14:10] (03CR) 10Jdlrobson: "If you are using Blankpage as an example of what an /article/ loads at minimum, this is severely flawed. It seems that the multimedia view" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/547653 (https://phabricator.wikimedia.org/T235797) (owner: 10Jdlrobson) [22:14:14] ok testing again [22:14:24] at least the UIDs were the same on both.. that's being lucky :p [22:14:31] !log twentyafterfour@deploy1001 Started deploy [design/style-guide@4d8d085]: testing deploy_design [22:14:34] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:14:37] !log twentyafterfour@deploy1001 Finished deploy [design/style-guide@4d8d085]: testing deploy_design (duration: 00m 06s) [22:14:38] (03CR) 10Jdlrobson: "(other extensions do similar things to MultimediaViewer e.g. check for the existence of collapsible elements)." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/547653 (https://phabricator.wikimedia.org/T235797) (owner: 10Jdlrobson) [22:14:40] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:14:42] success!!! [22:14:45] yay:) [22:14:51] Volker_E: ^ [22:15:15] big *phew* [22:15:26] so now there is one thing to merge [22:15:31] yep [22:15:38] that is actually switching the docroot to the deployment_dir [22:15:48] i can do that right now [22:15:53] if you think it's fine [22:16:04] or another time if you want to schedule when it gets released [22:16:09] or want to check anything else first [22:16:30] Volker_E: ^ [22:16:30] mutante: that is a question for me? [22:16:48] yea, well.. for both of you [22:16:55] is there something you want to check is there [22:17:08] do you just want it to switch to the new version ASAP? [22:17:08] !log Homer push to cr1/2-codfw [22:17:10] mutante: we still have the different repos in different entry points, style guide in /style-guide/ [22:17:10] ? [22:17:11] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:18:12] Volker_E: yeah, you can see what is deployed by looking at files on deploy1001:/srv/deployment/design/style-guide [22:18:40] Volker_E: design/style-guide changed. design/strategy and design/landing-page are deployed like before.. so far [22:18:49] the change i mean is https://gerrit.wikimedia.org/r/c/operations/puppet/+/546254/1/modules/profile/manifests/microsites/design.pp [22:19:14] it changes /srv/org/wikimedia/design-style-guide to link to /srv/deployment/design/style-guide [22:19:27] the first path is the webserver docroot and the second where you deploy to [22:19:41] hmm my terminal session with deploy got stuck [22:23:12] Volker_E: probably just timeout after you were disconnected from the Internet for a short time [22:24:01] in case I'm gone I go through an area of low connectivity soonish [22:24:19] ah, on a train [22:26:05] no, Cali road, heart of all the Internets money and soul [22:26:40] lol [22:29:43] PROBLEM - Keyholder SSH agent on deploy1001 is CRITICAL: CRITICAL: Keyholder is not armed. Run keyholder arm to arm it. https://wikitech.wikimedia.org/wiki/Keyholder [22:29:49] what... [22:30:54] i can't even confirm that is the case. keyholder status shows them all [22:32:36] the check is too smart. it detects that i moved the other key with the issue out of the way.. [22:33:03] (03CR) 10VolkerE: [C: 03+1] "Couldn't check the latest files with scap deployment, but it's current GitHub (cloned) master, it's fine." [puppet] - 10https://gerrit.wikimedia.org/r/546254 (owner: 1020after4) [22:34:32] Volker_E: it's cloned from gerrit not github [22:35:07] do you want to push to gerrit to make sure it's up to date? [22:35:12] (03PS1) 10Ayounsi: Fix OSPF metrics to match devices config [homer/public] - 10https://gerrit.wikimedia.org/r/547671 [22:35:20] twentyafterfour: Sure, but Gerrit itself is a clone to GitHub [22:35:30] which I did just yesterday [22:35:39] ok then we should be up to date [22:35:39] and everything in master is production-ready [22:35:56] is there a way for us to have the deployment automated [22:36:10] without me needing to login and deploy manually after pushing to Gerrit? [22:36:21] or is this not welcomed? [22:36:57] Volker_E: no scap deployments are done manually though in the future we should eventually have a deployment pipeline with automated continuous deployment [22:37:01] that's a little way off still [22:37:05] it's kind of the point that a human checks it for sanity before it hits production [22:38:14] we're going for +2 requirement on GitHub master normally (only minor changes and the last 2 weeks have been an exception for this) [22:39:28] and we had master branch being transformed into GitHub pages production state automatically before switching over to Foundation infrastructure [22:40:13] the thing is, we don't trust github enough to have automated deployment from there. [22:41:39] i need to do something about the keyholder .. brb [22:42:01] sure, sure [22:42:06] replaces key for "apache2modsec" [22:42:42] what I'm saying is I still do the get the recent clone master to my machine, have another look, push it manually to Gerrit [22:42:46] and from there on [22:43:55] also, is the scap deployment env Apache, I need to do a 302 redirect [22:43:56] ? [22:44:06] (03PS1) 10Bstorm: newk8s: adjust things to be compatible with migration to the new cluster [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/547676 (https://phabricator.wikimedia.org/T236202) [22:44:18] when we get the deployment pipeline done then it should be possible to have things deployed automatically after passing tests [22:44:35] ok, makes sense [22:44:55] yes it is apache afaik [22:45:16] Volker_E: hi, could you please review my patches ? https://gerrit.wikimedia.org/r/#/c/546791/ , https://gerrit.wikimedia.org/r/#/c/mediawiki/extensions/GlobalBlocking/+/546792/ thanks. [22:45:17] (03CR) 10BryanDavis: Docker-images: create new docker images based on buster. (032 comments) [docker-images/toollabs-images] - 10https://gerrit.wikimedia.org/r/543124 (https://phabricator.wikimedia.org/T230961) (owner: 10Phamhi) [22:45:49] Volker_E: I could make a script for you that automates the deployment to basically just one command that you can run [22:46:03] rxy: Did you provide screenshots on the task, that would help for others in comparison in case voices are raised afterwards [22:46:52] twentyafterfour: less process more win (that'd be helpful and help save time for more important things) [22:47:08] k, will do [22:48:46] (03PS1) 10Ayounsi: Add BGP_from_core_LVS policy [homer/public] - 10https://gerrit.wikimedia.org/r/547678 (https://phabricator.wikimedia.org/T167841) [22:49:09] Volker_E: yes, it's apache. that did not change with the deployment method [22:49:21] mutante: thx [22:49:45] mutante: .htaccess directive for 302 is accepted? [22:51:04] Volker_E: please put the directives directly in the apache config. the file is at puppet/modules/profile/templates/design/design.wikimedia.org-httpd.erb [22:51:21] !log Homer push to cr1/2-eqiad [22:51:25] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:51:48] Volker_E: same syntax and all ... [22:51:57] mutante: good that I've asked, thx :) [22:52:39] RedirectMatch 301 /style-guide/(w|wiki)/(.*) "/style-guide/" [22:58:16] (03CR) 10Ayounsi: [V: 03+2 C: 03+2] Fix cr2-eqsin ospf stub interfaces [homer/public] - 10https://gerrit.wikimedia.org/r/547645 (owner: 10Ayounsi) [22:58:35] (03CR) 10Ayounsi: [V: 03+2 C: 03+2] Remove esams asw and csw (decom) [homer/public] - 10https://gerrit.wikimedia.org/r/547646 (owner: 10Ayounsi) [22:59:57] RECOVERY - Keyholder SSH agent on deploy1001 is OK: OK: Keyholder is armed with all configured keys. https://wikitech.wikimedia.org/wiki/Keyholder [23:00:04] MaxSem, RoanKattouw, Niharika, and Urbanecm: My dear minions, it's time we take the moon! Just kidding. Time for Evening SWAT(Max 6 patches) deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20191031T2300). [23:00:04] MatmaRex, kemayo, and Urbanecm: A patch you scheduled for Evening SWAT(Max 6 patches) is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker. [23:00:14] !log replacing deployment keys for apache2secmod ; re-arming keyholder on deployment server [23:00:18] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:00:33] I can SWAT today! [23:00:45] hello [23:01:00] hi MatmaRex [23:03:06] MatmaRex: are your two changes dependant on each other? [23:03:21] no [23:03:31] ok [23:03:50] (03PS1) 10RobH: new *.wmflabs.org certificate [puppet] - 10https://gerrit.wikimedia.org/r/547680 (https://phabricator.wikimedia.org/T237066) [23:04:10] Urbanecm: Hi again! [23:05:18] (03CR) 10Urbanecm: [C: 03+2] Re-enable mobile editor A/B testing [mediawiki-config] - 10https://gerrit.wikimedia.org/r/546724 (https://phabricator.wikimedia.org/T236337) (owner: 10DLynch) [23:05:28] (03PS2) 10Urbanecm: Re-enable mobile editor A/B testing [mediawiki-config] - 10https://gerrit.wikimedia.org/r/546724 (https://phabricator.wikimedia.org/T236337) (owner: 10DLynch) [23:05:42] (03CR) 10Urbanecm: [C: 03+2] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/546724 (https://phabricator.wikimedia.org/T236337) (owner: 10DLynch) [23:06:30] (03Merged) 10jenkins-bot: Re-enable mobile editor A/B testing [mediawiki-config] - 10https://gerrit.wikimedia.org/r/546724 (https://phabricator.wikimedia.org/T236337) (owner: 10DLynch) [23:06:32] hi AndyRussG [23:07:12] MatmaRex: could you test your patch at mwdebug1001? [23:07:28] (only config) [23:07:38] yeah [23:07:56] (03CR) 10Urbanecm: [C: 03+2] Change bawiki logo to an anniversary one [mediawiki-config] - 10https://gerrit.wikimedia.org/r/547566 (https://phabricator.wikimedia.org/T237035) (owner: 10Urbanecm) [23:08:39] (03Merged) 10jenkins-bot: Change bawiki logo to an anniversary one [mediawiki-config] - 10https://gerrit.wikimedia.org/r/547566 (https://phabricator.wikimedia.org/T237035) (owner: 10Urbanecm) [23:08:47] MatmaRex: okay [23:10:31] PROBLEM - Check systemd state on netbox1001 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [23:11:26] !log urbanecm@deploy1001 Synchronized static/images/project-logos/: SWAT: 54ee973: Change bawiki logo to an anniversary one (T237035) (duration: 00m 53s) [23:11:31] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:11:32] T237035: Change Bashkir Wikipedia (bawiki) project logo to an anniversary one - https://phabricator.wikimedia.org/T237035 [23:11:33] (03CR) 10Ayounsi: [V: 03+2 C: 03+2] Fix OSPF metrics to match devices config [homer/public] - 10https://gerrit.wikimedia.org/r/547671 (owner: 10Ayounsi) [23:13:48] !log Purge https://en.wikipedia.org/static/images/project-logos/bawiki* (T237035) [23:13:53] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:14:56] Urbanecm: sorry, give me a minute more, i just realized [23:15:00] i was testing it wrong [23:15:01] :) [23:15:13] MatmaRex: certainly [23:17:55] AndyRussG: your patch is at mwdebug1001, please test and let me know [23:18:46] Urbanecm: testing, thanks! [23:18:57] MatmaRex: and your backport is there as well [23:19:53] Urbanecm: the config patch looks good [23:20:21] MatmaRex: great, syncing [23:20:26] RoanKattouw: are you around? [23:20:36] urandom: Yeah what's up [23:20:38] (03CR) 10Cwhite: "I'm curious to know how this could end up in prometheus. It doesn't look like a problem except for the potential for numerous time series:" (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/547505 (https://phabricator.wikimedia.org/T228854) (owner: 10Jbond) [23:20:50] I guess we're missing that last bullet point on https://phabricator.wikimedia.org/T222851 [23:21:00] I almost closed the issue before realizing that [23:21:22] RoanKattouw: making the echomarkseen API request a GET [23:21:46] Urbanecm: looks fine, thanks much! [23:21:50] Urbanecm: the other patch looks good as well [23:21:57] !log urbanecm@deploy1001 Synchronized wmf-config/InitialiseSettings.php: SWAT: 02bf4b8: Re-enable mobile editor A/B testing (T236337) (duration: 00m 52s) [23:22:02] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:22:02] T236337: Create new way to bucket contributors - https://phabricator.wikimedia.org/T236337 [23:22:03] AndyRussG: and MatmaRex: Good, will sync [23:22:23] RoanKattouw: and, if we're going to do that, the window to get it out on a train this year is limited [23:22:28] urandom: Ugh oh that's right [23:22:58] It's also a nontrivial thing and I have other "day job" stuff that we're trying to get onto next week's train [23:23:12] what makes it nontrivial? [23:23:16] How bad is it if we do that next month, after the November train apocalypse? [23:23:26] I'm not going to TechConf so I'll have some time in mid-Nov [23:23:48] I don't think it'll be super complicated, but it won't be 5 lines like the other things we did [23:23:59] Hmm I suppose we could literally make it a GET instead of a POST and keep everything else the same... [23:24:22] But I'd rather consoliate the requests [23:24:31] I can probably do this in mid-November (2-3 weeks from now) if that's OK? [23:24:32] !log urbanecm@deploy1001 Synchronized php-1.35.0-wmf.4/extensions/VisualEditor/: SWAT: 3686b82: Revert "Parse relative hrefs on image nodes like on regular links" (T237040) (duration: 00m 53s) [23:24:36] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:24:37] T237040: Visual editor changing automatically something in text while saving - https://phabricator.wikimedia.org/T237040 [23:24:45] (03CR) 10Cwhite: "A quick search says that reducing the number of partitions is unsupported. Is that out of date info though?" [puppet] - 10https://gerrit.wikimedia.org/r/547519 (https://phabricator.wikimedia.org/T215904) (owner: 10Filippo Giunchedi) [23:25:02] RoanKattouw: so before the window shuts in December? [23:25:29] RoanKattouw: I'll pass it along and see what everyone says, but just to be clear [23:26:09] When does the window close? [23:26:12] I heard rumors of a Thanksgiving train [23:26:21] But then when do deploys end in December? [23:26:31] I was just trying to find that... [23:26:41] !log urbanecm@deploy1001 Synchronized php-1.35.0-wmf.4/extensions/CentralNotice/extension.json: SWAT: dcd3ec3: Fix error in CentralNoticeImpression schema (T236627) (duration: 00m 51s) [23:26:41] Not sure if RelEng has made public statements about that yet [23:26:53] RoanKattouw: there was sth at the ops list [23:26:55] But assuming that there's a train in, say, the first week of Dec, let's aim for then? [23:27:13] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:27:32] RoanKattouw: https://wikitech.wikimedia.org/wiki/Deployments#Upcoming [23:27:33] T236627: CentralNotice: Adapt impression event schema for campaign fallback - https://phabricator.wikimedia.org/T236627 [23:27:43] RoanKattouw: OK [23:28:13] (03CR) 10Volans: rotatedump: Change to overwriting the daily timestamp dump rather than hour timestamps (031 comment) [software/netbox-deploy] - 10https://gerrit.wikimedia.org/r/546241 (owner: 10CRusnov) [23:28:16] RoanKattouw: I'll pass that along (I think that should be fine though) [23:28:31] AndyRussG: your patch is also deployed [23:28:47] Oh so we've got until Dec 23 [23:28:48] Yeah that should be fine [23:29:47] (03PS23) 10Cwhite: lvs, prometheus, profile: add swagger exporter jobs [puppet] - 10https://gerrit.wikimedia.org/r/542472 (https://phabricator.wikimedia.org/T205870) [23:30:57] (03CR) 10Cwhite: "This change is ready for review." (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/542472 (https://phabricator.wikimedia.org/T205870) (owner: 10Cwhite) [23:31:22] Urbanecm: ok thanks! yeah I see it there [23:31:34] AndyRussG: great [23:32:10] Urbanecm: and working as expected this time!!! Thanks again :) [23:32:18] happy to help [23:33:36] !log Evening SWAT done [23:33:39] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:33:50] (03PS2) 10Bstorm: newk8s: adjust things to be compatible with migration to the new cluster [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/547676 (https://phabricator.wikimedia.org/T236202) [23:37:31] RECOVERY - Check systemd state on netbox1001 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state