[01:01:35] RECOVERY - Check systemd state on mwmaint1002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [01:02:39] RECOVERY - Check the last execution of mediawiki_job_parser_cache_purging on mwmaint1002 is OK: OK: Status of the systemd unit mediawiki_job_parser_cache_purging https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [01:28:52] 10Operations, 10Commons, 10MediaWiki-File-management, 10Thumbor, 10Traffic: 500, Internal Server Error on Commons for images at specified size - https://phabricator.wikimedia.org/T250211 (10Krinkle) [04:47:08] * kart_ updating cxserver.. [04:55:08] OK. rebase will take time it seems.. [04:55:44] 10Operations, 10DBA: move db1114 to s8 - https://phabricator.wikimedia.org/T250224 (10Marostegui) p:05Triage→03High [04:55:56] (03PS3) 10Marostegui: install_server: Allow reimage of labsdb1011 [puppet] - 10https://gerrit.wikimedia.org/r/587628 (https://phabricator.wikimedia.org/T249188) [04:57:33] (03CR) 10Marostegui: [C: 03+2] install_server: Allow reimage of labsdb1011 [puppet] - 10https://gerrit.wikimedia.org/r/587628 (https://phabricator.wikimedia.org/T249188) (owner: 10Marostegui) [05:00:41] (03PS1) 10Marostegui: mariadb: Move db1114 to s8 [puppet] - 10https://gerrit.wikimedia.org/r/588831 (https://phabricator.wikimedia.org/T250224) [05:02:22] (03CR) 10Marostegui: [C: 03+2] mariadb: Move db1114 to s8 [puppet] - 10https://gerrit.wikimedia.org/r/588831 (https://phabricator.wikimedia.org/T250224) (owner: 10Marostegui) [05:02:27] (03PS2) 10KartikMistry: Update cxserver to 2020-04-13-094138-production [deployment-charts] - 10https://gerrit.wikimedia.org/r/588540 (https://phabricator.wikimedia.org/T239459) [05:05:21] (03PS1) 10Marostegui: install_server: Allow reimage db1114 [puppet] - 10https://gerrit.wikimedia.org/r/588832 (https://phabricator.wikimedia.org/T250224) [05:07:29] !log Remove db1114 from tendril - T250224 [05:07:35] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:07:36] T250224: move db1114 to s8 - https://phabricator.wikimedia.org/T250224 [05:07:44] (03CR) 10Marostegui: [C: 03+2] install_server: Allow reimage db1114 [puppet] - 10https://gerrit.wikimedia.org/r/588832 (https://phabricator.wikimedia.org/T250224) (owner: 10Marostegui) [05:08:31] marostegui: OK to deploy cxserver now or anything going on deploy1001? [05:08:42] kart_: yep! [05:08:50] kart_: good to deploy :) [05:08:55] cool. [05:09:01] (03CR) 10KartikMistry: [C: 03+2] Update cxserver to 2020-04-13-094138-production [deployment-charts] - 10https://gerrit.wikimedia.org/r/588540 (https://phabricator.wikimedia.org/T239459) (owner: 10KartikMistry) [05:09:18] (03Merged) 10jenkins-bot: Update cxserver to 2020-04-13-094138-production [deployment-charts] - 10https://gerrit.wikimedia.org/r/588540 (https://phabricator.wikimedia.org/T239459) (owner: 10KartikMistry) [05:11:19] !log kartik@deploy1001 helmfile [STAGING] Ran 'apply' command on namespace 'cxserver' for release 'staging' . [05:11:23] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:13:27] 10Operations, 10Patch-For-Review: Upgrade install servers to Buster - https://phabricator.wikimedia.org/T224576 (10Marostegui) Looks like the puppet issue isn't yet fixed, no? (I thought it was, so commenting here just to make sure) I pushed a change to netboot.cfg which does show on a puppet run on install10... [05:13:52] !log kartik@deploy1001 helmfile [EQIAD] Ran 'apply' command on namespace 'cxserver' for release 'production' . [05:13:56] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:14:57] 10Operations, 10DBA: move db1114 to s8 - https://phabricator.wikimedia.org/T250224 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by marostegui on cumin1001.eqiad.wmnet for hosts: ` ['db1114.eqiad.wmnet'] ` The log can be found in `/var/log/wmf-auto-reimage/202004150514_marostegui_23354.log`. [05:17:37] !log kartik@deploy1001 helmfile [CODFW] Ran 'apply' command on namespace 'cxserver' for release 'production' . [05:17:41] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:21:24] !log Remove db1114 from tendril and zarcillo T250224 [05:21:29] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:21:30] T250224: move db1114 to s8 - https://phabricator.wikimedia.org/T250224 [05:22:24] !log Update cxserver to 2020-04-13-094138-production (T239459, T249469) [05:22:30] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:22:31] T239459: service-runner apps (wikifeeds/cxserver at the least) running on kubernetes emit logs with log level 50 - https://phabricator.wikimedia.org/T239459 [05:22:31] T249469: Change default Chinese MT service to Google Translate - https://phabricator.wikimedia.org/T249469 [05:24:15] 10Operations, 10CX-cxserver, 10Product-Infrastructure-Team-Backlog, 10Wikifeeds, and 4 others: service-runner apps (wikifeeds/cxserver at the least) running on kubernetes emit logs with log level 50 - https://phabricator.wikimedia.org/T239459 (10KartikMistry) Verified: cxserver is now using named levels. [05:25:20] !log marostegui@cumin1001 START - Cookbook sre.hosts.downtime [05:25:25] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:27:46] !log marostegui@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) [05:27:50] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:30:35] 10Operations, 10Core Platform Team, 10Traffic, 10serviceops, 10Performance-Team (Radar): Reduce rate of purges emitted by MediaWiki - https://phabricator.wikimedia.org/T250205 (10Joe) a:05Joe→03None >>! In T250205#6056793, @daniel wrote: > @Joe You are assigned to this ticket, is this something you a... [05:31:30] 10Operations, 10DBA: move db1114 to s8 - https://phabricator.wikimedia.org/T250224 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['db1114.eqiad.wmnet'] ` and were **ALL** successful. [05:37:03] (03CR) 10Elukey: [C: 03+2] Set 10G for the JVM's young gen size to cloudelastic-chi [puppet] - 10https://gerrit.wikimedia.org/r/588753 (https://phabricator.wikimedia.org/T231517) (owner: 10Elukey) [05:41:09] (03CR) 10Elukey: [C: 03+1] "Amazing, thanks a lot for all this work! \o/" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/588760 (https://phabricator.wikimedia.org/T224454) (owner: 10CDanis) [05:45:06] 10Operations, 10DBA, 10Traffic, 10serviceops: Audit and harmonize timeouts across the stack - https://phabricator.wikimedia.org/T250251 (10Joe) [05:49:45] !log installing git security updates [05:49:50] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:04:05] !log update to ats 8.0.7-rc0-1wm2 on cp[5006,5012] - T249335 [06:04:10] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:04:11] T249335: Memory leak on ats-tls 8.0.6 - https://phabricator.wikimedia.org/T249335 [06:14:57] 10Operations, 10DBA: move db1114 to s8 - https://phabricator.wikimedia.org/T250224 (10Marostegui) I am populating this host from s8's snapshot from yesterday (600GB) already transferred. [06:18:49] (03CR) 10Elukey: "Added some comments! What are the hosts involved? It would be great to start using the puppet compiler to see diffs etc.." (037 comments) [puppet] - 10https://gerrit.wikimedia.org/r/588752 (https://phabricator.wikimedia.org/T249941) (owner: 10Andrew Bogott) [06:32:31] !log set uRPF log action back to log infra wide - T244147 [06:32:36] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:33:03] (03CR) 10Hashar: "We talked yesterday about setting the target to /var/lib/zuul/.ssh/known_hosts as it is right now on contint1001. But I rather have it in" [puppet] - 10https://gerrit.wikimedia.org/r/588708 (https://phabricator.wikimedia.org/T224591) (owner: 10Hashar) [06:42:30] !log Deploy schema change on labswiki - T250057 [06:42:35] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:42:36] T250057: type_acton index in logging table is lingering in production - https://phabricator.wikimedia.org/T250057 [06:43:08] !log re-set asw2-c-eqiad's licenses [06:43:13] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:43:54] !log Deploy schema change on labtestwiki - T250057 [06:43:59] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:47:28] (03CR) 10Muehlenhoff: role::mail::mx: enable jumpcloud test domain (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/588425 (https://phabricator.wikimedia.org/T244792) (owner: 10Jbond) [06:47:52] !log Deploy schema change on s6 codfw with replication - T250057 [06:47:59] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:48:00] T250057: type_acton index in logging table is lingering in production - https://phabricator.wikimedia.org/T250057 [06:52:10] 10Operations, 10Patch-For-Review: Upgrade install servers to Buster - https://phabricator.wikimedia.org/T224576 (10Dzahn) @Marostegui Please go to apt1001.wikimedia.org and you should see your change there. The "preseed" profile which includes /srv/autoinstall files is applied on the same hosts that have the A... [06:52:36] marostegui: ^ your change should be on apt1001 [06:53:21] mutante: yeah, it was on apt1001 indeed [06:53:37] marostegui: i should delete those files from install1003 and leave a message instead.. doing that [06:53:56] 10Operations, 10Patch-For-Review: Upgrade install servers to Buster - https://phabricator.wikimedia.org/T224576 (10Marostegui) Yeah, it was applied there indeed. Thanks for the clarification! [06:53:57] yeah, that's useful [06:53:58] thank you [06:54:40] 10Operations, 10LDAP-Access-Requests: LDAP access to the wmf group for Matthew Williams - https://phabricator.wikimedia.org/T249844 (10MoritzMuehlenhoff) 05Resolved→03Open This needs a corresponding entry in data.yaml [06:54:43] yw [06:55:03] PROBLEM - Work requests waiting in Zuul Gearman server on contint1001 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [150.0] https://www.mediawiki.org/wiki/Continuous_integration/Zuul https://grafana.wikimedia.org/dashboard/db/zuul-gearman?panelId=10&fullscreen&orgId=1 [06:55:45] !log install1003 moving /srv/autoinstall to /root, running puppet, leaving a README file to point out it moved to apt1001 [06:55:50] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:56:34] (03PS1) 10Muehlenhoff: Fix email entry for Stephen [puppet] - 10https://gerrit.wikimedia.org/r/588939 (https://phabricator.wikimedia.org/T250134) [06:57:14] (03PS1) 10Ema: cache: check that the purged process is running [puppet] - 10https://gerrit.wikimedia.org/r/588940 (https://phabricator.wikimedia.org/T249583) [07:00:56] (03CR) 10Muehlenhoff: [C: 03+2] Fix email entry for Stephen [puppet] - 10https://gerrit.wikimedia.org/r/588939 (https://phabricator.wikimedia.org/T250134) (owner: 10Muehlenhoff) [07:01:24] (03PS1) 10Ema: cache: test purged on cp2029 [puppet] - 10https://gerrit.wikimedia.org/r/588942 (https://phabricator.wikimedia.org/T249583) [07:03:21] PROBLEM - rsyslog in codfw is failing to deliver messages on icinga1001 is CRITICAL: action=fwd_centrallog2001.codfw.wmnet:6514 https://wikitech.wikimedia.org/wiki/Rsyslog https://grafana.wikimedia.org/d/000000596/rsyslog?var-datasource=codfw+prometheus/ops [07:04:21] PROBLEM - rsyslog in eqiad is failing to deliver messages on icinga1001 is CRITICAL: action=fwd_centrallog2001.codfw.wmnet:6514 https://wikitech.wikimedia.org/wiki/Rsyslog https://grafana.wikimedia.org/d/000000596/rsyslog?var-datasource=eqiad+prometheus/ops [07:06:13] RECOVERY - rsyslog in eqiad is failing to deliver messages on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Rsyslog https://grafana.wikimedia.org/d/000000596/rsyslog?var-datasource=eqiad+prometheus/ops [07:07:03] RECOVERY - rsyslog in codfw is failing to deliver messages on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Rsyslog https://grafana.wikimedia.org/d/000000596/rsyslog?var-datasource=codfw+prometheus/ops [07:10:13] (03CR) 10Marostegui: "Thanks for the fix" [puppet] - 10https://gerrit.wikimedia.org/r/588939 (https://phabricator.wikimedia.org/T250134) (owner: 10Muehlenhoff) [07:20:16] (03PS1) 10Ema: vcl: introduce wm_admission_policies [puppet] - 10https://gerrit.wikimedia.org/r/588945 (https://phabricator.wikimedia.org/T249809) [07:20:55] (03PS1) 10Giuseppe Lavagetto: mediawiki::php: prevent max_execution_time from triggering [puppet] - 10https://gerrit.wikimedia.org/r/588946 (https://phabricator.wikimedia.org/T248564) [07:22:41] (03PS1) 10Ayounsi: uRPF: switch back to log locally only [homer/public] - 10https://gerrit.wikimedia.org/r/588947 (https://phabricator.wikimedia.org/T244147) [07:23:41] (03CR) 10Giuseppe Lavagetto: [C: 03+2] parsoid: allow retries for connection resets in envoy [puppet] - 10https://gerrit.wikimedia.org/r/587732 (https://phabricator.wikimedia.org/T249705) (owner: 10Giuseppe Lavagetto) [07:23:54] <_joe_> gah fat fingers [07:24:06] <_joe_> I wanted to -1 that change :D [07:24:42] (03CR) 10Giuseppe Lavagetto: [C: 04-1] "While this could work, we've gone the other way and enabled retries in the downstream client, which seems the correct thing to do." [puppet] - 10https://gerrit.wikimedia.org/r/587732 (https://phabricator.wikimedia.org/T249705) (owner: 10Giuseppe Lavagetto) [07:25:08] (03Abandoned) 10Giuseppe Lavagetto: parsoid: allow retries for connection resets in envoy [puppet] - 10https://gerrit.wikimedia.org/r/587732 (https://phabricator.wikimedia.org/T249705) (owner: 10Giuseppe Lavagetto) [07:25:11] heh :) but not submitted [07:26:28] <_joe_> yeah [07:27:28] (03CR) 10Ayounsi: [C: 03+2] uRPF: switch back to log locally only [homer/public] - 10https://gerrit.wikimedia.org/r/588947 (https://phabricator.wikimedia.org/T244147) (owner: 10Ayounsi) [07:28:00] (03CR) 10Vgutierrez: [C: 03+1] cache: check that the purged process is running [puppet] - 10https://gerrit.wikimedia.org/r/588940 (https://phabricator.wikimedia.org/T249583) (owner: 10Ema) [07:28:06] (03CR) 10Vgutierrez: [C: 03+1] cache: test purged on cp2029 [puppet] - 10https://gerrit.wikimedia.org/r/588942 (https://phabricator.wikimedia.org/T249583) (owner: 10Ema) [07:30:47] (03CR) 10Giuseppe Lavagetto: cache: check that the purged process is running (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/588940 (https://phabricator.wikimedia.org/T249583) (owner: 10Ema) [07:31:45] (03PS1) 10Ayounsi: uRPF: sample and discard [homer/public] - 10https://gerrit.wikimedia.org/r/588948 (https://phabricator.wikimedia.org/T244147) [07:35:30] !log restart cloudelastic-chi on cloudelastic1002 to apply new jvm settings - T231517 [07:35:37] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:35:39] T231517: Investigate and fix GC issues on cloudelastic machines - https://phabricator.wikimedia.org/T231517 [07:39:34] (03CR) 10Ema: "VTC tests are green on both text and upload:" [puppet] - 10https://gerrit.wikimedia.org/r/588945 (https://phabricator.wikimedia.org/T249809) (owner: 10Ema) [07:43:30] 10Operations, 10observability: production-logstash elastic cluster is yellow state - https://phabricator.wikimedia.org/T250133 (10elukey) Question to understand the next steps - Should https://gerrit.wikimedia.org/r/588740 reduce space used or does it need a manual clean up? I still see the cluster in yellow s... [07:49:58] (03CR) 10Giuseppe Lavagetto: [C: 03+2] "https://puppet-compiler.wmflabs.org/compiler1001/21933/" [puppet] - 10https://gerrit.wikimedia.org/r/588946 (https://phabricator.wikimedia.org/T248564) (owner: 10Giuseppe Lavagetto) [07:50:07] RECOVERY - Work requests waiting in Zuul Gearman server on contint1001 is OK: OK: Less than 100.00% above the threshold [90.0] https://www.mediawiki.org/wiki/Continuous_integration/Zuul https://grafana.wikimedia.org/dashboard/db/zuul-gearman?panelId=10&fullscreen&orgId=1 [07:58:53] (03CR) 10Filippo Giunchedi: utils: fix hiera Debian package name (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/588729 (owner: 10Filippo Giunchedi) [07:59:03] (03CR) 10Ema: cache: check that the purged process is running (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/588940 (https://phabricator.wikimedia.org/T249583) (owner: 10Ema) [07:59:21] !log Deploy schema change on s7 codfw master - T250057 [07:59:27] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:59:28] T250057: type_acton index in logging table is lingering in production - https://phabricator.wikimedia.org/T250057 [08:04:00] 10Operations, 10SRE-Access-Requests: Requesting access to analytics-privatedata-users for Jim Maddock - https://phabricator.wikimedia.org/T249873 (10fgiunchedi) [08:08:22] (03PS1) 10Marostegui: db1114: Enable notifications [puppet] - 10https://gerrit.wikimedia.org/r/588949 (https://phabricator.wikimedia.org/T250224) [08:10:10] (03CR) 10Kormat: [C: 03+2] db1114: Enable notifications [puppet] - 10https://gerrit.wikimedia.org/r/588949 (https://phabricator.wikimedia.org/T250224) (owner: 10Marostegui) [08:10:24] 10Operations, 10SRE-Access-Requests: Requesting access to analytics-privatedata-users for Jim Maddock - https://phabricator.wikimedia.org/T249873 (10fgiunchedi) [08:10:25] (03PS2) 10Ema: cache: check that the purged process is running [puppet] - 10https://gerrit.wikimedia.org/r/588940 (https://phabricator.wikimedia.org/T249583) [08:10:27] (03PS2) 10Ema: cache: test purged on cp2029 [puppet] - 10https://gerrit.wikimedia.org/r/588942 (https://phabricator.wikimedia.org/T249583) [08:14:21] !log marostegui@cumin1001 dbctl commit (dc=all): 'Pool db1114 on s8 with low weight T250224', diff saved to https://phabricator.wikimedia.org/P10985 and previous config saved to /var/cache/conftool/dbconfig/20200415-081421-marostegui.json [08:14:27] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:14:29] T250224: move db1114 to s8 - https://phabricator.wikimedia.org/T250224 [08:15:55] (03PS1) 10Filippo Giunchedi: admin: add Jim Maddock [puppet] - 10https://gerrit.wikimedia.org/r/588951 (https://phabricator.wikimedia.org/T249873) [08:16:16] (03CR) 10Giuseppe Lavagetto: [C: 03+1] vcl: toggle to block non-API traffic from public clouds (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/588135 (owner: 10Ema) [08:18:16] (03PS2) 10Filippo Giunchedi: admin: add Jim Maddock [puppet] - 10https://gerrit.wikimedia.org/r/588951 (https://phabricator.wikimedia.org/T249873) [08:21:09] (03CR) 10jerkins-bot: [V: 04-1] admin: add Jim Maddock [puppet] - 10https://gerrit.wikimedia.org/r/588951 (https://phabricator.wikimedia.org/T249873) (owner: 10Filippo Giunchedi) [08:26:50] (03PS4) 10Ema: vcl: toggle to block non-API traffic from public clouds [puppet] - 10https://gerrit.wikimedia.org/r/588135 [08:27:03] (03CR) 10Ema: vcl: toggle to block non-API traffic from public clouds (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/588135 (owner: 10Ema) [08:27:29] (03CR) 10Ema: [C: 03+2] cache: check that the purged process is running [puppet] - 10https://gerrit.wikimedia.org/r/588940 (https://phabricator.wikimedia.org/T249583) (owner: 10Ema) [08:29:19] 10Operations, 10Anti-Harassment, 10SRE-Access-Requests, 10Patch-For-Review: Requesting access to analytics-privatedata-users for tchanders, dmaza, dbarratt, wikigit - https://phabricator.wikimedia.org/T249059 (10fgiunchedi) @Tchanders @dmaza @dbarratt @Mooeypoo from the task conversations it seems that ht... [08:34:29] (03PS2) 10Filippo Giunchedi: utils: fix hiera Debian package name [puppet] - 10https://gerrit.wikimedia.org/r/588729 [08:34:31] (03PS3) 10Filippo Giunchedi: admin: add Jim Maddock [puppet] - 10https://gerrit.wikimedia.org/r/588951 (https://phabricator.wikimedia.org/T249873) [08:35:19] (03CR) 10Dzahn: [C: 03+1] admin: add Jim Maddock [puppet] - 10https://gerrit.wikimedia.org/r/588951 (https://phabricator.wikimedia.org/T249873) (owner: 10Filippo Giunchedi) [08:35:59] (03CR) 10Ema: [C: 03+2] cache: test purged on cp2029 [puppet] - 10https://gerrit.wikimedia.org/r/588942 (https://phabricator.wikimedia.org/T249583) (owner: 10Ema) [08:36:14] (03PS1) 10Vgutierrez: Revert "ATS: Re-enable parent proxies on ats-tls" [puppet] - 10https://gerrit.wikimedia.org/r/588954 (https://phabricator.wikimedia.org/T249335) [08:38:00] (03PS2) 10Dzahn: gerrit: remove unused parameter for cache_text_nodes [puppet] - 10https://gerrit.wikimedia.org/r/588699 [08:38:34] (03CR) 10jerkins-bot: [V: 04-1] gerrit: remove unused parameter for cache_text_nodes [puppet] - 10https://gerrit.wikimedia.org/r/588699 (owner: 10Dzahn) [08:38:57] (03PS3) 10Dzahn: gerrit: remove unused parameter for cache_text_nodes [puppet] - 10https://gerrit.wikimedia.org/r/588699 [08:40:49] 10Operations, 10LDAP-Access-Requests: LDAP access to the wmf group for Sam Walton - https://phabricator.wikimedia.org/T250189 (10fgiunchedi) Hi @Samwalton9, as far as I know for Superset access only the shell access is not required. I'm CC'ing @Nuria for signoff. [08:41:34] 10Operations, 10LDAP-Access-Requests: LDAP access to the wmf group for Sam Walton - https://phabricator.wikimedia.org/T250189 (10fgiunchedi) p:05Triage→03Medium [08:41:39] 10Operations, 10LDAP-Access-Requests: LDAP access to the wmf group for Matthew Williams - https://phabricator.wikimedia.org/T249844 (10fgiunchedi) p:05Triage→03Medium [08:41:43] 10Operations, 10LDAP-Access-Requests: LDAP access to the wmf group for Sam Walton - https://phabricator.wikimedia.org/T250189 (10Samwalton9) > as far as I know for Superset access only the shell access is not required Oh, sure, it wasn't clear to me if the 'shell access' question was about whether I already h... [08:42:20] !log elastic (search cluster) reindex commmonswiki_content on cloudelastic (T246882) [08:42:25] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:42:26] T246882: commonswiki shard size grew more than 50G in eqiad and codfw - https://phabricator.wikimedia.org/T246882 [08:42:34] (03CR) 10jerkins-bot: [V: 04-1] gerrit: remove unused parameter for cache_text_nodes [puppet] - 10https://gerrit.wikimedia.org/r/588699 (owner: 10Dzahn) [08:43:10] !log errata: elastic (search cluster) reindexing commonswiki_content on cloudelastic (T246882) [08:43:15] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:44:06] 10Operations, 10DBA, 10Traffic, 10serviceops: Audit and harmonize timeouts across the stack - https://phabricator.wikimedia.org/T250251 (10fgiunchedi) p:05Triage→03Medium [08:44:30] 10Operations, 10MediaWiki-Parser, 10serviceops: purgeParserCache.php: Cannot purge this kind of parser cache - https://phabricator.wikimedia.org/T250231 (10fgiunchedi) p:05Triage→03Medium [08:44:35] 10Operations, 10Commons, 10MediaWiki-File-management, 10Thumbor, 10Traffic: 500, Internal Server Error on Commons for images at specified size - https://phabricator.wikimedia.org/T250211 (10fgiunchedi) p:05Triage→03Medium [08:46:09] (03PS4) 10Dzahn: gerrit: remove unused parameter for cache_text_nodes [puppet] - 10https://gerrit.wikimedia.org/r/588699 [08:46:18] !log reset edac counters on scb1001 [08:46:22] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:46:47] (03CR) 10jerkins-bot: [V: 04-1] gerrit: remove unused parameter for cache_text_nodes [puppet] - 10https://gerrit.wikimedia.org/r/588699 (owner: 10Dzahn) [08:47:57] (03PS1) 10Jcrespo: install_server: Remove backup2002 from recipe to test manual install [puppet] - 10https://gerrit.wikimedia.org/r/588955 (https://phabricator.wikimedia.org/T156955) [08:48:30] (03CR) 10Jcrespo: [C: 03+2] bacula: Increase max total size of Databases backups to 40 TB [puppet] - 10https://gerrit.wikimedia.org/r/588668 (https://phabricator.wikimedia.org/T138562) (owner: 10Jcrespo) [08:49:25] 10Operations, 10ops-eqiad: Interface errors on asw2-c-eqiad - ge-3/0/9 (pc1009) - https://phabricator.wikimedia.org/T250257 (10ayounsi) p:05Triage→03Medium [08:52:36] (03PS2) 10Jcrespo: install_server: Remove backup2002 from recipe to test manual install [puppet] - 10https://gerrit.wikimedia.org/r/588955 (https://phabricator.wikimedia.org/T156955) [08:53:35] 10Operations, 10Patch-For-Review, 10User-fgiunchedi: Standardizing our partman recipes - https://phabricator.wikimedia.org/T156955 (10jcrespo) Sorry, wrong bug^ [08:53:45] (03PS3) 10Jcrespo: install_server: Remove backup2002 from recipe to test manual install [puppet] - 10https://gerrit.wikimedia.org/r/588955 (https://phabricator.wikimedia.org/T248934) [08:54:27] !log depool cp1081 for debugging purposes [08:54:32] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:54:33] !log kormat@cumin1001 dbctl commit (dc=all): 'Increase db1114's weight T250224', diff saved to https://phabricator.wikimedia.org/P10986 and previous config saved to /var/cache/conftool/dbconfig/20200415-085432-kormat.json [08:54:39] \o/ [08:54:40] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:54:40] T250224: move db1114 to s8 - https://phabricator.wikimedia.org/T250224 [08:56:18] (03CR) 10Dzahn: "> Patch Set 2:" [puppet] - 10https://gerrit.wikimedia.org/r/588708 (https://phabricator.wikimedia.org/T224591) (owner: 10Hashar) [08:57:57] ACKNOWLEDGEMENT - Prometheus jobs reduced availability on icinga1001 is CRITICAL: job=purged site={codfw,eqiad,eqsin,esams,ulsfo} Ema testing purged https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [08:58:19] (03CR) 10Jcrespo: [C: 03+2] install_server: Remove backup2002 from recipe to test manual install [puppet] - 10https://gerrit.wikimedia.org/r/588955 (https://phabricator.wikimedia.org/T248934) (owner: 10Jcrespo) [09:04:33] PROBLEM - aqs endpoints health on aqs1005 is CRITICAL: /analytics.wikimedia.org/v1/edits/per-page/{project}/{page-title}/{editor-type}/{granularity}/{start}/{end} (Get daily edits for english wikipedia page 0) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/aqs [09:05:18] PROBLEM - aqs endpoints health on aqs1006 is CRITICAL: /analytics.wikimedia.org/v1/edits/per-page/{project}/{page-title}/{editor-type}/{granularity}/{start}/{end} (Get daily edits for english wikipedia page 0) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/aqs [09:05:23] PROBLEM - aqs endpoints health on aqs1009 is CRITICAL: /analytics.wikimedia.org/v1/edits/per-page/{project}/{page-title}/{editor-type}/{granularity}/{start}/{end} (Get daily edits for english wikipedia page 0) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/aqs [09:05:28] PROBLEM - aqs endpoints health on aqs1004 is CRITICAL: /analytics.wikimedia.org/v1/edits/per-page/{project}/{page-title}/{editor-type}/{granularity}/{start}/{end} (Get daily edits for english wikipedia page 0) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/aqs [09:05:33] PROBLEM - aqs endpoints health on aqs1007 is CRITICAL: /analytics.wikimedia.org/v1/edits/per-page/{project}/{page-title}/{editor-type}/{granularity}/{start}/{end} (Get daily edits for english wikipedia page 0) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/aqs [09:06:10] 10Operations, 10observability: production-logstash elastic cluster is yellow state - https://phabricator.wikimedia.org/T250133 (10fgiunchedi) The change did reduce space but not enough as it seems! I'll keep digging [09:06:31] I am on aqs, I think I know what the problem is sigh [09:07:17] !log repool cp1081 [09:07:21] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:07:55] PROBLEM - PyBal backends health check on lvs1015 is CRITICAL: PYBAL CRITICAL - CRITICAL - druid-public-broker_8082: Servers druid1006.eqiad.wmnet, druid1005.eqiad.wmnet, druid1004.eqiad.wmnet are marked down but pooled https://wikitech.wikimedia.org/wiki/PyBal [09:08:15] !log restart druid brokers on druid100[4-6] - stuck after datasource deletion [09:08:19] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:08:51] RECOVERY - aqs endpoints health on aqs1005 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/aqs [09:09:25] RECOVERY - PyBal backends health check on lvs1015 is OK: PYBAL OK - All pools are healthy https://wikitech.wikimedia.org/wiki/PyBal [09:09:41] RECOVERY - aqs endpoints health on aqs1006 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/aqs [09:09:45] gooood [09:09:47] RECOVERY - aqs endpoints health on aqs1009 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/aqs [09:09:53] RECOVERY - aqs endpoints health on aqs1004 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/aqs [09:09:59] RECOVERY - aqs endpoints health on aqs1007 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/aqs [09:12:10] (03CR) 10Arturo Borrero Gonzalez: cloudvps: Add metricsinfra prometheus server (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/588803 (https://phabricator.wikimedia.org/T250206) (owner: 10Jhedden) [09:14:29] 10Operations, 10ops-codfw, 10DBA: (Need by: TBD) codfw: rack/setup/install backup2002/array backup2002-array1 - https://phabricator.wikimedia.org/T248934 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by jynus on cumin2001.codfw.wmnet for hosts: ` backup2002.codfw.wmnet ` The log can be found in... [09:14:32] 10Operations, 10ops-codfw, 10DBA: (Need by: TBD) codfw: rack/setup/install backup2002/array backup2002-array1 - https://phabricator.wikimedia.org/T248934 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['backup2002.codfw.wmnet'] ` Of which those **FAILED**: ` ['backup2002.codfw.wmnet'] ` [09:15:58] 10Operations, 10ops-codfw, 10DBA: (Need by: TBD) codfw: rack/setup/install backup2002/array backup2002-array1 - https://phabricator.wikimedia.org/T248934 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by jynus on cumin2001.codfw.wmnet for hosts: ` ['backup2002.codfw.wmnet'] ` The log can be foun... [09:16:22] sorry for the spam, I got it at the third try [09:17:32] 10Operations, 10observability: production-logstash elastic cluster is yellow state - https://phabricator.wikimedia.org/T250133 (10fgiunchedi) Some shards did get assigned due to the increased available space (see below) although not all, for now and to get the cluster to allocate all shards I'm thinking of pus... [09:17:54] 10Operations, 10Traffic: pybal healthchecks reaching the applayer on specific requests - https://phabricator.wikimedia.org/T250258 (10Vgutierrez) [09:18:13] 10Operations, 10Traffic: pybal healthchecks reaching the applayer on specific requests - https://phabricator.wikimedia.org/T250258 (10Vgutierrez) p:05Triage→03High [09:19:50] (03PS1) 10Filippo Giunchedi: logstash: temporarily keep one copy after 50 days [puppet] - 10https://gerrit.wikimedia.org/r/588958 (https://phabricator.wikimedia.org/T250133) [09:20:15] (03PS5) 10Dzahn: gerrit: remove unused parameter for cache_text_nodes [puppet] - 10https://gerrit.wikimedia.org/r/588699 [09:23:32] (03CR) 10Arturo Borrero Gonzalez: [C: 03+2] dynamicproxy: add support for dynamic XFF per FQDN [puppet] - 10https://gerrit.wikimedia.org/r/583098 (https://phabricator.wikimedia.org/T135046) (owner: 10Arturo Borrero Gonzalez) [09:28:36] (03PS1) 10Ema: 0.5: purge full request URI, not just the path [software/purged] - 10https://gerrit.wikimedia.org/r/588959 (https://phabricator.wikimedia.org/T249583) [09:29:18] (03CR) 10Vgutierrez: [C: 03+1] 0.5: purge full request URI, not just the path [software/purged] - 10https://gerrit.wikimedia.org/r/588959 (https://phabricator.wikimedia.org/T249583) (owner: 10Ema) [09:29:52] (03CR) 10Elukey: [C: 03+1] logstash: temporarily keep one copy after 50 days [puppet] - 10https://gerrit.wikimedia.org/r/588958 (https://phabricator.wikimedia.org/T250133) (owner: 10Filippo Giunchedi) [09:30:23] 10Operations, 10SRE-Access-Requests, 10Patch-For-Review: Requesting access to analytics for andrew-wmde - https://phabricator.wikimedia.org/T249733 (10Tobi_WMDE_SW) >>! In T249733#6042118, @MoritzMuehlenhoff wrote: > You have an updated NDA in our records, so that's covered. > > Adding @Nuria for approval o... [09:31:44] (03CR) 10Filippo Giunchedi: [C: 03+2] logstash: temporarily keep one copy after 50 days [puppet] - 10https://gerrit.wikimedia.org/r/588958 (https://phabricator.wikimedia.org/T250133) (owner: 10Filippo Giunchedi) [09:33:16] (03PS1) 10Giuseppe Lavagetto: safe-service-restart: formatting fixed [puppet] - 10https://gerrit.wikimedia.org/r/588960 [09:33:18] (03PS1) 10Giuseppe Lavagetto: safe-service-restart: allow for a grace period after depooling [puppet] - 10https://gerrit.wikimedia.org/r/588961 [09:33:32] (03PS1) 10Vgutierrez: ATS: Disable KA for POST requests on eqiad [puppet] - 10https://gerrit.wikimedia.org/r/588962 (https://phabricator.wikimedia.org/T250258) [09:33:39] (03CR) 10Ema: [C: 03+2] 0.5: purge full request URI, not just the path [software/purged] - 10https://gerrit.wikimedia.org/r/588959 (https://phabricator.wikimedia.org/T249583) (owner: 10Ema) [09:34:30] 10Operations, 10netbox: Netbox racks consistency report - https://phabricator.wikimedia.org/T212878 (10Peachey88) [09:35:36] (03CR) 10Vgutierrez: "pcc looks happy: https://puppet-compiler.wmflabs.org/compiler1001/21935/" [puppet] - 10https://gerrit.wikimedia.org/r/588962 (https://phabricator.wikimedia.org/T250258) (owner: 10Vgutierrez) [09:35:38] 10Operations, 10SRE-Access-Requests, 10Patch-For-Review: Requesting access to analytics for andrew-wmde - https://phabricator.wikimedia.org/T249733 (10Andrew-WMDE) >>! In T249733#6054088, @fgiunchedi wrote: > Also please specify whether Kerberos user is needed (cfr https://wikitech.wikimedia.org/wiki/Analyti... [09:36:18] 10Operations, 10DBA: move db1114 to s8 - https://phabricator.wikimedia.org/T250224 (10Marostegui) db1114 added to the Wikidata CPU dashboard [09:37:28] (03CR) 10Dzahn: [C: 03+2] Profile to inject Gerrit ssh public key to known_hosts [puppet] - 10https://gerrit.wikimedia.org/r/588708 (https://phabricator.wikimedia.org/T224591) (owner: 10Hashar) [09:38:55] (03CR) 10Muehlenhoff: [C: 03+1] "Looks good" [puppet] - 10https://gerrit.wikimedia.org/r/588951 (https://phabricator.wikimedia.org/T249873) (owner: 10Filippo Giunchedi) [09:43:05] !log kormat@cumin1001 dbctl commit (dc=all): 'Increase db1114's weight some more T250224', diff saved to https://phabricator.wikimedia.org/P10988 and previous config saved to /var/cache/conftool/dbconfig/20200415-094305-kormat.json [09:43:10] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:43:12] T250224: move db1114 to s8 - https://phabricator.wikimedia.org/T250224 [09:43:28] (03CR) 10Ema: [C: 03+1] ATS: Disable KA for POST requests on eqiad [puppet] - 10https://gerrit.wikimedia.org/r/588962 (https://phabricator.wikimedia.org/T250258) (owner: 10Vgutierrez) [09:44:06] (03CR) 10Vgutierrez: [C: 03+2] ATS: Disable KA for POST requests on eqiad [puppet] - 10https://gerrit.wikimedia.org/r/588962 (https://phabricator.wikimedia.org/T250258) (owner: 10Vgutierrez) [09:45:22] !log force-run curator from logstash1008 - T250133 [09:45:28] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:45:29] T250133: production-logstash elastic cluster is yellow state - https://phabricator.wikimedia.org/T250133 [09:48:15] !log disable KA between ats-tls and varnish-fe for POST requests on eqiad - T250258 [09:48:15] !log jynus@cumin2001 START - Cookbook sre.hosts.downtime [09:48:20] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:48:21] T250258: pybal healthchecks reaching the applayer on specific requests - https://phabricator.wikimedia.org/T250258 [09:48:24] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:48:29] 10Operations, 10observability, 10Patch-For-Review: production-logstash elastic cluster is yellow state - https://phabricator.wikimedia.org/T250133 (10fgiunchedi) [09:50:37] !log jynus@cumin2001 END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) [09:50:39] 10Operations, 10ops-codfw, 10DBA: (Need by: TBD) codfw: rack/setup/install backup2002/array backup2002-array1 - https://phabricator.wikimedia.org/T248934 (10jcrespo) Manual install went through- not ideal, but enough to unblock this task. Now we only need to setup the storage- I can take care of this, as the... [09:50:41] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:52:51] (03CR) 10Ema: [C: 03+2] vcl: toggle to block non-API traffic from public clouds [puppet] - 10https://gerrit.wikimedia.org/r/588135 (owner: 10Ema) [09:53:23] (03CR) 10Filippo Giunchedi: [C: 03+1] cli: add pcc invocation to logging output [software/puppet-compiler] - 10https://gerrit.wikimedia.org/r/588703 (https://phabricator.wikimedia.org/T250169) (owner: 10Jbond) [09:54:39] (03PS1) 10Hashar: Revert "Profile to inject Gerrit ssh public key to known_hosts" [puppet] - 10https://gerrit.wikimedia.org/r/588965 [09:55:51] (03CR) 10Dzahn: [C: 03+2] Revert "Profile to inject Gerrit ssh public key to known_hosts" [puppet] - 10https://gerrit.wikimedia.org/r/588965 (owner: 10Hashar) [09:57:00] 10Operations, 10ops-codfw, 10DBA: (Need by: TBD) codfw: rack/setup/install backup2002/array backup2002-array1 - https://phabricator.wikimedia.org/T248934 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['backup2002.codfw.wmnet'] ` and were **ALL** successful. [09:57:49] (03CR) 10Filippo Giunchedi: [C: 03+2] admin: add Jim Maddock [puppet] - 10https://gerrit.wikimedia.org/r/588951 (https://phabricator.wikimedia.org/T249873) (owner: 10Filippo Giunchedi) [09:58:00] (03PS4) 10Filippo Giunchedi: admin: add Jim Maddock [puppet] - 10https://gerrit.wikimedia.org/r/588951 (https://phabricator.wikimedia.org/T249873) [10:00:02] (03PS1) 10Hashar: zuul: inject Gerrit ssh public key to known_hosts [puppet] - 10https://gerrit.wikimedia.org/r/588968 [10:01:04] (03PS1) 10Jbond: hiera_lookup: update hiera_lookup to use bundler [puppet] - 10https://gerrit.wikimedia.org/r/588969 [10:02:27] !log upload purged 0.5 to buster-wikimedia T249583 [10:02:32] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:02:33] T249583: Create vhtcpd replacement - https://phabricator.wikimedia.org/T249583 [10:03:55] (03PS1) 10Ema: TestWorkers: Wait for both fe and be channels to be consumed [software/purged] - 10https://gerrit.wikimedia.org/r/588970 [10:04:52] (03PS2) 10Dzahn: zuul: inject Gerrit ssh public key to known_hosts [puppet] - 10https://gerrit.wikimedia.org/r/588968 (https://phabricator.wikimedia.org/T224591) (owner: 10Hashar) [10:06:04] (03CR) 10Jbond: utils: fix hiera Debian package name (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/588729 (owner: 10Filippo Giunchedi) [10:06:39] (03CR) 10Dzahn: [C: 03+2] zuul: inject Gerrit ssh public key to known_hosts [puppet] - 10https://gerrit.wikimedia.org/r/588968 (https://phabricator.wikimedia.org/T224591) (owner: 10Hashar) [10:08:31] (03PS1) 10Ema: cache: test purged on cp3050 [puppet] - 10https://gerrit.wikimedia.org/r/588971 (https://phabricator.wikimedia.org/T249583) [10:09:37] (03CR) 10Ema: [C: 03+2] cache: test purged on cp3050 [puppet] - 10https://gerrit.wikimedia.org/r/588971 (https://phabricator.wikimedia.org/T249583) (owner: 10Ema) [10:09:43] 10Operations, 10SRE-Access-Requests, 10Patch-For-Review: Requesting access to analytics-privatedata-users for Jim Maddock - https://phabricator.wikimedia.org/T249873 (10fgiunchedi) [10:12:02] 10Operations, 10Puppet, 10User-jbond: admin: create schema validation for admin.yaml - https://phabricator.wikimedia.org/T250259 (10jbond) [10:12:26] (03CR) 10Jbond: [C: 03+2] admin: show uid in uid test error [puppet] - 10https://gerrit.wikimedia.org/r/588387 (owner: 10Jbond) [10:12:33] (03PS1) 10Dzahn: ATS: switch backend for integration.wm.org to contint2001 [puppet] - 10https://gerrit.wikimedia.org/r/588973 (https://phabricator.wikimedia.org/T224591) [10:13:09] jbond42: lock file fight. yes, merge :) [10:13:45] ah, ok, done [10:14:17] thanks [10:14:32] (03PS2) 10Filippo Giunchedi: admin: add andrew-wmde to analytics-privatedata-users [puppet] - 10https://gerrit.wikimedia.org/r/588653 (https://phabricator.wikimedia.org/T249733) [10:16:08] (03CR) 10Filippo Giunchedi: "Ready for review" [puppet] - 10https://gerrit.wikimedia.org/r/588653 (https://phabricator.wikimedia.org/T249733) (owner: 10Filippo Giunchedi) [10:24:06] (03CR) 10Jbond: "change looks good however I wonder if its useful to be able to toggle some of theses settings see inline" (031 comment) [homer/public] - 10https://gerrit.wikimedia.org/r/588948 (https://phabricator.wikimedia.org/T244147) (owner: 10Ayounsi) [10:24:37] (03PS1) 10Filippo Giunchedi: admin: add Matthew Williams [puppet] - 10https://gerrit.wikimedia.org/r/588976 (https://phabricator.wikimedia.org/T249844) [10:25:06] !log cp3050: varnish-frontend-restart to clear mbox lag and see how long it takes to show up T249583 [10:25:11] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:25:12] T249583: Create vhtcpd replacement - https://phabricator.wikimedia.org/T249583 [10:26:32] (03CR) 10Jbond: [C: 03+1] "lgtm" [puppet] - 10https://gerrit.wikimedia.org/r/588653 (https://phabricator.wikimedia.org/T249733) (owner: 10Filippo Giunchedi) [10:28:06] (03CR) 10Jbond: [C: 03+1] "lgtm" [puppet] - 10https://gerrit.wikimedia.org/r/588976 (https://phabricator.wikimedia.org/T249844) (owner: 10Filippo Giunchedi) [10:28:39] (03PS2) 10Jbond: pcc templates: refactor templates to make them more DRY [software/puppet-compiler] - 10https://gerrit.wikimedia.org/r/588735 (https://phabricator.wikimedia.org/T250169) [10:28:47] (03PS2) 10Jbond: pcc templates: add cli instructions to template footer [software/puppet-compiler] - 10https://gerrit.wikimedia.org/r/588736 (https://phabricator.wikimedia.org/T250169) [10:30:35] 10Operations, 10Core Platform Team, 10Traffic, 10serviceops, 10Performance-Team (Radar): Stop sending purges for `action=history` for linked pages. - https://phabricator.wikimedia.org/T250261 (10Joe) [10:37:02] 10Operations, 10Repository-Admins, 10Traffic: Requesting new gerrit project repository "operations/software/purged" - https://phabricator.wikimedia.org/T249606 (10ema) 05Open→03Resolved [10:42:27] (03PS1) 10Dzahn: ci::master: add envoy for TLS termination for integration [puppet] - 10https://gerrit.wikimedia.org/r/588980 (https://phabricator.wikimedia.org/T210411) [10:57:27] !log Deploy schema change on s8 codfw master - T250057 [10:57:32] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:57:33] T250057: type_acton index in logging table is lingering in production - https://phabricator.wikimedia.org/T250057 [11:00:04] Amir1, Lucas_WMDE, awight, and Urbanecm: How many deployers does it take to do European Mid-day SWAT(Max 6 patches) deploy? (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20200415T1100). [11:00:05] tgr and awight: A patch you scheduled for European Mid-day SWAT(Max 6 patches) is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker. [11:00:32] I'd be happy to deploy. tgr: or would you prefer to do your own? [11:01:06] thanks awight! [11:01:31] Great, I'll let you know in a minute when it's up for testing. [11:02:36] (03CR) 10Awight: [C: 03+2] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/588701 (https://phabricator.wikimedia.org/T249956) (owner: 10Gergő Tisza) [11:02:50] (03PS2) 10Awight: Deploy Welcome Survey to Serbian Wikipedia and French Wiktionary [mediawiki-config] - 10https://gerrit.wikimedia.org/r/588701 (https://phabricator.wikimedia.org/T249956) (owner: 10Gergő Tisza) [11:03:04] (03CR) 10Awight: [C: 03+2] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/588701 (https://phabricator.wikimedia.org/T249956) (owner: 10Gergő Tisza) [11:04:19] (03Merged) 10jenkins-bot: Deploy Welcome Survey to Serbian Wikipedia and French Wiktionary [mediawiki-config] - 10https://gerrit.wikimedia.org/r/588701 (https://phabricator.wikimedia.org/T249956) (owner: 10Gergő Tisza) [11:05:17] tgr: Live on mwdebug1001 [11:05:33] (03PS9) 10Hnowlan: profile::kubernetes: add the puppet CA cert to general.yaml [puppet] - 10https://gerrit.wikimedia.org/r/587799 (https://phabricator.wikimedia.org/T249633) [11:07:01] awight: works as expected [11:07:08] (03PS6) 10Jbond: role::mail::mx: enable jumpcloud test domain [puppet] - 10https://gerrit.wikimedia.org/r/588425 (https://phabricator.wikimedia.org/T244792) [11:09:02] !log awight@deploy1001 Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:588701|Deploy Welcome Survey to Serbian Wikipedia and French Wiktionary (T249956)]] (duration: 01m 24s) [11:09:07] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:09:08] T249956: Deploy Welcome Survey to Serbian Wikipedia and French Wiktionary - https://phabricator.wikimedia.org/T249956 [11:09:23] (03PS2) 10Arturo Borrero Gonzalez: cloud: review direct references to cloudcontrol1003.wikimedia.org [puppet] - 10https://gerrit.wikimedia.org/r/587556 [11:09:23] thanks! [11:09:42] :-) of course [11:10:14] (03PS7) 10Jbond: role::mail::mx: enable jumpcloud test domain [puppet] - 10https://gerrit.wikimedia.org/r/588425 (https://phabricator.wikimedia.org/T244792) [11:10:27] (03CR) 10Jbond: "updated thanks" (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/588425 (https://phabricator.wikimedia.org/T244792) (owner: 10Jbond) [11:11:19] (03PS10) 10Hnowlan: profile::kubernetes: add the puppet CA cert to general.yaml [puppet] - 10https://gerrit.wikimedia.org/r/587799 (https://phabricator.wikimedia.org/T249633) [11:14:08] (03CR) 10jerkins-bot: [V: 04-1] role::mail::mx: enable jumpcloud test domain [puppet] - 10https://gerrit.wikimedia.org/r/588425 (https://phabricator.wikimedia.org/T244792) (owner: 10Jbond) [11:14:30] (03PS11) 10Hnowlan: profile::kubernetes: add the puppet CA cert to general.yaml [puppet] - 10https://gerrit.wikimedia.org/r/587799 (https://phabricator.wikimedia.org/T249633) [11:17:11] (03CR) 10Hnowlan: "This change now writes the entire config as YAML, removing the need for templating and string mangling: https://puppet-compiler.wmflabs.or" [puppet] - 10https://gerrit.wikimedia.org/r/587799 (https://phabricator.wikimedia.org/T249633) (owner: 10Hnowlan) [11:21:03] (03PS8) 10Jbond: role::mail::mx: enable jumpcloud test domain [puppet] - 10https://gerrit.wikimedia.org/r/588425 (https://phabricator.wikimedia.org/T244792) [11:22:29] !log awight@deploy1001 Synchronized php-1.35.0-wmf.28/extensions/TwoColConflict: SWAT: [[gerrit:588966|Flatten exit logging (T248601)]] (duration: 01m 09s) [11:22:36] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:22:37] T248601: Adapt existing column metrics for talk page use case - https://phabricator.wikimedia.org/T248601 [11:22:41] (03CR) 10Arturo Borrero Gonzalez: [C: 03+2] cloud: review direct references to cloudcontrol1003.wikimedia.org [puppet] - 10https://gerrit.wikimedia.org/r/587556 (owner: 10Arturo Borrero Gonzalez) [11:23:15] !log EU SWAT complete [11:23:19] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:23:48] (03CR) 10Jbond: profile::kubernetes: add the puppet CA cert to general.yaml (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/587799 (https://phabricator.wikimedia.org/T249633) (owner: 10Hnowlan) [11:25:41] tgr: I realized that I hadn't double-synced InitialiseSettings.php--doing that now. [11:26:09] !log awight@deploy1001 sync-file aborted: SWAT: [[gerrit:588701|Deploy Welcome Survey to Serbian Wikipedia and French Wiktionary (T249956)]] (double-sync) (duration: 00m 02s) [11:26:14] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:26:15] T249956: Deploy Welcome Survey to Serbian Wikipedia and French Wiktionary - https://phabricator.wikimedia.org/T249956 [11:27:02] (03PS12) 10Hnowlan: profile::kubernetes: add the puppet CA cert to general.yaml [puppet] - 10https://gerrit.wikimedia.org/r/587799 (https://phabricator.wikimedia.org/T249633) [11:27:20] (03CR) 10Hnowlan: profile::kubernetes: add the puppet CA cert to general.yaml (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/587799 (https://phabricator.wikimedia.org/T249633) (owner: 10Hnowlan) [11:27:22] !log awight@deploy1001 Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:588701|Deploy Welcome Survey to Serbian Wikipedia and French Wiktionary (T249956)]] (double-sync) (duration: 01m 03s) [11:27:28] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:32:11] (03PS1) 10Arturo Borrero Gonzalez: sbuild: chroot: update it using the root user [puppet] - 10https://gerrit.wikimedia.org/r/588987 (https://phabricator.wikimedia.org/T249837) [11:37:57] (03CR) 10Ayounsi: "Some more comments. It would be useful to run it on a few test hosts, like two LVS and two CP in the same DC, a ping1001 host, maybe a clo" (036 comments) [software/netbox-extras] - 10https://gerrit.wikimedia.org/r/588036 (https://phabricator.wikimedia.org/T244153) (owner: 10CRusnov) [11:40:46] (03CR) 10Jbond: [C: 03+1] "puppet wise looks good to me" [puppet] - 10https://gerrit.wikimedia.org/r/587799 (https://phabricator.wikimedia.org/T249633) (owner: 10Hnowlan) [11:41:05] (03CR) 10Ayounsi: uRPF: sample and discard (031 comment) [homer/public] - 10https://gerrit.wikimedia.org/r/588948 (https://phabricator.wikimedia.org/T244147) (owner: 10Ayounsi) [11:45:54] (03CR) 10Jbond: [C: 03+1] uRPF: sample and discard (031 comment) [homer/public] - 10https://gerrit.wikimedia.org/r/588948 (https://phabricator.wikimedia.org/T244147) (owner: 10Ayounsi) [11:51:48] (03PS1) 10Cparle: Enable quality constraints on production commons [mediawiki-config] - 10https://gerrit.wikimedia.org/r/588988 (https://phabricator.wikimedia.org/T248177) [11:58:11] (03CR) 10Filippo Giunchedi: utils: fix hiera Debian package name (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/588729 (owner: 10Filippo Giunchedi) [11:58:13] (03CR) 10Lucas Werkmeister (WMDE): Enable quality constraints on production commons (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/588988 (https://phabricator.wikimedia.org/T248177) (owner: 10Cparle) [11:58:24] (03Abandoned) 10Filippo Giunchedi: utils: fix hiera Debian package name [puppet] - 10https://gerrit.wikimedia.org/r/588729 (owner: 10Filippo Giunchedi) [12:01:38] (03CR) 10Filippo Giunchedi: [C: 03+1] hiera_lookup: update hiera_lookup to use bundler [puppet] - 10https://gerrit.wikimedia.org/r/588969 (owner: 10Jbond) [12:02:06] (03CR) 10Filippo Giunchedi: [C: 03+2] admin: add Matthew Williams [puppet] - 10https://gerrit.wikimedia.org/r/588976 (https://phabricator.wikimedia.org/T249844) (owner: 10Filippo Giunchedi) [12:02:17] (03PS2) 10Filippo Giunchedi: admin: add Matthew Williams [puppet] - 10https://gerrit.wikimedia.org/r/588976 (https://phabricator.wikimedia.org/T249844) [12:03:25] !log puppetmaster1001: revoking ganeti01.svc.eqiad.wmnet and ganeti01.svc.codfw.wmnet certificates. adding eqiad and codfw to cergen .yaml file, recreating ganeti certs [12:03:31] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:03:39] (03CR) 10QEDK: [C: 03+1] "LGTM" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/587301 (https://phabricator.wikimedia.org/T249643) (owner: 10Huji) [12:05:52] (03CR) 10Arturo Borrero Gonzalez: [C: 03+2] sbuild: chroot: update it using the root user [puppet] - 10https://gerrit.wikimedia.org/r/588987 (https://phabricator.wikimedia.org/T249837) (owner: 10Arturo Borrero Gonzalez) [12:06:01] (03CR) 10Muehlenhoff: [C: 03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/588976 (https://phabricator.wikimedia.org/T249844) (owner: 10Filippo Giunchedi) [12:10:11] 10Operations, 10SRE-Access-Requests: Requesting access to analytics-privatedata-users for Jim Maddock - https://phabricator.wikimedia.org/T249873 (10fgiunchedi) @jmads shell and basic authentication (nda ldap group) access should work now, please confirm! [12:11:22] (03CR) 10Filippo Giunchedi: [C: 03+2] admin: add andrew-wmde to analytics-privatedata-users [puppet] - 10https://gerrit.wikimedia.org/r/588653 (https://phabricator.wikimedia.org/T249733) (owner: 10Filippo Giunchedi) [12:11:32] (03PS3) 10Filippo Giunchedi: admin: add andrew-wmde to analytics-privatedata-users [puppet] - 10https://gerrit.wikimedia.org/r/588653 (https://phabricator.wikimedia.org/T249733) [12:17:03] (03PS1) 10Dzahn: update all ganeti certs, add eqiad and codfw [puppet] - 10https://gerrit.wikimedia.org/r/588995 [12:17:39] 10Operations, 10LDAP-Access-Requests: LDAP access to the wmf group for Matthew Williams - https://phabricator.wikimedia.org/T249844 (10fgiunchedi) 05Open→03Resolved >>! In T249844#6057883, @MoritzMuehlenhoff wrote: > This needs a corresponding entry in data.yaml {{done}} [12:22:33] (03PS2) 10Dzahn: update all ganeti certs, add eqiad and codfw [puppet] - 10https://gerrit.wikimedia.org/r/588995 [12:24:56] (03CR) 10Dzahn: "~/puppet/files/ssl$ for dc in eqiad codfw ulsfo eqsin esams; do openssl x509 -in ganeti01.svc.${dc}.wmnet.crt -text -noout | grep Subject:" [puppet] - 10https://gerrit.wikimedia.org/r/588995 (owner: 10Dzahn) [12:25:43] (03CR) 10Dzahn: [C: 03+2] "Per email thread these are just used by netbox." [puppet] - 10https://gerrit.wikimedia.org/r/588995 (owner: 10Dzahn) [12:28:21] (03CR) 10Urbanecm: [C: 04-1] Enable quality constraints on production commons (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/588988 (https://phabricator.wikimedia.org/T248177) (owner: 10Cparle) [12:28:52] 10Operations, 10Pybal, 10Traffic: pybal healthchecks reaching the applayer on specific requests - https://phabricator.wikimedia.org/T250258 (10Aklapper) [12:31:56] 10Operations, 10Patch-For-Review, 10User-fgiunchedi: Standardizing our partman recipes - https://phabricator.wikimedia.org/T156955 (10elukey) [12:36:43] (03PS1) 10Elukey: Run the refine failure flag checker against the last 48h instead of 24 [puppet] - 10https://gerrit.wikimedia.org/r/589000 (https://phabricator.wikimedia.org/T240230) [12:36:47] (03PS1) 10Vgutierrez: ATS: Disable KA for websocket connections [puppet] - 10https://gerrit.wikimedia.org/r/589001 (https://phabricator.wikimedia.org/T250258) [12:39:58] (03PS2) 10Cparle: Enable quality constraints on production commons [mediawiki-config] - 10https://gerrit.wikimedia.org/r/588988 (https://phabricator.wikimedia.org/T248177) [12:40:23] (03CR) 10Cparle: Enable quality constraints on production commons (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/588988 (https://phabricator.wikimedia.org/T248177) (owner: 10Cparle) [12:40:31] (03CR) 10jerkins-bot: [V: 04-1] ATS: Disable KA for websocket connections [puppet] - 10https://gerrit.wikimedia.org/r/589001 (https://phabricator.wikimedia.org/T250258) (owner: 10Vgutierrez) [12:41:30] !log rolling upgrade to ATS 8.0.7-rc0-1wm2 in ulsfo and eqsin - T249335 [12:41:36] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:41:36] T249335: Memory leak on ats-tls 8.0.6 - https://phabricator.wikimedia.org/T249335 [12:42:51] (03CR) 10Vgutierrez: [C: 04-2] "blocked by T250258" [puppet] - 10https://gerrit.wikimedia.org/r/588954 (https://phabricator.wikimedia.org/T249335) (owner: 10Vgutierrez) [12:46:42] (03PS2) 10Vgutierrez: ATS: Disable KA for websocket connections [puppet] - 10https://gerrit.wikimedia.org/r/589001 (https://phabricator.wikimedia.org/T250258) [12:49:18] (03CR) 10Elukey: [C: 03+2] Run the refine failure flag checker against the last 48h instead of 24 [puppet] - 10https://gerrit.wikimedia.org/r/589000 (https://phabricator.wikimedia.org/T240230) (owner: 10Elukey) [12:49:31] !log kormat@cumin1001 dbctl commit (dc=all): 'Increase db1114's weight to 50% of target T250224', diff saved to https://phabricator.wikimedia.org/P10989 and previous config saved to /var/cache/conftool/dbconfig/20200415-124931-kormat.json [12:49:37] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:49:38] T250224: move db1114 to s8 - https://phabricator.wikimedia.org/T250224 [12:49:44] (03PS2) 10ArielGlenn: remove extraneous private/public wik checks [dumps] - 10https://gerrit.wikimedia.org/r/585534 (https://phabricator.wikimedia.org/T249508) [12:50:45] (03PS3) 10ArielGlenn: remove capability to dump private tables [dumps] - 10https://gerrit.wikimedia.org/r/585535 (https://phabricator.wikimedia.org/T249508) [12:51:08] (03CR) 10jerkins-bot: [V: 04-1] ATS: Disable KA for websocket connections [puppet] - 10https://gerrit.wikimedia.org/r/589001 (https://phabricator.wikimedia.org/T250258) (owner: 10Vgutierrez) [12:51:51] (03PS1) 10ArielGlenn: read special files from directory with the correct date [dumps] - 10https://gerrit.wikimedia.org/r/589004 (https://phabricator.wikimedia.org/T249508) [12:52:33] PROBLEM - Router interfaces on cr2-eqiad is CRITICAL: CRITICAL: host 208.80.154.197, interfaces up: 240, down: 1, dormant: 0, excluded: 0, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down [12:52:47] (03PS1) 10ArielGlenn: unit test for checking content of index.html file for a wiki dump run [dumps] - 10https://gerrit.wikimedia.org/r/589005 (https://phabricator.wikimedia.org/T249477) [12:53:04] (03PS3) 10Vgutierrez: ATS: Disable KA for websocket connections [puppet] - 10https://gerrit.wikimedia.org/r/589001 (https://phabricator.wikimedia.org/T250258) [12:53:41] (03PS1) 10ArielGlenn: unit test for private/public table type handling [dumps] - 10https://gerrit.wikimedia.org/r/589006 (https://phabricator.wikimedia.org/T249508) [12:54:12] (03CR) 10Lucas Werkmeister (WMDE): Enable quality constraints on production commons (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/588988 (https://phabricator.wikimedia.org/T248177) (owner: 10Cparle) [12:55:09] (03PS2) 10ArielGlenn: for 7z production in batches, skip files that exist at beginning of each batch [dumps] - 10https://gerrit.wikimedia.org/r/565301 (https://phabricator.wikimedia.org/T250260) [12:55:32] (03CR) 10jerkins-bot: [V: 04-1] for 7z production in batches, skip files that exist at beginning of each batch [dumps] - 10https://gerrit.wikimedia.org/r/565301 (https://phabricator.wikimedia.org/T250260) (owner: 10ArielGlenn) [12:55:47] (03PS1) 10Elukey: role::elasticsearch::cloudelastic: add more heap space for young gen [puppet] - 10https://gerrit.wikimedia.org/r/589007 (https://phabricator.wikimedia.org/T231517) [12:57:00] (03PS3) 10ArielGlenn: for 7z production in batches, skip files that exist at beginning of each batch [dumps] - 10https://gerrit.wikimedia.org/r/565301 (https://phabricator.wikimedia.org/T250260) [12:57:07] (03CR) 10Ottomata: [C: 03+1] Run the refine failure flag checker against the last 48h instead of 24 [puppet] - 10https://gerrit.wikimedia.org/r/589000 (https://phabricator.wikimedia.org/T240230) (owner: 10Elukey) [12:58:43] PROBLEM - Router interfaces on cr2-eqord is CRITICAL: CRITICAL: host 208.80.154.198, interfaces up: 62, down: 1, dormant: 0, excluded: 0, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down [12:58:54] (03PS1) 10ArielGlenn: add convenience bash script that runs all unit tests [dumps] - 10https://gerrit.wikimedia.org/r/589008 [13:00:04] James_F and liw: #bothumor Q:How do functions break up? A:They stop calling each other. Rise for Mediawiki train - American+European Version (secondary timeslot) deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20200415T1300). [13:01:08] Fun. [13:02:03] James_F, I didn't see a reason to roll back so I didn't [13:02:17] I guess we should push out the cherry picks. [13:02:27] liw: Ack. Thank you. [13:04:31] (03PS4) 10Jhedden: cloudvps: Add metricsinfra prometheus server [puppet] - 10https://gerrit.wikimedia.org/r/588803 (https://phabricator.wikimedia.org/T250206) [13:05:44] (03CR) 10Jhedden: cloudvps: Add metricsinfra prometheus server (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/588803 (https://phabricator.wikimedia.org/T250206) (owner: 10Jhedden) [13:07:10] (03CR) 10Ema: [C: 03+1] ATS: Disable KA for websocket connections [puppet] - 10https://gerrit.wikimedia.org/r/589001 (https://phabricator.wikimedia.org/T250258) (owner: 10Vgutierrez) [13:07:55] (03CR) 10Vgutierrez: [C: 03+2] ATS: Disable KA for websocket connections [puppet] - 10https://gerrit.wikimedia.org/r/589001 (https://phabricator.wikimedia.org/T250258) (owner: 10Vgutierrez) [13:10:39] !log contint2001: starting zuul-merger process # T224591 [13:10:44] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:10:45] T224591: Migrate contint* hosts to Buster - https://phabricator.wikimedia.org/T224591 [13:11:01] RECOVERY - Router interfaces on cr2-eqiad is OK: OK: host 208.80.154.197, interfaces up: 242, down: 0, dormant: 0, excluded: 0, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down [13:11:37] RECOVERY - Router interfaces on cr2-eqord is OK: OK: host 208.80.154.198, interfaces up: 64, down: 0, dormant: 0, excluded: 0, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down [13:13:48] (03PS1) 10ArielGlenn: fix an annoying typo in the test modules docs [dumps] - 10https://gerrit.wikimedia.org/r/589010 [13:15:14] 10Operations, 10Core Platform Team, 10MediaWiki-Parser, 10serviceops: purgeParserCache.php: Cannot purge this kind of parser cache - https://phabricator.wikimedia.org/T250231 (10WDoranWMF) [13:21:06] (03CR) 10Ema: [V: 03+2 C: 03+2] Add 0036-VSV00004.patch [debs/varnish4] (debian-wmf) - 10https://gerrit.wikimedia.org/r/587783 (https://phabricator.wikimedia.org/T249810) (owner: 10Ema) [13:23:12] !log kormat@cumin1001 dbctl commit (dc=all): 'Increase db1114's weight to 100% of target, and reduce db1104 slightly T250224', diff saved to https://phabricator.wikimedia.org/P10990 and previous config saved to /var/cache/conftool/dbconfig/20200415-132310-kormat.json [13:23:18] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:23:19] T250224: move db1114 to s8 - https://phabricator.wikimedia.org/T250224 [13:23:37] !log jforrester@deploy1001 Synchronized php-1.35.0-wmf.28/includes/page/PageArchive.php: T248727 Fix RevisionUndeleted hook to add (duration: 01m 08s) [13:23:43] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:23:43] T248727: Replace ArticleRevisionUndeleted with RevisionUndeleted - https://phabricator.wikimedia.org/T248727 [13:24:26] 10Operations, 10DBA: move db1114 to s8 - https://phabricator.wikimedia.org/T250224 (10Kormat) 05Open→03Resolved db1114 is now fully pooled with weight 300. db1104 was also reduced from 350 to 300 to match. [13:25:24] !log jforrester@deploy1001 Synchronized php-1.35.0-wmf.28/extensions/LiquidThreads/classes/DeletionController.php: T248727 Adjust to RevisionUndeleted hook now having (duration: 01m 06s) [13:25:26] 10Operations, 10DBA: move db1114 to s8 - https://phabricator.wikimedia.org/T250224 (10Marostegui) Thank you so much for the help! This should help s8 indeed. I expect also to repool db1092 tomorrow once the ALTERs are done. Later today I will also decrease the weight for the RC slaves, which we increased yeste... [13:25:29] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:25:46] (03PS1) 10Hashar: contint: enable zuul-merger on contint2001 [puppet] - 10https://gerrit.wikimedia.org/r/589013 (https://phabricator.wikimedia.org/T224591) [13:26:50] !log jforrester@deploy1001 Synchronized php-1.35.0-wmf.28/extensions/Flow/Hooks.php: T248727 Adjust to RevisionUndeleted hook now having (duration: 01m 04s) [13:26:55] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:27:06] 10Operations, 10Release-Engineering-Team-TODO, 10Continuous-Integration-Infrastructure (phase-out-jessie), 10Patch-For-Review, 10Release-Engineering-Team (CI & Testing services): Migrate contint* hosts to Buster - https://phabricator.wikimedia.org/T224591 (10hashar) [13:29:26] James_F thanks for deploying the backports [13:30:23] DannyS712: Thanks for fixing. :-) [13:30:38] No problem - I broke it, I fix it :) [13:32:16] !log upload varnish_5.1.3-1wm14 to buster-wikimedia T249810 [13:32:20] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:34:04] (03CR) 10Dzahn: [C: 03+2] contint: enable zuul-merger on contint2001 [puppet] - 10https://gerrit.wikimedia.org/r/589013 (https://phabricator.wikimedia.org/T224591) (owner: 10Hashar) [13:40:10] 10Operations, 10observability: production-logstash elastic cluster is yellow state - https://phabricator.wikimedia.org/T250133 (10fgiunchedi) Curator still running on `2020-04-15 12:13:38,172 INFO forceMerging index logstash-mediawiki-2020.04.13 to 1 segments per shard. Please wait...` At any rate, the... [13:42:49] 10Operations, 10Core Platform Team, 10MediaWiki-Cache, 10Traffic, and 2 others: Stop sending purges for `action=history` for linked pages. - https://phabricator.wikimedia.org/T250261 (10Reedy) [13:46:57] (03PS4) 10Jforrester: Stop using $wgContentHandlerUseDB [mediawiki-config] - 10https://gerrit.wikimedia.org/r/583406 (owner: 10DannyS712) [13:47:15] (03PS5) 10Jforrester: Stop setting $wgContentHandlerUseDB, now unread [mediawiki-config] - 10https://gerrit.wikimedia.org/r/583406 (owner: 10DannyS712) [13:47:33] (03CR) 10Jforrester: [C: 03+2] Stop setting $wgContentHandlerUseDB, now unread [mediawiki-config] - 10https://gerrit.wikimedia.org/r/583406 (owner: 10DannyS712) [13:48:38] (03PS5) 10Jhedden: cloudvps: Add metricsinfra prometheus server [puppet] - 10https://gerrit.wikimedia.org/r/588803 (https://phabricator.wikimedia.org/T250206) [13:48:40] (03Merged) 10jenkins-bot: Stop setting $wgContentHandlerUseDB, now unread [mediawiki-config] - 10https://gerrit.wikimedia.org/r/583406 (owner: 10DannyS712) [13:50:45] (03PS3) 10Cparle: Enable quality constraints on production commons [mediawiki-config] - 10https://gerrit.wikimedia.org/r/588988 (https://phabricator.wikimedia.org/T248117) [13:50:53] (03CR) 10Cparle: Enable quality constraints on production commons (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/588988 (https://phabricator.wikimedia.org/T248117) (owner: 10Cparle) [13:52:01] !log jforrester@deploy1001 Synchronized wmf-config/CommonSettings.php: Stop setting wgContentHandlerUseDB, now unread (duration: 01m 06s) [13:52:07] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:53:09] (03PS4) 10ArielGlenn: for 7z production in batches, skip files that exist at beginning of each batch [dumps] - 10https://gerrit.wikimedia.org/r/565301 (https://phabricator.wikimedia.org/T250260) [13:54:22] (03PS2) 10Jforrester: mobile: Stop reading wmgMobileFrontend and wmgMinervaNeue, always true [mediawiki-config] - 10https://gerrit.wikimedia.org/r/586403 [13:54:41] (03CR) 10Jforrester: [C: 03+2] mobile: Stop reading wmgMobileFrontend and wmgMinervaNeue, always true [mediawiki-config] - 10https://gerrit.wikimedia.org/r/586403 (owner: 10Jforrester) [13:54:53] (03PS2) 10ArielGlenn: add convenience bash script that runs all unit tests [dumps] - 10https://gerrit.wikimedia.org/r/589008 [13:55:42] (03PS2) 10ArielGlenn: fix an annoying typo in the test modules docs [dumps] - 10https://gerrit.wikimedia.org/r/589010 [13:55:48] (03Merged) 10jenkins-bot: mobile: Stop reading wmgMobileFrontend and wmgMinervaNeue, always true [mediawiki-config] - 10https://gerrit.wikimedia.org/r/586403 (owner: 10Jforrester) [13:58:42] (03PS2) 10Jforrester: Stop defining wmgMobileFrontend and wmgMinervaNeue, unread [mediawiki-config] - 10https://gerrit.wikimedia.org/r/586404 [13:58:52] (03CR) 10Jforrester: [C: 03+2] Stop defining wmgMobileFrontend and wmgMinervaNeue, unread [mediawiki-config] - 10https://gerrit.wikimedia.org/r/586404 (owner: 10Jforrester) [13:59:29] !log jforrester@deploy1001 Synchronized wmf-config/mobile.php: Stop reading wmgMobileFrontend and wmgMinervaNeue, always true (duration: 01m 06s) [13:59:34] (03PS3) 10Jforrester: Move mobile-labs into CommonSettings-labs [mediawiki-config] - 10https://gerrit.wikimedia.org/r/586405 [13:59:34] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:59:42] (03PS3) 10Jforrester: Move mobile into CommonSettings [mediawiki-config] - 10https://gerrit.wikimedia.org/r/586406 [13:59:46] (03Merged) 10jenkins-bot: Stop defining wmgMobileFrontend and wmgMinervaNeue, unread [mediawiki-config] - 10https://gerrit.wikimedia.org/r/586404 (owner: 10Jforrester) [14:00:10] (03PS1) 10WMDE-leszek: Wikibase: Use false instead of database names for "local" entity sources on test wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/589019 (https://phabricator.wikimedia.org/T250183) [14:00:26] (03CR) 10WMDE-leszek: "This change is ready for review." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/589019 (https://phabricator.wikimedia.org/T250183) (owner: 10WMDE-leszek) [14:00:56] leszek_wmde: Want that deployed right now? [14:01:21] 10Operations, 10SRE-Access-Requests: Requesting access to analytics for andrew-wmde - https://phabricator.wikimedia.org/T249733 (10fgiunchedi) Thank you all! @Andrew-WMDE access should be working now (e.g. ssh access to stat hosts), please confirm! I'm not overly familiar with the Kerberos process so I'll def... [14:01:45] !log jforrester@deploy1001 Synchronized wmf-config/InitialiseSettings.php: Stop defining wmgMobileFrontend and wmgMinervaNeue, unread (duration: 01m 06s) [14:01:50] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:01:55] James_F: thanks, that would be cool [14:02:01] Okie-dokie. [14:02:10] (03CR) 10Jforrester: [C: 03+2] Wikibase: Use false instead of database names for "local" entity sources on test wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/589019 (https://phabricator.wikimedia.org/T250183) (owner: 10WMDE-leszek) [14:02:35] (03PS2) 10WMDE-leszek: Wikibase: Use false instead of database names for "local" entity sources on test wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/589019 (https://phabricator.wikimedia.org/T250183) [14:02:57] !log jforrester@deploy1001 Synchronized wmf-config/InitialiseSettings.php: Touch and secondary sync of IS for cache-busting (duration: 01m 05s) [14:03:02] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:03:15] (03PS2) 10Jforrester: Remove $wgEnablePartialBlocks config [mediawiki-config] - 10https://gerrit.wikimedia.org/r/576086 (https://phabricator.wikimedia.org/T242912) (owner: 10Tchanders) [14:03:41] leszek_wmde: Is this testable on mwdebug1001 or should I just sync it out? [14:04:09] James_F: let me try on mwdebug1001 first [14:04:11] (03PS6) 10Jhedden: cloudvps: Add metricsinfra prometheus server [puppet] - 10https://gerrit.wikimedia.org/r/588803 (https://phabricator.wikimedia.org/T250206) [14:04:44] * James_F waits for it to merge. [14:05:34] (03PS3) 10Jforrester: Wikibase: Use false instead of database names for "local" entity sources on test wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/589019 (https://phabricator.wikimedia.org/T250183) (owner: 10WMDE-leszek) [14:05:39] (03CR) 10Jforrester: [C: 03+2] "…" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/589019 (https://phabricator.wikimedia.org/T250183) (owner: 10WMDE-leszek) [14:06:27] (03Merged) 10jenkins-bot: Wikibase: Use false instead of database names for "local" entity sources on test wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/589019 (https://phabricator.wikimedia.org/T250183) (owner: 10WMDE-leszek) [14:06:44] (03CR) 10CDanis: [C: 03+1] "Thanks for doing this!" [puppet] - 10https://gerrit.wikimedia.org/r/588135 (owner: 10Ema) [14:07:06] leszek_wmde: Live on mwdebug1001. [14:07:15] James_F: testing [14:07:47] James_F: looking good. please proceed [14:07:55] Excellent. [14:08:41] (03PS2) 10Jforrester: wmgExtraLanguageNames: Remove 'smn', supported by core since 1.35.0-wmf.26 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/585673 (owner: 10Raimond Spekking) [14:08:51] (03PS3) 10Jforrester: wmgExtraLanguageNames: Remove 'smn', supported by core since 1.35.0-wmf.26 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/585673 (owner: 10Raimond Spekking) [14:09:28] !log jforrester@deploy1001 Synchronized wmf-config/InitialiseSettings.php: T250181 T250183 Wikibase: Use false instead of database names for 'local' entity sources on test wikis (duration: 01m 06s) [14:09:35] (03CR) 10Jforrester: [C: 03+2] wmgExtraLanguageNames: Remove 'smn', supported by core since 1.35.0-wmf.26 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/585673 (owner: 10Raimond Spekking) [14:09:36] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:09:37] T250183: Argument 1 passed to WikibaseQuality\ConstraintReport\Job\CheckConstraintsJob::setResultsSource() must be an instance of WikibaseQuality\ConstraintReport\Api\CachingResultsSource, instance of WikibaseQuality\ConstraintReport\Api\CheckingResultsSource given, called in /srv/mediawiki/php-1.35.0-wmf.27/extensions/WikibaseQualityConstraints/src/Job/CheckConstraintsJob.php on line 49 - https://phabricator.wikimedia.org/T250183 [14:09:37] T250181: test.wikidata.org - term store not updating - https://phabricator.wikimedia.org/T250181 [14:09:49] (03PS1) 10Jbond: admin: add basic schema validation via tox [puppet] - 10https://gerrit.wikimedia.org/r/589021 (https://phabricator.wikimedia.org/T250259) [14:10:25] (03Merged) 10jenkins-bot: wmgExtraLanguageNames: Remove 'smn', supported by core since 1.35.0-wmf.26 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/585673 (owner: 10Raimond Spekking) [14:10:44] !log jforrester@deploy1001 Synchronized wmf-config/InitialiseSettings.php: Touch and secondary sync of IS for cache-busting (duration: 01m 06s) [14:10:47] (03PS3) 10Jforrester: Remove $wgEnablePartialBlocks config [mediawiki-config] - 10https://gerrit.wikimedia.org/r/576086 (https://phabricator.wikimedia.org/T242912) (owner: 10Tchanders) [14:10:49] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:10:53] (03CR) 10Jforrester: [C: 03+2] Remove $wgEnablePartialBlocks config [mediawiki-config] - 10https://gerrit.wikimedia.org/r/576086 (https://phabricator.wikimedia.org/T242912) (owner: 10Tchanders) [14:11:43] (03Merged) 10jenkins-bot: Remove $wgEnablePartialBlocks config [mediawiki-config] - 10https://gerrit.wikimedia.org/r/576086 (https://phabricator.wikimedia.org/T242912) (owner: 10Tchanders) [14:12:23] !log jforrester@deploy1001 Synchronized wmf-config/InitialiseSettings.php: wmgExtraLanguageNames: Remove 'smn', supported by core since 1.35.0-wmf.26 (duration: 01m 06s) [14:12:27] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:12:54] 10Operations, 10SRE-Access-Requests: Requesting access to analytics for andrew-wmde - https://phabricator.wikimedia.org/T249733 (10fgiunchedi) @Andrew-WMDE re: kerberos, please check your email with a temporary password and instructions. For the record the procedure @elukey pointed me to is this: https://wikit... [14:12:56] (03PS2) 10Jforrester: Use MediaWikiServices::getAuthManager [mediawiki-config] - 10https://gerrit.wikimedia.org/r/585910 (owner: 10Umherirrender) [14:14:35] (03PS2) 10Jforrester: Stop setting legacy wmgWikibase(Repo/Client)Repositories for test wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/586368 (https://phabricator.wikimedia.org/T248664) (owner: 10Addshore) [14:14:42] !log jforrester@deploy1001 Synchronized wmf-config/InitialiseSettings.php: T242912 Remove wgEnablePartialBlocks config, no longer read (duration: 01m 07s) [14:14:43] (03PS1) 10Hashar: contint: allow masters to ssh to themselves [puppet] - 10https://gerrit.wikimedia.org/r/589023 (https://phabricator.wikimedia.org/T224591) [14:14:45] (03PS3) 10Jforrester: Stop setting legacy wmgWikibase(Repo/Client)Repositories for test wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/586368 (https://phabricator.wikimedia.org/T248664) (owner: 10Addshore) [14:14:47] (03PS1) 10Hashar: contint: remove unused Gerrit firewall rule [puppet] - 10https://gerrit.wikimedia.org/r/589024 [14:14:47] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:14:48] T242912: Remove $wgEnablePartialBlocks feature flag - https://phabricator.wikimedia.org/T242912 [14:14:51] (03CR) 10Jforrester: [C: 03+1] Stop setting legacy wmgWikibase(Repo/Client)Repositories for test wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/586368 (https://phabricator.wikimedia.org/T248664) (owner: 10Addshore) [14:14:59] (03CR) 10Jforrester: [C: 03+2] Use MediaWikiServices::getAuthManager [mediawiki-config] - 10https://gerrit.wikimedia.org/r/585910 (owner: 10Umherirrender) [14:15:51] (03Merged) 10jenkins-bot: Use MediaWikiServices::getAuthManager [mediawiki-config] - 10https://gerrit.wikimedia.org/r/585910 (owner: 10Umherirrender) [14:16:31] (03PS6) 10Jforrester: Remove no-op GrowthExperiments beta settings [mediawiki-config] - 10https://gerrit.wikimedia.org/r/584184 (owner: 10Gergő Tisza) [14:17:24] !log jforrester@deploy1001 Synchronized wmf-config/wikitech.php: Use MediaWikiServices::getAuthManager on wikitech (duration: 01m 06s) [14:17:29] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:18:21] (03CR) 10Jforrester: [C: 03+2] Remove no-op GrowthExperiments beta settings [mediawiki-config] - 10https://gerrit.wikimedia.org/r/584184 (owner: 10Gergő Tisza) [14:19:27] (03Merged) 10jenkins-bot: Remove no-op GrowthExperiments beta settings [mediawiki-config] - 10https://gerrit.wikimedia.org/r/584184 (owner: 10Gergő Tisza) [14:19:47] (03PS6) 10Jforrester: ProductionServices: Add 'parsoid' service to replace 'parsoidphp' [mediawiki-config] - 10https://gerrit.wikimedia.org/r/579021 (owner: 10C. Scott Ananian) [14:19:53] (03CR) 10Jforrester: [C: 03+2] ProductionServices: Add 'parsoid' service to replace 'parsoidphp' [mediawiki-config] - 10https://gerrit.wikimedia.org/r/579021 (owner: 10C. Scott Ananian) [14:20:06] James_F: looks https://gerrit.wikimedia.org/r/589019 was synced. thanks a lot! [14:20:15] leszek_wmde: Happy to help. [14:20:35] (03PS4) 10Jforrester: CommonSettings: Use 'parsoid' service in lieu of 'parsoidphp' [mediawiki-config] - 10https://gerrit.wikimedia.org/r/579042 (owner: 10C. Scott Ananian) [14:20:41] (03Merged) 10jenkins-bot: ProductionServices: Add 'parsoid' service to replace 'parsoidphp' [mediawiki-config] - 10https://gerrit.wikimedia.org/r/579021 (owner: 10C. Scott Ananian) [14:20:43] (03PS4) 10Jforrester: ProductionServices: Drop 'parsoidphp' service, we use 'parsoid' now [mediawiki-config] - 10https://gerrit.wikimedia.org/r/579043 (owner: 10C. Scott Ananian) [14:22:59] !log jforrester@deploy1001 Synchronized wmf-config/ProductionServices.php: Add 'parsoid' service to replace 'parsoidphp' (duration: 01m 06s) [14:23:04] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:23:13] (03CR) 10Jforrester: [C: 03+2] CommonSettings: Use 'parsoid' service in lieu of 'parsoidphp' [mediawiki-config] - 10https://gerrit.wikimedia.org/r/579042 (owner: 10C. Scott Ananian) [14:24:14] (03Merged) 10jenkins-bot: CommonSettings: Use 'parsoid' service in lieu of 'parsoidphp' [mediawiki-config] - 10https://gerrit.wikimedia.org/r/579042 (owner: 10C. Scott Ananian) [14:25:43] !log jforrester@deploy1001 Synchronized wmf-config/InitialiseSettings.php: Touch and secondary sync of IS for cache-busting (duration: 01m 06s) [14:25:49] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:26:04] (03CR) 10Jbond: [C: 03+2] hiera_lookup: update hiera_lookup to use bundler [puppet] - 10https://gerrit.wikimedia.org/r/588969 (owner: 10Jbond) [14:27:31] !log jforrester@deploy1001 Synchronized wmf-config/CommonSettings.php: Use 'parsoid' service in lieu of 'parsoidphp' (duration: 01m 07s) [14:27:36] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:27:43] (03CR) 10Dzahn: contint: allow masters to ssh to themselves (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/589023 (https://phabricator.wikimedia.org/T224591) (owner: 10Hashar) [14:28:06] 10Operations, 10ops-codfw, 10DBA: (Need by: TBD) codfw: rack/setup/install backup2002/array backup2002-array1 - https://phabricator.wikimedia.org/T248934 (10jcrespo) [14:29:02] (03CR) 10Jforrester: [C: 03+2] ProductionServices: Drop 'parsoidphp' service, we use 'parsoid' now [mediawiki-config] - 10https://gerrit.wikimedia.org/r/579043 (owner: 10C. Scott Ananian) [14:29:58] (03Merged) 10jenkins-bot: ProductionServices: Drop 'parsoidphp' service, we use 'parsoid' now [mediawiki-config] - 10https://gerrit.wikimedia.org/r/579043 (owner: 10C. Scott Ananian) [14:31:59] 10Operations, 10ops-codfw, 10DBA: (Need by: TBD) codfw: rack/setup/install backup2002/array backup2002-array1 - https://phabricator.wikimedia.org/T248934 (10jcrespo) I've setup the array now: ` df -h /dev/mapper/array1-content 73T 24K 73T 1% /srv/content ` This is a summary of the setup, I will doc... [14:32:08] (03PS4) 10Jforrester: Wikidata: Define entity sources configuration [mediawiki-config] - 10https://gerrit.wikimedia.org/r/569258 (https://phabricator.wikimedia.org/T242087) (owner: 10WMDE-leszek) [14:32:12] !log jforrester@deploy1001 Synchronized wmf-config/ProductionServices.php: Drop 'parsoidphp' service, we use 'parsoid' now (duration: 01m 06s) [14:32:17] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:32:20] (03PS4) 10Jforrester: Wikidata client wikis: Define entity sources configuration [mediawiki-config] - 10https://gerrit.wikimedia.org/r/569259 (https://phabricator.wikimedia.org/T242087) (owner: 10WMDE-leszek) [14:32:36] (03PS4) 10Jforrester: Commons: Define entity sources configuration [mediawiki-config] - 10https://gerrit.wikimedia.org/r/569260 (https://phabricator.wikimedia.org/T242087) (owner: 10WMDE-leszek) [14:32:47] (03PS5) 10Jforrester: Wikidata/Wikibase: use entity source Wikibase setting for all wikibase-enabled wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/569261 (https://phabricator.wikimedia.org/T242087) (owner: 10WMDE-leszek) [14:32:55] (03PS5) 10Jforrester: Wikibase: Removed config option wmgUseEntitySourceBasedFederation [mediawiki-config] - 10https://gerrit.wikimedia.org/r/569263 (https://phabricator.wikimedia.org/T241975) (owner: 10WMDE-leszek) [14:33:26] (03CR) 10jerkins-bot: [V: 04-1] Wikidata client wikis: Define entity sources configuration [mediawiki-config] - 10https://gerrit.wikimedia.org/r/569259 (https://phabricator.wikimedia.org/T242087) (owner: 10WMDE-leszek) [14:33:29] (03CR) 10Hashar: contint: allow masters to ssh to themselves (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/589023 (https://phabricator.wikimedia.org/T224591) (owner: 10Hashar) [14:33:46] (03CR) 10jerkins-bot: [V: 04-1] Commons: Define entity sources configuration [mediawiki-config] - 10https://gerrit.wikimedia.org/r/569260 (https://phabricator.wikimedia.org/T242087) (owner: 10WMDE-leszek) [14:34:38] OK, I'm done for now. [14:36:13] !log rolling upgrade to ATS 8.0.7-rc0-1wm2 on cp[3064,3065,2042,2041,1090,1089] - T249335 [14:36:18] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:36:19] T249335: Memory leak on ats-tls 8.0.6 - https://phabricator.wikimedia.org/T249335 [14:38:07] (03CR) 10Dzahn: [C: 03+2] "yes, nothing listens on 29418 on contint hosts. only on gerrit hosts." [puppet] - 10https://gerrit.wikimedia.org/r/589024 (owner: 10Hashar) [14:38:22] (03PS2) 10Dzahn: contint: remove unused Gerrit firewall rule [puppet] - 10https://gerrit.wikimedia.org/r/589024 (owner: 10Hashar) [14:42:06] (03PS2) 10Hashar: contint: allow masters to ssh to themselves [puppet] - 10https://gerrit.wikimedia.org/r/589023 (https://phabricator.wikimedia.org/T224591) [14:42:08] (03PS3) 10Hashar: contint: remove unused Gerrit firewall rule [puppet] - 10https://gerrit.wikimedia.org/r/589024 [14:42:32] (03CR) 10Hashar: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/589023 (https://phabricator.wikimedia.org/T224591) (owner: 10Hashar) [14:42:38] (03CR) 10jerkins-bot: [V: 04-1] contint: allow masters to ssh to themselves [puppet] - 10https://gerrit.wikimedia.org/r/589023 (https://phabricator.wikimedia.org/T224591) (owner: 10Hashar) [14:43:28] (03PS1) 10Gergő Tisza: Fix GrowthExperiments helpdesk URL for frwiktionary [mediawiki-config] - 10https://gerrit.wikimedia.org/r/589029 (https://phabricator.wikimedia.org/T235964) [14:44:30] (03CR) 10RLazarus: [C: 03+1] "Awesome, thanks!" (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/589021 (https://phabricator.wikimedia.org/T250259) (owner: 10Jbond) [14:44:51] (03PS1) 10Vgutierrez: ATS: Enable inbound TLSv1.3 globally [puppet] - 10https://gerrit.wikimedia.org/r/589030 (https://phabricator.wikimedia.org/T170567) [14:45:25] (03CR) 10jerkins-bot: [V: 04-1] contint: remove unused Gerrit firewall rule [puppet] - 10https://gerrit.wikimedia.org/r/589024 (owner: 10Hashar) [14:45:27] PROBLEM - Unmerged changes on repository puppet on labtestpuppetmaster2001 is CRITICAL: There is one unmerged change in puppet (dir /var/lib/git/operations/puppet, ref HEAD..origin/production). https://wikitech.wikimedia.org/wiki/Monitoring/unmerged_changes [14:47:12] 10Operations, 10Core Platform Team, 10MediaWiki-Cache, 10Traffic, and 2 others: Stop sending purges for `action=history` for linked pages. - https://phabricator.wikimedia.org/T250261 (10fgiunchedi) p:05Triage→03Medium [14:47:20] 10Operations, 10Pybal, 10Traffic: pybal healthchecks reaching the applayer on specific requests - https://phabricator.wikimedia.org/T250258 (10Vgutierrez) p:05High→03Medium Lowering the priority to medium as the issue is not happening after merging https://gerrit.wikimedia.org/r/c/operations/puppet/+/589... [14:49:42] (03PS1) 10ArielGlenn: check bz2 page content files for existence before running command batch [dumps] - 10https://gerrit.wikimedia.org/r/589032 (https://phabricator.wikimedia.org/T250260) [14:52:55] 10Operations: Update Server Access Responsibilities document for Data Retention policy - https://phabricator.wikimedia.org/T83525 (10fgiunchedi) [14:53:23] (03PS3) 10Hashar: contint: allow masters to ssh to themselves [puppet] - 10https://gerrit.wikimedia.org/r/589023 (https://phabricator.wikimedia.org/T224591) [14:54:52] 10Operations, 10audits-data-retention: Implement Data Retention Guidelines - https://phabricator.wikimedia.org/T83531 (10fgiunchedi) [14:55:14] (03PS1) 10DCausse: [wdqs] data-reload allow to reload only categories [cookbooks] - 10https://gerrit.wikimedia.org/r/589033 [14:56:25] (03CR) 10jerkins-bot: [V: 04-1] contint: allow masters to ssh to themselves [puppet] - 10https://gerrit.wikimedia.org/r/589023 (https://phabricator.wikimedia.org/T224591) (owner: 10Hashar) [14:56:55] (03CR) 10jerkins-bot: [V: 04-1] [wdqs] data-reload allow to reload only categories [cookbooks] - 10https://gerrit.wikimedia.org/r/589033 (owner: 10DCausse) [14:59:47] (03PS4) 10Hashar: contint: allow masters to ssh to themselves [puppet] - 10https://gerrit.wikimedia.org/r/589023 (https://phabricator.wikimedia.org/T224591) [15:00:47] (03PS2) 10DCausse: [wdqs] data-reload allow to reload only categories [cookbooks] - 10https://gerrit.wikimedia.org/r/589033 [15:02:38] (03PS2) 10Jbond: admin: add basic schema validation via tox [puppet] - 10https://gerrit.wikimedia.org/r/589021 (https://phabricator.wikimedia.org/T250259) [15:03:32] (03CR) 10Jbond: "updated thanks" (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/589021 (https://phabricator.wikimedia.org/T250259) (owner: 10Jbond) [15:04:16] (03CR) 10Dzahn: contint: allow masters to ssh to themselves (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/589023 (https://phabricator.wikimedia.org/T224591) (owner: 10Hashar) [15:05:09] 10Operations, 10Wikimedia-General-or-Unknown, 10Tor, 10WorkType-NewFunctionality: Run our own Tor client for Tor block - https://phabricator.wikimedia.org/T32716 (10fgiunchedi) 05Open→03Resolved a:03fgiunchedi Boldly resolving as per comments, feel free to reopen. [15:08:39] 10Operations, 10ops-codfw, 10ops-eqiad, 10ops-eqsin, and 2 others: Audit & update spares part tracking for all sites - https://phabricator.wikimedia.org/T243450 (10Papaul) [15:08:48] 10Operations, 10ops-codfw, 10DBA: (Need by: TBD) codfw: rack/setup/install backup2002/array backup2002-array1 - https://phabricator.wikimedia.org/T248934 (10jcrespo) 05Open→03Resolved [15:09:26] (03PS1) 10Ema: 0.6: use golang.org/x/net/ipv4 for the multicast reader [software/purged] - 10https://gerrit.wikimedia.org/r/589037 (https://phabricator.wikimedia.org/T249583) [15:09:45] (03PS30) 10Jforrester: Test if 2x logo version is 2 times bigger than 1x logo version [mediawiki-config] - 10https://gerrit.wikimedia.org/r/521181 (https://phabricator.wikimedia.org/T211413) (owner: 10Urbanecm) [15:12:23] 10Operations, 10DBA, 10Goal: Set up backup strategy for es clusters - https://phabricator.wikimedia.org/T79922 (10jcrespo) Codfw hw now available: T248934 (75TB total), only needs puppetization. [15:13:11] (03PS1) 10Dzahn: ci::firewall: replace ferm::rule with ferm::service [puppet] - 10https://gerrit.wikimedia.org/r/589038 [15:13:50] (03CR) 10jerkins-bot: [V: 04-1] ci::firewall: replace ferm::rule with ferm::service [puppet] - 10https://gerrit.wikimedia.org/r/589038 (owner: 10Dzahn) [15:13:53] 10Operations, 10Pybal, 10Traffic: pybal healthchecks reaching the applayer on specific requests - https://phabricator.wikimedia.org/T250258 (10Vgutierrez) [15:14:36] (03PS2) 10Dzahn: ci::firewall: replace ferm::rule with ferm::service [puppet] - 10https://gerrit.wikimedia.org/r/589038 [15:15:52] (03PS1) 10Ema: purged: stop passing udp port [puppet] - 10https://gerrit.wikimedia.org/r/589040 (https://phabricator.wikimedia.org/T249583) [15:16:30] (03CR) 10Vgutierrez: "pcc is happy: https://puppet-compiler.wmflabs.org/compiler1002/21940/" [puppet] - 10https://gerrit.wikimedia.org/r/589030 (https://phabricator.wikimedia.org/T170567) (owner: 10Vgutierrez) [15:17:28] (03CR) 10Volans: "Small detail inline, the rest looks sane to me but I'll leave it to others that have better knowledge of the various parts involved here ;" (031 comment) [cookbooks] - 10https://gerrit.wikimedia.org/r/589033 (owner: 10DCausse) [15:17:31] (03CR) 10jerkins-bot: [V: 04-1] ci::firewall: replace ferm::rule with ferm::service [puppet] - 10https://gerrit.wikimedia.org/r/589038 (owner: 10Dzahn) [15:17:33] (03CR) 10Dzahn: [C: 03+2] "https://puppet-compiler.wmflabs.org/compiler1002/21941/" [puppet] - 10https://gerrit.wikimedia.org/r/589023 (https://phabricator.wikimedia.org/T224591) (owner: 10Hashar) [15:18:55] (03CR) 10RLazarus: [C: 03+1] admin: add basic schema validation via tox (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/589021 (https://phabricator.wikimedia.org/T250259) (owner: 10Jbond) [15:19:57] !log upgrading firmware on restbase2014 [15:20:02] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:20:18] jbond42: ok to merge the hiera_lookup change? it seems a while ago and potentially impactful [15:20:31] mutante: yes sorry i forgot about it [15:20:48] jbond42: np, done [15:20:54] (03PS2) 10Ema: 0.6: use golang.org/x/net/ipv4 for the multicast reader [software/purged] - 10https://gerrit.wikimedia.org/r/589037 (https://phabricator.wikimedia.org/T249583) [15:21:05] (03CR) 10Ema: [C: 03+1] ATS: Enable inbound TLSv1.3 globally [puppet] - 10https://gerrit.wikimedia.org/r/589030 (https://phabricator.wikimedia.org/T170567) (owner: 10Vgutierrez) [15:21:07] (03CR) 10Vgutierrez: [C: 03+1] 0.6: use golang.org/x/net/ipv4 for the multicast reader [software/purged] - 10https://gerrit.wikimedia.org/r/589037 (https://phabricator.wikimedia.org/T249583) (owner: 10Ema) [15:21:07] thank [15:22:27] RECOVERY - Unmerged changes on repository puppet on labtestpuppetmaster2001 is OK: No changes to merge. https://wikitech.wikimedia.org/wiki/Monitoring/unmerged_changes [15:26:46] (03CR) 10Ema: [V: 03+2 C: 03+2] TestWorkers: Wait for both fe and be channels to be consumed [software/purged] - 10https://gerrit.wikimedia.org/r/588970 (owner: 10Ema) [15:26:48] (03PS1) 10Herron: logstash: reduce kafka-logging log retention to 5 days [puppet] - 10https://gerrit.wikimedia.org/r/589043 (https://phabricator.wikimedia.org/T250133) [15:26:51] (03CR) 10Ema: [C: 03+2] 0.6: use golang.org/x/net/ipv4 for the multicast reader [software/purged] - 10https://gerrit.wikimedia.org/r/589037 (https://phabricator.wikimedia.org/T249583) (owner: 10Ema) [15:30:07] !log upload purged 0.6 to buster-wikimedia T249583 [15:30:13] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:30:16] T249583: Create vhtcpd replacement - https://phabricator.wikimedia.org/T249583 [15:30:32] (03CR) 10Cwhite: [C: 03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/589043 (https://phabricator.wikimedia.org/T250133) (owner: 10Herron) [15:30:37] (03CR) 10CRusnov: "Thanks for the feedback just a few juestions" (033 comments) [software/netbox-extras] - 10https://gerrit.wikimedia.org/r/588036 (https://phabricator.wikimedia.org/T244153) (owner: 10CRusnov) [15:30:40] godog: papaul: [15:31:53] 10Operations, 10procurement: Quote request for SSD expansion of prometheus hosts in eqiad - https://phabricator.wikimedia.org/T250286 (10fgiunchedi) [15:32:21] jbond42: yo! [15:32:39] godog: nothing sorry :) [15:32:48] haha! ok [15:33:20] (03PS4) 10Hashar: contint: remove unused Gerrit firewall rule [puppet] - 10https://gerrit.wikimedia.org/r/589024 [15:33:22] (03CR) 10Ema: [C: 03+2] purged: stop passing udp port [puppet] - 10https://gerrit.wikimedia.org/r/589040 (https://phabricator.wikimedia.org/T249583) (owner: 10Ema) [15:35:37] (03CR) 10Jbond: ferm: Add status check (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/576101 (https://phabricator.wikimedia.org/T206951) (owner: 10Jbond) [15:36:28] !log cp2029,cp3050: upgrade purged to 0.6, restart varnish-fe T249583 [15:36:33] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:36:34] T249583: Create vhtcpd replacement - https://phabricator.wikimedia.org/T249583 [15:37:00] (03CR) 10Herron: "interestingly ps1 is a noop https://puppet-compiler.wmflabs.org/compiler1001/21942/" [puppet] - 10https://gerrit.wikimedia.org/r/589043 (https://phabricator.wikimedia.org/T250133) (owner: 10Herron) [15:37:10] (03PS2) 10Herron: logstash: reduce kafka-logging log retention to 5 days [puppet] - 10https://gerrit.wikimedia.org/r/589043 (https://phabricator.wikimedia.org/T250133) [15:38:36] (03PS3) 10Herron: logstash: reduce kafka-logging log retention to 5 days [puppet] - 10https://gerrit.wikimedia.org/r/589043 (https://phabricator.wikimedia.org/T250133) [15:41:25] (03CR) 10Herron: "here we are https://puppet-compiler.wmflabs.org/compiler1001/21945/" [puppet] - 10https://gerrit.wikimedia.org/r/589043 (https://phabricator.wikimedia.org/T250133) (owner: 10Herron) [15:41:49] (03PS3) 10DCausse: [wdqs] data-reload allow to reload only categories [cookbooks] - 10https://gerrit.wikimedia.org/r/589033 [15:42:13] (03CR) 10DCausse: [wdqs] data-reload allow to reload only categories (031 comment) [cookbooks] - 10https://gerrit.wikimedia.org/r/589033 (owner: 10DCausse) [15:47:45] (03CR) 10Filippo Giunchedi: [C: 03+1] "LGTM!" [puppet] - 10https://gerrit.wikimedia.org/r/589043 (https://phabricator.wikimedia.org/T250133) (owner: 10Herron) [15:47:59] 10Operations, 10procurement: Quote request for SSD expansion of prometheus hosts in eqiad - https://phabricator.wikimedia.org/T250286 (10RobH) I've already dropped a note to @fgiunchedi to please always use the form provided for [[ https://phabricator.wikimedia.org/maniphest/task/edit/form/66/ | filing a procu... [15:51:15] (03CR) 10Jbond: ferm: Add status check (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/576101 (https://phabricator.wikimedia.org/T206951) (owner: 10Jbond) [15:52:09] PROBLEM - Host restbase2014 is DOWN: PING CRITICAL - Packet loss = 100% [15:53:19] (03CR) 10Herron: [C: 03+2] logstash: reduce kafka-logging log retention to 5 days [puppet] - 10https://gerrit.wikimedia.org/r/589043 (https://phabricator.wikimedia.org/T250133) (owner: 10Herron) [15:54:22] (03CR) 10Volans: [C: 04-1] "Some comments inline, I think there is a small bug. The rest looks sane." (039 comments) [puppet] - 10https://gerrit.wikimedia.org/r/576101 (https://phabricator.wikimedia.org/T206951) (owner: 10Jbond) [15:57:08] 10Operations, 10MediaWiki-Cache, 10Page Content Service, 10Product-Infrastructure-Team-Backlog, and 3 others: cache_text cluster consistently backlogged on purge requests - https://phabricator.wikimedia.org/T249325 (10ema) The work on [[https://gerrit.wikimedia.org/g/operations/software/purged | purged]]... [15:57:17] RECOVERY - Host restbase2014 is UP: PING OK - Packet loss = 0%, RTA = 36.22 ms [15:58:27] RECOVERY - Check systemd state on restbase2014 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [15:58:58] 10Operations, 10LDAP-Access-Requests: LDAP access to the wmf group for Matthew Williams - https://phabricator.wikimedia.org/T249844 (10mwilliams) Success! Thank you [16:02:53] (03PS1) 10CRusnov: interface_automation: Restrict mgmt creation [software/netbox-extras] - 10https://gerrit.wikimedia.org/r/589048 (https://phabricator.wikimedia.org/T250287) [16:03:18] (03CR) 10CRusnov: "This change is ready for review." [software/netbox-extras] - 10https://gerrit.wikimedia.org/r/589048 (https://phabricator.wikimedia.org/T250287) (owner: 10CRusnov) [16:03:28] jouncebot: next [16:03:28] In 1 hour(s) and 56 minute(s): Morning SWAT(Max 6 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20200415T1800) [16:08:35] !log volker-e@deploy1001 Started deploy [design/style-guide@a4d5794]: Deploy design/style-guide: [16:08:39] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:08:46] !log volker-e@deploy1001 Finished deploy [design/style-guide@a4d5794]: Deploy design/style-guide: (duration: 00m 11s) [16:08:50] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:09:00] (03CR) 10Arturo Borrero Gonzalez: [C: 03+1] cloudvps: Add metricsinfra prometheus server [puppet] - 10https://gerrit.wikimedia.org/r/588803 (https://phabricator.wikimedia.org/T250206) (owner: 10Jhedden) [16:11:52] (03PS1) 10Ema: multicast: set read buffer [software/purged] - 10https://gerrit.wikimedia.org/r/589049 (https://phabricator.wikimedia.org/T249583) [16:13:57] 10Operations, 10MediaWiki-Parser, 10serviceops, 10Core Platform Team Workboards (Clinic Duty Team), and 2 others: API action=parse should be poolcounter-limited if a re-parse is necessary - https://phabricator.wikimedia.org/T243803 (10Pchelolo) 05Open→03Resolved [16:22:19] (03CR) 10Alexandros Kosiaris: [C: 04-1] ci::firewall: replace ferm::rule with ferm::service (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/589038 (owner: 10Dzahn) [16:23:39] PROBLEM - Host restbase2014 is DOWN: PING CRITICAL - Packet loss = 100% [16:24:49] RECOVERY - Host restbase2014 is UP: PING OK - Packet loss = 0%, RTA = 36.18 ms [16:35:13] 10Operations, 10ChangeProp, 10Release Pipeline, 10Release-Engineering-Team-TODO, and 7 others: Migrate cpjobqueue to kubernetes - https://phabricator.wikimedia.org/T220399 (10eprodromou) [16:35:22] (03PS1) 10Hnowlan: changeprop: allow setting arbitrary keys for kafka options [deployment-charts] - 10https://gerrit.wikimedia.org/r/589059 (https://phabricator.wikimedia.org/T248677) [16:35:35] (03PS4) 10CDanis: puppetize nic_saturation_exporter & run on memcache hosts [puppet] - 10https://gerrit.wikimedia.org/r/588760 (https://phabricator.wikimedia.org/T224454) [16:36:46] ACKNOWLEDGEMENT - MD RAID on restbase2014 is CRITICAL: CRITICAL: State: degraded, Active: 6, Working: 6, Failed: 0, Spare: 0 nagiosadmin RAID handler auto-ack: https://phabricator.wikimedia.org/T250293 https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook%23Hardware_Raid_Information_Gathering [16:36:48] 10Operations, 10ops-codfw: Degraded RAID on restbase2014 - https://phabricator.wikimedia.org/T250293 (10ops-monitoring-bot) [16:38:40] (03CR) 10CDanis: [C: 03+2] "Thanks for the review!" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/588760 (https://phabricator.wikimedia.org/T224454) (owner: 10CDanis) [16:39:02] 10Operations, 10ops-codfw: Degraded RAID on restbase2014 - https://phabricator.wikimedia.org/T250050 (10Eevans) It looks like the machine was just rebooted: ` eevans@restbase2014:~$ date -R; uptime Wed, 15 Apr 2020 16:38:20 +0000 16:38:20 up 14 min, 1 user, load average: 1.96, 2.02, 1.56 eevans@restbase201... [16:41:24] (03PS1) 10CDanis: Revert "puppetize nic_saturation_exporter & run on memcache hosts" [puppet] - 10https://gerrit.wikimedia.org/r/589060 [16:41:32] (03CR) 10CDanis: [V: 03+2 C: 03+2] Revert "puppetize nic_saturation_exporter & run on memcache hosts" [puppet] - 10https://gerrit.wikimedia.org/r/589060 (owner: 10CDanis) [16:41:57] jessie. [16:42:01] sigh, jessie. [16:44:46] 10Operations, 10ops-codfw: Degraded RAID on restbase2014 - https://phabricator.wikimedia.org/T250050 (10Papaul) @Eevans yes was and i am working on it . i logged the message abut this at 15:19 today. thanks 15:19 papaul: upgrading firmware on restbase2014 [16:44:48] fist shaking ensues [16:51:06] 10Operations, 10CX-cxserver, 10Product-Infrastructure-Team-Backlog, 10Wikifeeds, and 4 others: service-runner apps running on kubernetes emit logs with log level 50 - https://phabricator.wikimedia.org/T239459 (10akosiaris) [16:51:11] James_F: I just saw (on twitter) a sync for using false instead of the dB name for "local" entity sources [16:51:33] Is that needed? Cant we be explicit? The idea being we should only define that source once [16:52:22] (03CR) 10Andrew Bogott: "Thanks for the review! Responses inline" (037 comments) [puppet] - 10https://gerrit.wikimedia.org/r/588752 (https://phabricator.wikimedia.org/T249941) (owner: 10Andrew Bogott) [16:53:50] (03PS11) 10Andrew Bogott: Designate: replace standalone memcached with a mcrouter cluster [puppet] - 10https://gerrit.wikimedia.org/r/588752 (https://phabricator.wikimedia.org/T249941) [16:54:47] 10Operations, 10CX-cxserver, 10Product-Infrastructure-Team-Backlog, 10Wikifeeds, and 4 others: service-runner apps running on kubernetes emit logs with log level 50 - https://phabricator.wikimedia.org/T239459 (10akosiaris) Adding owners of services in the subscribers list. [16:55:03] 10Operations, 10CX-cxserver, 10Product-Infrastructure-Team-Backlog, 10Wikifeeds, and 4 others: service-runner apps running on kubernetes emit logs with log level 50 - https://phabricator.wikimedia.org/T239459 (10akosiaris) [16:56:33] 10Operations, 10CX-cxserver, 10Product-Infrastructure-Team-Backlog, 10Wikifeeds, and 4 others: service-runner apps running on kubernetes emit logs with log level 50 - https://phabricator.wikimedia.org/T239459 (10akosiaris) [16:56:36] 10Operations, 10Wikimedia-Logstash, 10observability, 10Patch-For-Review, 10User-fgiunchedi: Deprecate all non-Kafka logstash inputs - https://phabricator.wikimedia.org/T227080 (10akosiaris) [16:57:12] (03PS12) 10Andrew Bogott: Designate: replace standalone memcached with a mcrouter cluster [puppet] - 10https://gerrit.wikimedia.org/r/588752 (https://phabricator.wikimedia.org/T249941) [16:58:18] 10Operations, 10CX-cxserver, 10Product-Infrastructure-Team-Backlog, 10Wikifeeds, and 4 others: service-runner apps running on kubernetes emit logs with log level 50 - https://phabricator.wikimedia.org/T239459 (10akosiaris) [16:58:21] 10Operations, 10Wikifeeds, 10Wikimedia-Logstash, 10observability: Move wikifeeds to the logging pipeline - https://phabricator.wikimedia.org/T245604 (10akosiaris) [16:58:26] 10Operations, 10Product-Infrastructure-Team-Backlog, 10Proton, 10Wikimedia-Logstash, and 3 others: Move proton logging to new logging pipeline - https://phabricator.wikimedia.org/T219925 (10akosiaris) [16:58:30] 10Operations, 10Page Content Service, 10Product-Infrastructure-Team-Backlog, 10Wikimedia-Logstash, and 3 others: Move mobileapps logging to new logging pipeline - https://phabricator.wikimedia.org/T219924 (10akosiaris) [16:58:33] 10Operations, 10CX-cxserver, 10Wikimedia-Logstash, 10observability, and 3 others: Move cxserver logging to new logging pipeline - https://phabricator.wikimedia.org/T219921 (10akosiaris) [16:58:37] 10Operations, 10Citoid, 10Wikimedia-Logstash, 10observability, and 2 others: Move citoid logging to new logging pipeline - https://phabricator.wikimedia.org/T219919 (10akosiaris) [16:58:40] (03CR) 10Volans: [C: 03+1] "LGTM, thanks for the quick patch!" [software/netbox-extras] - 10https://gerrit.wikimedia.org/r/589048 (https://phabricator.wikimedia.org/T250287) (owner: 10CRusnov) [16:58:42] 10Operations, 10Wikimedia-Logstash, 10observability, 10service-runner, 10Patch-For-Review: Move service-runner to new logging infrastructure - https://phabricator.wikimedia.org/T211125 (10akosiaris) [16:59:07] 10Operations, 10CX-cxserver, 10Product-Infrastructure-Team-Backlog, 10Wikifeeds, and 4 others: service-runner apps running on kubernetes emit logs with log level 50 - https://phabricator.wikimedia.org/T239459 (10akosiaris) [17:07:26] 10Operations, 10ops-codfw: Degraded RAID on restbase2014 - https://phabricator.wikimedia.org/T250050 (10Papaul) Create Dispatch: Success You have successfully submitted request SR1023108711. [17:07:30] (03PS13) 10Andrew Bogott: Designate: replace standalone memcached with a mcrouter cluster [puppet] - 10https://gerrit.wikimedia.org/r/588752 (https://phabricator.wikimedia.org/T249941) [17:08:17] 10Operations, 10ops-eqiad, 10DC-Ops: Interface errors on asw2-c-eqiad - ge-3/0/9 (pc1009) - https://phabricator.wikimedia.org/T250257 (10wiki_willy) i [17:08:29] 10Operations, 10ops-eqiad, 10DC-Ops: Interface errors on asw2-c-eqiad - ge-3/0/9 (pc1009) - https://phabricator.wikimedia.org/T250257 (10wiki_willy) a:03Cmjohnson [17:09:00] addshore: Ask leszek. [17:10:54] 10Operations, 10ops-eqiad, 10DC-Ops: Interface errors on asw2-c-eqiad - ge-3/0/9 (pc1009) - https://phabricator.wikimedia.org/T250257 (10Cmjohnson) @ayounsi what is your thought here? cable going bad? [17:15:38] (03CR) 10Andrew Bogott: "pcc results are not very illuminating, but:" [puppet] - 10https://gerrit.wikimedia.org/r/588752 (https://phabricator.wikimedia.org/T249941) (owner: 10Andrew Bogott) [17:16:58] 10Operations, 10Release-Engineering-Team-TODO: Should 'doc' machines (i.e. doc1001) have contint-roots as a group? - https://phabricator.wikimedia.org/T245691 (10hashar) The material is published as user `doc-uploader` and we have sudo access for that user. Only case I needed root on that machine was to fix s... [17:19:13] (03PS1) 10CDanis: fix NIC saturation exporter to be jessie-compatible 😖 [puppet] - 10https://gerrit.wikimedia.org/r/589067 (https://phabricator.wikimedia.org/T224454) [17:21:01] 10Operations, 10ops-eqiad, 10DC-Ops: Interface errors on asw2-c-eqiad - ge-3/0/9 (pc1009) - https://phabricator.wikimedia.org/T250257 (10ayounsi) From `asw2-c-eqiad> show interfaces ge-3/0/9 extensive | match error` They are `Framing errors: 8941` So yep, usual dance: swap cable, interfaces, etc. [17:22:14] (03PS1) 10Cmjohnson: Add mgmt dns for kubernetes100[7-14] [dns] - 10https://gerrit.wikimedia.org/r/589068 (https://phabricator.wikimedia.org/T241850) [17:25:24] (03CR) 10RLazarus: [C: 03+1] "😞" [puppet] - 10https://gerrit.wikimedia.org/r/589067 (https://phabricator.wikimedia.org/T224454) (owner: 10CDanis) [17:25:59] (03CR) 10CDanis: [C: 03+2] fix NIC saturation exporter to be jessie-compatible 😖 [puppet] - 10https://gerrit.wikimedia.org/r/589067 (https://phabricator.wikimedia.org/T224454) (owner: 10CDanis) [17:26:43] (03PS2) 10Cmjohnson: Add mgmt dns for kubernetes100[7-14] [dns] - 10https://gerrit.wikimedia.org/r/589068 (https://phabricator.wikimedia.org/T241850) [17:27:02] (03PS3) 10Cmjohnson: Add mgmt dns for kubernetes100[7-14] [dns] - 10https://gerrit.wikimedia.org/r/589068 (https://phabricator.wikimedia.org/T241850) [17:27:11] (03PS1) 10CDanis: jessie-ified nic_saturation_exporter on memcache hosts [puppet] - 10https://gerrit.wikimedia.org/r/589070 (https://phabricator.wikimedia.org/T224454) [17:28:19] (03CR) 10Cmjohnson: [C: 03+2] Add mgmt dns for kubernetes100[7-14] [dns] - 10https://gerrit.wikimedia.org/r/589068 (https://phabricator.wikimedia.org/T241850) (owner: 10Cmjohnson) [17:29:18] (03CR) 10CDanis: [C: 03+2] "This was reviewed by elukey as f500c1d98cf, only difference here is removing python3-attr from the required packages (not available on jes" [puppet] - 10https://gerrit.wikimedia.org/r/589070 (https://phabricator.wikimedia.org/T224454) (owner: 10CDanis) [17:34:35] sigh, I think I broke puppet on the memcacheds, debugging now [17:35:34] (03CR) 10Alex Monk: "this is fine, might want to make it just a simple list of projects which maps into the full list of openstack SD config objects, but this " [puppet] - 10https://gerrit.wikimedia.org/r/588803 (https://phabricator.wikimedia.org/T250206) (owner: 10Jhedden) [17:37:40] (03CR) 10Ppchelko: changeprop: allow setting arbitrary keys for kafka options (033 comments) [deployment-charts] - 10https://gerrit.wikimedia.org/r/589059 (https://phabricator.wikimedia.org/T248677) (owner: 10Hnowlan) [17:38:26] (03PS1) 10CDanis: nic-saturation-exporter: remove erroneous override [puppet] - 10https://gerrit.wikimedia.org/r/589073 [17:39:37] (03CR) 10CDanis: [C: 03+2] nic-saturation-exporter: remove erroneous override [puppet] - 10https://gerrit.wikimedia.org/r/589073 (owner: 10CDanis) [17:39:56] (03CR) 10Alex Monk: [C: 03+1] cloudvps: Add metricsinfra prometheus server [puppet] - 10https://gerrit.wikimedia.org/r/588803 (https://phabricator.wikimedia.org/T250206) (owner: 10Jhedden) [17:42:16] (03CR) 10Krinkle: [C: 04-1] "Per offline discussion, we'll roll this out on a per-wiki basis. Before we can do that though, we'll need WANCache to be able to toggle th" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/575098 (owner: 10Aaron Schulz) [17:42:47] PROBLEM - Widespread puppet agent failures on icinga1001 is CRITICAL: 0.01017 ge 0.01 https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/yOxVDGvWk/puppet [17:47:29] (03PS1) 10RLazarus: cergen: Add script for renewing mcrouter certs. [puppet] - 10https://gerrit.wikimedia.org/r/589076 [17:49:15] (03PS3) 10Krinkle: Convert test2wiki from EventLogging repo to client-only [mediawiki-config] - 10https://gerrit.wikimedia.org/r/583149 (https://phabricator.wikimedia.org/T196309) [17:53:49] RECOVERY - Widespread puppet agent failures on icinga1001 is OK: (C)0.01 ge (W)0.006 ge 0.002543 https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/yOxVDGvWk/puppet [18:00:04] RoanKattouw, Niharika, and Urbanecm: My dear minions, it's time we take the moon! Just kidding. Time for Morning SWAT(Max 6 patches) deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20200415T1800). [18:00:04] tgr: A patch you scheduled for Morning SWAT(Max 6 patches) is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker. [18:02:18] I'll SWAT [18:03:57] (03PS1) 10CDanis: prometheus/ops: add nic saturation exporter [puppet] - 10https://gerrit.wikimedia.org/r/589078 [18:04:15] (03CR) 10Gergő Tisza: [C: 03+2] Fix GrowthExperiments helpdesk URL for frwiktionary [mediawiki-config] - 10https://gerrit.wikimedia.org/r/589029 (https://phabricator.wikimedia.org/T235964) (owner: 10Gergő Tisza) [18:05:21] (03Merged) 10jenkins-bot: Fix GrowthExperiments helpdesk URL for frwiktionary [mediawiki-config] - 10https://gerrit.wikimedia.org/r/589029 (https://phabricator.wikimedia.org/T235964) (owner: 10Gergő Tisza) [18:06:03] (03CR) 10Ppchelko: "I think that actually we do not need this right now, I have found the reason for our performance problems." [deployment-charts] - 10https://gerrit.wikimedia.org/r/589059 (https://phabricator.wikimedia.org/T248677) (owner: 10Hnowlan) [18:06:35] (03CR) 10Nuria: [C: 03+1] Run the refine failure flag checker against the last 48h instead of 24 [puppet] - 10https://gerrit.wikimedia.org/r/589000 (https://phabricator.wikimedia.org/T240230) (owner: 10Elukey) [18:10:55] !log tgr@deploy1001 Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:589029|Fix GrowthExperiments helpdesk URL for frwiktionary (T235964)]] (duration: 01m 06s) [18:11:01] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:11:01] (03PS1) 10Ppchelko: Update change-prop container version to v0.9.5 [deployment-charts] - 10https://gerrit.wikimedia.org/r/589079 (https://phabricator.wikimedia.org/T248677) [18:11:03] T235964: Get the Growth experiment for the French Wiktionary - https://phabricator.wikimedia.org/T235964 [18:11:06] (03CR) 10CDanis: [C: 03+2] "PCC looks good https://puppet-compiler.wmflabs.org/compiler1001/21951/" [puppet] - 10https://gerrit.wikimedia.org/r/589078 (owner: 10CDanis) [18:12:40] !log tgr@deploy1001 Synchronized wmf-config/InitialiseSettings.php: re-sync (duration: 01m 07s) [18:12:43] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:13:58] (03CR) 10Ppchelko: "I did helm package; helm repo index . - I hope this time I did a correct sequence of actions, but please double-check. I keep missing this" [deployment-charts] - 10https://gerrit.wikimedia.org/r/589079 (https://phabricator.wikimedia.org/T248677) (owner: 10Ppchelko) [18:24:34] 10Operations, 10ops-eqiad: restbase1025 reported DIMM issues in getsel - https://phabricator.wikimedia.org/T250027 (10Cmjohnson) The error came back but on A2 this time. Bad DIMM This is under warranty, I will order a new DIMM and update task with the Dell ticket number [18:24:54] 10Operations, 10ops-eqiad, 10serviceops: (Need by: TBD) rack/setup/install kubernetes10[07-14].eqiad.wmnet - https://phabricator.wikimedia.org/T241850 (10Cmjohnson) [18:33:04] tgr: could you roll out https://gerrit.wikimedia.org/r/#/c/operations/mediawiki-config/+/583149/ for me as well? [18:33:28] (03PS1) 10CDanis: run nic_saturation_exporter on all hosts [puppet] - 10https://gerrit.wikimedia.org/r/589085 (https://phabricator.wikimedia.org/T224454) [18:33:56] oh that was half an hour ago [18:33:57] he [18:34:41] (03CR) 10Krinkle: [C: 03+2] Convert test2wiki from EventLogging repo to client-only [mediawiki-config] - 10https://gerrit.wikimedia.org/r/583149 (https://phabricator.wikimedia.org/T196309) (owner: 10Krinkle) [18:34:47] * Krinkle staging on mwdebug1002 [18:35:30] 10Operations, 10DBA, 10Data-Services, 10cloud-services-team (Kanban): Prepare and check storage layer for gr.wikimedia.org - https://phabricator.wikimedia.org/T245912 (10JHedden) a:03JHedden [18:35:59] (03CR) 10jerkins-bot: [V: 04-1] run nic_saturation_exporter on all hosts [puppet] - 10https://gerrit.wikimedia.org/r/589085 (https://phabricator.wikimedia.org/T224454) (owner: 10CDanis) [18:36:03] (03Merged) 10jenkins-bot: Convert test2wiki from EventLogging repo to client-only [mediawiki-config] - 10https://gerrit.wikimedia.org/r/583149 (https://phabricator.wikimedia.org/T196309) (owner: 10Krinkle) [18:39:27] (03PS2) 10CDanis: run nic_saturation_exporter on all physical hosts [puppet] - 10https://gerrit.wikimedia.org/r/589085 (https://phabricator.wikimedia.org/T224454) [18:44:05] !log krinkle@deploy1001 Synchronized wmf-config/CommonSettings.php: Idc81a885b2f3, T196309 (duration: 01m 07s) [18:44:11] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:44:12] T196309: EventLogging-based extensions cause errors on test2.wikipedia.org - https://phabricator.wikimedia.org/T196309 [18:44:54] 10Operations, 10MediaWiki-Parser, 10serviceops, 10Core Platform Team Workboards (Clinic Duty Team): purgeParserCache.php: Cannot purge this kind of parser cache - https://phabricator.wikimedia.org/T250231 (10daniel) [18:46:20] (03PS1) 10Bstorm: rails: ruby needs nodejs and bundler to be useful [docker-images/toollabs-images] - 10https://gerrit.wikimedia.org/r/589086 (https://phabricator.wikimedia.org/T141388) [18:46:57] 10Operations, 10DBA, 10Data-Services, 10cloud-services-team (Kanban): Prepare and check storage layer for gr.wikimedia.org - https://phabricator.wikimedia.org/T245912 (10JHedden) 05Open→03Resolved `name="Updates on labsdb10{09,10,11,12}" $ sudo /usr/local/sbin/maintain-replica-indexes --database grwiki... [18:47:34] (03PS2) 10Bstorm: rails: ruby needs nodejs and bundler to be useful [docker-images/toollabs-images] - 10https://gerrit.wikimedia.org/r/589086 (https://phabricator.wikimedia.org/T141388) [18:50:25] (03PS3) 10Bstorm: rails: ruby needs nodejs and bundler to be useful [docker-images/toollabs-images] - 10https://gerrit.wikimedia.org/r/589086 (https://phabricator.wikimedia.org/T141388) [18:51:07] (03Restored) 10Thcipriani: Gerrit: apache proxy not pooled [puppet] - 10https://gerrit.wikimedia.org/r/579601 (https://phabricator.wikimedia.org/T246763) (owner: 10Thcipriani) [18:54:47] (03PS3) 10CDanis: run nic_saturation_exporter on all physical hosts [puppet] - 10https://gerrit.wikimedia.org/r/589085 (https://phabricator.wikimedia.org/T224454) [18:57:00] 10Operations, 10Core Platform Team, 10MediaWiki-Cache, 10Traffic, and 2 others: Stop sending purges for `action=history` for linked pages. - https://phabricator.wikimedia.org/T250261 (10Krinkle) The same also applies to the `action=raw&ctype=text/javascript` variants for CSS/JS pages. These make up a much... [18:57:53] (03PS2) 10Ppchelko: Update change-prop container version to v0.9.5 [deployment-charts] - 10https://gerrit.wikimedia.org/r/589079 (https://phabricator.wikimedia.org/T248677) [18:58:59] (03CR) 10CDanis: "PCC on a sampling of hosts looks okay: https://puppet-compiler.wmflabs.org/compiler1002/21954/" [puppet] - 10https://gerrit.wikimedia.org/r/589085 (https://phabricator.wikimedia.org/T224454) (owner: 10CDanis) [18:59:20] Okie-dokie. [18:59:47] (03PS1) 10Jforrester: group1 wikis to 1.35.0-wmf.28 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/589090 [18:59:49] (03CR) 10Jforrester: [C: 03+2] group1 wikis to 1.35.0-wmf.28 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/589090 (owner: 10Jforrester) [19:00:04] James_F and liw: It is that lovely time of the day again! You are hereby commanded to deploy Mediawiki train - American+European Version. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20200415T1900). [19:00:04] Hold onto your hats, everyone. [19:01:08] (03Merged) 10jenkins-bot: group1 wikis to 1.35.0-wmf.28 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/589090 (owner: 10Jforrester) [19:03:07] !log jforrester@deploy1001 rebuilt and synchronized wikiversions files: group1 wikis to 1.35.0-wmf.28 [19:03:11] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:04:13] !log jforrester@deploy1001 Synchronized php: group1 wikis to 1.35.0-wmf.28 (duration: 01m 05s) [19:04:17] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:07:14] (03PS2) 10RLazarus: cergen: Add script for renewing mcrouter certs. [puppet] - 10https://gerrit.wikimedia.org/r/589076 [19:07:26] Quiet so far. [19:07:51] (03CR) 10Alexandros Kosiaris: [C: 03+1] "> I did helm package; helm repo index . - I hope this time I did a correct sequence of actions, but please double-check. I keep missing th" [deployment-charts] - 10https://gerrit.wikimedia.org/r/589079 (https://phabricator.wikimedia.org/T248677) (owner: 10Ppchelko) [19:22:08] 10Operations, 10Core Platform Team, 10MediaWiki-Cache, 10Traffic, and 2 others: Stop sending purges for `action=history` for linked pages. - https://phabricator.wikimedia.org/T250261 (10daniel) > I propose we add a separate Title method for the subset of URLs that need purging for link updates (in other wo... [19:28:33] (03PS2) 10C. Scott Ananian: Link to the phab task for VE/Parsoid being disabled on wikitech [mediawiki-config] - 10https://gerrit.wikimedia.org/r/583395 [19:28:52] (03PS2) 10C. Scott Ananian: The official name of Parsoid is 'Parsoid' [mediawiki-config] - 10https://gerrit.wikimedia.org/r/583394 [19:30:02] (03PS1) 10Andrew Bogott: cloud-vps hiera: introduce openstack_controllers and keystone_api_fqdn [puppet] - 10https://gerrit.wikimedia.org/r/589095 (https://phabricator.wikimedia.org/T249941) [19:30:04] (03PS1) 10Andrew Bogott: glance image_sync: use primary_glance_image_store to choose the image store [puppet] - 10https://gerrit.wikimedia.org/r/589096 (https://phabricator.wikimedia.org/T249941) [19:30:45] (03CR) 10jerkins-bot: [V: 04-1] cloud-vps hiera: introduce openstack_controllers and keystone_api_fqdn [puppet] - 10https://gerrit.wikimedia.org/r/589095 (https://phabricator.wikimedia.org/T249941) (owner: 10Andrew Bogott) [19:33:47] (03PS2) 10Andrew Bogott: cloud-vps hiera: introduce openstack_controllers and keystone_api_fqdn [puppet] - 10https://gerrit.wikimedia.org/r/589095 (https://phabricator.wikimedia.org/T249941) [19:33:49] (03PS2) 10Andrew Bogott: glance image_sync: use primary_glance_image_store to choose the image store [puppet] - 10https://gerrit.wikimedia.org/r/589096 (https://phabricator.wikimedia.org/T249941) [19:35:51] (03CR) 10Bstorm: [C: 04-1] "Current rails basically requires yarn because of webpacker. Fixing that." [docker-images/toollabs-images] - 10https://gerrit.wikimedia.org/r/589086 (https://phabricator.wikimedia.org/T141388) (owner: 10Bstorm) [19:37:54] (03CR) 10jerkins-bot: [V: 04-1] glance image_sync: use primary_glance_image_store to choose the image store [puppet] - 10https://gerrit.wikimedia.org/r/589096 (https://phabricator.wikimedia.org/T249941) (owner: 10Andrew Bogott) [19:43:43] (03CR) 10Jhedden: [C: 03+2] cloudvps: Add metricsinfra prometheus server [puppet] - 10https://gerrit.wikimedia.org/r/588803 (https://phabricator.wikimedia.org/T250206) (owner: 10Jhedden) [19:44:11] !log depool wdqs1006 to catch up on lag [19:44:18] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:45:54] (03PS3) 10Andrew Bogott: glance image_sync: use primary_glance_image_store to choose the image store [puppet] - 10https://gerrit.wikimedia.org/r/589096 (https://phabricator.wikimedia.org/T249941) [19:49:21] (03CR) 10jerkins-bot: [V: 04-1] glance image_sync: use primary_glance_image_store to choose the image store [puppet] - 10https://gerrit.wikimedia.org/r/589096 (https://phabricator.wikimedia.org/T249941) (owner: 10Andrew Bogott) [19:53:32] !log pool wdqs1006 caught up [19:53:37] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:54:55] (03CR) 10Bstorm: [C: 04-1] "making sure the build works with my changes before putting up the new patch" [docker-images/toollabs-images] - 10https://gerrit.wikimedia.org/r/589086 (https://phabricator.wikimedia.org/T141388) (owner: 10Bstorm) [20:00:04] halfak and accraze: (Dis)respected human, time to deploy Services – Graphoid / Citoid / ORES (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20200415T2000). Please do the needful. [20:00:43] (03PS4) 10Bstorm: rails: ruby needs nodejs and bundler to be useful [docker-images/toollabs-images] - 10https://gerrit.wikimedia.org/r/589086 (https://phabricator.wikimedia.org/T141388) [20:04:12] (03PS4) 10Andrew Bogott: glance image_sync: use primary_glance_image_store to choose the image store [puppet] - 10https://gerrit.wikimedia.org/r/589096 (https://phabricator.wikimedia.org/T249941) [20:04:14] (03PS1) 10Andrew Bogott: glance profiles: remove use of $nova_controller param [puppet] - 10https://gerrit.wikimedia.org/r/589112 (https://phabricator.wikimedia.org/T249941) [20:05:39] (03PS3) 10RLazarus: maintenance: Migrate refreshlinks to periodic_job [puppet] - 10https://gerrit.wikimedia.org/r/587331 (https://phabricator.wikimedia.org/T211250) [20:07:19] 10Operations, 10Core Platform Team, 10MediaWiki-Cache, 10Traffic, and 2 others: Stop sending purges for `action=history` for linked pages. - https://phabricator.wikimedia.org/T250261 (10Joe) I would frankly prefer to pass a flag to getCdnUrls, and return those dependent urls only if the flag has its defaul... [20:07:36] (03CR) 10jerkins-bot: [V: 04-1] glance image_sync: use primary_glance_image_store to choose the image store [puppet] - 10https://gerrit.wikimedia.org/r/589096 (https://phabricator.wikimedia.org/T249941) (owner: 10Andrew Bogott) [20:08:27] (03CR) 10jerkins-bot: [V: 04-1] glance profiles: remove use of $nova_controller param [puppet] - 10https://gerrit.wikimedia.org/r/589112 (https://phabricator.wikimedia.org/T249941) (owner: 10Andrew Bogott) [20:11:12] !log bsitzmann@deploy1001 Started deploy [mobileapps/deploy@1907571]: Update mobileapps to ff34d0b5 [20:11:16] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:12:33] (03CR) 10RLazarus: [C: 03+2] maintenance: Migrate refreshlinks to periodic_job [puppet] - 10https://gerrit.wikimedia.org/r/587331 (https://phabricator.wikimedia.org/T211250) (owner: 10RLazarus) [20:14:49] (03PS5) 10Andrew Bogott: glance image_sync: use primary_glance_image_store to choose the image store [puppet] - 10https://gerrit.wikimedia.org/r/589096 (https://phabricator.wikimedia.org/T249941) [20:14:51] (03PS2) 10Andrew Bogott: glance profiles: remove use of $nova_controller param [puppet] - 10https://gerrit.wikimedia.org/r/589112 (https://phabricator.wikimedia.org/T249941) [20:16:09] !log bsitzmann@deploy1001 Finished deploy [mobileapps/deploy@1907571]: Update mobileapps to ff34d0b5 (duration: 04m 57s) [20:16:13] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:18:09] (03CR) 10Subramanya Sastry: Check out parsoid deploy modules using git::clone, not scap (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/577656 (owner: 10C. Scott Ananian) [20:19:38] (03CR) 10jerkins-bot: [V: 04-1] glance image_sync: use primary_glance_image_store to choose the image store [puppet] - 10https://gerrit.wikimedia.org/r/589096 (https://phabricator.wikimedia.org/T249941) (owner: 10Andrew Bogott) [20:19:45] (03CR) 10jerkins-bot: [V: 04-1] glance profiles: remove use of $nova_controller param [puppet] - 10https://gerrit.wikimedia.org/r/589112 (https://phabricator.wikimedia.org/T249941) (owner: 10Andrew Bogott) [20:21:08] (03PS6) 10Andrew Bogott: glance image_sync: use primary_glance_image_store to choose the image store [puppet] - 10https://gerrit.wikimedia.org/r/589096 (https://phabricator.wikimedia.org/T249941) [20:21:10] (03PS3) 10Andrew Bogott: glance profiles: remove use of $nova_controller param [puppet] - 10https://gerrit.wikimedia.org/r/589112 (https://phabricator.wikimedia.org/T249941) [20:35:27] PROBLEM - PHP opcache health on mwdebug2001 is CRITICAL: CRITICAL: opcache cache-hit ratio is below 99.85% https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [20:40:58] (03CR) 10Volans: [C: 03+1] "LGTM, if Valentin could do a second pass on the apache/acme config would be great." [puppet] - 10https://gerrit.wikimedia.org/r/575603 (https://phabricator.wikimedia.org/T243927) (owner: 10CRusnov) [20:42:47] RECOVERY - PHP opcache health on mwdebug2001 is OK: OK: opcache is healthy https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [20:43:54] (03PS3) 10Andrew Bogott: cloud-vps hiera: introduce openstack_controllers and keystone_api_fqdn [puppet] - 10https://gerrit.wikimedia.org/r/589095 (https://phabricator.wikimedia.org/T249941) [20:43:56] (03PS7) 10Andrew Bogott: glance image_sync: use primary_glance_image_store to choose the image store [puppet] - 10https://gerrit.wikimedia.org/r/589096 (https://phabricator.wikimedia.org/T249941) [20:43:58] (03PS4) 10Andrew Bogott: glance profiles: remove use of $nova_controller param [puppet] - 10https://gerrit.wikimedia.org/r/589112 (https://phabricator.wikimedia.org/T249941) [20:48:35] (03CR) 10Volans: [C: 04-1] "@jbond42, as requested, this is the cookbook I was referring to earlier in the meeting:" [puppet] - 10https://gerrit.wikimedia.org/r/576101 (https://phabricator.wikimedia.org/T206951) (owner: 10Jbond) [21:17:07] PROBLEM - PHP opcache health on mw2292 is CRITICAL: CRITICAL: opcache cache-hit ratio is below 99.85% https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [21:25:12] 10Operations, 10observability, 10Performance-Team (Radar): Decide on `service-runner` aggregated prometheus metrics and use of `service` label - https://phabricator.wikimedia.org/T247820 (10colewhite) Patch needs a review/merge, etc. (cc. @Ottomata @Pchelolo ) [21:25:19] 10Operations, 10observability, 10Performance-Team (Radar): Decide on `service-runner` aggregated prometheus metrics and use of `service` label - https://phabricator.wikimedia.org/T247820 (10colewhite) a:05colewhite→03None [21:27:49] 10Operations, 10observability, 10Core Platform Team Workboards (External Code Reviews), 10Performance-Team (Radar): Decide on `service-runner` aggregated prometheus metrics and use of `service` label - https://phabricator.wikimedia.org/T247820 (10Pchelolo) [21:31:01] 10Operations, 10Core Platform Team, 10MediaWiki-Cache, 10Traffic, and 2 others: Stop sending purges for `action=history` for linked pages. - https://phabricator.wikimedia.org/T250261 (10Ladsgroup) An idea: `Title::getCdnUrls()` can be moved to `HtmlCacheUpdater` (recently introduced class). [21:34:13] 10Operations, 10Core Platform Team, 10MediaWiki-Cache, 10Traffic, and 2 others: Stop sending purges for `action=history` for linked pages. - https://phabricator.wikimedia.org/T250261 (10daniel) Joe wrote: > I would frankly prefer to pass a flag to getCdnUrls, and return those dependent urls only if the fla... [21:35:33] RECOVERY - PHP opcache health on mw2292 is OK: OK: opcache is healthy https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [21:38:47] (03PS8) 10Andrew Bogott: glance image_sync: use primary_glance_image_store to choose the image store [puppet] - 10https://gerrit.wikimedia.org/r/589096 (https://phabricator.wikimedia.org/T249941) [21:38:49] (03PS5) 10Andrew Bogott: glance profiles: remove use of $nova_controller param [puppet] - 10https://gerrit.wikimedia.org/r/589112 (https://phabricator.wikimedia.org/T249941) [21:38:51] (03PS1) 10Andrew Bogott: Glance profiles: remove firewall rule for labs_hosts_range [puppet] - 10https://gerrit.wikimedia.org/r/589139 [21:45:10] (03PS5) 10Bstorm: rails: ruby needs nodejs and bundler to be useful [docker-images/toollabs-images] - 10https://gerrit.wikimedia.org/r/589086 (https://phabricator.wikimedia.org/T141388) [21:48:46] !log removing duplicate incdices from production ES clusters that were created when reindexing failed [21:48:52] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:52:27] (03CR) 10Bstorm: "This now works to get a rails server running. Because the rails package isn't installed, it generally assumes the user has build the app e" [docker-images/toollabs-images] - 10https://gerrit.wikimedia.org/r/589086 (https://phabricator.wikimedia.org/T141388) (owner: 10Bstorm) [21:53:12] (03CR) 10Bstorm: rails: ruby needs nodejs and bundler to be useful (031 comment) [docker-images/toollabs-images] - 10https://gerrit.wikimedia.org/r/589086 (https://phabricator.wikimedia.org/T141388) (owner: 10Bstorm) [22:08:24] mdholloway: Deploy away. [22:08:32] James_F: thanks! [22:10:35] (03PS1) 10Andrew Bogott: Glance: use keystone_api_fqdn for endpoints [puppet] - 10https://gerrit.wikimedia.org/r/589143 (https://phabricator.wikimedia.org/T249941) [22:10:37] (03PS1) 10Andrew Bogott: Glance profiles: add param types and lookup() calls [puppet] - 10https://gerrit.wikimedia.org/r/589144 [22:11:43] !log mholloway-shell@deploy1001 Synchronized php-1.35.0-wmf.28/extensions/MachineVision: Fix: Initialize categories array for initial images (T250321) (duration: 01m 07s) [22:11:49] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:11:50] T250321: Get categories for initial image data - https://phabricator.wikimedia.org/T250321 [22:11:55] PROBLEM - PHP opcache health on mw2293 is CRITICAL: CRITICAL: opcache cache-hit ratio is below 99.85% https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [22:13:57] (03CR) 10jerkins-bot: [V: 04-1] Glance: use keystone_api_fqdn for endpoints [puppet] - 10https://gerrit.wikimedia.org/r/589143 (https://phabricator.wikimedia.org/T249941) (owner: 10Andrew Bogott) [22:15:01] (03CR) 10jerkins-bot: [V: 04-1] Glance profiles: add param types and lookup() calls [puppet] - 10https://gerrit.wikimedia.org/r/589144 (owner: 10Andrew Bogott) [22:16:20] (03PS2) 10Andrew Bogott: Glance: use keystone_api_fqdn for endpoints [puppet] - 10https://gerrit.wikimedia.org/r/589143 (https://phabricator.wikimedia.org/T249941) [22:16:22] (03PS2) 10Andrew Bogott: Glance profiles: add param types and lookup() calls [puppet] - 10https://gerrit.wikimedia.org/r/589144 [22:22:59] RECOVERY - PHP opcache health on mw2293 is OK: OK: opcache is healthy https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [22:24:05] (03PS3) 10Andrew Bogott: Glance profiles: add param types and lookup() calls [puppet] - 10https://gerrit.wikimedia.org/r/589144 [22:28:45] (03PS9) 10Andrew Bogott: glance image_sync: use primary_glance_image_store to choose the image store [puppet] - 10https://gerrit.wikimedia.org/r/589096 (https://phabricator.wikimedia.org/T249941) [22:28:47] (03PS6) 10Andrew Bogott: glance profiles: remove use of $nova_controller param [puppet] - 10https://gerrit.wikimedia.org/r/589112 (https://phabricator.wikimedia.org/T249941) [22:28:49] (03PS2) 10Andrew Bogott: Glance profiles: remove firewall rule for labs_hosts_range [puppet] - 10https://gerrit.wikimedia.org/r/589139 [22:28:51] (03PS3) 10Andrew Bogott: Glance: use keystone_api_fqdn for endpoints [puppet] - 10https://gerrit.wikimedia.org/r/589143 (https://phabricator.wikimedia.org/T249941) [22:28:53] (03PS4) 10Andrew Bogott: Glance profiles: add param types and lookup() calls [puppet] - 10https://gerrit.wikimedia.org/r/589144 [22:31:30] (03PS10) 10Andrew Bogott: glance image_sync: use primary_glance_image_store to choose the image store [puppet] - 10https://gerrit.wikimedia.org/r/589096 (https://phabricator.wikimedia.org/T249941) [22:31:32] (03PS7) 10Andrew Bogott: glance profiles: remove use of $nova_controller param [puppet] - 10https://gerrit.wikimedia.org/r/589112 (https://phabricator.wikimedia.org/T249941) [22:31:34] (03PS3) 10Andrew Bogott: Glance profiles: remove firewall rule for labs_hosts_range [puppet] - 10https://gerrit.wikimedia.org/r/589139 [22:31:36] (03PS4) 10Andrew Bogott: Glance: use keystone_api_fqdn for endpoints [puppet] - 10https://gerrit.wikimedia.org/r/589143 (https://phabricator.wikimedia.org/T249941) [22:31:38] (03PS5) 10Andrew Bogott: Glance profiles: add param types and lookup() calls [puppet] - 10https://gerrit.wikimedia.org/r/589144 [22:42:08] 10Operations, 10ops-eqiad, 10DC-Ops: Netbox report accounting icinga alert - https://phabricator.wikimedia.org/T250053 (10Jclark-ctr) @wiki_willy fixed accounting report coherence report only remaining for eqiad is flerovium-array2 will have to check U on site tomorrow [22:55:20] 10Operations, 10Maps, 10Toolforge, 10Patch-For-Review, and 2 others: maps: whitelist/reduce ratelimit from requests with toolforge.org referrer - https://phabricator.wikimedia.org/T249815 (10bd808) a:03Urbanecm @Urbanecm Is this all fixed up now or is there more followup needed? [22:56:24] 10Operations, 10Maps, 10Toolforge, 10Patch-For-Review, and 2 others: maps: whitelist/reduce ratelimit from requests with toolforge.org referrer - https://phabricator.wikimedia.org/T249815 (10Urbanecm) 05Open→03Resolved >>! In T249815#6060622, @bd808 wrote: > @Urbanecm Is this all fixed up now or is the... [23:00:04] RoanKattouw, Niharika, and Urbanecm: I seem to be stuck in Groundhog week. Sigh. Time for (yet another) Evening SWAT(Max 6 patches) deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20200415T2300). [23:00:04] No GERRIT patches in the queue for this window AFAICS. [23:18:11] (03CR) 10BryanDavis: [C: 03+2] "Let's give this a shot and iterate as needed" (031 comment) [docker-images/toollabs-images] - 10https://gerrit.wikimedia.org/r/589086 (https://phabricator.wikimedia.org/T141388) (owner: 10Bstorm) [23:18:36] (03Merged) 10jenkins-bot: rails: ruby needs nodejs and bundler to be useful [docker-images/toollabs-images] - 10https://gerrit.wikimedia.org/r/589086 (https://phabricator.wikimedia.org/T141388) (owner: 10Bstorm) [23:44:24] 10Operations, 10Research: recommendation api's test on scb nodes are flapping - https://phabricator.wikimedia.org/T247732 (10bmansurov) Thanks both. @leila I'm on it. I need to access logstash.wikimedia.org to see the logs. According to [[ https://wikitech.wikimedia.org/wiki/Logstash | the documentation ]]: >... [23:56:43] PROBLEM - PHP opcache health on mw2296 is CRITICAL: CRITICAL: opcache cache-hit ratio is below 99.85% https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health