[00:00:04] bd808: Dear deployers, time to do the Striker deploy. Dont look at me like that. You signed up for it. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20190322T0000). [00:00:09] RECOVERY - HHVM rendering on mwdebug1002 is OK: HTTP OK: HTTP/1.1 200 OK - 80651 bytes in 0.171 second response time https://wikitech.wikimedia.org/wiki/Application_servers [00:01:00] * bd808 missed an opportunity to get jouncebot to say "time to do the Time Warp again!" [00:01:08] * James_F grins. [00:01:24] * bd808 flings toast at James_F [00:02:38] (03PS1) 10Jforrester: [BETA] SDC: Add digitalRepresentationOf property for Beta Commons [mediawiki-config] - 10https://gerrit.wikimedia.org/r/498270 [00:03:12] (03CR) 10Jforrester: [C: 03+2] [BETA] SDC: Add digitalRepresentationOf property for Beta Commons [mediawiki-config] - 10https://gerrit.wikimedia.org/r/498270 (owner: 10Jforrester) [00:04:17] (03PS2) 10CRusnov: Add synchronizing nodes to ganeti-netbox sync. [software/netbox-deploy] - 10https://gerrit.wikimedia.org/r/498268 [00:04:22] (03Merged) 10jenkins-bot: [BETA] SDC: Add digitalRepresentationOf property for Beta Commons [mediawiki-config] - 10https://gerrit.wikimedia.org/r/498270 (owner: 10Jforrester) [00:04:49] (03CR) 10CRusnov: "Tested working in test instanec." [software/netbox-deploy] - 10https://gerrit.wikimedia.org/r/498268 (owner: 10CRusnov) [00:13:42] bd808: For whatever reason i10n-update took less than six minutes but rsync common took five, which is rather high. [00:20:57] !log jforrester@deploy1001 Finished scap: SWAT: Full scap for i18n rebuild for 498259 and 498113 (duration: 24m 49s) [00:20:59] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [00:23:05] Amir1: All looks good? [00:27:29] PROBLEM - Logstash rate of ingestion percent change compared to yesterday on icinga1001 is CRITICAL: 139.8 ge 130 https://grafana.wikimedia.org/dashboard/db/logstash?orgId=1&panelId=2&fullscreen [00:30:54] James_F: I'm going to do my scap3 deploy of Striker unless you have a strong objection [00:32:05] bd808: Sorry, go for it. [00:32:18] !log SWAT done, 12 minutes ago. [00:32:19] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [00:32:26] :-) [00:34:41] !log bd808@deploy1001 Started deploy [striker/deploy@c4726e3]: Django upgrade and various bug fixes (T192487, T182142, T176325, T217932) [00:34:49] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [00:34:49] T176325: "No Phabricator accounts found for tool maintainers." while creating the a new diffusion repository - https://phabricator.wikimedia.org/T176325 [00:34:50] T217932: Change log routing to ELK cluster to use rsyslog->kafka rather than talking directly to the ELK cluster - https://phabricator.wikimedia.org/T217932 [00:34:50] T192487: Update Django to latest stable or LTS - https://phabricator.wikimedia.org/T192487 [00:34:50] T182142: Diffusion repository creation fails via toolsadmin - https://phabricator.wikimedia.org/T182142 [00:35:56] !log bd808@deploy1001 Finished deploy [striker/deploy@c4726e3]: Django upgrade and various bug fixes (T192487, T182142, T176325, T217932) (duration: 01m 15s) [00:36:03] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [00:37:43] grrr [00:47:43] !log krinkle@mwmaint1002 Fixing corrupt 'log_params' field of kawiki.logging row where log_id=1021367; T93110 [00:47:46] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [00:47:47] T93110: Invalid parameter for message "logentry-massmessage-failure" - https://phabricator.wikimedia.org/T93110 [00:47:56] legoktm: done [00:48:09] \o/ [00:48:12] thanks :)) [00:49:46] legoktm: https://gist.github.com/Krinkle/0288308648331293c6f40504a0321588 [00:50:12] nice! [00:50:16] looks mostly what I did as well [00:50:58] (03PS1) 10Herron: add dummy modsec file to pacify PCC [labs/private] - 10https://gerrit.wikimedia.org/r/498280 [00:51:24] legoktm: cooll, I suppose there could be others. [00:51:30] but we'll find them when we find them I suppose. [00:51:35] (03CR) 10Herron: [V: 03+2 C: 03+2] add dummy modsec file to pacify PCC [labs/private] - 10https://gerrit.wikimedia.org/r/498280 (owner: 10Herron) [00:53:18] blerg... this is not my best day [00:54:28] !log Striker down following upgrade. scap3 did not rebuild venv as expected. Manually resolved, but not having mysql library issues. [00:54:29] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [00:55:14] hey, SAL logs to wiki again :D [00:55:49] yay [00:55:56] * Krenair finds task to token [00:57:02] https://phabricator.wikimedia.org/T218708 [00:57:09] closed as duplicate of restricted but anyway [00:57:27] Striker problem is that I build the wheels on a host with different mysql libs. Should be "easy" to fix if I can find the right place to try again. [00:58:29] * Krinkle stalks Krenair profile to find the task once hes tokenized it [00:59:31] I linked it Krinkle [00:59:40] yeah, I seee that now :D Thanks [00:59:43] :D [01:00:49] I'm looking through the non-dupe task to see if there's a reason for it to be secret other then all of us being paranoid [01:02:03] There's one small part that people might want private, but its kind of hard to justify keeping it private just for that [01:03:51] is the non-dupe task fairly big to the point where it's hard to justify reverse-duping it? [01:04:50] yes [01:05:54] well that's kind of a pain but not a huge deal [01:09:10] Krenair: its public now https://phabricator.wikimedia.org/T218608 [01:09:50] cool [01:18:02] !log bd808@deploy1001 Started deploy [striker/deploy@b4bcd08]: Update python wheels [01:18:04] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [01:19:02] !log bd808@deploy1001 Finished deploy [striker/deploy@b4bcd08]: Update python wheels (duration: 01m 00s) [01:19:03] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [01:24:43] (03PS2) 10Papaul: DNS: Fix production DNS for dbprov2002 [dns] - 10https://gerrit.wikimedia.org/r/498119 [01:25:11] (03CR) 10jerkins-bot: [V: 04-1] DNS: Fix production DNS for dbprov2002 [dns] - 10https://gerrit.wikimedia.org/r/498119 (owner: 10Papaul) [01:31:08] !log labweb: upgraded mariadb packages installed on labweb100[12] [01:31:09] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [01:31:46] ITS ALIVE!!! [02:03:50] 10Operations, 10Elasticsearch, 10Discovery-Search (Current work): Convert check_elasticsearch.py icinga plugin to py3 - https://phabricator.wikimedia.org/T215439 (10Mathew.onipe) [02:06:09] (03PS1) 10Mathew.onipe: elasticsearch: convert check to py3 [puppet] - 10https://gerrit.wikimedia.org/r/498292 (https://phabricator.wikimedia.org/T215439) [02:06:53] (03CR) 10jerkins-bot: [V: 04-1] elasticsearch: convert check to py3 [puppet] - 10https://gerrit.wikimedia.org/r/498292 (https://phabricator.wikimedia.org/T215439) (owner: 10Mathew.onipe) [02:12:32] (03PS2) 10Mathew.onipe: elasticsearch: convert check to py3 [puppet] - 10https://gerrit.wikimedia.org/r/498292 (https://phabricator.wikimedia.org/T215439) [02:57:33] 10Operations, 10Cloud-VPS, 10Toolforge, 10LDAP, and 2 others: LDAP server running out of memory frequently and disrupting Cloud VPS clients - https://phabricator.wikimedia.org/T217280 (10Bstorm) Since I cannot tell if these tools nodes are NFS, LDAP or both, I'm still recording them here. Honestly, the ef... [03:41:29] PROBLEM - puppet last run on cp3042 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [03:47:23] PROBLEM - puppet last run on mc1025 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [03:55:53] PROBLEM - Check size of conntrack table on an-worker1079 is CRITICAL: CRITICAL: nf_conntrack is 93 % full [03:55:53] PROBLEM - Check size of conntrack table on an-worker1093 is CRITICAL: CRITICAL: nf_conntrack is 90 % full [03:56:11] PROBLEM - Check size of conntrack table on analytics1070 is CRITICAL: CRITICAL: nf_conntrack is 90 % full [03:57:09] RECOVERY - Check size of conntrack table on an-worker1079 is OK: OK: nf_conntrack is 63 % full [03:57:11] RECOVERY - Check size of conntrack table on an-worker1093 is OK: OK: nf_conntrack is 57 % full [03:57:27] RECOVERY - Check size of conntrack table on analytics1070 is OK: OK: nf_conntrack is 46 % full [04:07:51] RECOVERY - puppet last run on cp3042 is OK: OK: Puppet is currently enabled, last run 1 second ago with 0 failures [04:19:01] RECOVERY - puppet last run on mc1025 is OK: OK: Puppet is currently enabled, last run 5 minutes ago with 0 failures [04:28:09] PROBLEM - Logstash rate of ingestion percent change compared to yesterday on icinga1001 is CRITICAL: 149 ge 130 https://grafana.wikimedia.org/dashboard/db/logstash?orgId=1&panelId=2&fullscreen [04:28:41] PROBLEM - puppet last run on ms-be1033 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [04:55:01] RECOVERY - puppet last run on ms-be1033 is OK: OK: Puppet is currently enabled, last run 26 seconds ago with 0 failures [05:10:45] PROBLEM - puppet last run on phab1001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [05:10:45] PROBLEM - puppet last run on dbproxy1006 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [05:37:11] RECOVERY - puppet last run on phab1001 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [05:37:11] RECOVERY - puppet last run on dbproxy1006 is OK: OK: Puppet is currently enabled, last run 57 seconds ago with 0 failures [05:39:13] PROBLEM - puppet last run on an-worker1095 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [05:46:43] PROBLEM - puppet last run on mc1033 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [06:00:03] (03PS1) 10Marostegui: Revert "db-codfw.php: Depool db2096" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/498308 [06:00:55] (03PS1) 10Marostegui: db-eqiad.php: Depool db1094 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/498309 [06:02:12] (03CR) 10Marostegui: [C: 03+2] Revert "db-codfw.php: Depool db2096" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/498308 (owner: 10Marostegui) [06:03:58] (03Merged) 10jenkins-bot: Revert "db-codfw.php: Depool db2096" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/498308 (owner: 10Marostegui) [06:05:23] !log marostegui@deploy1001 Synchronized wmf-config/db-codfw.php: Repool db2096 after onsite maintenance (duration: 00m 51s) [06:05:28] (03CR) 10Marostegui: [C: 03+2] db-eqiad.php: Depool db1094 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/498309 (owner: 10Marostegui) [06:05:29] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:06:31] (03Merged) 10jenkins-bot: db-eqiad.php: Depool db1094 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/498309 (owner: 10Marostegui) [06:07:36] !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Depool db1094 (duration: 00m 49s) [06:07:37] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:10:49] RECOVERY - puppet last run on an-worker1095 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures [06:13:03] RECOVERY - puppet last run on mc1033 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [06:19:22] (03CR) 10Marostegui: DNS: Fix production DNS for dbprov2002 (031 comment) [dns] - 10https://gerrit.wikimedia.org/r/498119 (owner: 10Papaul) [06:23:42] (03PS1) 10Marostegui: Revert "db-eqiad.php: Depool db1094" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/498311 [06:24:46] (03PS3) 10Papaul: DNS: Fix mgmt DNS for dbprov2002 [dns] - 10https://gerrit.wikimedia.org/r/498119 [06:24:53] (03CR) 10Marostegui: [C: 03+2] Revert "db-eqiad.php: Depool db1094" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/498311 (owner: 10Marostegui) [06:25:59] (03Merged) 10jenkins-bot: Revert "db-eqiad.php: Depool db1094" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/498311 (owner: 10Marostegui) [06:26:32] (03CR) 10Marostegui: [C: 03+2] DNS: Fix mgmt DNS for dbprov2002 [dns] - 10https://gerrit.wikimedia.org/r/498119 (owner: 10Papaul) [06:26:59] !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Repool db1094 (duration: 00m 50s) [06:27:00] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:28:13] PROBLEM - Check systemd state on netmon1002 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [06:28:21] PROBLEM - puppet last run on cp1080 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): File[/etc/profile.d/bash_autologout.sh] [06:29:13] PROBLEM - netbox HTTPS on netmon1002 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - 547 bytes in 0.008 second response time https://wikitech.wikimedia.org/wiki/Netbox [06:31:28] 10Operations, 10Parsoid-PHP, 10Patch-For-Review: Install PHP7 on scandium - https://phabricator.wikimedia.org/T213493 (10Joe) So first of all, why do wtp servers have php installed even? They should not, and they don't. And since this is for testing php-parsoid it needs to match what's on the application se... [06:33:07] PROBLEM - puppet last run on bast3002 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 3 minutes ago with 1 failures. Failed resources (up to 3 shown): File[/usr/local/bin/gen_fingerprints] [06:38:05] RECOVERY - netbox HTTPS on netmon1002 is OK: HTTP OK: HTTP/1.1 302 Found - 348 bytes in 0.515 second response time https://wikitech.wikimedia.org/wiki/Netbox [06:38:23] RECOVERY - Check systemd state on netmon1002 is OK: OK - running: The system is fully operational [06:38:53] RECOVERY - puppet last run on cp1080 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures [06:45:49] PROBLEM - puppet last run on elastic1044 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [06:48:55] PROBLEM - puppet last run on bast3002 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 19 minutes ago with 1 failures. Failed resources (up to 3 shown): File[/usr/local/bin/gen_fingerprints] [06:56:29] (03CR) 10Mathew.onipe: [WIP] Switch mjolnir to rsyslog based structured logging (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/498232 (https://phabricator.wikimedia.org/T218833) (owner: 10EBernhardson) [06:58:19] (03CR) 10Mathew.onipe: [C: 03+1] elasticsearch: deploy elasticsearch config for ES6 cirrus / codfw [puppet] - 10https://gerrit.wikimedia.org/r/498080 (https://phabricator.wikimedia.org/T218878) (owner: 10Gehel) [06:59:27] RECOVERY - puppet last run on bast3002 is OK: OK: Puppet is currently enabled, last run 5 minutes ago with 0 failures [07:00:31] (03CR) 10Mathew.onipe: "We can use the hieradata/role/common/elasticsearch/cirrus.yaml since eqiad is updated already" [puppet] - 10https://gerrit.wikimedia.org/r/498079 (https://phabricator.wikimedia.org/T218878) (owner: 10Gehel) [07:01:58] (03CR) 10Mathew.onipe: "We can use the hieradata/role/common/elasticsearch/cirrus.yaml since eqiad is updated already" [puppet] - 10https://gerrit.wikimedia.org/r/498080 (https://phabricator.wikimedia.org/T218878) (owner: 10Gehel) [07:07:21] (03PS2) 10ArielGlenn: switch over wikidata entity dumps to use the misc dump config file [puppet] - 10https://gerrit.wikimedia.org/r/498159 (https://phabricator.wikimedia.org/T205825) [07:09:02] (03CR) 10ArielGlenn: [C: 03+2] switch over wikidata entity dumps to use the misc dump config file [puppet] - 10https://gerrit.wikimedia.org/r/498159 (https://phabricator.wikimedia.org/T205825) (owner: 10ArielGlenn) [07:17:27] RECOVERY - puppet last run on elastic1044 is OK: OK: Puppet is currently enabled, last run 5 minutes ago with 0 failures [07:29:28] (03CR) 10DCausse: [C: 04-1] "need to switch search trafic back to eqiad first" [puppet] - 10https://gerrit.wikimedia.org/r/498079 (https://phabricator.wikimedia.org/T218878) (owner: 10Gehel) [07:43:48] 10Operations, 10Parsoid-PHP, 10Patch-For-Review: Install PHP7 on scandium - https://phabricator.wikimedia.org/T213493 (10MoritzMuehlenhoff) >>! In T213493#5047267, @Joe wrote: > So first of all, why do wtp servers have php installed even? They should not, and they don't. They do actually, scap pulls in php-... [07:49:05] (03CR) 10Urbanecm: [C: 03+1] "LGTM!" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/493385 (https://phabricator.wikimedia.org/T173070) (owner: 10Ammarpad) [07:50:14] (03CR) 10jerkins-bot: [V: 04-1] Set default aliases for Project_talk namespace [mediawiki-config] - 10https://gerrit.wikimedia.org/r/493385 (https://phabricator.wikimedia.org/T173070) (owner: 10Ammarpad) [07:50:51] (03CR) 10Urbanecm: [C: 03+1] "LGTM" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/495918 (https://phabricator.wikimedia.org/T216885) (owner: 10Ammarpad) [07:56:26] (03PS3) 10Muehlenhoff: Add curl to standard packages [puppet] - 10https://gerrit.wikimedia.org/r/498045 (https://phabricator.wikimedia.org/T213527) [07:58:26] (03CR) 10Muehlenhoff: [C: 03+2] Add curl to standard packages [puppet] - 10https://gerrit.wikimedia.org/r/498045 (https://phabricator.wikimedia.org/T213527) (owner: 10Muehlenhoff) [08:04:31] (03PS4) 10Jcrespo: mariadb-backups: Make sure retention is handled correctly [puppet] - 10https://gerrit.wikimedia.org/r/498024 (https://phabricator.wikimedia.org/T210292) [08:04:33] (03PS4) 10Jcrespo: mariadb-snapshots: Allow the option to only postprocess snapshots [puppet] - 10https://gerrit.wikimedia.org/r/498029 (https://phabricator.wikimedia.org/T210292) [08:04:35] (03PS1) 10Jcrespo: mariadb-snapshots: Chose right backup source for s1 [puppet] - 10https://gerrit.wikimedia.org/r/498314 (https://phabricator.wikimedia.org/T210292) [08:05:38] (03CR) 10jerkins-bot: [V: 04-1] mariadb-backups: Make sure retention is handled correctly [puppet] - 10https://gerrit.wikimedia.org/r/498024 (https://phabricator.wikimedia.org/T210292) (owner: 10Jcrespo) [08:06:11] PROBLEM - puppet last run on contint1001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [08:06:34] (03PS5) 10Jcrespo: mariadb-backups: Make sure retention is handled correctly [puppet] - 10https://gerrit.wikimedia.org/r/498024 (https://phabricator.wikimedia.org/T210292) [08:09:25] (03PS1) 10Jcrespo: WMFBackup: Make sure retention is handled correctly [software/wmfmariadbpy] - 10https://gerrit.wikimedia.org/r/498315 (https://phabricator.wikimedia.org/T210292) [08:09:43] (03CR) 10jerkins-bot: [V: 04-1] WMFBackup: Make sure retention is handled correctly [software/wmfmariadbpy] - 10https://gerrit.wikimedia.org/r/498315 (https://phabricator.wikimedia.org/T210292) (owner: 10Jcrespo) [08:10:10] (03CR) 10Jcrespo: [C: 03+2] mariadb-snapshots: Chose right backup source for s1 [puppet] - 10https://gerrit.wikimedia.org/r/498314 (https://phabricator.wikimedia.org/T210292) (owner: 10Jcrespo) [08:10:23] (03PS2) 10Jcrespo: mariadb-snapshots: Chose right backup source for s1 [puppet] - 10https://gerrit.wikimedia.org/r/498314 (https://phabricator.wikimedia.org/T210292) [08:10:25] (03PS1) 10Muehlenhoff: Add systemd to filter_services list of debdeploy [puppet] - 10https://gerrit.wikimedia.org/r/498316 [08:12:36] (03PS1) 10Urbanecm: Initial configuration for hiwikisource [mediawiki-config] - 10https://gerrit.wikimedia.org/r/498317 (https://phabricator.wikimedia.org/T218155) [08:13:21] (03CR) 10jerkins-bot: [V: 04-1] Initial configuration for hiwikisource [mediawiki-config] - 10https://gerrit.wikimedia.org/r/498317 (https://phabricator.wikimedia.org/T218155) (owner: 10Urbanecm) [08:16:04] 10Operations, 10ops-codfw, 10DBA, 10Patch-For-Review: rack/setup/deploy codfw dedicated backup recovery/provisioning hosts - https://phabricator.wikimedia.org/T218336 (10Marostegui) I think we need to decide how to install these from a partitioning point of view. We can install them manually and not use t... [08:22:40] (03PS1) 10DCausse: [cirrus] switch low volume wikis to eqiad (elastic 6.5.4) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/498318 (https://phabricator.wikimedia.org/T218878) [08:22:45] (03PS1) 10DCausse: [cirrus] switch all wikis to eqiad (elastic 6.5.4) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/498319 (https://phabricator.wikimedia.org/T218878) [08:24:36] (03PS1) 10Muehlenhoff: Add dbus-daemon to filter_services list of debdeploy [puppet] - 10https://gerrit.wikimedia.org/r/498321 (https://phabricator.wikimedia.org/T135991) [08:30:33] 10Operations, 10ops-codfw, 10DBA, 10Patch-For-Review: rack/setup/deploy codfw dedicated backup recovery/provisioning hosts - https://phabricator.wikimedia.org/T218336 (10jcrespo) Cannot we try db.cfg, then mount and partition the ssds later? Also, we should consider using buster. > SATA disks All are SA... [08:32:51] (03PS1) 10Marostegui: db-codfw.php: Change parsercache key [mediawiki-config] - 10https://gerrit.wikimedia.org/r/498322 (https://phabricator.wikimedia.org/T210725) [08:33:08] (03PS6) 10Jcrespo: mariadb-backups: Make sure retention is handled correctly [puppet] - 10https://gerrit.wikimedia.org/r/498024 (https://phabricator.wikimedia.org/T210292) [08:34:01] PROBLEM - puppet last run on contint2001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [08:35:09] 10Operations, 10ops-codfw, 10DBA, 10Patch-For-Review: rack/setup/deploy codfw dedicated backup recovery/provisioning hosts - https://phabricator.wikimedia.org/T218336 (10Marostegui) >>! In T218336#5047361, @jcrespo wrote: > Cannot we try db.cfg, then mount and partition the ssds later? Depending on which... [08:37:56] 10Operations, 10ops-codfw, 10DBA, 10Patch-For-Review: rack/setup/deploy codfw dedicated backup recovery/provisioning hosts - https://phabricator.wikimedia.org/T218336 (10jcrespo) > the OS might get installed on the SSDs. And it will be faster to just try once than to do 4 manual installs :-) > You know e... [08:38:04] (03PS2) 10Gehel: elasticsearch: upgrade to elastic 6.5.4 for cirrus / codfw [puppet] - 10https://gerrit.wikimedia.org/r/498079 (https://phabricator.wikimedia.org/T218878) [08:38:06] (03PS2) 10Gehel: elasticsearch: deploy elasticsearch config for ES6 cirrus / codfw [puppet] - 10https://gerrit.wikimedia.org/r/498080 (https://phabricator.wikimedia.org/T218878) [08:40:45] 10Operations, 10ops-codfw, 10DBA, 10Patch-For-Review: rack/setup/deploy codfw dedicated backup recovery/provisioning hosts - https://phabricator.wikimedia.org/T218336 (10Marostegui) >>! In T218336#5047371, @jcrespo wrote: >> the OS might get installed on the SSDs. > > And it will be faster to just try onc... [08:42:35] (03PS3) 10Gehel: elasticsearch: upgrade to elastic 6.5.4 for cirrus / codfw [puppet] - 10https://gerrit.wikimedia.org/r/498079 (https://phabricator.wikimedia.org/T218878) [08:42:50] (03CR) 10Jcrespo: [V: 03+2 C: 03+2] WMFBackup: Make sure retention is handled correctly [software/wmfmariadbpy] - 10https://gerrit.wikimedia.org/r/498315 (https://phabricator.wikimedia.org/T210292) (owner: 10Jcrespo) [08:43:12] (03CR) 10Jcrespo: [C: 03+2] mariadb-backups: Make sure retention is handled correctly [puppet] - 10https://gerrit.wikimedia.org/r/498024 (https://phabricator.wikimedia.org/T210292) (owner: 10Jcrespo) [08:51:59] (03CR) 10Dzahn: [C: 03+1] "> Patch Set 2:" [puppet] - 10https://gerrit.wikimedia.org/r/498057 (https://phabricator.wikimedia.org/T218844) (owner: 10MarcoAurelio) [08:53:33] (03CR) 10Dzahn: [C: 03+1] phd: restart on failures [puppet] - 10https://gerrit.wikimedia.org/r/498170 (owner: 10CDanis) [08:55:08] (03CR) 10Dzahn: [C: 03+1] prometheus: collect session storage Cassandra metrics [puppet] - 10https://gerrit.wikimedia.org/r/497848 (https://phabricator.wikimedia.org/T209108) (owner: 10Eevans) [08:56:07] (03PS5) 10Jcrespo: mariadb-snapshots: Allow the option to only postprocess snapshots [puppet] - 10https://gerrit.wikimedia.org/r/498029 (https://phabricator.wikimedia.org/T210292) [08:56:09] (03PS1) 10Jcrespo: backup_mariadb: Output Log to /var/log [puppet] - 10https://gerrit.wikimedia.org/r/498324 (https://phabricator.wikimedia.org/T210292) [08:57:23] (03CR) 10jerkins-bot: [V: 04-1] backup_mariadb: Output Log to /var/log [puppet] - 10https://gerrit.wikimedia.org/r/498324 (https://phabricator.wikimedia.org/T210292) (owner: 10Jcrespo) [08:58:03] (03CR) 10Dzahn: "i agree having logs would be good, at least at the beginning. you can put them into /var/log/mediawiki/. We have once created that locatio" [puppet] - 10https://gerrit.wikimedia.org/r/486454 (https://phabricator.wikimedia.org/T189091) (owner: 10KartikMistry) [09:00:36] (03PS2) 10Jcrespo: backup_mariadb: Output Log to /var/log [puppet] - 10https://gerrit.wikimedia.org/r/498324 (https://phabricator.wikimedia.org/T210292) [09:01:54] (03CR) 10jerkins-bot: [V: 04-1] backup_mariadb: Output Log to /var/log [puppet] - 10https://gerrit.wikimedia.org/r/498324 (https://phabricator.wikimedia.org/T210292) (owner: 10Jcrespo) [09:04:02] (03PS6) 10Jcrespo: mariadb-snapshots: Allow the option to only postprocess snapshots [puppet] - 10https://gerrit.wikimedia.org/r/498029 (https://phabricator.wikimedia.org/T210292) [09:04:04] (03PS3) 10Jcrespo: backup_mariadb: Output Log to /var/log [puppet] - 10https://gerrit.wikimedia.org/r/498324 (https://phabricator.wikimedia.org/T210292) [09:04:46] !log start tcpdump on mc1022 to gather traffic for analysis [09:04:48] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:05:34] this will run until some TKOs in mcrouter will appear, I have set a maximum of 30G pcap files (500M each) [09:05:36] (03PS1) 10Jcrespo: backup_mariadb: Output Log to /var/log [software/wmfmariadbpy] - 10https://gerrit.wikimedia.org/r/498326 (https://phabricator.wikimedia.org/T210292) [09:05:38] (03PS1) 10Jcrespo: backup_mariadb: Fix syntax error [software/wmfmariadbpy] - 10https://gerrit.wikimedia.org/r/498327 [09:06:00] (03CR) 10jerkins-bot: [V: 04-1] backup_mariadb: Fix syntax error [software/wmfmariadbpy] - 10https://gerrit.wikimedia.org/r/498327 (owner: 10Jcrespo) [09:06:02] (03CR) 10jerkins-bot: [V: 04-1] backup_mariadb: Output Log to /var/log [software/wmfmariadbpy] - 10https://gerrit.wikimedia.org/r/498326 (https://phabricator.wikimedia.org/T210292) (owner: 10Jcrespo) [09:07:50] (03PS2) 10Jcrespo: backup_mariadb: Output Log to /var/log [software/wmfmariadbpy] - 10https://gerrit.wikimedia.org/r/498326 (https://phabricator.wikimedia.org/T210292) [09:07:56] (03Abandoned) 10Jcrespo: backup_mariadb: Fix syntax error [software/wmfmariadbpy] - 10https://gerrit.wikimedia.org/r/498327 (owner: 10Jcrespo) [09:08:11] (03CR) 10jerkins-bot: [V: 04-1] backup_mariadb: Output Log to /var/log [software/wmfmariadbpy] - 10https://gerrit.wikimedia.org/r/498326 (https://phabricator.wikimedia.org/T210292) (owner: 10Jcrespo) [09:21:25] (03PS1) 10GTirloni: cloudvps: Exclude NFS shares from wmf-auto-restarts [puppet] - 10https://gerrit.wikimedia.org/r/498328 (https://phabricator.wikimedia.org/T217086) [09:24:14] (03CR) 10Muehlenhoff: [C: 03+1] "Looks good to me" [puppet] - 10https://gerrit.wikimedia.org/r/498328 (https://phabricator.wikimedia.org/T217086) (owner: 10GTirloni) [09:30:27] (03PS8) 10Giuseppe Lavagetto: profile::mediawiki::maintenance: systemd-timer based periodic jobs [puppet] - 10https://gerrit.wikimedia.org/r/482792 (https://phabricator.wikimedia.org/T211250) [09:31:38] (03PS1) 10Elukey: Add Hadoop TLS config to analytics1037 [puppet] - 10https://gerrit.wikimedia.org/r/498329 (https://phabricator.wikimedia.org/T217412) [09:35:52] (03PS2) 10Dzahn: phd: restart on failures [puppet] - 10https://gerrit.wikimedia.org/r/498170 (owner: 10CDanis) [09:39:02] (03CR) 10Dzahn: [C: 03+2] phd: restart on failures [puppet] - 10https://gerrit.wikimedia.org/r/498170 (owner: 10CDanis) [09:39:32] (03CR) 10Elukey: [C: 03+2] Add Hadoop TLS config to analytics1037 [puppet] - 10https://gerrit.wikimedia.org/r/498329 (https://phabricator.wikimedia.org/T217412) (owner: 10Elukey) [09:39:42] (03PS2) 10Elukey: Add Hadoop TLS config to analytics1037 [puppet] - 10https://gerrit.wikimedia.org/r/498329 (https://phabricator.wikimedia.org/T217412) [09:39:52] (03CR) 10Elukey: [V: 03+2 C: 03+2] Add Hadoop TLS config to analytics1037 [puppet] - 10https://gerrit.wikimedia.org/r/498329 (https://phabricator.wikimedia.org/T217412) (owner: 10Elukey) [09:41:25] (03CR) 10Dzahn: "deployed and service got refreshed on phab1001" [puppet] - 10https://gerrit.wikimedia.org/r/498170 (owner: 10CDanis) [09:42:00] !log rebooting pool counters in codfw to pick up SSBD-enabled qemu [09:42:01] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:44:48] (03PS1) 10Ema: cp2005: use ATS backends instead of Varnish [puppet] - 10https://gerrit.wikimedia.org/r/498330 (https://phabricator.wikimedia.org/T213263) [09:45:20] misses bot output from phab comments [09:47:53] !log cp2005: depool varnish-fe in preparation of traffic switch to ATS T213263 [09:47:55] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:47:56] T213263: Partial cache_upload traffic switchover to ATS and switchback to Varnish - https://phabricator.wikimedia.org/T213263 [09:48:26] !log ema@puppetmaster1001 conftool action : set/pooled=no; selector: name=cp2005.codfw.wmnet,service=nginx [09:48:28] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:48:28] !log ema@puppetmaster1001 conftool action : set/pooled=no; selector: name=cp2005.codfw.wmnet,service=varnish-fe [09:48:29] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:49:47] (03CR) 10Ema: [C: 03+2] cp2005: use ATS backends instead of Varnish [puppet] - 10https://gerrit.wikimedia.org/r/498330 (https://phabricator.wikimedia.org/T213263) (owner: 10Ema) [09:50:06] (03CR) 10Dzahn: [C: 04-1] "parameter 'sapis' expects an Array value, got String" [puppet] - 10https://gerrit.wikimedia.org/r/498104 (https://phabricator.wikimedia.org/T213493) (owner: 10Dzahn) [09:53:23] PROBLEM - puppet last run on cp2005 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 14 seconds ago with 1 failures. Failed resources (up to 3 shown) [09:53:47] (03PS1) 10Muehlenhoff: Add cn=gerritadmin to list of LDAP groups [puppet] - 10https://gerrit.wikimedia.org/r/498331 [09:54:03] cp2005 is me, please ignore [09:54:03] (03PS4) 10Dzahn: parsoid::testing: install PHP 7.2 [puppet] - 10https://gerrit.wikimedia.org/r/498104 (https://phabricator.wikimedia.org/T213493) [09:54:24] thx ema [09:55:11] (03CR) 10jerkins-bot: [V: 04-1] parsoid::testing: install PHP 7.2 [puppet] - 10https://gerrit.wikimedia.org/r/498104 (https://phabricator.wikimedia.org/T213493) (owner: 10Dzahn) [09:57:05] (03PS1) 10Elukey: Rely on Hadoop defaults for the TLS config of the Analytics Test cluster [puppet] - 10https://gerrit.wikimedia.org/r/498333 (https://phabricator.wikimedia.org/T217412) [09:57:27] (03CR) 10Muehlenhoff: [C: 03+2] Add cn=gerritadmin to list of LDAP groups [puppet] - 10https://gerrit.wikimedia.org/r/498331 (owner: 10Muehlenhoff) [09:57:47] (03CR) 10Elukey: [C: 03+2] Rely on Hadoop defaults for the TLS config of the Analytics Test cluster [puppet] - 10https://gerrit.wikimedia.org/r/498333 (https://phabricator.wikimedia.org/T217412) (owner: 10Elukey) [09:57:55] (03PS2) 10Elukey: Rely on Hadoop defaults for the TLS config of the Analytics Test cluster [puppet] - 10https://gerrit.wikimedia.org/r/498333 (https://phabricator.wikimedia.org/T217412) [09:58:37] RECOVERY - puppet last run on cp2005 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [09:59:25] moritzm: good to merge? [09:59:41] (03CR) 10Dzahn: [V: 03+1] "https://puppet-compiler.wmflabs.org/compiler1002/15272/scandium.eqiad.wmnet/" [puppet] - 10https://gerrit.wikimedia.org/r/498104 (https://phabricator.wikimedia.org/T213493) (owner: 10Dzahn) [10:00:20] elukey: please do! [10:00:42] !log ema@puppetmaster1001 conftool action : set/pooled=yes; selector: name=cp2005.codfw.wmnet,service=nginx [10:00:43] !log ema@puppetmaster1001 conftool action : set/pooled=yes; selector: name=cp2005.codfw.wmnet,service=varnish-fe [10:00:44] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:00:45] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:05:34] !log cp2005: repooled, serving traffic via ATS T213263 [10:05:36] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:05:37] T213263: Partial cache_upload traffic switchover to ATS and switchback to Varnish - https://phabricator.wikimedia.org/T213263 [10:12:27] (03PS2) 10GTirloni: wmf-auto-restart: Exclude NFS mountpoints [puppet] - 10https://gerrit.wikimedia.org/r/498328 (https://phabricator.wikimedia.org/T217086) [10:12:29] 10Operations, 10Traffic, 10Patch-For-Review: Partial cache_upload traffic switchover to ATS and switchback to Varnish - https://phabricator.wikimedia.org/T213263 (10ema) [10:12:34] (03CR) 10Giuseppe Lavagetto: profile::mediawiki::maintenance: systemd-timer based periodic jobs (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/482792 (https://phabricator.wikimedia.org/T211250) (owner: 10Giuseppe Lavagetto) [10:12:50] (03CR) 10GTirloni: "This change is ready for review." [puppet] - 10https://gerrit.wikimedia.org/r/498328 (https://phabricator.wikimedia.org/T217086) (owner: 10GTirloni) [10:13:33] (03CR) 10jerkins-bot: [V: 04-1] wmf-auto-restart: Exclude NFS mountpoints [puppet] - 10https://gerrit.wikimedia.org/r/498328 (https://phabricator.wikimedia.org/T217086) (owner: 10GTirloni) [10:14:11] (03PS1) 10Vgutierrez: netbox: Switch to the directory based deployment used by acme-chief [puppet] - 10https://gerrit.wikimedia.org/r/498336 (https://phabricator.wikimedia.org/T207295) [10:14:25] (03CR) 10GTirloni: "Making this dynamic enough with hiera values was getting tricky on a per-project basis. It also added some confusion when nfsclient.pp has" [puppet] - 10https://gerrit.wikimedia.org/r/498328 (https://phabricator.wikimedia.org/T217086) (owner: 10GTirloni) [10:16:01] (03PS3) 10GTirloni: wmf-auto-restart: Exclude NFS mountpoints [puppet] - 10https://gerrit.wikimedia.org/r/498328 (https://phabricator.wikimedia.org/T217086) [10:16:11] (03PS5) 10Dzahn: parsoid::testing: install PHP 7.2 [puppet] - 10https://gerrit.wikimedia.org/r/498104 (https://phabricator.wikimedia.org/T213493) [10:16:55] (03CR) 10jerkins-bot: [V: 04-1] wmf-auto-restart: Exclude NFS mountpoints [puppet] - 10https://gerrit.wikimedia.org/r/498328 (https://phabricator.wikimedia.org/T217086) (owner: 10GTirloni) [10:17:56] (03PS4) 10GTirloni: wmf-auto-restart: Exclude NFS mountpoints [puppet] - 10https://gerrit.wikimedia.org/r/498328 (https://phabricator.wikimedia.org/T217086) [10:20:32] (03CR) 10Vgutierrez: [C: 03+2] "everything looking as expected:" [puppet] - 10https://gerrit.wikimedia.org/r/498336 (https://phabricator.wikimedia.org/T207295) (owner: 10Vgutierrez) [10:20:52] !log scandium - manually removing all php* packages to let puppet reinstall 7.2 instead of 7.0 [10:20:54] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:21:04] (03CR) 10Dzahn: [C: 03+2] parsoid::testing: install PHP 7.2 [puppet] - 10https://gerrit.wikimedia.org/r/498104 (https://phabricator.wikimedia.org/T213493) (owner: 10Dzahn) [10:22:41] (03PS6) 10Dzahn: parsoid::testing: install PHP 7.2 [puppet] - 10https://gerrit.wikimedia.org/r/498104 (https://phabricator.wikimedia.org/T213493) [10:24:16] !log scandium - apt autoremove [10:24:17] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:27:31] (03CR) 10Arturo Borrero Gonzalez: [C: 03+1] wmf-auto-restart: Exclude NFS mountpoints [puppet] - 10https://gerrit.wikimedia.org/r/498328 (https://phabricator.wikimedia.org/T217086) (owner: 10GTirloni) [10:29:36] (03PS1) 10Vgutierrez: tendril: Switch to the directory based deployment used by acme-chief [puppet] - 10https://gerrit.wikimedia.org/r/498339 (https://phabricator.wikimedia.org/T207295) [10:29:53] 10Operations, 10Parsoid-PHP, 10Patch-For-Review: Install PHP7 on scandium - https://phabricator.wikimedia.org/T213493 (10Dzahn) >>! In T213493#5047267, @Joe wrote: > So first of all, why do wtp servers have php installed even? They should not, and they don't. See above, i checked with cumin and they all did... [10:30:03] thanks for reviving wikibugs. whoever did it [10:31:14] (03CR) 10Vgutierrez: [C: 03+2] "everything looks as expected:" [puppet] - 10https://gerrit.wikimedia.org/r/498339 (https://phabricator.wikimedia.org/T207295) (owner: 10Vgutierrez) [10:31:54] 10Operations, 10Parsoid-PHP, 10Patch-For-Review: Install PHP7 on scandium - https://phabricator.wikimedia.org/T213493 (10Dzahn) >>! In T213493#5044326, @ssastry wrote: > Parsoid is targeting PHP 7.2 so any servers that will run Parsoid/PHP will need 7.2 as well. ` root@scandium:~# dpkg -l | grep php ii p... [10:39:08] (03PS1) 10Vgutierrez: install: Switch to the directory based deployment used by acme-chief [puppet] - 10https://gerrit.wikimedia.org/r/498342 (https://phabricator.wikimedia.org/T207295) [10:40:11] (03PS1) 10Ema: lvs: fix ldap-ro and ldap-ro-ssl depool thresholds [puppet] - 10https://gerrit.wikimedia.org/r/498343 (https://phabricator.wikimedia.org/T218133) [10:41:15] (03CR) 10Vgutierrez: [C: 03+1] lvs: fix ldap-ro and ldap-ro-ssl depool thresholds [puppet] - 10https://gerrit.wikimedia.org/r/498343 (https://phabricator.wikimedia.org/T218133) (owner: 10Ema) [10:43:31] (03CR) 10Ema: [C: 03+2] lvs: fix ldap-ro and ldap-ro-ssl depool thresholds [puppet] - 10https://gerrit.wikimedia.org/r/498343 (https://phabricator.wikimedia.org/T218133) (owner: 10Ema) [10:45:56] (03CR) 10Vgutierrez: [C: 03+2] "everything looks good:" [puppet] - 10https://gerrit.wikimedia.org/r/498342 (https://phabricator.wikimedia.org/T207295) (owner: 10Vgutierrez) [10:46:05] (03PS2) 10Vgutierrez: install: Switch to the directory based deployment used by acme-chief [puppet] - 10https://gerrit.wikimedia.org/r/498342 (https://phabricator.wikimedia.org/T207295) [10:47:17] (03PS7) 10Jcrespo: mariadb-snapshots: Allow the option to only postprocess snapshots [puppet] - 10https://gerrit.wikimedia.org/r/498029 (https://phabricator.wikimedia.org/T210292) [10:51:18] (03PS8) 10Jcrespo: mariadb-snapshots: Allow the option to only postprocess snapshots [puppet] - 10https://gerrit.wikimedia.org/r/498029 (https://phabricator.wikimedia.org/T210292) [10:51:53] (03CR) 10Jbond: [C: 03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/498321 (https://phabricator.wikimedia.org/T135991) (owner: 10Muehlenhoff) [10:52:29] (03CR) 10Jcrespo: [C: 03+2] mariadb-snapshots: Allow the option to only postprocess snapshots [puppet] - 10https://gerrit.wikimedia.org/r/498029 (https://phabricator.wikimedia.org/T210292) (owner: 10Jcrespo) [10:56:11] 10Operations, 10ops-eqiad, 10DBA: rack/setup/install db1139|db1140.eqiad.wmnet (2 dump slaves) - https://phabricator.wikimedia.org/T218985 (10Marostegui) [10:56:19] 10Operations, 10ops-eqiad, 10DBA: rack/setup/install db1139|db1140.eqiad.wmnet (2 dump slaves) - https://phabricator.wikimedia.org/T218985 (10Marostegui) [10:56:31] 10Operations, 10ops-eqiad, 10DBA: rack/setup/install db1139|db1140.eqiad.wmnet (2 dump slaves) - https://phabricator.wikimedia.org/T218985 (10Marostegui) 05Open→03Stalled [10:59:52] (03CR) 10Muehlenhoff: wmf-auto-restart: Exclude NFS mountpoints (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/498328 (https://phabricator.wikimedia.org/T217086) (owner: 10GTirloni) [11:08:11] (03CR) 10Jbond: wmf-auto-restart: Exclude NFS mountpoints (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/498328 (https://phabricator.wikimedia.org/T217086) (owner: 10GTirloni) [11:09:49] (03CR) 10Muehlenhoff: wmf-auto-restart: Exclude NFS mountpoints (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/498328 (https://phabricator.wikimedia.org/T217086) (owner: 10GTirloni) [11:10:30] (03CR) 10Jcrespo: [V: 03+2 C: 03+2] backup_mariadb: Output Log to /var/log [software/wmfmariadbpy] - 10https://gerrit.wikimedia.org/r/498326 (https://phabricator.wikimedia.org/T210292) (owner: 10Jcrespo) [11:10:46] (03CR) 10Jcrespo: [C: 03+2] backup_mariadb: Output Log to /var/log [puppet] - 10https://gerrit.wikimedia.org/r/498324 (https://phabricator.wikimedia.org/T210292) (owner: 10Jcrespo) [11:10:57] (03PS4) 10Jcrespo: backup_mariadb: Output Log to /var/log [puppet] - 10https://gerrit.wikimedia.org/r/498324 (https://phabricator.wikimedia.org/T210292) [11:18:51] !log lvs1005: bounce pybal to clear backends health icinga warning T218133 [11:18:54] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:18:55] T218133: Put our ldap servers behind LVS - https://phabricator.wikimedia.org/T218133 [11:19:09] (03PS5) 10GTirloni: wmf-auto-restart: Exclude NFS mountpoints [puppet] - 10https://gerrit.wikimedia.org/r/498328 (https://phabricator.wikimedia.org/T217086) [11:20:50] (03CR) 10GTirloni: wmf-auto-restart: Exclude NFS mountpoints (033 comments) [puppet] - 10https://gerrit.wikimedia.org/r/498328 (https://phabricator.wikimedia.org/T217086) (owner: 10GTirloni) [11:22:07] !log lvs1002: bounce pybal to clear backends health icinga warning T218133 [11:22:10] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:23:14] (03CR) 10Effie Mouzeli: [C: 03+1] Add cn=gerritadmin to list of LDAP groups [puppet] - 10https://gerrit.wikimedia.org/r/498331 (owner: 10Muehlenhoff) [11:31:09] (03PS1) 10Jbond: mf-auto-restart: Exclude NFS mountpoints (alternate) [puppet] - 10https://gerrit.wikimedia.org/r/498351 (https://phabricator.wikimedia.org/T217086) [11:32:28] (03CR) 10Muehlenhoff: wmf-auto-restart: Exclude NFS mountpoints (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/498328 (https://phabricator.wikimedia.org/T217086) (owner: 10GTirloni) [11:32:30] (03CR) 10jerkins-bot: [V: 04-1] mf-auto-restart: Exclude NFS mountpoints (alternate) [puppet] - 10https://gerrit.wikimedia.org/r/498351 (https://phabricator.wikimedia.org/T217086) (owner: 10Jbond) [11:33:10] (03CR) 10Jbond: "this change is an alternate approch to https://gerrit.wikimedia.org/r/c/operations/puppet/+/498328" [puppet] - 10https://gerrit.wikimedia.org/r/498351 (https://phabricator.wikimedia.org/T217086) (owner: 10Jbond) [11:33:41] (03Abandoned) 10GTirloni: wmf-auto-restart: Exclude NFS mountpoints [puppet] - 10https://gerrit.wikimedia.org/r/498328 (https://phabricator.wikimedia.org/T217086) (owner: 10GTirloni) [11:34:32] (03CR) 10Dzahn: [C: 03+1] "will bring this up in the Monday meeting (Greek holiday) and get it merged if approved" [puppet] - 10https://gerrit.wikimedia.org/r/497840 (https://phabricator.wikimedia.org/T217813) (owner: 10Effie Mouzeli) [11:39:35] herron: o/ hi, it looks like you disabled puppet on deployment-mediawiki-09 in deployment-prep about a week ago for work on rsyslog templates. is that still in progress? [11:40:56] (deployment-mediawiki-07 doesn't show the same message, but puppet runs fail for a related-looking reason: `Error: Could not retrieve catalog from remote server: Error 500 on SERVER: Server Error: Could not find class profile::rsyslog::mwlog_shipper for deployment-mediawiki-07.deployment-prep.eqiad.wmflabs on node deployment-mediawiki-07.deployment-prep.eqiad.wmflabs`) [11:41:32] I’m afk at the moment but please feel free to re-enable [11:41:50] ok, thanks! [11:42:35] :) [11:45:18] (03PS2) 10Jbond: mf-auto-restart: Exclude NFS mountpoints (alternate) [puppet] - 10https://gerrit.wikimedia.org/r/498351 (https://phabricator.wikimedia.org/T217086) [11:49:00] Hmmm.... When I make a patch, CI doesn't run, what could have changed? [11:49:20] I'm white listed but Jenkins doesn't even vote +1 [11:49:41] PROBLEM - puppet last run on archiva1001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [11:49:51] Just merged this: https://gerrit.wikimedia.org/r/c/mediawiki/extensions/SecurePoll/+/498221 and nothing is happening looking at the zuul interface [11:50:48] I think this started since yesterday evening when I made patches and they didn't trigger Jenkins [11:51:08] And I'm on the whitelist and also, when I do "recheck", nothing happens [11:51:12] any help? [11:55:42] (03PS1) 10GTirloni: puppet_alert - Log to syslog instead of stdout [puppet] - 10https://gerrit.wikimedia.org/r/498353 (https://phabricator.wikimedia.org/T218987) [11:56:42] (03CR) 10GTirloni: [C: 03+1] mf-auto-restart: Exclude NFS mountpoints (alternate) [puppet] - 10https://gerrit.wikimedia.org/r/498351 (https://phabricator.wikimedia.org/T217086) (owner: 10Jbond) [11:56:48] (03CR) 10jerkins-bot: [V: 04-1] puppet_alert - Log to syslog instead of stdout [puppet] - 10https://gerrit.wikimedia.org/r/498353 (https://phabricator.wikimedia.org/T218987) (owner: 10GTirloni) [11:56:53] PROBLEM - DPKG on bast3002 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [11:57:35] PROBLEM - puppet last run on cp3030 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [11:57:50] (03CR) 10Jbond: [C: 03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/498316 (owner: 10Muehlenhoff) [11:57:54] ^ bast3002 is me, should recover soon [11:58:01] was about to say that, just checked [11:58:54] xSavitar: could you also report on #wikimedia-releng for the zuul issue [11:59:07] (03PS2) 10GTirloni: puppet_alert - Log to syslog instead of stdout [puppet] - 10https://gerrit.wikimedia.org/r/498353 (https://phabricator.wikimedia.org/T218987) [11:59:26] mutante: Let me do that [12:00:43] RECOVERY - DPKG on bast3002 is OK: All packages OK [12:02:40] (03PS3) 10GTirloni: puppet_alert - Log to syslog instead of stdout [puppet] - 10https://gerrit.wikimedia.org/r/498353 (https://phabricator.wikimedia.org/T218987) [12:05:35] (03CR) 10GTirloni: [C: 03+2] profile::base::labs - Convert cronjobs to systemd timers [puppet] - 10https://gerrit.wikimedia.org/r/498141 (https://phabricator.wikimedia.org/T210818) (owner: 10GTirloni) [12:05:45] (03PS5) 10GTirloni: profile::base::labs - Convert cronjobs to systemd timers [puppet] - 10https://gerrit.wikimedia.org/r/498141 (https://phabricator.wikimedia.org/T210818) [12:13:54] (03PS1) 10Jbond: Add option to filter out services which don't actually need a restart [debs/debdeploy] - 10https://gerrit.wikimedia.org/r/498357 [12:15:45] (03PS1) 10GTirloni: profile::base::labs - Fix timer definition [puppet] - 10https://gerrit.wikimedia.org/r/498358 (https://phabricator.wikimedia.org/T210818) [12:17:08] (03CR) 10GTirloni: [C: 03+2] profile::base::labs - Fix timer definition [puppet] - 10https://gerrit.wikimedia.org/r/498358 (https://phabricator.wikimedia.org/T210818) (owner: 10GTirloni) [12:20:01] (03CR) 10Arturo Borrero Gonzalez: [C: 03+1] mf-auto-restart: Exclude NFS mountpoints (alternate) [puppet] - 10https://gerrit.wikimedia.org/r/498351 (https://phabricator.wikimedia.org/T217086) (owner: 10Jbond) [12:21:21] RECOVERY - puppet last run on archiva1001 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures [12:21:45] (03PS1) 10Arturo Borrero Gonzalez: toolforge: introduce sssd, replacing nscd/nslcd [puppet] - 10https://gerrit.wikimedia.org/r/498359 (https://phabricator.wikimedia.org/T218126) [12:22:41] (03CR) 10jerkins-bot: [V: 04-1] toolforge: introduce sssd, replacing nscd/nslcd [puppet] - 10https://gerrit.wikimedia.org/r/498359 (https://phabricator.wikimedia.org/T218126) (owner: 10Arturo Borrero Gonzalez) [12:23:58] (03PS2) 10Arturo Borrero Gonzalez: toolforge: introduce sssd, replacing nscd/nslcd [puppet] - 10https://gerrit.wikimedia.org/r/498359 (https://phabricator.wikimedia.org/T218126) [12:23:59] RECOVERY - puppet last run on cp3030 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [12:25:25] 10Operations, 10Patch-For-Review: Tracking and Reducing cron-spam to root@ - https://phabricator.wikimedia.org/T132324 (10jcrespo) [12:33:55] (03CR) 10Paladox: [C: 03+1] "These links are now broken from 2.16 (/p/)." [puppet] - 10https://gerrit.wikimedia.org/r/498057 (https://phabricator.wikimedia.org/T218844) (owner: 10MarcoAurelio) [12:36:44] 10Operations, 10Discovery-Search (Current work): update elasticsearch curator to 5.6.0 - https://phabricator.wikimedia.org/T218991 (10Gehel) [12:36:52] (03PS12) 10KartikMistry: Cron to run script to purge old CX drafts [puppet] - 10https://gerrit.wikimedia.org/r/486454 (https://phabricator.wikimedia.org/T189091) [12:37:38] 10Operations, 10Discovery-Search (Current work): update elasticsearch curator to 5.6.0 - https://phabricator.wikimedia.org/T218991 (10Gehel) Note that we should take this as an opportunity to fix T216235 as well. [12:39:19] (03PS1) 10Elukey: profile::hadoop::common: explicitly set if TLS keys are deployed or not [puppet] - 10https://gerrit.wikimedia.org/r/498365 (https://phabricator.wikimedia.org/T217412) [12:43:20] (03PS2) 10Elukey: profile::hadoop::common: explicitly set if TLS keys are deployed or not [puppet] - 10https://gerrit.wikimedia.org/r/498365 (https://phabricator.wikimedia.org/T217412) [12:49:20] (03PS3) 10Elukey: profile::hadoop::common: explicitly set if TLS keys are deployed or not [puppet] - 10https://gerrit.wikimedia.org/r/498365 (https://phabricator.wikimedia.org/T217412) [12:51:44] (03CR) 10Elukey: [C: 03+2] "https://puppet-compiler.wmflabs.org/compiler1002/15275/" [puppet] - 10https://gerrit.wikimedia.org/r/498365 (https://phabricator.wikimedia.org/T217412) (owner: 10Elukey) [12:52:42] (03PS1) 10Muehlenhoff: Remove obsolete elasticsearch-curator sync definition [puppet] - 10https://gerrit.wikimedia.org/r/498367 (https://phabricator.wikimedia.org/T216235) [12:54:27] (03CR) 10Gehel: [C: 03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/498367 (https://phabricator.wikimedia.org/T216235) (owner: 10Muehlenhoff) [12:57:05] (03PS2) 10Muehlenhoff: Remove obsolete elasticsearch-curator sync definition [puppet] - 10https://gerrit.wikimedia.org/r/498367 (https://phabricator.wikimedia.org/T216235) [12:58:04] (03PS1) 10Paladox: contint: Remove curl package [puppet] - 10https://gerrit.wikimedia.org/r/498368 [12:58:26] (03CR) 10Muehlenhoff: [C: 03+2] Remove obsolete elasticsearch-curator sync definition [puppet] - 10https://gerrit.wikimedia.org/r/498367 (https://phabricator.wikimedia.org/T216235) (owner: 10Muehlenhoff) [12:58:36] (03PS2) 10Paladox: contint: Remove curl package [puppet] - 10https://gerrit.wikimedia.org/r/498368 [12:59:47] (03CR) 10Zppix: [C: 03+1] contint: Remove curl package [puppet] - 10https://gerrit.wikimedia.org/r/498368 (owner: 10Paladox) [12:59:55] (03CR) 10Muehlenhoff: "Is this causing puppet failures or similar? In https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/498045/ Antoine preferred to keep it" [puppet] - 10https://gerrit.wikimedia.org/r/498368 (owner: 10Paladox) [13:00:40] (03CR) 10Paladox: "> Is this causing puppet failures or similar? In https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/498045/" [puppet] - 10https://gerrit.wikimedia.org/r/498368 (owner: 10Paladox) [13:04:08] (03CR) 10Dzahn: "let's change it to use require_package or ensure_packages to avoid the duplicate while being able to keep it" [puppet] - 10https://gerrit.wikimedia.org/r/498368 (owner: 10Paladox) [13:05:02] (03PS3) 10Paladox: contint: Remove curl package [puppet] - 10https://gerrit.wikimedia.org/r/498368 [13:05:13] (03CR) 10Paladox: "> let's change it to use require_package or ensure_packages to avoid" [puppet] - 10https://gerrit.wikimedia.org/r/498368 (owner: 10Paladox) [13:05:30] (03CR) 10Muehlenhoff: [C: 03+1] contint: Remove curl package [puppet] - 10https://gerrit.wikimedia.org/r/498368 (owner: 10Paladox) [13:05:53] (03CR) 10Zppix: [C: 03+1] contint: Remove curl package [puppet] - 10https://gerrit.wikimedia.org/r/498368 (owner: 10Paladox) [13:07:45] (03PS4) 10Paladox: contint: Remove curl package [puppet] - 10https://gerrit.wikimedia.org/r/498368 [13:08:31] (03PS3) 10Arturo Borrero Gonzalez: toolforge: introduce sssd, replacing nscd/nslcd [puppet] - 10https://gerrit.wikimedia.org/r/498359 (https://phabricator.wikimedia.org/T218126) [13:09:27] (03CR) 10jerkins-bot: [V: 04-1] toolforge: introduce sssd, replacing nscd/nslcd [puppet] - 10https://gerrit.wikimedia.org/r/498359 (https://phabricator.wikimedia.org/T218126) (owner: 10Arturo Borrero Gonzalez) [13:09:58] (03CR) 10Dzahn: [C: 03+2] contint: Remove curl package [puppet] - 10https://gerrit.wikimedia.org/r/498368 (owner: 10Paladox) [13:10:16] (03PS5) 10Dzahn: contint: Remove curl package [puppet] - 10https://gerrit.wikimedia.org/r/498368 (owner: 10Paladox) [13:11:26] (03CR) 10Muehlenhoff: "One nit and one comment." (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/498351 (https://phabricator.wikimedia.org/T217086) (owner: 10Jbond) [13:12:46] (03CR) 10Dzahn: "this fixed the puppet run on contint1001. thanks!" [puppet] - 10https://gerrit.wikimedia.org/r/498368 (owner: 10Paladox) [13:13:48] Zppix: i hear you reported it first.. curios what alerted you. icinga? [13:13:59] or puppet in cloud [13:14:18] mutante: I saw the puppet alert on icinga2 so i ran puppet agent -tv on jenkins-slave-01 and got the error message [13:14:44] Zppix: oh.. both.. Icinga2 in cloud and the slaves. gotcha:) thanks [13:14:49] np [13:15:06] I have a one line fix for a train blocker here. anyone interested in deploying it? [13:15:06] https://gerrit.wikimedia.org/r/c/mediawiki/core/+/498363 [13:16:24] is it testable? (I assume the message cache would need to be purged somehow) [13:16:25] RECOVERY - Request latencies on neon is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/kubernetes-api [13:17:17] RECOVERY - puppet last run on contint1001 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures [13:17:25] Lucas_WMDE: i don't see a good way to test it [13:17:49] duesen_: maybe find a message on beta that is broken, and then somehow reload the cache, and see that it's fixed? [13:18:06] "somehow" is probably: call MessageCache::load() from eval.php [13:18:37] RECOVERY - puppet last run on contint2001 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [13:19:10] https://wikitech.wikimedia.org/wiki/MessageCache#How_to points to a ->clear() method [13:19:16] but yeah, a reproducer on some test wiki would be ideal [13:19:52] (03PS4) 10Arturo Borrero Gonzalez: toolforge: introduce sssd, replacing nscd/nslcd [puppet] - 10https://gerrit.wikimedia.org/r/498359 (https://phabricator.wikimedia.org/T218126) [13:19:58] (03PS1) 10WMDE-Fisch: Enable ReferencePreviews beta feature on de- and ar-wiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/498371 (https://phabricator.wikimedia.org/T218766) [13:20:37] (03CR) 10jerkins-bot: [V: 04-1] toolforge: introduce sssd, replacing nscd/nslcd [puppet] - 10https://gerrit.wikimedia.org/r/498359 (https://phabricator.wikimedia.org/T218126) (owner: 10Arturo Borrero Gonzalez) [13:20:41] (03CR) 10WMDE-Fisch: "Note: Deployment should NOT happen before April 4 2019." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/498371 (https://phabricator.wikimedia.org/T218766) (owner: 10WMDE-Fisch) [13:24:11] (03PS1) 10Gehel: aptrepo: add component for elasticsearch-curator [puppet] - 10https://gerrit.wikimedia.org/r/498372 (https://phabricator.wikimedia.org/T216235) [13:24:13] (03PS1) 10Gehel: elasticsearch: use the new elasticsearch-curator APT component [puppet] - 10https://gerrit.wikimedia.org/r/498373 (https://phabricator.wikimedia.org/T218991) [13:24:33] Lucas_WMDE: clear() would mean that the next ppo sod trying to laod a page would trigger the re-cache, i think. may cause timeouts... [13:25:30] (03CR) 10jerkins-bot: [V: 04-1] elasticsearch: use the new elasticsearch-curator APT component [puppet] - 10https://gerrit.wikimedia.org/r/498373 (https://phabricator.wikimedia.org/T218991) (owner: 10Gehel) [13:25:38] Lucas_WMDE: but I have not investiagtes this [13:26:50] (03PS2) 10Gehel: elasticsearch: use the new elasticsearch-curator APT component [puppet] - 10https://gerrit.wikimedia.org/r/498373 (https://phabricator.wikimedia.org/T218991) [13:27:26] (03PS5) 10Arturo Borrero Gonzalez: toolforge: introduce sssd, replacing nscd/nslcd [puppet] - 10https://gerrit.wikimedia.org/r/498359 (https://phabricator.wikimedia.org/T218126) [13:28:18] (03CR) 10jerkins-bot: [V: 04-1] toolforge: introduce sssd, replacing nscd/nslcd [puppet] - 10https://gerrit.wikimedia.org/r/498359 (https://phabricator.wikimedia.org/T218126) (owner: 10Arturo Borrero Gonzalez) [13:30:27] (03CR) 10Muehlenhoff: [C: 04-1] aptrepo: add component for elasticsearch-curator (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/498372 (https://phabricator.wikimedia.org/T216235) (owner: 10Gehel) [13:31:08] (03CR) 10Muehlenhoff: [C: 04-1] elasticsearch: use the new elasticsearch-curator APT component (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/498373 (https://phabricator.wikimedia.org/T218991) (owner: 10Gehel) [13:33:34] (03CR) 10Gehel: aptrepo: add component for elasticsearch-curator (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/498372 (https://phabricator.wikimedia.org/T216235) (owner: 10Gehel) [13:34:20] (03PS3) 10Gehel: elasticsearch: use the new elasticsearch-curator APT component [puppet] - 10https://gerrit.wikimedia.org/r/498373 (https://phabricator.wikimedia.org/T218991) [13:34:22] (03CR) 10Gehel: elasticsearch: use the new elasticsearch-curator APT component (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/498373 (https://phabricator.wikimedia.org/T218991) (owner: 10Gehel) [13:35:51] (03CR) 10Muehlenhoff: [C: 03+1] "Looks good, two nits" (032 comments) [debs/debdeploy] - 10https://gerrit.wikimedia.org/r/498357 (owner: 10Jbond) [13:38:42] I tried running [13:38:46] $mc = \Wikimedia\TestingAccessWrapper::newFromObject( MessageCache::singleton() ) [13:38:49] var_dump( $mc->loadFromDb( 'en' )['Mycontris'] ) [13:38:53] on simplewiki [13:38:59] but it always gives me the same result, " My changes" [13:39:00] (03PS6) 10Arturo Borrero Gonzalez: toolforge: introduce sssd, replacing nscd/nslcd [puppet] - 10https://gerrit.wikimedia.org/r/498359 (https://phabricator.wikimedia.org/T218126) [13:39:05] which seems to be the latest revision [13:39:55] so that’s not helping :/ [13:40:14] (03PS2) 10Jbond: Add option to filter out services which don't actually need a restart [debs/debdeploy] - 10https://gerrit.wikimedia.org/r/498357 [13:40:41] (03PS2) 10Gehel: aptrepo: add component for elasticsearch-curator [puppet] - 10https://gerrit.wikimedia.org/r/498372 (https://phabricator.wikimedia.org/T216235) [13:40:43] (03PS4) 10Gehel: elasticsearch: use the new elasticsearch-curator APT component [puppet] - 10https://gerrit.wikimedia.org/r/498373 (https://phabricator.wikimedia.org/T218991) [13:41:05] duesen_: can we not just null edit all mw anmespaces pages or something? or is that not enough? [13:41:14] (03CR) 10Jbond: Add option to filter out services which don't actually need a restart (032 comments) [debs/debdeploy] - 10https://gerrit.wikimedia.org/r/498357 (owner: 10Jbond) [13:43:59] (03CR) 10Muehlenhoff: [C: 03+1] aptrepo: add component for elasticsearch-curator [puppet] - 10https://gerrit.wikimedia.org/r/498372 (https://phabricator.wikimedia.org/T216235) (owner: 10Gehel) [13:44:10] (03PS7) 10Arturo Borrero Gonzalez: toolforge: introduce sssd, replacing nscd/nslcd [puppet] - 10https://gerrit.wikimedia.org/r/498359 (https://phabricator.wikimedia.org/T218126) [13:44:19] (03CR) 10Muehlenhoff: [C: 03+1] elasticsearch: use the new elasticsearch-curator APT component [puppet] - 10https://gerrit.wikimedia.org/r/498373 (https://phabricator.wikimedia.org/T218991) (owner: 10Gehel) [13:45:13] herron: hi again, is it all right if i remove the `profile::rsyslog::mwlog_shipper` class from the deployment-mediawiki- prefix config? that seems to be what's causing puppet to fail on those hosts. [13:45:44] (03CR) 10Gehel: [C: 03+2] aptrepo: add component for elasticsearch-curator [puppet] - 10https://gerrit.wikimedia.org/r/498372 (https://phabricator.wikimedia.org/T216235) (owner: 10Gehel) [13:46:28] mdholloway: yes, that can be removed. was testing it but can resume later on [13:46:40] ok, great! thank you : [13:46:43] :) [13:51:54] (03PS1) 10Elukey: hue: add ssl_ca_certs config tunable and https/http variations [puppet/cdh] - 10https://gerrit.wikimedia.org/r/498375 (https://phabricator.wikimedia.org/T217412) [13:53:33] (03PS1) 10Muehlenhoff: Remove sync defition for old PHP 7.2 repo [puppet] - 10https://gerrit.wikimedia.org/r/498376 (https://phabricator.wikimedia.org/T216712) [13:53:55] 10Operations, 10Toolforge, 10Toolforge-standards-committee, 10Traffic, and 2 others: Detect tools.wmflabs.org tools which are HTTP-only - https://phabricator.wikimedia.org/T128409 (10GTirloni) [13:53:59] (03CR) 10Elukey: [C: 03+2] "https://puppet-compiler.wmflabs.org/compiler1002/15276/" [puppet/cdh] - 10https://gerrit.wikimedia.org/r/498375 (https://phabricator.wikimedia.org/T217412) (owner: 10Elukey) [13:54:14] (03CR) 10Elukey: [V: 03+2 C: 03+2] hue: add ssl_ca_certs config tunable and https/http variations [puppet/cdh] - 10https://gerrit.wikimedia.org/r/498375 (https://phabricator.wikimedia.org/T217412) (owner: 10Elukey) [13:56:12] (03CR) 10Gehel: [C: 03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/498376 (https://phabricator.wikimedia.org/T216712) (owner: 10Muehlenhoff) [13:56:34] (03CR) 10Muehlenhoff: [C: 03+2] Remove sync defition for old PHP 7.2 repo [puppet] - 10https://gerrit.wikimedia.org/r/498376 (https://phabricator.wikimedia.org/T216712) (owner: 10Muehlenhoff) [13:56:44] (03PS1) 10Elukey: Update cdh module to its latest version [puppet] - 10https://gerrit.wikimedia.org/r/498377 [13:56:58] (03CR) 10Elukey: [C: 03+2] Update cdh module to its latest version [puppet] - 10https://gerrit.wikimedia.org/r/498377 (owner: 10Elukey) [13:57:08] (03PS2) 10Elukey: Update cdh module to its latest version [puppet] - 10https://gerrit.wikimedia.org/r/498377 [13:57:10] (03CR) 10Elukey: [V: 03+2 C: 03+2] Update cdh module to its latest version [puppet] - 10https://gerrit.wikimedia.org/r/498377 (owner: 10Elukey) [13:58:56] (03PS3) 10Jbond: mf-auto-restart: Exclude NFS mountpoints (alternate) [puppet] - 10https://gerrit.wikimedia.org/r/498351 (https://phabricator.wikimedia.org/T217086) [13:59:06] (03CR) 10Jbond: mf-auto-restart: Exclude NFS mountpoints (alternate) (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/498351 (https://phabricator.wikimedia.org/T217086) (owner: 10Jbond) [14:01:07] does anyone know what's up with the increased logstash input over the past 12 hours? herron godog? [14:01:30] (03PS5) 10Gehel: elasticsearch: use the new elasticsearch-curator APT component [puppet] - 10https://gerrit.wikimedia.org/r/498373 (https://phabricator.wikimedia.org/T218991) [14:03:17] cdanis: looks like this might be the elasticsearch 6 upgrade raising a few deprecation warnings [14:03:43] looking [14:03:54] seems like it correlates with... input gelf/12201 getting traffic starting around 3-21 14:45 UTC [14:04:04] (03CR) 10Gehel: [C: 03+2] elasticsearch: use the new elasticsearch-curator APT component [puppet] - 10https://gerrit.wikimedia.org/r/498373 (https://phabricator.wikimedia.org/T218991) (owner: 10Gehel) [14:04:20] (03CR) 10Muehlenhoff: [C: 03+1] "Looks good to me" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/498351 (https://phabricator.wikimedia.org/T217086) (owner: 10Jbond) [14:04:47] (03CR) 10Muehlenhoff: [C: 03+1] Add option to filter out services which don't actually need a restart [debs/debdeploy] - 10https://gerrit.wikimedia.org/r/498357 (owner: 10Jbond) [14:05:30] * cdanis bbiab, meeting prep [14:05:43] cdanis: no I haven't looked into it, though it is indeed gelf [14:05:50] cdanis: at least there is too much traffic from the elasticsearch / cirrus cluster [14:06:02] 10Operations, 10serviceops, 10User-jijiki: create IRC channel for the Service Operations SRE subteam - https://phabricator.wikimedia.org/T211902 (10ArielGlenn) [14:06:06] give me a few minutes to prepare a patch to hide that [14:06:17] i'm not sure it's too much, it's just more than yesterday :) [14:06:28] (which is what the alert cares about) [14:07:04] it's mjolnir still using a deprecated field on the elastic bulk update API, will fix [14:07:56] * godog nods [14:08:03] thanks dcausse gehel ! [14:08:19] (03CR) 10Jbond: [C: 03+2] Add option to filter out services which don't actually need a restart [debs/debdeploy] - 10https://gerrit.wikimedia.org/r/498357 (owner: 10Jbond) [14:08:42] dcausse: if needed, I can deploy a logging config change to hide that message for the moment [14:08:53] gehel: err... no it's logstash itself :) [14:08:54] dcausse: depending on how long it would take to fix and deploy mjolnir [14:09:23] (03CR) 10Jbond: [C: 03+2] mf-auto-restart: Exclude NFS mountpoints (alternate) [puppet] - 10https://gerrit.wikimedia.org/r/498351 (https://phabricator.wikimedia.org/T217086) (owner: 10Jbond) [14:09:50] (03PS4) 10Jbond: mf-auto-restart: Exclude NFS mountpoints (alternate) [puppet] - 10https://gerrit.wikimedia.org/r/498351 (https://phabricator.wikimedia.org/T217086) [14:09:50] gehel: the deprecation warnings I see are from logstash indices, I'll fix mjolnir but it's not what we see in kibana at the moment [14:11:20] gehel: bah ignore that ^ [14:11:51] (03PS1) 10GTirloni: wmcs: Add .py extension to various scripts [puppet] - 10https://gerrit.wikimedia.org/r/498379 (https://phabricator.wikimedia.org/T144169) [14:12:55] (03CR) 10jerkins-bot: [V: 04-1] wmcs: Add .py extension to various scripts [puppet] - 10https://gerrit.wikimedia.org/r/498379 (https://phabricator.wikimedia.org/T144169) (owner: 10GTirloni) [14:18:18] (03PS1) 10Dzahn: parsoid::testing: use profile::mediawiki::php to get PHP 7.2 [puppet] - 10https://gerrit.wikimedia.org/r/498383 (https://phabricator.wikimedia.org/T213493) [14:21:19] 10Operations, 10Discovery-Search (Current work): Deprecation warning on elasticsearch 6 expected [retry_on_conflict] - https://phabricator.wikimedia.org/T218994 (10Gehel) [14:23:18] (03PS1) 10Filippo Giunchedi: [WIP] kafkatee instance for udp2log compat [puppet] - 10https://gerrit.wikimedia.org/r/498386 (https://phabricator.wikimedia.org/T126989) [14:23:20] (03PS1) 10Filippo Giunchedi: [WIP] add kafkatee to mwlog [puppet] - 10https://gerrit.wikimedia.org/r/498387 (https://phabricator.wikimedia.org/T126989) [14:23:55] (03CR) 10jerkins-bot: [V: 04-1] [WIP] kafkatee instance for udp2log compat [puppet] - 10https://gerrit.wikimedia.org/r/498386 (https://phabricator.wikimedia.org/T126989) (owner: 10Filippo Giunchedi) [14:24:28] 10Operations, 10Discovery-Search (Current work), 10Patch-For-Review: Deprecation warning on elasticsearch 6 expected [retry_on_conflict] - https://phabricator.wikimedia.org/T218994 (10dcausse) Elastica is affected as well. [14:24:45] (03PS1) 10Gehel: elasticsearch: hide deprecation warning for ParseField [puppet] - 10https://gerrit.wikimedia.org/r/498388 (https://phabricator.wikimedia.org/T218994) [14:24:49] 10Operations, 10CirrusSearch, 10Discovery-Search, 10Patch-For-Review: Deprecation warning on elasticsearch 6 expected [retry_on_conflict] - https://phabricator.wikimedia.org/T218994 (10dcausse) [14:25:45] (03PS2) 10Gehel: elasticsearch: hide deprecation warning for ParseField [puppet] - 10https://gerrit.wikimedia.org/r/498388 (https://phabricator.wikimedia.org/T218994) [14:26:34] (03PS2) 10Dzahn: parsoid::testing: use profile::mediawiki::php to get PHP 7.2 [puppet] - 10https://gerrit.wikimedia.org/r/498383 (https://phabricator.wikimedia.org/T213493) [14:26:44] (03PS2) 10Muehlenhoff: Add systemd to filter_services list of debdeploy [puppet] - 10https://gerrit.wikimedia.org/r/498316 [14:26:48] (03PS2) 10Filippo Giunchedi: [WIP] kafkatee instance for udp2log compat [puppet] - 10https://gerrit.wikimedia.org/r/498386 (https://phabricator.wikimedia.org/T126989) [14:26:50] (03CR) 10DCausse: [C: 03+1] elasticsearch: hide deprecation warning for ParseField [puppet] - 10https://gerrit.wikimedia.org/r/498388 (https://phabricator.wikimedia.org/T218994) (owner: 10Gehel) [14:27:44] (03CR) 10jerkins-bot: [V: 04-1] [WIP] kafkatee instance for udp2log compat [puppet] - 10https://gerrit.wikimedia.org/r/498386 (https://phabricator.wikimedia.org/T126989) (owner: 10Filippo Giunchedi) [14:28:26] (03CR) 10Gehel: [C: 03+2] elasticsearch: hide deprecation warning for ParseField [puppet] - 10https://gerrit.wikimedia.org/r/498388 (https://phabricator.wikimedia.org/T218994) (owner: 10Gehel) [14:29:46] 10Operations, 10CirrusSearch, 10Discovery-Search, 10Patch-For-Review: Deprecation warning on elasticsearch 6 expected [retry_on_conflict] - https://phabricator.wikimedia.org/T218994 (10Gehel) disabling this logger for now, let's not forget to re-enable it once we've fixed the underlying issues! [14:29:56] (03CR) 10Dzahn: [V: 03+1] "https://puppet-compiler.wmflabs.org/compiler1001/15280/scandium.eqiad.wmnet/" [puppet] - 10https://gerrit.wikimedia.org/r/498383 (https://phabricator.wikimedia.org/T213493) (owner: 10Dzahn) [14:30:19] 10Operations, 10CirrusSearch, 10Discovery-Search: re-enable deprecation warning logger once issues are solved - https://phabricator.wikimedia.org/T218995 (10Gehel) [14:30:35] 10Operations, 10CirrusSearch, 10Discovery-Search: re-enable deprecation warning logger on elasticsearch once issues are solved - https://phabricator.wikimedia.org/T218995 (10Gehel) [14:30:53] (03PS3) 10Dzahn: parsoid::testing: use profile::mediawiki::php to get PHP 7.2 [puppet] - 10https://gerrit.wikimedia.org/r/498383 (https://phabricator.wikimedia.org/T213493) [14:31:45] (03PS2) 10GTirloni: wmcs: Add .py extension to various scripts [puppet] - 10https://gerrit.wikimedia.org/r/498379 (https://phabricator.wikimedia.org/T144169) [14:32:33] (03CR) 10Dzahn: [C: 03+2] parsoid::testing: use profile::mediawiki::php to get PHP 7.2 [puppet] - 10https://gerrit.wikimedia.org/r/498383 (https://phabricator.wikimedia.org/T213493) (owner: 10Dzahn) [14:32:51] (03CR) 10jerkins-bot: [V: 04-1] wmcs: Add .py extension to various scripts [puppet] - 10https://gerrit.wikimedia.org/r/498379 (https://phabricator.wikimedia.org/T144169) (owner: 10GTirloni) [14:33:06] (03PS3) 10Ppchelko: Map syslog{severity,facility}-text to {severity,facility}_label [puppet] - 10https://gerrit.wikimedia.org/r/497321 (https://phabricator.wikimedia.org/T211125) [14:33:34] !log upgrading to elasticsearch-curator 5.6.0 on all elasticsearch nodes (including logstash) - T218991 [14:33:36] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:33:37] T218991: update elasticsearch curator to 5.6.0 - https://phabricator.wikimedia.org/T218991 [14:34:09] PROBLEM - puppet last run on elastic1045 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [14:34:13] PROBLEM - puppet last run on elastic1052 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [14:34:42] !log scandium - apt-get remove --purge php* ; apt autoremove ; letting puppet reinstall php 7.2 one more time using mediawiki::profile::php now [14:34:43] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:34:59] PROBLEM - puppet last run on elastic1025 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [14:35:08] (03CR) 10Ladsgroup: "> Patch Set 2: Code-Review-1" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/497316 (owner: 10Ladsgroup) [14:35:56] (03PS3) 10Filippo Giunchedi: [WIP] kafkatee instance for udp2log compat [puppet] - 10https://gerrit.wikimedia.org/r/498386 (https://phabricator.wikimedia.org/T126989) [14:38:08] 10Operations, 10Parsoid-PHP, 10Patch-For-Review: Install PHP7 on scandium - https://phabricator.wikimedia.org/T213493 (10Dzahn) 05Open→03Resolved @ssastry Changed it one more time to use `profile::mediawiki::php` to setup PHP 7.2 and configure all the extensions just like on appservers. This should be id... [14:39:25] RECOVERY - puppet last run on elastic1045 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [14:39:29] RECOVERY - puppet last run on elastic1052 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [14:39:47] (03PS3) 10Muehlenhoff: Add systemd to filter_services list of debdeploy [puppet] - 10https://gerrit.wikimedia.org/r/498316 [14:40:15] RECOVERY - puppet last run on elastic1025 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [14:41:39] (03CR) 10Muehlenhoff: [C: 03+2] Add systemd to filter_services list of debdeploy [puppet] - 10https://gerrit.wikimedia.org/r/498316 (owner: 10Muehlenhoff) [14:45:06] (03PS3) 10GTirloni: wmcs: Add .py extension to various scripts [puppet] - 10https://gerrit.wikimedia.org/r/498379 (https://phabricator.wikimedia.org/T144169) [14:46:19] (03CR) 10jerkins-bot: [V: 04-1] wmcs: Add .py extension to various scripts [puppet] - 10https://gerrit.wikimedia.org/r/498379 (https://phabricator.wikimedia.org/T144169) (owner: 10GTirloni) [14:47:51] (03PS1) 10Gehel: logstash: elasticsearch-curator is now managed in its own component [puppet] - 10https://gerrit.wikimedia.org/r/498390 (https://phabricator.wikimedia.org/T216235) [14:48:30] (03CR) 10Paladox: [V: 03+2 C: 03+2] Merge branch 'stable-2.15' into wmf/stable-2.15 [software/gerrit] (wmf/stable-2.15) - 10https://gerrit.wikimedia.org/r/497120 (https://phabricator.wikimedia.org/T218515) (owner: 10Paladox) [14:49:04] (03PS4) 10GTirloni: wmcs: Add .py extension to various scripts [puppet] - 10https://gerrit.wikimedia.org/r/498379 (https://phabricator.wikimedia.org/T144169) [14:49:56] (03CR) 10jerkins-bot: [V: 04-1] wmcs: Add .py extension to various scripts [puppet] - 10https://gerrit.wikimedia.org/r/498379 (https://phabricator.wikimedia.org/T144169) (owner: 10GTirloni) [14:50:18] (03CR) 10Muehlenhoff: [C: 03+1] logstash: elasticsearch-curator is now managed in its own component [puppet] - 10https://gerrit.wikimedia.org/r/498390 (https://phabricator.wikimedia.org/T216235) (owner: 10Gehel) [14:50:33] (03CR) 10Gehel: [C: 03+2] logstash: elasticsearch-curator is now managed in its own component [puppet] - 10https://gerrit.wikimedia.org/r/498390 (https://phabricator.wikimedia.org/T216235) (owner: 10Gehel) [14:52:10] gtirloni: that's probably going to be a lot of patch sets until all scripts pass pep8 at once.. i suppose [14:52:12] (03PS3) 10Dzahn: confd: base::service_unit -> systemd::service [puppet] - 10https://gerrit.wikimedia.org/r/456317 (https://phabricator.wikimedia.org/T194724) [14:52:16] but cool! [14:52:33] oh, they all passed, except some clever code that breaks flake8 now, haha [14:52:41] oooh, nice :) [14:54:59] (03CR) 10GTirloni: "Don't write clever code, folks!" [puppet] - 10https://gerrit.wikimedia.org/r/498379 (https://phabricator.wikimedia.org/T144169) (owner: 10GTirloni) [14:56:20] (03CR) 10CRusnov: "Was going through my changesets and noticed this one, and of course there is request for this feature. The only thing that was pending was" [software/cumin] - 10https://gerrit.wikimedia.org/r/474087 (https://phabricator.wikimedia.org/T207037) (owner: 10CRusnov) [14:57:07] 10Operations, 10Elasticsearch, 10Discovery-Search (Current work), 10Patch-For-Review, 10User-fgiunchedi: cleanup reprepro configuration for elasticsearch-curator - https://phabricator.wikimedia.org/T216235 (10Gehel) [14:57:12] 10Operations, 10Traffic, 10Wikidata, 10Wikidata-Query-Service, 10User-Smalyshev: Reduce / remove the aggessive cache busting behaviour of wdqs-updater - https://phabricator.wikimedia.org/T217897 (10Addshore) >>! In T217897#5026499, @Smalyshev wrote: >> I guess the wdqs internal machines would have compar... [14:58:53] ottomata: I quickly checked kafkatee kafka ssl support but I don't see it, can you confirm that's not a thing yet? [14:58:59] 10Operations, 10serviceops, 10User-jijiki: create IRC channel for the Service Operations SRE subteam - https://phabricator.wikimedia.org/T211902 (10Dzahn) a:03mark Hi Mark, this ticket is resolved since a while except that one check box to setup an access list and add admins. I see currently you are on... [15:01:30] (03Abandoned) 10Filippo Giunchedi: [WIP] add kafkatee to mwlog [puppet] - 10https://gerrit.wikimedia.org/r/498387 (https://phabricator.wikimedia.org/T126989) (owner: 10Filippo Giunchedi) [15:02:26] (03PS1) 10Gehel: elasticsearch: hide deprecation warning for ParseField [puppet] - 10https://gerrit.wikimedia.org/r/498395 (https://phabricator.wikimedia.org/T218994) [15:05:22] (03PS1) 10Andrew Bogott: ldap: add an index for 'sudoHost' [puppet] - 10https://gerrit.wikimedia.org/r/498396 (https://phabricator.wikimedia.org/T46722) [15:05:41] (03CR) 10DCausse: [C: 03+1] elasticsearch: hide deprecation warning for ParseField [puppet] - 10https://gerrit.wikimedia.org/r/498395 (https://phabricator.wikimedia.org/T218994) (owner: 10Gehel) [15:06:01] (03CR) 10Gehel: [C: 03+2] elasticsearch: hide deprecation warning for ParseField [puppet] - 10https://gerrit.wikimedia.org/r/498395 (https://phabricator.wikimedia.org/T218994) (owner: 10Gehel) [15:06:25] (03PS2) 10Andrew Bogott: ldap: add an index for 'sudoHost' [puppet] - 10https://gerrit.wikimedia.org/r/498396 (https://phabricator.wikimedia.org/T46722) [15:07:22] (03CR) 10Andrew Bogott: "Moritz says 'after merging it IIRC slapindex needs to be manually run'" [puppet] - 10https://gerrit.wikimedia.org/r/498396 (https://phabricator.wikimedia.org/T46722) (owner: 10Andrew Bogott) [15:08:39] (03PS4) 10Filippo Giunchedi: profile: kafkatee instance for udp2log compat [puppet] - 10https://gerrit.wikimedia.org/r/498386 (https://phabricator.wikimedia.org/T126989) [15:08:58] (03CR) 10Muehlenhoff: ldap: add an index for 'sudoHost' (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/498396 (https://phabricator.wikimedia.org/T46722) (owner: 10Andrew Bogott) [15:09:38] 10Operations, 10Discovery-Search (Current work), 10Patch-For-Review: update elasticsearch curator to 5.6.0 - https://phabricator.wikimedia.org/T218991 (10Gehel) a:03Gehel [15:09:45] 10Operations, 10Elasticsearch, 10Discovery-Search (Current work), 10Patch-For-Review, 10User-fgiunchedi: cleanup reprepro configuration for elasticsearch-curator - https://phabricator.wikimedia.org/T216235 (10Gehel) a:03Gehel [15:10:46] (03PS17) 10Daimona Eaytoy: Update AbuseFilter config to keep the status quo [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475772 [15:10:53] (03PS16) 10Daimona Eaytoy: Move all AbuseFilter config to abusefilter.php [mediawiki-config] - 10https://gerrit.wikimedia.org/r/477063 (https://phabricator.wikimedia.org/T145931) [15:11:01] (03PS7) 10Daimona Eaytoy: Remove $wgAbuseFilterRuntimeProfile [mediawiki-config] - 10https://gerrit.wikimedia.org/r/486470 (https://phabricator.wikimedia.org/T191039) [15:11:06] 10Operations, 10User-Elukey: Archival of home directories on servers with very large homes - https://phabricator.wikimedia.org/T215171 (10elukey) [15:11:10] (03PS3) 10Andrew Bogott: ldap: add an index for 'sudoHost' [puppet] - 10https://gerrit.wikimedia.org/r/498396 (https://phabricator.wikimedia.org/T46722) [15:12:13] (03PS16) 10CRusnov: Add system timer for running ganeti->netbox sync. [puppet] - 10https://gerrit.wikimedia.org/r/493774 (https://phabricator.wikimedia.org/T215229) [15:12:27] (03CR) 10Dzahn: [C: 03+1] "https://puppet-compiler.wmflabs.org/compiler1002/15281/" [puppet] - 10https://gerrit.wikimedia.org/r/456317 (https://phabricator.wikimedia.org/T194724) (owner: 10Dzahn) [15:13:37] (03CR) 10Muehlenhoff: [C: 03+1] "Looks good to me" [puppet] - 10https://gerrit.wikimedia.org/r/498396 (https://phabricator.wikimedia.org/T46722) (owner: 10Andrew Bogott) [15:15:21] (03PS4) 10Filippo Giunchedi: [WIP] mirror udp2log data into the logging pipeline [mediawiki-config] - 10https://gerrit.wikimedia.org/r/494254 (https://phabricator.wikimedia.org/T126989) [15:15:29] (03PS1) 10Elukey: admin: allow users to be removed preserving their home directories [puppet] - 10https://gerrit.wikimedia.org/r/498399 (https://phabricator.wikimedia.org/T215171) [15:17:44] (03CR) 10jerkins-bot: [V: 04-1] Update AbuseFilter config to keep the status quo [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475772 (owner: 10Daimona Eaytoy) [15:17:46] (03CR) 10Dzahn: [V: 03+1] "https://puppet-compiler.wmflabs.org/compiler1002/15282/scb1001.eqiad.wmnet/" [puppet] - 10https://gerrit.wikimedia.org/r/456316 (https://phabricator.wikimedia.org/T194724) (owner: 10Dzahn) [15:18:35] (03CR) 10Elukey: "Moritz: I forgot about this task and I found some time to make a proposal, let me know you thoughts. It doesn't resolve all the problems o" [puppet] - 10https://gerrit.wikimedia.org/r/498399 (https://phabricator.wikimedia.org/T215171) (owner: 10Elukey) [15:19:01] (03CR) 10jerkins-bot: [V: 04-1] Add system timer for running ganeti->netbox sync. [puppet] - 10https://gerrit.wikimedia.org/r/493774 (https://phabricator.wikimedia.org/T215229) (owner: 10CRusnov) [15:26:58] !log restarting elasticsearch on elastic1046 for logging configuration change - T218994 [15:27:00] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:27:01] T218994: Deprecation warning on elasticsearch 6 expected [retry_on_conflict] - https://phabricator.wikimedia.org/T218994 [15:28:02] (03CR) 10Jforrester: "I guess this is an artefact to make it easy for the DBAs to disable a shard quickly without having to look up the wording we used last tim" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/497477 (owner: 10Reedy) [15:30:11] 10Operations, 10Parsoid-PHP, 10Patch-For-Review: Install PHP7 on scandium - https://phabricator.wikimedia.org/T213493 (10ssastry) Thanks! :-) [15:36:49] 10Operations, 10SRE-Access-Requests, 10Patch-For-Review: Grant root on MediaWiki maintenance hosts to perf-roots - https://phabricator.wikimedia.org/T217813 (10Dzahn) a:05kchapman→03Dzahn We will have this in our Monday SRE meeting and it can get merged once approved. [15:40:02] 10Operations, 10CirrusSearch, 10Discovery-Search, 10Patch-For-Review: Deprecation warning on elasticsearch 6 expected [retry_on_conflict] - https://phabricator.wikimedia.org/T218994 (10Gehel) The elasticsearch security manager is preventing log4j2 to auto-reload it's configuration (more precisely, it can't... [15:43:56] (03CR) 10Herron: "This will need an update to the ssh_puppet_merge ferm as well to allow a prod puppetmaster to ssh to the labspuppetmaster" [puppet] - 10https://gerrit.wikimedia.org/r/497069 (owner: 10Andrew Bogott) [15:46:52] (03PS17) 10CRusnov: Add system timer for running ganeti->netbox sync. [puppet] - 10https://gerrit.wikimedia.org/r/493774 (https://phabricator.wikimedia.org/T215229) [15:47:46] (03CR) 10Jbond: [C: 04-1] "just a few minor things, -1 mainly for the line71 comment" (034 comments) [puppet] - 10https://gerrit.wikimedia.org/r/498399 (https://phabricator.wikimedia.org/T215171) (owner: 10Elukey) [15:47:59] (03CR) 10Gehel: [C: 04-1] "The migration to py3 looks fine in itself, but there are a few problems that were already there and need to be fixed." (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/498292 (https://phabricator.wikimedia.org/T215439) (owner: 10Mathew.onipe) [15:48:03] (03CR) 10jerkins-bot: [V: 04-1] Add system timer for running ganeti->netbox sync. [puppet] - 10https://gerrit.wikimedia.org/r/493774 (https://phabricator.wikimedia.org/T215229) (owner: 10CRusnov) [15:51:08] (03CR) 10Dzahn: [C: 04-1] "the puppet compiler says Error while evaluating a Resource Statement, Class[Tilerator]: has no parameter named 'use_node10js'. There is a" [puppet] - 10https://gerrit.wikimedia.org/r/495735 (https://phabricator.wikimedia.org/T215523) (owner: 10MSantos) [15:56:38] (03PS18) 10CRusnov: Add system timer for running ganeti->netbox sync. [puppet] - 10https://gerrit.wikimedia.org/r/493774 (https://phabricator.wikimedia.org/T215229) [15:57:18] (03CR) 10jerkins-bot: [V: 04-1] Add system timer for running ganeti->netbox sync. [puppet] - 10https://gerrit.wikimedia.org/r/493774 (https://phabricator.wikimedia.org/T215229) (owner: 10CRusnov) [15:57:46] 10Operations, 10decommission, 10hardware-requests, 10Patch-For-Review: decommission wmf6937 as phab1002, reimage as mw1298 - https://phabricator.wikimedia.org/T215332 (10Dzahn) T215335 is unblocked again. If we can do that and assign it to me as phab1003 then doing this decom task would be slightly easier.... [15:58:07] !log UBN hot-deploy for T218918: Only load latest revision in MessageCache::loadFromDB [15:58:10] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:58:10] T218918: Some interface messages (e.g. sitenotice, others) are loading old revisions of their messages - https://phabricator.wikimedia.org/T218918 [15:58:13] (03PS2) 10EBernhardson: [WIP] Switch mjolnir to rsyslog based structured logging [puppet] - 10https://gerrit.wikimedia.org/r/498232 (https://phabricator.wikimedia.org/T218833) [15:59:53] PROBLEM - Check systemd state on db2096 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [16:00:45] (03PS19) 10CRusnov: Add system timer for running ganeti->netbox sync. [puppet] - 10https://gerrit.wikimedia.org/r/493774 (https://phabricator.wikimedia.org/T215229) [16:00:45] PROBLEM - Check whether ferm is active by checking the default input chain on db2096 is CRITICAL: ERROR ferm input drop default policy not set, ferm might not have been started correctly [16:00:52] (03CR) 10Elukey: "JBond: thanks a lot! Sloppy first attempt, will do better on Monday, I use Friday as excuse :)" (033 comments) [puppet] - 10https://gerrit.wikimedia.org/r/498399 (https://phabricator.wikimedia.org/T215171) (owner: 10Elukey) [16:05:15] (03CR) 10Jbond: [C: 04-1] admin: allow users to be removed preserving their home directories (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/498399 (https://phabricator.wikimedia.org/T215171) (owner: 10Elukey) [16:06:57] !log Restart ferm on db2096 [16:06:58] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:08:51] RECOVERY - Check systemd state on db2096 is OK: OK - running: The system is fully operational [16:09:43] RECOVERY - Check whether ferm is active by checking the default input chain on db2096 is OK: OK ferm input default policy is set [16:15:37] (03PS1) 10Sbisson: Homepage: register mentors page on labs enwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/498416 (https://phabricator.wikimedia.org/T216631) [16:17:16] (03PS1) 10Ppchelko: Create node-specific logstash filters for syslog. [puppet] - 10https://gerrit.wikimedia.org/r/498417 (https://phabricator.wikimedia.org/T211125) [16:21:14] (03CR) 10Sbisson: [C: 03+2] Homepage: register mentors page on labs enwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/498416 (https://phabricator.wikimedia.org/T216631) (owner: 10Sbisson) [16:22:33] (03Merged) 10jenkins-bot: Homepage: register mentors page on labs enwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/498416 (https://phabricator.wikimedia.org/T216631) (owner: 10Sbisson) [16:23:21] (03PS20) 10CRusnov: Add system timer for running ganeti->netbox sync. [puppet] - 10https://gerrit.wikimedia.org/r/493774 (https://phabricator.wikimedia.org/T215229) [16:27:29] (03PS3) 10EBernhardson: Switch mjolnir to rsyslog based structured logging [puppet] - 10https://gerrit.wikimedia.org/r/498232 (https://phabricator.wikimedia.org/T218833) [16:29:41] RECOVERY - Logstash rate of ingestion percent change compared to yesterday on icinga1001 is OK: (C)130 ge (W)110 ge 96.48 https://grafana.wikimedia.org/dashboard/db/logstash?orgId=1&panelId=2&fullscreen [16:29:47] (03PS4) 10EBernhardson: Switch mjolnir to rsyslog based structured logging [puppet] - 10https://gerrit.wikimedia.org/r/498232 (https://phabricator.wikimedia.org/T218833) [16:32:33] (03PS5) 10EBernhardson: Switch mjolnir to rsyslog based structured logging [puppet] - 10https://gerrit.wikimedia.org/r/498232 (https://phabricator.wikimedia.org/T218833) [16:32:45] vgutierrez: Hi, are you actively working on https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/498015/ ? it's cherry-picked in beta and doesn't want to merge with master Can I remove it from there? [16:36:43] that's already been solved and it's applied in production [16:46:39] (03PS1) 10Paladox: Merge branch 'stable-2.15' into wmf/stable-2.15 [software/gerrit] (wmf/stable-2.15) - 10https://gerrit.wikimedia.org/r/498424 [16:47:03] that includes the fix to see ldap groups under polygerrit [16:48:05] PROBLEM - Request latencies on neon is CRITICAL: instance=10.64.0.40:6443 verb=CONNECT https://grafana.wikimedia.org/dashboard/db/kubernetes-api [16:50:36] (03CR) 10Paladox: [V: 03+2 C: 03+2] Merge branch 'stable-2.15' into wmf/stable-2.15 [software/gerrit] (wmf/stable-2.15) - 10https://gerrit.wikimedia.org/r/498424 (owner: 10Paladox) [16:51:15] (03PS1) 10Volans: check_icinga: don't page on secondary host alerts [software/external-monitoring] - 10https://gerrit.wikimedia.org/r/498425 [16:51:55] (03PS2) 10Ppchelko: Create node-specific logstash filters for syslog. [puppet] - 10https://gerrit.wikimedia.org/r/498417 (https://phabricator.wikimedia.org/T211125) [16:51:57] RECOVERY - Request latencies on neon is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/kubernetes-api [16:53:39] (03PS1) 10Paladox: Merge branch 'stable-2.16' into wmf/stable-2.16 [software/gerrit] (wmf/stable-2.16) - 10https://gerrit.wikimedia.org/r/498426 [16:54:24] (03CR) 10Ppchelko: "Tested in beta. Seems to work." [puppet] - 10https://gerrit.wikimedia.org/r/498417 (https://phabricator.wikimedia.org/T211125) (owner: 10Ppchelko) [16:59:19] (03PS2) 10Paladox: Merge branch 'stable-2.16' into wmf/stable-2.16 [software/gerrit] (wmf/stable-2.16) - 10https://gerrit.wikimedia.org/r/498426 [16:59:21] (03PS1) 10Paladox: Update image-diff plugin [software/gerrit] (wmf/stable-2.16) - 10https://gerrit.wikimedia.org/r/498427 [17:06:33] (03PS21) 10CRusnov: Add system timer for running ganeti->netbox sync. [puppet] - 10https://gerrit.wikimedia.org/r/493774 (https://phabricator.wikimedia.org/T215229) [17:08:20] (03PS1) 10Dzahn: openldap/offboard-user: add wikitech user deactivation (WIP) [puppet] - 10https://gerrit.wikimedia.org/r/498429 [17:08:55] (03CR) 10jerkins-bot: [V: 04-1] openldap/offboard-user: add wikitech user deactivation (WIP) [puppet] - 10https://gerrit.wikimedia.org/r/498429 (owner: 10Dzahn) [17:14:03] (03PS1) 10Hashar: gerrit: admins: ops -> gerritadmin [puppet] - 10https://gerrit.wikimedia.org/r/498431 [17:14:56] (03CR) 10CDanis: [C: 03+1] check_icinga: don't page on secondary host alerts (031 comment) [software/external-monitoring] - 10https://gerrit.wikimedia.org/r/498425 (owner: 10Volans) [17:17:02] (03PS2) 10Dzahn: openldap/offboard-user: add wikitech user deactivation (WIP) [puppet] - 10https://gerrit.wikimedia.org/r/498429 [17:17:02] (03CR) 10jerkins-bot: [V: 04-1] openldap/offboard-user: add wikitech user deactivation (WIP) [puppet] - 10https://gerrit.wikimedia.org/r/498429 (owner: 10Dzahn) [17:19:08] (03PS22) 10CRusnov: Add system timer for running ganeti->netbox sync. [puppet] - 10https://gerrit.wikimedia.org/r/493774 (https://phabricator.wikimedia.org/T215229) [17:19:25] (03PS4) 10Jforrester: Added wmgWikibaseEntitySources setting for defining Wikibase "entity sources" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/490104 (https://phabricator.wikimedia.org/T214557) (owner: 10WMDE-leszek) [17:19:27] (03PS3) 10Dzahn: openldap/offboard-user: add wikitech user deactivation (WIP) [puppet] - 10https://gerrit.wikimedia.org/r/498429 [17:19:29] (03PS4) 10Jforrester: Added wmgWikibaseRepoLocalEntitySourceName to define the "local" source of Wikibase Repo [mediawiki-config] - 10https://gerrit.wikimedia.org/r/490633 (https://phabricator.wikimedia.org/T214557) (owner: 10WMDE-leszek) [17:19:31] (03PS6) 10Jforrester: DNM Define Wikibase "entity sources" on beta commons [mediawiki-config] - 10https://gerrit.wikimedia.org/r/490108 (https://phabricator.wikimedia.org/T214557) (owner: 10WMDE-leszek) [17:22:22] (03CR) 10Dzahn: [C: 03+1] "more details at https://phabricator.wikimedia.org/T218912#5048420" [puppet] - 10https://gerrit.wikimedia.org/r/498431 (owner: 10Hashar) [17:22:41] PROBLEM - toolschecker: check mtime mod from tools cron job on checker.tools.wmflabs.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 SERVICE UNAVAILABLE - string OK not found on http://checker.tools.wmflabs.org:80/toolscron - 185 bytes in 0.005 second response time https://wikitech.wikimedia.org/wiki/Help:Toolforge/Monitoring [17:23:20] (03PS1) 10Jforrester: [BETA] SDC: Stop setting up old-style federation [mediawiki-config] - 10https://gerrit.wikimedia.org/r/498432 [17:23:22] (03PS1) 10Jforrester: SDC: Stop setting wgWBRepoSettings['foreignRepositories'] old-style federation [mediawiki-config] - 10https://gerrit.wikimedia.org/r/498433 [17:23:24] (03PS1) 10Jforrester: SDC: Stop setting up old-style federation, no longer read [mediawiki-config] - 10https://gerrit.wikimedia.org/r/498434 [17:23:44] (03CR) 10Paladox: [V: 03+2 C: 03+2] Merge branch 'stable-2.16' into wmf/stable-2.16 [software/gerrit] (wmf/stable-2.16) - 10https://gerrit.wikimedia.org/r/498426 (owner: 10Paladox) [17:24:06] 10Operations, 10Analytics, 10Analytics-Kanban, 10EventBus, and 3 others: eventgate-analytics k8s pods occasionally can't produce to kafka - https://phabricator.wikimedia.org/T218268 (10Ottomata) I tcpdumped while reproducing this on kubestage1001 and the kafka brokers. I could see the looped metadata requ... [17:29:18] !log otto@deploy1001 scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml --set main_app.kafka_broker_list=kafka-jumbo1003.eqiad.wmnet:9092 stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging] [17:29:19] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:29:37] (03PS4) 10Dzahn: openldap/offboard-user: add wikitech user deactivation (WIP) [puppet] - 10https://gerrit.wikimedia.org/r/498429 [17:33:07] (03CR) 10Paladox: [C: 04-2] "We need to adjust the build to support this update." [software/gerrit] (wmf/stable-2.16) - 10https://gerrit.wikimedia.org/r/498427 (owner: 10Paladox) [17:33:30] (03Abandoned) 10Paladox: WIP: Update gerrit to 2.16.5 [software/gerrit] (deploy/wmf/stable-2.15) - 10https://gerrit.wikimedia.org/r/486711 (owner: 10Paladox) [17:36:57] PROBLEM - Request latencies on neon is CRITICAL: instance=10.64.0.40:6443 verb=CONNECT https://grafana.wikimedia.org/dashboard/db/kubernetes-api [17:38:25] 10Operations, 10Analytics, 10Analytics-Kanban, 10EventBus, and 3 others: eventgate-analytics k8s pods occasionally can't produce to kafka - https://phabricator.wikimedia.org/T218268 (10Ottomata) Interesting fact: kafka-jumbo1003 and kafka-jumbo1006 are the only brokers in this cluster that are not in Row... [17:40:47] RECOVERY - Request latencies on neon is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/kubernetes-api [17:45:41] RECOVERY - toolschecker: check mtime mod from tools cron job on checker.tools.wmflabs.org is OK: HTTP OK: HTTP/1.1 200 OK - 166 bytes in 0.007 second response time https://wikitech.wikimedia.org/wiki/Help:Toolforge/Monitoring [17:48:31] 10Operations, 10CirrusSearch, 10Discovery-Search, 10Patch-For-Review: Deprecation warning on elasticsearch 6 expected [retry_on_conflict] - https://phabricator.wikimedia.org/T218994 (10EBernhardson) Can't we apply the setting directly to elasticsearch cluster settings? We already have this in the cluster... [17:48:57] (03PS23) 10CRusnov: Add system timer for running ganeti->netbox sync. [puppet] - 10https://gerrit.wikimedia.org/r/493774 (https://phabricator.wikimedia.org/T215229) [17:49:58] (03CR) 10jerkins-bot: [V: 04-1] Add system timer for running ganeti->netbox sync. [puppet] - 10https://gerrit.wikimedia.org/r/493774 (https://phabricator.wikimedia.org/T215229) (owner: 10CRusnov) [17:51:19] (03PS6) 10Paladox: WIP: Update gerrit to 2.16.7 [software/gerrit] (deploy/wmf/stable-2.16) - 10https://gerrit.wikimedia.org/r/495012 [17:53:26] (03PS24) 10CRusnov: Add system timer for running ganeti->netbox sync. [puppet] - 10https://gerrit.wikimedia.org/r/493774 (https://phabricator.wikimedia.org/T215229) [17:54:07] (03CR) 10jerkins-bot: [V: 04-1] Add system timer for running ganeti->netbox sync. [puppet] - 10https://gerrit.wikimedia.org/r/493774 (https://phabricator.wikimedia.org/T215229) (owner: 10CRusnov) [17:55:33] (03PS1) 10Andrew Bogott: ldap replicas: enable alerting [puppet] - 10https://gerrit.wikimedia.org/r/498439 (https://phabricator.wikimedia.org/T46722) [17:59:56] (03PS7) 10Paladox: WIP: Update gerrit to 2.16.7 [software/gerrit] (deploy/wmf/stable-2.16) - 10https://gerrit.wikimedia.org/r/495012 [18:04:43] (03CR) 10Andrew Bogott: [C: 03+2] ldap replicas: enable alerting [puppet] - 10https://gerrit.wikimedia.org/r/498439 (https://phabricator.wikimedia.org/T46722) (owner: 10Andrew Bogott) [18:05:19] (03PS25) 10CRusnov: Add system timer for running ganeti->netbox sync. [puppet] - 10https://gerrit.wikimedia.org/r/493774 (https://phabricator.wikimedia.org/T215229) [18:05:53] (03PS1) 10Ladsgroup: Set $wmgWikibaseSiteGroup for wikimaniawiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/498440 (https://phabricator.wikimedia.org/T217730) [18:06:18] (03CR) 10jerkins-bot: [V: 04-1] Add system timer for running ganeti->netbox sync. [puppet] - 10https://gerrit.wikimedia.org/r/493774 (https://phabricator.wikimedia.org/T215229) (owner: 10CRusnov) [18:08:43] (03PS26) 10CRusnov: Add system timer for running ganeti->netbox sync. [puppet] - 10https://gerrit.wikimedia.org/r/493774 (https://phabricator.wikimedia.org/T215229) [18:09:45] (03CR) 10jerkins-bot: [V: 04-1] Add system timer for running ganeti->netbox sync. [puppet] - 10https://gerrit.wikimedia.org/r/493774 (https://phabricator.wikimedia.org/T215229) (owner: 10CRusnov) [18:11:08] (03CR) 10Paladox: [C: 03+1] gerrit: admins: ops -> gerritadmin [puppet] - 10https://gerrit.wikimedia.org/r/498431 (owner: 10Hashar) [18:13:14] (03PS27) 10CRusnov: Add system timer for running ganeti->netbox sync. [puppet] - 10https://gerrit.wikimedia.org/r/493774 (https://phabricator.wikimedia.org/T215229) [18:13:36] !log otto@deploy1001 scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml --set main_app.kafka_broker_list=kafka-jumbo1003.eqiad.wmnet:9092 stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging] [18:13:37] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:13:42] !log removing 5 files for legal compliance [18:13:43] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:13:51] (03CR) 10jerkins-bot: [V: 04-1] Add system timer for running ganeti->netbox sync. [puppet] - 10https://gerrit.wikimedia.org/r/493774 (https://phabricator.wikimedia.org/T215229) (owner: 10CRusnov) [18:15:57] (03CR) 10Cwhite: "One comment inline. Otherwise looks good!" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/498214 (https://phabricator.wikimedia.org/T217932) (owner: 10BryanDavis) [18:16:54] !log otto@deploy1001 scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml --set main_app.kafka_broker_list=kafka-jumbo1003.eqiad.wmnet:9092 stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging] [18:16:56] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:16:57] !log otto@deploy1001 scap-helm eventgate-analytics cluster staging completed [18:16:57] !log otto@deploy1001 scap-helm eventgate-analytics finished [18:16:58] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:16:59] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:17:15] (03PS28) 10CRusnov: Add system timer for running ganeti->netbox sync. [puppet] - 10https://gerrit.wikimedia.org/r/493774 (https://phabricator.wikimedia.org/T215229) [18:19:32] PROBLEM - Request latencies on neon is CRITICAL: instance=10.64.0.40:6443 verb=CONNECT https://grafana.wikimedia.org/dashboard/db/kubernetes-api [18:21:38] !log otto@deploy1001 scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging] [18:21:39] !log otto@deploy1001 scap-helm eventgate-analytics cluster staging completed [18:21:39] !log otto@deploy1001 scap-helm eventgate-analytics finished [18:21:39] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:21:41] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:21:42] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:23:24] (03PS29) 10CRusnov: Add system timer for running ganeti->netbox sync. [puppet] - 10https://gerrit.wikimedia.org/r/493774 (https://phabricator.wikimedia.org/T215229) [18:26:37] (03PS30) 10CRusnov: Add system timer for running ganeti->netbox sync. [puppet] - 10https://gerrit.wikimedia.org/r/493774 (https://phabricator.wikimedia.org/T215229) [18:27:28] (03PS1) 10Smalyshev: Enable WBCS on Testcommons too [mediawiki-config] - 10https://gerrit.wikimedia.org/r/498442 (https://phabricator.wikimedia.org/T218715) [18:28:02] (03PS1) 10Ayounsi: Logstash: add Icinga notifications parsing [puppet] - 10https://gerrit.wikimedia.org/r/498443 [18:28:13] (03CR) 10jerkins-bot: [V: 04-1] Enable WBCS on Testcommons too [mediawiki-config] - 10https://gerrit.wikimedia.org/r/498442 (https://phabricator.wikimedia.org/T218715) (owner: 10Smalyshev) [18:28:42] RECOVERY - Request latencies on neon is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/kubernetes-api [18:30:04] (03PS31) 10CRusnov: Add system timer for running ganeti->netbox sync. [puppet] - 10https://gerrit.wikimedia.org/r/493774 (https://phabricator.wikimedia.org/T215229) [18:30:22] (03PS2) 10Smalyshev: Enable WBCS on Testcommons too [mediawiki-config] - 10https://gerrit.wikimedia.org/r/498442 (https://phabricator.wikimedia.org/T218715) [18:31:11] 10Operations, 10Analytics, 10Analytics-Kanban, 10EventBus, and 3 others: eventgate-analytics k8s pods occasionally can't produce to kafka - https://phabricator.wikimedia.org/T218268 (10Ottomata) Ohooohooooo! Could this be related to IPv6? On a [[ https://logstash.wikimedia.org/goto/1a4ba12d778db9ae95d75db... [18:31:47] (03CR) 10Ayounsi: "https://puppet-compiler.wmflabs.org/compiler1001/15298/logstash1007.eqiad.wmnet/" [puppet] - 10https://gerrit.wikimedia.org/r/498443 (owner: 10Ayounsi) [18:34:20] (03CR) 10CRusnov: "It was a bit difficult to discover the proper set of ways to get splaying working, but i have done so." [puppet] - 10https://gerrit.wikimedia.org/r/493774 (https://phabricator.wikimedia.org/T215229) (owner: 10CRusnov) [18:36:41] (03CR) 10EBernhardson: [C: 03+1] Enable WBCS on Testcommons too [mediawiki-config] - 10https://gerrit.wikimedia.org/r/498442 (https://phabricator.wikimedia.org/T218715) (owner: 10Smalyshev) [18:37:53] * Krinkle staging on mwdebug1002 [18:39:11] (03CR) 10jenkins-bot: Homepage: register mentors page on labs enwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/498416 (https://phabricator.wikimedia.org/T216631) (owner: 10Sbisson) [18:41:15] !log krinkle@deploy1001 Synchronized php-1.33.0-wmf.22/extensions/Collection/: I2c4f5d005fc25c52 / T217835 (duration: 00m 52s) [18:41:22] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:41:23] T217835: PHP Fatal Error: Argument 1 passed to SpecialCollection::postZip() must be an instance of array, bool given - https://phabricator.wikimedia.org/T217835 [18:41:26] (03CR) 10Jforrester: [C: 03+1] Enable WBCS on Testcommons too [mediawiki-config] - 10https://gerrit.wikimedia.org/r/498442 (https://phabricator.wikimedia.org/T218715) (owner: 10Smalyshev) [18:41:49] 10Operations, 10ops-codfw, 10DBA, 10Patch-For-Review: rack/setup/deploy codfw dedicated backup recovery/provisioning hosts - https://phabricator.wikimedia.org/T218336 (10jcrespo) a:03jcrespo [18:43:12] PROBLEM - HHVM rendering on mwdebug1002 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers [18:44:20] RECOVERY - HHVM rendering on mwdebug1002 is OK: HTTP OK: HTTP/1.1 200 OK - 80702 bytes in 0.210 second response time https://wikitech.wikimedia.org/wiki/Application_servers [18:46:39] (03PS1) 10Paladox: Add readonly plugin [software/gerrit] (wmf/stable-2.15) - 10https://gerrit.wikimedia.org/r/498448 [18:46:59] (03PS2) 10Paladox: Add readonly plugin [software/gerrit] (wmf/stable-2.15) - 10https://gerrit.wikimedia.org/r/498448 [18:48:56] PROBLEM - Request latencies on neon is CRITICAL: instance=10.64.0.40:6443 verb=CONNECT https://grafana.wikimedia.org/dashboard/db/kubernetes-api [18:50:12] RECOVERY - Request latencies on neon is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/kubernetes-api [18:53:15] !log otto@deploy1001 scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging] [18:53:17] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:53:18] !log otto@deploy1001 scap-helm eventgate-analytics cluster staging completed [18:53:18] !log otto@deploy1001 scap-helm eventgate-analytics finished [18:53:19] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:53:20] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:00:22] PROBLEM - Request latencies on neon is CRITICAL: instance=10.64.0.40:6443 verb=CONNECT https://grafana.wikimedia.org/dashboard/db/kubernetes-api [19:01:38] RECOVERY - Request latencies on neon is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/kubernetes-api [19:02:59] 10Operations, 10Analytics, 10Analytics-Kanban, 10EventBus, and 3 others: eventgate-analytics k8s pods occasionally can't produce to kafka - https://phabricator.wikimedia.org/T218268 (10Ottomata) From https://github.com/edenhill/librdkafka/wiki/FAQ > librdkafka will use the system resolver to resolve the... [19:11:47] (03CR) 10Herron: [C: 03+1] "Looks good to me! One nitpicky thing inline" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/498443 (owner: 10Ayounsi) [19:19:13] (03PS2) 10Ayounsi: Logstash: add Icinga notifications parsing [puppet] - 10https://gerrit.wikimedia.org/r/498443 [19:19:47] (03CR) 10Ayounsi: Logstash: add Icinga notifications parsing (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/498443 (owner: 10Ayounsi) [19:22:09] (03PS4) 10Andrew Bogott: puppet-merge: merge to wmcs puppetmasters as well [puppet] - 10https://gerrit.wikimedia.org/r/497069 [19:23:12] (03CR) 10jerkins-bot: [V: 04-1] puppet-merge: merge to wmcs puppetmasters as well [puppet] - 10https://gerrit.wikimedia.org/r/497069 (owner: 10Andrew Bogott) [19:23:13] (03PS1) 10Ottomata: eventgate-analytics - set broker.address.family: v4 to workaround k8s IPv6 issue [deployment-charts] - 10https://gerrit.wikimedia.org/r/498455 (https://phabricator.wikimedia.org/T218268) [19:24:44] (03CR) 10Ppchelko: eventgate-analytics - set broker.address.family: v4 to workaround k8s IPv6 issue (031 comment) [deployment-charts] - 10https://gerrit.wikimedia.org/r/498455 (https://phabricator.wikimedia.org/T218268) (owner: 10Ottomata) [19:25:16] (03PS2) 10Ottomata: eventgate-analytics - set broker.address.family: v4 to workaround k8s IPv6 issue [deployment-charts] - 10https://gerrit.wikimedia.org/r/498455 (https://phabricator.wikimedia.org/T218268) [19:25:54] (03CR) 10Ppchelko: [C: 03+1] eventgate-analytics - set broker.address.family: v4 to workaround k8s IPv6 issue [deployment-charts] - 10https://gerrit.wikimedia.org/r/498455 (https://phabricator.wikimedia.org/T218268) (owner: 10Ottomata) [19:27:20] PROBLEM - Disk space on notebook1004 is CRITICAL: DISK CRITICAL - /mnt/hdfs is not accessible: Transport endpoint is not connected [19:31:49] (03CR) 10Herron: [C: 03+1] Logstash: add Icinga notifications parsing [puppet] - 10https://gerrit.wikimedia.org/r/498443 (owner: 10Ayounsi) [19:31:52] (03CR) 10Ottomata: [V: 03+2 C: 03+2] eventgate-analytics - set broker.address.family: v4 to workaround k8s IPv6 issue [deployment-charts] - 10https://gerrit.wikimedia.org/r/498455 (https://phabricator.wikimedia.org/T218268) (owner: 10Ottomata) [19:32:33] (03PS5) 10BryanDavis: striker: let uwsgi container and app logs flow to stdout/stderr [puppet] - 10https://gerrit.wikimedia.org/r/498214 (https://phabricator.wikimedia.org/T217932) [19:33:16] (03CR) 10BryanDavis: striker: let uwsgi container and app logs flow to stdout/stderr (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/498214 (https://phabricator.wikimedia.org/T217932) (owner: 10BryanDavis) [19:34:49] (03CR) 10Ayounsi: [C: 04-1] Add report which checks against puppetdb and compares serial numbers (031 comment) [software/netbox-reports] - 10https://gerrit.wikimedia.org/r/495267 (https://phabricator.wikimedia.org/T212526) (owner: 10CRusnov) [19:35:48] 10Operations, 10ops-eqiad: Degraded RAID on sodium - https://phabricator.wikimedia.org/T212010 (10RobH) a:05RobH→03Cmjohnson Reply from a Dell SR: > Hello Rob/Chris, > > > > My name is Ivan, Resolution Manager at Dell EMC. I was engaged on the case above regarding the HDD replacement issue you all... [19:36:48] !log otto@deploy1001 scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging] [19:36:49] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:36:50] !log otto@deploy1001 scap-helm eventgate-analytics cluster staging completed [19:36:50] !log otto@deploy1001 scap-helm eventgate-analytics finished [19:36:51] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:36:53] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:39:20] !log otto@deploy1001 scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging] [19:39:21] !log otto@deploy1001 scap-helm eventgate-analytics cluster staging completed [19:39:21] !log otto@deploy1001 scap-helm eventgate-analytics finished [19:39:21] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:39:22] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:39:23] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:42:50] (03CR) 10Cwhite: "Thank you for this! Comments inline." (034 comments) [puppet] - 10https://gerrit.wikimedia.org/r/498232 (https://phabricator.wikimedia.org/T218833) (owner: 10EBernhardson) [19:44:02] PROBLEM - Request latencies on neon is CRITICAL: instance=10.64.0.40:6443 verb=CONNECT https://grafana.wikimedia.org/dashboard/db/kubernetes-api [19:44:11] (03CR) 10Cwhite: [C: 03+1] striker: let uwsgi container and app logs flow to stdout/stderr [puppet] - 10https://gerrit.wikimedia.org/r/498214 (https://phabricator.wikimedia.org/T217932) (owner: 10BryanDavis) [19:44:24] (03PS1) 10Ottomata: eventgate-analytics - Incorrectly indexed 0.0.17, use 0.0.18 instead [deployment-charts] - 10https://gerrit.wikimedia.org/r/498461 [19:44:49] (03CR) 10Ottomata: [V: 03+2 C: 03+2] eventgate-analytics - Incorrectly indexed 0.0.17, use 0.0.18 instead [deployment-charts] - 10https://gerrit.wikimedia.org/r/498461 (owner: 10Ottomata) [19:46:26] !log otto@deploy1001 scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging] [19:46:28] !log otto@deploy1001 scap-helm eventgate-analytics cluster staging completed [19:46:28] !log otto@deploy1001 scap-helm eventgate-analytics finished [19:46:28] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:46:29] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:46:30] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:47:42] shdubsh: Will you deploy that puppet change for Striker logging at some point, or should I track down another willing root to do the +2 and post merge bits? [19:48:08] You may be better able to tell if things are going horribly wrong or not than most others [19:49:36] bd808: Sure, no problem. I'll do final checks and deploy. [19:50:00] shdubsh: no rush on my side either, just trying to figure out next steps :) [19:50:26] I got all the app level changes needed for it deployed yesterday [19:52:18] !log otto@deploy1001 scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml stable/eventgate-analytics --set main_app.extra_kafka_conf={} [namespace: eventgate-analytics, clusters: staging] [19:52:20] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:52:21] !log otto@deploy1001 scap-helm eventgate-analytics cluster staging completed [19:52:21] !log otto@deploy1001 scap-helm eventgate-analytics finished [19:52:22] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:52:23] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:52:24] 10Operations, 10ops-eqiad: Degraded RAID on sodium - https://phabricator.wikimedia.org/T212010 (10RobH) Basically we need the email thread where this swap was approved and done by Dell to demonstrate the 4TB are dell supported and under their warranty. [19:52:26] 10Operations, 10cloud-services-team (Kanban): WMCS-related dashboards using Diamond metrics - https://phabricator.wikimedia.org/T210850 (10Bstorm) [19:52:30] 10Operations, 10Toolforge, 10monitoring, 10User-fgiunchedi, 10cloud-services-team (Kanban): Deprecate Diamond collectors in Tool Labs / Tool Forge - https://phabricator.wikimedia.org/T210991 (10Bstorm) [19:52:51] !log otto@deploy1001 scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml stable/eventgate-analytics --set main_app.extra_kafka_conf= [namespace: eventgate-analytics, clusters: staging] [19:52:52] !log otto@deploy1001 scap-helm eventgate-analytics cluster staging completed [19:52:52] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:52:52] !log otto@deploy1001 scap-helm eventgate-analytics finished [19:52:53] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:52:54] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:55:26] !log otto@deploy1001 scap-helm eventgate-analytics upgrade production -f eventgate-analytics-codfw-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: codfw] [19:55:27] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:55:29] !log otto@deploy1001 scap-helm eventgate-analytics cluster codfw completed [19:55:29] !log otto@deploy1001 scap-helm eventgate-analytics finished [19:55:30] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:55:31] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:55:42] 10Operations, 10Cloud-Services, 10Datasets-General-or-Unknown, 10User-ArielGlenn, 10cloud-services-team (Kanban): Adjust bandwidth/connection limits, memory settings on labstore1006,7 as appropriate - https://phabricator.wikimedia.org/T191491 (10Bstorm) a:05Bstorm→03None A brief check suggests not ma... [19:57:48] !log otto@deploy1001 scap-helm eventgate-analytics upgrade production -f eventgate-analytics-eqiad-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: eqiad] [19:57:49] !log otto@deploy1001 scap-helm eventgate-analytics cluster eqiad completed [19:57:49] !log otto@deploy1001 scap-helm eventgate-analytics finished [19:57:50] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:57:51] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:57:52] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:58:32] (03CR) 10CRusnov: "> Patch Set 4: Code-Review-1" (031 comment) [software/netbox-reports] - 10https://gerrit.wikimedia.org/r/495267 (https://phabricator.wikimedia.org/T212526) (owner: 10CRusnov) [19:59:24] RECOVERY - Request latencies on neon is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/kubernetes-api [20:02:23] (03PS1) 10Herron: logstash: send varnish syslogs via kafka logging pipeline [puppet] - 10https://gerrit.wikimedia.org/r/498467 (https://phabricator.wikimedia.org/T213899) [20:03:39] (03CR) 10jerkins-bot: [V: 04-1] logstash: send varnish syslogs via kafka logging pipeline [puppet] - 10https://gerrit.wikimedia.org/r/498467 (https://phabricator.wikimedia.org/T213899) (owner: 10Herron) [20:04:52] !log otto@deploy1001 scap-helm eventgate-analytics upgrade production -f eventgate-analytics-eqiad-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: eqiad] [20:04:53] !log otto@deploy1001 scap-helm eventgate-analytics cluster eqiad completed [20:04:53] !log otto@deploy1001 scap-helm eventgate-analytics finished [20:04:53] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:04:54] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:04:56] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:06:13] (03PS2) 10Herron: logstash: send varnish syslogs via kafka logging pipeline [puppet] - 10https://gerrit.wikimedia.org/r/498467 (https://phabricator.wikimedia.org/T213899) [20:07:40] (03PS5) 10Andrew Bogott: puppet-merge: merge to wmcs puppetmasters as well [puppet] - 10https://gerrit.wikimedia.org/r/497069 [20:07:42] (03PS1) 10Andrew Bogott: git-sync-upstream: support a mode where only the /labs/private repo is updated [puppet] - 10https://gerrit.wikimedia.org/r/498473 [20:09:02] (03CR) 10jerkins-bot: [V: 04-1] puppet-merge: merge to wmcs puppetmasters as well [puppet] - 10https://gerrit.wikimedia.org/r/497069 (owner: 10Andrew Bogott) [20:10:39] (03CR) 10Herron: "https://puppet-compiler.wmflabs.org/compiler1002/15301/cp1080.eqiad.wmnet/" [puppet] - 10https://gerrit.wikimedia.org/r/498467 (https://phabricator.wikimedia.org/T213899) (owner: 10Herron) [20:14:44] (03PS2) 10Andrew Bogott: git-sync-upstream: support a mode where only the /labs/private repo is updated [puppet] - 10https://gerrit.wikimedia.org/r/498473 [20:14:45] !log otto@deploy1001 scap-helm eventgate-analytics upgrade production -f eventgate-analytics-eqiad-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: eqiad] [20:14:46] (03PS6) 10Andrew Bogott: puppet-merge: merge to wmcs puppetmasters as well [puppet] - 10https://gerrit.wikimedia.org/r/497069 [20:14:46] !log otto@deploy1001 scap-helm eventgate-analytics cluster eqiad completed [20:14:46] !log otto@deploy1001 scap-helm eventgate-analytics finished [20:14:46] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:14:48] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:14:49] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:17:18] 10Operations, 10WMF-NDA-Requests: Volunteer NDA for Alex Monk - https://phabricator.wikimedia.org/T218448 (10Andrew) fyi @jcrespo, access levels (in particular 'cloud-wide root') are defined in the policy document here: https://wikitech.wikimedia.org/wiki/Help:Access_policies [20:17:25] (03CR) 10jerkins-bot: [V: 04-1] puppet-merge: merge to wmcs puppetmasters as well [puppet] - 10https://gerrit.wikimedia.org/r/497069 (owner: 10Andrew Bogott) [20:22:11] (03PS3) 10Andrew Bogott: git-sync-upstream: support a mode where only the /labs/private repo is updated [puppet] - 10https://gerrit.wikimedia.org/r/498473 [20:22:13] (03PS7) 10Andrew Bogott: puppet-merge: merge to wmcs puppetmasters as well [puppet] - 10https://gerrit.wikimedia.org/r/497069 [20:24:31] (03CR) 10jerkins-bot: [V: 04-1] puppet-merge: merge to wmcs puppetmasters as well [puppet] - 10https://gerrit.wikimedia.org/r/497069 (owner: 10Andrew Bogott) [20:25:19] 10Operations, 10Gerrit, 10Release-Engineering-Team, 10serviceops: Deploy multi-site plugin to cobalt and gerrit2001 - https://phabricator.wikimedia.org/T217174 (10Paladox) [20:26:42] (03CR) 10Andrew Bogott: [C: 03+2] git-sync-upstream: support a mode where only the /labs/private repo is updated [puppet] - 10https://gerrit.wikimedia.org/r/498473 (owner: 10Andrew Bogott) [20:44:33] (03PS6) 10EBernhardson: Switch mjolnir to rsyslog based structured logging [puppet] - 10https://gerrit.wikimedia.org/r/498232 (https://phabricator.wikimedia.org/T218833) [20:57:22] (03PS8) 10Zoranzoki21: Enable VisualEditor at fiwikibooks [mediawiki-config] - 10https://gerrit.wikimedia.org/r/446583 (https://phabricator.wikimedia.org/T192135) [20:59:24] (03CR) 10Zoranzoki21: "Can I get this -2 removed as it is approved per comment in task?" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/446583 (https://phabricator.wikimedia.org/T192135) (owner: 10Zoranzoki21) [21:07:37] (03CR) 10Zoranzoki21: "> Removed Code-Review-2 by Jforrester " [mediawiki-config] - 10https://gerrit.wikimedia.org/r/446583 (https://phabricator.wikimedia.org/T192135) (owner: 10Zoranzoki21) [21:12:16] 10Operations, 10Icinga, 10monitoring: Icinga passive checks go awol and downtime stops working - https://phabricator.wikimedia.org/T196336 (10CDanis) p:05High→03Normal [21:13:38] PROBLEM - puppet last run on wdqs1007 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [21:16:46] PROBLEM - MediaWiki exceptions and fatals per minute on graphite1004 is CRITICAL: CRITICAL: 90.00% of data above the critical threshold [50.0] https://grafana.wikimedia.org/dashboard/db/mediawiki-graphite-alerts?orgId=1&panelId=2&fullscreen [21:30:43] (03PS6) 10Dmaza: Enforce 8 char password length requirements for non-privileged users [mediawiki-config] - 10https://gerrit.wikimedia.org/r/496202 (https://phabricator.wikimedia.org/T211622) [21:32:55] (03CR) 10Ayounsi: [C: 04-1] "> Patch Set 4:" (031 comment) [software/netbox-reports] - 10https://gerrit.wikimedia.org/r/495267 (https://phabricator.wikimedia.org/T212526) (owner: 10CRusnov) [21:35:51] (03CR) 10Cwhite: "Looks good! Comments inline are minor." (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/498386 (https://phabricator.wikimedia.org/T126989) (owner: 10Filippo Giunchedi) [21:39:58] RECOVERY - puppet last run on wdqs1007 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [21:40:25] (03CR) 10Cwhite: [C: 03+2] "PCC looks good: https://puppet-compiler.wmflabs.org/compiler1002/15305/" [puppet] - 10https://gerrit.wikimedia.org/r/498214 (https://phabricator.wikimedia.org/T217932) (owner: 10BryanDavis) [21:40:55] (03PS6) 10Cwhite: striker: let uwsgi container and app logs flow to stdout/stderr [puppet] - 10https://gerrit.wikimedia.org/r/498214 (https://phabricator.wikimedia.org/T217932) (owner: 10BryanDavis) [21:44:38] (03CR) 10Paladox: Gerrit: Support switching ldap servers (033 comments) [puppet] - 10https://gerrit.wikimedia.org/r/494811 (owner: 10Paladox) [21:44:48] (03PS6) 10Paladox: Gerrit: Support switching ldap servers [puppet] - 10https://gerrit.wikimedia.org/r/494811 [22:04:09] (03CR) 10Cwhite: [C: 03+1] "We talked about the "level" field in the other CS, but this CS looks good to me." [puppet] - 10https://gerrit.wikimedia.org/r/498232 (https://phabricator.wikimedia.org/T218833) (owner: 10EBernhardson) [22:04:53] (03CR) 10Cwhite: [C: 03+1] logstash: send varnish syslogs via kafka logging pipeline [puppet] - 10https://gerrit.wikimedia.org/r/498467 (https://phabricator.wikimedia.org/T213899) (owner: 10Herron) [22:12:56] !log Restarted uwsgi-striker on labweb1001 [22:12:58] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:13:35] !log Restarted uwsgi-striker on labweb1002 [22:13:36] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:19:52] shdubsh: yuck. I just figured out that service::uwsgi puts in all the logging I just ripped out of striker::uwsgi. So logs are not flowing as hoped yet :/ [22:20:46] :( [22:21:51] netbox, ores, debmonitor, puppetboard, and striker use that common define [22:22:01] * bd808 looks to see what surgery can be done [22:22:06] (03PS3) 10MarcoAurelio: contint: change `/r/p/` to `/r/` for gerrit links [puppet] - 10https://gerrit.wikimedia.org/r/498057 (https://phabricator.wikimedia.org/T218844) [22:43:18] PROBLEM - puppet last run on ms-be1045 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [23:09:44] RECOVERY - puppet last run on ms-be1045 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [23:15:35] (03CR) 10Hashar: [C: 03+1] contint: change `/r/p/` to `/r/` for gerrit links [puppet] - 10https://gerrit.wikimedia.org/r/498057 (https://phabricator.wikimedia.org/T218844) (owner: 10MarcoAurelio) [23:17:34] (03PS1) 10BryanDavis: service::uwsgi: Allow instances to disable logging config [puppet] - 10https://gerrit.wikimedia.org/r/498516 (https://phabricator.wikimedia.org/T217932) [23:18:33] 10Operations, 10Continuous-Integration-Infrastructure, 10Packaging, 10Patch-For-Review, 10Release-Engineering-Team (Kanban): Upgrade jenkins-debian-glue to v0.20.0 - https://phabricator.wikimedia.org/T212774 (10hashar) 05Open→03Resolved Seems good so far. Thank you very much. I will later look at mig... [23:18:46] (03PS2) 10BryanDavis: service::uwsgi: Allow instances to disable logging config [puppet] - 10https://gerrit.wikimedia.org/r/498516 (https://phabricator.wikimedia.org/T217932) [23:21:12] (03PS20) 10Alex Monk: cumin: Allow Puppet DB backend to be used within Labs projects that use it [puppet] - 10https://gerrit.wikimedia.org/r/437052 [23:21:40] (03CR) 10jerkins-bot: [V: 04-1] cumin: Allow Puppet DB backend to be used within Labs projects that use it [puppet] - 10https://gerrit.wikimedia.org/r/437052 (owner: 10Alex Monk) [23:23:14] (03PS21) 10Alex Monk: cumin: Allow Puppet DB backend to be used within Labs projects that use it [puppet] - 10https://gerrit.wikimedia.org/r/437052 [23:25:55] (03PS22) 10Alex Monk: cumin: Allow Puppet DB backend to be used within Labs projects that use it [puppet] - 10https://gerrit.wikimedia.org/r/437052 [23:28:13] 10Operations, 10Gerrit, 10Release-Engineering-Team: Create Gerrit Administrator right policy - https://phabricator.wikimedia.org/T218686 (10Peachey88) [23:30:34] (03CR) 10BryanDavis: "* https://puppet-compiler.wmflabs.org/compiler1002/15306/labweb1001.wikimedia.org/" [puppet] - 10https://gerrit.wikimedia.org/r/498516 (https://phabricator.wikimedia.org/T217932) (owner: 10BryanDavis) [23:39:50] (03PS1) 10Paladox: Make WebSessionManager.Val#getAccountId public [software/gerrit] (wmf/stable-2.15) - 10https://gerrit.wikimedia.org/r/498518 [23:40:21] (03CR) 10Paladox: [V: 03+2 C: 03+2] "(Almost merged upstream && already tested that this change builds)" [software/gerrit] (wmf/stable-2.15) - 10https://gerrit.wikimedia.org/r/498518 (owner: 10Paladox) [23:45:54] 10Operations, 10ops-codfw, 10decommission, 10Patch-For-Review, 10cloud-services-team (Kanban): decommmision: labtestweb2001.wikimedia.org - https://phabricator.wikimedia.org/T218024 (10bd808) >>! In T218024#5045062, @aborrero wrote: > * labtestwiki seems to be a mediawiki database, for a testing wikitech... [23:59:12] (03PS1) 10Bstorm: sonofgridengine: link hostgroup processing to host processing [puppet] - 10https://gerrit.wikimedia.org/r/498519 (https://phabricator.wikimedia.org/T216151)