[00:06:23] (03PS1) 10Cwhite: add bastion cluster definition [puppet] - 10https://gerrit.wikimedia.org/r/478372 (https://phabricator.wikimedia.org/T210486) [00:06:29] (03PS1) 10Dzahn: admins: add new group for proton admins [puppet] - 10https://gerrit.wikimedia.org/r/478373 (https://phabricator.wikimedia.org/T211382) [00:09:13] 10Operations, 10Proton, 10SRE-Access-Requests, 10Patch-For-Review: Requesting access to Proton for pmiazga, bearND, Mholloway, MSantos, Tgr - https://phabricator.wikimedia.org/T211382 (10Dzahn) >>! In T211382#4806114, @mobrovac wrote: > .. we currently don't have an admin group for proton (ping @akosiaris... [00:10:06] (03CR) 10Dzahn: "ah, getting away from everything being "misc", that's cool!" [puppet] - 10https://gerrit.wikimedia.org/r/478372 (https://phabricator.wikimedia.org/T210486) (owner: 10Cwhite) [00:13:38] 10Operations, 10Commons, 10MediaWiki-File-management, 10MediaWiki-Maintenance-scripts, and 3 others: cronspam cleanup: Cron /usr/local/bin/foreachwiki maintenance/cleanupUploadStash.php > /dev/null - https://phabricator.wikimedia.org/T150375 (10Dzahn) >>! In T150375#4803529, @jijiki wrot... [00:14:36] (03PS1) 10Smalyshev: Make dumps dir tagged as in-wdqs-data-dir [puppet] - 10https://gerrit.wikimedia.org/r/478374 (https://phabricator.wikimedia.org/T211462) [00:54:43] (03CR) 10Krinkle: Class wrapper for ProductionServices.php etc. (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/477956 (owner: 10Tim Starling) [01:02:18] (03CR) 10Krinkle: Class wrapper for ProductionServices.php etc. (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/477956 (owner: 10Tim Starling) [01:08:29] RECOVERY - Request latencies on neon is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/kubernetes-api [01:08:41] RECOVERY - etcd request latencies on neon is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/kubernetes-api [01:11:12] (03CR) 10Krinkle: Excimer and Tideways support (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478137 (owner: 10Tim Starling) [01:22:16] (03PS1) 10BryanDavis: wmcs: Add a cli script for managing dynamicproxy entries [puppet] - 10https://gerrit.wikimedia.org/r/478377 (https://phabricator.wikimedia.org/T211367) [01:39:35] (03CR) 10Volans: wmcs: Add a cli script for managing dynamicproxy entries (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/478377 (https://phabricator.wikimedia.org/T211367) (owner: 10BryanDavis) [01:44:33] win 73 [01:48:39] 10Operations, 10Gerrit, 10Patch-For-Review: Remove port 29418 from cloning process - https://phabricator.wikimedia.org/T37611 (10Tgr) I think this hurts the participation of enterprise MediaWiki developers as corporate firewalls typically disallow unknown ports (e.g. the NASA people complained recently about... [02:11:06] 10Operations, 10Gerrit, 10Patch-For-Review: Remove port 29418 from cloning process - https://phabricator.wikimedia.org/T37611 (10Krinkle) 05declined>03Open Re-opening on the basis that since 2015 the aforementioned "Phabricator plans" no longer apply, and that our kernals are now one major version ahead... [03:35:57] PROBLEM - MariaDB Slave Lag: s1 on dbstore1002 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 858.69 seconds [03:50:47] (03PS1) 10CDanis: cdanis: more fun with dotfiles [puppet] - 10https://gerrit.wikimedia.org/r/478387 [03:51:19] (03CR) 10CDanis: [C: 032] cdanis: more fun with dotfiles [puppet] - 10https://gerrit.wikimedia.org/r/478387 (owner: 10CDanis) [04:15:56] (03PS1) 10CDanis: cdanis: dotfiles: kludge around compinit sudo warning [puppet] - 10https://gerrit.wikimedia.org/r/478388 [04:16:19] (03CR) 10CDanis: [C: 032] cdanis: dotfiles: kludge around compinit sudo warning [puppet] - 10https://gerrit.wikimedia.org/r/478388 (owner: 10CDanis) [06:05:25] RECOVERY - MariaDB Slave Lag: s1 on dbstore1002 is OK: OK slave_sql_lag Replication lag: 204.00 seconds [06:18:53] 10Operations, 10Gadgets, 10MediaWiki-Cache, 10MW-1.33-notes (1.33.0-wmf.6; 2018-11-27), and 4 others: Mcrouter periodically reports soft TKOs for mc[1,2]035 leading to MW Memcached exceptions - https://phabricator.wikimedia.org/T203786 (10aaron) >>! In T203786#4805938, @elukey wrote: > Today another mediaw... [06:29:19] PROBLEM - puppet last run on labstore1003 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 3 minutes ago with 1 failures. Failed resources (up to 3 shown): File[/etc/apparmor.d/abstractions/ssl_certs] [06:30:25] PROBLEM - puppet last run on kubestagetcd1003 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 4 minutes ago with 1 failures. Failed resources (up to 3 shown): File[/etc/profile.d/bash_autologout.sh] [06:30:55] PROBLEM - puppet last run on analytics1071 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 4 minutes ago with 1 failures. Failed resources (up to 3 shown): File[/etc/R/update-library.R] [06:55:19] RECOVERY - puppet last run on labstore1003 is OK: OK: Puppet is currently enabled, last run 7 seconds ago with 0 failures [06:56:25] RECOVERY - puppet last run on kubestagetcd1003 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:56:53] RECOVERY - puppet last run on analytics1071 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [07:27:57] 10Operations, 10Developer-Advocacy, 10Gerrit: Remove port 29418 from cloning process - https://phabricator.wikimedia.org/T37611 (10Aklapper) [11:01:55] PROBLEM - Restbase LVS eqiad on restbase.svc.eqiad.wmnet is CRITICAL: /en.wikipedia.org/v1/data/citation/{format}/{query} (Get citation for Darth Vader) timed out before a response was received [11:03:01] RECOVERY - Restbase LVS eqiad on restbase.svc.eqiad.wmnet is OK: All endpoints are healthy [11:15:37] (03CR) 10GTirloni: [C: 032] remove directorysize diamond collector [puppet] - 10https://gerrit.wikimedia.org/r/478368 (https://phabricator.wikimedia.org/T183454) (owner: 10Cwhite) [11:15:45] (03PS2) 10GTirloni: remove directorysize diamond collector [puppet] - 10https://gerrit.wikimedia.org/r/478368 (https://phabricator.wikimedia.org/T183454) (owner: 10Cwhite) [13:54:09] !log decommissioning cassandra-c, restbase2002 -- T210843 [13:54:13] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:54:14] T210843: Reshape RESTBase Cassandra cluster for server refresh - https://phabricator.wikimedia.org/T210843 [13:58:13] godog: hahah [13:58:21] you beat me to it! (barely) [13:59:35] godog: I came here to log exactly the same only to see your message [14:00:17] urandom: haha fantastic! I thought about it earlier this morning and got to it only now [14:01:01] godog: I have a script running on 2013 that polls the status output and sends me an email, I awoke this morning to an email that -b had finisshed [14:01:33] I admit it though; I made coffee first [14:01:37] * urandom hangs his head [14:02:27] lol [14:05:20] godog: these samsung-equipped nodes are slow to stream :/ [14:08:49] yeah no kidding, I was trying to gauge how long per decommission now [14:09:59] seems like 6-8 hours [14:11:08] so another 4 days minimum [14:11:30] urandom: gotta go, ttyl! [14:11:47] godog: ttyl! [15:43:58] (03PS1) 10Volans: validator: complete refactor of the validation [dns] - 10https://gerrit.wikimedia.org/r/478416 (https://phabricator.wikimedia.org/T182028) [15:49:09] (03CR) 10Volans: "I've find a bit of time to improve the current status, see the commit message for details." [dns] - 10https://gerrit.wikimedia.org/r/478416 (https://phabricator.wikimedia.org/T182028) (owner: 10Volans) [16:14:59] (03CR) 10Urbanecm: [C: 031] "LGTM" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/477979 (https://phabricator.wikimedia.org/T211312) (owner: 10MarcoAurelio) [16:36:28] 10Operations, 10Mail: More restrictive DMARC policy for the wikimedia.org domain - https://phabricator.wikimedia.org/T211404 (10putnik) I received a response from Mail.Ru, and as a result, the first priority is T211477. [16:37:27] volans: nice :) [16:37:41] cdanis: thx :) [16:38:56] haha another one here [16:39:16] enjoy your weekend guys :) [17:01:37] (03PS2) 10Volans: validator: complete refactor of the validation [dns] - 10https://gerrit.wikimedia.org/r/478416 (https://phabricator.wikimedia.org/T182028) [18:51:04] 10Operations, 10ops-codfw, 10DBA, 10decommission: Decommission parsercache hosts: pc2004 pc2005 pc2006 (Dec 2018 lease return) - https://phabricator.wikimedia.org/T209858 (10Papaul) [19:40:19] (03CR) 10Krinkle: [C: 031] config: move wgMFNoindexPages to InitialiseSettings-labs [mediawiki-config] - 10https://gerrit.wikimedia.org/r/476060 (https://phabricator.wikimedia.org/T206497) (owner: 10Imarlier) [20:57:03] 10Operations, 10ops-codfw, 10Core Platform Team, 10Services (doing), and 2 others: Reshape RESTBase Cassandra cluster for server refresh - https://phabricator.wikimedia.org/T210843 (10Eevans) [20:59:22] !log decommissioning cassandra-a, restbase2003 -- T210843 [20:59:25] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:59:26] T210843: Reshape RESTBase Cassandra cluster for server refresh - https://phabricator.wikimedia.org/T210843 [21:53:59] 10Operations, 10Operations-Software-Development, 10Patch-For-Review: Develop and deploy at least three Netbox reports to assist with data correctness and consistency - https://phabricator.wikimedia.org/T205899 (10crusnov) [21:54:10] 10Operations, 10Operations-Software-Development, 10Patch-For-Review: Develop and deploy at least three Netbox reports to assist with data correctness and consistency - https://phabricator.wikimedia.org/T205899 (10crusnov) [22:01:46] (03PS1) 10Brian Wolff: Don't override builtin rate limits [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478439 (https://phabricator.wikimedia.org/T209794) [22:10:04] (03Abandoned) 10Brian Wolff: Don't override builtin rate limits [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478439 (https://phabricator.wikimedia.org/T209794) (owner: 10Brian Wolff) [22:14:43] (03PS1) 10Brian Wolff: Add changeemail rate limit [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478440 (https://phabricator.wikimedia.org/T209794) [22:16:10] (03CR) 10Brian Wolff: [C: 032] Add changeemail rate limit [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478440 (https://phabricator.wikimedia.org/T209794) (owner: 10Brian Wolff) [22:17:15] (03Merged) 10jenkins-bot: Add changeemail rate limit [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478440 (https://phabricator.wikimedia.org/T209794) (owner: 10Brian Wolff) [22:21:03] (03CR) 10jenkins-bot: Add changeemail rate limit [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478440 (https://phabricator.wikimedia.org/T209794) (owner: 10Brian Wolff) [22:21:04] !log bawolff@deploy1001 Synchronized wmf-config/InitialiseSettings.php: T209794 (duration: 00m 56s) [22:21:06] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:31:32] 10Operations, 10monitoring, 10Availability, 10Patch-For-Review, 10Performance-Team (Radar): Perform a statsd and Graphite switch - https://phabricator.wikimedia.org/T206963 (10Krinkle) [22:32:00] 10Operations, 10monitoring, 10Availability, 10Performance-Team (Radar): Perform a statsd and Graphite switch - https://phabricator.wikimedia.org/T206963 (10Krinkle) [22:32:15] (03Restored) 10Krinkle: [WIP] errorpages: Remove unused hhvm-fatal-error.php [mediawiki-config] - 10https://gerrit.wikimedia.org/r/412829 (https://phabricator.wikimedia.org/T113114) (owner: 10Krinkle) [22:32:23] (03Restored) 10Krinkle: mediawiki/hhvm: Move fatal-error.php to Puppet [puppet] - 10https://gerrit.wikimedia.org/r/379953 (https://phabricator.wikimedia.org/T113114) (owner: 10Krinkle) [22:33:05] 10Operations, 10Core Platform Team Backlog (Watching / External), 10HHVM, 10Patch-For-Review, and 3 others: Correctly collect logs from php-fpm pools - https://phabricator.wikimedia.org/T211184 (10Krinkle) [23:03:50] (03PS1) 10Krinkle: speed-tests: Remove old content and add a new article copy [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478447 (https://phabricator.wikimedia.org/T185446) [23:04:08] (03CR) 10Krinkle: [C: 032] speed-tests: Remove old content and add a new article copy [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478447 (https://phabricator.wikimedia.org/T185446) (owner: 10Krinkle) [23:05:11] (03Merged) 10jenkins-bot: speed-tests: Remove old content and add a new article copy [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478447 (https://phabricator.wikimedia.org/T185446) (owner: 10Krinkle) [23:07:07] !log krinkle@deploy1001 Synchronized docroot/wikipedia.org/speed-tests/: T185446 - I6cf29d598a11 (duration: 00m 47s) [23:07:10] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:07:11] T185446: Add static Obama page as reference page for our tests - https://phabricator.wikimedia.org/T185446 [23:12:57] (03CR) 10jenkins-bot: speed-tests: Remove old content and add a new article copy [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478447 (https://phabricator.wikimedia.org/T185446) (owner: 10Krinkle) [23:24:00] (03PS9) 10Krinkle: mediawiki/hhvm: Move fatal-error.php to Puppet [puppet] - 10https://gerrit.wikimedia.org/r/379953 (https://phabricator.wikimedia.org/T113114) [23:35:56] 10Operations, 10MediaWiki-Logging, 10Wikimedia-Logstash: Move mediawiki to new logging infrastructure - https://phabricator.wikimedia.org/T211124 (10Krinkle) [23:42:51] PROBLEM - puppet last run on cloudnet1004 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [23:52:09] PROBLEM - restbase endpoints health on restbase-dev1004 is CRITICAL: /en.wikipedia.org/v1/data/citation/{format}/{query} (Get citation for Darth Vader) timed out before a response was received [23:53:17] RECOVERY - restbase endpoints health on restbase-dev1004 is OK: All endpoints are healthy