[00:11:31] RECOVERY - puppet last run on mw2238 is OK: OK: Puppet is currently enabled, last run 44 seconds ago with 0 failures [00:16:19] (03PS12) 10MaxSem: Script to do the initial data load from OSM for Maps project [puppet] - 10https://gerrit.wikimedia.org/r/293105 (owner: 10Gehel) [00:20:59] (03CR) 10Yurik: Script to do the initial data load from OSM for Maps project (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/293105 (owner: 10Gehel) [01:03:29] 06Operations, 10DBA, 13Patch-For-Review: reimage or decom db servers on precise - https://phabricator.wikimedia.org/T125028#2403739 (10mmodell) @jcrespo: Thanks, sounds good to me! Is the 01:00 AM UTC time slot ok for you? It's evening in my time zone but I know that's super late for europe and I'm not sure... [01:19:46] (03PS3) 10BBlack: stream.wm.o: move to cache_misc in DNS [dns] - 10https://gerrit.wikimedia.org/r/295385 (https://phabricator.wikimedia.org/T134871) [01:21:04] (03CR) 10BBlack: [C: 032] stream.wm.o: move to cache_misc in DNS [dns] - 10https://gerrit.wikimedia.org/r/295385 (https://phabricator.wikimedia.org/T134871) (owner: 10BBlack) [01:22:48] !log stream.wikimedia.org (RCStream) DNS moved to cache_misc termination. If anyone reports bugs with rcstream services, revert https://gerrit.wikimedia.org/r/295385 [02:10:40] 06Operations, 10Wikimedia-SVG-rendering, 13Patch-For-Review: Install Amiri font (arabic) for svg - https://phabricator.wikimedia.org/T135347#2403743 (10Uwe_a) Greetings, the one called "with-amiri.png" looks right, and is using a naskh typeface/font (Amiri as far as i can tell) , the other one looks exactly... [02:38:07] !log mwdeploy@tin scap sync-l10n completed (1.28.0-wmf.6) (duration: 17m 08s) [02:38:13] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [03:12:51] !log mwdeploy@tin scap sync-l10n completed (1.28.0-wmf.7) (duration: 17m 24s) [03:12:56] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [03:19:55] !log l10nupdate@tin ResourceLoader cache refresh completed at Fri Jun 24 03:19:55 UTC 2016 (duration 7m 4s) [03:20:01] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [03:54:32] !log data copy for labmon1001 verified complete with proper permissions, re-enabling and running puppet to start back up services [03:54:37] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [03:54:54] RECOVERY - carbon-cache@d service on labmon1001 is OK: OK - carbon-cache@d is active [03:55:04] RECOVERY - carbon-cache@a service on labmon1001 is OK: OK - carbon-cache@a is active [03:55:14] RECOVERY - carbon-cache@c service on labmon1001 is OK: OK - carbon-cache@c is active [03:55:43] RECOVERY - carbon-cache@b service on labmon1001 is OK: OK - carbon-cache@b is active [03:55:53] RECOVERY - carbon-cache@f service on labmon1001 is OK: OK - carbon-cache@f is active [03:56:04] RECOVERY - carbon-cache@e service on labmon1001 is OK: OK - carbon-cache@e is active [03:56:12] bleh, yet graphite/apache is unhappy. [03:56:43] RECOVERY - carbon-cache@h service on labmon1001 is OK: OK - carbon-cache@h is active [03:57:03] RECOVERY - carbon-cache@g service on labmon1001 is OK: OK - carbon-cache@g is active [03:57:54] bleh, must be permissions [04:26:02] 06Operations, 10Wikimedia-SVG-rendering, 13Patch-For-Review: Install Amiri font (arabic) for svg - https://phabricator.wikimedia.org/T135347#2403794 (10MoritzMuehlenhoff) Hi, then we're indeed all covered it seems. I'm not sure if there's a way to trigger a rerasterisation of an existing image, I'll try to f... [05:05:21] PROBLEM - IPv6 ping to codfw on ripe-atlas-codfw is CRITICAL: CRITICAL - failed 24 probes of 231 (alerts on 19) - https://atlas.ripe.net/measurements/1791212/#!map [05:11:31] RECOVERY - IPv6 ping to codfw on ripe-atlas-codfw is OK: OK - failed 12 probes of 231 (alerts on 19) - https://atlas.ripe.net/measurements/1791212/#!map [05:12:54] 06Operations, 10MediaWiki-extensions-UniversalLanguageSelector, 10Wikimedia-SVG-rendering, 07I18n: MB Lateefi Fonts for Sindhi Wikipedia. - https://phabricator.wikimedia.org/T138136#2403797 (10mehtab.ahmed) {F4196942} This file has been sent by author. Check if it's according to needs. [05:33:38] (03PS1) 10Muehlenhoff: etcd: Use PRODUCTION_NETWORKS in ferm rules [puppet] - 10https://gerrit.wikimedia.org/r/295778 [06:31:15] PROBLEM - puppet last run on restbase2006 is CRITICAL: CRITICAL: Puppet has 1 failures [06:31:24] PROBLEM - puppet last run on ms-be2026 is CRITICAL: CRITICAL: Puppet has 1 failures [06:32:14] PROBLEM - puppet last run on cp3017 is CRITICAL: CRITICAL: Puppet has 2 failures [06:32:44] PROBLEM - puppet last run on mw1276 is CRITICAL: CRITICAL: Puppet has 1 failures [06:33:35] PROBLEM - puppet last run on cp3008 is CRITICAL: CRITICAL: Puppet has 1 failures [06:35:04] (03CR) 10Elukey: [C: 031] "Mr Giunchedi is a wise man, I rechecked and restbase1010 works fine. I also checked the patch with aqs1001 that is a host still running wi" [puppet] - 10https://gerrit.wikimedia.org/r/295123 (https://phabricator.wikimedia.org/T137422) (owner: 10Nicko) [06:47:02] (03PS1) 10Muehlenhoff: url_downloader: Use PRODUCTION_NETWORKS in ferm rules [puppet] - 10https://gerrit.wikimedia.org/r/295781 [06:55:55] RECOVERY - puppet last run on ms-be2026 is OK: OK: Puppet is currently enabled, last run 7 seconds ago with 0 failures [06:56:01] (03PS2) 10Elukey: Restore mc1007 memcached growth factor to 1.05 as the rest of the cluster. [puppet] - 10https://gerrit.wikimedia.org/r/295702 (https://phabricator.wikimedia.org/T129963) [06:56:36] RECOVERY - puppet last run on cp3017 is OK: OK: Puppet is currently enabled, last run 3 seconds ago with 0 failures [06:57:05] RECOVERY - puppet last run on mw1276 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:57:55] RECOVERY - puppet last run on restbase2006 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:57:56] disabled puppet on mc1* hosts for https://gerrit.wikimedia.org/r/#/c/295702/, not really needed but better super safe :P [06:58:04] RECOVERY - puppet last run on cp3008 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:58:42] (03CR) 10Elukey: [C: 032] Restore mc1007 memcached growth factor to 1.05 as the rest of the cluster. [puppet] - 10https://gerrit.wikimedia.org/r/295702 (https://phabricator.wikimedia.org/T129963) (owner: 10Elukey) [07:10:21] !log memcached on mc1007 restarted with growth factor 1.05 (T129963) [07:10:22] T129963: Update memcached package and configuration options - https://phabricator.wikimedia.org/T129963 [07:10:25] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [07:15:03] all good, only mc1007 restarted as expected [07:27:30] (03PS1) 10Muehlenhoff: saltmaster/production: Use PRODUCTION_NETWORKS [puppet] - 10https://gerrit.wikimedia.org/r/295782 [07:30:34] (03PS1) 10Ema: tlsproxy: enable TCP Fast Open [puppet] - 10https://gerrit.wikimedia.org/r/295783 (https://phabricator.wikimedia.org/T108827) [07:33:18] (03PS1) 10Muehlenhoff: install_server: Use PRODUCTION_NETWORKS [puppet] - 10https://gerrit.wikimedia.org/r/295784 [07:37:01] (03PS2) 10Ema: tlsproxy: enable TCP Fast Open [puppet] - 10https://gerrit.wikimedia.org/r/295783 (https://phabricator.wikimedia.org/T108827) [07:41:26] (03CR) 10Ema: [C: 032 V: 032] tlsproxy: enable TCP Fast Open [puppet] - 10https://gerrit.wikimedia.org/r/295783 (https://phabricator.wikimedia.org/T108827) (owner: 10Ema) [07:42:25] PROBLEM - puppet last run on cp3031 is CRITICAL: CRITICAL: Puppet last ran 15 hours ago [07:43:04] PROBLEM - puppet last run on cp3043 is CRITICAL: CRITICAL: Puppet last ran 15 hours ago [07:43:05] PROBLEM - puppet last run on cp3030 is CRITICAL: CRITICAL: Puppet last ran 15 hours ago [07:43:44] PROBLEM - puppet last run on cp3042 is CRITICAL: CRITICAL: Puppet last ran 15 hours ago [07:43:44] PROBLEM - puppet last run on cp3040 is CRITICAL: CRITICAL: Puppet last ran 15 hours ago [07:43:55] PROBLEM - puppet last run on cp3041 is CRITICAL: CRITICAL: Puppet last ran 15 hours ago [07:44:14] PROBLEM - puppet last run on cp3032 is CRITICAL: CRITICAL: Puppet last ran 15 hours ago [07:45:40] sorry for the puppetagent icinga spam ^ [07:46:03] RECOVERY - puppet last run on cp3040 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [07:46:54] RECOVERY - puppet last run on cp3031 is OK: OK: Puppet is currently enabled, last run 44 seconds ago with 0 failures [07:47:34] RECOVERY - puppet last run on cp3043 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [07:47:34] RECOVERY - puppet last run on cp3030 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [07:48:14] RECOVERY - puppet last run on cp3042 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [07:48:44] RECOVERY - puppet last run on cp3032 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [07:50:43] RECOVERY - puppet last run on cp3041 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [07:54:02] (03PS1) 10Muehlenhoff: backup: Use PRODUCTION_NETWORKS [puppet] - 10https://gerrit.wikimedia.org/r/295786 [07:57:54] (03PS1) 10Muehlenhoff: kafka/analytics: Use PRODUCTION_NETWORKS [puppet] - 10https://gerrit.wikimedia.org/r/295787 [08:09:23] (03CR) 10Nicko: "Looking at the other PR, please note that you're now free to change the list of people who will receive the alerts (this is not part of th" [puppet] - 10https://gerrit.wikimedia.org/r/295123 (https://phabricator.wikimedia.org/T137422) (owner: 10Nicko) [08:10:05] ACKNOWLEDGEMENT - etherpad_lite_process_running on etherpad1001 is CRITICAL: PROCS CRITICAL: 2 processes with regex args ^/usr/bin/node /usr/share/etherpad-lite/node_modules/ep_etherpad-lite/node/server.js Jcrespo https://phabricator.wikimedia.org/T138516 2 processes temporarily up for recovery - The acknowledgement expires at: 2016-07-04 09:06:20. [08:13:11] (03CR) 10Elukey: "Added Joal just to double check and be super safe. We use Camus to pull logs from kafka to HDFS. It is running on analytics1027, not sure " [puppet] - 10https://gerrit.wikimedia.org/r/295787 (owner: 10Muehlenhoff) [08:14:48] elukey: regarding --^, not sure how it could affect us [08:15:02] elukey: Is annalytics subnet in production ? [08:16:04] PROBLEM - puppet last run on cp3007 is CRITICAL: CRITICAL: puppet fail [08:16:33] (03PS3) 10Gehel: Notify TileratorUI on new expiry files [puppet] - 10https://gerrit.wikimedia.org/r/295450 (https://phabricator.wikimedia.org/T108459) (owner: 10Yurik) [08:17:06] (03CR) 10Gehel: Notify TileratorUI on new expiry files (033 comments) [puppet] - 10https://gerrit.wikimedia.org/r/295450 (https://phabricator.wikimedia.org/T108459) (owner: 10Yurik) [08:24:08] 06Operations, 10DBA, 10Wikimedia-Etherpad, 13Patch-For-Review, 07User-notice: etherpad database issues - https://phabricator.wikimedia.org/T138516#2402605 (10Nemo_bis) Thanks a lot for recovering the data, you're saving me some bitter teardrops. [08:24:11] (03CR) 10Gehel: [C: 04-1] Prepare scap3 deployment for WDQS (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/295437 (https://phabricator.wikimedia.org/T129144) (owner: 10Smalyshev) [08:25:03] RECOVERY - puppet last run on cp3007 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [08:29:09] !log uploaded nodejs 4.4.6 for jessie-wikimedia to carbon [08:29:13] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [08:31:58] 06Operations, 10DBA, 10Wikimedia-Etherpad: Correct m1 backups, that were being done supposing m1-master was still a slave - https://phabricator.wikimedia.org/T138559#2403895 (10jcrespo) [08:35:43] (03CR) 10Gehel: Prepare scap3 deployment for WDQS (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/295437 (https://phabricator.wikimedia.org/T129144) (owner: 10Smalyshev) [08:41:09] (03CR) 10Smalyshev: Prepare scap3 deployment for WDQS (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/295437 (https://phabricator.wikimedia.org/T129144) (owner: 10Smalyshev) [08:48:44] PROBLEM - mobileapps endpoints health on scb2001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [08:50:54] RECOVERY - mobileapps endpoints health on scb2001 is OK: All endpoints are healthy [08:51:08] (03PS1) 10Yuvipanda: Revert "graphite: Use mod_proxy for proxying" [puppet] - 10https://gerrit.wikimedia.org/r/295792 (https://phabricator.wikimedia.org/T138541) [08:51:36] (03CR) 10jenkins-bot: [V: 04-1] Revert "graphite: Use mod_proxy for proxying" [puppet] - 10https://gerrit.wikimedia.org/r/295792 (https://phabricator.wikimedia.org/T138541) (owner: 10Yuvipanda) [08:54:32] (03PS1) 10Yuvipanda: Revert "graphite: Use mod_proxy for proxying" [puppet] - 10https://gerrit.wikimedia.org/r/295793 (https://phabricator.wikimedia.org/T138541) [08:57:44] 06Operations: Updates various services to nodejs 4.4.6 - https://phabricator.wikimedia.org/T138561#2403928 (10MoritzMuehlenhoff) [08:57:59] 06Operations, 06Services: Updates various services to nodejs 4.4.6 - https://phabricator.wikimedia.org/T138561#2403940 (10MoritzMuehlenhoff) [08:58:12] (03Abandoned) 10Yuvipanda: Revert "graphite: Use mod_proxy for proxying" [puppet] - 10https://gerrit.wikimedia.org/r/295792 (https://phabricator.wikimedia.org/T138541) (owner: 10Yuvipanda) [08:58:20] (03PS2) 10Yuvipanda: Revert "graphite: Use mod_proxy for proxying" [puppet] - 10https://gerrit.wikimedia.org/r/295793 (https://phabricator.wikimedia.org/T138541) [08:58:27] (03CR) 10Yuvipanda: [C: 032 V: 032] Revert "graphite: Use mod_proxy for proxying" [puppet] - 10https://gerrit.wikimedia.org/r/295793 (https://phabricator.wikimedia.org/T138541) (owner: 10Yuvipanda) [09:01:27] mhh yuvipanda not on irc? ^ is going to bounce graphite in production too [09:01:34] yeah [09:01:36] am looking at it [09:01:41] as in, ran puppet and verifying [09:01:42] etc [09:01:49] sorry, forgot I'm in europe and you'd be around too [09:02:01] ah, I see you as Guest41061 not yuvipanda [09:02:07] oh [09:02:09] really? [09:02:12] augh [09:02:12] me too [09:02:33] better? [09:02:37] yeah [09:02:40] yeah! [09:02:50] ran on 1003 seems ok, doing on 1001 now [09:03:05] well it's running just now [09:03:07] yuvipanda: ok thanks, let me know if you run into problems [09:04:00] godog yup, will do. It seems to be all back just now [09:04:48] 07Blocked-on-Operations, 06Operations, 07Graphite: "unexpected error" on graphite-web - https://phabricator.wikimedia.org/T138541#2403970 (10yuvipanda) I've reverted the mod_proxy switch just now, and everything is back to normal. I'll re-introduce it later when I'm not at Wikimania. Apologies for the disrup... [09:05:01] indeed, lgtm! [09:06:31] (03PS5) 10Elukey: Include a cassandra::instance::monitoring class [puppet] - 10https://gerrit.wikimedia.org/r/295123 (https://phabricator.wikimedia.org/T137422) (owner: 10Nicko) [09:07:52] (03PS1) 10Ema: tlsproxy: only enable TFO on default_server [puppet] - 10https://gerrit.wikimedia.org/r/295810 (https://phabricator.wikimedia.org/T108827) [09:08:43] (03CR) 10Elukey: [C: 032] Include a cassandra::instance::monitoring class [puppet] - 10https://gerrit.wikimedia.org/r/295123 (https://phabricator.wikimedia.org/T137422) (owner: 10Nicko) [09:09:03] RECOVERY - graphite.wmflabs.org on labmon1001 is OK: HTTP OK: HTTP/1.1 200 OK - 1701 bytes in 0.020 second response time [09:09:51] !log scb100x stopping puppet to stop change-prop and clear the queue [09:09:55] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [09:11:08] 06Operations, 10DBA, 13Patch-For-Review: reimage or decom db servers on precise - https://phabricator.wikimedia.org/T125028#2403996 (10jcrespo) @mmodel please let's use the more specific T138460 for coordinating this, so you do not get spamed with other server's activity. [09:12:11] PROBLEM - Disk space on notebook1001 is CRITICAL: Connection refused by host [09:12:14] godog got a few minutes to track down labmon mystery? It's getting new data but doesn't seem to recognize old data [09:12:23] PROBLEM - MegaRAID on notebook1001 is CRITICAL: Connection refused by host [09:12:51] PROBLEM - configured eth on notebook1001 is CRITICAL: Connection refused by host [09:13:01] PROBLEM - puppet last run on notebook1001 is CRITICAL: Connection refused by host [09:13:21] PROBLEM - dhclient process on notebook1001 is CRITICAL: Connection refused by host [09:13:22] PROBLEM - salt-minion processes on notebook1001 is CRITICAL: Connection refused by host [09:13:32] PROBLEM - Check size of conntrack table on notebook1001 is CRITICAL: Connection refused by host [09:13:43] (03CR) 10ArielGlenn: [C: 031] saltmaster/production: Use PRODUCTION_NETWORKS [puppet] - 10https://gerrit.wikimedia.org/r/295782 (owner: 10Muehlenhoff) [09:13:52] 06Operations, 10DBA, 10Phabricator: Upgrade m3 (phabricator) db servers - https://phabricator.wikimedia.org/T138460#2404003 (10jcrespo) @mmodell Maybe we can do the first part (depool db1048) ASAP, by removing all references to it, or to m3-slave (or just stop doing temporarily those long-running processes t... [09:14:02] PROBLEM - DPKG on notebook1001 is CRITICAL: Connection refused by host [09:15:27] yuvipanda: sure, do you have an example? [09:15:51] (03PS2) 10Ema: tlsproxy: only enable TFO on default_server [puppet] - 10https://gerrit.wikimedia.org/r/295810 (https://phabricator.wikimedia.org/T108827) [09:16:01] (03CR) 10Ema: [C: 032 V: 032] tlsproxy: only enable TFO on default_server [puppet] - 10https://gerrit.wikimedia.org/r/295810 (https://phabricator.wikimedia.org/T108827) (owner: 10Ema) [09:16:52] godog (IRC) ah, nvm, it was just out for a bit [09:16:58] it's all good now! [09:16:59] brb [09:17:17] hehe ok [09:17:32] PROBLEM - changeprop endpoints health on scb1001 is CRITICAL: Generic error: Generic connection error: HTTPConnectionPool(host=10.64.0.16, port=7272): Max retries exceeded with url: /?spec (Caused by ProtocolError(Connection aborted., error(111, Connection refused))) [09:18:02] PROBLEM - changeprop endpoints health on scb1002 is CRITICAL: Generic error: Generic connection error: HTTPConnectionPool(host=10.64.16.21, port=7272): Max retries exceeded with url: /?spec (Caused by ProtocolError(Connection aborted., error(111, Connection refused))) [09:18:13] <_joe_> mobrovac: log it :) [09:18:30] _joe_: logged it ^^^ [09:19:04] <_joe_> sorry didn't see it [09:19:30] <_joe_> ETOOMUCHNOISE [09:21:54] (03CR) 10Gehel: [C: 04-1] Include a cassandra::instance::monitoring class (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/295125 (https://phabricator.wikimedia.org/T137422) (owner: 10Nicko) [09:23:58] gehel: just merged Nicko's first change for cassandra::monitoring, it looks good for the moment. I manually ran puppet on AQS and a couple of restbase servers, it was a no-op. Icinga is also showing correctly instances checked, that is good too [09:24:36] very good result from the mini kitchen hackaton :) [09:24:45] elukey: I saw that (and nicko too). Thanks a lot! [09:25:45] looking forward to apply the next one, so I'll get AQS alarms on the IRC channel again [09:26:42] RECOVERY - changeprop endpoints health on scb1001 is OK: All endpoints are healthy [09:27:25] (03CR) 10Gehel: "Puppet compiler: https://puppet-compiler.wmflabs.org/3185/" [puppet] - 10https://gerrit.wikimedia.org/r/295125 (https://phabricator.wikimedia.org/T137422) (owner: 10Nicko) [09:27:59] elukey: same for me and alerts on Maps! [09:29:32] RECOVERY - changeprop endpoints health on scb1002 is OK: All endpoints are healthy [09:29:39] (03CR) 10Alexandros Kosiaris: Configure Kartotherian geoshapes support (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/295602 (https://phabricator.wikimedia.org/T134084) (owner: 10Yurik) [09:32:02] (03CR) 10Gehel: Configure Kartotherian geoshapes support (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/295602 (https://phabricator.wikimedia.org/T134084) (owner: 10Yurik) [09:33:17] (03CR) 10Alexandros Kosiaris: Configure Kartotherian geoshapes support (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/295602 (https://phabricator.wikimedia.org/T134084) (owner: 10Yurik) [09:33:27] 06Operations, 13Patch-For-Review: Tracking and Reducing cron-spam from root@ - https://phabricator.wikimedia.org/T132324#2404086 (10jcrespo) [09:34:00] PROBLEM - Host notebook1001 is DOWN: PING CRITICAL - Packet loss = 100% [09:35:48] (03PS1) 10Hashar: contint: Java 8 on Jessie slaves [puppet] - 10https://gerrit.wikimedia.org/r/295880 (https://phabricator.wikimedia.org/T138506) [09:39:44] (03CR) 10Muehlenhoff: "Shouldn't that rather be a conditional, i.e. install openjdk-7 only on Ubuntu? Otherwise we have two versions installed along. Let's rathe" [puppet] - 10https://gerrit.wikimedia.org/r/295880 (https://phabricator.wikimedia.org/T138506) (owner: 10Hashar) [09:40:00] (03CR) 10Hashar: "Cherry picked on CI puppet master" [puppet] - 10https://gerrit.wikimedia.org/r/295880 (https://phabricator.wikimedia.org/T138506) (owner: 10Hashar) [09:48:35] !log upgrade nodejs on restbase test systems (xenon/praseodymium/cerium/restbase-test) and restart restbase on those [09:48:40] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [09:48:41] (03PS1) 10Jcrespo: Repool db1059 with low weight, increase weight of db1061, db1062 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/295882 [09:50:19] (03CR) 10Jcrespo: [C: 032] Repool db1059 with low weight, increase weight of db1061, db1062 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/295882 (owner: 10Jcrespo) [09:52:23] !log jynus@tin Synchronized wmf-config/db-eqiad.php: Repool db1059 with low weight, increase weight of db1061, db1062 (duration: 00m 33s) [09:52:27] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [09:56:46] PROBLEM - mysqld processes on db1001 is CRITICAL: PROCS CRITICAL: 0 processes with command name mysqld [09:57:28] (03CR) 10Hashar: "We can get both in parallel. Jenkins does not rely on Debian alternative system but set JAVA_HOME to point to the proper version. We thus " [puppet] - 10https://gerrit.wikimedia.org/r/295880 (https://phabricator.wikimedia.org/T138506) (owner: 10Hashar) [09:58:40] jynus: ^ known about db1001 ? [09:58:58] doesn't look like it is in service [09:59:28] !log nginx rolling restart to enable TFO on all tlsproxies (T108827) [09:59:29] T108827: Investigate TCP Fast Open for tlsproxy - https://phabricator.wikimedia.org/T108827 [09:59:32] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [10:00:01] godog, it is the failover from yestarday [10:01:00] jynus: ah, so reimage / expired silence [10:01:31] I forgot about it when its slave, now master, had issues [10:02:36] (03PS1) 10Yuvipanda: Revert "labs: Disable diamond labswide (enable for 3 projects)" [puppet] - 10https://gerrit.wikimedia.org/r/295884 (https://phabricator.wikimedia.org/T137753) [10:02:42] (03PS2) 10Yuvipanda: Revert "labs: Disable diamond labswide (enable for 3 projects)" [puppet] - 10https://gerrit.wikimedia.org/r/295884 (https://phabricator.wikimedia.org/T137753) [10:04:40] ok! thanks [10:05:10] (03PS1) 10Yuvipanda: Revert "labs: remove shinken monitoring for tools / deployment-prep" [puppet] - 10https://gerrit.wikimedia.org/r/295886 (https://phabricator.wikimedia.org/T137753) [10:05:25] (03PS2) 10Yuvipanda: Revert "labs: remove shinken monitoring for tools / deployment-prep" [puppet] - 10https://gerrit.wikimedia.org/r/295886 (https://phabricator.wikimedia.org/T137753) [10:05:29] (03CR) 10Yuvipanda: [C: 032] Revert "labs: Disable diamond labswide (enable for 3 projects)" [puppet] - 10https://gerrit.wikimedia.org/r/295884 (https://phabricator.wikimedia.org/T137753) (owner: 10Yuvipanda) [10:06:26] (03CR) 10Hashar: [C: 04-1] "-1 per Moritz. Have to look a bit more about coexisting Java 7 and Java 8 installations." [puppet] - 10https://gerrit.wikimedia.org/r/295880 (https://phabricator.wikimedia.org/T138506) (owner: 10Hashar) [10:11:00] (03PS1) 10Jcrespo: Perform m1 backups from the master, as there is no slave available [puppet] - 10https://gerrit.wikimedia.org/r/295889 (https://phabricator.wikimedia.org/T138559) [10:12:21] (03PS3) 10Yuvipanda: Revert "labs: remove shinken monitoring for tools / deployment-prep" [puppet] - 10https://gerrit.wikimedia.org/r/295886 (https://phabricator.wikimedia.org/T137753) [10:12:48] (03CR) 10Yuvipanda: [C: 032 V: 032] Revert "labs: remove shinken monitoring for tools / deployment-prep" [puppet] - 10https://gerrit.wikimedia.org/r/295886 (https://phabricator.wikimedia.org/T137753) (owner: 10Yuvipanda) [10:13:55] (03PS1) 10Jcrespo: Update db1001 to be the new slave, not the master [dns] - 10https://gerrit.wikimedia.org/r/295890 (https://phabricator.wikimedia.org/T138559) [10:16:50] (03PS2) 10Jcrespo: Perform m1 backups from the master, as there is no slave available [puppet] - 10https://gerrit.wikimedia.org/r/295889 (https://phabricator.wikimedia.org/T138559) [10:17:40] (03CR) 10Jcrespo: [C: 032] Perform m1 backups from the master, as there is no slave available [puppet] - 10https://gerrit.wikimedia.org/r/295889 (https://phabricator.wikimedia.org/T138559) (owner: 10Jcrespo) [10:18:41] !log upgrade nodejs on scb systems in codfw and restart node-based services [10:18:45] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [10:20:17] !log gallium: restarted apache2 , potentially stuck proxy [10:20:31] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [10:29:27] 06Operations, 10Traffic, 13Patch-For-Review: Investigate TCP Fast Open for tlsproxy - https://phabricator.wikimedia.org/T108827#2404243 (10ema) We're currently running with TCP Fast Open enabled on all tlsproxies, limiting the number of concurrent pending TFO requests to 150 to mitigate the risk of Resource... [10:34:56] (03CR) 10Jcrespo: [C: 032] Update db1001 to be the new slave, not the master [dns] - 10https://gerrit.wikimedia.org/r/295890 (https://phabricator.wikimedia.org/T138559) (owner: 10Jcrespo) [10:35:05] (03CR) 10Jcrespo: [V: 032] Update db1001 to be the new slave, not the master [dns] - 10https://gerrit.wikimedia.org/r/295890 (https://phabricator.wikimedia.org/T138559) (owner: 10Jcrespo) [10:46:30] java keeps confusing me [10:46:40] a class can have a static method that returns ... a class!! [10:46:51] or maybe it is a class inside a class [10:48:38] (03CR) 10Muehlenhoff: "Jenkins should be fine per" [puppet] - 10https://gerrit.wikimedia.org/r/295880 (https://phabricator.wikimedia.org/T138506) (owner: 10Hashar) [10:53:44] hashar: what's confusing about that? A class is an object just like anything else... [10:55:00] !log updated m1-slave dns to be db1001 [10:55:05] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [10:55:54] gehel: yeah, I must be confused because PHP does not support that :D [10:56:09] (03CR) 10Muehlenhoff: "$ANALYTICS_NETWORK is part of $PRODUCTION_NETWORKS, see /etc/ferm/conf.d/00_defs" [puppet] - 10https://gerrit.wikimedia.org/r/295787 (owner: 10Muehlenhoff) [10:56:17] * gehel is confused like hell by PHP... [10:57:48] 06Operations, 06Services: Updates various services to nodejs 4.4.6 - https://phabricator.wikimedia.org/T138561#2404268 (10MoritzMuehlenhoff) scb in codfw and the restbase test systems have been upgraded without apparent problems so far. [10:59:19] (03PS1) 10Filippo Giunchedi: raid: bump hpssacli nrpe timeout [puppet] - 10https://gerrit.wikimedia.org/r/295892 [10:59:54] 06Operations, 10DBA, 10Wikimedia-Etherpad, 13Patch-For-Review, 07User-notice: etherpad database issues - https://phabricator.wikimedia.org/T138516#2404271 (10jcrespo) [10:59:56] 06Operations, 10DBA, 10Wikimedia-Etherpad, 13Patch-For-Review: Correct m1 backups, that were being done supposing m1-master was still a slave - https://phabricator.wikimedia.org/T138559#2404269 (10jcrespo) 05Open>03Resolved a:03jcrespo [11:15:12] 06Operations, 10DBA, 10Wikimedia-Etherpad, 13Patch-For-Review, 07User-notice: etherpad database issues - https://phabricator.wikimedia.org/T138516#2404306 (10jcrespo) p:05High>03Low There is no pending recovery actions, just waiting to remove etherpad-restore and check backups continue as usual once... [11:20:55] 06Operations, 10MediaWiki-extensions-UniversalLanguageSelector, 10Wikimedia-SVG-rendering, 07I18n: MB Lateefi Fonts for Sindhi Wikipedia. - https://phabricator.wikimedia.org/T138136#2404308 (10mehtab.ahmed) Waiting for reply. [11:23:34] (03PS1) 10Hashar: contint: Android SDK deps on all slaves [puppet] - 10https://gerrit.wikimedia.org/r/295894 (https://phabricator.wikimedia.org/T138506) [11:25:06] (03CR) 10jenkins-bot: [V: 04-1] contint: Android SDK deps on all slaves [puppet] - 10https://gerrit.wikimedia.org/r/295894 (https://phabricator.wikimedia.org/T138506) (owner: 10Hashar) [11:25:28] (03PS2) 10Hashar: contint: Android SDK deps on all slaves [puppet] - 10https://gerrit.wikimedia.org/r/295894 (https://phabricator.wikimedia.org/T138506) [11:26:16] 06Operations, 10ops-esams, 06DC-Ops: ms-be3003 sdk (bay 11) broken - https://phabricator.wikimedia.org/T83811#2404327 (10fgiunchedi) [11:30:38] (03CR) 10Hashar: [C: 031] "cherry picked on CI puppetmaster. That aligns Jessie slaves with Trusty ones and let us execute the Android related tools. Ex:" [puppet] - 10https://gerrit.wikimedia.org/r/295894 (https://phabricator.wikimedia.org/T138506) (owner: 10Hashar) [11:34:16] 06Operations, 10MediaWiki-extensions-UniversalLanguageSelector, 10Wikimedia-SVG-rendering, 07I18n: MB Lateefi Fonts for Sindhi Wikipedia. - https://phabricator.wikimedia.org/T138136#2391310 (10MoritzMuehlenhoff) The font is packaged in Debian unstable as https://packages.qa.debian.org/f/fonts-sil-lateef.ht... [11:38:53] 06Operations, 10DBA, 10Phabricator: Upgrade m3 (phabricator) db servers - https://phabricator.wikimedia.org/T138460#2404370 (10jcrespo) Also, could this be part of your goal (maybe not technically, but give them equal importance as gallium)? I own these machines, but m3 only serves the Phabricator service, t... [11:44:39] (03CR) 10Elukey: [C: 031] "Thanks for the explanation Moritz!" [puppet] - 10https://gerrit.wikimedia.org/r/295787 (owner: 10Muehlenhoff) [11:47:24] 06Operations, 10MediaWiki-extensions-UniversalLanguageSelector, 10Wikimedia-SVG-rendering, 07I18n: MB Lateefi Fonts for Sindhi Wikipedia. - https://phabricator.wikimedia.org/T138136#2404415 (10Dereckson) **About the Debian package** I suspect the Debian package is for this font : http://software.sil.org/l... [11:59:58] (03PS1) 10Ema: diamond TCP collector: add TFO-related metrics [puppet] - 10https://gerrit.wikimedia.org/r/295900 (https://phabricator.wikimedia.org/T108827) [12:01:06] (03CR) 10Alexandros Kosiaris: [C: 031] Postgresql: init database with Puppet [puppet] - 10https://gerrit.wikimedia.org/r/295589 (https://phabricator.wikimedia.org/T138092) (owner: 10Gehel) [12:01:08] (03CR) 10jenkins-bot: [V: 04-1] diamond TCP collector: add TFO-related metrics [puppet] - 10https://gerrit.wikimedia.org/r/295900 (https://phabricator.wikimedia.org/T108827) (owner: 10Ema) [12:01:24] akosiaris: thanks! [12:03:01] (03PS2) 10Ema: diamond TCP collector: add TFO-related metrics [puppet] - 10https://gerrit.wikimedia.org/r/295900 (https://phabricator.wikimedia.org/T108827) [12:03:12] greg-g: transferring over from #wikimedia-mobile. Krenair might you be able to lend a hand with a deployment re: https://phabricator.wikimedia.org/T138578 ? tgr's at wikimania, so i don't think he's available. anomie starts in about and hour and and jdlrobson several hours later [12:03:40] Krenair: if you and i can get to the bottom of this and resolve sooner it would be most appreciated. if not, i can continue discussion with anomie and jdlrobson though [12:03:51] !log rebooting aqs1001.eqiad.wmnet for kernel upgrades [12:03:56] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [12:04:13] uhhh [12:04:19] !log elukey@palladium conftool action : set/pooled=no; selector: aqs1001.eqiad.wmnet [12:04:20] the train needs to be rolled back [12:04:24] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [12:04:42] https://phabricator.wikimedia.org/T138537 is not acceptable [12:05:17] hashar: also pinging you ^^^^^^^ [12:05:18] you around? [12:05:41] (03PS4) 10Gehel: Postgresql: init database with Puppet [puppet] - 10https://gerrit.wikimedia.org/r/295589 (https://phabricator.wikimedia.org/T138092) [12:06:56] yeah [12:07:09] who knows how to / can rollback the train? [12:07:22] I [12:07:43] unlikely I am going to rollback all the wiki on spot given that task seems to only impact the checkuser wiki [12:08:15] though I fully understand it is most probably extremely annoying at a quick glance [12:08:15] hashar: the thing i'm typing about is https://phabricator.wikimedia.org/T138578 . i think duploktm is for something else [12:08:15] (03CR) 10Gehel: [C: 032] Postgresql: init database with Puppet [puppet] - 10https://gerrit.wikimedia.org/r/295589 (https://phabricator.wikimedia.org/T138092) (owner: 10Gehel) [12:08:22] lets get some time to investigate the issue thoroughly and see if we can come with an easy fix [12:08:32] oh [12:08:41] was referring to checkuser css issue at https://phabricator.wikimedia.org/T138537 [12:08:42] hashar: it also affects nl.wikisource [12:09:15] I don't think it's acceptable for any wiki to not have styles like this [12:10:13] and there is https://phabricator.wikimedia.org/T138579 as well [12:11:39] that plus the edit save regression I think is grounds to move group1 back to wmf.6 [12:11:40] let me first grab a snack nearby. be back in 4 - 6 minutes [12:13:57] ok [12:15:35] back [12:16:13] https://nl.wikisource.org/wiki/Pagina:Darwin_-_Het_ontstaan_der_soorten_(1860).djvu/450 is quite "nice" [12:17:24] hashar: postmodern design [12:17:32] https://nl.wikisource.org/wiki/Pagina:Darwin_-_Het_ontstaan_der_soorten_(1860).djvu/450?debug=true [12:17:40] renders fine apparently [12:17:49] that would indicate some ressource loader / cache issue [12:21:07] yes [12:21:20] either someone needs to commit to debugging and fixing it or it needs to be reverted imo [12:21:25] ema: bblack: is one of you familiar with resource loader caching ? https://checkuser.wikimedia.org/wiki/Main_Page lacks some css but passing ?debug=true seems to render just fine [12:21:45] and I'm guessing the people who would normally do that (ori/Krinkle) aren't around right now, so I'd recommend reverting [12:22:09] hmm, wmf.7 seems unlucky? i also noticed https://phabricator.wikimedia.org/T138585 about it. [12:22:26] might have occurred rather recently [12:26:57] (03PS1) 10Alexandros Kosiaris: ganglia: Restrict ganglia aggregation to just production [puppet] - 10https://gerrit.wikimedia.org/r/295902 (https://phabricator.wikimedia.org/T122396) [12:28:58] (03PS1) 10Faidon Liambotis: Drain esams for network maintenance [dns] - 10https://gerrit.wikimedia.org/r/295904 [12:29:18] bblack, ema ^^ [12:29:51] happy friday morning / afternoon! [12:31:01] I dont think it is wmf.7 [12:31:01] (03CR) 10Faidon Liambotis: [C: 032] Drain esams for network maintenance [dns] - 10https://gerrit.wikimedia.org/r/295904 (owner: 10Faidon Liambotis) [12:31:20] filling a new task and shuffling tasks around [12:34:50] !log Random resource loader entries are apparently faulty causing issues with css and/or javascript T138586 [12:34:51] T138586: Resource loader stall cache / breakage - https://phabricator.wikimedia.org/T138586 [12:34:53] ACKNOWLEDGEMENT - HP RAID on ms-be2022 is CRITICAL: CHECK_NRPE: Socket timeout after 20 seconds. Filippo Giunchedi https://gerrit.wikimedia.org/r/#/c/295892/ [12:34:53] ACKNOWLEDGEMENT - HP RAID on ms-be2023 is CRITICAL: CHECK_NRPE: Socket timeout after 20 seconds. Filippo Giunchedi https://gerrit.wikimedia.org/r/#/c/295892/ [12:34:53] ACKNOWLEDGEMENT - HP RAID on ms-be2024 is CRITICAL: CHECK_NRPE: Socket timeout after 20 seconds. Filippo Giunchedi https://gerrit.wikimedia.org/r/#/c/295892/ [12:34:55] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [12:35:20] duploktm: dr0ptp4kt: I have filled a new "umbrella" task https://phabricator.wikimedia.org/T138586 [12:36:20] * aude waves [12:36:33] * dr0ptp4kt waves to aude [12:36:38] thx hashar [12:37:09] https://phabricator.wikimedia.org/T138586 and https://phabricator.wikimedia.org/T138585 are the reasons deployment was reverted? [12:37:12] duploktm: thanks for digging into as well. duploktm aren't you at wikimania? [12:37:24] (03CR) 10Alexandros Kosiaris: [C: 032] ganglia: Restrict ganglia aggregation to just production [puppet] - 10https://gerrit.wikimedia.org/r/295902 (https://phabricator.wikimedia.org/T122396) (owner: 10Alexandros Kosiaris) [12:37:26] dr0ptp4kt: I am [12:37:38] duploktm: parallel processing [12:37:55] (03CR) 10Faidon Liambotis: [C: 031] "That'll do. We should investigate if we can make it a little faster, 34s is crazy long." [puppet] - 10https://gerrit.wikimedia.org/r/295892 (owner: 10Filippo Giunchedi) [12:38:11] aude: see https://phabricator.wikimedia.org/T138586 and children and grandchildren tasks [12:38:12] i also noticed broken styles on mobile [12:38:23] but might be unrelated [12:38:35] aude: agree with your assessment [12:38:44] e.g. https://m.wikidata.org/wiki/Special:Search [12:38:47] where's the search box [12:39:12] * aude sees the one at top that is everywhere but there should be another one [12:39:15] aude: cf https://phabricator.wikimedia.org/T138578 [12:39:38] also https://en.m.wikipedia.org/wiki/Main_Page is showing the desktop main page [12:39:42] aude: mind adding that special:search thing to it? [12:39:50] dr0ptp4kt: go ahead [12:40:45] aude: the search bar actually works swimmingly for me on https://m.wikidata.org/wiki/Special:Search . it's the only place it's styled correctly with the new search bar, ironically! [12:40:57] aude: well, actually... [12:41:14] maybe i don't understand how it is new [12:41:21] aude: i should expect it would work. the group0 and group1 servers seem fine. it's the group2 servers, particularly the wikipedias, where that gray search chrome isn't looking good [12:41:24] https://m.wikidata.org/wiki/Special:SetSiteLink [12:41:25] e.g. [12:41:40] the input box is unstyled [12:41:51] aude: want to hop on google hangout for a few minutes? [12:41:58] * aude is at wikimania [12:42:14] aude: connection not so fast? [12:42:19] i am in a session :) [12:42:24] aude: ah [12:42:27] and probably not fast enough [12:42:35] aude: ok, lemme post a couple screenshots real quick [12:42:39] ok [12:43:01] aude: here's what i see [12:43:57] https://usercontent.irccloud-cdn.com/file/Q6F13cOS/search_works_for_me.png [12:44:04] aude: ^ there's what i see [12:44:16] aude: on wikidata anyway [12:44:29] on the group2 servers it's using the old styling, like so... [12:45:14] https://usercontent.irccloud-cdn.com/file/ycGs7Psp/search_legacy.png [12:45:25] guess i expect to see a second search box below the heading and explanation text [12:45:27] but unfortunately on tap it's mixing styles, like so... [12:45:31] maybe that's not how it is [12:45:55] https://usercontent.irccloud-cdn.com/file/MzyHj7RZ/search_group2_ontap.png [12:45:56] enwiki has no explanation and just has the search box + heading [12:47:18] https://phabricator.wikimedia.org/F4197935 [12:48:17] Wikipedias are on wmf.6 still [12:48:21] yeah [12:48:55] 06Operations, 10Wikimedia-SVG-rendering, 07Upstream: Incorrect text positioning in SVG rasterization (any extreme down scale) (fixed in upstream 2.40.13) - https://phabricator.wikimedia.org/T65703#2404617 (10Perhelion) >>! In T65703#2401510, @MoritzMuehlenhoff wrote: > hyper-old.png is the result with the cu... [12:51:39] aude: with grade c / noscript browsers there is a button, but normally it's suppressed by mfe iirc. or are you saying it used to have a search field in the main body of the page with a more modern rl-capable browser? here's what i see when i turn off js in ff: [12:52:00] all the examples are on wmf.7 apparently [12:52:28] dr0ptp4kt: i don't remember how it was before [12:52:44] (03PS1) 10Elukey: Add Thrift port (9160) to Ferm's whitelist to allow Cassandra bulk loading. [puppet] - 10https://gerrit.wikimedia.org/r/295907 [12:52:58] thought it was another case of form styles missing like https://m.wikidata.org/wiki/Special:SetLabelDescriptionAliases [12:53:27] hashar: are you saying the examples of resourceloader problems are on 1.27? on my side when i access en.m.wikipedia.org/wiki/Special:Version it says it's 1.28 [12:53:35] hashar: disregard [12:53:45] ohh [12:53:48] hashar: yes, it's wmf.6 there not wmf.7 [12:53:57] I was looking for a link such as https://en.m.wikipedia.org/wiki/Special:Version :D [12:54:12] yeah, it's 1.28.0-wmf.6 [12:54:30] excellent [12:54:31] PROBLEM - Router interfaces on cr2-knams is CRITICAL: CRITICAL: host 91.198.174.246, interfaces up: 52, down: 1, dormant: 0, excluded: 0, unused: 0BRxe-1/0/0: down - Core: csw2-esams:xe-0/1/1 (GBLX leg 2) {#14007} [10Gbps DF CWDM C49]BR [12:54:36] that means we dont have to rollback to wmf.6 [12:54:54] but the issue is somewhere else (maybe the l10n update that ran over night or whatever other issue) [12:55:10] isn't it ? [12:55:32] oh I am confused [12:55:36] that is the mobile version [12:56:01] PROBLEM - puppet last run on mw2186 is CRITICAL: CRITICAL: puppet fail [12:56:10] PROBLEM - Router interfaces on mr1-esams is CRITICAL: CRITICAL: host 91.198.174.247, interfaces up: 33, down: 1, dormant: 0, excluded: 0, unused: 0BRge-2/0/0: down - Tilaa OOB swap [1Gbps DF]BR [12:56:51] RECOVERY - Router interfaces on cr2-knams is OK: OK: host 91.198.174.246, interfaces up: 53, down: 0, dormant: 0, excluded: 0, unused: 0 [12:57:37] hashar: perhaps. adding a cachebuster to https://en.m.wikipedia.org/wiki/Special:Version?cb=fooo2 just to be sure, i see it's a varnish cache miss and "in fact" it's 1.28.0-wmf.6. [12:57:58] hashar: i was worried maybe we were getting a bad cached page :) [12:58:09] well https://en.m.wikipedia.org/wiki/Special:Version is fine isn't it ? [12:58:15] says 1.28.0-wmf.6 for me on en wikipedia [12:58:32] PROBLEM - Host mr1-esams.oob is DOWN: PING CRITICAL - Packet loss = 100% [12:58:56] yeah we had wmf.7 rollbacked from wikipedia yesterday due to a regression in edit save timing which is tracked in a different task [12:59:00] so hashar was the train stopped before it went out last night, or was the train rolled back for group2? [12:59:10] we did deploy [12:59:12] hashar: nm, you answered right as i sent the question! [12:59:36] and after a couple hour or so rollbacked due to some bad interaction due to apparently AbuseFilter. https://phabricator.wikimedia.org/T138550 [12:59:45] hashar: ok. so it's conceivable the cache became polluted as a consequence. got it [13:00:43] Special:Version is never cached [13:01:19] duploktm: yeah, i know - but i saw the middle server showing a cache pass instead of a miss [13:01:27] duploktm: so i thought...just be sure :) [13:05:17] so basically [13:05:21] I have no clue what is going on really [13:05:36] been looking at various logs and there is nothing standing out [13:05:41] (03CR) 10Elukey: [C: 04-1] "https://puppet-compiler.wmflabs.org/3186/aqs1001.eqiad.wmnet/change.aqs1001.eqiad.wmnet.err" [puppet] - 10https://gerrit.wikimedia.org/r/295907 (owner: 10Elukey) [13:06:57] (03PS2) 10Elukey: Add Thrift port (9160) to Ferm's whitelist to allow Cassandra bulk loading. [puppet] - 10https://gerrit.wikimedia.org/r/295907 [13:08:14] hashar, dr0ptp4kt: Stuff is broken? Do we know any more than just mystery RL-not-working since the automatic l10nupdate? [13:09:10] anomie: I am not even sure the l10n update causes the issue [13:09:13] anomie: i don't know anything more than that. the group 0 and 1 servers are at 1.28-wmf.7 and the group 2 were apparently rolled back to 1.28-wmf.6. do i have that right hashar ? [13:09:29] it just shows up a lot of logstash events in the "resource loader" since 3:14am when the l10nupdate ran [13:09:32] hashar: do you know what time the problem started? [13:09:36] nop [13:10:32] anomie: a trivial example is https://checkuser.wikimedia.org/wiki/Main_Page showing now css :/ [13:14:24] https://jigsaw.w3.org/css-validator/validator?uri=https%3A%2F%2Fcheckuser.wikimedia.org%2Fwiki%2FMain_Page&profile=css3&usermedium=all&warning=1&vextwarning=&lang=en [13:14:44] (03CR) 10Muehlenhoff: [C: 04-1] Add Thrift port (9160) to Ferm's whitelist to allow Cassandra bulk loading. (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/295907 (owner: 10Elukey) [13:20:02] (03PS3) 10Elukey: Add Thrift port (9160) to Ferm's whitelist to allow Cassandra bulk loading. [puppet] - 10https://gerrit.wikimedia.org/r/295907 [13:20:29] fwiw, other than the special page styles, i haven't seen any reports of css/js problems on wikidata [13:22:18] !log re-enabling puppet on maps1002 (still in pre-configuration state, only default role) [13:22:22] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [13:23:19] (03CR) 10Muehlenhoff: [C: 031] Add Thrift port (9160) to Ferm's whitelist to allow Cassandra bulk loading. [puppet] - 10https://gerrit.wikimedia.org/r/295907 (owner: 10Elukey) [13:24:31] (03CR) 10Elukey: [C: 032] Add Thrift port (9160) to Ferm's whitelist to allow Cassandra bulk loading. [puppet] - 10https://gerrit.wikimedia.org/r/295907 (owner: 10Elukey) [13:24:59] RECOVERY - puppet last run on mw2186 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [13:29:25] hashar: duploktm anomie aude is there any possibility the revert didn't actually update all of the server files to the old version? [13:29:40] that is, the actual mediawiki files [13:30:15] it's a different set of files [13:30:25] and was the i18n job re-run after the revert? [13:30:34] dr0ptp4kt: Not that I can think of. The revert should just have pointed the wikis at the different path. [13:30:37] i don't think it's necessary to re-run [13:30:55] * aude runs to next session [13:31:03] anomie: aude...fumbling around in the dark...long time since looking at these scripts [13:42:51] AH progress [13:43:12] if I inject a with solely the skins.vector.styles module it works fine (i.e.: /w/load.php?debug=false&lang=en&modules=skins.vector.styles&only=styles&skin=vector ) [13:44:53] dr0ptp4kt, aude, hashar : It's looking to me like either (a) RL somehow used to fix CSS errors in stuff like Common.css, but isn't doing that now, or (b) RL started doing something different that's making browsers recover differently from syntax errors in code included from Common.css. [13:45:11] 06Operations, 10Wikimedia-SVG-rendering, 13Patch-For-Review: Install Amiri font (arabic) for svg - https://phabricator.wikimedia.org/T135347#2404717 (10MoritzMuehlenhoff) @Uwe_a : You can query a refresh by passing ?action=purge to the URL, e.g. https://commons.wikimedia.org/wiki/File:Sinai_-_Camp_David_Trea... [13:45:59] PROBLEM - Router interfaces on cr2-knams is CRITICAL: CRITICAL: host 91.198.174.246, interfaces up: 47, down: 2, dormant: 0, excluded: 0, unused: 0BRxe-0/0/0: down - Core: csw2-esams:xe-2/1/1 (GBLX leg 1) {#14006} [10Gbps DF CWDM C61]BRxe-0/0/3: down - Core: csw2-esams:xe-3/1/1 [10Gbps DF CWDM C47]BR [13:46:08] that's ^^ expected [13:46:30] anomie: yeah that it is [13:46:40] (03PS3) 10Muehlenhoff: Add Amiri font to the scalers [puppet] - 10https://gerrit.wikimedia.org/r/295498 (https://phabricator.wikimedia.org/T135347) [13:46:40] anomie: If I drop the "site.styles" module it works just fine [13:46:44] For example, https://nl.wikisource.org/wiki/MediaWiki:Common.css has what's supposed to be "/* \nblah blah wikitext\n
 */", except some of them seem to be missing the "/*" and that's making the browser lose some of the styles in the vicinity..
[13:47:25] 	 When I look at checkuserwiki's MediaWiki:Common.css using eval.php, there's a missing "}" in one place.
[13:47:42] 	 I am quoting you on the task
[13:47:48] 	 (03CR) 10Krinkle: "fixme: It seems this change the behaviour for many wikis, including enwiki. It was previously part of default=>true. But is not listed in " [mediawiki-config] - 10https://gerrit.wikimedia.org/r/295600 (https://phabricator.wikimedia.org/T138425) (owner: 10Jdlrobson)
[13:48:58] 	 hey Krinkle you see the backscroll?
[13:49:27] 	 so looks like we have a good candidate now
[13:49:30] 	 Or.... Did RL even load Common.css merged with other styles before?
[13:49:39] 	 I have updated the task at https://phabricator.wikimedia.org/T138586
[13:49:42] 	 hashar: anomie i remember one time i messed up my comments in a js file and this very thing happened. seems like forever ago :)
[13:49:55] 	 anomie: I am comparing with enwiki
[13:51:09] 	 I did a diff of enwiki html vs checkuser html
[13:51:22] 	 and that is how I went to try tweaking the stylesheet url
[13:53:04] 	 (03CR) 10Muehlenhoff: [C: 032 V: 032] Add Amiri font to the scalers [puppet] - 10https://gerrit.wikimedia.org/r/295498 (https://phabricator.wikimedia.org/T135347) (owner: 10Muehlenhoff)
[13:53:05] 	 dr0ptp4kt: that is what I like with wmf. Manager that already had experimented issues on the live site :-}
[13:53:24] * hashar digs in RL code
[13:54:54] 	 * 93ed259 - resourceloader: Create 'site.styles' module (Wed Jun 15 23:06:50 2016 -0700) 
[13:55:33] 	 hashar, dr0ptp4kt: Yeah, I think that's it. It looks like for whatever reason RL used to include the "site" module CSS as its own thing, so syntax errors there would be localized to just itself. Now that we have site.styles that's being merged with other CSS, syntax errors can bleed into other modules too.
[13:55:37] 	 06Operations, 10Traffic: Backport iproute2 4.x  from debian testing -> our jessie - https://phabricator.wikimedia.org/T138591#2404751 (10BBlack)
[13:57:22] 	 hashar, dr0ptp4kt: Yeah. Tested on mw1017 by reverting just https://gerrit.wikimedia.org/r/#/c/292972/5/includes/OutputPage.php and suddenly styles work again.
[13:57:34] 	 And then break again when I unrevert.
[13:58:05] 	 ah
[13:58:18] 	 that is what I love with you anomie: you know how to live hack :}}
[13:58:21] 	 Krinkle: any adverse impacts if touching https://github.com/wikimedia/mediawiki/commit/93ed259cf3e52a4f5e192c233b2df56d88a0e14c ?
[13:58:40] 	 dr0ptp4kt: hashar: It's being fixed now.
[13:58:42] 	 See my latest patch
[13:58:45] 	 deploying now
[13:58:50] 	 \O/
[13:58:50] 	 nice!
[13:59:06] 	 Flow and Math have uncommitted submodule changes on tin
[13:59:11] 	 Tssk!
[13:59:56] 	  hashar duploktm aude anomie Krinkle....now let's see if that mobile thing picks it up, too :)
[14:00:04] 	 if you can later come up with a test to ensure the site is loaded independently, that will be quite nice to have :}
[14:00:16] 	 hashar: pulls the test card :)
[14:00:24] 	 er...draws the test card :)
[14:00:30] 	 good idea
[14:00:35] 	 I love those one liner fix up
[14:01:10] 	 that is a proof that line of codes /  commit counts is irrelevant to assert productivity / skill of a dev !
[14:01:23] 	 Krinkle: thank you !
[14:01:39] 	 so looks like we (I) will not have to rollback all wikis to wmf.6
[14:02:33] 	 !log scb100x disabled puppet to clear changeprop queues
[14:02:38] 	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[14:02:39] 	 anomie: thanks a ton for all the help!
[14:02:49] 	 nice work anomie et al
[14:02:59] 	 hashar: what's the roll forward plan for wmf.7?
[14:03:23] 	 I believe we will stick to the current situation until next week
[14:03:31] 	 hashar: okey dokey
[14:03:34] 	 so we get a chance to investigate the regression in edit saving
[14:03:48] 	 which seems to be related to some change in AbuseFilter and is probably harmless
[14:04:13] 	 then I guess it will be fixed today or over the week-end and we can give a try at wmf.7 on monday
[14:04:24] 	 and hopefully get wmf.8  cut on Tuesday as usual
[14:04:53] 	 I took a few traces yesterday around midnight my time, completed my investigation early on this week-end
[14:05:04] 	 hashar: thx
[14:05:12] 	 will sync with rest of #releng to figure out the .plan and I guess greg will do the announce
[14:05:16] 	 https://checkuser.wikimedia.org/wiki/Main_Page?bust now works as expected.
[14:05:33] 	 Krinkle: is the thing sync'd across the cluster?
[14:05:41] 	 yep, almost finished
[14:05:41] 	 !log krinkle@tin Synchronized php-1.28.0-wmf.7/includes/OutputPage.php: T138586 hotfix (duration: 00m 47s)
[14:05:42] 	 T138586: Skin stylesheet no longer unaffected by broken Common.css as of 1.28.0-wmf.7 - https://phabricator.wikimedia.org/T138586
[14:05:43] 	 and done
[14:05:46] 	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[14:05:52] 	 thx. any lag for rl?
[14:06:00] 	 Krinkle: ^
[14:06:05] 	 or just go check now?
[14:06:14] 	 capturing IRC logs to the task
[14:07:20] 	 PROBLEM - changeprop endpoints health on scb1002 is CRITICAL: Generic error: Generic connection error: HTTPConnectionPool(host=10.64.16.21, port=7272): Max retries exceeded with url: /?spec (Caused by ProtocolError(Connection aborted., error(111, Connection refused)))
[14:07:58] 	 dr0ptp4kt: It's an HTML cache
[14:08:02] 	 duploktm: checkuser.wikimedia.org layout should be fixed now
[14:08:06] 	 dr0ptp4kt: So it's subject to Varnish html cache unfortunately.
[14:08:11] 	 PROBLEM - changeprop endpoints health on scb1001 is CRITICAL: Generic error: Generic connection error: HTTPConnectionPool(host=10.64.0.16, port=7272): Max retries exceeded with url: /?spec (Caused by ProtocolError(Connection aborted., error(111, Connection refused)))
[14:08:13] 	 Krinkle anomie hashar  looks like https://en.m.wikipedia.org/wiki/Main_Page?cb=1234234234 (not cachebuster param, also action=purge'd) is still pegged at the old stuff
[14:08:26] 	 so i wonder if that swat patch from last night needs to be reverted?
[14:08:26] 	 That's separate unrelated bug.
[14:08:30] 	 Krinkle: yeah
[14:08:32] 	 Yeah, probably. 
[14:08:48] 	 I just left a comment on jdlrobson 's config change that it wasn't a no-op as intended.
[14:08:52] 	 It changes behaviour of many wikis.
[14:08:54] 	 Including enwiki
[14:09:27] * hashar celebrates Krinkle 
[14:09:52] 	 so bunch of pages will have their layout broken until the HTML cache is purged / expires in varnish
[14:10:30] 	 RECOVERY - nutcracker port on mw2246 is OK: TCP OK - 0.000 second response time on 127.0.0.1 port 11212
[14:10:47] 	 hashar: "only" 24 hours now.
[14:10:50] 	 RECOVERY - nutcracker process on mw2246 is OK: PROCS OK: 1 process with UID = 110 (nutcracker), command name nutcracker
[14:11:04] 	 and less considering that it only affects pages rendered after last night.
[14:11:10] 	 RECOVERY - MD RAID on mw2246 is OK: OK: Active: 4, Working: 4, Failed: 0, Spare: 0
[14:11:17] 	 we no more cache for 30 days ?
[14:11:26] 	 We dropped to 20d 2 months ago
[14:11:29] 	 RECOVERY - salt-minion processes on mw2246 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion
[14:11:30] 	 and since then to 10, 7, 2 and now 1d.
[14:11:39] 	 whaa
[14:11:41] 	 RECOVERY - Check size of conntrack table on mw2246 is OK: OK: nf_conntrack is 0 % full
[14:11:41] 	 hashar: https://phabricator.wikimedia.org/T124954
[14:11:43] 	 what a change!
[14:11:45] 	 well, it's simple enough to action=purge the big ones
[14:11:49] 	 RECOVERY - DPKG on mw2246 is OK: All packages OK
[14:11:59] 	 RECOVERY - Disk space on mw2246 is OK: DISK OK
[14:12:19] 	 RECOVERY - configured eth on mw2246 is OK: OK - interfaces up
[14:12:20] 	 RECOVERY - NTP on mw2246 is OK: NTP OK: Offset 0.002087950706 secs
[14:12:30] 	 RECOVERY - dhclient process on mw2246 is OK: PROCS OK: 0 processes with command name dhclient
[14:12:40] 	 I am still in awe at the amount of changes that have been conducted over the last couple years
[14:12:47] 	 hashar: Zuul's gate-and-submit seems stuck.
[14:13:12] 	 yeah there is a job stuck
[14:13:16] 	 (03PS1) 10Muehlenhoff: Move graphite ferm rules out of role:graphite::base [puppet] - 10https://gerrit.wikimedia.org/r/295919 
[14:13:22] 	 got cancelled and the whole queue is being reprocessed
[14:13:26] 	 oh, jobs have been restarted. Cool.
[14:13:42] 	 ty for the action.
[14:13:46] 	 (03PS2) 10Filippo Giunchedi: raid: bump hpssacli nrpe timeout [puppet] - 10https://gerrit.wikimedia.org/r/295892 
[14:13:52] 	 (03CR) 10Filippo Giunchedi: [C: 032 V: 032] raid: bump hpssacli nrpe timeout [puppet] - 10https://gerrit.wikimedia.org/r/295892 (owner: 10Filippo Giunchedi)
[14:14:01] 	 siebrand: translatewiki should really be moved to its own queue 
[14:14:21] 	 RECOVERY - Router interfaces on mr1-esams is OK: OK: host 91.198.174.247, interfaces up: 37, down: 0, dormant: 0, excluded: 0, unused: 0
[14:14:28] 	 siebrand: or we can move it to Differential if you are willing to be an early adopter :-}    Mukunda has CI support in Differential now!
[14:14:30] 	 hashar: If you say so... ;)
[14:14:39] 	 that is the future!!
[14:15:04] 	 Will we be able to have the same checkers, or will most checks no longer be performed?
[14:15:11] 	 RECOVERY - changeprop endpoints health on scb1001 is OK: All endpoints are healthy
[14:15:56] 	 (03CR) 10Filippo Giunchedi: [C: 031] diamond TCP collector: add TFO-related metrics [puppet] - 10https://gerrit.wikimedia.org/r/295900 (https://phabricator.wikimedia.org/T108827) (owner: 10Ema)
[14:16:52] 	 (03PS1) 10Elukey: Enable Thrift RCP service on aqs100[456] [puppet] - 10https://gerrit.wikimedia.org/r/295920 
[14:17:59] 	 RECOVERY - Host mr1-esams.oob is UP: PING OK - Packet loss = 0%, RTA = 81.39 ms
[14:18:10] 	 (03CR) 10Muehlenhoff: "PCC: http://puppet-compiler.wmflabs.org/3191/" [puppet] - 10https://gerrit.wikimedia.org/r/295919 (owner: 10Muehlenhoff)
[14:18:36] 	 !log shutting down ms-fe3002 due to on-site work
[14:18:41] 	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[14:18:57] 	 (03PS2) 10Elukey: Enable Thrift RCP service on aqs100[456] [puppet] - 10https://gerrit.wikimedia.org/r/295920 
[14:19:55] 	 siebrand: should be the same coverage. At least for the common entry point such as npm test / composer test etc
[14:20:17] 	 okay. I'll discuss with Nikerabbit.
[14:20:19] 	 PROBLEM - Redis status tcp_6479 on rdb2006 is CRITICAL: CRITICAL: replication_delay is 632 600 - REDIS on 10.192.48.44:6479 has 1 databases (db0) with 5218064 keys - replication_delay is 632
[14:20:25] 	 siebrand: it is still preliminary work. The intent is to be able to migrate the simplest repo right now
[14:21:05] 	 siebrand: for translatewiki we would need to add shellint and  puppet validate / puppet lint.  The later can probably be migrated to a ruby rake task instead
[14:21:30] 	 PROBLEM - Host ms-fe3002 is DOWN: PING CRITICAL - Packet loss = 100%
[14:24:06] 	 quick away, be back in a few minutes
[14:25:55] 	 (03CR) 10Elukey: [C: 032] "Puppet compiler looks good!" [puppet] - 10https://gerrit.wikimedia.org/r/295920 (owner: 10Elukey)
[14:27:58] 	 06Operations, 10Traffic, 06Community-Liaisons (Jul-Sep-2016): Help contact bot owners about the end of HTTP access to the API - https://phabricator.wikimedia.org/T136674#2404878 (10BBlack) Since the last update (past ~4 days):  New usernames: ``` Electron_Bot Pahles KSFT Amalthea_(bot) Qsx753698 AlphamaBot `...
[14:28:30] 	 PROBLEM - Juniper alarms on csw2-esams.mgmt.esams.wmnet is CRITICAL: JNX_ALARMS CRITICAL - 1 red alarms, 0 yellow alarms
[14:28:42] 	 06Operations, 10netops, 13Patch-For-Review: block labs IPs from sending data to prod ganglia - https://phabricator.wikimedia.org/T115330#2404879 (10akosiaris) 05Open>03Resolved a:03akosiaris This is finally fixed in rOPUPb3ef0ad. labs VMs in https://ganglia.wikimedia.org/latest/?r=hour&cs=&ce=&m=cpu_re...
[14:29:30] 	 PROBLEM - changeprop endpoints health on scb1001 is CRITICAL: Generic error: Generic connection error: HTTPConnectionPool(host=10.64.0.16, port=7272): Max retries exceeded with url: /?spec (Caused by ProtocolError(Connection aborted., error(111, Connection refused)))
[14:29:30] 	 06Operations, 10Traffic, 13Patch-For-Review: Decrease max object TTL in varnishes - https://phabricator.wikimedia.org/T124954#2404883 (10Krinkle) @BBlack I agree that technically "Not Modified" is a lie from MediaWiki in that case, but I'm not convinced that behaviour is wrong or needs changing.  In many cas...
[14:31:24] 	 akosiaris: \o/
[14:32:04] 	 :-)
[14:32:53] 	 (03PS1) 10Ema: tlsproxy: document safe/unsafe TFO usage [puppet] - 10https://gerrit.wikimedia.org/r/295925 (https://phabricator.wikimedia.org/T108827) 
[14:35:10] 	 RECOVERY - HP RAID on ms-be2022 is OK: OK: Slot 3: OK: 2I:4:1, 2I:4:2, 1I:1:5, 1I:1:6, 1I:1:7, 1I:1:8, 1I:1:1, 1I:1:2, 1I:1:3, 1I:1:4, 2I:2:1, 2I:2:2, 2I:2:3, 2I:2:4, Controller, Battery/Capacitor
[14:36:21] 	 (03PS1) 10Dr0ptp4kt: Restore mobile formatting for enwiki mdot [mediawiki-config] - 10https://gerrit.wikimedia.org/r/295926 (https://phabricator.wikimedia.org/T138425) 
[14:36:58] 	 RECOVERY - Redis status tcp_6479 on rdb2006 is OK: OK: REDIS on 10.192.48.44:6479 has 1 databases (db0) with 5170257 keys - replication_delay is 0
[14:38:42] 	 hashar Krinkle anomie requesting review and deploy on ^^ . cc jdlrobson 
[14:39:13] 	 dr0ptp4kt: Is that the only wiki previously covered by 'default' and not explicitly listed there?
[14:39:21] 	 dr0ptp4kt: this one i have really no clue what it is going to do really :(
[14:39:41] 	 Krinkle: i don't know for sure, i didn't do the audit of the sites :(
[14:39:57] 	 Krinkle: one moment, lemme see if i can reconstruct
[14:39:59] 	 dr0ptp4kt: It's simple. The legacy dblist should contain all.dblist - expansion of the previously false values
[14:40:18] 	 (wikidata, wikiquote, wiktionary)
[14:40:24] 	 So presumably all wikipedias
[14:40:40] 	 dr0ptp4kt: This is following up https://gerrit.wikimedia.org/r/#/c/295600/?
[14:40:48] 	 (03CR) 10Krinkle: Restore mobile formatting for enwiki mdot (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/295926 (https://phabricator.wikimedia.org/T138425) (owner: 10Dr0ptp4kt)
[14:41:51] 	 (03PS1) 10Elukey: Add mw2246 (videoscaler) to the MW Scap DSH list. [puppet] - 10https://gerrit.wikimedia.org/r/295930 
[14:42:17] 	 _joe_ --^ can I proceed?
[14:42:19] 	 anomie: yeah. i think the whole list needs to be reviewed. but of course enwiki homepage is particularly painful
[14:42:27] 	 06Operations, 10Traffic, 13Patch-For-Review: Decrease max object TTL in varnishes - https://phabricator.wikimedia.org/T124954#2404938 (10BBlack) Well, it's certainly legal from some point of view.  But if you want to claim Not Modified on what are considered minor non-breaking changes then you have to live w...
[14:43:15] 	 (03PS3) 10Ema: diamond TCP collector: add TFO-related metrics [puppet] - 10https://gerrit.wikimedia.org/r/295900 (https://phabricator.wikimedia.org/T108827) 
[14:43:27] 	 (03CR) 10Ema: [C: 032 V: 032] diamond TCP collector: add TFO-related metrics [puppet] - 10https://gerrit.wikimedia.org/r/295900 (https://phabricator.wikimedia.org/T108827) (owner: 10Ema)
[14:43:40] 	 Krinkle: anomie hashar would you advise we fix this one glaring case now and address the rest later today (more isolated change this way, addresses specific problem in very visible pain spot, etc), or update the whole thing in one fell swoop?
[14:43:44] 	 (03PS2) 10Ema: tlsproxy: document safe/unsafe TFO usage [puppet] - 10https://gerrit.wikimedia.org/r/295925 (https://phabricator.wikimedia.org/T108827) 
[14:43:52] 	 (03CR) 10Ema: [C: 032 V: 032] tlsproxy: document safe/unsafe TFO usage [puppet] - 10https://gerrit.wikimedia.org/r/295925 (https://phabricator.wikimedia.org/T108827) (owner: 10Ema)
[14:43:57] 	 dr0ptp4kt: I'm working on running a comparison now
[14:44:02] 	 thanks anomie 
[14:47:25] 	 (03PS2) 10Dr0ptp4kt: Restore mobile formatting for enwiki mdot [mediawiki-config] - 10https://gerrit.wikimedia.org/r/295926 (https://phabricator.wikimedia.org/T138578) 
[14:47:42] <_joe_>	 elukey: sure thing
[14:47:51] <_joe_>	 just check the scaler doesn't start
[14:47:54] <_joe_>	 (the service)
[14:48:28] 	 RECOVERY - Juniper alarms on csw2-esams.mgmt.esams.wmnet is OK: JNX_ALARMS OK - 0 red alarms, 0 yellow alarms
[14:49:44] 	 dr0ptp4kt: On tin, /home/anomie/old-mobilemainpagelegacy.dblist has what I believe is the equivalent of the old list of wikis the setting was enabled on, and /home/anomie/mobilemainpagelegacy-removed.dblist is just the wikis that were removed from the list in that change.
[14:50:41] 	 06Operations, 10Traffic, 13Patch-For-Review: Decrease max object TTL in varnishes - https://phabricator.wikimedia.org/T124954#2404952 (10Krinkle) For as long as I can remember (at least 6 years), we've made countless breaking changes based on the basic assumption that caches roll over within ttl ("30 days")....
[14:52:08] 	 RECOVERY - Juniper alarms on cr2-esams is OK: JNX_ALARMS OK - 0 red alarms, 0 yellow alarms
[14:52:23] 	 06Operations, 10Monitoring: investigate speeding up hp raid checks - https://phabricator.wikimedia.org/T138597#2404965 (10fgiunchedi)
[14:52:34] 	 06Operations, 10Monitoring: investigate speeding up hp raid checks - https://phabricator.wikimedia.org/T138597#2404977 (10fgiunchedi) p:05Triage>03Low
[14:53:34] 	 (03PS2) 10Elukey: Add mw2246 (videoscaler) to the MW Scap DSH list. [puppet] - 10https://gerrit.wikimedia.org/r/295930 
[14:54:08] 	 (03CR) 10Filippo Giunchedi: "the http rule can be kept common I think, LGTM other than that" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/295919 (owner: 10Muehlenhoff)
[14:55:37] 	 (03CR) 10Elukey: [C: 032] Add mw2246 (videoscaler) to the MW Scap DSH list. [puppet] - 10https://gerrit.wikimedia.org/r/295930 (owner: 10Elukey)
[14:56:18] 	 PROBLEM - Redis status tcp_6479 on rdb2006 is CRITICAL: CRITICAL: replication_delay is 662 600 - REDIS on 10.192.48.44:6479 has 1 databases (db0) with 5171334 keys - replication_delay is 662
[14:57:49] 	 RECOVERY - Host ms-fe3002 is UP: PING OK - Packet loss = 0%, RTA = 82.97 ms
[15:00:08] 	 (03CR) 10Alexandros Kosiaris: [C: 04-1] "Looking a bit more at the output, I can do better. The new rules are somewhat more lax by allowing ulsfo and esams for no good reason for " [puppet] - 10https://gerrit.wikimedia.org/r/291819 (owner: 10Alexandros Kosiaris)
[15:02:57] 	 (03PS1) 10Faidon Liambotis: Revert "Drain esams for network maintenance" [dns] - 10https://gerrit.wikimedia.org/r/295934 
[15:04:08] 	 RECOVERY - HP RAID on ms-be2023 is OK: OK: Slot 3: OK: 2I:4:1, 2I:4:2, 1I:1:5, 1I:1:6, 1I:1:7, 1I:1:8, 1I:1:1, 1I:1:2, 1I:1:3, 1I:1:4, 2I:2:1, 2I:2:2, 2I:2:3, 2I:2:4, Controller, Battery/Capacitor
[15:05:18] 	 RECOVERY - Redis status tcp_6479 on rdb2006 is OK: OK: REDIS on 10.192.48.44:6479 has 1 databases (db0) with 5170561 keys - replication_delay is 0
[15:05:20] 	 (03CR) 10Faidon Liambotis: [C: 032] Revert "Drain esams for network maintenance" [dns] - 10https://gerrit.wikimedia.org/r/295934 (owner: 10Faidon Liambotis)
[15:06:17] 	 RECOVERY - HP RAID on ms-be2024 is OK: OK: Slot 3: OK: 2I:4:1, 2I:4:2, 1I:1:5, 1I:1:6, 1I:1:7, 1I:1:8, 1I:1:1, 1I:1:2, 1I:1:3, 1I:1:4, 2I:2:1, 2I:2:2, 2I:2:3, 2I:2:4, Controller, Battery/Capacitor
[15:07:36] 	 06Operations, 06Discovery, 06Maps, 10hardware-requests: 2 servers for maps-beta cluster - https://phabricator.wikimedia.org/T138600#2405018 (10Gehel)
[15:07:47] 	 06Operations, 06Discovery, 06Maps, 10hardware-requests: 2 servers for maps-beta cluster - https://phabricator.wikimedia.org/T138600#2405034 (10Gehel)
[15:08:28] 	 (03PS6) 10Alexandros Kosiaris: networks::constants: use slice_network_constants [puppet] - 10https://gerrit.wikimedia.org/r/291819 
[15:08:36] 	 06Operations, 06Discovery, 06Maps, 10hardware-requests: 2 servers for maps-beta cluster - https://phabricator.wikimedia.org/T138600#2405018 (10Gehel)
[15:09:46] 	 (03CR) 10jenkins-bot: [V: 04-1] networks::constants: use slice_network_constants [puppet] - 10https://gerrit.wikimedia.org/r/291819 (owner: 10Alexandros Kosiaris)
[15:09:51] 	 (03PS1) 10Ema: Run configtest upon config file modification [puppet/nginx] - 10https://gerrit.wikimedia.org/r/295937 
[15:11:13] 	 (03PS7) 10Alexandros Kosiaris: networks::constants: use slice_network_constants [puppet] - 10https://gerrit.wikimedia.org/r/291819 
[15:16:13] 	 (03PS8) 10Alexandros Kosiaris: networks::constants: use slice_network_constants [puppet] - 10https://gerrit.wikimedia.org/r/291819 
[15:16:17] 	 (03PS2) 10Ema: Reload nginx upon config file modification [puppet/nginx] - 10https://gerrit.wikimedia.org/r/295937 
[15:19:07] 	 RECOVERY - Router interfaces on cr2-knams is OK: OK: host 91.198.174.246, interfaces up: 57, down: 0, dormant: 0, excluded: 0, unused: 0
[15:19:29] 	 (03CR) 10Southparkfan: Reload nginx upon config file modification (031 comment) [puppet/nginx] - 10https://gerrit.wikimedia.org/r/295937 (owner: 10Ema)
[15:19:54] 	 do we still have knams?
[15:20:26] 	 anomie: i gather you checked out to an older commit sorted and diffed?
[15:21:21] 	 SPF|Cloud: Ja
[15:22:39] 	 hm... now I become curious why we have that one if we already have esams (although I'm pretty sure there's a good reason for it - it's not like I doubt it's needed)... :)
[15:23:37] 	 dr0ptp4kt: I took `git show 425cf560e8572d0e26f0a28e3d627324005c6a35:dblists/mobilemainpagelegacy.dblist` and combined it with `/usr/local/bin/expanddblist '%% all.dblist - wikidata.dblist - wikiquote.dblist - wiktionary.dblist'`, then sorted, uniqued, and compared to the current version.
[15:23:56] 	 https://wikitech.wikimedia.org/wiki/Knams_cluster
[15:25:16] 	 yeah I know that page. Just wanted to ask if it's a slightly more detailed explanation can be given (if there's time for that) ;)
[15:27:06] 	 06Operations, 10ops-esams, 06DC-Ops, 10netops: Set up cr2-esams - https://phabricator.wikimedia.org/T118256#2405121 (10faidon)
[15:27:08] 	 06Operations, 10ops-esams, 06DC-Ops: Power cr2-esams PEM 2/PEM 3 - https://phabricator.wikimedia.org/T118166#2405119 (10faidon) 05Open>03Resolved Done!
[15:27:14] 	 You'd probably need to ask opsen
[15:28:10] 	 (03PS3) 10Ema: Reload nginx upon config file modification [puppet/nginx] - 10https://gerrit.wikimedia.org/r/295937 
[15:30:04] 	 SPF|Cloud: we do
[15:30:21] 	 for reaching peers of ours that are present there but not in esams
[15:30:22] 	 SPF|Cloud: esams is where our actual cache servers live at, and knams is network/router stuff only, and they're linked to each other.  It has to do with where we can get peering, transit, transport links best at, etc.
[15:30:34] 	 as Vancis/SARA are better connected
[15:31:13] 	 aha, I get it. thanks!
[15:38:38] 	 (03PS6) 10Gehel: Move es-tool to a proper python package [puppet] - 10https://gerrit.wikimedia.org/r/290765 
[15:42:20] 	 RECOVERY - mediawiki-installation DSH group on mw2246 is OK: OK
[15:44:31] 	 anomie: so if i understand correctly, the logical induction produced 461 (or is it 462? diff -rcs mobilemainpagelegacy.dblist old-mobilemainpagelegacy.dblist | grep -wc '+'
[15:44:31] 	  yielded 461, but the -removed.dblist file shows 462 lines) entries that would have to be added back in...
[15:44:48] 	 ...assuming
[15:45:26] 	 ...total inaccuracy in the audit of the sites. i don't if that was done via a content analysis or something like that. anyway, clearly a number of wikipedias were operational using that wg
[15:45:45] 	 jdlrobson: you around
[15:45:46] 	 ?
[15:49:51] 	 (03CR) 10jenkins-bot: [V: 04-1] Move es-tool to a proper python package [puppet] - 10https://gerrit.wikimedia.org/r/290765 (owner: 10Gehel)
[15:58:56] 	 06Operations, 10netops: Network ACL rules to allow traffic from Analytics to Production for port 9061 - https://phabricator.wikimedia.org/T138609#2405243 (10elukey)
[16:01:03] 	 06Operations, 10netops: Network ACL rules to allow traffic from Analytics to Production for port 9060 - https://phabricator.wikimedia.org/T138609#2405270 (10elukey)
[16:04:48] 	 dr0ptp4kt: still around ?
[16:05:17] 	 dr0ptp4kt: I dont think I can do anything about the task "en.wikipedia.org Mobile main page layout broken  " https://phabricator.wikimedia.org/T138578 
[16:06:17] 	 hashar: yeah, i'm here
[16:06:31] 	 dr0ptp4kt: it does not seem related to 1.28.0-wmf.7 is it ?
[16:07:53] 	 hashar: no. my recommendation is to swat patchset 2 that i submitted in https://phabricator.wikimedia.org/T138578#2405214
[16:10:31] 	 guess there are enough people able to deploy 
[16:10:47] 	 gotta write the .plan about the mediawiki train next steps
[16:14:00] 	 RECOVERY - changeprop endpoints health on scb1001 is OK: All endpoints are healthy
[16:20:43] 	 dr0ptp4kt: I posted a status update about wmf.7 rolling on https://phabricator.wikimedia.org/T136973#2405307  and sent an email to wikitech-l + engineering list
[16:23:20] 	 PROBLEM - changeprop endpoints health on scb1001 is CRITICAL: Generic error: Generic connection error: HTTPConnectionPool(host=10.64.0.16, port=7272): Max retries exceeded with url: /?spec (Caused by ProtocolError(Connection aborted., error(111, Connection refused)))
[16:28:50] 	 I am off have safe hackings!
[16:32:40] 	 dr0ptp4kt: I see 462 in your diff command. Are you using a version of mobilemainpagelegacy.dblist that has enwiki already added back in, maybe?
[16:33:48] 	 anomie: you know what, i think maybe so. sorry, on a call right now. i probably just messed up. so the numbers match in your file plus the diff?
[16:34:17] 	 dr0ptp4kt: Both have 462 lines for me.
[16:34:26] 	 anomie: cool
[16:52:10] 	 06Operations, 10procurement: esams: (3?) SAS 2TB disks for ms-be* systems - https://phabricator.wikimedia.org/T138618#2405440 (10fgiunchedi)
[16:54:28] 	 (03PS1) 10Ppchelko: Change-Prop: Remove the dependencies module. [puppet] - 10https://gerrit.wikimedia.org/r/295951 
[16:58:34] 	 (03CR) 10Giuseppe Lavagetto: [C: 032] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/295951 (owner: 10Ppchelko)
[17:02:49] 	 RECOVERY - changeprop endpoints health on scb1002 is OK: All endpoints are healthy
[17:03:20] 	 RECOVERY - changeprop endpoints health on scb1001 is OK: All endpoints are healthy
[17:05:06] <_joe_>	 !log re-started changeprop after disabling the dependency module
[17:05:11] 	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[17:09:35] 	 (03PS1) 10Ppchelko: Change-Prop: Limit the concurrency for ORES [puppet] - 10https://gerrit.wikimedia.org/r/295954 
[17:10:47] 	 (03CR) 10Mobrovac: [C: 031] Change-Prop: Limit the concurrency for ORES [puppet] - 10https://gerrit.wikimedia.org/r/295954 (owner: 10Ppchelko)
[17:12:54] 	 (03CR) 10Giuseppe Lavagetto: [C: 032] Change-Prop: Limit the concurrency for ORES [puppet] - 10https://gerrit.wikimedia.org/r/295954 (owner: 10Ppchelko)
[17:13:33] 	 (03PS2) 10Muehlenhoff: Move graphite ferm rules out of role:graphite::base [puppet] - 10https://gerrit.wikimedia.org/r/295919 
[17:19:56] 	 !log change-prop deploying df88a75b
[17:20:01] 	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[17:40:33] 	 (03PS1) 10Jdlrobson: English Wikipedia uses legacy main page formatting [mediawiki-config] - 10https://gerrit.wikimedia.org/r/295958 (https://phabricator.wikimedia.org/T138578) 
[17:40:54] 	 (03CR) 10jenkins-bot: [V: 04-1] English Wikipedia uses legacy main page formatting [mediawiki-config] - 10https://gerrit.wikimedia.org/r/295958 (https://phabricator.wikimedia.org/T138578) (owner: 10Jdlrobson)
[17:42:41] 	 PROBLEM - mobileapps endpoints health on scb1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[17:49:39] 	 RECOVERY - mobileapps endpoints health on scb1001 is OK: All endpoints are healthy
[17:57:09] 	 (03PS2) 10Jdlrobson: Whitelist Wikis that use older mp- prefix [mediawiki-config] - 10https://gerrit.wikimedia.org/r/295958 (https://phabricator.wikimedia.org/T138578) 
[17:57:48] 	 (03CR) 10jenkins-bot: [V: 04-1] Whitelist Wikis that use older mp- prefix [mediawiki-config] - 10https://gerrit.wikimedia.org/r/295958 (https://phabricator.wikimedia.org/T138578) (owner: 10Jdlrobson)
[17:58:36] 	 (03PS3) 10Jdlrobson: Whitelist Wikis that use older mp- prefix [mediawiki-config] - 10https://gerrit.wikimedia.org/r/295958 (https://phabricator.wikimedia.org/T138578) 
[18:11:15] 	 06Operations, 10MediaWiki-extensions-UniversalLanguageSelector, 10Wikimedia-SVG-rendering, 07I18n: MB Lateefi Fonts for Sindhi Wikipedia. - https://phabricator.wikimedia.org/T138136#2405619 (10mehtab.ahmed) I have sent mail to author about publishing the license. Can anyone tell me how much time this proce...
[18:13:07] 	 PROBLEM - Router interfaces on cr2-knams is CRITICAL: CRITICAL: host 91.198.174.246, interfaces up: 52, down: 1, dormant: 0, excluded: 0, unused: 0BRxe-0/0/3: down - Core: asw-esams:xe-0/0/32 (Relined, SMF4303) [10Gbps DF CWDM C47]BR
[18:17:37] 	 RECOVERY - Router interfaces on cr2-knams is OK: OK: host 91.198.174.246, interfaces up: 57, down: 0, dormant: 0, excluded: 0, unused: 0
[18:20:23] 	 06Operations, 10ops-ulsfo, 06DC-Ops: ulsfo temperature-related exceptions - https://phabricator.wikimedia.org/T119631#2405622 (10RobH) 05Open>03Resolved a:03RobH This was completed months ago, and I neglected to close out this task.
[18:34:32] 	 (03CR) 10MaxSem: [C: 04-1] Whitelist Wikis that use older mp- prefix (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/295958 (https://phabricator.wikimedia.org/T138578) (owner: 10Jdlrobson)
[18:35:19] 	 (03PS4) 10Jdlrobson: Whitelist Wikis that use older mp- prefix [mediawiki-config] - 10https://gerrit.wikimedia.org/r/295958 (https://phabricator.wikimedia.org/T138578) 
[18:36:14] 	 (03CR) 10Alex Monk: [C: 032] "This looks important enough to do now, and should be safe, but I'll be cautious of course" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/295958 (https://phabricator.wikimedia.org/T138578) (owner: 10Jdlrobson)
[18:36:54] 	 (03Merged) 10jenkins-bot: Whitelist Wikis that use older mp- prefix [mediawiki-config] - 10https://gerrit.wikimedia.org/r/295958 (https://phabricator.wikimedia.org/T138578) (owner: 10Jdlrobson)
[18:38:09] 	 ^ am deploying
[18:39:05] 	 go ahead
[18:39:06] 	 first via mw1017 etc.
[18:39:14] 	 I was about to myself)
[18:39:47] 	 :)
[18:40:05] 	 okay, sending it out
[18:41:21] 	 !log krenair@tin Synchronized dblists/mobilemainpagelegacy.dblist: https://gerrit.wikimedia.org/r/#/c/295958/4 - fix mobile main page rendering on a bunch of wikis, effectively putting them back to how they were a few days ago (duration: 00m 37s)
[18:41:25] 	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[18:41:38] 	 jdlrobson, ^
[18:41:56] 	 it looks correct to me now
[18:42:40] 	 purged enwiki main page
[18:43:15] 	 yup! all is good again!
[18:43:18] 	 w00t
[18:43:32] 	 thanks Krenair 
[18:43:36] 	 and MaxSem  :)
[19:24:49] 	 PROBLEM - mobileapps endpoints health on scb1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[19:27:08] 	 RECOVERY - mobileapps endpoints health on scb1001 is OK: All endpoints are healthy
[19:52:21] 	 (03PS5) 10Smalyshev: Prepare scap3 deployment for WDQS [puppet] - 10https://gerrit.wikimedia.org/r/295437 (https://phabricator.wikimedia.org/T129144) 
[20:52:26] 	 06Operations, 10Ops-Access-Requests, 06Discovery, 10Wikidata, 10Wikidata-Query-Service: Enable WDQS admins to enable/disable updater service - https://phabricator.wikimedia.org/T138627#2405780 (10Smalyshev)
[20:54:18] 	 (03PS1) 10Smalyshev: Allow wdqs admins to control wdqs-updater service [puppet] - 10https://gerrit.wikimedia.org/r/295968 (https://phabricator.wikimedia.org/T138627) 
[20:57:50] 	 06Operations, 10Ops-Access-Requests, 10Deployment-Systems, 06Discovery, and 4 others: Add wdqs-admins to deploy-services group - https://phabricator.wikimedia.org/T138628#2405800 (10Smalyshev)
[20:59:10] 	 (03PS13) 10MaxSem: Script to do the initial data load from OSM for Maps project [puppet] - 10https://gerrit.wikimedia.org/r/293105 (owner: 10Gehel)
[21:00:13] 	 (03CR) 10jenkins-bot: [V: 04-1] Script to do the initial data load from OSM for Maps project [puppet] - 10https://gerrit.wikimedia.org/r/293105 (owner: 10Gehel)
[21:07:32] 	 (03PS14) 10MaxSem: Script to do the initial data load from OSM for Maps project [puppet] - 10https://gerrit.wikimedia.org/r/293105 (owner: 10Gehel)
[21:08:41] 	 (03CR) 10jenkins-bot: [V: 04-1] Script to do the initial data load from OSM for Maps project [puppet] - 10https://gerrit.wikimedia.org/r/293105 (owner: 10Gehel)
[21:12:40] 	 gwicke, mobrovac: on https://en.wikipedia.org/wiki/Mercator_projection : Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "/mathoid/local/v1/":): {\displaystyle \begin{align} \lambda &= \lambda_0 + \frac{x}{R}, \qquad \varphi &= 2\tan^{-1}\left[\exp\left(\frac{y}{R}\right)\right] - \frac{\pi}{2} \,
[21:12:40] 	 . \end{align}} 
[21:51:20] 	 (03PS15) 10MaxSem: Script to do the initial data load from OSM for Maps project [puppet] - 10https://gerrit.wikimedia.org/r/293105 (owner: 10Gehel)
[21:52:47] 	 (03CR) 10jenkins-bot: [V: 04-1] Script to do the initial data load from OSM for Maps project [puppet] - 10https://gerrit.wikimedia.org/r/293105 (owner: 10Gehel)
[22:32:18] 	 06Operations, 10procurement: esams: (3?) SAS 2TB disks for ms-be* systems - https://phabricator.wikimedia.org/T138618#2405440 (10Peachey88) this task is in the wrong user space when it comes to #procurement
[23:06:45] 	 06Operations, 10Ops-Access-Requests: Requesting access to deployment hosts (tin/terbium) for Brian Wolff - https://phabricator.wikimedia.org/T138635#2405970 (10dpatrick)
[23:07:33] 	 06Operations, 10Ops-Access-Requests: Requesting access to deployment hosts (tin/terbium) for Brian Wolff - https://phabricator.wikimedia.org/T138635#2405983 (10dpatrick) @Bawolff Please update the description with the information requested at https://wikitech.wikimedia.org/wiki/Requesting_shell_access, and aft...
[23:07:44] 	 06Operations, 10Ops-Access-Requests: Requesting access to deployment hosts (tin/terbium) for Brian Wolff - https://phabricator.wikimedia.org/T138635#2405984 (10dpatrick)
[23:08:11] 	 06Operations, 10Ops-Access-Requests: Requesting access to deployment hosts (tin/terbium) for Brian Wolff - https://phabricator.wikimedia.org/T138635#2405970 (10Krenair) To deploy security patches he should get full access given by deployment rights, which includes all mw* servers etc.
[23:11:02] 	 PROBLEM - CirrusSearch codfw 95th percentile latency on graphite1001 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [1000.0]
[23:13:47] 	 06Operations, 10Ops-Access-Requests: Requesting access to deployment hosts (tin/terbium) for Brian Wolff - https://phabricator.wikimedia.org/T138635#2406009 (10dpatrick) >>! In T138635#2405986, @Krenair wrote: > To deploy security patches he should get full access given by deployment rights, which includes all...
[23:16:36] 	 06Operations, 10Ops-Access-Requests: Requesting access to deployment hosts (tin/terbium) for Brian Wolff - https://phabricator.wikimedia.org/T138635#2406022 (10Krenair) "deployment access" means the ability to actually use the deployment commands on tin/mira to sync MW code (not just the ability to log in to t...
[23:17:57] 	 06Operations, 10Ops-Access-Requests: Requesting access to deployment hosts (tin/terbium) for Brian Wolff - https://phabricator.wikimedia.org/T138635#2406023 (10dpatrick) >>! In T138635#2406022, @Krenair wrote: > "deployment access" means the ability to actually use the deployment commands on tin/mira to sync M...
[23:19:52] 	 RECOVERY - CirrusSearch codfw 95th percentile latency on graphite1001 is OK: OK: Less than 20.00% above the threshold [500.0]
[23:34:58] 	 06Operations, 06Discovery, 10Wikidata, 10Wikidata-Query-Service, 10hardware-requests: Hardware request for codfw WDQS server - https://phabricator.wikimedia.org/T138637#2406030 (10Smalyshev)