[00:00:04] addshore, hashar, anomie, no_justification, aude, MaxSem, twentyafterfour, RoanKattouw, Dereckson, thcipriani, Niharika, and zeljkof: (Dis)respected human, time to deploy Evening SWAT (Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180111T0000). Please do the needful. [00:00:05] RoanKattouw: A patch you scheduled for Evening SWAT (Max 8 patches) is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker. [00:00:17] RECOVERY - cassandra-b CQL 10.64.0.118:9042 on restbase1011 is OK: TCP OK - 0.036 second response time on 10.64.0.118 port 9042 [00:01:48] I'll do the SWAT, I'm the only customer anyway [00:04:07] (03PS2) 10Dzahn: rename phabricator_server to just phabricator [puppet] - 10https://gerrit.wikimedia.org/r/393709 [00:11:34] (03PS3) 10Dzahn: rename phabricator_server to just phabricator [puppet] - 10https://gerrit.wikimedia.org/r/393709 [00:13:26] (03CR) 1020after4: [C: 031] rename phabricator_server to just phabricator [puppet] - 10https://gerrit.wikimedia.org/r/393709 (owner: 10Dzahn) [00:14:18] (03PS1) 10Krinkle: keys: Simplify and update keys.html styling to match other simple pages [mediawiki-config] - 10https://gerrit.wikimedia.org/r/403569 [00:14:44] (03PS2) 10Krinkle: keys: Simplify and update keys.html styling to match other simple pages [mediawiki-config] - 10https://gerrit.wikimedia.org/r/403569 (https://phabricator.wikimedia.org/T181018) [00:17:56] (03PS22) 10Aaron Schulz: [WIP] Add mcrouter module and mcrouter_wancache profile [puppet] - 10https://gerrit.wikimedia.org/r/392221 [00:18:04] (03CR) 10Dzahn: [C: 032] "http://puppet-compiler.wmflabs.org/9694/" [puppet] - 10https://gerrit.wikimedia.org/r/393709 (owner: 10Dzahn) [00:18:14] (03PS4) 10Dzahn: rename phabricator_server to just phabricator [puppet] - 10https://gerrit.wikimedia.org/r/393709 [00:20:12] (03CR) 10Dzahn: "thanks 20after4, i'll first confirm prod and then check to change the role name on labs instances" [puppet] - 10https://gerrit.wikimedia.org/r/393709 (owner: 10Dzahn) [00:20:56] (03CR) 10Dzahn: "no-op in prod besides motd file" [puppet] - 10https://gerrit.wikimedia.org/r/393709 (owner: 10Dzahn) [00:26:27] (03PS3) 10Dzahn: wikilabels: convert roles to a profile and 2 roles [puppet] - 10https://gerrit.wikimedia.org/r/400252 [00:27:37] (03CR) 10Alex Monk: "so now it has foundation branding?" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/403569 (https://phabricator.wikimedia.org/T181018) (owner: 10Krinkle) [00:29:19] (03PS23) 10Aaron Schulz: [WIP] Add mcrouter module and mcrouter_wancache profile [puppet] - 10https://gerrit.wikimedia.org/r/392221 [00:31:23] (03CR) 10Alex Monk: "given the potential for legal review here I've held off cherry-picking on deployment-puppetmaster02 like normal, though I did find this as" [puppet] - 10https://gerrit.wikimedia.org/r/403326 (owner: 10Alex Monk) [00:32:38] RECOVERY - cassandra-c SSL 10.64.0.119:7001 on restbase1011 is OK: SSL OK - Certificate restbase1011-c valid until 2018-08-17 16:11:11 +0000 (expires in 218 days) [00:34:56] (03PS24) 10Aaron Schulz: [WIP] Add mcrouter module and mcrouter_wancache profile [puppet] - 10https://gerrit.wikimedia.org/r/392221 [00:36:45] (03CR) 10Dzahn: [C: 032] wikilabels: convert roles to a profile and 2 roles [puppet] - 10https://gerrit.wikimedia.org/r/400252 (owner: 10Dzahn) [00:46:05] (03PS1) 10Dmaza: Revert "Restrict sending mails to new users" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/403571 (https://phabricator.wikimedia.org/T184470) [00:47:10] (03PS1) 10Dzahn: wikilabels: fix profile class name in role [puppet] - 10https://gerrit.wikimedia.org/r/403572 [00:48:45] (03CR) 10Dzahn: [C: 032] wikilabels: fix profile class name in role [puppet] - 10https://gerrit.wikimedia.org/r/403572 (owner: 10Dzahn) [00:51:44] (03PS2) 10Dmaza: Revert "Restrict sending mails to new users" config change [mediawiki-config] - 10https://gerrit.wikimedia.org/r/403571 (https://phabricator.wikimedia.org/T184470) [00:57:22] !log bootstrapping restbase1011-c -- T184100 [00:57:34] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [00:57:35] T184100: Reprovision legacy Cassandra nodes into new cluster - https://phabricator.wikimedia.org/T184100 [01:00:04] twentyafterfour: I, the Bot under the Fountain, allow thee, The Deployer, to do Phabricator update deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180111T0100). [01:00:04] No GERRIT patches in the queue for this window AFAICS. [01:01:24] (03CR) 10Dzahn: [C: 032] "http://puppet-compiler.wmflabs.org/9695/" [puppet] - 10https://gerrit.wikimedia.org/r/399686 (https://phabricator.wikimedia.org/T177225) (owner: 10Dzahn) [01:01:30] (03PS2) 10Dzahn: confluent:kafka:jmxtrans: remove Ganglia support [puppet] - 10https://gerrit.wikimedia.org/r/399686 (https://phabricator.wikimedia.org/T177225) [01:03:32] (03Abandoned) 10Dzahn: drop optional Ganglia params from metrics::jvm [puppet/jmxtrans] - 10https://gerrit.wikimedia.org/r/399699 (https://phabricator.wikimedia.org/T177225) (owner: 10Dzahn) [01:04:54] (03PS25) 10Aaron Schulz: [WIP] Add mcrouter module and mcrouter_wancache profile [puppet] - 10https://gerrit.wikimedia.org/r/392221 [01:05:13] (03PS6) 10Dzahn: redis: delete ganglia monitoring script [puppet] - 10https://gerrit.wikimedia.org/r/399248 (https://phabricator.wikimedia.org/T177225) [01:05:45] (03PS7) 10Dzahn: redis: delete ganglia monitoring script [puppet] - 10https://gerrit.wikimedia.org/r/399248 (https://phabricator.wikimedia.org/T177225) [01:06:08] (03CR) 10Dzahn: "heh, amending fixed the permission issue on this change. so i can abandon it finally, since it's already done" [puppet] - 10https://gerrit.wikimedia.org/r/399248 (https://phabricator.wikimedia.org/T177225) (owner: 10Dzahn) [01:06:10] (03Abandoned) 10Dzahn: redis: delete ganglia monitoring script [puppet] - 10https://gerrit.wikimedia.org/r/399248 (https://phabricator.wikimedia.org/T177225) (owner: 10Dzahn) [01:06:48] (03PS2) 10Dzahn: site/logging/kafkatee: move includes from site to role [puppet] - 10https://gerrit.wikimedia.org/r/399702 [01:08:06] (03PS26) 10Aaron Schulz: [WIP] Add mcrouter module and mcrouter_wancache profile [puppet] - 10https://gerrit.wikimedia.org/r/392221 [01:09:35] (03CR) 10Dzahn: [C: 032] "http://puppet-compiler.wmflabs.org/9696/oxygen.eqiad.wmnet/" [puppet] - 10https://gerrit.wikimedia.org/r/399702 (owner: 10Dzahn) [01:11:17] (03CR) 10Dzahn: "noop on oxygen.eqiad.wmnet" [puppet] - 10https://gerrit.wikimedia.org/r/399702 (owner: 10Dzahn) [01:16:06] (03PS1) 10Thcipriani: Scap canary: cache last good deploy time [puppet] - 10https://gerrit.wikimedia.org/r/403574 (https://phabricator.wikimedia.org/T183999) [01:22:44] 10Operations, 10Ops-Access-Requests, 10Patch-For-Review: Request access to analytics cluster for bawolff - https://phabricator.wikimedia.org/T184582#3891944 (10RobH) [01:24:16] 10Operations, 10monitoring, 10User-fgiunchedi: Better organization for ops grafana dashboards - https://phabricator.wikimedia.org/T178690#3699692 (10Dzahn) We need the following new dashboards / URLs (noticed as part of T183873): - service cluster A overview (single link) (replace link on https://wikitech.w... [01:27:12] 10Operations, 10monitoring, 10Patch-For-Review: Uninstall ganglia from the fleet - https://phabricator.wikimedia.org/T177225#3891960 (10Dzahn) [02:30:54] 10Operations, 10Datasets-General-or-Unknown, 10Dumps-Generation, 10hardware-requests: Give misc dump crons their own host - https://phabricator.wikimedia.org/T181936#3892043 (10hoo) >>! In T181936#3890096, @ArielGlenn wrote: > How do you see your capacity needs increasing over the next few years? Do you h... [02:37:09] !log l10nupdate@tin scap sync-l10n completed (1.31.0-wmf.15) (duration: 11m 10s) [02:37:25] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [03:09:47] PROBLEM - puppet last run on ms-be1016 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [03:26:17] PROBLEM - MariaDB Slave Lag: s1 on dbstore1002 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 854.41 seconds [03:39:47] RECOVERY - puppet last run on ms-be1016 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures [03:56:08] 10Operations, 10Cloud-VPS, 10Toolforge, 10Patch-For-Review, 10cloud-services-team (Kanban): Cloud: Labvirt and instance reboots for Meltdown - https://phabricator.wikimedia.org/T184189#3892128 (10Andrew) First test was at Thu Jan 11 03:20:08 UTC 2018 {F12397322} Second test was at Thu Jan 11 03:35:23 U... [04:00:14] 10Operations, 10Cloud-VPS, 10Toolforge, 10Patch-For-Review, 10cloud-services-team (Kanban): Cloud: Labvirt and instance reboots for Meltdown - https://phabricator.wikimedia.org/T184189#3892131 (10Andrew) There's a slight change in performance but not much! At least on the newer labvirts it doesn't look... [04:01:17] RECOVERY - MariaDB Slave Lag: s1 on dbstore1002 is OK: OK slave_sql_lag Replication lag: 186.52 seconds [04:23:05] (03PS1) 10Revi: Create extendedconfirmed for kowiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/403584 (https://phabricator.wikimedia.org/T184675) [04:24:05] (03CR) 10jerkins-bot: [V: 04-1] Create extendedconfirmed for kowiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/403584 (https://phabricator.wikimedia.org/T184675) (owner: 10Revi) [04:28:57] uhm I thought my patch didn't have any ; in it [04:30:24] (03CR) 10Revi: "Fail message:" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/403584 (https://phabricator.wikimedia.org/T184675) (owner: 10Revi) [05:09:07] PROBLEM - MariaDB Slave Lag: m3 on db1059 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 321.45 seconds [05:24:08] RECOVERY - cassandra-c CQL 10.64.0.119:9042 on restbase1011 is OK: TCP OK - 0.040 second response time on 10.64.0.119 port 9042 [05:24:16] (03PS3) 10Kaldari: Revert "Restrict sending mails to new users" config change [mediawiki-config] - 10https://gerrit.wikimedia.org/r/403571 (https://phabricator.wikimedia.org/T184470) (owner: 10Dmaza) [05:27:15] (03CR) 10Kaldari: [C: 031] "Looks good. Feel free to schedule SWAT deployment for this." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/403571 (https://phabricator.wikimedia.org/T184470) (owner: 10Dmaza) [05:38:13] (03CR) 10Giuseppe Lavagetto: [C: 031] "Confirmed none of the IPs is occupied with" [dns] - 10https://gerrit.wikimedia.org/r/403425 (owner: 10Cmjohnson) [05:43:30] (03PS7) 10Giuseppe Lavagetto: puppetdb: refactor to role/profile [puppet] - 10https://gerrit.wikimedia.org/r/403388 [05:45:22] (03CR) 10Giuseppe Lavagetto: [C: 032] puppetdb: refactor to role/profile [puppet] - 10https://gerrit.wikimedia.org/r/403388 (owner: 10Giuseppe Lavagetto) [06:01:17] PROBLEM - puppet last run on sarin is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [06:07:28] PROBLEM - puppet last run on neodymium is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [06:17:20] !log Force BBU relearn on db1059 - T184160 [06:17:35] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:17:35] T184160: db1059 BBU issues - https://phabricator.wikimedia.org/T184160 [06:19:32] (03PS1) 10Marostegui: Revert "db-eqiad.php: Depool db1096:3315" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/403587 [06:19:36] (03PS2) 10Marostegui: Revert "db-eqiad.php: Depool db1096:3315" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/403587 [06:21:05] !log Upgrade mariadb+kernel on db1089 [06:21:18] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:21:48] <_joe_> sarin/neodymium is me, I'll fix it in a few [06:22:17] (03CR) 10Marostegui: [C: 032] Revert "db-eqiad.php: Depool db1096:3315" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/403587 (owner: 10Marostegui) [06:23:40] (03Merged) 10jenkins-bot: Revert "db-eqiad.php: Depool db1096:3315" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/403587 (owner: 10Marostegui) [06:25:12] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Repool db1096:3315 - T174569 (duration: 01m 03s) [06:25:25] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:25:26] T174569: Schema change for refactored comment storage - https://phabricator.wikimedia.org/T174569 [06:26:23] (03PS1) 10Marostegui: db-eqiad.php: Depool db1082 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/403588 (https://phabricator.wikimedia.org/T174569) [06:26:37] (03CR) 10jenkins-bot: Revert "db-eqiad.php: Depool db1096:3315" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/403587 (owner: 10Marostegui) [06:29:16] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Depool db1082 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/403588 (https://phabricator.wikimedia.org/T174569) (owner: 10Marostegui) [06:30:47] (03Merged) 10jenkins-bot: db-eqiad.php: Depool db1082 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/403588 (https://phabricator.wikimedia.org/T174569) (owner: 10Marostegui) [06:31:00] (03CR) 10jenkins-bot: db-eqiad.php: Depool db1082 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/403588 (https://phabricator.wikimedia.org/T174569) (owner: 10Marostegui) [06:32:32] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Depool db1082 - T174569 (duration: 01m 02s) [06:32:34] !log Deploy schema change on db1082.s5 with replication (this will generate lag on labs) - T174569 [06:32:43] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:32:43] T174569: Schema change for refactored comment storage - https://phabricator.wikimedia.org/T174569 [06:32:54] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:40:07] <_joe_> c: care to explain that ban, please? [06:40:18] RECOVERY - MariaDB Slave Lag: m3 on db1059 is OK: OK slave_sql_lag Replication lag: 36.51 seconds [06:40:32] <_joe_> I don't really like random bans for things not happening in this channel [06:41:08] it's not random, and it's a ban forward and not just a ban. dude has had a flappy connection for over a day [06:41:27] (03CR) 10Marostegui: "Why not changing site.pp and mysql-core_codfw.yaml at the same time here?" [puppet] - 10https://gerrit.wikimedia.org/r/403451 (https://phabricator.wikimedia.org/T176243) (owner: 10Jcrespo) [06:41:32] <_joe_> flappy connection is not a reason for a ban in this channel, until there is no annoyance [06:41:41] it is a reason because I said it is [06:41:52] <_joe_> c: seriously? [06:42:03] Yes, I'm quite serious. [06:42:12] <_joe_> do you think this behaviour is constructive or civil? [06:42:22] Wow... [06:43:19] here and a few other channels where he has been constantly pinging out for hours on end, when he returns he will notice he was forwarded to a different channel (or the ban will automatically expire whatever comes first) [06:43:50] <_joe_> what I am saying is I live in this channel and didn't notice that [06:43:59] <_joe_> *here* [06:44:01] c: That doesn't look like a reason to ban someone to me [06:44:58] marostegui: it's more common than you think, so common that freenode dedicated a channel for users who have bad connections to be forwarded to [06:45:03] _joe_: probably because of all the bots [06:46:38] <_joe_> c: also, I really think you should learn how to respond with civility to people asking for explanation of your actions with civility. I might add I have ops rights in this channel and I was asking a fellow operator the reason of his choice. Being answered with "<@c> it is a reason because I said it is" is childish at best, an abuse of power at worst [06:46:57] <_joe_> I'll go with childish and get back to working on the wikimedia infrastructure. Adieu. [06:47:33] you might have op rights in this channel but I have op rights in every channel, it was a routine forward in one of several and not specific to this channel [06:53:46] c: curious, do you do this for all channels? I didn't see it in eg the channels I manage in the #wikimedia- namespace. [06:54:34] (03PS1) 10Giuseppe Lavagetto: cumin: stop cross-referencing the puppetdb master [puppet] - 10https://gerrit.wikimedia.org/r/403590 [06:55:38] <_joe_> this will fix sarin/neodymium [06:55:48] greg-g: the channels i noticed rfarrand flapping in, i did a quick scrollback and noticed it's been happening all day. i have channels split between two clients so i might have missed some [06:56:25] oh, it was rfarrand you banned? please unban, she needs access to our channels to do work. [06:56:30] <_joe_> rotfl [06:56:44] hard to do work when you're constantly pinging out [06:56:50] not really [06:56:53] for hours on end [06:57:42] c: please unban [06:58:23] ok [06:58:36] and in the rest of the channels you banned them, plesae [06:58:38] (03CR) 10Giuseppe Lavagetto: [C: 032] cumin: stop cross-referencing the puppetdb master [puppet] - 10https://gerrit.wikimedia.org/r/403590 (owner: 10Giuseppe Lavagetto) [07:00:36] c: ^ [07:02:28] RECOVERY - puppet last run on neodymium is OK: OK: Puppet is currently enabled, last run 11 seconds ago with 0 failures [07:03:18] (03PS1) 10Marostegui: db-eqiad.php: Depool db1099 s1 and s8 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/403591 (https://phabricator.wikimedia.org/T162807) [07:03:58] <_joe_> well that didn't require any effort at all [07:04:05] greg-g: you may want to inform them that putting IRC on somewhere like irccloud, or a bouncer or screen+irssi, or remote anything would be immensely better than off their satellite internet connection [07:05:49] c: sure, I'll make a note of that when I see them next. Can we also be more civil in our discourse in the future, please? "because I said it is" and "mines bigger than yours" aren't great ways to communicate with peers. [07:06:17] RECOVERY - puppet last run on sarin is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [07:08:47] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Depool db1099 s1 and s8 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/403591 (https://phabricator.wikimedia.org/T162807) (owner: 10Marostegui) [07:09:01] (03PS14) 10Giuseppe Lavagetto: role::puppetmaster::puppetdb: add Prometheus monitoring for puppetdb [puppet] - 10https://gerrit.wikimedia.org/r/394966 (owner: 10Elukey) [07:10:16] (03Merged) 10jenkins-bot: db-eqiad.php: Depool db1099 s1 and s8 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/403591 (https://phabricator.wikimedia.org/T162807) (owner: 10Marostegui) [07:10:26] (03CR) 10jenkins-bot: db-eqiad.php: Depool db1099 s1 and s8 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/403591 (https://phabricator.wikimedia.org/T162807) (owner: 10Marostegui) [07:10:53] (03CR) 10Giuseppe Lavagetto: [C: 032] role::puppetmaster::puppetdb: add Prometheus monitoring for puppetdb [puppet] - 10https://gerrit.wikimedia.org/r/394966 (owner: 10Elukey) [07:12:12] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Depool db1099:3311 db1099:3318 - T162807 T184256 (duration: 01m 02s) [07:12:24] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:12:27] T162807: Run pt-table-checksum on s1 (enwiki) - https://phabricator.wikimedia.org/T162807 [07:15:39] <_joe_> expect some puppet failures, I'm installing the jmx exporter on nihal and that will restart puppetdb [07:17:43] !log Removed 2FA from Amjaabc [07:17:52] _joe_: thanks! [07:17:53] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:18:00] going to check after you run puppet [07:18:03] <_joe_> elukey: heya, worked like a charm [07:18:08] <_joe_> see on nihal [07:18:10] awesome [07:18:17] PROBLEM - puppet last run on mw2224 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [07:18:36] <_joe_> I already found a thing that made me laugh about puppetlabs, btw. "999th percentile" [07:19:40] curl http://10.192.16.184:9400/metrics -s | grep -v "#" |sort works perfectly \o/ [07:21:08] PROBLEM - puppet last run on ms-fe2006 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [07:21:38] PROBLEM - puppet last run on db1063 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [07:21:38] PROBLEM - puppet last run on es2011 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [07:21:38] PROBLEM - puppet last run on puppetmaster1002 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [07:22:07] PROBLEM - puppet last run on cp4023 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [07:22:11] the only strange thing is that puppetdb on nihal runs with -Xmx4G [07:22:47] <_joe_> elukey: on nitrogen as well, I must have messed up something [07:23:01] <_joe_> lemme see [07:23:54] <_joe_> heh yeah [07:23:57] and pcc was showing it up https://puppet-compiler.wmflabs.org/compiler02/9689/nitrogen.eqiad.wmnet/ [07:23:58] <_joe_> PEBKAC [07:24:10] I didn't see it in the review sorry :( [07:24:17] PROBLEM - puppet last run on rdb1008 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [07:24:27] PROBLEM - puppet last run on wtp1037 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [07:24:27] PROBLEM - puppet last run on wtp1027 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [07:24:27] PROBLEM - puppet last run on db1080 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [07:24:27] PROBLEM - puppet last run on poolcounter1002 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [07:24:41] <_joe_> this is expected ^^ [07:24:51] !log Drop external_user table from s3 - T184247 [07:25:02] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:25:03] T184247: Drop `external_user` from all databases - https://phabricator.wikimedia.org/T184247 [07:25:40] (03PS1) 10Giuseppe Lavagetto: puppetdb: fix hiera key [puppet] - 10https://gerrit.wikimedia.org/r/403600 [07:26:02] <_joe_> elukey: ^^ [07:26:17] PROBLEM - puppet last run on ms-be1031 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [07:26:27] PROBLEM - puppet last run on restbase-dev1005 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [07:26:38] PROBLEM - puppet last run on elastic1041 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [07:27:35] (03CR) 10Giuseppe Lavagetto: [C: 032] puppetdb: fix hiera key [puppet] - 10https://gerrit.wikimedia.org/r/403600 (owner: 10Giuseppe Lavagetto) [07:27:53] ah snap [07:33:57] PROBLEM - puppet last run on mw1287 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [07:34:07] PROBLEM - puppet last run on mw1220 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [07:34:18] RECOVERY - puppet last run on wtp1037 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [07:34:47] PROBLEM - puppet last run on lvs1003 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [07:36:17] PROBLEM - puppet last run on dbproxy1011 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [07:36:18] PROBLEM - puppet last run on wtp1042 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [07:36:18] PROBLEM - puppet last run on es1015 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [07:36:18] PROBLEM - puppet last run on ms-fe1008 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [07:36:28] PROBLEM - puppet last run on releases1001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [07:36:38] PROBLEM - puppet last run on wtp1046 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [07:36:47] PROBLEM - puppet last run on mw1233 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [07:36:47] PROBLEM - puppet last run on scb1001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [07:36:58] PROBLEM - puppet last run on phab2001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [07:36:58] PROBLEM - puppet last run on labsdb1010 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [07:37:27] PROBLEM - puppet last run on labsdb1011 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [07:40:18] (03PS1) 10Marostegui: Revert "db-eqiad.php: Depool db1082" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/403601 [07:40:27] (03PS2) 10Marostegui: Revert "db-eqiad.php: Depool db1082" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/403601 [07:41:55] (03CR) 10Marostegui: [C: 032] Revert "db-eqiad.php: Depool db1082" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/403601 (owner: 10Marostegui) [07:43:25] (03Merged) 10jenkins-bot: Revert "db-eqiad.php: Depool db1082" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/403601 (owner: 10Marostegui) [07:43:41] (03CR) 10jenkins-bot: Revert "db-eqiad.php: Depool db1082" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/403601 (owner: 10Marostegui) [07:44:43] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Repool db1082 - T174569 (duration: 01m 03s) [07:44:55] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:44:56] T174569: Schema change for refactored comment storage - https://phabricator.wikimedia.org/T174569 [07:46:06] (03PS1) 10Marostegui: db-eqiad.php: Depool db1110 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/403604 (https://phabricator.wikimedia.org/T174569) [07:46:07] RECOVERY - puppet last run on ms-fe2006 is OK: OK: Puppet is currently enabled, last run 4 seconds ago with 0 failures [07:46:38] RECOVERY - puppet last run on es2011 is OK: OK: Puppet is currently enabled, last run 36 seconds ago with 0 failures [07:47:25] !log reboot remaining mediawiki API servers for kernel security update (along with update to HHVM 3.18.6) [07:47:35] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:47:40] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Depool db1110 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/403604 (https://phabricator.wikimedia.org/T174569) (owner: 10Marostegui) [07:48:17] RECOVERY - puppet last run on mw2224 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [07:49:14] (03Merged) 10jenkins-bot: db-eqiad.php: Depool db1110 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/403604 (https://phabricator.wikimedia.org/T174569) (owner: 10Marostegui) [07:49:24] (03CR) 10jenkins-bot: db-eqiad.php: Depool db1110 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/403604 (https://phabricator.wikimedia.org/T174569) (owner: 10Marostegui) [07:50:36] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Depool db1110 - T174569 (duration: 01m 03s) [07:50:43] !log Deploy schema change on db1110 - T174569 [07:50:47] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:50:48] T174569: Schema change for refactored comment storage - https://phabricator.wikimedia.org/T174569 [07:50:57] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:51:17] RECOVERY - puppet last run on ms-be1031 is OK: OK: Puppet is currently enabled, last run 22 seconds ago with 0 failures [07:51:27] RECOVERY - puppet last run on restbase-dev1005 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [07:51:38] RECOVERY - puppet last run on db1063 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [07:51:38] RECOVERY - puppet last run on elastic1041 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [07:51:38] RECOVERY - puppet last run on puppetmaster1002 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [07:51:58] RECOVERY - puppet last run on cp4023 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [07:54:17] RECOVERY - puppet last run on rdb1008 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures [07:54:18] RECOVERY - puppet last run on wtp1027 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures [07:54:27] RECOVERY - puppet last run on db1080 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures [07:54:27] RECOVERY - puppet last run on poolcounter1002 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [08:02:27] RECOVERY - puppet last run on labsdb1011 is OK: OK: Puppet is currently enabled, last run 15 seconds ago with 0 failures [08:03:57] RECOVERY - puppet last run on mw1287 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [08:04:07] RECOVERY - puppet last run on mw1220 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [08:04:47] RECOVERY - puppet last run on lvs1003 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [08:05:58] PROBLEM - puppet last run on elastic2006 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [08:06:17] PROBLEM - puppet last run on db2055 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [08:06:17] RECOVERY - puppet last run on dbproxy1011 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures [08:06:17] RECOVERY - puppet last run on wtp1042 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures [08:06:17] RECOVERY - puppet last run on es1015 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures [08:06:18] RECOVERY - puppet last run on ms-fe1008 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures [08:06:27] PROBLEM - puppet last run on wtp2006 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [08:06:28] RECOVERY - puppet last run on releases1001 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures [08:06:38] RECOVERY - puppet last run on wtp1046 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures [08:06:47] RECOVERY - puppet last run on mw1233 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [08:06:47] RECOVERY - puppet last run on scb1001 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures [08:06:58] RECOVERY - puppet last run on phab2001 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures [08:07:00] RECOVERY - puppet last run on labsdb1010 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures [08:07:18] PROBLEM - puppet last run on cp2021 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [08:07:27] PROBLEM - puppet last run on mw2201 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [08:08:17] PROBLEM - puppet last run on labstore2002 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [08:08:25] <_joe_> uhm [08:08:35] <_joe_> I guess nihal just ran puppet? [08:09:52] 8m ago [08:09:56] puppetdb was restarted [08:11:06] mw2201 returns 502 from nihal [08:12:10] forced a re-run manually, worked [08:12:27] RECOVERY - puppet last run on mw2201 is OK: OK: Puppet is currently enabled, last run 26 seconds ago with 0 failures [08:12:53] and I don't see weird things like ooms on nihal [08:17:56] !log rolling restart of logstash for kernel upgrade [08:18:06] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:21:38] !log gehel@puppetmaster1001 conftool action : set/pooled=no; selector: name=logstash1007.eqiad.wmnet [08:21:49] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:27:38] !log Fix data drifts on enwiki.archive on codfw - T162807 [08:27:48] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:27:49] T162807: Run pt-table-checksum on s1 (enwiki) - https://phabricator.wikimedia.org/T162807 [08:31:27] RECOVERY - puppet last run on wtp2006 is OK: OK: Puppet is currently enabled, last run 43 seconds ago with 0 failures [08:31:48] 10Operations, 10Wikimedia-Logstash: logstash group1 dashboard incorrectly shows testwikidatawiki - https://phabricator.wikimedia.org/T184655#3892348 (10Addshore) [08:32:18] RECOVERY - puppet last run on cp2021 is OK: OK: Puppet is currently enabled, last run 37 seconds ago with 0 failures [08:33:17] RECOVERY - puppet last run on labstore2002 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [08:35:58] RECOVERY - puppet last run on elastic2006 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures [08:36:17] RECOVERY - puppet last run on db2055 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures [08:36:53] !log powercycling wtp2013 (apparently didn't come back up after reboot) [08:37:04] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:38:57] RECOVERY - Host wtp2013 is UP: PING OK - Packet loss = 0%, RTA = 0.32 ms [08:45:38] (03CR) 10Jcrespo: "Ok" [puppet] - 10https://gerrit.wikimedia.org/r/403451 (https://phabricator.wikimedia.org/T176243) (owner: 10Jcrespo) [08:46:58] (03CR) 10Marostegui: "> Ok" [puppet] - 10https://gerrit.wikimedia.org/r/403451 (https://phabricator.wikimedia.org/T176243) (owner: 10Jcrespo) [08:48:54] (03PS1) 10Reedy: Add fonts-nono to mediawiki::packages::fonts [puppet] - 10https://gerrit.wikimedia.org/r/403605 (https://phabricator.wikimedia.org/T184664) [08:49:38] (03PS2) 10Reedy: Add fonts-noto to mediawiki::packages::fonts [puppet] - 10https://gerrit.wikimedia.org/r/403605 (https://phabricator.wikimedia.org/T184664) [08:50:17] 10Operations, 10Commons, 10Wikimedia-SVG-rendering, 10media-storage, 10Patch-For-Review: Install Noto fonts on scaling servers for SVG rendering - https://phabricator.wikimedia.org/T184664#3891777 (10Reedy) https://packages.debian.org/jessie/fonts-noto [08:55:40] !log Upgrade db1110 kernel - T184256 [08:55:52] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:57:00] !log reboot remaining mediawiki app servers in eqiad for kernel security update (along with update to HHVM 3.18.6) [08:57:11] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:00:01] !log logstash rolling restart completed [09:00:12] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:02:50] (03PS1) 10Marostegui: db-eqiad.php: Repool db1110 with low weight [mediawiki-config] - 10https://gerrit.wikimedia.org/r/403606 (https://phabricator.wikimedia.org/T184256) [09:04:22] (03CR) 10jerkins-bot: [V: 04-1] db-eqiad.php: Repool db1110 with low weight [mediawiki-config] - 10https://gerrit.wikimedia.org/r/403606 (https://phabricator.wikimedia.org/T184256) (owner: 10Marostegui) [09:04:29] !log reboot analytics1051->1054 for kernel updates [09:04:38] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:07:01] (03PS2) 10Jcrespo: mariadb: Promote db2040 to be the codfw-s7 master instead of db2029 [puppet] - 10https://gerrit.wikimedia.org/r/403451 (https://phabricator.wikimedia.org/T176243) [09:08:20] !log reboot of relforge* for kernel upgrade [09:08:31] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:14:17] 10Operations, 10monitoring: Netbox: postgres cannot be restarted w/ current config - https://phabricator.wikimedia.org/T184634#3892377 (10Volans) [09:24:02] (03PS4) 10Filippo Giunchedi: graphite: cleanup stale ORES metrics [puppet] - 10https://gerrit.wikimedia.org/r/401917 (https://phabricator.wikimedia.org/T169969) [09:24:23] (03CR) 10Marostegui: [C: 031] mariadb: Promote db2040 to be the codfw-s7 master instead of db2029 [puppet] - 10https://gerrit.wikimedia.org/r/403451 (https://phabricator.wikimedia.org/T176243) (owner: 10Jcrespo) [09:24:58] !log relforge reboot completed [09:25:09] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:25:28] (03PS3) 10Hashar: contint: convert Apache proxying to profiles [puppet] - 10https://gerrit.wikimedia.org/r/399311 [09:25:30] (03PS2) 10Marostegui: db-eqiad.php: Repool db1110 with low weight [mediawiki-config] - 10https://gerrit.wikimedia.org/r/403606 (https://phabricator.wikimedia.org/T184256) [09:25:32] (03CR) 10Filippo Giunchedi: [C: 032] graphite: cleanup stale ORES metrics [puppet] - 10https://gerrit.wikimedia.org/r/401917 (https://phabricator.wikimedia.org/T169969) (owner: 10Filippo Giunchedi) [09:27:49] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Repool db1110 with low weight [mediawiki-config] - 10https://gerrit.wikimedia.org/r/403606 (https://phabricator.wikimedia.org/T184256) (owner: 10Marostegui) [09:27:54] 10Operations, 10monitoring: Netbox: postgres cannot be restarted w/ current config - https://phabricator.wikimedia.org/T184634#3892416 (10Volans) [09:29:13] (03Merged) 10jenkins-bot: db-eqiad.php: Repool db1110 with low weight [mediawiki-config] - 10https://gerrit.wikimedia.org/r/403606 (https://phabricator.wikimedia.org/T184256) (owner: 10Marostegui) [09:29:23] (03CR) 10jenkins-bot: db-eqiad.php: Repool db1110 with low weight [mediawiki-config] - 10https://gerrit.wikimedia.org/r/403606 (https://phabricator.wikimedia.org/T184256) (owner: 10Marostegui) [09:31:03] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Repool db1110 with low weight - T174569 (duration: 01m 08s) [09:31:15] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:31:16] T174569: Schema change for refactored comment storage - https://phabricator.wikimedia.org/T174569 [09:32:35] !log cleanup ores metrics older than 30d - T169969 [09:32:45] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:32:46] T169969: Regularly purge old ores graphite metrics - https://phabricator.wikimedia.org/T169969 [09:34:06] 10Operations, 10monitoring, 10Graphite, 10User-fgiunchedi: Audit groups of metrics in Graphite that allocate a lot of disk space - https://phabricator.wikimedia.org/T1075#3892439 (10fgiunchedi) [09:34:11] 10Operations, 10ORES, 10Graphite, 10Patch-For-Review, and 2 others: Regularly purge old ores graphite metrics - https://phabricator.wikimedia.org/T169969#3892437 (10fgiunchedi) 05Open>03Resolved All done! Agreed the parameter isn't the best, and naming is hard :( This task is done from my POV so tenta... [09:34:38] !log reboot analytics1055->1058 for kernel updates [09:34:48] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:35:22] (03CR) 10Hashar: "Rebased and compiled again at https://puppet-compiler.wmflabs.org/compiler02/9697/" [puppet] - 10https://gerrit.wikimedia.org/r/399311 (owner: 10Hashar) [09:36:02] (03CR) 10Alexandros Kosiaris: [C: 032] hfst: New upstream release [debs/contenttranslation/hfst] - 10https://gerrit.wikimedia.org/r/394967 (https://phabricator.wikimedia.org/T181463) (owner: 10KartikMistry) [09:40:43] 10Operations, 10Datasets-General-or-Unknown, 10Dumps-Generation, 10hardware-requests: Give misc dump crons their own host - https://phabricator.wikimedia.org/T181936#3892448 (10Nikerabbit) For Content Translation we are expecting a stable increase in dumps size. See https://en.wikipedia.org/wiki/Special:Co... [09:41:26] !log reboot bast4002 for kernel security update [09:41:36] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:44:00] 10Operations, 10netops, 10Patch-For-Review: Evaluate NetBox as a Racktables replacement & IPAM - https://phabricator.wikimedia.org/T170144#3420547 (10Volans) While trying to fix the issues after the reboot for the kernel upgrade, I've opened T184634. But now it seems that the Postgres DB is empty (no tables... [09:54:07] (03Abandoned) 10Hashar: Jenkins job validation (DO NOT SUBMIT) [puppet/zookeeper] - 10https://gerrit.wikimedia.org/r/66906 (owner: 10Hashar) [09:55:01] (03PS1) 10Marostegui: db-eqiad.php: Increase weight for db1110 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/403610 [09:55:39] (03Abandoned) 10Hashar: Add .gitreview [software/conftool] - 10https://gerrit.wikimedia.org/r/392795 (owner: 10Hashar) [09:55:42] (03Abandoned) 10Hashar: Fix flake8 issues [software/conftool] - 10https://gerrit.wikimedia.org/r/392793 (owner: 10Hashar) [09:57:09] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Increase weight for db1110 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/403610 (owner: 10Marostegui) [09:58:36] (03Merged) 10jenkins-bot: db-eqiad.php: Increase weight for db1110 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/403610 (owner: 10Marostegui) [09:58:46] (03CR) 10jenkins-bot: db-eqiad.php: Increase weight for db1110 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/403610 (owner: 10Marostegui) [10:00:25] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Increase db1110 weight (duration: 01m 06s) [10:00:35] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:00:48] !log reboot analytics1059-61 for kernel updates [10:00:58] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:02:16] (03CR) 10Filippo Giunchedi: "LGTM, I haven't checked in detail the code on mw side, though when a file is patrolled does the header get removed altogether from swift?" [puppet] - 10https://gerrit.wikimedia.org/r/402471 (https://phabricator.wikimedia.org/T167400) (owner: 10Gergő Tisza) [10:06:10] !log rebooting rhenium for kernel security update [10:06:12] (03PS1) 10Jcrespo: mariadb: Move db2054 away from /tmp [puppet] - 10https://gerrit.wikimedia.org/r/403611 (https://phabricator.wikimedia.org/T148507) [10:06:18] (03CR) 10Filippo Giunchedi: "> LGTM, I haven't checked in detail the code on mw side, though when" [puppet] - 10https://gerrit.wikimedia.org/r/402471 (https://phabricator.wikimedia.org/T167400) (owner: 10Gergő Tisza) [10:06:21] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:06:27] (03PS2) 10Giuseppe Lavagetto: base::resolving: remove useless "else" clause [puppet] - 10https://gerrit.wikimedia.org/r/403439 [10:06:29] (03PS1) 10Giuseppe Lavagetto: base::resolving: properly extend the tests [puppet] - 10https://gerrit.wikimedia.org/r/403612 [10:07:31] (03PS2) 10Jcrespo: mariadb: Move db2054 away from /tmp [puppet] - 10https://gerrit.wikimedia.org/r/403611 (https://phabricator.wikimedia.org/T148507) [10:07:58] !log upgrade and restart db2054 [10:08:07] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:08:33] (03CR) 10Jcrespo: [C: 032] mariadb: Move db2054 away from /tmp [puppet] - 10https://gerrit.wikimedia.org/r/403611 (https://phabricator.wikimedia.org/T148507) (owner: 10Jcrespo) [10:13:12] (03PS1) 10Elukey: README.md: fix virtual env suggestions [software/druid_exporter] - 10https://gerrit.wikimedia.org/r/403613 [10:14:42] !log migrating instances off ganeti1002 for subsequent reboot for kernel security update [10:14:54] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:15:47] (03CR) 10Elukey: [V: 032 C: 032] README.md: fix virtual env suggestions [software/druid_exporter] - 10https://gerrit.wikimedia.org/r/403613 (owner: 10Elukey) [10:20:28] !log upload hfst_3.13.0~r3461-1+wmf1_amd64 to apt.wikimedia.org/jessie-wikimedia/main T181463 [10:20:39] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:20:40] T181463: Update hfst from upstream - https://phabricator.wikimedia.org/T181463 [10:21:13] (03CR) 10Alexandros Kosiaris: "recheck" [debs/contenttranslation/apertium] - 10https://gerrit.wikimedia.org/r/395176 (https://phabricator.wikimedia.org/T181464) (owner: 10KartikMistry) [10:21:32] (03PS3) 10Jcrespo: mariadb: Promote db2040 to be the codfw-s7 master instead of db2029 [puppet] - 10https://gerrit.wikimedia.org/r/403451 (https://phabricator.wikimedia.org/T176243) [10:21:34] (03PS1) 10Jcrespo: mariadb: Move db2051 socket away from /tmp [puppet] - 10https://gerrit.wikimedia.org/r/403614 (https://phabricator.wikimedia.org/T148507) [10:22:03] (03PS2) 10Jcrespo: mariadb: Move db2061 socket away from /tmp [puppet] - 10https://gerrit.wikimedia.org/r/403614 (https://phabricator.wikimedia.org/T148507) [10:23:01] (03CR) 10Jcrespo: [C: 032] mariadb: Move db2061 socket away from /tmp [puppet] - 10https://gerrit.wikimedia.org/r/403614 (https://phabricator.wikimedia.org/T148507) (owner: 10Jcrespo) [10:23:09] (03PS3) 10Jcrespo: mariadb: Move db2061 socket away from /tmp [puppet] - 10https://gerrit.wikimedia.org/r/403614 (https://phabricator.wikimedia.org/T148507) [10:23:52] !log upgrade and restart db2061 [10:24:01] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:25:27] (03CR) 10Alexandros Kosiaris: [C: 032] apertium: Update for new hfst [debs/contenttranslation/apertium] - 10https://gerrit.wikimedia.org/r/395176 (https://phabricator.wikimedia.org/T181464) (owner: 10KartikMistry) [10:25:43] (03PS1) 10Marostegui: db-eqiad.php: Restore db1110 original weight [mediawiki-config] - 10https://gerrit.wikimedia.org/r/403615 [10:27:33] !log rolling reboot of sca/zotero clusters for kernel security update [10:27:42] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:28:14] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Restore db1110 original weight [mediawiki-config] - 10https://gerrit.wikimedia.org/r/403615 (owner: 10Marostegui) [10:29:47] (03Merged) 10jenkins-bot: db-eqiad.php: Restore db1110 original weight [mediawiki-config] - 10https://gerrit.wikimedia.org/r/403615 (owner: 10Marostegui) [10:30:01] (03CR) 10jenkins-bot: db-eqiad.php: Restore db1110 original weight [mediawiki-config] - 10https://gerrit.wikimedia.org/r/403615 (owner: 10Marostegui) [10:31:20] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Restore db1110 original weight (duration: 01m 04s) [10:31:23] 10Operations, 10Chinese-Sites, 10I18n: Deploy Noto fonts or their derivatives for Chinese (and J&K?) - https://phabricator.wikimedia.org/T180924#3773371 (10Mahir256) Per T184664 Noto fonts will //soon// be available for rendering SVG images. [10:31:31] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:36:15] (03PS22) 10TerraCodes: Add wikidata and mediawiki.org to $wgLocalVirtualHosts [mediawiki-config] - 10https://gerrit.wikimedia.org/r/392999 (https://phabricator.wikimedia.org/T117302) [10:40:08] (03PS1) 10Faidon Liambotis: utils: fix Style/GlobalVars cop violations [puppet] - 10https://gerrit.wikimedia.org/r/403616 [10:40:34] (03CR) 10Faidon Liambotis: [C: 032] utils: fix Style/GlobalVars cop violations [puppet] - 10https://gerrit.wikimedia.org/r/403616 (owner: 10Faidon Liambotis) [10:40:39] (03CR) 10Volans: [C: 031] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/403439 (owner: 10Giuseppe Lavagetto) [10:41:08] !log upgrade and restart db2068 [10:41:19] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:43:22] !log Upgrade and restart db1099:3311 and db1099:3318 [10:43:32] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:44:54] db1099 might page :-( [10:45:19] I thought I downtimed it earlier, but I didn't I just downtimed it, but not sure if I was faster than icinga [10:46:32] * elukey hugs marostegui [10:46:45] * marostegui appreciates it [10:46:52] fixed [10:46:57] even if I do not get hugs [10:47:18] Ah, you disasbled notifications! [10:47:19] thanks [10:47:23] * marostegui hugs jynus [10:47:48] !log rebooting tin for kernel security update [10:48:00] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:48:07] what is db1099? [10:48:17] recentchanges 3311 and 33118 [10:48:22] ok [10:49:03] so "disable notification" will disable instantly notifications [10:49:23] downtime doesn't work because it will only ignore future downs, not if we are already degraded [10:50:22] yeah, I don't know if I was faster than icinga dectecting the degradation on 3311 [10:50:42] tin is back up [10:50:45] normaly ther is time [10:51:01] because replication checks were extended in time and retrys [10:51:28] also icinga has been laterly very responsive attendiung the queue, probably due to akosiaris fixies [10:51:52] I 've done no fixes [10:52:14] I see what you are doing there and refuse any kind of involvement [10:52:16] :P [10:52:29] well, icinga has been much more responsivle in the latest weeks/months [10:52:39] it used to take 30 seconds to process a command [10:52:44] now it takes very few [10:52:55] !log rearmed keyholder on tin [10:53:05] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:53:59] jynus: ok that's good. Let's pin it on volans though. He is the icinga expert. He is even reading the code !!!! [10:54:15] * volans hides [10:54:40] !log set kvm:migration_downtime to 30ms for both eqiad/codfw ganeti clusters. Then set migration_downtime 30000 for nitrogen/nihal [10:54:51] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:54:58] and if anyone asks why, the answer is java [10:55:24] marostegui: can you take care of remocing the alerts disables on db1099 when ready? [10:55:33] I am moving to other hosts [10:55:40] jynus: yep, will do! thanks [10:55:50] thanks to you [10:56:19] (03PS1) 10Hashar: volans: getconf ARG_MAX [puppet] - 10https://gerrit.wikimedia.org/r/403617 [10:56:43] rotfl [10:56:52] I didn't realise volans was configured using puppet [10:56:52] !log upload apertium_3.4.2~r68466-3+wmf1_amd64to apt.wikimedia.org/jessie-wikimedia/main T181464 [10:57:04] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:57:04] T181464: Update Apertium for new hfst - https://phabricator.wikimedia.org/T181464 [10:57:10] Reedy: I'm not real, I'm an AI... you got me [10:57:36] (03Abandoned) 10Hashar: volans: getconf ARG_MAX [puppet] - 10https://gerrit.wikimedia.org/r/403617 (owner: 10Hashar) [10:57:38] (03PS2) 10Giuseppe Lavagetto: base::resolving: properly extend the tests [puppet] - 10https://gerrit.wikimedia.org/r/403612 [10:57:44] (03CR) 10Alexandros Kosiaris: "recheck" [debs/contenttranslation/apertium-arg-cat] - 10https://gerrit.wikimedia.org/r/397218 (https://phabricator.wikimedia.org/T171406) (owner: 10KartikMistry) [10:58:13] (03CR) 10jerkins-bot: [V: 04-1] Depends on cg3 (>= 1.0.0~r12254) [debs/contenttranslation/apertium-arg-cat] - 10https://gerrit.wikimedia.org/r/397218 (https://phabricator.wikimedia.org/T171406) (owner: 10KartikMistry) [10:58:39] (03CR) 10Volans: [C: 031] "LGTM (nitpick inside)" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/403612 (owner: 10Giuseppe Lavagetto) [11:02:33] !log migrating instances off ganeti1003 for subsequent reboot for kernel security update [11:02:44] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:02:58] !log upload cg3_1.0.0~r12254-1+wmf1_amd64 to apt.wikimedia.org/jessie-wikimedia/main [11:03:08] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:03:39] 10Operations, 10Commons, 10Wikimedia-SVG-rendering, 10media-storage, 10Patch-For-Review: Install Noto fonts on scaling servers for SVG rendering - https://phabricator.wikimedia.org/T184664#3891777 (10Mahir256) I just noticed, @Reedy, that only a select few Noto fonts are actually in the noto-fonts packag... [11:07:02] !log reboot remaining job runners in eqiad for kernel security update (along with update to HHVM 3.18.6) [11:07:07] (03CR) 10Alexandros Kosiaris: "recheck" [debs/contenttranslation/apertium-arg-cat] - 10https://gerrit.wikimedia.org/r/397218 (https://phabricator.wikimedia.org/T171406) (owner: 10KartikMistry) [11:07:12] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:07:52] (03PS3) 10Giuseppe Lavagetto: base::resolving: properly extend the tests [puppet] - 10https://gerrit.wikimedia.org/r/403612 [11:07:59] (03CR) 10Alexandros Kosiaris: "recheck" [debs/contenttranslation/apertium-arg-cat] - 10https://gerrit.wikimedia.org/r/397218 (https://phabricator.wikimedia.org/T171406) (owner: 10KartikMistry) [11:08:04] (03CR) 10Alexandros Kosiaris: "recheck" [debs/contenttranslation/apertium-arg-cat] - 10https://gerrit.wikimedia.org/r/397218 (https://phabricator.wikimedia.org/T171406) (owner: 10KartikMistry) [11:08:06] 10Operations, 10Commons, 10Wikimedia-SVG-rendering, 10media-storage, 10Patch-For-Review: Install Noto fonts on scaling servers for SVG rendering - https://phabricator.wikimedia.org/T184664#3892719 (10Reedy) >>! In T184664#3892703, @Mahir256 wrote: > I just noticed, @Reedy, that only a select few Noto fon... [11:08:08] (03CR) 10Alexandros Kosiaris: "recheck" [debs/contenttranslation/apertium-urd-hin] - 10https://gerrit.wikimedia.org/r/403353 (https://phabricator.wikimedia.org/T171406) (owner: 10KartikMistry) [11:08:10] (03CR) 10Alexandros Kosiaris: "recheck" [debs/contenttranslation/apertium-urd] - 10https://gerrit.wikimedia.org/r/403352 (https://phabricator.wikimedia.org/T171406) (owner: 10KartikMistry) [11:08:10] sorry about what's going to follow [11:08:12] (03CR) 10Alexandros Kosiaris: "recheck" [debs/contenttranslation/apertium-tur] - 10https://gerrit.wikimedia.org/r/403351 (https://phabricator.wikimedia.org/T171406) (owner: 10KartikMistry) [11:08:14] (03CR) 10Alexandros Kosiaris: "recheck" [debs/contenttranslation/apertium-tat] - 10https://gerrit.wikimedia.org/r/403350 (https://phabricator.wikimedia.org/T171406) (owner: 10KartikMistry) [11:08:16] (03CR) 10Alexandros Kosiaris: "recheck" [debs/contenttranslation/apertium-swe-nor] - 10https://gerrit.wikimedia.org/r/403347 (https://phabricator.wikimedia.org/T171406) (owner: 10KartikMistry) [11:08:20] (03CR) 10Alexandros Kosiaris: "recheck" [debs/contenttranslation/apertium-swe-dan] - 10https://gerrit.wikimedia.org/r/403346 (https://phabricator.wikimedia.org/T171406) (owner: 10KartikMistry) [11:08:22] (03CR) 10Alexandros Kosiaris: "recheck" [debs/contenttranslation/apertium-swe] - 10https://gerrit.wikimedia.org/r/403345 (https://phabricator.wikimedia.org/T171406) (owner: 10KartikMistry) [11:08:25] (03CR) 10Alexandros Kosiaris: "recheck" [debs/contenttranslation/apertium-srd-ita] - 10https://gerrit.wikimedia.org/r/403344 (https://phabricator.wikimedia.org/T171406) (owner: 10KartikMistry) [11:08:27] (03CR) 10Alexandros Kosiaris: "recheck" [debs/contenttranslation/apertium-cat-srd] - 10https://gerrit.wikimedia.org/r/403340 (https://phabricator.wikimedia.org/T171406) (owner: 10KartikMistry) [11:08:28] (03CR) 10Alexandros Kosiaris: "recheck" [debs/contenttranslation/apertium-cat] - 10https://gerrit.wikimedia.org/r/403339 (https://phabricator.wikimedia.org/T171406) (owner: 10KartikMistry) [11:08:30] (03CR) 10Alexandros Kosiaris: "recheck" [debs/contenttranslation/apertium-srd] - 10https://gerrit.wikimedia.org/r/403159 (https://phabricator.wikimedia.org/T171406) (owner: 10KartikMistry) [11:08:32] (03CR) 10Alexandros Kosiaris: "recheck" [debs/contenttranslation/apertium-spa-arg] - 10https://gerrit.wikimedia.org/r/403136 (https://phabricator.wikimedia.org/T171406) (owner: 10KartikMistry) [11:08:34] (03CR) 10Alexandros Kosiaris: "recheck" [debs/contenttranslation/apertium-spa] - 10https://gerrit.wikimedia.org/r/403135 (https://phabricator.wikimedia.org/T171406) (owner: 10KartikMistry) [11:08:36] (03CR) 10Alexandros Kosiaris: "recheck" [debs/contenttranslation/apertium-sme-nob] - 10https://gerrit.wikimedia.org/r/403134 (https://phabricator.wikimedia.org/T171406) (owner: 10KartikMistry) [11:08:38] (03CR) 10Alexandros Kosiaris: "recheck" [debs/contenttranslation/apertium-rus] - 10https://gerrit.wikimedia.org/r/403118 (https://phabricator.wikimedia.org/T171406) (owner: 10KartikMistry) [11:08:40] (03CR) 10Alexandros Kosiaris: "recheck" [debs/contenttranslation/apertium-nob] - 10https://gerrit.wikimedia.org/r/403105 (https://phabricator.wikimedia.org/T171406) (owner: 10KartikMistry) [11:08:41] ;D [11:08:42] (03CR) 10Alexandros Kosiaris: "recheck" [debs/contenttranslation/apertium-nno-nob] - 10https://gerrit.wikimedia.org/r/403101 (https://phabricator.wikimedia.org/T171406) (owner: 10KartikMistry) [11:08:44] (03CR) 10Alexandros Kosiaris: "recheck" [debs/contenttranslation/apertium-nno] - 10https://gerrit.wikimedia.org/r/403100 (https://phabricator.wikimedia.org/T171406) (owner: 10KartikMistry) [11:08:46] (03CR) 10Alexandros Kosiaris: "recheck" [debs/contenttranslation/apertium-mlt-ara] - 10https://gerrit.wikimedia.org/r/403099 (https://phabricator.wikimedia.org/T171406) (owner: 10KartikMistry) [11:08:48] (03CR) 10Alexandros Kosiaris: "recheck" [debs/contenttranslation/apertium] - 10https://gerrit.wikimedia.org/r/397217 (https://phabricator.wikimedia.org/T171406) (owner: 10KartikMistry) [11:08:49] <_joe_> akosiaris: u flooding us with your script [11:08:50] (03CR) 10Alexandros Kosiaris: "recheck" [debs/contenttranslation/apertium-is-sv] - 10https://gerrit.wikimedia.org/r/397527 (https://phabricator.wikimedia.org/T171406) (owner: 10KartikMistry) [11:08:52] (03CR) 10Alexandros Kosiaris: "recheck" [debs/contenttranslation/apertium-mk-en] - 10https://gerrit.wikimedia.org/r/397742 (https://phabricator.wikimedia.org/T171406) (owner: 10KartikMistry) [11:08:54] (03CR) 10Alexandros Kosiaris: "recheck" [debs/contenttranslation/apertium-mk-bg] - 10https://gerrit.wikimedia.org/r/397741 (https://phabricator.wikimedia.org/T171406) (owner: 10KartikMistry) [11:08:56] (03CR) 10Alexandros Kosiaris: "recheck" [debs/contenttranslation/apertium-kaz-tat] - 10https://gerrit.wikimedia.org/r/397542 (https://phabricator.wikimedia.org/T171406) (owner: 10KartikMistry) [11:08:58] (03CR) 10Alexandros Kosiaris: "recheck" [debs/contenttranslation/apertium-kaz] - 10https://gerrit.wikimedia.org/r/397540 (https://phabricator.wikimedia.org/T171406) (owner: 10KartikMistry) [11:09:00] (03CR) 10Alexandros Kosiaris: "recheck" [debs/contenttranslation/apertium-ita] - 10https://gerrit.wikimedia.org/r/397528 (https://phabricator.wikimedia.org/T171406) (owner: 10KartikMistry) [11:09:02] (03CR) 10Alexandros Kosiaris: "recheck" [debs/contenttranslation/apertium-isl-eng] - 10https://gerrit.wikimedia.org/r/397512 (https://phabricator.wikimedia.org/T171406) (owner: 10KartikMistry) [11:09:04] (03CR) 10Alexandros Kosiaris: "recheck" [debs/contenttranslation/apertium-isl] - 10https://gerrit.wikimedia.org/r/397502 (https://phabricator.wikimedia.org/T171406) (owner: 10KartikMistry) [11:09:06] (03CR) 10Alexandros Kosiaris: "recheck" [debs/contenttranslation/apertium-hin] - 10https://gerrit.wikimedia.org/r/397494 (https://phabricator.wikimedia.org/T171406) (owner: 10KartikMistry) [11:09:07] _joe_: obviously.. but I did apologize in advance [11:09:08] (03CR) 10Alexandros Kosiaris: "recheck" [debs/contenttranslation/apertium-hbs-slv] - 10https://gerrit.wikimedia.org/r/397285 (https://phabricator.wikimedia.org/T171406) (owner: 10KartikMistry) [11:09:10] (03CR) 10Alexandros Kosiaris: "recheck" [debs/contenttranslation/apertium-hbs-mkd] - 10https://gerrit.wikimedia.org/r/397284 (https://phabricator.wikimedia.org/T171406) (owner: 10KartikMistry) [11:09:12] (03CR) 10Alexandros Kosiaris: "recheck" [debs/contenttranslation/apertium-hbs-eng] - 10https://gerrit.wikimedia.org/r/397283 (https://phabricator.wikimedia.org/T171406) (owner: 10KartikMistry) [11:09:14] (03CR) 10Giuseppe Lavagetto: [C: 032] base::resolving: properly extend the tests [puppet] - 10https://gerrit.wikimedia.org/r/403612 (owner: 10Giuseppe Lavagetto) [11:09:16] (03CR) 10Alexandros Kosiaris: "recheck" [debs/contenttranslation/apertium-hbs] - 10https://gerrit.wikimedia.org/r/397279 (https://phabricator.wikimedia.org/T171406) (owner: 10KartikMistry) [11:09:18] (03CR) 10Alexandros Kosiaris: "recheck" [debs/contenttranslation/apertium-fra] - 10https://gerrit.wikimedia.org/r/397234 (https://phabricator.wikimedia.org/T171406) (owner: 10KartikMistry) [11:09:20] (03CR) 10Alexandros Kosiaris: "recheck" [debs/contenttranslation/apertium-fra-cat] - 10https://gerrit.wikimedia.org/r/397233 (https://phabricator.wikimedia.org/T171406) (owner: 10KartikMistry) [11:09:22] (03CR) 10Alexandros Kosiaris: "recheck" [debs/contenttranslation/apertium-eus] - 10https://gerrit.wikimedia.org/r/397231 (https://phabricator.wikimedia.org/T171406) (owner: 10KartikMistry) [11:09:24] (03CR) 10Alexandros Kosiaris: "recheck" [debs/contenttranslation/apertium-eo-es] - 10https://gerrit.wikimedia.org/r/397230 (https://phabricator.wikimedia.org/T171406) (owner: 10KartikMistry) [11:09:26] (03CR) 10Alexandros Kosiaris: "recheck" [debs/contenttranslation/apertium-dan-nor] - 10https://gerrit.wikimedia.org/r/397229 (https://phabricator.wikimedia.org/T171406) (owner: 10KartikMistry) [11:09:30] I wonder if jenkins is going to survive [11:09:30] (03CR) 10Alexandros Kosiaris: "recheck" [debs/contenttranslation/apertium-dan] - 10https://gerrit.wikimedia.org/r/397228 (https://phabricator.wikimedia.org/T171406) (owner: 10KartikMistry) [11:09:33] (03CR) 10Alexandros Kosiaris: "recheck" [debs/contenttranslation/apertium-cy-en] - 10https://gerrit.wikimedia.org/r/397227 (https://phabricator.wikimedia.org/T171406) (owner: 10KartikMistry) [11:09:34] (03CR) 10Alexandros Kosiaris: "recheck" [debs/contenttranslation/apertium-crh-tur] - 10https://gerrit.wikimedia.org/r/397226 (https://phabricator.wikimedia.org/T171406) (owner: 10KartikMistry) [11:09:35] hashar: best to keep an eye on it [11:09:36] (03CR) 10Alexandros Kosiaris: "recheck" [debs/contenttranslation/apertium-crh] - 10https://gerrit.wikimedia.org/r/397225 (https://phabricator.wikimedia.org/T171406) (owner: 10KartikMistry) [11:09:38] <_joe_> I doubt it [11:09:38] (03CR) 10Alexandros Kosiaris: "recheck" [debs/contenttranslation/apertium-br-fr] - 10https://gerrit.wikimedia.org/r/397222 (https://phabricator.wikimedia.org/T171406) (owner: 10KartikMistry) [11:09:40] (03CR) 10Alexandros Kosiaris: "recheck" [debs/contenttranslation/apertium-bel-rus] - 10https://gerrit.wikimedia.org/r/397221 (https://phabricator.wikimedia.org/T171406) (owner: 10KartikMistry) [11:09:42] (03CR) 10Alexandros Kosiaris: "recheck" [debs/contenttranslation/apertium-bel] - 10https://gerrit.wikimedia.org/r/397220 (https://phabricator.wikimedia.org/T171406) (owner: 10KartikMistry) [11:09:44] (03CR) 10Alexandros Kosiaris: "recheck" [debs/contenttranslation/apertium-arg] - 10https://gerrit.wikimedia.org/r/397219 (https://phabricator.wikimedia.org/T171406) (owner: 10KartikMistry) [11:09:48] "I'm not slacking, I just can't get any work done because akosiaris fscked jenkins up for the rest of the day" [11:09:52] <_joe_> zuul goes down with a patchset of 10 pages [11:10:03] .... [11:10:24] (03CR) 10jerkins-bot: [V: 04-1] apertium-swe-nor: Updated dependency on cg3 [debs/contenttranslation/apertium-swe-nor] - 10https://gerrit.wikimedia.org/r/403347 (https://phabricator.wikimedia.org/T171406) (owner: 10KartikMistry) [11:10:30] good start [11:10:30] (03CR) 10jerkins-bot: [V: 04-1] apertium-cat-srd: New upstream and updated dependencies [debs/contenttranslation/apertium-cat-srd] - 10https://gerrit.wikimedia.org/r/403340 (https://phabricator.wikimedia.org/T171406) (owner: 10KartikMistry) [11:10:31] see ? it's working ! [11:10:33] rofl [11:10:35] I accidentally CI [11:10:37] V-1 for everyone [11:10:41] akosiaris: that debian-glue job runs on a set of 4 instances wich are barely used by anything else. So that should be all fine [11:11:06] akosiaris: and Zuul kindly queue the changes and wait for one of the 4 instances to be available to run another job [11:11:28] there ... eventuality rules!!! [11:11:29] <_joe_> hashar: http://s2.quickmeme.com/img/01/01601154924d2b0144f2dd8ab01c25933dc78a3b8a5b94be3786935a8ec600dc.jpg [11:11:39] _joe_: ahah [11:11:56] 10Operations, 10Commons, 10Wikimedia-SVG-rendering, 10media-storage, 10Patch-For-Review: Install Noto fonts on scaling servers for SVG rendering - https://phabricator.wikimedia.org/T184664#3892727 (10Reedy) So basically, we can ship this now, and then a task can be filed to add the others *after* T174431... [11:13:33] 10Operations, 10Analytics, 10Patch-For-Review, 10User-Elukey, 10User-Joe: rack/setup/install conf1004-conf1006 - https://phabricator.wikimedia.org/T166081#3284834 (10MoritzMuehlenhoff) Given that this task is stalled for a while now, we should reimage these servers with stretch before eventually putting... [11:14:14] !log migrating instances off ganeti1004 for subsequent reboot for kernel security update [11:14:25] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:15:02] (03CR) 10jerkins-bot: [V: 04-1] apertium-nno: Update dependency on cg3 [debs/contenttranslation/apertium-nno] - 10https://gerrit.wikimedia.org/r/403100 (https://phabricator.wikimedia.org/T171406) (owner: 10KartikMistry) [11:15:09] moritzm: it might be difficult since we'd need to test zookeeper and etcd stretch versions, me and _joe_ are planning to work on those hosts soon [11:15:43] (03PS2) 10Elukey: role::analytics_cluster::coordinator: add a profile to restart streaming jobs [puppet] - 10https://gerrit.wikimedia.org/r/395504 (https://phabricator.wikimedia.org/T176983) (owner: 10Joal) [11:15:46] <_joe_> elukey: I can take a look as far as etcd goes, moving to stretch would be a good thing probably [11:16:15] (03CR) 10jerkins-bot: [V: 04-1] role::analytics_cluster::coordinator: add a profile to restart streaming jobs [puppet] - 10https://gerrit.wikimedia.org/r/395504 (https://phabricator.wikimedia.org/T176983) (owner: 10Joal) [11:16:26] elukey: we'll need all those on stretch eventually anyway :-) [11:17:55] <_joe_> moritzm: since etcd has been removed as a package from both stretch and buster, I think I'll go on and do some shameful packaging from now on for it, btw [11:18:04] ack [11:18:13] 10Operations, 10DBA, 10Goal: Generate consistent logical database backups in CODFW - https://phabricator.wikimedia.org/T184699#3892765 (10jcrespo) [11:18:17] <_joe_> I'm also evaluating installing etcd3 on those machines, so we can work on an eventual transition [11:18:33] 10Operations, 10DBA, 10Goal: Generate consistent logical database backups in CODFW - https://phabricator.wikimedia.org/T184699#3892785 (10Marostegui) p:05Triage>03Normal [11:18:59] I really hate that stupid fringe ports like powerpc64le keep packages like this out of testing [11:20:00] (03CR) 10Giuseppe Lavagetto: "https://puppet-compiler.wmflabs.org/compiler02/9698/ says this is a noop." [puppet] - 10https://gerrit.wikimedia.org/r/403439 (owner: 10Giuseppe Lavagetto) [11:20:07] (03PS3) 10Giuseppe Lavagetto: base::resolving: remove useless "else" clause [puppet] - 10https://gerrit.wikimedia.org/r/403439 [11:20:09] all right so I'll check the current stretch zk version :) [11:20:25] 10Operations, 10DBA, 10Goal: Generate consistent logical database backups in CODFW - https://phabricator.wikimedia.org/T184699#3892789 (10jcrespo) [11:20:49] I don't know if our strategy of expanding the current zk/etcd clusters from 3 to 6 nodes would work though with different sw versions [11:21:20] (03CR) 10jerkins-bot: [V: 04-1] Depends on cg3 (>= 1.0.0~r12254) [debs/contenttranslation/apertium-hbs-mkd] - 10https://gerrit.wikimedia.org/r/397284 (https://phabricator.wikimedia.org/T171406) (owner: 10KartikMistry) [11:21:26] akosiaris: I'll look into failures.. [11:22:21] elukey: stretch and jessie use the same 3.4.x series, probably just fine. worst case we build 3.4.5 packages for stretch and upgrade the cluster to 3.4.9 ina followup step [11:22:27] (03PS3) 10Elukey: role::analytics_cluster::coordinator: add a profile to restart streaming jobs [puppet] - 10https://gerrit.wikimedia.org/r/395504 (https://phabricator.wikimedia.org/T176983) (owner: 10Joal) [11:22:33] moritzm: ack! [11:22:34] (03CR) 10jerkins-bot: [V: 04-1] Depends on cg3 (>= 1.0.0~r12254) [debs/contenttranslation/apertium-eo-es] - 10https://gerrit.wikimedia.org/r/397230 (https://phabricator.wikimedia.org/T171406) (owner: 10KartikMistry) [11:23:22] (03CR) 10jerkins-bot: [V: 04-1] Depends on cg3 (>= 1.0.0~r12254) [debs/contenttranslation/apertium-dan-nor] - 10https://gerrit.wikimedia.org/r/397229 (https://phabricator.wikimedia.org/T171406) (owner: 10KartikMistry) [11:23:31] (03CR) 10Giuseppe Lavagetto: [C: 032] base::resolving: remove useless "else" clause [puppet] - 10https://gerrit.wikimedia.org/r/403439 (owner: 10Giuseppe Lavagetto) [11:26:44] (03PS7) 10Gilles: Smarter Varnish slow log [puppet] - 10https://gerrit.wikimedia.org/r/399176 (https://phabricator.wikimedia.org/T181315) [11:26:52] (03CR) 10Gilles: Smarter Varnish slow log (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/399176 (https://phabricator.wikimedia.org/T181315) (owner: 10Gilles) [11:27:09] (03PS8) 10Gilles: Smarter Varnish slow log [puppet] - 10https://gerrit.wikimedia.org/r/399176 (https://phabricator.wikimedia.org/T181315) [11:27:21] (03CR) 10KartikMistry: "Depends on newer apertium-cat and apertium-srd." [debs/contenttranslation/apertium-cat-srd] - 10https://gerrit.wikimedia.org/r/403340 (https://phabricator.wikimedia.org/T171406) (owner: 10KartikMistry) [11:29:40] (03CR) 10Joal: "Updates to the spark command line, then ready for me !" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/395504 (https://phabricator.wikimedia.org/T176983) (owner: 10Joal) [11:29:44] (03CR) 10Gilles: "https://puppet-compiler.wmflabs.org/compiler03/9700/" [puppet] - 10https://gerrit.wikimedia.org/r/399176 (https://phabricator.wikimedia.org/T181315) (owner: 10Gilles) [11:29:49] elukey: --^ [11:31:40] (03PS4) 10Elukey: role::analytics_cluster::coordinator: add a profile to restart streaming jobs [puppet] - 10https://gerrit.wikimedia.org/r/395504 (https://phabricator.wikimedia.org/T176983) (owner: 10Joal) [11:31:45] joal: --^ :_ [11:31:47] :) [11:31:49] 10Operations, 10DBA, 10Goal, 10Patch-For-Review: Decommission old coredb machines (<=db1050) - https://phabricator.wikimedia.org/T134476#3892865 (10jcrespo) [11:32:06] (03PS1) 10Filippo Giunchedi: hieradata: extend SMART eqiad deployment [puppet] - 10https://gerrit.wikimedia.org/r/403621 (https://phabricator.wikimedia.org/T86552) [11:32:38] (03CR) 10Filippo Giunchedi: [C: 031] Add fonts-noto to mediawiki::packages::fonts [puppet] - 10https://gerrit.wikimedia.org/r/403605 (https://phabricator.wikimedia.org/T184664) (owner: 10Reedy) [11:33:04] (03CR) 10Joal: [C: 031] "Good for me !" [puppet] - 10https://gerrit.wikimedia.org/r/395504 (https://phabricator.wikimedia.org/T176983) (owner: 10Joal) [11:33:44] !log migrating instances off ganeti1005 for subsequent reboot for kernel security update [11:33:56] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:34:21] (03CR) 10Reedy: ".15 is everywhere now, so this could go out whenever" [puppet] - 10https://gerrit.wikimedia.org/r/399434 (owner: 10Reedy) [11:36:07] (03PS5) 10Elukey: role::analytics_cluster::coordinator: add a profile to restart streaming jobs [puppet] - 10https://gerrit.wikimedia.org/r/395504 (https://phabricator.wikimedia.org/T176983) (owner: 10Joal) [11:36:10] (03PS2) 10KartikMistry: apertium-eo-es: Depends on cg3 (>= 1.0.0~r12254) [debs/contenttranslation/apertium-eo-es] - 10https://gerrit.wikimedia.org/r/397230 (https://phabricator.wikimedia.org/T171406) [11:36:50] (03PS6) 10Elukey: role::analytics_cluster::coordinator: add a profile to restart streaming jobs [puppet] - 10https://gerrit.wikimedia.org/r/395504 (https://phabricator.wikimedia.org/T176983) (owner: 10Joal) [11:39:36] (03CR) 10Elukey: [C: 032] "https://puppet-compiler.wmflabs.org/compiler02/9703/analytics1003.eqiad.wmnet/" [puppet] - 10https://gerrit.wikimedia.org/r/395504 (https://phabricator.wikimedia.org/T176983) (owner: 10Joal) [11:40:56] (03PS2) 10KartikMistry: apertium-nno: Update dependency on cg3 [debs/contenttranslation/apertium-nno] - 10https://gerrit.wikimedia.org/r/403100 (https://phabricator.wikimedia.org/T171406) [11:41:05] (03CR) 10Muehlenhoff: [C: 04-1] Add fonts-noto to mediawiki::packages::fonts (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/403605 (https://phabricator.wikimedia.org/T184664) (owner: 10Reedy) [11:41:53] (03CR) 10Reedy: "Not 100% sure if we want to install all of the other packages as per https://phabricator.wikimedia.org/T184664#3892703" [puppet] - 10https://gerrit.wikimedia.org/r/403605 (https://phabricator.wikimedia.org/T184664) (owner: 10Reedy) [11:44:08] (03PS2) 10Giuseppe Lavagetto: base::resolving: explicitly pass arguments [puppet] - 10https://gerrit.wikimedia.org/r/403440 [11:45:04] (03CR) 10jerkins-bot: [V: 04-1] base::resolving: explicitly pass arguments [puppet] - 10https://gerrit.wikimedia.org/r/403440 (owner: 10Giuseppe Lavagetto) [11:45:08] RECOVERY - Long running screen/tmux on bast2001 is OK: OK: No SCREEN or tmux processes detected. [11:50:20] (03PS1) 10Reedy: Move packages onto individual lines in require_package() for OS versions [puppet] - 10https://gerrit.wikimedia.org/r/403623 [11:55:22] (03PS3) 10Giuseppe Lavagetto: base::resolving: explicitly pass arguments [puppet] - 10https://gerrit.wikimedia.org/r/403440 [11:56:28] (03PS3) 10Reedy: Add Noto fonts to mediawiki::packages::fonts [puppet] - 10https://gerrit.wikimedia.org/r/403605 (https://phabricator.wikimedia.org/T184664) [11:56:48] (03CR) 10jerkins-bot: [V: 04-1] base::resolving: explicitly pass arguments [puppet] - 10https://gerrit.wikimedia.org/r/403440 (owner: 10Giuseppe Lavagetto) [11:57:06] (03CR) 10jerkins-bot: [V: 04-1] Add Noto fonts to mediawiki::packages::fonts [puppet] - 10https://gerrit.wikimedia.org/r/403605 (https://phabricator.wikimedia.org/T184664) (owner: 10Reedy) [11:57:09] (03Abandoned) 10KartikMistry: Depends on cg3 (>= 1.0.0~r12254) [debs/contenttranslation/apertium-hbs-mkd] - 10https://gerrit.wikimedia.org/r/397284 (https://phabricator.wikimedia.org/T171406) (owner: 10KartikMistry) [11:57:59] (03PS1) 10KartikMistry: apertium-hbs-mkd: New upstream snapshot and updated cg3 dependency [debs/contenttranslation/apertium-hbs-mkd] - 10https://gerrit.wikimedia.org/r/403625 (https://phabricator.wikimedia.org/T171406) [11:59:26] (03PS4) 10Reedy: Add Noto fonts to mediawiki::packages::fonts [puppet] - 10https://gerrit.wikimedia.org/r/403605 (https://phabricator.wikimedia.org/T184664) [12:02:15] (03CR) 10KartikMistry: "Requires latest apertium-nno." [debs/contenttranslation/apertium-swe-nor] - 10https://gerrit.wikimedia.org/r/403347 (https://phabricator.wikimedia.org/T171406) (owner: 10KartikMistry) [12:02:19] (03CR) 10KartikMistry: "Requires latest apertium-nno." [debs/contenttranslation/apertium-dan-nor] - 10https://gerrit.wikimedia.org/r/397229 (https://phabricator.wikimedia.org/T171406) (owner: 10KartikMistry) [12:02:48] That concludes all failures, akosiaris :) [12:03:14] !log migrating instances off ganeti1006 for subsequent reboot for kernel security update [12:03:25] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:05:34] (03PS1) 10Marostegui: db-eqiad.php: Repool db1099:3318 with low weight [mediawiki-config] - 10https://gerrit.wikimedia.org/r/403626 [12:07:27] !log Stop replication in sync db1089 db1099:3311 - T162807 [12:07:38] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:07:39] T162807: Run pt-table-checksum on s1 (enwiki) - https://phabricator.wikimedia.org/T162807 [12:10:46] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Repool db1099:3318 with low weight [mediawiki-config] - 10https://gerrit.wikimedia.org/r/403626 (owner: 10Marostegui) [12:13:16] (03Merged) 10jenkins-bot: db-eqiad.php: Repool db1099:3318 with low weight [mediawiki-config] - 10https://gerrit.wikimedia.org/r/403626 (owner: 10Marostegui) [12:13:29] (03CR) 10jenkins-bot: db-eqiad.php: Repool db1099:3318 with low weight [mediawiki-config] - 10https://gerrit.wikimedia.org/r/403626 (owner: 10Marostegui) [12:15:20] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Repool db1099:3318 with low weight (duration: 01m 44s) [12:15:33] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:20:57] FYI I'll restore einsteinium as active for the Icinga server in few minutes [12:21:09] (03PS1) 10Volans: Revert "Temporary failover Icinga to tegmen" [puppet] - 10https://gerrit.wikimedia.org/r/403627 [12:21:12] (03PS1) 10Volans: Revert "Temporary failover Icinga to tegmen" [dns] - 10https://gerrit.wikimedia.org/r/403628 [12:21:23] (03CR) 10jerkins-bot: [V: 04-1] Revert "Temporary failover Icinga to tegmen" [puppet] - 10https://gerrit.wikimedia.org/r/403627 (owner: 10Volans) [12:27:43] (03PS2) 10Volans: Revert "Temporary failover Icinga to tegmen" [puppet] - 10https://gerrit.wikimedia.org/r/403627 [12:28:21] !log Start Icinga failover back to einsteinium - T170353 [12:28:32] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:28:33] T170353: Icinga: timeseries checks should have the link to a graph with the data - https://phabricator.wikimedia.org/T170353 [12:29:54] volans: how are downtimes synchronized between icinga instances? I just scheduled 3 elastic nodes for reboot on tegmen... [12:30:25] gehel: I run the sync of the state during the migration, so everything should be there, but let's double check [12:30:28] which nodes? [12:30:55] elastic2003, 2027 and 2001 [12:32:06] I still see them down in the Icinga UI, but not sure you already have completed the transition [12:32:20] starting now, I'll keep an eye on those ;) [12:32:28] (03CR) 10Volans: [C: 032] Revert "Temporary failover Icinga to tegmen" [puppet] - 10https://gerrit.wikimedia.org/r/403627 (owner: 10Volans) [12:32:37] worst case, they'll raise some alerts... [12:34:08] (03CR) 10Gehel: [C: 031] "All good to me for elastic, logstash and wdqs. Feel free to add maps1.* as well if you need." [puppet] - 10https://gerrit.wikimedia.org/r/403621 (https://phabricator.wikimedia.org/T86552) (owner: 10Filippo Giunchedi) [12:34:13] 10Operations, 10Electron-PDFs, 10OfflineContentGenerator, 10Services (designing): Improve stability and maintainability of our browser-based PDF render service - https://phabricator.wikimedia.org/T172815#3510411 (10Gilles) This might be of interest: https://github.com/alvarcarto/url-to-pdf-api looks very s... [12:34:21] !log rebooting naos for kernel security update [12:34:33] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:34:38] moritzm: did you just downtimed it? where? [12:34:48] * volans migrating back icinga to einsteinium [12:35:13] volans: icinga.wikimedia.org, wherever that points to ATM :-) [12:35:22] ehehe good! [12:35:46] (03CR) 10Volans: [C: 032] Revert "Temporary failover Icinga to tegmen" [dns] - 10https://gerrit.wikimedia.org/r/403628 (owner: 10Volans) [12:36:05] !log migrating instances off ganeti1007 for subsequent reboot for kernel security update [12:36:16] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:38:35] !log rearmed keyholder on naos [12:38:45] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:39:27] !log Icinga failover back to einsteinium completed - T170353 [12:39:38] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:39:38] T170353: Icinga: timeseries checks should have the link to a graph with the data - https://phabricator.wikimedia.org/T170353 [12:40:40] gehel: all downtimes seems to be still there [12:40:46] kool [12:40:55] all seems good [12:42:14] PROBLEM - carbon-frontend-relay metric drops on graphite1001 is CRITICAL: TEST - IGNORE - volans https://grafana.wikimedia.org/dashboard/db/graphite-eqiad?orgId=1&panelId=21&fullscreen https://grafana.wikimedia.org/dashboard/db/graphite-codfw?orgId=1&panelId=21&fullscreen [12:42:22] RECOVERY - carbon-frontend-relay metric drops on graphite1001 is OK: OK: Less than 80.00% above the threshold [25.0] https://grafana.wikimedia.org/dashboard/db/graphite-eqiad?orgId=1&panelId=21&fullscreen https://grafana.wikimedia.org/dashboard/db/graphite-codfw?orgId=1&panelId=21&fullscreen [12:50:23] 10Operations, 10monitoring: Icinga: timeseries checks should have the link to a graph with the data - https://phabricator.wikimedia.org/T170353#3893094 (10Volans) 05Open>03Resolved TL;DR: Everything is back to einsteinium now, and everything is working. Resolving. The only possible explanation I have rig... [12:52:03] 10Operations, 10monitoring: Puppet fail to properly refresh Icinga - https://phabricator.wikimedia.org/T184714#3893099 (10Volans) [12:52:34] 10Operations, 10monitoring: Puppet fail to properly refresh Icinga - https://phabricator.wikimedia.org/T184714#3893110 (10Volans) [12:52:54] 10Operations, 10monitoring: Puppet fail to properly refresh Icinga - https://phabricator.wikimedia.org/T184714#3893099 (10Volans) [12:53:02] (03PS1) 10Elukey: profile::analytics::refinery::job::stream_check: fix refinery path [puppet] - 10https://gerrit.wikimedia.org/r/403635 [12:53:41] <_joe_> volans: see what we did for apache in terms of restart/reload (module httpd) [12:54:11] ok, thanks [12:55:11] (03CR) 10Elukey: [C: 032] "https://puppet-compiler.wmflabs.org/compiler02/9706/analytics1003.eqiad.wmnet/" [puppet] - 10https://gerrit.wikimedia.org/r/403635 (owner: 10Elukey) [12:56:12] RECOVERY - Long running screen/tmux on bast4002 is OK: OK: No SCREEN or tmux processes detected. [13:00:04] Amir1: I, the Bot under the Fountain, allow thee, The Deployer, to do Wikidata deploy deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180111T1300). [13:00:04] Lucas_WMDE: A patch you scheduled for Wikidata deploy is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker. [13:02:26] I’m here :) [13:06:01] I am too [13:06:15] Lucas_WMDE: you should ping me in real life if I miss it :P [13:06:20] come here, let's do it [13:08:03] 10Operations, 10Commons, 10Thumbor, 10media-storage, 10Performance-Team (Radar): Jessie rsvg/cairo can't render specific SVG file on Commons - https://phabricator.wikimedia.org/T170628#3893148 (10Gilles) [13:08:29] (03CR) 10Ladsgroup: [C: 032] Enable caching of constraint check results [mediawiki-config] - 10https://gerrit.wikimedia.org/r/403195 (https://phabricator.wikimedia.org/T181060) (owner: 10Lucas Werkmeister (WMDE)) [13:10:03] (03Merged) 10jenkins-bot: Enable caching of constraint check results [mediawiki-config] - 10https://gerrit.wikimedia.org/r/403195 (https://phabricator.wikimedia.org/T181060) (owner: 10Lucas Werkmeister (WMDE)) [13:10:13] (03CR) 10jenkins-bot: Enable caching of constraint check results [mediawiki-config] - 10https://gerrit.wikimedia.org/r/403195 (https://phabricator.wikimedia.org/T181060) (owner: 10Lucas Werkmeister (WMDE)) [13:14:03] mwdebug1002 doesn't seem happy [13:14:13] shall I move forward with mwdebug1001? [13:17:10] I get this weird curl php extension is not installed in mwdebug1002 node: [13:17:14] https://www.irccloud.com/pastebin/leH57baD/ [13:18:43] 10Operations, 10Puppet, 10Traffic: pybal's "can-depool" logic only takes downServers into account - https://phabricator.wikimedia.org/T184715#3893173 (10ema) [13:19:53] 10Operations, 10Puppet, 10Traffic: pybal's "can-depool" logic only takes downServers into account - https://phabricator.wikimedia.org/T184715#3893187 (10ema) p:05Triage>03High [13:27:52] !log failover the ganeti master in eqiad to ganeti1004 [13:28:05] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:31:18] !log migrating instances off ganeti1001 for subsequent reboot for kernel security update [13:31:28] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:31:52] (03CR) 10Jcrespo: [C: 031] hieradata: extend SMART eqiad deployment [puppet] - 10https://gerrit.wikimedia.org/r/403621 (https://phabricator.wikimedia.org/T86552) (owner: 10Filippo Giunchedi) [13:34:02] !log rebooting bast2001 for kernel security update [13:34:12] !log upgrade and restart db2077 [13:34:13] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:34:23] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:36:44] (03PS1) 10Marostegui: db-eqiad.php: Increase weight db1099:3318 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/403648 [13:37:30] (03CR) 10Alexandros Kosiaris: [C: 032] apertium-hbs-mkd: New upstream snapshot and updated cg3 dependency [debs/contenttranslation/apertium-hbs-mkd] - 10https://gerrit.wikimedia.org/r/403625 (https://phabricator.wikimedia.org/T171406) (owner: 10KartikMistry) [13:38:41] (03CR) 10Alexandros Kosiaris: [C: 032] apertium-nno: Update dependency on cg3 [debs/contenttranslation/apertium-nno] - 10https://gerrit.wikimedia.org/r/403100 (https://phabricator.wikimedia.org/T171406) (owner: 10KartikMistry) [13:38:46] (03CR) 10Alexandros Kosiaris: [C: 032] apertium-eo-es: Depends on cg3 (>= 1.0.0~r12254) [debs/contenttranslation/apertium-eo-es] - 10https://gerrit.wikimedia.org/r/397230 (https://phabricator.wikimedia.org/T171406) (owner: 10KartikMistry) [13:38:51] (03CR) 10Alexandros Kosiaris: [C: 032] Depends on cg3 (>= 1.0.0~r12254) [debs/contenttranslation/apertium-bel] - 10https://gerrit.wikimedia.org/r/397220 (https://phabricator.wikimedia.org/T171406) (owner: 10KartikMistry) [13:38:56] (03CR) 10Alexandros Kosiaris: [C: 032] Depends on cg3 (>= 1.0.0~r12254) [debs/contenttranslation/apertium-arg] - 10https://gerrit.wikimedia.org/r/397219 (https://phabricator.wikimedia.org/T171406) (owner: 10KartikMistry) [13:38:59] (03CR) 10Alexandros Kosiaris: [C: 032] Depends on cg3 (>= 1.0.0~r12254) [debs/contenttranslation/apertium-br-fr] - 10https://gerrit.wikimedia.org/r/397222 (https://phabricator.wikimedia.org/T171406) (owner: 10KartikMistry) [13:39:02] (03CR) 10Alexandros Kosiaris: [C: 032] Depends on cg3 (>= 1.0.0~r12254) [debs/contenttranslation/apertium-bel-rus] - 10https://gerrit.wikimedia.org/r/397221 (https://phabricator.wikimedia.org/T171406) (owner: 10KartikMistry) [13:39:06] (03CR) 10Alexandros Kosiaris: [C: 032] Depends on cg3 (>= 1.0.0~r12254) [debs/contenttranslation/apertium-crh] - 10https://gerrit.wikimedia.org/r/397225 (https://phabricator.wikimedia.org/T171406) (owner: 10KartikMistry) [13:39:11] (03CR) 10Alexandros Kosiaris: [C: 032] Depends on cg3 (>= 1.0.0~r12254) [debs/contenttranslation/apertium-crh-tur] - 10https://gerrit.wikimedia.org/r/397226 (https://phabricator.wikimedia.org/T171406) (owner: 10KartikMistry) [13:39:14] (03CR) 10Alexandros Kosiaris: [C: 032] Depends on cg3 (>= 1.0.0~r12254) [debs/contenttranslation/apertium-cy-en] - 10https://gerrit.wikimedia.org/r/397227 (https://phabricator.wikimedia.org/T171406) (owner: 10KartikMistry) [13:39:16] (03CR) 10Alexandros Kosiaris: [C: 032] Depends on cg3 (>= 1.0.0~r12254) [debs/contenttranslation/apertium-dan] - 10https://gerrit.wikimedia.org/r/397228 (https://phabricator.wikimedia.org/T171406) (owner: 10KartikMistry) [13:39:19] (03CR) 10Alexandros Kosiaris: [C: 032] Depends on cg3 (>= 1.0.0~r12254) [debs/contenttranslation/apertium-hbs] - 10https://gerrit.wikimedia.org/r/397279 (https://phabricator.wikimedia.org/T171406) (owner: 10KartikMistry) [13:39:22] (03CR) 10Alexandros Kosiaris: [C: 032] Depends on cg3 (>= 1.0.0~r12254) [debs/contenttranslation/apertium-eus] - 10https://gerrit.wikimedia.org/r/397231 (https://phabricator.wikimedia.org/T171406) (owner: 10KartikMistry) [13:39:26] (03CR) 10Alexandros Kosiaris: [C: 032] Depends on cg3 (>= 1.0.0~r12254) [debs/contenttranslation/apertium-fra-cat] - 10https://gerrit.wikimedia.org/r/397233 (https://phabricator.wikimedia.org/T171406) (owner: 10KartikMistry) [13:39:31] (03CR) 10Alexandros Kosiaris: [C: 032] Depends on cg3 (>= 1.0.0~r12254) [debs/contenttranslation/apertium-fra] - 10https://gerrit.wikimedia.org/r/397234 (https://phabricator.wikimedia.org/T171406) (owner: 10KartikMistry) [13:39:35] (03CR) 10Alexandros Kosiaris: [C: 032] Depends on cg3 (>= 1.0.0~r12254) [debs/contenttranslation/apertium-hbs-eng] - 10https://gerrit.wikimedia.org/r/397283 (https://phabricator.wikimedia.org/T171406) (owner: 10KartikMistry) [13:39:38] (03CR) 10Alexandros Kosiaris: [C: 032] Depends on cg3 (>= 1.0.0~r12254) [debs/contenttranslation/apertium-hbs-slv] - 10https://gerrit.wikimedia.org/r/397285 (https://phabricator.wikimedia.org/T171406) (owner: 10KartikMistry) [13:39:41] (03CR) 10Alexandros Kosiaris: [C: 032] apertium-kaz: Depends on cg3 (>= 1.0.0~r12254) [debs/contenttranslation/apertium-kaz] - 10https://gerrit.wikimedia.org/r/397540 (https://phabricator.wikimedia.org/T171406) (owner: 10KartikMistry) [13:39:43] (03CR) 10Alexandros Kosiaris: [C: 032] Depends on cg3 (>= 1.0.0~r12254) [debs/contenttranslation/apertium-hin] - 10https://gerrit.wikimedia.org/r/397494 (https://phabricator.wikimedia.org/T171406) (owner: 10KartikMistry) [13:39:46] (03CR) 10Alexandros Kosiaris: [C: 032] Depends on cg3 (>= 1.0.0~r12254) [debs/contenttranslation/apertium-ita] - 10https://gerrit.wikimedia.org/r/397528 (https://phabricator.wikimedia.org/T171406) (owner: 10KartikMistry) [13:39:48] (03CR) 10Alexandros Kosiaris: [C: 032] Depends on cg3 (>= 1.0.0~r12254) [debs/contenttranslation/apertium-isl-eng] - 10https://gerrit.wikimedia.org/r/397512 (https://phabricator.wikimedia.org/T171406) (owner: 10KartikMistry) [13:39:51] (03CR) 10Alexandros Kosiaris: [C: 032] Depends on cg3 (>= 1.0.0~r12254) [debs/contenttranslation/apertium-isl] - 10https://gerrit.wikimedia.org/r/397502 (https://phabricator.wikimedia.org/T171406) (owner: 10KartikMistry) [13:39:55] (03CR) 10Alexandros Kosiaris: [C: 032] apertium: Depends on new cg3 [debs/contenttranslation/apertium] - 10https://gerrit.wikimedia.org/r/397217 (https://phabricator.wikimedia.org/T171406) (owner: 10KartikMistry) [13:39:57] (03CR) 10Alexandros Kosiaris: [C: 032] apertium-kaz-tat: Depends on cg3 (>= 1.0.0~r12254) [debs/contenttranslation/apertium-kaz-tat] - 10https://gerrit.wikimedia.org/r/397542 (https://phabricator.wikimedia.org/T171406) (owner: 10KartikMistry) [13:39:59] (03CR) 10Alexandros Kosiaris: [C: 032] apertium-mk-en: Update dependency on cg3 [debs/contenttranslation/apertium-mk-en] - 10https://gerrit.wikimedia.org/r/397742 (https://phabricator.wikimedia.org/T171406) (owner: 10KartikMistry) [13:40:02] (03CR) 10Alexandros Kosiaris: [C: 032] apertium-mk-bg: Depends on cg3 (>= 1.0.0~r12254) [debs/contenttranslation/apertium-mk-bg] - 10https://gerrit.wikimedia.org/r/397741 (https://phabricator.wikimedia.org/T171406) (owner: 10KartikMistry) [13:40:04] (03CR) 10Alexandros Kosiaris: [C: 032] apertium-is-sv: New upstream release and cg3 update [debs/contenttranslation/apertium-is-sv] - 10https://gerrit.wikimedia.org/r/397527 (https://phabricator.wikimedia.org/T171406) (owner: 10KartikMistry) [13:40:07] (03CR) 10Alexandros Kosiaris: [C: 032] apertium-mlt-ara: Update dependency on cg3 [debs/contenttranslation/apertium-mlt-ara] - 10https://gerrit.wikimedia.org/r/403099 (https://phabricator.wikimedia.org/T171406) (owner: 10KartikMistry) [13:40:09] Amir1: are you done with your deployment window? [13:40:09] (03CR) 10Alexandros Kosiaris: [C: 032] apertium-nob: Update dependency on cg3 [debs/contenttranslation/apertium-nob] - 10https://gerrit.wikimedia.org/r/403105 (https://phabricator.wikimedia.org/T171406) (owner: 10KartikMistry) [13:40:13] (03CR) 10Alexandros Kosiaris: [C: 032] apertium-sme-nob: Updated dependency on cg3 [debs/contenttranslation/apertium-sme-nob] - 10https://gerrit.wikimedia.org/r/403134 (https://phabricator.wikimedia.org/T171406) (owner: 10KartikMistry) [13:40:15] (03CR) 10Alexandros Kosiaris: [C: 032] apertium-nno-nob: Update dependency on cg3 [debs/contenttranslation/apertium-nno-nob] - 10https://gerrit.wikimedia.org/r/403101 (https://phabricator.wikimedia.org/T171406) (owner: 10KartikMistry) [13:40:17] (03CR) 10Alexandros Kosiaris: [C: 032] apertium-rus: Update dependency on cg3 [debs/contenttranslation/apertium-rus] - 10https://gerrit.wikimedia.org/r/403118 (https://phabricator.wikimedia.org/T171406) (owner: 10KartikMistry) [13:40:20] (03CR) 10Alexandros Kosiaris: [C: 032] apertium-spa: Updated dependency on cg3 [debs/contenttranslation/apertium-spa] - 10https://gerrit.wikimedia.org/r/403135 (https://phabricator.wikimedia.org/T171406) (owner: 10KartikMistry) [13:40:23] (03CR) 10Alexandros Kosiaris: [C: 032] apertium-spa-arg: Updated dependency on cg3 [debs/contenttranslation/apertium-spa-arg] - 10https://gerrit.wikimedia.org/r/403136 (https://phabricator.wikimedia.org/T171406) (owner: 10KartikMistry) [13:40:25] (03CR) 10Alexandros Kosiaris: [C: 032] apertium-cat: New upstream and updated dependency on cg3 [debs/contenttranslation/apertium-cat] - 10https://gerrit.wikimedia.org/r/403339 (https://phabricator.wikimedia.org/T171406) (owner: 10KartikMistry) [13:40:28] (03CR) 10Alexandros Kosiaris: [C: 032] apertium-srd: New upstream and updated cg3 dependency [debs/contenttranslation/apertium-srd] - 10https://gerrit.wikimedia.org/r/403159 (https://phabricator.wikimedia.org/T171406) (owner: 10KartikMistry) [13:40:30] (03CR) 10Alexandros Kosiaris: [C: 032] apertium-srd-ita: Updated cg3 dependency [debs/contenttranslation/apertium-srd-ita] - 10https://gerrit.wikimedia.org/r/403344 (https://phabricator.wikimedia.org/T171406) (owner: 10KartikMistry) [13:40:33] (03CR) 10Alexandros Kosiaris: [C: 032] apertium-swe-dan: updated dependency on cg3 [debs/contenttranslation/apertium-swe-dan] - 10https://gerrit.wikimedia.org/r/403346 (https://phabricator.wikimedia.org/T171406) (owner: 10KartikMistry) [13:40:36] (03CR) 10Alexandros Kosiaris: [C: 032] apertium-swe: Updated dependency on cg3 [debs/contenttranslation/apertium-swe] - 10https://gerrit.wikimedia.org/r/403345 (https://phabricator.wikimedia.org/T171406) (owner: 10KartikMistry) [13:40:38] (03CR) 10Alexandros Kosiaris: [C: 032] apertium-tur: Updated dependency on cg3 [debs/contenttranslation/apertium-tur] - 10https://gerrit.wikimedia.org/r/403351 (https://phabricator.wikimedia.org/T171406) (owner: 10KartikMistry) [13:40:41] (03CR) 10Alexandros Kosiaris: [C: 032] apertium-tat: Updated dependency on cg3 [debs/contenttranslation/apertium-tat] - 10https://gerrit.wikimedia.org/r/403350 (https://phabricator.wikimedia.org/T171406) (owner: 10KartikMistry) [13:40:45] (03CR) 10Alexandros Kosiaris: [C: 032] apertium-urd-hin: Updated dependency on cg3 [debs/contenttranslation/apertium-urd-hin] - 10https://gerrit.wikimedia.org/r/403353 (https://phabricator.wikimedia.org/T171406) (owner: 10KartikMistry) [13:40:48] (03CR) 10Alexandros Kosiaris: [C: 032] apertium-urd: Updated dependency on cg3 [debs/contenttranslation/apertium-urd] - 10https://gerrit.wikimedia.org/r/403352 (https://phabricator.wikimedia.org/T171406) (owner: 10KartikMistry) [13:40:51] (03CR) 10Alexandros Kosiaris: [C: 032] Depends on cg3 (>= 1.0.0~r12254) [debs/contenttranslation/apertium-arg-cat] - 10https://gerrit.wikimedia.org/r/397218 (https://phabricator.wikimedia.org/T171406) (owner: 10KartikMistry) [13:40:53] marostegui: Not yet [13:40:59] maybe next time I will "accidentally" kill wikibugs first [13:41:01] it should stay in mwdebug nodes for a while [13:41:09] that's the reason [13:41:13] and then submit a ton of reviews [13:41:22] You can move forward if there's anything [13:41:33] Amir1: cool, I will wait then. Thanks! [13:42:42] !log rebooting ores2* for kernel security update [13:42:53] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:48:01] (03PS2) 10Filippo Giunchedi: hieradata: extend SMART eqiad deployment [puppet] - 10https://gerrit.wikimedia.org/r/403621 (https://phabricator.wikimedia.org/T86552) [13:48:22] (03CR) 10Filippo Giunchedi: "> All good to me for elastic, logstash and wdqs. Feel free to add" [puppet] - 10https://gerrit.wikimedia.org/r/403621 (https://phabricator.wikimedia.org/T86552) (owner: 10Filippo Giunchedi) [13:50:19] (03PS2) 10Jcrespo: mariadb: Promote db2040 as the new codfw-s7 master [mediawiki-config] - 10https://gerrit.wikimedia.org/r/403453 (https://phabricator.wikimedia.org/T176243) [13:50:33] (03PS4) 10Jcrespo: mariadb: Promote db2040 to be the codfw-s7 master instead of db2029 [puppet] - 10https://gerrit.wikimedia.org/r/403451 (https://phabricator.wikimedia.org/T176243) [13:50:45] (03PS1) 10Ladsgroup: Revert "Enable caching of constraint check results" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/403652 [13:50:52] (03CR) 10Ladsgroup: [C: 032] Revert "Enable caching of constraint check results" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/403652 (owner: 10Ladsgroup) [13:52:17] (03Merged) 10jenkins-bot: Revert "Enable caching of constraint check results" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/403652 (owner: 10Ladsgroup) [13:52:31] (03CR) 10jenkins-bot: Revert "Enable caching of constraint check results" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/403652 (owner: 10Ladsgroup) [13:54:02] I'm done with the deployment [13:55:17] (03CR) 10Jcrespo: [C: 032] mariadb: Promote db2040 to be the codfw-s7 master instead of db2029 [puppet] - 10https://gerrit.wikimedia.org/r/403451 (https://phabricator.wikimedia.org/T176243) (owner: 10Jcrespo) [13:55:23] Amir1: thanks [13:55:31] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Increase weight db1099:3318 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/403648 (owner: 10Marostegui) [13:55:32] Keep up the great work [13:56:13] /q marostegui [13:56:51] !log perform master switchover of s7 codfw [13:57:01] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:57:05] (03Merged) 10jenkins-bot: db-eqiad.php: Increase weight db1099:3318 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/403648 (owner: 10Marostegui) [13:58:45] 10Operations, 10Pybal, 10Traffic: Alert instrumentation returning 500 errors - https://phabricator.wikimedia.org/T184721#3893341 (10ema) [13:58:55] 10Operations, 10Pybal, 10Traffic: Alert instrumentation returning 500 errors - https://phabricator.wikimedia.org/T184721#3893351 (10ema) p:05Triage>03High [13:58:58] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Increase db1099:3318 weight (duration: 01m 15s) [13:59:01] (03CR) 10jenkins-bot: db-eqiad.php: Increase weight db1099:3318 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/403648 (owner: 10Marostegui) [13:59:12] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:00:05] addshore, hashar, anomie, no_justification, aude, MaxSem, twentyafterfour, RoanKattouw, Dereckson, thcipriani, Niharika, and zeljkof: #bothumor I � Unicode. All rise for European Mid-day SWAT(Max 8 patches) deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180111T1400). [14:00:05] revi: A patch you scheduled for European Mid-day SWAT(Max 8 patches) is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker. [14:00:16] I can SWAT today [14:00:22] hoi, except... config is error-ing for something I didn't do [14:00:27] !log reboot tungsten for kernel security update [14:00:37] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:00:39] s/config/CI/ [14:00:47] revi: your patch will not be deployed since it has -1 vote from jenkins-bot [14:01:13] lemme test removing that line... [14:01:18] !log reboot hafnium for kernel security update [14:01:23] let's see if it fixes -1 [14:01:28] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:01:43] (if you can wait) [14:01:46] revi: jenkins is our source of truth, if it fails, the patch will not be merged [14:02:04] revi: I can wait, we have 59 minutes in the swat window left [14:02:26] (03PS2) 10Revi: Create extendedconfirmed for kowiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/403584 (https://phabricator.wikimedia.org/T184675) [14:03:00] rip [14:03:01] failed [14:03:06] 10Operations, 10Pybal, 10Traffic: pybal's "can-depool" logic only takes downServers into account - https://phabricator.wikimedia.org/T184715#3893353 (10ema) [14:03:35] (03CR) 10jerkins-bot: [V: 04-1] Create extendedconfirmed for kowiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/403584 (https://phabricator.wikimedia.org/T184675) (owner: 10Revi) [14:05:01] ok gotcha [14:05:31] You removed ]; at the end? [14:05:31] lol [14:05:44] lol no [14:05:44] (03CR) 10Reedy: [C: 04-1] Create extendedconfirmed for kowiki (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/403584 (https://phabricator.wikimedia.org/T184675) (owner: 10Revi) [14:05:48] forgot to close one ]; [14:06:11] The diff says otherwise [14:06:19] and thought it was that [14:06:26] (03PS1) 10Joal: Correct analytics streams_check jar [puppet] - 10https://gerrit.wikimedia.org/r/403655 (https://phabricator.wikimedia.org/T176983) [14:06:27] PS3... [14:06:29] (03PS3) 10Revi: Create extendedconfirmed for kowiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/403584 (https://phabricator.wikimedia.org/T184675) [14:06:47] yeah seems good [14:06:51] (I hope0 [14:06:52] ) [14:06:57] elukey: --^ [14:08:21] zeljkof: Jenkins-CI +2 now [14:09:00] revi: ok, reviewing [14:09:50] (03CR) 10Revi: ">" (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/403584 (https://phabricator.wikimedia.org/T184675) (owner: 10Revi) [14:09:57] just for the record :P [14:10:22] (03CR) 10Zfilipin: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/403584 (https://phabricator.wikimedia.org/T184675) (owner: 10Revi) [14:10:51] revi: I hope you have learned something today ;) when Jenkins is not happy, it's (usually) your fault :D [14:10:56] yeah heh [14:11:23] mistakes happen, that's why we run all those tests to check [14:11:55] (03Merged) 10jenkins-bot: Create extendedconfirmed for kowiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/403584 (https://phabricator.wikimedia.org/T184675) (owner: 10Revi) [14:12:09] (03CR) 10jenkins-bot: Create extendedconfirmed for kowiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/403584 (https://phabricator.wikimedia.org/T184675) (owner: 10Revi) [14:13:11] revi: 403584 is at mwdebug1002, do you know how to test there? [14:13:45] https://tppr.me/KTVTo yup [14:13:58] revi: please test and let me know if I can deploy [14:13:59] !log set migration_downtime to 2000ms for seaborgium [14:14:12] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:14:57] zeljkof: Good to go! :D [14:15:16] revi: ok, deploying [14:16:55] !log zfilipin@tin Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:403584|Create extendedconfirmed for kowiki (T184675)]] (duration: 01m 23s) [14:17:06] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:17:07] T184675: Create ‘extendedconfirmed’ for kowiki - https://phabricator.wikimedia.org/T184675 [14:17:09] !log upgrade and restart db2029 [14:17:18] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:17:52] revi: deployed, please check and thanks for deploying with #releng ;) [14:18:22] hm, scap said "14:16:55 1 apaches had sync errors" [14:18:30] ping me when deployment band is finished [14:18:36] I have to do maintenance on a database [14:18:37] zeljkof: confirmed, have a nice day! [14:18:47] jynus: I think I was only one with SWAT this time? [14:18:57] so given I'm done, I think it's done for this hr [14:19:06] jynus: I am done, but there is a problem with one apache... [14:19:11] zeljkof fo you have a mw number? [14:19:18] robh: scap said "ssh: connect to host mw1271.eqiad.wmnet port 22: No route to host" [14:19:24] jynus: ^ [14:19:43] (03PS2) 10Elukey: profile::analytics::refinery::job::streams_check: set correct spark jar [puppet] - 10https://gerrit.wikimedia.org/r/403655 (https://phabricator.wikimedia.org/T176983) (owner: 10Joal) [14:20:26] I don't see anything about mw1271 at https://tools.wmflabs.org/sal/production [14:21:14] looks like it's been down for 16h zeljkof [14:21:31] godog: uh oh :) [14:21:40] does anyone know any maintenance on it, or can I kick it? [14:21:50] I cannot find anything on phabricator [14:21:56] will look alerts [14:23:44] Last State Change: 2018-01-10 21:37:57 [14:24:01] jynus: I'm powercycling it, mgmt is dead [14:24:24] (or rather output of comsole as shown over mgmt) [14:24:26] 10Operations, 10Cloud-VPS, 10Toolforge, 10Patch-For-Review, 10cloud-services-team (Kanban): Cloud: Labvirt and instance reboots for Meltdown - https://phabricator.wikimedia.org/T184189#3893388 (10chasemp) {F12453248} {F12453261} [14:24:56] moritzm, jynus: do I need to re-run scap? or should I just close EU SWAT window? [14:25:20] zeljkof: I'll re-run scap before I depool it, feel free to close the SWAT window [14:25:22] we probably need to pull everhing [14:25:28] moritzm: ok, thanks [14:25:32] !log EU SWAT finished [14:25:39] both config but also code [14:25:42] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:25:55] it may have gone down before train yesterday [14:26:01] !log powercycling mw1271 [14:26:02] 10Operations, 10Cloud-VPS, 10Toolforge, 10Patch-For-Review, 10cloud-services-team (Kanban): Cloud: Labvirt and instance reboots for Meltdown - https://phabricator.wikimedia.org/T184189#3893391 (10chasemp) Definitely more expensive, potentially not so severe that it causes us major pains. [14:26:10] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:27:34] (03CR) 10Awight: "Try this in init():" [puppet] - 10https://gerrit.wikimedia.org/r/402665 (owner: 10Paladox) [14:28:57] RECOVERY - Host mw1271 is UP: PING OK - Packet loss = 0%, RTA = 0.31 ms [14:29:01] (03PS1) 10Marostegui: db-eqiad.php: db1067,db1066,db1099:331{1,8} [mediawiki-config] - 10https://gerrit.wikimedia.org/r/403658 (https://phabricator.wikimedia.org/T162807) [14:29:24] (03PS3) 10Jcrespo: mariadb: Promote db2040 as the new codfw-s7 master [mediawiki-config] - 10https://gerrit.wikimedia.org/r/403453 (https://phabricator.wikimedia.org/T176243) [14:29:34] (03CR) 10Giuseppe Lavagetto: "What is the license on the original work you used? Did you add an appropriate copyright notice?" [puppet] - 10https://gerrit.wikimedia.org/r/402665 (owner: 10Paladox) [14:29:58] (03CR) 10Marostegui: [C: 031] mariadb: Promote db2040 as the new codfw-s7 master [mediawiki-config] - 10https://gerrit.wikimedia.org/r/403453 (https://phabricator.wikimedia.org/T176243) (owner: 10Jcrespo) [14:30:29] (03CR) 10Marostegui: [C: 04-2] "Wait for s7 codfw failover to be completed" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/403658 (https://phabricator.wikimedia.org/T162807) (owner: 10Marostegui) [14:32:05] networks stack may be up, but ssh is not [14:32:26] oh, ignore me [14:32:31] Ping ops-team - I'm about to deploy our jobs definition - Please let me know of concerns [14:32:33] I was testing the wrong server [14:32:39] !log joal@tin Started deploy [analytics/refinery@ed8ecbc]: Patching interlanguage link and manually add a jar to our collection [14:32:49] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:33:04] joal: does it have anything to do with mediawiki servers? [14:33:13] if yes, I would wait a second [14:33:25] if no, no problem [14:34:17] jynus: mw1271 seems like a hardware error, I'm opening a dc-ops ticket for further checks [14:34:44] moritzm: I can either run pull or remove it from dsh [14:34:53] jynus: no MW servers related [14:35:03] jynus: purely analytics-internal [14:35:16] no issue then, we had an issue with a MW server [14:35:20] jynus: I have been asked to ping when deploying, even if not directly MW related, so I do :) [14:35:27] thanks [14:35:30] Arf - Good luk with that jynus [14:36:28] !log running scap pull on mw1271 [14:36:38] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:36:44] 10Operations, 10ops-eqiad: Hardware check on mw1271 - https://phabricator.wikimedia.org/T184722#3893398 (10MoritzMuehlenhoff) [14:36:48] !log joal@tin Finished deploy [analytics/refinery@ed8ecbc]: Patching interlanguage link and manually add a jar to our collection (duration: 04m 10s) [14:36:58] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:36:58] 10Operations, 10fundraising-tech-ops, 10netops: switch network port 2/0/3 (frdb1003) back to administration-vlan - https://phabricator.wikimedia.org/T184723#3893410 (10Jgreen) [14:37:03] moritzm: I guess even if we depool it, it will not harm to keep the server updated [14:37:46] !log jmm@puppetmaster1001 conftool action : set/pooled=inactive; selector: mw1271.eqiad.wmnet [14:37:56] jynus: definitly, thanks [14:37:56] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:37:57] PROBLEM - etc request latencies on chlorine is CRITICAL: CRITICAL - etcd_request_latencies is 87992 https://grafana.wikimedia.org/dashboard/db/kubernetes-api [14:37:58] PROBLEM - etc request latencies on argon is CRITICAL: CRITICAL - etcd_request_latencies is 55657 https://grafana.wikimedia.org/dashboard/db/kubernetes-api [14:38:24] I've set it to inactive for now, so that it doesn't cause further scap irritation when Chris powers it down for diagnostics [14:39:57] RECOVERY - etc request latencies on chlorine is OK: OK - etcd_request_latencies is 3509 https://grafana.wikimedia.org/dashboard/db/kubernetes-api [14:39:58] RECOVERY - etc request latencies on argon is OK: OK - etcd_request_latencies is 17967 https://grafana.wikimedia.org/dashboard/db/kubernetes-api [14:40:13] 14:38:52 Finished rsync common (duration: 02m 45s) [14:40:57] marostegui: I deploy, then you can? [14:40:57] !log rolling reboot of prometheus in codfw for kernel security update [14:41:08] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:41:15] (03CR) 10Jcrespo: [C: 032] mariadb: Promote db2040 as the new codfw-s7 master [mediawiki-config] - 10https://gerrit.wikimedia.org/r/403453 (https://phabricator.wikimedia.org/T176243) (owner: 10Jcrespo) [14:41:18] jynus: sure, no rush [14:41:24] I am perfectly fine waiting for you :) [14:42:44] (03Merged) 10jenkins-bot: mariadb: Promote db2040 as the new codfw-s7 master [mediawiki-config] - 10https://gerrit.wikimedia.org/r/403453 (https://phabricator.wikimedia.org/T176243) (owner: 10Jcrespo) [14:42:58] (03CR) 10jenkins-bot: mariadb: Promote db2040 as the new codfw-s7 master [mediawiki-config] - 10https://gerrit.wikimedia.org/r/403453 (https://phabricator.wikimedia.org/T176243) (owner: 10Jcrespo) [14:43:33] (03CR) 10Elukey: [C: 032] profile::analytics::refinery::job::streams_check: set correct spark jar [puppet] - 10https://gerrit.wikimedia.org/r/403655 (https://phabricator.wikimedia.org/T176983) (owner: 10Joal) [14:45:50] !log jynus@tin Synchronized wmf-config/db-codfw.php: Promote db2040 as the new codfw-s7 master (duration: 01m 22s) [14:46:00] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:46:29] (03CR) 10Giuseppe Lavagetto: "https://puppet-compiler.wmflabs.org/compiler02/9705/ compilation results seem ok. Per bblack's request, I reproduced the current (unintend" [puppet] - 10https://gerrit.wikimedia.org/r/403440 (owner: 10Giuseppe Lavagetto) [14:47:07] !log continue swift frontend eqiad roll-restart, ms-fe1007 / ms-fe1008 [14:47:15] zeljkof: no errors now [14:47:16] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:47:39] !log filippo@puppetmaster1001 conftool action : set/pooled=no; selector: name=ms-fe1007.eqiad.wmnet [14:47:49] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:49:07] PROBLEM - All k8s worker nodes are healthy on checker.tools.wmflabs.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 SERVICE UNAVAILABLE - string OK not found on http://checker.tools.wmflabs.org:80/k8s/nodes/ready - 185 bytes in 0.158 second response time [14:49:19] ^ me [14:50:07] RECOVERY - All k8s worker nodes are healthy on checker.tools.wmflabs.org is OK: HTTP OK: HTTP/1.1 200 OK - 166 bytes in 0.218 second response time [14:50:27] chasemp: OCD request from my side, could we change the name to something like: "Health of k8s worker nodes on checker.tools.wmflabs.org" ? :-P [14:50:32] (03PS2) 10Marostegui: db-eqiad.php: db1067,db1066,db1099:331{1,8} [mediawiki-config] - 10https://gerrit.wikimedia.org/r/403658 (https://phabricator.wikimedia.org/T162807) [14:50:48] jynus: you done with your changes? [14:50:50] I always read it as a contradiction "All k8s worker nodes are healthy... is CRITICAL" [14:51:01] but might perfectly just be me ;) [14:51:05] volans: seems reasonable to me [14:51:44] marostegui: yes [14:51:53] ok! will deploy my changes then [14:52:03] (03PS1) 10Ema: Alerts instrumentation: return instance of bytes [debs/pybal] - 10https://gerrit.wikimedia.org/r/403664 (https://phabricator.wikimedia.org/T184721) [14:54:09] (03PS4) 10Giuseppe Lavagetto: base::resolving: explicitly pass arguments [puppet] - 10https://gerrit.wikimedia.org/r/403440 [14:54:18] PROBLEM - DPKG on ms-fe1007 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [14:54:52] yes yes [14:55:08] (03CR) 10Marostegui: [C: 032] db-eqiad.php: db1067,db1066,db1099:331{1,8} [mediawiki-config] - 10https://gerrit.wikimedia.org/r/403658 (https://phabricator.wikimedia.org/T162807) (owner: 10Marostegui) [14:56:55] (03Merged) 10jenkins-bot: db-eqiad.php: db1067,db1066,db1099:331{1,8} [mediawiki-config] - 10https://gerrit.wikimedia.org/r/403658 (https://phabricator.wikimedia.org/T162807) (owner: 10Marostegui) [14:57:00] (03CR) 10Giuseppe Lavagetto: [C: 031] Alerts instrumentation: return instance of bytes [debs/pybal] - 10https://gerrit.wikimedia.org/r/403664 (https://phabricator.wikimedia.org/T184721) (owner: 10Ema) [14:57:05] (03CR) 10jenkins-bot: db-eqiad.php: db1067,db1066,db1099:331{1,8} [mediawiki-config] - 10https://gerrit.wikimedia.org/r/403658 (https://phabricator.wikimedia.org/T162807) (owner: 10Marostegui) [14:57:50] jynus: great, thanks! :) [14:58:01] !log Upgrade mariadb and kernel on db1066 [14:58:11] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:58:18] RECOVERY - DPKG on ms-fe1007 is OK: All packages OK [14:58:53] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Repool db1067,db1099:3318,db1099:3311, depool db1066 (duration: 01m 19s) [14:59:03] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:01:12] !log filippo@puppetmaster1001 conftool action : set/pooled=yes; selector: name=ms-fe1007.eqiad.wmnet [15:01:23] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:03:38] PROBLEM - mediawiki-installation DSH group on mw1271 is CRITICAL: Host mw1271 is not in mediawiki-installation dsh group [15:05:22] (03CR) 10Ottomata: [C: 031] [WIP] coal: Consume EventLogging from Kafka instead of ZMQ [puppet] - 10https://gerrit.wikimedia.org/r/403560 (https://phabricator.wikimedia.org/T110903) (owner: 10Krinkle) [15:05:27] ACKNOWLEDGEMENT - mediawiki-installation DSH group on mw1271 is CRITICAL: Host mw1271 is not in mediawiki-installation dsh group Muehlenhoff T184722 [15:05:52] (03CR) 10Ottomata: [C: 031] "Oh WIP. I guess +1 to idea :)" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/403560 (https://phabricator.wikimedia.org/T110903) (owner: 10Krinkle) [15:05:58] !log rolling reboot of prometheus in eqiad for kernel security update [15:06:08] (03CR) 10Ema: [C: 032] Alerts instrumentation: return instance of bytes [debs/pybal] - 10https://gerrit.wikimedia.org/r/403664 (https://phabricator.wikimedia.org/T184721) (owner: 10Ema) [15:06:10] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:06:17] (03PS1) 10Marostegui: mariadb: Update db1066 socket location [puppet] - 10https://gerrit.wikimedia.org/r/403670 [15:06:23] (03PS1) 10Ema: Alerts instrumentation: return instance of bytes [debs/pybal] (1.14) - 10https://gerrit.wikimedia.org/r/403671 (https://phabricator.wikimedia.org/T184721) [15:06:39] (03CR) 10Paladox: "> What is the license on the original work you used? Did you add an" [puppet] - 10https://gerrit.wikimedia.org/r/402665 (owner: 10Paladox) [15:07:07] (03CR) 10Marostegui: [C: 032] mariadb: Update db1066 socket location [puppet] - 10https://gerrit.wikimedia.org/r/403670 (owner: 10Marostegui) [15:08:10] (03PS15) 10Paladox: Update gerrit login display [puppet] - 10https://gerrit.wikimedia.org/r/402665 [15:08:12] (03CR) 10Ema: [C: 032] Alerts instrumentation: return instance of bytes [debs/pybal] (1.14) - 10https://gerrit.wikimedia.org/r/403671 (https://phabricator.wikimedia.org/T184721) (owner: 10Ema) [15:08:13] !log filippo@puppetmaster1001 conftool action : set/pooled=no; selector: name=ms-fe1008.eqiad.wmnet [15:08:23] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:09:02] (03PS1) 10Marostegui: db-eqiad.php: Increase weight db1099:3311 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/403673 [15:12:00] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Increase weight db1099:3311 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/403673 (owner: 10Marostegui) [15:13:36] (03Merged) 10jenkins-bot: db-eqiad.php: Increase weight db1099:3311 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/403673 (owner: 10Marostegui) [15:14:19] (03PS1) 10Jcrespo: mariadb: Update package to 10.1.30 [software] - 10https://gerrit.wikimedia.org/r/403674 [15:14:21] (03PS1) 10Jcrespo: dblist: Reorder s7.hosts so new codfw master is last [software] - 10https://gerrit.wikimedia.org/r/403675 [15:14:57] (03PS3) 10Giuseppe Lavagetto: wmflib: simplify the role() function, convert to the new API [puppet] - 10https://gerrit.wikimedia.org/r/402345 [15:15:01] (03CR) 10Jcrespo: [V: 032 C: 032] mariadb: Update package to 10.1.30 [software] - 10https://gerrit.wikimedia.org/r/403674 (owner: 10Jcrespo) [15:15:19] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Increase db1099:3311 weight (duration: 01m 23s) [15:15:31] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:15:34] (03CR) 10Jcrespo: [V: 032 C: 032] dblist: Reorder s7.hosts so new codfw master is last [software] - 10https://gerrit.wikimedia.org/r/403675 (owner: 10Jcrespo) [15:16:45] (03CR) 10jenkins-bot: db-eqiad.php: Increase weight db1099:3311 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/403673 (owner: 10Marostegui) [15:18:46] !log clear trusty-wikimedia from apertium packages. The apertium services is a long time now on jessie and all users should have migrated by now. If not, they should [15:18:57] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:19:28] 10Operations, 10Cloud-VPS, 10Toolforge, 10Patch-For-Review, 10cloud-services-team (Kanban): Cloud: Labvirt and instance reboots for Meltdown - https://phabricator.wikimedia.org/T184189#3893588 (10chasemp) I did https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Meltdown_Response#PIlot_in_Toolforg... [15:20:07] (03PS1) 10Ema: Use pooled-and-up servers in can-depool logic [debs/pybal] - 10https://gerrit.wikimedia.org/r/403677 (https://phabricator.wikimedia.org/T184715) [15:20:23] (03CR) 10Mobrovac: "The concept looks good, but the patch needs a bit more work." (033 comments) [puppet] - 10https://gerrit.wikimedia.org/r/403464 (owner: 10Arlolra) [15:20:50] (03PS3) 10Cmjohnson: adding dns entries both production and mgmt for mw1338-mw1348. [dns] - 10https://gerrit.wikimedia.org/r/403425 [15:21:08] PROBLEM - puppet last run on notebook1001 is CRITICAL: CRITICAL: Puppet has 4 failures. Last run 6 minutes ago with 4 failures. Failed resources (up to 3 shown): Package[sudo],Package[php5-cli],Package[php5-curl],Package[php5-mysql] [15:21:27] (03CR) 10Cmjohnson: [C: 032] adding dns entries both production and mgmt for mw1338-mw1348. [dns] - 10https://gerrit.wikimedia.org/r/403425 (owner: 10Cmjohnson) [15:21:39] cmjohnson1: \o/ [15:22:17] (03CR) 10Alexandros Kosiaris: "recheck" [debs/contenttranslation/apertium-dan-nor] - 10https://gerrit.wikimedia.org/r/397229 (https://phabricator.wikimedia.org/T171406) (owner: 10KartikMistry) [15:22:20] (03CR) 10Alexandros Kosiaris: "recheck" [debs/contenttranslation/apertium-swe-nor] - 10https://gerrit.wikimedia.org/r/403347 (https://phabricator.wikimedia.org/T171406) (owner: 10KartikMistry) [15:22:24] (03CR) 10Alexandros Kosiaris: "recheck" [debs/contenttranslation/apertium-cat-srd] - 10https://gerrit.wikimedia.org/r/403340 (https://phabricator.wikimedia.org/T171406) (owner: 10KartikMistry) [15:22:29] (03PS1) 10Jcrespo: mariadb: Promote db1055 to be the x1 eqiad master instead of db1031 [puppet] - 10https://gerrit.wikimedia.org/r/403678 (https://phabricator.wikimedia.org/T183469) [15:24:11] (03PS1) 10Jcrespo: dblist: Promote db1055 to be the x1 eqiad master instead of db1031 [software] - 10https://gerrit.wikimedia.org/r/403679 (https://phabricator.wikimedia.org/T183469) [15:24:41] 10Operations, 10Cloud-VPS, 10Toolforge, 10Patch-For-Review, 10cloud-services-team (Kanban): Cloud: Labvirt and instance reboots for Meltdown - https://phabricator.wikimedia.org/T184189#3893609 (10chasemp) rush@tools-exec-1401:~$ dmesg | grep -i isolation > [ 0.000000] Kernel/User page tables isolation... [15:24:51] (03CR) 10Marostegui: [C: 031] mariadb: Promote db1055 to be the x1 eqiad master instead of db1031 [puppet] - 10https://gerrit.wikimedia.org/r/403678 (https://phabricator.wikimedia.org/T183469) (owner: 10Jcrespo) [15:25:06] (03CR) 10Marostegui: [C: 031] dblist: Promote db1055 to be the x1 eqiad master instead of db1031 [software] - 10https://gerrit.wikimedia.org/r/403679 (https://phabricator.wikimedia.org/T183469) (owner: 10Jcrespo) [15:26:57] !log filippo@puppetmaster1001 conftool action : set/pooled=yes; selector: name=ms-fe1008.eqiad.wmnet [15:27:08] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:27:52] 10Operations, 10Cloud-VPS, 10Toolforge, 10Patch-For-Review, 10cloud-services-team (Kanban): Cloud: Labvirt and instance reboots for Meltdown - https://phabricator.wikimedia.org/T184189#3893615 (10chasemp) [15:28:19] (03CR) 10Mobrovac: "LGTM, nice work! Some minor comments in-lined." (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/403574 (https://phabricator.wikimedia.org/T183999) (owner: 10Thcipriani) [15:28:55] !log reboot ruthenium for kernel security update [15:29:07] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:33:37] PROBLEM - DPKG on db2077 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [15:33:54] twentyafterfour: (or when your around) more things to backport before the train continues :) [15:33:59] I'll add you to them [15:36:55] (03PS1) 10Matthias Mullie: Enable 3D on commons [mediawiki-config] - 10https://gerrit.wikimedia.org/r/403680 (https://phabricator.wikimedia.org/T184728) [15:38:19] (03CR) 10jerkins-bot: [V: 04-1] Enable 3D on commons [mediawiki-config] - 10https://gerrit.wikimedia.org/r/403680 (https://phabricator.wikimedia.org/T184728) (owner: 10Matthias Mullie) [15:38:45] (03PS1) 10Marostegui: db-eqiad.php: Increase weight for db1099:3311 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/403681 [15:39:53] (03PS2) 10Matthias Mullie: Enable 3D on commons [mediawiki-config] - 10https://gerrit.wikimedia.org/r/403680 (https://phabricator.wikimedia.org/T184728) [15:40:50] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Increase weight for db1099:3311 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/403681 (owner: 10Marostegui) [15:41:16] (03CR) 10Alexandros Kosiaris: [C: 032] Depends on cg3 (>= 1.0.0~r12254) [debs/contenttranslation/apertium-dan-nor] - 10https://gerrit.wikimedia.org/r/397229 (https://phabricator.wikimedia.org/T171406) (owner: 10KartikMistry) [15:41:24] (03CR) 10Alexandros Kosiaris: [C: 032] apertium-swe-nor: Updated dependency on cg3 [debs/contenttranslation/apertium-swe-nor] - 10https://gerrit.wikimedia.org/r/403347 (https://phabricator.wikimedia.org/T171406) (owner: 10KartikMistry) [15:41:30] (03CR) 10Alexandros Kosiaris: [C: 032] apertium-cat-srd: New upstream and updated dependencies [debs/contenttranslation/apertium-cat-srd] - 10https://gerrit.wikimedia.org/r/403340 (https://phabricator.wikimedia.org/T171406) (owner: 10KartikMistry) [15:42:21] (03Merged) 10jenkins-bot: db-eqiad.php: Increase weight for db1099:3311 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/403681 (owner: 10Marostegui) [15:42:36] (03CR) 10jenkins-bot: db-eqiad.php: Increase weight for db1099:3311 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/403681 (owner: 10Marostegui) [15:43:57] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Increase db1099:3311 weight (duration: 01m 21s) [15:44:07] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:46:07] RECOVERY - puppet last run on notebook1001 is OK: OK: Puppet is currently enabled, last run 50 seconds ago with 0 failures [15:48:03] (03CR) 10Matthias Mullie: [C: 04-1] "Don't merge; needs to be verified & merged at deployment time" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/403680 (https://phabricator.wikimedia.org/T184728) (owner: 10Matthias Mullie) [15:48:50] 10Operations, 10Cloud-VPS, 10cloud-services-team: Reboot non-labvirt cloud provider hardware for meltdown - https://phabricator.wikimedia.org/T184730#3893707 (10chasemp) [15:49:01] 10Operations, 10Cloud-VPS, 10cloud-services-team: Reboot non-labvirt cloud provider hardware for meltdown - https://phabricator.wikimedia.org/T184730#3893724 (10chasemp) [15:49:06] 10Operations, 10Cloud-VPS, 10Toolforge, 10Patch-For-Review, 10cloud-services-team (Kanban): Cloud: Labvirt and instance reboots for Meltdown - https://phabricator.wikimedia.org/T184189#3875329 (10chasemp) [15:51:35] !log reboot oxygen for kernel security update [15:51:46] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:52:37] PROBLEM - Router interfaces on cr1-codfw is CRITICAL: CRITICAL: host 208.80.153.192, interfaces up: 120, down: 1, dormant: 0, excluded: 0, unused: 0 [15:52:47] PROBLEM - Router interfaces on cr2-ulsfo is CRITICAL: CRITICAL: host 198.35.26.193, interfaces up: 76, down: 1, dormant: 0, excluded: 0, unused: 0 [15:59:36] !log reboot lithium for kernel security update [15:59:47] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:01:07] PROBLEM - puppet last run on scb2004 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[set debconf flag seen for wireshark-common/install-setuid] [16:02:47] PROBLEM - DPKG on scb2001 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [16:03:18] PROBLEM - puppet last run on scb2003 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 3 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[set debconf flag seen for wireshark-common/install-setuid] [16:03:47] RECOVERY - DPKG on scb2001 is OK: All packages OK [16:05:35] !log upgrade apertium on scb200* nodes [16:05:43] !log rebooting notebook1001 for kernel security update [16:05:46] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:05:56] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:06:07] RECOVERY - puppet last run on scb2004 is OK: OK: Puppet is currently enabled, last run 16 seconds ago with 0 failures [16:07:28] PROBLEM - Host notebook1001 is DOWN: PING CRITICAL - Packet loss = 100% [16:08:17] RECOVERY - puppet last run on scb2003 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [16:08:47] RECOVERY - Host notebook1001 is UP: PING OK - Packet loss = 0%, RTA = 0.31 ms [16:13:27] 10Operations, 10Puppet: Upgrade puppetDB to version 3.2 or newer - https://phabricator.wikimedia.org/T177253#3893843 (10herron) Spoke with Faidon about this a bit. There are some issues with the unstable puppetdb-4.4.1 package that must be fixed before it transitions into testing, etc. Since the timeframe fo... [16:17:32] (03CR) 10Volans: [C: 031] "The new logic looks good to me. If possible add some tests too to ensure it has the correct behaviour." [debs/pybal] - 10https://gerrit.wikimedia.org/r/403677 (https://phabricator.wikimedia.org/T184715) (owner: 10Ema) [16:18:49] (03PS1) 10Cmjohnson: Adding mac addresses for mw1338-48 [puppet] - 10https://gerrit.wikimedia.org/r/403691 (https://phabricator.wikimedia.org/T165519) [16:19:02] elukey ^ [16:19:07] !log rebooting mwlog1001 for kernel security update [16:19:18] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:20:38] (03CR) 10Elukey: Adding mac addresses for mw1338-48 (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/403691 (https://phabricator.wikimedia.org/T165519) (owner: 10Cmjohnson) [16:20:56] ugh! tabs [16:20:58] PROBLEM - puppet last run on scb2006 is CRITICAL: CRITICAL: Puppet has 2 failures. Last run 4 minutes ago with 2 failures. Failed resources (up to 3 shown): Package[cg3],Package[hfst] [16:21:36] cmjohnson1: except from that it looks fine from a quick look! I'll start the installs tomorrow if I have time [16:22:01] yeah, it's the script I use to pull out the mac addresses. I always forget that it does the tabs [16:22:27] PROBLEM - Restbase edge esams on text-lb.esams.wikimedia.org is CRITICAL: /api/rest_v1/feed/featured/{yyyy}/{mm}/{dd} (Retrieve aggregated feed content for April 29, 2016) timed out before a response was received [16:23:18] RECOVERY - Restbase edge esams on text-lb.esams.wikimedia.org is OK: All endpoints are healthy [16:24:09] (03PS2) 10Cmjohnson: Adding mac addresses for mw1338-48 [puppet] - 10https://gerrit.wikimedia.org/r/403691 (https://phabricator.wikimedia.org/T165519) [16:25:58] (03PS1) 10Marostegui: db-eqiad.php: Fully repool db1099:3311 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/403694 [16:26:22] (03PS3) 10Cmjohnson: Adding mac addresses for mw1338-48 [puppet] - 10https://gerrit.wikimedia.org/r/403691 (https://phabricator.wikimedia.org/T165519) [16:26:48] 10Operations, 10Ops-Access-Requests, 10Patch-For-Review: Request access to analytics cluster for bawolff - https://phabricator.wikimedia.org/T184582#3893878 (10RobH) [16:27:12] (03PS4) 10Cmjohnson: Adding mac addresses for mw1338-48 [puppet] - 10https://gerrit.wikimedia.org/r/403691 (https://phabricator.wikimedia.org/T165519) [16:28:28] !log rebooting mwlog2001 for kernel security update [16:28:39] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:28:41] (03CR) 10Cmjohnson: [C: 032] Adding mac addresses for mw1338-48 [puppet] - 10https://gerrit.wikimedia.org/r/403691 (https://phabricator.wikimedia.org/T165519) (owner: 10Cmjohnson) [16:30:50] 10Operations, 10ops-eqiad, 10Patch-For-Review, 10User-Elukey, 10User-Joe: rack and setup mw1307-1348 - https://phabricator.wikimedia.org/T165519#3893885 (10Cmjohnson) [16:31:07] (03CR) 10Volans: [C: 04-1] "With this approach the cache is local to the server and in case of failover to the other deployment server it will be lost." [puppet] - 10https://gerrit.wikimedia.org/r/403574 (https://phabricator.wikimedia.org/T183999) (owner: 10Thcipriani) [16:32:05] 10Operations, 10ops-eqiad, 10Patch-For-Review, 10User-Elukey, 10User-Joe: rack and setup mw1307-1348 - https://phabricator.wikimedia.org/T165519#3626294 (10Cmjohnson) the final 10 servers have been racked. 9 of 10 are now ready to be installed. There is an issue with the idrac setup on mw1340 but will b... [16:32:10] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Fully repool db1099:3311 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/403694 (owner: 10Marostegui) [16:36:48] PROBLEM - DPKG on restbase-test2001 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [16:36:55] that's me ^ [16:37:07] (03Merged) 10jenkins-bot: db-eqiad.php: Fully repool db1099:3311 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/403694 (owner: 10Marostegui) [16:37:17] (03CR) 10jenkins-bot: db-eqiad.php: Fully repool db1099:3311 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/403694 (owner: 10Marostegui) [16:37:21] (03PS1) 10Faidon Liambotis: Remove utils/expanderb.rb [puppet] - 10https://gerrit.wikimedia.org/r/403697 [16:37:25] (03PS1) 10Faidon Liambotis: base: wrap lines in check_puppetrun to < 110 [puppet] - 10https://gerrit.wikimedia.org/r/403698 [16:37:25] (03PS1) 10Faidon Liambotis: network: reword slice_network_constants' errors [puppet] - 10https://gerrit.wikimedia.org/r/403699 [16:37:27] (03PS1) 10Faidon Liambotis: wmflib/hiera: wrap long lines [puppet] - 10https://gerrit.wikimedia.org/r/403700 [16:37:29] (03CR) 10Arlolra: Switch to YAML configuration for Parsoid on ruthenium (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/403464 (owner: 10Arlolra) [16:37:48] RECOVERY - DPKG on restbase-test2001 is OK: All packages OK [16:38:10] (03CR) 10jerkins-bot: [V: 04-1] network: reword slice_network_constants' errors [puppet] - 10https://gerrit.wikimedia.org/r/403699 (owner: 10Faidon Liambotis) [16:39:17] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Fully repool db1099:3311 (duration: 01m 22s) [16:39:28] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:40:11] (03PS2) 10Faidon Liambotis: network: reword slice_network_constants' errors [puppet] - 10https://gerrit.wikimedia.org/r/403699 [16:40:13] (03PS2) 10Faidon Liambotis: wmflib/hiera: wrap long lines [puppet] - 10https://gerrit.wikimedia.org/r/403700 [16:41:07] (03CR) 10jerkins-bot: [V: 04-1] network: reword slice_network_constants' errors [puppet] - 10https://gerrit.wikimedia.org/r/403699 (owner: 10Faidon Liambotis) [16:44:11] (03PS1) 10Elukey: Allow to explicitly set the JAVA_HOME environment variable [puppet/cdh] - 10https://gerrit.wikimedia.org/r/403701 (https://phabricator.wikimedia.org/T166248) [16:49:38] PROBLEM - DPKG on scb1002 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [16:49:58] PROBLEM - DPKG on scb1003 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [16:49:58] PROBLEM - DPKG on scb1004 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [16:50:07] PROBLEM - DPKG on scb1001 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [16:50:58] RECOVERY - puppet last run on scb2006 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures [16:51:11] (03PS1) 10Mobrovac: JobQueue: Use EventBus for HTMLCacheUpdate except en, commons, wikidata [mediawiki-config] - 10https://gerrit.wikimedia.org/r/403703 (https://phabricator.wikimedia.org/T182023) [16:51:58] RECOVERY - DPKG on scb1004 is OK: All packages OK [16:51:58] RECOVERY - DPKG on scb1003 is OK: All packages OK [16:52:12] (03PS3) 10Faidon Liambotis: network: reword slice_network_constants' errors [puppet] - 10https://gerrit.wikimedia.org/r/403699 [16:52:14] (03PS3) 10Faidon Liambotis: wmflib/hiera: wrap long lines [puppet] - 10https://gerrit.wikimedia.org/r/403700 [16:52:57] (03CR) 10jerkins-bot: [V: 04-1] network: reword slice_network_constants' errors [puppet] - 10https://gerrit.wikimedia.org/r/403699 (owner: 10Faidon Liambotis) [16:53:38] RECOVERY - DPKG on scb1002 is OK: All packages OK [16:53:53] (03PS4) 10Faidon Liambotis: network: reword slice_network_constants' errors [puppet] - 10https://gerrit.wikimedia.org/r/403699 [16:53:55] (03PS4) 10Faidon Liambotis: wmflib/hiera: wrap long lines [puppet] - 10https://gerrit.wikimedia.org/r/403700 [16:54:16] (03CR) 10Elukey: "https://puppet-compiler.wmflabs.org/compiler03/9708/" [puppet/cdh] - 10https://gerrit.wikimedia.org/r/403701 (https://phabricator.wikimedia.org/T166248) (owner: 10Elukey) [16:54:42] !log upgrade and restart db1095- it may add some minutes of lag to some wikis on wikireplicas [16:54:53] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:55:04] !log start rolling restart of restbase-test / restbase-dev cluster [16:55:07] RECOVERY - DPKG on scb1001 is OK: All packages OK [16:55:13] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:57:13] !log upgrade apertium on scb100* nodes done [16:57:17] PROBLEM - puppet last run on scb1004 is CRITICAL: CRITICAL: Puppet has 17 failures. Last run 5 minutes ago with 17 failures. Failed resources (up to 3 shown): Package[prometheus-node-exporter],Package[sudo],Package[hunspell-vi],Package[myspell-en-gb] [16:57:22] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:59:06] akosiaris: Thanks! [16:59:09] (03CR) 10Mobrovac: Switch to YAML configuration for Parsoid on ruthenium (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/403464 (owner: 10Arlolra) [17:00:04] godog, moritzm, and _joe_: Dear deployers, time to do the Puppet SWAT(Max 8 patches) deploy. Dont look at me like that. You signed up for it. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180111T1700). [17:00:04] Amir1: A patch you scheduled for Puppet SWAT(Max 8 patches) is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker. [17:00:12] o/ [17:00:14] kart_: thanks as well [17:01:07] 10Operations, 10Analytics-Cluster, 10Analytics-Kanban, 10Traffic, and 2 others: TLS security review of the Kafka stack - https://phabricator.wikimedia.org/T182993#3894034 (10Ottomata) Here's a Q: In cergen, I'm generating EC keys using a [[ https://cryptography.io/en/latest/hazmat/primitives/asymmetric/ec... [17:01:44] Amir1: if nobody takes care of it I'll do later on after analytics meetings :) [17:01:49] (https://gerrit.wikimedia.org/r/#/c/403366/2/modules/statistics/manifests/wmde/graphite.pp right?) [17:02:09] yup [17:02:11] elukey: I'll do it [17:02:17] godog: <3 [17:02:40] (03PS3) 10Filippo Giunchedi: statistics: Install php5-dom for wmde scripts [puppet] - 10https://gerrit.wikimedia.org/r/403366 (https://phabricator.wikimedia.org/T165463) (owner: 10Ladsgroup) [17:03:11] (03CR) 10Faidon Liambotis: [C: 031] PuppetDB backend: add support for API v4 (032 comments) [software/cumin] - 10https://gerrit.wikimedia.org/r/399821 (https://phabricator.wikimedia.org/T182575) (owner: 10Volans) [17:03:13] (03CR) 10Filippo Giunchedi: [C: 032] statistics: Install php5-dom for wmde scripts [puppet] - 10https://gerrit.wikimedia.org/r/403366 (https://phabricator.wikimedia.org/T165463) (owner: 10Ladsgroup) [17:03:47] Amir1: merged [17:04:38] Thanks! [17:04:53] We should be able to see the results by tomorrow [17:06:24] (03CR) 10Ottomata: "Will this work for all CDH daemons? Hive? Oozie? Etc.?" [puppet/cdh] - 10https://gerrit.wikimedia.org/r/403701 (https://phabricator.wikimedia.org/T166248) (owner: 10Elukey) [17:09:49] (03CR) 10Faidon Liambotis: [C: 031] "\o/ Looks good overall, see some comments inline. Would have been better to see the NodeSet -> nodeset changes in a separate pre-commit, b" (032 comments) [software/cumin] - 10https://gerrit.wikimedia.org/r/402059 (owner: 10Volans) [17:17:02] 10Operations, 10ops-esams, 10hardware-requests: Procure and install LVS and miscellaneous servers - https://phabricator.wikimedia.org/T184068#3894067 (10RobH) [17:17:44] (03PS4) 10Arlolra: Switch to YAML configuration for Parsoid on ruthenium [puppet] - 10https://gerrit.wikimedia.org/r/403464 [17:19:45] (03PS5) 10Arlolra: Switch to YAML configuration for Parsoid on ruthenium [puppet] - 10https://gerrit.wikimedia.org/r/403464 [17:21:03] (03CR) 10Elukey: "Testing now the upgrade of the cluster, going to report soon. So far everything works, didn't test the coordinator daemons." [puppet/cdh] - 10https://gerrit.wikimedia.org/r/403701 (https://phabricator.wikimedia.org/T166248) (owner: 10Elukey) [17:21:46] 10Operations, 10Datasets-General-or-Unknown, 10Dumps-Generation, 10hardware-requests: Give misc dump crons their own host - https://phabricator.wikimedia.org/T181936#3894073 (10RobH) In eqiad spares we have the following system: wmf4749 - Dual Intel Xeon E5-2640 v3 2.6GHz/8Core per CPU - 64GB RAM - dual 1... [17:21:59] apergos: ^ i think we have a spare and its cheaper to allocate a spare thats a year old [17:22:01] than buy a new system [17:22:06] even thought hespare has 64GB ram =] [17:22:17] RECOVERY - puppet last run on scb1004 is OK: OK: Puppet is currently enabled, last run 58 seconds ago with 0 failures [17:22:34] so if that works for you, then escalate the task to mark or paravoid for them to sign off on the allocation and i'll get it moving =] [17:22:40] robh: how old is that spare, and how many cores does it have? [17:22:47] i posted on the task, its dual 8 core cpu [17:22:49] so 16 total [17:22:51] 64 gb ram [17:23:00] and its in warranty until 2019-03-24 [17:23:09] I have to talk to hoo about that "only 250% increase by the end of the year" too, that's a bit.. large [17:23:49] also dual 1tb LFF are pretty much the cheapest disk we can get (we can go slightly cheaper and go SFF chassis with 500GB SATA but that chassis costs more so it comes out even ;) [17:23:53] all right, will follow up on the ticket, thank you! [17:24:09] let me know if you need anything else [17:24:11] sorry it sat for so long! [17:24:12] not yet [17:24:30] just need to see if those specs are going to hold up given the wikidata needs [17:24:53] we also have 3 of these spares in eqiad, i dont think that matters for you since you are just asking for one [17:24:56] but fyi ;D [17:26:28] what is our service contract like for those? [17:26:42] i.e. if the box is down due to disk or board or whatever, how long before it's back up, typically? [17:26:43] ? [17:26:45] oh [17:26:56] all of our systems under warranty are next business day hardware replacement, or should be [17:27:21] that's good enough I believe, [17:27:40] i.e. rather than having a fallback host, one host that can be brought back into service within a couple days should be fine [17:27:45] all the data is stored elsewhere [17:28:35] cool, yeah and reconfirmed [17:28:40] its pro support so the sata are covered [17:28:44] great [17:29:02] i think our basic waranty hosts (which for a good 4 month period didnt cover sata on dell after the first year) have al aged out [17:29:07] but this one is set. [17:29:18] pro support, next business day replacement on all hw. [17:29:24] sweet [17:29:27] PROBLEM - haproxy failover on dbproxy1003 is CRITICAL: CRITICAL check_failover servers up 1 down 1 [17:29:27] PROBLEM - haproxy failover on dbproxy1008 is CRITICAL: CRITICAL check_failover servers up 1 down 1 [17:32:46] !log shutting down db1059 for maintenance [17:32:54] those proxies are due to^ [17:32:55] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:33:02] will come back when db1059 comes back [17:34:47] PROBLEM - Host db1059.mgmt is DOWN: PING CRITICAL - Packet loss = 100% [17:36:41] (03PS2) 10Elukey: Allow to explicitly set the JAVA_HOME environment variable [puppet/cdh] - 10https://gerrit.wikimedia.org/r/403701 (https://phabricator.wikimedia.org/T166248) [17:39:58] RECOVERY - Host db1059.mgmt is UP: PING OK - Packet loss = 0%, RTA = 1.05 ms [17:40:07] (03PS16) 10Paladox: Update gerrit login display [puppet] - 10https://gerrit.wikimedia.org/r/402665 [17:40:09] 10Operations, 10Commons, 10Wikimedia-SVG-rendering, 10media-storage, 10Patch-For-Review: Install Noto fonts on scaling servers for SVG rendering - https://phabricator.wikimedia.org/T184664#3894190 (10kaldari) >So basically, we can ship this now, and then a task can be filed to add the others *after* T174... [17:40:19] 10Operations, 10Commons, 10Wikimedia-SVG-rendering, 10media-storage, 10Patch-For-Review: Install Noto fonts on scaling servers for SVG rendering - https://phabricator.wikimedia.org/T184664#3894192 (10kaldari) [17:40:27] RECOVERY - haproxy failover on dbproxy1003 is OK: OK check_failover servers up 2 down 0 [17:40:27] RECOVERY - haproxy failover on dbproxy1008 is OK: OK check_failover servers up 2 down 0 [17:40:38] PROBLEM - HP RAID on restbase1011 is CRITICAL: Return code of 255 is out of bounds [17:40:57] (03PS17) 10Paladox: Update gerrit login display [puppet] - 10https://gerrit.wikimedia.org/r/402665 [17:42:57] (03PS1) 10Cmjohnson: Adding mw1340 to dhcp file [puppet] - 10https://gerrit.wikimedia.org/r/403715 (https://phabricator.wikimedia.org/T165519) [17:43:41] 10Operations, 10Analytics, 10hardware-requests: Refresh or replace oxygen - https://phabricator.wikimedia.org/T181264#3894199 (10RobH) We don't have any spare hardware with SSDs, but do have spares with 1TB SATA. wmf4750 - Dell PoweEdge R430 - Dual Intel Xeon E5-2640 v3 2.6GHz - 64GB RAM Oxygen has only 79... [17:43:46] 10Operations, 10Analytics, 10hardware-requests: Refresh or replace oxygen - https://phabricator.wikimedia.org/T181264#3894202 (10RobH) a:03faidon [17:43:48] (03CR) 10Awight: [C: 031] "Elegant and useful!" [puppet] - 10https://gerrit.wikimedia.org/r/402665 (owner: 10Paladox) [17:43:58] PROBLEM - Long running screen/tmux on restbase1011 is CRITICAL: Return code of 255 is out of bounds [17:44:07] RECOVERY - MegaRAID on db1059 is OK: OK: optimal, 1 logical, 2 physical, WriteBack policy [17:45:29] (03CR) 10Krinkle: "That wasn't a conscious decision (it's the default for this template, and thus far, with the exception of Toolforge error pages, there are" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/403569 (https://phabricator.wikimedia.org/T181018) (owner: 10Krinkle) [17:46:25] (03CR) 10jerkins-bot: [V: 04-1] Adding mw1340 to dhcp file [puppet] - 10https://gerrit.wikimedia.org/r/403715 (https://phabricator.wikimedia.org/T165519) (owner: 10Cmjohnson) [17:47:36] (03PS2) 10Cmjohnson: Adding mw1340 to dhcp file [puppet] - 10https://gerrit.wikimedia.org/r/403715 (https://phabricator.wikimedia.org/T165519) [17:48:28] (03CR) 10Cmjohnson: [C: 032] Adding mw1340 to dhcp file [puppet] - 10https://gerrit.wikimedia.org/r/403715 (https://phabricator.wikimedia.org/T165519) (owner: 10Cmjohnson) [17:50:27] RECOVERY - DPKG on db2077 is OK: All packages OK [17:52:18] PROBLEM - IPMI Sensor Status on restbase1011 is CRITICAL: Return code of 255 is out of bounds [17:52:31] (03PS18) 10Paladox: Update gerrit login display [puppet] - 10https://gerrit.wikimedia.org/r/402665 [17:53:31] 10Puppet, 10Analytics, 10User-Elukey: analytics VPS project puppet errors - https://phabricator.wikimedia.org/T184482#3894231 (10Nuria) [17:53:38] (03CR) 10Awight: [C: 031] Update gerrit login display [puppet] - 10https://gerrit.wikimedia.org/r/402665 (owner: 10Paladox) [17:53:43] 10Puppet, 10Analytics, 10User-Elukey: analytics VPS project puppet errors - https://phabricator.wikimedia.org/T184482#3884526 (10Nuria) We will be killing that instance [17:54:14] 10Puppet, 10Analytics-Kanban, 10User-Elukey: analytics VPS project puppet errors - https://phabricator.wikimedia.org/T184482#3894238 (10Nuria) [17:54:19] (03CR) 10Paladox: "This is what it looks like now:" [puppet] - 10https://gerrit.wikimedia.org/r/402665 (owner: 10Paladox) [17:54:28] (03PS19) 10Paladox: Update gerrit login display [puppet] - 10https://gerrit.wikimedia.org/r/402665 [17:55:29] (03PS2) 10Ema: Use pooled-and-up servers in can-depool logic [debs/pybal] - 10https://gerrit.wikimedia.org/r/403677 (https://phabricator.wikimedia.org/T184715) [17:56:14] 10Puppet, 10Analytics-Kanban, 10User-Elukey: analytics VPS project puppet errors - https://phabricator.wikimedia.org/T184482#3894245 (10elukey) 05Open>03Resolved a:03elukey Instance deleted! [17:56:22] (03CR) 10Ema: "> The new logic looks good to me. If possible add some tests too to" [debs/pybal] - 10https://gerrit.wikimedia.org/r/403677 (https://phabricator.wikimedia.org/T184715) (owner: 10Ema) [17:56:27] 10Puppet, 10Analytics-Kanban, 10User-Elukey: analytics VPS project puppet errors - https://phabricator.wikimedia.org/T184482#3894248 (10elukey) 05Resolved>03Open [17:56:59] 10Puppet, 10Analytics-Kanban, 10User-Elukey: analytics VPS project puppet errors - https://phabricator.wikimedia.org/T184482#3884526 (10elukey) Just seen that there are more instances to fix. Some of them are under experiment at the moment, will try to fix them asap though. [17:57:45] 10Operations, 10Analytics, 10hardware-requests: EQIAD: (1) hardware request for eventlog1001 replacement - eventlog1002. - https://phabricator.wikimedia.org/T184551#3894260 (10RobH) a:03Ottomata So we have a spare server that would actually meet this requirement without ordering more hardware: wmf4751 - w... [18:00:04] cscott, arlolra, subbu, halfak, and Amir1: #bothumor Q:Why did functions stop calling each other? A:They had arguments. Rise for Services – Graphoid / Parsoid / Citoid / ORES . (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180111T1800). [18:00:04] No GERRIT patches in the queue for this window AFAICS. [18:00:57] !log upgrade and restart db1102- it may add some minutes of lag to some wikis on wikireplicas [18:01:07] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:01:26] 10Operations, 10hardware-requests: Replacement hardware for cumin masters - https://phabricator.wikimedia.org/T178392#3894298 (10RobH) Since this is a public task, I cannot put down our pricing. I'll just link to the associated tasks though. The alternative is we just allocate our standard lvs/misc machine,... [18:01:35] 10Operations, 10ops-eqiad, 10Patch-For-Review, 10User-Elukey, 10User-Joe: rack and setup mw1307-1348 - https://phabricator.wikimedia.org/T165519#3894300 (10Cmjohnson) a:03elukey assigning this to @elukey to complete installs. [18:04:17] 10Operations, 10ops-eqiad: Hardware check on mw1271 - https://phabricator.wikimedia.org/T184722#3894309 (10Cmjohnson) The server has a DIMM error on A1 ------------------------------------------------------------------------------- Record: 2 Date/Time: 01/10/2018 10:57:11 Source: system Severity:... [18:06:12] !log lvs4007: upgrade to latest jessie point release (8.10) T182656 and linux kernel 4.9.65-3+deb9u1~bpo8+2 (KPTI) T184267 [18:06:24] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:06:24] T182656: Integrate jessie 8.10 point release - https://phabricator.wikimedia.org/T182656 [18:08:54] PROBLEM - pybal on lvs4007 is CRITICAL: PROCS CRITICAL: 0 processes with UID = 0 (root), args /usr/sbin/pybal [18:09:04] PROBLEM - PyBal backends health check on lvs4007 is CRITICAL: PYBAL CRITICAL - Bad Response from pybal: 500 Cant connect to localhost:9090 [18:11:02] that's me ^ [18:11:25] PROBLEM - Host mw1271.mgmt is DOWN: PING CRITICAL - Packet loss = 100% [18:11:54] RECOVERY - pybal on lvs4007 is OK: PROCS OK: 1 process with UID = 0 (root), args /usr/sbin/pybal [18:12:17] 10Operations, 10ops-eqiad: Hardware check on mw1271 - https://phabricator.wikimedia.org/T184722#3894338 (10Cmjohnson) I swapped the DIMM from A1 to B1 to see if the error persists on the DIMM bank or if it stays with the DIMM. [18:13:21] 10Operations, 10ops-eqiad, 10Patch-For-Review, 10User-Elukey, 10User-Joe: Decommission mw1180-1200 - https://phabricator.wikimedia.org/T183895#3894341 (10Cmjohnson) [18:13:31] 10Operations, 10ops-eqiad, 10Patch-For-Review, 10User-Elukey, 10User-Joe: Decommission mw1180-1200 - https://phabricator.wikimedia.org/T183895#3866486 (10Cmjohnson) removed from rack and racktables updated. [18:14:10] 10Operations, 10ops-eqiad, 10DBA: db1059 BBU issues - https://phabricator.wikimedia.org/T184160#3894345 (10Cmjohnson) Swapped the bbu....leaving this open to confirm everything is okay. [18:14:34] 10Operations, 10ops-eqiad, 10Patch-For-Review, 10User-Elukey, 10User-Joe: rack and setup mw1307-1348 - https://phabricator.wikimedia.org/T165519#3627421 (10Cmjohnson) [18:14:37] 10Operations, 10ops-eqiad, 10Patch-For-Review, 10User-Elukey, 10User-Joe: Decommission mw1180-1200 - https://phabricator.wikimedia.org/T183895#3894347 (10Cmjohnson) 05Open>03Resolved a:03Cmjohnson [18:15:29] 10Operations, 10ops-eqiad, 10DBA: db1059 BBU issues - https://phabricator.wikimedia.org/T184160#3894351 (10jcrespo) 05Open>03Resolved icinga check says things are ok- we will reopen if they reappear. Thank you for the help! [18:16:35] RECOVERY - Host mw1271.mgmt is UP: PING OK - Packet loss = 0%, RTA = 0.93 ms [18:18:56] 10Operations, 10Discovery, 10Discovery-Search, 10Elasticsearch: Increase time before alert for elasticsearch disk space issues - https://phabricator.wikimedia.org/T136702#3894358 (10debt) 05Open>03declined Closing this for now - the errors aren't so noisy right now. :) [18:23:08] 10Operations, 10Analytics, 10hardware-requests: EQIAD: (1) hardware request for eventlog1001 replacement - eventlog1002. - https://phabricator.wikimedia.org/T184551#3894361 (10Ottomata) a:05Ottomata>03faidon Great, that'll do just fine! Assigned to @faidon for approval. [18:27:05] (03CR) 10Ottomata: [C: 031] hieradata: extend SMART eqiad deployment [puppet] - 10https://gerrit.wikimedia.org/r/403621 (https://phabricator.wikimedia.org/T86552) (owner: 10Filippo Giunchedi) [18:28:18] (03CR) 10Ottomata: role::analytics_cluster::coordinator: add a profile to restart streaming jobs (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/395504 (https://phabricator.wikimedia.org/T176983) (owner: 10Joal) [18:34:38] (03CR) 10Elukey: role::analytics_cluster::coordinator: add a profile to restart streaming jobs (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/395504 (https://phabricator.wikimedia.org/T176983) (owner: 10Joal) [18:47:02] 10Operations, 10Puppet: Puppet hosts with their cert revoked can still run puppet - https://phabricator.wikimedia.org/T184444#3894450 (10herron) One observation... The apache puppet frontend (vhost on port 8140) is configured with `SSLCARevocationPath /var/lib/puppet/server/ssl/crl` but this directory contain... [18:57:45] PROBLEM - Restbase root url on restbase-test2001 is CRITICAL: connect to address 10.192.16.149 and port 7231: Connection refused [18:57:54] PROBLEM - Restbase root url on cerium is CRITICAL: connect to address 10.64.16.147 and port 7231: Connection refused [18:57:55] PROBLEM - Restbase root url on restbase-dev1005 is CRITICAL: connect to address 10.64.16.96 and port 7231: Connection refused [18:57:55] PROBLEM - Restbase root url on restbase-test2003 is CRITICAL: connect to address 10.192.16.151 and port 7231: Connection refused [18:57:55] PROBLEM - Restbase root url on restbase-dev1004 is CRITICAL: connect to address 10.64.0.89 and port 7231: Connection refused [18:58:04] PROBLEM - Restbase root url on restbase-dev1006 is CRITICAL: connect to address 10.64.48.10 and port 7231: Connection refused [18:58:15] PROBLEM - Restbase root url on restbase-test2002 is CRITICAL: connect to address 10.192.16.150 and port 7231: Connection refused [18:58:15] PROBLEM - Restbase root url on xenon is CRITICAL: connect to address 10.64.0.200 and port 7231: Connection refused [18:58:56] 10Operations, 10Puppet: Puppet hosts with their cert revoked can still run puppet - https://phabricator.wikimedia.org/T184444#3894529 (10herron) Removed the previous comment after realizing those symlinks of course point to the same file! It looks like `SSLCARevocationCheck` defaults to none and is currently... [19:00:04] addshore, hashar, anomie, no_justification, aude, MaxSem, twentyafterfour, RoanKattouw, Dereckson, thcipriani, Niharika, and zeljkof: That opportune time is upon us again. Time for a Morning SWAT (Max 8 patches) deploy. Don't be afraid. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180111T1900). [19:00:04] ebernhardson: A patch you scheduled for Morning SWAT (Max 8 patches) is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker. [19:00:44] I left my patch from yesterday's SWAT undeployed, sorry [19:00:51] As my penance I can do this SWAT [19:01:42] (03CR) 10Mobrovac: "Using the new wmf puppet style guide, role classes are supposed to only include profiles and have nothing else beside them~[1], so this co" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/403464 (owner: 10Arlolra) [19:02:19] 10Operations, 10Analytics-Cluster, 10Analytics-Kanban, 10Traffic, and 2 others: TLS security review of the Kafka stack - https://phabricator.wikimedia.org/T182993#3894549 (10BBlack) No, it's not a problem. For certificates, `NIST P-256` (aka `secp256r1`, aka `prime256v1`, depending on who's talking) is re... [19:03:47] RoanKattouw: oooooh [19:04:08] RoanKattouw: I have a bunch of stuff we could swat before the train, or we can do it as part of the train slot [19:04:23] Feel free to add [19:04:34] {{doing}} [19:04:50] i have one patch in swat i believe [19:05:18] Yes you do [19:06:47] RoanKattouw: added 4 [19:07:01] Cool will look in a sec [19:08:20] the rb issues above are known [19:09:16] !log catrope@tin Synchronized php-1.31.0-wmf.16/extensions/Flow/modules/styles/flow/widgets/editor/mw.flow.ui.EditorWidget.less: T184631 (duration: 01m 22s) [19:09:28] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:09:29] T184631: Fallback input widget not styled properly - https://phabricator.wikimedia.org/T184631 [19:13:34] addshore: Re https://gerrit.wikimedia.org/r/#/c/403693/1/tests/phpunit/includes/RevisionTest.php I think you should have used 6s to flag the evilness of using an arbitrary huge page ID and assuming it won't be associated with a title ;) [19:14:22] 10Operations, 10ops-eqiad: Hardware check on mw1271 - https://phabricator.wikimedia.org/T184722#3894592 (10Peachey88) [19:14:36] ebernhardson: Your patch is on mwdebug1002, please test [19:14:50] addshore: Yours are going to take a while because I'm going to wait for all four to merge, then pull them all at once [19:15:05] RoanKattouw: hehe, Daniel wrote that one ;) [19:15:24] For a second there I was thinking this really has been a long day as I don't remember writing those tests! ;) [19:15:39] RoanKattouw: sounds good, the Flow patch has a new i18n key, would that need a full scap? [19:15:47] Ah yes good point [19:15:51] I did realize that earlier but had forgotten [19:15:59] That solves the annoying problem of having to sync files all over core anyway [19:16:06] If so, sync the core files first and makesure that ius okay, then do a full scap for the Flow stuff [19:16:32] or, whatever ;) [19:17:15] RoanKattouw: looks good [19:17:29] addshore: Yes [19:20:13] !log catrope@tin Synchronized php-1.31.0-wmf.16/includes/: Deprecate old interwiki search result widget (duration: 02m 17s) [19:20:26] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:25:51] (03PS1) 10Dzahn: ci: replace apache with httpd for proxy/website [puppet] - 10https://gerrit.wikimedia.org/r/403730 [19:26:26] (03PS1) 10Dduvall: Use default SSL CA cert path for https requests [docker-images/docker-pkg] - 10https://gerrit.wikimedia.org/r/403731 [19:26:53] (03CR) 10Dzahn: "@Hashar i added follow-up on top of this one: https://gerrit.wikimedia.org/r/#/c/403730/" [puppet] - 10https://gerrit.wikimedia.org/r/399311 (owner: 10Hashar) [19:31:50] addshore: OK your patches are on 1002, please test [19:31:52] All four of them [19:32:01] okay!!! [19:32:13] 10Operations, 10Cloud-VPS, 10Toolforge, 10Patch-For-Review, 10cloud-services-team (Kanban): Cloud: Labvirt and instance reboots for Meltdown - https://phabricator.wikimedia.org/T184189#3894622 (10chasemp) It seems `tools-worker-1015` did not get the update as I forgot to reboot it. But I'm hoping we hav... [19:33:31] (03PS1) 10Dzahn: releases-mediawiki: replace apache module with httpd [puppet] - 10https://gerrit.wikimedia.org/r/403734 [19:33:58] RoanKattouw: all of the testable things look good :) [19:34:03] and, the sites are up :) [19:34:07] !log catrope@tin Started scap: SWAT [19:34:18] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:37:17] RoanKattouw: is there any way to watch the process of a scap as another user? :P [19:40:16] addshore: only for scap3 not for mediawiki scaping. You can kinda-sorta watch the logs and know what's happening, but it's not what the scap user is seeing. [19:40:34] I see thcipriani ! [19:42:52] you can tail the logs on the log server actually. I used to have a python script to parse them and follow along [19:43:11] * bd808 looks to see if he still has such a thing [19:43:14] oooh [19:43:30] sounds like i should have another window in my tmux for log monitoring during deploying [19:43:55] neat :) [19:44:07] Well also the logs are still insanely verbose during the first stage [19:44:36] see, this was my suggestion for how to get T170484 working. Pipe log output to midi on your local machine. [19:44:37] T170484: Play elevator music while scap is running - https://phabricator.wikimedia.org/T170484 [19:44:48] yeah, the first stage is terrifying. [19:45:00] There's a fix in master but it's still not deployed to prod [19:45:10] yeah [19:45:22] T182643 [19:45:22] T182643: cache_git_info (from e.g. scap sync-file) is way way too verbose - https://phabricator.wikimedia.org/T182643 [19:45:34] I guess it hasn't been that long in terms of work weeks, the holidays were in between, so it feels longe ago [19:46:29] I can't find my log parsing script, but it wouldn't be hard to make another. The scap logs are nicely formed json which makes them easy to parse and then do fancier things with [19:46:38] definitely, there were some rumblings of a new release earlier this week IIRC. In process still, likely. There were a few features that weren't ready to go out. [19:47:27] 10Operations, 10Analytics-Cluster, 10Analytics-Kanban, 10Traffic, and 2 others: TLS security review of the Kafka stack - https://phabricator.wikimedia.org/T182993#3894725 (10Ottomata) > Oook, I've set this [restricted certpath algorithms] on all jumbo Kafka brokers. Welp, something is totally crazy with P... [19:49:15] 10Operations, 10LDAP-Access-Requests, 10WMF-NDA-Requests: Request to be added to the ldap/wmde group - https://phabricator.wikimedia.org/T184620#3894732 (10RStallman-legalteam) Tonina's NDA for LDAP access is signed and on file. Thanks! [19:53:37] RoanKattouw: flow is scraming a bit [19:53:39] or was [19:53:43] Hm? [19:53:58] i saw a spike of stuff in exception.log but may have just been a tiny spike [19:54:10] *looks at logstash* [19:54:30] A bunch of,, [{exception_id}] {exception_url} Flow\Exception\CatchableFatalErrorException from line 571 of /srv/mediawiki/php-1.31.0-wmf.15/extensions/Flow/Hooks.php: Argument 1 passed to Flow\Formatter\CheckUserQuery::getResult() must be an instance of CheckUser, S [19:54:36] .15 so not related [19:56:42] yay paravoid and mark :) [19:56:57] same trendy new job title :) [19:58:53] !log elasticsearch / cirrus / codfw rolling reboot completed. Cluster still recovering [19:59:04] PROBLEM - High CPU load on API appserver on mw1202 is CRITICAL: CRITICAL - load average: 53.40, 25.72, 17.45 [19:59:15] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:59:34] PROBLEM - High CPU load on API appserver on mw1201 is CRITICAL: CRITICAL - load average: 49.86, 33.40, 24.27 [19:59:53] hmm RoanKattouw high cpu load on api servers, that happened during a sync yesterday too [19:59:54] PROBLEM - High CPU load on API appserver on mw1233 is CRITICAL: CRITICAL - load average: 64.18, 37.81, 28.53 [20:00:04] no_justification: How many deployers does it take to do MediaWiki train deploy? (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180111T2000). [20:00:04] No GERRIT patches in the queue for this window AFAICS. [20:00:04] PROBLEM - High CPU load on API appserver on mw1227 is CRITICAL: CRITICAL - load average: 56.03, 35.09, 28.47 [20:00:11] though only with 2 serevrs [20:00:39] interesting [20:00:54] PROBLEM - High CPU load on API appserver on mw1203 is CRITICAL: CRITICAL - load average: 50.80, 28.20, 18.67 [20:01:04] RECOVERY - High CPU load on API appserver on mw1202 is OK: OK - load average: 21.85, 23.67, 17.77 [20:01:04] PROBLEM - High CPU load on API appserver on mw1226 is CRITICAL: CRITICAL - load average: 51.02, 29.22, 20.02 [20:01:14] they all seem to have recovered / are recovering again [20:01:34] PROBLEM - High CPU load on API appserver on mw1208 is CRITICAL: CRITICAL - load average: 58.57, 27.01, 18.00 [20:02:54] RECOVERY - High CPU load on API appserver on mw1203 is OK: OK - load average: 16.24, 22.83, 17.88 [20:03:04] RECOVERY - High CPU load on API appserver on mw1226 is OK: OK - load average: 21.25, 25.56, 19.83 [20:03:12] ugh: https://phabricator.wikimedia.org/T184749 looks like a serious blocker [20:03:25] maybe worthy of a rollback? [20:03:34] RECOVERY - High CPU load on API appserver on mw1201 is OK: OK - load average: 14.80, 24.06, 22.69 [20:03:34] RECOVERY - High CPU load on API appserver on mw1208 is OK: OK - load average: 17.98, 23.22, 17.80 [20:03:55] twentyafterfour: if data is being corrupted definitely worthy of a rollback, urghhh [20:04:05] yeah crappy [20:04:20] !log catrope@tin Finished scap: SWAT (duration: 30m 12s) [20:04:31] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:04:44] thanks for the swat RoanKattouw! [20:04:56] RoanKattouw: all clear? I guess I need to roll back group 1 due to https://phabricator.wikimedia.org/T184749 [20:05:11] Whoa [20:05:14] Yeah all clear [20:05:16] That's an insane bug [20:05:35] indeed [20:05:36] Immediately after saving, the text showed correctly, but purging the cache (with ?action=purge) shows distorted text ...... interesting [20:08:25] PROBLEM - Apache HTTP on mw2126 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [20:09:15] RECOVERY - Apache HTTP on mw2126 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 616 bytes in 0.123 second response time [20:09:18] (03PS6) 10Arlolra: Switch to YAML configuration for Parsoid on ruthenium [puppet] - 10https://gerrit.wikimedia.org/r/403464 [20:09:28] (03PS1) 1020after4: Rollback group1 to wmf.15 due to T184749 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/403743 [20:09:47] (03CR) 10jerkins-bot: [V: 04-1] Switch to YAML configuration for Parsoid on ruthenium [puppet] - 10https://gerrit.wikimedia.org/r/403464 (owner: 10Arlolra) [20:09:54] (03CR) 1020after4: [C: 032] Rollback group1 to wmf.15 due to T184749 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/403743 (owner: 1020after4) [20:11:03] no_justification: sorry for the spurious ping from jouncebot, forgot to update the deploy calendar wiki with mukunda's name [20:11:04] PROBLEM - Disk space on stat1005 is CRITICAL: Return code of 255 is out of bounds [20:11:13] (03CR) 10Arlolra: "> this could be used as an opportunity to change things" [puppet] - 10https://gerrit.wikimedia.org/r/403464 (owner: 10Arlolra) [20:11:29] (03Merged) 10jenkins-bot: Rollback group1 to wmf.15 due to T184749 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/403743 (owner: 1020after4) [20:11:44] (03CR) 10jenkins-bot: Rollback group1 to wmf.15 due to T184749 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/403743 (owner: 1020after4) [20:12:18] !log rebooting labvirt1017 for kernel upgrade [20:12:30] !log twentyafterfour@tin rebuilt and synchronized wikiversions files: Rollback group1 to wmf.15 due to T184749 refs T180749 [20:12:32] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:12:44] PROBLEM - configured eth on stat1005 is CRITICAL: Return code of 255 is out of bounds [20:12:44] PROBLEM - MD RAID on stat1005 is CRITICAL: Return code of 255 is out of bounds [20:12:45] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:12:54] PROBLEM - Check the NTP synchronisation status of timesyncd on stat1005 is CRITICAL: Return code of 255 is out of bounds [20:12:58] PROBLEM - DPKG on stat1005 is CRITICAL: Return code of 255 is out of bounds [20:13:04] PROBLEM - Check systemd state on stat1005 is CRITICAL: Return code of 255 is out of bounds [20:13:15] PROBLEM - dhclient process on stat1005 is CRITICAL: Return code of 255 is out of bounds [20:16:24] PROBLEM - puppet last run on stat1005 is CRITICAL: Return code of 255 is out of bounds [20:18:22] addshore: twentyafterfour: I just saw the issue with wmf.16 and the rollback, no known ETA on a fix at this point right? [20:18:37] greg-g: no know ETA indeed [20:18:39] greg-g: right [20:18:51] we all just saw it I think [20:18:55] I was just about to wind down for the day [20:19:04] nobody found the culprit? [20:19:22] Platonides: it's most likely in Revision and RevisionStore [20:19:40] alright, I'll send an email around [20:20:05] (at a guess) [20:20:37] wow that bug was known and in prod for two and a half hours? [20:20:54] (03CR) 10Dzahn: [C: 032] "http://puppet-compiler.wmflabs.org/9709/releases1001.eqiad.wmnet/" [puppet] - 10https://gerrit.wikimedia.org/r/403734 (owner: 10Dzahn) [20:21:40] did the affected pages fix themselves after the rollback? [20:21:57] I think this is worthy of an incident report [20:22:12] greg-g, ^ [20:22:31] Platonides: good question [20:23:10] https://sv.wiktionary.org/wiki/Anv%C3%A4ndare:Skalman/test seems still double-utf8 after purging [20:23:23] what about all the wikis that are made of non-ascii text? [20:23:41] if they were stored wrong, that's a lot of bad edits that were crated [20:23:44] *created [20:24:02] is it just me or is this diff doing different things to do the 2 pages themselves? https://sv.wiktionary.org/w/index.php?title=sur&type=revision&diff=3085353&oldid=3023155 https://sv.wiktionary.org/w/index.php?title=sur&oldid=3023155 https://sv.wiktionary.org/w/index.php?title=sur&oldid=3085353 [20:25:06] oh no, again, a purge and the revision reflected the diff [20:25:37] Krenair: there is no magic rollback function in phabricator. What do you mean "what about all the wikis that are made of non-ascii text?" ... that's why I rolled back to wmf.15 asap [20:25:41] Krenair: yep yep, rollbacks/issues like this get incident reports and post-mortems [20:25:55] RECOVERY - DPKG on stat1005 is OK: All packages OK [20:25:57] (03PS4) 10Dzahn: contint: convert Apache proxying to profiles [puppet] - 10https://gerrit.wikimedia.org/r/399311 (owner: 10Hashar) [20:26:05] RECOVERY - Check systemd state on stat1005 is OK: OK - running: The system is fully operational [20:26:15] RECOVERY - dhclient process on stat1005 is OK: PROCS OK: 0 processes with command name dhclient [20:26:24] RECOVERY - puppet last run on stat1005 is OK: OK: Puppet is currently enabled, last run 5 seconds ago with 0 failures [20:26:39] (03PS2) 10Thcipriani: Scap canary: cache last good deploy time [puppet] - 10https://gerrit.wikimedia.org/r/403574 (https://phabricator.wikimedia.org/T183999) [20:26:41] (03CR) 10Hashar: "Isn't it going to cause puppet to whine because of a duplicate ressource Class[:httpd] ? :(" [puppet] - 10https://gerrit.wikimedia.org/r/403730 (owner: 10Dzahn) [20:26:44] RECOVERY - configured eth on stat1005 is OK: OK - interfaces up [20:26:44] RECOVERY - MD RAID on stat1005 is OK: OK: Active: 8, Working: 8, Failed: 0, Spare: 0 [20:27:12] shouldn't we be able to catch an encoding issue with a fairly straightforward unit test? [20:27:17] a purge would reload from the db, so the palces to start looking are changes to endcoding on the way to the db and decoding on the way out [20:27:21] (03CR) 10Dzahn: [C: 032] contint: convert Apache proxying to profiles [puppet] - 10https://gerrit.wikimedia.org/r/399311 (owner: 10Hashar) [20:27:39] twentyafterfour: not necessarily if there is db interaction involved [20:27:52] can anyone recreate in beta cluster? [20:28:20] wmf.16 is still on test wikis as well [20:30:43] * twentyafterfour thinks we should have jouncebot or *some*bot shout loudly when there are UBN blockers in phabricator [20:31:09] yup, I can reproduce locally with $wgLegacyEncoding = 'windows-1252'; and the test string "sammansättningar" [20:31:15] I can't recreate at https://deployment.wikimedia.beta.wmflabs.org/wiki/User:Bd808/sandbox -- do we have beta cluster wikis with the encoding from the bug report? [20:31:46] addshore: awesome. repro is first step to fixing [20:32:04] RECOVERY - High CPU load on API appserver on mw1227 is OK: OK - load average: 19.01, 21.05, 23.93 [20:32:26] twentyafterfour: the problem is "which UBNs". there are always some in Phab [20:32:27] interesting the purge is indeed needed..... first load shows the correct text [20:32:28] * twentyafterfour is still concerned about those high CPU alerts on API servers as well [20:32:42] bd808: the ones marked as blockers to the train deployment tasks [20:33:23] twentyafterfour: you come up with the conduit query and I'll code it into a bot :) [20:33:29] bd808: good question (re beta cluster encoding) [20:33:32] bd808: cool! [20:33:53] I would have rolled back a bit sooner with an alert, but not much sooner really [20:34:14] Indeed, has I see that I would have poke someone for a revert straight away! [20:34:17] *had [20:34:47] enwiki, dawiki, svwiki, nlwiki, dawiktionary, svwiktionary are the windows-1252 wikis per wmf-config [20:34:50] I monitor the blockers but not CONSTANTLY, just occasionally [20:35:05] (03PS3) 10Thcipriani: Scap canary: cache last good deploy time [puppet] - 10https://gerrit.wikimedia.org/r/403574 (https://phabricator.wikimedia.org/T183999) [20:35:19] (03PS2) 10Dzahn: ci: replace apache with httpd for proxy/website [puppet] - 10https://gerrit.wikimedia.org/r/403730 [20:35:26] So, this only got as far as group 1 so only dawiktionary and svwiktionary will have had the issue [20:35:28] those are all of the wgLegacyEncoding overrides too [20:36:59] bd808: the created entries in text page didn't have the utf8 flag? [20:37:02] we can probably use cirrussearch to find the pages that have the telltale à in the wikitext for fixing the source [20:37:44] some utf8 encoded characters won't have that à [20:38:00] Platonides: no idea. this is far out of the parts of MW that I know about [20:38:13] twentyafterfour: I considered pinging but then got distracted by a meeting to start plus wasn't sure if it's actually a .16 issue and if it happens on any other wiki... Sorry :-/ [20:38:21] I mean, we can just go for LAL edits between the time this was deployed and the time of the rollback if we want to attempt to fix stuff [20:38:22] well, since you had it reproduced locally [20:38:35] (03CR) 1020after4: Scap canary: cache last good deploy time (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/403574 (https://phabricator.wikimedia.org/T183999) (owner: 10Thcipriani) [20:38:56] andre__: it's ok! [20:39:04] I dont know how many edits those 2 wikis get in 24 hours but I don't imagine the number is too great [20:39:12] bd808: look at the text table [20:39:13] it got seen pretty quickly, nonetheless [20:39:22] Platonides: addshore had the repro [20:39:42] oh, addshore then :) [20:39:47] run SELECT old_flags FROM text; [20:39:57] expected output is "utf-8" everywhere [20:40:12] (03CR) 10Thcipriani: "> With this approach the cache is local to the server and in case of" (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/403574 (https://phabricator.wikimedia.org/T183999) (owner: 10Thcipriani) [20:40:14] (03CR) 10Dzahn: [C: 04-1] "http://puppet-compiler.wmflabs.org/9710/contint1001.wikimedia.org/change.contint1001.wikimedia.org.err" [puppet] - 10https://gerrit.wikimedia.org/r/403730 (owner: 10Dzahn) [20:40:15] (add some limits and sorting if not on a test instance with a handful of edits ;) ) [20:40:24] (03CR) 10Krinkle: load ActiveAbtract extension explicitly so class autoloading works (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/403114 (https://phabricator.wikimedia.org/T184177) (owner: 10ArielGlenn) [20:41:44] Platonides: at least for https://sv.wiktionary.org/w/index.php?title=sur&oldid=3023155 it has "old_flags: utf-8,gzip,external" in prod [20:41:50] (03CR) 10Thcipriani: Scap canary: cache last good deploy time (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/403574 (https://phabricator.wikimedia.org/T183999) (owner: 10Thcipriani) [20:41:57] so not that :( [20:42:19] there are some more intermediates... checking [20:42:23] would have been too easy to fix.. [20:42:55] RECOVERY - Check the NTP synchronisation status of timesyncd on stat1005 is OK: OK: synced at Thu 2018-01-11 20:42:51 UTC. [20:43:17] What am I saying, the issue wont be with Revision or RevisionStore, but with SqlBlobStore [20:43:35] for that page ... this edit is the start of the corruption -- https://sv.wiktionary.org/w/index.php?title=sur&diff=3085351&oldid=3023155 [20:43:48] * addshore goes to write a unit test [20:46:10] Here are the changes: https://phabricator.wikimedia.org/source/mediawiki/compare/?head=wmf%2F1.31.0-wmf.16&against=wmf%2F1.31.0-wmf.15 [20:50:59] I am seeing a weird RevisionStore error when doing a wmf.16 install [20:51:19] as if it is trying to use a different table than specified [20:52:05] !log rebooting labvirt1003 [20:52:16] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:52:27] I don't see anything significant related to SqlBlobStore in the diff between wmf.15 and wmf.16 [20:52:50] twentyafterfour: nothing used it in .15! [20:52:58] oh [20:53:18] I'm just reading commit messages anyway, looking for a clue [20:53:22] Platonides: what error? [20:53:54] PROBLEM - Host www.toolserver.org is DOWN: CRITICAL - Host Unreachable (www.toolserver.org) [20:53:54] RECOVERY - High CPU load on API appserver on mw1233 is OK: OK - load average: 19.51, 21.69, 23.65 [20:53:58] it said it couldn't insert the main page content into my_wiki with a connection open to my_core_wiki [20:54:03] twentyafterfour: all of this stuff is new / refactored from Revision, and came in in patches through .15 .16 (and whatever was before .15 [20:55:34] (03CR) 10Hashar: [C: 032] Use default SSL CA cert path for https requests [docker-images/docker-pkg] - 10https://gerrit.wikimedia.org/r/403731 (owner: 10Dduvall) [20:55:36] (03PS1) 10Ottomata: Revert yesterday's change to kafka-jumbo java.security [puppet] - 10https://gerrit.wikimedia.org/r/403753 (https://phabricator.wikimedia.org/T182993) [20:56:06] (03Merged) 10jenkins-bot: Use default SSL CA cert path for https requests [docker-images/docker-pkg] - 10https://gerrit.wikimedia.org/r/403731 (owner: 10Dduvall) [20:56:36] (03CR) 10Ottomata: [C: 032] Revert yesterday's change to kafka-jumbo java.security [puppet] - 10https://gerrit.wikimedia.org/r/403753 (https://phabricator.wikimedia.org/T182993) (owner: 10Ottomata) [20:56:38] Well, I wrote some tests and they seem to pass.... https://gerrit.wikimedia.org/r/403754 [20:56:39] ok, I reproduced it [20:56:59] it fails in sqlite, too [20:57:14] !log restarting kafka-jumbo brokers to apply https://gerrit.wikimedia.org/r/#/c/403753/ [20:57:27] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:00:07] Right, let me do a bisect..... [21:00:14] (03CR) 10Hashar: [C: 031] "I wrote that to manually expand some erb templates. Nowadays I guess I would write a spec for it :]" [puppet] - 10https://gerrit.wikimedia.org/r/403697 (owner: 10Faidon Liambotis) [21:00:35] I was on it :P [21:00:47] 10Operations, 10LDAP-Access-Requests, 10WMF-NDA-Requests: Request to be added to the ldap/wmde group - https://phabricator.wikimedia.org/T184620#3895019 (10RobH) a:03RobH I'll add the patchset and ldap permissions later today. [21:02:34] RECOVERY - Host www.toolserver.org is UP: PING OK - Packet loss = 0%, RTA = 0.78 ms [21:03:20] this is odd: [21:03:22] The merge base 7fa2c9434e1aa2edec76bacd6180d1a75d310655 is bad. [21:03:23] This means the bug has been fixed between 7fa2c9434e1aa2edec76bacd6180d1a75d310655 and [09537008d55f929003bd153df63a3a66b5f779c4]. [21:03:57] heh, i just got the same outcome [21:04:15] why isn't it going on? [21:04:16] (03PS3) 10Dzahn: ci: replace apache with httpd for proxy/website [puppet] - 10https://gerrit.wikimedia.org/r/403730 [21:04:50] ah [21:04:53] Some good revs are not ancestor of the bad rev. [21:05:13] this is probably a common ancestor [21:06:32] doing an inverse bisect [21:08:24] so the bug was fixed on wmf.15 and not merged to master? [21:08:26] wmf.15 got fixed on 8e46998c588ad3239557dba076bb5c4e23010300 [21:08:33] which is itself a revert [21:08:55] yeah that just reverts everything [21:08:56] heh [21:08:57] Revert MCR Revision related patches for .15 branch [21:09:18] so you need to bisect between wmf.12 and wmf.16? [21:09:35] .12 ? [21:09:44] the one that came before .15 [21:09:44] why not .14? [21:09:53] that was skipped? [21:09:54] because .13 and .14 were skipped due to holidays [21:10:00] 6 steps left, gimmie a mo! [21:10:02] then yes [21:10:49] (03CR) 10Dzahn: "@hashar like this? http://puppet-compiler.wmflabs.org/9711/contint1001.wikimedia.org/" [puppet] - 10https://gerrit.wikimedia.org/r/403730 (owner: 10Dzahn) [21:11:24] 10Operations, 10Cloud-VPS, 10Toolforge, 10Patch-For-Review, 10cloud-services-team (Kanban): Cloud: Labvirt and instance reboots for Meltdown - https://phabricator.wikimedia.org/T184189#3895029 (10chasemp) https://graphite.wikimedia.org/render/?width=586&height=308&_salt=1515702661.283&target=servers.labv... [21:15:19] Platonides: yup 6af796f3e0cf3e66cd7d7e59af8445f5712d68fe is the first bad commit [21:15:24] MCR: Deprecate and gut Revision class [21:15:29] Change-Id: Ia4c20a91e98df0b9b14b138eb4825c55e5200384 [21:15:54] (03CR) 10Dzahn: "nitpick: please remove literal tab characters from .css file. there is a mix of spaces and tabs in it currently" [puppet] - 10https://gerrit.wikimedia.org/r/402665 (owner: 10Paladox) [21:16:06] oh good, that sounds like a really small commit that doesn’t do much [21:16:25] (03CR) 10Dzahn: "and the MIT license thing you mentioned above would be nice, i see you already added license info to the .js file, cool" [puppet] - 10https://gerrit.wikimedia.org/r/402665 (owner: 10Paladox) [21:17:07] mutante it was the eqcss file that needed that license, but i removed it :). [21:17:22] (03CR) 10Paladox: "@Dzahn it was the eqcss file that needed that license, but i removed it :)." [puppet] - 10https://gerrit.wikimedia.org/r/402665 (owner: 10Paladox) [21:17:36] paladox: :) ok [21:17:40] :) [21:17:42] Platonides: Lucas_WMDE https://phabricator.wikimedia.org/T184749#3895051 I have to stop looking for a bit now [21:17:47] will resume later or in the morning [21:21:08] hashar: https://gerrit.wikimedia.org/r/#/c/403730/ compiles without error [21:21:25] (03CR) 10Markusguenther: "Hi guys," [puppet] - 10https://gerrit.wikimedia.org/r/402665 (owner: 10Paladox) [21:21:40] wow that's a lot of code review [21:21:46] ( https://gerrit.wikimedia.org/r/#/c/374077/ ) [21:21:47] (03PS20) 10Paladox: Update gerrit login display [puppet] - 10https://gerrit.wikimedia.org/r/402665 [21:21:47] twentyafterfour: heh [21:21:59] fun [21:22:16] Patch Set 123: [21:22:17] Post-merge build succeeded. [21:22:57] (03CR) 10Paladox: "> Hi guys," [puppet] - 10https://gerrit.wikimedia.org/r/402665 (owner: 10Paladox) [21:23:55] mutante: will look at it tomorrow :) [21:24:00] thanks! [21:24:55] mutante: would you have some spare time to upload to apt.wm.o a zuul.deb and a debianized node module ? [21:27:48] (03PS21) 10Paladox: Update gerrit login display [puppet] - 10https://gerrit.wikimedia.org/r/402665 [21:28:39] hashar: the first yes, the second i dont know.. can it be done via ticket ? [21:28:49] 10Operations, 10Fr-tech-archived-from-FY-14/15, 10MediaWiki-extensions-CentralNotice, 10Traffic, and 4 others: Spike: what to do about Special:RecordImpression and Banner History - https://phabricator.wikimedia.org/T88614#3895142 (10DStrine) [21:29:53] mutante: https://phabricator.wikimedia.org/T183569#3885851 ;D [21:30:19] mutante: it is an outdated dependency of the npm version in Jessie and that need a refresh to work with node6 :] [21:31:15] but yeah maybe that needs some second check [21:31:36] for zuul that is: [21:31:37] https://phabricator.wikimedia.org/T158243#3889124 [21:31:37] https://people.wikimedia.org/~hashar/debs/zuul_2.5.0-8-gcbc7f62-wmf6/ [21:31:44] and I have already upgraded contint machines [21:31:48] (03CR) 10Dzahn: [C: 031] "thanks markusguenther and paladox. it's cool when the original upstream author is on our gerrit change as well :)" [puppet] - 10https://gerrit.wikimedia.org/r/402665 (owner: 10Paladox) [21:36:22] hashar: looking :) [21:37:15] (03CR) 10Volans: "Addressed comments" (032 comments) [software/cumin] - 10https://gerrit.wikimedia.org/r/399821 (https://phabricator.wikimedia.org/T182575) (owner: 10Volans) [21:37:20] (03PS6) 10Volans: PuppetDB backend: add support for API v4 [software/cumin] - 10https://gerrit.wikimedia.org/r/399821 (https://phabricator.wikimedia.org/T182575) [21:37:22] (03PS4) 10Volans: Migration to Python 3 [software/cumin] - 10https://gerrit.wikimedia.org/r/402059 [21:37:32] (03CR) 10Volans: "Addressed comments" (032 comments) [software/cumin] - 10https://gerrit.wikimedia.org/r/402059 (owner: 10Volans) [21:37:34] PROBLEM - Disk space on ms-be2023 is CRITICAL: DISK CRITICAL - /srv/swift-storage/sdn1 is not accessible: Input/output error [21:40:57] (03PS22) 10Paladox: Update gerrit login display [puppet] - 10https://gerrit.wikimedia.org/r/402665 [21:41:12] mutante: I cleaned up the old package to get rid of some legacy cruft. Next week I will get a new package with updated dependencies [21:41:24] and some other patches :] [21:41:42] :) [21:42:46] 10Operations, 10Fr-tech-archived-from-FY-2015/16, 10MediaWiki-extensions-CentralNotice, 10Traffic, and 6 others: Eliminate PHP backend call for Special:RecordImpression - https://phabricator.wikimedia.org/T106624#3895354 (10DStrine) [21:45:31] 10Operations, 10DNS, 10Fr-tech-archived-from-FY-2015/16, 10Traffic: donate.wikimedia.org needs an MX record - https://phabricator.wikimedia.org/T120322#3895466 (10DStrine) [21:47:00] hashar: Distribution: UNRELEASED [21:47:07] .changes put in a distribution not listed within it! [21:47:26] bah [21:49:07] mutante: is that the zuul package? [21:49:13] hashar: edited it to jessie-wikimedia [21:49:17] node-tunnel-agent [21:49:40] ahhhh [21:49:40] reprepro ls node-tunnel-agent [21:49:40] node-tunnel-agent | 0.4.3-1 | jessie-wikimedia | amd64, i386, source [21:49:44] there you go [21:49:49] you are awesome [21:49:59] I found the issue [21:50:05] I had it build with a local hack :/ [21:50:44] * hashar rebuilds [21:50:48] 10Operations, 10Analytics, 10Analytics-Cluster, 10Fr-tech-archived-from-FY-2015/16, and 6 others: Verify kafkatee use for fundraising logs on erbium - https://phabricator.wikimedia.org/T97676#3895652 (10DStrine) [21:51:28] 10Operations, 10Continuous-Integration-Infrastructure (shipyard), 10Patch-For-Review, 10Release-Engineering-Team (Kanban): npm 1.4.21 can't use a http proxy - https://phabricator.wikimedia.org/T183569#3857222 (10Dzahn) I uploaded the files from https://people.wikimedia.org/~hashar/debs/node-tunnel-agent_0.... [21:53:11] (03PS1) 10Hashar: Fix UNRELEASED in debian/changelog [debs/node-tunnel-agent] - 10https://gerrit.wikimedia.org/r/403760 [21:53:24] PROBLEM - Apache HTTP on mw2120 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [21:54:12] (03CR) 10Krinkle: [WIP] php7 manifests for mediawiki on stretch (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/394977 (owner: 10ArielGlenn) [21:54:14] RECOVERY - Apache HTTP on mw2120 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 616 bytes in 0.124 second response time [21:54:29] hashar: regarding the zuul package, that looks already done by somebody else [21:54:36] (03PS1) 10RobH: adding user tonina to ldap section [puppet] - 10https://gerrit.wikimedia.org/r/403761 (https://phabricator.wikimedia.org/T184620) [21:54:38] zuul | 2.5.0-8-gcbc7f62-wmf3jessie1 | jessie-wikimedia | amd64, source [21:54:41] zuul | 2.5.0-8-gcbc7f62-wmf4jessie1 | jessie-wikimedia | amd64, source [21:54:48] nice [21:54:53] wait, or not [21:54:54] wmf6 ? [21:55:00] ah yeah should be wmf6 [21:55:06] and you can drop the others [21:55:16] i dont want to do try dropping anything [21:55:23] i just import new versions [21:55:29] and reprepro does the rest [21:55:29] https://gerrit.wikimedia.org/r/403760 edits the changelog. I have pushed the packages again on people.wm.o [21:55:35] (03CR) 10Krinkle: [WIP] php7 manifests for mediawiki on stretch (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/394977 (owner: 10ArielGlenn) [21:55:53] https://people.wikimedia.org/~hashar/debs/node-tunnel-agent_0.4.3/ [21:56:03] (03PS1) 10Ottomata: Allow certificates RSA keySize > 2048, Puppet generates certs like these [puppet] - 10https://gerrit.wikimedia.org/r/403762 (https://phabricator.wikimedia.org/T182993) [21:56:04] 10Operations, 10LDAP-Access-Requests, 10WMF-NDA-Requests, 10Patch-For-Review: Request to be added to the ldap/wmde group - https://phabricator.wikimedia.org/T184620#3895692 (10RobH) p:05Triage>03Normal [21:56:06] been there trying to fight with the reprepro db [21:56:07] this way the Distribution: would be the proper one in both the .changes and the debian/changelog :) [21:56:26] (03CR) 10RobH: [C: 032] adding user tonina to ldap section [puppet] - 10https://gerrit.wikimedia.org/r/403761 (https://phabricator.wikimedia.org/T184620) (owner: 10RobH) [21:57:16] hashar: just merge it , i had manually made the same change before import [21:57:30] (03PS2) 10Ottomata: Allow certificates RSA keySize > 2048, Puppet generates certs like these [puppet] - 10https://gerrit.wikimedia.org/r/403762 (https://phabricator.wikimedia.org/T182993) [21:57:32] (03CR) 10Hashar: [C: 032] Fix UNRELEASED in debian/changelog [debs/node-tunnel-agent] - 10https://gerrit.wikimedia.org/r/403760 (owner: 10Hashar) [21:57:34] to rebuild it and reimport it we'd have to bump version [21:57:41] not worth it [21:57:45] I agree :D [21:58:04] that will let me migrate all the npm jobs to docker containers \o/ [21:58:05] (03CR) 10Ottomata: [C: 032] Allow certificates RSA keySize > 2048, Puppet generates certs like these [puppet] - 10https://gerrit.wikimedia.org/r/403762 (https://phabricator.wikimedia.org/T182993) (owner: 10Ottomata) [21:58:22] (03Merged) 10jenkins-bot: Fix UNRELEASED in debian/changelog [debs/node-tunnel-agent] - 10https://gerrit.wikimedia.org/r/403760 (owner: 10Hashar) [21:58:52] anomie: where are those calls? [21:58:57] 10Operations, 10LDAP-Access-Requests, 10WMF-NDA-Requests: Request to be added to the ldap/wmde group - https://phabricator.wikimedia.org/T184620#3895699 (10RobH) [21:59:08] hashar: zuul | 2.5.0-8-gcbc7f62-wmf6 | jessie-wikimedia | amd64, source [21:59:16] zuul | 2.5.0-8-gcbc7f62-wmf4jessie1 | jessie-wikimedia | amd64, source [21:59:23] Platonides: E_NOCONTEXT [21:59:25] Exporting indices... [21:59:25] Deleting files no longer referenced... [21:59:34] xD [21:59:37] https://phabricator.wikimedia.org/T184749#3895080 [21:59:47] mutante: contint1001 looks all fine! [22:00:01] 10Operations, 10LDAP-Access-Requests, 10WMF-NDA-Requests: Request to be added to the ldap/wmde group - https://phabricator.wikimedia.org/T184620#3890300 (10RobH) 05Open>03Resolved I've merged the access update to the admin module, and also added tonina to the wmde ldap group. [22:00:03] hashar: :) [22:00:29] mutante: and for the node module I will use it tomorrow. Last thing: I guess you can claim the mail I sent to the ops list wednesday asking for the packages to be uploaded! [22:00:36] mutante: Danke Schon!!! [22:01:01] hashar: de rien [22:01:15] ok, doing now [22:01:21] Platonides: EditPage::internalAttemptSave() → WikiPage::doEditContent() → WikiPage::doModify() → Revision::__construct() calls RevisionStore::newMutableRevisionFromArray() [22:01:34] RECOVERY - Disk space on ms-be2023 is OK: DISK OK [22:04:02] !log restarting kafka-jumbo brokers to apply https://gerrit.wikimedia.org/r/#/c/403762/ [22:04:14] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:04:56] (03CR) 10Krinkle: "Note: I'm new to the mediawiki manifests in puppet and don't know the best practices very well yet." [puppet] - 10https://gerrit.wikimedia.org/r/394977 (owner: 10ArielGlenn) [22:06:44] PROBLEM - puppet last run on ms-be2023 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 6 minutes ago with 1 failures. Failed resources (up to 3 shown): File[mountpoint-/srv/swift-storage/sdn1] [22:08:37] mutante: Do you know if this suggestion is worth persuing for us? - https://phabricator.wikimedia.org/T178457#3882976 [22:09:05] (03Abandoned) 10Dzahn: gerrit: correct variable name for list of servers [puppet] - 10https://gerrit.wikimedia.org/r/397733 (owner: 10Dzahn) [22:09:14] PROBLEM - puppet last run on labvirt1017 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [22:15:05] RECOVERY - Router interfaces on cr1-codfw is OK: OK: host 208.80.153.192, interfaces up: 122, down: 0, dormant: 0, excluded: 0, unused: 0 [22:15:14] RECOVERY - Router interfaces on cr2-ulsfo is OK: OK: host 198.35.26.193, interfaces up: 78, down: 0, dormant: 0, excluded: 0, unused: 0 [22:16:03] Krinkle: yes, it seems like that is the right solution to create a tmp dir nowadays [22:16:19] but.. i dont see any usage of it in existing configs [22:20:29] (03PS1) 10Madhuvishy: WIP: nfsclient: Setup dumps mounts from new servers [puppet] - 10https://gerrit.wikimedia.org/r/403767 (https://phabricator.wikimedia.org/T171540) [22:25:33] (03CR) 10Dzahn: [C: 032] Add Noto fonts to mediawiki::packages::fonts [puppet] - 10https://gerrit.wikimedia.org/r/403605 (https://phabricator.wikimedia.org/T184664) (owner: 10Reedy) [22:25:50] (03CR) 10Dzahn: [C: 032] "awww.. unexpected dependency shows up :p" [puppet] - 10https://gerrit.wikimedia.org/r/403605 (https://phabricator.wikimedia.org/T184664) (owner: 10Reedy) [22:29:51] (03PS1) 10Ottomata: Also disable SHA224 [puppet] - 10https://gerrit.wikimedia.org/r/403774 (https://phabricator.wikimedia.org/T182993) [22:29:57] 10Operations, 10Packaging, 10Scap: SCAP: Upload debian package version 3.7.5-1 - https://phabricator.wikimedia.org/T184774#3895753 (10mmodell) p:05Triage>03High [22:30:13] (03PS1) 1020after4: Upgrade scap package to 3.7.5-1 [puppet] - 10https://gerrit.wikimedia.org/r/403775 (https://phabricator.wikimedia.org/T184774) [22:31:24] (03CR) 10Dzahn: [C: 031] "http://puppet-compiler.wmflabs.org/9712/" [puppet] - 10https://gerrit.wikimedia.org/r/397730 (owner: 10Dzahn) [22:31:27] 10Operations, 10Packaging, 10Scap, 10Patch-For-Review: SCAP: Upload debian package version 3.7.5-1 - https://phabricator.wikimedia.org/T184774#3895773 (10mmodell) [22:31:53] 10Operations, 10Packaging, 10Scap, 10Patch-For-Review: SCAP: Upload debian package version 3.7.5-1 - https://phabricator.wikimedia.org/T184774#3895753 (10mmodell) a:05akosiaris>03None [22:32:02] (03PS2) 10Dzahn: Move packages onto individual lines in require_package() for OS versions [puppet] - 10https://gerrit.wikimedia.org/r/403623 (owner: 10Reedy) [22:32:46] (03PS3) 10Dzahn: mediawiki: Move font packages onto individual lines in require_package() [puppet] - 10https://gerrit.wikimedia.org/r/403623 (owner: 10Reedy) [22:32:58] (03CR) 10Dzahn: [C: 032] "http://puppet-compiler.wmflabs.org/9713/" [puppet] - 10https://gerrit.wikimedia.org/r/403623 (owner: 10Reedy) [22:33:56] (03CR) 10Ottomata: [C: 032] Also disable SHA224 [puppet] - 10https://gerrit.wikimedia.org/r/403774 (https://phabricator.wikimedia.org/T182993) (owner: 10Ottomata) [22:34:02] (03PS2) 10Ottomata: Also disable SHA224 [puppet] - 10https://gerrit.wikimedia.org/r/403774 (https://phabricator.wikimedia.org/T182993) [22:34:06] (03CR) 10Ottomata: [V: 032 C: 032] Also disable SHA224 [puppet] - 10https://gerrit.wikimedia.org/r/403774 (https://phabricator.wikimedia.org/T182993) (owner: 10Ottomata) [22:34:31] (03PS5) 10Dzahn: Add Noto fonts to mediawiki::packages::fonts [puppet] - 10https://gerrit.wikimedia.org/r/403605 (https://phabricator.wikimedia.org/T184664) (owner: 10Reedy) [22:35:41] !log restarting kafka-jumbo brokers to apply https://gerrit.wikimedia.org/r/403774 [22:35:52] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:38:26] (03CR) 10Dzahn: "mw2246, videoscalers stretch:" [puppet] - 10https://gerrit.wikimedia.org/r/403605 (https://phabricator.wikimedia.org/T184664) (owner: 10Reedy) [22:38:36] (03CR) 10Dzahn: "works on stretch, fails on jessie" [puppet] - 10https://gerrit.wikimedia.org/r/403605 (https://phabricator.wikimedia.org/T184664) (owner: 10Reedy) [22:39:05] PROBLEM - puppet last run on mw1221 is CRITICAL: CRITICAL: Puppet has 2 failures. Last run 2 minutes ago with 2 failures. Failed resources (up to 3 shown): Package[fonts-noto-hinted],Package[fonts-noto-unhinted] [22:39:15] PROBLEM - puppet last run on mw1277 is CRITICAL: CRITICAL: Puppet has 2 failures. Last run 2 minutes ago with 2 failures. Failed resources (up to 3 shown): Package[fonts-noto-hinted],Package[fonts-noto-unhinted] [22:39:16] PROBLEM - puppet last run on mw2146 is CRITICAL: CRITICAL: Puppet has 2 failures. Last run 2 minutes ago with 2 failures. Failed resources (up to 3 shown): Package[fonts-noto-hinted],Package[fonts-noto-unhinted] [22:39:21] (03CR) 10Krinkle: [C: 031] "Please schedule (or ask someone to schedule) this commit for SWAT to see it deployed. No need to rebase until that happens, though." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/392999 (https://phabricator.wikimedia.org/T117302) (owner: 10TerraCodes) [22:39:51] (03CR) 10Dzahn: "yea, the fonts for stretch are in an ">= jessie" section, not "> jessie"..." [puppet] - 10https://gerrit.wikimedia.org/r/403605 (https://phabricator.wikimedia.org/T184664) (owner: 10Reedy) [22:40:45] PROBLEM - puppet last run on mw1226 is CRITICAL: CRITICAL: Puppet has 2 failures. Last run 4 minutes ago with 2 failures. Failed resources (up to 3 shown): Package[fonts-noto-hinted],Package[fonts-noto-unhinted] [22:41:02] 10Operations, 10Analytics-Cluster, 10Analytics-Kanban, 10Traffic, and 2 others: TLS security review of the Kafka stack - https://phabricator.wikimedia.org/T182993#3895798 (10Ottomata) Current status: kafka-jumbo running with Requested Signature Algorithms: ECDSA+SHA512:RSA+SHA512:ECDSA+SHA384:RSA+SHA384:E... [22:41:14] PROBLEM - puppet last run on mw2238 is CRITICAL: CRITICAL: Puppet has 2 failures. Last run 2 minutes ago with 2 failures. Failed resources (up to 3 shown): Package[fonts-noto-hinted],Package[fonts-noto-unhinted] [22:41:44] PROBLEM - puppet last run on mw2174 is CRITICAL: CRITICAL: Puppet has 2 failures. Last run 2 minutes ago with 2 failures. Failed resources (up to 3 shown): Package[fonts-noto-hinted],Package[fonts-noto-unhinted] [22:41:45] PROBLEM - puppet last run on mw1331 is CRITICAL: CRITICAL: Puppet has 2 failures. Last run 2 minutes ago with 2 failures. Failed resources (up to 3 shown): Package[fonts-noto-hinted],Package[fonts-noto-unhinted] [22:41:45] PROBLEM - puppet last run on mw2203 is CRITICAL: CRITICAL: Puppet has 2 failures. Last run 4 minutes ago with 2 failures. Failed resources (up to 3 shown): Package[fonts-noto-hinted],Package[fonts-noto-unhinted] [22:41:45] PROBLEM - puppet last run on mw2115 is CRITICAL: CRITICAL: Puppet has 2 failures. Last run 5 minutes ago with 2 failures. Failed resources (up to 3 shown): Package[fonts-noto-hinted],Package[fonts-noto-unhinted] [22:42:04] PROBLEM - puppet last run on mw2226 is CRITICAL: CRITICAL: Puppet has 2 failures. Last run 4 minutes ago with 2 failures. Failed resources (up to 3 shown): Package[fonts-noto-hinted],Package[fonts-noto-unhinted] [22:42:05] PROBLEM - puppet last run on mw1256 is CRITICAL: CRITICAL: Puppet has 2 failures. Last run 2 minutes ago with 2 failures. Failed resources (up to 3 shown): Package[fonts-noto-hinted],Package[fonts-noto-unhinted] [22:42:05] PROBLEM - puppet last run on mw2185 is CRITICAL: CRITICAL: Puppet has 2 failures. Last run 3 minutes ago with 2 failures. Failed resources (up to 3 shown): Package[fonts-noto-hinted],Package[fonts-noto-unhinted] [22:42:07] PROBLEM - puppet last run on mw2105 is CRITICAL: CRITICAL: Puppet has 2 failures. Last run 4 minutes ago with 2 failures. Failed resources (up to 3 shown): Package[fonts-noto-hinted],Package[fonts-noto-unhinted] [22:42:14] PROBLEM - puppet last run on mw2117 is CRITICAL: CRITICAL: Puppet has 2 failures. Last run 5 minutes ago with 2 failures. Failed resources (up to 3 shown): Package[fonts-noto-hinted],Package[fonts-noto-unhinted] [22:42:15] PROBLEM - puppet last run on mw1267 is CRITICAL: CRITICAL: Puppet has 2 failures. Last run 3 minutes ago with 2 failures. Failed resources (up to 3 shown): Package[fonts-noto-hinted],Package[fonts-noto-unhinted] [22:42:15] PROBLEM - puppet last run on mw1205 is CRITICAL: CRITICAL: Puppet has 2 failures. Last run 4 minutes ago with 2 failures. Failed resources (up to 3 shown): Package[fonts-noto-hinted],Package[fonts-noto-unhinted] [22:43:04] PROBLEM - puppet last run on mw1234 is CRITICAL: CRITICAL: Puppet has 2 failures. Last run 4 minutes ago with 2 failures. Failed resources (up to 3 shown): Package[fonts-noto-hinted],Package[fonts-noto-unhinted] [22:43:24] PROBLEM - puppet last run on mw2104 is CRITICAL: CRITICAL: Puppet has 2 failures. Last run 2 minutes ago with 2 failures. Failed resources (up to 3 shown): Package[fonts-noto-hinted],Package[fonts-noto-unhinted] [22:43:35] aww man.. i'm alreayd fixing that [22:43:44] PROBLEM - puppet last run on mw1224 is CRITICAL: CRITICAL: Puppet has 2 failures. Last run 5 minutes ago with 2 failures. Failed resources (up to 3 shown): Package[fonts-noto-hinted],Package[fonts-noto-unhinted] [22:43:55] PROBLEM - puppet last run on mw1313 is CRITICAL: CRITICAL: Puppet has 2 failures. Last run 4 minutes ago with 2 failures. Failed resources (up to 3 shown): Package[fonts-noto-hinted],Package[fonts-noto-unhinted] [22:44:04] PROBLEM - puppet last run on mw2110 is CRITICAL: CRITICAL: Puppet has 2 failures. Last run 6 minutes ago with 2 failures. Failed resources (up to 3 shown): Package[fonts-noto-hinted],Package[fonts-noto-unhinted] [22:44:07] PROBLEM - puppet last run on mw1223 is CRITICAL: CRITICAL: Puppet has 2 failures. Last run 2 minutes ago with 2 failures. Failed resources (up to 3 shown): Package[fonts-noto-hinted],Package[fonts-noto-unhinted] [22:44:07] PROBLEM - puppet last run on mw2111 is CRITICAL: CRITICAL: Puppet has 2 failures. Last run 5 minutes ago with 2 failures. Failed resources (up to 3 shown): Package[fonts-noto-hinted],Package[fonts-noto-unhinted] [22:44:07] PROBLEM - puppet last run on mw2245 is CRITICAL: CRITICAL: Puppet has 2 failures. Last run 6 minutes ago with 2 failures. Failed resources (up to 3 shown): Package[fonts-noto-hinted],Package[fonts-noto-unhinted] [22:44:07] PROBLEM - puppet last run on mw2193 is CRITICAL: CRITICAL: Puppet has 2 failures. Last run 6 minutes ago with 2 failures. Failed resources (up to 3 shown): Package[fonts-noto-hinted],Package[fonts-noto-unhinted] [22:44:07] PROBLEM - puppet last run on mw2189 is CRITICAL: CRITICAL: Puppet has 2 failures. Last run 5 minutes ago with 2 failures. Failed resources (up to 3 shown): Package[fonts-noto-hinted],Package[fonts-noto-unhinted] [22:44:14] PROBLEM - puppet last run on mw2130 is CRITICAL: CRITICAL: Puppet has 2 failures. Last run 5 minutes ago with 2 failures. Failed resources (up to 3 shown): Package[fonts-noto-hinted],Package[fonts-noto-unhinted] [22:44:15] PROBLEM - puppet last run on mw1321 is CRITICAL: CRITICAL: Puppet has 2 failures. Last run 2 minutes ago with 2 failures. Failed resources (up to 3 shown): Package[fonts-noto-hinted],Package[fonts-noto-unhinted] [22:44:15] PROBLEM - puppet last run on mw1297 is CRITICAL: CRITICAL: Puppet has 2 failures. Last run 3 minutes ago with 2 failures. Failed resources (up to 3 shown): Package[fonts-noto-hinted],Package[fonts-noto-unhinted] [22:44:15] PROBLEM - puppet last run on mw1315 is CRITICAL: CRITICAL: Puppet has 2 failures. Last run 2 minutes ago with 2 failures. Failed resources (up to 3 shown): Package[fonts-noto-hinted],Package[fonts-noto-unhinted] [22:44:15] PROBLEM - puppet last run on mw1235 is CRITICAL: CRITICAL: Puppet has 2 failures. Last run 4 minutes ago with 2 failures. Failed resources (up to 3 shown): Package[fonts-noto-hinted],Package[fonts-noto-unhinted] [22:44:15] PROBLEM - puppet last run on mw2103 is CRITICAL: CRITICAL: Puppet has 2 failures. Last run 2 minutes ago with 2 failures. Failed resources (up to 3 shown): Package[fonts-noto-hinted],Package[fonts-noto-unhinted] [22:44:16] PROBLEM - puppet last run on mw1225 is CRITICAL: CRITICAL: Puppet has 2 failures. Last run 3 minutes ago with 2 failures. Failed resources (up to 3 shown): Package[fonts-noto-hinted],Package[fonts-noto-unhinted] [22:45:57] (03PS1) 10Dzahn: mediawiki::fonts: fix noto font inclusion on jessie [puppet] - 10https://gerrit.wikimedia.org/r/403832 (https://phabricator.wikimedia.org/T184664) [22:46:56] 10Operations, 10Packaging, 10Scap, 10Patch-For-Review: SCAP: Upload debian package version 3.7.5-1 - https://phabricator.wikimedia.org/T184774#3895823 (10mmodell) cc: @fgiunchedi [22:47:08] (03CR) 10Dzahn: [C: 032] mediawiki::fonts: fix noto font inclusion on jessie [puppet] - 10https://gerrit.wikimedia.org/r/403832 (https://phabricator.wikimedia.org/T184664) (owner: 10Dzahn) [22:47:17] 10Operations, 10Packaging, 10Scap, 10Patch-For-Review, 10Release: SCAP: Upload debian package version 3.7.5-1 - https://phabricator.wikimedia.org/T184774#3895825 (10mmodell) [22:48:27] (03PS1) 10MaxSem: labs: add GlobalPreferences to Sanitarium [puppet] - 10https://gerrit.wikimedia.org/r/403833 (https://phabricator.wikimedia.org/T184666) [22:50:25] 10Operations, 10Ops-Access-Requests, 10Patch-For-Review: Requesting access to Production Shell for cy534 - https://phabricator.wikimedia.org/T184473#3895831 (10cy534) @RobH I signed the NDA acknowledgement, so it should be on file now! [22:50:45] PROBLEM - puppet last run on mw1240 is CRITICAL: CRITICAL: Puppet has 2 failures. Last run 6 minutes ago with 2 failures. Failed resources (up to 3 shown): Package[fonts-noto-hinted],Package[fonts-noto-unhinted] [22:51:04] PROBLEM - puppet last run on mw1281 is CRITICAL: CRITICAL: Puppet has 2 failures. Last run 4 minutes ago with 2 failures. Failed resources (up to 3 shown): Package[fonts-noto-hinted],Package[fonts-noto-unhinted] [22:51:12] mutante: are you force-running puppet on failed hosts? [22:51:14] PROBLEM - puppet last run on mw2112 is CRITICAL: CRITICAL: Puppet has 2 failures. Last run 6 minutes ago with 2 failures. Failed resources (up to 3 shown): Package[fonts-noto-hinted],Package[fonts-noto-unhinted] [22:51:14] PROBLEM - puppet last run on mw2097 is CRITICAL: CRITICAL: Puppet has 2 failures. Last run 6 minutes ago with 2 failures. Failed resources (up to 3 shown): Package[fonts-noto-hinted],Package[fonts-noto-unhinted] [22:51:24] PROBLEM - puppet last run on thumbor2002 is CRITICAL: CRITICAL: Puppet has 2 failures. Last run 4 minutes ago with 2 failures. Failed resources (up to 3 shown): Package[fonts-noto-hinted],Package[fonts-noto-unhinted] [22:51:24] PROBLEM - puppet last run on mw1227 is CRITICAL: CRITICAL: Puppet has 2 failures. Last run 5 minutes ago with 2 failures. Failed resources (up to 3 shown): Package[fonts-noto-hinted],Package[fonts-noto-unhinted] [22:51:25] PROBLEM - puppet last run on mw2237 is CRITICAL: CRITICAL: Puppet has 2 failures. Last run 5 minutes ago with 2 failures. Failed resources (up to 3 shown): Package[fonts-noto-hinted],Package[fonts-noto-unhinted] [22:51:25] PROBLEM - puppet last run on scb2001 is CRITICAL: CRITICAL: Puppet has 2 failures. Last run 4 minutes ago with 2 failures. Failed resources (up to 3 shown): Package[fonts-noto-hinted],Package[fonts-noto-unhinted] [22:51:34] PROBLEM - puppet last run on thumbor1002 is CRITICAL: CRITICAL: Puppet has 2 failures. Last run 5 minutes ago with 2 failures. Failed resources (up to 3 shown): Package[fonts-noto-hinted],Package[fonts-noto-unhinted] [22:51:44] RECOVERY - puppet last run on mw2115 is OK: OK: Puppet is currently enabled, last run 4 seconds ago with 0 failures [22:51:45] PROBLEM - puppet last run on mw2163 is CRITICAL: CRITICAL: Puppet has 2 failures. Last run 6 minutes ago with 2 failures. Failed resources (up to 3 shown): Package[fonts-noto-hinted],Package[fonts-noto-unhinted] [22:51:45] PROBLEM - puppet last run on mw2199 is CRITICAL: CRITICAL: Puppet has 2 failures. Last run 6 minutes ago with 2 failures. Failed resources (up to 3 shown): Package[fonts-noto-hinted],Package[fonts-noto-unhinted] [22:51:45] PROBLEM - puppet last run on mw1210 is CRITICAL: CRITICAL: Puppet has 2 failures. Last run 3 minutes ago with 2 failures. Failed resources (up to 3 shown): Package[fonts-noto-hinted],Package[fonts-noto-unhinted] [22:51:46] PROBLEM - puppet last run on mw2205 is CRITICAL: CRITICAL: Puppet has 2 failures. Last run 3 minutes ago with 2 failures. Failed resources (up to 3 shown): Package[fonts-noto-hinted],Package[fonts-noto-unhinted] [22:51:54] PROBLEM - puppet last run on mw1254 is CRITICAL: CRITICAL: Puppet has 2 failures. Last run 6 minutes ago with 2 failures. Failed resources (up to 3 shown): Package[fonts-noto-hinted],Package[fonts-noto-unhinted] [22:52:04] RECOVERY - puppet last run on mw2226 is OK: OK: Puppet is currently enabled, last run 28 seconds ago with 0 failures [22:52:04] RECOVERY - puppet last run on mw2254 is OK: OK: Puppet is currently enabled, last run 16 seconds ago with 0 failures [22:52:05] RECOVERY - puppet last run on mw2105 is OK: OK: Puppet is currently enabled, last run 4 seconds ago with 0 failures [22:52:05] PROBLEM - puppet last run on mw2210 is CRITICAL: CRITICAL: Puppet has 2 failures. Last run 4 minutes ago with 2 failures. Failed resources (up to 3 shown): Package[fonts-noto-hinted],Package[fonts-noto-unhinted] [22:52:14] RECOVERY - puppet last run on mw2117 is OK: OK: Puppet is currently enabled, last run 31 seconds ago with 0 failures [22:52:14] PROBLEM - puppet last run on mw1206 is CRITICAL: CRITICAL: Puppet has 2 failures. Last run 4 minutes ago with 2 failures. Failed resources (up to 3 shown): Package[fonts-noto-hinted],Package[fonts-noto-unhinted] [22:52:15] PROBLEM - puppet last run on mw1272 is CRITICAL: CRITICAL: Puppet has 2 failures. Last run 5 minutes ago with 2 failures. Failed resources (up to 3 shown): Package[fonts-noto-hinted],Package[fonts-noto-unhinted] [22:52:15] PROBLEM - puppet last run on mw1244 is CRITICAL: CRITICAL: Puppet has 2 failures. Last run 4 minutes ago with 2 failures. Failed resources (up to 3 shown): Package[fonts-noto-hinted],Package[fonts-noto-unhinted] [22:52:15] PROBLEM - puppet last run on mw1215 is CRITICAL: CRITICAL: Puppet has 2 failures. Last run 5 minutes ago with 2 failures. Failed resources (up to 3 shown): Package[fonts-noto-hinted],Package[fonts-noto-unhinted] [22:52:33] volans: yes, in codfw [22:52:45] ? [22:52:54] ? [22:53:05] all these alerts are being handled by someone right? [22:53:10] yes [22:54:10] i'm running puppet to speed up the fix [22:54:59] * volans cannot avoid to re-paste the URL for advertisement ;) [22:55:00] https://wikitech.wikimedia.org/wiki/Cumin#Run_Puppet_only_if_last_run_failed [22:55:08] 10Operations, 10Ops-Access-Requests, 10Patch-For-Review: Requesting access to Production Shell for cy534 - https://phabricator.wikimedia.org/T184473#3895837 (10RobH) @RStallman-legalteam: Can you confirm legal has a signed NDA on file for @cy534? (Asking since I don't see their name on the spreadsheet yet.... [22:56:06] volans: thanks, i'm using it for the rest [22:57:13] mutante: when you're done please re-start ircecho on einsteinium, systemd is not doing it, I'm re-opening the task [22:58:13] volans: ok, and that's basically a feature right now :p [22:58:15] RECOVERY - puppet last run on mw1269 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [22:58:21] lol [22:58:27] no, it does start it , see above [22:58:34] i just kept killing it, that's why [22:58:43] that was probably the puppet run [22:58:55] indeed, 26,56 [22:59:02] ok [23:01:59] --failed-only is sure useful, it doesnt just affect mw* , also thumbor [23:02:12] (originally this would have been only imagescalers) [23:02:55] yeah the idea was to have something that can run everywhere, and it's a quick noop if puppet didn't failed the last time [23:03:08] good to know [23:04:28] 10Operations, 10IRCecho, 10monitoring, 10Patch-For-Review: ircecho doesn't reconnect on failure - https://phabricator.wikimedia.org/T184103#3895852 (10Volans) 05Resolved>03Open I'm re-opening it because ircecho was not restarted by systemd as expected... too late for debuggin it now though. For context... [23:04:38] i think i suggested the part to include fonts on all appservers, yea.. that fixed issues for something [23:04:59] and that's why this error showed up on so many now, heh [23:06:59] volans: but i . killed ircecho [23:07:43] mutante: ahhh, you did it? [23:07:58] yes, to stop the spam [23:08:08] than it's all fuss in my mind... I'm sorry, let me amend [23:08:56] 10Operations, 10IRCecho, 10monitoring, 10Patch-For-Review: ircecho doesn't reconnect on failure - https://phabricator.wikimedia.org/T184103#3895863 (10Volans) 05Open>03Resolved This was a misunderstanding on my side, @Dzahn actually stopped it manually. [23:09:01] i saw the spam, i killed ircecho, uploaded the fix, started cumin.. then when puppet ran it started ircecho again and i killed it once more [23:09:29] thanks for the clarification, I should have asked [23:09:32] and now it should be back to normal [23:09:41] i should have logged, np [23:09:43] clearly too late to think straight here... time to go to bed ;) [23:10:03] heh, good night! the "--failed-only" finished [23:10:14] good night [23:10:17] I should do the same [23:10:31] indeed apergos ;) [23:14:03] 10Operations, 10Datasets-General-or-Unknown, 10Dumps-Generation, 10hardware-requests: Give misc dump crons their own host - https://phabricator.wikimedia.org/T181936#3895869 (10ArielGlenn) >>! In T181936#3892043, @hoo wrote: ... > Also keep in mind the strong-ish growth of Wikidata. Given all of this I'd e... [23:15:16] 10Operations, 10Commons, 10Wikimedia-SVG-rendering, 10media-storage, 10Patch-For-Review: Install Noto fonts on scaling servers for SVG rendering - https://phabricator.wikimedia.org/T184664#3891777 (10Dzahn) after the changes above, now: jessie-imagescaler: ``` [mw1293:~] $ dpkg -l | grep noto ii fo... [23:15:44] 10Operations, 10Commons, 10Wikimedia-SVG-rendering, 10media-storage, 10Patch-For-Review: Install Noto fonts on scaling servers for SVG rendering - https://phabricator.wikimedia.org/T184664#3895872 (10Dzahn) 05Open>03Resolved a:03Dzahn [23:16:50] 10Operations, 10Chinese-Sites, 10I18n: Deploy Noto fonts or their derivatives for Chinese (and J&K?) - https://phabricator.wikimedia.org/T180924#3895879 (10Dzahn) 05Open>03Resolved a:03Dzahn see T184664#3895870 noto fonts are now installed across mediawiki appservers, jessie and stretch have slightly... [23:18:13] (03PS1) 10Madhuvishy: dumps_distribution: Set up initial NFS exports [puppet] - 10https://gerrit.wikimedia.org/r/403837 (https://phabricator.wikimedia.org/T181431) [23:18:43] (03CR) 10jerkins-bot: [V: 04-1] dumps_distribution: Set up initial NFS exports [puppet] - 10https://gerrit.wikimedia.org/r/403837 (https://phabricator.wikimedia.org/T181431) (owner: 10Madhuvishy) [23:20:44] PROBLEM - Disk space on ms-be2023 is CRITICAL: DISK CRITICAL - free space: / 0 MB (0% inode=87%) [23:20:52] (03PS2) 10Madhuvishy: dumps_distribution: Set up initial NFS exports [puppet] - 10https://gerrit.wikimedia.org/r/403837 (https://phabricator.wikimedia.org/T181431) [23:20:57] (03PS3) 10Dzahn: aptrepo: move Hiera calls into parameter of role class [puppet] - 10https://gerrit.wikimedia.org/r/397730 [23:22:26] 10Operations, 10Commons, 10Wikimedia-SVG-rendering, 10media-storage, 10Patch-For-Review: Install Noto fonts on scaling servers for SVG rendering - https://phabricator.wikimedia.org/T184664#3895894 (10kaldari) @Dzahn: Shouldn't stretch also show the plain "fonts-noto"? [23:22:55] (03PS23) 10Paladox: Update gerrit login display [puppet] - 10https://gerrit.wikimedia.org/r/402665 [23:24:46] 10Operations, 10Commons, 10Wikimedia-SVG-rendering, 10media-storage, 10Patch-For-Review: Install Noto fonts on scaling servers for SVG rendering - https://phabricator.wikimedia.org/T184664#3895901 (10Dzahn) @kaldari No, it is expected like this per these comments from Moritz and Reedy on https://gerrit.... [23:26:01] (03CR) 10Madhuvishy: [C: 032] dumps_distribution: Set up initial NFS exports [puppet] - 10https://gerrit.wikimedia.org/r/403837 (https://phabricator.wikimedia.org/T181431) (owner: 10Madhuvishy) [23:26:35] (03CR) 10Dzahn: "follow-up at https://gerrit.wikimedia.org/r/#/c/403832/" [puppet] - 10https://gerrit.wikimedia.org/r/403605 (https://phabricator.wikimedia.org/T184664) (owner: 10Reedy) [23:27:10] (03CR) 10Dzahn: [C: 032] aptrepo: move Hiera calls into parameter of role class [puppet] - 10https://gerrit.wikimedia.org/r/397730 (owner: 10Dzahn) [23:27:20] (03PS4) 10Dzahn: aptrepo: move Hiera calls into parameter of role class [puppet] - 10https://gerrit.wikimedia.org/r/397730 [23:29:58] (03Abandoned) 10Dzahn: prometheus: replace deprecated parser functions with validate_legacy [puppet] - 10https://gerrit.wikimedia.org/r/377331 (owner: 10Dzahn) [23:30:55] (03PS4) 10Dzahn: ganeti: create profiles, split monitoring/firewall classes [puppet] - 10https://gerrit.wikimedia.org/r/392564 [23:45:37] 10Operations, 10Commons, 10Wikimedia-SVG-rendering, 10media-storage, 10Patch-For-Review: Install Noto fonts on scaling servers for SVG rendering - https://phabricator.wikimedia.org/T184664#3895929 (10kaldari) @Dzahn: Ah, got it! Makes sense! [23:47:38] 10Operations, 10Commons, 10Wikimedia-SVG-rendering, 10media-storage, 10Patch-For-Review: Install Noto fonts on scaling servers for SVG rendering - https://phabricator.wikimedia.org/T184664#3895931 (10kaldari) When should we expect these to be available for use and/or show up on https://noc.wikimedia.org/... [23:54:15] 10Operations, 10Commons, 10Wikimedia-SVG-rendering, 10media-storage, 10Patch-For-Review: Install Noto fonts on scaling servers for SVG rendering - https://phabricator.wikimedia.org/T184664#3895932 (10Dzahn) Sorry, i don't know about fc-list, that's a Mediawiki config file and it seems last time it was up... [23:54:41] 10Operations, 10Commons, 10Wikimedia-SVG-rendering, 10media-storage, 10Patch-For-Review: Install Noto fonts on scaling servers for SVG rendering - https://phabricator.wikimedia.org/T184664#3895933 (10Dzahn) a:05Dzahn>03None [23:54:49] 10Operations, 10Commons, 10Wikimedia-SVG-rendering, 10media-storage, 10Patch-For-Review: Install Noto fonts on scaling servers for SVG rendering - https://phabricator.wikimedia.org/T184664#3891777 (10Dzahn) 05Resolved>03Open