[00:03:55] (03PS4) 10Dzahn: facilities: Fix variable contains an uppercase letter [puppet] - 10https://gerrit.wikimedia.org/r/308338 (https://phabricator.wikimedia.org/T93645) (owner: 10Paladox) [00:09:22] 06Operations, 06Commons, 10Wikimedia-SVG-rendering, 10media-storage: Install mscorefonts on scaling servers for SVG rendering - https://phabricator.wikimedia.org/T140141#2613339 (10kaldari) @MoritzMuehlenhoff: Is that test-case sufficient? (https://commons.wikimedia.org/wiki/File:Mscorefonts_svg_rendering_... [00:13:16] !log Ran namespaceDupes maintenance script on frwiki [00:13:23] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [00:14:33] Dereckson, deploying anything? [00:14:49] yurik, MaxSem: ^ [00:15:07] * yurik looks [00:15:32] oh, never mind, i'm done [00:16:08] Krenair: yurik is [00:16:25] Dereckson, Krenair, nope, i'm done [00:16:33] Krenair: ah by the way, Math files are now served on wikitech [00:16:40] was - idle time made that pretty clear, but I wanted to be sure so we didn't accidentally break anything between us [00:17:40] (I want to test a potentially-breaking OSM patch on labtestwikitech before it goes to wikitech... this means putting it on tin, running sync-common on labtestweb2001, and hoping nobody runs scap or a sync-dir covering the directory while it's still on tin) [00:19:14] 06Operations, 06Commons, 10Wikimedia-SVG-rendering, 10media-storage: Install mscorefonts on scaling servers for SVG rendering - https://phabricator.wikimedia.org/T140141#2454595 (10Yurik) We have similar issue in Graphoid T127683 -- all backend renderings look horrible with just one font, and don't support... [00:20:06] huh, there's still a reviewer-bot rule adding springle to .sql changes :) [00:26:12] 06Operations, 06Commons, 10Wikimedia-SVG-rendering, 10media-storage: Install mscorefonts on scaling servers for SVG rendering - https://phabricator.wikimedia.org/T140141#2613377 (10Dereckson) [00:28:16] 06Operations, 06Commons, 10Wikimedia-SVG-rendering, 10media-storage: Install mscorefonts on scaling servers for SVG rendering - https://phabricator.wikimedia.org/T140141#2454595 (10Dereckson) EULA contains some provision which made these fonts non freely licensed: > Restrictions on Alteration. You may not... [01:04:41] !log labtest ldap: created dc=codfw,ou=hosts,dc=wikimedia,dc=org [01:04:46] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [01:07:24] !log Updated Wikidata's property suggester with data from Monday's json dump and applied the T132839 workarounds [01:07:25] T132839: Property suggester suggests human properties for non-human items - https://phabricator.wikimedia.org/T132839 [01:07:29] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [01:30:18] 06Operations, 06Commons, 10Wikimedia-SVG-rendering, 10media-storage: Install mscorefonts on scaling servers for SVG rendering - https://phabricator.wikimedia.org/T140141#2613499 (10kaldari) >Until now, the decision was to only use freely licensed fonts. Well, depends on your definition of "freely licensed... [01:59:48] (03PS1) 10Legoktm: contint: Add PHP 7 packages from packages.sury.org [puppet] - 10https://gerrit.wikimedia.org/r/308918 (https://phabricator.wikimedia.org/T144872) [02:01:13] (03CR) 10jenkins-bot: [V: 04-1] contint: Add PHP 7 packages from packages.sury.org [puppet] - 10https://gerrit.wikimedia.org/r/308918 (https://phabricator.wikimedia.org/T144872) (owner: 10Legoktm) [02:02:48] (03PS2) 10Legoktm: contint: Add PHP 7 packages from packages.sury.org [puppet] - 10https://gerrit.wikimedia.org/r/308918 (https://phabricator.wikimedia.org/T144872) [02:04:18] (03CR) 10jenkins-bot: [V: 04-1] contint: Add PHP 7 packages from packages.sury.org [puppet] - 10https://gerrit.wikimedia.org/r/308918 (https://phabricator.wikimedia.org/T144872) (owner: 10Legoktm) [02:05:57] (03PS3) 10Legoktm: contint: Add PHP 7 packages from packages.sury.org [puppet] - 10https://gerrit.wikimedia.org/r/308918 (https://phabricator.wikimedia.org/T144872) [02:06:39] !log T144826: Restarting Cassandra restbase2004-b.codfw.wmnet (putting back into service) [02:06:40] T144826: restbase2004-b.codfw.wmnet data corruption - https://phabricator.wikimedia.org/T144826 [02:06:44] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [02:40:06] !log mwdeploy@tin scap sync-l10n completed (1.28.0-wmf.17) (duration: 17m 37s) [02:40:14] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [02:41:07] 06Operations, 06Labs, 10wikitech.wikimedia.org: Can't login wikitech - https://phabricator.wikimedia.org/T144805#2613590 (10Shizhao) >>! In T144805#2612937, @Peachey88 wrote: > @Shizhao When was the last time you logged into wikitech successfully? > > There was a mistake at one stage that resulted in a numb... [02:43:54] 06Operations, 06Labs, 10wikitech.wikimedia.org: Can't login wikitech - https://phabricator.wikimedia.org/T144805#2613591 (10Shizhao) ops, logged into wikitech successfully... .About 4 - 5 years ago? I don‘t remember [03:16:36] !log mwdeploy@tin scap sync-l10n completed (1.28.0-wmf.18) (duration: 17m 58s) [03:16:42] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [05:29:16] (03PS4) 10Legoktm: contint: Add PHP 7 packages from packages.sury.org [puppet] - 10https://gerrit.wikimedia.org/r/308918 (https://phabricator.wikimedia.org/T144872) [05:37:04] (03CR) 10Legoktm: "I've cherry-picked this onto integration-puppetmaster with an additional hack to limit it to only the currently depooled integration-slave" [puppet] - 10https://gerrit.wikimedia.org/r/308918 (https://phabricator.wikimedia.org/T144872) (owner: 10Legoktm) [05:38:52] !log enabling row aware allocation on elasticsearch codfw - T143571 [05:38:54] T143571: Make elasticsearch actually uses shard allocation awareness - https://phabricator.wikimedia.org/T143571 [05:38:57] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [06:27:26] (03CR) 10Muehlenhoff: [C: 031] "Looks good to me" [puppet] - 10https://gerrit.wikimedia.org/r/308918 (https://phabricator.wikimedia.org/T144872) (owner: 10Legoktm) [06:40:17] !log reimaging mw2140-mw2143 to jessie [06:40:23] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [06:44:18] gehel: o/ feeling better today? [06:55:05] elukey: I'm a new man! [06:55:10] \o/ [07:00:52] !log increase cluster_concurrent_rebalance on elasticsearch codfw - T143571 [07:00:53] T143571: Make elasticsearch actually uses shard allocation awareness - https://phabricator.wikimedia.org/T143571 [07:00:58] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [07:02:57] (03PS2) 10Muehlenhoff: role::statistics: Limit to production networks [puppet] - 10https://gerrit.wikimedia.org/r/308576 [07:03:14] (03PS1) 10Elukey: Add the logrotate delaycompress setting everywhere [puppet] - 10https://gerrit.wikimedia.org/r/308940 (https://phabricator.wikimedia.org/T132324) [07:03:54] (03PS1) 10Madhuvishy: nfsclient: Create /data/scratch symlink only if mount is present [puppet] - 10https://gerrit.wikimedia.org/r/308941 [07:05:18] (03CR) 10Muehlenhoff: [C: 032] role::statistics: Limit to production networks [puppet] - 10https://gerrit.wikimedia.org/r/308576 (owner: 10Muehlenhoff) [07:27:10] 10Blocked-on-Operations, 06Operations, 10Recommendation-API: Backport python3-sklearn and python3-sklearn-lib from sid - https://phabricator.wikimedia.org/T133362#2229360 (10MoritzMuehlenhoff) > Per discussions in backlog grooming, there is no dependency on sklearn at the moment, however, for future experime... [07:27:16] 06Operations, 10Recommendation-API: Backport python3-sklearn and python3-sklearn-lib from sid - https://phabricator.wikimedia.org/T133362#2613729 (10MoritzMuehlenhoff) [07:27:46] !log reimaging mw2144->mw2147 to jessie [07:27:51] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [07:28:49] for the record, I am going to set these hosts to inactive before proceeding :P [07:29:11] good morning [07:30:03] hashar: sorry for the gerrit spam but I only wanted to add Alex [07:30:14] and gerrit added 10 people instead [07:30:19] I am not sure how I did it [07:33:11] elukey: maybe you have added a group? [07:33:14] I am also auto added as a reviewer by a bot [07:33:21] on some kind of changes based on repo / file names [07:35:19] ahh ok makes sense, maybe I touched too many files in the puppet repo [07:39:37] hashar nope, https://www.mediawiki.org/wiki/Git/Reviewers#operations.2Fpuppet [07:41:10] (03CR) 10Paladox: [C: 031] contint: Add PHP 7 packages from packages.sury.org [puppet] - 10https://gerrit.wikimedia.org/r/308918 (https://phabricator.wikimedia.org/T144872) (owner: 10Legoktm) [07:43:35] I didn't know about this page :) [07:53:33] that is the magic bot :} [07:56:57] hashar: can I add a repo just adding it to the page? [07:58:03] volans: on https://www.mediawiki.org/wiki/Git/Reviewers ? Yes [07:58:16] there is a bot that parse the wikitext from time to time [07:58:18] and update itself [07:58:20] !log executed apt-get purge tmpreaper on gallium (T132324) [07:58:21] T132324: Tracking and Reducing cron-spam from root@ - https://phabricator.wikimedia.org/T132324 [07:58:25] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [07:58:26] hashar: yes, there, operations/software is not there :D [07:58:27] ok thanks [08:03:21] (03PS3) 10Hashar: zuul: refactor to use hiera [puppet] - 10https://gerrit.wikimedia.org/r/308778 [08:03:51] (03CR) 10Ema: [C: 031] Move yarn.w.o traffic to stat1001.eqiad.wmnet [puppet] - 10https://gerrit.wikimedia.org/r/308776 (https://phabricator.wikimedia.org/T116192) (owner: 10Elukey) [08:06:15] (03PS2) 10Elukey: Move yarn.w.o traffic to stat1001.eqiad.wmnet [puppet] - 10https://gerrit.wikimedia.org/r/308776 (https://phabricator.wikimedia.org/T116192) [08:06:32] !log reimaging mw2200-mw2203 to jessie [08:06:36] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [08:07:37] (03CR) 10Elukey: [C: 032] Move yarn.w.o traffic to stat1001.eqiad.wmnet [puppet] - 10https://gerrit.wikimedia.org/r/308776 (https://phabricator.wikimedia.org/T116192) (owner: 10Elukey) [08:12:52] (03PS2) 10Jcrespo: nagios: Add marostegui to the dba contact group [puppet] - 10https://gerrit.wikimedia.org/r/308686 [08:15:31] (03CR) 10Marostegui: [C: 031] nagios: Add marostegui to the dba contact group [puppet] - 10https://gerrit.wikimedia.org/r/308686 (owner: 10Jcrespo) [08:18:55] (03CR) 10Jcrespo: [C: 032] nagios: Add marostegui to the dba contact group [puppet] - 10https://gerrit.wikimedia.org/r/308686 (owner: 10Jcrespo) [08:21:48] (03PS1) 10Marostegui: mariadb: pool db1064 after the reimage; depool db1019 which was replacing it [mediawiki-config] - 10https://gerrit.wikimedia.org/r/308945 (https://phabricator.wikimedia.org/T144723) [08:27:14] (03CR) 10Jcrespo: [C: 04-1] "There is a mismatch betwen the intended weight and the real one." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/308945 (https://phabricator.wikimedia.org/T144723) (owner: 10Marostegui) [08:27:21] (03PS4) 10Hashar: zuul: refactor to use hiera [puppet] - 10https://gerrit.wikimedia.org/r/308778 [08:28:14] (03CR) 10Jcrespo: mariadb: pool db1064 after the reimage; depool db1019 which was replacing it (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/308945 (https://phabricator.wikimedia.org/T144723) (owner: 10Marostegui) [08:29:47] (03CR) 10Hashar: [C: 04-1] "In hiera, I have moved "zuul::common" from common.yaml to each host hiera file. It seems to solve the compilation: https://puppet-compil" [puppet] - 10https://gerrit.wikimedia.org/r/308778 (owner: 10Hashar) [08:36:34] (03PS5) 10Hashar: zuul: refactor to use hiera [puppet] - 10https://gerrit.wikimedia.org/r/308778 [08:40:13] (03CR) 10Hashar: "https://puppet-compiler.wmflabs.org/4001/" [puppet] - 10https://gerrit.wikimedia.org/r/308778 (owner: 10Hashar) [08:40:33] (03PS2) 10Marostegui: mariadb: pool db1064 after the reimage; depool db1019 which was replacing it [mediawiki-config] - 10https://gerrit.wikimedia.org/r/308945 (https://phabricator.wikimedia.org/T144723) [08:42:31] (03CR) 10Jcrespo: [C: 031] mariadb: pool db1064 after the reimage; depool db1019 which was replacing it [mediawiki-config] - 10https://gerrit.wikimedia.org/r/308945 (https://phabricator.wikimedia.org/T144723) (owner: 10Marostegui) [08:47:54] (03CR) 10Hashar: "The puppet compiler https://puppet-compiler.wmflabs.org/4001/ for gallium reports:" [puppet] - 10https://gerrit.wikimedia.org/r/308778 (owner: 10Hashar) [08:48:44] (03PS6) 10Hashar: zuul: refactor to use hiera [puppet] - 10https://gerrit.wikimedia.org/r/308778 (https://phabricator.wikimedia.org/T139527) [08:49:43] (03CR) 10Hashar: [C: 031] "Change linked to T139527 "role::zuul::configuration should be replaced by hiera"" [puppet] - 10https://gerrit.wikimedia.org/r/308778 (https://phabricator.wikimedia.org/T139527) (owner: 10Hashar) [08:53:39] 07Puppet, 10Continuous-Integration-Infrastructure, 13Patch-For-Review, 07Technical-Debt, 07Zuul: role::zuul::configuration should be replaced by hiera - https://phabricator.wikimedia.org/T139527#2613906 (10hashar) p:05Triage>03Normal a:03hashar As part of migrating CI from gallium to a new host, I... [08:55:50] 07Puppet, 10Continuous-Integration-Infrastructure, 13Patch-For-Review, 07Technical-Debt, 07Zuul: role::zuul::configuration should be replaced by hiera - https://phabricator.wikimedia.org/T139527#2613923 (10Paladox) @hashar I can ajust but I doint know how I can run puppet without it deleting the /var/ww... [08:56:04] (03CR) 10Paladox: [C: 031] zuul: refactor to use hiera [puppet] - 10https://gerrit.wikimedia.org/r/308778 (https://phabricator.wikimedia.org/T139527) (owner: 10Hashar) [08:57:05] 06Operations: ganglia-monitor and puppet failing on bast3001 - https://phabricator.wikimedia.org/T144778#2613924 (10Volans) >>! In T144778#2610768, @fgiunchedi wrote: > @volans indeed, it doesn't seem to be able to start back up again by itself after a reboot. Also manually starting it via `/etc/init.d/ganglia-m... [08:57:43] (03CR) 10Marostegui: [C: 032] mariadb: pool db1064 after the reimage; depool db1019 which was replacing it [mediawiki-config] - 10https://gerrit.wikimedia.org/r/308945 (https://phabricator.wikimedia.org/T144723) (owner: 10Marostegui) [08:58:11] (03Merged) 10jenkins-bot: mariadb: pool db1064 after the reimage; depool db1019 which was replacing it [mediawiki-config] - 10https://gerrit.wikimedia.org/r/308945 (https://phabricator.wikimedia.org/T144723) (owner: 10Marostegui) [09:00:55] (03PS4) 10Filippo Giunchedi: introduce thumbor-admins group [puppet] - 10https://gerrit.wikimedia.org/r/302471 (https://phabricator.wikimedia.org/T139606) [09:05:09] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: pooled db1064 and removed db1019 which was replacing it - T144723 (duration: 00m 52s) [09:05:10] T144723: Reimage & upgrade db1064 - https://phabricator.wikimedia.org/T144723 [09:05:14] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [09:12:40] 06Operations: ganglia-monitor and puppet failing on bast3001 - https://phabricator.wikimedia.org/T144778#2613994 (10fgiunchedi) >>! In T144778#2613924, @Volans wrote: >>>! In T144778#2610768, @fgiunchedi wrote: >> @volans indeed, it doesn't seem to be able to start back up again by itself after a reboot. Also ma... [09:21:46] 06Operations: ganglia-monitor and puppet failing on bast3001 - https://phabricator.wikimedia.org/T144778#2614007 (10Volans) Yeah, sorry my bad I was looking for the ganglia-monitor, now I've checked the init.d script and agree with you. [09:26:38] 06Operations: ganglia-monitor and puppet failing on bast3001 - https://phabricator.wikimedia.org/T144778#2614023 (10MoritzMuehlenhoff) But "systemctl list-units" still shows ganglia-monitor.service as failed. [09:28:41] volans moritzm yeah it is a bit of a mess, unlikely I'll have time to look at it :( [09:28:53] especially the part where "it works" but systemctl disagrees [09:50:27] (03PS1) 10Muehlenhoff: Update SSH key for Alexander Krause [puppet] - 10https://gerrit.wikimedia.org/r/308947 (https://phabricator.wikimedia.org/T142780) [09:51:16] (03CR) 10Alexandros Kosiaris: [C: 032] contint: Add PHP 7 packages from packages.sury.org [puppet] - 10https://gerrit.wikimedia.org/r/308918 (https://phabricator.wikimedia.org/T144872) (owner: 10Legoktm) [09:51:20] (03PS5) 10Alexandros Kosiaris: contint: Add PHP 7 packages from packages.sury.org [puppet] - 10https://gerrit.wikimedia.org/r/308918 (https://phabricator.wikimedia.org/T144872) (owner: 10Legoktm) [09:51:22] (03CR) 10Alexandros Kosiaris: [V: 032] contint: Add PHP 7 packages from packages.sury.org [puppet] - 10https://gerrit.wikimedia.org/r/308918 (https://phabricator.wikimedia.org/T144872) (owner: 10Legoktm) [09:52:03] godog, exactly my case [09:52:34] I saw it, and I was like I would like to figure this out [09:52:51] but I should be finishing prometheus first [09:53:18] (I promise I will work on that soon) [09:53:26] jynus: yeah same here, I'm spinning the thumbor plate today/this week though [09:53:57] the pending things regarding mysql should be totaly on my plate [09:54:04] so do not worry [09:54:24] although we should sync at some point to see what else is missing [09:55:57] jynus: for sure, especially on https://phabricator.wikimedia.org/T143896 [09:56:18] oh, no that is what I say it is totally on me [09:56:42] I am referring to other prometheus things such as graphs or pending deploys, etc. [09:57:32] ah ok [10:00:05] godog: Respected human, time to deploy Turn on shadow thumbor requests for small wikis (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20160907T1000). Please do the needful. [10:00:34] (03CR) 10Elukey: [C: 031] Update SSH key for Alexander Krause [puppet] - 10https://gerrit.wikimedia.org/r/308947 (https://phabricator.wikimedia.org/T142780) (owner: 10Muehlenhoff) [10:02:17] !log add mw:thumbor to read/write ACLs for thumbnail containers of a subset of wikis T139606 [10:02:17] T139606: add thumbor to production infrastructure - https://phabricator.wikimedia.org/T139606 [10:02:22] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [10:04:06] (03PS2) 10Muehlenhoff: Update SSH key for Alexander Krause [puppet] - 10https://gerrit.wikimedia.org/r/308947 (https://phabricator.wikimedia.org/T142780) [10:06:24] (03CR) 10Muehlenhoff: [C: 032] Update SSH key for Alexander Krause [puppet] - 10https://gerrit.wikimedia.org/r/308947 (https://phabricator.wikimedia.org/T142780) (owner: 10Muehlenhoff) [10:11:49] (03PS2) 10Alexandros Kosiaris: Rename ores deploy repo [puppet] - 10https://gerrit.wikimedia.org/r/296687 (https://phabricator.wikimedia.org/T139008) (owner: 10Ladsgroup) [10:12:19] (03PS3) 10Alexandros Kosiaris: Rename ores deploy repo [puppet] - 10https://gerrit.wikimedia.org/r/296687 (https://phabricator.wikimedia.org/T139008) (owner: 10Ladsgroup) [10:12:26] (03CR) 10Alexandros Kosiaris: [C: 032 V: 032] Rename ores deploy repo [puppet] - 10https://gerrit.wikimedia.org/r/296687 (https://phabricator.wikimedia.org/T139008) (owner: 10Ladsgroup) [10:12:36] 06Operations, 10Ops-Access-Requests, 06Research-and-Data, 10Research-collaborations, and 2 others: Request access to data for WDQS research - https://phabricator.wikimedia.org/T142780#2614098 (10MoritzMuehlenhoff) 05Open>03Resolved @AlexKrauseTUD : I've updated your key, you should be able to login now. [10:13:34] (03PS3) 10Muehlenhoff: openldap: enable the memberof overlay [puppet] - 10https://gerrit.wikimedia.org/r/295357 (https://phabricator.wikimedia.org/T142817) (owner: 10Faidon Liambotis) [10:17:37] (03PS3) 10Filippo Giunchedi: swift: enable shadow thumb requests for small wikis [puppet] - 10https://gerrit.wikimedia.org/r/308746 (https://phabricator.wikimedia.org/T139606) [10:19:25] (03CR) 10Filippo Giunchedi: [C: 032] swift: enable shadow thumb requests for small wikis [puppet] - 10https://gerrit.wikimedia.org/r/308746 (https://phabricator.wikimedia.org/T139606) (owner: 10Filippo Giunchedi) [10:20:25] akosiaris: good to merge your change too? [10:21:34] {{done}} [10:21:53] (03PS1) 10Elukey: Extend the access to yarn.wikimedia.org to the wmf group [puppet] - 10https://gerrit.wikimedia.org/r/308951 (https://phabricator.wikimedia.org/T116192) [10:22:15] godog: thanks [10:24:12] np! [10:24:30] you actually saved me from a mistake [10:24:37] (03CR) 10Elukey: [C: 032] Extend the access to yarn.wikimedia.org to the wmf group [puppet] - 10https://gerrit.wikimedia.org/r/308951 (https://phabricator.wikimedia.org/T116192) (owner: 10Elukey) [10:24:43] I was running puppet without having it merged... not sure what I was thinking [10:25:45] hehe yeah happened to me too, the next step usually involves cursing puppet [10:26:11] hehe, so true [10:27:29] (03CR) 10Mark Bergsma: [C: 032] introduce thumbor-admins group [puppet] - 10https://gerrit.wikimedia.org/r/302471 (https://phabricator.wikimedia.org/T139606) (owner: 10Filippo Giunchedi) [10:30:04] !log yarn.w.o is now available to all the users in the wmf ldap group (Basic Auth) [10:30:09] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [10:32:11] !log https://yarn.wikimedia.org/ for the lazies [10:32:16] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [10:33:19] mmm I updated https://wikitech.wikimedia.org/wiki/LDAP_Groups and now I realize why we have two requires in the apache config [10:33:25] * elukey sends another code review [10:33:37] anyone familiar with unattended upgrade ? I am wondering why we dont auto upgrade from "trusty-updates" [10:37:31] (03PS1) 10Alexandros Kosiaris: Revert "Rename ores deploy repo" [puppet] - 10https://gerrit.wikimedia.org/r/308953 (https://phabricator.wikimedia.org/T139008) [10:37:50] mmmm actually one can't be in ops without being in the wmf group no? [10:38:09] I don't see any reason to put a Require for ops and one for wmf [10:38:20] if anybody disagree please let me know :) [10:39:11] (03CR) 10Alexandros Kosiaris: [C: 032] Revert "Rename ores deploy repo" [puppet] - 10https://gerrit.wikimedia.org/r/308953 (https://phabricator.wikimedia.org/T139008) (owner: 10Alexandros Kosiaris) [10:44:22] (03PS3) 10Urbanecm: Enable RC patrol for fiwiki and some related changes [mediawiki-config] - 10https://gerrit.wikimedia.org/r/308764 (https://phabricator.wikimedia.org/T144817) [10:44:48] (03PS4) 10Urbanecm: Enable RC patrol for fiwiki and some related changes [mediawiki-config] - 10https://gerrit.wikimedia.org/r/308764 (https://phabricator.wikimedia.org/T144817) [10:56:57] (03PS3) 10Gehel: graphite - fix storage_schemas order [puppet] - 10https://gerrit.wikimedia.org/r/308565 [10:58:11] (03CR) 10Gehel: [C: 032] graphite - fix storage_schemas order [puppet] - 10https://gerrit.wikimedia.org/r/308565 (owner: 10Gehel) [10:59:39] (03CR) 10Hashar: "There php7.0 binary takes precedence over our alternative script :(" [puppet] - 10https://gerrit.wikimedia.org/r/308918 (https://phabricator.wikimedia.org/T144872) (owner: 10Legoktm) [11:02:28] (03PS1) 10Hashar: contint: prefer our bin/php alternative [puppet] - 10https://gerrit.wikimedia.org/r/308955 (https://phabricator.wikimedia.org/T144872) [11:04:32] (03CR) 10Hashar: "Alternative pinning is https://gerrit.wikimedia.org/r/#/c/308955/" [puppet] - 10https://gerrit.wikimedia.org/r/308918 (https://phabricator.wikimedia.org/T144872) (owner: 10Legoktm) [11:04:39] !log filippo@palladium conftool action : set/pooled=no; selector: ms-fe1001.eqiad.wmnet [11:04:44] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [11:05:29] lunch, I've depooled the host I'm testing the thumbor change in though [11:06:53] (03PS2) 10Giuseppe Lavagetto: Change-Prop: Increase concurrency for transclusions [puppet] - 10https://gerrit.wikimedia.org/r/308803 (owner: 10Ppchelko) [11:08:17] (03CR) 10Giuseppe Lavagetto: [C: 032] Change-Prop: Increase concurrency for transclusions [puppet] - 10https://gerrit.wikimedia.org/r/308803 (owner: 10Ppchelko) [11:08:41] (03PS2) 10Giuseppe Lavagetto: Change Prop: Send the if-unmodified-since header on transclusion updates [puppet] - 10https://gerrit.wikimedia.org/r/308773 (owner: 10Mobrovac) [11:09:07] (03CR) 10Giuseppe Lavagetto: [C: 032 V: 032] Change Prop: Send the if-unmodified-since header on transclusion updates [puppet] - 10https://gerrit.wikimedia.org/r/308773 (owner: 10Mobrovac) [11:11:34] (03CR) 10Hashar: [C: 031] "I have cherry picked it on the CI puppetmaster, ran puppet and confirmed with salt that /usr/bin/php points to our shell script." [puppet] - 10https://gerrit.wikimedia.org/r/308955 (https://phabricator.wikimedia.org/T144872) (owner: 10Hashar) [11:17:18] akosiaris: as a follow up to PHP7, I need to tweak the alternative priority for 'php'. We want our own shell script and not default to php7.0 https://gerrit.wikimedia.org/r/308955 :D [11:17:31] confirmed that fix the prio on the permanent slaves [11:17:48] once that change merge I will update the Nodepool jessie images (they are build from operations/puppet with no cherry pick possibility) [11:20:03] (03PS1) 10Mobrovac: Change-Prop: Fix concurrency calculation for transcludes [puppet] - 10https://gerrit.wikimedia.org/r/308957 [11:20:57] 06Operations, 10Ops-Access-Requests, 06Research-and-Data, 10Research-collaborations, and 2 others: Request access to data for WDQS research - https://phabricator.wikimedia.org/T142780#2614212 (10AlexKrauseTUD) Thanks, now I can login :) As an information, if you are not already aware of this (I did not fi... [11:21:01] (03CR) 10Alexandros Kosiaris: [C: 032] contint: prefer our bin/php alternative [puppet] - 10https://gerrit.wikimedia.org/r/308955 (https://phabricator.wikimedia.org/T144872) (owner: 10Hashar) [11:21:11] (03PS2) 10Alexandros Kosiaris: contint: prefer our bin/php alternative [puppet] - 10https://gerrit.wikimedia.org/r/308955 (https://phabricator.wikimedia.org/T144872) (owner: 10Hashar) [11:21:14] (03CR) 10Alexandros Kosiaris: [V: 032] contint: prefer our bin/php alternative [puppet] - 10https://gerrit.wikimedia.org/r/308955 (https://phabricator.wikimedia.org/T144872) (owner: 10Hashar) [11:21:40] hashar: done [11:21:58] akosiaris: moritzm: and thank you a ton for the PHP7 quick review :} [11:22:50] hashar: yw [11:23:28] (03CR) 10Mobrovac: [C: 031] "PCC says OK - https://puppet-compiler.wmflabs.org/4006/" [puppet] - 10https://gerrit.wikimedia.org/r/308957 (owner: 10Mobrovac) [11:28:21] lunch & [11:32:29] (03PS2) 10Giuseppe Lavagetto: Change-Prop: Fix concurrency calculation for transcludes [puppet] - 10https://gerrit.wikimedia.org/r/308957 (owner: 10Mobrovac) [11:33:27] (03CR) 10Giuseppe Lavagetto: [C: 032 V: 032] Change-Prop: Fix concurrency calculation for transcludes [puppet] - 10https://gerrit.wikimedia.org/r/308957 (owner: 10Mobrovac) [11:37:59] 06Operations, 10ops-codfw: mw2202/mw2203 failed to install - https://phabricator.wikimedia.org/T144911#2614219 (10MoritzMuehlenhoff) [11:40:37] (03PS1) 10Alexandros Kosiaris: puppetmaster: Redirect cron job stdout/stderr to /dev/null [puppet] - 10https://gerrit.wikimedia.org/r/308958 [11:45:25] RoanKattouw: https://gerrit.wikimedia.org/r/#/c/308915/ - should this deploy earlier than 11 PDT? I can be there in EU SWAT time. [11:48:20] !log reimaging mw2153-mw2156 to jessie [11:48:25] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [11:57:24] Error: Could not find dependency Apt::Repository[sury-php] for Package[php7.0-cli] at /puppet/modules/contint/manifests/packages/php.pp:44 [11:57:29] puppet never cease to amaze me [11:58:35] (03PS7) 10Muehlenhoff: Add modules-load.d/kmod configuration for br_netfilter for Linux >= 3.18 [puppet] - 10https://gerrit.wikimedia.org/r/306633 (https://phabricator.wikimedia.org/T142388) [12:01:07] (03CR) 10Alex Monk: "Why just wmf instead of nda+ops+wmf?" [puppet] - 10https://gerrit.wikimedia.org/r/308951 (https://phabricator.wikimedia.org/T116192) (owner: 10Elukey) [12:02:10] (03CR) 10Muehlenhoff: [C: 032] Add modules-load.d/kmod configuration for br_netfilter for Linux >= 3.18 [puppet] - 10https://gerrit.wikimedia.org/r/306633 (https://phabricator.wikimedia.org/T142388) (owner: 10Muehlenhoff) [12:11:59] 06Operations, 10ops-codfw: mw2202/mw2203 failed to install - https://phabricator.wikimedia.org/T144911#2614266 (10MoritzMuehlenhoff) [12:20:42] !log mobileapps deploying fc09d0d [12:20:47] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [12:22:33] (03CR) 10Alexandros Kosiaris: [C: 032] puppetmaster: Redirect cron job stdout/stderr to /dev/null [puppet] - 10https://gerrit.wikimedia.org/r/308958 (owner: 10Alexandros Kosiaris) [12:22:37] (03PS2) 10Alexandros Kosiaris: puppetmaster: Redirect cron job stdout/stderr to /dev/null [puppet] - 10https://gerrit.wikimedia.org/r/308958 [12:22:39] (03CR) 10Alexandros Kosiaris: [V: 032] puppetmaster: Redirect cron job stdout/stderr to /dev/null [puppet] - 10https://gerrit.wikimedia.org/r/308958 (owner: 10Alexandros Kosiaris) [12:22:49] hola [12:22:55] (03PS2) 10Alexandros Kosiaris: pybal: Fix require_package Puppet 4.x syntax [puppet] - 10https://gerrit.wikimedia.org/r/308531 [12:22:58] (03CR) 10Alexandros Kosiaris: [C: 032 V: 032] pybal: Fix require_package Puppet 4.x syntax [puppet] - 10https://gerrit.wikimedia.org/r/308531 (owner: 10Alexandros Kosiaris) [12:23:03] qu [12:28:54] hable [12:35:01] !log filippo@palladium conftool action : set/pooled=yes; selector: ms-fe1001.eqiad.wmnet [12:35:07] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [12:35:12] some glitches on fermium or not our fault? [12:36:50] (03PS1) 10Filippo Giunchedi: swift: add thumborhost port 8800 [puppet] - 10https://gerrit.wikimedia.org/r/308964 [12:37:53] jynus: ? [12:37:58] jynus: it's strange, that also happened two hours ago and I restarted mailman to fix it [12:38:16] !log restarted mailman on fermium [12:38:17] oh, it sems something happened to the processes, true [12:38:21] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [12:38:27] now the qrunner processes are back [12:38:27] I only saw the mail queue first [12:38:53] so I didn't see if it could be just mail doing stupid things in general [12:39:27] if it happens again, I will investigate further [12:39:36] will leave it for now [12:39:38] (03CR) 10Filippo Giunchedi: [C: 032] swift: add thumborhost port 8800 [puppet] - 10https://gerrit.wikimedia.org/r/308964 (owner: 10Filippo Giunchedi) [12:41:47] Dereckson: hashar aude I can do EU swat today (again 2 of the changes are mine) [12:41:51] 06Operations, 10ops-eqiad: Rack/setup sodium (carbon/mirror server replacement) - https://phabricator.wikimedia.org/T139171#2614341 (10faidon) p:05Triage>03Normal [12:42:18] 06Operations, 10ops-eqiad: Rack/setup sodium (carbon/mirror server replacement) - https://phabricator.wikimedia.org/T139171#2421526 (10faidon) 05Open>03Resolved Thanks Chris. I installed the system, reconfigured BIOS etc.; system is installed and up and running now. [12:45:13] addshore: sounds good :} [12:45:44] [= [12:46:41] (03CR) 10BBlack: [C: 031] Add the logrotate delaycompress setting everywhere [puppet] - 10https://gerrit.wikimedia.org/r/308940 (https://phabricator.wikimedia.org/T132324) (owner: 10Elukey) [12:47:06] (03PS1) 10Ema: depool upload in ulsfo [dns] - 10https://gerrit.wikimedia.org/r/308967 (https://phabricator.wikimedia.org/T131502) [12:48:12] (03PS4) 10Gehel: maps - create project specific indices during initial data import [puppet] - 10https://gerrit.wikimedia.org/r/308587 [12:48:52] (03CR) 10BBlack: [C: 031] depool upload in ulsfo [dns] - 10https://gerrit.wikimedia.org/r/308967 (https://phabricator.wikimedia.org/T131502) (owner: 10Ema) [12:49:29] (03CR) 10Gehel: [C: 032] maps - create project specific indices during initial data import [puppet] - 10https://gerrit.wikimedia.org/r/308587 (owner: 10Gehel) [12:50:55] (03CR) 10Ema: [C: 032] depool upload in ulsfo [dns] - 10https://gerrit.wikimedia.org/r/308967 (https://phabricator.wikimedia.org/T131502) (owner: 10Ema) [12:51:24] !log depool upload in ulsfo [12:51:30] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [12:58:27] ohia Urbanecm! [12:58:36] !log enabling row aware allocation on elasticsearch eqiad - T143571 [12:58:37] T143571: Make elasticsearch actually uses shard allocation awareness - https://phabricator.wikimedia.org/T143571 [12:58:41] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [12:58:46] addshore: I'm here. Do you want something? ;) [12:59:26] https://gerrit.wikimedia.org/r/#/c/308764/4 looks all ready to go yes? :) [13:00:04] hashar, Dereckson, addshore, and aude: Respected human, time to deploy European Mid-day SWAT(Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20160907T1300). Please do the needful. [13:00:04] Urbanecm and Addshore: A patch you scheduled for European Mid-day SWAT(Max 8 patches) is about to be deployed. Please be available during the process. [13:00:15] addshore: Yes, it can be deployed. [13:01:03] Krenair: hi! Thanks for the review in the yarn ldap patch. So for the nda I wasn't sure since I wanted to ask my team first (going to do during standup today), meanwhile ops seemed redundand since it should be inside wmf.. But I could be wrong. Any suggestion? [13:01:19] elukey, ops is not a subset of wmf [13:01:44] why were you sure about wmf but not nda? [13:02:11] Urbanecm: in that case I will start SWAT now :) [13:02:13] because in my mind nda is bigger than wmf, so I wanted to ask first.. [13:02:14] (03CR) 10Addshore: [C: 032] Enable RC patrol for fiwiki and some related changes [mediawiki-config] - 10https://gerrit.wikimedia.org/r/308764 (https://phabricator.wikimedia.org/T144817) (owner: 10Urbanecm) [13:02:40] (03Merged) 10jenkins-bot: Enable RC patrol for fiwiki and some related changes [mediawiki-config] - 10https://gerrit.wikimedia.org/r/308764 (https://phabricator.wikimedia.org/T144817) (owner: 10Urbanecm) [13:02:41] Okay [13:02:42] Krenair: how is it possible that ops is not in wmf? [13:02:49] (asking because I am curious) [13:03:02] elukey, wmf is 7-8x the size of nda [13:03:20] Urbanecm: it is on mw1099 now, are you able to test? [13:03:27] elukey, because in the production cluster non-staff can have root [13:03:49] so there should be ops in that group who aren't in wmf [13:03:57] makes sense now [13:04:11] (03PS1) 10Giuseppe Lavagetto: [WiP] scap: introduce scap_source type [puppet] - 10https://gerrit.wikimedia.org/r/308973 [13:04:17] addshore: No, I have no patrol rights. [13:04:26] I'll ask for checking in the task. Is it ok addshore ? [13:04:44] Krenair: will make a patch to add ops and nda right after the meetings with my team, thanks! [13:05:21] (03CR) 10jenkins-bot: [V: 04-1] [WiP] scap: introduce scap_source type [puppet] - 10https://gerrit.wikimedia.org/r/308973 (owner: 10Giuseppe Lavagetto) [13:05:28] ack, I will roll it out everywhere now however [13:05:33] Okay. [13:06:04] (03Abandoned) 10Giuseppe Lavagetto: scap::source: use puppet to manage directory creation [puppet] - 10https://gerrit.wikimedia.org/r/306429 (owner: 10Giuseppe Lavagetto) [13:06:11] !log addshore@tin Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:308764|Enable RC patrol for fiwiki and some related changes]] (duration: 00m 47s) [13:06:16] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [13:06:17] Urbanecm: it is everywhere [13:06:17] (03Abandoned) 10Giuseppe Lavagetto: scap::source: allow picking phabricator as a source. [puppet] - 10https://gerrit.wikimedia.org/r/306430 (owner: 10Giuseppe Lavagetto) [13:06:22] Thanks. [13:06:24] Closed. [13:06:24] Just my patches left in SWAT now :) [13:06:27] :) [13:07:17] (03PS2) 10Addshore: Enable mention status notifications on mediawikiwiki and metawiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/307476 (https://phabricator.wikimedia.org/T144181) (owner: 10WMDE-leszek) [13:07:23] (03CR) 10Addshore: [C: 032] Enable mention status notifications on mediawikiwiki and metawiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/307476 (https://phabricator.wikimedia.org/T144181) (owner: 10WMDE-leszek) [13:07:54] (03Merged) 10jenkins-bot: Enable mention status notifications on mediawikiwiki and metawiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/307476 (https://phabricator.wikimedia.org/T144181) (owner: 10WMDE-leszek) [13:08:26] elukey, okay.. what exactly is yarn.wm.o anyway? [13:09:35] hadoop administration? [13:09:53] Krenair: its the yarn resourcemanager job ui [13:09:57] you can't take any actions there [13:10:04] but you can view running jobs in yarn (hadoop) [13:10:07] so ja sorta [13:10:19] so you can view running queries, like tendril? [13:10:35] you can see parts of running hive queries, yeah [13:10:47] !log addshore@tin Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:307476|Enable mention status notifications on mediawikiwiki and metawiki]] (duration: 00m 47s) [13:10:52] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [13:11:53] this is why I wanted to ask for nda vs wmf vs etc.. [13:12:07] but afaics nda+wmf+ops should be fine [13:12:13] thoughts ottomata ? [13:13:16] elukey: i think it should be fine [13:13:26] they can't take any actions using that ui [13:13:38] all right so let me add the remaining Require directives [13:13:44] I have the same opinion [13:15:24] !log addshore@tin Synchronized php-1.28.0-wmf.18/extensions/RevisionSlider/modules/ext.RevisionSlider.DiffPage.js: SWAT: [[gerrit:308943|Revert "Do not nest mw-content-text element when reloading a diff" (duration: 00m 47s) [13:15:29] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [13:16:31] 06Operations, 10MediaWiki-Sites, 05MW-1.26-release, 13Patch-For-Review, 07SEO: URLs for the same title without extra query parameters should have the same canonical link - https://phabricator.wikimedia.org/T67402#696646 (10Paladox) Fixing this task caused this T131414 problem since the patches broke supp... [13:16:32] EUSWAT done! :) [13:16:52] (03PS1) 10Faidon Liambotis: installserver: minor simplification of a ferm rule [puppet] - 10https://gerrit.wikimedia.org/r/308978 [13:16:54] (03PS1) 10Faidon Liambotis: role::installserver::mirrors fixups [puppet] - 10https://gerrit.wikimedia.org/r/308979 [13:16:56] (03PS1) 10Faidon Liambotis: Really split the mirrors role class [puppet] - 10https://gerrit.wikimedia.org/r/308980 [13:16:58] (03PS1) 10Faidon Liambotis: Add role::mirrors to sodium [puppet] - 10https://gerrit.wikimedia.org/r/308981 [13:17:47] (03PS1) 10Elukey: Allow the ops and nda LDAP groups to access yarn.w.o [puppet] - 10https://gerrit.wikimedia.org/r/308982 (https://phabricator.wikimedia.org/T116192) [13:18:40] hm, elukey, everyone who would have access to hadoop would have signed an NDA, right? [13:18:44] does that mean they are in the NDA group? [13:18:56] if so, maybe we can remove the 'wmf' group? [13:19:18] ha, maybe this is what you were asking me...:) [13:19:24] yes :D [13:19:51] I found https://wikitech.wikimedia.org/wiki/LDAP_Groups [13:20:06] https://phabricator.wikimedia.org/T129786 [13:21:53] 06Operations, 10MediaWiki-Sites, 05MW-1.26-release, 13Patch-For-Review, 07SEO: URLs for the same title without extra query parameters should have the same canonical link - https://phabricator.wikimedia.org/T67402#2614440 (10Paladox) We will need to revert https://phabricator.wikimedia.org/rMW155d555b83ec... [13:23:00] aye elukey ha, i dunno then, whatever. Krenair what do you think, wmf+nda for this? [13:23:18] wmf+nda+ops is the normal list of ldap groups [13:23:50] it is possible for folks to have access to hadoop without being in the analytics-privatedata-users group, but i think we'd make them sign an nda anyway they were getting just analytcs-users (which no one seems to really want) [13:23:54] nda should be enough, no? [13:24:07] or wmf+nda I suppose [13:24:17] it's possible that it isn't, but that's a group membership bug :/ [13:24:26] unfortunately not right now [13:24:30] no need for ops? all ops are in at least the other two, eh? [13:24:39] once we get wmf merged into nda you could just do nda [13:25:13] ok, well elukey in that case, nda sounds good enough [13:25:23] i don't think anyone outside of nda needs access now at least [13:25:28] and it will be sufficient in the future [13:26:14] ok so only nda without wmf and ops right? [13:26:19] no [13:26:20] nda+wmf [13:26:35] (03CR) 10Alexandros Kosiaris: [C: 032] installserver: minor simplification of a ferm rule [puppet] - 10https://gerrit.wikimedia.org/r/308978 (owner: 10Faidon Liambotis) [13:27:05] yes sir [13:27:49] elukey, I'm not trying to push you around here [13:27:54] (03PS2) 10Elukey: Allow the nda LDAP group to access yarn.w.o [puppet] - 10https://gerrit.wikimedia.org/r/308982 (https://phabricator.wikimedia.org/T116192) [13:28:14] it just doesn't make sense to have only nda with access right now [13:28:23] (03PS5) 10Filippo Giunchedi: introduce thumbor-admins group [puppet] - 10https://gerrit.wikimedia.org/r/302471 (https://phabricator.wikimedia.org/T139606) [13:28:34] Krenair: nono thanks for the help, it makes sense to do the right thing [13:28:58] 06Operations, 10hardware-requests, 10Continuous-Integration-Infrastructure (phase-out-gallium): Allocate contint1001 to releng and allocate to a vlan - https://phabricator.wikimedia.org/T140257#2614474 (10faidon) Yes, this is inline with what I've previously said and it sounds fine with me. This is really no... [13:29:07] (03CR) 10Alexandros Kosiaris: [C: 032] role::installserver::mirrors fixups [puppet] - 10https://gerrit.wikimedia.org/r/308979 (owner: 10Faidon Liambotis) [13:29:17] (03CR) 10Elukey: [C: 032] Allow the nda LDAP group to access yarn.w.o [puppet] - 10https://gerrit.wikimedia.org/r/308982 (https://phabricator.wikimedia.org/T116192) (owner: 10Elukey) [13:29:25] (03PS3) 10Elukey: Allow the nda LDAP group to access yarn.w.o [puppet] - 10https://gerrit.wikimedia.org/r/308982 (https://phabricator.wikimedia.org/T116192) [13:30:12] paravoid: anything against https://gerrit.wikimedia.org/r/#/c/308940/ [13:30:15] ? [13:31:41] elukey: no, sounds fine to me [13:31:46] !log restbase start deploy of 3852f72 [13:31:51] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [13:33:33] (03CR) 10Alexandros Kosiaris: [C: 032] Really split the mirrors role class [puppet] - 10https://gerrit.wikimedia.org/r/308980 (owner: 10Faidon Liambotis) [13:33:39] (03PS2) 10Alexandros Kosiaris: Really split the mirrors role class [puppet] - 10https://gerrit.wikimedia.org/r/308980 (owner: 10Faidon Liambotis) [13:33:41] (03CR) 10Alexandros Kosiaris: [V: 032] Really split the mirrors role class [puppet] - 10https://gerrit.wikimedia.org/r/308980 (owner: 10Faidon Liambotis) [13:34:39] (03CR) 10Alexandros Kosiaris: [C: 031] Add role::mirrors to sodium [puppet] - 10https://gerrit.wikimedia.org/r/308981 (owner: 10Faidon Liambotis) [13:35:41] (03PS6) 10Filippo Giunchedi: introduce thumbor-admins group [puppet] - 10https://gerrit.wikimedia.org/r/302471 (https://phabricator.wikimedia.org/T139606) [13:38:27] 06Operations, 10hardware-requests, 10Continuous-Integration-Infrastructure (phase-out-gallium): Allocate contint1001 to releng and allocate to a vlan - https://phabricator.wikimedia.org/T140257#2614527 (10hashar) [13:40:49] (03PS2) 10Elukey: Add the logrotate delaycompress setting everywhere [puppet] - 10https://gerrit.wikimedia.org/r/308940 (https://phabricator.wikimedia.org/T132324) [13:41:43] (03PS1) 10Filippo Giunchedi: hieradata: move admin::groups to thumbor::mediawiki [puppet] - 10https://gerrit.wikimedia.org/r/308987 [13:44:09] (03CR) 10Filippo Giunchedi: [C: 032] hieradata: move admin::groups to thumbor::mediawiki [puppet] - 10https://gerrit.wikimedia.org/r/308987 (owner: 10Filippo Giunchedi) [13:44:52] 06Operations, 10ops-eqiad: rack/setup/deploy puppetmaster100[12] - https://phabricator.wikimedia.org/T143219#2614533 (10Joe) @Cmjohnson when do you think we can start using these? I really want to install and setup these as well so that we can declare our goal a win and concentrate on puppetDB. [13:46:14] gilles: thumbor is getting shadow thumbnail requests now \o/ [13:46:55] (03CR) 10Elukey: [C: 032] Add the logrotate delaycompress setting everywhere [puppet] - 10https://gerrit.wikimedia.org/r/308940 (https://phabricator.wikimedia.org/T132324) (owner: 10Elukey) [13:47:01] (03PS3) 10Elukey: Add the logrotate delaycompress setting everywhere [puppet] - 10https://gerrit.wikimedia.org/r/308940 (https://phabricator.wikimedia.org/T132324) [13:48:47] (03PS1) 10Hoo man: Specify contact groups in wdqs::monitor::services [puppet] - 10https://gerrit.wikimedia.org/r/308989 [13:50:00] (03CR) 10jenkins-bot: [V: 04-1] Specify contact groups in wdqs::monitor::services [puppet] - 10https://gerrit.wikimedia.org/r/308989 (owner: 10Hoo man) [13:55:31] (03PS1) 10Phuedx: Disable Wikidata descriptions for 6 Wikipedias [mediawiki-config] - 10https://gerrit.wikimedia.org/r/308990 (https://phabricator.wikimedia.org/T143345) [13:55:43] !log restbase start end of 3852f72 [13:55:48] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [13:56:58] !log upgrading labvirt1014 to Linux 4.4 [13:57:03] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [13:58:16] !log reimaging mw2204->mw2207 to jessie [13:58:21] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [14:02:35] (03PS2) 10Rush: labstore: nfs-exports/nfs-exports-daemons settle on nfs-exportd [puppet] - 10https://gerrit.wikimedia.org/r/308886 [14:04:04] (03CR) 10Phuedx: [C: 04-2] "Waiting for a shepherd…" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/308990 (https://phabricator.wikimedia.org/T143345) (owner: 10Phuedx) [14:05:06] (03CR) 10Rush: [C: 032] labstore: nfs-exports/nfs-exports-daemons settle on nfs-exportd [puppet] - 10https://gerrit.wikimedia.org/r/308886 (owner: 10Rush) [14:10:21] (03PS2) 10Phuedx: Disable Wikidata descriptions for 6 Wikipedias [mediawiki-config] - 10https://gerrit.wikimedia.org/r/308990 (https://phabricator.wikimedia.org/T143345) [14:10:43] (03PS2) 10Hoo man: Specify contact groups in wdqs::monitor::services [puppet] - 10https://gerrit.wikimedia.org/r/308989 [14:11:20] (03PS1) 10BBlack: cache_upload: one-hit-wonder experiment, hit/2+ [puppet] - 10https://gerrit.wikimedia.org/r/308995 (https://phabricator.wikimedia.org/T144187) [14:12:42] (03CR) 10jenkins-bot: [V: 04-1] cache_upload: one-hit-wonder experiment, hit/2+ [puppet] - 10https://gerrit.wikimedia.org/r/308995 (https://phabricator.wikimedia.org/T144187) (owner: 10BBlack) [14:13:16] !log reimaging mw2157-2160 to jessie [14:13:22] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [14:15:17] (03PS1) 10Rush: labstore: nfs-export-daemon renamed to nfs-exportd [puppet] - 10https://gerrit.wikimedia.org/r/308996 [14:16:40] !log shutting down mw2120-mw2139 for hardware maintenance (T142726) [14:16:41] T142726: Multiple servers in codfw fail to respond to IPMI commands during reimaging - https://phabricator.wikimedia.org/T142726 [14:16:45] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [14:21:42] (03PS2) 10BBlack: cache_upload: one-hit-wonder experiment, hit/2+ [puppet] - 10https://gerrit.wikimedia.org/r/308995 (https://phabricator.wikimedia.org/T144187) [14:22:55] (03PS3) 10BBlack: cache_upload: one-hit-wonder experiment, hit/2+ [puppet] - 10https://gerrit.wikimedia.org/r/308995 (https://phabricator.wikimedia.org/T144187) [14:23:07] (03PS1) 10Ottomata: Point archiva at meitnerium [dns] - 10https://gerrit.wikimedia.org/r/308997 (https://phabricator.wikimedia.org/T123725) [14:23:11] 06Operations, 06Performance-Team, 10Thumbor, 13Patch-For-Review: add thumbor to production infrastructure - https://phabricator.wikimedia.org/T139606#2614651 (10fgiunchedi) WIP dashboard, https://grafana.wikimedia.org/dashboard/db/thumbor [14:24:09] (03CR) 10Muehlenhoff: "Before that happens we'll need another rsync from titanium->meitnerium, won't we?" [dns] - 10https://gerrit.wikimedia.org/r/308997 (https://phabricator.wikimedia.org/T123725) (owner: 10Ottomata) [14:25:13] moritzm: that can't hurt, there may not have been changes since then but we might as well do it too [14:25:20] can you easily do that? [14:26:05] (03CR) 10Andrew Bogott: [C: 032] Kill references to $::instancename [puppet] - 10https://gerrit.wikimedia.org/r/308903 (https://phabricator.wikimedia.org/T101447) (owner: 10Alex Monk) [14:26:18] (03PS2) 10Andrew Bogott: Kill references to $::instancename [puppet] - 10https://gerrit.wikimedia.org/r/308903 (https://phabricator.wikimedia.org/T101447) (owner: 10Alex Monk) [14:26:43] let's quickly check with Daniel when he's online, he made the initial sync [14:29:02] !log Deployed d4ad9ddbdd21a6460bb3f3a6fa6b74998e33d020 of wikidata/query/deploy: UI improvements [14:29:07] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [14:29:20] (03CR) 10Rush: [C: 032] labstore: nfs-export-daemon renamed to nfs-exportd [puppet] - 10https://gerrit.wikimedia.org/r/308996 (owner: 10Rush) [14:29:24] (03PS2) 10Rush: labstore: nfs-export-daemon renamed to nfs-exportd [puppet] - 10https://gerrit.wikimedia.org/r/308996 [14:29:28] (03CR) 10Rush: [V: 032] labstore: nfs-export-daemon renamed to nfs-exportd [puppet] - 10https://gerrit.wikimedia.org/r/308996 (owner: 10Rush) [14:33:11] ok [14:35:52] (03PS1) 10Cmjohnson: Adding mgmt dns entries for puppetmaster100[12] [dns] - 10https://gerrit.wikimedia.org/r/308999 [14:36:08] (03CR) 10jenkins-bot: [V: 04-1] Adding mgmt dns entries for puppetmaster100[12] [dns] - 10https://gerrit.wikimedia.org/r/308999 (owner: 10Cmjohnson) [14:38:33] 06Operations, 06Performance-Team, 10Thumbor: thumbor/exiftool deadlock, likely full pipe - https://phabricator.wikimedia.org/T144928#2614735 (10fgiunchedi) [14:38:48] 06Operations, 10Cassandra, 06Services: restbase2004.codfw.wmnet data corruption - https://phabricator.wikimedia.org/T144826#2614749 (10Eevans) [14:38:55] (03PS2) 10Cmjohnson: Adding mgmt dns entries for puppetmaster100[12] [dns] - 10https://gerrit.wikimedia.org/r/308999 [14:39:08] (03CR) 10jenkins-bot: [V: 04-1] Adding mgmt dns entries for puppetmaster100[12] [dns] - 10https://gerrit.wikimedia.org/r/308999 (owner: 10Cmjohnson) [14:40:00] (03PS4) 10BBlack: Remove geoiplookup DNS entries [dns] - 10https://gerrit.wikimedia.org/r/305422 (https://phabricator.wikimedia.org/T100902) [14:40:33] 06Operations, 10Cassandra, 06Services: restbase2004.codfw.wmnet data corruption - https://phabricator.wikimedia.org/T144826#2611596 (10Eevans) [14:40:48] (03PS1) 10Rush: labstore: nfs-exportd monitoring into a manifest [puppet] - 10https://gerrit.wikimedia.org/r/309000 [14:40:50] 06Operations: setup YubiHSM and laptop at office - https://phabricator.wikimedia.org/T123818#2614781 (10MoritzMuehlenhoff) a:05Muehlenhoff>03Dzahn As discussed, I'll prepare the script and we'll do the actual setup/some tests at the offsite. Assigning to Daniel as a reminder to bring that laptop with him :-) [14:42:04] (03PS1) 10Giuseppe Lavagetto: puppetmaster::puppetdb::client: add configs, directories [puppet] - 10https://gerrit.wikimedia.org/r/309002 [14:43:22] (03PS2) 10Giuseppe Lavagetto: puppetmaster::puppetdb::client: add configs, directories [puppet] - 10https://gerrit.wikimedia.org/r/309002 [14:43:34] (03CR) 10Giuseppe Lavagetto: [C: 032 V: 032] puppetmaster::puppetdb::client: add configs, directories [puppet] - 10https://gerrit.wikimedia.org/r/309002 (owner: 10Giuseppe Lavagetto) [14:45:03] !log T144826: Removing compaction rate limit, increasing compactor threads from 10 to 20, and beginning scrub of local_group_globaldomain_T_mathoid_png.data [14:45:05] T144826: restbase2004.codfw.wmnet data corruption - https://phabricator.wikimedia.org/T144826 [14:45:09] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [14:45:09] (03PS3) 10Cmjohnson: Adding mgmt dns entries for puppetmaster100[12] [dns] - 10https://gerrit.wikimedia.org/r/308999 [14:45:18] !log T144826: Removing compaction rate limit, increasing compactor threads from 10 to 20, and beginning scrub of local_group_globaldomain_T_mathoid_png.data (restbase2004-c.codfw.wmnet) [14:45:19] T144826: restbase2004.codfw.wmnet data corruption - https://phabricator.wikimedia.org/T144826 [14:45:29] (03CR) 10Cmjohnson: [C: 032] Adding mgmt dns entries for puppetmaster100[12] [dns] - 10https://gerrit.wikimedia.org/r/308999 (owner: 10Cmjohnson) [14:45:59] 06Operations: ganglia-monitor and puppet failing on bast3001 - https://phabricator.wikimedia.org/T144778#2614828 (10Dzahn) a) killall -u ganglia b) systemctl start ganglia-monitor (do _not_ use /etc/init.d/ ) c) run puppet back to normal and works [14:46:40] 06Operations: ganglia-monitor and puppet failing on bast3001 - https://phabricator.wikimedia.org/T144778#2614830 (10Dzahn) 05Open>03Resolved a:03Dzahn ``` root@bast3001:~# systemctl list-units | grep ganglia ganglia-monitor-aggregator@3002.service loaded act... [14:49:04] !log T144826: Restarting Cassandra on restbase2004-c.codfw.wmnet (scrub complete, re-joining cluster) [14:49:05] T144826: restbase2004.codfw.wmnet data corruption - https://phabricator.wikimedia.org/T144826 [14:49:09] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [14:49:33] Morning bblack... thx for the pings... Hopefully we can get that through today :) [14:50:17] 06Operations: ganglia-monitor and puppet failing on bast3001 - https://phabricator.wikimedia.org/T144778#2614834 (10Dzahn) P.S. It should not be a mix of sysvinit/systemd. All services controlled by systemd, because there is this: 24 service { 'ganglia-monitor': 25 ensure => running, 26... [14:50:44] (03CR) 10Faidon Liambotis: [C: 04-2] "(do not merge yet)" [puppet] - 10https://gerrit.wikimedia.org/r/308981 (owner: 10Faidon Liambotis) [14:50:52] 06Operations: puppet run stopping qrunner on fermium - https://phabricator.wikimedia.org/T144933#2614835 (10MoritzMuehlenhoff) [14:52:04] 06Operations, 06Performance-Team, 10Thumbor: thumbor/exiftool deadlock, likely full pipe - https://phabricator.wikimedia.org/T144928#2614849 (10fgiunchedi) possibly related to this, while debugging I've observed thumbor doing one-byte reads (from exiftool pipe I'd imagine) ``` [pid 87929] read(14, "\341", 1... [14:55:20] (03PS1) 10Giuseppe Lavagetto: ssh::client: add ssh key to template [puppet] - 10https://gerrit.wikimedia.org/r/309006 [14:55:29] !log mobileapps deploying 8b929cfe [14:55:33] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [14:55:46] (03CR) 10Giuseppe Lavagetto: [C: 032 V: 032] ssh::client: add ssh key to template [puppet] - 10https://gerrit.wikimedia.org/r/309006 (owner: 10Giuseppe Lavagetto) [14:56:38] 06Operations: ganglia-monitor and puppet failing on bast3001 - https://phabricator.wikimedia.org/T144778#2614852 (10MoritzMuehlenhoff) There's no systemd unit, the one running on bast3001 is auto-translated from the sysvinit script: root@bast3001:/etc/systemd/system# systemctl cat ganglia-monitor.service # /run... [15:02:09] !log shutting down mw2080-mw2085 for hardware maintenance (T142726) [15:02:11] T142726: Multiple servers in codfw fail to respond to IPMI commands during reimaging - https://phabricator.wikimedia.org/T142726 [15:02:14] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [15:03:37] (03PS1) 10Alex Monk: shinkengen: Remove use of puppetVars [puppet] - 10https://gerrit.wikimedia.org/r/309008 [15:03:39] (03PS1) 10Alex Monk: labs LDAP: remove puppetVar attribute [puppet] - 10https://gerrit.wikimedia.org/r/309009 [15:04:08] 06Operations: Multiple servers in codfw fail to respond to IPMI commands during reimaging - https://phabricator.wikimedia.org/T142726#2614875 (10MoritzMuehlenhoff) Thanks, third batch: mw2120-mw2139 mw2080-mw2085 [15:07:17] (03CR) 10Andrew Bogott: [C: 031] "Looks right to me but we should probably get a review from someone else more familiar with this" [puppet] - 10https://gerrit.wikimedia.org/r/309008 (owner: 10Alex Monk) [15:08:36] !log re-imaging labnet1002 for T136718 [15:08:37] T136718: labnet100[12].eqiad.wmnet need to be reimaged with RAID - https://phabricator.wikimedia.org/T136718 [15:08:40] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [15:10:06] (03PS1) 10Alex Monk: shinkengen: Remove old fix for ec2id -> fqdn ldap host entry migration [puppet] - 10https://gerrit.wikimedia.org/r/309011 [15:10:08] (03PS1) 10Alex Monk: shinkengen: Remove unused instance attributes [puppet] - 10https://gerrit.wikimedia.org/r/309012 [15:10:19] 06Operations, 10ops-eqiad: rack/setup/deploy puppetmaster100[12] - https://phabricator.wikimedia.org/T143219#2614921 (10Cmjohnson) [15:13:03] 06Operations, 10ops-eqiad: rack/setup/deploy puppetmaster100[12] - https://phabricator.wikimedia.org/T143219#2614929 (10Cmjohnson) [15:13:16] (03PS1) 10Gehel: graphite - add tests to configparser_format [puppet] - 10https://gerrit.wikimedia.org/r/309013 [15:13:26] 06Operations, 10Traffic, 13Patch-For-Review: Decom bits.wikimedia.org hostname - https://phabricator.wikimedia.org/T107430#2614930 (10BBlack) Looking at 24H of data from oxygen webrequest archive's `sampled-1000.json-20160907`, if I filter just for bits requests, `cut -d/ -f1-3` to coalesce long-path noise a... [15:14:07] 06Operations, 10ops-eqiad: rack/setup/deploy puppetmaster100[12] - https://phabricator.wikimedia.org/T143219#2561022 (10Cmjohnson) @Joe servers are accessible, feel free to take over and install once you figure out the partitioning scheme. [15:23:04] 06Operations, 06Performance-Team, 10Thumbor: thumbor error spawning ghostscript 'libcgroup initialization failed: Cgroup is not mounted' - https://phabricator.wikimedia.org/T144938#2614960 (10fgiunchedi) [15:24:12] 06Operations, 10hardware-requests, 10netops, 10Continuous-Integration-Infrastructure (phase-out-gallium): Allocate contint1001 to releng and allocate to a vlan - https://phabricator.wikimedia.org/T140257#2614974 (10hashar) Looping #netops . We would need contint1001 to be moved to the public network with... [15:26:53] (03PS1) 10Alex Monk: throttle: raise limit for Amrita University Hackathon [mediawiki-config] - 10https://gerrit.wikimedia.org/r/309015 [15:28:50] (03PS1) 10Elukey: Add a Substitute Apache directive to fix broken links in the Yarn UI [puppet] - 10https://gerrit.wikimedia.org/r/309016 (https://phabricator.wikimedia.org/T116192) [15:30:54] (03PS2) 10Rush: labstore: nfs-exportd monitoring into a manifest [puppet] - 10https://gerrit.wikimedia.org/r/309000 [15:31:25] (03CR) 1001tonythomas: [C: 031] "Thank you @Krenair" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/309015 (owner: 10Alex Monk) [15:32:19] (03CR) 10Alex Monk: [C: 032] throttle: raise limit for Amrita University Hackathon [mediawiki-config] - 10https://gerrit.wikimedia.org/r/309015 (owner: 10Alex Monk) [15:32:45] (03Merged) 10jenkins-bot: throttle: raise limit for Amrita University Hackathon [mediawiki-config] - 10https://gerrit.wikimedia.org/r/309015 (owner: 10Alex Monk) [15:34:32] okay [15:34:37] mw2080.codfw.wmnet is a scap proxy [15:34:39] but seems to be down? [15:34:45] (03CR) 10Gehel: "As far as I can see, wdqs-admins should already be notified via mostly magical hiera indirection done at https://github.com/wikimedia/oper" [puppet] - 10https://gerrit.wikimedia.org/r/308989 (owner: 10Hoo man) [15:35:06] moritzm [15:35:39] (03PS1) 10BBlack: ciphersuites: chapoly ordering fixups [puppet] - 10https://gerrit.wikimedia.org/r/309017 [15:36:11] !log krenair@tin Synchronized wmf-config/throttle.php: https://gerrit.wikimedia.org/r/#/c/309015/ (duration: 02m 50s) [15:36:15] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [15:37:37] (03CR) 10BBlack: [C: 032 V: 032] ciphersuites: chapoly ordering fixups [puppet] - 10https://gerrit.wikimedia.org/r/309017 (owner: 10BBlack) [15:40:52] 06Operations, 10ops-eqiad, 10media-storage: diagnose failed(?) sda on ms-be1022 - https://phabricator.wikimedia.org/T140597#2615147 (10Cmjohnson) @godog: I want to swap the ssd slots again. I am doing that now...can you reinstall and let me know what the msg logs state. Thanks [15:41:43] <_joe_> icinga-wm: ? [15:43:05] !log Rebooting host labstore1004 [15:43:10] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [15:44:09] thcipriani, when a scap proxy dies... what happens? [15:44:25] hosts choose other proxies? [15:44:38] possibly. It depends on how dead it is [15:44:43] ^ [15:45:16] I think this was physically powered off [15:45:36] each MW hosts gets a list of all proxies and then does a tcp connect with ever increasing TTL until it finds one that responds [15:46:16] a completely down host should never ack the tcp connect ping and therefore not get used [15:46:21] I meant to dig through this when this happened the other day, but the remaining scap servers use this https://github.com/wikimedia/scap/blob/master/scap/utils.py#L114 to find proxies [15:47:04] so if a proxy can't be connected to on port 22, it shouldn't get used as a proxy node [15:47:26] icinga-wm: ping [15:47:35] we should really switch the port to be the rsync demon port [15:47:47] that's what I meant about "depends on how down it is" [15:48:26] eh, sort of debatable, if you can't ssh to the host, you can't get the new code from the deployment server in the first place...so it kinda needs both to work correctly. [15:48:47] hrm true [15:48:48] it'd be bad if rsync was up, but ssh was down, but vice-versa is also bad [15:49:09] this all needs a "version" to check too really [15:49:45] the master should send that out to the MW servers and they should ensure that the replica they clone from has that same version [15:51:38] is mw2080.codfw.wmnet the host that mutante was manually taking out of the dsh group on tin last week? [15:52:06] down for hardware -- https://tools.wmflabs.org/sal/log/AVcFK6iSpirJUPGy-4xa [15:52:10] mw2187 I think [15:52:22] (03CR) 10Rush: [C: 032] labstore: nfs-exportd monitoring into a manifest [puppet] - 10https://gerrit.wikimedia.org/r/309000 (owner: 10Rush) [15:52:26] (03PS3) 10Rush: labstore: nfs-exportd monitoring into a manifest [puppet] - 10https://gerrit.wikimedia.org/r/309000 [15:52:28] (03CR) 10Rush: [V: 032] labstore: nfs-exportd monitoring into a manifest [puppet] - 10https://gerrit.wikimedia.org/r/309000 (owner: 10Rush) [15:53:04] !log graphite1002 swapping failed disk slot10 [15:53:05] looks like its only been down for 50 minutes or so [15:53:09] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [15:53:25] <_joe_> volans: done [15:53:28] (03PS1) 10Filippo Giunchedi: thumbor: disable exiftool's stay open feature [puppet] - 10https://gerrit.wikimedia.org/r/309021 (https://phabricator.wikimedia.org/T144928) [15:53:43] _joe_: thanks [15:54:26] 06Operations, 06Discovery, 10Wikidata, 10Wikidata-Query-Service: some icinga checks on WDQS do not send notifications - https://phabricator.wikimedia.org/T144948#2615203 (10Gehel) [15:55:21] _joe_: since your around: if mw2080 is down for maintenance doesn't the dsh generation thing you made for puppet remove it from the dsh files? Or is there a missing step that needs to be performed? [15:55:48] <_joe_> it should, yes, as long as a puppet run has happened since it was disabled [15:56:05] <_joe_> thcipriani: why? [15:57:00] _joe_: eh, Krenair was syncing something that got hung trying to connect to mw2080 and bd808 found this https://tools.wmflabs.org/sal/log/AVcFK6iSpirJUPGy-4xa [15:57:16] 06Operations, 10ops-eqiad: graphite1002.eqiad.wmnet: slot=10 disk failed - https://phabricator.wikimedia.org/T141795#2615226 (10Cmjohnson) Disk replaced. [15:57:36] <_joe_> thcipriani: maybe moritzm didn't disable the hosts in conftool? [15:58:13] <_joe_> or maybe Krenair synced less than half an hour after moritzm did that [15:58:36] 06Operations, 10ops-eqiad, 10fundraising-tech-ops: rack/setup berryllium replacment - https://phabricator.wikimedia.org/T143902#2615247 (10Cmjohnson) Named frauth1001 racked in C1. DNS completed [15:58:53] it was about 31 minutes after [15:59:03] 06Operations, 10ops-eqiad, 10fundraising-tech-ops: rack/setup berryllium replacment - https://phabricator.wikimedia.org/T143902#2615248 (10Cmjohnson) [15:59:21] (03PS2) 10Filippo Giunchedi: thumbor: disable exiftool's stay open feature [puppet] - 10https://gerrit.wikimedia.org/r/309021 (https://phabricator.wikimedia.org/T144928) [15:59:27] fwiw host is missing from https://config-master.wikimedia.org/conftool/codfw/apaches, but I suppose the lack of puppet run could explain it [15:59:33] !log deploying wdqs, fix for T144913 [15:59:35] (03CR) 10Filippo Giunchedi: [C: 032 V: 032] thumbor: disable exiftool's stay open feature [puppet] - 10https://gerrit.wikimedia.org/r/309021 (https://phabricator.wikimedia.org/T144928) (owner: 10Filippo Giunchedi) [15:59:35] T144913: wdqs-updater tries to apply updates from non-Entity pages, fails - https://phabricator.wikimedia.org/T144913 [15:59:38] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [15:59:48] <_joe_> thcipriani: yes it's missing from dsh too now [16:01:41] _joe_: ah, so maybe that's the weirdness: It's still in /etc/dsh/group/scap-proxies [16:02:09] (03PS1) 10Chad: Group1 to wmf.18 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/309022 [16:03:09] !log db1020 swapping failed disk slot 5 [16:03:14] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [16:05:18] 06Operations, 10ops-eqiad: db1020 degraded array - https://phabricator.wikimedia.org/T144793#2615275 (10Cmjohnson) Disk swapped. [16:05:53] _joe_: they're depooled with inactive, time window was too close [16:09:06] (03CR) 10Chad: [C: 031] "I say just go ahead and merge and mention it on wikitech-l or something. It can't be that big a deal." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/208655 (https://phabricator.wikimedia.org/T94416) (owner: 10Aude) [16:09:14] !log restarted ircecho on neon after rotating the irc.log file [16:09:20] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [16:10:35] <_joe_> moritzm: no it's scap proxies [16:10:46] <_joe_> they're not under etcd control [16:10:54] <_joe_> as per my email at the time [16:12:49] 06Operations, 10ops-eqiad, 10fundraising-tech-ops: Rack/Setup pay-lvs1001[2] - https://phabricator.wikimedia.org/T143900#2615295 (10Cmjohnson) [16:13:06] (03CR) 10Jdlrobson: [C: 031] "Sam can you remove your -2?" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/308990 (https://phabricator.wikimedia.org/T143345) (owner: 10Phuedx) [16:14:46] (03PS1) 10Rush: labstore: fix manifest declaraton path for labstore::monitoring::exports [puppet] - 10https://gerrit.wikimedia.org/r/309025 [16:15:24] (03PS2) 10Rush: labstore: fix manifest declaraton path for labstore::monitoring::exports [puppet] - 10https://gerrit.wikimedia.org/r/309025 [16:16:09] (03CR) 10Rush: [C: 032 V: 032] labstore: fix manifest declaraton path for labstore::monitoring::exports [puppet] - 10https://gerrit.wikimedia.org/r/309025 (owner: 10Rush) [16:18:39] (03PS1) 10Gehel: wdqs - initial configuration of wdqs servers in codfw [puppet] - 10https://gerrit.wikimedia.org/r/309026 [16:20:31] (03PS1) 10Filippo Giunchedi: swift: disable thumbor shadow traffic [puppet] - 10https://gerrit.wikimedia.org/r/309029 (https://phabricator.wikimedia.org/T121388) [16:21:47] RECOVERY - Apache HTTP on mw2204 is OK: HTTP OK: HTTP/1.1 200 OK - 10975 bytes in 0.076 second response time [16:22:46] just reimaged --^ [16:24:04] 06Operations, 07HHVM: Migrate deployment servers (tin/mira) to jessie - https://phabricator.wikimedia.org/T144578#2604309 (10demon) Talked about it in our team meeting, and I think we're inclined towards bare metal here. Couple of reasons: # Disk performance. Deployments require a lot of disk io: git and rsyn... [16:24:49] (03CR) 10Filippo Giunchedi: [C: 032] swift: disable thumbor shadow traffic [puppet] - 10https://gerrit.wikimedia.org/r/309029 (https://phabricator.wikimedia.org/T121388) (owner: 10Filippo Giunchedi) [16:27:10] (03CR) 10Jdlrobson: Disable Wikidata descriptions for 6 Wikipedias (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/308990 (https://phabricator.wikimedia.org/T143345) (owner: 10Phuedx) [16:27:46] (03PS1) 10Jdlrobson: Revert "Enable Wikidata descriptions on all wikipedias" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/309032 [16:28:15] jouncebot: next [16:28:15] In 1 hour(s) and 31 minute(s): Morning SWAT (Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20160907T1800) [16:28:51] 06Operations, 10ops-codfw: mw2202/mw2203 failed to install - https://phabricator.wikimedia.org/T144911#2614219 (10elukey) Had the same problem with 220[567] and fixed it adding manually rootdelay=60 at boot. [16:29:41] (03PS1) 10Jdlrobson: Remove wikidata descriptions from additional projects [mediawiki-config] - 10https://gerrit.wikimedia.org/r/309033 [16:30:18] (03CR) 10jenkins-bot: [V: 04-1] Remove wikidata descriptions from additional projects [mediawiki-config] - 10https://gerrit.wikimedia.org/r/309033 (owner: 10Jdlrobson) [16:31:36] RECOVERY - dhclient process on mw2204 is OK: PROCS OK: 0 processes with command name dhclient [16:31:49] RECOVERY - nutcracker port on mw2204 is OK: TCP OK - 0.000 second response time on 127.0.0.1 port 11212 [16:31:53] cmjohnson1 chasemp I'm ready to go [16:32:01] Scheduled downtime on labstore1003 [16:32:06] RECOVERY - nutcracker process on mw2204 is OK: PROCS OK: 1 process with UID = 110 (nutcracker), command name nutcracker [16:32:07] okay....let's power her off [16:32:19] cmjohnson1: okay doing now [16:32:58] cmjohnson1: okay i've shut it down [16:33:13] k [16:33:26] RECOVERY - Check size of conntrack table on mw2204 is OK: OK: nf_conntrack is 0 % full [16:34:06] RECOVERY - Disk space on mw2204 is OK: DISK OK [16:34:16] RECOVERY - MD RAID on mw2204 is OK: OK: Active: 4, Working: 4, Failed: 0, Spare: 0 [16:34:40] silenced in icinga [16:35:07] RECOVERY - configured eth on mw2204 is OK: OK - interfaces up [16:35:45] (03CR) 10Volans: "Nice!" [puppet] - 10https://gerrit.wikimedia.org/r/309013 (owner: 10Gehel) [16:36:49] RECOVERY - MegaRAID on db1020 is OK: OK: optimal, 1 logical, 2 physical [16:38:16] RECOVERY - MegaRAID on graphite1002 is OK: OK: optimal, 2 logical, 4 physical [16:39:29] madhuvishy/chasemp: mgmt is up [16:40:07] 06Operations, 06Performance-Team, 10Thumbor: thumbor handling of originals 404 - https://phabricator.wikimedia.org/T144956#2615390 (10fgiunchedi) [16:40:07] RECOVERY - NTP on mw2204 is OK: NTP OK: Offset 0.0002719163895 secs [16:40:11] 06Operations, 10ops-eqiad: db1020 degraded array - https://phabricator.wikimedia.org/T144793#2615404 (10Cmjohnson) 05Open>03Resolved MegaRAID on db1020 is OK: OK: optimal, 1 logical, 2 physical [16:40:35] 06Operations, 10ops-eqiad: graphite1002.eqiad.wmnet: slot=10 disk failed - https://phabricator.wikimedia.org/T141795#2615406 (10Cmjohnson) 05Open>03Resolved MegaRAID on graphite1002 is OK: OK: optimal, 2 logical, 4 physical [16:40:48] cmjohnson1: cool I see that [16:41:27] (03PS1) 10Elukey: Add the apache::mod::substitute class [puppet] - 10https://gerrit.wikimedia.org/r/309035 [16:41:47] RECOVERY - DPKG on mw2204 is OK: All packages OK [16:41:53] cmjohnson1: how can i bring labstore1003 back up? [16:42:07] it's not coming up? [16:42:11] !log demon@tin Synchronized php-1.28.0-wmf.18/includes/libs/objectcache/WANObjectCache.php: for aaron <3 (duration: 02m 50s) [16:42:16] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [16:42:19] cmjohnson1: i can't ssh in so far [16:42:31] (03CR) 10Gehel: "@volans: not that I really wanted to cover all test cases, but since you mention it..." [puppet] - 10https://gerrit.wikimedia.org/r/309013 (owner: 10Gehel) [16:42:56] AaronSchulz: Done ^ [16:42:58] (03PS2) 10Gehel: wdqs - initial configuration of wdqs servers in codfw [puppet] - 10https://gerrit.wikimedia.org/r/309026 [16:43:06] Hmm, couldn't sync to scap-proxy mw2080.codfw.wmnet. Known? [16:43:21] heh [16:43:23] ostriches: yes. host if fully down [16:43:36] *is [16:43:38] We should swap the proxy then :) [16:44:05] madhusvishy: it may help to turn the server on...hahaha...just did it [16:44:21] the toggle the on/off button comes in handy from time to tie [16:44:21] cmjohnson1: :) [16:45:06] PROBLEM - Host labstore1003 is DOWN: PING CRITICAL - Packet loss = 100% [16:45:21] Hmm, another apache in B3. [16:45:38] labstore1003 is me, I scheduled 15 minute downtime, but it should be back in a sec [16:46:27] RECOVERY - Host labstore1003 is UP: PING WARNING - Packet loss = 73%, RTA = 2.41 ms [16:46:52] cmjohnson1: looks good [16:46:59] awesome! [16:47:11] chasemp: all good, checking tools now, new kernel version came up [16:47:13] 06Operations, 10DBA, 10MediaWiki-Maintenance-scripts, 06Release-Engineering-Team, and 2 others: Add section for long-running tasks on the Deployment page (specially for database maintenance) - https://phabricator.wikimedia.org/T144661#2615428 (10greg) [16:47:22] (03PS3) 10Gehel: wdqs - initial configuration of wdqs servers in codfw [puppet] - 10https://gerrit.wikimedia.org/r/309026 (https://phabricator.wikimedia.org/T144380) [16:47:26] cmjohnson1: thank you :) [16:47:32] YW [16:48:40] (03PS1) 10Legoktm: contint: Add some more PHP 7 packages [puppet] - 10https://gerrit.wikimedia.org/r/309039 [16:49:40] (03CR) 10Paladox: [C: 031] contint: Add some more PHP 7 packages [puppet] - 10https://gerrit.wikimedia.org/r/309039 (owner: 10Legoktm) [16:50:39] (03PS1) 10Muehlenhoff: zuul::merger: Convert to ferm service and restrict to labs + gallium [puppet] - 10https://gerrit.wikimedia.org/r/309041 [16:51:10] (03PS1) 10Chad: Remove mw2080 from scap-proxy list [puppet] - 10https://gerrit.wikimedia.org/r/309043 [16:51:14] (03CR) 10Hashar: [C: 031] "Need this to be merged in puppet.git then the image can be refreshed based on doc at https://wikitech.wikimedia.org/wiki/Nodepool#Manually" [puppet] - 10https://gerrit.wikimedia.org/r/309039 (owner: 10Legoktm) [16:51:22] (03CR) 10Muehlenhoff: [C: 031] contint: Add some more PHP 7 packages [puppet] - 10https://gerrit.wikimedia.org/r/309039 (owner: 10Legoktm) [16:51:29] _joe_: Mind having a look at 309043? ^^ [16:51:46] moritzm: would you like to +2 that? :) [16:51:55] PROBLEM - check mtime mod from tools cron job on checker.tools.wmflabs.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 SERVICE UNAVAILABLE - string OK not found on http://checker.tools.wmflabs.org:80/toolscron - 185 bytes in 0.044 second response time [16:52:13] (03CR) 10Phuedx: Disable Wikidata descriptions for 6 Wikipedias (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/308990 (https://phabricator.wikimedia.org/T143345) (owner: 10Phuedx) [16:52:23] (03PS3) 10Phuedx: Disable Wikidata descriptions for 6 Wikipedias [mediawiki-config] - 10https://gerrit.wikimedia.org/r/308990 (https://phabricator.wikimedia.org/T143345) [16:52:34] legoktm: sure, let me do that [16:52:52] <_joe_> ostriches: maybe tomorrow? [16:53:27] 06Operations, 10hardware-requests: eqiad: (4) spare pool servers for kubernetes - https://phabricator.wikimedia.org/T141624#2615456 (10mark) p:05Low>03Normal Now we're making this the main goal for next quarter, raising priority again. However, TPMs (trusted platform modules) would be very useful for thes... [16:53:47] <_joe_> thcipriani, ostriches I decided to take a radical approach at scap::source - https://gerrit.wikimedia.org/r/#/c/308973/ [16:54:38] RECOVERY - check mtime mod from tools cron job on checker.tools.wmflabs.org is OK: HTTP OK: HTTP/1.1 200 OK - 166 bytes in 0.143 second response time [16:55:12] * thcipriani looks [16:55:40] <_joe_> thcipriani: the ruby style needs some grease, as you might notice [16:56:00] 06Operations, 10hardware-requests: eqiad: (4) worker servers for kubernetes - https://phabricator.wikimedia.org/T141624#2615461 (10mark) [16:57:35] ostriches: it's really quick change, papaul can you expedite mw2080 for the IPMI setting? [16:58:21] (03CR) 10Muehlenhoff: [C: 032] contint: Add some more PHP 7 packages [puppet] - 10https://gerrit.wikimedia.org/r/309039 (owner: 10Legoktm) [16:58:43] moritzm: Whatever is easiest :) [16:59:06] _joe_: this looks pretty neat, I like this approach (although this may say more about my feelings about the puppet dsl than anything :)) [16:59:16] (03PS4) 10Aaron Schulz: Avoid pointless ChronologyProtector duplicate key notices [mediawiki-config] - 10https://gerrit.wikimedia.org/r/308662 [16:59:20] moritzm: thanks :) [16:59:21] (03PS5) 10Aaron Schulz: Avoid pointless ChronologyProtector duplicate key notices [mediawiki-config] - 10https://gerrit.wikimedia.org/r/308662 [16:59:51] <_joe_> thcipriani: whenever you have to do too many execs, it's time to write a custom type [17:00:10] <_joe_> thcipriani: I think people are scared of custom types because the documentation is very very bad [17:01:11] indeed, docs for this are poor to nonexistent ime [17:04:38] RECOVERY - salt-minion processes on mw2204 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion [17:07:25] PROBLEM - Ensure NFS exports are maintained for new instances with NFS on labstore1002 is CRITICAL: CRITICAL - Expecting active but unit nfs-exportd is inactive [17:10:06] RECOVERY - Ensure NFS exports are maintained for new instances with NFS on labstore1002 is OK: OK - nfs-exportd is active [17:12:16] RECOVERY - puppet last run on mw2204 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [17:12:43] hoo: Did that jquery.uls module get pulled into wmf.18? Doesn't appear so [17:12:48] (03Abandoned) 10Muehlenhoff: Show latest 4.4.x release version in uname [debs/linux44] - 10https://gerrit.wikimedia.org/r/308151 (owner: 10Muehlenhoff) [17:14:19] ostriches: ho k [17:17:12] 06Operations: Multiple servers in codfw fail to respond to IPMI commands during reimaging - https://phabricator.wikimedia.org/T142726#2615567 (10MoritzMuehlenhoff) >>! In T142726#2614875, @MoritzMuehlenhoff wrote: > Thanks, third batch: > mw2120-mw2139 > mw2080-mw2085 I've en-enabled mw2080, please skip that... [17:19:12] ostriches: I'm current powering mw2080 back up, we can deal with it on Friday when there's no deployments [17:20:57] (03CR) 10Aaron Schulz: [C: 032] Avoid pointless ChronologyProtector duplicate key notices [mediawiki-config] - 10https://gerrit.wikimedia.org/r/308662 (owner: 10Aaron Schulz) [17:21:24] (03Merged) 10jenkins-bot: Avoid pointless ChronologyProtector duplicate key notices [mediawiki-config] - 10https://gerrit.wikimedia.org/r/308662 (owner: 10Aaron Schulz) [17:21:50] moritzm: Ok [17:23:05] !log aaron@tin Synchronized wmf-config/redis.php: Avoid pointless ChronologyProtector duplicate key notices (duration: 00m 47s) [17:23:11] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [17:25:02] 06Operations, 10hardware-requests, 10netops, 10Continuous-Integration-Infrastructure (phase-out-gallium): Allocate contint1001 to releng and allocate to a vlan - https://phabricator.wikimedia.org/T140257#2615667 (10RobH) So I can handle the vlan move and reimage. Just to confirm there is no data that is c... [17:26:06] (03PS2) 10Jdlrobson: Remove wikidata descriptions from additional projects [mediawiki-config] - 10https://gerrit.wikimedia.org/r/309033 [17:26:41] (03CR) 10jenkins-bot: [V: 04-1] Remove wikidata descriptions from additional projects [mediawiki-config] - 10https://gerrit.wikimedia.org/r/309033 (owner: 10Jdlrobson) [17:28:01] hey... https://integration.wikimedia.org/ci/job/operations-mw-config-phpunit/8993/console < am i reading this correctly that I cannot include generic project names in dblists? [17:29:03] (03PS3) 10Jdlrobson: Remove wikidata descriptions from additional projects [mediawiki-config] - 10https://gerrit.wikimedia.org/r/309033 [17:29:05] (03PS2) 10Jdlrobson: Revert "Enable Wikidata descriptions on all wikipedias" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/309032 [17:29:10] ^ bd808 any idea? [17:29:50] (03CR) 10jenkins-bot: [V: 04-1] Remove wikidata descriptions from additional projects [mediawiki-config] - 10https://gerrit.wikimedia.org/r/309033 (owner: 10Jdlrobson) [17:30:36] jdlrobson: yeah... I think you have to include other dblists to do that [17:31:00] bd808:oh dears... that makes things ever so confusing. Is there a logical reason not to allow that? [17:31:05] 06Operations, 10hardware-requests, 10netops, 10Continuous-Integration-Infrastructure (phase-out-gallium): Allocate contint1001 to releng and allocate to a vlan - https://phabricator.wikimedia.org/T140257#2615745 (10RobH) a:03RobH checked in release engineering, its cool for me to reimage this now (after... [17:31:08] 06Operations, 06Release-Engineering-Team, 15User-greg, 07Wikimedia-Incident: Improve reminders for teams/people to address identified actionables from incident reports - https://phabricator.wikimedia.org/T141287#2615747 (10greg) [17:31:22] jdlrobson: well , yes. The dblist needs to know how to expand [17:32:07] the way to read other lists is like this example -- https://github.com/wikimedia/operations-mediawiki-config/blob/master/dblists/group2.dblist [17:33:01] 06Operations, 06Release-Engineering-Team, 15User-greg, 07Wikimedia-Incident: Improve reminders for teams/people to address identified actionables from incident reports - https://phabricator.wikimedia.org/T141287#2493130 (10greg) [17:33:02] bd808: ahhhh [17:33:04] that's magic [17:33:13] it is indeed. [17:33:24] 06Operations, 13Patch-For-Review: Audit/fix hosts with no RAID configured - https://phabricator.wikimedia.org/T136562#2615767 (10Andrew) [17:33:26] 06Operations, 06Labs: labnet100[12].eqiad.wmnet need to be reimaged with RAID - https://phabricator.wikimedia.org/T136718#2615765 (10Andrew) 05Open>03Resolved Labnet1001 is now the live network/api host, and I just reimaged labnet1002 with a raid. [17:34:08] (03PS1) 10محمد شعیب: Add massmessage-sender group to urwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/309047 (https://phabricator.wikimedia.org/T144701) [17:34:08] jdlrobson: rules at https://github.com/wikimedia/operations-mediawiki-config/blob/9f6354c700fc7712ae784431495fc6fd3b15cf23/multiversion/MWWikiversions.php#L53 [17:34:20] (03CR) 10jenkins-bot: [V: 04-1] Add massmessage-sender group to urwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/309047 (https://phabricator.wikimedia.org/T144701) (owner: 10محمد شعیب) [17:34:27] cool that was the magic i was looking for. Thanks bd808 [17:35:42] (03PS4) 10Jdlrobson: Remove wikidata descriptions from additional projects [mediawiki-config] - 10https://gerrit.wikimedia.org/r/309033 [17:38:31] 06Operations, 06Release-Engineering-Team, 15User-greg, 07Wikimedia-Incident: Plan how to improve reminders for teams/people to address identified actionables from incident reports - https://phabricator.wikimedia.org/T141287#2493130 (10greg) [17:38:46] 06Operations, 06Release-Engineering-Team, 15User-greg, 07Wikimedia-Incident: Plan how to improve reminders for teams/people to address identified actionables from incident reports - https://phabricator.wikimedia.org/T141287#2493130 (10greg) [17:39:29] mutante: Right now I have self-signed certs on labtestwikitech and labtesthorizon. It should be simple to replace those with LE certs, right? [17:39:36] Can you point me to the puppet class I'd use? [17:39:48] 06Operations, 06Release-Engineering-Team, 15User-greg, 07Wikimedia-Incident: Plan how to improve reminders for teams/people to address identified actionables from incident reports - https://phabricator.wikimedia.org/T141287#2493130 (10greg) 05Open>03Resolved a:03greg With the retitling and documentin... [17:42:47] (03PS1) 10RobH: contint1001 install params [puppet] - 10https://gerrit.wikimedia.org/r/309050 [17:46:36] (03PS1) 10RobH: contint1001 moving to public vlan [dns] - 10https://gerrit.wikimedia.org/r/309052 [17:46:48] (03PS1) 10Urbanecm: Enable Extension:SandboxLink for tcywiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/309053 (https://phabricator.wikimedia.org/T144925) [17:46:50] (03CR) 10RobH: [C: 032] contint1001 install params [puppet] - 10https://gerrit.wikimedia.org/r/309050 (owner: 10RobH) [17:46:58] ostriches: Argh, are you rolling the train to group 1? [17:47:16] (03CR) 10RobH: [C: 032] contint1001 moving to public vlan [dns] - 10https://gerrit.wikimedia.org/r/309052 (owner: 10RobH) [17:47:21] James_F: They see me rollin', they hatin'..... [17:47:53] ostriches: But after SWAT, right? 'Cos VE and other things in the train are totally screwed right now thanks to ULS. [17:48:21] andrewbogott: letsencrypt::cert::integrated is what you want [17:48:27] James_F: Yes after swat [17:48:45] OK, thanks. :-) [17:49:00] ostriches: that seems like a reasonable name [17:49:26] It's pretty easy, we're using it in the gerrit manifest now too. [17:50:22] phabricator should be switched to letsencrypt too :) [17:53:18] (03PS1) 10Urbanecm: Enable Education Program extension on ur.wp [mediawiki-config] - 10https://gerrit.wikimedia.org/r/309054 (https://phabricator.wikimedia.org/T144927) [17:53:25] mutante: hiyaa, can you do a 'final' rsync to the new archiva box? [17:53:32] would like to merge this today [17:53:33] https://gerrit.wikimedia.org/r/#/c/308997/ [17:53:40] so we can try a archiva release tomorrow [17:53:54] ottomata i think mutante is on holiday for the rest of the week [17:53:57] oh! [17:53:58] ok. [17:54:12] ok i'll see if i can do the rsync myself then, ok thanks paladox [17:54:28] Your welcome [17:58:50] 06Operations, 10hardware-requests, 10netops, 10Continuous-Integration-Infrastructure (phase-out-gallium): Allocate contint1001 to releng and allocate to a vlan - https://phabricator.wikimedia.org/T140257#2615947 (10RobH) [18:00:04] anomie, ostriches, thcipriani, hashar, and twentyafterfour: Respected human, time to deploy Morning SWAT (Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20160907T1800). Please do the needful. [18:00:04] RoanKattouw and Jdlrobson: A patch you scheduled for Morning SWAT (Max 8 patches) is about to be deployed. Please be available during the process. [18:00:10] \o [18:00:46] Why does not jouncebot know about my patch? [18:00:54] Urbanecm: Did you add it too late? [18:01:00] Before a minute. [18:01:11] I'm here [18:01:15] James_F: https://wikitech.wikimedia.org/w/index.php?title=Deployments&diff=820626&oldid=820480 [18:01:23] Urbanecm: Yes, I saw. [18:01:31] 06Operations, 10Monitoring, 10Traffic, 07HTTPS: adjust ssl certificate montioring to differentiate between standard and LE certificates. - https://phabricator.wikimedia.org/T144293#2615965 (10AlexMonk-WMF) causes the labtestwikitech alert that the labs team noticed [18:01:44] whoo, lots of changes. I can SWAT today. [18:03:08] Thanks thcipriani [18:04:53] hrm has it always been the case that jenkins-bot -1's dependencies https://gerrit.wikimedia.org/r/#/c/308922/ ? Or am I misreading this somehow? [18:06:22] lots of things in zuul. Let's get some config changes done while we wait on those patches to bake. [18:07:31] (03Merged) 10jenkins-bot: Revert "Enable Wikidata descriptions on all wikipedias" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/309032 (owner: 10Jdlrobson) [18:09:26] jdlrobson: https://gerrit.wikimedia.org/r/#/c/309032/2 is live on mw1099, check please [18:10:45] thanks thcipriani checking [18:11:32] thcipriani: works! [18:11:39] jdlrobson: ack, going everywhere [18:12:30] RoanKattouw: thanks for uls fix. [18:12:39] (03PS1) 10محمد شعیب: Enable Education Program extension at urwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/309062 (https://phabricator.wikimedia.org/T144927) [18:12:48] (03CR) 10jenkins-bot: [V: 04-1] Enable Education Program extension at urwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/309062 (https://phabricator.wikimedia.org/T144927) (owner: 10محمد شعیب) [18:13:14] !log thcipriani@tin Synchronized dblists/nowikidatadescriptiontaglines.dblist: SWAT: [[gerrit:309032|Revert "Enable Wikidata descriptions on all wikipedias"]] (duration: 00m 47s) [18:13:20] ^ jdlrobson live everywhere [18:13:20] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [18:13:33] kart_: Sorry for not seeing your ping earlier. It would have been better to do in the 6am PDT SWAT probably, but we found this bug at like 6pm PDT, so nobody who could be around for the 6am PDT SWAT was still awake/around [18:13:58] RoanKattouw: no worries. [18:14:12] thanks thcipriani [18:14:25] confirmed that works! (you didn't swat the second yet right?) [18:14:37] RoanKattouw: regression fix is included in the new patch, I'll probably add in SWAT tomorrow after discussing with team. [18:14:41] jdlrobson: right, not yet, didn't realize that it was more of the same :) [18:15:35] RoanKattouw: FWIW, I was around at 06:00. ;-) [18:16:08] That is really earley :) [18:16:37] RoanKattouw: https://gerrit.wikimedia.org/r/#/c/308915/ is live on mw1099, check please [18:20:02] Checking [18:20:24] grrrit-wm is having a bad time [18:20:58] thcipriani: Works great [18:21:05] !log aaron@tin Synchronized php-1.28.0-wmf.18/includes/db/loadbalancer/LoadBalancer.php: ddd35a6ccedb68bad41d17c67e4408afe5ca4ae6 (duration: 00m 45s) [18:21:10] (03PS1) 10Rush: labstore: statistics and scratch mount definitions [puppet] - 10https://gerrit.wikimedia.org/r/309063 [18:21:11] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [18:21:11] RoanKattouw: awesome, sync-dir fine here? [18:21:40] fine/the only way :) [18:23:07] Yup [18:24:18] !log thcipriani@tin Synchronized php-1.28.0-wmf.18/extensions/UniversalLanguageSelector: SWAT: [[gerrit:308915|Revert "Update jquery.uls to a9dc11b" (T144871)]] (duration: 00m 47s) [18:24:19] T144871: TypeError: this.options.quickList is null jquery.uls.lcd.js:319:9 - https://phabricator.wikimedia.org/T144871 [18:24:23] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [18:24:32] ^ RoanKattouw live everywhere [18:25:13] (03CR) 10Thcipriani: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/309033 (owner: 10Jdlrobson) [18:25:32] (03CR) 10Madhuvishy: [C: 031] labstore: statistics and scratch mount definitions [puppet] - 10https://gerrit.wikimedia.org/r/309063 (owner: 10Rush) [18:25:39] (03Merged) 10jenkins-bot: Remove wikidata descriptions from additional projects [mediawiki-config] - 10https://gerrit.wikimedia.org/r/309033 (owner: 10Jdlrobson) [18:26:55] jdlrobson: https://gerrit.wikimedia.org/r/#/c/309033/ is live on mw1099, check please [18:27:49] thcipriani: works! [18:27:57] jdlrobson: ack, going everywhere [18:28:10] thanks thcipriani [18:29:31] !log thcipriani@tin Synchronized dblists/nowikidatadescriptiontaglines.dblist: SWAT: [[gerrit:309033|Remove wikidata descriptions from additional projects]] (duration: 00m 45s) [18:29:32] ^ jdlrobson live everywhere [18:29:37] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [18:30:21] thcipriani: perfect! thanks for all your help! [18:30:42] jdlrobson: no problem, thanks for checking to make sure everything works as expected. [18:30:58] (03PS2) 10Thcipriani: Enable Extension:SandboxLink for tcywiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/309053 (https://phabricator.wikimedia.org/T144925) (owner: 10Urbanecm) [18:31:30] (03CR) 10Thcipriani: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/309053 (https://phabricator.wikimedia.org/T144925) (owner: 10Urbanecm) [18:32:09] (03Merged) 10jenkins-bot: Enable Extension:SandboxLink for tcywiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/309053 (https://phabricator.wikimedia.org/T144925) (owner: 10Urbanecm) [18:32:44] Urbanecm: https://gerrit.wikimedia.org/r/#/c/309053 is live on mw1099, check please [18:33:22] Checking [18:33:50] Working [18:33:58] Urbanecm: ack, going everywhere [18:34:42] Okay [18:35:36] !log thcipriani@tin Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:309053|Enable Extension:SandboxLink for tcywiki (T144925)]] (duration: 00m 47s) [18:35:37] T144925: Enable Extension:SandboxLink for Tulu Wikipedia (tcy wp) - https://phabricator.wikimedia.org/T144925 [18:35:41] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [18:35:42] ^ Urbanecm live everywhere now [18:35:49] Thx thcipriani [18:36:07] thanks for checking :) [18:36:41] (03CR) 10jenkins-bot: [V: 04-1] labstore: statistics and scratch mount definitions [puppet] - 10https://gerrit.wikimedia.org/r/309063 (owner: 10Rush) [18:37:00] !log deploying wdqs, fix for T144913 [18:37:01] T144913: wdqs-updater tries to apply updates from non-Entity pages, fails - https://phabricator.wikimedia.org/T144913 [18:39:37] RoanKattouw: https://gerrit.wikimedia.org/r/#/c/308923/ and https://gerrit.wikimedia.org/r/#/c/308922/ live on mw1099, check please [18:40:15] (03PS2) 10Rush: labstore: statistics and scratch mount definitions [puppet] - 10https://gerrit.wikimedia.org/r/309063 [18:41:16] thcipriani: Works, thanks [18:41:27] (03PS7) 10Eevans: Simplification of Cassandra Logstash filtering [puppet] - 10https://gerrit.wikimedia.org/r/282466 (https://phabricator.wikimedia.org/T130861) (owner: 10Jstenval) [18:41:52] RoanKattouw: cool, will go live everywhere, looks like Echo first then MobileFrontend, sound correct? [18:41:54] (03CR) 10Eevans: "@elukey ping?" [puppet] - 10https://gerrit.wikimedia.org/r/282466 (https://phabricator.wikimedia.org/T130861) (owner: 10Jstenval) [18:42:04] thcipriani: That's right, thanks for picking up on that, I'd forgotten [18:42:30] (03PS1) 10Legoktm: Match 'editcontentmodel' permission with 'move' [mediawiki-config] - 10https://gerrit.wikimedia.org/r/309066 (https://phabricator.wikimedia.org/T85847) [18:44:38] !log thcipriani@tin Synchronized php-1.28.0-wmf.18/extensions/Echo/modules/model/mw.echo.dm.ModelManager.js: SWAT: [[gerrit:308923|Add method to get local unread notifications in the manager (T141404)]] (duration: 00m 45s) [18:44:39] T141404: Add "Mark all as read" button for Notification badge in mobile - https://phabricator.wikimedia.org/T141404 [18:44:43] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [18:44:57] (03CR) 10Volans: [C: 031] "With my little knowledge of puppet/rake tests LGTM. Thanks for adding those gehel!" [puppet] - 10https://gerrit.wikimedia.org/r/309013 (owner: 10Gehel) [18:46:39] !log thcipriani@tin Synchronized php-1.28.0-wmf.18/extensions/MobileFrontend/resources/mobile.notifications.overlay/NotificationsOverlay.js: SWAT: [[gerrit:308922|Count local unread notifications when mark-all-read is clicked (T141404)]] (duration: 00m 44s) [18:46:40] T141404: Add "Mark all as read" button for Notification badge in mobile - https://phabricator.wikimedia.org/T141404 [18:46:42] ^ RoanKattouw live everywhere [18:46:44] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [18:46:48] yay thanks [18:46:57] :D [18:47:06] !log Morning SWAT complete [18:47:11] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [18:47:18] for some value of "morning" [18:51:46] (03PS2) 10Legoktm: Match 'editcontentmodel' permission with 'move' [mediawiki-config] - 10https://gerrit.wikimedia.org/r/309066 (https://phabricator.wikimedia.org/T85847) [18:56:21] (03CR) 10Brian Wolff: [C: 031] Match 'editcontentmodel' permission with 'move' [mediawiki-config] - 10https://gerrit.wikimedia.org/r/309066 (https://phabricator.wikimedia.org/T85847) (owner: 10Legoktm) [18:58:52] (03PS1) 10Hashar: contint: drop roles from contint1001 [puppet] - 10https://gerrit.wikimedia.org/r/309069 (https://phabricator.wikimedia.org/T140257) [19:00:04] 06Operations, 10hardware-requests, 10netops, 10Continuous-Integration-Infrastructure (phase-out-gallium), 13Patch-For-Review: Allocate contint1001 to releng and allocate to a vlan - https://phabricator.wikimedia.org/T140257#2616279 (10hashar) I confirm the server content on contint1001.eqiad.wmnet can be... [19:00:04] ostriches: Dear anthropoid, the time has come. Please deploy MediaWiki train (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20160907T1900). [19:01:23] !log T139961: Starting RESTBase htmldumper processes in codfw (read testing) [19:01:24] T139961: 9x or 15x additional Cassandra/RESTBase nodes - https://phabricator.wikimedia.org/T139961 [19:01:28] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [19:04:31] (03CR) 10Ottomata: "Ok, I need to do one more rsync, as I was just told that an artifact was very recently uploaded. I'm not 100% sure how you do this, but I" [dns] - 10https://gerrit.wikimedia.org/r/308997 (https://phabricator.wikimedia.org/T123725) (owner: 10Ottomata) [19:04:53] (03PS1) 10Yuvipanda: [WIP]labs: Move labs puppetmaster to use ENC [puppet] - 10https://gerrit.wikimedia.org/r/309071 [19:05:52] (03CR) 10Ottomata: "AH I found your screen!" [dns] - 10https://gerrit.wikimedia.org/r/308997 (https://phabricator.wikimedia.org/T123725) (owner: 10Ottomata) [19:06:06] (03CR) 10jenkins-bot: [V: 04-1] [WIP]labs: Move labs puppetmaster to use ENC [puppet] - 10https://gerrit.wikimedia.org/r/309071 (owner: 10Yuvipanda) [19:06:26] (03PS2) 10Ottomata: Point archiva at meitnerium [dns] - 10https://gerrit.wikimedia.org/r/308997 (https://phabricator.wikimedia.org/T123725) [19:06:55] * ostriches takes a bite of sandwich [19:06:58] Ok hur we go [19:07:08] (03PS2) 10Chad: Group1 to wmf.18 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/309022 [19:08:01] (03PS2) 10Yuvipanda: [WIP]labs: Move labs puppetmaster to use ENC [puppet] - 10https://gerrit.wikimedia.org/r/309071 [19:08:26] (03CR) 10jenkins-bot: [V: 04-1] [WIP]labs: Move labs puppetmaster to use ENC [puppet] - 10https://gerrit.wikimedia.org/r/309071 (owner: 10Yuvipanda) [19:09:40] (03PS3) 10Yuvipanda: [WIP]labs: Move labs puppetmaster to use ENC [puppet] - 10https://gerrit.wikimedia.org/r/309071 [19:10:10] (03CR) 10jenkins-bot: [V: 04-1] [WIP]labs: Move labs puppetmaster to use ENC [puppet] - 10https://gerrit.wikimedia.org/r/309071 (owner: 10Yuvipanda) [19:10:50] what the fuck now, jenkins [19:11:03] (03CR) 10Chad: [C: 032] Group1 to wmf.18 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/309022 (owner: 10Chad) [19:11:33] (03Merged) 10jenkins-bot: Group1 to wmf.18 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/309022 (owner: 10Chad) [19:11:42] (03Abandoned) 10Urbanecm: Enable Education Program extension on ur.wp [mediawiki-config] - 10https://gerrit.wikimedia.org/r/309054 (https://phabricator.wikimedia.org/T144927) (owner: 10Urbanecm) [19:12:31] (03PS4) 10Yuvipanda: [WIP]labs: Move labs puppetmaster to use ENC [puppet] - 10https://gerrit.wikimedia.org/r/309071 [19:12:41] !log demon@tin rebuilt wikiversions.php and synchronized wikiversions files: group1 to wmf.18 [19:12:55] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [19:14:34] (03PS2) 10Urbanecm: Add massmessage-sender group to urwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/309047 (https://phabricator.wikimedia.org/T144701) (owner: 10محمد شعیب) [19:16:53] ostriches there was a patch that reverted changes in uls [19:17:07] Do you know if that got deployed earlier [19:17:37] The uls changes were breaking visualeditor as well as wikibase [19:18:27] The uls one got done during swat. [19:18:36] Ok [19:18:46] Just want to be sure [19:27:10] 06Operations, 10procurement: Test - https://phabricator.wikimedia.org/T144994#2616371 (10Pswaby) [19:29:18] 06Operations, 10procurement: Test - https://phabricator.wikimedia.org/T144995#2616390 (10Pswaby) [19:29:52] 06Operations, 10procurement: Test - https://phabricator.wikimedia.org/T144994#2616403 (10Pswaby) 05Open>03declined [19:33:04] (03PS1) 10Ppchelko: Change-Prop: Concurrency bump for transclusions [puppet] - 10https://gerrit.wikimedia.org/r/309077 [19:40:53] !log T139961: Actually starting RESTBase htmldumper processes in codfw (read testing) [19:40:54] T139961: 9x or 15x additional Cassandra/RESTBase nodes - https://phabricator.wikimedia.org/T139961 [19:41:01] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [19:47:58] 06Operations: Update ICU version to 55.1 - https://phabricator.wikimedia.org/T143931#2616433 (10Bawolff) >>! In T143931#2608631, @MoritzMuehlenhoff wrote: > We can't easily upgrade icu to 55.1. Can we pinpoint the fix to a specific upstream commit? Current (unconfirmed) theory is r9764 and r9748. One theory (a... [19:48:48] I guess pokemon go continues it's rollout https://9to5mac.com/2016/09/07/ios-10-will-be-released-to-the-public-on-september-13-for-iphone-and-ipad/ [19:49:38] Here comes the rollout of super mario http://www.neowin.net/news/super-mario-run-is-coming-to-ios---the-world-rejoices [19:49:39] :) [19:52:18] !log restbase deploy start of 38d8c41 [19:52:26] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [19:52:27] bearND: mdholloway: ^^^ [19:54:22] RoanKattouw: do you happen to understand how dblists work in configuration changes? [19:54:54] I think I misunderstood them here - https://gerrit.wikimedia.org/r/#/c/309033/4/dblists/nowikidatadescriptiontaglines.dblist - unfortunately this means now i have enabled in a bunch of places that i shouldn't have. [19:55:37] jdlrobson: I don't think you can have a dblist that is both a computed dblist and lists wikis? But I might be wrong [19:55:45] yeh that's what i'm seeing [19:55:51] Also you have wikibooks twice [19:55:57] yep saw that too :) [19:56:07] it seems strange creating a dblist just for meta and mediawiki.org though [19:56:16] (the others can be captured in top6 wikipedias [19:56:20] Right [19:56:30] To capture mw.org you could use group0 :) [19:56:41] yup that's true [19:56:48] But yes you may just have to make a small dblist [19:56:54] although not sure about zerowiki [19:57:51] Also if this dblist is failing to parse completely, then maybe the nonwikidatadescriptiontaglines set is empty? [19:58:00] RoanKattouw: what's s4.dblist ? [19:58:16] sN are the database shards [19:58:31] So s1.dblist are all on one DB cluster, and s2.dblist on a second cluster, etc [19:59:16] From memory s1 is only enwiki and s4 is only commonswiki, s2 is the top15ish, s5 is three big Wikipedias, s6 is a few more, and s3 is everything else [19:59:29] And I forget what s7 is because that's relatively new compared to when I stopped paying attention to this stuff [19:59:36] !log mobileapps deploying 2cd4f6a [19:59:44] bearND: mdholloway: ^^^ [19:59:44] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [20:00:04] gwicke, cscott, arlolra, subbu, bearND, mdholloway, halfak, and Amir1: Respected human, time to deploy Services – Parsoid / OCG / Citoid / Mobileapps / ORES / … (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20160907T2000). Please do the needful. [20:00:28] Nothing for ORES [20:00:32] no parsoid deploy [20:01:46] (03PS1) 10Jdlrobson: Correct dblist definition [mediawiki-config] - 10https://gerrit.wikimedia.org/r/309087 (https://phabricator.wikimedia.org/T143345) [20:01:56] RoanKattouw: thanks for all your help so far. Does the above look sane to you? ^ [20:01:59] (03CR) 10Paladox: [C: 031] contint: drop roles from contint1001 [puppet] - 10https://gerrit.wikimedia.org/r/309069 (https://phabricator.wikimedia.org/T140257) (owner: 10Hashar) [20:02:47] (03PS5) 10Yuvipanda: [WIP]labs: Move labs puppetmaster to use ENC [puppet] - 10https://gerrit.wikimedia.org/r/309071 [20:04:33] (03PS2) 10Jdlrobson: Correct dblist definition [mediawiki-config] - 10https://gerrit.wikimedia.org/r/309087 (https://phabricator.wikimedia.org/T143345) [20:07:53] !log restbase cassandra truncating local_group_wikipedia_T_feed_aggregated.data for T144990 [20:07:54] T144990: [CRASH] Content Service shouldn't send empty objects - https://phabricator.wikimedia.org/T144990 [20:08:00] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [20:08:13] urandom: bearND: mdholloway: ^ [20:11:30] (03CR) 10Smalyshev: "ho does this relate to https://gerrit.wikimedia.org/r/#/c/308989/ ?" [puppet] - 10https://gerrit.wikimedia.org/r/309026 (https://phabricator.wikimedia.org/T144380) (owner: 10Gehel) [20:11:39] !log restbase deploy end of 38d8c41 [20:11:46] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [20:12:05] (03CR) 10Smalyshev: [C: 031] Specify contact groups in wdqs::monitor::services [puppet] - 10https://gerrit.wikimedia.org/r/308989 (owner: 10Hoo man) [20:17:14] jdlrobson: Will look after lunch [20:20:24] (03PS1) 10BBlack: update bblack ssh keys [labs/private] - 10https://gerrit.wikimedia.org/r/309100 [20:22:50] mobrovac: boom. [20:23:16] (03PS1) 10Andrew Bogott: labspuppetbackend: Rudimentary security [puppet] - 10https://gerrit.wikimedia.org/r/309101 [20:26:59] (03CR) 10BBlack: [C: 032 V: 032] update bblack ssh keys [labs/private] - 10https://gerrit.wikimedia.org/r/309100 (owner: 10BBlack) [20:27:20] 06Operations, 10Continuous-Integration-Infrastructure (phase-out-gallium), 13Patch-For-Review: Migrate CI services from gallium to contint1001 - https://phabricator.wikimedia.org/T137358#2616631 (10hashar) [20:32:04] (03CR) 10Gehel: "No, this is unrelated to https://gerrit.wikimedia.org/r/#/c/308989/" [puppet] - 10https://gerrit.wikimedia.org/r/309026 (https://phabricator.wikimedia.org/T144380) (owner: 10Gehel) [20:33:33] (03PS5) 10Volans: Automation: automatically reimage host [puppet] - 10https://gerrit.wikimedia.org/r/308520 (https://phabricator.wikimedia.org/T143536) [20:33:35] (03CR) 10Lydia Pintscher: [C: 04-1] "I'd like a clearer understanding of the consequences. We'll not just do this and see what breaks." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/208655 (https://phabricator.wikimedia.org/T94416) (owner: 10Aude) [20:33:47] PROBLEM - Host mw2182 is DOWN: PING CRITICAL - Packet loss = 100% [20:33:47] PROBLEM - Host mw2183 is DOWN: PING CRITICAL - Packet loss = 100% [20:34:19] (03CR) 10Gehel: [C: 04-1] "This change does not actually fix the issue it is trying to solve. Details in https://phabricator.wikimedia.org/T144948" [puppet] - 10https://gerrit.wikimedia.org/r/308989 (owner: 10Hoo man) [20:34:29] RECOVERY - Host mw2182 is UP: PING OK - Packet loss = 0%, RTA = 37.36 ms [20:34:47] PROBLEM - Host mw2184 is DOWN: PING CRITICAL - Packet loss = 100% [20:35:59] RECOVERY - Host mw2184 is UP: PING OK - Packet loss = 0%, RTA = 36.35 ms [20:36:48] RECOVERY - Host mw2183 is UP: PING OK - Packet loss = 0%, RTA = 37.06 ms [20:38:07] (03CR) 10Volans: "Addressed comments, fixed some issue discovered while testing the functions while elukey was reimaging the hosts manually and added some m" [puppet] - 10https://gerrit.wikimedia.org/r/308520 (https://phabricator.wikimedia.org/T143536) (owner: 10Volans) [20:46:56] (03PS6) 10Yuvipanda: [WIP]labs: Move labs puppetmaster to use ENC [puppet] - 10https://gerrit.wikimedia.org/r/309071 [20:47:37] (03PS7) 10Yuvipanda: labs: Introduce an ENC for labs [puppet] - 10https://gerrit.wikimedia.org/r/309071 [20:48:36] (03PS1) 10Hashar: contint: allow ssh from contint1001 to labs instance [puppet] - 10https://gerrit.wikimedia.org/r/309153 (https://phabricator.wikimedia.org/T137323) [20:48:49] (03CR) 10Yuvipanda: [C: 04-1] "I think we should try to not put this in the code, but keep this in nginx/uwsgi. I'm much more inclined to trust nginx to make this work r" [puppet] - 10https://gerrit.wikimedia.org/r/309101 (owner: 10Andrew Bogott) [20:48:51] (03PS1) 10Hashar: contint: vary ssh from= for prod slave [puppet] - 10https://gerrit.wikimedia.org/r/309154 (https://phabricator.wikimedia.org/T137323) [20:50:24] (03CR) 10Hashar: "I have updated the labs security groups in labs projects contintcloud, integration and deployment-prep." [puppet] - 10https://gerrit.wikimedia.org/r/309153 (https://phabricator.wikimedia.org/T137323) (owner: 10Hashar) [20:50:56] (03PS8) 10Yuvipanda: labs: Introduce an ENC for labs [puppet] - 10https://gerrit.wikimedia.org/r/309071 [20:51:19] (03CR) 10Paladox: [C: 031] contint: vary ssh from= for prod slave [puppet] - 10https://gerrit.wikimedia.org/r/309154 (https://phabricator.wikimedia.org/T137323) (owner: 10Hashar) [20:54:39] PROBLEM - Router interfaces on cr1-eqord is CRITICAL: CRITICAL: host 208.80.154.198, interfaces up: 37, down: 1, dormant: 0, excluded: 0, unused: 0BRxe-0/0/1: down - Core: cr1-ulsfo:xe-1/2/0 (Telia, IC-313592, 51ms) {#11372} [10Gbps wave]BR [20:54:41] (03CR) 10Hashar: [C: 031] "https://puppet-compiler.wmflabs.org/4010/" [puppet] - 10https://gerrit.wikimedia.org/r/309154 (https://phabricator.wikimedia.org/T137323) (owner: 10Hashar) [20:55:50] PROBLEM - Router interfaces on cr1-ulsfo is CRITICAL: CRITICAL: host 198.35.26.192, interfaces up: 66, down: 1, dormant: 0, excluded: 0, unused: 0BRxe-1/2/0: down - Core: cr1-eqord:xe-0/0/1 (Telia, IC-313592, 51ms) {#1502} [10Gbps wave]BR [20:55:52] (03PS4) 10Gehel: wdqs - initial configuration of wdqs servers in codfw [puppet] - 10https://gerrit.wikimedia.org/r/309026 (https://phabricator.wikimedia.org/T144380) [20:56:22] 06Operations, 10hardware-requests, 10netops, 10Continuous-Integration-Infrastructure (phase-out-gallium), 13Patch-For-Review: Allocate contint1001 to releng and allocate to a vlan - https://phabricator.wikimedia.org/T140257#2616802 (10RobH) [20:57:14] 06Operations, 10hardware-requests, 10netops, 10Continuous-Integration-Infrastructure (phase-out-gallium), 13Patch-For-Review: Allocate contint1001 to releng and allocate to a vlan - https://phabricator.wikimedia.org/T140257#2458291 (10RobH) a:05RobH>03hashar contint1001.wikimedia.org is online with p... [20:58:00] jynus: ready for PageAssessments roll-out. I'm ready to start in a few minutes [20:58:07] if you are [20:58:26] (03PS2) 10Hashar: sites.pp: rename contint1001 and drop role [puppet] - 10https://gerrit.wikimedia.org/r/309069 (https://phabricator.wikimedia.org/T140257) [20:59:27] (03CR) 10Paladox: [C: 031] sites.pp: rename contint1001 and drop role [puppet] - 10https://gerrit.wikimedia.org/r/309069 (https://phabricator.wikimedia.org/T140257) (owner: 10Hashar) [21:01:00] ori: is there a way to limit fatalmonitor to filter for a particular wiki? [21:01:13] (03CR) 10Gehel: [C: 032] wdqs - initial configuration of wdqs servers in codfw [puppet] - 10https://gerrit.wikimedia.org/r/309026 (https://phabricator.wikimedia.org/T144380) (owner: 10Gehel) [21:01:46] (03CR) 10Hashar: "puppet compiler is not happy on https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/4011/console" [puppet] - 10https://gerrit.wikimedia.org/r/309069 (https://phabricator.wikimedia.org/T140257) (owner: 10Hashar) [21:03:10] (03PS6) 10Volans: Automation: automatically reimage host [puppet] - 10https://gerrit.wikimedia.org/r/308520 (https://phabricator.wikimedia.org/T143536) [21:04:44] !log T139961: Stopping RESTBase htmldumper in codfw [21:04:45] T139961: 9x or 15x additional Cassandra/RESTBase nodes - https://phabricator.wikimedia.org/T139961 [21:04:51] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [21:08:41] (03CR) 10RobH: [C: 032] sites.pp: rename contint1001 and drop role [puppet] - 10https://gerrit.wikimedia.org/r/309069 (https://phabricator.wikimedia.org/T140257) (owner: 10Hashar) [21:08:46] (03PS3) 10RobH: sites.pp: rename contint1001 and drop role [puppet] - 10https://gerrit.wikimedia.org/r/309069 (https://phabricator.wikimedia.org/T140257) (owner: 10Hashar) [21:12:51] ACKNOWLEDGEMENT - High lag on wdqs1001 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [1800.0] Gehel updates are catching up after fix - T144913 [21:12:51] ACKNOWLEDGEMENT - High lag on wdqs1002 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [1800.0] Gehel updates are catching up after fix - T144913 [21:17:46] 06Operations, 10fundraising-tech-ops: Ensure all disaster recover documentation is in one central location - https://phabricator.wikimedia.org/T95841#2616881 (10Jgreen) [21:19:20] jdlrobson: Looks fine to me. I see you replaced meta+mw with special - s4, just checking you realize special.dblist includes a whole bunch of smaller wikis and also Wikidata [21:20:22] 06Operations, 10DBA, 10MediaWiki-Maintenance-scripts, 06Release-Engineering-Team, and 2 others: Add section for long-running tasks on the Deployment page (specially for database maintenance) - https://phabricator.wikimedia.org/T144661#2616900 (10greg) I'm also a big +1 on having those long running maint sc... [21:20:54] hmm, looks like jynus isn't available. marostegui: would you or someone else be available to help me monitor a new feature deployment? [21:21:19] 06Operations, 10Cassandra, 06Services, 10hardware-requests: 9x or 15x additional Cassandra/RESTBase nodes - https://phabricator.wikimedia.org/T139961#2616901 (10Eevans) I ran some dumps from codfw in order to generate read traffic, and determine if the Intel SSDs out-perform the Samsung in reads as well as... [21:31:27] AaronSchulz: Any chance I could talk you into helping me monitor things during the PageAssessments roll-out. Ori isn't around and neither are jynus or marostegui. I'm also open to suggesting other people. Just need someone to monitor the English Wikivoyage job queue and database when I add the new parser function to the master assessment template. [21:34:04] I suppose. enwikivoyage? Is that even large? [21:36:11] !log Created tables for Translate extension on fr.wiktionary (T138972) [21:36:12] T138972: Install Translate Extension in the French Wiktionary - https://phabricator.wikimedia.org/T138972 [21:36:18] AaronSchulz: It's not that large, but it does have a master assessment template that is included on every page. [21:36:18] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [21:37:38] PROBLEM - Check correctness of the icinga configuration on neon is CRITICAL: Icinga configuration contains errors [21:38:01] AaronSchulz: I think I can do a lot of the monitoring myself, but I'm not an expert on this. I was planning on watching the job number via https://en.wikivoyage.org/w/api.php?action=query&meta=siteinfo&siprop=statistics&format=jsonfm (Is there a graph somewhere I can watch instead?) and then watching the lag graph at [21:38:02] https://tendril.wikimedia.org/host/view/db1015.eqiad.wmnet/3306. [21:38:26] the graphs are not by wiki [21:40:13] (03PS21) 1020after4: Scap swat command [mediawiki-config] - 10https://gerrit.wikimedia.org/r/306259 (https://phabricator.wikimedia.org/T142880) [21:40:17] kaldari: you can also look at https://tendril.wikimedia.org/host/view/db1078.eqiad.wmnet/3306 (a main load s3 replica) [21:40:51] gehel: wdqs_codfw not find in icinga config [21:41:22] AaronSchulz: what's the difference between that one and db1015? [21:41:23] volans: yep, my doing... fix coming up [21:41:33] which I just chose randomly [21:41:40] gehel: ok, thanks! [21:41:40] volans: you're fast! [21:41:50] :) [21:43:40] AaronSchulz: I think db1015 is also a main load s3 replica, although I'm not 100% sure [21:44:15] ACKNOWLEDGEMENT - Check correctness of the icinga configuration on neon is CRITICAL: Icinga configuration contains errors Gehel related to new wdqs servers in codfw - gehel [21:44:19] they are all at https://tendril.wikimedia.org/tree [21:45:35] AaronSchulz: OK, looks like we're on the same page then. Is there anything better to watch for job queue load (besides monitoring the job number via the API)? [21:46:21] (03PS1) 10Gehel: wdqs - missing icinga group for codfw [puppet] - 10https://gerrit.wikimedia.org/r/309176 [21:46:48] seems like there should be a graph for that somewhere [21:48:31] (03CR) 10Gehel: [C: 032] wdqs - missing icinga group for codfw [puppet] - 10https://gerrit.wikimedia.org/r/309176 (owner: 10Gehel) [21:48:56] kaldari: I have no context about what are you deploying but for the JobQueue there are 2 dashboards on grafana [21:49:00] of that can help [21:49:56] and for databases you've tendril and some experimental grafana dashboard (search mysql) [21:50:57] volans: Cool, just found it and queued up the jobs I'm interested in. Thanks! [21:51:52] volans: I don't suppose there's any way to filter that grafana graph to a specific wiki? [21:52:07] db or job queue? [21:52:36] volans: job queue [21:53:19] RECOVERY - Check correctness of the icinga configuration on neon is OK: Icinga configuration is correct [21:53:22] kaldari: there is the raw log on fluorine.eqiad.wmnet in /a/mw-log/runJobs.php (warning 36GB) [21:54:00] the messages are emitted by mediawiki . Suffixed with STARTING when job starts [21:54:09] and then t= good [21:54:13] or "bad" [21:54:22] so you could grep [21:55:26] (03PS22) 1020after4: Scap swat command [mediawiki-config] - 10https://gerrit.wikimedia.org/r/306259 (https://phabricator.wikimedia.org/T142880) [21:55:39] happy hacking sleep & [21:55:57] hashar: well, it's going to create a few hundred thousand jobs, so I need a slightly higher level view, but would be nice if it was wiki-specific. [21:56:10] (03PS1) 10Dereckson: Enable Translate on fr.wiktionary [mediawiki-config] - 10https://gerrit.wikimedia.org/r/309182 (https://phabricator.wikimedia.org/T138972) [21:56:47] AaronSchulz: OK to pull the trigger now? [21:57:11] sure [21:57:44] 06Operations, 10hardware-requests: eqiad: (4) worker servers for kubernetes - https://phabricator.wikimedia.org/T141624#2617138 (10RobH) 05stalled>03Open a:05mark>03RobH So the TPM may not be included by default on our orders, since we never bothered to check or use it in the past. I've dropped an ema... [21:58:16] AaronSchulz: done. just went from 0 jobs to 300 and climbing... [21:58:37] https://grafana.wikimedia.org/dashboard/db/job-queue-health is the overall site graph [21:58:47] (03CR) 10Hashar: "Host now has the ferm rules enabled. Have to ssh to it via the bastion. I have checked a few network flows and that looks fine." [puppet] - 10https://gerrit.wikimedia.org/r/309069 (https://phabricator.wikimedia.org/T140257) (owner: 10Hashar) [21:59:51] (03CR) 10Dereckson: [C: 04-1] "Fix is good, but top6wikipedias broke the current dblist convention. List of all the Wikipedia is called wikipedia.dblist, not wikipedias." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/309087 (https://phabricator.wikimedia.org/T143345) (owner: 10Jdlrobson) [22:00:29] job queue number is going all over the place, but seems to be averaging about 300 or 400. Interesting that it's staying fairly level. [22:01:05] kaldari: on the DB side this is affecting only s3? [22:01:14] should be, yes [22:02:41] (03CR) 10Thibaut120094: [C: 031] Enable Translate on fr.wiktionary [mediawiki-config] - 10https://gerrit.wikimedia.org/r/309182 (https://phabricator.wikimedia.org/T138972) (owner: 10Dereckson) [22:02:44] seeing a 3 second replication spike on the slave [22:02:55] !log ladsgroup@terbium:~$ mwscript extensions/ORES/maintenance/PurgeScoreCache.php --wiki=wikidatawiki --model damaging [22:03:04] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [22:04:07] nothing bad so far [22:04:21] no fatals [22:05:07] jobs at 469 [22:05:42] (03PS1) 10Dereckson: Improve dblist name coherence [mediawiki-config] - 10https://gerrit.wikimedia.org/r/309186 [22:05:55] you almost doubled the # of writes on s3 master :) [22:06:53] replication lag back to 0 [22:07:02] volans: sweet :) [22:07:15] volans: where do you see that? [22:07:32] grafana [22:08:43] or tendril [22:12:18] becoming late, I'll be offline soon kaldari (FYI) [22:12:40] volans: I don't see that in grafana, how do you get it? [22:15:11] grafana support for mysql metrics is still experimental, look at the dashboard mysql-aggregated [22:15:28] select equiad, core, s3, master for example [22:16:06] or just see tendril for db1075 kaldari [22:30:24] what are the deployment plans for graphoid? [22:32:22] job number back under 100 now [22:32:59] yurik? [22:33:18] Platonides, sup [22:33:45] I see Extension:Graph is enabled in eg. mediawikiwiki but not in wikipedias [22:33:48] what are the plans? [22:34:27] is there some roadmap ? [22:34:43] Platonides, graph ext has been enabled everywhere for the past 2 years [22:35:42] a) I didnt know about it [22:35:54] b) seems I was confused due to the template indirection [22:36:21] we need to copy Module:Graph ? [22:36:43] hmm, it's there [22:37:49] yep [22:37:53] it works [22:37:54] sorry [22:38:02] now I just need to figure out how to add decimals [22:38:53] Platonides, current usage: https://grafana.wikimedia.org/dashboard/db/interactive-team-kpi [22:39:19] I'm adding one to eswiki right now :) [22:39:28] \m/ [22:41:47] https://es.wikipedia.org/w/index.php?title=Ortograf%C3%ADa_del_espa%C3%B1ol&type=revision&diff=93465648&oldid=93458332 [22:52:29] (03CR) 10Volans: "Few minor comments" (039 comments) [puppet] - 10https://gerrit.wikimedia.org/r/309071 (owner: 10Yuvipanda) [23:03:10] Dereckson: i don't understand your -1 [23:03:39] you want me to rename to wikipedia-top6 ? [23:03:44] or top6wikipedia [23:04:12] or other? [23:04:23] I think he means top6wikipedia, without an 's' at the end [23:04:30] reason being is that the other dblist names aren't plural [23:04:55] yeh but that looks strange. What do you think about wikipedia-top6 ? [23:05:21] Looks like jouncebot is broken? [23:05:22] we have group1-wikipedia [23:05:23] jouncebot: next [23:05:24] In 0 hour(s) and 54 minute(s): Phabricator update (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20160908T0000) [23:05:27] so i guess top6-wikipedia would make most sense [23:05:31] It should have pinged about the SWAT [23:05:34] I'll do it today [23:05:57] \o [23:05:58] Hello. [23:06:22] jdlrobson: top6wikipedia looks good [23:06:32] Dereckson: i've suggested adding a '-' [23:06:41] top6-wikipedia also [23:06:44] since that will be consistent with group1-wikipedia [23:06:44] ebernhardson: You here for SWAT? [23:06:47] yes [23:06:54] (03PS3) 10Jdlrobson: Correct dblist definition [mediawiki-config] - 10https://gerrit.wikimedia.org/r/309087 (https://phabricator.wikimedia.org/T143345) [23:06:54] RoanKattouw: yup [23:06:55] and there you are ^ [23:07:00] I also prepared a patch to fix flow_computed [23:07:03] OK, I'll do your patch first [23:07:10] Since Dereckson and jdlrobson are still figuring out dblist stuff [23:07:12] but I would like some input from matt [23:07:14] I'm here to test the ORES patch [23:07:18] RoanKattouw: it's figured out [23:07:21] and new patch is up [23:07:26] RoanKattouw: sure. mine is real simple, js change reverting back to a previously known-good state [23:07:29] as I don't know where flow_computed is used [23:07:30] OK, Amir1 first then, for timezone reasons [23:07:56] Thanks! [23:08:29] (03CR) 10Dereckson: [C: 031] "Yes, that makes naming more coherent." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/309087 (https://phabricator.wikimedia.org/T143345) (owner: 10Jdlrobson) [23:08:58] Oh, crap, wait [23:09:00] My SSH key [23:09:07] * RoanKattouw tries to retrieve his SSH key [23:09:30] :) [23:09:37] new laptop? [23:09:44] Kind of [23:09:48] I spilled water on my laptop on Thursday [23:10:01] doh [23:10:09] So I'm using a loaner, and today they managed to transplant the hard drive to another machine [23:10:25] Going to try to pull my key off that machine, brb [23:10:32] (03CR) 10Dereckson: "@matt Where flow_computed is used, if used somewhere?" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/309186 (owner: 10Dereckson) [23:10:41] RoanKattouw: I can SWAT if you wish [23:10:58] Platonides, technically you don't need the module:graph -- that one is one of the implementations of "wiki template -> pretty graph". Its good, but it has a number of fairly complex bugs [23:11:10] Amir1, I'm looking to pack up. Should I stick around? [23:11:23] Hey good evening halfak [23:11:27] halfak: we've Math on wikitech [23:11:28] halfak: nah, don't worry. I've got this [23:12:39] Dereckson, saw that. Went and implemented the math I wanted right away and gave the task a token :) [23:12:42] Dereckson: Yes please [23:12:46] Okay [23:12:49] I thought my backup was done but it wasn't [23:12:55] (03PS4) 10Dereckson: Correct dblist definition [mediawiki-config] - 10https://gerrit.wikimedia.org/r/309087 (https://phabricator.wikimedia.org/T143345) (owner: 10Jdlrobson) [23:13:03] FYI: https://wikitech.wikimedia.org/w/index.php?title=Analytics%2FData%2FPageview_hourly%2FSanitization&type=revision&diff=819735&oldid=390304 [23:13:12] Also I'm filling in for James to supervise his patches [23:13:16] Amir1, rock on. Catch you later! [23:13:33] (03CR) 10Dereckson: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/309087 (https://phabricator.wikimedia.org/T143345) (owner: 10Jdlrobson) [23:13:39] ack'ed [23:13:44] Thanks! have fun halfak [23:14:00] (03Merged) 10jenkins-bot: Correct dblist definition [mediawiki-config] - 10https://gerrit.wikimedia.org/r/309087 (https://phabricator.wikimedia.org/T143345) (owner: 10Jdlrobson) [23:14:16] ebernhardson: okay so zuul merged your patches [23:14:48] jdlrobson: dblist update live on mw1099 [23:15:51] Dereckson: testing [23:16:06] https://gerrit.wikimedia.org/r/#/c/309156/ still waiting zuul [23:16:23] https://gerrit.wikimedia.org/r/#/c/309180/ still waiting zuul too [23:17:19] looks good to me Dereckson [23:17:55] jdlrobson: ack'ed [23:18:02] 309156 and 309180 merged [23:18:16] logs are good too jdlrobson [23:20:48] ebernhardson: RoanKattouw: your changes are live on mw1099 [23:21:14] I'm here filling for Roan [23:21:52] Dereckson: looks good from here [23:22:55] Dereckson: Are you sure it's live in mw1099? [23:23:05] Dereckson: is it live everywhere now? [23:23:25] jdlrobson: syncing [23:23:41] great. just checking. Had so many issues with this keen to see it all resolved :) [23:23:43] !log dereckson@tin Synchronized dblists/: Correct dblist definition (T143345, 1/2) (duration: 00m 49s) [23:23:46] T143345: Deploy Wikidata descriptions to mobile web stable channel Wikipedias 2nd half - https://phabricator.wikimedia.org/T143345 [23:23:50] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [23:24:06] https://www.wikidata.org/wiki/Special:Version in mw1099 doesn't say it's there [23:24:22] Amir1: what change? [23:24:30] The RoanKattouw's change [23:24:34] The ORES change WFM [23:24:48] !log dereckson@tin Synchronized wmf-config/CommonSettings.php: Correct dblist definition (T143345, 2/2) (duration: 00m 47s) [23:24:48] Amir1: gerrit id please [23:24:49] T143345: Deploy Wikidata descriptions to mobile web stable channel Wikipedias 2nd half - https://phabricator.wikimedia.org/T143345 [23:24:55] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [23:24:59] ah, 309174? [23:24:59] https://www.wikidata.org/w/index.php?title=Special:Contributions&limit=200&contribs=user&target=Stryn&namespace=2&tagfilter=&year=2016&month=-1&debug= is empty in prod and nonempty on mw1099 [23:25:37] https://gerrit.wikimedia.org/r/#/c/309174/ [23:25:50] Dereckson: Did you sync the VE patches to mw1099 too, or only the ORES patch? [23:26:02] RoanKattouw: It's empty for me in mw1099 [23:26:09] maybe my extension is broken [23:26:39] Amir1: so, I checked on mw1099, yes change is deployed [23:26:41] $conds[] = '(oresm_is_current != 0 OR oresm_is_current IS NULL)'; [23:26:46] for line 140 of /srv/mediawiki/php-1.28.0-wmf.18/extensions/ORES [23:26:58] okay [23:27:02] (append /includes/Hooks.php) [23:27:16] RoanKattouw: yes VE patches are too on mw1099 [23:28:19] er wait, nope I'm not sure [23:29:57] I'm testing in firefox [23:31:14] (03PS1) 10Alex Monk: check_ssl: Use a maximum percentage of certificate validity time for determining alert state [puppet] - 10https://gerrit.wikimedia.org/r/309203 (https://phabricator.wikimedia.org/T144293) [23:31:15] RoanKattouw: VE patches are now live on mw1099 [23:31:35] okay, I confirm it was something else [23:31:40] it's okay in mw1099 [23:31:41] Strangely, git submodule from /srv/mediawiki-staging/php-1.28.0-wmf.18 didn't updated the repo [23:32:15] I git fetch, git rebase in /srv/mediawiki-staging/php-1.28.0-wmf.18/extension/VisualEditor, then updated submodule, and at this time, we have on tin a correct state [23:32:36] Dereckson: synced yet? [23:32:45] jdlrobson: synced [23:32:58] jdlrobson: 23:24:48 < logmsgbot> !log dereckson@tin Synchronized wmf-config/CommonSettings.php: Correct dblist definition (T143345, 2/2) (duration: 00m 47s) [23:32:59] T143345: Deploy Wikidata descriptions to mobile web stable channel Wikipedias 2nd half - https://phabricator.wikimedia.org/T143345 [23:33:34] Amir1: "it's okay in mw1099" > VE or ORES? [23:33:41] ORES [23:33:44] ack'ed [23:33:44] thanks Dereckson looks good [23:34:14] thanks for checking [23:34:48] Dereckson: VE looks good [23:35:49] Amir1: syncing ORES to prod [23:36:03] !log dereckson@tin Synchronized php-1.28.0-wmf.18/extensions/ORES/includes/Hooks.php: Get results when the score is not stored too (T144999) (duration: 00m 46s) [23:36:04] Amir1: here you are ^ [23:36:05] T144999: User contribs seems to be empty when ores enabled - https://phabricator.wikimedia.org/T144999 [23:36:07] Now, VE. [23:36:14] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [23:36:16] Awesome [23:36:18] thanks [23:37:36] have fun [23:37:37] (03PS9) 10Yuvipanda: labs: Introduce an ENC for labs [puppet] - 10https://gerrit.wikimedia.org/r/309071 [23:37:38] thanks RoanKattouw [23:37:43] (03CR) 10Yuvipanda: [C: 032 V: 032] labs: Introduce an ENC for labs [puppet] - 10https://gerrit.wikimedia.org/r/309071 (owner: 10Yuvipanda) [23:38:09] !log dereckson@tin Synchronized php-1.28.0-wmf.17/extensions/VisualEditor/lib/ve: Fix bad serialization of DOM elements in cloneElement (through [[Gerrit:309155]]) (duration: 00m 47s) [23:38:16] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [23:38:19] Dereckson: mine get synced out to rest of cluster yet? looks fine on mw1099 [23:38:35] ebernhardson: not yet synced, I'll do it after ve [23:38:46] kk [23:39:00] !log dereckson@tin Synchronized php-1.28.0-wmf.18/extensions/VisualEditor/lib/ve: Fix bad serialization of DOM elements in cloneElement (through [[Gerrit:309156]]) (duration: 00m 47s) [23:39:08] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [23:39:14] We need a small ephemeral workboard to show state of patches "in mw1099" "tested looks good" "synced to prod" [23:41:21] !log dereckson@tin Synchronized php-1.28.0-wmf.18/extensions/VisualEditor/modules/ve-mw/ui/pages/ve.ui.MWParameterPage.js: Fix parent constructor call ([[Gerrit:309180]]) (duration: 00m 46s) [23:41:24] RoanKattouw: VE synced to prod [23:41:29] Yay [23:41:30] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [23:41:32] (James_F ) [23:41:39] Ta. [23:41:48] You're welcome. [23:41:50] now WikimediaEvents [23:42:05] (03CR) 10Alex Monk: [C: 04-1] "not working yet" [puppet] - 10https://gerrit.wikimedia.org/r/309203 (https://phabricator.wikimedia.org/T144293) (owner: 10Alex Monk) [23:43:28] !log dereckson@tin Synchronized php-1.28.0-wmf.18/extensions/WikimediaEvents/modules/ext.wikimediaEvents.searchSatisfaction.js: Revert "Turn on CirrusSearch bm25 A/B test" (T143588) (duration: 00m 46s) [23:43:29] T143588: Turn off BM25 AB test - https://phabricator.wikimedia.org/T143588 [23:43:35] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [23:44:02] (03PS2) 10Alex Monk: check_ssl: Use a maximum percentage of certificate validity time for determining alert state [puppet] - 10https://gerrit.wikimedia.org/r/309203 (https://phabricator.wikimedia.org/T144293) [23:45:02] !log dereckson@tin Synchronized php-1.28.0-wmf.17/extensions/WikimediaEvents/modules/ext.wikimediaEvents.searchSatisfaction.js: Revert "Turn on CirrusSearch bm25 A/B test" (T143588) (duration: 00m 46s) [23:45:03] T143588: Turn off BM25 AB test - https://phabricator.wikimedia.org/T143588 [23:45:06] ebernhardson: sync'ed [23:45:10] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [23:45:12] Dereckson: looking [23:48:53] Next: 309182 translate on fr.wikt [23:48:59] SQL tables are already added [23:49:35] (03CR) 10Dereckson: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/309182 (https://phabricator.wikimedia.org/T138972) (owner: 10Dereckson) [23:50:25] (03PS2) 10Dereckson: Enable Translate on fr.wiktionary [mediawiki-config] - 10https://gerrit.wikimedia.org/r/309182 (https://phabricator.wikimedia.org/T138972) [23:50:36] (03CR) 10Dereckson: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/309182 (https://phabricator.wikimedia.org/T138972) (owner: 10Dereckson) [23:51:10] PROBLEM - puppet last run on achernar is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 6 minutes ago with 1 failures. Failed resources (up to 3 shown): File[/usr/local/sbin/grain-ensure] [23:51:13] (03Merged) 10jenkins-bot: Enable Translate on fr.wiktionary [mediawiki-config] - 10https://gerrit.wikimedia.org/r/309182 (https://phabricator.wikimedia.org/T138972) (owner: 10Dereckson) [23:51:15] Dereckson: all looks good [23:51:23] ebernhardson: good :) [23:51:31] 309182 live on mw1099 [23:52:00] (03PS1) 10Yuvipanda: labs: Use the puppet enc on labtest [puppet] - 10https://gerrit.wikimedia.org/r/309208 [23:52:19] (03PS2) 10Yuvipanda: labs: Use the puppet enc on labtest [puppet] - 10https://gerrit.wikimedia.org/r/309208 [23:52:30] (03CR) 10Yuvipanda: [C: 032 V: 032] labs: Use the puppet enc on labtest [puppet] - 10https://gerrit.wikimedia.org/r/309208 (owner: 10Yuvipanda) [23:52:32] looks good [23:55:02] Syncing. [23:55:38] !log dereckson@tin Synchronized wmf-config/InitialiseSettings.php: Enable Translate on fr.wiktionary (T138972) (duration: 00m 47s) [23:55:39] T138972: Install Translate Extension in the French Wiktionary - https://phabricator.wikimedia.org/T138972 [23:55:46] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [23:58:47] any ops to +2 a patch? perhaps gehel ? [23:59:12] * yurik suspects gehel is asleep :(