[00:00:04] twentyafterfour: Respected human, time to deploy Phabricator update (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20160714T0000). Please do the needful. [00:00:48] jouncebot: doing the needful [00:01:34] !log preparing to take Phabricator offline momentarily for scheduled maintenance / upgrade. Service should be restored within a couple of minutes. [00:01:44] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [00:01:55] 06Operations, 06Discovery, 06Labs, 10Labs-Infrastructure, and 2 others: Update coastline data in OSM postgres db (osmdb.eqiad.wmnet) - https://phabricator.wikimedia.org/T140296#2460301 (10MaxSem) [00:04:39] (03PS1) 10Dzahn: etherpad: move role to module, rename to ::server [puppet] - 10https://gerrit.wikimedia.org/r/298909 [00:06:50] !log Phabricator maintenance completed. Service restored [00:06:56] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [00:11:43] (03PS1) 10Dzahn: pmacct: move role to module, rename to ::netflow [puppet] - 10https://gerrit.wikimedia.org/r/298911 [00:14:55] (03CR) 10Faidon Liambotis: [C: 04-1] "Why the rename?" [puppet] - 10https://gerrit.wikimedia.org/r/298911 (owner: 10Dzahn) [00:16:55] (03CR) 10Dzahn: "renaming to a "role foo::bar" format fixes all lint warnings about the correct autoloader layout" [puppet] - 10https://gerrit.wikimedia.org/r/298911 (owner: 10Dzahn) [00:18:55] Epics just become much more epic: https://phabricator.wikimedia.org/T94620 [00:19:39] such graph [00:20:06] these are awesome [00:20:12] yay graphs [00:20:46] cue complains about the amount of space they take up in 3, 2, 1, ... [00:21:54] hmm [00:22:00] bugzilla had those on a different page [00:22:07] but I'm so glad those are back :D [00:23:43] twentyafterfour, lolololol https://phabricator.wikimedia.org/T2001 [00:24:15] !log Started backfillUnreadWikis --rebuild and backfillReadBundles for all group 0 and group 1 wikis earlier [00:24:23] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [00:24:39] MaxSem: ouch [00:25:35] MaxSem that seems like a bug [00:27:10] Bug 1 looks amazing [00:27:12] PROBLEM - mobileapps endpoints health on scb2002 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:27:16] morebots: wow [00:27:16] I am a logbot running on tools-exec-1211. [00:27:16] Messages are logged to wikitech.wikimedia.org/wiki/Server_Admin_Log. [00:27:16] To log a message, type !log . [00:28:37] MaxSem: it's like guitar hero and nyancat in one [00:29:02] RECOVERY - mobileapps endpoints health on scb2002 is OK: All endpoints are healthy [00:29:06] MaxSem twentyafterfour https://phabricator.wikimedia.org/T140326 [00:30:57] I gave it as an example upstream -- https://secure.phabricator.com/T4788#185751 [00:31:16] 06Operations, 10Parsoid: Delete Parsoid deb 0.4.0 package from releases wikimedia.org - https://phabricator.wikimedia.org/T140279#2460349 (10Dzahn) a:05Dzahn>03None [00:31:34] But honestly if anyone goes to T2001 they get what they deserve [00:31:34] T2001: Documentation is out of date, incomplete (tracking) - https://phabricator.wikimedia.org/T2001 [00:35:34] !log Started backfillReadBundles on labswiki [00:35:41] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [00:36:41] !log Ran backfillReadBundles on labtestwiki [00:36:48] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [00:48:56] twentyafterfour bd808 as spotted by quiddity it happends here too https://phabricator.wikimedia.org/T126641 [00:49:21] and causing long page load even sometimes hitting the script limit causing http errors. [00:50:08] I get a script timeout, if I try to load T126641. (I thought I'd broken everything, when I hit save on my comment, and that guitarhero madness appeared! :) [00:50:09] T126641: [RFC] Devise plan for a cross-wiki watchlist back-end - https://phabricator.wikimedia.org/T126641 [00:50:53] I am trying to figure out how to disable it when some sort of sane size limit is exceeded. [00:51:36] Really though it illustrates nicely just how insane some of our tracking tasks are [00:51:52] Yep [00:52:17] twentyafterfour we could disable the color's that are drawn if a size is reached [00:52:49] and add scroll bars on the tasks so theres side scrolls and bottom to prevent the screen from being too big reducing a big chunk of load [00:52:56] please [00:53:42] paladox: it's not really that simple to do that [00:53:51] Oh [00:54:35] twentyafterfour what about if so many tasks are in task graph then it disables past the fixed amout [00:55:41] paladox: I'm just trying to find an appropriate place to test the size of the graph. It's all very abstract [00:55:52] Oh [00:56:24] twentyafterfour you can test on the phabricator instances> [00:56:26] ? [00:56:36] im going to quickly update it now phab-01 [01:00:03] twentyafterfour all updated now. phab-01.wmflabs.org should now have task graphs [01:33:23] PROBLEM - mobileapps endpoints health on scb1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [01:35:03] RECOVERY - mobileapps endpoints health on scb1001 is OK: All endpoints are healthy [02:03:34] 06Operations, 10Fundraising Tech Backlog: Add granularity limiter (g=) to wikimedia.org DKIM record(s) - https://phabricator.wikimedia.org/T140316#2460001 (10dpatrick) FWIW, I'm fine with waiting until after T135410. [02:37:00] !log mwdeploy@tin scap sync-l10n completed (1.28.0-wmf.8) (duration: 15m 17s) [02:37:09] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [03:07:41] !log mwdeploy@tin scap sync-l10n completed (1.28.0-wmf.10) (duration: 15m 50s) [03:07:50] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [03:14:57] !log l10nupdate@tin ResourceLoader cache refresh completed at Thu Jul 14 03:14:57 UTC 2016 (duration 7m 16s) [03:15:05] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [03:18:46] 06Operations, 10Wikimedia-Language-setup, 10Wikimedia-Site-requests, 07Tracking: Wikipedias with zh-* language codes waiting to be renamed (zh-min-nan -> nan, zh-yue -> yue, zh-classical -> lzh) (tracking) - https://phabricator.wikimedia.org/T10217#2460584 (10Quiddity) [03:25:31] PROBLEM - IPv4 ping to eqiad on ripe-atlas-eqiad is CRITICAL: CRITICAL - failed 26 probes of 401 (alerts on 19) - https://atlas.ripe.net/measurements/1790945/#!map [03:26:46] 06Operations, 10Wikimedia-Site-requests, 07I18n, 07Tracking: Wikis waiting to be renamed (tracking) - https://phabricator.wikimedia.org/T21986#2460613 (10Quiddity) [03:27:06] (03PS2) 10KartikMistry: apertium-hbs-slv: New upstream, rebuild for Jessie and cleanup [debs/contenttranslation/apertium-hbs-slv] - 10https://gerrit.wikimedia.org/r/296203 (https://phabricator.wikimedia.org/T107306) [03:29:48] (03PS2) 10KartikMistry: apertium-hbs-eng: New upstream, rebuild for Jessie and cleanup [debs/contenttranslation/apertium-hbs-eng] - 10https://gerrit.wikimedia.org/r/296049 (https://phabricator.wikimedia.org/T107306) [03:31:40] RECOVERY - IPv4 ping to eqiad on ripe-atlas-eqiad is OK: OK - failed 8 probes of 401 (alerts on 19) - https://atlas.ripe.net/measurements/1790945/#!map [03:42:38] 06Operations, 10Beta-Cluster-Infrastructure: [OPS] exim config points to mchenry.wmflabs.org - https://phabricator.wikimedia.org/T38996#2460667 (10Quiddity) [03:48:21] PROBLEM - MariaDB Slave Lag: s7 on dbstore1002 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 975.72 seconds [03:48:38] 06Operations, 10Wikimedia-General-or-Unknown, 07Tracking: Upgrade/reinstall all servers to Ubuntu Precise Pangolin (12.04) (tracking) - https://phabricator.wikimedia.org/T38623#2460698 (10Quiddity) [04:09:41] RECOVERY - MariaDB Slave Lag: s7 on dbstore1002 is OK: OK slave_sql_lag Replication lag: 0.64 seconds [04:22:13] !log Phabricator hotfix: applied patch to disable task graph on tasks with > 100 related tasks. [04:22:21] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [04:23:12] PROBLEM - puppet last run on mw2130 is CRITICAL: CRITICAL: Puppet has 1 failures [04:50:11] RECOVERY - puppet last run on mw2130 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [04:56:23] (03CR) 10Papaul: [C: 031] ipmi: move role to module structure [puppet] - 10https://gerrit.wikimedia.org/r/298902 (owner: 10Dzahn) [06:00:08] PROBLEM - MariaDB Slave Lag: s7 on dbstore1002 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 625.72 seconds [06:07:16] !log upgrading hhvm in codfw [06:07:25] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [06:10:08] 06Operations, 10Beta-Cluster-Infrastructure, 06Labs, 10MediaWiki-General-or-Unknown: Create a poolcounter instance in deployment-prep - https://phabricator.wikimedia.org/T112501#2460920 (10greg) [06:18:24] PROBLEM - MariaDB Slave Lag: s7 on dbstore1002 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 622.32 seconds [06:29:55] PROBLEM - puppet last run on cp3017 is CRITICAL: CRITICAL: puppet fail [06:29:55] PROBLEM - puppet last run on db2058 is CRITICAL: CRITICAL: Puppet has 1 failures [06:30:10] (03PS1) 10Greg Grossmeier: [Beta Cluster] Enable PoolCounter [mediawiki-config] - 10https://gerrit.wikimedia.org/r/298919 (https://phabricator.wikimedia.org/T38891) [06:31:13] PROBLEM - puppet last run on mw1170 is CRITICAL: CRITICAL: Puppet has 1 failures [06:31:13] PROBLEM - puppet last run on ms-be1010 is CRITICAL: CRITICAL: puppet fail [06:31:25] PROBLEM - puppet last run on cp2001 is CRITICAL: CRITICAL: Puppet has 1 failures [06:31:34] PROBLEM - puppet last run on neodymium is CRITICAL: CRITICAL: Puppet has 1 failures [06:31:34] PROBLEM - puppet last run on mw2136 is CRITICAL: CRITICAL: puppet fail [06:31:34] PROBLEM - puppet last run on mw2129 is CRITICAL: CRITICAL: Puppet has 1 failures [06:31:44] PROBLEM - puppet last run on mw2208 is CRITICAL: CRITICAL: Puppet has 1 failures [06:31:54] RECOVERY - MariaDB Slave Lag: s7 on dbstore1002 is OK: OK slave_sql_lag Replication lag: 0.74 seconds [06:32:24] PROBLEM - puppet last run on lvs2002 is CRITICAL: CRITICAL: Puppet has 1 failures [06:34:23] PROBLEM - puppet last run on mw1289 is CRITICAL: CRITICAL: Puppet has 1 failures [06:39:13] RECOVERY - puppet last run on cp2001 is OK: OK: Puppet is currently enabled, last run 6 seconds ago with 0 failures [06:43:29] 06Operations, 10Beta-Cluster-Infrastructure: [OPS] exim config points to mchenry.wmflabs.org - https://phabricator.wikimedia.org/T38996#2460945 (10hashar) [06:45:26] !log restarted hhvm on mw1170 [06:45:33] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [06:45:33] RECOVERY - Apache HTTP on mw1170 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 627 bytes in 0.828 second response time [06:45:55] RECOVERY - HHVM rendering on mw1170 is OK: HTTP OK: HTTP/1.1 200 OK - 68759 bytes in 0.336 second response time [06:56:04] RECOVERY - puppet last run on mw1170 is OK: OK: Puppet is currently enabled, last run 11 seconds ago with 0 failures [06:56:33] RECOVERY - puppet last run on neodymium is OK: OK: Puppet is currently enabled, last run 25 seconds ago with 0 failures [06:56:43] RECOVERY - puppet last run on mw2208 is OK: OK: Puppet is currently enabled, last run 55 seconds ago with 0 failures [06:56:54] RECOVERY - puppet last run on db2058 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:56:55] RECOVERY - puppet last run on cp3017 is OK: OK: Puppet is currently enabled, last run 28 seconds ago with 0 failures [06:57:14] RECOVERY - puppet last run on lvs2002 is OK: OK: Puppet is currently enabled, last run 57 seconds ago with 0 failures [06:58:04] RECOVERY - puppet last run on ms-be1010 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:58:33] RECOVERY - puppet last run on mw2136 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:58:33] RECOVERY - puppet last run on mw2129 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:59:14] RECOVERY - puppet last run on mw1289 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [07:11:06] (03PS1) 10Ema: Attempt running puppet again in case of failure [puppet] - 10https://gerrit.wikimedia.org/r/298921 [07:27:33] 06Operations, 10Traffic, 07HTTPS, 13Patch-For-Review: Enforce HTTPS+HSTS on remaining one-off sites in wikimedia.org that don't use standard cache cluster termination - https://phabricator.wikimedia.org/T132521#2201391 (10Peachey88) >>! In T132521#2454541, @demon wrote: > Did we ever make that an official... [07:28:25] (03CR) 10Giuseppe Lavagetto: puppetmaster: correct puppettization of the private repo (035 comments) [puppet] - 10https://gerrit.wikimedia.org/r/298258 (https://phabricator.wikimedia.org/T98173) (owner: 10Giuseppe Lavagetto) [07:28:48] (03PS4) 10Giuseppe Lavagetto: puppetmaster: correct puppetization of the private repo [puppet] - 10https://gerrit.wikimedia.org/r/298258 (https://phabricator.wikimedia.org/T98173) [07:30:46] (03CR) 10Mobrovac: [C: 031] puppet: add a function for performing conftool lookups [puppet] - 10https://gerrit.wikimedia.org/r/283151 (owner: 10Giuseppe Lavagetto) [07:33:44] 06Operations, 10ops-eqiad, 06DC-Ops: dbstore1002.mgmt.eqiad.wmnet: "No more sessions are available for this type of connection!" - https://phabricator.wikimedia.org/T119488#2461075 (10jcrespo) Ping for 19 July, @Cmjohnson. [07:34:23] (03PS3) 10Giuseppe Lavagetto: changeprop: add new precaching for ores new models [puppet] - 10https://gerrit.wikimedia.org/r/298707 (owner: 10Ladsgroup) [07:36:12] PROBLEM - Apache HTTP on mw1275 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - 50404 bytes in 0.017 second response time [07:37:57] (03CR) 10Giuseppe Lavagetto: [C: 032] changeprop: add new precaching for ores new models [puppet] - 10https://gerrit.wikimedia.org/r/298707 (owner: 10Ladsgroup) [07:39:34] PROBLEM - MariaDB Slave Lag: s7 on dbstore1002 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 862.29 seconds [07:41:23] PROBLEM - Apache HTTP on mw1267 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - 50404 bytes in 0.006 second response time [07:42:34] PROBLEM - Apache HTTP on mw1278 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - 50404 bytes in 0.012 second response time [07:42:42] mmmmm [07:46:35] mw1278 is an api server [07:47:12] RECOVERY - Apache HTTP on mw1267 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 612 bytes in 0.042 second response time [07:47:21] 06Operations, 10ops-eqiad, 13Patch-For-Review: Decommission all old mediawiki appservers in eqiad - https://phabricator.wikimedia.org/T139353#2461108 (10Joe) p:05Triage>03Low [07:47:38] strangely enough, zhwiki seem to have gone [07:47:44] *api issues [07:49:36] there are some 503s with /w/api.php?centralauthtoken, perhaps unrelated? [07:50:23] RECOVERY - Apache HTTP on mw1278 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 612 bytes in 0.051 second response time [07:50:28] ha [07:50:35] I know what is that [07:51:42] RECOVERY - Apache HTTP on mw1275 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 612 bytes in 0.037 second response time [07:51:54] https://phabricator.wikimedia.org/T139970 [07:53:09] ah nice! [07:53:10] still 0 acknowledgements [07:54:07] <_joe_> jynus: I guess you should notify that to tgr and anomie at least [07:55:21] 06Operations, 10Ops-Access-Requests: Allow *-admin groups to see systemd logs for their units - https://phabricator.wikimedia.org/T137878#2461144 (10mobrovac) Wouldn't it be simpler just to fwd the logs to syslog and allow admins to read `/var/log/syslog` (which they currently can't) ? [07:55:37] !log removing api servers mw112[0-9] from service via conftool as first decom step (T139353) [07:55:38] T139353: Decommission all old mediawiki appservers in eqiad - https://phabricator.wikimedia.org/T139353 [07:55:42] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [07:57:28] we have only mw1120.eqiad.wmnet in the range, silly me [07:57:51] <_joe_> ? [07:57:56] <_joe_> how's that? [07:58:30] _joe_ I checked https://config-master.wikimedia.org/conftool/eqiad/api [07:58:53] <_joe_> elukey: uhm [07:59:14] I didn't check puppet though [07:59:38] <_joe_> yeah weird [07:59:38] seems consistent with conftool [07:59:47] <_joe_> we must have decommed those a long time ago [07:59:51] <_joe_> and I didn't remember [07:59:53] <_joe_> meh [08:00:15] so I guess I can proceed with 111[4-9] then [08:00:27] <_joe_> the last ones remaining? [08:00:31] <_joe_> because those are canaries [08:00:41] yes last ones [08:00:43] PROBLEM - puppet last run on cp4010 is CRITICAL: CRITICAL: puppet fail [08:00:45] <_joe_> and you have to remove them from other places [08:00:49] <_joe_> as well [08:00:51] ahh okok got it [08:00:54] will look into puppet [08:01:09] <_joe_> but yes for now just remove them from conftool [08:01:49] !log removing api servers mw111[4-9] from service via conftool as first decom step (T139353) [08:01:50] T139353: Decommission all old mediawiki appservers in eqiad - https://phabricator.wikimedia.org/T139353 [08:01:55] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [08:03:34] <_joe_> !log removing appservers mw1018-25 from service via conftool for decommissioning (T139353) [08:03:34] T139353: Decommission all old mediawiki appservers in eqiad - https://phabricator.wikimedia.org/T139353 [08:03:39] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [08:05:14] RECOVERY - puppet last run on cp4010 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [08:05:24] PROBLEM - puppet last run on mw1288 is CRITICAL: CRITICAL: Puppet has 2 failures [08:06:06] !log upgrading cache misc to varnishkafka 1.0.11-1 [08:06:11] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [08:06:25] PROBLEM - Apache HTTP on mw1281 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - 50404 bytes in 0.017 second response time [08:07:05] PROBLEM - Apache HTTP on mw1279 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - 50404 bytes in 0.011 second response time [08:07:08] (03PS6) 10Giuseppe Lavagetto: puppet: add a function for performing conftool lookups [puppet] - 10https://gerrit.wikimedia.org/r/283151 [08:07:22] (03CR) 10Giuseppe Lavagetto: [C: 032] puppet: add a function for performing conftool lookups [puppet] - 10https://gerrit.wikimedia.org/r/283151 (owner: 10Giuseppe Lavagetto) [08:09:25] PROBLEM - Apache HTTP on mw1289 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - 50404 bytes in 0.022 second response time [08:12:15] RECOVERY - Apache HTTP on mw1281 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 612 bytes in 0.045 second response time [08:13:35] PROBLEM - Apache HTTP on mw1285 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - 50404 bytes in 0.012 second response time [08:15:35] RECOVERY - Apache HTTP on mw1285 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 612 bytes in 0.042 second response time [08:15:35] PROBLEM - puppet last run on mw1280 is CRITICAL: CRITICAL: Puppet has 2 failures [08:16:25] PROBLEM - Apache HTTP on mw1287 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - 50404 bytes in 0.006 second response time [08:16:31] 06Operations, 10ops-eqiad, 10DBA: dbstore1002 disk failure causing lag - https://phabricator.wikimedia.org/T140337#2461162 (10jcrespo) [08:17:36] PROBLEM - Apache HTTP on mw1280 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - 50404 bytes in 0.004 second response time [08:18:45] RECOVERY - Apache HTTP on mw1279 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 612 bytes in 0.039 second response time [08:18:51] !log running "megacli -PDOffline -PhysDrv '[32:6]' -aALL" on dbstore1002 to debug issue T140337 [08:18:52] T140337: dbstore1002 disk failure causing lag - https://phabricator.wikimedia.org/T140337 [08:18:56] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [08:19:25] PROBLEM - Apache HTTP on mw1288 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - 50404 bytes in 0.007 second response time [08:19:32] <_joe_> shit, again? [08:19:36] now wait [08:20:26] <_joe_> moritzm: is it you upgrading HHVM? [08:21:13] I don't think so, he was upgrading codfw afaik [08:22:21] and also there are some API servers this time [08:22:30] <_joe_> those are all APIs [08:22:55] PROBLEM - Apache HTTP on mw1284 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - 50404 bytes in 0.010 second response time [08:23:18] yea, in mw12[78*] were upgraded by me, there were a few still missing from yesterday's restart in eqiad [08:24:20] no idea why those in particular were triggering alerts, I upgraded them in a similar manner to yesterday (one at a time with 30 seconds of delay between each server) and none of those caused an Icinga alert yesterday [08:24:25] PROBLEM - MegaRAID on dbstore1002 is CRITICAL: CRITICAL: 1 failed LD(s) (Degraded) [08:25:34] RECOVERY - Apache HTTP on mw1280 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 612 bytes in 0.042 second response time [08:25:59] 06Operations, 10ops-eqiad, 10DBA: dbstore1002 disk failure causing lag - https://phabricator.wikimedia.org/T140337#2461192 (10jcrespo) It is not the disk, I am going to rebuild it into the RAID. [08:27:04] RECOVERY - Apache HTTP on mw1289 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 613 bytes in 0.402 second response time [08:29:25] RECOVERY - Apache HTTP on mw1288 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 612 bytes in 0.082 second response time [08:31:04] RECOVERY - Apache HTTP on mw1284 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 612 bytes in 0.035 second response time [08:31:04] RECOVERY - puppet last run on mw1288 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [08:34:55] RECOVERY - Apache HTTP on mw1287 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 612 bytes in 0.129 second response time [08:36:16] PROBLEM - Apache HTTP on mw1280 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - 50404 bytes in 0.011 second response time [08:38:16] RECOVERY - Apache HTTP on mw1280 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 612 bytes in 0.042 second response time [08:39:30] !log restarted hhvm on mw1289 mw1280 mw1288 mw1284 mw1287 [08:39:36] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [08:40:23] !log deploying 0e9555f to scb nodes [08:40:28] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [08:40:29] !log for ores [08:40:34] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [08:41:05] RECOVERY - Disk space on ms-be3004 is OK: DISK OK [08:42:06] RECOVERY - puppet last run on mw1280 is OK: OK: Puppet is currently enabled, last run 33 seconds ago with 0 failures [08:45:54] okay, deployment is done [08:47:03] _joe_: per our new deployment, you won't have memory issues anymore [08:47:20] we let it be tested for 24 hours in labs [08:47:53] (03CR) 10Gehel: [C: 04-1] Move updater logs config to /etc/wdqs (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/298880 (https://phabricator.wikimedia.org/T139434) (owner: 10Smalyshev) [08:48:27] <_joe_> Amir1: hopefully! [08:49:04] I monitor scb memory for the next hour or so [08:54:13] (03CR) 10Mark Bergsma: [C: 031] Allow aklapper to delete files in Phabricator [puppet] - 10https://gerrit.wikimedia.org/r/298494 (owner: 10Aklapper) [08:55:24] (03Draft1) 10Addshore: WIP introduce wmde-analytics-admins group [puppet] - 10https://gerrit.wikimedia.org/r/298928 [09:00:54] 06Operations, 10media-storage: investigate swift used space spikes since June 2016 - https://phabricator.wikimedia.org/T140075#2461280 (10fgiunchedi) Thanks @Fae for the through answer! >>! In T140075#2458566, @Fae wrote: > This is due to the NYPL uploads, which are coming to an end, though maybe with some fi... [09:01:10] 06Operations, 10Ops-Access-Requests, 13Patch-For-Review, 15User-zeljkofilipin: MediaWiki deployment shell access request for zfilipin - https://phabricator.wikimedia.org/T140264#2461281 (10zeljkofilipin) [09:01:37] 06Operations, 10ops-eqiad, 10DBA: dbstore1002 disk failure causing lag - https://phabricator.wikimedia.org/T140337#2461285 (10jcrespo) On rebuild I am getting more and more media/other/predictive errors, I think the drive should still be replaced, but @Cmjohnson has the last word on this. I will deal with th... [09:01:55] 06Operations, 10ops-eqiad, 10DBA: dbstore1002 disk errors - https://phabricator.wikimedia.org/T140337#2461287 (10jcrespo) [09:03:29] (03PS1) 10Elukey: Increase the Varnishkafka VSL API timeout to 700 seconds [puppet] - 10https://gerrit.wikimedia.org/r/298929 (https://phabricator.wikimedia.org/T136314) [09:03:47] (03CR) 10Zfilipin: [C: 031] "Greg gave +1 in phab" [puppet] - 10https://gerrit.wikimedia.org/r/298792 (https://phabricator.wikimedia.org/T140264) (owner: 10Thcipriani) [09:04:24] 07Puppet, 10ORES, 06Revision-Scoring-As-A-Service, 07Easy: Puppet fails on new web node - https://phabricator.wikimedia.org/T140265#2461298 (10Ladsgroup) Okay, There are two parts for this: # /srv/log directory is not there # the system tries to connect "tin.eqiad.wmnet" which is not possible. We foo... [09:04:42] 06Operations: Upgrade phpredis client on zend - https://phabricator.wikimedia.org/T112694#2461300 (10fgiunchedi) sigh, that's right @Legoktm ! It looks like {T98813} is stalled though, anyways I don't think and upgraded phpredis for wikitech would make a practical difference (?) [09:05:11] (03PS2) 10Elukey: Increase the Varnishkafka VSL API timeout to 700 seconds [puppet] - 10https://gerrit.wikimedia.org/r/298929 (https://phabricator.wikimedia.org/T136314) [09:08:01] 06Operations, 10Ops-Access-Requests, 06WMDE-Analytics-Engineering: Requesting sudo access to analytics-wmde user on stat1002 for Addshore - https://phabricator.wikimedia.org/T140342#2461330 (10Addshore) [09:08:25] (03CR) 10Elukey: "https://puppet-compiler.wmflabs.org/3332/" [puppet] - 10https://gerrit.wikimedia.org/r/298929 (https://phabricator.wikimedia.org/T136314) (owner: 10Elukey) [09:09:18] (03PS2) 10Addshore: WIP introduce wmde-analytics-admins group [puppet] - 10https://gerrit.wikimedia.org/r/298928 (https://phabricator.wikimedia.org/T140342) [09:16:04] (03CR) 10Legoktm: [C: 04-1] "The whole override stanza should be removed so we inherit production's value" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/298919 (https://phabricator.wikimedia.org/T38891) (owner: 10Greg Grossmeier) [09:16:07] (03CR) 10Elukey: "I am also seeing tons of:" [puppet] - 10https://gerrit.wikimedia.org/r/298779 (https://phabricator.wikimedia.org/T132324) (owner: 10Elukey) [09:19:42] 06Operations: Upgrade phpredis client on zend - https://phabricator.wikimedia.org/T112694#2461363 (10Legoktm) I'm not sure...that's an @aaron question. [09:25:46] (03PS2) 10Giuseppe Lavagetto: scap: use conftool data to populate dsh groups [puppet] - 10https://gerrit.wikimedia.org/r/283201 (https://phabricator.wikimedia.org/T132529) [09:27:03] (03CR) 10jenkins-bot: [V: 04-1] scap: use conftool data to populate dsh groups [puppet] - 10https://gerrit.wikimedia.org/r/283201 (https://phabricator.wikimedia.org/T132529) (owner: 10Giuseppe Lavagetto) [09:27:29] (03PS2) 10Elukey: Remove cronspam coming from Gerrit log deletion [puppet] - 10https://gerrit.wikimedia.org/r/298779 (https://phabricator.wikimedia.org/T132324) [09:29:20] (03PS1) 10KartikMistry: Beta: Fix restbase_url for ContentTranslation [mediawiki-config] - 10https://gerrit.wikimedia.org/r/298930 (https://phabricator.wikimedia.org/T129284) [09:29:33] (03CR) 10Elukey: "All right the issue mentioned above has been fixed. I am going to watch the next logs to root@ to figure out what should need to be fixed." [puppet] - 10https://gerrit.wikimedia.org/r/298779 (https://phabricator.wikimedia.org/T132324) (owner: 10Elukey) [09:30:45] (03PS2) 10KartikMistry: Beta: Fix ContentTranslationRESTBase for Content Translation [mediawiki-config] - 10https://gerrit.wikimedia.org/r/298930 (https://phabricator.wikimedia.org/T129284) [09:32:26] (03PS1) 10Addshore: Add more to stats:wmde config [puppet] - 10https://gerrit.wikimedia.org/r/298931 [09:32:32] (03PS1) 10Filippo Giunchedi: prometheus: run server with default group [puppet] - 10https://gerrit.wikimedia.org/r/298932 [09:34:51] godog: anything against me merging https://gerrit.wikimedia.org/r/#/c/298631/ ? [09:35:00] (03PS2) 10Elukey: Move node-specific versions to a cluster-wide setting [puppet] - 10https://gerrit.wikimedia.org/r/298631 (https://phabricator.wikimedia.org/T139639) (owner: 10Eevans) [09:35:14] 06Operations, 10ops-eqiad, 10DBA: dbstore1002 disk errors - https://phabricator.wikimedia.org/T140337#2461377 (10jcrespo) I think I killed the drive for good: ``` Rebuild Progress on Device at Enclosure 32, Slot 6 Completed 0% in 38 Minutes. Media Error Count: 777 Other Error Count: 2313 Predictive Failure... [09:35:58] (03PS3) 10Giuseppe Lavagetto: scap: use conftool data to populate dsh groups [puppet] - 10https://gerrit.wikimedia.org/r/283201 (https://phabricator.wikimedia.org/T132529) [09:36:23] this one will remove a lot of boring code reviews! ---^ [09:36:45] <_joe_> it's not enough imo [09:36:50] <_joe_> let me continue [09:37:06] (03CR) 10Filippo Giunchedi: [C: 031] Move node-specific versions to a cluster-wide setting [puppet] - 10https://gerrit.wikimedia.org/r/298631 (https://phabricator.wikimedia.org/T139639) (owner: 10Eevans) [09:37:11] elukey: nope, go for it! [09:37:45] (03PS11) 10Addshore: Deploy RevisionSlider to test and test2 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/296753 (https://phabricator.wikimedia.org/T138943) [09:37:57] PROBLEM - puppet last run on eeden is CRITICAL: CRITICAL: puppet fail [09:37:58] (03CR) 10Elukey: [C: 032] Move node-specific versions to a cluster-wide setting [puppet] - 10https://gerrit.wikimedia.org/r/298631 (https://phabricator.wikimedia.org/T139639) (owner: 10Eevans) [09:38:23] (03CR) 10Addshore: "PS11 is a rebase" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/296753 (https://phabricator.wikimedia.org/T138943) (owner: 10Addshore) [09:39:11] mobrovac: --^ [09:39:43] elukey: grazie! [09:39:52] (03PS12) 10Addshore: Deploy RevisionSlider to test and test2 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/296753 (https://phabricator.wikimedia.org/T138943) [09:40:48] (03PS1) 10Addshore: DNM WIP Enable RevisonSlider on dewiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/298933 (https://phabricator.wikimedia.org/T140232) [09:41:02] (03CR) 10Addshore: [C: 04-1] DNM WIP Enable RevisonSlider on dewiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/298933 (https://phabricator.wikimedia.org/T140232) (owner: 10Addshore) [09:43:39] (03CR) 10Mobrovac: [C: 04-1] "Needs the port too, see in-lined comment." (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/298930 (https://phabricator.wikimedia.org/T129284) (owner: 10KartikMistry) [09:45:28] 06Operations, 06Commons, 10media-storage, 07User-notice: Some fonts not anti-aliasing in SVG thumbnails after upgrade of scaling servers - https://phabricator.wikimedia.org/T139543#2461393 (10MoritzMuehlenhoff) Status update: This does not appear to be related to the upgrade of librsvg, but to some other d... [09:48:28] RECOVERY - MariaDB Slave Lag: s7 on dbstore1002 is OK: OK slave_sql_lag Replication lag: 187.10 seconds [09:48:50] it worked [09:53:18] (03PS2) 10Filippo Giunchedi: prometheus: run server with default group [puppet] - 10https://gerrit.wikimedia.org/r/298932 [09:53:25] (03CR) 10Filippo Giunchedi: [C: 032 V: 032] prometheus: run server with default group [puppet] - 10https://gerrit.wikimedia.org/r/298932 (owner: 10Filippo Giunchedi) [09:58:07] 06Operations, 15User-zeljkofilipin, 05WMF-NDA: Add zeljkof to #mediawiki_security IRC channel - https://phabricator.wikimedia.org/T140225#2461443 (10zeljkofilipin) Thanks! [09:58:34] !log powercycle ms-be1012, adding back replaced disk [09:58:39] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [10:01:37] RECOVERY - puppet last run on eeden is OK: OK: Puppet is currently enabled, last run 21 seconds ago with 0 failures [10:09:03] 06Operations, 10ops-eqiad: ms-be1012.eqiad.wmnet: slot=7 dev=sdh failed - https://phabricator.wikimedia.org/T140101#2461453 (10fgiunchedi) 05Open>03Resolved a:03fgiunchedi thanks! yeah the disk was marked as foreign config, cleared it and readded, now rebuilding [10:09:14] (03PS1) 10Ladsgroup: ores: add File['/srv/log'] for web nodes in labs [puppet] - 10https://gerrit.wikimedia.org/r/298935 (https://phabricator.wikimedia.org/T140265) [10:20:19] (03PS1) 10Giuseppe Lavagetto: role::mediawiki::webserver: add conftool scripts [puppet] - 10https://gerrit.wikimedia.org/r/298939 [10:20:21] (03PS1) 10Giuseppe Lavagetto: mediawiki::conftool: add mw-pool [puppet] - 10https://gerrit.wikimedia.org/r/298940 [10:23:20] (03PS1) 10Yuvipanda: tools: Don't validate namespace creation [puppet] - 10https://gerrit.wikimedia.org/r/298941 (https://phabricator.wikimedia.org/T140303) [10:23:44] (03PS2) 10Yuvipanda: tools: Don't validate namespace creation [puppet] - 10https://gerrit.wikimedia.org/r/298941 (https://phabricator.wikimedia.org/T140303) [10:26:41] (03CR) 10Yuvipanda: [C: 032] tools: Don't validate namespace creation [puppet] - 10https://gerrit.wikimedia.org/r/298941 (https://phabricator.wikimedia.org/T140303) (owner: 10Yuvipanda) [10:28:17] Hey Ops. I have a question about my yubikey (trying to make it work for accessing to prod). I always get this error: [10:28:20] https://www.irccloud.com/pastebin/euPHNLtU/ [10:28:37] I can't add or remove it to the ssh-agent [10:28:57] I'm following this: https://wikitech.wikimedia.org/wiki/Yubikey-SSH [10:29:09] googled the error nothing useful came up [10:32:08] PROBLEM - Text HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 11.11% of data above the critical threshold [1000.0] [10:33:19] PROBLEM - Eqiad HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 11.11% of data above the critical threshold [1000.0] [10:38:39] RECOVERY - Eqiad HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] [10:38:41] RECOVERY - Text HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] [10:39:56] Amir1: fyi, your changeprop config changes were deployed earlier today [10:40:11] mobrovac: oh, I saw. Thanks [10:40:11] (03PS1) 10Yuvipanda: k8s: Disable service accounts [puppet] - 10https://gerrit.wikimedia.org/r/298946 (https://phabricator.wikimedia.org/T140347) [10:40:39] mobrovac: we just deployed a change in ores that reduce pressure on other service greatly [10:40:45] https://grafana.wikimedia.org/dashboard/db/ores?panelId=6&fullscreen [10:40:46] (03PS2) 10Yuvipanda: k8s: Disable service accounts [puppet] - 10https://gerrit.wikimedia.org/r/298946 (https://phabricator.wikimedia.org/T140347) [10:40:53] this won't go down anymore [10:47:24] (03CR) 10Ema: [C: 031] Increase the Varnishkafka VSL API timeout to 700 seconds [puppet] - 10https://gerrit.wikimedia.org/r/298929 (https://phabricator.wikimedia.org/T136314) (owner: 10Elukey) [10:47:37] (03CR) 10Yuvipanda: [C: 032] k8s: Disable service accounts [puppet] - 10https://gerrit.wikimedia.org/r/298946 (https://phabricator.wikimedia.org/T140347) (owner: 10Yuvipanda) [11:00:46] (03Abandoned) 10Giuseppe Lavagetto: mediawiki: add conftool-specifc credentials and scripts [puppet] - 10https://gerrit.wikimedia.org/r/258979 (owner: 10Giuseppe Lavagetto) [11:07:00] (03PS2) 10Jcrespo: phab: notes on DB dependencies for rt & bz update jobs [puppet] - 10https://gerrit.wikimedia.org/r/298794 (owner: 10Rush) [11:12:42] (03CR) 10Jcrespo: [C: 032] phab: notes on DB dependencies for rt & bz update jobs [puppet] - 10https://gerrit.wikimedia.org/r/298794 (owner: 10Rush) [11:17:21] 06Operations, 10ops-eqiad: rack/setup/install/deploy labsdb1009-labsdb1011 - https://phabricator.wikimedia.org/T136860#2461593 (10jcrespo) Consider this resolved- I can take them from here. I need to get blocked by a meeting with @chasemp before implementing this. [11:19:31] (03PS3) 10Elukey: Increase the Varnishkafka VSL API timeout to 700 seconds [puppet] - 10https://gerrit.wikimedia.org/r/298929 (https://phabricator.wikimedia.org/T136314) [11:19:44] 06Operations, 10ops-eqiad: rack/setup/install/deploy labsdb1009-labsdb1011 - https://phabricator.wikimedia.org/T136860#2461594 (10jcrespo) I will ask you to move at some point in the future labsdb1008 back to the production network as db1095, but not anytime soon. [11:21:59] (03CR) 10Elukey: [C: 032] Increase the Varnishkafka VSL API timeout to 700 seconds [puppet] - 10https://gerrit.wikimedia.org/r/298929 (https://phabricator.wikimedia.org/T136314) (owner: 10Elukey) [11:30:09] (03PS3) 10KartikMistry: Beta: Fix restbase_url for ContentTranslation [mediawiki-config] - 10https://gerrit.wikimedia.org/r/298930 (https://phabricator.wikimedia.org/T129284) [11:31:23] (03CR) 10Jcrespo: "My only blocker for this is if this would cause a massive connection drop on all dbs, otherwise, it is good to go." [puppet] - 10https://gerrit.wikimedia.org/r/298033 (owner: 10Dzahn) [11:32:40] (03PS1) 10KartikMistry: Beta: Fix cxserver restbase_url [puppet] - 10https://gerrit.wikimedia.org/r/298947 (https://phabricator.wikimedia.org/T129284) [11:35:01] (03PS4) 10Jcrespo: typos file: add 'mariabd' and 'eqad' [puppet] - 10https://gerrit.wikimedia.org/r/298033 (owner: 10Dzahn) [11:35:49] (03CR) 10Mobrovac: [C: 031] Beta: Fix restbase_url for ContentTranslation [mediawiki-config] - 10https://gerrit.wikimedia.org/r/298930 (https://phabricator.wikimedia.org/T129284) (owner: 10KartikMistry) [11:37:19] (03PS4) 10Jcrespo: beta: send MariaDB errors to syslog [puppet] - 10https://gerrit.wikimedia.org/r/296713 (https://phabricator.wikimedia.org/T119370) (owner: 10Hashar) [11:39:23] (03CR) 10Jcrespo: [C: 032] beta: send MariaDB errors to syslog [puppet] - 10https://gerrit.wikimedia.org/r/296713 (https://phabricator.wikimedia.org/T119370) (owner: 10Hashar) [11:50:39] (03CR) 10Santhosh: [C: 031] Beta: Fix cxserver restbase_url [puppet] - 10https://gerrit.wikimedia.org/r/298947 (https://phabricator.wikimedia.org/T129284) (owner: 10KartikMistry) [12:28:17] (03CR) 10Mobrovac: [C: 04-1] "The current version is the correct one, this patch won't work." [puppet] - 10https://gerrit.wikimedia.org/r/298947 (https://phabricator.wikimedia.org/T129284) (owner: 10KartikMistry) [12:47:01] (03CR) 10Nikerabbit: Beta: Fix restbase_url for ContentTranslation (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/298930 (https://phabricator.wikimedia.org/T129284) (owner: 10KartikMistry) [13:07:35] (03PS5) 10Giuseppe Lavagetto: puppetmaster: correct puppetization of the private repo [puppet] - 10https://gerrit.wikimedia.org/r/298258 (https://phabricator.wikimedia.org/T98173) [13:07:37] (03PS1) 10Giuseppe Lavagetto: puppetmaster: puppetize private post-commit hook [puppet] - 10https://gerrit.wikimedia.org/r/298958 (https://phabricator.wikimedia.org/T98173) [13:10:00] (03CR) 10jenkins-bot: [V: 04-1] puppetmaster: puppetize private post-commit hook [puppet] - 10https://gerrit.wikimedia.org/r/298958 (https://phabricator.wikimedia.org/T98173) (owner: 10Giuseppe Lavagetto) [13:12:40] 06Operations, 10Traffic, 07HTTPS, 13Patch-For-Review: Enforce HTTPS+HSTS on remaining one-off sites in wikimedia.org that don't use standard cache cluster termination - https://phabricator.wikimedia.org/T132521#2461831 (10BBlack) We probably have to be a little bit careful about broad "https all the things... [13:12:52] 06Operations, 06Commons, 10media-storage, 07User-notice: Some fonts not anti-aliasing in SVG thumbnails after upgrade of scaling servers - https://phabricator.wikimedia.org/T139543#2461832 (10MoritzMuehlenhoff) Status update: The SVG is rendered fine with Debian unstable: https://people.wikimedia.org/~jmm/... [13:13:45] (03CR) 10Ottomata: Add more to stats:wmde config (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/298931 (owner: 10Addshore) [13:14:22] 06Operations, 10DBA, 06Labs, 10Tool-Labs, 10Traffic: Antigng-bot improper non-api http requests - https://phabricator.wikimedia.org/T137707#2461834 (10jcrespo) 05Open>03Resolved a:03jcrespo [13:15:09] (03PS2) 10Addshore: Add more to stats:wmde config [puppet] - 10https://gerrit.wikimedia.org/r/298931 [13:16:37] (03CR) 10Ottomata: "To be consistent, can we call this analytics-wmde-users? The analytics-search-users group allows folks to sudo as the analyitcs-search us" [puppet] - 10https://gerrit.wikimedia.org/r/298928 (https://phabricator.wikimedia.org/T140342) (owner: 10Addshore) [13:18:23] (03CR) 10Ottomata: "You should add the user to admin::groups in hieradata/role/common/statistics/private.yaml" [puppet] - 10https://gerrit.wikimedia.org/r/298928 (https://phabricator.wikimedia.org/T140342) (owner: 10Addshore) [13:19:06] PROBLEM - mobileapps endpoints health on scb1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [13:19:24] (03PS3) 10Addshore: Introduce wmde-analytics-admins group [puppet] - 10https://gerrit.wikimedia.org/r/298928 (https://phabricator.wikimedia.org/T140342) [13:22:56] RECOVERY - mobileapps endpoints health on scb1001 is OK: All endpoints are healthy [13:32:02] 06Operations, 06Analytics-Kanban, 10Traffic, 13Patch-For-Review: Verify why varnishkafka stats and webrequest logs count differs - https://phabricator.wikimedia.org/T136314#2461880 (10elukey) We are now in pending verification of fix, let's see if oozie will not complain during the next couple of days. It... [13:34:52] (03PS2) 10Giuseppe Lavagetto: puppetmaster: puppetize private post-commit hook [puppet] - 10https://gerrit.wikimedia.org/r/298958 (https://phabricator.wikimedia.org/T98173) [13:34:54] (03PS6) 10Giuseppe Lavagetto: puppetmaster: correct puppetization of the private repo [puppet] - 10https://gerrit.wikimedia.org/r/298258 (https://phabricator.wikimedia.org/T98173) [13:35:12] (03CR) 10Ottomata: "Cool, now you need a phab ticket and ops approval :)" [puppet] - 10https://gerrit.wikimedia.org/r/298928 (https://phabricator.wikimedia.org/T140342) (owner: 10Addshore) [13:36:59] (03PS4) 10KartikMistry: Beta: Remove restbase_url for ContentTranslation [mediawiki-config] - 10https://gerrit.wikimedia.org/r/298930 (https://phabricator.wikimedia.org/T129284) [13:37:02] (03CR) 10jenkins-bot: [V: 04-1] puppetmaster: puppetize private post-commit hook [puppet] - 10https://gerrit.wikimedia.org/r/298958 (https://phabricator.wikimedia.org/T98173) (owner: 10Giuseppe Lavagetto) [13:37:33] (03CR) 10Giuseppe Lavagetto: [C: 032] puppetmaster: correct puppetization of the private repo [puppet] - 10https://gerrit.wikimedia.org/r/298258 (https://phabricator.wikimedia.org/T98173) (owner: 10Giuseppe Lavagetto) [13:38:07] PROBLEM - Host payments1005 is DOWN: PING CRITICAL - Packet loss = 100% [13:38:13] PROBLEM - Host payments1004 is DOWN: PING CRITICAL - Packet loss = 100% [13:38:30] this time this is not codfw [13:38:56] <_joe_> nope [13:39:39] Jeff_Green: ^ [13:39:52] yep. forgot to set downtime, sorry [13:40:41] PROBLEM - Router interfaces on pfw-eqiad is CRITICAL: CRITICAL: host 208.80.154.218, interfaces up: 108, down: 1, dormant: 0, excluded: 1, unused: 0BRge-11/0/1: down - payments4BR [13:40:42] I also checked as a user and things seemed to work [13:44:10] PROBLEM - puppet last run on rhodium is CRITICAL: CRITICAL: Puppet has 4 failures [13:44:11] RECOVERY - Host payments1005 is UP: PING OK - Packet loss = 0%, RTA = 1.57 ms [13:44:36] <_joe_> rhodium is me ofc [13:44:40] (03PS1) 10Gehel: WIP - postgresql::user should be idempotent [puppet] - 10https://gerrit.wikimedia.org/r/298960 (https://phabricator.wikimedia.org/T138092) [13:46:10] (03PS1) 10Giuseppe Lavagetto: puppetmaster::gitprivate: correctly support bare repositories [puppet] - 10https://gerrit.wikimedia.org/r/298961 [13:47:48] (03CR) 10Giuseppe Lavagetto: [C: 032 V: 032] puppetmaster::gitprivate: correctly support bare repositories [puppet] - 10https://gerrit.wikimedia.org/r/298961 (owner: 10Giuseppe Lavagetto) [13:50:00] RECOVERY - puppet last run on rhodium is OK: OK: Puppet is currently enabled, last run 36 seconds ago with 0 failures [13:50:20] PROBLEM - Host ganeti1004 is DOWN: PING CRITICAL - Packet loss = 100% [13:50:40] RECOVERY - Host payments1004 is UP: PING OK - Packet loss = 16%, RTA = 2.10 ms [13:51:22] ganeti1004 looks real [13:52:15] ah no, perhaps it is T138414, cmjohnson1 ^ ? [13:52:15] T138414: eqiad: Install SSD's into ganeti hosts - https://phabricator.wikimedia.org/T138414 [13:52:33] godog: it's me [13:52:57] sorry, i didn't verify it was in maintenance [13:53:05] (03PS1) 10Giuseppe Lavagetto: puppetmaster: /srv/private is owned by gitpuppet on replicas [puppet] - 10https://gerrit.wikimedia.org/r/298962 [13:54:25] cmjohnson1: np, I've downtimed it now [13:55:00] (03CR) 10Giuseppe Lavagetto: [C: 032] puppetmaster: /srv/private is owned by gitpuppet on replicas [puppet] - 10https://gerrit.wikimedia.org/r/298962 (owner: 10Giuseppe Lavagetto) [13:55:30] (03CR) 10KartikMistry: "Production has restbase_url = https://@lang.wikipedia.org/api/rest_v1/page/html/@title and this seems working fine." [puppet] - 10https://gerrit.wikimedia.org/r/298947 (https://phabricator.wikimedia.org/T129284) (owner: 10KartikMistry) [13:56:40] mobrovac: ^^ [13:57:26] aude: I'm ready when you are! :)_ [13:57:37] mobrovac: I'm wrong. [13:57:42] * aude back in ~5 minutes [14:00:04] aude: Dear anthropoid, the time has come. Please deploy RevisionSlider (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20160714T1400). [14:00:04] Addshore: A patch you scheduled for RevisionSlider is about to be deployed. Please be available during the process. [14:00:51] 06Operations, 10ops-eqiad: eqiad: Install SSD's into ganeti hosts - https://phabricator.wikimedia.org/T138414#2461943 (10Cmjohnson) a:05Cmjohnson>03akosiaris @akosiaris I added 2 800GB SSDs to ganeti1004. Assigning to you [14:01:12] (03CR) 10KartikMistry: "Sorry, Production has, $restbase_url = "http://restbase.svc.${::rb_site}.wmnet:7231/@lang.wikipedia.org/v1/page/html/@title", - which is s" [puppet] - 10https://gerrit.wikimedia.org/r/298947 (https://phabricator.wikimedia.org/T129284) (owner: 10KartikMistry) [14:01:18] (03PS1) 10Filippo Giunchedi: install_server: ms-be10[2-7] with jessie [puppet] - 10https://gerrit.wikimedia.org/r/298964 [14:01:30] (03CR) 10Faidon Liambotis: "I don't understand. Why isn't modules/role/manifest/pmacct.pp correct in the autoloader layout for the class "role::pmacct"?" [puppet] - 10https://gerrit.wikimedia.org/r/298911 (owner: 10Dzahn) [14:02:46] RECOVERY - Host ganeti1004 is UP: PING OK - Packet loss = 0%, RTA = 1.06 ms [14:04:10] (03CR) 10Filippo Giunchedi: [C: 032 V: 032] install_server: ms-be10[2-7] with jessie [puppet] - 10https://gerrit.wikimedia.org/r/298964 (owner: 10Filippo Giunchedi) [14:04:54] 06Operations: eqiad: Install SSD's into ganeti hosts - https://phabricator.wikimedia.org/T138414#2461961 (10Cmjohnson) [14:06:45] 06Operations, 10ops-eqiad: db1054 degraded RAID (failed disk) - https://phabricator.wikimedia.org/T139026#2461965 (10Cmjohnson) 05Open>03Resolved Disk is online...resolving [14:06:58] * aude back [14:07:04] *waves* [14:07:42] Hi aude. Have a nice deployment. [14:07:55] (03CR) 10Aude: [C: 032] Deploy RevisionSlider to test and test2 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/296753 (https://phabricator.wikimedia.org/T138943) (owner: 10Addshore) [14:07:58] :) [14:09:45] since we are adding RevisionSlider to extension-list, i wonder if we need to add the extension to wmf.9 / wmf.8 also [14:10:05] or else localisation update chokes? [14:10:51] *reads https://wikitech.wikimedia.org/wiki/How_to_deploy_code#Case_1c:_new_submodule_.28extension.2C_skin.2C_etc..29 * [14:11:06] NOTE: When adding a new extension to one branch, you also need to add the extension to any other branches in use on the cluster (typically the wmf.{N-1} branch), even if the extension will not be enabled on any wikis running that branch. Otherwise the localization cache builder will fail. [14:11:26] (03PS1) 10Cmjohnson: Adding mgmt dns for mirror1001 (carbon mirror) [dns] - 10https://gerrit.wikimedia.org/r/298965 [14:12:18] aude: https://gerrit.wikimedia.org/r/#/c/297613/ ? [14:12:19] addshore: thought so [14:12:26] Apparently I already have a patch ready... [14:12:46] wmf8 is enabled also [14:12:56] oooh, give me some ticks [14:13:12] i think we are skipping wmf9, though it's still there on tin [14:17:42] (03PS1) 10Elukey: Add the deploy-analytics user/group to the refinery role [puppet] - 10https://gerrit.wikimedia.org/r/298967 (https://phabricator.wikimedia.org/T129151) [14:18:42] (03PS2) 10Elukey: Add the deploy-analytics user/group to the refinery role [puppet] - 10https://gerrit.wikimedia.org/r/298967 (https://phabricator.wikimedia.org/T129151) [14:19:28] aude: afaik https://gerrit.wikimedia.org/r/#/c/298966/2 is correct for 8 [14:19:34] mutante: you merged https://gerrit.wikimedia.org/r/#/c/298688/ , but it's not live yet, is it? [14:19:57] ok [14:20:18] shame it wouldn't just let me cherry pick it accross in Gerrit! [14:21:33] MatmaRex hi, i think he said he was done for that day of restarting gerrit and will instead do it today, you may want to remind him when he comes online [14:22:00] paladox: ah. thanks [14:22:32] Your welcome [14:23:40] * aude waits for jenkins [14:23:48] :) [14:23:49] then will have to do scap when we are done [14:23:55] for i18n stuff [14:24:13] but will test things first on the test server [14:24:33] which test server will you be using? :) [14:25:02] https://wikitech.wikimedia.org/wiki/X-Wikimedia-Debug#Staging_changes [14:25:05] it's new procedure [14:25:14] yypu, but, 1017 or 1099? [14:25:21] 1099 [14:25:32] awesome! [14:25:34] 1017 might have other long running experiments [14:25:35] (03PS2) 10Cmjohnson: Adding mgmt and production dns including ipv6 for mirror1001 (carbon mirror) [dns] - 10https://gerrit.wikimedia.org/r/298965 [14:25:43] that we don't want to interfere with [14:26:03] 06Operations, 10ops-codfw, 10DBA: db2034 degraded RAID - https://phabricator.wikimedia.org/T136583#2462072 (10jcrespo) 05Open>03Resolved A month without crashing, we will reopen if it happens again. [14:26:33] (03PS7) 10Mobrovac: Parsoid: Move to service::node [puppet] - 10https://gerrit.wikimedia.org/r/298436 (https://phabricator.wikimedia.org/T90668) [14:28:18] PROBLEM - puppet last run on analytics1040 is CRITICAL: CRITICAL: Puppet has 1 failures [14:28:28] (03CR) 10Aude: [C: 032] Deploy RevisionSlider to test and test2 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/296753 (https://phabricator.wikimedia.org/T138943) (owner: 10Addshore) [14:28:40] (03CR) 10Cmjohnson: [C: 032] Adding mgmt and production dns including ipv6 for mirror1001 (carbon mirror) [dns] - 10https://gerrit.wikimedia.org/r/298965 (owner: 10Cmjohnson) [14:29:54] (03PS2) 10KartikMistry: Beta: Fix cxserver restbase_url [puppet] - 10https://gerrit.wikimedia.org/r/298947 (https://phabricator.wikimedia.org/T129284) [14:29:57] (03PS1) 10BBlack: cache_upload: 1d FE TTL cap [puppet] - 10https://gerrit.wikimedia.org/r/298968 (https://phabricator.wikimedia.org/T124954) [14:29:59] (03PS1) 10BBlack: cache_misc: 1d FE TTL cap like the other clusters [puppet] - 10https://gerrit.wikimedia.org/r/298969 [14:30:01] (03PS1) 10BBlack: cache_misc: raise default_ttl to 1h [puppet] - 10https://gerrit.wikimedia.org/r/298970 (https://phabricator.wikimedia.org/T124954) [14:30:03] (03PS1) 10BBlack: caches: double the min_free_kbytes cap [puppet] - 10https://gerrit.wikimedia.org/r/298971 (https://phabricator.wikimedia.org/T135384) [14:30:03] jenkins is so slow... [14:30:05] (03PS1) 10BBlack: cache_text: raise FE mem size to 50% [puppet] - 10https://gerrit.wikimedia.org/r/298972 (https://phabricator.wikimedia.org/T135384) [14:30:07] (03PS1) 10BBlack: cache_upload: raise FE mem size to 50% [puppet] - 10https://gerrit.wikimedia.org/r/298973 (https://phabricator.wikimedia.org/T135384) [14:30:13] haha :D [14:30:51] ok, there's wmf.8 [14:31:54] 06Operations, 06Discovery, 06Discovery-Search-Backlog, 10Elasticsearch, and 3 others: Decommission elastic1001-1016 - https://phabricator.wikimedia.org/T139758#2441721 (10Cmjohnson) The dns entries were removed but not propagated. I pushed the changed today. [14:33:10] 06Operations, 10ops-eqiad: rack/setup/install/deploy labsdb1009-labsdb1011 - https://phabricator.wikimedia.org/T136860#2462120 (10Cmjohnson) 05Open>03Resolved per jcrespo, resolving this task...please create a new task for db1095 [14:33:35] 06Operations, 10ops-eqiad, 13Patch-For-Review: Decommission all old mediawiki appservers in eqiad - https://phabricator.wikimedia.org/T139353#2462123 (10elukey) Next step for mw1114-mw1148 is to wait until next week before proceeding any further (better safe than sorry). [14:34:01] seems that zuul is not triggered for https://gerrit.wikimedia.org/r/#/c/296753/ [14:34:05] 06Operations, 10ops-eqiad: Rack/Setup Carbon/Apt Server Replacement - https://phabricator.wikimedia.org/T139171#2462133 (10Cmjohnson) [14:34:24] and there is no rebase button (i don't think it needs rebase) [14:34:28] aude: let me check [14:34:33] (03PS13) 10Addshore: Deploy RevisionSlider to test and test2 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/296753 (https://phabricator.wikimedia.org/T138943) [14:34:44] ok [14:34:45] thanks [14:34:48] try that one aude (removed the Depends-On) [14:34:52] ok [14:35:01] 06Operations, 10ops-eqiad: Rack/Setup Carbon/Apt Server Replacement - https://phabricator.wikimedia.org/T139171#2421526 (10Cmjohnson) Mgmt and production DNS completed. I added to public vlan and assigned both ipv4 and ipv6. [14:35:20] yeah the depends-on would prevent it from even entering the queue I guess [14:35:27] jynus: hello ping me when you are ready [14:35:33] (03CR) 10Aude: [C: 032] Deploy RevisionSlider to test and test2 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/296753 (https://phabricator.wikimedia.org/T138943) (owner: 10Addshore) [14:35:38] (03CR) 10Mobrovac: "PS7 PCC - https://puppet-compiler.wmflabs.org/3334/" [puppet] - 10https://gerrit.wikimedia.org/r/298436 (https://phabricator.wikimedia.org/T90668) (owner: 10Mobrovac) [14:35:50] look like it is in the queue now [14:35:55] seems to work now [14:36:16] (03CR) 10BBlack: [C: 032] cache_upload: 1d FE TTL cap [puppet] - 10https://gerrit.wikimedia.org/r/298968 (https://phabricator.wikimedia.org/T124954) (owner: 10BBlack) [14:36:30] (03CR) 10BBlack: [C: 032] cache_misc: 1d FE TTL cap like the other clusters [puppet] - 10https://gerrit.wikimedia.org/r/298969 (owner: 10BBlack) [14:36:36] papaul, I will bring down es2011-es2013 [14:36:42] (03CR) 10BBlack: [C: 032] cache_misc: raise default_ttl to 1h [puppet] - 10https://gerrit.wikimedia.org/r/298970 (https://phabricator.wikimedia.org/T124954) (owner: 10BBlack) [14:36:46] (03Merged) 10jenkins-bot: Deploy RevisionSlider to test and test2 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/296753 (https://phabricator.wikimedia.org/T138943) (owner: 10Addshore) [14:36:52] (03CR) 10BBlack: [C: 032] caches: double the min_free_kbytes cap [puppet] - 10https://gerrit.wikimedia.org/r/298971 (https://phabricator.wikimedia.org/T135384) (owner: 10BBlack) [14:36:57] It's in! :) [14:37:02] \o/ [14:37:29] jynus: ok [14:38:11] (03CR) 10KartikMistry: "We need to use Production restbase_url as Beta doesn't have real articles which we can use for testing purpose." [puppet] - 10https://gerrit.wikimedia.org/r/298947 (https://phabricator.wikimedia.org/T129284) (owner: 10KartikMistry) [14:38:41] !log aude@tin Synchronized wmf-config/extension-list: Add RevisionSlider to extension-list (duration: 00m 42s) [14:38:46] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [14:38:58] then i'll test the next part on 1099 [14:39:09] tell me when it is there and I'll test it too :) [14:39:33] (03PS1) 10Faidon Liambotis: admin: ensure => absent Brett Simmer's account [puppet] - 10https://gerrit.wikimedia.org/r/298975 [14:39:35] (03PS1) 10Faidon Liambotis: admin: add an NDA audit helper script [puppet] - 10https://gerrit.wikimedia.org/r/298976 [14:39:58] (03CR) 10Faidon Liambotis: [C: 032] admin: ensure => absent Brett Simmer's account [puppet] - 10https://gerrit.wikimedia.org/r/298975 (owner: 10Faidon Liambotis) [14:40:49] ok [14:41:40] (03CR) 10jenkins-bot: [V: 04-1] admin: add an NDA audit helper script [puppet] - 10https://gerrit.wikimedia.org/r/298976 (owner: 10Faidon Liambotis) [14:41:42] (03PS3) 10Elukey: Add the deploy-analytics user/group to the refinery role [puppet] - 10https://gerrit.wikimedia.org/r/298967 (https://phabricator.wikimedia.org/T129151) [14:41:58] addshore: it's there (won't have i18n stuff) [14:42:02] how do i see it? [14:42:09] (03PS1) 10Filippo Giunchedi: add ms-be102[2-7] [puppet] - 10https://gerrit.wikimedia.org/r/298978 (https://phabricator.wikimedia.org/T136631) [14:42:21] I see it on specil version :) [14:42:29] no i18n, let me test it quickly! [14:42:34] it's a beta feature? [14:43:04] https://test.wikipedia.org/w/index.php?diff=229663&oldid=229662 [14:43:06] \o/ [14:43:06] yup [14:43:10] all looks good to me [14:43:24] ok [14:44:35] !log aude@tin Synchronized wmf-config/InitialiseSettings.php: Enable RevisionSlider on test wikis (duration: 00m 27s) [14:44:39] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [14:44:53] (03PS4) 10Elukey: Add the deploy-analytics user/group to the refinery role [puppet] - 10https://gerrit.wikimedia.org/r/298967 (https://phabricator.wikimedia.org/T129151) [14:44:55] (03PS2) 10Faidon Liambotis: admin: add an NDA audit helper script [puppet] - 10https://gerrit.wikimedia.org/r/298976 [14:45:15] !log aude@tin Synchronized wmf-config/CommonSettings.php: Enable RevisionSlider on test wikis (duration: 00m 28s) [14:45:19] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [14:45:23] (03PS2) 10Filippo Giunchedi: add ms-be102[2-7] [puppet] - 10https://gerrit.wikimedia.org/r/298978 (https://phabricator.wikimedia.org/T136631) [14:45:30] (03CR) 10Filippo Giunchedi: [C: 032 V: 032] add ms-be102[2-7] [puppet] - 10https://gerrit.wikimedia.org/r/298978 (https://phabricator.wikimedia.org/T136631) (owner: 10Filippo Giunchedi) [14:45:35] (03PS1) 10Paladox: Redirect www. to non www. gerrit domain [puppet] - 10https://gerrit.wikimedia.org/r/298979 [14:45:52] (03PS2) 10Paladox: Redirect www. to non www. gerrit domain [puppet] - 10https://gerrit.wikimedia.org/r/298979 [14:46:01] (03PS5) 10Elukey: Add the deploy-analytics user/group to the refinery role [puppet] - 10https://gerrit.wikimedia.org/r/298967 (https://phabricator.wikimedia.org/T129151) [14:46:05] now scap for i18n [14:46:14] yup! :) [14:46:54] (03PS3) 10Paladox: Redirect www. to non www. gerrit domain [puppet] - 10https://gerrit.wikimedia.org/r/298979 [14:47:03] (03PS4) 10Paladox: Redirect www. to non www. gerrit domain [puppet] - 10https://gerrit.wikimedia.org/r/298979 [14:47:53] !log aude@tin Started scap: (no message) [14:47:55] !log aude@tin scap aborted: (no message) (duration: 00m 02s) [14:47:57] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [14:48:00] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [14:48:05] !log shutting down es2011, es2012, es2013 for hardware maintenance T139714 [14:48:06] T139714: BIOS upgrade on certain codfw machines - https://phabricator.wikimedia.org/T139714 [14:48:09] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [14:48:09] !log aude@tin Started scap: Update i18n for RevisionSlider [14:48:13] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [14:48:22] papaul, all these 3 are yours [14:48:25] * aude waits [14:48:44] PROBLEM - puppet last run on mw2105 is CRITICAL: CRITICAL: Puppet has 1 failures [14:49:04] PROBLEM - puppet last run on mw1205 is CRITICAL: CRITICAL: Puppet has 1 failures [14:49:05] PROBLEM - puppet last run on mw2240 is CRITICAL: CRITICAL: Puppet has 1 failures [14:49:05] (03CR) 10Elukey: [C: 032] Add the deploy-analytics user/group to the refinery role [puppet] - 10https://gerrit.wikimedia.org/r/298967 (https://phabricator.wikimedia.org/T129151) (owner: 10Elukey) [14:49:05] PROBLEM - puppet last run on mw1300 is CRITICAL: CRITICAL: Puppet has 1 failures [14:49:15] PROBLEM - puppet last run on mw2114 is CRITICAL: CRITICAL: Puppet has 1 failures [14:49:34] PROBLEM - puppet last run on bast1001 is CRITICAL: CRITICAL: Puppet has 1 failures [14:49:34] PROBLEM - puppet last run on mw1142 is CRITICAL: CRITICAL: Puppet has 1 failures [14:49:52] jynus: thanks [14:49:55] PROBLEM - puppet last run on mw2244 is CRITICAL: CRITICAL: Puppet has 1 failures [14:50:05] PROBLEM - puppet last run on mw1294 is CRITICAL: CRITICAL: Puppet has 1 failures [14:50:05] PROBLEM - puppet last run on mw1143 is CRITICAL: CRITICAL: Puppet has 1 failures [14:50:11] ugh [14:50:13] :/ [14:50:15] that's me I think [14:50:16] PROBLEM - puppet last run on mw2247 is CRITICAL: CRITICAL: Puppet has 1 failures [14:50:18] Jul 14 14:44:39 mw2105 puppet-agent[757]: (/Stage[main]/Admin/Admin::Groupmembers[deployment]/Exec[deployment_ensure_members]/returns) gpasswd: user 'bsimmers' does not exist [14:50:18] I'll prepare es2014,15,16 next [14:50:22] what the.. [14:50:34] PROBLEM - puppet last run on mw1298 is CRITICAL: CRITICAL: Puppet has 1 failures [14:50:34] PROBLEM - puppet last run on mw2083 is CRITICAL: CRITICAL: Puppet has 1 failures [14:50:35] PROBLEM - puppet last run on mw2087 is CRITICAL: CRITICAL: Puppet has 1 failures [14:50:45] PROBLEM - puppet last run on mw1241 is CRITICAL: CRITICAL: Puppet has 1 failures [14:50:45] PROBLEM - puppet last run on mw1285 is CRITICAL: CRITICAL: Puppet has 1 failures [14:50:45] PROBLEM - puppet last run on mw2113 is CRITICAL: CRITICAL: Puppet has 1 failures [14:50:46] PROBLEM - puppet last run on mw2117 is CRITICAL: CRITICAL: Puppet has 1 failures [14:50:47] (03PS1) 10Cmjohnson: Adding dhcpd file for mirror1001 [puppet] - 10https://gerrit.wikimedia.org/r/298980 [14:50:55] PROBLEM - puppet last run on mw2176 is CRITICAL: CRITICAL: Puppet has 1 failures [14:50:55] PROBLEM - puppet last run on mw1189 is CRITICAL: CRITICAL: Puppet has 1 failures [14:50:55] PROBLEM - puppet last run on mw1173 is CRITICAL: CRITICAL: Puppet has 1 failures [14:51:06] PROBLEM - puppet last run on mw2096 is CRITICAL: CRITICAL: Puppet has 1 failures [14:51:06] PROBLEM - puppet last run on mw1131 is CRITICAL: CRITICAL: Puppet has 1 failures [14:51:06] PROBLEM - puppet last run on mw1267 is CRITICAL: CRITICAL: Puppet has 1 failures [14:51:15] PROBLEM - puppet last run on mw1153 is CRITICAL: CRITICAL: Puppet has 1 failures [14:51:15] PROBLEM - puppet last run on mw2163 is CRITICAL: CRITICAL: Puppet has 1 failures [14:51:26] PROBLEM - puppet last run on mw1204 is CRITICAL: CRITICAL: Puppet has 1 failures [14:51:26] PROBLEM - puppet last run on mw1253 is CRITICAL: CRITICAL: Puppet has 1 failures [14:51:34] PROBLEM - puppet last run on mw1166 is CRITICAL: CRITICAL: Puppet has 1 failures [14:51:34] PROBLEM - puppet last run on mw2109 is CRITICAL: CRITICAL: Puppet has 1 failures [14:51:35] PROBLEM - puppet last run on mw2123 is CRITICAL: CRITICAL: Puppet has 1 failures [14:51:52] (03PS1) 10Faidon Liambotis: admin: brown paper bag fix for bsimmers [puppet] - 10https://gerrit.wikimedia.org/r/298981 [14:51:54] PROBLEM - puppet last run on mw1194 is CRITICAL: CRITICAL: Puppet has 1 failures [14:51:54] PROBLEM - puppet last run on mw2082 is CRITICAL: CRITICAL: Puppet has 1 failures [14:51:55] PROBLEM - puppet last run on mw2212 is CRITICAL: CRITICAL: Puppet has 1 failures [14:51:55] can't possibly be related to scap [14:51:55] PROBLEM - puppet last run on mw1025 is CRITICAL: CRITICAL: Puppet has 1 failures [14:52:04] PROBLEM - puppet last run on mw1118 is CRITICAL: CRITICAL: Puppet has 1 failures [14:52:05] aude, it is not [14:52:05] PROBLEM - puppet last run on mw1208 is CRITICAL: CRITICAL: Puppet has 1 failures [14:52:14] PROBLEM - puppet last run on mw2079 is CRITICAL: CRITICAL: Puppet has 1 failures [14:52:14] PROBLEM - puppet last run on mw1211 is CRITICAL: CRITICAL: Puppet has 1 failures [14:52:15] PROBLEM - puppet last run on mw1155 is CRITICAL: CRITICAL: Puppet has 1 failures [14:52:15] PROBLEM - puppet last run on mw2131 is CRITICAL: CRITICAL: Puppet has 1 failures [14:52:15] PROBLEM - puppet last run on mw2233 is CRITICAL: CRITICAL: Puppet has 1 failures [14:52:16] (03PS2) 10Faidon Liambotis: admin: brown paper bag fix for bsimmers [puppet] - 10https://gerrit.wikimedia.org/r/298981 [14:52:16] PROBLEM - puppet last run on mw2134 is CRITICAL: CRITICAL: Puppet has 1 failures [14:52:16] PROBLEM - puppet last run on mw2090 is CRITICAL: CRITICAL: Puppet has 1 failures [14:52:23] (03CR) 10Faidon Liambotis: [C: 032 V: 032] admin: brown paper bag fix for bsimmers [puppet] - 10https://gerrit.wikimedia.org/r/298981 (owner: 10Faidon Liambotis) [14:52:34] PROBLEM - puppet last run on mw2143 is CRITICAL: CRITICAL: Puppet has 1 failures [14:52:34] PROBLEM - puppet last run on mw2182 is CRITICAL: CRITICAL: Puppet has 1 failures [14:52:35] PROBLEM - puppet last run on mw1154 is CRITICAL: CRITICAL: Puppet has 1 failures [14:52:35] PROBLEM - puppet last run on mw1021 is CRITICAL: CRITICAL: Puppet has 1 failures [14:52:35] PROBLEM - puppet last run on mw1213 is CRITICAL: CRITICAL: Puppet has 1 failures [14:52:36] PROBLEM - puppet last run on mw2168 is CRITICAL: CRITICAL: Puppet has 1 failures [14:52:36] PROBLEM - puppet last run on mw2093 is CRITICAL: CRITICAL: Puppet has 1 failures [14:52:44] PROBLEM - puppet last run on bast3001 is CRITICAL: CRITICAL: Puppet has 1 failures [14:52:44] PROBLEM - puppet last run on mw1304 is CRITICAL: CRITICAL: Puppet has 1 failures [14:52:44] PROBLEM - puppet last run on mw2092 is CRITICAL: CRITICAL: Puppet has 1 failures [14:52:45] PROBLEM - puppet last run on mw2196 is CRITICAL: CRITICAL: Puppet has 1 failures [14:53:05] btw, i can do swat when scap is done [14:53:49] (03PS3) 10KartikMistry: Beta: Fix cxserver restbase_url [puppet] - 10https://gerrit.wikimedia.org/r/298947 (https://phabricator.wikimedia.org/T129284) [14:54:58] *twiddels thumbs for scap* [14:55:03] :) [14:55:16] :D [14:55:54] (03CR) 10Gehel: [C: 04-1] "I tested this on deployment-elastic05. Saddly, it does not work... No idea why, it does look good!" [puppet] - 10https://gerrit.wikimedia.org/r/295129 (owner: 10Nicko) [14:56:09] !log cache_misc: manually raised default_ttl to 3600 (to match https://gerrit.wikimedia.org/r/#/c/298970/ without restarts) [14:56:14] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [14:57:22] i still see "Could not find local user data" entries in the exception log :( [14:57:31] (03CR) 10Chad: "There's not even dns for this. Why?" [puppet] - 10https://gerrit.wikimedia.org/r/298979 (owner: 10Paladox) [14:58:16] (03CR) 10Paladox: "Oh, I have to set the dns?" [puppet] - 10https://gerrit.wikimedia.org/r/298979 (owner: 10Paladox) [14:59:57] 06Operations, 10ops-eqiad: Rack/Setup Carbon/Apt Server Replacement - https://phabricator.wikimedia.org/T139171#2462280 (10Cmjohnson) Only thing missing at this point is preferred partitioning. Please let me know and I will update and install. [15:00:04] anomie, ostriches, thcipriani, hashar, twentyafterfour, and aude: Dear anthropoid, the time has come. Please deploy Morning SWAT (Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20160714T1500). [15:00:04] kart_: A patch you scheduled for Morning SWAT (Max 8 patches) is about to be deployed. Please be available during the process. [15:00:08] (03CR) 10Subramanya Sastry: "ruthenium changes lgtm." [puppet] - 10https://gerrit.wikimedia.org/r/298436 (https://phabricator.wikimedia.org/T90668) (owner: 10Mobrovac) [15:00:20] (03CR) 10Chad: "In the dns repository. But still, why? Nobody's ever used this and no docs mention it. It just adds extra complexity." [puppet] - 10https://gerrit.wikimedia.org/r/298979 (owner: 10Paladox) [15:00:57] soon as scap is done, i can swat [15:01:03] \o/ [15:01:17] (03CR) 10Paladox: "Oh, well to make sure that we redirect www. to main domain since sometimes browsers add www. prefix to things." [puppet] - 10https://gerrit.wikimedia.org/r/298979 (owner: 10Paladox) [15:01:25] Who is SWAT'ng? [15:01:30] kart_: i can [15:01:37] aude: thanks! [15:01:37] we have to wait for scap to finish though [15:01:45] hopefully not too much longer [15:01:48] aude: ok. Ping me when done. [15:01:51] ok [15:02:11] aude: we need to test Compact Language Links patch in testhosts first. [15:02:16] yeah [15:02:17] aude: just note :) [15:02:52] RECOVERY - puppet last run on mw2105 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [15:07:47] (03CR) 10Chad: "Get a better browser then :P" [puppet] - 10https://gerrit.wikimedia.org/r/298979 (owner: 10Paladox) [15:07:54] jynus: es2011-es2013 complete [15:08:10] (03CR) 10Paladox: "I use ios 10." [puppet] - 10https://gerrit.wikimedia.org/r/298979 (owner: 10Paladox) [15:08:26] (03PS1) 10Cmjohnson: Updting dns entries for payments1001-1004 mgmt [dns] - 10https://gerrit.wikimedia.org/r/298984 [15:08:49] !log shutting down es2014, es2015, es2016 for hardware maintenance T139714 [15:08:50] T139714: BIOS upgrade on certain codfw machines - https://phabricator.wikimedia.org/T139714 [15:08:53] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [15:08:54] (03PS2) 10Cmjohnson: Adding dhcpd file for mirror1001 [puppet] - 10https://gerrit.wikimedia.org/r/298980 [15:08:57] papaul, ^ [15:09:09] PROBLEM - MariaDB Slave IO: es2 on es1015 is CRITICAL: CRITICAL slave_io_state Slave_IO_Running: No, Errno: 2003, Errmsg: error reconnecting to master repl@es2015.codfw.wmnet:3306 - retry-time: 60 retries: 86400 message: Cant connect to MySQL server on es2015.codfw.wmnet (111 Connection refused) [15:09:40] no problem, no user impact, I forgot about the returning replication [15:09:47] <_joe_> es1015 slave of the codfw one? [15:09:51] yes [15:10:00] replication topology is symetical [15:10:09] but right now it does not replicate anything [15:10:21] 06Operations, 10Traffic: Lower geodns TTLs from 600 to 300 - https://phabricator.wikimedia.org/T140365#2462333 (10BBlack) [15:10:21] (03PS4) 10KartikMistry: Beta: Fix cxserver restbase_url [puppet] - 10https://gerrit.wikimedia.org/r/298947 (https://phabricator.wikimedia.org/T129284) [15:10:54] 06Operations, 10Traffic, 10netops: Set up LVS for current AuthDNS - https://phabricator.wikimedia.org/T101525#1341583 (10BBlack) [15:10:56] 06Operations, 10Traffic: Lower geodns TTLs from 600 to 300 - https://phabricator.wikimedia.org/T140365#2462348 (10BBlack) [15:13:42] RECOVERY - puppet last run on mw2240 is OK: OK: Puppet is currently enabled, last run 3 seconds ago with 0 failures [15:14:42] RECOVERY - puppet last run on mw1294 is OK: OK: Puppet is currently enabled, last run 40 seconds ago with 0 failures [15:15:01] RECOVERY - puppet last run on mw1204 is OK: OK: Puppet is currently enabled, last run 4 seconds ago with 0 failures [15:15:01] RECOVERY - puppet last run on mw1142 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [15:15:21] RECOVERY - puppet last run on mw1143 is OK: OK: Puppet is currently enabled, last run 8 seconds ago with 0 failures [15:15:21] RECOVERY - puppet last run on mw1300 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [15:15:23] RECOVERY - puppet last run on mw2109 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [15:15:23] RECOVERY - puppet last run on mw2247 is OK: OK: Puppet is currently enabled, last run 31 seconds ago with 0 failures [15:15:32] RECOVERY - puppet last run on mw1153 is OK: OK: Puppet is currently enabled, last run 33 seconds ago with 0 failures [15:15:41] RECOVERY - puppet last run on mw1241 is OK: OK: Puppet is currently enabled, last run 27 seconds ago with 0 failures [15:15:41] RECOVERY - puppet last run on mw1285 is OK: OK: Puppet is currently enabled, last run 7 seconds ago with 0 failures [15:16:11] RECOVERY - puppet last run on mw2083 is OK: OK: Puppet is currently enabled, last run 32 seconds ago with 0 failures [15:16:12] RECOVERY - puppet last run on mw2082 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [15:16:12] RECOVERY - puppet last run on mw2114 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [15:16:21] RECOVERY - puppet last run on mw1025 is OK: OK: Puppet is currently enabled, last run 18 seconds ago with 0 failures [15:16:23] RECOVERY - puppet last run on mw2244 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [15:16:23] RECOVERY - puppet last run on mw1205 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [15:16:31] RECOVERY - puppet last run on mw2184 is OK: OK: Puppet is currently enabled, last run 14 seconds ago with 0 failures [15:16:32] RECOVERY - puppet last run on mw2163 is OK: OK: Puppet is currently enabled, last run 21 seconds ago with 0 failures [15:16:42] RECOVERY - puppet last run on mw1021 is OK: OK: Puppet is currently enabled, last run 3 seconds ago with 0 failures [15:16:42] RECOVERY - puppet last run on mw1173 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [15:16:42] RECOVERY - puppet last run on mw1235 is OK: OK: Puppet is currently enabled, last run 15 seconds ago with 0 failures [15:16:42] RECOVERY - puppet last run on mw1298 is OK: OK: Puppet is currently enabled, last run 40 seconds ago with 0 failures [15:16:42] RECOVERY - puppet last run on mw2087 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [15:16:51] RECOVERY - puppet last run on mw1189 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [15:16:51] RECOVERY - puppet last run on mw1267 is OK: OK: Puppet is currently enabled, last run 31 seconds ago with 0 failures [15:16:52] RECOVERY - puppet last run on bast1001 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [15:16:53] RECOVERY - puppet last run on mw1253 is OK: OK: Puppet is currently enabled, last run 56 seconds ago with 0 failures [15:17:02] RECOVERY - puppet last run on mw2113 is OK: OK: Puppet is currently enabled, last run 4 seconds ago with 0 failures [15:17:02] RECOVERY - puppet last run on mw2117 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [15:17:02] RECOVERY - puppet last run on mw1166 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [15:17:22] RECOVERY - puppet last run on mw2070 is OK: OK: Puppet is currently enabled, last run 21 seconds ago with 0 failures [15:17:23] RECOVERY - puppet last run on mw2123 is OK: OK: Puppet is currently enabled, last run 42 seconds ago with 0 failures [15:17:32] RECOVERY - puppet last run on mw1131 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [15:17:41] RECOVERY - puppet last run on mw2096 is OK: OK: Puppet is currently enabled, last run 23 seconds ago with 0 failures [15:17:41] RECOVERY - puppet last run on mw2127 is OK: OK: Puppet is currently enabled, last run 27 seconds ago with 0 failures [15:17:42] RECOVERY - puppet last run on mw2067 is OK: OK: Puppet is currently enabled, last run 19 seconds ago with 0 failures [15:17:43] RECOVERY - puppet last run on bast3001 is OK: OK: Puppet is currently enabled, last run 6 seconds ago with 0 failures [15:17:43] RECOVERY - puppet last run on mw2143 is OK: OK: Puppet is currently enabled, last run 23 seconds ago with 0 failures [15:17:43] RECOVERY - puppet last run on mw2249 is OK: OK: Puppet is currently enabled, last run 20 seconds ago with 0 failures [15:17:52] (03PS3) 10Andrew Bogott: Desigate policy: Allow projectadmins to manipulate domains [puppet] - 10https://gerrit.wikimedia.org/r/298280 [15:17:55] (03PS1) 10Andrew Bogott: Lower default quotas for new Labs projects [puppet] - 10https://gerrit.wikimedia.org/r/298985 (https://phabricator.wikimedia.org/T140158) [15:17:58] (03CR) 10Reedy: "Last time I checked, iOS 10 isn't a browser." [puppet] - 10https://gerrit.wikimedia.org/r/298979 (owner: 10Paladox) [15:18:02] RECOVERY - puppet last run on mw1194 is OK: OK: Puppet is currently enabled, last run 46 seconds ago with 0 failures [15:18:02] RECOVERY - puppet last run on mw1154 is OK: OK: Puppet is currently enabled, last run 28 seconds ago with 0 failures [15:18:02] RECOVERY - puppet last run on mw1211 is OK: OK: Puppet is currently enabled, last run 29 seconds ago with 0 failures [15:18:03] RECOVERY - puppet last run on mw2093 is OK: OK: Puppet is currently enabled, last run 39 seconds ago with 0 failures [15:18:03] RECOVERY - puppet last run on mw2196 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [15:18:04] RECOVERY - puppet last run on mw2212 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [15:18:11] RECOVERY - puppet last run on mw2079 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [15:18:13] RECOVERY - puppet last run on mw1118 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [15:18:22] RECOVERY - puppet last run on mw1137 is OK: OK: Puppet is currently enabled, last run 28 seconds ago with 0 failures [15:18:23] RECOVERY - puppet last run on mw2233 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [15:18:23] RECOVERY - puppet last run on mw2182 is OK: OK: Puppet is currently enabled, last run 28 seconds ago with 0 failures [15:18:26] (03CR) 10Paladox: "But safari is." [puppet] - 10https://gerrit.wikimedia.org/r/298979 (owner: 10Paladox) [15:18:32] RECOVERY - puppet last run on mw2176 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [15:18:43] RECOVERY - puppet last run on mw2131 is OK: OK: Puppet is currently enabled, last run 52 seconds ago with 0 failures [15:18:51] RECOVERY - puppet last run on mw1279 is OK: OK: Puppet is currently enabled, last run 43 seconds ago with 0 failures [15:18:53] RECOVERY - puppet last run on mw2084 is OK: OK: Puppet is currently enabled, last run 36 seconds ago with 0 failures [15:18:53] RECOVERY - puppet last run on mw1206 is OK: OK: Puppet is currently enabled, last run 5 seconds ago with 0 failures [15:19:01] RECOVERY - puppet last run on mw2223 is OK: OK: Puppet is currently enabled, last run 6 seconds ago with 0 failures [15:19:11] RECOVERY - puppet last run on mw2142 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [15:19:12] RECOVERY - puppet last run on mw1213 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [15:19:12] RECOVERY - puppet last run on mw1155 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [15:19:12] RECOVERY - puppet last run on mw2092 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [15:19:12] RECOVERY - puppet last run on mw2134 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [15:19:13] RECOVERY - puppet last run on mw2090 is OK: OK: Puppet is currently enabled, last run 58 seconds ago with 0 failures [15:19:22] RECOVERY - puppet last run on mw2110 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [15:19:22] RECOVERY - puppet last run on snapshot1002 is OK: OK: Puppet is currently enabled, last run 24 seconds ago with 0 failures [15:19:31] RECOVERY - puppet last run on mw1136 is OK: OK: Puppet is currently enabled, last run 12 seconds ago with 0 failures [15:19:31] PROBLEM - puppet last run on cp3010 is CRITICAL: CRITICAL: Puppet has 1 failures [15:19:32] RECOVERY - puppet last run on mw1208 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [15:19:46] (03PS1) 10Giuseppe Lavagetto: puppetmaster::gitclone: actually clone the git repo if an origin is given [puppet] - 10https://gerrit.wikimedia.org/r/298986 [15:19:51] RECOVERY - puppet last run on mw1230 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [15:19:51] RECOVERY - puppet last run on mw1296 is OK: OK: Puppet is currently enabled, last run 32 seconds ago with 0 failures [15:19:52] RECOVERY - puppet last run on mw1295 is OK: OK: Puppet is currently enabled, last run 6 seconds ago with 0 failures [15:19:52] RECOVERY - puppet last run on mw1020 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [15:20:02] RECOVERY - puppet last run on mw2203 is OK: OK: Puppet is currently enabled, last run 14 seconds ago with 0 failures [15:20:02] RECOVERY - puppet last run on mw1304 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [15:20:12] RECOVERY - puppet last run on mw2098 is OK: OK: Puppet is currently enabled, last run 5 seconds ago with 0 failures [15:20:12] RECOVERY - puppet last run on mw1238 is OK: OK: Puppet is currently enabled, last run 46 seconds ago with 0 failures [15:20:12] RECOVERY - puppet last run on mw1179 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [15:20:22] RECOVERY - puppet last run on mw1180 is OK: OK: Puppet is currently enabled, last run 6 seconds ago with 0 failures [15:20:22] RECOVERY - puppet last run on mw2101 is OK: OK: Puppet is currently enabled, last run 54 seconds ago with 0 failures [15:20:22] RECOVERY - puppet last run on mw2130 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [15:20:23] RECOVERY - puppet last run on mw2242 is OK: OK: Puppet is currently enabled, last run 45 seconds ago with 0 failures [15:20:31] RECOVERY - puppet last run on mw2085 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [15:20:32] RECOVERY - puppet last run on mw1274 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [15:20:41] RECOVERY - puppet last run on mw2174 is OK: OK: Puppet is currently enabled, last run 45 seconds ago with 0 failures [15:20:42] RECOVERY - puppet last run on mira is OK: OK: Puppet is currently enabled, last run 25 seconds ago with 0 failures [15:20:42] RECOVERY - puppet last run on mw1278 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [15:20:42] RECOVERY - puppet last run on mw1275 is OK: OK: Puppet is currently enabled, last run 35 seconds ago with 0 failures [15:20:42] RECOVERY - puppet last run on mw2062 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [15:20:43] RECOVERY - puppet last run on mw2111 is OK: OK: Puppet is currently enabled, last run 52 seconds ago with 0 failures [15:20:50] (03CR) 10Cmjohnson: [C: 032] Updting dns entries for payments1001-1004 mgmt [dns] - 10https://gerrit.wikimedia.org/r/298984 (owner: 10Cmjohnson) [15:20:52] RECOVERY - puppet last run on mw2150 is OK: OK: Puppet is currently enabled, last run 25 seconds ago with 0 failures [15:20:53] RECOVERY - puppet last run on mw2172 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [15:20:58] 06Operations, 06Analytics-Kanban, 10Traffic, 13Patch-For-Review: Verify why varnishkafka stats and webrequest logs count differs - https://phabricator.wikimedia.org/T136314#2462364 (10elukey) Seems working! After 11 UTC no more empty dt fields. ``` ADD JAR /usr/lib/hive-hcatalog/share/hcatalog/hive-hcatal... [15:21:01] RECOVERY - puppet last run on mw2094 is OK: OK: Puppet is currently enabled, last run 41 seconds ago with 0 failures [15:21:07] (03CR) 10jenkins-bot: [V: 04-1] puppetmaster::gitclone: actually clone the git repo if an origin is given [puppet] - 10https://gerrit.wikimedia.org/r/298986 (owner: 10Giuseppe Lavagetto) [15:21:09] (03CR) 10Cmjohnson: [C: 032] Adding dhcpd file for mirror1001 [puppet] - 10https://gerrit.wikimedia.org/r/298980 (owner: 10Cmjohnson) [15:21:11] RECOVERY - puppet last run on mw1169 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [15:21:12] RECOVERY - puppet last run on mw2200 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [15:21:13] RECOVERY - puppet last run on snapshot1005 is OK: OK: Puppet is currently enabled, last run 44 seconds ago with 0 failures [15:21:31] RECOVERY - puppet last run on wasat is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [15:21:32] RECOVERY - puppet last run on mw2107 is OK: OK: Puppet is currently enabled, last run 52 seconds ago with 0 failures [15:21:32] RECOVERY - puppet last run on mw2168 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [15:21:32] RECOVERY - puppet last run on mw1132 is OK: OK: Puppet is currently enabled, last run 32 seconds ago with 0 failures [15:21:52] RECOVERY - puppet last run on mw2152 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [15:21:52] RECOVERY - puppet last run on mw2237 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [15:21:53] RECOVERY - puppet last run on mw1233 is OK: OK: Puppet is currently enabled, last run 26 seconds ago with 0 failures [15:22:01] RECOVERY - puppet last run on mw1245 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [15:22:11] RECOVERY - puppet last run on mw1116 is OK: OK: Puppet is currently enabled, last run 26 seconds ago with 0 failures [15:22:12] RECOVERY - puppet last run on mw1178 is OK: OK: Puppet is currently enabled, last run 8 seconds ago with 0 failures [15:22:12] RECOVERY - puppet last run on mw1181 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [15:22:12] RECOVERY - puppet last run on mw2106 is OK: OK: Puppet is currently enabled, last run 15 seconds ago with 0 failures [15:22:12] RECOVERY - puppet last run on mw1240 is OK: OK: Puppet is currently enabled, last run 5 seconds ago with 0 failures [15:22:12] RECOVERY - puppet last run on mw2202 is OK: OK: Puppet is currently enabled, last run 54 seconds ago with 0 failures [15:22:21] RECOVERY - puppet last run on mw1138 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [15:22:22] RECOVERY - puppet last run on mw1218 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [15:22:22] RECOVERY - puppet last run on mw1297 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [15:22:41] RECOVERY - puppet last run on mw2133 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [15:22:42] RECOVERY - puppet last run on mw1134 is OK: OK: Puppet is currently enabled, last run 19 seconds ago with 0 failures [15:22:42] RECOVERY - puppet last run on mw1198 is OK: OK: Puppet is currently enabled, last run 49 seconds ago with 0 failures [15:22:42] RECOVERY - puppet last run on mw1256 is OK: OK: Puppet is currently enabled, last run 40 seconds ago with 0 failures [15:22:43] RECOVERY - puppet last run on mw1270 is OK: OK: Puppet is currently enabled, last run 27 seconds ago with 0 failures [15:22:43] RECOVERY - puppet last run on mw1165 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [15:23:13] RECOVERY - puppet last run on mw2231 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [15:23:21] RECOVERY - puppet last run on mw2167 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [15:23:22] RECOVERY - puppet last run on mw2193 is OK: OK: Puppet is currently enabled, last run 37 seconds ago with 0 failures [15:23:41] RECOVERY - puppet last run on mw2219 is OK: OK: Puppet is currently enabled, last run 28 seconds ago with 0 failures [15:23:42] RECOVERY - puppet last run on mw1022 is OK: OK: Puppet is currently enabled, last run 13 seconds ago with 0 failures [15:23:51] RECOVERY - puppet last run on mw1200 is OK: OK: Puppet is currently enabled, last run 47 seconds ago with 0 failures [15:23:52] RECOVERY - puppet last run on mw2186 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [15:23:52] RECOVERY - puppet last run on mw1262 is OK: OK: Puppet is currently enabled, last run 28 seconds ago with 0 failures [15:24:02] RECOVERY - puppet last run on mw2209 is OK: OK: Puppet is currently enabled, last run 35 seconds ago with 0 failures [15:24:02] RECOVERY - puppet last run on mw2068 is OK: OK: Puppet is currently enabled, last run 53 seconds ago with 0 failures [15:24:04] aude: did scap finish? [15:24:13] RECOVERY - puppet last run on mw2248 is OK: OK: Puppet is currently enabled, last run 10 seconds ago with 0 failures [15:24:22] RECOVERY - puppet last run on mw2091 is OK: OK: Puppet is currently enabled, last run 54 seconds ago with 0 failures [15:24:41] RECOVERY - puppet last run on mw2144 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [15:24:42] RECOVERY - puppet last run on mw1174 is OK: OK: Puppet is currently enabled, last run 45 seconds ago with 0 failures [15:24:42] RECOVERY - puppet last run on mw1219 is OK: OK: Puppet is currently enabled, last run 18 seconds ago with 0 failures [15:24:43] RECOVERY - puppet last run on mw1145 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [15:24:51] RECOVERY - puppet last run on mw1141 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [15:25:36] aude: waiting :) [15:25:52] RECOVERY - puppet last run on mw2076 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [15:25:52] RECOVERY - puppet last run on mw2141 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [15:26:01] RECOVERY - puppet last run on mw1185 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [15:26:21] (03CR) 10Chad: [C: 04-1] "Regardless of the merits, there's a few inline things that need fixing." (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/298979 (owner: 10Paladox) [15:26:32] RECOVERY - puppet last run on mw2069 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [15:26:47] jynus: es2014-es2016 complete [15:27:12] thank you, last batch is a bit more complex to shutdown (to avoid pages) [15:27:16] give 3 minutes [15:27:49] scap is almost done [15:27:49] 06Operations, 10ops-codfw, 10DBA: BIOS upgrade on certain codfw machines - https://phabricator.wikimedia.org/T139714#2462375 (10Papaul) [15:28:01] jynus: ok [15:28:33] (03PS2) 10Giuseppe Lavagetto: puppetmaster::gitclone: actually clone the git repo if an origin is given [puppet] - 10https://gerrit.wikimedia.org/r/298986 [15:29:00] super speedy scap! :D [15:29:42] heh [15:29:48] (03CR) 10jenkins-bot: [V: 04-1] puppetmaster::gitclone: actually clone the git repo if an origin is given [puppet] - 10https://gerrit.wikimedia.org/r/298986 (owner: 10Giuseppe Lavagetto) [15:30:17] (03Abandoned) 10Mholloway: Stop restricting carrier tagging to mobile subdomains [puppet] - 10https://gerrit.wikimedia.org/r/293887 (owner: 10Mholloway) [15:31:10] (03PS3) 10Giuseppe Lavagetto: puppetmaster::gitclone: actually clone the git repo if an origin is given [puppet] - 10https://gerrit.wikimedia.org/r/298986 [15:31:31] (03PS14) 10Paladox: Add missing roottree, file configs to gerrit.config.erb [puppet] - 10https://gerrit.wikimedia.org/r/298710 [15:31:32] addshore: well it takes a while to rsync 6 complete MW branches across 400+ servers ;) [15:31:49] with i18n stuff [15:32:03] bd808: indeed! I havn't ever looked into scap before and seen all of the things that it does (currently doing so now) [15:32:20] someday we'll have scap3 for MW and I think it will get a lot faster for the worst case (and a bit slower for the best case) [15:32:34] (03CR) 10Paladox: "Fixed syntax error." [puppet] - 10https://gerrit.wikimedia.org/r/298710 (owner: 10Paladox) [15:32:40] it does *all* the things :) [15:32:53] aude: yeh, that's what I have always had in my head :D [15:35:07] !log aude@tin Finished scap: Update i18n for RevisionSlider (duration: 46m 58s) [15:35:09] (03CR) 10Giuseppe Lavagetto: [C: 032] puppetmaster::gitclone: actually clone the git repo if an origin is given [puppet] - 10https://gerrit.wikimedia.org/r/298986 (owner: 10Giuseppe Lavagetto) [15:35:12] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [15:35:23] there ^ [15:35:32] aude: looks good to me! [15:35:50] i don't see i18n in the javascripts [15:36:06] mdholloway: re: the abandoned carrier tagging patch: if you look at the current tag_carrier (zero) VCL, we've now reduced it to the point where it's just copying existing headers around that already exist for all clusters/wikis, mobile or not. [15:36:06] (03PS5) 10Paladox: Redirect www. to non www. gerrit domain [puppet] - 10https://gerrit.wikimedia.org/r/298979 [15:36:10] oooh, what? *looks* [15:36:13] do you? [15:36:16] mdholloway: https://github.com/wikimedia/operations-puppet/blob/production/templates/varnish/zero.inc.vcl.erb [15:36:35] no, I don't either [15:36:41] i :( [15:36:44] but there is i18n on special:verison [15:37:00] could be these are not passed in the resource loader definitions [15:37:03] mdholloway: so this could also be approached from a different angle: get MFE/Zero to look at the global headers that are set everywhere rather than the X-CS/X-F-B headers set in zero.inc.vcl, and then deprecate remove tag_carrier's code completely afterwards. [15:37:04] PROBLEM - puppet last run on rhodium is CRITICAL: CRITICAL: puppet fail [15:37:07] yeh, all php based i18n is working, but JS is not [15:37:11] * aude needs to move on to swat, but we can investigate [15:37:21] kart_: ready in 2 minutes [15:37:23] aude: yeh, the JS i18n doesn't need to be sorted now [15:37:29] aude: here. [15:38:18] mdholloway: the only part that's not simple copying is the conditional override to X-CS:ON in cases other than if (req.url ~ "(action=zeroconfig|:ZeroRatedMobileAccess)($|&|\?)" || req.http.host ~ "^(zero|m)\.") { [15:38:18] (03CR) 10Nikerabbit: [C: 031] "Looks okay but I didn't test." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/298930 (https://phabricator.wikimedia.org/T129284) (owner: 10KartikMistry) [15:38:37] (03PS1) 10Giuseppe Lavagetto: puppetmaster::gitprivate: remove unneeded dependency [puppet] - 10https://gerrit.wikimedia.org/r/298988 [15:38:41] mdholloway: but we could fix that another way, by changing the Vary output of those affected URL paths [15:39:44] (03CR) 10jenkins-bot: [V: 04-1] puppetmaster::gitprivate: remove unneeded dependency [puppet] - 10https://gerrit.wikimedia.org/r/298988 (owner: 10Giuseppe Lavagetto) [15:39:53] (03CR) 10Aude: [C: 032] Beta: Remove restbase_url for ContentTranslation [mediawiki-config] - 10https://gerrit.wikimedia.org/r/298930 (https://phabricator.wikimedia.org/T129284) (owner: 10KartikMistry) [15:40:06] (03CR) 10Aude: [C: 032] Deploy Compact Language Links as default (Stage 4.5) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/298735 (https://phabricator.wikimedia.org/T136677) (owner: 10KartikMistry) [15:40:14] (03PS3) 10Aude: Deploy Compact Language Links as default (Stage 4.5) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/298735 (https://phabricator.wikimedia.org/T136677) (owner: 10KartikMistry) [15:40:18] ah, they need to be rebased [15:40:26] (03PS5) 10Aude: Beta: Remove restbase_url for ContentTranslation [mediawiki-config] - 10https://gerrit.wikimedia.org/r/298930 (https://phabricator.wikimedia.org/T129284) (owner: 10KartikMistry) [15:40:29] (03PS6) 10Paladox: Redirect www. to non www. gerrit domain [puppet] - 10https://gerrit.wikimedia.org/r/298979 [15:40:45] !log shutting down es2018, pc2004, es2005 for hardware maintenance T139714 [15:40:46] T139714: BIOS upgrade on certain codfw machines - https://phabricator.wikimedia.org/T139714 [15:40:49] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [15:40:53] (03CR) 10Paladox: Redirect www. to non www. gerrit domain (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/298979 (owner: 10Paladox) [15:40:55] papaul, ^ [15:41:07] (03PS2) 10Giuseppe Lavagetto: puppetmaster::gitprivate: remove unneeded dependency [puppet] - 10https://gerrit.wikimedia.org/r/298988 [15:41:11] (03PS5) 10KartikMistry: Beta: Fix cxserver restbase_url [puppet] - 10https://gerrit.wikimedia.org/r/298947 (https://phabricator.wikimedia.org/T129284) [15:42:12] (03CR) 10Aude: [C: 032] Beta: Remove restbase_url for ContentTranslation [mediawiki-config] - 10https://gerrit.wikimedia.org/r/298930 (https://phabricator.wikimedia.org/T129284) (owner: 10KartikMistry) [15:42:35] bblack: interesting, thanks -- i'll give that some thought and maybe kick it around with dr0ptp4kt next time we chat [15:42:57] jynus: ok [15:43:33] (03Merged) 10jenkins-bot: Beta: Remove restbase_url for ContentTranslation [mediawiki-config] - 10https://gerrit.wikimedia.org/r/298930 (https://phabricator.wikimedia.org/T129284) (owner: 10KartikMistry) [15:43:45] (03CR) 10Giuseppe Lavagetto: [C: 032] puppetmaster::gitprivate: remove unneeded dependency [puppet] - 10https://gerrit.wikimedia.org/r/298988 (owner: 10Giuseppe Lavagetto) [15:44:10] (03PS4) 10Aude: Deploy Compact Language Links as default (Stage 4.5) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/298735 (https://phabricator.wikimedia.org/T136677) (owner: 10KartikMistry) [15:44:16] (03CR) 10Aude: [C: 032] Deploy Compact Language Links as default (Stage 4.5) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/298735 (https://phabricator.wikimedia.org/T136677) (owner: 10KartikMistry) [15:45:22] aude: ping me for test host test. [15:45:25] (03Merged) 10jenkins-bot: Deploy Compact Language Links as default (Stage 4.5) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/298735 (https://phabricator.wikimedia.org/T136677) (owner: 10KartikMistry) [15:45:34] k [15:46:17] kart_: please check [15:46:57] aude: mw1017? [15:47:04] mw1099 [15:47:27] we're supposed to use that for swat now [15:47:41] OK. Checking. [15:47:44] (03PS1) 10Giuseppe Lavagetto: puppetmaster::gitclone: brown paper bag fix [puppet] - 10https://gerrit.wikimedia.org/r/298989 [15:48:37] (03CR) 10Giuseppe Lavagetto: [C: 032 V: 032] puppetmaster::gitclone: brown paper bag fix [puppet] - 10https://gerrit.wikimedia.org/r/298989 (owner: 10Giuseppe Lavagetto) [15:48:45] aude: all good. Go ahead! [15:49:11] ok [15:50:35] !log aude@tin Synchronized dblists/clldefault.dblist: Enable compact language lists on more wikis (duration: 00m 51s) [15:50:40] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [15:50:54] RECOVERY - puppet last run on rhodium is OK: OK: Puppet is currently enabled, last run 8 seconds ago with 0 failures [15:51:09] kart_: ^ [15:51:39] aude: thanks. retesting. [15:52:05] !log aude@tin Synchronized wmf-config/CommonSettings-labs.php: (no message) (duration: 00m 32s) [15:52:09] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [15:54:14] PROBLEM - dhclient process on ganeti1004 is CRITICAL: Connection refused by host [15:54:24] aude: looks good. Thanks. [15:54:24] PROBLEM - Disk space on ganeti1004 is CRITICAL: Connection refused by host [15:54:25] PROBLEM - NTP on ganeti1004 is CRITICAL: NTP CRITICAL: No response from NTP server [15:54:25] PROBLEM - ganeti-confd running on ganeti1004 is CRITICAL: Connection refused by host [15:54:45] PROBLEM - ganeti-mond running on ganeti1004 is CRITICAL: Connection refused by host [15:55:12] kart_: great [15:55:15] PROBLEM - ganeti-noded running on ganeti1004 is CRITICAL: Connection refused by host [15:55:15] swat is done:) [15:55:18] aude: https://gerrit.wikimedia.org/r/#/c/298930/ is pending, right? [15:55:24] PROBLEM - MD RAID on ganeti1004 is CRITICAL: Connection refused by host [15:55:24] PROBLEM - DPKG on ganeti1004 is CRITICAL: Connection refused by host [15:55:24] PROBLEM - Check size of conntrack table on ganeti1004 is CRITICAL: Connection refused by host [15:55:24] PROBLEM - puppet last run on ganeti1004 is CRITICAL: Connection refused by host [15:55:26] aude: sync'ed it? [15:55:34] yeah [15:55:44] jynus: pc2004, pc2005, es2018 coomplete [15:55:51] it will appear on beta automatically, but sync'ed it anyway [15:55:53] thank you very much!!!! [15:55:55] PROBLEM - configured eth on ganeti1004 is CRITICAL: Connection refused by host [15:55:56] aude: OK. Can't really test it though :) [15:55:59] 06Operations, 10ops-codfw, 10DBA: BIOS upgrade on certain codfw machines - https://phabricator.wikimedia.org/T139714#2462444 (10Papaul) [15:56:01] ok [15:56:05] PROBLEM - salt-minion processes on ganeti1004 is CRITICAL: Connection refused by host [15:56:18] i suppose check later that nothing is broken [15:56:25] aude: okay [15:56:52] jynus: yw [15:57:15] let's hope no more crashes :-( [16:00:05] godog, moritzm, and _joe_: Dear anthropoid, the time has come. Please deploy Puppet SWAT(Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20160714T1600). [16:01:44] (03PS1) 10Giuseppe Lavagetto: puppetmaster::gitclone: tighten directory permissions [puppet] - 10https://gerrit.wikimedia.org/r/298994 [16:04:04] (03CR) 10Giuseppe Lavagetto: [C: 032] puppetmaster::gitclone: tighten directory permissions [puppet] - 10https://gerrit.wikimedia.org/r/298994 (owner: 10Giuseppe Lavagetto) [16:05:55] 06Operations, 10Fundraising-Backlog, 10fundraising-tech-ops, 13Patch-For-Review: Allow Fundraising to A/B test wikipedia.org as send domain - https://phabricator.wikimedia.org/T135410#2462453 (10Jgreen) Distilling the discussion to a proposed config, here's the DKIM record I think we would add to the wikip... [16:09:53] PROBLEM - puppet last run on ms-be1027 is CRITICAL: CRITICAL: Puppet has 2 failures [16:13:42] 06Operations, 10ops-eqiad: ms-be1021.eqiad.wmnet: slot=1I:1:2 dev=sdh failed - https://phabricator.wikimedia.org/T139767#2462473 (10fgiunchedi) [16:14:33] PROBLEM - puppet last run on palladium is CRITICAL: CRITICAL: puppet fail [16:15:23] RECOVERY - puppet last run on cp3010 is OK: OK: Puppet is currently enabled, last run 46 seconds ago with 0 failures [16:15:44] ACKNOWLEDGEMENT - puppet last run on ms-be1027 is CRITICAL: CRITICAL: Puppet has 2 failures Filippo Giunchedi likely hw problems [16:16:48] (03CR) 10Hashar: [C: 031] "Aaron / Csteipp has been added back in a time when we ran tests on prod machines ( https://gerrit.wikimedia.org/r/#/c/120596/ ). That was" [puppet] - 10https://gerrit.wikimedia.org/r/298832 (owner: 10Chad) [16:20:22] (03PS1) 10Giuseppe Lavagetto: puppetmaster::gitclone: fix dependencies for git master [puppet] - 10https://gerrit.wikimedia.org/r/298999 [16:21:10] (03PS2) 10Chad: WIP: Gerrit: Swap lead to point at production data [puppet] - 10https://gerrit.wikimedia.org/r/298673 [16:21:14] (03PS2) 10Giuseppe Lavagetto: puppetmaster::gitclone: fix dependencies for git master [puppet] - 10https://gerrit.wikimedia.org/r/298999 [16:22:01] (03PS1) 10Mobrovac: service::node: Output std out/err to a file [puppet] - 10https://gerrit.wikimedia.org/r/299000 (https://phabricator.wikimedia.org/T137878) [16:22:24] PROBLEM - puppet last run on mw2101 is CRITICAL: CRITICAL: puppet fail [16:22:50] <_joe_> dear god jenkins [16:22:52] <_joe_> get a grip [16:23:52] ? [16:23:55] 06Operations, 10ops-eqiad: ms-be1021.eqiad.wmnet: slot=1I:1:2 dev=sdh failed - https://phabricator.wikimedia.org/T139767#2441989 (10Cmjohnson) case opened ase ID: 5310374258 Case title: Failed Hard Drive Severity 2-Critical Degraded Customer tracking number: be1021 Product serial number: MXQ54101my P... [16:24:05] <_joe_> it's being super-slow for small changes [16:24:11] <_joe_> and then fast for large ones [16:24:24] <_joe_> it's clearly in a bad mood [16:24:28] gotta keep you guessing! [16:25:31] (03CR) 10jenkins-bot: [V: 04-1] service::node: Output std out/err to a file [puppet] - 10https://gerrit.wikimedia.org/r/299000 (https://phabricator.wikimedia.org/T137878) (owner: 10Mobrovac) [16:25:33] (03CR) 10Giuseppe Lavagetto: [C: 032] puppetmaster::gitclone: fix dependencies for git master [puppet] - 10https://gerrit.wikimedia.org/r/298999 (owner: 10Giuseppe Lavagetto) [16:25:39] _joe_: the devil's in the detail, so jenkins pays special attention to small changesets [16:27:17] 06Operations, 10ops-eqiad: Remove and destroy disks from old payments boxes decom server - https://phabricator.wikimedia.org/T140370#2462535 (10Cmjohnson) [16:27:32] 06Operations, 10ops-codfw, 10DBA: BIOS upgrade on certain codfw machines - https://phabricator.wikimedia.org/T139714#2462549 (10jcrespo) I will check the logs and then close this if I do not see anything strange. We will reopen this or one of its child tasks if a crash happens. [16:28:27] 06Operations, 10ops-eqiad, 10fundraising-tech-ops: Rack and setup payments1005-8 - https://phabricator.wikimedia.org/T136881#2462552 (10Cmjohnson) 05Open>03Resolved These servers are racked and setup...Jeff opted rename them the original payments1001-1004. Updated the server mgmt ip's and removed new na... [16:28:28] (03PS1) 10Filippo Giunchedi: admin: add test for absented users not in 'absented' group [puppet] - 10https://gerrit.wikimedia.org/r/299003 [16:29:14] (03PS2) 10Mobrovac: service::node: Output std out/err to a file [puppet] - 10https://gerrit.wikimedia.org/r/299000 (https://phabricator.wikimedia.org/T137878) [16:29:36] (03CR) 10jenkins-bot: [V: 04-1] admin: add test for absented users not in 'absented' group [puppet] - 10https://gerrit.wikimedia.org/r/299003 (owner: 10Filippo Giunchedi) [16:30:53] RECOVERY - puppet last run on palladium is OK: OK: Puppet is currently enabled, last run 6 seconds ago with 0 failures [16:31:45] (03PS4) 10Andrew Bogott: Desigate policy: Allow projectadmins to manipulate domains [puppet] - 10https://gerrit.wikimedia.org/r/298280 [16:32:20] (03PS2) 10Filippo Giunchedi: admin: add test for absented users not in 'absented' group [puppet] - 10https://gerrit.wikimedia.org/r/299003 [16:32:36] 06Operations, 10ops-eqiad: Remove and destroy disks from old payments boxes decom server - https://phabricator.wikimedia.org/T140370#2462560 (10Cmjohnson) p:05Triage>03Low [16:35:07] (03CR) 10Andrew Bogott: [C: 032] Desigate policy: Allow projectadmins to manipulate domains [puppet] - 10https://gerrit.wikimedia.org/r/298280 (owner: 10Andrew Bogott) [16:37:44] (03PS1) 10Giuseppe Lavagetto: puppetmaster::gitclone: fix ownership of post-receive hook [puppet] - 10https://gerrit.wikimedia.org/r/299005 [16:37:51] <_joe_> Desigate? [16:37:58] (03CR) 10Chad: "Compiles, makes the changes I wanted: https://puppet-compiler.wmflabs.org/3337/" [puppet] - 10https://gerrit.wikimedia.org/r/298673 (owner: 10Chad) [16:38:45] (03PS2) 10Giuseppe Lavagetto: puppetmaster::gitclone: fix ownership of post-receive hook [puppet] - 10https://gerrit.wikimedia.org/r/299005 [16:38:57] (03CR) 10Giuseppe Lavagetto: [C: 032 V: 032] puppetmaster::gitclone: fix ownership of post-receive hook [puppet] - 10https://gerrit.wikimedia.org/r/299005 (owner: 10Giuseppe Lavagetto) [16:39:04] (03PS1) 10ArielGlenn: script to generate lists of db hosts by shard and/or dc [software] - 10https://gerrit.wikimedia.org/r/299006 (https://phabricator.wikimedia.org/T104459) [16:40:11] (03CR) 10Mobrovac: "PCC looking good - https://puppet-compiler.wmflabs.org/3338/" [puppet] - 10https://gerrit.wikimedia.org/r/299000 (https://phabricator.wikimedia.org/T137878) (owner: 10Mobrovac) [16:40:53] PROBLEM - MD RAID on ms-be1027 is CRITICAL: CRITICAL: Active: 11, Working: 11, Failed: 1, Spare: 0 [16:41:43] PROBLEM - HP RAID on ms-be1027 is CRITICAL: CRITICAL: Slot 3: Failed: 2I:4:1, 2I:4:2 - OK: 1I:1:5, 1I:1:6, 1I:1:7, 1I:1:8, 1I:1:1, 1I:1:2, 1I:1:3, 1I:1:4, 2I:2:1, 2I:2:2, 2I:2:3, 2I:2:4, Controller, Battery/Capacitor [16:45:08] 06Operations, 10ops-eqiad, 10DBA: dbstore1002 disk errors - https://phabricator.wikimedia.org/T140337#2462602 (10Cmjohnson) I submitted a work order for a new disk Congratulations: Work Order SR932776781 was successfully submitted. [16:45:49] 06Operations, 10ops-eqiad, 10hardware-requests: decommission WMF3155-WMF3175 (old lsearchd) - https://phabricator.wikimedia.org/T140372#2462603 (10RobH) [16:46:04] (03PS1) 10Chad: WIP: Gerrit: Swap DNS to new host, lead [dns] - 10https://gerrit.wikimedia.org/r/299007 [16:48:23] RECOVERY - puppet last run on mw2101 is OK: OK: Puppet is currently enabled, last run 16 seconds ago with 0 failures [16:49:29] 06Operations, 10hardware-requests, 10Continuous-Integration-Infrastructure (phase-out-gallium): Allocate contint1001 to releng and reimage it in labs-support-network - https://phabricator.wikimedia.org/T140257#2462643 (10chasemp) @faidon and I chatted about this for a few minutes today. We talked about prim... [16:50:50] (03PS2) 10BBlack: Remove stream-lb.eqiad hostname [dns] - 10https://gerrit.wikimedia.org/r/298530 (https://phabricator.wikimedia.org/T134871) [16:51:21] 06Operations, 10ops-eqiad, 10media-storage: diagnose failed disks on ms-be1027 - https://phabricator.wikimedia.org/T140374#2462651 (10fgiunchedi) [16:51:24] (03PS2) 10Andrew Bogott: Lower default quotas for new Labs projects [puppet] - 10https://gerrit.wikimedia.org/r/298985 (https://phabricator.wikimedia.org/T140158) [16:52:59] (03CR) 10Rush: [C: 031] Lower default quotas for new Labs projects [puppet] - 10https://gerrit.wikimedia.org/r/298985 (https://phabricator.wikimedia.org/T140158) (owner: 10Andrew Bogott) [16:53:25] (03CR) 10Andrew Bogott: [C: 032] Lower default quotas for new Labs projects [puppet] - 10https://gerrit.wikimedia.org/r/298985 (https://phabricator.wikimedia.org/T140158) (owner: 10Andrew Bogott) [16:58:30] (03PS3) 10Rush: admin: add test for absented users not in 'absented' group [puppet] - 10https://gerrit.wikimedia.org/r/299003 (owner: 10Filippo Giunchedi) [16:58:51] (03CR) 10Mforns: [C: 04-1] "There are still a couple schemas that need its purging strategy defined. Please, wait until then to merge. ETA: couple days." [puppet] - 10https://gerrit.wikimedia.org/r/298721 (https://phabricator.wikimedia.org/T108850) (owner: 10Mforns) [16:58:53] godog, chasemp: hey, thanks :) [16:59:03] (03CR) 10Rush: [C: 031] "this makes sense in that totally absented users should only be in the meta group to apply their absenting from their user stanza. thanks" [puppet] - 10https://gerrit.wikimedia.org/r/299003 (owner: 10Filippo Giunchedi) [16:59:29] for writing a test case for the mess I created :) [16:59:41] godog, chasemp: also while at it, you might want to review https://gerrit.wikimedia.org/r/#/c/298976/ [17:00:04] (03PS3) 10Giuseppe Lavagetto: puppetmaster: puppetize private post-commit hook [puppet] - 10https://gerrit.wikimedia.org/r/298958 (https://phabricator.wikimedia.org/T98173) [17:00:04] yurik, gwicke, cscott, arlolra, and subbu: Respected human, time to deploy Services – Graphoid / Parsoid / OCG / Citoid (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20160714T1700). Please do the needful. [17:00:15] no parsoid deploy [17:09:30] (03CR) 10Smalyshev: Move updater logs config to /etc/wdqs (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/298880 (https://phabricator.wikimedia.org/T139434) (owner: 10Smalyshev) [17:09:41] (03PS4) 10Smalyshev: Move updater logs config to /etc/wdqs [puppet] - 10https://gerrit.wikimedia.org/r/298880 (https://phabricator.wikimedia.org/T139434) [17:10:33] paravoid: no worries! ok I've added myself to the review will likely take a closer look tomorrow [17:13:19] http://www.commitstrip.com/en/2015/11/10/coder-epitaphs/ [17:21:43] PROBLEM - mobileapps endpoints health on scb1002 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [17:21:43] PROBLEM - mobileapps endpoints health on scb1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [17:23:12] 06Operations, 10hardware-requests, 10Continuous-Integration-Infrastructure (phase-out-gallium): Allocate contint1001 to releng and allocate to a vlan - https://phabricator.wikimedia.org/T140257#2462877 (10chasemp) [17:23:42] RECOVERY - mobileapps endpoints health on scb1002 is OK: All endpoints are healthy [17:23:42] RECOVERY - mobileapps endpoints health on scb1001 is OK: All endpoints are healthy [17:24:52] mobrovac: ^ re: the flapping, not sure if you read the backscroll in -services but i did a deployment yesterday but bearND correctly pointed out that the siteinfo call we thought was the culprit hadn't actually been deployed yet [17:26:06] (03PS6) 10Faidon Liambotis: labstore: un-hieraize elevator/ioscheduler boot-setting [puppet] - 10https://gerrit.wikimedia.org/r/296731 [17:26:08] (03PS6) 10Faidon Liambotis: cache: un-hieraize tcpmhash_entries boot setting [puppet] - 10https://gerrit.wikimedia.org/r/296730 [17:26:10] (03PS6) 10Faidon Liambotis: Create a new grub module [puppet] - 10https://gerrit.wikimedia.org/r/296729 [17:26:12] (03PS6) 10Faidon Liambotis: mediawiki: un-hieraize cgroup_enable boot-settings [puppet] - 10https://gerrit.wikimedia.org/r/296732 [17:27:55] (actually, i just built the deploy repo, bearND deployed it) [17:28:00] (03PS7) 10Faidon Liambotis: Create a new grub module [puppet] - 10https://gerrit.wikimedia.org/r/296729 [17:28:28] (03CR) 10Faidon Liambotis: [C: 032 V: 032] Create a new grub module [puppet] - 10https://gerrit.wikimedia.org/r/296729 (owner: 10Faidon Liambotis) [17:32:26] (03PS7) 10Faidon Liambotis: labstore: un-hieraize elevator/ioscheduler boot-setting [puppet] - 10https://gerrit.wikimedia.org/r/296731 [17:32:28] (03PS7) 10Faidon Liambotis: cache: un-hieraize tcpmhash_entries boot setting [puppet] - 10https://gerrit.wikimedia.org/r/296730 [17:32:30] (03PS7) 10Faidon Liambotis: mediawiki: un-hieraize cgroup_enable boot-settings [puppet] - 10https://gerrit.wikimedia.org/r/296732 [17:32:45] (03CR) 10Faidon Liambotis: [C: 032 V: 032] labstore: un-hieraize elevator/ioscheduler boot-setting [puppet] - 10https://gerrit.wikimedia.org/r/296731 (owner: 10Faidon Liambotis) [17:33:37] PROBLEM - puppet last run on mc2013 is CRITICAL: CRITICAL: puppet fail [17:34:26] PROBLEM - puppet last run on mw2125 is CRITICAL: CRITICAL: puppet fail [17:35:37] RECOVERY - puppet last run on mc2013 is OK: OK: Puppet is currently enabled, last run 21 seconds ago with 0 failures [17:35:54] mutante: hey, please tell when you have some time. I've got a strange issue with yubikey :) [17:36:07] PROBLEM - puppet last run on mw2191 is CRITICAL: CRITICAL: puppet fail [17:36:34] (03CR) 10Faidon Liambotis: [C: 032] cache: un-hieraize tcpmhash_entries boot setting [puppet] - 10https://gerrit.wikimedia.org/r/296730 (owner: 10Faidon Liambotis) [17:38:27] RECOVERY - puppet last run on mw2125 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [17:40:15] (03CR) 10Faidon Liambotis: [C: 032] mediawiki: un-hieraize cgroup_enable boot-settings [puppet] - 10https://gerrit.wikimedia.org/r/296732 (owner: 10Faidon Liambotis) [17:40:59] (03CR) 10BBlack: [C: 032] "Confirmed sniffing DNS requests, no clients appear to be using this name (as expected)" [dns] - 10https://gerrit.wikimedia.org/r/298530 (https://phabricator.wikimedia.org/T134871) (owner: 10BBlack) [17:41:50] (03CR) 10Faidon Liambotis: [C: 04-1] "Same concern (why the rename?)" [puppet] - 10https://gerrit.wikimedia.org/r/298907 (owner: 10Dzahn) [17:48:36] (03PS1) 10Andrew Bogott: Allow projectadmins to query project limits. [puppet] - 10https://gerrit.wikimedia.org/r/299009 [17:48:39] mdholloway: yeah, i had the same thought [17:50:37] (03PS2) 10Andrew Bogott: Allow projectadmins to query project limits. [puppet] - 10https://gerrit.wikimedia.org/r/299009 [17:53:02] (03CR) 10MaxSem: "So far, it still spams on every run:" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/298960 (https://phabricator.wikimedia.org/T138092) (owner: 10Gehel) [17:54:28] (03CR) 10Andrew Bogott: [C: 032] Allow projectadmins to query project limits. [puppet] - 10https://gerrit.wikimedia.org/r/299009 (owner: 10Andrew Bogott) [18:00:58] Amir1: i'm here [18:01:09] oh thanks :) [18:01:29] 06Operations, 10Parsoid, 06Services, 10service-runner, and 2 others: Replace custom server.js with service-runner - https://phabricator.wikimedia.org/T90668#2463203 (10mobrovac) OK, Parsoid in beta should now be ready. @ssastry @Arlolra mind testing it and playing with it to be sure we're on the right trac... [18:01:34] mutante: I can't add or remove ssh agent to my yubikey [18:01:40] https://wikitech.wikimedia.org/wiki/Yubikey-SSH [18:01:44] I followed this [18:01:56] 'ssh-add -s $OPENSC' [18:02:00] doesn't work [18:02:37] RECOVERY - puppet last run on mw2191 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [18:03:02] (03PS2) 10Gehel: WIP - postgresql::user should be idempotent [puppet] - 10https://gerrit.wikimedia.org/r/298960 (https://phabricator.wikimedia.org/T138092) [18:03:02] https://www.irccloud.com/pastebin/HRNlU2yB/ [18:04:37] Amir1: i think there was a bug that caused this.. and reconnecting the key made it work [18:04:45] for others.. but it's been a while [18:05:04] 06Operations, 10Parsoid, 06Services, 10service-runner, and 2 others: Replace custom server.js with service-runner - https://phabricator.wikimedia.org/T90668#2463217 (10ssastry) >>! In T90668#2463203, @mobrovac wrote: > OK, Parsoid in beta should now be ready. @ssastry @Arlolra mind testing it and playing w... [18:05:14] I took it out and put it back several times [18:05:36] asks for passphrase, strange to me [18:05:52] Amir1: are you on OSX? [18:06:07] nope, xenial [18:07:02] it asks for a passphrase but you havent set one? [18:07:10] yeah [18:07:12] you got this from a former user, right [18:07:15] it was not brandnew [18:07:17] Enter passphrase for PKCS#11: [18:07:31] he said, he didn't configured it [18:08:20] maybe he just set the PIN for physical access , but nothing else [18:08:29] https://wikitech.wikimedia.org/wiki/Yubikey-SSH#Securing_physical_access_to_the_YubiKey [18:08:43] i thought it also had a default pin [18:08:47] yeah, that link says so [18:08:53] so even if they didnt set one, you may have to use the default [18:08:55] Now the PIN (6 digits, 123456 is the shipped default PIN): [18:08:56] actually pin was the default so I changed it to something else [18:09:07] "The YubiKey uses default values for PIN, PUK and management key, " [18:09:16] did you see these steps yet? [18:09:19] to change them [18:09:28] pin, PUK and the management key, I changed them all [18:09:42] all were default [18:10:00] I followed each and every step [18:10:13] until "Accessing the key" [18:10:27] ok, so now it asks for a passphrase of the ssh key stored on it [18:10:35] looking at docs [18:10:40] but when I do ssh -I $OPENSC $HOST it doesn't ask for PIN, it asks password [18:10:51] indeed [18:10:52] Enter passphrase for PKCS#11: [18:11:01] is what i get as well in os x, so its misleading from the ssh client point of view [18:11:08] I didn't set any password for the key [18:11:10] when you are actually entering the pin. [18:12:22] when I ran "ssh-keygen -D $OPENSC -e" it didn't asks me to set a password [18:12:42] it just sent out the ssh rsa public key [18:20:35] Amir1: type the PIN [18:20:40] when it asks you for "passphrase" [18:20:48] okay, let me try that [18:22:15] mutante: didn't work :( [18:24:49] yesssss [18:25:14] mutante: it worked now, I needed to run "eval `ssh-agent -s`" and then try PIN [18:25:21] I never tried them both together [18:25:23] :) yay [18:34:03] (03PS3) 10MaxSem: WIP - postgresql::user should be idempotent [puppet] - 10https://gerrit.wikimedia.org/r/298960 (https://phabricator.wikimedia.org/T138092) (owner: 10Gehel) [18:38:07] (03PS2) 10Dzahn: Allow aklapper to delete files in Phabricator [puppet] - 10https://gerrit.wikimedia.org/r/298494 (owner: 10Aklapper) [18:38:15] (03CR) 10Dzahn: [C: 032] Allow aklapper to delete files in Phabricator [puppet] - 10https://gerrit.wikimedia.org/r/298494 (owner: 10Aklapper) [18:47:07] PROBLEM - puppet last run on mw2105 is CRITICAL: CRITICAL: puppet fail [18:47:24] (03PS3) 10Dzahn: admin: add yubikey ssh key for ladsgroup [puppet] - 10https://gerrit.wikimedia.org/r/298130 (owner: 10Ladsgroup) [18:50:08] 06Operations, 10Ops-Access-Requests: Requesting shell access and access to groups 'analytics-privatedata-users' and 'researchers' for Mpany - https://phabricator.wikimedia.org/T140399#2463390 (10Mpany) [18:53:02] 06Operations, 10Traffic, 10Wikimedia-Blog, 07HTTPS: Switch blog to HTTPS-only - https://phabricator.wikimedia.org/T105905#2463436 (10Tbayer) Update (@faidon: I'm still unsure if you are receiving the updates from the internal ticket with Automattic): On April 21, I had asked Automattic's support about the... [18:53:56] (03PS1) 10Yurik: Enable "geoshapes" graphs protocol [mediawiki-config] - 10https://gerrit.wikimedia.org/r/299013 [18:56:04] (03PS1) 10Yurik: Enable "geoshapes" graphs protocol [puppet] - 10https://gerrit.wikimedia.org/r/299014 [18:56:11] gehel, ^ [19:00:04] ostriches: Respected human, time to deploy MediaWiki train (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20160714T1900). Please do the needful. [19:00:26] !log Dropping legacy Cassandra system_auth tables in RESTBase production to complete RBAC conversion : T139639 [19:00:27] T139639: Cassandra 2.2.6 post-upgrade checklist - https://phabricator.wikimedia.org/T139639 [19:00:31] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [19:00:44] 06Operations, 10Traffic, 10Wikimedia-Blog, 07HTTPS: Switch blog to HTTPS-only - https://phabricator.wikimedia.org/T105905#2463476 (10faidon) @Tbayer, I still do. In fact I've been receiving these ticket (or series of tickets?) updates since Jul 24th 2015. They supposedly support HTTPS-by-default since Apri... [19:00:46] jouncebot: what if I don't want to? Hmm? Ever think of that? [19:00:51] So insensitive [19:01:24] (03CR) 10Dzahn: [C: 032] "confirmed via https://people.wikimedia.org/~ladsgroup/" [puppet] - 10https://gerrit.wikimedia.org/r/298130 (owner: 10Ladsgroup) [19:07:52] ostriches, hi, will you have a chance to sync a minor config change later? https://gerrit.wikimedia.org/r/#/c/299013/ [19:08:36] yurik: That looks like it should go in swat? [19:08:57] ostriches, true, but i was hoping to get it done earlier - the swat is at 3am for me :) [19:09:56] (03CR) 10Dzahn: [C: 04-1] "www.gerrit does not exist in DNS and has not been used before, and i dont think we should add it. the http->https redirect is just a catch" [puppet] - 10https://gerrit.wikimedia.org/r/298979 (owner: 10Paladox) [19:10:22] (03Abandoned) 10Paladox: Redirect www. to non www. gerrit domain [puppet] - 10https://gerrit.wikimedia.org/r/298979 (owner: 10Paladox) [19:13:45] 06Operations, 10Traffic, 10Wikimedia-Blog, 07HTTPS: Switch blog to HTTPS-only - https://phabricator.wikimedia.org/T105905#2463578 (10BBlack) Nice! The redirect functionality looks correct. However, the STS header is `strict-transport-security:max-age=31536000`, whereas it should be `strict-transport-secur... [19:15:27] RECOVERY - puppet last run on mw2105 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [19:16:10] (03PS1) 10Chad: Move remaining wikis to wmf.10 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/299021 [19:17:54] (03CR) 10Chad: [C: 032] Move remaining wikis to wmf.10 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/299021 (owner: 10Chad) [19:17:58] (03CR) 10Chad: [C: 032] Enable "geoshapes" graphs protocol [mediawiki-config] - 10https://gerrit.wikimedia.org/r/299013 (owner: 10Yurik) [19:18:34] (03Merged) 10jenkins-bot: Move remaining wikis to wmf.10 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/299021 (owner: 10Chad) [19:19:30] !log demon@tin rebuilt wikiversions.php and synchronized wikiversions files: Move remaining wikis to wmf.10 [19:19:32] (03CR) 10Dzahn: "you might want "ensure_resource('file'," from modules/stdlib/lib/puppet/parser/functions/ensure_resource.rb" [puppet] - 10https://gerrit.wikimedia.org/r/298935 (https://phabricator.wikimedia.org/T140265) (owner: 10Ladsgroup) [19:19:35] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [19:19:41] (03PS2) 10Chad: Enable "geoshapes" graphs protocol [mediawiki-config] - 10https://gerrit.wikimedia.org/r/299013 (owner: 10Yurik) [19:21:54] !log demon@tin Synchronized wmf-config/CommonSettings.php: maps geoshapes stuff for yurik (duration: 00m 31s) [19:21:58] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [19:22:05] ostriches, thanks!!! [19:22:11] testing... [19:22:11] yw [19:22:53] !log demon@tin Synchronized wmf-config/CommonSettings-labs.php: maps geoshapes stuff for yurik (labs file for completeness) (duration: 00m 27s) [19:22:58] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [19:32:23] (03PS3) 10Faidon Liambotis: admin: add an NDA audit helper script [puppet] - 10https://gerrit.wikimedia.org/r/298976 [19:32:35] PROBLEM - Router interfaces on cr1-codfw is CRITICAL: CRITICAL: host 208.80.153.192, interfaces up: 120, down: 1, dormant: 0, excluded: 0, unused: 0BRxe-5/0/2: down - Core: cr2-ulsfo:xe-1/3/0 (Zayo, OGYX/124337//ZYO, 38.8ms) {#11541} [10Gbps wave]BR [19:34:30] (03PS2) 10Ladsgroup: ores: add File['/srv/log'] for web nodes in labs [puppet] - 10https://gerrit.wikimedia.org/r/298935 (https://phabricator.wikimedia.org/T140265) [19:38:25] PROBLEM - Router interfaces on cr2-ulsfo is CRITICAL: CRITICAL: host 198.35.26.193, interfaces up: 75, down: 1, dormant: 0, excluded: 0, unused: 0BRxe-1/3/0: down - Core: cr1-codfw:xe-5/0/2 (Zayo, OGYX/124337//ZYO, 38.8ms) {#?} [10Gbps wave]BR [19:39:01] (03CR) 10Dzahn: "i think we should remove the users from the group (if they actually dont use it anymore) and also remove the group from ./hosts/ in hiera," [puppet] - 10https://gerrit.wikimedia.org/r/298832 (owner: 10Chad) [19:40:21] (03CR) 10Dzahn: "it's not really "zero rights" because it gives the log access which can still be useful" [puppet] - 10https://gerrit.wikimedia.org/r/298832 (owner: 10Chad) [19:41:28] 06Operations: FDCsupport@wikimedia.org - https://phabricator.wikimedia.org/T140408#2463733 (10Aklapper) Adding #Operations to hopefully find out. (For future reference, please associate projects to tasks, otherwise noone will see this task when searching in their baskets (=projects)). Thanks! :) [19:41:52] 06Operations: FDCsupport@wikimedia.org: Add/remove some members - https://phabricator.wikimedia.org/T140408#2463735 (10Aklapper) [19:44:08] 06Operations: Update Cassandra in Wikimedia APT repository - https://phabricator.wikimedia.org/T140409#2463737 (10Eevans) [19:44:15] 06Operations, 10Cassandra: Update Cassandra in Wikimedia APT repository - https://phabricator.wikimedia.org/T140409#2463749 (10Eevans) [19:44:20] 06Operations: FDCsupport@wikimedia.org: Add/remove some members - https://phabricator.wikimedia.org/T140408#2463750 (10eross) Okay, Thank you! *Emerauld Ross* Wikimedia Foundation I.T. Help Desk [19:45:45] RECOVERY - Router interfaces on cr1-codfw is OK: OK: host 208.80.153.192, interfaces up: 122, down: 0, dormant: 0, excluded: 0, unused: 0 [19:46:25] RECOVERY - Router interfaces on cr2-ulsfo is OK: OK: host 198.35.26.193, interfaces up: 77, down: 0, dormant: 0, excluded: 0, unused: 0 [19:47:37] 06Operations: FDCsupport@wikimedia.org: Add/remove some members - https://phabricator.wikimedia.org/T140408#2463719 (10Dzahn) Hello, fdcsupport@wikimedia is indeed still controlled by Operations. I say "still" because we'd be happy to have it moved over to OIT like we did with many other aliases already in T12... [19:49:14] 06Operations, 10Fundraising-Backlog, 10fundraising-tech-ops, 13Patch-For-Review: Allow Fundraising to A/B test wikipedia.org as send domain - https://phabricator.wikimedia.org/T135410#2463782 (10dpatrick) >>! In T135410#2462453, @Jgreen wrote: > Distilling the discussion to a proposed config, here's the DK... [19:51:21] 06Operations: FDCsupport@wikimedia.org: Add/remove some members - https://phabricator.wikimedia.org/T140408#2463798 (10eross) Hi, Yes that is correct dmenard to be added and klove to be removed. Next time I will check the email that you have sent to us. Thank you so much! [19:53:46] PROBLEM - Router interfaces on cr1-codfw is CRITICAL: CRITICAL: host 208.80.153.192, interfaces up: 120, down: 1, dormant: 0, excluded: 0, unused: 0BRxe-5/0/2: down - Core: cr2-ulsfo:xe-1/3/0 (Zayo, OGYX/124337//ZYO, 38.8ms) {#11541} [10Gbps wave]BR [19:53:50] ostriches: done with deploying? [19:54:13] 06Operations: FDCsupport@wikimedia.org: Add/remove some members - https://phabricator.wikimedia.org/T140408#2463807 (10Dzahn) @eross Done. fdsupport@ has been updated: ``` -fdcsupport: wolliff, klove +fdcsupport: wolliff, dmenard ``` So if you have access to mail to officeit@ you should be able to see some... [19:54:13] Yeah [19:54:19] i'd like to touch some files in the RevisionSlider extension and re-sync [19:54:33] (03CR) 10Ottomata: [C: 032] server.pp: fix zkCleanup cron [puppet/zookeeper] - 10https://gerrit.wikimedia.org/r/297757 (owner: 10Mklette) [19:54:40] with hopes that i18n stuff works (it works in php and on beta, etc) [19:54:41] 06Operations: FDCsupport@wikimedia.org: Add/remove some members - https://phabricator.wikimedia.org/T140408#2463810 (10Dzahn) 05Open>03Resolved a:03Dzahn [19:54:44] 07Blocked-on-Operations, 06Operations, 10Kartographer, 10Wikimedia-Extension-setup, and 3 others: Enable Interactive Maps (Kartographer) on Macedonian Wikipedia - https://phabricator.wikimedia.org/T139946#2463812 (10CKoerner_WMF) Conversation is happening on the Macedonian Wikipedia. https://mk.wikipedia... [19:54:46] (03CR) 10Ottomata: "Thanks Mklette!" [puppet/zookeeper] - 10https://gerrit.wikimedia.org/r/297757 (owner: 10Mklette) [19:55:29] (03PS2) 10Ottomata: zookeeper::jmxtrans: Expose statsd & group_prefix parameters [puppet/zookeeper] - 10https://gerrit.wikimedia.org/r/294906 (owner: 10Ctrochalakis) [19:55:31] 06Operations, 10Traffic, 10Wikimedia-Blog, 07HTTPS: Switch blog to HTTPS-only - https://phabricator.wikimedia.org/T105905#2463813 (10Tbayer) >>! In T105905#2463476, @faidon wrote: > @Tbayer, I still do. In fact I've been receiving these ticket (or series of tickets?) updates since Jul 24th 2015. OK, from... [19:56:14] !log aude@tin Synchronized php-1.28.0-wmf.10/extensions/RevisionSlider: touching js and resource files (duration: 00m 28s) [19:56:19] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [19:57:07] hmmmm [19:57:43] oh, think i know [19:57:54] (03CR) 10Ottomata: [C: 032] "Thanks Ctrochalakis!" [puppet/zookeeper] - 10https://gerrit.wikimedia.org/r/294906 (owner: 10Ctrochalakis) [19:58:05] (03CR) 10Dzahn: "yea, cool. we already use that in "releases", "iegreview" and "bugzilla_static" to avoid those conflicts between roles. that's the way to " [puppet] - 10https://gerrit.wikimedia.org/r/298935 (https://phabricator.wikimedia.org/T140265) (owner: 10Ladsgroup) [19:58:27] (03PS3) 10Dzahn: ores: add File['/srv/log'] for web nodes in labs [puppet] - 10https://gerrit.wikimedia.org/r/298935 (https://phabricator.wikimedia.org/T140265) (owner: 10Ladsgroup) [19:58:28] or not [19:59:46] RECOVERY - Router interfaces on cr1-codfw is OK: OK: host 208.80.153.192, interfaces up: 122, down: 0, dormant: 0, excluded: 0, unused: 0 [20:02:33] (03CR) 10Dzahn: [C: 032] ores: add File['/srv/log'] for web nodes in labs [puppet] - 10https://gerrit.wikimedia.org/r/298935 (https://phabricator.wikimedia.org/T140265) (owner: 10Ladsgroup) [20:02:55] !log restarting hadoop-yarn-resourcemanager on analytics1002 and then analytics1001 to apply yarn log aggregation change [20:02:59] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [20:04:03] PROBLEM - Router interfaces on cr2-ulsfo is CRITICAL: CRITICAL: host 198.35.26.193, interfaces up: 75, down: 1, dormant: 0, excluded: 0, unused: 0BRxe-1/3/0: down - Core: cr1-codfw:xe-5/0/2 (Zayo, OGYX/124337//ZYO, 38.8ms) {#?} [10Gbps wave]BR [20:06:02] RECOVERY - Router interfaces on cr2-ulsfo is OK: OK: host 198.35.26.193, interfaces up: 77, down: 0, dormant: 0, excluded: 0, unused: 0 [20:09:38] (03PS1) 1020after4: Fix path to jenkins homedir for nodepool slaves [puppet] - 10https://gerrit.wikimedia.org/r/299029 [20:14:29] (03CR) 10Krinkle: "Where is User['jenkins'] defined?" [puppet] - 10https://gerrit.wikimedia.org/r/299029 (owner: 1020after4) [20:15:32] 06Operations, 07LDAP: Assistance with LDAP Access for Transparency Report - https://phabricator.wikimedia.org/T140380#2463875 (10APalmer_WMF) [20:16:13] (03CR) 1020after4: "krinkle: modules/jenkins/manifests/user.pp" [puppet] - 10https://gerrit.wikimedia.org/r/299029 (owner: 1020after4) [20:16:41] (03CR) 1020after4: "yeah I don't know how it gets set on the nodepool slaves" [puppet] - 10https://gerrit.wikimedia.org/r/299029 (owner: 1020after4) [20:18:39] 06Operations, 10LDAP-Access-Requests: Assistance with LDAP Access for Transparency Report - https://phabricator.wikimedia.org/T140380#2463880 (10Aklapper) Sounds very similar to T138369 [20:22:39] (03PS2) 10Gehel: Enable "geoshapes" graphs protocol [puppet] - 10https://gerrit.wikimedia.org/r/299014 (owner: 10Yurik) [20:24:48] (03CR) 10Gehel: [C: 032] Enable "geoshapes" graphs protocol [puppet] - 10https://gerrit.wikimedia.org/r/299014 (owner: 10Yurik) [20:30:34] dear ops, how do i get a HHVM patch deployed in our production? https://phabricator.wikimedia.org/T97253#2463909 [20:30:55] (HHVM patch and a PHP patch, if we still use PHP anywhere, both are on that task) [20:31:08] wikitech is still on zend iirc [20:31:39] MatmaRex: backport for the branch we're using if it's not a clean cherry pick... [20:31:49] And then ask probably _joe_ ori etc [20:32:06] 06Operations, 10LDAP-Access-Requests: Assistance with LDAP Access for Transparency Report - https://phabricator.wikimedia.org/T140380#2462848 (10Dzahn) Hi, could you please add the exact user name that you are using, for the new user and the existing contractor that already has access? When you say that you... [20:32:18] MatmaRex: submit a task, tag with operations [20:32:19] Reedy: the PHP patch was a clean cherry-pick for everything from 5.3 to 5.6, i'm guessing the HHVM one also is [20:32:35] (…to 7 to master) [20:32:39] PHP 5.5.9-1ubuntu4.17 (apache2handler) [20:32:49] Looks like we're not using a wmf specific version there [20:32:52] 06Operations, 10LDAP-Access-Requests: Assistance with LDAP Access for Transparency Report - https://phabricator.wikimedia.org/T140380#2463917 (10Dzahn) also, where is the login please? [20:32:54] ori: heh. okay [20:33:33] ah, it's a small change relatively [20:33:52] three lines. (and a lot of tests, release notes and such) [20:34:13] * MatmaRex files a task [20:34:18] How desperate are we for it? [20:34:28] Looks like we're behind on point releases, and on major releases [20:34:56] 06Operations, 10LDAP-Access-Requests: Assistance with LDAP Access for Transparency Report - https://phabricator.wikimedia.org/T140380#2463921 (10APalmer_WMF) They're definitely related, in that access to the same website is the goal. But @siddharth11 had just joined, and I think wasn't certain if LDAP credenti... [20:35:08] Reedy: sufficiently for me to look into the matter, implement both of those patches and get them merged, and i don't even usually write C ;) [20:35:42] i'll file a task [20:35:53] probably good for 10 [20:36:00] (i wonder if i could find out how many files on commons are affected…) [20:37:03] sounds a lot of file accesses :) [20:38:34] (03CR) 10Dzahn: [C: 031] Add zfilipin to deployment group [puppet] - 10https://gerrit.wikimedia.org/r/298792 (https://phabricator.wikimedia.org/T140264) (owner: 10Thcipriani) [20:38:44] Reedy: it results in binary crap in images.img_metadata, could probably be done with a sufficiently big sql query [20:39:03] aha, bit easier then [20:39:16] (at least sometimes it does, not always. still, could get a lower bound) [20:40:08] Reedy: a lot of file accesses will be when i figure out a way to update the img_metadata for all those files ;) [20:40:22] purging? [20:40:32] hm [20:40:41] i wonder if that would work. i assumed it wouldn't, but maybe [20:40:48] It may [20:40:53] bawolff would be able to advise [20:40:54] !log restarted graphoid with the new settings, enabling geoshape protocol [20:41:00] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [20:41:45] (03PS1) 10Dzahn: admin: add addshore to deployers [puppet] - 10https://gerrit.wikimedia.org/r/299032 (https://phabricator.wikimedia.org/T140276) [20:42:08] I'm sure we've purged pages to fix stuff like that before [20:42:58] 06Operations, 10Phabricator: Phabricator weekly report not generated (or at least sent) - https://phabricator.wikimedia.org/T139950#2463947 (10mmodell) @jcrespo: I thought we were ready to go ahead with the failover and then we could reenable the cron jobs afterward. @greg, @danny_b: I'll run the job manual... [20:43:31] 06Operations, 10Analytics-Cluster, 06Analytics-Kanban, 10EventBus, 06Services: Better monitoring for Zookeeper - https://phabricator.wikimedia.org/T137302#2463951 (10Ottomata) a:03Ottomata [20:45:34] 06Operations, 10Phabricator: Phabricator weekly report not generated (or at least sent) - https://phabricator.wikimedia.org/T139950#2447582 (10Dzahn) Why don't you just enable the report separate from the other maintenance things? It's a separate thing that is sent to "absent' anyways. ``` 190 # project... [20:51:25] (03PS19) 10MaxSem: Script to do the initial data load from OSM for Maps project [puppet] - 10https://gerrit.wikimedia.org/r/293105 (owner: 10Gehel) [20:51:41] (03PS3) 10Mforns: Add white-list for EventLogging auto-purging [puppet] - 10https://gerrit.wikimedia.org/r/298721 (https://phabricator.wikimedia.org/T108850) [20:53:00] (03CR) 10jenkins-bot: [V: 04-1] Script to do the initial data load from OSM for Maps project [puppet] - 10https://gerrit.wikimedia.org/r/293105 (owner: 10Gehel) [20:55:12] 07Puppet, 10Continuous-Integration-Config, 07Jenkins: jenkins homedir on nodepool slaves is in /home/jenkins but this doesn't seem to be anywhere in puppet - https://phabricator.wikimedia.org/T140417#2464009 (10mmodell) [20:55:39] 07Puppet, 10Continuous-Integration-Config, 07Jenkins: jenkins homedir on nodepool slaves is in /home/jenkins but this doesn't seem to be anywhere in puppet - https://phabricator.wikimedia.org/T140417#2463982 (10mmodell) [20:57:41] 06Operations, 10LDAP-Access-Requests: Assistance with LDAP Access for Transparency Report - https://phabricator.wikimedia.org/T140380#2464012 (10Dzahn) @APalmer_WMF Can you give us the username that you guys are using to login (as opposed to phabricator users) and where exactly you are trying to login? [20:58:23] (03PS1) 10MaxSem: WIP: puppetize grants, kill grants.sql [puppet] - 10https://gerrit.wikimedia.org/r/299033 [20:58:56] 06Operations: Deploy a PHP and HHVM patch (Exif values retrieved incorrectly if they appear before IFD) - https://phabricator.wikimedia.org/T140419#2464014 (10matmarex) [20:59:08] Reedy: ori: ^ [20:59:23] ooh, task graph [20:59:33] (03CR) 10jenkins-bot: [V: 04-1] WIP: puppetize grants, kill grants.sql [puppet] - 10https://gerrit.wikimedia.org/r/299033 (owner: 10MaxSem) [20:59:54] yeah, pretty [21:00:06] "Task graph too large to display (this task is connected to more than 100 other tasks)" [21:00:07] Reedy: don't try it for bug 1, you'll be disappointed [21:00:07] ffs [21:00:09] ha [21:00:15] so, it just hides it? [21:00:17] That's shit [21:00:26] Reedy: being worked on :P [21:00:36] that was almost exactly my reaction today. :P [21:00:42] it's really neet when it works though :D [21:00:44] neat* [21:00:54] greg-g: pfffft [21:01:20] manyana [21:02:39] MatmaRex: I like 'neet' better [21:03:49] (03PS1) 10Ottomata: Emit zookeeper server JMX metrics in zookeeper::jmxtrans class [puppet/zookeeper] - 10https://gerrit.wikimedia.org/r/299036 (https://phabricator.wikimedia.org/T137302) [21:07:02] !log Started backfillUnreadWikis.php --rebuild on all group 2 wikis [21:07:07] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [21:08:26] (03PS1) 10Ottomata: Update zookeeper submodule and configure sending zookeeper jmx stats to statsd [puppet] - 10https://gerrit.wikimedia.org/r/299039 (https://phabricator.wikimedia.org/T137302) [21:08:30] !log Started backfillReadBundles.php on all group 2 wikis [21:08:35] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [21:08:43] (03CR) 10Ottomata: "Requires that https://gerrit.wikimedia.org/r/#/c/299036/ is merged first." [puppet] - 10https://gerrit.wikimedia.org/r/299039 (https://phabricator.wikimedia.org/T137302) (owner: 10Ottomata) [21:09:56] (03CR) 10jenkins-bot: [V: 04-1] Update zookeeper submodule and configure sending zookeeper jmx stats to statsd [puppet] - 10https://gerrit.wikimedia.org/r/299039 (https://phabricator.wikimedia.org/T137302) (owner: 10Ottomata) [21:10:02] (03PS2) 10Ottomata: Update zookeeper submodule and configure sending zookeeper jmx stats to statsd [puppet] - 10https://gerrit.wikimedia.org/r/299039 (https://phabricator.wikimedia.org/T137302) [21:16:44] 06Operations, 10LDAP-Access-Requests: Assistance with LDAP Access for Transparency Report - https://phabricator.wikimedia.org/T140380#2464127 (10APalmer_WMF) Hi @Dzahn, just sent you an email with this information. Since this task is public and we didn't know what other detail might be required, didn't fancy... [21:18:08] 06Operations, 10LDAP-Access-Requests: Assistance with LDAP Access for Transparency Report - https://phabricator.wikimedia.org/T140380#2464140 (10Dzahn) There is no need to send private mail. The user names are public. [21:22:43] PROBLEM - Start and verify pages via webservices on kubernetes on checker.tools.wmflabs.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 SERVICE UNAVAILABLE - string OK not found on http://checker.tools.wmflabs.org:80/webservice/kubernetes - 299 bytes in 0.237 second response time [21:26:55] halfak: what can you tell me about the Labs project 'persistence'? [21:27:25] andrewbogott, nearly forgot about it. It's totally defunct and can be wiped out. [21:27:28] Sorry for the trouble. [21:27:33] ok! thanks [21:27:35] :) [21:28:05] 06Operations, 10LDAP-Access-Requests: Assistance with LDAP Access for Transparency Report - https://phabricator.wikimedia.org/T140380#2464176 (10APalmer_WMF) The user names are apalmer and jbuatti. We understand that they are public, but were leery of continuing to use Phabricator to talk about details of acce... [21:29:02] 06Operations, 06Labs: madhuvishy is moving to operations on 7/18/16 - https://phabricator.wikimedia.org/T140422#2464178 (10chasemp) [21:29:48] 06Operations, 06Labs: madhuvishy is moving to operations on 7/18/16 - https://phabricator.wikimedia.org/T140422#2464190 (10chasemp) p:05Triage>03Normal [21:30:28] 06Operations, 10Ops-Access-Requests, 06Labs: madhuvishy is moving to operations on 7/18/16 - https://phabricator.wikimedia.org/T140422#2464178 (10chasemp) [21:34:12] 06Operations, 10LDAP-Access-Requests: Assistance with LDAP Access for Transparency Report - https://phabricator.wikimedia.org/T140380#2464201 (10Dzahn) @APalmer_WMF We are using Phabricator for access requests all the time. Even for root shell. It's better to keep access things on tickets. That way others can... [21:35:21] PROBLEM - check_payments_wiki on payments1004 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [21:36:01] 06Operations, 10LDAP-Access-Requests: Assistance with LDAP Access for Transparency Report - https://phabricator.wikimedia.org/T140380#2462848 (10JbuattiWMF) Hey, @Dzahn my Wikitech account name is "Jbuatti". [21:39:58] (03PS1) 10MaxSem: postgresql: fix user existence check [puppet] - 10https://gerrit.wikimedia.org/r/299075 [21:40:09] gehel, ^ [21:40:11] RECOVERY - check_payments_wiki on payments1004 is OK: HTTP OK: HTTP/1.1 200 OK - 269 bytes in 0.029 second response time [21:41:17] 06Operations, 10LDAP-Access-Requests: Assistance with LDAP Access for Transparency Report - https://phabricator.wikimedia.org/T140380#2464214 (10APalmer_WMF) @Dzahn, thanks for clarifying -- must have misunderstood previous info from someone else about Google app credentials. My wikitech username is "Apalmer". [21:44:35] MaxSem: thanks! Will have a look tomorrow [21:45:32] gehel, this raises question though how the hell does this work right now? :P [21:49:38] MaxSem: Yeah, I had a quick look and I'm not sure I understand. One more reason to use a standard puppet module and not write our own... [21:50:38] MaxSem: I had a look at puppetlabs module. It of course depends on a recent version of stdlib... Which we don't have... [21:50:39] 07Puppet, 10Continuous-Integration-Config, 07Jenkins: jenkins homedir on nodepool slaves is in /home/jenkins but this doesn't seem to be anywhere in puppet - https://phabricator.wikimedia.org/T140417#2464260 (10hashar) The user for Jenkins are absolutely a complete mess. That boils down to: | User | Realm... [21:54:49] 07Puppet, 10Continuous-Integration-Config, 07Jenkins: jenkins homedir on nodepool slaves is in /home/jenkins but this doesn't seem to be anywhere in puppet - https://phabricator.wikimedia.org/T140417#2464272 (10mmodell) @hashar: Thanks for the detailed response. I'd like to make this a little less messy but... [21:59:51] (03PS2) 10MaxSem: WIP: puppetize grants, kill grants.sql [puppet] - 10https://gerrit.wikimedia.org/r/299033 [22:01:07] (03PS1) 10Rush: madhu: transition to ops [puppet] - 10https://gerrit.wikimedia.org/r/299078 (https://phabricator.wikimedia.org/T140422) [22:01:20] (03CR) 10jenkins-bot: [V: 04-1] WIP: puppetize grants, kill grants.sql [puppet] - 10https://gerrit.wikimedia.org/r/299033 (owner: 10MaxSem) [22:03:02] PROBLEM - puppet last run on mw2136 is CRITICAL: CRITICAL: Puppet has 1 failures [22:10:31] PROBLEM - Router interfaces on cr1-codfw is CRITICAL: CRITICAL: host 208.80.153.192, interfaces up: 120, down: 1, dormant: 0, excluded: 0, unused: 0BRxe-5/0/2: down - Core: cr2-ulsfo:xe-1/3/0 (Zayo, OGYX/124337//ZYO, 38.8ms) {#11541} [10Gbps wave]BR [22:12:22] RECOVERY - Router interfaces on cr1-codfw is OK: OK: host 208.80.153.192, interfaces up: 122, down: 0, dormant: 0, excluded: 0, unused: 0 [22:27:31] does anybody know how to scrub a line or two out of bots.wmflabs.org channel logs? [22:28:03] RECOVERY - puppet last run on mw2136 is OK: OK: Puppet is currently enabled, last run 51 seconds ago with 0 failures [22:29:15] why? [22:29:49] Someone commented on a not-private phab ticket with what I'm pretty sure is a donor name [22:30:36] wikibugs parroted it to IRC, and the logging bot has it in the wmflabs dir already [22:30:50] I've made the phab ticket private [22:30:54] should be easy for someone with access to the project.. [22:31:15] ah right, I should be able to find that out. Thanks! [22:37:53] PROBLEM - puppet last run on mw2171 is CRITICAL: CRITICAL: puppet fail [22:52:48] (03PS1) 10RobH: setting new mirror systems partition scheme [puppet] - 10https://gerrit.wikimedia.org/r/299092 [22:55:03] (03PS2) 10RobH: setting new mirror systems partition scheme [puppet] - 10https://gerrit.wikimedia.org/r/299092 [22:57:08] (03CR) 10RobH: [C: 032] setting new mirror systems partition scheme [puppet] - 10https://gerrit.wikimedia.org/r/299092 (owner: 10RobH) [22:57:25] 06Operations, 10LDAP-Access-Requests: Assistance with LDAP Access for Transparency Report - https://phabricator.wikimedia.org/T140380#2464649 (10Dzahn) @APalmer_WMF @JbuattiWMF Alright, thank you. I found your users in LDAP, I added you to the WMF group (based on the @wikimedia email addresses and the priva... [22:57:57] 06Operations, 10LDAP-Access-Requests: Assistance with LDAP Access for Transparency Report - https://phabricator.wikimedia.org/T140380#2464653 (10Dzahn) This means you should now be able to login. Let me know if any problems. [23:00:05] RoanKattouw, ostriches, MaxSem, awight, and Dereckson: Respected human, time to deploy Evening SWAT (Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20160714T2300). Please do the needful. [23:00:05] ebernhardson: A patch you scheduled for Evening SWAT (Max 8 patches) is about to be deployed. Please be available during the process. [23:00:13] <_joe_> ejegg: if it's a privacy issue, open a phab ticket (with the correct rights :)) [23:00:31] <_joe_> but if it's a security issue, a LOT of people keep logs here [23:00:41] <_joe_> so consider the information leaked [23:00:42] will do _joe_, it's a privacy thing [23:01:08] got in touch with a bot op who's going to take care of the logs as soon as he can [23:01:16] but there's probably more we need to do [23:01:21] what else can be done? [23:02:04] notify? [23:02:56] so, i suppose its just me for swat so ... i'll ship it [23:03:00] * ebernhardson looks up the new procedure again [23:03:31] old commands still work :P [23:03:45] yeah but there's new instructions [23:03:53] ebernhardson: mw1099 instead of 1017, mostly [23:03:56] like always using X-Wikimedia-Debug [23:04:38] I've got 1099 problems, but wikipedia ain't one? [23:05:07] something like that [23:05:37] 06Operations, 06Security-Team, 10vm-requests, 13Patch-For-Review: provide ganeti VM for security team sectools - https://phabricator.wikimedia.org/T138650#2464659 (10Dzahn) p:05Triage>03Normal [23:05:45] 06Operations, 10Ops-Access-Requests: root access on security-tools instances for Darian Patrick - https://phabricator.wikimedia.org/T138873#2464661 (10Dzahn) [23:06:22] 06Operations, 10Phabricator: Phabricator weekly report not generated (or at least sent) - https://phabricator.wikimedia.org/T139950#2464662 (10mmodell) Ok, I just ran the community_metrics script manually. >>! In T139950#2463954, @Dzahn wrote: > Why don't you just enable the report separate from the other m... [23:07:03] RECOVERY - puppet last run on mw2171 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [23:07:08] 06Operations, 10Ops-Access-Requests, 06Labs, 13Patch-For-Review: madhuvishy is moving to operations on 7/18/16 - https://phabricator.wikimedia.org/T140422#2464178 (10bd808) * [x] [[https://tools.wmflabs.org/sal/log/AVXrojH3gCrwkbTdmhun|Added as admin on Tools project]] * [x] [[https://wikitech.wikimedia.or... [23:10:59] !log ebernhardson@tin Synchronized php-1.28.0-wmf.10/extensions/WikimediaEvents/modules/ext.wikimediaEvents.searchSatisfaction.js: T137169: Turn of TextCat A/B test (duration: 00m 34s) [23:11:00] T137169: Part Deux: TextCat A/B test for Language Identification - turn off test - https://phabricator.wikimedia.org/T137169 [23:11:03] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [23:12:31] ok, swat completed [23:13:42] (03PS1) 10Dzahn: phabricator: re-enable community metrics mail [puppet] - 10https://gerrit.wikimedia.org/r/299093 (https://phabricator.wikimedia.org/T139950) [23:15:39] 06Operations, 10ops-eqiad: Rack/Setup Carbon/Apt Server Replacement - https://phabricator.wikimedia.org/T139171#2464690 (10Cmjohnson) Configured w/Raid 10. Tried installing but no pxe device found. Check cable needed. [23:23:37] 06Operations, 10Phabricator, 13Patch-For-Review: Phabricator weekly report not generated (or at least sent) - https://phabricator.wikimedia.org/T139950#2464720 (10Danny_B) And the weekly stats? [23:25:49] 06Operations, 10DBA, 10Phabricator, 13Patch-For-Review: Upgrade m3 (phabricator) db servers - https://phabricator.wikimedia.org/T138460#2464723 (10jcrespo) I do not feel confident with the current status- while I could rush it and do it today (the slave is ready), after checking that this shard is not yet... [23:27:28] 06Operations, 10DBA, 10Phabricator, 13Patch-For-Review: Upgrade m3 (phabricator) db servers - https://phabricator.wikimedia.org/T138460#2464725 (10jcrespo) [23:34:34] (03PS1) 10Aaron Schulz: Set the DBTransaction log [mediawiki-config] - 10https://gerrit.wikimedia.org/r/299095 [23:36:58] (03PS2) 10Dzahn: admin: add addshore to deployers [puppet] - 10https://gerrit.wikimedia.org/r/299032 (https://phabricator.wikimedia.org/T140276) [23:37:00] can we kill phabricator mailer for a bit? [23:39:32] Danny_B: ? that's the opposite of that ticket that asks to reactive the maintenance mails [23:39:47] ??? [23:39:52] * Danny_B is confused now [23:40:03] ah [23:40:08] what kind of mails do you mean? the stats mail to wikitech? [23:40:18] no. notifications [23:40:34] oh, ok. i dont get any of them via email. i prefer browser [23:40:36] i found a way, how to enable most of the task dependecy graphs [23:40:47] you can change it in your user profile [23:40:53] to disable them all [23:40:54] but it would involve massive mailing to many people [23:40:59] oh [23:41:10] hence why i need to disable the mailing for a bit [23:41:11] gotcha [23:41:23] uh what [23:41:26] i can assume thousands of emails [23:41:30] i definitely want to know what you're doing… [23:41:31] ;) [23:41:47] MatmaRex: killing dependencies on 4007 [23:42:11] Danny_B: didn't quiddity already kill them? [23:42:12] it helps. tested on humans. proved [23:42:19] MatmaRex: not at all [23:43:00] ew, yeah, there's still lots of subtasks [23:43:11] I only removed in an attempt to make 4007 not give a 10second complete timeout. Was waiting till today to discuss what to do with the rest. [23:43:17] removed a few* [23:44:14] Danny_B: why not just do it? can't be much more disruptive than the usual triaging of super old closed bugs you often do ;) and it can be easily filtered out, unlike that [23:44:20] MatmaRex: there is 277 of them left [23:44:43] MatmaRex: please refrain from irony and sarcasm, thank you [23:44:59] well, the last part is true, though. it can be easily filtered out. [23:45:06] i easily filtered out the quiddity ones this morning ;) [23:46:22] 277 tracking tasks means at least 4 times 277 mails sent and that's if i only count myself and task creator and no other dependencies. it will be tens of thousands of mails. but it will solve the issue for most of the task graphs. [23:46:39] hmm [23:46:47] MatmaRex: fact that *you* filtered it doesn't imply that others will be comfortable with that [23:47:04] Where does the 4x multiplier come from? (I probably need more coffee...) [23:47:58] Danny_B: you could set a restrictive visibility policy on T4007 so that only you can view that task, and then remove the blockers. i'm not sure if that will send any emails to users subscribed to the blockers that can't view the parent… can you try that on some test instance? [23:47:58] T4007: [DO NOT USE] Tracking bug [superseded by the #Tracking tag] - https://phabricator.wikimedia.org/T4007 [23:48:36] that will probably be safer/easier than fiddling with emails directly [23:49:01] or, well, just go ahead and remove them… [23:49:16] quiddity: "t4007 removed subtask" to myself as performer of the action. "this task removed parent" - to me and creator. that's 3 times. 4 time is that creator is typically subscribed to 4007 too, so the first mail applies. and it didn't count all the dependencies going lower and all other people subscribed [23:49:50] Danny_B: you can disable emails to yourself in phab preferences, btw (at least temporarily for this, if you want them in general) [23:50:26] MatmaRex: re policy: that would make the half perhaps. but still the "subtasks" subscribers would be notified [23:50:37] MatmaRex: i don't care about mailspam, others do [23:51:39] so what's difficult on disabling the phab mailing for a bit? ok, maybe some regular mails will be lost, but the benefit is much bigger than the possible loos of one two emails in the meantime [23:52:18] twentyafterfour: any idea how to disable it? [23:55:14] Danny_B: i'm not convinced that they will be notified if they can't see the parent. can we test that real quick? [23:55:28] let's see, mmt [23:56:18] MatmaRex: can you subscribe to https://phabricator.wikimedia.org/T73295 [23:56:27] i'll test on that [23:56:58] can we test on a test task? :D [23:57:18] i could imagine that if you disable it, it means they all end up in some queue.. and when you re-enable they all get delivered at once [23:57:19] we can't test at all [23:57:28] i can subscribe, but how exactly are you going to test? [23:57:36] i am not admin nor owner of 4007 so i can't set up the policy [23:57:41] i am [23:57:47] you are what? [23:58:00] i can set visibility policies [23:58:33] lol, so then test yourself... why am i trying the impossible? ;-) [23:58:44] mutante: i don't think so. why would they? [23:59:17] 06Operations, 10LDAP-Access-Requests: Assistance with LDAP Access for Transparency Report - https://phabricator.wikimedia.org/T140380#2464769 (10Dzahn) 05Open>03Resolved a:03Dzahn [23:59:37] 4007 is 12 ppl subscribed. that means 12×227 emails