[00:00:04] twentyafterfour: #bothumor I � Unicode. All rise for Phabricator update deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20171012T0000). [00:00:04] No GERRIT patches in the queue for this window AFAICS. [00:00:10] thanks thcipriani and sorry about all the hiccups this week [00:00:31] bd808: Seems like the default should be to go to logstash though? [00:00:35] jdlrobson: yw, no worries, glad all's working now :) [00:00:36] hmm. [00:01:01] awight: yeah, >=info should go to logstash once the channel is registered [00:01:33] bd808: hey, this is a mean thing to corner you about but any thoughts about why I can’t “git submodule update” on tin-beta:/srv/deployment/ores/deploy ? [00:01:35] !log Preparing to take phabricator offline for a hopefully momentary upgrade. [00:01:42] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [00:01:43] Just one of the submodules is timing out trying to fetch from git-ssh.wmoi [00:02:36] awight: hmm... not sure what exactly would cause that. gerrit git server bug? Corruption in the clone? [00:03:29] harr. k I might try moving .git/modules/submodules/wikiclass [00:04:29] ooh that was ridiculous. “git submodule sync” FTW. [00:05:19] git-ssh is on phabricator, which isn't offline yet but it's about to be [00:05:30] doing a fresh clone of that submodule repo locally hung... [00:05:36] !log netmon2001 arm keyholder for rancid deploy [00:05:39] * bd808 sees twentyafterfour's announce [00:05:44] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [00:05:56] bd808: but I haven't taken it offline yet so that's not the cause [00:06:06] I see a git process apparently stuck on phabricator [00:06:20] /srv/repos/1914/ [00:06:21] it got to "remote: Compressing objects: 99% (728/731)" and stopped [00:06:33] yup that's it [00:06:41] that repo may be messed up I guess? [00:06:45] it looks like it's just doing a lot of work? [00:06:49] using a ton of cpu [00:06:53] the files are huge [00:06:54] might be legit [00:07:03] I’m running now so that process is probably mine, not stuck. [00:07:03] git pack-objects 100% of one core [00:07:10] remote: Compressing objects: 100% (82/82), done. [00:07:13] ok [00:07:15] yep—please do not shoot :D [00:07:32] This will get a lot better in the git-lfs future BTW. [00:07:34] * bd808 kills his test clone [00:07:40] ok tell me when you're done so I can do a quick service bounce to complete the upgrade [00:07:53] twentyafterfour: ah thx. done! [00:08:04] awight: phabricator supports git-lfs (I think) and it's on my todo for this quarter to look into replacing git-fat with git-lfs [00:08:16] totally! I’m hoping to be able to help with that, so KIT [00:08:32] :) [00:08:56] !log upgrading phabricator to #phab-2017-10-11 [00:08:58] twentyafterfour: remember when we told you that working on phabricator was going to be a short term project to get it deployed? ;) [00:09:03] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [00:09:20] ACKNOWLEDGEMENT - Host ftp-internal is DOWN: PING CRITICAL - Packet loss = 100% daniel_zahn ayounsi temp host [00:19:11] bd808: heheh yeah, short term alright [00:29:04] 10Operations, 10ops-ulsfo, 10Traffic, 10hardware-requests, 10Patch-For-Review: Decom cp4005-8,13-16 (8 nodes) - https://phabricator.wikimedia.org/T176366#3678459 (10RobH) both ssds (sda/sdb) securely erased with hdparm on cp4005-8, cp4013-16 cp4005-8 have had their drac/ilom reset to factory defaults cp4... [00:29:45] 10Operations, 10Gerrit, 10ORES, 10Scoring-platform-team, and 2 others: Support git-lfs files in gerrit - https://phabricator.wikimedia.org/T171758#3475312 (10mmodell) Relatedly, phabricator had some work done on git-lfs support upstream, however, it seems to be undocumented and probably unfinished (it was... [00:30:33] !log phabricator update finished (a while ago, I forgot to log it) [00:30:41] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [00:33:09] 10Operations, 10netops: Allow syslog (-tls) from both wezen and lithium in labs - https://phabricator.wikimedia.org/T177820#3671443 (10ayounsi) Routers and jnt updated. [00:33:17] 10Operations, 10netops: Allow syslog (-tls) from both wezen and lithium in labs - https://phabricator.wikimedia.org/T177820#3678472 (10ayounsi) 05Open>03Resolved a:03ayounsi [00:40:07] bd808: okay, I do have a need to plumb the beta logs. I can’t read deployment-sca03.deployment-prep.eqiad.wmflabs:/srv/log/ores/main.log due to silly perms. [00:40:23] The scary part is that production doesn’t seem to have any special plumbling. [00:41:19] awight: do you have sudo in deployment-prep? If not, do you want it? [00:41:21] wmgMonologChannels doesn’t seem to be the panacaea I’m looking for. [00:41:25] SURE [00:41:35] actually this is not on tin AFAIK [00:41:42] It’s on sca03-beta [00:42:03] PROBLEM - IPMI Sensor Status on cp4026 is CRITICAL: Sensor Type(s) Temperature, Power_Supply Status: Critical [PS Redundancy = Critical, Status = Critical] [00:42:15] right. the uwsgi logs are local to the deploy [00:44:41] Just mentioning cos I think I would need sudo on sca03-beta to read these. [00:44:59] or just the www-data group, either way [00:45:16] I’m cross-eyed from debugging, don’t mind me. [00:45:23] awight: It looks like the current project policy is "all members have sudo on all instances" [00:45:29] ahahaha [00:45:37] so you should be able to sudo less ... [00:45:38] * awight levels wand [00:46:35] there was a time when this was restricted, but I remember now that g.reg-g and I got that sorted :) [00:46:42] bd808: Successfully sudoing. Thanks for the info! [00:47:06] awight: first rule of debugging: when in doubt, ask for more rights :) [00:47:07] [good thing too, this looks messy: no python application found, check your startup logs for errors] [00:47:30] absolutely! And for random big favors that keep ops busy and out of your business [00:48:15] as a manger of roots... I can neither confirm or deny that sentiment [01:03:26] I already asked for +2 on operations/puppet, afaikt it didn't do much to keep ops busy, except busy laughing. [01:22:29] twentyafterfour: nice one, I’ll be sure to leave one of those in the burning dumpster nonetheless. [01:22:52] twentyafterfour: Hey uh, any idea why scap-beta would be doing this to me? [01:22:55] > 01:03:14 Unable to find keyholder key for deploy_service [01:23:10] Files in submodules aren’t being deployed. [02:02:40] awight|afk: ywah ... I think the keyholder_key needs to be specified in scap config now [02:12:56] hmmm weird. I can't see any reason it wouldn't be finding the keyholder_ key for deploy_service on beta deployment-tin [02:15:53] It’s pretty consistent, at least the last 3 times. [02:16:45] oh. keyholder_key doesn’t appear in any of my scap config. [02:16:55] I’ll try to copy from another repo... [02:18:30] 10Operations, 10MediaWiki-Platform-Team, 10TechCom-RfC, 10HHVM, 10NewPHP: Migrate to PHP 7 in WMF production - https://phabricator.wikimedia.org/T176370#3678552 (10alex-mashin) Is there a reason to migrate only to PHP 7.0 and not to PHP 7.1 of even 7.2, as it is going to be released soon? Also, certain... [02:18:35] you probably need to specify keyholder_key or deploy_user [02:20:12] Looks like restbase has the same error in its scap logs. [02:20:17] Aha, thanks! deploy_user sounds like the ticket. [02:20:52] or uh… parsoid/scap/scap.cfg has the line “keyholder_key: deploy_service" [02:21:48] That was it. I’ll file a patch. [02:27:11] https://gerrit.wikimedia.org/r/383766 fwiw [02:28:11] * awight scowls. Still nothing in the submodules. [02:30:18] The scap.cfg change wasn’t propagated either. [02:32:38] .cfg change was livehacked actually, rather than through git, so maybe that’s not surprising that it wasn’t fetched by the deployment target. [02:34:27] Spooky: the deployment process seems to rewrite .gitmodules [02:37:27] The scap log shows something interesting, P6108 [02:37:30] https://phabricator.wikimedia.org/P6108 [02:38:02] Fetch from: http://deployment-tin.deployment-prep.eqiad.wmflabs/ores/deploy/.git \n Revision directory already exists (use --force to override) [02:38:29] oh. I think I get it, it didn’t deploy right the first time, and scap is unwilling to re-deploy at the same revision. [02:41:18] * awight abuses sudo [02:41:52] !log l10nupdate@tin scap sync-l10n completed (1.31.0-wmf.2) (duration: 15m 39s) [02:42:01] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [02:44:21] where is the repo of the scap package [02:44:45] mutante: it's on phabricator I believe [02:44:58] \o/ twentyafterfour success. Thanks for the clues! [02:45:00] https://phabricator.wikimedia.org/source/scap.git ? [02:45:14] legoktm: ok, thanks, i think that's it [03:03:08] 10Operations, 10Scap: scap should not pull in HHVM on stretch hosts using PHP7 - https://phabricator.wikimedia.org/T178039#3678578 (10Dzahn) [03:06:40] 10Operations, 10Scap: scap should not pull in HHVM on stretch hosts using PHP7 - https://phabricator.wikimedia.org/T178039#3678591 (10Dzahn) [03:10:10] 10Operations, 10Scap: scap should not pull in HHVM on stretch hosts using PHP7 - https://phabricator.wikimedia.org/T178039#3678593 (10Dzahn) 23:00 < MaxSem> uhh, why wold it depend on PHP? Yea, why? [03:12:11] 10Operations, 10Scap: scap should not pull in HHVM on stretch hosts using PHP7 - https://phabricator.wikimedia.org/T178039#3678595 (10demon) PHP should probably be a Suggests, not a Depends. It's only used by the master and only used for linting. This is a packaging issue -- easily fixed. [03:18:00] !log l10nupdate@tin scap sync-l10n completed (1.31.0-wmf.3) (duration: 15m 57s) [03:18:08] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [03:25:18] !log l10nupdate@tin ResourceLoader cache refresh completed at Thu Oct 12 03:25:17 UTC 2017 (duration 7m 17s) [03:25:25] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [03:30:13] PROBLEM - MariaDB Slave Lag: s1 on dbstore1002 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 784.97 seconds [04:12:57] (03CR) 10KartikMistry: "Scheduled deployed on 16th October: https://wikitech.wikimedia.org/wiki/Deployments#Monday.2C.C2.A0October.C2.A016" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/383357 (https://phabricator.wikimedia.org/T177836) (owner: 10KartikMistry) [05:22:43] RECOVERY - MariaDB Slave Lag: s1 on dbstore1002 is OK: OK slave_sql_lag Replication lag: 255.24 seconds [05:41:17] (03PS3) 10Giuseppe Lavagetto: Rakefile: brown paper bag fix [puppet] - 10https://gerrit.wikimedia.org/r/383620 [05:42:28] (03CR) 10Giuseppe Lavagetto: [C: 032] Rakefile: brown paper bag fix [puppet] - 10https://gerrit.wikimedia.org/r/383620 (owner: 10Giuseppe Lavagetto) [06:26:27] jouncebot: next [06:26:27] In 1 hour(s) and 33 minute(s): ores_classification table cleanup (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20171012T0800) [07:11:06] (03Abandoned) 10Thiemo Mättig (WMDE): remove unused injectrecentChanges option [mediawiki-config] - 10https://gerrit.wikimedia.org/r/371067 (owner: 10Thiemo Mättig (WMDE)) [07:12:25] (03Abandoned) 10Thiemo Mättig (WMDE): Remove pointless showExternalRecentChanges option. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/371069 (owner: 10Thiemo Mättig (WMDE)) [07:12:50] 10Operations, 10Scap: scap should not pull in HHVM on stretch hosts using PHP7 - https://phabricator.wikimedia.org/T178039#3678740 (10MoritzMuehlenhoff) If the scap package itself doesn't use PHP, it should not depend/recommend/suggest it; if the person using scap needs PHP for some workflow it should be pulle... [07:18:40] !log purging html5-misnesting linter entries from database on terbium (T178040) [07:18:48] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:18:49] T178040: Purge html5-misnesting tags from the database - https://phabricator.wikimedia.org/T178040 [07:24:51] !log installing gnupg2 update from stretch 9.2 point release [07:24:56] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:26:25] 10Operations, 10User-fgiunchedi: Integrate stretch 9.2 point release - https://phabricator.wikimedia.org/T177739#3678747 (10MoritzMuehlenhoff) None of the packages removed in 9.2 were present in our environment. These are fully rolled out: apt bind9 db5.3 dbus dns-root-data expect samba vim whois [07:40:26] !log installing at-spi2-core update from stretch 9.2 point release [07:40:32] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:42:34] (03CR) 10Muehlenhoff: [C: 032] Use readline in generate-debdeploy-spec [debs/debdeploy] - 10https://gerrit.wikimedia.org/r/382689 (owner: 10Muehlenhoff) [07:51:09] !log reinitialize cassandra on maps-test2004 to test vector tiles - T153282 [07:51:16] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:51:17] T153282: [epic] Migrate to a new vector tile structure - https://phabricator.wikimedia.org/T153282 [08:00:04] Amir1: Your horoscope predicts another unfortunate ores_classification table cleanup deploy. May Zuul be (nice) with you. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20171012T0800). [08:00:04] No GERRIT patches in the queue for this window AFAICS. [08:00:18] :)))) [08:00:40] The person writing these stuff deserves an Academy award [08:02:58] !log starting cleanup of ores_classification in wikidatawiki (T159753) [08:03:04] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:03:04] T159753: Concerns about ores_classification table size on enwiki - https://phabricator.wikimedia.org/T159753 [08:04:05] (03Abandoned) 10Elukey: role::analytics_cluster::druid::worker: introduce tlsproxy for druid [puppet] - 10https://gerrit.wikimedia.org/r/379533 (https://phabricator.wikimedia.org/T176223) (owner: 10Elukey) [08:25:32] !log installing gnutls28 update from stretch 9.2 point release [08:25:38] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:31:27] (03CR) 10Elukey: Set up separate druid public-eqiad cluster. (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/380804 (https://phabricator.wikimedia.org/T176223) (owner: 10Ottomata) [08:36:12] (03PS1) 10DCausse: Add additional namespaces to search results for bnwikisource [mediawiki-config] - 10https://gerrit.wikimedia.org/r/383794 (https://phabricator.wikimedia.org/T178041) [08:46:41] 10Operations, 10Patch-For-Review, 10Release-Engineering-Team (Watching / External): Provide cross-dc redundancy (active-active or active-passive) to all important misc services - https://phabricator.wikimedia.org/T156937#3678875 (10hashar) [08:46:43] 10Operations, 10Continuous-Integration-Infrastructure, 10Patch-For-Review, 10Release-Engineering-Team (Backlog): Secondary production Jenkins for CI - https://phabricator.wikimedia.org/T150771#3678873 (10hashar) 05Open>03stalled I would like to ultimately have the Jenkins in active/active. I miss time... [08:47:01] 10Operations, 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team, 10Jenkins: Upgrade ci ssh key to ecdsa - https://phabricator.wikimedia.org/T177826#3678877 (10hashar) p:05Triage>03Low [08:48:15] 10Operations, 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team: CI is down (jenkins) - https://phabricator.wikimedia.org/T177174#3678879 (10hashar) 05Open>03declined It happens from time to time. We would need a thread dump to figure out what is going exactly. [09:01:20] !log installing ncurses security updates [09:01:26] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:06:37] !log installing nettle update from stretch 9.2 point release [09:06:43] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:19:51] https://en.wikipedia.beta.wmflabs.org/wiki/Main_Page gives error message [09:20:40] Elitre: Caused by https://gerrit.wikimedia.org/r/383744 [09:21:14] legoktm: Reedy ^ [09:21:43] thanks. [09:23:27] 10Operations, 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team: CI is down (jenkins) - https://phabricator.wikimedia.org/T177174#3649345 (10Gehel) More than a thread dump, we would need GC logs. If you are interested in digging into this further, I would suggest the following config to the... [09:25:49] (03CR) 10Alexandros Kosiaris: Set up separate druid public-eqiad cluster. (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/380804 (https://phabricator.wikimedia.org/T176223) (owner: 10Ottomata) [09:31:49] (03CR) 10Jcrespo: "This is ok to me, but my aim is to make the default not critical, as stated on: T177782, the problem is that other ops opposed to that." [puppet] - 10https://gerrit.wikimedia.org/r/383713 (https://phabricator.wikimedia.org/T178008) (owner: 10Dzahn) [09:32:15] (03PS15) 10Elukey: Set up separate druid public-eqiad cluster. [puppet] - 10https://gerrit.wikimedia.org/r/380804 (https://phabricator.wikimedia.org/T176223) (owner: 10Ottomata) [09:34:09] 10Operations, 10wikidiff2, 10User-Addshore, 10WMDE-QWERTY-Team-Board: Update and use php-wikidiff2 to 1.5 in production - https://phabricator.wikimedia.org/T177891#3678968 (10Tobi_WMDE_SW) @MoritzMuehlenhoff T176485 has just been merged, so it would be super great if we could also include https://gerrit.wi... [09:37:04] (03PS1) 10Ema: cp3008: upgrade to Varnish 5 [puppet] - 10https://gerrit.wikimedia.org/r/383803 (https://phabricator.wikimedia.org/T177233) [09:38:22] (03CR) 10Ema: [C: 032] cp3008: upgrade to Varnish 5 [puppet] - 10https://gerrit.wikimedia.org/r/383803 (https://phabricator.wikimedia.org/T177233) (owner: 10Ema) [09:40:08] 10Operations, 10wikidiff2, 10User-Addshore, 10WMDE-QWERTY-Team-Board: Update and use php-wikidiff2 to 1.5 in production - https://phabricator.wikimedia.org/T177891#3678991 (10MoritzMuehlenhoff) >>! In T177891#3678968, @Tobi_WMDE_SW wrote: > @MoritzMuehlenhoff T176485 has just been merged, so it would be su... [09:40:29] !log upgrading cp3008 to Varnish 5 T177233 [09:40:34] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:40:34] T177233: Upgrade cache_misc to Varnish 5 - https://phabricator.wikimedia.org/T177233 [09:40:53] 10Operations, 10wikidiff2, 10User-Addshore, 10WMDE-QWERTY-Team-Board: Update and use php-wikidiff2 to 1.5 in production - https://phabricator.wikimedia.org/T177891#3678994 (10Tobi_WMDE_SW) @MoritzMuehlenhoff sounds great! Thanks [09:43:22] (03CR) 10Ema: [C: 032] instrumentation: pools with one server are not misconfigured [debs/pybal] - 10https://gerrit.wikimedia.org/r/383591 (https://phabricator.wikimedia.org/T177815) (owner: 10Ema) [09:43:34] (03PS1) 10Ema: instrumentation: pools with one server are not misconfigured [debs/pybal] (1.14) - 10https://gerrit.wikimedia.org/r/383804 (https://phabricator.wikimedia.org/T177815) [09:43:35] not sure if known yet: on commons the database is locked from time to time & lags [09:44:40] (03CR) 10Elukey: "New pcc: https://puppet-compiler.wmflabs.org/compiler02/8293/" [puppet] - 10https://gerrit.wikimedia.org/r/380804 (https://phabricator.wikimedia.org/T176223) (owner: 10Ottomata) [09:45:43] (03CR) 10Ema: [V: 032 C: 032] instrumentation: pools with one server are not misconfigured [debs/pybal] (1.14) - 10https://gerrit.wikimedia.org/r/383804 (https://phabricator.wikimedia.org/T177815) (owner: 10Ema) [09:49:44] (03PS1) 10Ema: 1.14.1: pools with one server are not misconfigured [debs/pybal] - 10https://gerrit.wikimedia.org/r/383805 (https://phabricator.wikimedia.org/T177815) [09:51:04] (03PS1) 10Ema: 1.14.1: pools with one server are not misconfigured [debs/pybal] (1.14) - 10https://gerrit.wikimedia.org/r/383807 (https://phabricator.wikimedia.org/T177815) [09:58:01] (03CR) 10Ema: [V: 032 C: 032] 1.14.1: pools with one server are not misconfigured [debs/pybal] - 10https://gerrit.wikimedia.org/r/383805 (https://phabricator.wikimedia.org/T177815) (owner: 10Ema) [09:58:13] (03CR) 10Ema: [V: 032 C: 032] 1.14.1: pools with one server are not misconfigured [debs/pybal] (1.14) - 10https://gerrit.wikimedia.org/r/383807 (https://phabricator.wikimedia.org/T177815) (owner: 10Ema) [10:01:07] 10Operations, 10ORES, 10Patch-For-Review, 10Scoring-platform-team (Current), 10User-Ladsgroup: Review and fix file handle management in worker and celery processes - https://phabricator.wikimedia.org/T174402#3679022 (10akosiaris) Given that up to now we are configuring the PER PROCESS file limit, can I a... [10:02:26] !log pybal 1.14.1 uploaded to apt.w.o [10:02:32] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:03:11] !log lvs1005: upgrade pybal to 1.14.1 T177815 [10:03:18] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:03:18] T177815: Alerts on LVS services with one single realserver - https://phabricator.wikimedia.org/T177815 [10:04:04] RECOVERY - PyBal backends health check on lvs1005 is OK: PYBAL OK - All pools are healthy [10:04:16] good boy [10:12:59] (03PS1) 10Muehlenhoff: Update debian/changelog [debs/debdeploy] - 10https://gerrit.wikimedia.org/r/383811 [10:16:14] (03PS1) 10Muehlenhoff: Update changelog [debs/debdeploy] - 10https://gerrit.wikimedia.org/r/383813 [10:18:30] (03CR) 10Thiemo Mättig (WMDE): [C: 04-1] Add configuration for statement indexing for Wikidata (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/383464 (https://phabricator.wikimedia.org/T175199) (owner: 10Smalyshev) [10:27:01] hoo: was your response above also related to "on commons the database is locked from time to time & lags"? [10:40:14] (03CR) 10Muehlenhoff: [C: 032] Update changelog [debs/debdeploy] - 10https://gerrit.wikimedia.org/r/383813 (owner: 10Muehlenhoff) [10:49:25] 10Operations, 10Continuous-Integration-Scaling, 10Nodepool, 10Release-Engineering-Team (Backlog), 10WorkType-NewFunctionality: Backport python-shade from debian/testing to jessie-wikimedia - https://phabricator.wikimedia.org/T107267#3679110 (10hashar) 05Open>03declined We are most probably never goin... [11:15:13] (03PS1) 10Muehlenhoff: Add a new component thirdparty/hwraid [puppet] - 10https://gerrit.wikimedia.org/r/383821 (https://phabricator.wikimedia.org/T158583) [11:15:15] (03PS1) 10Muehlenhoff: Adapt synchronisation of raid blobs to use thirdparty/hwraid [puppet] - 10https://gerrit.wikimedia.org/r/383822 (https://phabricator.wikimedia.org/T158583) [11:27:34] !log finished cleanup of ores_classification in wikidatawiki, still needs more work (T159753) [11:27:41] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:27:41] T159753: Concerns about ores_classification table size on enwiki - https://phabricator.wikimedia.org/T159753 [11:28:11] (03PS1) 10Muehlenhoff: Add reference to Phab task [puppet] - 10https://gerrit.wikimedia.org/r/383823 [11:33:10] (03CR) 10Muehlenhoff: [C: 032] Add reference to Phab task [puppet] - 10https://gerrit.wikimedia.org/r/383823 (owner: 10Muehlenhoff) [11:41:23] 10Operations, 10DBA, 10Patch-For-Review: Create less overhead on bacula jobs when dumping production databases - https://phabricator.wikimedia.org/T162789#3679191 (10jcrespo) Backups of local hosts worked, but x1 only backed up %wik% databases, we have to get a full list of databases there and add it to dump... [11:43:35] (03PS1) 10Ema: cp3010: upgrade to Varnish 5 [puppet] - 10https://gerrit.wikimedia.org/r/383824 (https://phabricator.wikimedia.org/T177233) [11:44:34] (03CR) 10Ema: [C: 032] cp3010: upgrade to Varnish 5 [puppet] - 10https://gerrit.wikimedia.org/r/383824 (https://phabricator.wikimedia.org/T177233) (owner: 10Ema) [11:46:21] !log upgrading cp3010 to Varnish 5 T177233 [11:46:27] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:46:28] T177233: Upgrade cache_misc to Varnish 5 - https://phabricator.wikimedia.org/T177233 [12:00:42] (03PS16) 10Elukey: Set up separate druid public-eqiad cluster. [puppet] - 10https://gerrit.wikimedia.org/r/380804 (https://phabricator.wikimedia.org/T176223) (owner: 10Ottomata) [12:06:59] (03CR) 10Elukey: [C: 032] Set up separate druid public-eqiad cluster. [puppet] - 10https://gerrit.wikimedia.org/r/380804 (https://phabricator.wikimedia.org/T176223) (owner: 10Ottomata) [12:08:06] all right I just merged --^ and now I am changing a lot of hadoop ferm rules etc.. [12:08:25] this should be completely fine but any analytics alarm that might fire is due to me probably [12:12:48] I don't see hashar around, and I am not sure who is in charge of beta cluster, but looks like it's down? [12:13:08] https://en.wikipedia.beta.wmflabs.org/wiki/Main_Page returns "MediaWiki internal error." [12:15:56] works for me... [12:16:22] eddiegp: argh, let me try again [12:16:33] zeljkof: No, you're right [12:16:42] still broken for me :( [12:16:46] Seems I've got a cached version of the main page [12:16:53] Other pages don't work [12:17:01] yeah, it broke when I clicked login page, that is right [12:18:16] reported: https://phabricator.wikimedia.org/T178062 [12:18:39] Should have a look what was merged in master in the last hours/minutes. [12:19:07] looking [12:19:31] but who knows where the problem is, maybe in core, or in config, or extension... [12:19:52] the error does mention AuthManager [12:23:18] andre__: do you know who to ping to get https://phabricator.wikimedia.org/T178062 fixed? [12:23:34] (03PS3) 10Muehlenhoff: Provide a reboot wrapper for Cumin clients [puppet] - 10https://gerrit.wikimedia.org/r/383316 [12:24:01] zeljkof, somewhere between Ops and Releng, I'd say? :) [12:24:07] uh wait [12:24:49] andre__: well, as far as I can see releng (hashar) is not around, not sure who in ops to ping [12:25:00] whoever is on duty :D [12:25:09] see topic [12:25:16] ah, that's what it's for :) [12:25:20] thanks [12:25:41] XioNoX: could you please take a look at https://phabricator.wikimedia.org/T178062 [12:26:22] zeljkof, I have a suspicion what happened, let me comment on the task and ping Reedy :P [12:26:36] andre__: please do :) [12:26:41] regression of https://phabricator.wikimedia.org/T178033 maybe? [12:27:31] That's what I though from looking at gerrits latest merges, a lot of patches from Reedy relating to this got merged this morning. [12:28:32] there is a comment at https://gerrit.wikimedia.org/r/#/c/383744/ [12:28:47] "This knocked beta down" :'D [12:28:51] Which is the latest one related to auth merged [12:31:10] I'd say let's revert that one for now. [12:32:53] Revert is at https://gerrit.wikimedia.org/r/#/c/383827/ [12:33:26] (03CR) 10Muehlenhoff: [C: 032] Provide a reboot wrapper for Cumin clients [puppet] - 10https://gerrit.wikimedia.org/r/383316 (owner: 10Muehlenhoff) [12:37:53] PROBLEM - puppet last run on pc1006 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 3 minutes ago with 1 failures. Failed resources (up to 3 shown): File[/usr/local/sbin/reboot-host] [12:42:54] RECOVERY - puppet last run on pc1006 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [12:45:32] (03PS3) 10Gehel: service::node - add a defined() guard on git deployment [puppet] - 10https://gerrit.wikimedia.org/r/347855 [12:51:12] (03Abandoned) 10BBlack: geodns: US+CA prefer codfw over eqiad [dns] - 10https://gerrit.wikimedia.org/r/383402 (owner: 10BBlack) [12:51:25] (03PS2) 10BBlack: browsersec: bump to 29% 2017-10-12 [puppet] - 10https://gerrit.wikimedia.org/r/376315 (https://phabricator.wikimedia.org/T163251) [12:52:01] (03CR) 10BBlack: [C: 032] browsersec: bump to 29% 2017-10-12 [puppet] - 10https://gerrit.wikimedia.org/r/376315 (https://phabricator.wikimedia.org/T163251) (owner: 10BBlack) [12:56:58] !log uploaded librsvg 2.40.18 for jessie-wikimedia to apt.wikimedia.org [12:57:05] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:00:03] 10Operations, 10Commons, 10Thumbor, 10media-storage, 10Performance-Team (Radar): Jessie rsvg/cairo can't render specific SVG file on Commons - https://phabricator.wikimedia.org/T170628#3679486 (10MoritzMuehlenhoff) [13:00:04] addshore, hashar, anomie, RainbowSprinkles, aude, MaxSem, twentyafterfour, RoanKattouw, Dereckson, thcipriani, Niharika, and zeljkof: I, the Bot under the Fountain, allow thee, The Deployer, to do European Mid-day SWAT(Max 8 patches) deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20171012T1300). [13:00:06] Amir1: A patch you scheduled for European Mid-day SWAT(Max 8 patches) is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker. [13:01:47] I can SWAT today [13:01:51] that revert isn't on an ops repo, I'm not even sure we have merge rights there [13:01:55] eddiegp: ^ [13:03:37] Amir1: around for SWAT? want to deploy yourself, or should I? [13:04:30] (03CR) 10Ottomata: "Hm, as Hadoop clients, I think druid nodes should be able to talk to all DataNodes so they can be a normal Hadoop client :) BUt yaaa thank" [puppet] - 10https://gerrit.wikimedia.org/r/380804 (https://phabricator.wikimedia.org/T176223) (owner: 10Ottomata) [13:06:01] (03PS3) 10Zfilipin: Remove deprecated config variable for Wikibase [mediawiki-config] - 10https://gerrit.wikimedia.org/r/381642 (https://phabricator.wikimedia.org/T129475) (owner: 10Ladsgroup) [13:06:54] looks like Amir1 is not around, nothing else for swat [13:07:07] !log EU SWAT finished [13:07:13] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:07:34] Oh yeah, just saw this. [13:07:51] I'd like to have https://gerrit.wikimedia.org/r/#/c/383827/ merged. [13:08:12] eddiegp: looks like I have +2 on that repo, should I merge it? [13:08:25] Yes please. [13:08:47] Shouldn't leave beta in a broken state. [13:08:53] sure, I hesitated since I am not familiar with the code, but let's see if it fixes the problem [13:09:39] +2d [13:10:54] bblack: BTW, ldap/ops is an owner and thus has merge rights on all repos in gerrit (sf. https://gerrit.wikimedia.org/r/#/admin/projects/All-Projects,access "refs/* Owner: ldap/ops") [13:11:24] zeljkof: Thanks :) [13:12:17] Will need a few minutes to reach beta I guess. [13:12:59] !log rolling restart of logstash to pick up thumbor logs (T150734) [13:13:06] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:13:06] T150734: Make Thumbor logs available in ELK - https://phabricator.wikimedia.org/T150734 [13:16:51] eddiegp, XioNoX: looks like 383827 did not help, beta is still broken [13:17:55] eddiegp: that's probably true, but I think that's meant for emergencies. In general, we're not going to review or make merge decision on non-emergency MediaWiki commits. [13:18:53] bblack: yeah, that was just re: "not even sure we have merge rights there" ;) [13:19:11] yeah, I've never exercised them AFAIK, in any case :) [13:21:47] zeljkof: Right :/ But the error message changed! It still is "expected PrimaryAuthenticationProvider" but it is now "got AntiSpoofPreAuthenticationProvider" and not "got TitleBlacklistSomethingAuthentication" any more. [13:22:04] progress :) [13:22:13] Don't know if that means we've reverted the wrong commit or if we should look for another one ... [13:23:47] I mean of course we could start reverting https://gerrit.wikimedia.org/r/#/c/383731/ now (the AntiSpoof one) but I'm not sure if that's going to help or just bring up the next error message :D [13:26:30] zeljkof: ohh, sorry, my paperwork with Germans took way more than expected [13:27:13] Amir1: there is still time, want to do the deploy yourself? or should I? [13:27:26] as you wish [13:27:40] it's deprecated so should not make any problems [13:28:17] What did you expect from paperwork with germans? We're known for our bureaucracy for a good reason :P [13:28:56] Amir1: if you want to deploy, go ahead :) if you need help, I can deploy [13:28:56] making a bank account, they gave a me huge booklet with lots of contracts to sign [13:29:28] (03PS1) 10Elukey: role::an_cluster::database::meta: allow the druid pulic cluster to use mysql [puppet] - 10https://gerrit.wikimedia.org/r/383833 (https://phabricator.wikimedia.org/T176223) [13:29:28] Amir1: deprecation just broke beta cluster ;) [13:29:35] zeljkof: let me deploy and you go on with your work [13:29:46] :) [13:29:53] Amir1: great, I am around if you need me [13:30:15] 10Operations, 10Traffic, 10Patch-For-Review, 10Performance-Team (Radar): Upgrade to Varnish 5 - https://phabricator.wikimedia.org/T168529#3679599 (10BBlack) So, thinking ahead past `cache_misc` and assuming that's successful, probably the next target should `cache_upload` (lower complexity than text, and i... [13:31:35] Thanks [13:32:12] (03CR) 10Elukey: [C: 032] role::an_cluster::database::meta: allow the druid pulic cluster to use mysql [puppet] - 10https://gerrit.wikimedia.org/r/383833 (https://phabricator.wikimedia.org/T176223) (owner: 10Elukey) [13:33:21] (03CR) 10Ladsgroup: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/381642 (https://phabricator.wikimedia.org/T129475) (owner: 10Ladsgroup) [13:33:23] !log updating thumbor1001 to librsvg 2.40.18 [13:33:28] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:34:33] (03Merged) 10jenkins-bot: Remove deprecated config variable for Wikibase [mediawiki-config] - 10https://gerrit.wikimedia.org/r/381642 (https://phabricator.wikimedia.org/T129475) (owner: 10Ladsgroup) [13:34:48] (03CR) 10Elukey: "> Hm, as Hadoop clients, I think druid nodes should be able to talk" [puppet] - 10https://gerrit.wikimedia.org/r/380804 (https://phabricator.wikimedia.org/T176223) (owner: 10Ottomata) [13:35:06] (03PS2) 10BBlack: WMF-Last-Access-Global: not for wikimedia.org [puppet] - 10https://gerrit.wikimedia.org/r/383353 (https://phabricator.wikimedia.org/T174640) [13:36:10] (03CR) 10jenkins-bot: Remove deprecated config variable for Wikibase [mediawiki-config] - 10https://gerrit.wikimedia.org/r/381642 (https://phabricator.wikimedia.org/T129475) (owner: 10Ladsgroup) [13:37:45] it's in mwdebug1002, testing [13:37:59] (03CR) 10BBlack: [C: 032] WMF-Last-Access-Global: not for wikimedia.org [puppet] - 10https://gerrit.wikimedia.org/r/383353 (https://phabricator.wikimedia.org/T174640) (owner: 10BBlack) [13:38:16] confirming it works fine [13:38:19] going live [13:40:58] !log ladsgroup@tin Synchronized wmf-config/Wikibase.php: SWAT: Remove deprecated config variable for Wikibase (T129475) (duration: 01m 02s) [13:41:05] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:41:05] T129475: [Task] Remove obsolete beta feature settings - https://phabricator.wikimedia.org/T129475 [13:41:30] the deployment is done, will keep monitoring fatalmonitor just in case [13:43:33] (03CR) 10Elukey: "> > Hm, as Hadoop clients, I think druid nodes should be able to talk" [puppet] - 10https://gerrit.wikimedia.org/r/380804 (https://phabricator.wikimedia.org/T176223) (owner: 10Ottomata) [13:45:02] !log updating remaining thumbor servers to librsvg 2.40.18 [13:45:08] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:48:37] !log deployed the new Analytics Public Druid cluster - T176223 [13:48:44] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:48:45] T176223: Create Druid public cluster such AQS can query druid public data - https://phabricator.wikimedia.org/T176223 [13:51:44] (03PS1) 10Ema: pybal: use Monitoring::Plugin in check_pybal [puppet] - 10https://gerrit.wikimedia.org/r/383834 (https://phabricator.wikimedia.org/T177961) [13:54:07] !log updating image scalers to librsvg 2.40.18 [13:54:12] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:01:32] 10Operations, 10Commons, 10Thumbor, 10media-storage, 10Performance-Team (Radar): Jessie rsvg/cairo can't render specific SVG file on Commons - https://phabricator.wikimedia.org/T170628#3679700 (10MoritzMuehlenhoff) 05stalled>03Resolved a:03MoritzMuehlenhoff The new 2.4.18 backport has been rolled o... [14:12:43] 10Operations, 10Commons, 10Thumbor, 10media-storage, 10Performance-Team (Radar): Jessie rsvg/cairo can't render specific SVG file on Commons - https://phabricator.wikimedia.org/T170628#3679730 (10Gilles) 05Resolved>03Open The upgrade hasn't fixed the problem. It seems like converting the file without... [14:21:53] 10Operations, 10MediaWiki-Platform-Team, 10TechCom-RfC, 10HHVM, 10NewPHP: Migrate to PHP 7 in WMF production - https://phabricator.wikimedia.org/T176370#3679755 (10Anomie) >>! In T176370#3678552, @alex-mashin wrote: > Is there a reason to migrate only to PHP 7.0 and not to PHP 7.1 of even 7.2, as it is g... [14:23:41] 10Operations, 10MediaWiki-Platform-Team, 10TechCom-RfC, 10HHVM, 10NewPHP: Migrate to PHP 7 in WMF production - https://phabricator.wikimedia.org/T176370#3679757 (10Reedy) >>! In T176370#3679755, @Anomie wrote: >>>! In T176370#3678552, @alex-mashin wrote: >> Is there a reason to migrate only to PHP 7.0 an... [14:24:31] (03PS1) 10Gehel: maps: introduce the base requirements for vector tiles and cleartables [puppet] - 10https://gerrit.wikimedia.org/r/383841 (https://phabricator.wikimedia.org/T157613) [14:24:38] (03Abandoned) 10Gehel: [WIP] maps: move to vector tiles and cleartables [puppet] - 10https://gerrit.wikimedia.org/r/378245 (https://phabricator.wikimedia.org/T157613) (owner: 10Gehel) [14:28:53] (03PS1) 10Hashar: Support --bare in git::clone() [puppet] - 10https://gerrit.wikimedia.org/r/383842 [14:28:55] (03PS1) 10Hashar: ci: provide bare copy of puppet.git on Docker slaves [puppet] - 10https://gerrit.wikimedia.org/r/383843 [14:29:02] !log rebooting kubernetes masters to new 4.9.51 kernel [14:29:08] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:33:29] (03PS1) 10Elukey: role::analytics_cluster::hadoop::ferm::namenode: disable httpfs access [puppet] - 10https://gerrit.wikimedia.org/r/383844 [14:34:10] (03PS2) 10Hashar: ci: provide bare copy of puppet.git on Docker slaves [puppet] - 10https://gerrit.wikimedia.org/r/383843 [14:36:51] (03PS2) 10Gehel: maps: introduce the base requirements for vector tiles and cleartables [puppet] - 10https://gerrit.wikimedia.org/r/383841 (https://phabricator.wikimedia.org/T157613) [14:37:40] (03CR) 10Muehlenhoff: [C: 031] role::analytics_cluster::hadoop::ferm::namenode: disable httpfs access [puppet] - 10https://gerrit.wikimedia.org/r/383844 (owner: 10Elukey) [14:39:53] (03CR) 10Elukey: [C: 032] role::analytics_cluster::hadoop::ferm::namenode: disable httpfs access [puppet] - 10https://gerrit.wikimedia.org/r/383844 (owner: 10Elukey) [14:41:38] !log rebooting kubernetes staging servers to new 4.9.51 kernel [14:41:43] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:43:42] (03PS3) 10Eevans: cassandra: move machines from restbase to restbase_ng cluster [puppet] - 10https://gerrit.wikimedia.org/r/382506 (https://phabricator.wikimedia.org/T177501) [14:47:01] !log rebooting kubernetes workers to new 4.9.51 kernel [14:47:07] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:52:09] !log netmon1002 (librenms, rancid) - rebooting for kernel upgrade [14:52:13] (03CR) 10Gehel: "Puppet compiler looks happy: https://puppet-compiler.wmflabs.org/compiler02/8299/" [puppet] - 10https://gerrit.wikimedia.org/r/383841 (https://phabricator.wikimedia.org/T157613) (owner: 10Gehel) [14:52:15] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:55:05] (03PS2) 10Hashar: Support --bare in git::clone() [puppet] - 10https://gerrit.wikimedia.org/r/383842 [14:56:14] RECOVERY - Check systemd state on kubernetes1004 is OK: OK - running: The system is fully operational [14:57:20] !log netmon1002 - arming keyholder for rancid deploy [14:57:26] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:58:10] 10Operations, 10ORES, 10Patch-For-Review, 10Scoring-platform-team (Current), 10User-Ladsgroup: Review and fix file handle management in worker and celery processes - https://phabricator.wikimedia.org/T174402#3679834 (10Halfak) @akosiaris, see my notes in T177036 [15:03:24] moritzm: after you've restarted logstash RESTBase logs completely disappeared from there at the same time [15:04:49] (03PS3) 10Hashar: Support --bare in git::clone() [puppet] - 10https://gerrit.wikimedia.org/r/383842 (https://phabricator.wikimedia.org/T178076) [15:05:25] (03PS3) 10Hashar: ci: provide bare copy of puppet.git on Docker slaves [puppet] - 10https://gerrit.wikimedia.org/r/383843 (https://phabricator.wikimedia.org/T178076) [15:07:22] (03CR) 10Hashar: "Cherry picked on CI puppet master. The follow up change https://gerrit.wikimedia.org/r/#/c/383843/ adds operations/puppet.git on the CI do" [puppet] - 10https://gerrit.wikimedia.org/r/383842 (https://phabricator.wikimedia.org/T178076) (owner: 10Hashar) [15:07:29] Pchelolo: no idea about that, I only restarted it [15:07:55] best open a task about it [15:08:21] (03CR) 10Hashar: [V: 031 C: 031] "Cherry picked on the CI puppet master and that seems to provide the bare clone of puppet.git" [puppet] - 10https://gerrit.wikimedia.org/r/383843 (https://phabricator.wikimedia.org/T178076) (owner: 10Hashar) [15:08:27] moritzm: ye, that's weird indeed, I've just noticed they've dissappeared and that the time matched with the restart precisely [15:08:32] will create a task [15:11:59] !log gerrit2001 - rebooting for kernel upgrade [15:12:05] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:13:09] 10Operations, 10Commons, 10Thumbor, 10media-storage, 10Performance-Team (Radar): Jessie rsvg/cairo can't render specific SVG file on Commons - https://phabricator.wikimedia.org/T170628#3679897 (10Gilles) It's specifically resizing it takes issue with. [15:17:27] !log mobrovac@tin Started restart [electron-render/deploy@8dd5f13]: Electron hanging - T174916 [15:17:32] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:17:33] T174916: electron/pdfrender hangs - https://phabricator.wikimedia.org/T174916 [15:23:11] 10Operations, 10Scap: scap should not pull in HHVM on stretch hosts using PHP7 - https://phabricator.wikimedia.org/T178039#3679940 (10bd808) Scap [[https://phabricator.wikimedia.org/source/scap/browse/master/scap/tasks.py;96c10d0176573f19ce3beb86e24bba7ffdb29893$144-165|shells out to PHP]] for one particular w... [15:27:53] (03PS1) 10Andrew Bogott: compiler-update-facts: restore optional use of PUPPET_MASTER env [puppet] - 10https://gerrit.wikimedia.org/r/383857 (https://phabricator.wikimedia.org/T97081) [15:28:38] (03PS1) 10Dmaza: Enable $wgAbuseFilterProfile on ptwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/383858 (https://phabricator.wikimedia.org/T177641) [15:30:11] (03PS2) 10Dmaza: Enable $wgAbuseFilterProfile on ptwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/383858 (https://phabricator.wikimedia.org/T177641) [15:45:23] (03Draft1) 10Andrew Bogott: TESTING: this is a patch that introduces an intentional broken diff [puppet] - 10https://gerrit.wikimedia.org/r/383860 [15:58:20] btw, There are complaints about slave lag causing read only mode on commons this morning at COM:VP [15:59:14] RECOVERY - puppet last run on netmon1002 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [16:00:04] godog, moritzm, and _joe_: #bothumor My software never has bugs. It just develops random features. Rise for Puppet SWAT(Max 8 patches). (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20171012T1600). [16:00:04] No GERRIT patches in the queue for this window AFAICS. [16:00:51] * elukey dances [16:00:59] Whoever did the humor in jouncebot needs some serious wikilove [16:01:01] (03PS2) 10Andrew Bogott: TESTING: this is a patch that introduces an intentional broken diff [puppet] - 10https://gerrit.wikimedia.org/r/383860 [16:04:19] (03CR) 10Thcipriani: [C: 031] ci: provide bare copy of puppet.git on Docker slaves [puppet] - 10https://gerrit.wikimedia.org/r/383843 (https://phabricator.wikimedia.org/T178076) (owner: 10Hashar) [16:15:42] (03Abandoned) 10Thcipriani: Deploy ocg with scap3 [puppet] - 10https://gerrit.wikimedia.org/r/378241 (https://phabricator.wikimedia.org/T129142) (owner: 10Thcipriani) [16:16:08] (03Draft2) 10Hashar: spec: subject type is infered from dir structue [puppet] - 10https://gerrit.wikimedia.org/r/383870 [16:21:02] (03CR) 10BryanDavis: [C: 032] Remove Schema:CommandInvocation EventLogging [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/381713 (https://phabricator.wikimedia.org/T166712) (owner: 10BryanDavis) [16:21:45] (03Merged) 10jenkins-bot: Remove Schema:CommandInvocation EventLogging [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/381713 (https://phabricator.wikimedia.org/T166712) (owner: 10BryanDavis) [16:22:07] (03PS3) 10Dzahn: Phab: Allow aklapper to delete panels on dashboards [puppet] - 10https://gerrit.wikimedia.org/r/380959 (owner: 10Aklapper) [16:22:32] (03CR) 10jerkins-bot: [V: 04-1] Phab: Allow aklapper to delete panels on dashboards [puppet] - 10https://gerrit.wikimedia.org/r/380959 (owner: 10Aklapper) [16:23:51] 10Operations, 10Scap: scap should not pull in HHVM on stretch hosts using PHP7 - https://phabricator.wikimedia.org/T178039#3680162 (10thcipriani) 05Open>03Resolved [16:24:44] 10Operations, 10Scap: scap should not pull in HHVM on stretch hosts using PHP7 - https://phabricator.wikimedia.org/T178039#3680177 (10thcipriani) 05Resolved>03Open Reopening until deployed. Landing the patch closed the task. [16:29:02] (03PS2) 10BryanDavis: Extend webservice -h details [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/383401 (owner: 10Merlijn van Deen) [16:33:18] (03PS1) 10Chad: group2 to wmf.3 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/383876 [16:33:20] (03CR) 10Chad: [C: 04-2] group2 to wmf.3 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/383876 (owner: 10Chad) [16:35:10] (03CR) 10BryanDavis: [C: 032] Extend webservice -h details [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/383401 (owner: 10Merlijn van Deen) [16:35:15] 10Operations, 10ORES, 10Patch-For-Review, 10Scoring-platform-team (Current), 10User-Ladsgroup: Review and fix file handle management in worker and celery processes - https://phabricator.wikimedia.org/T174402#3680194 (10awight) @akosiaris Good questions! I only just now found good [[ https://www.freedesk... [16:35:51] (03Merged) 10jenkins-bot: Extend webservice -h details [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/383401 (owner: 10Merlijn van Deen) [16:37:09] (03Abandoned) 10BryanDavis: Explain scripts/webservice [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/381296 (https://phabricator.wikimedia.org/T176018) (owner: 10Mridubhatnagar) [16:37:22] (03Abandoned) 10BryanDavis: Understand the scripts/webservice [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/380568 (https://phabricator.wikimedia.org/T176624) (owner: 10Sowjanyavemuri) [16:38:40] (03PS1) 10BryanDavis: Bump debian package version [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/383877 [16:38:50] (03PS1) 10Cmjohnson: Removing remaining dns entries for db1022[3-4], netmont1001 [dns] - 10https://gerrit.wikimedia.org/r/383878 [16:39:38] (03CR) 10BryanDavis: [C: 032] Bump debian package version [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/383877 (owner: 10BryanDavis) [16:40:14] (03Merged) 10jenkins-bot: Bump debian package version [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/383877 (owner: 10BryanDavis) [16:42:39] 10Operations, 10ops-ulsfo: Check cp4026 power supply redundancy - https://phabricator.wikimedia.org/T178085#3680213 (10herron) [16:43:19] (03CR) 10Cmjohnson: [C: 032] Removing remaining dns entries for db1022[3-4], netmont1001 [dns] - 10https://gerrit.wikimedia.org/r/383878 (owner: 10Cmjohnson) [16:43:35] ACKNOWLEDGEMENT - IPMI Sensor Status on cp4026 is CRITICAL: Sensor Type(s) Temperature, Power_Supply Status: Critical [PS Redundancy = Critical, Status = Critical] Herron https://phabricator.wikimedia.org/T178085 [16:44:04] (03PS1) 10Thcipriani: scap: upgrade to 3.7.1-1 [puppet] - 10https://gerrit.wikimedia.org/r/383879 (https://phabricator.wikimedia.org/T127762) [16:46:20] ACKNOWLEDGEMENT - Host lvs1009 is DOWN: PING CRITICAL - Packet loss = 100% daniel_zahn HP sucks https://phabricator.wikimedia.org/rOPUPe4526a97ad2ba650ebcf0ab0e7b9e72036504990 [16:47:10] 10Operations, 10ops-eqiad, 10DBA: Decommission db1024 - https://phabricator.wikimedia.org/T164702#3680249 (10Cmjohnson) [16:48:28] 10Operations, 10DBA, 10Patch-For-Review: Decomissions old s2 eqiad hosts (db1018, db1021, db1024, db1036) - https://phabricator.wikimedia.org/T162699#3680252 (10Cmjohnson) [16:48:31] 10Operations, 10DBA, 10Patch-For-Review: Decommission old coredb machines (<=db1050) - https://phabricator.wikimedia.org/T134476#3680253 (10Cmjohnson) [16:48:33] 10Operations, 10ops-eqiad, 10DBA: Decommission db1024 - https://phabricator.wikimedia.org/T164702#3242621 (10Cmjohnson) 05Open>03Resolved done [16:49:16] 10Operations, 10Analytics: rack/setup/install furud.codfw.wmnet - https://phabricator.wikimedia.org/T176506#3680268 (10herron) [16:49:18] 10Operations, 10Performance-Team: webpagetest-alerts: Difference in size authenticated - https://phabricator.wikimedia.org/T164209#3680270 (10Dzahn) fyi, currently there is: CRITICAL: https://grafana.wikimedia.org/dashboard/db/webpagetest-alerts is alerting: Speed Index Internet Explorer Desktop [ALERT] alert... [16:49:20] 10Operations, 10ops-eqiad, 10DBA, 10Patch-For-Review: Decommission db1023 - https://phabricator.wikimedia.org/T166486#3680271 (10Cmjohnson) [16:49:41] 10Operations: furud /mnt/2a 97% full - https://phabricator.wikimedia.org/T178087#3680257 (10herron) [16:49:50] Krinkle: ^ an alert says there is an issue with speed for Internet Explorer [16:50:02] (03PS2) 10Cmjohnson: Deleting mgmt entries for ms-be1001-12 & ms-fe1001-4 [dns] - 10https://gerrit.wikimedia.org/r/383628 [16:50:15] ACKNOWLEDGEMENT - Disk space on furud is CRITICAL: DISK CRITICAL - free space: /mnt/2a 1248862 MB (3% inode=96%): Herron https://phabricator.wikimedia.org/T178087 [16:50:18] should i make tickets for these? [16:50:32] i saw there is a component "webpagetest" in phab [16:50:38] and these are called webpagetest-alerts [16:51:07] (03PS1) 10Ottomata: LVS for druid-public broker and overlord [puppet] - 10https://gerrit.wikimedia.org/r/383880 (https://phabricator.wikimedia.org/T176223) [16:51:37] (03CR) 10jerkins-bot: [V: 04-1] LVS for druid-public broker and overlord [puppet] - 10https://gerrit.wikimedia.org/r/383880 (https://phabricator.wikimedia.org/T176223) (owner: 10Ottomata) [16:52:44] (03PS2) 10Ottomata: LVS for druid-public broker and overlord [puppet] - 10https://gerrit.wikimedia.org/r/383880 (https://phabricator.wikimedia.org/T176223) [16:56:20] @seen phedenskog [16:56:20] mutante: phedenskog is in here, right now [16:56:57] oh :) hi, peter that question was for you then [16:57:29] 10Operations, 10User-fgiunchedi: Integrate stretch 9.2 point release - https://phabricator.wikimedia.org/T177739#3680277 (10MoritzMuehlenhoff) These are fully rolled out: at-spi2-core gnupg2 gnutls28 libselinux ncurses nettle [17:00:04] gwicke, cscott, arlolra, subbu, halfak, and Amir1: I seem to be stuck in Groundhog week. Sigh. Time for (yet another) Services – Graphoid / Parsoid / OCG / Citoid / ORES deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20171012T1700). [17:00:04] No GERRIT patches in the queue for this window AFAICS. [17:00:14] no parsoid deploy today [17:00:24] No ORES yet. "soon" was a lie :( [17:01:10] halfak: so is the cake [17:01:36] :) [17:01:49] (03CR) 10Cmjohnson: [C: 032] Deleting mgmt entries for ms-be1001-12 & ms-fe1001-4 [dns] - 10https://gerrit.wikimedia.org/r/383628 (owner: 10Cmjohnson) [17:06:34] bblack: lvs1007 - Failed resources (up to 3 shown): Exec[txqueuelen-eth3],Exec[txqueuelen-eth2]. known ? [17:06:35] 10Operations, 10DBA, 10Patch-For-Review: Decommission old coredb machines (<=db1050) - https://phabricator.wikimedia.org/T134476#3680285 (10Cmjohnson) [17:06:37] 10Operations, 10ops-eqiad, 10DBA, 10Patch-For-Review: Decommission db1023 - https://phabricator.wikimedia.org/T166486#3680284 (10Cmjohnson) 05Open>03Resolved [17:06:59] 10Operations, 10ops-eqiad, 10hardware-requests, 10monitoring: decom netmon1001 - https://phabricator.wikimedia.org/T171018#3680287 (10Cmjohnson) [17:07:06] 10Operations, 10ops-eqiad, 10hardware-requests, 10monitoring: decom netmon1001 - https://phabricator.wikimedia.org/T171018#3450938 (10Cmjohnson) 05Open>03Resolved [17:07:16] 10Operations, 10OCG-General, 10Readers-Community-Engagement, 10Epic, and 3 others: [EPIC] (Proposal) Replicate core OCG features and sunset OCG service - https://phabricator.wikimedia.org/T150871#3680293 (10ovasileva) [17:09:00] mutante: yeah, sorry. a NIC was replaced and systemd renamed the replaced interfaces from eth2+eth3 to something like eth10+eth11, but it will go away on some future reinstall or whatever once hw issues are fixed [17:09:46] bblack: *nod* alright, np [17:10:19] lvs1009 - HP sucks , you reinstalled with stretch, currently it's down [17:10:31] saw that commit, heh :) [17:11:15] ACKNOWLEDGEMENT - puppet last run on lvs1007 is CRITICAL: CRITICAL: Puppet has 2 failures. Last run 24 minutes ago with 2 failures. Failed resources (up to 3 shown): Exec[txqueuelen-eth3],Exec[txqueuelen-eth2] daniel_zahn NIC was replaced, systemd renamed interfaces, will go away on future reinstall [17:12:15] 10Operations, 10ops-eqiad, 10Discovery, 10Discovery-Search, 10Elasticsearch: check elastic1022 power supply redundancy - https://phabricator.wikimedia.org/T177631#3680310 (10debt) Moving to up next to keep an eye on this, our team doesn't expect to do any work with this ticket. [17:13:50] (03PS3) 10Ottomata: LVS for druid-public broker [puppet] - 10https://gerrit.wikimedia.org/r/383880 (https://phabricator.wikimedia.org/T176223) [17:23:46] (03Abandoned) 10Muehlenhoff: Update debian/changelog [debs/debdeploy] - 10https://gerrit.wikimedia.org/r/383811 (owner: 10Muehlenhoff) [17:25:44] 10Operations, 10Analytics: rack/setup/install furud.codfw.wmnet - https://phabricator.wikimedia.org/T176506#3627866 (10faidon) 05Open>03Resolved This is all installed and in production for about a week now. [17:26:07] 10Operations, 10ops-eqiad, 10Patch-For-Review: rack/setup/install flerovium.eqiad.wmnet - https://phabricator.wikimedia.org/T176505#3680431 (10faidon) 05Open>03Resolved In production for about a week now. [17:31:15] PROBLEM - puppet last run on netmon1002 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 6 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[acme-setup-acme-librenms] [17:32:20] 10Operations, 10Research, 10Research-2017-18-Q2: Permissions to upload data to the analytics cluster from a machine at Drexel - https://phabricator.wikimedia.org/T177521#3680513 (10mark) >>! In T177521#3661737, @elukey wrote: > So the last viable solution would be to create a temporary ssh key for Aaron's a... [17:34:19] (03PS1) 10Herron: puppet: Add puppetcompiler1001 to site.pp [puppet] - 10https://gerrit.wikimedia.org/r/383883 (https://phabricator.wikimedia.org/T177843) [17:36:05] (03CR) 10Herron: [C: 032] puppet: Add puppetcompiler1001 to site.pp [puppet] - 10https://gerrit.wikimedia.org/r/383883 (https://phabricator.wikimedia.org/T177843) (owner: 10Herron) [17:36:23] 10Operations, 10Research, 10Research-2017-18-Q2: Permissions to upload data to the analytics cluster from a machine at Drexel - https://phabricator.wikimedia.org/T177521#3680521 (10elukey) @Ottomata, @Halfak - I believe that it would be easier for you guys to coordinate during your US daytime on one of the f... [17:51:03] (03PS3) 10Smalyshev: Add configuration for statement indexing for Wikidata [mediawiki-config] - 10https://gerrit.wikimedia.org/r/383464 (https://phabricator.wikimedia.org/T175199) [17:51:13] (03CR) 10Smalyshev: Add configuration for statement indexing for Wikidata (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/383464 (https://phabricator.wikimedia.org/T175199) (owner: 10Smalyshev) [17:56:15] RECOVERY - puppet last run on netmon1002 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [17:56:57] oh.. wait. did somebody do that ^ :) [17:57:16] i was still sigh'ing that it wasn't gone, heh, very nice [17:57:17] gilles: About? You're running refreshFileHeaders.php for commonswiki on terbium, right? [17:58:00] Your script is spamming some notices about some undefined indices in Flow [18:00:04] addshore, hashar, anomie, RainbowSprinkles, aude, MaxSem, twentyafterfour, RoanKattouw, Dereckson, thcipriani, Niharika, and zeljkof: It is that lovely time of the day again! You are hereby commanded to deploy Morning SWAT (Max 8 patches). (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20171012T1800). [18:00:04] RoanKattouw: A patch you scheduled for Morning SWAT (Max 8 patches) is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker. [18:00:34] I can SWAT. [18:00:41] I'm already on tin [18:00:42] I got it [18:00:50] Cool. :) [18:01:02] !log demon@tin Synchronized php-1.31.0-wmf.3/extensions/JsonConfig/includes/JCCache.php: Silence junk warnings (duration: 00m 50s) [18:01:07] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:02:35] (03CR) 10Zoranzoki21: [C: 031] Add additional namespaces to search results for bnwikisource [mediawiki-config] - 10https://gerrit.wikimedia.org/r/383794 (https://phabricator.wikimedia.org/T178041) (owner: 10DCausse) [18:05:46] (03CR) 10Zoranzoki21: [C: 031] Enable $wgAbuseFilterProfile on ptwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/383858 (https://phabricator.wikimedia.org/T177641) (owner: 10Dmaza) [18:11:37] no_justification: SWAT done or ongoing? [18:11:48] i'm planning to cause some CI disruption [18:12:57] MatmaRex: I hope no_justification isn't done because my patch is in the SWAT and it's not done yet [18:13:17] Derp, got distracted [18:13:31] i can see it was merged 6 minutes ago but i haven't seen a notification here that it was deployed [18:13:33] oh, alright [18:13:47] That's OK, my wmf.3 patch only took 6 mins to merge which is amazingly fast relatively speaking [18:13:59] It usually takes 15+ minutes to make its way through Zuul [18:14:17] (i need to force-merge https://gerrit.wikimedia.org/r/#/c/383762/ which will cause phpunit to fail everywhere until https://gerrit.wikimedia.org/r/#/c/383763/ is merged) [18:15:00] !log demon@tin Synchronized php-1.31.0-wmf.3/extensions/ORES/includes/Hooks.php: Make watchlist queries suck less (duration: 00m 50s) [18:15:06] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:15:33] RoanKattouw: You're live ^^^ [18:16:18] Thanks [18:17:18] (03PS1) 10Dzahn: add IPv6 records for puppetcompiler1001 [dns] - 10https://gerrit.wikimedia.org/r/383890 (https://phabricator.wikimedia.org/T177843) [18:20:30] (if no one tells me not to do it, i'll do it in 10 minutes) [18:21:14] (03PS1) 10Dzahn: add mapped IPv6 address for puppetcompiler1001 [puppet] - 10https://gerrit.wikimedia.org/r/383892 (https://phabricator.wikimedia.org/T177843) [18:21:20] (03PS2) 10Dzahn: add IPv6 records for puppetcompiler1001 [dns] - 10https://gerrit.wikimedia.org/r/383890 (https://phabricator.wikimedia.org/T177843) [18:22:57] 10Operations, 10OCG-General, 10Readers-Community-Engagement, 10Epic, and 3 others: [EPIC] (Proposal) Replicate core OCG features and sunset OCG service - https://phabricator.wikimedia.org/T150871#3680651 (10phuedx) [18:23:29] 10Operations, 10LDAP-Access-Requests, 10WMF-NDA-Requests, 10User-Addshore: Request to be added to the ldap/wmde group - https://phabricator.wikimedia.org/T177599#3680652 (10RStallman-legalteam) Pablo has signed the NDA for LDAP access and it's on file in our contracts software. Thanks! [18:28:32] !log added Pablo (pgrass) to LDAP group 'wmde' (T177599) [18:28:38] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:28:38] T177599: Request to be added to the ldap/wmde group - https://phabricator.wikimedia.org/T177599 [18:29:50] 10Operations, 10LDAP-Access-Requests, 10WMF-NDA-Requests, 10User-Addshore: Request to be added to the ldap/wmde group - https://phabricator.wikimedia.org/T177599#3680687 (10Dzahn) 05Open>03Resolved a:03Dzahn @Rstallman-legalteam Thank you for confirming. Done, i added him just now. @Pablo-WMDE You h... [18:46:07] 10Operations, 10ops-eqiad, 10Patch-For-Review: rack and setup db1107 and db1108 - https://phabricator.wikimedia.org/T177405#3680725 (10Cmjohnson) [18:46:18] 10Operations, 10ops-eqiad, 10Patch-For-Review: rack and setup db1107 and db1108 - https://phabricator.wikimedia.org/T177405#3657517 (10Cmjohnson) [18:47:13] 10Operations, 10ops-eqiad, 10Patch-For-Review: rack and setup db1107 and db1108 - https://phabricator.wikimedia.org/T177405#3657517 (10Cmjohnson) @elukey these are ready for you or @Ottomata Please reassign [18:51:45] 10Operations, 10Research, 10Research-2017-18-Q2: Permissions to upload data to the analytics cluster from a machine at Drexel - https://phabricator.wikimedia.org/T177521#3680733 (10Ottomata) @halfak cool, ya post the public key on office wiki somewhere, let me know, and I'll get on it. [18:57:52] (03PS1) 10Rush: openstack: labs-ip-alias-dump as a cron rather than exec [puppet] - 10https://gerrit.wikimedia.org/r/383896 [18:58:19] PROBLEM - puppet last run on netmon1002 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 3 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[acme-setup-acme-librenms] [19:00:04] no_justification: #bothumor When your hammer is PHP, everything starts looking like a thumb. Rise for MediaWiki train. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20171012T1900). [19:00:04] No GERRIT patches in the queue for this window AFAICS. [19:01:12] i ended up having to revert my mediawiki/vendor patch because everything sucks [19:01:19] but CI should be back in working order now [19:09:45] (03CR) 10Chad: [C: 032] group2 to wmf.3 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/383876 (owner: 10Chad) [19:14:29] (03Merged) 10jenkins-bot: group2 to wmf.3 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/383876 (owner: 10Chad) [19:16:37] (03CR) 10jenkins-bot: group2 to wmf.3 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/383876 (owner: 10Chad) [19:18:59] !log demon@tin rebuilt wikiversions.php and synchronized wikiversions files: group2 to wmf.3 [19:19:06] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:20:18] (03PS4) 10Smalyshev: Add configuration for statement indexing for Wikidata [mediawiki-config] - 10https://gerrit.wikimedia.org/r/383464 (https://phabricator.wikimedia.org/T175199) [19:40:44] (03PS19) 10Paladox: gerrit: Ajust scap files (DO NOT MERGE) [software/gerrit] - 10https://gerrit.wikimedia.org/r/363738 [19:40:45] (03PS17) 10Paladox: Gerrit: Upgrading gerrit to 2.14.5-pre (DO NOT MERGE) [software/gerrit] - 10https://gerrit.wikimedia.org/r/363734 [19:46:59] !log disable puppet for lab*services* for merge of 383896 [19:47:05] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:47:15] (03CR) 10Rush: [C: 032] openstack: labs-ip-alias-dump as a cron rather than exec [puppet] - 10https://gerrit.wikimedia.org/r/383896 (owner: 10Rush) [19:48:29] no_justification: yes, for https://phabricator.wikimedia.org/T175689 [19:49:02] no_justification: it should take about 3 more days to complete, based on my rough estimates [19:50:33] PROBLEM - IPv6 ping to ulsfo on ripe-atlas-ulsfo IPv6 is CRITICAL: CRITICAL - failed 26 probes of 279 (alerts on 19) - https://atlas.ripe.net/measurements/1791309/#!map [19:54:31] gilles: No worries on running it, just pinging re: logspam :) [19:55:10] no_justification: yes that warning is really spammy, I need to file a task for it. it didn't happen last time I used that script [19:55:23] okie dokie [19:55:33] RECOVERY - IPv6 ping to ulsfo on ripe-atlas-ulsfo IPv6 is OK: OK - failed 9 probes of 279 (alerts on 19) - https://atlas.ripe.net/measurements/1791309/#!map [19:56:14] 10Operations, 10wikidiff2, 10User-Addshore, 10WMDE-QWERTY-Team-Board: Update and use php-wikidiff2 to 1.5 in production - https://phabricator.wikimedia.org/T177891#3680957 (10Legoktm) I tagged 1.5.1 and updated the packaging/changelog: https://gerrit.wikimedia.org/r/#/c/383907/ [19:57:27] 10Operations, 10Puppet, 10User-Joe: Set up puppet catalog diff on host with access to puppetmaster1001 and puppetmaster2001 - https://phabricator.wikimedia.org/T177843#3680959 (10herron) [19:59:16] 10Operations, 10Scap: scap should not pull in HHVM on stretch hosts using PHP7 - https://phabricator.wikimedia.org/T178039#3680962 (10Dzahn) I stopped hhvm on gerrit2001 but i can't remove the package just yet because that requirement also means both hhvm and scap get removed if you attempt to remove hhvm. [20:00:51] (03PS1) 10Rush: openstack: pdns recursor module/profile/role [puppet] - 10https://gerrit.wikimedia.org/r/383909 (https://phabricator.wikimedia.org/T171494) [20:01:24] (03CR) 10jerkins-bot: [V: 04-1] openstack: pdns recursor module/profile/role [puppet] - 10https://gerrit.wikimedia.org/r/383909 (https://phabricator.wikimedia.org/T171494) (owner: 10Rush) [20:09:32] 10Operations, 10Data-Services, 10cloud-services-team: Switch labstore servers to default SSH configuration - https://phabricator.wikimedia.org/T177914#3680996 (10bd808) [20:11:59] (03PS3) 10Bartosz Dziewoński: Revert "Revert "Limit thanks for new users at pl.wikipedia to 3 per day"" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/383694 (https://phabricator.wikimedia.org/T169268) (owner: 10Dereckson) [20:12:11] (03CR) 10Bartosz Dziewoński: "I removed the comment "Re-evaluate on 2017-08-01"." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/383694 (https://phabricator.wikimedia.org/T169268) (owner: 10Dereckson) [20:14:13] 10Operations, 10Data-Services, 10cloud-services-team: Switch labstore servers to default SSH configuration - https://phabricator.wikimedia.org/T177914#3681006 (10chasemp) I believe paramiko is no longer in use. I know it's been removed for all the backup components that have been redone, but I'm unsure if t... [20:17:06] (03CR) 10Bartosz Dziewoński: "Scheduled for deployment today: https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20171012T2300" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/383694 (https://phabricator.wikimedia.org/T169268) (owner: 10Dereckson) [20:17:08] 10Operations, 10Data-Services, 10cloud-services-team: Switch labstore servers to default SSH configuration - https://phabricator.wikimedia.org/T177914#3674761 (10Paladox) backporting paramiko 2.0 will benefit zuul. Zuul uses this library and i had to update it and build zuul manually for the new gerrit updat... [20:18:41] 10Operations, 10Puppet, 10User-Joe, 10cloud-services-team (FY2017-18): Upgrade to puppet 4 (4.8 or newer) - https://phabricator.wikimedia.org/T177254#3681039 (10bd808) [20:36:25] 10Operations, 10DBA: Wikimedia\Rdbms\DBQueryTimeoutError (not repeated) - https://phabricator.wikimedia.org/T178109#3681065 (10Base) [20:42:25] (03PS1) 10Bearloga: Add profiles/roles for stats/ML on Wikimedia Cloud [puppet] - 10https://gerrit.wikimedia.org/r/383916 (https://phabricator.wikimedia.org/T178096) [20:42:56] (03CR) 10jerkins-bot: [V: 04-1] Add profiles/roles for stats/ML on Wikimedia Cloud [puppet] - 10https://gerrit.wikimedia.org/r/383916 (https://phabricator.wikimedia.org/T178096) (owner: 10Bearloga) [20:44:51] (03PS2) 10Bearloga: Add profiles/roles for stats/ML on Wikimedia Cloud [puppet] - 10https://gerrit.wikimedia.org/r/383916 (https://phabricator.wikimedia.org/T178096) [20:47:08] (03PS2) 10Rush: openstack: pdns recursor module/profile/role [puppet] - 10https://gerrit.wikimedia.org/r/383909 (https://phabricator.wikimedia.org/T171494) [20:47:38] (03CR) 10jerkins-bot: [V: 04-1] openstack: pdns recursor module/profile/role [puppet] - 10https://gerrit.wikimedia.org/r/383909 (https://phabricator.wikimedia.org/T171494) (owner: 10Rush) [20:48:51] (03CR) 10Bearloga: "IDK whether to make these more general than Discovery in case other teams (esp. Research) would be interested in using these" [puppet] - 10https://gerrit.wikimedia.org/r/383916 (https://phabricator.wikimedia.org/T178096) (owner: 10Bearloga) [20:58:22] (03PS3) 10Rush: openstack: pdns recursor module/profile/role [puppet] - 10https://gerrit.wikimedia.org/r/383909 (https://phabricator.wikimedia.org/T171494) [20:58:35] 10Operations, 10ORES, 10Patch-For-Review, 10Scoring-platform-team (Current), 10User-Ladsgroup: Review and fix file handle management in worker and celery processes - https://phabricator.wikimedia.org/T174402#3681200 (10awight) I used lsof to watch filehandle usage over the lifecycle of the celery service... [21:03:13] (03CR) 10Andrew Bogott: "I've confirmed that this actually works and helps us get proper diffs for cloud instances." [software/puppet-compiler] - 10https://gerrit.wikimedia.org/r/325042 (owner: 10Gerrit Patch Uploader) [21:12:16] 10Operations, 10ORES, 10Patch-For-Review, 10Scoring-platform-team (Current), 10User-Ladsgroup: Review and fix file handle management in worker and celery processes - https://phabricator.wikimedia.org/T174402#3681269 (10awight) Now *this* is interesting. The Celery code involved in the failure, https://g... [21:13:22] 10Operations, 10ORES, 10Patch-For-Review, 10Scoring-platform-team (Current), 10User-Ladsgroup: Review and fix file handle management in worker and celery processes - https://phabricator.wikimedia.org/T174402#3681281 (10Halfak) Before we have that discussion, we could just give it a shot to see if it work... [21:15:00] 10Operations, 10ORES, 10Patch-For-Review, 10Scoring-platform-team (Current), 10User-Ladsgroup: Review and fix file handle management in worker and celery processes - https://phabricator.wikimedia.org/T174402#3681285 (10awight) Sounds good. It also sounds like the Celery developers would be open to a con... [21:15:51] 10Operations, 10Research, 10Research-2017-18-Q2: Permissions to upload data to the analytics cluster from a machine at Drexel - https://phabricator.wikimedia.org/T177521#3681291 (10Halfak) https://office.wikimedia.org/wiki/User:Halfak_(WMF)/id_rsa.pub_(temp) [21:20:07] (03PS1) 10BryanDavis: maintain-views: Add ip_changes view [puppet] - 10https://gerrit.wikimedia.org/r/383935 (https://phabricator.wikimedia.org/T173891) [21:22:15] (03CR) 10Bearloga: [C: 04-1] "After some consideration, I'm gonna shuffle some things around to make these roles/profiles more general for others' use." [puppet] - 10https://gerrit.wikimedia.org/r/383916 (https://phabricator.wikimedia.org/T178096) (owner: 10Bearloga) [21:26:27] (03CR) 10Brian Wolff: [C: 031] maintain-views: Add ip_changes view [puppet] - 10https://gerrit.wikimedia.org/r/383935 (https://phabricator.wikimedia.org/T173891) (owner: 10BryanDavis) [21:40:19] 10Operations, 10Patch-For-Review: create endowment.wm.org microsite - https://phabricator.wikimedia.org/T136735#3681351 (10Dzahn) 05Open>03stalled [21:41:59] 10Operations, 10RelEng-Archive-FY201718-Q1, 10Patch-For-Review, 10Release-Engineering-Team (Kanban), 10Security-General: setup releases1001.eqiad.wmnet (was: setup mwreleases1001) - https://phabricator.wikimedia.org/T164030#3681353 (10Dzahn) a:05Dzahn>03demon Back to Chad. Jenkins should be usable now. [21:49:39] (03PS4) 10Rush: openstack: pdns recursor module/profile/role [puppet] - 10https://gerrit.wikimedia.org/r/383909 (https://phabricator.wikimedia.org/T171494) [21:50:13] (03CR) 10jerkins-bot: [V: 04-1] openstack: pdns recursor module/profile/role [puppet] - 10https://gerrit.wikimedia.org/r/383909 (https://phabricator.wikimedia.org/T171494) (owner: 10Rush) [21:54:23] (03PS5) 10Rush: openstack: pdns recursor module/profile/role [puppet] - 10https://gerrit.wikimedia.org/r/383909 (https://phabricator.wikimedia.org/T171494) [22:01:02] (03PS1) 10Andrew Bogott: DO NOT MERGE: no-op patch for testing [puppet] - 10https://gerrit.wikimedia.org/r/383942 [22:03:32] (03PS6) 10Rush: openstack: pdns recursor module/profile/role [puppet] - 10https://gerrit.wikimedia.org/r/383909 (https://phabricator.wikimedia.org/T171494) [22:04:02] (03PS7) 10Rush: openstack: pdns recursor module/profile/role [puppet] - 10https://gerrit.wikimedia.org/r/383909 (https://phabricator.wikimedia.org/T171494) [22:05:12] (03PS1) 10Zoranzoki21: Rename $wmf* to $wmg* in wmf-config [mediawiki-config] - 10https://gerrit.wikimedia.org/r/383944 (https://phabricator.wikimedia.org/T45956) [22:06:56] (03CR) 10Rush: "http://puppet-compiler.wmflabs.org/8303/" [puppet] - 10https://gerrit.wikimedia.org/r/383909 (https://phabricator.wikimedia.org/T171494) (owner: 10Rush) [22:10:40] (03CR) 10jerkins-bot: [V: 04-1] Rename $wmf* to $wmg* in wmf-config [mediawiki-config] - 10https://gerrit.wikimedia.org/r/383944 (https://phabricator.wikimedia.org/T45956) (owner: 10Zoranzoki21) [22:10:53] (03PS1) 10Dzahn: ganglia: decom class should use "ensure => purged" [puppet] - 10https://gerrit.wikimedia.org/r/383945 (https://phabricator.wikimedia.org/T177225) [22:10:55] (03CR) 10Bearloga: [C: 031] "Er, nevermind. I guess other teams don't really need this stuff as much as Chelsy and I do in Discovery Analysis." [puppet] - 10https://gerrit.wikimedia.org/r/383916 (https://phabricator.wikimedia.org/T178096) (owner: 10Bearloga) [22:13:08] (03PS1) 10Rush: openstack: cleanup ceilometer files and roles [puppet] - 10https://gerrit.wikimedia.org/r/383946 (https://phabricator.wikimedia.org/T171494) [22:16:05] (03CR) 10Dzahn: [C: 032] ganglia: decom class should use "ensure => purged" [puppet] - 10https://gerrit.wikimedia.org/r/383945 (https://phabricator.wikimedia.org/T177225) (owner: 10Dzahn) [22:16:50] (03CR) 10Andrew Bogott: [C: 032] maintain-views: Add ip_changes view [puppet] - 10https://gerrit.wikimedia.org/r/383935 (https://phabricator.wikimedia.org/T173891) (owner: 10BryanDavis) [22:16:59] (03PS2) 10Andrew Bogott: maintain-views: Add ip_changes view [puppet] - 10https://gerrit.wikimedia.org/r/383935 (https://phabricator.wikimedia.org/T173891) (owner: 10BryanDavis) [22:44:35] !log demon@tin Synchronized php-1.31.0-wmf.3/includes/specialpage/ChangesListSpecialPage.php: silence notices (duration: 00m 47s) [22:44:42] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:52:52] (03PS1) 10BryanDavis: maintain-views: sort logging_whitelist [puppet] - 10https://gerrit.wikimedia.org/r/383950 [22:54:24] (03CR) 10Andrew Bogott: [C: 032] maintain-views: sort logging_whitelist [puppet] - 10https://gerrit.wikimedia.org/r/383950 (owner: 10BryanDavis) [23:00:04] addshore, hashar, anomie, RainbowSprinkles, aude, MaxSem, twentyafterfour, RoanKattouw, Dereckson, thcipriani, Niharika, and zeljkof: #bothumor When your hammer is PHP, everything starts looking like a thumb. Rise for Evening SWAT (Max 8 patches). (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20171012T2300). [23:00:04] MaxSem, MatmaRex, and DMaza: A patch you scheduled for Evening SWAT (Max 8 patches) is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker. [23:00:18] sup [23:00:23] hello [23:00:28] hello [23:01:39] Hello, I can SWAT this evening. [23:04:02] DMaza: if you suspect it could affect performance, it's not swattable at evening, and you should first ask a +1 from some ops [23:04:32] DMaza: if it's deployed during the morning SWAT, that allows a revert all the day if there is any issue [23:04:50] I see, ok.. I'll re-schedule [23:04:54] (03PS2) 10Dereckson: Switch test wikis to HTML5 fragment mode in links [mediawiki-config] - 10https://gerrit.wikimedia.org/r/383473 (https://phabricator.wikimedia.org/T152540) (owner: 10MaxSem) [23:05:26] thanks Dereckson [23:05:30] (03CR) 10Dereckson: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/383473 (https://phabricator.wikimedia.org/T152540) (owner: 10MaxSem) [23:07:06] (03Merged) 10jenkins-bot: Switch test wikis to HTML5 fragment mode in links [mediawiki-config] - 10https://gerrit.wikimedia.org/r/383473 (https://phabricator.wikimedia.org/T152540) (owner: 10MaxSem) [23:07:17] (03CR) 10jenkins-bot: Switch test wikis to HTML5 fragment mode in links [mediawiki-config] - 10https://gerrit.wikimedia.org/r/383473 (https://phabricator.wikimedia.org/T152540) (owner: 10MaxSem) [23:08:11] MaxSem: live on mwdebug1002 [23:10:41] MaxSem: live on mwdebug1002 now (wasn't actually the first time) [23:11:04] heh, I was trying to figure out what's wrong :P [23:11:47] sorry [23:12:01] Dereckson: works [23:12:43] (03PS2) 10Dereckson: Add logging for email blocks [mediawiki-config] - 10https://gerrit.wikimedia.org/r/376792 (https://phabricator.wikimedia.org/T175419) (owner: 10MaxSem) [23:13:12] !log dereckson@tin Synchronized wmf-config/InitialiseSettings.php: Switch test wikis to HTML5 fragment mode in links (T152540) (duration: 00m 47s) [23:13:19] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:13:19] T152540: Migrate to HTML5 section ids - https://phabricator.wikimedia.org/T152540 [23:14:42] (03CR) 10Dereckson: [C: 032] Add logging for email blocks [mediawiki-config] - 10https://gerrit.wikimedia.org/r/376792 (https://phabricator.wikimedia.org/T175419) (owner: 10MaxSem) [23:15:53] (03Merged) 10jenkins-bot: Add logging for email blocks [mediawiki-config] - 10https://gerrit.wikimedia.org/r/376792 (https://phabricator.wikimedia.org/T175419) (owner: 10MaxSem) [23:16:47] (03CR) 10jenkins-bot: Add logging for email blocks [mediawiki-config] - 10https://gerrit.wikimedia.org/r/376792 (https://phabricator.wikimedia.org/T175419) (owner: 10MaxSem) [23:17:19] live on mwdebug1002 (I guess you can send two mails to test that one?) [23:17:30] 10Operations, 10Ops-Access-Requests, 10Research: Request public key change for a research fellow - https://phabricator.wikimedia.org/T177889#3681626 (10leila) [23:17:46] 10Operations, 10Ops-Access-Requests, 10DBA, 10cloud-services-team: Access to raw database tables on labsdb* for wmcs-admin users - https://phabricator.wikimedia.org/T178128#3681627 (10bd808) [23:18:08] works [23:22:00] Dereckson: ^ [23:22:49] ack'ed [23:25:19] !log dereckson@tin Synchronized wmf-config/CommonSettings.php: Add logging for email blocks (T175419) (duration: 00m 46s) [23:25:25] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:25:52] Unrelated, but since the start of the SWAT we've some small increase in this issue: [23:25:59] 90 Undefined index: 1 in /srv/mediawiki/php-1.31.0-wmf.3/includes/media/FormatMetadata.php on line 744 [23:26:10] (now 96) [23:26:45] Hi Urbanecm [23:26:57] (03PS4) 10Dereckson: Revert "Revert "Limit thanks for new users at pl.wikipedia to 3 per day"" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/383694 (https://phabricator.wikimedia.org/T169268) [23:35:23] MatmaRex or Urbanecm > ping? [23:35:58] i'm here [23:36:03] (03CR) 10Dereckson: [C: 032] Revert "Revert "Limit thanks for new users at pl.wikipedia to 3 per day"" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/383694 (https://phabricator.wikimedia.org/T169268) (owner: 10Dereckson) [23:36:13] not that i have much to test, i think? this was live and tested before [23:36:43] if you've still a newbie account, you can try to fire 4 thanks [23:37:04] (we should have a generator of new test accounts for such tests) [23:37:52] but yes, was tested and working fine last time, so just fire a thanks and see no error occur sounds fine to me [23:38:26] (03CR) 10jerkins-bot: [V: 04-1] Revert "Revert "Limit thanks for new users at pl.wikipedia to 3 per day"" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/383694 (https://phabricator.wikimedia.org/T169268) (owner: 10Dereckson) [23:38:56] (03PS4) 10Dzahn: Phab: Allow aklapper to delete panels on dashboards [puppet] - 10https://gerrit.wikimedia.org/r/380959 (owner: 10Aklapper) [23:39:23] (03CR) 10jerkins-bot: [V: 04-1] Phab: Allow aklapper to delete panels on dashboards [puppet] - 10https://gerrit.wikimedia.org/r/380959 (owner: 10Aklapper) [23:41:05] (03CR) 10Dzahn: [C: 031] "CI is broken with "23:39:20 ERROR: unknown environment 'testenv'"" [puppet] - 10https://gerrit.wikimedia.org/r/380959 (owner: 10Aklapper) [23:41:31] mutante: and on another change, they can do a git checkout [23:41:33] can't [23:41:46] oops [23:41:50] 10Operations, 10Patch-For-Review: create endowment.wm.org microsite - https://phabricator.wikimedia.org/T136735#3681663 (10kaythaney) update! automatic / wordpress VIP are going to host the email for dns-admin/ abuse reports, as well as handle hosting of the domain. will know more next week about the timeline... [23:41:52] https://integration.wikimedia.org/ci/job/operations-mw-config-typos/12519/console [23:42:47] (03CR) 10Dereckson: [V: 032 C: 032] "Manually merging, as the change passes the tests, and it looks there is a CI issue." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/383694 (https://phabricator.wikimedia.org/T169268) (owner: 10Dereckson) [23:43:05] (03CR) 10jenkins-bot: Revert "Revert "Limit thanks for new users at pl.wikipedia to 3 per day"" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/383694 (https://phabricator.wikimedia.org/T169268) (owner: 10Dereckson) [23:43:14] MatmaRex: live on mwdebug1002 [23:44:38] Dereckson: testing [23:46:05] ugh, i guess i need to register an acocunt to test [23:49:33] Dereckson: alright, verified. i get an error when trying to send 4th thanks [23:49:50] (03PS1) 10Tim Starling: Update dumps archive_index.html for the files I just uploaded [puppet] - 10https://gerrit.wikimedia.org/r/383958 [23:49:52] https://pl.wikipedia.org/wiki/Specjalna:Rejestr/thanks [23:51:33] an error like a 500 error or like an expected error message? [23:54:33] MatmaRex: ^ [23:54:51] Dereckson: no, the expected error message. [23:55:19] https://i.imgur.com/yB3mpbi.png [23:57:06] ack'ed, syncing [23:57:14] Thanks a lot for testing [23:58:47] !log dereckson@tin Synchronized wmf-config/InitialiseSettings.php: Restore thanks limit on pl.wikipedia (T169268) (duration: 00m 47s) [23:58:54] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:58:54] T169268: Limiting thanks for new users at pl.wikipedia - https://phabricator.wikimedia.org/T169268