[00:00:06] PROBLEM - check_raid on silicon is CRITICAL: CRITICAL md0 status=[UU]. md1 status=[UU]. md2 status=[UU]. md3 status=[U_]. [00:00:49] ori: I think I know where the problem comes from... do you do git fetch before pulling in /a/common ? [00:01:09] yes [00:01:19] mh [00:01:20] weird [00:05:06] PROBLEM - check_raid on silicon is CRITICAL: CRITICAL md0 status=[UU]. md1 status=[UU]. md2 status=[UU]. md3 status=[U_]. [00:10:06] PROBLEM - check_raid on silicon is CRITICAL: CRITICAL md0 status=[UU]. md1 status=[UU]. md2 status=[UU]. md3 status=[U_]. [00:15:06] PROBLEM - check_raid on silicon is CRITICAL: CRITICAL md0 status=[UU]. md1 status=[UU]. md2 status=[UU]. md3 status=[U_]. [00:16:30] (03PS1) 10Faidon Liambotis: ldap: switch CA name to Equifax_Secure_CA [operations/puppet] - 10https://gerrit.wikimedia.org/r/121283 [00:16:43] (03CR) 10Faidon Liambotis: [C: 032 V: 032] ldap: switch CA name to Equifax_Secure_CA [operations/puppet] - 10https://gerrit.wikimedia.org/r/121283 (owner: 10Faidon Liambotis) [00:19:15] mh.. I guess I'm going to investigate on the git hook stuff after my next deploy [00:20:10] PROBLEM - check_raid on silicon is CRITICAL: CRITICAL md0 status=[UU]. md1 status=[UU]. md2 status=[UU]. md3 status=[U_]. [00:20:20] RECOVERY - Certificate expiration on virt0 is OK: (null) [00:20:29] \o/ [00:20:52] springle: is es3 decommissioned? [00:23:10] RECOVERY - Puppet freshness on labstore2 is OK: puppet ran at Thu Mar 27 00:22:53 UTC 2014 [00:25:10] PROBLEM - check_raid on silicon is CRITICAL: CRITICAL md0 status=[UU]. md1 status=[UU]. md2 status=[UU]. md3 status=[U_]. [00:26:12] RECOVERY - Certificate expiration on virt1000 is OK: (null) [00:30:12] PROBLEM - check_raid on silicon is CRITICAL: CRITICAL md0 status=[UU]. md1 status=[UU]. md2 status=[UU]. md3 status=[U_]. [00:31:32] !log acknowledge silicon's check raid in icinga, already on rt7136 [00:31:38] Logged the message, RobH [00:32:32] paravoid: Got maybe time for a small ci change that would make my day? https://gerrit.wikimedia.org/r/#/c/119750/ [00:35:17] PROBLEM - Certificate expiration on virt1000 is CRITICAL: SSL error: [Errno 1] _ssl.c:504: error:14090086:SSL routines:SSL3_GET_SERVER_CERTIFICATE:certificate verify failed [00:38:22] (03CR) 10Faidon Liambotis: [C: 04-1] contint: override .jshintrc file on gallium (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/119750 (owner: 10Hashar) [01:03:30] RECOVERY - Certificate expiration on virt1000 is OK: (null) [01:29:24] paravoid: yes https://rt.wikimedia.org/Ticket/Display.html?id=6544 [01:36:27] !log Reloading Zuul to deploy I8a7ccef26da45d9ed7c7705df77246001bb85544 [01:36:36] Logged the message, Master [02:02:40] (03PS2) 10Jforrester: Fix typo in noncirrus.dblist reference [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/120172 [02:03:13] (03PS3) 10Krinkle: Fix typo in noncirrus.dblist noc symlink creation [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/120172 (owner: 10Jforrester) [02:03:17] (03CR) 10Krinkle: [C: 032] Fix typo in noncirrus.dblist noc symlink creation [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/120172 (owner: 10Jforrester) [02:03:29] (03Merged) 10jenkins-bot: Fix typo in noncirrus.dblist noc symlink creation [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/120172 (owner: 10Jforrester) [02:04:01] !log LocalisationUpdate failed (1.23wmf18) at 2014-03-27 02:04:01+00:00 [02:04:13] Logged the message, Master [02:04:37] !log LocalisationUpdate failed (1.23wmf19) at 2014-03-27 02:04:37+00:00 [02:04:44] Logged the message, Master [02:11:33] !log LocalisationUpdate ResourceLoader cache refresh completed at Thu Mar 27 02:11:30 UTC 2014 (duration 11m 29s) [02:11:39] Logged the message, Master [02:14:20] PROBLEM - Puppet freshness on tantalum is CRITICAL: Last successful Puppet run was Wed 26 Mar 2014 05:12:06 PM UTC [02:58:51] (03PS1) 10Chad: Opt all remaining wikis with ~50k or less pages into Cirrus as Beta [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/121294 [03:01:27] (03CR) 10Chad: [C: 032] Opt all remaining wikis with ~50k or less pages into Cirrus as Beta [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/121294 (owner: 10Chad) [03:01:35] (03Merged) 10jenkins-bot: Opt all remaining wikis with ~50k or less pages into Cirrus as Beta [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/121294 (owner: 10Chad) [03:03:26] !log demon synchronized noncirrus.dblist [03:03:32] Logged the message, Master [03:04:22] !log demon synchronized wmf-config/InitialiseSettings.php 'touch' [03:04:28] Logged the message, Master [03:09:13] ^d: nice, congrats [03:10:41] <^d> :) [03:11:01] <^d> Only 85 wikis don't have Cirrus at all yet. [03:11:03] <^d> Almost there. [03:13:12] <^d> This is my favorite Elasticsearch graph right now. http://ganglia.wikimedia.org/latest/graph.php?r=week&z=xlarge&c=Elasticsearch+cluster+eqiad&m=cpu_report&s=by+name&mc=2&g=cpu_report [03:13:15] (03PS1) 10Springle: reassign db1034 to s7 [operations/puppet] - 10https://gerrit.wikimedia.org/r/121295 [03:13:21] <^d> Showing nice even daily traffic cycles finally. [03:13:31] <^d> (Minus when Nik or I mess with things :)) [03:14:54] (03CR) 10Springle: [C: 032] reassign db1034 to s7 [operations/puppet] - 10https://gerrit.wikimedia.org/r/121295 (owner: 10Springle) [03:18:52] !log xtrabackup clone db1007 to db1034 [03:18:57] Logged the message, Master [03:25:41] (03CR) 10Mattflaschen: "It's no longer blocked by that bug. I need to rebase and retest." [operations/puppet] - 10https://gerrit.wikimedia.org/r/79955 (owner: 10Mattflaschen) [03:34:55] (03PS1) 10Jalexander: Add config file for bing webmaster tools. [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/121297 [03:44:00] PROBLEM - gitblit.wikimedia.org on antimony is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:58:52] (03CR) 10Krinkle: contint: override .jshintrc file on gallium (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/119750 (owner: 10Hashar) [04:00:00] RECOVERY - gitblit.wikimedia.org on antimony is OK: HTTP OK: HTTP/1.1 200 OK - 307199 bytes in 8.144 second response time [04:02:50] PROBLEM - gitblit.wikimedia.org on antimony is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Temporarily Unavailable - 516 bytes in 0.026 second response time [04:04:00] RECOVERY - gitblit.wikimedia.org on antimony is OK: HTTP OK: HTTP/1.1 200 OK - 308675 bytes in 9.080 second response time [04:09:28] poor gitblit [05:15:20] PROBLEM - Puppet freshness on tantalum is CRITICAL: Last successful Puppet run was Wed 26 Mar 2014 05:12:06 PM UTC [05:16:54] (03CR) 10Mattflaschen: "Could exposing the core file to all Gerrit users like that reveal any sensitive information in RAM?" [operations/puppet] - 10https://gerrit.wikimedia.org/r/119225 (owner: 10Faidon Liambotis) [05:19:02] (03PS1) 10Chad: All wikis with 100k pages or less can get Cirrus too [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/121305 [05:22:58] (03PS3) 10Gilles: Add ?download parameter to images [operations/puppet] - 10https://gerrit.wikimedia.org/r/120617 [05:25:31] (03CR) 10Gilles: "Thanks for the tip! I was able to test this locally. I think I've taken these regular expressions as far as they can be taken. The "downlo" [operations/puppet] - 10https://gerrit.wikimedia.org/r/120617 (owner: 10Gilles) [05:26:55] (03CR) 10Chad: [C: 032] All wikis with 100k pages or less can get Cirrus too [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/121305 (owner: 10Chad) [05:27:02] (03Merged) 10jenkins-bot: All wikis with 100k pages or less can get Cirrus too [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/121305 (owner: 10Chad) [05:27:24] !log demon synchronized noncirrus.dblist [05:27:50] !log demon synchronized wmf-config/InitialiseSettings.php [05:27:56] Logged the message, Master [05:30:29] I do not know if someone can look at this labs related bug https://bugzilla.wikimedia.org/show_bug.cgi?id=63152 [05:30:46] it has killed many tools [05:34:26] .. right something happened .. it now does work [06:55:31] congrats ^demon|zzz [07:17:40] PROBLEM - Disk space on db1044 is CRITICAL: DISK CRITICAL - free space: / 143 MB (2% inode=69%): [07:32:40] RECOVERY - Disk space on db1044 is OK: DISK OK [07:56:09] (03CR) 10Tobias Gritschacher: "I guess we do not have to wait until the banner is fixed. The current situation where people can get to en.wikidata.org and assume everyth" [operations/dns] - 10https://gerrit.wikimedia.org/r/119032 (owner: 10Faidon Liambotis) [08:09:34] (03PS10) 10Ori.livneh: Change home directory of vagrant user [operations/puppet] - 10https://gerrit.wikimedia.org/r/118053 (owner: 10Physikerwelt) [08:09:39] (03CR) 10Ori.livneh: [C: 032] Change home directory of vagrant user [operations/puppet] - 10https://gerrit.wikimedia.org/r/118053 (owner: 10Physikerwelt) [08:16:20] PROBLEM - Puppet freshness on tantalum is CRITICAL: Last successful Puppet run was Wed 26 Mar 2014 05:12:06 PM UTC [08:40:08] !log springle synchronized wmf-config/db-eqiad.php 's7 pool db1034, warm up' [08:40:14] Logged the message, Master [09:56:35] (03PS1) 10Springle: s7 pool db1034 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/121337 [09:56:54] (03CR) 10Springle: [C: 032] s7 pool db1034 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/121337 (owner: 10Springle) [09:57:01] (03Merged) 10jenkins-bot: s7 pool db1034 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/121337 (owner: 10Springle) [09:57:37] !log springle synchronized wmf-config/db-eqiad.php 's7 db1034 full steam' [09:57:43] Logged the message, Master [10:02:37] !log db1034 running custom mariadb 5.5.34 build. if in doubt, depool [10:02:42] Logged the message, Master [10:02:57] (03PS2) 10ArielGlenn: download.wikimedia.org to misc web cluster with dataset1001 backend [operations/puppet] - 10https://gerrit.wikimedia.org/r/120998 [10:07:40] hi springle [10:08:04] aude: hi ;) [10:08:20] wondering if we ever did the schema changes for wb_terms table? [10:08:28] 1) https://gerrit.wikimedia.org/r/#/c/99637/ make row ids use bigint [10:08:39] 2) https://gerrit.wikimedia.org/r/#/c/99660/ composite indexes [10:09:03] * springle looks [10:09:27] thanks [10:10:05] there must be bugzilla ticket for those [10:10:33] https://bugzilla.wikimedia.org/show_bug.cgi?id=60540 [10:10:44] https://bugzilla.wikimedia.org/show_bug.cgi?id=60539 [10:12:05] aude: the indexes have been applied. the conversion od id to bigint has not yet been applied [10:12:10] ok [10:12:31] (03PS3) 10ArielGlenn: download.wikimedia.org to misc web cluster with dataset1001 backend [operations/puppet] - 10https://gerrit.wikimedia.org/r/120998 [10:12:37] i think we were going to do something a bit different for wikidata [10:12:39] I should have commented on 99660. will do so [10:13:16] or that was for the indexes maybe [10:13:23] different how? [10:14:17] it was partitioning for indexes [10:15:08] (03CR) 10ArielGlenn: [C: 032] download.wikimedia.org to misc web cluster with dataset1001 backend [operations/puppet] - 10https://gerrit.wikimedia.org/r/120998 (owner: 10ArielGlenn) [10:18:24] aude: wb_terms is partitioned on production slaves, but iirc daniel decided not to out that into the schema, and just leave it as a WMF production optimization [10:18:32] s/out/put/ [10:18:48] that's what i remember [10:18:58] which i think was a good idea. so far partitioning is helping, but as data set grows... who knows [10:19:06] agree [10:19:18] when do you think we can do the other change? [10:19:57] not a hurry, perhaps but don't want to forget [10:20:06] i have it as low priority, but not forgotten [10:20:23] ok, that's fine [10:20:37] because wb_terms id is still only 300m, and smaller primary key means smaller secondary indexes [10:20:43] yep [10:21:35] when wikidata gets it's own shard, then we can probably pull it all into line. the way wikidatawiki grows, i've been thinking that will be sooner rather than later [10:21:43] ok [10:23:20] hi :)) [10:25:48] springle: commented on the bugs [10:26:44] aude: thank you [10:54:33] (03PS1) 10ArielGlenn: separate stanzas for http/https redirects for download.wm.o [operations/puppet] - 10https://gerrit.wikimedia.org/r/121344 [10:56:23] (03CR) 10ArielGlenn: [C: 032] separate stanzas for http/https redirects for download.wm.o [operations/puppet] - 10https://gerrit.wikimedia.org/r/121344 (owner: 10ArielGlenn) [10:59:14] (03PS1) 10ArielGlenn: missed two vars in the download.wm.o redirect fixup [operations/puppet] - 10https://gerrit.wikimedia.org/r/121349 [11:00:52] (03CR) 10ArielGlenn: [C: 032] missed two vars in the download.wm.o redirect fixup [operations/puppet] - 10https://gerrit.wikimedia.org/r/121349 (owner: 10ArielGlenn) [11:17:20] PROBLEM - Puppet freshness on tantalum is CRITICAL: Last successful Puppet run was Wed 26 Mar 2014 05:12:06 PM UTC [11:20:20] PROBLEM - HTTP 5xx req/min on tungsten is CRITICAL: CRITICAL: reqstats.5xx [crit=500.000000 [11:22:06] (03PS1) 10Hashar: beta: point autoupdater to use /data/project [operations/puppet] - 10https://gerrit.wikimedia.org/r/121360 [11:22:22] (03CR) 10ArielGlenn: [C: 032] dumps.wikimedia.org moved to dataset1001 [operations/dns] - 10https://gerrit.wikimedia.org/r/121000 (owner: 10ArielGlenn) [11:22:26] apergos: while you are around would you mind merging in a path change for beta please? https://gerrit.wikimedia.org/r/121360 :-] [11:22:48] !log springle synchronized wmf-config/db-eqiad.php 's5 depool db1021 for schema changes' [11:22:50] cause some file not found errors on the eqiad beta cluster because it lacks /home/wikipedia :] [11:22:53] Logged the message, Master [11:24:03] I can't right now hashar, I screwed up the https redirection on the backend of the dataset hosts and need to figure out how to fix [11:24:11] :( [11:24:29] apergos: just need +2 / merge in repo, doesn't have to be on the prod puppetmaster :/ [11:24:44] though I can cherry pick it on our puppetmaster [11:24:47] will just do that :] [11:26:05] I keep forgetting we have our own puppet in eqiad [11:27:00] hashar: sorry for not starting the backport of puppet-lint, busy as crazy [11:27:11] matanya: that is not that much of a priority :] [11:27:17] migrated production servers all week long [11:27:19] matanya: and we can still use gem if it happens to be urgent hehe [11:27:52] ugh, I didn't want this to rely on parent that won't be sommmitted, and now it's submitted and merge pending [11:27:54] sigh [11:28:25] ok, noted hashar thanks [11:28:27] (03CR) 10Hashar: [C: 031 V: 032] "cherry picked on deployment-salt.eqiad.wmflabs puppet master. That fixed the path issue. Compare:" [operations/puppet] - 10https://gerrit.wikimedia.org/r/121360 (owner: 10Hashar) [11:28:42] apergos: you can use the cherry-pick button in Gerrit [11:28:48] apergos: and cherry pick your change to the production branch [11:28:59] this way its parent will be the tip of the branch and you can merge it :] [11:29:29] I git commmit ammend and push it after the cherry pick? [11:29:39] that's way too many mms in there [11:31:25] (03PS2) 10ArielGlenn: dumps.wikimedia.org moved to dataset1001 [operations/dns] - 10https://gerrit.wikimedia.org/r/121000 [11:32:15] (03PS3) 10ArielGlenn: dumps.wikimedia.org moved to dataset1001 [operations/dns] - 10https://gerrit.wikimedia.org/r/121000 [11:32:50] (03CR) 10ArielGlenn: [C: 032] dumps.wikimedia.org moved to dataset1001 [operations/dns] - 10https://gerrit.wikimedia.org/r/121000 (owner: 10ArielGlenn) [11:51:44] !log springle synchronized wmf-config/db-eqiad.php 's5 repool db1021 warm up' [11:51:50] Logged the message, Master [12:40:20] (03PS1) 10Hashar: beta: bring in scap related scripts on bastion [operations/puppet] - 10https://gerrit.wikimedia.org/r/121365 [12:43:33] (03CR) 10Hashar: [C: 031 V: 032] "Applied on eqiad beta cluster puppet master deployment-salt.eqiad.wmflabs. Fixed the beta-code-update-eqiad jenkins job which takes care " [operations/puppet] - 10https://gerrit.wikimedia.org/r/121365 (owner: 10Hashar) [12:56:45] paravoid: i know it isn't the place, but debian wiki is down [13:07:04] !log springle synchronized wmf-config/db-eqiad.php 's5 db1021 full steam' [13:07:10] Logged the message, Master [13:18:12] (03CR) 10ArielGlenn: "do not merge until further notice; on hold" [operations/dns] - 10https://gerrit.wikimedia.org/r/120999 (owner: 10ArielGlenn) [13:25:10] RECOVERY - Puppet freshness on tantalum is OK: puppet ran at Thu Mar 27 13:25:02 UTC 2014 [13:55:20] springle: I assume db1034 is you, correct? [13:55:31] paravoid: yeah, checking it now [13:55:37] okay, great [13:58:13] hashar: i tested the upstream debian package from jessie [13:58:27] it works ok in presice [13:58:38] you can take that, if you want [13:58:53] springle: btw, any reason you're doing "silence notifications" instead of "acknowledge" in the icinga web intf? [13:58:56] matanya: for what? [13:59:02] puppet-lint [13:59:30] [1830901.261363] mysqld[16175]: segfault at 0 ip 00000000006840d6 sp 00007f3d12452c00 error 4 in mysqld[400000+b54000] [13:59:36] that's from db1034 [14:00:42] paravoid: i forgot to renable the silenced stuff from earlier [14:00:45] and now tested the one from trusty. works ok in precise too. same dep's [14:01:14] so basicly, if ops wish, backporting is quite easy [14:01:16] springle: if you do acknowledge, it sticks until it recovers, then warns again [14:01:38] Table './heartbeat/heartbeat' is marked as crashed and should be repaired [14:01:45] fun stuff [14:04:33] !log springle synchronized wmf-config/db-eqiad.php 's7 depool db1034 after crash' [14:04:39] Logged the message, Master [14:05:07] matanya: awesome. If you could handle the communication with ops to get the puppet-lint package updated in apt.wikimedia.org that would be nice :] [14:05:20] matanya: I think I previously filled a RT then poked folks here. [14:06:51] springle: Megan (who saw server errors on www and meta) is planning a fundraising/banner test this AM, she's asking if she should proceed or if she should hold off based on the database issue?. [14:08:49] Jeff_Green: fine to go ahead. db1034 was something of a test case. it won't go back in until i figure out what killed it [14:09:00] springle: great, thank you [14:21:46] (03PS5) 10Hashar: Create roles for syslog-ng [operations/puppet] - 10https://gerrit.wikimedia.org/r/119257 [14:21:52] (03PS6) 10Hashar: Make syslog-ng basepath a parameter [operations/puppet] - 10https://gerrit.wikimedia.org/r/119256 [14:22:45] (03PS7) 10Hashar: Make syslog-ng basepath a parameter [operations/puppet] - 10https://gerrit.wikimedia.org/r/119256 [14:23:13] (03CR) 10Hashar: Make syslog-ng basepath a parameter (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/119256 (owner: 10Hashar) [14:35:57] (03PS1) 10Tim Landscheidt: Tools: Forward mail for root to the admin group [operations/puppet] - 10https://gerrit.wikimedia.org/r/121377 [14:38:51] (03PS8) 10Hashar: Make syslog-ng basepath a parameter [operations/puppet] - 10https://gerrit.wikimedia.org/r/119256 [14:40:30] (03PS6) 10Hashar: Create roles for syslog-ng [operations/puppet] - 10https://gerrit.wikimedia.org/r/119257 [14:43:38] (03CR) 10Andrew Bogott: [C: 032] Make syslog-ng basepath a parameter [operations/puppet] - 10https://gerrit.wikimedia.org/r/119256 (owner: 10Hashar) [14:43:43] \O/ [14:44:23] (03CR) 10Andrew Bogott: [C: 032] Create roles for syslog-ng [operations/puppet] - 10https://gerrit.wikimedia.org/r/119257 (owner: 10Hashar) [14:44:25] manybubbles: you there? [14:44:31] pong [14:44:40] (03CR) 10coren: [C: 04-2] "I have a more general solution WIP right now." [operations/puppet] - 10https://gerrit.wikimedia.org/r/121377 (owner: 10Tim Landscheidt) [14:44:51] hey so, i think the jvm deployment stuff is working [14:45:11] i can't fully test it in analytics cluster because of network acls there, need to get access to archiva from those nodes [14:45:22] but! can we try it for your stuff somehow? [14:45:32] hmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmm [14:45:34] probably [14:45:40] i'm going to work on writing up docs today, working through it with you will help me do that too [14:45:46] we can certainly see if it drops files into the right spots [14:46:04] we'll always manually bounce the Elasticsearch nodes anyway [14:46:19] well, we can just see if we can deploy them first :) [14:47:01] ottomata: you made git-fat a dependency of deployment, so it needs to now be build for trusty as well [14:47:01] step 1: deploy the .jars you need to archiva. [14:47:02] step 2: git-fat add them to some deployment repo, enable git-fat on that deployment target [14:47:02] step 3: do a deployment [14:47:07] (speaking of git-fat) [14:47:09] done done doooone! [14:47:12] wait [14:47:17] i did not mean 'done' the word [14:47:21] i meant DUN the musical sound :p [14:47:22] ok paravoid! [14:47:41] um, ok, it is just a python script [14:47:52] should I just change the distribution and add it to apt? [14:49:53] you can even reprepro copy it [14:49:59] if it works [14:50:54] oh it should, lemme see [14:55:22] paravoid, do we have wikimedia-trusty dist? [14:55:25] yes [14:55:26] is that what I shoudl copy it to? [14:55:33] yes [14:55:35] k [14:57:17] (03PS1) 10Hashar: beta: fix wrong include in role::beta::natfix [operations/puppet] - 10https://gerrit.wikimedia.org/r/121378 [14:59:09] hmm, maybe reprepro copy won't work? [14:59:12] Will not copy as not found: git-fat. [14:59:57] paravoid: you seen that before? ^ [15:00:12] the copy arguments are very counter-intuitive [15:00:14] they're opposite [15:00:17] did you notice that? [15:00:30] it's copy [15:00:36] if I recall correctly [15:00:37] i did not notice that [15:00:53] ah ha! there it goes [15:01:05] that worked, thanks! [15:01:15] (03PS1) 10Hashar: Add bastion-eqiad.wmflabs.org is a bastion_hosts [operations/puppet] - 10https://gerrit.wikimedia.org/r/121379 [15:03:02] it's like strcpy ;) [15:04:10] welllll woot! git-deploy + archiva + git-fat just worked for the first time [15:04:11] yeehaw [15:10:02] (03PS2) 10Hashar: Add labs eqiad bastions in networks.pp [operations/puppet] - 10https://gerrit.wikimedia.org/r/121379 [15:10:40] (03CR) 10Andrew Bogott: [C: 032] Add labs eqiad bastions in networks.pp [operations/puppet] - 10https://gerrit.wikimedia.org/r/121379 (owner: 10Hashar) [15:11:16] (03PS1) 10coren: Tool Labs: fixes mail to root (and postmaster) [operations/puppet] - 10https://gerrit.wikimedia.org/r/121380 [15:11:34] is wikimedia going to switch to WebScaleSQL? [15:13:09] (03CR) 10Andrew Bogott: [C: 032] beta: fix wrong include in role::beta::natfix [operations/puppet] - 10https://gerrit.wikimedia.org/r/121378 (owner: 10Hashar) [15:14:06] domas: do you still need access to stat1? [15:14:08] (03CR) 10coren: [C: 032] "Tested already." [operations/puppet] - 10https://gerrit.wikimedia.org/r/121380 (owner: 10coren) [15:14:24] matanya: what does it do, what is all that thing about revoking access, maybe I need it! [15:14:38] I don't know when I need access to things, but when I need them it is better to have it [15:14:39] :) [15:14:58] domas: ottomata is auditing access to this machine [15:15:04] i'm trying to help him [15:15:12] what is so special about that machine to audit access to it [15:15:24] it got some super private data stuffs [15:15:25] do people have nothing better to do? [15:15:28] naw it doesn't [15:15:32] domas: yup, exactly [15:15:32] :) [15:15:36] then it's boring :P [15:15:36] fine [15:15:38] then leave my access [15:15:40] :)) [15:15:47] we are migrating to a new machine [15:15:54] just not migrating things we don't need to, to clean up [15:15:55] it is being decommed, and we don't want to create them on the new replace box [15:16:01] (03PS6) 10Hoo man: Introduce an admins::release user group [operations/puppet] - 10https://gerrit.wikimedia.org/r/116019 [15:16:17] domas, you would only need access on stat1 if you intend to query research slave dbs and save the data and do computation on it [15:16:21] mh... rush doesn't have IRC, does he? [15:16:33] is does hoo [15:16:51] right, then I need access!!!! [15:16:52] his nick is chasemp [15:17:01] though I enjoy building my own stats pipelines! [15:17:15] * hoo eyes chasemp for review https://gerrit.wikimedia.org/r/116019 [15:17:31] domas: you have enough private data at facebook :P [15:17:37] domas is a root [15:17:43] as such he gets to keep access to all machines [15:17:58] yay [15:18:43] thank you mark [15:18:55] (03CR) 10Hoo man: [C: 031] "anyone?" [operations/puppet] - 10https://gerrit.wikimedia.org/r/120730 (owner: 10Hoo man) [15:20:05] lets see who is left [15:20:26] (03Abandoned) 10Hoo man: Remove deprecated wgCopyrightIcon in favor of wgFooterIcons [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/110668 (owner: 10Hoo man) [15:32:02] (03PS1) 10Springle: Host was depooled from tin. Make it stick until further post-crash testing can be done. [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/121387 [15:40:24] (03PS1) 10Ottomata: Moving inclusion of standard and admins::roots out of analytics roles [operations/puppet] - 10https://gerrit.wikimedia.org/r/121389 [15:42:04] (03PS2) 10Ottomata: Moving inclusion of standard and admins::roots out of analytics roles [operations/puppet] - 10https://gerrit.wikimedia.org/r/121389 [15:42:22] RobH: daily question : what do i need to do in order to get a newer version to apt.wikimedia.org ? [15:42:46] matanya: , someone has to build and/or add the package there [15:42:53] newer version of what? [15:42:57] pupet-lint [15:43:01] *puppet [15:43:24] what ottomata said, as for how to do that.... [15:43:41] i guess an RT ticket detailing why and such [15:43:46] aye [15:43:59] but usually has to be a pretty detailed reason to not stick with the ubuntu defaults (cuz maintaining is a pain) [15:44:26] but i have not been involved in that for awhile so if someone else knows more correct me (1) [15:44:30] (!) even. [15:44:58] doesn't hurt that matanya can point at a long list of things demonstrating use of lint.... [15:45:04] heh [15:45:28] (03CR) 10Springle: [C: 032 V: 032] Host was depooled from tin. Make it stick until further post-crash testing can be done. [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/121387 (owner: 10Springle) [15:45:54] (03CR) 10Ottomata: [C: 032 V: 032] Moving inclusion of standard and admins::roots out of analytics roles [operations/puppet] - 10https://gerrit.wikimedia.org/r/121389 (owner: 10Ottomata) [15:45:57] RobH: i have backported the one from trusty, and tested the one from jessie [15:46:15] they both act nicely and sane, and have the same dependencies [15:46:32] Coren: i just merged a tool labs commit from you, hope that's ok [15:46:35] on palladium [15:47:03] so it is not much of a pain involved. i also tested on a trusty instance, worked good as well [15:47:58] so apart from pushing the debian directory (to where? operations/debs/puppet-lint?) what should i do ? [15:49:21] RobH: we can wait for trusty, i guess, but i'm not sure we need to [15:49:23] (03PS2) 10BBlack: lvs300[1234] dns for esams private subnet [operations/dns] - 10https://gerrit.wikimedia.org/r/121235 [15:49:25] so, RT ticket would be good, but if you want to make it super easy, make operations/debs/puppet-lint a git-buildpackage usable repo [15:49:31] that would mean: [15:49:35] master is just a fork of upstream [15:49:38] ottomata: It was. Sorry, I got distracted mid-process. [15:49:49] debian branch is a branch of master with changes only made to debian/ directory [15:49:59] matanya: oh not saying its a no, ottomata has more info on this than me obviously =] [15:50:01] debian/gbp.conf specifies which branches to use [15:50:06] etc. [15:50:10] then someone can just clone your repo [15:50:11] (03CR) 10BBlack: [C: 032 V: 032] lvs300[1234] dns for esams private subnet [operations/dns] - 10https://gerrit.wikimedia.org/r/121235 (owner: 10BBlack) [15:50:13] git-buildpackage [15:50:15] and have a .deb [15:50:58] so just clone the upstream source deb tree to out tree basiclly ottomata ? [15:51:35] (03PS2) 10BBlack: private1-esams dhcp/preseed stuff [operations/puppet] - 10https://gerrit.wikimedia.org/r/121236 [15:52:16] (03CR) 10BBlack: [C: 032 V: 032] private1-esams dhcp/preseed stuff [operations/puppet] - 10https://gerrit.wikimedia.org/r/121236 (owner: 10BBlack) [15:54:48] matanya, i guess it depends on what upstream's layout looks like [15:55:06] are they using a git repo? do they already have a debian directory? is it from the source package? is it from a source tarball? [15:55:20] matanya: http://honk.sigxcpu.org/projects/git-buildpackage/manual-html/gbp.html [15:55:30] ottomata: http://packages.ubuntu.com/source/trusty/puppet-lint [15:55:33] (03PS1) 10Ottomata: role::analytics must be included before standard so that ganglia_cluster is set properly [operations/puppet] - 10https://gerrit.wikimedia.org/r/121395 [15:55:39] ahhh from ubuntu hmmm right [15:55:46] (03CR) 10jenkins-bot: [V: 04-1] role::analytics must be included before standard so that ganglia_cluster is set properly [operations/puppet] - 10https://gerrit.wikimedia.org/r/121395 (owner: 10Ottomata) [15:55:55] oh wait, there is already a deb built? [15:55:59] is this the version you want? [15:56:00] yes [15:56:08] oh its just from future? [15:56:13] yes [15:56:16] trusty [15:56:17] oh oh oh [15:56:31] works well on precise [15:56:33] i take all that back then, I don't think we need a repo…i am not certain, but we can probably just import it into apt…paravoid [15:56:34] ? [15:56:58] same deps and all that [15:57:24] (03PS2) 10Ottomata: role::analytics must be included before standard so that ganglia_cluster is set properly [operations/puppet] - 10https://gerrit.wikimedia.org/r/121395 [15:58:07] (03CR) 10Ottomata: [C: 032 V: 032] role::analytics must be included before standard so that ganglia_cluster is set properly [operations/puppet] - 10https://gerrit.wikimedia.org/r/121395 (owner: 10Ottomata) [16:02:22] greg-g: Sam's otherwise engaged so I'm going to start deploy prep for 1.23wmf20. [16:04:00] \O/ [16:08:21] (03PS1) 10Ottomata: Moving analytics ganglia cluster definition into site.pp [operations/puppet] - 10https://gerrit.wikimedia.org/r/121396 [16:08:29] (03PS2) 10Ottomata: Moving analytics ganglia cluster definition into site.pp [operations/puppet] - 10https://gerrit.wikimedia.org/r/121396 [16:08:34] (03CR) 10jenkins-bot: [V: 04-1] Moving analytics ganglia cluster definition into site.pp [operations/puppet] - 10https://gerrit.wikimedia.org/r/121396 (owner: 10Ottomata) [16:08:50] (03CR) 10Ottomata: [C: 032 V: 032] Moving analytics ganglia cluster definition into site.pp [operations/puppet] - 10https://gerrit.wikimedia.org/r/121396 (owner: 10Ottomata) [16:12:08] (03CR) 10CSteipp: "It would if there was anything sensitive in the process or environment, which is why all hardening guides tell you to turn off core dumps." [operations/puppet] - 10https://gerrit.wikimedia.org/r/119225 (owner: 10Faidon Liambotis) [16:13:27] akosiaris: Did I hear that you have some magic way to detect whether or not a puppet patch is a no-op? https://gerrit.wikimedia.org/r/#/c/120005/2 [16:15:01] andrewbogott: he has, it is called catalog-differ [16:15:28] bd808|deploy: alrighty then [16:15:39] (03Abandoned) 10Tim Landscheidt: Tools: Forward mail for root to the admin group [operations/puppet] - 10https://gerrit.wikimedia.org/r/121377 (owner: 10Tim Landscheidt) [16:15:45] * bd808|deploy is cutting the branch now [16:15:54] and i'm waiting to when he releases it with oss lisence [16:16:38] ottomata: i'f you hear from paravoid on the package thingy, please let me know. i'm out, see you. [16:20:41] (03PS1) 10Mglaser: Added Markus Glaser's GPG keys [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/121397 [16:21:14] (03PS1) 10Tim Landscheidt: Tools: Work around problem with mail for nested service groups [operations/puppet] - 10https://gerrit.wikimedia.org/r/121398 [16:21:42] (03CR) 10Hashar: contint: override .jshintrc file on gallium (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/119750 (owner: 10Hashar) [16:23:15] (03PS3) 10Hashar: contint: override .jshintrc file on gallium [operations/puppet] - 10https://gerrit.wikimedia.org/r/119750 [16:23:31] (03PS4) 10Hashar: contint: override .jshintrc file on gallium [operations/puppet] - 10https://gerrit.wikimedia.org/r/119750 [16:25:03] hashar: so why isn't this *in* role::ci::slave? [16:25:11] and the Mount as well [16:25:33] paravoid: we need it on the host having both the integration/docroot.git repository and the role::ci::slave. [16:25:46] paravoid: that is merely a hack for gallium :/ [16:25:50] ? [16:26:26] on gallium we have integration/docroot.git cloned at /srv/ and it provides a non default .jshintrc file as /srv/.jshintrc [16:27:00] why do we have it at /srv instead of /srv/www or something? [16:27:02] then we have role::ci::slaves which setup jenkins workspaces under /srv/ssd/jenkins-slave/workspace/ Whenever a jshint job run on gallium host, jshint will fallback to use the /srv/.jshintrc , causing jobs to fail [16:27:17] tech debt [16:27:37] (03CR) 10Rush: "to quote faidon "What's the rationale for this?"" [operations/puppet] - 10https://gerrit.wikimedia.org/r/116019 (owner: 10Hoo man) [16:27:49] we have the sites listed as /srv/org/wikimedia/integration/ (for http://integration.wikimedia.org/ ) [16:28:12] chasemp: You're rush, right? [16:28:18] which is what we are doing for blog / bugzilla / smoke ping etc [16:28:30] hoo: true [16:29:15] chasemp: Ok, point is to not give everyone access to private data... the users taht don't need the access should not have it [16:29:26] That's pretty obvious I hope [16:29:58] that's why I created a separate "group" for the release users so that they only get access to the bastion and their release machine [16:30:13] (03CR) 10Hashar: lvs: generic::upstart_job() now uses boolean values (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/118717 (owner: 10Hashar) [16:30:32] (03PS4) 10Hashar: lvs: generic::upstart_job() now uses boolean values [operations/puppet] - 10https://gerrit.wikimedia.org/r/118717 [16:31:12] hashar: okay, fix that then :) [16:31:30] paravoid: no way I am going to pill even more puppet changes in my dashboard to fix all the /srv/org/ paths [16:32:02] (03CR) 10Hashar: openstack: generic::upstart_job() now uses boolean values (032 comments) [operations/puppet] - 10https://gerrit.wikimedia.org/r/118716 (owner: 10Hashar) [16:32:12] (03PS4) 10Hashar: openstack: generic::upstart_job() now uses boolean values [operations/puppet] - 10https://gerrit.wikimedia.org/r/118716 [16:32:16] no way I'm gonna merge more hackish workarounds on the ground that we have tech debt :) [16:32:40] fine to me [16:32:41] :-D [16:33:29] it just piss me of that providing a .jshintrc file containing {}  requires me to spend a couple days refactoring all the jenkins jobs / apache conf relying on websites being at /srv/org/* [16:34:21] you've built all of the manifests, who are you blaming for the tech debt? [16:34:32] (03CR) 10Hashar: twemproxy: generic::upstart_job() now uses boolean values (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/118718 (owner: 10Hashar) [16:34:37] (03PS4) 10Hashar: twemproxy: generic::upstart_job() now uses boolean values [operations/puppet] - 10https://gerrit.wikimedia.org/r/118718 [16:34:46] hoo: I understand there is a creation of a sub-group, but I don't have the context knowledge to know if. a) those are the right ppl to be in it. b) if it's worth creating. the context I'm missing is the usage context. these users do x that needs y, and these users do x that needs z. let's separate them. I may not be the best reviewer but for me overall it is not clear. [16:35:40] private data is ambigious to me in this context, some is obviously more private than others [16:35:50] chasemp: Well, all of our Databases etc. [16:35:54] I guess that's pretty private [16:36:09] all restricted users have access to that (now party...) [16:37:12] (03PS1) 10Hashar: Lint mediawiki::twemproxy [operations/puppet] - 10https://gerrit.wikimedia.org/r/121400 [16:38:47] (03Abandoned) 10Hashar: contint: override .jshintrc file on gallium [operations/puppet] - 10https://gerrit.wikimedia.org/r/119750 (owner: 10Hashar) [16:41:33] hoo: ok, starting over. why are the users mglaser and mah different from other restricted users? [16:41:56] (03CR) 10Andrew Bogott: "I've tested on labs and verified that this is a no-op." [operations/puppet] - 10https://gerrit.wikimedia.org/r/120005 (owner: 10Hashar) [16:43:32] chasemp: Well, I think they don't need the full access plus they're both rather new shell users so they probably didn't get "used" to the whole thing [16:43:48] like both don't need DB access etc. [16:43:56] * bd808|deploy is checking out 1.23wmf20 in tin [16:43:59] I doubt they even know they have that access :P [16:44:25] chasemp: what's this in question to? re mark and markus? they're the tarball release team [16:45:20] chasemp: see https://www.mediawiki.org/wiki/Release_Management_RFP/NicheWork_and_Hallo_Welt! [16:46:08] greg-g: thank you [16:46:12] np [16:46:19] * greg-g finally opens up the gerrit change in question, he thinks [16:50:34] hoo: are we giving the new release group different rights? I can't see what the difference is for them. (ie: would they now have less access than mortals?) [16:50:54] greg-g: They have less access than mortals and restricted, yes [16:51:11] unless someone started epxosing private data on the actual bastion [16:51:20] if someone did and I'm going to find them and ... [16:51:37] s/and/that/ [16:52:04] hoo: this sounds like something prompted you to do this? [16:52:07] (03CR) 10Hashar: "The RT ticket for my account is 4101, not 401 :-] Apart from that seems fine." (032 comments) [operations/puppet] - 10https://gerrit.wikimedia.org/r/121091 (owner: 10Rush) [16:52:40] greg-g: Yep, after I got my shell access I started reading up on the puppet etc. and I scared at how much access we give to so many people [16:52:54] this isn't about anyone in particular, but this sounded like a good start [16:53:02] especially as these are new shell users [16:53:17] I see [16:54:06] hoo: I assume this won't affect them much (hexmode and mglaser), but I'd check with them to see what production servers they use first [16:54:20] I'm all for a release group for them/that role, btw [16:54:24] greg-g: Would be great [16:54:44] lemme send an email, cc'ing, k? [16:54:46] (03PS1) 10BryanDavis: Remove 1.23wmf11 symlinks [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/121403 [16:54:48] (03PS1) 10BryanDavis: Remove 1.23wmf12 symlinks [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/121404 [16:54:49] we should have rather atomic groups for all purpose (with own unix groups assigned...) [16:54:50] (03PS1) 10BryanDavis: Add 1.23wmf20 symlinks [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/121405 [16:54:52] (03PS1) 10BryanDavis: Wikipedias to 1.23wmf19 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/121406 [16:54:54] (03PS1) 10BryanDavis: Group0 wikis to 1.23wmf20 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/121407 [16:54:56] hoo: /me nods [16:55:23] I'm going to try working towards that, but my time is limited and review is slow over here [16:57:43] (03CR) 10BryanDavis: [C: 032] Remove 1.23wmf11 symlinks [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/121403 (owner: 10BryanDavis) [16:58:14] (03CR) 10BryanDavis: [C: 032] Remove 1.23wmf12 symlinks [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/121404 (owner: 10BryanDavis) [16:58:44] hoo: chat with chasemp, he's doing a ton of that right now, in general (well, laying the ground work) [16:58:45] (03CR) 10BryanDavis: [C: 032] Add 1.23wmf20 symlinks [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/121405 (owner: 10BryanDavis) [16:59:18] I attempting to make this a manageable situation at the moment [16:59:26] * bd808|deploy gives zuul & jenkins a disapproving stare [17:00:47] bd808|deploy: I think it's stalled... [17:01:17] chasemp: +1 on that... feel free to add me to changes, I might even be able to +1, but I can't really approve (mssing the rights) [17:01:21] hashar: Any idea what's up with the zuul test queue? [17:01:36] damn again [17:01:40] happened yesterday as well [17:01:45] there is some rather weird issue in Zuul :/ [17:02:22] :( [17:02:39] something weird happened around 15:30 UTC [17:02:55] i really need to get zuul upgraded :D [17:03:12] YOu just can't trust demon dogs to do their job promptly these days [17:03:53] Zuul does add the jobs to the gearman server but for some reason the gearman client in Jenkins never receive/process them [17:03:54] :/ [17:05:19] It seems like it's only for some test types. Are the tests pinned to different slaves? /me speculates wildly [17:06:30] bd808|deploy: those tests are roaming [17:06:48] I think on Jenkins side some jobs simply disappear for some reason [17:07:38] Now https://gerrit.wikimedia.org/ is non-responsive for me. [17:07:53] down for me too [17:08:05] [18:06] how hard is it to write a java website that doesn't fall over every half hour [17:08:14] bd808|deploy: ahhh operations-puppet-validate is bound to gallium indeed [17:08:29] gerrit web seems to be back [17:08:35] as well as operations-mw-config-tests [17:08:53] so that must be gallium being weird again :(- [17:08:57] * bd808|deploy has fancy debugging intuition [17:09:01] still down for me :( [17:09:11] ahh, it loaded [17:09:19] it just took like forty seconds [17:09:31] !log Changes are stalled in Zuul because jobs tied to the 'gallium' slaves are not being processed. [17:09:32] i mean, the page loaded. the content is still loading. [17:09:38] Logged the message, Master [17:09:38] !log Jenkins restarting gallium slave. [17:09:42] will keep you updated. [17:09:44] Logged the message, Master [17:09:45] still loading. [17:09:49] oooh, it failed. [17:09:51] MatmaRex: :P [17:09:56] Code Review - Error. Server Unavailable. 0 [17:10:53] we have jenkins slaves running on the jenkins master machine? [17:11:02] (03PS1) 10Cmjohnson: adding dhcp for stat1003 [operations/puppet] - 10https://gerrit.wikimedia.org/r/121409 [17:11:11] https://wikitech.wikimedia.org/wiki/Gallium [17:11:19] That's pretty common. [17:11:55] doesn't cause stability issues with high load? [17:13:12] !log repooled Jenkins slave 'gallium' [17:13:18] Logged the message, Master [17:13:30] I am not sure Zuul will repool the jobs though :/ [17:13:34] hopefully it is smart enough [17:13:47] * bd808|deploy takes this opportunity for a short break before the long scap begins [17:17:05] (03CR) 10Greg Grossmeier: "I assume we need a way to verify your control of the associated private key?" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/121397 (owner: 10Mglaser) [17:17:22] greg-g: No, we usually take gerrit users as legt [17:17:24] * legit [17:17:33] also for ssh keys [17:18:00] hoo: I've had to add my ssh key to the officewiki to get access [17:18:15] that's only for staffers [17:18:29] so we're less strick with non-staffers? [17:18:36] probably [17:18:41] that's not good practice [17:18:42] at least for ssh keys [17:18:50] especially for something like signing the tarballs :) [17:18:54] so I *think* Zuul is proceeding stalled change [17:19:00] well, I guess you guys don't want us on officewiki [17:19:02] otherwise it's just a bit of security theater [17:19:11] not convinced officewiki is more secure [17:19:13] no, but some other out of band confirmation/test [17:19:16] (although we have access to its DB etc. after gaining the access :P) [17:19:44] it isn't about the wiki being secure, it's more about "you are who you say you are" as user creation on officewiki is restricted to HR [17:19:52] and HR has a passport/photo id of you :) [17:19:55] easily hackable [17:20:00] yep [17:20:02] sure, everything is [17:20:03] as easy or easier than gerrit [17:20:13] but that doesn't mean we don't do some verification [17:20:13] with the rigth timing one could probably change the ssh keys and then gain root [17:20:14] ugh [17:20:16] whatever [17:20:27] eg. all restricted users could very easily do that [17:20:54] so, you're arguing that we shouldn't verify things like gpg keys and such except via a gerrit change? [17:21:03] ie: not verify at all? [17:21:13] (03Merged) 10jenkins-bot: Remove 1.23wmf11 symlinks [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/121403 (owner: 10BryanDavis) [17:21:16] greg-g: no no... but I think gerrit is better than officewiki [17:21:18] (03Merged) 10jenkins-bot: Remove 1.23wmf12 symlinks [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/121404 (owner: 10BryanDavis) [17:21:25] (03Merged) 10jenkins-bot: Add 1.23wmf20 symlinks [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/121405 (owner: 10BryanDavis) [17:21:49] not that it helps that much, but i was chatting with the ops person same time i put my ssh key on gerrit [17:21:50] hoo: my point is that is only one communication medium, officewiki provides 2. I'd expect we want a second medium of communication to verify this gerrit change [17:22:09] both would have to be hacked and someone impersonate me [17:22:20] !log Zuul managed to retrigger all jobs that got stalled. The root cause is that the 'gallium' slave was no more proceeding jobs. The way to fix it is to unpool the slave from the Jenkins web interface (mark it offline) and repool it. I also raised the number of executors, some of the executors might be stalled for some reasons [17:22:25] Logged the message, Master [17:22:28] bd808|deploy: you can resume. Sorry [17:22:32] whatever, I'll listen to whatever robh or csteipp tells me, that's it :) [17:22:36] :) [17:22:39] bd808|deploy: gallium slave in Jenkins went wild and was no more proceeding changes. [17:22:43] greg-g: True... but all of these web based things aren't 110% trusty with such things INO [17:22:45] * IMO [17:22:50] hashar: No worries. Thanks for fixing it [17:22:51] ? [17:23:01] RobH: https://gerrit.wikimedia.org/r/121397 [17:23:04] better use encrypted mail or so [17:23:14] * signed [17:23:35] * greg-g tries to remember to sign his weekly update emails, for whatever non-good that does ;) [17:23:38] So if someone is changing their own keys, and they aren't compromised, then there is no reason not to suspect gerrit changesets of being invalid. [17:23:49] RobH: this is a new key [17:24:37] ahh, so the quesiton is how do we know the dude submitting the patchset is indeed the person with those said keys? [17:24:46] !log Deleted /a/common/php-1.23wmf1[12] on tin [17:24:47] right [17:24:52] Logged the message, Master [17:25:04] I don't know what verification of his ssh key has been done [17:25:19] gerrit ssh key, that is [17:25:33] not his bast1001 key, his gerrit key [17:25:38] yea, i've done keys based off just that, but it was when i knew it was also the right gerrit user [17:25:45] just from dealing with their patchsets. [17:25:56] (03CR) 10Hashar: [C: 04-1] "I totally missed Matanya comment regarding scoping. See inline for a possible solution. Feel free to take over / merge while I am sleepin" (032 comments) [operations/puppet] - 10https://gerrit.wikimedia.org/r/120009 (owner: 10Hashar) [17:25:59] RobH: I can confirm the gerrit user :P [17:26:03] greg-g: Ready for scap if it's cool with you [17:26:08] it's being used for his work for ages [17:26:19] there isnt any verification of the gerrit key at all, its kind of been overlooked i suppose =P [17:26:19] bd808|deploy: yeppers [17:26:33] someone could ask him [17:26:34] So yea, you guys are right we have done this as 'good enough' in the past, but greg raises a valid point. [17:26:39] !log bd808 Started scap: testwiki to php-1.23wmf20 and rebuild l10n cache [17:26:43] RobH: that's my worry :) we only have track record of patches, which is something, but... [17:26:48] Logged the message, Master [17:26:59] rephrase: everyones done everything thats been good enough in the past [17:27:09] -rw-r--r-- 1 mwdeploy mwdeploy 1 Mar 24 00:00 mw-UIDGenerator-squidhtcppurge-48 [17:27:11] greg-g: how dare you try to improve security process! [17:27:20] aude: from his email to me/chris about it, mutt tells me "S/MIME signature could NOT be verified." ;) [17:27:30] hmmm [17:27:41] vs. -rw-r--r-- 1 apache apache 3 Mar 27 17:20 /tmp/mw-UIDGenerator-squidhtcppurge-48 [17:28:04] i mean something like chat (skype / hangout) maybe if really concerned [17:28:07] I think the point is even if we personally verify this one person, we should tweak our parameters for accepting how someone is verified in the future. [17:28:19] what robh said :) [17:28:23] to try to reduce any real time verification [17:28:31] officewiki does that, but is only good for staff [17:28:36] also, I tend to think that gpg keys used for signing tarballs are REALLY FUCKING IMPORTANT [17:28:37] it doesnt scale. [17:28:45] * aude nods [17:28:46] :) [17:28:47] plus office wiki does not guarantee who is editing it [17:28:52] Skype would probably [17:28:58] well, unless their acocunt is compromised indeed [17:29:08] hashar: right, either you or the NSA is the one sending the skype message ;) [17:29:11] unless the attacker manage to stream some faked video of the user presenting his fingerprint [17:29:18] or MS, if they think you're doing something weird [17:29:32] I got my access by talking half an hour with Jeronim in Australia :-] [17:29:49] anywho... :) [17:30:00] (03PS1) 10Cmjohnson: giving stat1003 public ipv4/ipv6 dns & removing private entries [operations/dns] - 10https://gerrit.wikimedia.org/r/121412 [17:30:00] hashar: Then was that? :P [17:30:00] anywho i dunno what to do for this review, heh [17:30:05] :) [17:30:05] PROBLEM - HTTP 5xx req/min on tungsten is CRITICAL: CRITICAL: reqstats.5xx [crit=500.000000 [17:30:06] (03CR) 10jenkins-bot: [V: 04-1] giving stat1003 public ipv4/ipv6 dns & removing private entries [operations/dns] - 10https://gerrit.wikimedia.org/r/121412 (owner: 10Cmjohnson) [17:30:21] hoo: 2004?2005 ? Folks were needed to tweak mediawiki configuration. [17:30:32] hoo, if you can verify that is his key, can you add a comment to the patch? [17:30:43] hoo: deploy was very nice back in that time. vi a file, save it. bam instantly available (including php typos which produced blank pages) [17:30:46] "gpg key" [17:31:00] !log Zuul is all happy. [17:31:06] Logged the message, Master [17:31:49] csteipp: Uh... mh :/ I guess I can get his phone number or so to verify, but I'm not really in contact with him despite mail and IRC [17:32:34] once and a while, he visits the office here (but let's not wait for that) [17:32:38] (03PS2) 10Cmjohnson: giving stat1003 public ipv4/ipv6 dns & removing private entries [operations/dns] - 10https://gerrit.wikimedia.org/r/121412 [17:32:39] heh [17:32:42] hashar: /tmp/mw-UIDGenerator-squidhtcppurge-48 [17:32:47] do you know that thing? [17:32:47] (03CR) 10jenkins-bot: [V: 04-1] giving stat1003 public ipv4/ipv6 dns & removing private entries [operations/dns] - 10https://gerrit.wikimedia.org/r/121412 (owner: 10Cmjohnson) [17:33:14] for the future: All gpg keys used to sign tarballs are signed by other gpg keys used to sign the tarballs. [17:33:31] * hoo looks for someone with root :P [17:33:41] chown apache:apache /tmp/mw-UIDGenerator-squidhtcppurge-48 [17:33:45] that needs to be run on terbium [17:33:52] hashar: http://imgur.com/xNnFbig,AjgBwlY,dxA9Jas,WOVHMNG,2gnHC4Y [17:33:57] That's what it looked like yesterday [17:34:00] I don't dare to remove the file cause I'm not sure how it's getting recreated [17:34:02] empty jenkins, all idle [17:34:06] hashar: And huge list on zuul status [17:34:13] hoo: is this a temp fix or something that needs to be set in puppet? [17:34:23] hashar: (see Second Image, Third Image etc.) [17:34:26] RobH: not really [17:34:29] cuz i can do it but if we need to followup and make a more permanent fix i'll want to make an rt ticket [17:34:33] not sure why it has the wrong owner on trebium [17:34:36] fine on all over hosts [17:34:41] probably someone messed locally [17:34:44] ok, i can try fixing it there and we'll see if it crops back up [17:34:55] Krinkle: yesterday issue was probably different than the one we had today [17:35:01] I just shell uploaded two files and it failed on terbium for that reason [17:35:05] but it's fine on tin [17:35:06] Krinkle: or maybe yesterday that was ALL slaves being stalled :-/ [17:35:11] (03PS3) 10Cmjohnson: giving stat1003 public ipv4/ipv6 dns & removing private entries [operations/dns] - 10https://gerrit.wikimedia.org/r/121412 [17:35:16] hoo: try now [17:35:18] and it's apache owned on all other hosts (which makes sense) [17:35:20] hashar: but all executors were idle? [17:35:26] what do you mean by stalled? [17:35:31] RobH: Already did my stuff on tin [17:35:32] Krinkle: na there were some jobs being processed on slaves nodes in labs [17:35:45] Krinkle: stalled i.e. jobs shown as enqueued on the Zuul status page. [17:35:49] hashar: At some point yes, but I waited for those to finish to see [17:35:49] but it's pretty obvious cuase php did with an access denied to that file [17:35:59] and then literally all executors were idle, all of them, everywhere [17:36:05] well, its fixed now so should be ok for future runs [17:36:23] thanks, RobH [17:36:24] (03CR) 10Cmjohnson: [C: 032] giving stat1003 public ipv4/ipv6 dns & removing private entries [operations/dns] - 10https://gerrit.wikimedia.org/r/121412 (owner: 10Cmjohnson) [17:36:42] welcome =] [17:36:42] Krinkle: yeah fixed it yesterday by restarting Zuul the hard way. But that might be solvable by unpooling slaves and repooling them. [17:42:25] ottomata: so you want me to start doing archiva stuff? Can you point me at the server/send me login info and stuff? [17:42:35] I've been only half paying attention, unfortunatly [17:43:04] (03CR) 10Hashar: package-builder.pp: parameterized $pbuilder_root (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/120009 (owner: 10Hashar) [17:43:17] (03PS2) 10Hashar: package-builder.pp: parameterized $pbuilder_root [operations/puppet] - 10https://gerrit.wikimedia.org/r/120009 [17:43:43] hoo: That file permission thing sounds like https://bugzilla.wikimedia.org/show_bug.cgi?id=53791 [17:44:09] !log bd808 Finished scap: testwiki to php-1.23wmf20 and rebuild l10n cache (duration: 17m 29s) [17:44:14] Logged the message, Master [17:44:20] manybubbles: yeah i need to document, we need to work out process [17:44:32] 17m, not bad :) [17:44:32] what do you need to do, i forget? just deploy a jar? [17:44:35] or do you need to build it? [17:44:44] deploy a couple of jars, I think [17:45:18] Feel free to click around testwiki looking for l10n problems [17:45:19] do you have a project with pom, or just the jars? [17:45:47] bd808|deploy: What the heck [17:45:56] don't tell me anyone runs update.php on the cluster [17:46:49] ah yeah that works well [17:46:55] that is how beta cluster updates the databases every hours [17:47:14] hashar: well beta... but not production [17:47:20] terbium is production [17:47:59] hoo: sorry was kidding :-] [17:48:01] * hashar vanishes [17:49:55] (03CR) 10Ottomata: [C: 032 V: 032] Building new version 0.1.1 [operations/debs/git-fat] (debian) - 10https://gerrit.wikimedia.org/r/121252 (owner: 10Ottomata) [17:52:53] !log bd808 Purged l10n cache for 1.23wmf17 [17:52:59] Logged the message, Master [17:53:50] greg-g: Done with prep. WIll grab a quick lunch before the version switch [17:54:45] ottomata: what partitioning do you want on stat1003? [17:55:00] or do you wanna handle that yourself later? [17:55:11] ottomata: project either I suppose [17:55:28] I have zips containing the jars I need to deploy but I can dig up the poms [17:55:44] (03PS1) 10Tim Landscheidt: Add temporary PTR record for mail.tools.wmflabs.org [operations/dns] - 10https://gerrit.wikimedia.org/r/121416 [17:55:52] cmjohnson1: hmmm, cmjohnson1 if you can make / be a smallish raid 1 partition, that woudl be fine [17:55:56] i think raid-1-lvm would work [17:56:09] k [17:56:14] don't remember how many drives this thing has [17:56:37] manybubbles: either is fine, i think, i've never added jars without poms, but i think it is possible [17:57:06] bd808|LUNCH: sounds good, me too [17:57:20] ottomata 4 disks [17:57:40] manybubbles: http://archiva.apache.org/docs/2.0.1/userguide/deploy.html [17:57:50] all 3TB [17:57:57] (03Abandoned) 10Andrew Bogott: Add a script that updates labs instances after migration. [operations/puppet] - 10https://gerrit.wikimedia.org/r/115342 (owner: 10Andrew Bogott) [17:59:33] hm, cmjohnson1, what's on stat1? [17:59:44] is it really just one 10TB disk as it is showing??? is that some kind of storage device? [17:59:49] i only see /dev/sda [18:00:17] stat1 is storage [18:00:28] very different than what you have here [18:01:21] aye ok [18:01:24] hmmm [18:01:29] lemm echeck ticket to remember what I said [18:01:42] ottomata: https://gerrit.wikimedia.org/r/#/c/121399/ could/should replace the script we use for checking Cirrus's Elasticsearch [18:05:04] hmmm cmjohnson1, can we raid-1+0 these? [18:05:31] that would give us 6TB of space with mirrored redundancy (right?) [18:06:04] if we coudl raid 1+0 all 4, then we'd just have one logical volume group across the whole thing, and make a smallish / lvm partition, then I could lvm partition the rest myself [18:06:29] manybubbles: will be with ja in a few... [18:07:44] ottomata: yeah 6TB [18:16:48] (03CR) 10BryanDavis: [C: 032] Wikipedias to 1.23wmf19 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/121406 (owner: 10BryanDavis) [18:17:47] (03Merged) 10jenkins-bot: Wikipedias to 1.23wmf19 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/121406 (owner: 10BryanDavis) [18:19:04] !log bd808 rebuilt wikiversions.cdb and synchronized wikiversions files: wikipedias to 1.23wmf19 [18:19:11] Logged the message, Master [18:22:03] hoo: Call to a member function getValue() on a non-object in /usr/local/apache/common-local/php-1.23wmf19/extensions/Wikidata/extensions/Wikibase/client/includes/hooks/SpecialWatchlistQueryHandler.php on line 51 [18:22:11] manybubbles: is there a way for you to know what the index errors are when nagios starts crying? [18:24:29] ottomata: the old one, not really [18:24:30] this one, yeah [18:24:56] the new one prints this out: [18:24:59] https://gist.github.com/nik9000/9814635 [18:25:21] hm ok cool [18:25:24] i mean, sure, looks fine to me [18:25:30] it should be more useful [18:25:32] actionable [18:25:34] hoo, aude : PHP Fatal error: Call to a member function getValue() on a non-object in /usr/local/apache/common-local/php-1.23wmf19/extensions/Wikidata/extensions/Wikibase/client/includes/hooks/SpecialWatchlistQueryHandler.php on line 51 [18:25:35] and rarer [18:26:15] thanks1 [18:26:31] bd808|deploy: I've seen that one a ton [18:26:34] has it spiked? [18:26:53] ottomata: so about elasticsearch plugins [18:27:07] jajja ok so [18:27:18] I've downloaded one plugin, verified all the hashes (md5 and sha512) against the same thing downloaded from another source [18:27:21] It seems to be climbing. 27 in the last 15 minutes and 'pedias just got wmf19 [18:29:43] bd808|deploy: ouh [18:29:48] let me check [18:31:14] hoo: Your patch is in there. I think that $opts can be null [18:31:28] is it common? [18:31:37] the rror, I mean [18:31:38] 56 in the last 15 minutes [18:31:44] :/ [18:32:16] * @param array|FormOptions $opts array until MW 1.22, FormOptions since MW 1.23 [18:33:06] looking for a trace on fluorine [18:33:12] "Call to a member function getValue() on a non-object" === null and the error moved when you moved the line [18:33:24] (03PS2) 10Cmjohnson: adding dhcp and netboot for stat1003 [operations/puppet] - 10https://gerrit.wikimedia.org/r/121409 [18:33:36] It's a fatal so I don't think there's a trace [18:33:51] wait [18:33:56] there's a special mobile watchlist [18:34:01] * hoo wtfs loud [18:34:26] * YuviPanda tells hoo about special mobile history page [18:34:46] bd808|deploy: I guess it's not this urgent... what about we do a deploy later today, there's another thing I want to have done today anyway [18:34:46] * YuviPanda also tells hoo about special mobile diff page and editing page and uh, I don't remember [18:35:02] YuviPanda: ... sounds like a sane thing... [18:35:39] going to have food fast (before it gets cold, sorry), then will do the patches and get stuff ready to deploy [18:36:01] hoo|away: okey doke [18:39:16] (03CR) 10BryanDavis: [C: 032] Group0 wikis to 1.23wmf20 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/121407 (owner: 10BryanDavis) [18:39:25] (03Merged) 10jenkins-bot: Group0 wikis to 1.23wmf20 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/121407 (owner: 10BryanDavis) [18:40:37] !log bd808 rebuilt wikiversions.cdb and synchronized wikiversions files: group0 wikis to 1.23wmf20 [18:40:43] Logged the message, Master [18:46:17] (03CR) 10Matanya: "This should work. You could as well just fully qualified them." [operations/puppet] - 10https://gerrit.wikimedia.org/r/120009 (owner: 10Hashar) [18:49:19] (03PS3) 10Cmjohnson: adding dhcp and netboot for stat1003 [operations/puppet] - 10https://gerrit.wikimedia.org/r/121409 [18:50:12] (03CR) 10Cmjohnson: [C: 032] adding dhcp and netboot for stat1003 [operations/puppet] - 10https://gerrit.wikimedia.org/r/121409 (owner: 10Cmjohnson) [18:51:07] greg-g: {{done}} but hoo will need to make a patch and backport to wmf19 for the watchlist problem in https://bugzilla.wikimedia.org/show_bug.cgi?id=63087 [18:51:23] I'm kind of on that [18:51:29] there's two things I want backported [18:51:35] bot not yet in gerrit [18:51:36] on that [18:51:59] * bd808|deploy is just keeping the boss informed :) [18:52:16] :) [18:52:18] ty sir [18:52:25] (03CR) 10Matanya: [C: 031] Lint mediawiki::twemproxy [operations/puppet] - 10https://gerrit.wikimedia.org/r/121400 (owner: 10Hashar) [18:53:04] If greg-g ain't happy ain't nobody happy [19:02:41] I wonder whether we should show Wikidata changes on mobile or not [19:02:47] don't have the stuff around... [19:03:48] YuviPanda: ^ [19:04:39] awjr: ^ [19:04:59] need some kind of desicion ASAP [19:05:07] * decision [19:05:25] hoo: let's move this to #wikimedia-mobile [19:15:44] ori, working today? [19:23:05] (03PS1) 10Manybubbles: Add icu analysis plugin for 1.0.1 [operations/software/elasticsearch/plugins] - 10https://gerrit.wikimedia.org/r/121436 [19:28:53] greg-g: bd808: Ok, I'm going to make the build now and prepare stuff (will take 20 minutes or so) [19:29:18] or maybe 25, maybe I can get that other hting also in [19:35:54] (03PS1) 10Jforrester: Enable VisualEditor by default on French Wiktionary & French Wikibooks [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/121439 [19:39:24] hoo: what other thing? :) [19:41:28] (03CR) 10coren: [C: 032] "That's a good way of doing it." [operations/puppet] - 10https://gerrit.wikimedia.org/r/121398 (owner: 10Tim Landscheidt) [19:42:05] greg-g: A problem where we didn't link the correct terms of use page [19:42:11] but that one is not included now [19:42:12] * greg-g nods [19:42:14] but I want to have that later [19:42:33] kk [19:42:39] SpecialMobileWatchlist is more important atm ;) [19:44:25] slightly :) [19:45:54] oh wait, wmf20 is out now either [19:46:54] yeah, on testwikis/mw.org [19:47:06] better fix both [19:48:18] ok, if you're ok, I can +2 both changes and then wait for jenkins [19:48:43] (03PS2) 10Tim Landscheidt: Add temporary PTR record for mail.tools.wmflabs.org [operations/dns] - 10https://gerrit.wikimedia.org/r/121416 [19:49:12] fix it please :) [19:49:22] Is this a good to go? [19:49:32] * hoo asks twice after the mistake yesterday [19:51:02] (03PS1) 10Manybubbles: Turn on git-fat for Elasticsearch plugins [operations/puppet] - 10https://gerrit.wikimedia.org/r/121445 [19:51:04] greg-g: ^ [19:51:18] now, yes (without seeing the changes) [19:51:23] where are they in gerrit? [19:51:42] oh, there they are /me forgets which channel does that announcing sometimes [19:51:46] https://gerrit.wikimedia.org/r/121434 [19:51:48] * greg-g was looking at -tech not -dev ;) [19:51:51] this is the relevant thing [19:53:18] yeah, go and fix :) [19:58:34] !log hoo synchronized php-1.23wmf19/extensions/Wikidata/ 'Update Wikidata to fix a problem with SpecialMobileWatchlist' [19:58:38] Can anyone please confirm? [19:58:40] Logged the message, Master [19:59:52] looks good to me [19:59:56] https://en.m.wikipedia.org/wiki/Special:Watchlist [20:01:22] !log hoo synchronized php-1.23wmf20/extensions/Wikidata/ 'Update Wikidata to fix a problem with SpecialMobileWatchlist' [20:01:28] Logged the message, Master [20:02:53] error logs look ok [20:03:09] greg-g: I guess I'm done for now, next patch will probably be in an hour or so [20:03:25] but that one is much less important [20:05:16] hoo: add it to a SWAT deploy then :) [20:05:33] will do [20:06:06] hoo: thanks much for the quick fix [20:11:42] You're welcome :) [20:18:03] (03PS1) 10Ori.livneh: hhvm on beta: use upstart [operations/puppet] - 10https://gerrit.wikimedia.org/r/121451 [20:19:05] (03PS2) 10Ori.livneh: hhvm on beta: use upstart [operations/puppet] - 10https://gerrit.wikimedia.org/r/121451 [20:24:05] PROBLEM - HTTP 5xx req/min on tungsten is CRITICAL: CRITICAL: reqstats.5xx [crit=500.000000 [20:37:55] (03CR) 10Ori.livneh: [C: 032] hhvm on beta: use upstart [operations/puppet] - 10https://gerrit.wikimedia.org/r/121451 (owner: 10Ori.livneh) [20:38:34] (03PS2) 10Ottomata: Turn on git-fat for Elasticsearch plugins [operations/puppet] - 10https://gerrit.wikimedia.org/r/121445 (owner: 10Manybubbles) [20:38:42] (03CR) 10Ottomata: [C: 032 V: 032] Turn on git-fat for Elasticsearch plugins [operations/puppet] - 10https://gerrit.wikimedia.org/r/121445 (owner: 10Manybubbles) [20:47:53] OK [20:47:56] Something's weird [20:48:05] For some reason MMV is no longer in beta on mw.org [20:48:18] marktraceur: it's always on, or just gone? [20:48:19] It's also not working, but I'm going to take that as a blessing in disguise [20:48:24] * greg-g nods [20:48:33] It's in the appearance prefs now [20:48:39] Which is what I'd expect if it weren't in beta [20:48:44] But it's supposed to be in beta. [20:48:56] well [20:49:07] is beta-labs-on-eqiad publicly accessible? [20:49:27] ori: not sure [20:49:36] andrewbogott: missed your ping earlier. what's up? [20:51:14] I see no reason for this to be the case... [20:51:21] The settings are all correct [20:55:22] greg-g: Got it. I assume you're cool with me joining the SWAT once she's fixed. [20:56:23] greg-g: FYI: Added me to tonight's SWAT, will prepare the deployment stuff later on, away for now [20:56:57] marktraceur: We do code around here, not spaying [20:57:05] hoo|away: kk [20:57:55] greg-g: May have to dabble a bit here [21:00:37] I love it when others do things on my task list: [21:00:38] greg@x200s:~$ task 17 done [21:00:38] Completed task 17 'follow up on EventLogging postmortem, and maintenance responsibilities'. [21:02:35] greg-g: what EventLogging postmortem? [21:02:49] oh, the one from a week or two ago [21:02:58] ori: yeah, that one, that christian pinged on today [21:03:30] :-P [21:03:37] It's embarrassing :-( [21:04:26] what is? [21:04:35] qchris: thanks for the ping, though, sincerely. [21:05:11] ori: The fact that we didn't manage to respond to greg-g's email within a whole week. That is just embarrassing. [21:06:25] PROBLEM - HTTP on aluminium is CRITICAL: Connection refused [21:07:25] RECOVERY - HTTP on aluminium is OK: HTTP OK: HTTP/1.1 302 Found - 557 bytes in 0.002 second response time [21:07:38] no page on wikitech for aluminum [21:13:10] ottomata: so no new files on elastic1001 [21:13:20] ok, so we merged [21:13:23] not that I could use them yet [21:13:25] but still [21:13:29] did you deploy? [21:13:32] deploy? [21:13:42] puppet does it? [21:13:45] yeah, all that does it tell git-deploy to run git-fat pull commands when you deploy [21:13:54] no, puppet sets it up so that git-deploy will do the right thing [21:14:14] ah [21:14:20] do you have deployment::targets for that repo? [21:14:33] I thought they were already in puppet [21:16:15] Added patch to SWAT [21:16:19] So angry [21:16:20] Cannot see [21:16:54] marktraceur: ? [21:17:20] greg-g: That was just a frustrating experience [21:17:43] * greg-g nods [21:23:31] manybubbles: they probably are [21:23:34] i didn't check [21:23:43] if they are, then you just need to do a deployment from tin [21:23:54] (do you know how?) [21:23:59] ori: I emailed you earlier as well… wondering about your moratorium on editing misc/package-builder.pp [21:24:08] hashar is blocked by that, which is indirectly blocking the labs migration [21:24:10] bd808: ^ [21:24:20] do you know if that is still in use? [21:24:22] as a task, i mean [21:24:43] ottomata: not a git-deploy, actually [21:25:09] ok cool [21:25:10] sooooo! [21:25:32] https://wikitech.wikimedia.org/wiki/Git-deploy#Deploy_the_repo_via_tin.eqiad.wmnet [21:25:38] i don't think you need the --force [21:25:40] dunno what that does [21:25:52] ori, andrewbogott: I reviewed tasks based on it last week. robla would know maybe if they are still sending it out. I think it's ok to be fixed though. If candidates turn in a 1 for 1 copy of the fixes we make they probably fail the test anyway. :) [21:26:13] robla: Any ideas? [21:26:29] in a meeting. emergency? [21:26:38] robla: not really. [21:26:49] We send out a tarball that is isolated from the ops/puppet repo [21:27:10] oh, in that case I will merge with abandon! [21:28:06] (03CR) 10Andrew Bogott: [C: 032] Lint misc::package-builder [operations/puppet] - 10https://gerrit.wikimedia.org/r/120005 (owner: 10Hashar) [21:28:46] hm… ottomata, you have a merge pending… shall I merge it as well? 'Turn on git-fat for Elasticsearch plugins' [21:29:54] welp [21:30:47] (03CR) 10Andrew Bogott: [C: 032] package-builder.pp: convert notify to use 'message' [operations/puppet] - 10https://gerrit.wikimedia.org/r/120008 (owner: 10Hashar) [21:32:38] (03PS1) 10Ori.livneh: hhvm on beta: remove /etc/init.d/hhvm; simplify libmemcached check [operations/puppet] - 10https://gerrit.wikimedia.org/r/121530 [21:33:27] (03CR) 10Andrew Bogott: [C: 032] package-builder.pp: parameterized $pbuilder_root [operations/puppet] - 10https://gerrit.wikimedia.org/r/120009 (owner: 10Hashar) [21:33:40] bd808: i pointed that out on the patch in gerrit, but if you say it is ok, then great [21:34:49] matanya: I saw your comments. I think it will be fine. "Can't fix this because somebody might copy it for their homework" seems pretty lame. [21:35:15] i spoke to alex about it in the past [21:35:24] (03PS2) 10Andrew Bogott: Role classes wrapper for misc::package-pbuilder [operations/puppet] - 10https://gerrit.wikimedia.org/r/120013 (owner: 10Hashar) [21:35:25] he said it is better to leave it [21:35:38] but i'm more than happy it will be fixed [21:35:59] bd808: i'd love to modulrize it if you don't mind [21:36:12] it is one of the very last things that aren't yet [21:39:22] (03CR) 10Andrew Bogott: [C: 032] Role classes wrapper for misc::package-pbuilder [operations/puppet] - 10https://gerrit.wikimedia.org/r/120013 (owner: 10Hashar) [21:42:02] matanya: Can you hold off for a couple more weeks on a complete rewrite? [21:48:38] sure bd808 sorry, have severe network issues [21:49:48] matanya: I know how that works. My cable modem was jacked up badly for the last few days. I just got it fixed yesterday afternoon. [21:50:57] bd808: so my modem reboot every 2 minutes, i think the IT of my provider is playing the random cable game again [21:51:37] My problem was a combination of borderline signal strength and a flakey card at the headend. [21:52:41] Replaced 2 splitters, bypassed a third and got them to migrate me to another port on the headend. [21:53:59] !log Reloading Zuul to deploy I243258bc2b1770524285ec7 [21:54:07] Logged the message, Master [21:54:15] i give up. night all [21:55:00] bd808: regarding https://gerrit.wikimedia.org/r/#/c/120019/1 -- /was/ your global removed? Or is that in a patch that's pending elsewhere? [21:56:39] andrewbogott: it's in a pending patch -- https://gerrit.wikimedia.org/r/#/c/119524/ -- and cherry-picked into beta's puppetmaster [21:57:22] (03CR) 10Andrew Bogott: [C: 032] Allow user to specify mount point for role::labs::lvm::mnt [operations/puppet] - 10https://gerrit.wikimedia.org/r/119524 (owner: 10BryanDavis) [21:58:16] (03PS2) 10Andrew Bogott: role::labs::lvm::srv to mount second disk on /srv [operations/puppet] - 10https://gerrit.wikimedia.org/r/120019 (owner: 10Hashar) [21:58:32] * bd808 feels the +2 love [21:59:29] (03CR) 10Andrew Bogott: [C: 032] "Everybody wins!" [operations/puppet] - 10https://gerrit.wikimedia.org/r/120019 (owner: 10Hashar) [22:01:25] (03CR) 10Ori.livneh: [C: 032] hhvm on beta: remove /etc/init.d/hhvm; simplify libmemcached check [operations/puppet] - 10https://gerrit.wikimedia.org/r/121530 (owner: 10Ori.livneh) [22:03:32] (03CR) 10Andrew Bogott: [C: 032] contint: use role::labs::lvm::mnt on eqiad slaves [operations/puppet] - 10https://gerrit.wikimedia.org/r/119483 (owner: 10Hashar) [22:06:38] (03PS1) 10Ori.livneh: hhvm on beta: fix upstart job filename (add .conf suffix) [operations/puppet] - 10https://gerrit.wikimedia.org/r/121542 [22:09:21] (03CR) 10Ori.livneh: [C: 032] hhvm on beta: fix upstart job filename (add .conf suffix) [operations/puppet] - 10https://gerrit.wikimedia.org/r/121542 (owner: 10Ori.livneh) [22:13:25] (03PS1) 10Ottomata: Puppetizing Camus cronjob [operations/puppet] - 10https://gerrit.wikimedia.org/r/121546 [22:14:10] (03CR) 10Ottomata: "This change depends on this being deployed: https://gerrit.wikimedia.org/r/#/c/121531" [operations/puppet] - 10https://gerrit.wikimedia.org/r/121546 (owner: 10Ottomata) [22:17:08] (03CR) 10jenkins-bot: [V: 04-1] Puppetizing Camus cronjob [operations/puppet] - 10https://gerrit.wikimedia.org/r/121546 (owner: 10Ottomata) [22:21:04] (03PS1) 10Ori.livneh: applicationserver::hhvm: add /var/run/hhvm [operations/puppet] - 10https://gerrit.wikimedia.org/r/121547 [22:23:48] (03CR) 10Ori.livneh: [C: 032] applicationserver::hhvm: add /var/run/hhvm [operations/puppet] - 10https://gerrit.wikimedia.org/r/121547 (owner: 10Ori.livneh) [22:37:06] test -- irc suddenly decided my nick was suspect, so that was an adventure [22:38:34] * hoo still wonders whethre anyone really ran update.php in production [22:40:31] * ebernhardson wonders why there is a script there just waiting to break things. shouldn't mediawiki protect itself? :P [22:42:27] OK, so, the friggin' preferences patches for MMV are still not enough [22:42:33] tgr is working on The Final Fix [22:42:55] But we aren't sorted yet, just so you know [22:43:02] one fix to rule them all? [22:43:09] Basically. [22:43:29] Turns out that ResourceLoaderGetConfigVars is pretty much useless for context-sensitive things [22:43:33] So that's my bad [22:54:53] OK, all three are in the list [22:59:56] * mwalker twiddles thumbs waiting for his deployment checkout of core to finish updating [23:02:32] James_F|Away, RoanKattouw I guess I'll start with the VE / OOjs stuff first -- beginning with the config change because that's easy :p [23:03:22] (03CR) 10Mwalker: [C: 032] Enable VisualEditor by default on French Wiktionary & French Wikibooks [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/121439 (owner: 10Jforrester) [23:06:16] mwalker: OK cool [23:06:23] (Sorry, wasn't paying attention for a second) [23:07:05] PROBLEM - HTTP 5xx req/min on tungsten is CRITICAL: CRITICAL: reqstats.5xx [crit=500.000000 [23:07:13] !log mwalker synchronized wmf-config/ 'Enabling VE on French Wiktionary & French Wikibooks' [23:07:19] Logged the message, Master [23:13:47] marktraceur, do you want me to deploy your changes one at a time? or should I bundle them all up? [23:13:48] hey [23:13:54] hey ori [23:14:01] mwalker: You can bundle them [23:14:09] how are we doing swats today? [23:14:10] i'm available [23:14:27] I dunno; I'm mostly making the order up as I go [23:14:43] OK, but you're on it today? [23:14:52] * RoanKattouw is designing a complex algorithm with David but is pingable if needed [23:15:14] yepyep [23:16:11] Ugh, Jenkins, slow bastard [23:19:28] (03PS1) 10Ori.livneh: MWMultiVersion: treat *.hhvm.beta as *.wikipedia.beta [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/121556 [23:21:16] mwalker: OK if I interject with that quickly ^ ? should take me a sec and is scoped to beta. [23:21:51] !log mwalker synchronized php-1.23wmf20/resources/oojs-ui 'Syncing for Update OOjs UI to v0.1.0-pre' [23:21:52] ori, go for it [23:21:56] Logged the message, Master [23:22:07] (03CR) 10Ori.livneh: [C: 032] MWMultiVersion: treat *.hhvm.beta as *.wikipedia.beta [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/121556 (owner: 10Ori.livneh) [23:22:14] (03Merged) 10jenkins-bot: MWMultiVersion: treat *.hhvm.beta as *.wikipedia.beta [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/121556 (owner: 10Ori.livneh) [23:22:20] greg-g: Hope https://wikitech.wikimedia.org/w/index.php?title=Deployments&diff=107804&oldid=107800 is OK – trying to prompt people to use the SWAT slots correctly. [23:22:23] !log ori updated /a/common to {{Gerrit|Ie4c431ae1}}: MWMultiVersion: treat *.hhvm.beta as *.wikipedia.beta [23:22:28] Logged the message, Master [23:22:58] James_F: not bad, not bad. [23:23:30] !log ori synchronized multiversion/MWMultiVersion.php 'Ie4c431ae1: MWMultiVersion: treat *.hhvm.beta as *.wikipedia.beta' [23:23:31] !log mwalker synchronized php-1.23wmf20/skins/vector/components/common.less 'Syncing for 'Follow-up to typography changes to Vector'' [23:23:35] Logged the message, Master [23:23:41] Logged the message, Master [23:25:14] ok marktraceur; pushing your stuff now [23:26:35] !log mwalker synchronized php-1.23wmf20/extensions/MultimediaViewer/ 'Updating multimediaviewer with {{Gerrit|121555}}' [23:26:46] Logged the message, Master [23:28:07] hoo, Nikerabbit -- pushing both your changes as soon as I bundle them and jenkins reviews them [23:28:32] ok [23:28:54] please notice that my stuff is for wmf19 and 20 [23:29:22] ah; I only saw your update for wmf19 [23:29:31] I'll do that one first; and then come back to wmf20 [23:36:25] hoo, I'm a little bit confused; the Wikibase extension doesn't seem to exist on the 1.23wmf19 branch [23:36:44] mwalker: sure [23:37:00] https://gerrit.wikimedia.org/r/121544 [23:37:06] https://gerrit.wikimedia.org/r/121545 [23:37:12] Wikidata uses its own build system [23:37:20] ah [23:37:22] that's a little complicated... [23:37:29] so just use these :P [23:37:40] it's also not called what I expected it to be called :p [23:37:42] mwalker: You, um...you pushed it out? [23:38:03] marktraceur, supposedly [23:38:05] Shit [23:38:10] Something else might be wrong then [23:38:30] :( [23:38:42] hmm... oh [23:38:44] I didn't actually [23:38:49] I forgot to update the submodule [23:38:57] marktraceur, ^ [23:39:01] Aha. [23:39:06] mwalker: Hate it when that happens [23:39:19] Carry on [23:40:33] !log mwalker synchronized php-1.23wmf20/extensions/MultimediaViewer/ 'Actually updating multimediaviewer with {{Gerrit|121555}}...' [23:40:38] gj. [23:40:39] Logged the message, Master [23:40:53] Molto bene [23:41:12] !log mwalker synchronized php-1.23wmf20/extensions/MultimediaViewer/ 'Updating Wikibase with {{Gerrit|121545}}...' [23:41:18] Logged the message, Master [23:41:25] hoo, ok; try your 1.20 changes [23:41:37] k [23:42:55] marktraceur: all good? [23:43:02] !log mwalker synchronized php-1.23wmf19/extensions/MultimediaViewer/ 'Updating Wikibase with {{Gerrit|121544}}...' [23:43:13] uhm [23:43:19] Logged the message, Master [23:43:19] Looks like [23:43:26] marktraceur: congrats :) [23:43:27] Wat [23:43:42] heh [23:43:47] mwalker: Why are you syncing MMV for wmf19? [23:44:21] because I suck [23:44:23] Hahaha [23:44:31] * marktraceur hugs mwalker [23:44:59] !log mwalker synchronized php-1.23wmf19/extensions/Wikidata/ 'Updating Wikibase (not multimedia viewer) with {{Gerrit|121544}}...' [23:45:07] Logged the message, Master [23:45:10] maybe I'm just not detail oriented enough for SWAT duty [23:45:13] nice summary :) [23:46:19] ... and it works [23:47:36] greg-g: How do I get in on this stuff [23:47:44] So mwalker doesn't have to flail about every week. :P [23:47:48] mwalker: Don't forget to sync wmf20 [23:48:35] !log mwalker synchronized php-1.23wmf20/extensions/Wikidata/ 'Updating Wikibase (not mulitmediaviewer) with {{Gerrit|121545}}...' [23:48:42] Logged the message, Master [23:48:53] hah; marktraceur this might just be because this is the first SWAT i've done [23:49:06] I'm not used to juggling so many things [23:49:31] Yeah [23:49:46] mwalker: and this is a big one, too [23:50:12] marktraceur: respond to my call to action 3 weeks ago? :P [23:50:53] hoo, interestingly; wikibase just had a couple of exceptions come through looking like: "2014-03-27 23:48:53 mw1202 wikidatawiki: [32b2520c] /w/api.php?action=wbgetentities&props=sitelinks&format=json&ids=Q8092315 Exception from line 98 of /usr/local/apache/common-local/php-1.23wmf19/extensions/Wikidata/extensions/Wikibase/lib/includes/store/sql/WikiPageEntityLookup.php: No such revision found for Q8092315: 118082697" [23:50:54] marktraceur: but seriously, uh, we should have some sort of initiation ritual [23:51:00] I'm guessing that's not related; but want to make sure [23:51:18] mwalker: Hey, you seem to have botched the deployment of the VE config change [23:51:24] mwalker: Unrelated [23:51:29] Heisenbug thing :P [23:51:34] mwalker: The change you +2ed touched *.dblist files, but those aren't in the wmf-config directory [23:51:46] damnit RoanKattouw; apparently I just cant do anything correctly today [23:52:09] looking [23:52:16] * greg-g hands mwalker a cup of tea [23:52:29] mwalker: Also, did you do my/Niklas's LocalisationUpdate change? SAL suggests you haven't [23:52:45] no; not yet; it's https://gerrit.wikimedia.org/r/#/c/121560/ [23:52:48] I was waiting on jenkins [23:52:51] Oh OK [23:53:18] Cool [23:53:27] (Dammit Jenkins.) [23:54:30] there it goes [23:55:45] RoanKattouw, can I sync-file *.dblist? [23:55:46] or should I do it one by one [23:55:52] (I dont know if sync file accepts wildcards) [23:56:08] wildcards are interpreted by the shell [23:56:47] !log mwalker synchronized visualeditor-default.dblist 'Actually syncing for {{Gerrit|121439}}' [23:56:53] Logged the message, Master [23:57:02] !log mwalker synchronized visualeditor.dblist 'Actually syncing for {{Gerrit|121439}}' [23:57:08] Logged the message, Master [23:58:41] !log mwalker synchronized php-1.23wmf19/extensions/LocalisationUpdate/ 'Updating for {{Gerrit|121560}}' [23:58:51] Logged the message, Master [23:59:12] James_F / RoanKattouw : does it all look good now? [23:59:22] * James_F checks. [23:59:25] we'll know in a couple hours :) [23:59:33] (re l10n) [23:59:52] hah; ya; if everyone on the french wikis storms the village pumps [23:59:56] greg-g: Sure; for the VE thing we'll need to wait a second.