[00:07:26] (03PS1) 10Ori.livneh: Add eventlogging::service::reporter for reporting to StatsD [operations/puppet] - 10https://gerrit.wikimedia.org/r/123139 [00:09:10] (03CR) 10Ori.livneh: [C: 032] Add eventlogging::service::reporter for reporting to StatsD [operations/puppet] - 10https://gerrit.wikimedia.org/r/123139 (owner: 10Ori.livneh) [00:11:24] (03PS1) 10Ori.livneh: Follow-up to I595f2c15d: add missing reporter.erb [operations/puppet] - 10https://gerrit.wikimedia.org/r/123140 [00:12:54] (03CR) 10Ori.livneh: [C: 032] Follow-up to I595f2c15d: add missing reporter.erb [operations/puppet] - 10https://gerrit.wikimedia.org/r/123140 (owner: 10Ori.livneh) [00:27:00] (03PS1) 10Ori.livneh: eventlogging: make statsd:// consumer write to tungsten [operations/puppet] - 10https://gerrit.wikimedia.org/r/123143 [00:29:00] (03CR) 10Ori.livneh: [C: 032] eventlogging: make statsd:// consumer write to tungsten [operations/puppet] - 10https://gerrit.wikimedia.org/r/123143 (owner: 10Ori.livneh) [00:30:36] PROBLEM - Kafka Broker Messages In on analytics1021 is CRITICAL: kafka.server.BrokerTopicMetrics.AllTopicsMessagesInPerSec.FifteenMinuteRate CRITICAL: 988.204227577 [00:43:42] (03PS1) 10Tim Landscheidt: Tools: Alias tools.wmflabs.org to internal webproxy [operations/puppet] - 10https://gerrit.wikimedia.org/r/123149 [00:51:56] PROBLEM - RAID on db1047 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:51:59] (03CR) 10Legoktm: Use the BetaFeatures whitelist for production to avoid accidental deploys (031 comment) [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/121892 (owner: 10Jforrester) [00:52:13] springle: db1047 again :/ [00:52:36] PROBLEM - DPKG on db1047 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:52:53] * springle sigh [00:52:56] RECOVERY - RAID on db1047 is OK: OK: optimal, 3 logical, 6 physical [00:53:26] RECOVERY - DPKG on db1047 is OK: All packages OK [01:00:54] ebernhardson: did you update the submodules before syncing? [01:01:30] ebernhardson: i don't think you did [01:01:37] i'll do the bump / sync now [01:02:30] !log ori synchronized php-1.23wmf20/extensions/EventLogging [01:02:34] Logged the message, Master [01:02:44] !log ori synchronized php-1.23wmf20/extensions/WikimediaEvents [01:02:48] Logged the message, Master [01:04:12] Someone knows about the SSL issues on gerrit transactions, right? [01:04:56] !log ori synchronized php-1.23wmf19/extensions/EventLogging [01:05:01] Logged the message, Master [01:05:15] !log ori synchronized php-1.23wmf19/extensions/WikimediaEvents [01:05:19] Logged the message, Master [01:16:21] !log ori synchronized php-1.23wmf19/extensions/WikimediaEvents 'Undeployed change from earlier SWAT deploy' [01:16:29] Logged the message, Master [01:16:31] !log ori synchronized php-1.23wmf20/extensions/WikimediaEvents 'Undeployed change from earlier SWAT deploy' [01:16:35] Logged the message, Master [01:21:06] PROBLEM - HTTP 5xx req/min on tungsten is CRITICAL: CRITICAL: reqstats.5xx [crit=500.000000 [01:30:14] (03PS1) 10Ori.livneh: Use utf-8 as default codec in mwgrep [operations/puppet] - 10https://gerrit.wikimedia.org/r/123156 [01:32:04] (03CR) 10Ori.livneh: [C: 032] Use utf-8 as default codec in mwgrep [operations/puppet] - 10https://gerrit.wikimedia.org/r/123156 (owner: 10Ori.livneh) [02:03:16] !log springle synchronized wmf-config/db-eqiad.php 's3 depool db1027 for upgrade' [02:03:24] Logged the message, Master [02:18:36] (03PS1) 10Springle: Remove unneeded client packages. The wmf-mariadb package provides the usual command line utilities, and other packages are free to use the usual stock libmysqlclient* dependencies without conflict. [operations/puppet] - 10https://gerrit.wikimedia.org/r/123160 [02:22:23] !log springle synchronized wmf-config/db-eqiad.php 's3 repool db1027, warm up' [02:22:27] Logged the message, Master [02:24:03] (03CR) 10Springle: [C: 032] Remove unneeded client packages. The wmf-mariadb package provides the usual command line utilities, and other packages are free to use the u [operations/puppet] - 10https://gerrit.wikimedia.org/r/123160 (owner: 10Springle) [02:28:35] (03PS1) 10coren: Tool Labs: Fix exim4 config [operations/puppet] - 10https://gerrit.wikimedia.org/r/123162 [02:29:18] !log LocalisationUpdate completed (1.23wmf19) at 2014-04-02 02:29:18+00:00 [02:29:22] Logged the message, Master [02:31:12] (03PS2) 10coren: Tool Labs: Fix exim4 config [operations/puppet] - 10https://gerrit.wikimedia.org/r/123162 [02:31:32] (03PS1) 10Springle: Add db1027 to coredb topology in S3. This omission caused db1027 to only have the standard icinga checks. [operations/puppet] - 10https://gerrit.wikimedia.org/r/123163 [02:34:27] (03CR) 10coren: [C: 032] Tool Labs: Fix exim4 config [operations/puppet] - 10https://gerrit.wikimedia.org/r/123162 (owner: 10coren) [02:35:57] (03PS2) 10Springle: Add db1027 to coredb topology in S3. This omission caused db1027 to only have the standard icinga checks. [operations/puppet] - 10https://gerrit.wikimedia.org/r/123163 [02:37:54] (03CR) 10Springle: [C: 032] Add db1027 to coredb topology in S3. This omission caused db1027 to only have the standard icinga checks. [operations/puppet] - 10https://gerrit.wikimedia.org/r/123163 (owner: 10Springle) [02:52:48] !log LocalisationUpdate completed (1.23wmf20) at 2014-04-02 02:52:48+00:00 [02:52:53] Logged the message, Master [02:55:41] !log springle synchronized wmf-config/db-eqiad.php 's2 depool db1063 for upgrade' [02:55:45] Logged the message, Master [03:04:26] !log springle synchronized wmf-config/db-eqiad.php 's2 repool db1063, warm up' [03:04:30] Logged the message, Master [03:07:55] !log springle synchronized wmf-config/db-eqiad.php 's4 depool db1059 for upgrade' [03:07:58] Logged the message, Master [03:21:36] !log springle synchronized wmf-config/db-eqiad.php 's4 repool db1059, warm up' [03:21:41] Logged the message, Master [03:27:33] !log springle synchronized wmf-config/db-eqiad.php 's5 depool db1045 for upgrade' [03:27:37] Logged the message, Master [03:36:39] (03PS1) 10coren: Labs: prevent ec2id fact from returing errors [operations/puppet] - 10https://gerrit.wikimedia.org/r/123168 [03:37:32] (03CR) 10jenkins-bot: [V: 04-1] Labs: prevent ec2id fact from returing errors [operations/puppet] - 10https://gerrit.wikimedia.org/r/123168 (owner: 10coren) [03:38:36] PROBLEM - Host db1045 is DOWN: PING CRITICAL - Packet loss = 100% [03:38:55] (03PS2) 10coren: Labs: prevent ec2id fact from returing errors [operations/puppet] - 10https://gerrit.wikimedia.org/r/123168 [03:39:40] (03PS3) 10coren: Labs: prevent ec2id fact from returing errors [operations/puppet] - 10https://gerrit.wikimedia.org/r/123168 [03:40:30] (03CR) 10jenkins-bot: [V: 04-1] Labs: prevent ec2id fact from returing errors [operations/puppet] - 10https://gerrit.wikimedia.org/r/123168 (owner: 10coren) [03:41:37] Syntax error at '}'; expected '}' [03:41:45] That is soooo helpful. really. [03:42:18] :) [03:43:48] !log springle synchronized wmf-config/db-eqiad.php 's5 repool db1045, warm up' [03:43:52] Logged the message, Master [03:43:52] Hm. The fail function documentation isn't as useful as I wanted. I was hoping it'd make the run fail when executed, not make the run fail to parse. [03:46:01] !log springle synchronized wmf-config/db-eqiad.php 's6 depool db1006 for upgrade' [03:46:03] (03PS4) 10coren: Labs: prevent ec2id fact from returing errors [operations/puppet] - 10https://gerrit.wikimedia.org/r/123168 [03:46:06] Logged the message, Master [03:46:18] Ah, or perhaps not. Maybe I used fail wrong. Fail at fail. :-) [03:46:42] which is a success, sort of :) [03:48:34] !log LocalisationUpdate ResourceLoader cache refresh completed at Wed Apr 2 03:48:31 UTC 2014 (duration 48m 30s) [03:48:38] Logged the message, Master [03:56:43] !log springle synchronized wmf-config/db-eqiad.php 's6 repool db1006, warm up' [03:56:47] Logged the message, Master [04:27:21] !log springle synchronized wmf-config/db-eqiad.php 's7 depool db1039 for upgrade' [04:27:26] Logged the message, Master [04:42:21] !log springle synchronized wmf-config/db-eqiad.php 's7 repool db1039, warm up' [04:42:26] Logged the message, Master [04:45:09] !log springle synchronized wmf-config/db-eqiad.php 's1 depool db1062 for upgrade' [04:45:12] Logged the message, Master [04:52:20] (03CR) 10Andrew Bogott: [C: 032] "Looks good, but I'm not merging because I want to be awake when this applies everywhere..." [operations/puppet] - 10https://gerrit.wikimedia.org/r/123168 (owner: 10coren) [04:53:11] !log springle synchronized wmf-config/db-eqiad.php 's1 repool db1062, warm up' [04:53:16] Logged the message, Master [05:31:31] !log springle synchronized wmf-config/db-eqiad.php 'normal loads for all upraded slaves' [05:31:35] Logged the message, Master [05:46:06] PROBLEM - HTTP 5xx req/min on tungsten is CRITICAL: CRITICAL: reqstats.5xx [crit=500.000000 [06:56:43] (03PS5) 10Hashar: Labs: prevent ec2id fact from returing errors [operations/puppet] - 10https://gerrit.wikimedia.org/r/123168 (owner: 10coren) [06:57:32] (03CR) 10Hashar: "Amended topic to poke bug 63322" (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/123168 (owner: 10coren) [07:03:21] <_joe_> hi all [07:59:46] PROBLEM - SSH on lvs1001 is CRITICAL: Server answer: [08:00:46] RECOVERY - SSH on lvs1001 is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1.2 (protocol 2.0) [08:08:47] (03PS3) 10Matanya: Puppetize purge-checkuser [operations/puppet] - 10https://gerrit.wikimedia.org/r/122617 (owner: 10Hoo man) [08:13:46] PROBLEM - SSH on lvs1001 is CRITICAL: Server answer: [08:15:04] (03PS1) 10Dzahn: authorize Giuseppe for all Icinga commands [operations/puppet] - 10https://gerrit.wikimedia.org/r/123179 [08:15:08] _joe_: ^ [08:16:46] RECOVERY - SSH on lvs1001 is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1.2 (protocol 2.0) [08:18:29] (03CR) 10Giuseppe Lavagetto: [C: 031] "Username and group permissions are correct." [operations/puppet] - 10https://gerrit.wikimedia.org/r/123179 (owner: 10Dzahn) [08:20:00] (03PS2) 10Dzahn: authorize Giuseppe for all Icinga commands [operations/puppet] - 10https://gerrit.wikimedia.org/r/123179 [08:38:36] (03CR) 10Dzahn: [C: 04-1] "we need to add the matching Icinga contact first in private repo, which is pending the sms gateway address for local provider" [operations/puppet] - 10https://gerrit.wikimedia.org/r/123179 (owner: 10Dzahn) [08:52:46] PROBLEM - SSH on lvs1001 is CRITICAL: Server answer: [08:53:28] (03CR) 10Dzahn: [C: 032] authorize Giuseppe for all Icinga commands [operations/puppet] - 10https://gerrit.wikimedia.org/r/123179 (owner: 10Dzahn) [08:53:40] (03PS5) 10Matanya: mail :lint [operations/puppet] - 10https://gerrit.wikimedia.org/r/109514 [08:55:44] RECOVERY - SSH on lvs1001 is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1.2 (protocol 2.0) [08:56:27] aha, we've got a new ops [08:57:38] MaxSem: yes, we do:) [08:57:44] PROBLEM - HTTP 5xx req/min on tungsten is CRITICAL: CRITICAL: reqstats.5xx [crit=500.000000 [08:57:50] _joe_: ^ that's Max, he's in mobile [08:58:01] MaxSem: it's _joe_ [08:58:13] hi _joe_ :) [08:58:49] .ru - .it [08:58:55] <_joe_> MaxSem: hi [09:06:33] so what you'll be doing? [09:06:58] (03CR) 10Dzahn: [C: 032] "shut down yesterday" [operations/dns] - 10https://gerrit.wikimedia.org/r/122837 (owner: 10Matanya) [09:08:50] i hope not a assigned to special project too soon, heh [09:09:32] !log DNS update - removing ms10 [09:09:36] Logged the message, Master [09:13:54] (03CR) 10Dzahn: [C: 032] capella: decom [operations/dns] - 10https://gerrit.wikimedia.org/r/122719 (owner: 10Matanya) [09:14:26] (03PS1) 10Giuseppe Lavagetto: Adding Giuseppe to ichinga contact groups. [operations/puppet] - 10https://gerrit.wikimedia.org/r/123186 [09:14:29] !log DNS update - removing capella [09:14:32] Logged the message, Master [09:14:54] (03CR) 10QChris: "(Keeping CR-1, to mark dependency on merge of https://gerrit.wikimedia.org/r/#/c/121531/" (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/121546 (owner: 10Ottomata) [09:16:22] (03CR) 10Dzahn: [C: 032] "RT #7173" [operations/puppet] - 10https://gerrit.wikimedia.org/r/123186 (owner: 10Giuseppe Lavagetto) [09:30:21] CUSTOM - Host magnesium is UP: PING OK - Packet loss = 0%, RTA = 0.24 ms [09:30:36] _joe_: ^ works, very niceee [09:30:39] <_joe_> eheh [09:31:05] ok, that shows as you can also do the other commands [09:31:17] morning (kind of) [09:31:29] f.e. ACK when something is critical but you know it's maintenance [09:31:36] hi hashar [09:31:50] what is that? [09:32:26] paravoid: "custom notificatio" in icinga web ui to test if he is authorized for commands [09:32:48] ah [09:32:50] it doesnt do anything except send a custom notification [09:33:33] <_joe_> just to test my privileges [09:34:52] !log restarting gitblit on antimony [09:34:56] Logged the message, Master [09:35:04] that should also cause a recovery btw [09:35:26] ACKNOWLEDGEMENT - gitblit.wikimedia.org on antimony is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Temporarily Unavailable - 516 bytes in 0.013 second response time daniel_zahn restarted it [09:35:45] ACK for a critical service example [09:38:03] RECOVERY - gitblit.wikimedia.org on antimony is OK: HTTP OK: HTTP/1.1 200 OK - 503318 bytes in 9.305 second response time [09:39:46] Coren: does virt0 still do anything [09:40:14] it runs pmtpa labs?:) [09:40:24] it is supposed to live for a bit more [09:40:34] mutante: remind you of hume :) [09:40:57] MaxSem: well yea, i just ask because there was the switch of wikitech and puppetmaster yesterday [09:41:05] it already is not a puppetmaster or webserver [09:41:16] anymore [09:41:28] matanya: ok [09:42:35] (03PS1) 10Alexandros Kosiaris: Create u_kolossos db and user [operations/puppet] - 10https://gerrit.wikimedia.org/r/123191 [09:42:44] matanya: "remove db67 from coredb,decom db64,db65,db66,db70" let's just turn that into "remove db67" and kill the others already [09:42:55] agreed [09:42:56] !log rsynced brewster /srv to carbon [09:42:57] it's just DHCP but they're off the list [09:43:00] Logged the message, Master [09:44:40] mutante: you will handle that one? did you speak to springle ? [09:45:10] matanya: yep [09:45:17] thanks [09:45:27] no, didn't speak on IRC, just gerrit comment [09:45:58] ok, your call [09:51:54] (03PS3) 10Matanya: webserver: fixing duplicate declaration of apache-mpm [operations/puppet] - 10https://gerrit.wikimedia.org/r/112423 [09:55:10] akosiaris: https://gerrit.wikimedia.org/r/#/c/116936/ ? [09:55:25] matanya: correction, i already asked on gerrit [09:55:48] mutante: you can do that ^ one too if you wish [09:56:00] "## imminent decomission/reclaim from pmtpa pending 12th floor reorg " [09:56:11] what does it mean: remove from puppet or not [09:56:55] where do you see that? [09:56:56] decom or reclaim ,pls [09:57:12] line 643 of site.pp [09:57:21] eh, or 651 in the patch [09:57:31] akosiaris: if you are around, how long does it takes your script to compile catalogs? And did you get added to some git repository somewhere? [09:58:55] mutante: you should ask springle or RobH i guess [09:59:38] (03PS2) 10Matanya: statistics: converted iptables to ferm rule [operations/puppet] - 10https://gerrit.wikimedia.org/r/117670 [09:59:39] hashar: on a 4 cpu labs machine around 30-40 minutes. To your second question the answer is "I have not yet got to it" [10:00:00] ah that takes a while :-( [10:00:22] embarrassingly parallel though [10:00:40] gimme quantum computing and I will be done in a jiffie :-) [10:01:03] you might find time/place for improvement [10:01:37] akosiaris: I am pretty sure we can find out how to make it faster whenever it is properly open sourced / published :-D [10:02:44] hashar: Yeah, I need to find some time for that. Hopefully next-next week I will be done with OSM and ops meeting and will do exactly that [10:02:56] matanya: kind of reluctant on https://gerrit.wikimedia.org/r/#/c/116936 [10:03:06] well push , announce, get folks to play with it, merge proposed enhancements :] [10:03:30] I can not find id deploy/id_rsa.pub is being used anywhere, but... [10:03:32] matanya: same here, i skipped that key back in the original patch on purpose, not just missed it:) [10:03:34] if* [10:03:50] fair enough [10:04:10] if you merge, we can find out who is using :D [10:04:15] I am wondering whether we should merge it and see what breaks [10:04:16] if any [10:04:16] exactly [10:05:04] anyway, your call [10:06:05] mutante: maybe ask bd808|BUFFER ? [10:06:24] I suppose with all the scap work, he and ori might be the best source around ? [10:07:07] i'll poke him on SF times tonight [10:07:31] akosiaris: yea, i'd rather not break it right now, just half day work today [10:07:34] and off tomorrow [10:07:57] i'd prefer to get some other decom stuff in [10:08:06] but asking them is a good idea,yep [10:09:15] today i learned: vim let you save a file encrypted with command X [10:09:36] <_joe_> hashar: encrypted how? [10:09:46] <_joe_> with gpg? [10:10:01] (03PS1) 10Hashar: ganglia: define in subclass at the top level [operations/puppet] - 10https://gerrit.wikimedia.org/r/123195 [10:10:01] it ask for some key [10:10:03] (03PS1) 10Hashar: ganglia: lint manifest! [operations/puppet] - 10https://gerrit.wikimedia.org/r/123196 [10:10:08] so a very weak encryption I guess [10:10:49] (03CR) 10Hashar: "Took me a while to find out the collector/aggregator classes since I was gripping for ganglia::collector hehe." [operations/puppet] - 10https://gerrit.wikimedia.org/r/123195 (owner: 10Hashar) [10:11:14] <_joe_> I'm sure I can find some 'easy' gpg-to-emacs integration just for the sake of an editor war, anyway :P [10:11:17] akosiaris: https://gerrit.wikimedia.org/r/#/c/123196/ could use your catalog script of doom :-] It is a huge lint [10:13:37] hashar: vim can do blowfish from 7.3 and on, so strong encryption. But <7.3 is is pkzip which is indeed weak and insecure [10:14:31] (03PS4) 10Matanya: icinga: replace iptable with ferm rules [operations/puppet] - 10https://gerrit.wikimedia.org/r/117674 [10:16:26] hashar: doing 123196... it is together with two others on the same machine so no 30 mins but considerably more :-( [10:16:43] tis ok [10:16:51] by the end of the afternoon we will get some report [10:17:35] akosiaris: want me to get your script wrapped in a jenkins job? So folks will be able to trigger the job manually by simply filling the change they are interested in [10:17:49] this way we dont have to contextswitch(self) :] [10:18:15] (03PS5) 10Alexandros Kosiaris: sockpuppet: remove leftovers [operations/puppet] - 10https://gerrit.wikimedia.org/r/116936 (owner: 10Matanya) [10:19:52] (03CR) 10Alexandros Kosiaris: "Bryan, Ori: I added you because of the scap work and you might (or might not) be knowing the answer to the following question "Is the del" [operations/puppet] - 10https://gerrit.wikimedia.org/r/116936 (owner: 10Matanya) [10:21:30] (03Abandoned) 10Hashar: Noop change (DO NOT SUBMIT) [operations/debs/ganglia] - 10https://gerrit.wikimedia.org/r/122833 (owner: 10Hashar) [10:22:44] mutante: are you using vim? There is a plugin to nicely align puppet arrows : https://github.com/godlygeek/tabular [10:23:27] hashar: yes I want it added to jenkins for sure, but let me first open source it so people can indeed improve it. Btw what is the API for jenkins ? What do you expect from the script ? exit codes/something else ? [10:23:30] hashar: yes, i use vim, that's interesting [10:24:34] (03CR) 10Hashar: [C: 031] lint role/deployment (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/122338 (owner: 10Dzahn) [10:25:07] mutante: there is a puppet git://github.com/rodjek/vim-puppet.git not sure what it does though [10:25:56] the github username sounds familiar [10:26:11] akosiaris: whatever input the script uses can be made a parameter in the Jenkins jobs for folks to manually fill. The exit code is pretty simple: exit 0 means job success, exit 1 means job error [10:26:38] akosiaris: if your shell script uses -e , any error in the script will abort the build and make it a failure [10:27:46] :-) [10:28:15] we can also have Jenkins collect some output files and attach them to the build page in the web interface. [10:28:40] for mw/core we record the debug log, test results and the LocalSettings.php that got used. [10:29:59] (03PS2) 10Hashar: ganglia: class defined in class moved to top [operations/puppet] - 10https://gerrit.wikimedia.org/r/123195 [10:49:36] hashar / mutante i use https://github.com/scrooloose/syntastic [10:50:31] matanya: ok.. now back to hume.. did you say the cron job is running? [10:50:43] tim saied it is not [10:50:51] should be on terbium [10:50:59] so hume is good to go [10:51:25] yea, i meant "running on terbium",, did it work and get merged after revert [10:51:28] looks [10:51:45] should have [10:52:18] akosiaris: can have u_aude (or maybe u_wikidata_geo) db and user for postgres? [10:52:23] do i make a bug ticket? [10:52:51] ah, can make gerrit patch [10:53:50] (03CR) 10Dzahn: [C: 04-1] "got it, thanks Sean. Diederik?" [operations/puppet] - 10https://gerrit.wikimedia.org/r/111152 (owner: 10Diederik) [10:54:14] aude: yes, please update bz ticket though [10:54:17] ok [10:54:35] https://bugzilla.wikimedia.org/show_bug.cgi?id=63382 [10:54:36] i suppose i'll wait for kolossos patch to be merged [10:54:42] don't want merge conflicts [10:55:02] no need, update the ticket, I 'll update the patch [10:56:02] ok [10:56:23] (03CR) 10Dzahn: "mingle.corp.wikimedia.org looks broken.. is that wikimedia.mingle.thoughtworks.com now? how about redirecting to the new one?" [wikimedia/bugzilla/modifications] - 10https://gerrit.wikimedia.org/r/120341 (owner: 1001tonythomas) [10:56:29] done [10:57:53] (03CR) 10Dzahn: [C: 032] "because the old URL doesn't work anymore now (just a little while ago both worked)" [wikimedia/bugzilla/modifications] - 10https://gerrit.wikimedia.org/r/120341 (owner: 1001tonythomas) [10:58:08] (03CR) 10Matanya: ganglia: lint manifest! (034 comments) [operations/puppet] - 10https://gerrit.wikimedia.org/r/123196 (owner: 10Hashar) [10:58:41] (03CR) 10Dzahn: [V: 032] "because the old URL doesn't work anymore now (just a little while ago both worked)" [wikimedia/bugzilla/modifications] - 10https://gerrit.wikimedia.org/r/120341 (owner: 1001tonythomas) [11:01:59] (03CR) 10Dzahn: decom: hume (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/122605 (owner: 10Matanya) [11:03:11] (03CR) 10Dzahn: "Change-Id: I448bf155c2436ac07506c7390d1d63cb620a0ebf is not merged yet, but since that doesn't actually run on hume that would not block i" [operations/puppet] - 10https://gerrit.wikimedia.org/r/122605 (owner: 10Matanya) [11:04:23] (03PS2) 10Alexandros Kosiaris: Create u_kolossos,u_aude dbs and users [operations/puppet] - 10https://gerrit.wikimedia.org/r/123191 [11:05:01] (03CR) 10Matanya: decom: hume (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/122605 (owner: 10Matanya) [11:05:15] matanya: did anyone say if we want logs or don't want logs? [11:05:23] hoo said we don't [11:05:30] tim said he doesn't care [11:05:33] akosiaris: thanks [11:05:44] aude: you are welcome [11:05:44] matanya: fair, thanks [11:07:03] (03CR) 10Dzahn: [C: 032] Puppetize purge-checkuser [operations/puppet] - 10https://gerrit.wikimedia.org/r/122617 (owner: 10Hoo man) [11:09:52] mark: can you please look at https://gerrit.wikimedia.org/r/#/c/118480/ ? [11:10:03] (03CR) 10Dzahn: "this has been created on terbium now. notice: /Stage[main]/Misc::Maintenance::Purge_checkuser/Cron[purge-checkuser]/ensure: created" [operations/puppet] - 10https://gerrit.wikimedia.org/r/122617 (owner: 10Hoo man) [11:11:09] mutante: i pushed hume decom patches, if you didn't notice. letting you know, to prevent duplicate work [11:12:11] matanya: i know, we just commented both on one of them [11:12:35] * matanya facepalms [11:20:43] !log running CheckUser/maintenance/purgeOldData.php on all wikis [11:20:47] Logged the message, Master [11:21:19] (03PS1) 10Alexandros Kosiaris: brewster to carbon migration [operations/puppet] - 10https://gerrit.wikimedia.org/r/123202 [11:22:59] TimStarling: ^ running that manually once, works.. the cronjob got created on terbium as well [11:23:27] (as apache) [11:25:17] (03CR) 10Dzahn: "i ran this manually once on terbium, as apache and watching it..since this did not run for quite a while" [operations/puppet] - 10https://gerrit.wikimedia.org/r/122617 (owner: 10Hoo man) [11:28:51] (03PS1) 10Alexandros Kosiaris: Move brewster services to carbon [operations/dns] - 10https://gerrit.wikimedia.org/r/123205 [11:28:53] (03PS1) 10Alexandros Kosiaris: Remove DNS entries for brewster [operations/dns] - 10https://gerrit.wikimedia.org/r/123206 [11:33:33] mutante: apergos wanted to keep ms10. maybe note it in the decom ticket [11:33:43] matanya: true! [11:34:52] maybe we can use it in eqiad as a tridge replacement ? [11:34:58] or mirror [11:36:22] it takes forever to run that on loginwiki [11:36:31] i'm still watching that [11:36:51] purging check user data [11:37:33] matanya: check with Ariel and RobH i suggest [11:37:59] mutante: that is becuse we do a lot of checks there [11:38:07] and i mean A LOT [11:38:18] yep. 04:34 cu_changes table on loginwiki has 1804371 rows [11:38:32] i'll ask them about ms10 [11:38:32] as opposed to something like [11:38:38] minwiki: Purging data from cu_changes...7 rows. [11:38:51] i guess two of those are mine [11:39:07] heh [11:39:37] it was expected this is slow when it runs the first time after X [11:39:42] (03CR) 10Mark Bergsma: "Some of these have been taken over by toolserver and are still using their old management ips. For convenience, let's keep them in until T" (031 comment) [operations/dns] - 10https://gerrit.wikimedia.org/r/118480 (owner: 10Matanya) [11:39:50] that's why i did it [11:40:01] so the cron should not take that long at all then [11:40:23] (03CR) 10Tim Landscheidt: "Will this solve http://apt.wikimedia.org/ returning nothing?" [operations/puppet] - 10https://gerrit.wikimedia.org/r/123202 (owner: 10Alexandros Kosiaris) [11:47:40] matanya: Reedy: it finished (purging).. ended with zuwikt [11:48:07] so 40 minutes ~ ? [11:49:39] matanya: 27, but since it runs each day now.. should be way less [11:50:07] we should pick an eye open on the few following days [11:50:19] it doesn't log :p [11:50:41] run it again and see? :D [11:51:11] heh, yes:) [11:51:26] Yup [11:51:30] It removed 800k rows [11:51:34] 1044164 now in cu_changes [11:52:36] PROBLEM - MySQL Replication Heartbeat on db1037 is CRITICAL: CRIT replication delay 309 seconds [11:54:08] springle: ^ dewiki and wikidatawiki db's busy [11:54:44] matanya: done, 2-3 minutes [11:54:47] :) [11:54:57] nice [11:56:36] RECOVERY - MySQL Replication Heartbeat on db1037 is OK: OK replication delay -1 seconds [11:58:13] Anything else left on hume? [11:59:46] we think .. no [12:00:06] but it appears here, see this [12:00:27] https://gerrit.wikimedia.org/r/#/c/122605/ [12:00:35] (03PS2) 10Alexandros Kosiaris: brewster to carbon migration [operations/puppet] - 10https://gerrit.wikimedia.org/r/123202 [12:00:42] https://gerrit.wikimedia.org/r/#/c/122605/1/manifests/site.pp [12:01:08] does terbium replace all of those includes? it should [12:02:26] PROBLEM - Puppet freshness on carbon is CRITICAL: Last successful Puppet run was Wed 02 Apr 2014 09:02:01 AM UTC [12:03:54] (03CR) 10Dzahn: "this was merged instead now: Change-Id: I448bf155c2436ac07506c7390d1d63cb620a0ebf" [operations/puppet] - 10https://gerrit.wikimedia.org/r/74591 (owner: 10Reedy) [12:04:03] They should've been moved a while ago [12:05:40] should we claim "deploy terbium as hume replacement" is done? [12:05:49] that's separate from decom hume [12:06:06] I'd think so [12:06:24] #4839: misc::maintenance:: home dependent entries [12:06:29] hmmm,what's that [12:07:05] ah, resolved by RobH and then just reopened automatically because of a reply [12:08:27] closing with "anyone who thinks terbium lacks something that hume had, please speak up " [12:11:27] mutante: zero wiki is going to be in the cluster? (i.e. centralauth, stewards and the rest) [12:11:51] matanya: no idea [12:14:38] (03CR) 10Dzahn: [C: 04-1] "files/misc/scripts/purge-checkuser has been deleted now in Change-Id: I448bf155c2436ac07506c7390d1d63cb620a0ebf and this has a path confli" [operations/puppet] - 10https://gerrit.wikimedia.org/r/122605 (owner: 10Matanya) [12:16:43] (03PS2) 10Matanya: decom: hume [operations/puppet] - 10https://gerrit.wikimedia.org/r/122605 [12:16:52] fixed [12:16:56] matanya: feel like finding this out? [12:16:58] #7080: scap rsync slaves (mw1010 and mw1070) missing allow for 10.64.48.0/22 [12:17:06] that links to patches that have been merged [12:17:22] i _think_ it's done [12:17:38] looking [12:19:59] looks done, i'll poke bd808|BUFFER on this too [12:20:09] a bunch of questions for him today [12:20:23] tyvm [12:23:37] (03PS1) 10Dzahn: delete empty pmtpa dsh files, remove mw10 [operations/puppet] - 10https://gerrit.wikimedia.org/r/123210 [12:24:18] (03PS2) 10Dzahn: delete empty pmtpa dsh files, remove mw10 [operations/puppet] - 10https://gerrit.wikimedia.org/r/123210 [12:24:39] Reedy: ^ [12:25:03] :) [12:25:20] (03PS3) 10Dzahn: delete empty pmtpa dsh files, remove mw10,mw40,db10 [operations/puppet] - 10https://gerrit.wikimedia.org/r/123210 [12:29:55] (03CR) 10Dzahn: [C: 032] delete empty pmtpa dsh files, remove mw10,mw40,db10 [operations/puppet] - 10https://gerrit.wikimedia.org/r/123210 (owner: 10Dzahn) [12:39:07] (03PS1) 10Dzahn: remove all Tampa appservers from DHCP [operations/puppet] - 10https://gerrit.wikimedia.org/r/123211 [12:39:40] (03PS2) 10Dzahn: remove all Tampa appservers from DHCP [operations/puppet] - 10https://gerrit.wikimedia.org/r/123211 [12:40:24] matanya: there^ one more,, and now i'm out for today (actually half day which is over) cya [12:40:30] be back on Fri [12:40:57] thanks have fun on vacation [12:41:06] thanks for the help as usual, cya [12:43:32] (03PS1) 10Hashar: parsoid: systemuser is only for production [operations/puppet] - 10https://gerrit.wikimedia.org/r/123212 [12:45:53] (03CR) 10Hashar: "Applied on deployment-salt, the beta cluster puppetmaster. Solves:" [operations/puppet] - 10https://gerrit.wikimedia.org/r/123212 (owner: 10Hashar) [12:46:20] (03CR) 10Hashar: [C: 031 V: 032] parsoid: systemuser is only for production [operations/puppet] - 10https://gerrit.wikimedia.org/r/123212 (owner: 10Hashar) [12:50:38] (03PS1) 10MaxSem: Clean up DSH groups [operations/puppet] - 10https://gerrit.wikimedia.org/r/123213 [12:50:52] mutante|away, you might also be interested in^^^^ [12:57:44] join jenkins [13:28:36] (03CR) 10Andrew Bogott: [C: 032] Labs: prevent ec2id fact from returing errors [operations/puppet] - 10https://gerrit.wikimedia.org/r/123168 (owner: 10coren) [13:36:59] (03PS1) 10Andrew Bogott: Don't try to reset $certname [operations/puppet] - 10https://gerrit.wikimedia.org/r/123221 [13:37:10] Coren: fyi, ^ [13:39:36] (03CR) 10Andrew Bogott: [C: 032] Don't try to reset $certname [operations/puppet] - 10https://gerrit.wikimedia.org/r/123221 (owner: 10Andrew Bogott) [13:45:29] (03CR) 10Hashar: ganglia: lint manifest! (034 comments) [operations/puppet] - 10https://gerrit.wikimedia.org/r/123196 (owner: 10Hashar) [13:45:44] (03PS2) 10Hashar: ganglia: lint manifest! [operations/puppet] - 10https://gerrit.wikimedia.org/r/123196 [13:48:32] andrewbogott: Ay, yes. I keep forgetting that puppet syntax only /looks/ like it makes sense. :-) [13:48:51] yep, me too :( [13:50:02] hii Reedy, you there? [13:52:01] andrewbogott: So at least that's going to prevent that same error from occuring again. As I've said, about half the tools instance got mangled puppet.conf out of it; I expect that's true of all projects and we should probably inventory. [13:54:20] ottomata: mostly [13:54:36] hey, just wondering about your stat1` account [13:54:43] you are one of the last 3 I have yet to hear from :) [13:54:46] heh [13:54:52] do you want an account on the new stat1003 replacement server? [13:54:53] I suspect I probably don't need it... [13:55:04] I can't remember the last time I logged onto it [13:55:09] you'd only need it if you need to connect to mysql research slaves AND you want a place to save query results and do number crunching [13:55:10] I think it was poking at api request logs [13:55:21] aye, cool, those aren't even on stat1/stat1003 anymore :) [13:55:33] heh [13:55:42] those are on stat1002 [13:55:45] anyway, ok cool, thanks [13:55:46] I guess that's a no then [13:55:53] andrewbogott: Could you please use the syntax "Bug: 63322" on a single line immediately prior to "Change-Id"? That triggers the Bugzilla comments by Gerrit Notification Bot. [13:56:40] heya apergos, just noticed something with users on stat1002 [13:56:52] scfc_de: ok [13:56:59] you did a couple of RTs for access there back in december [13:57:00] hashar: answered you on the patch, from some reason it doesn't seem to show up here [13:57:08] and you added the users in site.pp under the node [13:57:19] that's fine, but there is an accounts::privatedata class in admins.pp [13:57:26] Coren: thoughts on how to inventory? [13:57:26] probably did, when I didn't know about the class [13:57:27] that I've been using for users with access to private data [13:57:32] ahh, cool, you know now [13:57:35] yep [13:57:39] ok cool, just letting you know that it exists for the future [13:57:42] yep [13:57:44] cool, ok thanks! [13:58:42] apergos: what is your plan for ms10? [13:58:44] andrewbogott: A simple grep for '' will do. [13:59:50] use it for dump related things somewhere (new dc since that's where boxes get shipped now), maybe the new incrementals [14:00:04] !log tried restarting some lsearchd services (carefully) to clear out some crashing when searching for a particular query term. It caused pool queue full errors.... serves me right for trying? [14:00:10] Logged the message, Master [14:00:13] Coren: sure enough, half a dozen or so. You fixed just by replacing that one string in /etc/puppet/puppet.conf? Or was there more to it? [14:01:32] (03PS1) 10Ottomata: Removing standard and admins::roots from base statistics class [operations/puppet] - 10https://gerrit.wikimedia.org/r/123225 [14:04:29] (03CR) 10Ottomata: [C: 032 V: 032] Removing standard and admins::roots from base statistics class [operations/puppet] - 10https://gerrit.wikimedia.org/r/123225 (owner: 10Ottomata) [14:05:00] andrewbogott: Just fixing the string works; I did it manually, I don't think we have enough to justify writing a script for it. [14:05:35] yep, doing. [14:08:00] andrewbogott: It should be rare enough; the only reason I even bothered making a puppet patch for it is because 'puppet breaks its own config in a way that makes it unable to run' is pretty bad even if it's rare. :-) [14:08:43] Yeah, it seems important. And your fix seems sound, as a subsequent puppet run should fix things. [14:09:08] Well, it's better to fail the run than self-destruct, certainly. :-) [14:10:21] (03PS1) 10Ottomata: Moving privatedata users on stat1002 into admins::privatedata class [operations/puppet] - 10https://gerrit.wikimedia.org/r/123229 [14:12:59] (03CR) 10Ottomata: [C: 032 V: 032] Moving privatedata users on stat1002 into admins::privatedata class [operations/puppet] - 10https://gerrit.wikimedia.org/r/123229 (owner: 10Ottomata) [14:14:02] so, matanya, question [14:14:08] yes? [14:14:11] if I merge that statistics ferm change [14:14:17] will anything at all change on the server? [14:14:25] yes [14:14:26] sure [14:14:31] or is it just a simplification of iptables stuff? [14:14:48] i think I should merge that before I puppetize more of stat1003, eh? [14:15:01] ottomata: ferm has a default DROP policy [14:15:13] so make sure you can ferm::rules for everything you need [14:15:18] you have* [14:15:22] oh hm [14:15:34] like, everything? webservices, ssh? [14:15:39] or are there some default open ports? [14:15:44] not ssh [14:15:44] your default ip table rule will be drop [14:15:52] but webservices yes [14:15:55] !log Jenkins deleting pmtpa slaves (they all have been shutdown and jobs got deleted) [14:16:00] Logged the message, Master [14:16:04] hmmmm [14:16:15] there are some default rules indeed. Look into base module's ferm::rules [14:16:27] ok cool, will check it [14:16:34] matanya: for now then, let's not delete the old stuff [14:16:37] the iptables stuff [14:16:43] let's add the new ::firewall class [14:16:48] and I will include that on stat1003 [14:16:53] but not remove the iptables stuff from stat1 [14:17:01] when we decom stat1 we will remove those classes [14:17:10] i'm all for that ottomata [14:17:27] but if you use it in the first place it will be much easier [14:17:40] (03PS3) 10Ottomata: statistics: converted iptables to ferm rule [operations/puppet] - 10https://gerrit.wikimedia.org/r/117670 (owner: 10Matanya) [14:17:55] yeah, so, i'm saying matanya, stat1003 will have that included at the start [14:18:01] and I want to puppetize it now [14:18:06] or just merge it after stat [14:18:15] but i'm not ready to possibly disrupt users on stat1 yet [14:18:21] * so just merge it after stat1 is dead [14:18:21] (03PS3) 10Hashar: contint: install puppet-lint from rubygems on labs [operations/puppet] - 10https://gerrit.wikimedia.org/r/120498 [14:19:43] pfff [14:20:17] (03CR) 10Matanya: [C: 04-1] "no need for this. puppet0lint was backported. see RT #7154" [operations/puppet] - 10https://gerrit.wikimedia.org/r/120498 (owner: 10Hashar) [14:20:32] (03PS19) 10Alexandros Kosiaris: etherpad: convert into a module [operations/puppet] - 10https://gerrit.wikimedia.org/r/107567 (owner: 10Matanya) [14:20:47] matanya: awesome [14:20:50] well, i want to have the ferm stuff to apply to stat1003 now though [14:20:57] i want some overlap [14:21:20] ottomata: so duplicate that line and add an if or something like that [14:21:28] (03Abandoned) 10Hashar: contint: install puppet-lint from rubygems on labs [operations/puppet] - 10https://gerrit.wikimedia.org/r/120498 (owner: 10Hashar) [14:22:03] ottomata: go for overlap and you are asking for trouble [14:22:08] akosiaris: don't forget you need https://gerrit.wikimedia.org/r/#/c/112423/ before the etherpad module [14:23:44] (03PS1) 10Hashar: contint: bring puppet-lint on lab slaves [operations/puppet] - 10https://gerrit.wikimedia.org/r/123235 [14:24:16] matanya: ^^ :-D [14:24:49] (03CR) 10Matanya: [C: 031] contint: bring puppet-lint on lab slaves [operations/puppet] - 10https://gerrit.wikimedia.org/r/123235 (owner: 10Hashar) [14:25:33] (03CR) 10Hashar: [C: 031 V: 032] "Applied on integration puppetmaster integration-puppetmaster." [operations/puppet] - 10https://gerrit.wikimedia.org/r/123235 (owner: 10Hashar) [14:27:43] !log Jenkins applying label contintLabsSlave on slaves in labs used for ci (integration-slave1001 and 1002) [14:27:48] Logged the message, Master [14:28:08] hashar: please verify the version [14:28:15] it should be 3.2 or so [14:28:16] matanya: did [14:28:18] great [14:28:43] my work was worthwhile for someone :) [14:29:09] akosiaris: trouble?! [14:29:10] no i mean [14:29:17] overlap in time when stat1 and stat1003 are both accessible to users [14:29:32] i just want to ask a few users to log in and make sure things are as they shoudl be [14:29:35] before I turn off stat1 [14:30:46] but wait, q [14:30:49] ottomata: aah ok, I misunderstood then [14:30:57] do we even need redis on stat1 / stat1003? [14:31:00] is it even being included? [14:31:06] hm, looking [14:33:16] ok, i'm going to wait and see what happens when I puppetize stat1003, i don't think it will install redis [14:33:23] don't see anywhere that is happening... [14:43:53] matanya: https://integration.wikimedia.org/ci/job/operations-puppet-puppetlint/2/console :-D [14:44:22] \o/ [14:45:01] and the 1.47MB log file is attached to builds https://integration.wikimedia.org/ci/job/operations-puppet-puppetlint/2/ [14:45:02] :D [14:45:54] nice, very nice [14:46:07] now we should filter/fix/clean [14:46:11] (03PS1) 10Ottomata: Removing gerrit_stats class from role::statistics::cruncher [operations/puppet] - 10https://gerrit.wikimedia.org/r/123237 [14:46:22] e.g no WARNING line has more than 80 characters (80chars) [14:46:31] (03CR) 10Ottomata: [C: 032 V: 032] Removing gerrit_stats class from role::statistics::cruncher [operations/puppet] - 10https://gerrit.wikimedia.org/r/123237 (owner: 10Ottomata) [14:46:42] or not in autoload module layout for foles [14:49:20] matanya: yeah probably want to drop a bunch of checks entirely [14:49:34] yes [14:49:49] if you see some obvious offenders, we can pass some --no-80chars or something [14:49:49] (03PS1) 10Ottomata: Removing some includes of erosen and mgrover accounts [operations/puppet] - 10https://gerrit.wikimedia.org/r/123240 [14:50:37] (03PS2) 10Ottomata: Removing some includes of erosen and mgrover accounts [operations/puppet] - 10https://gerrit.wikimedia.org/r/123240 [14:51:51] hashar: just check http://puppet-lint.com/checks/ and decide what we want [14:51:54] matanya: the puppet-lint is a macro in Jenkins Job Builder. It is in integration/jenkins-job-builder-config.git macro.yaml or something [14:52:03] similar list with puppet-lint --help :D [14:52:19] I have edited the default message to report the internal name of the failing checks [14:52:21] hashar: want to give me bit access in jenkins, or is it disallowed ? [14:52:54] access is limited, not sure how to get it granted to moaar folks [14:52:55] :/ [14:55:11] maybe I should get two jobs [14:55:20] (03CR) 10Ottomata: [C: 032 V: 032] Removing some includes of erosen and mgrover accounts [operations/puppet] - 10https://gerrit.wikimedia.org/r/123240 (owner: 10Ottomata) [14:55:22] one that only takes care of error, and another that reports errors+warnings [15:00:21] Hi StevenW, you there? [15:03:03] (03CR) 10Giuseppe Lavagetto: [C: 04-1] "The firewall rules for accepting connections has changed from a source of 208.80.152.0/22 to 10.0.0.0/8; which may be a good thing, but th" [operations/puppet] - 10https://gerrit.wikimedia.org/r/117670 (owner: 10Matanya) [15:03:26] PROBLEM - Puppet freshness on carbon is CRITICAL: Last successful Puppet run was Wed 02 Apr 2014 09:02:01 AM UTC [15:04:02] (03PS1) 10Ottomata: Adding role and user classes to stat1003 [operations/puppet] - 10https://gerrit.wikimedia.org/r/123242 [15:05:36] RECOVERY - Puppet freshness on carbon is OK: puppet ran at Wed Apr 2 15:05:26 UTC 2014 [15:06:50] (03PS2) 10Ottomata: Adding role and user classes to stat1003 [operations/puppet] - 10https://gerrit.wikimedia.org/r/123242 [15:09:07] (03CR) 10Ottomata: [C: 032 V: 032] Adding role and user classes to stat1003 [operations/puppet] - 10https://gerrit.wikimedia.org/r/123242 (owner: 10Ottomata) [15:09:46] (03PS4) 10Matanya: statistics: converted iptables to ferm rule [operations/puppet] - 10https://gerrit.wikimedia.org/r/117670 [15:11:06] (03CR) 10Matanya: "Thank you, fixed that, and wrote it in a nicer way. As for your second question:" [operations/puppet] - 10https://gerrit.wikimedia.org/r/117670 (owner: 10Matanya) [15:12:38] (03PS5) 10Matanya: statistics: converted iptables to ferm rule [operations/puppet] - 10https://gerrit.wikimedia.org/r/117670 [15:12:43] (03CR) 10Giuseppe Lavagetto: [C: 031] "I think that's good, then." [operations/puppet] - 10https://gerrit.wikimedia.org/r/117670 (owner: 10Matanya) [15:14:34] (03CR) 10Giuseppe Lavagetto: [C: 031] statistics: converted iptables to ferm rule [operations/puppet] - 10https://gerrit.wikimedia.org/r/117670 (owner: 10Matanya) [15:14:51] don't merge it anyway _joe_ [15:15:00] it will break stat1 [15:15:51] <_joe_> matanya: I wouldn't merge :) [15:16:15] <_joe_> matanya: why would it break stat1? [15:16:45] because stat1 has this role, and other role such as webservices [15:17:07] and since i didn't mention them in the patch, ferm will drop packets [15:17:13] and users will be not happy [15:17:28] <_joe_> ok so you need to add ferm rules to specific classes as well [15:17:58] yes [15:18:02] _joe_: role classes preferably but you are correct [15:18:19] <_joe_> akosiaris: classes specific to the role :) [15:18:28] just in time bd808 [15:18:54] when you have sec, two questiions. [15:19:54] though _joe_ stat1 will be decomed soon and replaced with stat1003, so it is a good timing to fix things for the new host [15:20:44] !log stopping puppet on stat1 [15:20:50] Logged the message, Master [15:21:02] matanya: I haven't read backscroll yet; what's up? [15:21:36] hi, one question is regarding https://gerrit.wikimedia.org/r/#/c/116936/ [15:22:43] the other regarding RT ticket 7080 [15:23:41] matanya: That ssh key is not used by scap. I can't say where it may or may not otherwise be used. [15:24:10] good enough. please comment there :) [15:26:12] (03PS1) 10Nuria: Making sure apache can write to /var/lib/wikimetrics directory [operations/puppet/wikimetrics] - 10https://gerrit.wikimedia.org/r/123244 [15:28:50] (03CR) 10BryanDavis: "This ssh key is not used directly by scap. Scap operates using the ssh-agent of the invoking user. I can't say where it may or may not oth" [operations/puppet] - 10https://gerrit.wikimedia.org/r/116936 (owner: 10Matanya) [15:30:49] (03PS1) 10Manybubbles: Switch most group1 wikis to cirrus as primary [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/123246 [15:35:48] matanya: I replied to the RT ticket too. [15:35:55] thanks [15:39:45] (03CR) 10Odder: "This should really get merged at some point." [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/115153 (owner: 10Odder) [15:40:18] (03CR) 10Odder: "Is there anything blocking this patch set from being merged?" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/120152 (owner: 10Odder) [15:40:29] (03CR) 10Odder: "Is there anything blocking this patch set from being merged?" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/120153 (owner: 10Odder) [15:42:41] off for now will be back later tonight [15:43:20] ^d: did you happen to sync the prefix search change? [15:43:36] <^d> The caching config change? [15:43:42] yeah [15:43:44] <^d> No I didn't. [15:44:14] cool. I'm just seeing a tidge higher load then normal [15:44:22] must be someone hammering itwiki or something [15:45:00] no big deal [15:45:09] just wanted to get a baseline before we sync in 15 minutes [15:46:44] yeah, that is artificial [15:46:49] http://ganglia.wikimedia.org/latest/stacked.php?m=es_query_time&c=Elasticsearch%20cluster%20eqiad&r=hour&st=1396453581&host_regex= [15:46:55] (03CR) 10Alexandros Kosiaris: [C: 032] ganglia: lint manifest! [operations/puppet] - 10https://gerrit.wikimedia.org/r/123196 (owner: 10Hashar) [15:47:06] like someone trying to see how fast it is [15:47:31] meh [15:48:50] (03CR) 10Alexandros Kosiaris: [C: 032] iptables.pp: retab to four spaces [operations/puppet] - 10https://gerrit.wikimedia.org/r/122788 (owner: 10Hashar) [15:49:36] (03PS2) 10Manybubbles: Switch most group1 wikis to cirrus as primary [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/123246 [15:51:56] akosiaris: please don't merge that ganglia lint change yet [15:52:39] matanya: ? [15:52:46] (03CR) 10Giuseppe Lavagetto: "Wouldn't it be better to have that dir always owned by wikimetrics:wikimetrics and make the www-data user part of the wikimetrics group? I" [operations/puppet/wikimetrics] - 10https://gerrit.wikimedia.org/r/123244 (owner: 10Nuria) [15:53:11] akosiaris: refering to https://gerrit.wikimedia.org/r/123196 there is a comment by me not resolved yet [15:53:12] ^d: so we're actually not adding anything bigger then enwikisource [15:53:50] <^d> Cannot parse. "Not going to..." or "Nothing we'll be adding is bigger than..." [15:54:14] <_joe_> :) [15:54:21] matanya: ok [15:55:29] gerrit 503 :/ [15:55:47] ottomata: so before you start any of those machines, you probaby ought to ping me. just in case:) [15:56:23] <_joe_> matanya: "works for me" (TM) [15:56:57] <_joe_> doesn't work anymore. [15:57:27] _joe_: i'm predicting issues :) [15:57:57] ok cool, will do manybubbles [16:01:23] gerrit web ui down for everyone or just me? [16:01:29] everyone [16:01:40] ^d: want to kick it ? [16:01:50] bd808: trying [16:01:50] <^d> No. [16:01:52] <^d> No. [16:01:57] <^d> Kicking it solves nothing. [16:02:22] it came back for me, just slowly [16:02:25] <^d> What the FUCK? [16:02:30] s/kick/replace it with phabricator/ [16:02:47] <^d> qchris: I think we're hitting the limit. [16:02:58] ^d: today everything is smaller then enwikisource [16:03:05] <^d> Gotcha. [16:03:43] greg-g: I'mma take the conch now [16:03:49] bd808: we didn't test phab with this load just yet [16:04:00] but: oh, yeah! [16:04:13] matanya: True, but I know how to make php apps scale [16:04:29] hmm, mediawiki ? :D [16:04:57] <^d> it's some idiot crawling gerrit. [16:05:05] <^d> and using up all the available connections. [16:05:20] ^d: Ok. Let's bump the limit then :-) [16:05:28] <^d> No. [16:05:29] ^d: could you review https://gerrit.wikimedia.org/r/#/c/123246 ? [16:05:32] <^d> Let's block the idiot. [16:05:58] ^d: Ah .. it's just some crawler. Sorry :-) [16:06:02] <^d> "GET /r/changes/?q=b9842c0ca4d3e5e3564b5a34d7380b0386e47363&o=CURRENT_REVISION&o=CURRENT_FILES&n=1 HTTP/1.1" 200 1109 T=0s "-" "python-requests/1.2.3 CPython/2.7.3 Linux/3.2.0-59-virtual" [16:06:09] <^d> For every freaking sha1 you can imagine. [16:06:18] <_joe_> ^d: yes I was seeing that [16:06:19] :-D [16:06:25] (03CR) 10Manybubbles: [C: 032] Lower search suggestions to reasonable values [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/122571 (owner: 10Chad) [16:07:41] (03CR) 10Matanya: "and +1 for Giuseppe's offer." (031 comment) [operations/puppet/wikimetrics] - 10https://gerrit.wikimedia.org/r/123244 (owner: 10Nuria) [16:07:57] (03Merged) 10jenkins-bot: Lower search suggestions to reasonable values [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/122571 (owner: 10Chad) [16:08:36] (03PS1) 10Chad: Bad crawlers don't get to crawl gerrit, breaks it for real humans [operations/puppet] - 10https://gerrit.wikimedia.org/r/123293 [16:08:55] <^d> Can I get an ops merge on 123293 and merge on sockpuppet ^^^? [16:09:03] <^d> This is hurting gerrit for normal people. [16:09:35] <_joe_> ^d: taking a look - I may not be the best one though [16:09:57] Remote_Addr starting in 10? [16:10:24] <^d> qchris: I think it's something from labs. [16:10:27] <^d> But it's insane. [16:10:31] _joe_: Day 2 is plenty of time on the job to start merging and applying emergency patches :) [16:10:32] Ok. [16:10:44] * James_F grins. [16:11:05] <^d> Actually, ping Coren. [16:11:14] <^d> Coren: I think someone in labs is being mean to gerrit. [16:11:18] <_joe_> ^d: give me 5 mins (gerrit is slooow) [16:12:03] tools-exec-03.eqiad.wmflabs. [16:12:16] _joe_, welcome btw :) great to have you here! [16:12:33] <_joe_> ^d: I'd prefer to ban IP AND user agent, but for now LGTM [16:12:36] <_joe_> Eloquence: thanks [16:12:41] <^d> _joe_: Well it's labs. [16:12:48] <^d> There might be a better way than IP block. [16:12:49] <_joe_> ok ok :) [16:12:53] Uh, Gerrit has an avatar! https://gerrit.wikimedia.org/r/#/c/123196/ [16:13:05] <^d> scfc_de: Yes, it has for quite some time. [16:13:27] (03CR) 10Giuseppe Lavagetto: [C: 032] "This should solve the problem for future spikes." [operations/puppet] - 10https://gerrit.wikimedia.org/r/123293 (owner: 10Chad) [16:13:55] Never noticed. Hard to recognize that should be a bird. [16:14:17] !log banned tools-exec-03.eqiad.wmflabs. using manual iptables on ytterbium [16:14:23] Logged the message, Master [16:14:25] <^d> scfc_de: Diffy the Kung-Fu Review Cuckoo [16:14:34] <^d> akosiaris: That's one way to do it. [16:14:36] akosiaris: Yay. [16:14:54] <_joe_> akosiaris: we're pushing a rule in apache [16:15:09] scfc_de: find the user. and warn him :) [16:15:32] <_joe_> akosiaris: going to run puppet on ytterbium and remove the iptables rule [16:15:38] <^d> scfc_de: I used to have a bunch of stickers on my desk. Dunno what happened to them. [16:15:55] _joe_: that is a partial solution, if it runs on other exec servers it won't help [16:16:12] that = d^ patch [16:16:49] <_joe_> matanya: ok, but it is still good for now. While we search for root cause [16:17:07] agreed [16:17:18] <^d> manybubbles: Trying to review. [16:17:24] <^d> Gerrit's sick tho :\ [16:17:25] thanks [16:17:31] yeah, I can wait [16:17:37] we have a two hour window which is plenty for this [16:17:38] <_joe_> akosiaris: you banned 10.68.16.32, why not 10.68.16.35 ? [16:17:41] (03CR) 10Chad: [C: 032] Switch most group1 wikis to cirrus as primary [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/123246 (owner: 10Manybubbles) [16:17:43] <^d> Yay, done. [16:17:47] <^d> Finally got through [16:17:48] cool [16:17:49] (03Merged) 10jenkins-bot: Switch most group1 wikis to cirrus as primary [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/123246 (owner: 10Manybubbles) [16:17:54] * manybubbles has the conch [16:18:21] _joe_: I dont really want to ban anyone from labs [16:18:33] but it had to be someone so that gerrit becomes usable again [16:18:57] <^d> Indeed. We need to find the tool and shut it down. [16:18:58] !log manybubbles synchronized cirrus.dblist 'Cirrus as primary for most of group1' [16:19:04] Logged the message, Master [16:19:07] <^d> I want gerrit to be accessible from tool/labs, generally though :) [16:19:36] can one push changes from labs now that it is eqiad? [16:19:45] or still not possible? [16:19:47] <_joe_> akosiaris: yeah it stopped anyway :) [16:19:55] <^d> matanya: You've always been able to over https. [16:19:55] !log manybubbles synchronized wmf-config/InitialiseSettings.php 'Lower timeout on prefix searches and make the cirrus.dblist sync I just did take effect.' [16:20:01] Logged the message, Master [16:20:17] ^d: i meant over git [16:20:23] <_joe_> ^d: ohch, a syntax error :| [16:20:27] <^d> matanya: We don't push over git. [16:20:34] <^d> We push over https or ssh. [16:20:36] *git review [16:20:41] <^d> git-review is ssh. [16:20:53] yeah, does it work? [16:20:54] <^d> and is an awful piece of software that I don't use. [16:21:04] <^d> Wouldn't know, haven't used it in a year :p [16:21:14] (03PS1) 10Ottomata: Removing gerrit_stats class from misc/statistics.pp [operations/puppet] - 10https://gerrit.wikimedia.org/r/123294 [16:21:15] anyone: [16:21:16] * matanya wonders why awful [16:21:25] (03PS2) 10Ottomata: Removing gerrit_stats class from misc/statistics.pp [operations/puppet] - 10https://gerrit.wikimedia.org/r/123294 [16:21:27] matanya: It should work if you are crazy enough to forward your ssh key into labs [16:21:31] Fatal error: Call to a member function getWarningMessageText() on a non-object at /usr/local/...SpecialMobileDiff.php on line 167? [16:21:33] (03CR) 10Ottomata: [C: 032 V: 032] Removing gerrit_stats class from misc/statistics.pp [operations/puppet] - 10https://gerrit.wikimedia.org/r/123294 (owner: 10Ottomata) [16:21:59] bd808: well, i'm not i guess :) [16:22:04] manybubbles: I saw some of those yesterday I think. Let me check [16:22:27] <^d> _joe_: Hm? [16:22:41] ^d: and we're synced [16:22:47] * manybubbles puts conch down [16:23:45] manybubbles: That mobile diff error first shows in logstash on 2014-04-01T19:10:09.000Z [16:23:50] ottomata: what's up? [16:23:53] ^d: searches look to be going through for all kinds of wikis now but no real spike in traffic [16:24:11] <^d> Yeah I saw. [16:24:12] bd808: k. just noticed it while reading error logs - but the power tools are better:) [16:24:34] I don't know if anyone has filed a bug for it yet [16:25:08] hey ottomata, did i win our bet regarding the kafka brokers? i haven't seen a lot of errors lately ;) [16:25:23] I'll make sure not to go anywhere for a while but I don't see anyone complaining:) [16:26:05] <^d> We're primary search for 605 wikis now. [16:26:08] <^d> That's pretty cool [16:26:20] <^d> 68.5% of wikis [16:26:22] (03PS1) 10Ottomata: Making stat1003 use /srv instead of /a [operations/puppet] - 10https://gerrit.wikimedia.org/r/123295 [16:26:44] ^d, how much longer?:P [16:27:08] <^d> Soon. Very soon. [16:27:27] MaxSem: enwiki and dewiki are the only two that have active testers complaining [16:27:42] <^d> We'll be able to get the rest as secondary as soon as we're done juggling the disk space around. [16:27:46] and enwiki is one dude who is a super power user who we might take a long time to really support [16:27:51] ^d: yup [16:27:58] I'd so do that myself if I could [16:28:27] the dewiki guy is finding some difficult problems but they aren't show stoppers [16:28:41] (03PS1) 10Giuseppe Lavagetto: Adding the whole range of offending IPs. [operations/puppet] - 10https://gerrit.wikimedia.org/r/123296 [16:28:45] but load is the big thing, which is why on friday I'll do another load test [16:29:00] we'll have some improvements out there [16:29:22] (03CR) 10Alexandros Kosiaris: ganglia: lint manifest! (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/123196 (owner: 10Hashar) [16:29:25] and we'll get 1.1.0 on tuesday if I'm right so it might be worth doing more load tests again after we've updated to use its shiny features [16:29:39] <^d> We prolly could've done commons today [16:29:43] <^d> Was good to be conservative. [16:29:48] <^d> But probably could've done it [16:30:05] <_joe_> I want to behave and not auto-push my changes [16:30:19] manybubbles: I opened https://bugzilla.wikimedia.org/show_bug.cgi?id=63427 for the SpecialMobileDiff crash [16:30:25] MaxSem: ^ [16:30:27] <_joe_> so, is anyone able to take a look at https://gerrit.wikimedia.org/r/123296? [16:30:49] _joe_: yes. I don't like it [16:31:04] <_joe_> akosiaris: ok, any better suggestion? [16:31:16] since gerrit works, just find the user and tell him to stop [16:31:23] <^d> Indeed. [16:31:23] gerrit sucks akosiaris [16:31:38] matanya: not arguing, but not the point [16:31:49] i posted the same comment as you did on that patch, but it didn't go through [16:31:50] I got root on labs, looking into it [16:31:53] <_joe_> akosiaris: ok, then we need to revert the change before that (the one ^d made) [16:32:12] _joe_: I think so too [16:32:16] <^d> I'm fine with leaving akosiaris' iptables rule in place for now, revert my old change. [16:32:22] (my coment isn't related to this topic here ) ^ [16:32:22] <^d> and find who it is on labs and ask them to stop [16:32:43] <_joe_> ^d: ok, I'll play with my commit then :) [16:33:19] see you all later [16:35:13] (03PS2) 10Giuseppe Lavagetto: Removing the bad_browser rule from apache. [operations/puppet] - 10https://gerrit.wikimedia.org/r/123296 [16:36:00] <_joe_> I can merge this right away, wow I'm starting not to mess up with gerrit [16:36:35] (03CR) 10Giuseppe Lavagetto: [C: 032] "This is a simple revert after some discussion." [operations/puppet] - 10https://gerrit.wikimedia.org/r/123296 (owner: 10Giuseppe Lavagetto) [16:38:26] <_joe_> btw, changing the apache vhost does not trigger apache reload in puppet. I think this is a known issue anyway [16:40:29] <_joe_> ok, good evening! [16:41:17] <^d> _joe|away: Thanks for your help :) [16:46:07] <^d> akosiaris: E-mailed ops list. [16:46:17] <^d> Feel free to add/amend. [16:46:47] (03CR) 10Ottomata: "I like the idea, but I don't want to attempt to manage the 'www-data' user from inside of the apache module. Especially since the 'apache" [operations/puppet/wikimetrics] - 10https://gerrit.wikimedia.org/r/123244 (owner: 10Nuria) [16:47:38] (03CR) 10Ottomata: "Oops, I meant "I don't want to manage the 'www-data' user from inside of the wikimetrics module"." [operations/puppet/wikimetrics] - 10https://gerrit.wikimedia.org/r/123244 (owner: 10Nuria) [16:51:19] (03PS2) 10Ottomata: Making stat1003 use /srv instead of /a [operations/puppet] - 10https://gerrit.wikimedia.org/r/123295 [16:51:52] (03PS3) 10Ottomata: Making stat1003 use /srv instead of /a [operations/puppet] - 10https://gerrit.wikimedia.org/r/123295 [16:53:13] (03PS1) 10Andrew Bogott: Revert "Increase rate limits a bit more to allow for scripted migrations." [operations/puppet] - 10https://gerrit.wikimedia.org/r/123301 [16:53:15] (03PS1) 10Andrew Bogott: Revert "Turn rate limits WAY up for nova api." [operations/puppet] - 10https://gerrit.wikimedia.org/r/123302 [16:55:04] (03PS2) 10Ottomata: Making sure apache can write to /var/lib/wikimetrics directory [operations/puppet/wikimetrics] - 10https://gerrit.wikimedia.org/r/123244 (owner: 10Nuria) [16:55:52] (03CR) 10Andrew Bogott: [C: 032] Revert "Increase rate limits a bit more to allow for scripted migrations." [operations/puppet] - 10https://gerrit.wikimedia.org/r/123301 (owner: 10Andrew Bogott) [16:55:55] (03PS3) 10Ottomata: Making sure apache can write to /var/lib/wikimetrics directory [operations/puppet/wikimetrics] - 10https://gerrit.wikimedia.org/r/123244 (owner: 10Nuria) [16:56:14] (03CR) 10Andrew Bogott: [C: 032] Revert "Turn rate limits WAY up for nova api." [operations/puppet] - 10https://gerrit.wikimedia.org/r/123302 (owner: 10Andrew Bogott) [16:56:25] <_joe|away> akosiaris: I added a few notes on what's happening on gerrit - it has suffered another hit a few minutes ago [16:56:32] <_joe|away> now I *really* gotta run [16:56:57] _joe|away: yeah I noticed too. go [16:57:09] I should be going too [16:57:21] <_joe|away> yeah I figured that as well [16:57:23] <_joe|away> :) [17:00:21] RobH: wikitech-static should be syncing again now. The admin log is way behind so it may take a couple cycles of the cron before it catches up all the way (it seems to time out before it updates all the edits) [17:00:49] !log fixed updating crons on wikitech-status, I think. Time will tell... [17:00:55] Logged the message, Master [17:03:00] (03CR) 10Giuseppe Lavagetto: [C: 04-1] "My suggestion was: once you have the apache2 package in place, you do have www-data as well." [operations/puppet/wikimetrics] - 10https://gerrit.wikimedia.org/r/123244 (owner: 10Nuria) [17:04:22] (03PS4) 10Ottomata: Making stat1003 use /srv instead of /a [operations/puppet] - 10https://gerrit.wikimedia.org/r/123295 [17:04:27] (03CR) 10Ottomata: [C: 032 V: 032] Making stat1003 use /srv instead of /a [operations/puppet] - 10https://gerrit.wikimedia.org/r/123295 (owner: 10Ottomata) [17:05:06] andrewbogott: cool, thanks! [17:06:37] RobH: there were a few problems, but the main one was that it was still trying to wget from 'labsconsole' which caused a cert failure. [17:10:24] andrewbogott: thats amusing since i was one of the opponents of buying a new cert for that ;] [17:10:40] 'its depreciated, folks will figure it out!' [17:11:08] This both proves me right, and demonstrates why its still annoying, heh. [17:13:24] ^d: Caught me during lunch. I'd ask pointed questions in Yuvi's direction given he has a gerrit/github interface bot. [17:13:46] <^d> Coren: There's a thread on ops list about it now :) [17:13:59] ^d: Indeed, and I replied there. [17:14:46] * ^d twiddles thumbs [17:20:54] * ^d clicks refresh a few times [17:20:56] Ah, perhaps not on the thread; I hadn't noticed that the email was sent to me from it dropping the list. Hang on. [17:21:50] (03CR) 10Manybubbles: "Tested this on deployment-bastion (beta)." [operations/software/elasticsearch/plugins] - 10https://gerrit.wikimedia.org/r/121436 (owner: 10Manybubbles) [17:26:18] <^d> Coren: Got it, thanks. [17:27:20] <^d> Yeah, I figured as much. It's pretty easy to spot in the gerrit apache log if it starts happening again. [17:27:37] (03CR) 10Manybubbles: [C: 04-1] "Ah, but beta is 1.1.0 not 1.0.1 so this is the wrong version. will correct" [operations/software/elasticsearch/plugins] - 10https://gerrit.wikimedia.org/r/121436 (owner: 10Manybubbles) [17:37:35] (03PS2) 10Manybubbles: Add icu analysis plugin for 1.1.0 [operations/software/elasticsearch/plugins] - 10https://gerrit.wikimedia.org/r/121436 [17:55:36] (03PS1) 10Ottomata: Ensuring parent directories for limn::mobile_data_sync exist [operations/puppet] - 10https://gerrit.wikimedia.org/r/123350 [17:56:38] apergos: you still around? [17:56:47] i'm puppetizing stat1003 and the dataset1001 mount failed [17:56:52] do we need to do something to allow stat1003 to mount it? [17:57:20] yes, you need to add it to the ataset module, we don't export to just anybody [17:57:25] *dataet [17:57:29] meh [17:57:34] dataset [17:58:12] files/exports [17:59:00] got it, thanks [17:59:31] (03CR) 10Ottomata: [C: 032 V: 032] Ensuring parent directories for limn::mobile_data_sync exist [operations/puppet] - 10https://gerrit.wikimedia.org/r/123350 (owner: 10Ottomata) [17:59:51] (03PS1) 10Ottomata: Exporting dataset mount to stat1003 [operations/puppet] - 10https://gerrit.wikimedia.org/r/123352 [18:00:02] apergos: ^ look ok? [18:00:50] oh it has an external ip? bleah [18:01:09] anyways yeah that looks fine [18:01:53] stat1 does too :/ I'm gong to add ferm to it [18:01:57] in a bit [18:02:09] (03CR) 10Ottomata: [C: 032 V: 032] Exporting dataset mount to stat1003 [operations/puppet] - 10https://gerrit.wikimedia.org/r/123352 (owner: 10Ottomata) [18:02:43] I know stat1 does but I was hoping we would get rid of that little issue :-D [18:02:46] anyhoo [18:05:56] you might have to re-export from ds1001 if you don't see it show up (exportfs -ra) [18:08:18] should there be a website at wmflabs.org:80? apache is installed but not running on virt0, and i see no config for that website. [18:09:01] ah thanks apergos, that did it :) [18:09:55] :-) [18:10:31] hm icinga has an http monitor for virt0 and says it's been down for 1d1h [18:11:12] i can't write udp from vanadium to tungsten (both in eqiad). could someone help me troubleshoot the link? [18:11:12] * jgage scans email for any in-progress work [18:13:26] PROBLEM - Puppet freshness on stat1 is CRITICAL: Last successful Puppet run was Wed 02 Apr 2014 03:12:29 PM UTC [18:14:38] (03CR) 10Ori.livneh: [C: 04-1] "Why?" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/115153 (owner: 10Odder) [18:15:35] (03CR) 10Ottomata: [C: 032] "Looks good to me, you can merge right?" [operations/software/elasticsearch/plugins] - 10https://gerrit.wikimedia.org/r/121436 (owner: 10Manybubbles) [18:15:37] (03PS1) 10RobH: Jon Robson access to flourine [operations/puppet] - 10https://gerrit.wikimedia.org/r/123360 [18:15:48] ottomata: please do [18:16:00] (03PS2) 10Ori.livneh: Jon Robson access to fluorine [operations/puppet] - 10https://gerrit.wikimedia.org/r/123360 (owner: 10RobH) [18:16:03] can I get +2 on that repostiory? [18:16:10] ^d: ^^^ [18:16:20] the elasticsearch/plugins one [18:16:24] RobH: fluorine, not flourine :) [18:16:24] <^d> on puppet?!? [18:16:25] <^d> ;-) [18:17:06] (03CR) 10RobH: [C: 04-1] "3 day wait has to pass, if no issues this can merge on 2014-04-07" [operations/puppet] - 10https://gerrit.wikimedia.org/r/123360 (owner: 10RobH) [18:17:09] heh [18:17:19] (03CR) 10Odder: "Didn't you say you were removing your -1?" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/115153 (owner: 10Odder) [18:17:37] <^d> manybubbles: done-ded [18:17:50] thank you [18:18:12] <^d> yw [18:19:18] (03CR) 10Ori.livneh: "The quickest way to persuade me to +2 this is to provide a clear explanation of why the need for better Hebrew support has to be met by en" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/115153 (owner: 10Odder) [18:19:59] oh ha, weird I was trying to do that for manybubbles too,a nd was like "wahaaaa what happened?" [18:20:03] i didn't save that?! [18:20:26] thanks guys [18:22:16] (03CR) 10Odder: "You said on 2014-03-15 that you are letting hewikisource community call the shot on this; on 2014-03-19 they asked if this was yet fixed." [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/115153 (owner: 10Odder) [18:27:45] (03CR) 10Ori.livneh: "Well, probably you're right -- blocking it with a -1 is inconsistent with my earlier statements, so changing to a 0. Still, I'd be *happy*" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/115153 (owner: 10Odder) [18:32:16] RECOVERY - Puppet freshness on virt0 is OK: puppet ran at Wed Apr 2 18:32:14 UTC 2014 [18:40:20] (03PS1) 10Odder: Enable NewUserMessage extension on Urdu Wikipedia [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/123366 [18:43:22] (03PS1) 10coren: Tool Labs: Fix to exim jmail execution [operations/puppet] - 10https://gerrit.wikimedia.org/r/123369 [18:44:03] (03CR) 10Ottomata: "The original patch that Nuria submitted didn't work. if defined(Package['apache2']) wasn't useable because at the time that wikimetrics c" [operations/puppet/wikimetrics] - 10https://gerrit.wikimedia.org/r/123244 (owner: 10Nuria) [18:45:32] (03CR) 10Ottomata: "So, it turns out that redis is no longer included by role::statistics::crucher classes anyway. We will not need a redis ferm rule on stat" [operations/puppet] - 10https://gerrit.wikimedia.org/r/117670 (owner: 10Matanya) [18:47:18] (03CR) 10coren: [C: 032] "So standard, it's boring." [operations/puppet] - 10https://gerrit.wikimedia.org/r/123112 (owner: 10Yuvipanda) [18:47:28] Coren: woot! [18:47:34] (03CR) 10coren: [C: 032] "Known to work." [operations/puppet] - 10https://gerrit.wikimedia.org/r/123369 (owner: 10coren) [18:47:51] Coren: at this point it should 'just work'? [18:47:57] Coren: portgrabber? if it is pointed at the right box [18:48:06] * YuviPanda forces a puppet urn [18:48:07] *run [18:48:16] Hypothetically, yes. [18:48:25] Lemme restart the admin webservice. [18:49:05] Coren: jenkins bot hasn't gotten to it yet I think. [18:49:15] oh no [18:49:17] wat [18:49:40] i'm an idiot, nevermind [18:51:03] Coren: proxylistener is running :) [18:51:20] YuviPanda: /admin/ should have tried to tell it where it lives. Can you enumerate what has registered itself? [18:51:28] Coren: yeah, moment. [18:51:52] Coren: no, it hasn't asked for it. try again? [18:52:00] Coren: I just started it a second ago or so before your messag.e [18:52:33] Howzabout now? [18:57:21] YuviPanda: ^^? [18:57:29] Coren: looking [18:57:36] hmm, nothing [18:58:25] Coren: moment, meeting about to end. sorry [18:58:40] Ah, sorry, I thought your meeting was over. nvm. [19:01:08] Coren: yeah, was a lull, and now I'm having weird license discussions [19:03:01] !log ori synchronized php-1.23wmf19/extensions/WikimediaEvents 'Update WikimediaEvents for I7fdaa5524: Use simple random sampling to log deprecated usage at 1:100' [19:03:07] Logged the message, Master [19:03:28] !log ori synchronized php-1.23wmf20/extensions/WikimediaEvents 'Update WikimediaEvents for I7fdaa5524: Use simple random sampling to log deprecated usage at 1:100' [19:03:33] Logged the message, Master [19:16:42] (03PS1) 10coren: Tool Labs: bugfix portgrabber [operations/puppet] - 10https://gerrit.wikimedia.org/r/123415 [19:16:46] (03PS1) 10QChris: Allow more http load to gerrit [operations/puppet] - 10https://gerrit.wikimedia.org/r/123416 [19:20:56] wow [19:21:03] qchris: thanks [19:21:19] * Coren pushes Jenkins. [19:21:32] hashar: I doubt the change would buy us much. [19:21:48] hashar: ^d said we're like having 2-3 concurrent connections. [19:21:54] there might be some overload caused by Zuul fetching a ton of changes from Gerrit [19:21:58] ah [19:22:13] But just in case ...ytterbium is idling a lot anyways [19:22:33] what is antimony for ? [19:23:21] Mhmm ... I heard that name around gerrit :-D Let me check ... which machine runs git.wikimedia.org ...? [19:23:26] ahhh [19:23:38] https://wikitech.wikimedia.org/wiki/Antimony [19:23:40] so that should be unrelated to the Gerrit web interface being suddenly super slow [19:23:50] Yes. [19:25:01] I just had a random look ... antimony looks fine in ganglia. [19:25:08] Are we having problems with antimony? [19:27:01] qchris: just me confusing all our servers [19:27:08] was looking at antimony whenever we had some slowness [19:27:28] javac: invalid target release: 1.7 [19:27:30] bah maven [19:27:35] I should get a java course. [19:27:38] You at least know the machines :-) I only know ytterbium and stat{1,1001,1002} :-) [19:27:59] i still miss the best server name ever, eiximenis [19:28:07] hashar: Java loves you. Just love it back. [19:28:14] if i ever have a child, that will be its name. [19:28:48] better than dysprosium [19:29:07] if folks dislike the element names they can take it up with the folks who named said elements ;] [19:29:12] RobH: But I thought you were allergic [19:29:25] marktraceur: to children? i am and wont ever have one [19:29:26] true that [19:29:49] it would still be unfair to punish them since the beginning [19:32:06] (03CR) 10Matanya: "Will be glad to convert this patch to whatever needed." [operations/puppet] - 10https://gerrit.wikimedia.org/r/117670 (owner: 10Matanya) [19:36:45] (03CR) 10coren: [C: 032] "Tested and known to work." [operations/puppet] - 10https://gerrit.wikimedia.org/r/123415 (owner: 10coren) [19:38:03] "Java loves you" sounds to me like the desperate try of a creepy stalker. :-) [19:38:56] PROBLEM - RAID on db1047 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [19:39:56] RECOVERY - RAID on db1047 is OK: OK: optimal, 3 logical, 6 physical [19:42:26] PROBLEM - MySQL Processlist on db1047 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [19:42:56] PROBLEM - RAID on db1047 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [19:42:56] PROBLEM - MySQL Recent Restart on db1047 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [19:43:06] PROBLEM - SSH on db1047 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:43:16] RECOVERY - MySQL Processlist on db1047 is OK: OK 0 unauthenticated, 0 locked, 5 copy to table, 0 statistics [19:43:56] RECOVERY - MySQL Recent Restart on db1047 is OK: OK 678677 seconds since restart [19:43:56] RECOVERY - RAID on db1047 is OK: OK: optimal, 3 logical, 6 physical [19:43:56] RECOVERY - SSH on db1047 is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1.2 (protocol 2.0) [19:46:17] (03PS3) 10Hashar: ganglia: lint manifest! [operations/puppet] - 10https://gerrit.wikimedia.org/r/123196 [19:47:31] (03CR) 10Hashar: "The selector inside a selector is addressed in a follow up patch." (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/123196 (owner: 10Hashar) [19:50:09] (03PS1) 10Hashar: ganglia: address selector in a define [operations/puppet] - 10https://gerrit.wikimedia.org/r/123422 [20:06:07] (03CR) 10Chad: [C: 031] Allow more http load to gerrit [operations/puppet] - 10https://gerrit.wikimedia.org/r/123416 (owner: 10QChris) [20:06:43] (03PS1) 10Ori.livneh: hhvm on beta: remove repack-libmemcached10 hack [operations/puppet] - 10https://gerrit.wikimedia.org/r/123428 [20:08:14] (03CR) 10Ori.livneh: [C: 032] hhvm on beta: remove repack-libmemcached10 hack [operations/puppet] - 10https://gerrit.wikimedia.org/r/123428 (owner: 10Ori.livneh) [20:09:55] (03CR) 10Ori.livneh: "hashar: this should no longer be required; could you try restoring hhvm on contint?" [operations/puppet] - 10https://gerrit.wikimedia.org/r/121675 (owner: 10Hashar) [20:10:45] (03PS1) 10GWicke: Add an account for subbu on Parsoid / Cassandra test hosts [operations/puppet] - 10https://gerrit.wikimedia.org/r/123433 [20:10:54] (03CR) 10Chad: [C: 032 V: 032] Add icu analysis plugin for 1.1.0 [operations/software/elasticsearch/plugins] - 10https://gerrit.wikimedia.org/r/121436 (owner: 10Manybubbles) [20:13:27] (03Restored) 10Hashar: Jenkins job validation (DO NOT SUBMIT) [operations/puppet] - 10https://gerrit.wikimedia.org/r/89002 (owner: 10Hashar) [20:13:32] (03PS7) 10Hashar: Jenkins job validation (DO NOT SUBMIT) [operations/puppet] - 10https://gerrit.wikimedia.org/r/89002 [20:15:19] (03Abandoned) 10Hashar: Jenkins job validation (DO NOT SUBMIT) [operations/puppet] - 10https://gerrit.wikimedia.org/r/89002 (owner: 10Hashar) [20:23:19] !log deployed Parsoid 33471172 with deploy repo sha 5c620e54 [20:23:23] Logged the message, Master [20:35:39] (03CR) 10Hoo man: "nit picking..." (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/123433 (owner: 10GWicke) [20:37:25] ottomata: I'm going to deploy some artifacts required for lucene development to archiva - they are required for building a project I'm working on. I'll make sure the verify shas like I do with the Elasticsearch plugins. [20:38:11] ok cool, sounds good [20:40:16] qchris: want me to merge that gerrit change? [20:40:23] +1 from me [20:40:32] ottomata: the gerrit one? [20:40:34] yes! [20:41:55] k [20:42:02] (03PS2) 10Ottomata: Allow more http load to gerrit [operations/puppet] - 10https://gerrit.wikimedia.org/r/123416 (owner: 10QChris) [20:42:08] (03CR) 10Ottomata: [C: 032 V: 032] Allow more http load to gerrit [operations/puppet] - 10https://gerrit.wikimedia.org/r/123416 (owner: 10QChris) [20:42:25] Thanks! [20:42:59] (03PS1) 10Ottomata: Adding new parser LineCountLogster [operations/debs/logster] - 10https://gerrit.wikimedia.org/r/123438 [20:44:03] and gerrit is down [20:44:21] works again, mh [20:44:36] PROBLEM - Host ms-be1003 is DOWN: PING CRITICAL - Packet loss = 100% [20:44:54] (03PS2) 10Ottomata: Adding new parser LineCountLogster [operations/debs/logster] - 10https://gerrit.wikimedia.org/r/123438 [20:45:15] hoo: https://gerrit.wikimedia.org/r/123416 got merged, which restarted gerrit. [20:45:55] ah, I see [20:50:23] ottomata: so I made a mistake and uploaded something without a pom. now I'm trying to correct it but it won't let me overwrite [20:51:23] hm [20:51:41] you uploaded it to mirrored? [20:52:26] pssh, whatever, try it now :) [20:52:28] if it was mirrored [20:52:31] i jsut unblocked redeployments [21:00:24] ottomata: cool - can you try reblocking them? I think it will let me redeploy artifacts I forgot [21:00:33] just not the same artifact twice [21:00:37] like, it'll let jar if you didn't pom [21:00:59] (03PS3) 10Ottomata: Adding new parser LineCountLogster [operations/debs/logster] - 10https://gerrit.wikimedia.org/r/123438 [21:01:14] ok [21:01:34] manybubbles: redeployments on wikimedia.mirrored are now blocked [21:01:42] thanks, will try again [21:03:59] (03CR) 10Ottomata: [C: 032 V: 032] Adding new parser LineCountLogster [operations/debs/logster] - 10https://gerrit.wikimedia.org/r/123438 (owner: 10Ottomata) [21:06:30] ottomata: bleh, turn redeployments back on.... [21:07:04] ha k [21:07:13] done [21:08:52] × Error encountered while uploading pom file: /var/lib/archiva/repositories/releases/com/google/guava/guava-parent/16.0.1/guava-parent-16.0.1.pom (No such file or directory) [21:09:08] does file exist? [21:09:29] or, rather, path [21:13:25] ah, I think I tried to upload it to the wrong spot [21:13:56] blast, it won't get to mirrored either... [21:13:59] ottomata: ^^ [21:14:26] PROBLEM - Puppet freshness on stat1 is CRITICAL: Last successful Puppet run was Wed 02 Apr 2014 03:12:29 PM UTC [21:15:06] (03PS1) 10Hashar: jobrunner: reduce polling on beta cluster [operations/puppet] - 10https://gerrit.wikimedia.org/r/123444 [21:15:18] I so wanted to get change 123456 :( [21:16:09] hashar: got to work a bit more tonight, heh? :P [21:16:24] is noc (on fenari) in some puppet / whatever repo? Doesn't seem like it [21:16:40] andre__: had a long nap this morning and lunched with my wife :-] So I have to keep up [21:17:18] andre__: somehow I am much more productive at night when the house is calm. Even if tired. [21:17:23] hoo: operations/mediawiki-config.git/wmf-config/ in Gerrit, kind of? [21:17:36] hashar: pretty often the same here, yeah [21:17:41] plus I like the night. :) [21:17:54] (03PS1) 10Ottomata: Bumping debian version to 0.0.6-1 [operations/debs/logster] (debian) - 10https://gerrit.wikimedia.org/r/123445 [21:17:56] (03CR) 10Hashar: "Feel free to add moaaar reviewers. I am not sure who knows about our crazy job system :-]" [operations/puppet] - 10https://gerrit.wikimedia.org/r/123444 (owner: 10Hashar) [21:18:00] andre__: :-] [21:18:10] andre__: I mean especially /usr/local/apache/common/docroot/noc/index.html [21:18:10] andre__: and now I am heading bed. Daughter suddenly awake (she is sick) [21:18:16] have a good night folks! [21:18:20] night! [21:18:23] oo manybubbles [21:18:26] you don't want releases, do you? [21:18:32] you want mirrored, right? [21:18:37] for deps? [21:18:43] yeah [21:18:49] fucking drop down [21:18:51] your error said releases [21:18:58] if I go back and forth it jumps [21:19:03] but, even breaks for mirrored [21:19:35] ah right, that one indeed is in git, but doesn't really seem to be a symlink [21:19:55] (03CR) 10Ottomata: [C: 032 V: 032] Bumping debian version to 0.0.6-1 [operations/debs/logster] (debian) - 10https://gerrit.wikimedia.org/r/123445 (owner: 10Ottomata) [21:23:24] ottomata: × Error encountered while uploading pom file: /var/lib/archiva/repositories/mirrored/com/google/guava/guava-parent/16.0.1/guava-parent-16.0.1.pom (No such file or directory) [21:24:15] wikisloooooooooooooooooooooooow from Italy [21:25:30] hm manybubbles, looking [21:25:36] thanks [21:25:37] guava-parent does not have a 16.0.1 directory [21:25:46] but guava does [21:25:51] not sure what happened [21:25:59] did you just try to upload guava 16.0.1? [21:26:20] i can probably just delete any 16.0.1 directory and restart archiva maybe? [21:26:34] just nuke it and see if that helps without a restart [21:26:43] ok [21:27:00] Vito: Looks ok from Germany [21:27:15] worked [21:27:32] manybubbles: I ahven't done anythign! [21:27:40] why?! [21:27:44] ha [21:27:44] why you spite me archiva! [21:27:47] also another user is experiencing troubles from AS3269 hoo [21:27:50] but I see 16.0.1 guava-parent now :) [21:27:52] iunno! [21:28:00] well, I'll keep plowing ahead [21:28:04] ok! [21:28:15] Vito: Probably network related [21:28:26] do the usual stuff, I guess you know what I mean [21:29:33] csteipp: do you have an OTRS access? there is someone who is reporting technical issues with a block, says blocking them at an IP that is an open proxy, yet they are at another. Tech person so they seem competent [21:29:48] or should this just be copied to a bugzilla? [21:29:49] OH manybubbles, maaan, i'm working on this CirrusSearch-slow.log thing for you [21:29:59] and I just added a parser to logster to allow for counting lines in files [21:30:07] yay! [21:30:07] that can report to statsd or graphite or ganglia [21:30:08] but! [21:30:12] no! [21:30:14] not bu! [21:30:16] but! [21:30:17] i just noticed that we ahve those logs in elasticsearch [21:30:18] i think? [21:30:21] hoo: probably, also AS3269 has stopped doing free peering [21:30:25] huh? [21:30:34] we have other slow logs [21:30:34] sDrewth: probably bugzilla [21:30:40] and I imagine everything goes to logstash [21:30:42] sorry [21:30:43] haha [21:30:43] i meant [21:30:44] logstash [21:30:50] you might also want to skim the xff log on fluorine (or let me do it) [21:30:53] yeah, and it woulda been cool if I had created the alert from logstash somehow [21:30:54] can report on that instead? [21:30:57] rather than what i'm about to do [21:30:57] (03CR) 10Giuseppe Lavagetto: [C: 031] "Did not know of that inter-repo dependency. This version surely makes your life easier :)" [operations/puppet/wikimetrics] - 10https://gerrit.wikimedia.org/r/123244 (owner: 10Nuria) [21:30:57] dunno [21:31:00] what should I do? [21:31:00] sure [21:31:02] go with what i've been working on [21:31:04] whatever you want [21:31:07] ahhhhhahmm [21:31:10] you da man [21:31:17] well, i guess you have elasticsearch stuff in ganglia already huh [21:31:18] alerts from logstash sound cool [21:31:23] might as well have this in ganglia now too? [21:31:28] i was about to put this in ganglia [21:31:33] that'd be good too [21:31:37] and then use check_ganglia to create icinga alerts [21:31:38] I'm happy with anything/everything [21:31:40] ok ok ok [21:32:34] * bd808 looks at a lot of pings from "logstash" mentions [21:32:43] haha [21:32:47] Vito: ... I'd assume good faith unless proven otherwise... but well :/ [21:33:00] bd808: do we have any icinga alerts built on anything in logstash (yet)? [21:33:05] hoo: clearcut bad faith [21:33:22] it's a commercial war from the incumbent towards small local ISPs [21:33:26] PROBLEM - Puppet freshness on stat1003 is CRITICAL: Last successful Puppet run was Wed 02 Apr 2014 06:33:14 PM UTC [21:33:28] hoo: the only issue with that, is that I have to include private data, so would have to do as security, which it isn't [21:33:33] which have no global infrastructure [21:33:53] ottomata: Not that I'm aware of. It would be cool to make a way to alert on the doc count from an elasticsearch search against the log corpus though [21:33:55] sDrewth: Well, make it security than anyway, people do that rather often... better than no bug [21:34:18] you might also use the privatecomment feature, but I'm not sure that's supposed for this ( andre__ ?) [21:34:19] yeah, so I just noticed that you are consuming the logs that manybubbles asked me to create an alert for [21:34:38] he wants to know if there are more than X lines added to CirrusSearch-slow.log in Y period [21:35:11] Yeah. That would be god for many things I imagine [21:35:16] *good [21:35:22] but we dont' have anything that does that yet [21:35:34] hmm ok, i'm just going to make this work as is for now, since i've already put a couple of hours into it :) [21:36:59] <_joe|away> bd808: that's also quite easy to do, given you have easy ways of submitting nagios checks and elasticsearch has a rest interface... [21:37:30] <_joe|away> s/nagios/ichinga/ [21:39:05] (03CR) 10Mattflaschen: "I haven't heard about any such database change to Beta Commons." [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/122084 (owner: 10Mattflaschen) [21:39:49] There's an nsca output plugin for logstash apparently too: http://logstash.net/docs/1.4.0/outputs/nagios_nsca [21:40:26] PROBLEM - Check status of defined EventLogging jobs on vanadium is CRITICAL: CRITICAL: Stopped EventLogging jobs: reporter/statsd [21:40:29] <_joe|away> oh ok :P even easier, do not reinvent the wheel [21:40:29] sDrewth, hoo: If Bugzilla then I'd go for the "Security" product just to be on the safe side. Can still be relaxed by moving to a public ticket with a privatecomment later [21:41:20] ottomata, _joe|away: Here's an interesting blog post on monitoring for missed jobs: http://garthwaite.org/noticing-what-didnt-happen.html [21:42:13] <_joe|away> bd808: ok but this is for logstash, not for querying aggregates on ES - which is also interesting [21:42:31] I think both a way to alert icinga from logstash and a way to poll the elasticsearch fed by logstash from icinga would be useful to have [21:42:53] _joe|away: Yes. It's a slightly different check [21:43:26] RECOVERY - Check status of defined EventLogging jobs on vanadium is OK: OK: All defined EventLogging jobs are runnning. [21:43:35] <_joe|away> bd808: both things may apply. [21:43:53] I'd actually like an icinga check for the logstash input crashing which could be seen if there were less than N records in the last M minutes [21:44:15] <_joe|away> bd808: acking an ES-query plugin for ichinga should be very easy to do. [21:44:24] CRITICAL: too few errors, impossible [21:44:41] The damn thing likes to hang up randomly where the process is alive but it stops processing events [21:44:50] Nemo_bis: :) exactly. [21:45:45] <_joe|away> bd808: that's the downside of not using a proper syslog daemon but something much more complicated :) [21:46:38] _joe|away: Feel free to help think about HA for it -- https://bugzilla.wikimedia.org/show_bug.cgi?id=61785 [21:46:44] <_joe|away> (it is still the best option for a lot of things of course) [21:47:23] Our current udp2log transport was easy to hook up but is super sketchy and locks us into a single active logstash node [21:48:24] I think I'd like to see shippers on various hosts with a reliable message bus as transport to the core logstash cluster [21:48:47] <_joe|away> bd808: ehm, as soon as I have a clearer idea of the setup, I'd be happy to do that [21:48:50] <_joe|away> :) [21:49:34] _joe|away: I'd be glad to walk you through it. It's new, I built it and Ops need to learn it so I can hand over ownership :) [21:50:57] <_joe|away> ah, the newcomer trap! but I play the "it's almost midnight and I will have to take my step-daughter to school tomorrow" card [21:51:23] _joe|away: Today is not required. I'm here all the time. [21:51:37] <_joe|away> eheh [21:52:36] (03CR) 10BryanDavis: "Several pedantic puppet comments and a likely bash bug." (035 comments) [operations/puppet] - 10https://gerrit.wikimedia.org/r/123444 (owner: 10Hashar) [21:56:29] (03PS1) 10Ori.livneh: Follow-up to Iacd6a8250: remove reference to repack-libmemcached10 [operations/puppet] - 10https://gerrit.wikimedia.org/r/123455 [22:04:35] (03PS1) 10Jean-Frédéric: Add Musées de la Haute-Saône to wgCopyUploadsDomains [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/123459 [22:06:07] (03PS1) 10Ottomata: Using slope both for LineCountLogster line_rate metric [operations/debs/logster] - 10https://gerrit.wikimedia.org/r/123460 [22:06:23] (03CR) 10Ottomata: [C: 032 V: 032] Using slope both for LineCountLogster line_rate metric [operations/debs/logster] - 10https://gerrit.wikimedia.org/r/123460 (owner: 10Ottomata) [22:16:07] (03PS1) 10Chad: Opt all italian wikis into interwiki search [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/123461 [22:17:03] (03PS1) 10Ottomata: 0.0.7-1 version [operations/debs/logster] (debian) - 10https://gerrit.wikimedia.org/r/123462 [22:18:05] RoanKattouw, do you have time for a quick question about LVS [22:18:06] ? [22:18:54] (03Abandoned) 10Ottomata: 0.0.7-1 version [operations/debs/logster] (debian) - 10https://gerrit.wikimedia.org/r/123462 (owner: 10Ottomata) [22:21:51] gwicke: Syre [22:22:13] RoanKattouw, are you aware of any time limits for balanced tcp connections? [22:22:37] my understanding is that a single incoming tcp connection is mapped to a single backend as long as it's active [22:22:56] I'm not aware of any such limits [22:23:00] with incoming traffic traversing the LVS host, and return traffic going direct [22:23:05] Yes, that's my understanding too [22:23:16] so LVS would have information about continuing incoming traffic [22:23:48] so something like spdy which uses a single connection would just end up talking to a single backend all the time even with round-robin [22:23:56] I think so [22:24:08] k [22:24:11] thx [22:26:11] (03CR) 10Nemo bis: [C: 031] "Yes please, whenever possible. Local discussions on those wikis showed large consensus and openness to try all things Cirrus" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/123461 (owner: 10Chad) [22:34:21] (03PS2) 10GWicke: Add an account for subbu on Parsoid / Cassandra test hosts [operations/puppet] - 10https://gerrit.wikimedia.org/r/123433 [22:34:27] (03CR) 10GWicke: Add an account for subbu on Parsoid / Cassandra test hosts (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/123433 (owner: 10GWicke) [22:36:31] (03PS1) 10Ottomata: Adding logster module and using it to monitor CirrusSearch-slow.log [operations/puppet] - 10https://gerrit.wikimedia.org/r/123466 [22:37:46] (03PS1) 10coren: Tool Labs: Debugging proxylistener [operations/puppet] - 10https://gerrit.wikimedia.org/r/123467 [22:37:50] (03PS2) 10Ottomata: Adding logster module and using it to monitor CirrusSearch-slow.log [operations/puppet] - 10https://gerrit.wikimedia.org/r/123466 [22:41:52] (03PS1) 10Ottomata: 0.0.7-1 version [operations/debs/logster] (debian) - 10https://gerrit.wikimedia.org/r/123469 [22:42:04] (03CR) 10Ottomata: [C: 032 V: 032] 0.0.7-1 version [operations/debs/logster] (debian) - 10https://gerrit.wikimedia.org/r/123469 (owner: 10Ottomata) [22:58:16] (03PS1) 10Ori.livneh: Revert "Beta: use MemcachedPhpBagOStuff if running under HHVM" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/123471 [22:58:48] (03CR) 10Ori.livneh: [C: 032] Revert "Beta: use MemcachedPhpBagOStuff if running under HHVM" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/123471 (owner: 10Ori.livneh) [22:58:55] (03Merged) 10jenkins-bot: Revert "Beta: use MemcachedPhpBagOStuff if running under HHVM" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/123471 (owner: 10Ori.livneh) [23:03:28] (03CR) 10Aaron Schulz: jobrunner: reduce polling on beta cluster (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/123444 (owner: 10Hashar) [23:07:42] (03CR) 10coren: [C: 032] Tool Labs: Debugging proxylistener [operations/puppet] - 10https://gerrit.wikimedia.org/r/123467 (owner: 10coren) [23:29:51] (03PS6) 10Ori.livneh: [WIP] Make role::graphite work in labs [operations/puppet] - 10https://gerrit.wikimedia.org/r/122859 (owner: 10BryanDavis) [23:30:17] (03PS7) 10Ori.livneh: Make role::graphite work in Labs [operations/puppet] - 10https://gerrit.wikimedia.org/r/122859 (owner: 10BryanDavis) [23:35:17] (03CR) 10Ori.livneh: [C: 032] Make role::graphite work in Labs [operations/puppet] - 10https://gerrit.wikimedia.org/r/122859 (owner: 10BryanDavis) [23:38:13] MaxSem: your fix hasn't been deployed yet, right? [23:38:28] greg-g, nope:( [23:38:29] today's swat, that is [23:39:02] ori: mwalker and or ebernhardson ping-a-ling re SWAT, Max here has a request :) https://gerrit.wikimedia.org/r/#/c/123454/ [23:39:18] oh, and if RoanKattouw is on duty today [23:39:22] ah [23:39:26] hokay [23:39:35] greg-g: Present [23:39:43] I really hate that google calneder can only give me desktop reminders if it's open [23:39:48] I'm only out on Tues & Thurs [23:39:56] really, we should just max a "both windows" deployer :) [23:40:04] * MaxSem hides [23:40:08] MaxSem: what say you? :) [23:40:13] hehe; I'll deploy it [23:40:23] ty mwalker [23:40:45] * mwalker is also going to have to write a bot that pings people when their windows are up [23:40:45] thanks mwalker, I'm still awake but it's a bit scary to deploy so late [23:40:53] np max [23:44:36] (03PS1) 10Aaron Schulz: Bumped wgJobBackoffThrottling for htmlCacheUpdate to 15 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/123480 [23:45:09] (03CR) 10Aaron Schulz: [C: 032] Bumped wgJobBackoffThrottling for htmlCacheUpdate to 15 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/123480 (owner: 10Aaron Schulz) [23:46:08] !log mwalker synchronized php-1.23wmf20/extensions/MobileFrontend 'SWAT deploy for MaxSem' [23:46:13] Logged the message, Master [23:46:24] (03Merged) 10jenkins-bot: Bumped wgJobBackoffThrottling for htmlCacheUpdate to 15 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/123480 (owner: 10Aaron Schulz) [23:46:46] !log ... deploy was for mobile frontend {{gerrit|123454}} [23:47:26] Logged the message, Master [23:47:50] thanks mwalker, I confirm that http://km.wiktionary.org/wiki/ពិសេស:MobileDiff/37051 now works [23:47:55] !log aaron synchronized wmf-config/CommonSettings.php 'Bumped wgJobBackoffThrottling for htmlCacheUpdate to 15' [23:48:01] Logged the message, Master [23:48:09] whoo; and I'm not seeing any fatals or exceptions related to mobile frontend [23:48:12] in the logs [23:52:53] * AaronSchulz wonders why db errors never have corresponding exception.log entries [23:53:08] * AaronSchulz wonders what was calling Title::countRevisionsBetween() [23:53:56] ty mwalker