[00:00:36] it is out of erik's budget [00:00:58] ahh, then my comment is immaterial [00:01:05] and spaces have been allocated mostly [00:01:33] except for speakers (and only a handful left) [00:02:24] !log profiling collector was pegged at 100% cpu and graphs were turned to swiss cheese due to a bad stats call in 1.20, now fixed [00:02:26] Logged the message, Master [00:05:12] aude: I'll just wait on the waiting list, it's no big deal [00:06:09] if my talk naturally gets added, I'll go, otherwise I'll pass my travel slot to someone else in tech who hasn't gone to wikimania [00:07:48] I was just kind of surprised that the community doesn't want to hear about how bots and tools will work in this system [00:08:08] Ryan_Lane: it has to work to get you in the program :) [00:10:19] Ryan_Lane: i think i see a spot where someone said they can't come [00:57:23] PROBLEM - mysqld processes on db58 is CRITICAL: PROCS CRITICAL: 0 processes with command name mysqld [01:40:53] PROBLEM - Puppet freshness on amslvs2 is CRITICAL: Puppet has not run in the last 10 hours [01:43:17] PROBLEM - MySQL Slave Delay on storage3 is CRITICAL: CRIT replication delay 254 seconds [01:46:09] RECOVERY - MySQL Slave Delay on storage3 is OK: OK replication delay 5 seconds [03:18:12] RECOVERY - mysqld processes on db58 is OK: PROCS OK: 1 process with command name mysqld [03:21:29] PROBLEM - MySQL Replication Heartbeat on db58 is CRITICAL: CRIT replication delay 8343 seconds [03:28:50] PROBLEM - MySQL Slave Delay on db58 is CRITICAL: CRIT replication delay 7418 seconds [03:37:05] RECOVERY - MySQL Replication Heartbeat on db58 is OK: OK replication delay 0 seconds [03:37:14] RECOVERY - MySQL Slave Delay on db58 is OK: OK replication delay 0 seconds [04:32:32] PROBLEM - Host lvs6 is DOWN: PING CRITICAL - Packet loss = 100% [04:34:20] PROBLEM - BGP status on cr1-sdtpa is CRITICAL: CRITICAL: host 208.80.152.196, sessions up: 7, down: 1, shutdown: 0BRPeering with AS64600 not established - BR [04:37:30] New patchset: Jeremyb; "simplify wrapper" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/5778 [04:37:47] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/5778 [05:23:44] hm [05:23:46] anyone alive? [05:23:49] * jeremyb  [05:24:00] well, anyone with access :) [05:24:07] * jeremyb not [05:28:00] PROBLEM - Puppet freshness on amslvs4 is CRITICAL: Puppet has not run in the last 10 hours [05:28:20] paravoid: you've seen lvs6? ^^ [05:29:28] that's what I was trying to do [05:35:03] RECOVERY - Host lvs6 is UP: PING OK - Packet loss = 0%, RTA = 0.23 ms [05:35:12] RECOVERY - BGP status on cr1-sdtpa is OK: OK: host 208.80.152.196, sessions up: 8, down: 0, shutdown: 0 [05:35:40] !log powercycled lvs6, was dead and not responding to serial [05:35:43] Logged the message, Master [05:35:57] so, what's the answer? [05:37:06] what's the question? :) [05:37:26] what changed in your knowledge of lvs6? [05:37:45] or was it just unfamiliar and you figured it out but slower than if someone had been around? [05:38:25] newer dracs need "console com2" instead of "connect com2" [05:38:40] how friendly [05:38:43] I was typing "connect com2" and getting a cryptic message back [09:01:00] mutante: good luck with all the boxes :-]] [09:01:21] statistically there must be one with a screwed DIMM [09:04:35] hashar: arr, thanks. actually, fail to connect to the very first one [09:04:45] told you :-D [09:05:04] try to get the other one, cause that first one might be the screwed one [09:05:34] * hashar hears a facepalm noise in The Netherlands [09:05:51] yea, but start with 1002 instead of 1001 in the naming scheme [09:47:33] New patchset: Nikerabbit; "Cron entries for TranslationNotifications" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/5783 [09:47:51] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/5783 [10:45:00] PROBLEM - check_all_memcacheds on spence is CRITICAL: MEMCACHED CRITICAL - Could not connect: 10.0.11.27:11000 (timeout) 10.0.11.32:11000 (timeout) 10.0.8.23:11000 (timeout) 10.0.8.39:11000 (timeout) [10:47:42] RECOVERY - check_all_memcacheds on spence is OK: MEMCACHED OK - All memcacheds are online [11:02:45] New review: Siebrand; "(no comment)" [operations/puppet] (production) C: 0; - https://gerrit.wikimedia.org/r/5783 [11:37:41] PROBLEM - check_all_memcacheds on spence is CRITICAL: MEMCACHED CRITICAL - Could not connect: 10.0.8.23:11000 (timeout) 10.0.8.39:11000 (timeout) [11:39:11] RECOVERY - check_all_memcacheds on spence is OK: MEMCACHED OK - All memcacheds are online [11:39:48] !log running authdns-update to add analysis mgmt names [11:39:50] Logged the message, Master [11:42:20] PROBLEM - Puppet freshness on amslvs2 is CRITICAL: Puppet has not run in the last 10 hours [12:07:37] PROBLEM - check_all_memcacheds on spence is CRITICAL: MEMCACHED CRITICAL - Could not connect: 10.0.8.39:11000 (timeout) [12:11:58] RECOVERY - check_all_memcacheds on spence is OK: MEMCACHED OK - All memcacheds are online [12:16:19] PROBLEM - check_all_memcacheds on spence is CRITICAL: MEMCACHED CRITICAL - Could not connect: 10.0.8.23:11000 (timeout) [12:19:10] RECOVERY - check_all_memcacheds on spence is OK: MEMCACHED OK - All memcacheds are online [12:29:46] !log Sending US, Brazil, Indian traffic to upload.eqiad [12:29:49] Logged the message, Master [12:32:09] New patchset: Mark Bergsma; "Silence cron spam" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/5791 [12:32:26] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/5791 [12:32:34] New review: Mark Bergsma; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/5791 [12:32:37] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/5791 [12:33:04] hi hashar, could you maybe have a quick look at: https://gerrit.wikimedia.org/r/5717 it is about setting up stat1 for erik zachte [12:33:50] diederik: can't today sorry [12:34:36] mutante: if you can take the time to merge & restart mysql yeah please do it . Should be all about merging both changes, running puppet, crossing fingers and restarting mysql [12:34:43] ok, i'll shop around some more :) [12:35:41] hashar: ok, doing that now, cause installing these servers will take more time. i know one is already live anyways and the other has asher review [12:35:48] diederik: you will want an op to review it then merge / deploy it :-] [12:35:58] diederik: sorry, already too many stuff to track :-] [12:36:32] mutante: so you have 15% of servers installed :-] [12:36:37] diederik: you already have that:) [12:37:00] diederik: oh, no, i see, you added the right R packages..ok [12:37:46] hashar: i have preparational work like mgmt DNS entries, updated racktables, document the mgmt CLI commands .. .:P [12:41:55] New review: Dzahn; "yea, this is puppetizing a life hack which is good and should not actually change stuff and" [operations/puppet] (production); V: 1 C: 2; - https://gerrit.wikimedia.org/r/4395 [12:41:58] Change merged: Dzahn; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/4395 [12:44:35] New review: Dzahn; "more contint db config. has Asher review and Facebook-only config line has been removed" [operations/puppet] (production); V: 1 C: 2; - https://gerrit.wikimedia.org/r/4400 [12:44:38] Change merged: Dzahn; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/4400 [12:46:27] \o/ [12:46:53] hashar: but still needs a fix :p [12:47:05] error in the erb template [12:47:09] argHGH [12:47:15] !erb [12:47:15] to check the syntax of a puppet erb template: erb -x -T '-' mytemplate.erb | ruby -c [12:47:16] :) [12:47:36] mysql/log_slow_queries.cnf.erb:8: syntax error, unexpected ';', expecting ')' [12:47:55] mysql/log_slow_queries.cnf.erb:9: syntax error, unexpected ')' [12:48:39] whoever invented the ; as a line terminator back in the 1970's deserve a blame stick [12:49:32] New review: Mark Bergsma; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/5717 [12:49:35] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/5717 [12:50:28] ah, thanks Mark, i was going to do that next [12:50:33] diederik: ^ [12:50:45] thanks guys! [12:56:39] !g Icda77ab48e67624ceabf2d9b7b3b259d9d84aa53 [12:56:39] https://gerrit.wikimedia.org/r/Icda77ab48e67624ceabf2d9b7b3b259d9d84aa53 [12:56:47] OH MY GOD [12:56:53] that does not work :-( [12:58:31] New patchset: Hashar; "log_slow_query mysql template was invalid" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/5793 [12:58:39] mutante: https://gerrit.wikimedia.org/r/5793 should fix the erg issue [12:58:48] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/5793 [12:58:55] New review: Hashar; "ERB syntax errors corrected with https://gerrit.wikimedia.org/r/5793" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/4400 [12:59:57] New review: Dzahn; "sure, syntax error fix" [operations/puppet] (production); V: 1 C: 2; - https://gerrit.wikimedia.org/r/5793 [13:00:00] Change merged: Dzahn; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/5793 [13:01:21] hashar: much better:) applying configuration... [13:01:58] Contint::Test::Testswarm/File[/etc/mysql/conf.d/log_slow_queries.cnf]: Scheduling refresh of Service[mysql] [13:02:09] ahah [13:02:23] sounds like puppet is able to break mysql on its own [13:02:25] \o/ [13:03:50] so yeah, what is it waiting for right now [13:05:23] sure about the "subscribe" to the 2 files..? [13:07:33] want me to stop puppet and start mysql manually? how long is your current maintenance time ?;) [13:08:07] as short as you can :-] [13:08:16] puppet is sitting there, not stopping but not finishing either [13:08:40] I would ^C puppet and try restarting mysqld manually [13:08:46] seems down for now : https://integration.mediawiki.org/testswarm/ [13:11:24] ok, better. stopped mysql, puppet run finished [13:11:57] start: Job is already running: mysql [13:13:05] I am connecting to mysql [13:13:13] where's the PID file [13:13:16] looking [13:13:40] /var/run/mysqld empty :( [13:14:40] $ status mysql [13:14:40] mysql respawn/post-start, (post-start) process 24369 [13:14:47] service mysql restart "Since the script you are attempting to invoke has been converted ..Upstart job, you may also use the start(8) utility," [13:16:37] Misc::Contint::Test::Testswarm/Service[mysql]/ensure: ensure changed 'stopped' to 'running' [13:18:05] maybe /var/log/daemon.log has some clues ? :/ [13:18:35] gallium init: mysql post-start process (25287) terminated with status 1 [13:18:45] gallium init: mysql main process (25398) terminated with status 2 [13:19:02] init: mysql main process ended, respawning [13:21:10] 120425 13:20:13 [ERROR] Can't open the mysql.plugin table. Please run mysql_upgrade to create it. [13:21:14] hmm [13:21:17] maybe cause I am not root [13:21:23] mysqld: Table 'mysql.plugin' doesn't exist [13:21:34] how about this? leave the service { "mysql" in there but remove the subscribe to the files for now and it should be back to before, right? mysql was just started as a regular service before and no packages were changed [13:22:07] I think there is another issue [13:22:11] mysqld --help --verbose > /dev/null [13:22:16] let me paste the result of that [13:22:22] 120425 13:21:40 [ERROR] Can't open the mysql.plugin table. Please run mysql_upgrade to create it. [13:22:33] so maybe mysql got magically upgraded at some point ? :( [13:23:40] from dpkg.log 2012-04-05 13:34:23 upgrade mysql-server 5.1.41-3ubuntu12.10 5.1.61-0ubuntu0.10.04.1 [13:23:48] might not have been restarted [13:23:53] but unrelated to the puppet change.all you added was to ensure the service is running [13:24:00] yup [13:24:15] but maybe mysql was not fully upgraded [13:24:41] hence he was running in a state which would not let it restart [13:24:45] so it would have broken at next restart ..and we just triggered it, yep [13:25:06] that is what I suspect [13:26:10] how about dist-upgrading to new mysql versions we are being offered [13:26:28] you also had -ubuntu before , right [13:26:51] thats what made you remove the facebook-only config line [13:27:48] !log running apt-get upgrade on gallium [13:27:49] yeaht that is the stock one [13:27:50] Logged the message, Master [13:28:44] Setting up mysql-server-core-5.1 (5.1.62-0ubuntu0.10.04.1) ... etc... [13:28:56] anyone here can make changes to meta css file ? [13:29:09] if not any suggestions how to do that? [13:29:39] New review: Mark Bergsma; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/5768 [13:29:42] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/5768 [13:30:24] hashar: sigh, it also kind of hangs at setting it up now :p [13:31:33] ahah [13:31:37] we are screwed :-( [13:31:45] maybe cause there is lot of rows? [13:33:18] so indeed the script is at 'start mysql' [13:33:19] :( [13:33:20] /bin/bash -e /var/lib/dpkg/info/mysql-server-5.1.postinst configure [13:34:51] killed it, dpkg was done otherwise.. at least that doesnt look broken now [13:35:49] !gallium - dpkg-reconfigure mysql-server-5.1 [13:36:50] !log gallium - dpkg-reconfigure mysql-server-5.1, mysql does not start right [13:36:53] Logged the message, Master [13:40:17] hashar: mysql> ! [13:40:51] Krinkle: there :) [13:41:01] alrighty [13:41:08] mutante: just in time!! [13:41:18] I mean before Timo could even complain about https://integration.mediawiki.org/testswarm/ being dead hehe [13:41:28] almost ;-) [13:41:39] !log gallium/testswarm - back up after mysql upgrade and issue starting the service [13:41:40] but really, no problem. the swarm clients are long-running in the browsers [13:41:41] Logged the message, Master [13:41:56] they use frames and ajax for everything, the clients won't die or hang [13:42:08] they'll just keep trying every 30 seconds until it works again and then continue as if nothing happened [13:42:10] mutante: can you check if it logs any slow queries ? [13:42:26] should be in /var/log/mysql something [13:42:39] all my swarm clients are still connected and back in the swarm now [13:43:25] hashar: let me move your config files back, i just removed them to make sure they did not cause anything..one more restart then [13:43:53] actually, let puppet do it and see if nothing happens to the service there either [13:45:23] Contint::Test::Testswarm/File[/etc/mysql/conf.d/log_slow_queries.cnf]/ensure: defined content Scheduling refresh of Service[mysql] .... [13:45:30] * hashar crosses fingers [13:45:46] it takes so long again... [13:46:54] nope, not looking good ...:( [13:49:55] so that must be one of the changes ? [13:50:24] ohh [13:50:35] log_slow_queries.cnf has a line showing 'false' [13:50:43] must be the stupid template trick [13:52:27] !log gallium stopped puppet, moved log_slow_queries config, re-setting up mysql again [13:52:29] Logged the message, Master [13:54:58] !erb [13:54:58] to check the syntax of a puppet erb template: erb -x -T '-' mytemplate.erb | ruby -c [13:55:00] hashar: its the innodb buffer log size thing [13:55:19] back up again [13:55:35] :-( [13:55:38] buffer pool size i meant [13:55:52] the erg template must be wrong somehow [13:55:59] err ERB template must be wrong somehow [13:56:16] I am fixing the log_slow_queries.cnf.erb one [13:57:37] nnoDB: WARNING: over 67 percent of the buffer pool is occupied by lock heaps or the adaptive hash index [13:57:51] well that sounds like the reason you want to change its size [13:58:03] New patchset: Ottomata; "statistics.pp - Ah, need libxt-dev in order for R to build and install Cairo R library." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/5795 [13:58:06] brb, puppet wont break it for now [13:58:09] stopped the agent [13:58:20] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/5795 [13:58:37] New patchset: Hashar; "gallium mysql templates were wrong again" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/5796 [13:58:54] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/5796 [13:58:57] mutante: taking a coffee. https://gerrit.wikimedia.org/r/5796 should fix the template [13:59:18] need to find out how the files will be generated though [13:59:29] I probably should have tested all of that in a labs first :-( [14:00:31] hashar: same here, and actually still need to install servers, well, it is up now and you got newer mysql packages [14:00:52] that is some progress! [14:01:02] heh:) [14:01:09] I will play with labs and polish it up [14:01:11] thanks Daniel! [14:01:30] np Antoine [14:03:16] !log gallium - don't start puppet unless the erb template fix for mysql has been merged [14:03:19] Logged the message, Master [14:29:55] hioooo [14:30:03] looking into another RT ticket [14:30:11] having trouble with some NFS mounts [14:30:17] what are these two IPs? [14:30:35] 208.80.152.185, 10.0.5.8 [14:38:35] hi domas, to continue the conversation about webstatscollector, (and forgive my lack of knowledge of berkekely-db) if the db is basically all in memory, do you still need to set the DB_CREATE and DB_TRUNCATE flags when you open the handle? [14:55:50] ottomata, 208.80.152.185 seems to be dataset2 [14:58:18] thanks Platonides [15:08:12] /topic [15:12:39] mark: if you have a moment, I am going to run the mgmt connections for row c, I put the port assignments in https://rt.wikimedia.org/Ticket/Display.html?id=2859 [15:18:52] damn it, my headphones just got pulled out of my ears by catching on something, and the silicone earbud popped off and disappeared [15:18:54] ;_; [15:19:03] I hate it when that happens [15:20:04] today just went to shit [15:20:10] all cuz i have no tunes [15:20:42] You should get some cans and look like a tool but rock out to the bass [15:21:07] i prolly should, i spend so much time in the dc anyhow [15:21:16] but i hate stuff ON my ears, they get warm [15:21:42] now i get to wear a single earbud and try to fashion a temp earbud collar out of earplug material ;] [15:22:06] when does the DC meetup group get to come visit? ;) [15:22:20] I use to wear ear defenders over earbuds when spending lots of time in the dc, mainly so I could hear my phone lol [15:22:23] or wikimania field trip ;) [15:23:11] i lost mine too [15:26:45] aude: uhhhhhh [15:26:53] i guess i need to do something about that [15:27:02] im going to say i will look into it, and promptly forget again ;] [15:27:20] :) [15:27:45] i thought i emailed our eq rep about this, cuz there is a 'pbx tour guide' checkbox in the user mgmt [15:27:51] lemme see if i can find the email and resend [15:27:57] it sounds boring yet some people would think it's interesting [15:28:04] RobH: cool :) [15:28:07] New review: Hashar; "Someone needs to check that the ERB templates actually generate a valid MySQL configuration. That ca..." [operations/puppet] (production) C: 0; - https://gerrit.wikimedia.org/r/5796 [15:29:00] !log hashar: gallium: MySQL had issues most probably because of the mysql configuration snippets. https://gerrit.wikimedia.org/r/5796 might solve that. [15:29:04] Logged the message, Master [15:29:04] PROBLEM - Puppet freshness on amslvs4 is CRITICAL: Puppet has not run in the last 10 hours [15:29:49] New review: Hashar; "I attempted testing them in labs but since I have no merge rights in the test branch, I can't get th..." [operations/puppet] (production) C: 0; - https://gerrit.wikimedia.org/r/5796 [15:29:55] maaaaark mark mark mark [15:30:02] maybe you have info for me about this [15:30:03] yes? [15:30:10] i'm working on this ticket [15:30:10] http://rt.wikimedia.org/Ticket/Display.html?id=2162 [15:30:19] trying to mount two machines on the new stat1 server [15:30:36] aude: resent the email to our EQ rep, I know I wanted to get something setup for the board, organizers, etc. [15:30:47] but there is no way it will ever be an open 'everyone can sign up' kinda thing [15:30:49] RobH: sounds good [15:30:50] but, they are not happy, give messages like this with the current options [15:30:51] mount: 10.0.5.8:/home/wikipedia/wikistats: can't read superblock [15:30:57] but, if I add the nolock option [15:30:59] RobH: understand [15:30:59] it mounts [15:31:00] so would be like the board, the wikimania organizers who wanna see it, etc. [15:31:14] does it really need NFS? [15:31:16] and why? [15:31:21] mutante: so I wanted to test the gallium mysql templates but eventually gave up. I can't really have them deployed on labs. We will see that later next week when I am back from my looong weekend [15:31:25] we really hate NFS and are getting rid of it as much as possible [15:31:28] RobH: i think that works [15:31:29] good question, I am just trying to complete the ticket :) [15:32:08] I don't know what those remote NFS machines are [15:32:11] one is dataset2 [15:32:27] Erik Z uses this to generate some stats for the report card [15:32:32] we are going to replace this system eventually [15:32:35] but for now we are stuck with it [15:32:59] so it might be easiest just to get this running for now, until we get a new way of generating analytics data up ( > 6mo for sure) [15:33:25] nolock would be dangerous if anyone else is using the mounts, right? [15:33:35] either on the hosts or other remote nfs mounts [15:33:50] i doubt they are mounted anywhere other than bayes (deprecated) and stat1 (not yet) [15:34:00] so we could umount on bayes and use nolock… [15:34:08] or maybe if I umount on bayes I can mount on stat1 without nolock. [15:34:09] hm [15:34:17] will try that [15:34:34] but. in the meantime, do you what version of NFS those remote hosts are running? [15:34:48] if only /home/wikipedia/htdocs/wikipedia.org/wikistats is mounted [15:34:53] why don't we just move that directory onto stat1 then? [15:35:11] if nothing else mounts that dir, then there's no point in having NFS is there [15:35:19] and if something else does, locking is indeed a problem [15:35:21] there are 3 directories mounted, 2 from 10.0.5.8 and 1 from 208.80.152.185 [15:35:24] i think 208.80.152.185 is dataset1 [15:35:26] where xml dumps are stored [15:35:50] yeah [15:35:55] so the first one doesn't seem necessary [15:35:59] the second one is mediawiki [15:36:03] that might be necessary for something [15:36:15] and should probably be read-only? [15:36:28] hmm, maybe [15:36:35] the xmldump one probably for sure [15:36:39] yes [15:36:52] what machine is 10.0.5.8? what's it for? [15:36:54] mark, ottomate: i don't think we need NFS [15:37:00] 10.0.5.8 is /home [15:37:02] i think Erik just wants his data [15:37:15] can he just rsync it over? [15:37:16] sure, we should copy the existing wikistats directory off /home [15:37:19] I don't see why not [15:37:22] if nothing else uses that [15:37:28] so easiest is to see what data he uses / needs and copy it [15:37:30] and drop the mounts [15:37:32] indeed [15:37:42] yes, erik can rsync [15:37:44] if we copy /home/wikipedia over to stat1 and then have him work there from then on? [15:37:49] and just drop it elsewhere? [15:37:50] nono [15:37:52] not /home/wikipedia [15:37:52] no. [15:38:08] /home/wikipedia/htdocs/wikipedia.org/wikistats we can copy [15:38:11] ok [15:38:14] /home/wikipedia in its entirety we cannot [15:38:16] and /home/wikipedia/wikistats? [15:38:17] ok [15:38:20] so what else is needed there needs to be investigated [15:38:23] ok [15:38:29] yeah, i'm not sure exactly what he needs, it is a little confusing [15:38:35] we'll talk to him abou thte wikistats things then [15:38:38] i'm sure it is :) [15:38:38] as for dataset1 [15:38:48] i am waiting for erik to come online [15:38:50] can/should we still NFS that? [15:38:54] and i''ll ask him all that stuff [15:39:01] I can imagine that the dataset mount is needed [15:39:05] but I don't know what it's being used for [15:39:09] possibly just to READ data dumps [15:39:19] in that case it can become a read-only NFS mount [15:39:20] that's sounds likely [15:39:50] yeah probably [15:40:17] * apergos grits their teeth a bit [15:40:24] there's some pagecount stuff he munges [15:40:33] yes he does that as well [15:40:36] it may be that he'll want write for that [15:40:44] I don't remember how it's set up, even though I set it up :-/ [15:40:49] don't think so [15:40:52] but i'll ask him [15:40:57] so I shoud stab you for introducing another NFS mount then eh apergos [15:41:00] no [15:41:03] yes [15:41:09] hah [15:41:10] I didn't introduce this [15:41:17] you just said you set it up [15:41:19] ther eis a gluster copy of the most recent 5 dumps [15:41:21] maybe erik rsync thoses files [15:41:37] I would love it if he could use those ventually [15:41:51] we can setup now if that makes sense [15:42:00] the pageview stuff should go via rsync or whatever [15:42:08] indeed [15:42:24] I think we sohuld make that mount available to him (the gluster volume) and we'll find out what works and doesn't work [15:42:31] ok, still having trouble mounting dataset1 though [15:42:31] access denied by server while mounting 208.80.152.185:/data [15:42:34] even in ro [15:42:37] he's the first user so I expct we'll find various problems [15:42:43] sudo mount -t nfs -v -o ro 208.80.152.185:/data /mnt/data [15:42:52] apergos: can you work with them on that? [15:42:56] did you add yourself into the stanza in pppet? [15:43:06] the host, I mean [15:43:28] hmm, for export stuff? [15:43:29] so there's three steps. 1) puppet change for exports of dataset2 (not dataset1, it doesn't exist) [15:43:32] mark: can you approve https://gerrit.wikimedia.org/r/#change,5795 as well? [15:43:37] dataset2 aye [15:43:38] ok cool [15:43:40] will check [15:43:56] 2) puppet run on ds2 [15:44:12] New review: Mark Bergsma; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/5795 [15:44:15] 3) re-export ds2 (puppet can't do that right) [15:44:15] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/5795 [15:44:31] it can't? [15:44:33] thanks mark! [15:44:35] k, i don't have access to ds2 [15:44:43] re: NFS on stat1: https://gerrit.wikimedia.org/r/#patch,sidebyside,5709,1,manifests/misc/statistics.pp <-- there are the NFS mounts, added FIXMEs, the IPs and pathes are per ezachte from an RT [15:44:57] I'll do the puppet run and the re-export [15:45:05] thanks [15:45:16] add a big FIXME to get stat1 away from NFS there [15:45:19] <^demon> Do we have a generic wikimedia logo in files/ somewhere for placing on hosts? [15:46:42] apergos, does this look right? [15:46:43] http://pastebin.com/QcYj7h0m [15:47:01] adding stat1.wikimedia.org to that list? [15:48:03] uhh, hang on, that pastebin is not right [15:49:02] there [15:49:02] http://pastebin.com/P8GyaG8c [15:49:05] that's better [15:49:07] is that the right file? [15:49:52] yes that's the right file [15:49:58] k will commit and push [15:50:13] not rw, we said we'ddo read only for stat1 right? [15:50:23] hm, all the others are rw though? [15:50:28] yes, the others are [15:50:33] the snapshot hosts write the dumps [15:50:35] I could do it ro, but maybe he needs it and we can just leave it the same? [15:50:39] bayes is rw too [15:51:31] but maybe he doesn't need it and then we shouldn't [15:51:41] so please find out, and find alternative ways if possible [15:51:42] apergos: original request says "as defined on bayes".. RT-2162 [15:51:54] ugh [15:51:54] fnie [15:51:56] *fine [15:51:58] bayes is not a golden master you should replicate ;-) [15:52:22] I would prefer read only and later I would prefer gluster [15:52:30] indeed [15:52:39] mark: i am on it :) [15:52:44] thanks [15:52:49] let's wait what erik says [15:54:11] great [15:54:21] don't know gluster :) [15:54:31] diederik, I can go ahead and commit ro now and apergos can make it so [15:54:32] that's the distributd filesystem labs is using [15:54:36] we can change to rw if/when he needs it [15:54:57] at least that way if ro is fine we don't have to think abou tit again later and I can close the ticket [15:55:22] You can think about tit later if you wish :D [15:55:37] can anyone tell me sumana's official new job title please? [15:55:49] do people have official job titles? [15:56:06] well new job title then :P [15:56:31] New patchset: Ottomata; "files/download/exports - allowing /data to be NFS mounted read only on stat1 for Erik Z." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/5809 [15:56:39] apergos: yes, you're a "software developer" [15:56:41] cool, apergos, if you approve/merge [15:56:48] nothing fancy unfortunately ;) [15:56:48] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/5809 [15:56:49] then you can run puppet and reload exports on ds2 [15:56:59] and I can try to mount [16:00:32] I'm a software developer because I told them to put that on the staff page back in the day [16:00:38] it's pretty much without content [16:00:59] ottomata: lookng [16:01:40] New review: ArielGlenn; "(no comment)" [operations/puppet] (production); V: 1 C: 2; - https://gerrit.wikimedia.org/r/5809 [16:01:42] Change merged: ArielGlenn; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/5809 [16:01:55] New patchset: Demon; "Moving gitweb config to its own class, adding blame support (bug 36234)" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/5810 [16:02:09] New review: gerrit2; "Change did not pass lint check. You will need to send an amended patchset for this (see: https://lab..." [operations/puppet] (production); V: -1 - https://gerrit.wikimedia.org/r/5810 [16:02:56] whose libxt-dev edition? [16:03:01] that didn't get pushed out [16:03:20] otto and me [16:03:21] <^demon> Hrm, I did something wrong :\ [16:03:22] hm, eh wha? that is me i think [16:03:26] ok [16:03:35] doing so now [16:03:38] danke [16:04:01] apergos: thanks for email about bzip2 btw, so that block stuff is really important? and gzip does not offer that? [16:04:05] found my earbud, YAY [16:04:11] gzip does not and yes it is [16:04:15] ok [16:04:27] New review: Catrope; "(no comment)" [operations/puppet] (production) C: 0; - https://gerrit.wikimedia.org/r/5783 [16:05:25] !log reshuffling cables in eqiad for serial and mgmt connections in a8, this may affect all eqiad mgmt and serial connections for the next 5 minutes [16:05:27] Logged the message, RobH [16:06:28] ok, try mounting now [16:06:37] New review: Catrope; "(no comment)" [operations/puppet] (production) C: 0; - https://gerrit.wikimedia.org/r/5810 [16:07:22] apergos, mark, ottomata: erik is joining us on IRC [16:07:51] ok [16:08:09] so erik: we are talking about your NFS moutns [16:08:18] and we would like to know what data you exactly need [16:09:07] we suspect the XML dumps and the pageview counts [16:09:09] New patchset: Demon; "Moving gitweb config to its own class, adding blame support (bug 36234)" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/5810 [16:09:26] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/5810 [16:10:42] in: dumps & php files out ; htdocs & again dump server [16:11:00] htdocs we can move to stat1 itself [16:11:32] what files do you write? [16:12:28] htdocs on stats1 would be fine, what about switch to other data center on stat2 (some day)? [16:12:44] 208.80.152.185:/data on /mnt/data type nfs (rw,bg,tcp,rsize=8192,wsize=8192,timeo=14,intr,addr=208.80.152.185) [16:12:53] 10.0.5.8:/home/wikipedia/htdocs/wikipedia.org/wikistats on /mnt/htdocs type nfs (rw,addr=10.0.5.8) [16:13:02] ideally it would be stored on both [16:13:12] and ideally they're both setup the same way [16:13:15] yes, some day, I'm working on the order to support that dor the dump hosts [16:13:27] *for the [16:13:56] also, backups should be setup this time :) [16:14:15] php files to mine language specific keywords (it got lost on bayes' mount list while ago so I am serving from those from local cache) [16:14:27] !log done moving mgmt connections and serial connections in s8-eqiad for now [16:14:30] Logged the message, RobH [16:14:32] wow, it looks so much better. [16:14:43] RobH: it looked crap before ;) [16:14:46] er, what do you write on /data? I shoul dhave been more specific [16:14:50] yes, yes it did [16:14:55] it looks 100% better. [16:15:12] and php files, you mean mediawiki source files? [16:15:16] not done yet, but i dont wanna route all those serial connections in right now, will do that once I finish with the new mgmt calbes [16:15:18] cables [16:15:29] data: http://dumps.wikimedia.org/other/pagecounts-ez/ [16:15:44] do you write those files directly? [16:15:46] and perhaps the mediawiki php files can be pulled from git instead [16:15:48] in a cron/puppet job [16:16:00] pagecount files, csv files, celaned up projectcount files [16:16:10] cleaned up [16:16:23] I would like those to go over via a cron job with rsync [16:16:23] do we need to pubish that? the original files are already published [16:16:56] php: the language files I parse those to recognize 'Talk page' in Japanese [16:17:30] ezachte: any issue with getting a local copy of that from git on stat1? [16:17:35] instead of over NFS [16:17:47] ideally the xml dumps namespace tag should contain the localized version of the namespaces [16:18:01] or does it already have that? [16:18:11] Mark new idea for me, I'm also blank at GIT yet, should change in Berlin [16:18:43] it already does in the site info [16:18:44] DaBPunkt had a mailman question in #-tech. but he didn't actually say what the question is yet [16:18:48] ezachte: well, mediawiki is maintained in git, so unless it needs to be 100% guaranteed identical to what's on fenari, that should work, right? [16:19:02] diederik it does, but not e.g. 'category' keyword and some more [16:19:03] i can set up the git pull + cron or whatever [16:19:06] faw: did you hear about paravoid's new job? [16:19:14] but erik why don't you get that info from the xml file itself? [16:19:14] ottomata: there is some git stuff in puppet already [16:19:16] some definitions for it [16:19:16] cool [16:19:24] so use or extend those instead of using plain execs [16:19:26] k [16:19:39] (i'm happy to review) [16:19:44] ottomata, great [16:19:54] the category namespace is also in the site info [16:19:58] localized [16:20:02] danke, i don't yet understand what needs to be done (hard to follow this convo cause I don't know most of the stuff you are talking about) [16:20:11] will try to get a summary once you guys figure each piece out :) [16:21:08] so let's summarize this [16:21:09] if it turns out there are magic words or whatever that need to be gotten, then those could come from git for wmf-whichever is deployed at the time, I suppose [16:22:26] htdocs is copied to stat1 so we don't need the 10.0.0.58 mount [16:22:31] great [16:22:49] the other mount still seems relevant and need write rights, correct? [16:23:13] I would like these: pagecount files, csv files, celaned up projectcount files to go over via cron/rsync [16:23:24] instead of a mount? [16:23:27] yes [16:23:32] perfect [16:23:53] I would like the mount to be read-only, and eventually to be replaced by a mount of the gluster volume with the dumps on it [16:24:06] ezachte: if you email me and ottomata the exact source and target locations of pagecounts, csv files and project counts then we can take it from there [16:25:02] diederik will do [16:25:04] so, no mounts from 10.0.0.58 [16:25:08] ro mount on ds2 [16:25:11] (like we have now) [16:25:13] excellent [16:25:20] rsync/cron for some stuff that I will find out about [16:25:30] and git/cron (or something) for mediawiki on stat1 [16:25:31] ? [16:25:35] yes [16:25:38] and backups. [16:25:41] backups? [16:25:48] amanda backups for what's generated on stat1 [16:25:50] apergos you're right about category tag there are some more, need to check [16:26:00] and keep in mind all this needs to work on stat1001 too some time (other data center) [16:26:07] fine, ezachte [16:26:13] ok, I will get this setup on stat1 first [16:26:17] then ask more about amanda [16:26:19] New patchset: preilly; "Modify to allow carrier testing for Tunisia and Camerron" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/5811 [16:26:32] mark: all will be puppetized and (hopefully) generic enought to include anywhere [16:26:36] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/5811 [16:26:37] ottomata: ok, add a TODO in your puppet manifests for it so you don't forget :) [16:26:43] ottomata: awesome [16:26:45] k, and on the RT titcket [16:26:48] yes [16:26:54] I've added the gluster volume mount to my todo list [16:27:13] Change abandoned: preilly; "(no reason)" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/5447 [16:27:41] New review: preilly; "Add ACL for new carriers and redirect support for carriers landing page on m. domain" [operations/puppet] (production) C: 0; - https://gerrit.wikimedia.org/r/5811 [16:29:10] i updated the rt ticket [16:29:29] (http://rt.wikimedia.org/Ticket/Display.html?id=2162) [16:29:52] ok, i can copy htdocs from bayes, but that is the NFS mount [16:30:06] I don't have access to wherever it is actually hosted [16:34:15] nfs1 I guess [16:36:23] can someone who has access to that rsync it over? [16:36:32] does it need to be cron rsynced? [16:36:36] or jsut rsynced once? [16:38:44] it can be copied from bayes as well as anywhere else, assuming it's mounted there [16:38:51] i can copy it [16:38:53] and I don't know if it needs regular updates [16:39:00] i think it does [16:39:07] I really have no idea if the ffiles he uses change regularly [16:39:22] ezachte: do we need to copy htdocs once or regurarly [16:39:29] *regularly [16:40:09] wikistats makes daily updates to htdocs, [16:40:38] um wait a sec [16:40:39] !log starting delete script on ms-be3 [16:40:41] it writes there? [16:40:42] Logged the message, Mistress of the network gear. [16:41:07] sounds like it :) [16:41:09] what's a url that uses that? [16:41:27] not sure if we are talking about the same for me htdocs is stats.wikimedia.org [16:42:02] we are talking about bayes:/mnt/htdocs right? [16:42:37] yep [16:42:38] spence [16:43:08] ServerName stats.wikimedia.org [16:43:20] DocumentRoot /home/wikipedia/htdocs/wikipedia.org/wikistats [16:43:31] 10.0.5.8:/home on /home type nfs (rw,bg,tcp,rsize=8192,wsize=8192,timeo=14,intr,addr=10.0.5.8) [16:43:35] so that's pretty annoying [16:43:43] what is ? [16:44:15] apergos can you explain? [16:44:20] I'm writing [16:44:23] please be patient [16:44:41] if there are daily updates from bayes to /home/wikipedia/htdocs/wikipedia.org/wikistats now ... [16:44:43] (are there?) [16:44:59] yes [16:45:14] then it's going to be a pain to do without the moount [16:45:28] the point is that stats.wm.o uses that mount to serve up the data [16:45:41] example of daily output: http://stats.wikimedia.org/EN/TablesPageViewsMonthly.htm [16:45:56] is stats.wikimedia.org hosted on bayes? or elsewhere? [16:46:11] I would dislike intensely exporting some random filesystem off of stat1 and/or stat1001 to spence [16:46:19] stats.wm.o is on spence [16:46:24] it mounts /home from nfs1 [16:46:29] just like bayes does [16:46:32] elsewhere, I wouldn't know really for me the mount is all that mattered [16:47:14] ah, hm, [16:47:48] do we need to use nfs1? [16:48:01] can we use either spence or stat1 as the htdocs host [16:48:05] and nfs mount from one or the other? [16:49:06] using stat1 is a bad idea. it's being used for computation and other things, it shouldn't serve web data. [16:49:13] ok [16:49:23] well [16:49:25] hm [16:49:31] I will bet dollars to donuts that spence doesn't have the kind of room we need either [16:49:32] can we just rsync a data dir (or htdocs, i guess) [16:49:35] from stat1 to spence? [16:49:39] I am doing a du, it could take a while to complete [16:49:39] do that regularly? [16:49:43] like a deploy of stats.wm? [16:50:03] what from stat1 to spence? [16:50:09] spence doesn't have the data on local disks [16:50:13] ah ok [16:50:16] I doubt it has room for the data [16:50:20] and it can't? [16:50:28] can we NFS htdocs on stat1 [16:50:29] hmm well in theory it does, the du shows 32 GB [16:50:34] allow erik to generate stuff there [16:50:40] * apergos grits teeth [16:50:41] and nfs mount htdocs from stat1 on spence? [16:51:04] (just don't see the need for nfs1 here, i guess.) [16:51:30] so the thing about nfs1 is that it gets backed up (I think) [16:51:34] this is a good thing [16:51:36] aye ok [16:51:53] I should really find out how much of /home does [16:51:56] so um, let's just leave it like it is, and mount it on stat1? [16:52:05] then stats.wm.org will continue working as it [16:52:08] as is* [16:52:16] yes but. [16:52:28] and this better go in the ticket [16:53:11] if we can rsync across once a day that eliminates one more mount [16:53:41] from stat1 to nfs1? [16:53:45] so let's look into making thathappen (from stat1 or in the future stat1001 to primary host for /home) [16:53:47] why not create a shell script for ezachte that he can run once he has finished running his scripts? [16:53:49] jeremyb, I did, and I'm quite happy for him :) [16:54:02] diederik, we can do that [16:54:06] and use the same one for cron if we want to [16:54:12] perfect [16:54:28] apergos, rsync from stat1 to nfs1? to spence? to bayes (naww) [16:54:44] once a day? i'd like bug fixes etc to be online asap, also I publish frequent progress report on dump/stats progress [16:54:45] rsync from stat1 to primary host for /home (now nfs1) [16:54:57] that's why is said give erik a script [16:54:58] I don't know if there is a nice puppet variable for it, [16:55:00] one can wish... [16:55:21] ok, if not i can make one [16:55:25] where, in generic-definitions? [16:55:45] I guess I should look at how other things that use /home are set up [16:55:51] ah no [16:55:52] yeah ok [16:55:53] will look [16:56:00] adding new info to RT ticket [16:56:11] ok [16:57:10] um, Q [16:57:23] is htdocs on nfs1 /home chnaged by anything else? [16:57:27] or is that all entierly erics? [16:57:33] see in class nfs::home there is a nice little class that any host can include to moount from nfs1 [16:57:35] we only need pushes from stat1, right? [16:57:39] without having to know which host it is [16:57:39] we don't have to worry about syncing both ways? [16:58:17] LeslieCarr: are you in the office today? [16:58:29] preilly: nope, sniffles + sore throat [16:58:35] figured i wouldn't spread it around [16:58:43] LeslieCarr: are you working today from home? [16:59:09] LeslieCarr: do you mind pushing this change out https://gerrit.wikimedia.org/r/#change,5811 the same way that you did the last time? [16:59:58] I don't know where a geeneric "primary /home server" definition would go [17:00:21] preilly: pushign it out, then clearing the mobile varnish cache iirc ? [17:00:29] or was it just pushing out and reloading varnish [17:00:38] LeslieCarr: yes [17:00:55] LeslieCarr: push purge reload [17:01:12] ok, bbiab then i will get this [17:01:33] LeslieCarr: how long is a bit? [17:03:06] now [17:03:48] New review: Lcarr; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/5811 [17:03:51] Change merged: Lcarr; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/5811 [17:05:04] updating the mobile caches now ... [17:05:33] LeslieCarr: okay cool [17:05:44] ottomata: it looks to me like this data in the one directory is not generated or touched by anyone else, [17:05:51] ok cool [17:06:11] !log purging mobile cache [17:06:14] Logged the message, Mistress of the network gear. [17:06:14] i.e. it's currerntly maintainted from bayes and hence woould now be generated locally and pushed out from stat1 [17:07:28] Hey ops folks, does anyone here have intimate familiarity with memcached? [17:07:42] !log reloaded mobile varnish configs [17:07:44] Logged the message, Mistress of the network gear. [17:07:50] preilly: how's it look ? [17:07:52] I'm getting strange results running mctest.php and I figure either there's some terrible network issue, or memc is really full and turnover is high [17:08:08] LeslieCarr: looks good [17:08:24] RoanKattouw: you can be sure that turnover is high if running against prod [17:08:29] ottomata: are we good? cause I am thinking abou afk-ish for the day [17:08:34] it's after 8 pm here [17:08:47] RoanKattouw: where's mctest.php? i haven't seen it [17:08:54] maintenance I think [17:08:59] Yes, maintenance dir [17:09:02] I've live-hacked it now [17:09:10] See /home/wikipedia/common/php-1.20wmf1/maintenance/mctest.php [17:09:24] Essentially what it's doing is setting test1 to 1, test2 to 2, etc up to 100 [17:09:32] Then it tries to read those keys back in that same order [17:10:01] So if there's an LRU turnover issue, the window for those things being evicted is very small [17:10:25] But when I run it repeatedly, I always get a few servers that have low success rates [17:10:40] Just now I got a 27%, 22% and two zeroes [17:11:32] i'm seeing that too [17:11:57] ( Using mwscript mctest.php --wiki=enwiki | grep -v 'get1: 100' ) [17:12:10] with different sets of servers returning 0 each time [17:12:12] But it's different boxes every time [17:12:15] apergos, sorry in meeting... [17:12:20] will be out and read in a sec [17:12:25] And it's usually zero but not always, sometimes it's 15 or whatever [17:12:51] e..g 10.0.2.233:11000 set: 100 get1: 45 incr: 96 get2: 0 time: 0.25783896446228 [17:13:06] Turnover shouldn't be such that 55% of my entries are gone after 1/4 of a second, right? That'd be insane [17:13:31] binasher: BTW do we have any percentile data for memc in graphite? [17:14:18] asher@fenari:~/wmf-config$ alias t='echo "stats" | nc -q1 10.0.11.25 11000 | grep evic' [17:14:18] asher@fenari:~/wmf-config$ t ; sleep 20 ; t [17:14:19] STAT evictions 1172358 [17:14:20] STAT evictions 1172561 [17:14:35] 609 evictions per minute on that one server based on a 20 sec sample.. [17:15:16] That seems excessive [17:15:38] what are the values going into keys in this test? [17:15:45] Integers [17:15:50] test1=1, test2=2, ... , test100=100 [17:16:06] Then after 100 set operations, it starts back at test1 and does 100 get operations [17:16:16] wiring individual mgmt cables takes forever [17:16:20] 3 of 8 done =P [17:16:23] Then 100 incrs (test1+=1, test2+=2, etc), then 100 gets to check that test$i == 2*$i [17:19:45] binasher: OK if you don't mind I'm gonna collect the #evictions for all memc boxes [17:19:53] See if some are affected worse than others [17:20:05] RoanKattouw: i don't see a ttl in mctest.php - does mw set a default? [17:20:17] Default = 0 = forever AFAIK [17:21:45] RoanKattouw: we are sadly sadly lacking in memcached statistics. ryan and i have both lamented the fact and wanted to install a ganglia module (there are a few that track everything we'd ever want) but having memcached on a subset of the app servers where that subset isn't defined in puppet turned into "well.. we'll have a dedicated memcached cluster soon.." [17:22:11] yeah, I did try at some point to make a module [17:22:13] Right [17:22:21] I gave up half way through to work on something else [17:22:55] i think there's a module or two on github we could just use [17:23:28] but half the servers it would be reporting on wouldn't be in use so the aggregate stats would be useless [17:23:32] 0 is indeed infinity [17:23:34] " is expiration time. If it's 0, the item never expires (although it may be deleted from the cache to make place for other items)." [17:24:08] binasher, I think you'd want per-node data [17:24:27] of course you would [17:24:31] then begin investigating if one of them suddenly has a much bigger miss rate [17:25:18] * jeremyb is thinking about how to puppetize it [17:25:32] you could have a custom fact for is currently in rotation [17:25:35] you'd want lots of things, including never running memcached on servers running php/apache [17:25:37] and set role based on that? [17:26:19] (where role is memcached in rotation or memcached out of rotation) [17:28:34] RoanKattouw: there's something wrong with mctest.php / mediawiki [17:28:55] apergos, if you are still there [17:29:00] yeah [17:29:03] if I'm going to do an rsync to nfs1 [17:29:05] i was running "tcpdump -A -s1500 host 10.0.11.37 and port 11000" on a run where I got "10.0.11.37:11000 set: 100 incr: 95 get: 0 time: 0.69881200790405" [17:29:06] what user do I do it as? [17:29:10] I don't have access myself [17:29:41] the tcpdump shows that all the operations succeeded and the correct values were returned for each set! [17:29:57] can't an rsync conf file go in with a stanza and a specified user? [17:30:29] might as well be ezachte, I wonder if that user exists there [17:31:01] no. ugh [17:31:07] RoanKattouw: and "stats items" on 10.0.11.37 shows that slabs 3,4,5,6 have had no evictions or out of memory situations, only larger slabs. [17:31:12] i think the rsync conf files are only if we run rsync daemon [17:31:13] no one does [17:31:15] which is fine, we could do that [17:31:23] so adding him isn't awesome [17:31:25] That's really damn scary [17:31:43] slab 6 is for 96 byte items [17:31:44] I am going to retry this with my patched script, because it checks the sets and incrs separately [17:31:56] so all of the mctest keys should go in there or lower, right? [17:32:00] is there a user there that I could rsync as? [17:32:08] what user owns the htdocs dir on nfs1? [17:32:40] ottomata: ezachte:wikidevs [17:32:56] ezachte has the wikistats dir [17:33:14] i guess it could be host based [17:33:32] the ssh key? [17:33:33] yeah [17:33:35] yeah, each set command says to allocate 1 byte [17:33:42] so this dir: [17:33:45] /home/wikipedia/htdocs/wikipedia.org/wikistats [17:33:50] uh huh [17:33:55] on nfs1, is owned by ezachte? [17:33:59] uh huh [17:34:33] RoanKattouw: i've checked a few servers that returned 0 from the test script, and all look ok on memory with 0 evictions from the tiniest few slabs [17:34:34] oh, so I can just rsync ssh as ezachte? [17:34:46] if we set up keys? [17:34:58] well the exachte user doesn't exist on nfs1 and probably shouldn't [17:35:08] how is it owned by ezachte then? [17:35:16] it's owned by that uid [17:35:19] ah [17:35:28] but that uid does not have a user [17:35:30] which is an unknown uid on nfs1 [17:35:41] right. but the information is still there in the inode [17:35:53] he had it mounted on bayes right [17:35:56] where he is a user [17:35:57] man, so, when we did puppet on our cluster at couchsurfing, we made sure that all users everywhere existed, and that all had the same uid everywhere [17:36:12] and then just granted access either with passwords or keys as needed [17:36:16] well there is a central place where we define users in puppet [17:36:18] yeah [17:36:29] but as far as having them exist everywhere, there's the access question [17:36:36] ok, can we make a wikidev user that is not associated witha peson? [17:36:58] as long as the user doesn't have a pw or authorized key, they shouldn't be able to access [17:37:08] and puppet will not add either of those by default…oh but ldap [17:37:10] hm yeah [17:37:14] anyway [17:37:22] wikidev user? [17:37:28] or something like that [17:37:29] ? [17:38:03] or maybe an existing user that we can add to wikidev grou [17:38:10] and chgrp and g+w on that dir? [17:38:25] the memcached slab under most pressure is 163 which allocates 304k per item. there, the lru is evicting things that haven't been accessed in 18 minutes [17:39:19] I don't know what a good approach is here [17:39:24] everwhere else, keys are lasting, well, longer than that at least [17:39:46] Q: how are deploys of other mediawiki sites currently handled? [17:39:50] who are the files owned by? [17:40:21] when we deploy mediawiki generally, there's a pile of scripts just for thta, ddsh is used to run commands that pull locally from the given host [17:40:40] it's done by "mwdeploy" I think (if that ever got completed) [17:40:56] there's an su to that user buried in one of the scripts iirc [17:42:32] scap and a pile of the sync-* scripts all rely on this mechanism [17:42:48] you can see those in the puppet repo, just grep -r for mwdeploy [17:42:52] binasher: 18 minutes? really? [17:43:42] Well 18 minutes >> 0.25 s [17:43:49] So yeah something's wrong with MediaWiki [17:44:13] Ryan_Lane: yup! STAT items:163:evicted_time 1081 [17:44:34] thats seconds since the most recently evicted item was requested [17:44:40] bah [17:44:46] for most keys, it's 9 hours [17:45:36] ok, does mwdeploy exist on nfs1? [17:47:46] no [17:47:51] it's not a deployment host [17:48:32] we are kind of turning it into one, no? [17:48:34] there's no user nore role accounts, just service accounts like ganglia, nagios, etc. [17:48:35] I assume so yes [17:48:44] (ww sorry) [17:49:01] i can't find a defintion for user { "mwdeploy" in puppet [17:49:15] find for mwdeploy only returns uses, not users [17:49:34] and yet it is a real user [17:49:45] aye, how's it get created? [17:49:54] anyway, the solution is to have a user that can ssh in and rsync to that dir on nfs1, right? doesn't matter what user it is [17:50:11] or we can do it the other way? [17:50:20] rsync pull on nfs1 from stat1 as ezachte [17:50:29] then we don't have to create special accounts on nfs1 [17:50:42] but, then ezachte loses manual deploy ability [17:50:48] well erik wants to be able to run it at will [17:50:52] right [17:51:00] also, more questions from ez in rt ticket: [17:51:09] I think before we add accounts to nfs1 I would ask mark to weigh in [17:51:09] Can I specify source and target folder? [17:51:09] There are thousands of html files, partly also in 28 languages. [17:51:10] Sometimes I want to update one single file quickly and see results. [17:51:26] Right now I keep track of progress of jobs with html file updated every 15 minutes or so: [17:51:26] http://stats.wikimedia.org/WikiCountsJobProgressCurrent.html [17:51:31] Is there a solution for that, running that sync job every 15 minutes is not as intended I assume. [17:52:14] rsync would allow an rsync of a single file of course, or a directory [17:52:40] as long as the tree itself is exposed as an rsync module in rsync.conf (say) or whatever [17:52:45] right, but i was just going to write a script for him to 'deploy' [17:53:03] uh huh [17:53:15] also, if he updates the files every 15 minutes to see the status of the jobs [17:53:17] that is pretty annoying [17:53:21] possible, but annoying [17:53:39] well it's one file [17:53:40] oo, we could run an rsync daemon module on stat1 as some user there [17:53:53] insead of a cp he does a run-this-script [17:54:12] yeah, and the rsync would catch it [17:54:25] i could just set up cron to 'deploy' every 15 mins [17:54:36] ok so the whole dir is 32gb [17:54:37] sorry [17:54:46] rsync daemon module on nfs1* [17:54:47] is what i meant [17:54:52] um [17:55:00] I dunno how many files [17:55:05] 10.0.5.8:/home/wikipedia/htdocs/wikipedia.org/wikistats [17:55:07] is 32GB? [17:55:14] so I dunno how loong it takes rsync to walk the entire tree [17:55:16] or /home/wikipedia? [17:55:17] yes, 32 gb [17:55:19] geez [17:55:21] for his dir [17:55:26] can we just do nfs ? [17:55:35] would be way simpler! [17:55:44] i can put comments in that we don't like it and that it is temporary [17:55:45] yes and then we would be in the same hole we are now [17:55:49] what hole? [17:56:04] where we're trying to get off of nfs and yet we haven't [17:56:16] well, we are going to replace this entire system eventually [17:56:24] so we won't need it at all in 6mo - 1 year [17:56:24] you can put it in there only if you ask mark tomorrow if he has any brighter ideas [17:56:53] if I wake up tomorrow a little less brain dead and think of something I will let you know also [17:57:02] this ticket is just so ezacthe can run his report card and stats.wm.org scripts on the new stat1 rather than the old bayes [17:57:11] ottomata: that's a very optimistic prediction [17:57:18] 6mo - 1y? [17:57:19] (I wasn't going to say it) [17:57:20] probably [17:57:22] hehe [17:57:26] yes [17:57:36] but i'm trying to convince them to let me just do nfs! don't make my argument weaker! [17:57:37] (everything temporary is long term. that's why I want to make sure there is followup with someone more clueful tomorrow0 [17:58:27] RECOVERY - mysqld processes on db57 is OK: PROCS OK: 1 process with command name mysqld [17:58:43] apergos, ok I will put a note on the RT ticket and work on som eother stuff for hte rest of the day :) [17:58:50] thanks for your help, will bug you and mark again tomorrow [17:58:52] I still don't see why hosts allow can't be made to work somehow [17:59:08] it won't be perfect but we're only talking about the one dir [17:59:15] for rsync? [17:59:18] we could make it work i think [17:59:21] uh huh [17:59:25] even with just an rsync module (and maybe hosts allow) [17:59:36] then hand him his script and profit [18:00:20] I need to make dinner. this means I need to do some dishes first... it's 9 pm so I really want to be gone. [18:00:30] ok? [18:00:57] thanks apergos! [18:01:00] jaaa, ok [18:01:06] no probs, appreciate the help [18:01:09] ok [18:01:12] i got other stuff to work on for the rest of the day so no prob [18:01:16] we can talk more tomorrow [18:01:17] ok [18:01:32] * apergos is (finally) off the clock! [18:01:50] byeeeeeee have a good dinner! [18:02:21] PROBLEM - MySQL Replication Heartbeat on db57 is CRITICAL: CRIT replication delay 40110 seconds [18:03:52] hi domas, to continue the conversation about webstatscollector, (and forgive my lack of knowledge of berkekely-db) if the db is basically all in memory, do you still need to set the DB_CREATE and DB_TRUNCATE flags when you open the handle? [18:15:15] PROBLEM - MySQL Slave Delay on db57 is CRITICAL: CRIT replication delay 39954 seconds [18:21:17] https://twitter.com/#!/DEVOPS_BORAT [18:22:28] dk [18:22:31] K4-713: a [18:22:35] Fail, sorry [18:22:38] diederik: you're a bit late ;) [18:22:44] i always am [18:23:16] diederik: you should still pass those flags [18:23:19] * K4-713 wanders through, looks vaguely confused [18:23:32] binasher: ok, thanks [18:48:20] don't know if this has been reported [18:48:27] but I can block account names that don't even exist [18:48:28] https://en.wikipedia.org/w/index.php?title=Special%3ALog&type=&user=&page=User%3AThehelpfulone+is+the+evil&year=&month=-1&tagfilter=&hide_patrol_log=1&hide_review_log=1 [18:48:54] this could be something for #-tech actually [18:49:33] probably :) [18:56:35] starting innobackupex from db1017 to db60 for new s1 slave [18:56:42] !log starting innobackupex from db1017 to db60 for new s1 slave [18:56:44] Logged the message, notpeter [18:56:49] woo woo [18:59:43] could someone respond to DaBPunkt on #wikimedia-tech? [19:00:28] (DaBPunkt's question is about mailing list administration) [19:03:18] robla: people are responding to him> [19:03:19] ? [19:03:31] yup, all's good, thanks! [19:05:37] RECOVERY - MySQL disk space on db59 is OK: DISK OK [19:13:31] hi guys, i have a puppet Q [19:13:43] i need to set up a clone of mediawiki core (+ schedueld pulls) on stat1 [19:14:06] i could just use git::clone + cron in a class in statistics.pp [19:14:17] !log starting innobackupex from db12 to db59 for new s1 slave, per mr. feldman's directions [19:14:20] Logged the message, notpeter [19:14:25] but, it would seem cleaner if I created a mediawiki::clone class in mediawiki.pp [19:14:36] so that in the future if someone else wants a clone of mediawiki, they could just include the class [19:15:02] i want to parameterize the class so that users can specify dest path, branch, and maybe if it should be pulled regularly (maybe not) [19:15:35] should I do that? or should I just keep my mediawiki git:clone specific to statistics stuff in statistics.pp [19:15:40] instead of making it generic? [19:16:51] now is when I type some irc screen names to try to make people read what I just wrote :) Ryan_Lane LeslieCarr mutante notpeter [19:18:00] hehe, damn you ottomata it works [19:18:22] RECOVERY - MySQL Replication Heartbeat on db57 is OK: OK replication delay 0 seconds [19:18:22] i knew it! [19:18:34] expand existing, I would say [19:18:40] RECOVERY - MySQL Slave Delay on db57 is OK: OK replication delay 0 seconds [19:18:45] (if I understand correctly [19:18:54] expand existing == make it generic? [19:18:58] yeah [19:18:58] if I do, you would be able to do [19:19:13] so making a new one is tempting, then we can actually proper puppetize mediawiki instead of doing the ghetto roleouts …. [19:19:32] class mediawiki::clone { 'blabla': path => '/bla/bla', branch => 'nonya' } [19:20:03] ok, will do, people can tell me not to in code review if they don't like it [19:20:16] oh man, I'm confuzzled. listen to LeslieCarr :)_ [19:20:19] thanks for the encouragement! [19:20:24] hehehe [19:20:31] always listen to me! mwhahaha! [19:20:36] will do. [19:20:54] oh, LeslieCarr [19:20:59] what is a good default clone path? [19:21:16] or i could not use a default [19:21:31] but if I ahd a default, then you could just do include mediawiki::clone [19:22:06] ottomata: I started this recently, then decided it would take me way too long and abandoned it :) [19:22:11] but yes, that would be nice [19:22:20] um, i'm not sure... [19:22:37] Ryan_Lane: happy to do it [19:22:41] \o/ [19:22:50] where's a good default path though? [19:22:57] /var/www/mediawiki? [19:22:59] something like that? [19:24:40] hrm, [19:26:55] sure [19:26:58] we can always update it [19:27:03] or, maybe i'll just use namevar for the path? like file? [19:27:08] that way it always needs to be specified anyway [19:27:11] ahhh, naw [19:27:16] that's fine for defines [19:27:21] but i don't like that for paramaterized classes [19:27:32] i'd like it if all classes *could* be used with include [19:27:36] rather than class { "…": [19:27:48] ok, i'll make it /var/www/mediawiki [19:29:53] hm, or maybe I will make it a define, hrrrm [19:29:58] agh, i dunno, class for now :p [19:31:55] one more Q for Ryan_Lane and LeslieCarr [19:32:21] do we often have more than one mediawiki running on the same host out of different directories? [19:33:21] i believe there is often a copy of the old mediawiki software (or new one) so that there's not two instances running but two copies on one machine [19:34:00] Yes, that's right [19:34:06] We almost always have two different versions running [19:34:17] Between like an hour ago and Monday morning, we only have one though [19:34:58] so it could be useful to have 2 different directories cloned from the same origin (maybe at different branches?) [19:39:22] PROBLEM - mysqld processes on db60 is CRITICAL: PROCS CRITICAL: 0 processes with command name mysqld [19:41:09] LeslieCarr: can you rollback that varnish config [19:43:09] awww, ok [19:47:39] haha [19:47:40] # Puppet Name: restartpuppet [19:47:40] 37 2 * * * /etc/init.d/puppet restart > /dev/null [19:48:31] LeslieCarr: thanks! [19:57:20] Ryan_Lane: following up on session replication from other channel: write session to db + local memcache on creation. check db on memcache miss, insert to memcache if valid. periodic async job to delete expired sessions. no extra db queries during page views, just one extra replace query on login, where we already are doing a write to update a last login timestamp which this could be combined with. i've implemented this a [19:57:21] php site before (written all the code.) its simple and works. [20:12:24] binasher: sounds good to me [20:21:04] PROBLEM - Apache HTTP on mw24 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [20:22:34] RECOVERY - Apache HTTP on mw24 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.041 second response time [21:12:36] preilly: sorry, got called by a vendor [21:12:50] New patchset: Lcarr; "revert requested by preilly Revert "Modify to allow carrier testing for Tunisia and Camerron"" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/5857 [21:13:06] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/5857 [21:14:31] New review: Lcarr; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/5857 [21:14:33] Change merged: Lcarr; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/5857 [21:21:12] !log clearing mobile varnish cache [21:21:14] Logged the message, Mistress of the network gear. [21:21:38] !reloading varnish on mobile caches cp1041 cp1042 cp1043 cp1044 [21:22:02] * RoanKattouw hands LeslieCarr a !log [21:22:20] !log reloading varnish on mobile caches cp1041 cp1042 cp1043 cp1044 [21:22:22] thanks [21:22:22] Logged the message, Mistress of the network gear. [21:23:25] preilly: done [21:30:54] New patchset: Ottomata; "mediawiki.pp - added define for mediawiki_clone." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/5858 [21:31:12] New patchset: Ottomata; "Cloning mediawiki into /a/mediawiki on stat1. This uses the new mediawiki_clone and depends on the parent commit." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/5859 [21:31:29] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/5858 [21:31:29] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/5859 [21:32:29] would love review of those, although I imagine reviewing the commit that adds the mediawiki_clone define will require more than a passing glance [21:32:42] let's see who will I poke [21:32:44] hmmm [21:33:06] Ryan_Lane, since he said he was going to do this himself once [21:33:45] hm [21:33:46] trunk? [21:34:00] ah [21:34:04] you can specify the branch [21:34:20] yeah, it does what git::clone does by default [21:34:32] just passes the branch arg along to it [21:35:51] LeslieCarr: wow, talk about bureaucracy (re:juniper) [21:36:42] ottomata: wouldn't this use the trunk branch> [21:36:45] err [21:36:46] well [21:36:47] master [21:37:25] i think when you clone [21:37:29] it does master (or whatever?) by default [21:37:33] usually when you clone [21:37:37] on the cli [21:37:41] you don't bother specifying, right? [21:37:41] so [21:37:47] git clone http://....core.git [21:37:53] and master will be default [21:37:55] checked out [21:38:07] probably different, for example, for puppet repo [21:38:12] since there isn't a 'master', just 'production' [21:38:13] so [21:38:22] git clone http:://.…puppet.git [21:38:31] would end up with 'production' being checked out by default? [21:39:00] I could make $branch = 'master' by default [21:39:10] which would be fine for mediawiki_clone define [21:39:19] since we know that mediawiki core has a master [21:39:25] but it shouldn't matter, right? [21:39:48] paravoid: shit i hadn't checked that out yet [21:39:49] grrrr [21:42:45] PROBLEM - Puppet freshness on amslvs2 is CRITICAL: Puppet has not run in the last 10 hours [21:43:32] * halfak moved [21:43:32] ottomata: well, is this going to fully set up mediawiki? [21:43:43] ottomata: becuase it should use the current stable version by defaulty [21:43:47] unless you ask for master [21:43:53] or a different version [21:44:38] hello halfak, checking this out now [21:46:52] LeslieCarr: brb [21:47:53] New patchset: Lcarr; "adding halfak to admins::mortals RT 2707" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/5861 [21:48:09] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/5861 [21:48:52] New review: Lcarr; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/5861 [21:48:55] Change merged: Lcarr; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/5861 [21:49:23] Ryan_Lane [21:49:24] naw [21:49:31] i think erik just uses some of the code there for analytics [21:49:49] we can build something on top of mediawiki_clone later if we need that, right? [21:49:55] mediawiki_site [21:49:56] or somethign [21:50:06] it will need to probably set up much more than just the clone [21:50:08] LeslieCarr: back now [21:50:11] i think erik wants the lastest [21:50:16] Any luck? [21:50:19] he might even do some analysis on the codebase tself [21:52:17] halfak: try again [21:52:35] hence the cron in the next commit [21:52:38] that is pulling once a day [21:55:08] !log pushing dns update for scs-c1-eqiad and ps1-c#-eqiad [21:55:11] Logged the message, RobH [21:55:16] I'm in. Thanks LeslieCarr! [22:00:03] ottomata: this is going to run in production? [22:00:07] or in labs [22:00:07] ? [22:00:18] in production, we want to use a stable build [22:00:40] though I guess we are doing review before merge now [22:00:50] so it may be sane to use master [22:01:23] production, but the mediawiki_clone is just whatever you checkout [22:01:38] if you wanna comment on what erik Z should use, that is the next commit [22:01:53] where I am actually using the define and checking out master (by default) [22:02:25] i don't think I should change the default branch for the define itself [22:05:46] * Ryan_Lane nods [22:06:31] New review: Ryan Lane; "(no comment)" [operations/puppet] (production); V: 0 C: -1; - https://gerrit.wikimedia.org/r/5859 [22:07:11] cool, tanks [22:07:15] will ask Erik Z about that [22:07:23] running low on battery [22:07:25] think i'm out for the day [22:07:29] * Ryan_Lane nods [22:07:32] later [22:07:34] thanks for the help evwybody! [22:11:31] PROBLEM - swift-container-auditor on ms-be3 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-container-auditor [22:14:22] RECOVERY - swift-container-auditor on ms-be3 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-container-auditor [22:14:55] !log restarted swift-container-auditor on ms-be3 [22:14:56] binasher: stumbled into my first occasion to put you on the reviewer list for a review: https://gerrit.wikimedia.org/r/#change,5803 [22:14:58] Logged the message, Mistress of the network gear. [22:23:11] robla: thanks - i'm going to need to look at all of afv5 in some depth to understand it all. seeing scripts that do IR score calculation in sql and and blocking insert into .. select queries that have to run on the master while locking the right side table for the duration look troublesome right off the bat. [22:29:00] Yeah there's some weirdness there [22:29:18] But there's nothing in the web-facing code that does anything scary as far as I'm aware [22:31:56] ah, word from Erik Z, I can use stable branch [22:32:05] Ryan_Lane, is there such a 'branch' in git repo? [22:32:16] should be [22:32:36] origin/REL1_19 [22:32:37] 'stable'? [22:32:39] ah, hm [22:32:39] I'd imagine [22:32:44] git branch 0r [22:32:45] so I have to make it fancy when it changes? [22:32:45] err [22:32:47] git branch -r [22:32:57] yeah see that [22:34:08] oo [22:34:12] git symbolic-ref [22:34:12] ? [22:36:54] Could someone copy some tar files to dataset2 /data/xmldatadumps/public/mediawiki and extract them for me please? [22:37:16] http://noc.wikimedia.org/~reedy/upload-1.17.4.tar http://noc.wikimedia.org/~reedy/upload-1.18.3.tar http://noc.wikimedia.org/~reedy/upload-1.19.0rc1.tar [22:37:38] Ryan_Lane, don't know if we can do this in git [22:37:45] eh? [22:37:50] You can use branches [22:37:52] but would it be possible to maintain a 'stable' branch that points to one of the releases? [22:37:52] what do you mean? [22:38:08] Just use a specific release [22:38:11] git checkout -b REL1_19 origin/REL1_19 [22:38:13] and change branches when a new stable comes out [22:38:21] boo, i want it to be automatic [22:38:25] it can be [22:38:26] i don't want to change puppet [22:38:35] it doesn't happen that frequently [22:38:37] sec [22:38:43] You'd also need to be running update.php aswell then [22:38:46] you'll need to do that anyway [22:38:47] yeah, but it still should be automatic [22:38:48] exactly [22:38:52] its not running mediawiki for real [22:39:00] erik z is just analytzing translations [22:39:03] from the codebase [22:39:15] oh [22:39:21] I thought it would be running it [22:39:35] naw he's just analyzing it for stats.wm.org [22:39:39] oh [22:39:48] then master is likely fine [22:40:05] aye, hm, ok [22:40:06] cool [22:40:16] what do I need to do in gerrit now then? [22:42:54] New review: Ryan Lane; "On further discussion, mediawiki won't be running, so the master branch is fine." [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/5859 [22:45:16] New review: Ryan Lane; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/5858 [22:45:19] Change merged: Ryan Lane; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/5859 [22:45:21] Change merged: Ryan Lane; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/5858 [22:46:19] ottomata: it's merged in [22:50:46] danke! [22:51:30] cool! [22:56:17] ok, time to head home, its 7pm. [23:00:41] New patchset: Ottomata; "misc/statistics.pp - agh, wrong name on the require => Mediawiki_clone, fixed." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/5865 [23:00:50] RobH, thanks for the setup! [23:00:58] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/5865 [23:01:53] agh, Ryan_Lane if you are still around could you approve that one too? [23:02:29] i had a bad string name in there. I'm working with two working copies right now, and had fixed it in wrong one. Should be good now. [23:44:58] PROBLEM - Apache HTTP on srv278 is CRITICAL: Connection refused [23:48:26] New review: Ryan Lane; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/5865 [23:48:26] ottomata: done [23:48:29] Change merged: Ryan Lane; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/5865 [23:48:34] danke! [23:49:10] yw [23:50:40] PROBLEM - Puppet freshness on gallium is CRITICAL: Puppet has not run in the last 10 hours