[02:17:53] !log LocalisationUpdate completed (1.19) at Thu Mar 29 02:17:53 UTC 2012 [02:18:18] Logged the message, Master [07:01:00] Any dev around? https://fr.wikipedia.org/w/index.php?title=Aide%3ALiens_internes&action=history&year=&month=-1&tagfilter=&deleted=1 makes no sense to me [07:01:06] no logs of it whatsoever [07:01:09] and makes no sense [07:08:43] > [07:08:43] Maplebed (another wmf-tech – give him/her a beer if you meet him/her) [07:08:44] started a dump now, so we should have a running/up-to-date DB at the end of [07:08:47] the week at latest. [07:08:49] Good night everybody. [07:08:52] > [07:08:56] Oh, he's not here. [07:08:59] I was going to say thank you. [07:10:58] Poor Joan, when he wants to be kind the ~~victim~~ addressee is missing. [07:13:38] I sent him an e-mail. [07:13:49] Ah, technology. [07:14:24] It took me a minute to figure out what his real name was. [07:14:28] Googling is hard. [07:40:50] mutante: apergos: hello. In puppet, do you happen to know how adding a user to a group ? [07:41:05] I have created a jenkins group in manifests/admins.php but somehow I am not part of that group :D [07:43:46] if you look at admins.pp you'll see examples [07:44:01] in general you make a class which assigns a bunch of users to a given group [07:44:18] that is what I did [07:44:26] i copy pasted the analinterns group :) [07:44:43] maybe I did not include that class :-D [07:44:46] then make sure that the class gets included on the hosts where you want [07:44:51] probably you forgot to do that [07:45:51] it is applied. Hmm going to check syslog logs since I have access to them ;) [08:01:53] hashar: hi, re! [08:02:02] :) [08:02:22] I am trying to find out how puppet got confused and do not add me to the jenkins user group on gallium [08:02:33] though I have a declaration in manifests/admins.pp (at the bottom [08:02:38] you cant "add existing user to existing group" using puppet methods [08:02:45] OH MY GOD [08:02:46] ;) [08:02:48] besides using an Exec with usermod [08:03:05] you can just add it to a group when creating the user [08:03:07] hold on .. [08:03:09] that explains my issue so :) [08:04:19] arr, searching for old patchsets in gerrit ..not that easy [08:04:30] the one I did ? [08:04:47] owner:hashar project:operations/puppet [08:05:02] one i made [08:09:10] https://gerrit.wikimedia.org/r/#patch,sidebyside,2927,1,manifests/mail.pp [08:10:42] mutante: I guess that could work [08:10:55] still, it is not supposed to happen on new deployement [08:11:47] are the class::XXXX just for documentation? [08:15:12] when i wanted to do that i also found stuff like this: User["defaultaccount"] { groups +> "sshusers" } ,but it didnt seem to work, and when other people asked they usually get "> The issue with the group/user resources is that you cannot manage [08:15:17] > existing groups/users. [08:15:35] which class:XXXX ? [08:15:39] sorry [08:15:49] the class admins::jenkins for example [08:15:54] I have added reedy, demon and I there [08:16:05] but since we all already have an account, it seems to be useless to declare that class [08:16:21] though it might help if one day we decide to reinstall jenkins from scratch [08:16:25] (as I understand it) [08:18:46] hmm, yes, it will ensure that the groups wikidev and jenkins exist, and that those users exist, and if the users dont exist yet it will create them with those groups [08:19:10] fine [08:19:10] just if it sees the groups and the users exist, it will not check if the users are in the groups [08:19:19] or bother to change their grups [08:19:33] could it be done at the user level ? [08:19:47] something like: user { name => hashar, ensure => Group['jenkins'] } ? [08:20:15] worth a try, but i dont know [08:21:02] you could cheat and remove the user manually temp. then run puppet again ? :p [08:21:02] that would not [08:21:17] ensure is just about present / absent / latest I think [08:25:04] maybe I could add to admins::jenkins class : User["hashar"] { groups +> "jenkins" } [08:25:45] though we probably want to rewrite the admins stuff to inherits a class that would do that automatically [08:26:14] mutante: http://dpaste.org/95vnX/ :) [08:27:55] hrmm, i see "user" has attributes "user_role_add" and stuff, but that is all Solaris stuff, so depends on the provider [08:28:01] hashar: yea, try it;) [08:28:10] need to submit to gerrit :) [08:30:36] mutante: https://gerrit.wikimedia.org/r/3903 [08:36:42] while you are around, I had bots kicked out of this channel :-D [08:36:54] so this is a quiet place to talk about wm technical issues [08:37:02] hashar: merged on sockpuppet, run it .. i'll be eager to hear [08:37:25] if that works i'll replace the Exec i used [08:41:19] * hashar logs on gallium [08:41:26] $ groups [08:41:27] wikidev [08:41:57] $ grep jenkins /etc/group [08:41:57] jenkins:x:561: [08:42:04] we are screwed :-( [08:43:41] well ok, at least confirmed the "plusignment" stuff doesnt work either, even though you find it in other answers [08:44:14] gotta use the Exec i guess [08:44:39] or cheat by removing users manually , then run puppet, in cases where you need to add existing users on boxes that are installed already [08:44:56] or just manually add users to groups ? :-D [08:45:03] anything in puppet logs related to it? [08:45:15] or that [08:45:33] * hashar looks at syslog [08:45:41] /var/log/daemon.log [08:45:49] I can't access that one [08:45:54] lemme check [08:46:07] I need to use the GBytes one on fenari :) [08:46:16] ooh..ok [08:46:33] oh it is only 21MB !! [08:47:06] Could not retrieve catalog from remote server: Error 400 on SERVER: Duplicate definition: Group[jenkins] is already defined in file /var/lib/git/operations/puppet/manifests/admins.pp at line 65; cannot redefine [08:47:25] <-- just like i remembered it when trying.. gets you duplicates [08:47:50] let's revert [08:49:37] mutante: want me to do the revert? [08:49:50] yes pls [08:50:13] you decide if you want to add the Exec instead or just do it manually [08:50:25] i dont think [08:50:35] it matters that much [08:50:57] since this still works when setting up a new box or a new user [08:51:44] just do it manually please :) [08:52:51] reversion is https://gerrit.wikimedia.org/r/3904 [08:53:02] http://projects.puppetlabs.com/issues/3556 [08:54:01] so i think if the group "jenkins" would not have been defined before it might have worked with the +> [08:54:22] but also http://groups.google.com/group/puppet-users/browse_thread/thread/6143acab84a9c0ea [08:54:25] hrmmm [08:55:57] mutante: puppet should just reuse whatever group was existing :-D [08:56:32] Error 400 on SERVER: Only subclasses can override parameters at /var/lib/git/operations/puppet/manifests/admins.pp:1879 on node gallium.wikimedia.or [08:56:35] ah [08:58:12] does not make any sense to me :( [09:04:43] it's related to "inclusion vs. inheritance" / "variable scope, overwrite inherited values" [09:05:25] http://groups.google.com/group/puppet-users/browse_thread/thread/21caf3f1bb17d3 [09:05:26] * Damianz turns mutante into a pillow [09:05:29] what I don't understand is that Leslie did apply that change yesterday [09:07:35] Damianz: good night :P [09:07:57] Night? It's like 10am :P [09:09:03] hashar: you dont know why she removed the require => Group['jenkins'],? [09:09:18] noooo idea [09:09:23] Damianz: ok, then just "sleep well"? [09:09:45] I wish :( Going to sniffy here with my cold in work. [09:11:21] hashar: uid=1145(demon) gid=500(wikidev) groups=500(wikidev),561(jenkins [09:11:28] uid=519(hashar) gid=500(wikidev) groups=500(wikidev),561(jenkins) [09:11:29] mutante: thanks! [09:11:44] which command generates that output ? [09:11:54] id [09:14:34] mutante: seems to work for me now. thx! [09:14:44] yw! [09:16:37] btw, i assume you used "git revert HEAD" for the quick revert? [09:18:02] I dont remember [09:18:14] probably git-review -d 1234 && git revert FETCH_HEAD [09:18:33] ah,ok, thats why i was asking, cause i stopped using git-review again [09:18:40] 451 git revert 985b21a [09:18:40] 452 git amend [09:18:41] 453 git-review [09:18:42] after i overheard something about issues with it [09:18:44] <--- I am that lame ;) [09:18:50] got the sha1 from gerrit [09:19:05] at least by using sha1 I am sure about what I am going to do [09:19:21] doesnt seem that lame to me, thx [09:19:33] yea, makes sure its the right one [09:24:17] so it looks like I have fixed my issue [09:24:20] jenkins and I are in the same boat [09:24:31] which should let me play with git easily ;) [09:27:34] mutante: thanks :-]))))))))))))))) [09:29:52] :) [10:21:53] ugh [10:22:04] running phpunit tests *requires* the ParserTree extension to be installed? [10:22:06] wtf? [10:22:13] or at least present [10:22:34] oh, or maybe it doesn't?... [10:23:14] whoops, my fault [10:23:20] hi Raymond_ [10:24:38] hi Daniel_WMDE [10:29:20] Daniel_WMDE: is one API test giving you an error (not failure)? [10:29:35] Daniel_WMDE: I got this - http://p.defau.lt/?Lj6yKPGAzbT4vGQeSCwCvw [10:30:14] i'm just running parser testst atm [10:30:30] "Somebody needs to finish loving me" [10:30:32] hehehe... [10:33:33] yeah this one is also an API test [10:50:10] what happened to esams bits, one server went down a day ago? load is slow for me from Europe [10:50:17] bits is slow for me too [10:52:50] http://ganglia.wikimedia.org/latest/?c=Bits%20caches%20pmtpa&m=load_one&r=week&s=by%20name&hc=4&mc=2 looks even weirder [11:04:21] Hi! [11:04:23] I don't know if it is relevant, but when I saved this edit (711 536 bytes long) [11:04:26] https://pt.wikipedia.org/w/index.php?title=Wikip%C3%A9dia:P%C3%A1gina_de_testes/1&dir=prev&offset=20120329105132&limit=1&action=history [11:04:27] I got the following error: [11:04:29] PHP fatal error in /usr/local/apache/common-local/php-1.19/includes/objectcache/MemcachedClient.php line 987: [11:04:31] Allowed memory size of 125829120 bytes exhausted (tried to allocate 3757034 bytes) [11:05:40] Irony [13:17:26] !log reedy synchronized wmf-config/InitialiseSettings.php 'Add participation namespace to metawiki per request' [13:17:28] Logged the message, Master [13:36:03] uh... [13:36:12] wikidown? [13:36:25] i'm not getting any responsein germany [13:37:07] Netherlands neither [13:37:30] Ganglia suggests problems with bits [13:37:33] according to firebug, the problem is actually bits.wikimedia.org [13:37:36] and/or uploads [13:38:42] Daniel_WMDE: esams bits show zero outgoing traffic [13:38:49] emailed wikitech-l [13:38:49] * vvv pokes sysadmins [13:39:03] Do we have an emergency contacts for that? [13:39:21] RoanKattouw, mutante, ... [13:39:22] Poke me if any channel topics should be updated [13:39:46] TBloemink: well, bits on esams are down [13:39:48] * RoanKattouw is not ops, doesn't know what's going on [13:39:55] RoanKattouw: http://ganglia.wikimedia.org/latest/?c=Bits%20caches%20esams&m=load_one&r=hour&s=by%20name&hc=4&mc=2 [13:40:05] Ouch [13:40:13] vvv, well, people can always be SMS'd but it doesn't look like a full emergency. or it is? [13:40:20] Hmm, Ryan is on vacation, I forget whether Mark is [13:40:31] Oh, I see [13:40:37] cp3001 went down somehow [13:40:46] Why the hell do we have *two* boxes for esams bits? [13:40:48] MaxSem: well, I believe the Euorpean users are unable to open Wikipedia from browsers now [13:40:50] Isn't that a bit low? [13:41:13] Unless they have all bits cached [13:41:22] So yes, this is emergency [13:41:44] hahaha, to look up contact list I need to open officewiki first [13:41:57] cruel fate:P [13:42:06] hmm, it opens [13:42:16] Yeah, just slow [13:42:41] you could just adblock bits, then you wouldn't get js and such, if you were desperate for the date [13:42:42] i'm not getting a response after minutes [13:42:46] *data [13:43:06] p858snake|l: that's not a good response to "wikipedia is offline in europe" :) [13:43:37] * Reedy pings apergos [13:43:39] Daniel_WMDE: I was referring more to the MaxSem opening up the private wiki :p [13:44:44] MaxSem: one doc page suggests that /h/w/doc/contact-details must have stuff [13:44:44] If you have root [13:44:44] ideas whom to poke? [13:45:12] Wasn't Mark the sysadmin located in Europe? [13:45:20] Yes, and he's on vacation [13:45:22] Oh [13:45:30] As is Ryan, and Asher :s [13:46:00] tim has root doesn't he, but hes not online atm, and its midnight for him [13:46:12] It's almost 1am there [13:46:17] We've got non west coast US ops too [13:46:20] Australia still has DST for a little while longer [13:46:25] vroadcast everyone? :P [13:46:25] Yes, Peter is around [13:46:26] RoanKattouw: I know, Hes +1 me [13:47:22] how about mutante? [13:47:30] notpeter... [13:47:31] *pokedypoke* [13:47:42] see #wikimedia-operations [13:47:53] eyah, sorry, in ops [13:48:04] hm... this is a bit silly, isn't it? [13:48:18] there are *no* ops people on call? [13:48:26] Daniel_WMDE: there's at least 3 online now [13:48:28] Peter is working on it [13:48:33] ok, good [13:48:37] thanks for letting us know [13:48:47] what's the difference between the two channels? [13:49:03] <^demon> This one has no bots and it's the one we direct people to. [13:49:14] MaxSem: -operations is supposed to have less offtopic [13:49:59] looking better now, thanks [13:50:16] OK, we should be coming back up [13:51:24] thank you [14:16:13] RobH: regarding the discussion in ops (not sure, if "outsiders" should talk in there ;)): Wouldn't it be enough to have a nagios server in a second location checking wether alert-location one is available (i.e. reachable) and have those two locations monitor each other? or did you want to be prepared for an outage of your sms gateway provider? [14:16:51] i suggested that [14:17:08] we had a long internal discussion, unfortunately i know folsk cannot see that ;] [14:21:07] you can talk in there, it's ffine [14:22:18] there are nagio-checks for nagios AFAIR [15:11:35] Issue at Wikimedia Commons when generating Thumbnails: https://upload.wikimedia.org/wikipedia/commons/thumb/f/fc/Pleiades-venus-jupiter.JPG/1280px-Pleiades-venus-jupiter.JPG Is that known? [15:12:52] sorry, the first link seems to work [15:12:57] hmm [15:12:58] Error creating thumbnail: convert: Output file write error --- out of disk space? `/tmp/transform_3327961-1.jpg' @ error/jpeg.c/EmitMessage/235. [15:13:02] yep [15:13:06] apergos: ^ scalers out of disk space? [15:13:50] srv219 has gone up from 90% to nearly 100 in the last half hour [15:14:12] Same on all the scalers [15:15:15] It's known now [15:15:18] the cro job that clears out tmp runs every 5 minutes. [15:15:27] it cleans out stuff older then 10 mins I think [15:16:41] There's a few other small files laying about [15:16:56] wurfl.xml can go from any apache (not used now) [15:17:10] mw-cache-1.18 is 366M on 219 [15:17:28] can that go? is everything switched over (including private wikis, test)? [15:18:21] Yup, everything is now [16:56:55] RobH: which end of the world is eqiad at? ;) [16:57:14] ashburn va [16:57:18] ah [16:57:20] outside washington dc [16:57:38] that and tampa fl are the two primary datacenter sites [16:57:48] ic [16:57:55] with a caching center in haarlem netherlands and a peering site down the road from the netherlands facility [16:58:34] T3rminat0r: if you would like to donate datacenter space for a caching site, we would be interested :) [16:58:57] LeslieCarr: haha, just a student... ;) [16:59:13] ah too bad :-/ [17:00:13] just wondered at which end of this planet, rob's current "cave" was at ;) [17:00:27] ah :) [18:11:00] !log catrope synchronized php-1.19/extensions/ClickTracking/ClickTracking.hooks.php [18:11:01] Logged the message, Master [20:05:13] !log Stopping and starting Gerrit on manganese to apply Chad's change of the -1 text in the DB [20:05:15] Logged the message, Mr. Obvious [20:28:13] !log reedy synchronized wmf-config/InitialiseSettings.php 'Swap wgUseCommaCount to wgArticleCountMethod' [20:28:15] Logged the message, Master [20:54:27] hello [20:54:42] does anyone know what happened to db47? [20:56:20] nope, were you using it for something ? [20:58:10] yes toolserver is replicating s6 from there [20:58:31] LeslieCarr: do you have a hint for another host to use? [20:59:25] no, trying to find the shard to db mapper ... [20:59:42] <^demon|away> db.php? [21:00:04] ah [21:00:05] http://noc.wikimedia.org/conf/highlight.php?file=db.php [21:00:12] thats the correct url? [21:00:23] it says [21:00:24] 's6' => array( [21:00:24] 'db43' => 0, # hw died 12/18/2011 [21:00:24] 'db47' => 1000, [21:00:24] 'db46' => 400, # snapshot host [21:00:24] 'db50' => 1000, [21:00:27] <^demon|away> Yeah, that displays the version currently on the site. [21:00:35] <^demon|away> LeslieCarr: That file's at fenari:/home/wikipedia/common/wmf-config/ if you need it [21:00:40] ah ok so i could use db50 too [21:01:15] hrm, if db46 is supposed to be in rotation … why hasn't nagios yelled at us... [21:01:57] She was asking about db47, not db46 [21:02:55] i had the feeling db47 was gone because of http://ganglia.wikimedia.org/latest/?r=hour&cs=&ce=&m=&s=by+name&c=MySQL%2520pmtpa&tab=m&vn= [21:03:12] and telnet through our tunnel is not answering really [21:03:22] makes a connection but no mysql prompt [21:10:12] am i wrong? [21:11:27] what should i use as a master position then? [21:19:30] im going to check out db47, still unsure why it didn't alarm us ... [21:20:43] LeslieCarr: thanks [21:20:54] db47 is frozen, i'll reboot it :( [21:22:44] :-( [21:23:00] LeslieCarr: about the jenkins group right from yesterday [21:23:17] LeslieCarr: mutante and I finally did the group add manually :-D [21:23:36] so the issue is fixed. Thanks for the time you spent on that yesterday afternoon [21:33:42] fyi nosy no error messages or info as to why db47 locked up … but the last info i can find from it was about 9 hours ago [21:34:59] LeslieCarr: thanks so far. it seems it still did not really recover did it? [21:35:30] it should be recovering the db right now ... [21:35:55] ok then ill wait :D [21:36:01] RoanKattouw: around? [21:36:12] Yeah [21:37:08] a couple of months ago we had a problem on nl-wiki with 'maximum transclusiegrootte', did you follow/see that one? [21:37:24] No [21:38:34] ok, there is a limit, on the maximum size of a page when templates are expanded [21:38:49] and with the monumentlists we are hitting that limit [21:39:16] I am wondering why that limit is there, and if really need, if it can be raised [21:39:49] What is this limit set to and how did you manage to hit it? [21:42:21] https://nl.wikipedia.org/wiki/Categorie:Wikipedia:Pagina%27s_waarvoor_de_maximale_transclusiegrootte_is_overschreden [21:42:33] not sure what the limit is [21:42:58] we hit it because every tablerow is a template, and the lists are sometimes quite big [21:46:24] Akoopal: Eww [21:46:45] I guess there's not really a better way to do this though [21:46:55] nope [21:47:09] You should talk to Tim about this when he's awake [21:47:14] and the whole wiki loves monuments system has been build on this [21:47:29] hmm, ok [21:47:36] he is US? [21:47:58] Akoopal: Australia [21:48:02] ahhh [21:48:24] do you know the english term for this one btw? [21:48:30] The local time there is 8:48am [21:48:33] LeslieCarr: how long do you expect db47 to recover? [21:48:40] Akoopal: Probably "maximum transclusion size" [21:48:40] RoanKattouw: yeah, I know [21:49:01] I have 2 direct collegues in AU [21:49:26] 'Pages where template include size is exceeded' [21:49:56] ok, thanks [21:54:23] um, i am not certain, mysql is running again now ... [21:56:12] but it's currently 34900 seconds behind :( [21:56:55] nosy: http://nagios.wikimedia.org/nagios/cgi-bin/extinfo.cgi?type=2&host=db47&service=MySQL+Slave+Delay should refresh every 5 minutes [21:56:59] if you want to see the delay ... [22:13:14] LeslieCarr: the s6 copy now has another problem [22:13:26] after restarting mysql [22:13:27] what is it ? [22:13:27] Last_IO_Error: Got fatal error 1236 from master when reading data from binary log: 'Client requested master to start replication from impossible position' [22:13:43] Connect_Retry: 60 [22:13:44] Master_Log_File: db47-bin.000144 [22:13:44] Read_Master_Log_Pos: 48644454 [22:14:39] maplebed: my mysql-foo is gone - you have any more ? [22:15:03] which host is the problem on? [22:15:08] db47? [22:15:14] yes [22:15:54] what master position should i use now? [22:16:05] db47's replicating quickly … only about 6 hours back now ... [22:16:27] seems the last info i have is not usable [22:17:19] hi folks. who do i ask to (temporarily) raise the permitted number of account creations from one IP? someone is having a workshop for no.wikipedia, and will be creating a lot of accounts from that one IP on April 23 [22:17:26] nosy: mysuggestion: start replication from db47-bin.000145 at position 0, see if it works. [22:17:31] jsoby: File a site request in Bugzilla [22:17:36] mysql starts writing to a new binlog when you kick it, [22:17:44] RoanKattouw, (Y). do i have to know the IP? (I don't) [22:17:47] so if it crashed when it next started it would have started at position 0 in a new file. [22:17:48] Yes [22:17:59] right [22:18:06] RoanKattouw, ok, will find out. thanks [22:18:07] Not right now, but you'll have to know it at some point [22:18:48] exactly [22:20:19] maplebed: thanks [22:20:26] nosy: that worked? [22:20:30] \o/ [22:20:35] the replication works again and the replag looks reasonable [22:21:30] if it lost a query replication may break again; if it does I recommend telling it to skip 1 query and just keep going. [22:21:38] if that fails too many times in a row you'll need a fresh dump. [22:21:46] Seconds_Behind_Master: 36787 [22:22:22] maplebed: dont say things like that :,( [22:22:30] ;) [22:22:36] my guess - that'll take about 3hrs to catch up (assuming you've got decent hardware) [22:22:47] we have excluded a lot of replication error already [22:23:00] slave-skip-errors=0,1213,1158,1053,1007,1062,1050 [22:23:03] I'm sure you'll be fine. [22:23:04] thats our config [22:23:12] think so too [23:40:10] gn8 folks [23:49:09] !log aaron synchronized php-1.19/includes/revisiondelete/RevisionDeleteUser.php 'deployed r114619' [23:49:11] Logged the message, Master