[00:05:26] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:06:29] New review: Lcarr; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/2729 [00:06:30] Change merged: Lcarr; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2729 [00:11:17] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 335 bytes in 0.044 seconds [00:15:06] New patchset: Diederik; "Improvements: 1) IP range filtering and regular expression now work. 2) Started with unit-tests 3) Major refactoring" [analytics/udp-filters] (refactoring) - https://gerrit.wikimedia.org/r/2698 [00:19:08] New patchset: Diederik; "Improvements:" [analytics/udp-filters] (refactoring) - https://gerrit.wikimedia.org/r/2698 [00:27:53] how do we clear a cached dns miss again ? [00:27:57] (other than just waiting it out ;) [00:30:36] oh found it :) [00:34:24] !log reinstalling neon [00:34:26] Logged the message, Mistress of the network gear. [00:34:59] New patchset: Lcarr; "Bringing neon back to life" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2730 [00:37:45] New patchset: Ryan Lane; "Use the simple scheduler, rather than chance." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2731 [00:38:20] New review: Ryan Lane; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/2731 [00:38:21] Change merged: Ryan Lane; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2731 [00:40:09] PROBLEM - Host cp1017 is DOWN: PING CRITICAL - Packet loss = 100% [00:41:57] RECOVERY - Host cp1017 is UP: PING OK - Packet loss = 0%, RTA = 26.43 ms [00:44:57] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:46:00] PROBLEM - Backend Squid HTTP on cp1017 is CRITICAL: Connection refused [00:46:45] PROBLEM - Frontend Squid HTTP on cp1017 is CRITICAL: Connection refused [00:50:57] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 335 bytes in 0.045 seconds [00:55:20] New patchset: Bhartshorne; "syntax corrections, size correction, partition number correction" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2732 [00:55:47] New review: Bhartshorne; "(no comment)" [operations/puppet] (production); V: 1 C: 2; - https://gerrit.wikimedia.org/r/2732 [00:56:22] Change merged: Bhartshorne; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2732 [00:57:45] hi maplebed, would this be a good time to install some software on stat1? [00:59:36] New patchset: Ryan Lane; "Making labstore1-4 LDAP clients" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2733 [00:59:39] I think I do have some time before the end of the day. [00:59:59] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/2733 [01:00:00] drdee: I saw the request for virtualenv and pip again [01:00:14] drdee: is it possible for you guys to list the specific python packages you need? [01:00:17] as before [01:00:26] also, the maxmind c library should already be there... [01:00:27] Ryan_Lane: there's a wiki page and some RT tickes. [01:00:47] * Ryan_Lane hates virtualenv and pip [01:00:53] http://www.mediawiki.org/wiki/Analytics/Infrastructure/Stat1 [01:01:34] but I'm with you on virtualenv and pip. There's eternal conflict between languages wanting to manage their modules and distributions wanting to manage packages. [01:01:41] distributions win, when ops has to manage a system. [01:01:58] which means if you write software that depends on the language's package management, it's likely that we won't be able to install it easily. [01:02:09] yes. and virtualenv and pip are a great way to ensure we'll never be able to rebuild a system [01:02:32] at the same time, using the language's package managament to speed development works fine, so long as it's converted to distribution-based package management before getting "deployed". [01:02:39] RECOVERY - Frontend Squid HTTP on cp1017 is OK: HTTP OK HTTP/1.0 200 OK - 27545 bytes in 0.126 seconds [01:02:50] (which is total fail when the dev system starts churning out "production" reports, as hapepns so incredibly frequently. [01:02:50] ) [01:02:57] it'll never happen, though [01:03:13] unlike ryan, I am an eternal optimist. [01:03:16] :D [01:03:22] I know better :D [01:03:51] RECOVERY - Backend Squid HTTP on cp1017 is OK: HTTP OK HTTP/1.0 200 OK - 27400 bytes in 0.178 seconds [01:04:06] I'm going to add: misc::package-builder [01:04:15] since they need C, they'll likely need those too [01:04:39] hm. that may be excessive [01:04:58] heh. it doesn't even include C libraries [01:05:03] I'll just include the package class [01:05:05] for git [01:12:20] New patchset: Ryan Lane; "Adding git and mysql-client to stat1" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2734 [01:12:55] New review: Ryan Lane; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/2734 [01:13:18] New review: Ryan Lane; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/2733 [01:13:19] Change merged: Ryan Lane; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2734 [01:13:19] Change merged: Ryan Lane; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2733 [01:14:28] Ryan_Lane: virtualenv was requested by Andrew Otto; I understand his reasons but I also understand your concerns [01:14:54] we gain flexibilty and you will be more woried, that's the tradeoff [01:15:11] well, it's fine for development [01:15:15] but not for production [01:15:48] once we are to go to production we can puppetize the whole thing [01:16:03] why's it being developed on stat1, rather than labs, then? [01:16:07] does it require private data? [01:16:11] for development? [01:16:14] yes [01:16:21] hm [01:16:53] Ryan_Lane: you said it should already have the maxmind geoip library? is it getting that not-through-puppet? [01:16:54] it would be really really awesome if we can have private data on labs :) [01:17:05] maplebed: I think it was installed manually [01:17:13] drdee: we're discussing how to implement it [01:17:18] possibly [01:17:35] maplebed: we probably need to add classes for it [01:17:55] puppet on this system is fairly odd [01:17:55] there's already a generic::geoip class. [01:18:02] ah [01:18:20] that would be what's needed [01:18:21] it installs /usr/share/GeoIP/stuff. [01:18:26] that's the same thing? [01:18:27] ok [01:18:28] yep. adding it [01:19:05] New patchset: Ryan Lane; "Adding geoip libraries to ensure it's managed by puppet" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2735 [01:19:29] which version of geoip is that? [01:19:41] whatever comes with ubuntu lucid [01:19:49] if you need a newer version, we'll need to backport [01:19:54] huh. https://github.com/rcrowley/puppet-pip [01:20:02] * Ryan_Lane pukes [01:20:06] no thanks [01:20:14] that's like managing ruby gems via puppet [01:20:38] we don't use third party repos for apt. do we really want to trust one for system libraries? :) [01:20:57] drdee: 1.4.6.dfsg-17 [01:21:02] that's the version in lucid [01:21:38] if precise has a newer version, we can upgrade to precise for it, or it can be backported [01:22:07] if we need an even newer version, it'll need to be backported into a package for whatever ubuntu version we want to use [01:22:33] it's perfect [01:22:36] great :) [01:23:22] does it also install the dev libs of geoip? [01:23:56] do you need them? was just about to ask that in the ticket [01:24:26] oh. it does [01:24:42] New review: Ryan Lane; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/2735 [01:24:42] Change merged: Ryan Lane; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2735 [01:24:51] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [01:29:49] maplebed, ryan_lane: thanks for helping us out and my apologies [01:29:58] oh. no worries [01:30:01] we'll work something out [01:30:10] hopefully we can work out what you need with ubuntu packages [01:30:16] if not, we'll take a look at virtualenv and pip [01:30:24] I'd *really* like to avoid that, though [01:31:07] as long as very good documentation is kept, it may be acceptable [01:31:23] meaning, whenever someone calls pip, they document the package they added [01:31:42] one of the issues I have with pip is that it installs never versions of python libraries than may be included with lucid [01:32:08] which then makes code dependent on things we'd have to backport [01:32:57] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 335 bytes in 0.064 seconds [01:35:57] New patchset: Bhartshorne; "adding class to install pip for stat1 - dev use only." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2736 [01:36:32] maplebed: I hope you'll backport whatever they need, when we run into that problem, too ;) [01:36:34] Ryan_Lane: r2736 is a start down that road. [01:36:54] they need stuff, they backport it. [01:36:57] why we would need to backport stuff? [01:37:19] drdee: for instance, if you are using python-blah version 2.3 via pip [01:37:19] because of what Ryan_Lane just said - pip can get you newer versions than ubuntu. [01:37:31] and lucid had 1.9 [01:37:43] you'll be using features that don't exist in 1.9 [01:37:45] and depending on them [01:37:56] but that's why you use virtualenv [01:38:00] then, when we puppetize it, you are missing the packages [01:38:07] err, missing the correct version [01:38:29] drdee: we don't use third party repositories in production, as a rule [01:38:40] they can't be trusted, so we don't use them [01:39:08] so, when we puppetize the system, whichever pip installed python libraries you are using need to come from ubuntu, or they need to be packaged [01:39:22] if you are depending on a newer version of a library, it needs to be backported [01:39:52] this is why we recommend trying to avoid using pip as much as possible [01:40:08] first see if its available as an ubuntu package, then use pip if not, and document it [01:45:20] ok [01:45:30] I'm mentioning this in the ticket [01:47:19] New patchset: Bhartshorne; "adding class to install pip for stat1 - dev use only." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2736 [01:47:36] Ryan_Lane: if I "abandon" a change in gerrit, does it stick around for later viewing? [01:47:44] yep [01:47:45] or does it nuke it compeltely? [01:47:47] ok. [01:48:09] I'm going to abandon that change, but it's there for reference if we decide it's necessary later. [01:48:23] oh. you aren't adding it to stat1? [01:49:34] I thought you had convinced drdee not to use it. [01:50:00] well, it's possible they'll actually need it for some things, as much as I hate that [01:50:12] I suppose I can install it anyways and if they don't need it they won't use it. [01:50:31] I'm writing a comment to the ticket that's basically "please don't use pip unless you have to use it, and if you do, make sure to document what pip is installing" [01:50:45] also explaining why we don't want people to use it [01:50:58] New patchset: Bhartshorne; "adding class to install pip for stat1 - dev use only." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2736 [01:51:14] we need to work out the private data policy in labs :) [01:51:19] so that this wouldn't be an issue [01:51:36] well, I guess it really is one anyway [01:51:43] yup. [01:51:53] New review: Bhartshorne; "(no comment)" [operations/puppet] (production); V: 1 C: 2; - https://gerrit.wikimedia.org/r/2736 [01:51:54] Change merged: Bhartshorne; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2736 [01:52:07] we should probably write development recommendations for people in labs [01:52:15] "how to not make ops' life hell" [01:52:19] heh [01:52:22] :) [01:53:02] well, this ticket can act as a starting point for some docs, I guess. [01:56:12] PROBLEM - MySQL replication status on storage3 is CRITICAL: CHECK MySQL REPLICATION - lag - CRITICAL - Seconds_Behind_Master : 635s [01:57:06] PROBLEM - Misc_Db_Lag on storage3 is CRITICAL: CHECK MySQL REPLICATION - lag - CRITICAL - Seconds_Behind_Master : 689s [02:04:24] New patchset: Bhartshorne; "puppet exec enforces full paths; that's cool." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2737 [02:04:45] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:06:30] New review: Bhartshorne; "(no comment)" [operations/puppet] (production); V: 1 C: 2; - https://gerrit.wikimedia.org/r/2737 [02:06:31] Change merged: Bhartshorne; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2737 [02:08:35] https://labsconsole.wikimedia.org/wiki/Help:Development_recommendations_for_easily_moving_to_production [02:08:52] drdee, maplebed: ^^ [02:10:54] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 335 bytes in 5.475 seconds [02:14:00] anyone know where our favicons are stored? (specifically the foundation logo favicon) [02:15:50] normally at /favicon, no? [02:16:20] yep /favicon.ico [02:16:22] it's standard [02:16:32] Jamesofur: or do you mean on the filesystem on fenari? [02:16:39] or the filesystem on the apaches? [02:22:19] Ryan_Lane, I just want to grab the image to make it the favicon on the store. I may be able to grab it on fenari if that's the easiest spot or if it's somewhere in SVN or something where I can grab it [02:22:44] just /favicon.ico on any site you want to get it from [02:22:50] Jamesofur: eg http://wikimediafoundation.org/favicon.ico [02:23:11] thanks! [02:25:11] puppet on stat1 is spewing out 4700 lines of junk because of file ownership of /a. [02:25:18] ::sigh:: [02:27:00] New patchset: Bhartshorne; "trying to remove 4700 lines of spam as stat1 tries to manage /a for ezachte." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2738 [02:27:23] New review: Bhartshorne; "(no comment)" [operations/puppet] (production); V: 1 C: 2; - https://gerrit.wikimedia.org/r/2738 [02:27:23] Change merged: Bhartshorne; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2738 [02:42:33] PROBLEM - Puppet freshness on cadmium is CRITICAL: Puppet has not run in the last 10 hours [03:08:14] RECOVERY - MySQL replication status on storage3 is OK: CHECK MySQL REPLICATION - lag - OK - Seconds_Behind_Master : 23s [03:09:17] RECOVERY - Puppet freshness on mw1002 is OK: puppet ran at Thu Feb 23 03:09:00 UTC 2012 [03:09:35] RECOVERY - Misc_Db_Lag on storage3 is OK: CHECK MySQL REPLICATION - lag - OK - Seconds_Behind_Master : 1s [03:12:26] RECOVERY - Puppet freshness on db46 is OK: puppet ran at Thu Feb 23 03:12:05 UTC 2012 [04:19:07] PROBLEM - MySQL Replication Heartbeat on db1047 is CRITICAL: CRIT replication delay 213 seconds [04:19:52] PROBLEM - MySQL Slave Delay on db1047 is CRITICAL: CRIT replication delay 232 seconds [04:23:19] PROBLEM - MySQL Replication Heartbeat on db1047 is CRITICAL: CRIT replication delay 202 seconds [04:23:46] PROBLEM - MySQL Slave Delay on db1047 is CRITICAL: CRIT replication delay 228 seconds [06:08:29] PROBLEM - Puppet freshness on owa3 is CRITICAL: Puppet has not run in the last 10 hours [06:14:29] PROBLEM - Puppet freshness on owa1 is CRITICAL: Puppet has not run in the last 10 hours [06:14:29] PROBLEM - Puppet freshness on owa2 is CRITICAL: Puppet has not run in the last 10 hours [07:42:06] RECOVERY - MySQL Replication Heartbeat on db1047 is OK: OK replication delay 0 seconds [07:42:15] RECOVERY - MySQL Slave Delay on db1047 is OK: OK replication delay 2 seconds [08:16:06] PROBLEM - Lucene on search9 is CRITICAL: Connection timed out [08:21:48] RECOVERY - Lucene on search9 is OK: TCP OK - 0.023 second response time on port 8123 [08:58:54] :o [08:58:58] someone can fix labs pls [08:59:25] it seems that someone "fixed" firewall or something like that [08:59:42] it's not possible to connect to public instances [09:00:06] ssmollett, mut-away... [09:01:24] I think you'd be more likely to get results if you pinged apergos or mark [09:07:36] petan|wk: when was this known to be working last? [09:12:40] ah, forget me :) [09:12:45] it was my fault heh [09:12:53] ok then :-) [10:27:42] New patchset: ArielGlenn; "verify objects; no-derive tag for uploads; success/failure message." [operations/dumps] (ariel) - https://gerrit.wikimedia.org/r/2739 [10:27:46] New review: gerrit2; "Lint check passed." [operations/dumps] (ariel); V: 1 - https://gerrit.wikimedia.org/r/2739 [11:33:54] PROBLEM - RAID on searchidx2 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [11:35:51] RECOVERY - RAID on searchidx2 is OK: OK: State is Optimal, checked 4 logical device(s) [12:43:47] PROBLEM - Puppet freshness on cadmium is CRITICAL: Puppet has not run in the last 10 hours [13:01:09] can someone please have a look at gerrit and review/merge https://gerrit.wikimedia.org/r/#change,2677 (simple URL change) [13:01:24] does not need puppet to be run right now, it is not that urgent :-D [13:02:03] I have also submitted two changes to ignores python compiled files ( .pyc ) https://gerrit.wikimedia.org/r/#change,2587 and vim swap files (.swp) https://gerrit.wikimedia.org/r/#change,2587 [13:02:16] the python one is https://gerrit.wikimedia.org/r/#change,2514 sorry :D [13:48:54] PROBLEM - RAID on search7 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [13:50:42] RECOVERY - RAID on search7 is OK: OK: 1 logical device(s) checked [16:03:40] mark: hi [16:08:08] robh: rt 2497...please take a look when you get a chance [16:08:21] checkin now [16:09:55] PROBLEM - Puppet freshness on owa3 is CRITICAL: Puppet has not run in the last 10 hours [16:10:31] cmjohnson1: so its the master, and I guess asher would like us to migrate master to another one (so he would do that) [16:10:37] but lets see if i cannot get the disk id [16:10:57] cmjohnson1: disk 10 [16:11:33] cmjohnson1: so normally we can hot swap this, but since asher would prefer we swap master first [16:11:46] see if you cannot see what disk i mean, and then we can have asher do his thing before we replace [16:12:33] robh: now that the cable is done (thanks) I can't actually change stuff over there because the dreaded rsync error is back... so I gotta work it out with the other end before switching anything else up :-/ [16:12:41] soooo aggravating [16:12:54] i do not see any orange indicators on db22 [16:12:58] cmjohnson1: ct asked us to migrate master, not asher, my bad. [16:13:26] master is usually disk1? [16:14:00] cmjohnson1: no [16:14:12] cmjohnson1: ok, disk 10 MAY be flashing an identify led [16:14:20] but since the disk is bad, it may not be takign this command. [16:14:36] do not swap drives on a db master [16:15:16] mark: wasnt going to since ct asked us not to. [16:15:25] robh: all are flashing but one ...i am checking disk diagram to confirm disk 10. [16:15:28] just trying to see if the drive id command is working [16:15:35] cmjohnson1: dont pull anything though [16:15:55] mark: can you check and see if there any connections needed to scs-c1 to row b in pmtpa. I am moving scs-c1 to d1 today [16:15:55] PROBLEM - Puppet freshness on owa1 is CRITICAL: Puppet has not run in the last 10 hours [16:15:55] PROBLEM - Puppet freshness on owa2 is CRITICAL: Puppet has not run in the last 10 hours [16:15:59] robh: no [16:16:11] ok, well, i assigned the ticket to asher [16:16:19] he can migrate which is master and assign back for further work [16:16:23] for now its goign to wait [16:16:29] ok...so we'll standby [16:16:33] it seems it wont take the id command on the bad disk, so later when we can do it [16:16:39] i can id the good disk above it, then below it for you [16:16:51] ok..that is great... [16:16:53] then we can pull the bad disk while its up to see if we got it right, but it could cause downtime so it has to wait [16:17:03] cmjohnson1: yeah, to csw5-pmtpa [16:17:06] that can be a very temp run though [16:17:12] csw5-pmtpa is going away soon, but is still critical now [16:17:47] is it okay to relocate.?..i am going to have to disconnect the cables [16:20:23] yes [16:20:31] we only need it during emergencies [16:20:56] just run a quick cable and make sure it works, don't need to do it neatly [16:21:46] okay...thx [17:04:23] robh: rt 2484 and rt 2485....when you get a chance....memory issues..thx [17:17:31] New patchset: Andre Engels; "Porting my changes to the new version of the code. Main changes: * Including traits.py, containing a number of 'standard' traits, enabling them to be re-used without having to be recopied or rewritten * Including selectors.py, which does the same for " [analytics/reportcard] (andre/mobile) - https://gerrit.wikimedia.org/r/2740 [17:22:54] cmjohnson1: so for this stuff, always check here first [17:22:58] http://noc.wikimedia.org/dbtree/ [17:23:13] this will let you see if its in the server rotation on mysql, in this case it is a slave on s7 [17:23:28] slaves should be able to be shutdown at any time for testing, as long as an ops level person bring it down gracefully [17:23:40] okay [17:25:07] robh: can you bring them down for me. [17:25:16] you have copies to run on both right? [17:25:23] cuz if its one at a time, db15 can go first [17:25:43] go ahead and bring 15 down [17:25:53] and i will take a look at 18 later [17:26:07] ok, since 18 is in rotation, its best to keep it up until you are ready to test [17:26:15] since it means it wont fall far behind in replication [17:26:23] !log db15 shutting down for memtest [17:26:26] Logged the message, RobH [17:26:38] cmjohnson1: db15 is all yours [17:27:36] ok...robh can you send me wikitech link for the pin out for the orange serial cables for scs [17:28:38] http://wiki.wikked.net/wiki/OpenGear#Cisco_-_Opengear_cable [17:35:10] morning, RobH, mark! [17:35:28] mark: the installer is creating a GPT partition table. [17:35:29] hiya [17:35:45] maplebed: i didnt put in my email, but not having a partman on those isnt the worst [17:35:53] sicne it means if they pxe boot they wont wipe out any data [17:36:00] which is the drawback to pxe partman ;] [17:36:22] for the longest time we didnt keep individual server names in there on databases (before they were db# and such) [17:36:26] RobH: ::sigh::. we have 4 now, we'll probably get 5-10 more during the next year, and grow by a few a year beyond that. [17:36:51] but yeah, we could do them by hand. [17:36:54] well a quick look at it makes me think what you did should work [17:37:12] so it means a lot of partman work, which both myself, daniel, peter, and mark can tell you, sucks. [17:37:16] ;] [17:39:03] and me and leslie too. [17:39:27] yea, would be nice to have it automated =P [17:39:39] ok, well, assuming we do give up, I've never done an install without partman; are there docs? [17:40:38] we do not have any ourselves nah, its basic ubuntu installer stuff, so they would have it documented. basically you want to pxe boot and attach to console, then when it gets to the partitioning propmts it will stop for your input [17:40:44] you want to choose manually partition option [17:41:20] then it will show you all the disks, you want to then setup the bios 1m part on sda and sdb, then setup two identical partitions on those two disks at 120GB for the / [17:41:30] and choose 'physical whatever for software raid' as the type [17:41:55] then setup your swap and choose the raid setup option [17:42:06] which then is pretty self explanatory but ping me when yer there if you want help =] [17:44:56] oh, so do all the normal building a servers stuff but just have no entry for the host in netboot.cfg? [17:48:49] hey guys. can i get a password reset on Mobile-feedback-l Moderator pleease [17:48:59] now that i'm back id like to clean up that list [17:49:08] its its not a a quick fix then i'll file a ticket [17:51:25] * tfinc goes to cut a ticket [17:51:34] http://rt.wikimedia.org/Ticket/Display.html?id=2508 hopefully one of you can take a look [17:52:19] tfinc: I think you can request a password reminder. [17:52:20] * maplebed verifies [17:53:28] thanks maplebed [17:53:40] i'm not seeing it but could easily be not looking in the right place [17:56:16] ok, I have it narrowed down the the OAI harvester never starting on the new search indexer [17:56:19] progress! [17:56:39] but why.... [17:56:41] win! [17:57:11] it's not an access thing, as the requests just go to the eqiad LBs [17:58:02] it's actually that it never start [17:58:05] things like [main] INFO org.wikimedia.lsearch.oai.IncrementalUpdater - Authenticating ... [17:58:09] should show up in the logs [17:58:19] but even on debug, the letters oai never show up in the logs [17:59:29] notpeter, where is the log for the OAI process? [18:00:08] rainman-sr: on searchdix2 it just goes to log-all [18:00:32] yep, that i know [18:00:38] where is it on searchidx1001 [18:00:38] ah [18:00:51] that's a very good question... [18:00:57] how is that configed? [18:00:58] the incremental updater is a special process [18:01:31] i.e. it's a standalone java program [18:01:44] ah, ok [18:02:12] e.g. look at /home/rainman/scripts/search-inc-all [18:02:52] and search-restart-indexer-searchidx2 [18:02:59] oh! [18:03:02] which is calling it to start/restart indexing jobs [18:03:02] ok [18:03:18] ok, this is making a lot more sense! [18:03:19] woo! [18:03:23] ;) [18:03:30] is it reasonable to just include that in the init script? [18:11:42] * AaronSchulz wonders what keeps periodically breaking graphite [18:14:24] maplebed: any luck ? [18:19:23] LeslieCarr: any idea why noc.wikimedia.org/cgi-bin/report.py is slow? [18:20:36] tfinc: sorry I got distracted. [18:21:18] AaronSchulz: not at all :) i can look [18:23:36] wow that's incrededibly slow [18:23:39] noc is slow [18:23:53] yeah, but there's no real reason i ts hould be like that [18:24:05] cpu's are low, free memory [18:24:10] quick, let's all click on it and see if it goes faster! [18:24:20] no crazy errors in logs [18:24:41] oh [18:24:48] usually it is slow when someone sends crapload of profiling sections [18:24:57] well this could be why Timeout waiting for output from CGI script /usr/lib/cgi-bin/report.py, referer: http://noc.wikimedia.org/cgi-bin/report.py?db=1.19 [18:25:07] which is not the case now either [18:25:41] oh wait [18:25:43] it isn't spence nowadays [18:25:49] it's fenari now [18:26:23] i'm going to guess something is wrong with the script (ms obvious!) [18:26:27] no [18:26:29] its professor [18:26:37] and no, there's nothing wrong with the script [18:27:46] root@professor:/tmp# du -sh stats.db [18:27:46] 556M stats.db [18:27:47] that would do it [18:27:48] :) [18:27:51] ah :) [18:27:53] hehe [18:28:02] * AaronSchulz tried to clear profiling [18:28:10] it can't run through that big of a db ? [18:28:12] template profiling is enabled again [18:28:14] I had it hacked out [18:28:19] someone decided it is good idea to enable it again [18:28:20] yep [18:28:21] in 1.19 branch [18:28:24] I was looking at that [18:29:11] we need fair sampling for that stuff [18:29:16] full profile won't work [18:33:48] bonvenon qchris :) [18:34:55] http://live.wikimedia.in/ [18:35:15] maplebed: I believe you don't need to give grub a partition when installing to gpt, just the device [18:35:23] but not 100% sure... but I think it may be failing for some other reason [18:35:33] perhaps trying the install command manually in the installer will clear it up [18:35:47] yeah, I thought so too. [18:35:53] domas: any process you can nuke? [18:36:09] maplebed: perhaps I'll try it when I'll install "my" node [18:36:15] if you don't beat me to it [18:36:21] aaronschulz: what do you mean? [18:36:28] I won't; I'm going to build one by hand for now. [18:36:31] ok [18:36:35] domas: to make report.py responsive [18:36:56] aaronschulz: I already said what is wrong [18:37:04] template profiling is back on [18:37:14] which is sending profiling keys like... [18:37:15] not as of three minutes ago [18:37:30] 1.19:-:Parser::braceSubstitution-title-First_year_of_the_Czech_Municipalities_Photographs_grant/layR* [18:37:31] ah [18:37:37] just clear-profile then! [18:37:39] I think it needs a kick though [18:37:42] already tried, twice [18:37:53] hm [18:37:58] maplebed: thanks for helping out. i'll keep watch over the rt ticket to track progress. [18:38:17] tfinc: I was wrong; there is no reminder for the moderator address. [18:38:19] that was what I did 10 min ago, which made me think maybe it wasn't templates...but it actually just has no effect [18:38:21] just a reset. [18:38:31]