[02:22:19] PROBLEM - Misc_Db_Lag on storage3 is CRITICAL: CHECK MySQL REPLICATION - lag - CRITICAL - Seconds_Behind_Master : 1767s [02:25:29] PROBLEM - MySQL replication status on storage3 is CRITICAL: CHECK MySQL REPLICATION - lag - CRITICAL - Seconds_Behind_Master : 1957s [02:32:41] !log LocalisationUpdate completed (1.18) at Mon Feb 6 02:32:41 UTC 2012 [02:32:44] Logged the message, Master [02:40:55] RECOVERY - Misc_Db_Lag on storage3 is OK: CHECK MySQL REPLICATION - lag - OK - Seconds_Behind_Master : 4s [02:46:35] RECOVERY - MySQL replication status on storage3 is OK: CHECK MySQL REPLICATION - lag - OK - Seconds_Behind_Master : 1s [05:10:38] zzz =_= [06:24:01] New patchset: Asher; "adding db1017 as an eqiad db collector" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2292 [06:24:30] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/2292 [06:24:45] New review: Asher; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/2292 [06:24:46] Change merged: Asher; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2292 [06:27:19] PROBLEM - MySQL Slave Delay on db1033 is CRITICAL: CRIT replication delay 36092 seconds [06:34:49] PROBLEM - MySQL Slave Delay on db1017 is CRITICAL: CRIT replication delay 25891 seconds [06:38:09] PROBLEM - MySQL Slave Delay on db1047 is CRITICAL: CRIT replication delay 24180 seconds [06:55:29] PROBLEM - MySQL Slave Delay on db42 is CRITICAL: CRIT replication delay 32488 seconds [07:24:19] RECOVERY - MySQL Slave Delay on db1047 is OK: OK replication delay 0 seconds [07:26:59] RECOVERY - MySQL Slave Delay on db1017 is OK: OK replication delay 0 seconds [07:32:46] PROBLEM - MySQL Slave Running on db1005 is CRITICAL: CRIT replication Slave_IO_Running: Yes Slave_SQL_Running: No Last_Error: Error Unknown table __old_revision on query. Default database: d [07:41:06] PROBLEM - MySQL Slave Running on db1021 is CRITICAL: CRIT replication Slave_IO_Running: Yes Slave_SQL_Running: No Last_Error: Error Unknown table __old_revision on query. Default database: d [07:48:59] thanks hexmode for bug 28026 [08:17:16] RECOVERY - MySQL Slave Delay on db1033 is OK: OK replication delay 0 seconds [08:26:56] RECOVERY - MySQL Replication Heartbeat on db42 is OK: OK replication delay 0 seconds [08:36:46] PROBLEM - Puppet freshness on lvs1003 is CRITICAL: Puppet has not run in the last 10 hours [08:36:46] PROBLEM - Puppet freshness on lvs1006 is CRITICAL: Puppet has not run in the last 10 hours [08:37:06] RECOVERY - MySQL Slave Delay on db42 is OK: OK replication delay 0 seconds [09:00:08] New review: Dzahn; "(no comment)" [analytics/udp-filters] (master) C: 1; - https://gerrit.wikimedia.org/r/2235 [09:00:45] New review: Dzahn; "(no comment)" [analytics/udp-filters] (master) C: 1; - https://gerrit.wikimedia.org/r/2234 [09:11:45] helllo [09:19:26] does anyone know why the sidebar on Commons changes the url-names with changing the language, and why I can't get that done on be.wikimedia.org ? [09:48:58] New patchset: Dzahn; "add class for stat server per ezachte (RT 2162)" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2293 [09:49:17] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/2293 [09:51:25] New patchset: Dzahn; "add class for stat server per ezachte (RT 2162), apply on stat1" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2293 [09:51:42] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/2293 [09:57:19] PROBLEM - Disk space on srv223 is CRITICAL: DISK CRITICAL - free space: / 154 MB (2% inode=64%): /var/lib/ureadahead/debugfs 154 MB (2% inode=64%): [10:08:49] RECOVERY - Disk space on srv223 is OK: DISK OK [11:04:35] !log nikerabbit synchronized php-1.18/extensions/Translate/ 'i18ndeploy r110738 Translate fixes' [11:04:37] Logged the message, Master [11:26:40] anybody here with shell access to perform massive deletion? [11:27:10] omg [11:28:50] !log demon synchronized wmf-config/InitialiseSettings.php 'Fixing vepwiki logo - bug 34222' [11:28:51] Logged the message, Master [11:32:04] !log demon synchronized wmf-config/InitialiseSettings.php 'Fixing vepwiki logo - bug 34222' [11:32:06] Logged the message, Master [11:34:36] Nemo_bis: what happened? [11:34:59] Danny_B|backup, dunno, you're the one asking mass shell deletion [11:35:07] :p [11:36:36] @replag [11:38:32] Nemo_bis: not shell deletion ;-) just perform a query to mass delete some stuff we don't want anymore [11:38:38] hm, Krinkle, Last seen : Jan 19 20:06:55 2012 (2 weeks, 3 days, 15:30:45 ago) [11:38:46] ? [11:38:50] ah, that's reassuring :p [11:38:55] Krinkle, where is dbbot [11:39:04] well, it [11:39:07] well, it's not here [11:39:15] so that leaves only one place [11:39:22] it reached a happier places? [11:39:51] Nope, willow ate it [11:39:56] it's back [11:39:59] * Nemo_bis hugs Krinkle [11:40:01] ascended into a higher plane of existant [11:40:05] @replag [11:40:07] Nemo_bis: No replag currently. See also "replag all". [11:40:13] @replag all [11:40:14] Nemo_bis: [s1] db36: 0s, db12: 0s, db32: 0s, db38: 0s, db52: 0s, db53: 0s; [s2] db30: 0s, db13: 0s, db24: 0s, db54: 0s; [s3] db34: 0s, db39: 0s, db25: 0s, db11: 0s [11:40:15] Nemo_bis: [s4] db31: 0s, db22: 0s, db33: 0s, db51: 0s; [s5] db45: 0s, db44: 0s, db55: 0s; [s6] db47: 0s, db43: 0s, db46: 0s, db50: 0s; [s7] db16: 0s, db37: 0s, db18: 0s, db26: 0s [11:40:25] Nemo_bis: Actually, I'm going to shut it down for just a second and refresh it's static copy of noc.wikimedia.org [11:40:33] it's cluster info is outdated [11:40:34] ok [11:40:38] @exit [11:40:41] @quit [11:40:46] tyvm [11:44:01] yay [11:46:39] ok, done [11:47:24] @replag [11:47:25] Nemo_bis: No replag currently. See also "replag all". [11:47:51] no replag? that seems odd [11:48:12] disbeliever [11:48:12] No replay, so no lag higher than 1s [11:48:25] @replag filters upto 1.5s last I checked [11:48:25] Krinkle: Unknown identifier [11:48:29] yeah yeah [11:49:05] Nemo_bis: In case you're interested, dbbot-wm runs on "Kribo" with a small bridge-plugin that implements the wmfDbBot shell tool that I wrote. [11:49:19] wmfDbBot is actually a shell tool, and the bridge makes it available through the its bot [11:49:21] irc bot [11:49:21] yes, yes, very interesting [11:49:26] * Nemo_bis pretends to understand [11:49:34] https://svn.toolserver.org/svnroot/krinkle/trunk/Kribo/ [11:49:45] https://svn.toolserver.org/svnroot/krinkle/trunk/Kribo%20(plugins)/wmfDbBot_KriboBridge/ [11:50:21] * Nemo_bis hums pretentiously [11:50:52] I just hope people on Wikimedia Labs with their bots are not reinventing the wheel [11:51:09] I already dislike this "we're replacing the Toolserver" rhetoric enough [11:51:33] Well, it's so much more than that. [11:51:53] What Toolserver is currently doing/offering would be a very small subset of what Wikimedia Labs is [11:52:57] But there is no intention of taking over or luring away anyone from Toolserver. Wikimedia Labs is just getting started and Toolserver will and should be kept up and running [11:53:16] Krinkle, this is not what it's bein suggested [11:53:21] (by the WMF) [11:54:36] I don't know where you get the info from, but from speaking with the engineers actually building the thing I know that Toolserver is not going to be "taken over" from this side. Any toolserver users interested in using Labs are free to migrate whenever they want. There is no pressure or rush. Just let users do it the way they want, there is no problem. [11:54:49] «The first target of Tool Labs is to be a *replacement* for Toolserver.» (bold added) [11:54:53] https://www.mediawiki.org/wiki/Wikimedia_Labs [11:55:02] and from the very first announce Eloquence made in WMCOn last year [11:55:04] "Tool Labs" is not "Wikimedia Labs" [11:55:21] «Wikimedia Labs is a two-part project aimed at improving the volunteer involvement in operations and software development. The first part of this project is Test/Dev Labs, and the second part is Tool Labs.» [11:55:23] "Tool Labs" is _a_ proposed project in Wikimedia Labs, and hasn't even started yet. [11:55:34] exactly [11:56:00] if B is a subset of A and B does something, A does something [11:56:00] Nemo is right though; it is definitely part of the plan, from early on, to replace toolserver with the labs [11:56:48] or more explicitly, to kill the Toolserver [11:56:56] Sure, but there is intention of hijacking the toolserver user base and forcing them to use Labs as soon as possible. [11:57:09] "is no intention"? [11:57:10] There is still lots of things that Toolserver has that Labs doesn't offer yet [11:57:26] It will supersede it in the long run, that's the plan. [11:57:44] it will be better, and afaik both parties know about this and that's fine. [11:57:48] Yes, it would have been interesting to know WMDE's opinion on this [11:58:18] I mean, /before/ deciding it [11:59:03] afaik Toolserver was originally a project to be done by WMF, but for some reason that could't happen and WMDE did a good job at Toolserver instead, and now there's gonna be a V2 of that plan, and it's called Labs [11:59:48] well, yes, this is an imaginative reconstruction :) [12:00:06] i hardly doubt labs will allow the same what toolserver allows [12:00:25] Daniel_WMDE: It's quite the reverse actually [12:00:48] Daniel_WMDE: Do you have your own virtual environment where you have root access and can basically do whatever your project needs ? [12:00:56] not in Toolserver. [12:01:09] can i run 26 hours sql query on labs? [12:01:15] probably, yes. [12:01:26] store 2 Tb data? [12:01:30] lol [12:01:30] You'll basically have your own instance to do whatever you want. [12:01:55] as long as it's within policies and project the instance belongs to obviously [12:01:57] but i assume still no access to some db tables [12:02:01] (i.e. not hosting a torrent site) [12:02:20] btw, I'd like someone to do a du -hcs on hemlock [12:02:21] Danny_B|backup: Well, what do you expect ? Read access to mw_user.user_password ? [12:02:34] nope [12:02:36] Replication will be the same [12:02:45] but better access to prefs [12:02:48] no more, no less. [12:02:53] That's trivial. [12:02:59] ts has limited view on prefs [12:03:30] I would love it if everyone that copies dumps to their local instance on a lab does it to some central location shared among all instances [12:03:30] basically non matching with users. so it's unable to perform statistics on it [12:03:38] Setting up extra views or better indexes is like a very very minor detail that won't be a problem or point in migration. [12:04:17] Krinkle: that's all true except: storage space is limited. and mechanisms for live replication of wiki content are still very unclear [12:04:36] right, texts are not available on ts [12:04:41] Krinkle: it seems unlikely that replication will be per-vm. there will probably be a central replicated copy. [12:04:48] Daniel_WMDE: Ofcourse [12:04:52] which will have alkl the problems and limitations we have on the TS [12:04:55] There are global projects that any project can connect to [12:05:03] such as user storage and replication [12:05:10] yea. [12:05:11] as well as authtentication [12:05:22] Danny_B|backup: that is unlikely to change, those restrictions are mostly due to the privacy policy as is and legal stuff [12:05:26] which means that there will be pretty much the same limitations as on the TS [12:05:38] Ofcourse, Toolserver isn't setting these limits for fun. [12:05:41] There is a reason for it :) [12:06:32] i can't see any reason for that. dumps are publicly available and contain texts. both of actual and of the entire history [12:06:50] so what is the reason to not have it available directly [12:07:08] Danny_B|backup: They are stored differently on WMF's side [12:07:16] They're not regular MySQL tables like the rest of the database [12:07:36] they are stored in separate database clusters, compressed and grouped together by page id [12:08:04] there *should* be some way how to access texts comfortably in db and not via dumps [12:08:13] etherpad acting up [12:08:45] which means that in order to get any given page's text revisions you must first go to the revision table, get the revision ids, then go to the text table, get the cluser pointer, then go to the cluster db, request the texts [12:08:59] Danny_B|backup: Let's focus first on making sure WMFLabs/Tool will do a good job of providing at least the same services as Toolserver, and then we could add additional features. [12:09:34] Krinkle, actually, it would have been beter to do the opposite [12:09:45] i.e. not decide "let's replace the Toolserver" [12:09:59] but "let's see what's missing, what needs to be developed first and how" [12:10:20] Nemo_bis: I didn't mean actually building it first but writing it as a specification [12:10:33] it's the same [12:10:49] the decision to replace the Toolserver came before any justification for it [12:10:53] so write up what we got, what we need, propose it for users, give it some time and then implement it [12:10:55] not to mention discussion with stakeholders [12:11:18] not "do whatever TS is doing and then add some stuff" [12:13:14] Danny_B|backup: we can't replicate the text in the database becauuse the text isn't in the database. or more precisely: it's in different databases, with several revisions compressed together in blobsd [12:13:20] they are essentially useles [12:13:37] we could maintain our own tull text database, for full text searching, etc [12:13:45] but relication would have to be done differently [12:13:49] and it would need a lot of space [12:14:13] this is actually something that has been on the todo list for a long time - but we never found a way to make it work with the resources we have [12:15:31] Krinkle: my impression is that people behind Labs/Tools havn't even started to really look at what's required for the level of database access we have on the TS. [12:15:33] Daniel_WMDE: i know, i remember our talks on this topic. that's why i pointed it out, that it should be one of the first things labs would provide, as ts does not [12:16:01] issues like "replicate commons to every server" or "have a meta-database wit hall the namespace names" or "allow user databases on the same box as the wiki databases" [12:16:26] yea. full text and a good search engine. and full replication [12:16:29] would be very nice. [12:16:32] it'S not trivial at all [12:16:35] <^demon> Holy scrollback batman. [12:16:42] <^demon> I step out to the store for 5 minutes... [12:16:42] :P [12:16:58] * Daniel_WMDE just noticed this is #wikimedia-tech, not #wikimedia-toolserver [12:17:56] ^demon: next time, ask for deliver instead [12:18:04] Daniel_WMDE: There have been several meetings with toolserver users (and ts-admins iirc) to basically discuss what they like and dislike about Toolserver. And I brought up most of how Toolserver works with db infrastructure (toolserver.wiki/namespacenames etc., as well as s4 on all servers, user debases, offering svn repos, and MMT etc.) [12:18:04] :-D [12:18:19] However that's not been documented much yet, because TestDev Labs is going to be done first [12:18:27] ToolLabs is not up for another few months [12:18:56] When the time comes there will be more communication and getting insight into how things work at TS. [12:18:58] <^demon> Who needs labs when you can run things directly on the cluster? ;-) [12:19:06] Krinkle: perhaps invite me to one of those meetings :) [12:19:07] something like that [12:19:16] I have used spare snapshot4 cycles quite a number of times... [12:19:22] Daniel_WMDE: I have no problem with that. I'll be happy to bring that up. [12:19:25] but aall having to do with the dumps, so I guess it's allowed [12:19:33] Daniel_WMDE: Especially if you propose it yourself, will do :) [12:20:19] i also am not aware of any of those meetings and would appreciate the possibility to join [12:21:23] <^demon> apergos: We could just shelve labs. Shell accounts for everyone :p [12:21:42] * apergos could just shelve ops and go to the bahamas. [12:21:52] oh wait, I'm already in a vacation destination... :-P :-P [12:22:18] you know what we say if you break the site, right? [12:22:30] run like the wind? [12:22:38] "you break it, you own it" [12:22:40] :-P :-D [12:23:08] (well we also say "I broke the site doing X and all I got was this lousy t-shirt" but that's only for ops really) [12:23:18] <^demon> I broke nothing. [12:23:22] <^demon> It's all hearsay! [12:23:23] I think I would prefer, You break it, You do a few of the bar rounds... [12:23:52] ... for *everyone* that didn't have access. [12:31:29] last year in berlin hackathon, some people broke wikipedia ;-) [12:32:10] <^demon> It's not a hackathon unless the site comes down ;-) [12:32:19] * apergos bans all hackathons [12:32:49] <^demon> spoilsport. [12:33:00] yup! [12:33:11] lets all break apergos then! [12:34:48] I like my beauty sleep (plus chronic stress is bad for life expectancy) [14:58:23] New patchset: ArielGlenn; "configuration for rolling rsyncs from dataset1001 to dataset2 and v.v." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2295 [14:59:08] !log creating Translate db tables for incubator [14:59:10] Logged the message, Master [14:59:11] New review: ArielGlenn; "(no comment)" [operations/puppet] (production); V: 1 C: 2; - https://gerrit.wikimedia.org/r/2295 [14:59:12] Change merged: ArielGlenn; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2295 [15:02:01] hi all [15:02:22] New patchset: Mark Bergsma; "updated list of server for nrpe to include nagios server" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2296 [15:03:28] New patchset: Mark Bergsma; "updated list of server for nrpe to include nagios server" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2296 [15:04:00] New review: Mark Bergsma; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/2296 [15:04:01] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2296 [15:05:01] anyone there? [15:08:43] subha, ask your question [15:08:49] don't just say "anyone there?" [15:08:53] :) [15:09:31] alright, can we have open source formats like .odt and .ods uploading feature added in commons? [15:11:56] New patchset: ArielGlenn; "and actually add the rsync module for rolling dump host rsyncs" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2297 [15:13:02] New review: ArielGlenn; "(no comment)" [operations/puppet] (production); V: 1 C: 2; - https://gerrit.wikimedia.org/r/2297 [15:13:03] Change merged: ArielGlenn; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2297 [15:14:39] !log reedy synchronized wmf-config/InitialiseSettings.php 'Bug 34219 - Disable Babel autocreation of categories at the Portuguese Wiktionary' [15:14:40] Logged the message, Master [15:15:03] hi Reedy [15:18:27] !log nikerabbit synchronized wmf-config/InitialiseSettings.php [15:18:29] Logged the message, Master [15:19:05] !log that was Bug 34213 - Enabling of Extension:Translate on the Wikimedia Incubator [15:19:07] Logged the message, Master [15:25:27] Hi Nikerabbit [15:26:29] subha, no [15:26:59] why? [15:27:33] subha, https://bugzilla.wikimedia.org/show_bug.cgi?id=16241 [15:27:55] in short, it's enabled only on some private wikis due to security problems [15:28:07] although IMHo it should be enabled at least on wikimania wiki too [15:29:01] and https://bugzilla.wikimedia.org/show_bug.cgi?id=24230 is supposedly fixed now [15:29:49] Alright Nemo, thanks a lot Nemo_bis, good day! [15:32:37] subha, so I guess the right bu to comment is now [15:32:38] https://bugzilla.wikimedia.org/show_bug.cgi?id=2089 [15:32:40] argh [15:32:51] well, he'll find it [15:52:50] Reedy: are you working on bugs right now? [15:53:11] Yup [15:53:36] nearly done with this one [15:55:03] Reedy: would you like to add unset( $wgGroupPermissions['translate-proofr'] ); in commonsettings under wmgUseTranslate? [15:55:14] !log reedy synchronized wmf-config/InitialiseSettings.php 'Bug 34124 - Create a Project namespace on Russian Wikipedia' [15:55:16] Logged the message, Master [15:56:05] New review: Mark Bergsma; "If we're redoing it anyway, why wouldn't we rewrite everything to use the "sudo_user" definitions, i..." [operations/puppet] (production); V: 0 C: -2; - https://gerrit.wikimedia.org/r/2283 [15:57:38] !log reedy synchronized wmf-config/CommonSettings.php 'unset( $wgGroupPermissions[translate-proof] )' [15:57:39] Logged the message, Master [15:57:55] bah [15:58:34] !log reedy synchronized wmf-config/CommonSettings.php 'add r' [15:58:36] Logged the message, Master [16:00:34] aww crap [16:01:02] New patchset: Mark Bergsma; "Move misc::torrus to a separate file" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2315 [16:01:21] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/2315 [16:01:25] also need to unset $wgAddGroups['translate-proofr'] to have the group not appear there [16:02:03] New review: Mark Bergsma; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/2315 [16:02:04] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2315 [16:07:53] !log nikerabbit synchronized wmf-config/CommonSettings.php 'Do not add translation reviewers group' [16:07:55] Logged the message, Master [16:22:10] !log Updated user_former_groups.ufg_group to 32 characters [16:22:11] Logged the message, Master [16:46:36] !log reedy synchronized wmf-config/InitialiseSettings.php 'Bug 34009 - Translation of project namespace and site-name for ta.wikiquote' [16:46:38] Logged the message, Master [16:49:55] !log reedy synchronized wmf-config/InitialiseSettings.php 'Bug 34043 - Change logo at Indonesian Wikibooks' [16:49:56] Logged the message, Master [17:14:21] New patchset: Mark Bergsma; "Experimental attempt of restyling misc web server setup" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2316 [17:23:09] New patchset: Mark Bergsma; "Experimental attempt of restyling misc web server setup" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2316 [17:24:47] New patchset: Mark Bergsma; "Experimental attempt of restyling misc web server setup" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2316 [17:27:45] New patchset: Mark Bergsma; "Experimental attempt of restyling misc web server setup" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2316 [17:33:30] New patchset: Mark Bergsma; "Experimental attempt of restyling misc web server setup" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2316 [17:35:40] New review: Mark Bergsma; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/2316 [17:35:40] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2316 [17:47:27] why does etherpad redirect me to https now? that doesn't work [17:48:14] Nemo_bis, using httpsanywhere? [17:48:17] Nemo_bis: WFM [17:48:19] Even /with/ HE [17:48:25] (Although I haven't upgraded to FF 10 yet) [17:48:30] I did [17:48:37] but you didn't remove it from exclusions I see [17:49:34] New patchset: Bhartshorne; "changing swift's backend to ms5 in prep for deploying it to production" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2317 [17:50:14] RoanKattouw_away, yes it's HE [17:50:29] New patchset: Mark Bergsma; "Disable misc::torrus on streber for now, add on manutius" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2318 [17:50:53] 2.0development.5 [17:51:55] hm https://www.eff.org/files/Changelog.txt [17:54:02] New review: Mark Bergsma; "(no comment)" [operations/puppet] (production); V: 1 C: 2; - https://gerrit.wikimedia.org/r/2318 [17:54:17] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2318 [17:55:14] New review: Bhartshorne; "(no comment)" [operations/puppet] (production); V: 1 C: 2; - https://gerrit.wikimedia.org/r/2317 [17:55:27] Change merged: Bhartshorne; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2317 [18:08:27] New patchset: Mark Bergsma; "Remove inclusion of nonexistant class" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2319 [18:18:54] New review: Mark Bergsma; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/2319 [18:18:54] New review: Mark Bergsma; "(no comment)" [operations/puppet] (production); V: 1 C: 2; - https://gerrit.wikimedia.org/r/2319 [18:18:54] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2319 [18:19:49] New patchset: Mark Bergsma; "Fix syntax error" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2320 [18:21:00] New review: Mark Bergsma; "(no comment)" [operations/puppet] (production); V: 1 C: 2; - https://gerrit.wikimedia.org/r/2320 [18:21:01] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2320 [18:48:09] PROBLEM - Puppet freshness on lvs1003 is CRITICAL: Puppet has not run in the last 10 hours [18:48:09] PROBLEM - Puppet freshness on lvs1006 is CRITICAL: Puppet has not run in the last 10 hours [18:58:01] Is there a problem with file thumbnails? [18:58:11] er [18:58:17] yeah temporarily [18:58:35] ok, thanks [18:59:04] New patchset: Asher; "adding db1043 to s1 eqiad" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2321 [19:00:23] New review: Asher; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/2321 [19:00:23] Change merged: Asher; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2321 [19:05:09] site down! [19:05:14] [19:05:18] :O [19:05:48] ya the site is going real slow [19:05:53] It's not down [19:06:07] I wouldn't even call it slow [19:06:28] When it takes over 30 seconds to load any image on the page it's slow [19:06:30] (india) [19:06:37] (canada) [19:06:43] now it's working [19:06:53] first it showed : the website is experiencing technical difficulties [19:06:55] then database error [19:06:56] now back [19:07:21] The issues are known and being actively worked on. [19:08:03] thanks robh! [19:08:36] welcome, plus we have a full house on ops (as much as possible at least) since right now is technically our ops meeting ;] [19:08:55] guess it's a work meeting :-P [19:09:06] getting SQL errors [19:09:25] 1637: Too many active concurrent transactions [19:09:27] well, the issue is the caching servers that arent online rightnow are going to start toppling over other shit. [19:09:56] That's the parser cache going lolno [19:11:35] heh [19:11:38] what the heck happened to ganglia [19:13:03] did someone purge all parser cache or what? [19:13:11] image squids are down [19:13:23] some squids rebuilding their cache right now [19:13:38] oh fun [19:13:38] text squids too? [19:14:04] RECOVERY - Host db1018 is UP: PING OK - Packet loss = 0%, RTA = 30.86 ms [19:14:14] apparently some of them were restarted too [19:14:24] PROBLEM - Backend Squid HTTP on sq52 is CRITICAL: Connection refused [19:14:24] PROBLEM - Backend Squid HTTP on sq80 is CRITICAL: Connection refused [19:14:24] PROBLEM - Backend Squid HTTP on sq86 is CRITICAL: Connection refused [19:14:35] PROBLEM - Backend Squid HTTP on sq42 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:14:55] I wonder why there're increased pcache writes [19:15:04] PROBLEM - NTP on db1018 is CRITICAL: NTP CRITICAL: Offset unknown [19:15:25] so far it's just been about upload squids [19:16:28] so why application squid cpu went up? [19:16:33] and traffic? [19:17:04] so it was the result of a swift deployment that went awry [19:17:20] issue with the squid confs prolly [19:17:38] Oh swift is getting deployed today? [19:17:47] well [19:17:51] there were lots of pcache invalidations [19:18:03] domas: actually in memcached? [19:18:04] PROBLEM - Backend Squid HTTP on sq57 is CRITICAL: Connection refused [19:18:11] that would be something else [19:18:23] a tiny but of traffic is scheduled to be moved,yes [19:18:27] AaronSchulz: pretty every graph I see shows app cpu hike and lots of replaces into pcache [19:18:43] johnduhart: it's like a rolling upgrade, first 1/256th of all thumbs, then 1/128th.. and so on.. afaik [19:18:44] PROBLEM - mysqld processes on db1018 is CRITICAL: PROCS CRITICAL: 0 processes with command name mysqld [19:18:56] mutante: ah [19:19:36] AaronSchulz: similar thing happened recently [19:19:42] SqlBagOStuff? wth? [19:19:42] AaronSchulz: could've been template invalidation too [19:19:58] vvv: lol [19:20:05] Is that parser-cache-in-DB? [19:20:12] yes [19:20:17] yeah [19:20:53] root@db40:/sys/block/sda/queue# echo 512 > nr_requests [19:20:55] \o/ [19:21:27] ow http://ganglia.wikimedia.org/latest/?c=Miscellaneous%20pmtpa&h=ms5.pmtpa.wmnet&m=load_one&r=hour&s=by%20name&hc=4&mc=2 [19:24:21] anyway, surprise surprise, if you send 10x workload to a system, it may be slower [19:24:24] domas: I wish we had a graph of profile calls to easily spot spikes in particular functions...I guess one could diff from a previous profiling table [19:24:37] AaronSchulz: Asher built something for that [19:24:42] he is graphing individual profiling sections [19:24:48] I was hoping it covered this [19:24:58] hopefully I actually will have access to it sometime [19:25:06] :-) [19:25:26] domas: why aren't you in security? we needs lulz like that surprise surprise in there [19:25:28] * AaronSchulz discovered a thumbnail exploit today [19:25:48] binasher: btw I can't get into security/staff...fyi :) [19:25:50] binasher: someone put stupid perms on the channel, so my client doesn't always autojoin [19:25:54] RECOVERY - Backend Squid HTTP on sq52 is OK: HTTP OK HTTP/1.0 200 OK - 459 bytes in 0.023 seconds [19:25:54] RECOVERY - Backend Squid HTTP on sq80 is OK: HTTP OK HTTP/1.0 200 OK - 459 bytes in 0.002 seconds [19:26:05] grr [19:26:19] binasher: and I forget things, like joining channels manually [19:26:24] RECOVERY - Backend Squid HTTP on sq42 is OK: HTTP OK HTTP/1.0 200 OK - 460 bytes in 0.017 seconds [19:26:27] AaronSchulz: you need to get a labs account for the profiler graphing [19:26:39] where is the profiler url? [19:27:05] RECOVERY - NTP on db1018 is OK: NTP OK: Offset 0.01375472546 secs [19:27:59] New patchset: Andre Engels; "* Creating a new separate variable.py, which amongst other things is a repository for variables, so we don't have to re-define them each time * Moved the actual determination of what a variable does with a log line to this new variable.py, so that the" [analytics/reportcard] (master) - https://gerrit.wikimedia.org/r/2322 [19:28:01] domas: https://upload.wikimedia.org/wikipedia/commons/thumb/x/xx/Little_kitten_.jpg/799px-Little_kittenajsdhfa_.jpg [19:28:10] hehe, file deletion won't purge that I bet :D [19:28:51] it sends the purge URLs based on the actual relative path, not that fake one I posted with fake hash dirs [19:29:08] :-) [19:29:34] RECOVERY - Backend Squid HTTP on sq57 is OK: HTTP OK HTTP/1.0 200 OK - 459 bytes in 0.042 seconds [19:29:45] one could upload pr0n and hotlink to thumbs for days without them going away even if the source file was deleted [19:30:19] * AaronSchulz should prolly just patch thumb.php [19:30:29] * AaronSchulz is lazy [19:30:59] AaronSchulz: Yeah we should probably patch thumb.php so those URLs don't work any more [19:31:08] or 302 to the canonical one [19:31:19] and you should review r99317 [19:31:30] !r 99317 [19:31:35] bah [19:31:38]