[00:01:44] yeah, basic on what siebrand just posted, looks like I was wrong anyway [00:02:59] I'm going to fix this bug [00:03:34] Also see http://lists.wikimedia.org/pipermail/wikitech-l/2011-September/055034.html (first in thread in which 7 or so participated) [00:04:52] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 335 bytes in 2.608 seconds [00:05:04] if you change an interface then you have to survey the existing code to see what uses that interface, and update all those callers [00:05:40] if Nikerabbit didn't have time to do that, then we should have backed out the code to a development branch until he did [00:06:15] TimStarling: there are hundreds of callers. [00:06:24] so? [00:06:31] sounds like a big job [00:06:34] TimStarling: srsly? [00:06:44] you want us to do it after deployment when all those callers are broken? [00:06:47] yeah, I'm with Tim on this one [00:06:53] instead of doing it before deployment? [00:07:04] ok, so what is going here? [00:07:20] IIRC the new logging classes should be BC with the old ones. [00:07:39] Do I understand correctly, that old and new are not working well together? [00:07:39] we're at where we're at now, though....we can rehash what we should have done later [00:08:00] siebrand: yeah, that's what we're piecing together [00:08:06] we've been using this since September at twn. [00:08:18] siebrand: well non-interfaces can't be b/c, like log row formats, so things that were not using wrapper interfaces broke and where not updated [00:08:19] I know it's just a small wiki, but very big things usually surface there. [00:08:29] it didn't come up. [00:08:44] with a second review this could have been much less broke, though granted it would still be difficult and stuff would break [00:09:06] changing all logs in our repo is weeks of work. [00:09:27] we'd hoped some volunteers would have jumped on it, but it's probably too boring... [00:09:38] logging/rc was some of the worst code in MW...and that says a lot [00:10:19] I'd be prepared to put some resources on this starting next week. Can I get some help, so we can get this over with, robla ? [00:10:57] we're deploying to the rest of wikipedia this wednesday [00:11:18] siebrand: are you saying we stop deployment? [00:11:25] robla: should we? [00:11:39] (I'm not saying anything short term; it's not clear to me what the blocking issue is) [00:11:46] s/is/would be [00:11:55] aside from logeventslist and irc (which is limping along), what else is broke? [00:12:38] what's logeventslist? (the API that came up earlier/bug 34653?) [00:13:04] yes [00:13:05] IRC is a tools issue that's been resolved now, isn't it? [00:13:33] it's limping along, so it's sort of resolved [00:13:36] somewhat...still slightly broken [00:13:41] !b 34508 [00:13:41] https://bugzilla.wikimedia.org/show_bug.cgi?id=34508 [00:13:55] hashar is going to do some work on it tomorrow [00:14:22] well, the definition of broken here is debatable, IMO. But I'll skip over that. [00:14:53] There was a breaking change, like there are in the API now and then too, that wasn't announced well enough. [00:15:12] because of that tools were not ready for the new input. [00:15:13] someone want to review r112546 before I backport it? [00:15:25] it's the fix for the API exception [00:15:37] I reproduced the bug and tested the fix [00:15:55] !r 112546 [00:15:55] https://www.mediawiki.org/wiki/Special:Code/MediaWiki/112546 [00:16:17] Looking [00:16:31] Looks harmless enough [00:16:41] And probably appropriate [00:16:48] AaronSchulz already marked it ok [00:17:21] does this supersede Roan's live hack? [00:17:34] yes [00:17:43] so robla, where I offered to jointly put resources to it, I meant in converting whatever remains of old logging uses (and there are a lot, sample at https://www.mediawiki.org/wiki/Pune_Hackathon_Feb_2012/Topics#Genderizing_logs_for_core_and_extensions -- only core mentioned), and getting that over with. That allows us to properly and timely inform everyone of the logging changes (including those for IRC output) in MediaWiki 1.2 [00:18:02] 1.2? :) [00:18:27] siebrand: my team's time is pretty booked up [00:18:52] robla: yeah, so is mine. So I guess it shouldn't get any priority, then? [00:19:35] well, I'm not going to agree to a big project next week in the middle of a deployment [00:20:24] TimStarling: mw.loader::execute> Exception thrown by mediawiki.action.edit: isReady is not defined [00:20:25] robla: I thought you just said you'd be deploying the rest this Wed? [00:20:27] !log tstarling synchronized php-1.19/includes/api/ApiQueryRecentChanges.php [00:20:30] Logged the message, Master [00:20:31] TimStarling: Seeing that on nl.wiki [00:20:42] a real live exception message? [00:20:54] that patch of mine isn't deployed is it? [00:21:03] siebrand: are you talking about going above and beyond what is absolutely essential not to break things? [00:21:04] Dunno [00:21:13] TimStarling: I'm getting also a log of the exception object [00:21:41] !log tstarling synchronized php-1.19/includes/api/ApiQueryLogEvents.php 'revert live hack' [00:21:43] Logged the message, Master [00:22:17] TimStarling: Refresh fixed it though, can't reproduce anymore [00:22:20] srv192 out of disk space, fixing that [00:23:28] RECOVERY - Disk space on srv192 is OK: DISK OK [00:23:28] robla: I don't know. It may be that my English is not well enough to understand the nuances here, which is why I'm trying to be as clear and unambiguous as possible. Tim Starling wrote a few mins ago "So? [..] you want us to do it after deployment when all those callers are broken?" [00:23:54] Feb 27 23:47:29 srv192 apache2[12769]: PHP Warning: filemtime() [function.filemtime]: stat failed for /usr/local/apache/common-local/php-1.17/extensions/WikiEditor/modules/./images/toolbar/loading.gif in /usr/local/apache/common-local/php-1.18/includes/resourceloader/ResourceLoaderFileModule.php on line 380 [00:23:56] robla: From this I understand, that he is of the opinion that no single instance should have been left in the old logging system. [00:24:15] I'm going to suppress this warning, it seems to have used a lot of disk space on srv192 flooding the logs [00:24:17] siebrand: My understanding is that right now, we want to do the minimum work required to make things not break, as opposed to embarking on a huge project Right Now [00:24:20] "php-1.17" ? [00:24:25] robla: I'm trying to agree on interpretation here first. Did you understand it differently? [00:24:27] Yeah that's a known issue [00:24:33] 1.19 doesn't throw this warning [00:24:38] what about 1.18 ? [00:24:40] RECOVERY - MySQL Replication Heartbeat on db45 is OK: OK replication delay 0 seconds [00:24:42] It does I think [00:24:50] Note that the path to the PHP file is 1.18 [00:24:53] or is there no php-1.18 ? [00:25:07] RECOVERY - MySQL Slave Delay on db45 is OK: OK replication delay 0 seconds [00:25:10] TimStarling: Feel free to just live hack an @ in there, 1.18 is going away anyway [00:25:54] !log tstarling synchronized php-1.18/includes/resourceloader/ResourceLoaderFileModule.php [00:25:55] RoanKattouw: sure. That's *right now*. But if no coordinated effort can be discussed, we'll have the same issue for 1.20 (or 1.2 or whatever version will be next), and the temporary IRC hacks for logging will become permanent, for example. [00:25:57] Logged the message, Master [00:26:26] Krinkle: There is, read the error message carefully [00:28:00] RoanKattouw: Aha [00:28:20] !log removed 2GB of syslogs on srv192 [00:28:20] So this is an old cached absolute path? [00:28:23] Logged the message, Master [00:29:39] TimStarling: call_user_func_array() expects parameter 1 to be a valid callback, class 'languages' does not have a method 'getMessage' in /usr/local/apache/common-local/php-1.19/includes/StubObject.php on line 58 [00:29:40] yes, probably because of MessageBlobStore::clear() being commented out [00:29:49] I'm still trying to figure that one [00:29:51] No, that's not it [00:29:57] There's tons of stuff in the module_deps table [00:30:05] AaronSchulz: got a backtrace? [00:30:07] Lots of bogus skin names with XSS attempts in it [00:30:09] *in them [00:30:17] TimStarling: no :/ [00:30:19] Once 1.19 is deployed I'd like to just clear all the RL cache tables completely [00:30:27] robla: I'll send you a mail. [00:30:37] (after I verify that won't cause problems) [00:30:37] great, thanks [00:30:51] maybe add trigger_error() there to generate a fatal [00:31:52] PROBLEM - MySQL Slave Delay on db47 is CRITICAL: CRIT replication delay 290 seconds [00:32:17] TimStarling: will wmerrors give a backtrace or something? [00:32:28] yes [00:32:34] and the user will get a pretty-looking error page [00:32:46] I saw it when I used the same method on a previous bug [00:33:31] PROBLEM - MySQL Replication Heartbeat on db47 is CRITICAL: CRIT replication delay 386 seconds [00:33:54] trigger_error( "StubObject cannot find function", E_USER_ERROR ); [00:35:28] !log aaron synchronized php-1.19/includes/StubObject.php 'trigger errors for debugging bad callbacks' [00:35:31] Logged the message, Master [00:39:13] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:41:07] TimStarling: AaronSchulz: Reedy: anything blocking us from deploying to plwiki at this point? [00:41:17] no [00:41:25] nope [00:42:01] who wants to pull the trigger? [00:42:23] Not it [00:42:25] I will [00:43:12] !log tstarling rebuilt wikiversions.cdb and synchronized wikiversions files: plwiki to 1.19 [00:43:15] Logged the message, Master [00:45:04] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 335 bytes in 1.129 seconds [00:47:58] I'm feeling bad about all those OOMs in zh.wikipedia.org [00:48:10] LanguageConverter? [00:48:47] maybe LanguageConverter is leaking memory or something, the exceptions come from various places [00:49:13] most often Preprocessor_DOM [00:55:46] gn8 folks [01:03:04] brion: on bug 34539, looks like the block rows are there but have user ID zero [01:03:24] I don't see how this ever worked after the block refactoring [01:04:30] urk [01:04:42] coulda been broken by that already [01:04:57] it ignores the ID in the constructor and gets it via the User (which is via the name) [01:05:19] New patchset: Lcarr; "Installing ssl module and certificate" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2815 [01:05:42] madness :D [01:05:46] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/2815 [01:05:49] but it does User::newFromName( IP::sanitizeIP( $target ), false ) on metawiki [01:06:01] brion: so while that sounds better at first, it doesn't work ;) [01:08:26] hehe [01:09:40] * AaronSchulz needs to hack it to tack in the local wiki ID again [01:12:24] PROBLEM - MySQL Replication Heartbeat on db1006 is CRITICAL: CRIT replication delay 2866 seconds [01:12:51] RECOVERY - MySQL Slave Delay on db1006 is OK: OK replication delay NULL seconds [01:16:34] New review: Lcarr; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/2815 [01:16:35] Change merged: Lcarr; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2815 [01:16:45] PROBLEM - MySQL Slave Running on db1006 is CRITICAL: CRIT replication Slave_IO_Running: Yes Slave_SQL_Running: No Last_Error: Error Duplicate key name page_redirect_namespace_len on query. De [01:22:03] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [01:23:04] robla: it's r100506 that removed - $wgOut->addHTML( $this->sk->makeKnownLinkObj( $this->getTitle(), wfMsgHtml( 'checkuser-log-return' ) ) ); [01:23:20] !r 100506 [01:23:20] https://www.mediawiki.org/wiki/Special:Code/MediaWiki/100506 [01:23:50] it's gone, the message is left only in i18n [01:24:54] johnduhart: you around? [01:25:00] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 335 bytes in 4.580 seconds [01:26:57] saper: are you comfortable filing bugs in Bugzilla? [01:28:19] robla: I can commit a fix for you to trunk if you like [01:28:43] PHP Fatal error: Call to a member function getNamespace() on a non-object in /usr/local/apache/common-local/php-1.19/includes/specials/SpecialMovepage.php on line 59 [01:28:55] saper: well that's another way to go :) [01:29:57] I guess that was a pretty silly question given you narrowed down the rev, huh? [01:30:33] robla: I am just quickly installing trunk w/CheckUser ext to test this one-liner [01:30:37] brion: http://pastebin.com/r168a75X ? [01:30:40] ugh [01:31:09] PROBLEM - Disk space on srv191 is CRITICAL: DISK CRITICAL - free space: / 284 MB (3% inode=63%): /var/lib/ureadahead/debugfs 284 MB (3% inode=63%): [01:33:46] New patchset: Lcarr; "Commenting out duplicate check definition" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2817 [01:34:10] New review: Lcarr; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/2817 [01:34:10] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/2817 [01:34:16] New review: Lcarr; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/2817 [01:34:17] Change merged: Lcarr; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2817 [01:55:00] RECOVERY - Disk space on srv191 is OK: DISK OK [01:55:08] New patchset: Lcarr; "ensure sample config file removed" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2819 [01:55:58] !log reduced disk space usage on srv191 by running logrotate manually [01:56:02] Logged the message, Master [01:56:33] New patchset: Lcarr; "ensure sample config file removed" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2819 [01:56:56] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/2819 [01:57:04] New review: Lcarr; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/2819 [01:57:05] Change merged: Lcarr; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2819 [01:58:23] robla: !r 112562 [01:58:33] !r 112562 [01:58:33] https://www.mediawiki.org/wiki/Special:Code/MediaWiki/112562 [02:00:49] robla: somebody should commit this to 1.19wmf1 [02:00:51] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:01:23] saper: thanks! [02:01:33] AaronSchulz is looking [02:02:06] goodie [02:02:36] checkuser log was my bookmarked entry point to the function, I always check the log first and *then* go to the query form [02:08:39] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 335 bytes in 0.019 seconds [02:12:33] RECOVERY - MySQL Slave Running on db1006 is OK: OK replication Slave_IO_Running: Yes Slave_SQL_Running: Yes Last_Error: [02:14:30] RECOVERY - MySQL Replication Heartbeat on db1006 is OK: OK replication delay 0 seconds [02:16:00] RECOVERY - MySQL Replication Heartbeat on db47 is OK: OK replication delay 0 seconds [02:16:01] RECOVERY - MySQL Slave Delay on db47 is OK: OK replication delay 0 seconds [02:17:09] !log LocalisationUpdate completed (1.18) at Tue Feb 28 02:17:09 UTC 2012 [02:17:12] Logged the message, Master [02:23:57] PROBLEM - MySQL Replication Heartbeat on db47 is CRITICAL: CRIT replication delay 388 seconds [02:23:57] PROBLEM - MySQL Slave Delay on db47 is CRITICAL: CRIT replication delay 388 seconds [02:33:30] !log LocalisationUpdate completed (1.19) at Tue Feb 28 02:33:30 UTC 2012 [02:33:33] Logged the message, Master [02:43:01] !log aaron synchronized php-1.19/resources/mediawiki/mediawiki.js 'deployed r112564' [02:43:04] Logged the message, Master [02:43:47] !log aaron synchronized php-1.19/includes/resourceloader/ResourceLoaderContext.php 'deployed r112564' [02:43:50] Logged the message, Master [02:44:14] !log aaron synchronized php-1.19/includes/Block.php 'deployed r112564' [02:44:16] Logged the message, Master [02:44:18] !r r112564 [02:44:19] https://www.mediawiki.org/wiki/Special:Code/MediaWiki/r112564 [02:58:48] !log catrope synchronized php-1.19/includes/logging 'r112570' [02:58:51] Logged the message, Master [02:59:14] !log catrope synchronized php-1.19/extensions/CheckUser 'r112570' [02:59:16] Logged the message, Master [02:59:39] !log catrope synchronized php-1.19/resources 'r112570' [02:59:42] Logged the message, Master [03:01:48] PROBLEM - Puppet freshness on hooper is CRITICAL: Puppet has not run in the last 10 hours [03:02:51] PROBLEM - Puppet freshness on cadmium is CRITICAL: Puppet has not run in the last 10 hours [03:07:41] !log tstarling synchronized php-1.19/includes/specials/SpecialMovepage.php 'r112572' [03:07:44] Logged the message, Master [03:10:13] RECOVERY - MySQL Slave Delay on db47 is OK: OK replication delay 0 seconds [03:12:01] RECOVERY - MySQL Replication Heartbeat on db47 is OK: OK replication delay 0 seconds [03:17:52] PROBLEM - MySQL Replication Heartbeat on db47 is CRITICAL: CRIT replication delay 303 seconds [03:18:10] PROBLEM - MySQL Slave Delay on db47 is CRITICAL: CRIT replication delay 320 seconds [03:54:24] !log tstarling synchronized all.dblist 'removed chwikimedia' [03:54:27] Logged the message, Master [03:54:42] !log tstarling synchronized deleted.dblist 'removed chwikimedia' [03:54:45] Logged the message, Master [03:55:02] * jeremyb spies a tstarling ;) [03:55:47] TimStarling: FYI, the google guy i was mailing with has mailed their "sitelinks" team. i think we're still waiting to hear back from that team. (or just it's night and people aren't working) [03:58:46] or maybe they fixed it and didn't bother to tell you [04:01:39] i doubt that. maybe they fixed it and the message just hasn't gotten to me yet [04:01:53] (i'm not talking to the sitelinks ppl. i'm talking to someone who's talking to them) [04:02:01] * jeremyb goes to check if it's fixed [04:02:17] TimStarling: it's still broken... [04:02:54] sleeeeeepy time [04:03:48] RECOVERY - Puppet freshness on lvs1002 is OK: puppet ran at Tue Feb 28 04:03:20 UTC 2012 [04:10:42] RECOVERY - MySQL Replication Heartbeat on db47 is OK: OK replication delay 0 seconds [04:10:51] RECOVERY - MySQL Slave Delay on db47 is OK: OK replication delay 0 seconds [04:11:08] I can't load the wiki [04:11:13] Request: GET http://commons.wikimedia.org/, from 10.64.0.130 via cp1014.eqiad.wmnet (squid/2.7.STABLE9) to () [04:11:13] Error: ERR_CANNOT_FORWARD, errno (11) Resource temporarily unavailable at Tue, 28 Feb 2012 04:11:05 GMT [04:11:18] RECOVERY - ps1-d2-pmtpa-infeed-load-tower-A-phase-Y on ps1-d2-pmtpa is OK: ps1-d2-pmtpa-infeed-load-tower-A-phase-Y OK - 1150 [04:11:20] "Our servers are currently experiencing a technical problem. This is probably temporary and should be fixed soon. Please try again in a few minutes." ? [04:11:39] me too [04:12:16] It's loading now [04:13:18] Eloquence, let me guess, the servers crashed [04:13:34] * jeremyb imagines Eloquence doesn't know? [04:13:37] but maybe [04:13:39] * TimStarling is here [04:13:48] yeah, that too ;) [04:14:32] TimStarling: jorm also mentioned intermittent 500 [04:15:20] got Request: GET http://en.wikipedia.org/wiki/Main_Page, from 10.64.0.140 via cp1015.eqiad.wmnet (squid/2.7.STABLE9) to () [04:15:21] Error: ERR_CANNOT_FORWARD, errno (11) Resource temporarily unavailable at Tue, 28 Feb 2012 04:11:44 GMT [04:15:52] It's loading now [04:15:59] Eloquence: /me visited this painting in person yesterday ;) https://commons.wikimedia.org/w/index.php?title=Special:Log&limit=1&offset=20050519184230&type=upload&user=File+Upload+Bot+%28Eloquence%29 [04:16:14] yeah, seems fine now [04:17:45] PROBLEM - MySQL Replication Heartbeat on db36 is CRITICAL: CRIT replication delay 302 seconds [04:17:46] PROBLEM - MySQL Slave Delay on db36 is CRITICAL: CRIT replication delay 302 seconds [04:18:34] @info db36 [04:18:34] jeremyb: [db36: s1] 10.0.6.46 [04:18:44] it's doing schema changes [04:18:45] dbbot-wm is soooo much faster now [04:19:25] i wonder how it learns about servers moving from cluster to cluster? [04:19:30] the schema change script causes false lag alerts [04:19:51] (that seemed too fast to be doing an API call) [04:30:13] !log torrus down, reporting "PANIC: fatal region error detected; run recovery" about its DB files, will stop apache to investigate [04:30:18] Logged the message, Master [04:36:21] PROBLEM - Puppet freshness on mw1010 is CRITICAL: Puppet has not run in the last 10 hours [04:42:30] PROBLEM - Puppet freshness on mw1110 is CRITICAL: Puppet has not run in the last 10 hours [04:42:30] PROBLEM - Puppet freshness on mw1020 is CRITICAL: Puppet has not run in the last 10 hours [04:48:29] !log on manutius: rebuilt torrus config DB, moved old one out to /var/lib/torrus/db.broken. Restarted. [04:48:31] Logged the message, Master [05:26:30] RECOVERY - MySQL Replication Heartbeat on db36 is OK: OK replication delay 0 seconds [05:26:48] RECOVERY - MySQL Slave Delay on db36 is OK: OK replication delay 0 seconds [06:14:38] PROBLEM - MySQL Replication Heartbeat on db34 is CRITICAL: CRIT replication delay 21203 seconds [06:24:14] PROBLEM - Puppet freshness on owa3 is CRITICAL: Puppet has not run in the last 10 hours [06:24:59] RECOVERY - MySQL Replication Heartbeat on db34 is OK: OK replication delay 0 seconds [06:25:08] RECOVERY - MySQL Slave Delay on db34 is OK: OK replication delay 0 seconds [06:33:14] PROBLEM - Puppet freshness on owa1 is CRITICAL: Puppet has not run in the last 10 hours [06:33:14] PROBLEM - Puppet freshness on owa2 is CRITICAL: Puppet has not run in the last 10 hours [06:46:41] PROBLEM - NTP on srv278 is CRITICAL: NTP CRITICAL: Offset unknown [06:47:17] PROBLEM - Apache HTTP on srv278 is CRITICAL: Connection refused [06:50:35] RECOVERY - NTP on srv278 is OK: NTP OK: Offset 0.007147789001 secs [07:09:11] RECOVERY - Apache HTTP on srv278 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.052 second response time [08:48:22] !log nikerabbit synchronized php-1.18/extensions/Narayam/Narayam.php 'Narayam gu mapping out of beta' [08:48:25] Logged the message, Master [08:49:34] !log nikerabbit synchronized php-1.19/extensions/Narayam/Narayam.php 'Narayam gu mapping out of beta' [08:49:37] Logged the message, Master [09:56:55] New patchset: Hashar; "redirect some missing Swift syslog messages" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2820 [09:57:22] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/2820 [11:20:43] PROBLEM - RAID on searchidx2 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [11:22:30] RECOVERY - RAID on searchidx2 is OK: OK: State is Optimal, checked 4 logical device(s) [11:31:45] New patchset: Hashar; "allow hashar on formey host" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2821 [11:32:09] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/2821 [13:02:59] PROBLEM - Puppet freshness on hooper is CRITICAL: Puppet has not run in the last 10 hours [13:04:02] PROBLEM - Puppet freshness on cadmium is CRITICAL: Puppet has not run in the last 10 hours [13:19:11] mark, maybe you can help me update https://www.mediawiki.org/wiki/SwiftMedia -- I'm not sure to what extent Swift is now deployed or what proportion of our media storage needs it serves [13:23:43] Is wikitech suppose to have a self signed ssl rather than the wikimedia wildcard? [13:23:43] ah sumanah, it's serving 100% of thumbs. [13:24:02] Damianz: it can't have the wildcard cert, because it's stored on an external server which we don't control [13:24:11] Damianz: there's a bug open about that. The ops team is delaying fixing it because they may merge that with another wiki, IIUC [13:24:27] sumanah: also I added my name as a gsoc mentor possibility and plugged the signup page to ops [13:24:50] Ah [13:24:51] yes, the idea is to move it to labs in the end with an external copy off site that's only needed if everything dies [13:24:53] Thank you apergos [13:24:58] That's ok then, thought I'd ask :D [13:25:03] thanks for pointing it out sumanah [13:25:09] sure thing Damianz [13:25:19] Damianz: https://bugzilla.wikimedia.org/show_bug.cgi?id=27291 [13:26:31] but what are you watching? [14:15:48] working on a traceroute [14:15:50] mark: prod? [14:15:54] hi werdna [14:16:00] hey Daniel_WMDE [14:16:13] The requested URL http://en.wikipedia.org/wiki/ was not found on this server. [14:16:16] :/ [14:16:57] restored [14:16:58] that sounds interesting :) [14:17:13] for reference it broke here [14:17:14] 17 ae1d0.mcr1.tampa-fl.us.xo.net (216.156.1.109) 232.827 ms 243.969 ms 271.850 ms [14:17:17] 18 * * * [14:17:31] omg what a ping [14:17:35] are you in some kind of australia?!!!? [14:17:44] but that error message isn't consistent with a broken route... you *did* get a response. just apparently from the wrong box... [14:18:12] yeah, it was up and down [14:18:19] I had a broken route for a bit [14:18:25] ohh [14:18:29] my ISP has a transparent proxy [14:18:33] I believe [14:18:43] werdna: TPG? [14:19:02] Telstra 3G [14:19:21] domas: I'm on the 58th floor of a building and I'm using 3G internet from Australia [14:19:28] it's a miracle that it gets to tampa at all [14:20:25] werdna: yeah, they do iirc, proxies out into melb iirc [14:21:29] werdna: don't you have direct visibility from that high?! [14:21:38] oh how I don't miss using telstra 3g [14:21:47] Telstra 3G is pretty good all things considered [14:37:30] PROBLEM - Puppet freshness on mw1010 is CRITICAL: Puppet has not run in the last 10 hours [14:39:14] https://pl.wikipedia.org/wiki/Special:Version says plwiki is running r112130, however I see that r112570 is already deployed (my Special:CheckUserLog problem is no longer there). What's changed? [14:44:33] PROBLEM - Puppet freshness on mw1110 is CRITICAL: Puppet has not run in the last 10 hours [14:44:33] PROBLEM - Puppet freshness on mw1020 is CRITICAL: Puppet has not run in the last 10 hours [14:46:08] New review: Demon; "(no comment)" [operations/puppet] (production) C: 1; - https://gerrit.wikimedia.org/r/2821 [15:27:58] PROBLEM - Packetloss_Average on locke is CRITICAL: CRITICAL: packet_loss_average is 10.0594110526 (gt 8.0) [15:30:36] join the better side of the world: #wikimedia-tech-nolog Same without the nasty public logging. [15:30:39] Change abandoned: Mark Bergsma; "Old" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/887 [15:37:17] lol [15:38:50] Change abandoned: Mark Bergsma; "This doesn't work as it causes duplicate Puppet definitions." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/1726 [15:44:27] Change abandoned: Mark Bergsma; "This does not look like it can go into production as it is now, sorry. Since it's an old change I wi..." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2254 [15:45:46] New review: Mark Bergsma; "This breaks the mail gateway I'm afraid" [operations/puppet] (production); V: 0 C: -2; - https://gerrit.wikimedia.org/r/2446 [15:47:36] New review: Mark Bergsma; "What is the status of this change?" [operations/puppet] (production); V: 0 C: 0; - https://gerrit.wikimedia.org/r/2495 [15:47:46] PROBLEM - RAID on searchidx2 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [15:49:43] RECOVERY - RAID on searchidx2 is OK: OK: State is Optimal, checked 4 logical device(s) [15:52:07] PROBLEM - Packetloss_Average on locke is CRITICAL: CRITICAL: packet_loss_average is 8.81116736842 (gt 8.0) [15:59:03] domas: were you laughing about my "nolog" advert? :-P Not nice of you! ;) [15:59:12] I was [16:02:01] RECOVERY - Packetloss_Average on locke is OK: OK: packet_loss_average is 3.60184761062 [16:26:16] PROBLEM - Puppet freshness on owa3 is CRITICAL: Puppet has not run in the last 10 hours [16:35:16] PROBLEM - Puppet freshness on owa2 is CRITICAL: Puppet has not run in the last 10 hours [16:35:16] PROBLEM - Puppet freshness on owa1 is CRITICAL: Puppet has not run in the last 10 hours [16:37:22] PROBLEM - Packetloss_Average on locke is CRITICAL: CRITICAL: packet_loss_average is 11.5292930702 (gt 8.0) [17:07:28] PROBLEM - Packetloss_Average on locke is CRITICAL: CRITICAL: packet_loss_average is 11.6548744737 (gt 8.0) [17:20:47] !log aaron synchronized php-1.19/extensions/PagedTiffHandler/PagedTiffHandler_body.php 'deployed r112614' [17:20:50] Logged the message, Master [17:21:24] RECOVERY - Packetloss_Average on locke is OK: OK: packet_loss_average is 3.59682 [17:23:53] hmm, still no good [17:25:45] if ( is_file( $dstPath ) ) { [17:25:47] return new ThumbnailImage( $image, $dstUrl, $width, $height, $dstPath, $page ); [17:25:49] * AaronSchulz sighs [17:28:08] !log aaron synchronized php-1.19/extensions/PagedTiffHandler/PagedTiffHandler_body.php [17:28:11] Logged the message, Master [17:31:18] PROBLEM - Packetloss_Average on locke is CRITICAL: CRITICAL: packet_loss_average is 10.0625042105 (gt 8.0) [17:33:02] !log aaron synchronized php-1.19/extensions/PagedTiffHandler/PagedTiffHandler_body.php [17:33:05] Logged the message, Master [17:47:21] RECOVERY - Packetloss_Average on locke is OK: OK: packet_loss_average is 3.23885684211 [18:00:29] Hi all! Here's Tom from Wikimedia Brasil and Open Knowledge Foundation Brasil. Can someone recommend some multilanguage extension for mediawiki? I'm wrinting a text on possibilities for wiki.okfn.org site and I'd like to know what do we use on wikipedia [18:01:43] I am analysing what is better, to have the language set by the browser default and rediredct to the proper page (as commons) or to have a bar for all available language in a specifi article [18:01:59] New patchset: Mark Bergsma; "Add strontium and palladium as bits servers" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2834 [18:03:01] New patchset: Mark Bergsma; "Add strontium and palladium as bits servers" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2834 [18:04:04] New patchset: Mark Bergsma; "Add strontium and palladium as bits servers" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2834 [18:04:56] New review: Mark Bergsma; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/2834 [18:04:57] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2834 [18:08:48] New patchset: Ryan Lane; "Moving operations/software commits to operations channels." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2836 [18:22:05] PROBLEM - Packetloss_Average on locke is CRITICAL: CRITICAL: packet_loss_average is 10.4970663158 (gt 8.0) [18:33:39] New review: Ryan Lane; "(no comment)" [operations/puppet] (production); V: 1 C: 2; - https://gerrit.wikimedia.org/r/2836 [18:33:39] Change merged: Ryan Lane; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2836 [18:41:58] RoanKattouw, what's the status of the videos transcode' :) [18:42:11] RECOVERY - Packetloss_Average on locke is OK: OK: packet_loss_average is 3.64073380531 [18:42:23] I started it over the weekend [18:42:28] It got most of the way there, then the box died [18:42:36] heh I noticed [18:42:48] Rob brought it back up, but I haven't had time to go in and investigate which files are complete and which ones aren't [18:43:07] If I could get Ryan to set up the proxy, I guess a volunteer could do that for me maybe [18:43:14] (odder has been really great helping out with this) [18:44:06] I see [18:44:13] RoanKattouw: bring apache up on it, and I'll see how hard it'll be to proxy through an existing proxy [18:44:17] OK [18:44:30] I may just use singer [18:45:09] !log Installing Apache on cadmium [18:45:12] Logged the message, Mr. Obvious [18:49:31] AaronSchulz: is there a change log or release notes for flagged revs? [18:51:41] Ryan_Lane: Apache running and set up [18:51:50] ok [18:52:03] there's a couple things I *must* do today [18:52:08] like move the gerrit server [18:52:12]