[00:01:44] yeah, basic on what siebrand just posted, looks like I was wrong anyway [00:02:59] I'm going to fix this bug [00:03:34] Also see http://lists.wikimedia.org/pipermail/wikitech-l/2011-September/055034.html (first in thread in which 7 or so participated) [00:04:52] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 335 bytes in 2.608 seconds [00:05:04] if you change an interface then you have to survey the existing code to see what uses that interface, and update all those callers [00:05:40] if Nikerabbit didn't have time to do that, then we should have backed out the code to a development branch until he did [00:06:15] TimStarling: there are hundreds of callers. [00:06:24] so? [00:06:31] sounds like a big job [00:06:34] TimStarling: srsly? [00:06:44] you want us to do it after deployment when all those callers are broken? [00:06:47] yeah, I'm with Tim on this one [00:06:53] instead of doing it before deployment? [00:07:04] ok, so what is going here? [00:07:20] IIRC the new logging classes should be BC with the old ones. [00:07:39] Do I understand correctly, that old and new are not working well together? [00:07:39] we're at where we're at now, though....we can rehash what we should have done later [00:08:00] siebrand: yeah, that's what we're piecing together [00:08:06] we've been using this since September at twn. [00:08:18] siebrand: well non-interfaces can't be b/c, like log row formats, so things that were not using wrapper interfaces broke and where not updated [00:08:19] I know it's just a small wiki, but very big things usually surface there. [00:08:29] it didn't come up. [00:08:44] with a second review this could have been much less broke, though granted it would still be difficult and stuff would break [00:09:06] changing all logs in our repo is weeks of work. [00:09:27] we'd hoped some volunteers would have jumped on it, but it's probably too boring... [00:09:38] logging/rc was some of the worst code in MW...and that says a lot [00:10:19] I'd be prepared to put some resources on this starting next week. Can I get some help, so we can get this over with, robla ? [00:10:57] we're deploying to the rest of wikipedia this wednesday [00:11:18] siebrand: are you saying we stop deployment? [00:11:25] robla: should we? [00:11:39] (I'm not saying anything short term; it's not clear to me what the blocking issue is) [00:11:46] s/is/would be [00:11:55] aside from logeventslist and irc (which is limping along), what else is broke? [00:12:38] what's logeventslist? (the API that came up earlier/bug 34653?) [00:13:04] yes [00:13:05] IRC is a tools issue that's been resolved now, isn't it? [00:13:33] it's limping along, so it's sort of resolved [00:13:36] somewhat...still slightly broken [00:13:41] !b 34508 [00:13:41] https://bugzilla.wikimedia.org/show_bug.cgi?id=34508 [00:13:55] hashar is going to do some work on it tomorrow [00:14:22] well, the definition of broken here is debatable, IMO. But I'll skip over that. [00:14:53] There was a breaking change, like there are in the API now and then too, that wasn't announced well enough. [00:15:12] because of that tools were not ready for the new input. [00:15:13] someone want to review r112546 before I backport it? [00:15:25] it's the fix for the API exception [00:15:37] I reproduced the bug and tested the fix [00:15:55] !r 112546 [00:15:55] https://www.mediawiki.org/wiki/Special:Code/MediaWiki/112546 [00:16:17] Looking [00:16:31] Looks harmless enough [00:16:41] And probably appropriate [00:16:48] AaronSchulz already marked it ok [00:17:21] does this supersede Roan's live hack? [00:17:34] yes [00:17:43] so robla, where I offered to jointly put resources to it, I meant in converting whatever remains of old logging uses (and there are a lot, sample at https://www.mediawiki.org/wiki/Pune_Hackathon_Feb_2012/Topics#Genderizing_logs_for_core_and_extensions -- only core mentioned), and getting that over with. That allows us to properly and timely inform everyone of the logging changes (including those for IRC output) in MediaWiki 1.2 [00:18:02] 1.2? :) [00:18:27] siebrand: my team's time is pretty booked up [00:18:52] robla: yeah, so is mine. So I guess it shouldn't get any priority, then? [00:19:35] well, I'm not going to agree to a big project next week in the middle of a deployment [00:20:24] TimStarling: mw.loader::execute> Exception thrown by mediawiki.action.edit: isReady is not defined [00:20:25] robla: I thought you just said you'd be deploying the rest this Wed? [00:20:27] !log tstarling synchronized php-1.19/includes/api/ApiQueryRecentChanges.php [00:20:30] Logged the message, Master [00:20:31] TimStarling: Seeing that on nl.wiki [00:20:42] a real live exception message? [00:20:54] that patch of mine isn't deployed is it? [00:21:03] siebrand: are you talking about going above and beyond what is absolutely essential not to break things? [00:21:04] Dunno [00:21:13] TimStarling: I'm getting also a log of the exception object [00:21:41] !log tstarling synchronized php-1.19/includes/api/ApiQueryLogEvents.php 'revert live hack' [00:21:43] Logged the message, Master [00:22:17] TimStarling: Refresh fixed it though, can't reproduce anymore [00:22:20] srv192 out of disk space, fixing that [00:23:28] RECOVERY - Disk space on srv192 is OK: DISK OK [00:23:28] robla: I don't know. It may be that my English is not well enough to understand the nuances here, which is why I'm trying to be as clear and unambiguous as possible. Tim Starling wrote a few mins ago "So? [..] you want us to do it after deployment when all those callers are broken?" [00:23:54] Feb 27 23:47:29 srv192 apache2[12769]: PHP Warning: filemtime() [function.filemtime]: stat failed for /usr/local/apache/common-local/php-1.17/extensions/WikiEditor/modules/./images/toolbar/loading.gif in /usr/local/apache/common-local/php-1.18/includes/resourceloader/ResourceLoaderFileModule.php on line 380 [00:23:56] robla: From this I understand, that he is of the opinion that no single instance should have been left in the old logging system. [00:24:15] I'm going to suppress this warning, it seems to have used a lot of disk space on srv192 flooding the logs [00:24:17] siebrand: My understanding is that right now, we want to do the minimum work required to make things not break, as opposed to embarking on a huge project Right Now [00:24:20] "php-1.17" ? [00:24:25] robla: I'm trying to agree on interpretation here first. Did you understand it differently? [00:24:27] Yeah that's a known issue [00:24:33] 1.19 doesn't throw this warning [00:24:38] what about 1.18 ? [00:24:40] RECOVERY - MySQL Replication Heartbeat on db45 is OK: OK replication delay 0 seconds [00:24:42] It does I think [00:24:50] Note that the path to the PHP file is 1.18 [00:24:53] or is there no php-1.18 ? [00:25:07] RECOVERY - MySQL Slave Delay on db45 is OK: OK replication delay 0 seconds [00:25:10] TimStarling: Feel free to just live hack an @ in there, 1.18 is going away anyway [00:25:54] !log tstarling synchronized php-1.18/includes/resourceloader/ResourceLoaderFileModule.php [00:25:55] RoanKattouw: sure. That's *right now*. But if no coordinated effort can be discussed, we'll have the same issue for 1.20 (or 1.2 or whatever version will be next), and the temporary IRC hacks for logging will become permanent, for example. [00:25:57] Logged the message, Master [00:26:26] Krinkle: There is, read the error message carefully [00:28:00] RoanKattouw: Aha [00:28:20] !log removed 2GB of syslogs on srv192 [00:28:20] So this is an old cached absolute path? [00:28:23] Logged the message, Master [00:29:39] TimStarling: call_user_func_array() expects parameter 1 to be a valid callback, class 'languages' does not have a method 'getMessage' in /usr/local/apache/common-local/php-1.19/includes/StubObject.php on line 58 [00:29:40] yes, probably because of MessageBlobStore::clear() being commented out [00:29:49] I'm still trying to figure that one [00:29:51] No, that's not it [00:29:57] There's tons of stuff in the module_deps table [00:30:05] AaronSchulz: got a backtrace? [00:30:07] Lots of bogus skin names with XSS attempts in it [00:30:09] *in them [00:30:17] TimStarling: no :/ [00:30:19] Once 1.19 is deployed I'd like to just clear all the RL cache tables completely [00:30:27] robla: I'll send you a mail. [00:30:37] (after I verify that won't cause problems) [00:30:37] great, thanks [00:30:51] maybe add trigger_error() there to generate a fatal [00:31:52] PROBLEM - MySQL Slave Delay on db47 is CRITICAL: CRIT replication delay 290 seconds [00:32:17] TimStarling: will wmerrors give a backtrace or something? [00:32:28] yes [00:32:34] and the user will get a pretty-looking error page [00:32:46] I saw it when I used the same method on a previous bug [00:33:31] PROBLEM - MySQL Replication Heartbeat on db47 is CRITICAL: CRIT replication delay 386 seconds [00:33:54] trigger_error( "StubObject cannot find function", E_USER_ERROR ); [00:35:28] !log aaron synchronized php-1.19/includes/StubObject.php 'trigger errors for debugging bad callbacks' [00:35:31] Logged the message, Master [00:39:13] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:41:07] TimStarling: AaronSchulz: Reedy: anything blocking us from deploying to plwiki at this point? [00:41:17] no [00:41:25] nope [00:42:01] who wants to pull the trigger? [00:42:23] Not it [00:42:25] I will [00:43:12] !log tstarling rebuilt wikiversions.cdb and synchronized wikiversions files: plwiki to 1.19 [00:43:15] Logged the message, Master [00:45:04] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 335 bytes in 1.129 seconds [00:47:58] I'm feeling bad about all those OOMs in zh.wikipedia.org [00:48:10] LanguageConverter? [00:48:47] maybe LanguageConverter is leaking memory or something, the exceptions come from various places [00:49:13] most often Preprocessor_DOM [00:55:46] gn8 folks [01:03:04] brion: on bug 34539, looks like the block rows are there but have user ID zero [01:03:24] I don't see how this ever worked after the block refactoring [01:04:30] urk [01:04:42] coulda been broken by that already [01:04:57] it ignores the ID in the constructor and gets it via the User (which is via the name) [01:05:19] New patchset: Lcarr; "Installing ssl module and certificate" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2815 [01:05:42] madness :D [01:05:46] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/2815 [01:05:49] but it does User::newFromName( IP::sanitizeIP( $target ), false ) on metawiki [01:06:01] brion: so while that sounds better at first, it doesn't work ;) [01:08:26] hehe [01:09:40] * AaronSchulz needs to hack it to tack in the local wiki ID again [01:12:24] PROBLEM - MySQL Replication Heartbeat on db1006 is CRITICAL: CRIT replication delay 2866 seconds [01:12:51] RECOVERY - MySQL Slave Delay on db1006 is OK: OK replication delay NULL seconds [01:16:34] New review: Lcarr; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/2815 [01:16:35] Change merged: Lcarr; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2815 [01:16:45] PROBLEM - MySQL Slave Running on db1006 is CRITICAL: CRIT replication Slave_IO_Running: Yes Slave_SQL_Running: No Last_Error: Error Duplicate key name page_redirect_namespace_len on query. De [01:22:03] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [01:23:04] robla: it's r100506 that removed - $wgOut->addHTML( $this->sk->makeKnownLinkObj( $this->getTitle(), wfMsgHtml( 'checkuser-log-return' ) ) ); [01:23:20] !r 100506 [01:23:20] https://www.mediawiki.org/wiki/Special:Code/MediaWiki/100506 [01:23:50] it's gone, the message is left only in i18n [01:24:54] johnduhart: you around? [01:25:00] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 335 bytes in 4.580 seconds [01:26:57] saper: are you comfortable filing bugs in Bugzilla? [01:28:19] robla: I can commit a fix for you to trunk if you like [01:28:43] PHP Fatal error: Call to a member function getNamespace() on a non-object in /usr/local/apache/common-local/php-1.19/includes/specials/SpecialMovepage.php on line 59 [01:28:55] saper: well that's another way to go :) [01:29:57] I guess that was a pretty silly question given you narrowed down the rev, huh? [01:30:33] robla: I am just quickly installing trunk w/CheckUser ext to test this one-liner [01:30:37] brion: http://pastebin.com/r168a75X ? [01:30:40] ugh [01:31:09] PROBLEM - Disk space on srv191 is CRITICAL: DISK CRITICAL - free space: / 284 MB (3% inode=63%): /var/lib/ureadahead/debugfs 284 MB (3% inode=63%): [01:33:46] New patchset: Lcarr; "Commenting out duplicate check definition" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2817 [01:34:10] New review: Lcarr; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/2817 [01:34:10] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/2817 [01:34:16] New review: Lcarr; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/2817 [01:34:17] Change merged: Lcarr; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2817 [01:55:00] RECOVERY - Disk space on srv191 is OK: DISK OK [01:55:08] New patchset: Lcarr; "ensure sample config file removed" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2819 [01:55:58] !log reduced disk space usage on srv191 by running logrotate manually [01:56:02] Logged the message, Master [01:56:33] New patchset: Lcarr; "ensure sample config file removed" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2819 [01:56:56] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/2819 [01:57:04] New review: Lcarr; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/2819 [01:57:05] Change merged: Lcarr; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2819 [01:58:23] robla: !r 112562 [01:58:33] !r 112562 [01:58:33] https://www.mediawiki.org/wiki/Special:Code/MediaWiki/112562 [02:00:49] robla: somebody should commit this to 1.19wmf1 [02:00:51] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:01:23] saper: thanks! [02:01:33] AaronSchulz is looking [02:02:06] goodie [02:02:36] checkuser log was my bookmarked entry point to the function, I always check the log first and *then* go to the query form [02:08:39] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 335 bytes in 0.019 seconds [02:12:33] RECOVERY - MySQL Slave Running on db1006 is OK: OK replication Slave_IO_Running: Yes Slave_SQL_Running: Yes Last_Error: [02:14:30] RECOVERY - MySQL Replication Heartbeat on db1006 is OK: OK replication delay 0 seconds [02:16:00] RECOVERY - MySQL Replication Heartbeat on db47 is OK: OK replication delay 0 seconds [02:16:01] RECOVERY - MySQL Slave Delay on db47 is OK: OK replication delay 0 seconds [02:17:09] !log LocalisationUpdate completed (1.18) at Tue Feb 28 02:17:09 UTC 2012 [02:17:12] Logged the message, Master [02:23:57] PROBLEM - MySQL Replication Heartbeat on db47 is CRITICAL: CRIT replication delay 388 seconds [02:23:57] PROBLEM - MySQL Slave Delay on db47 is CRITICAL: CRIT replication delay 388 seconds [02:33:30] !log LocalisationUpdate completed (1.19) at Tue Feb 28 02:33:30 UTC 2012 [02:33:33] Logged the message, Master [02:43:01] !log aaron synchronized php-1.19/resources/mediawiki/mediawiki.js 'deployed r112564' [02:43:04] Logged the message, Master [02:43:47] !log aaron synchronized php-1.19/includes/resourceloader/ResourceLoaderContext.php 'deployed r112564' [02:43:50] Logged the message, Master [02:44:14] !log aaron synchronized php-1.19/includes/Block.php 'deployed r112564' [02:44:16] Logged the message, Master [02:44:18] !r r112564 [02:44:19] https://www.mediawiki.org/wiki/Special:Code/MediaWiki/r112564 [02:58:48] !log catrope synchronized php-1.19/includes/logging 'r112570' [02:58:51] Logged the message, Master [02:59:14] !log catrope synchronized php-1.19/extensions/CheckUser 'r112570' [02:59:16] Logged the message, Master [02:59:39] !log catrope synchronized php-1.19/resources 'r112570' [02:59:42] Logged the message, Master [03:01:48] PROBLEM - Puppet freshness on hooper is CRITICAL: Puppet has not run in the last 10 hours [03:02:51] PROBLEM - Puppet freshness on cadmium is CRITICAL: Puppet has not run in the last 10 hours [03:07:41] !log tstarling synchronized php-1.19/includes/specials/SpecialMovepage.php 'r112572' [03:07:44] Logged the message, Master [03:10:13] RECOVERY - MySQL Slave Delay on db47 is OK: OK replication delay 0 seconds [03:12:01] RECOVERY - MySQL Replication Heartbeat on db47 is OK: OK replication delay 0 seconds [03:17:52] PROBLEM - MySQL Replication Heartbeat on db47 is CRITICAL: CRIT replication delay 303 seconds [03:18:10] PROBLEM - MySQL Slave Delay on db47 is CRITICAL: CRIT replication delay 320 seconds [03:54:24] !log tstarling synchronized all.dblist 'removed chwikimedia' [03:54:27] Logged the message, Master [03:54:42] !log tstarling synchronized deleted.dblist 'removed chwikimedia' [03:54:45] Logged the message, Master [03:55:02] * jeremyb spies a tstarling ;) [03:55:47] TimStarling: FYI, the google guy i was mailing with has mailed their "sitelinks" team. i think we're still waiting to hear back from that team. (or just it's night and people aren't working) [03:58:46] or maybe they fixed it and didn't bother to tell you [04:01:39] i doubt that. maybe they fixed it and the message just hasn't gotten to me yet [04:01:53] (i'm not talking to the sitelinks ppl. i'm talking to someone who's talking to them) [04:02:01] * jeremyb goes to check if it's fixed [04:02:17] TimStarling: it's still broken... [04:02:54] sleeeeeepy time [04:03:48] RECOVERY - Puppet freshness on lvs1002 is OK: puppet ran at Tue Feb 28 04:03:20 UTC 2012 [04:10:42] RECOVERY - MySQL Replication Heartbeat on db47 is OK: OK replication delay 0 seconds [04:10:51] RECOVERY - MySQL Slave Delay on db47 is OK: OK replication delay 0 seconds [04:11:08] I can't load the wiki [04:11:13] Request: GET http://commons.wikimedia.org/, from 10.64.0.130 via cp1014.eqiad.wmnet (squid/2.7.STABLE9) to () [04:11:13] Error: ERR_CANNOT_FORWARD, errno (11) Resource temporarily unavailable at Tue, 28 Feb 2012 04:11:05 GMT [04:11:18] RECOVERY - ps1-d2-pmtpa-infeed-load-tower-A-phase-Y on ps1-d2-pmtpa is OK: ps1-d2-pmtpa-infeed-load-tower-A-phase-Y OK - 1150 [04:11:20] "Our servers are currently experiencing a technical problem. This is probably temporary and should be fixed soon. Please try again in a few minutes." ? [04:11:39] me too [04:12:16] It's loading now [04:13:18] Eloquence, let me guess, the servers crashed [04:13:34] * jeremyb imagines Eloquence doesn't know? [04:13:37] but maybe [04:13:39] * TimStarling is here [04:13:48] yeah, that too ;) [04:14:32] TimStarling: jorm also mentioned intermittent 500 [04:15:20] got Request: GET http://en.wikipedia.org/wiki/Main_Page, from 10.64.0.140 via cp1015.eqiad.wmnet (squid/2.7.STABLE9) to () [04:15:21] Error: ERR_CANNOT_FORWARD, errno (11) Resource temporarily unavailable at Tue, 28 Feb 2012 04:11:44 GMT [04:15:52] It's loading now [04:15:59] Eloquence: /me visited this painting in person yesterday ;) https://commons.wikimedia.org/w/index.php?title=Special:Log&limit=1&offset=20050519184230&type=upload&user=File+Upload+Bot+%28Eloquence%29 [04:16:14] yeah, seems fine now [04:17:45] PROBLEM - MySQL Replication Heartbeat on db36 is CRITICAL: CRIT replication delay 302 seconds [04:17:46] PROBLEM - MySQL Slave Delay on db36 is CRITICAL: CRIT replication delay 302 seconds [04:18:34] @info db36 [04:18:34] jeremyb: [db36: s1] 10.0.6.46 [04:18:44] it's doing schema changes [04:18:45] dbbot-wm is soooo much faster now [04:19:25] i wonder how it learns about servers moving from cluster to cluster? [04:19:30] the schema change script causes false lag alerts [04:19:51] (that seemed too fast to be doing an API call) [04:30:13] !log torrus down, reporting "PANIC: fatal region error detected; run recovery" about its DB files, will stop apache to investigate [04:30:18] Logged the message, Master [04:36:21] PROBLEM - Puppet freshness on mw1010 is CRITICAL: Puppet has not run in the last 10 hours [04:42:30] PROBLEM - Puppet freshness on mw1110 is CRITICAL: Puppet has not run in the last 10 hours [04:42:30] PROBLEM - Puppet freshness on mw1020 is CRITICAL: Puppet has not run in the last 10 hours [04:48:29] !log on manutius: rebuilt torrus config DB, moved old one out to /var/lib/torrus/db.broken. Restarted. [04:48:31] Logged the message, Master [05:26:30] RECOVERY - MySQL Replication Heartbeat on db36 is OK: OK replication delay 0 seconds [05:26:48] RECOVERY - MySQL Slave Delay on db36 is OK: OK replication delay 0 seconds [06:14:38] PROBLEM - MySQL Replication Heartbeat on db34 is CRITICAL: CRIT replication delay 21203 seconds [06:24:14] PROBLEM - Puppet freshness on owa3 is CRITICAL: Puppet has not run in the last 10 hours [06:24:59] RECOVERY - MySQL Replication Heartbeat on db34 is OK: OK replication delay 0 seconds [06:25:08] RECOVERY - MySQL Slave Delay on db34 is OK: OK replication delay 0 seconds [06:33:14] PROBLEM - Puppet freshness on owa1 is CRITICAL: Puppet has not run in the last 10 hours [06:33:14] PROBLEM - Puppet freshness on owa2 is CRITICAL: Puppet has not run in the last 10 hours [06:46:41] PROBLEM - NTP on srv278 is CRITICAL: NTP CRITICAL: Offset unknown [06:47:17] PROBLEM - Apache HTTP on srv278 is CRITICAL: Connection refused [06:50:35] RECOVERY - NTP on srv278 is OK: NTP OK: Offset 0.007147789001 secs [07:09:11] RECOVERY - Apache HTTP on srv278 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.052 second response time [08:48:22] !log nikerabbit synchronized php-1.18/extensions/Narayam/Narayam.php 'Narayam gu mapping out of beta' [08:48:25] Logged the message, Master [08:49:34] !log nikerabbit synchronized php-1.19/extensions/Narayam/Narayam.php 'Narayam gu mapping out of beta' [08:49:37] Logged the message, Master [09:56:55] New patchset: Hashar; "redirect some missing Swift syslog messages" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2820 [09:57:22] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/2820 [11:20:43] PROBLEM - RAID on searchidx2 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [11:22:30] RECOVERY - RAID on searchidx2 is OK: OK: State is Optimal, checked 4 logical device(s) [11:31:45] New patchset: Hashar; "allow hashar on formey host" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2821 [11:32:09] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/2821 [13:02:59] PROBLEM - Puppet freshness on hooper is CRITICAL: Puppet has not run in the last 10 hours [13:04:02] PROBLEM - Puppet freshness on cadmium is CRITICAL: Puppet has not run in the last 10 hours [13:19:11] mark, maybe you can help me update https://www.mediawiki.org/wiki/SwiftMedia -- I'm not sure to what extent Swift is now deployed or what proportion of our media storage needs it serves [13:23:43] Is wikitech suppose to have a self signed ssl rather than the wikimedia wildcard? [13:23:43] ah sumanah, it's serving 100% of thumbs. [13:24:02] Damianz: it can't have the wildcard cert, because it's stored on an external server which we don't control [13:24:11] Damianz: there's a bug open about that. The ops team is delaying fixing it because they may merge that with another wiki, IIUC [13:24:27] sumanah: also I added my name as a gsoc mentor possibility and plugged the signup page to ops [13:24:50] Ah [13:24:51] yes, the idea is to move it to labs in the end with an external copy off site that's only needed if everything dies [13:24:53] Thank you apergos [13:24:58] That's ok then, thought I'd ask :D [13:25:03] thanks for pointing it out sumanah [13:25:09] sure thing Damianz [13:25:19] Damianz: https://bugzilla.wikimedia.org/show_bug.cgi?id=27291 [13:26:31] but what are you watching? [14:15:48] working on a traceroute [14:15:50] mark: prod? [14:15:54] hi werdna [14:16:00] hey Daniel_WMDE [14:16:13] The requested URL http://en.wikipedia.org/wiki/ was not found on this server. [14:16:16] :/ [14:16:57] restored [14:16:58] that sounds interesting :) [14:17:13] for reference it broke here [14:17:14] 17 ae1d0.mcr1.tampa-fl.us.xo.net (216.156.1.109) 232.827 ms 243.969 ms 271.850 ms [14:17:17] 18 * * * [14:17:31] omg what a ping [14:17:35] are you in some kind of australia?!!!? [14:17:44] but that error message isn't consistent with a broken route... you *did* get a response. just apparently from the wrong box... [14:18:12] yeah, it was up and down [14:18:19] I had a broken route for a bit [14:18:25] ohh [14:18:29] my ISP has a transparent proxy [14:18:33] I believe [14:18:43] werdna: TPG? [14:19:02] Telstra 3G [14:19:21] domas: I'm on the 58th floor of a building and I'm using 3G internet from Australia [14:19:28] it's a miracle that it gets to tampa at all [14:20:25] werdna: yeah, they do iirc, proxies out into melb iirc [14:21:29] werdna: don't you have direct visibility from that high?! [14:21:38] oh how I don't miss using telstra 3g [14:21:47] Telstra 3G is pretty good all things considered [14:37:30] PROBLEM - Puppet freshness on mw1010 is CRITICAL: Puppet has not run in the last 10 hours [14:39:14] https://pl.wikipedia.org/wiki/Special:Version says plwiki is running r112130, however I see that r112570 is already deployed (my Special:CheckUserLog problem is no longer there). What's changed? [14:44:33] PROBLEM - Puppet freshness on mw1110 is CRITICAL: Puppet has not run in the last 10 hours [14:44:33] PROBLEM - Puppet freshness on mw1020 is CRITICAL: Puppet has not run in the last 10 hours [14:46:08] New review: Demon; "(no comment)" [operations/puppet] (production) C: 1; - https://gerrit.wikimedia.org/r/2821 [15:27:58] PROBLEM - Packetloss_Average on locke is CRITICAL: CRITICAL: packet_loss_average is 10.0594110526 (gt 8.0) [15:30:36] join the better side of the world: #wikimedia-tech-nolog Same without the nasty public logging. [15:30:39] Change abandoned: Mark Bergsma; "Old" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/887 [15:37:17] lol [15:38:50] Change abandoned: Mark Bergsma; "This doesn't work as it causes duplicate Puppet definitions." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/1726 [15:44:27] Change abandoned: Mark Bergsma; "This does not look like it can go into production as it is now, sorry. Since it's an old change I wi..." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2254 [15:45:46] New review: Mark Bergsma; "This breaks the mail gateway I'm afraid" [operations/puppet] (production); V: 0 C: -2; - https://gerrit.wikimedia.org/r/2446 [15:47:36] New review: Mark Bergsma; "What is the status of this change?" [operations/puppet] (production); V: 0 C: 0; - https://gerrit.wikimedia.org/r/2495 [15:47:46] PROBLEM - RAID on searchidx2 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [15:49:43] RECOVERY - RAID on searchidx2 is OK: OK: State is Optimal, checked 4 logical device(s) [15:52:07] PROBLEM - Packetloss_Average on locke is CRITICAL: CRITICAL: packet_loss_average is 8.81116736842 (gt 8.0) [15:59:03] domas: were you laughing about my "nolog" advert? :-P Not nice of you! ;) [15:59:12] I was [16:02:01] RECOVERY - Packetloss_Average on locke is OK: OK: packet_loss_average is 3.60184761062 [16:26:16] PROBLEM - Puppet freshness on owa3 is CRITICAL: Puppet has not run in the last 10 hours [16:35:16] PROBLEM - Puppet freshness on owa2 is CRITICAL: Puppet has not run in the last 10 hours [16:35:16] PROBLEM - Puppet freshness on owa1 is CRITICAL: Puppet has not run in the last 10 hours [16:37:22] PROBLEM - Packetloss_Average on locke is CRITICAL: CRITICAL: packet_loss_average is 11.5292930702 (gt 8.0) [17:07:28] PROBLEM - Packetloss_Average on locke is CRITICAL: CRITICAL: packet_loss_average is 11.6548744737 (gt 8.0) [17:20:47] !log aaron synchronized php-1.19/extensions/PagedTiffHandler/PagedTiffHandler_body.php 'deployed r112614' [17:20:50] Logged the message, Master [17:21:24] RECOVERY - Packetloss_Average on locke is OK: OK: packet_loss_average is 3.59682 [17:23:53] hmm, still no good [17:25:45] if ( is_file( $dstPath ) ) { [17:25:47] return new ThumbnailImage( $image, $dstUrl, $width, $height, $dstPath, $page ); [17:25:49] * AaronSchulz sighs [17:28:08] !log aaron synchronized php-1.19/extensions/PagedTiffHandler/PagedTiffHandler_body.php [17:28:11] Logged the message, Master [17:31:18] PROBLEM - Packetloss_Average on locke is CRITICAL: CRITICAL: packet_loss_average is 10.0625042105 (gt 8.0) [17:33:02] !log aaron synchronized php-1.19/extensions/PagedTiffHandler/PagedTiffHandler_body.php [17:33:05] Logged the message, Master [17:47:21] RECOVERY - Packetloss_Average on locke is OK: OK: packet_loss_average is 3.23885684211 [18:00:29] Hi all! Here's Tom from Wikimedia Brasil and Open Knowledge Foundation Brasil. Can someone recommend some multilanguage extension for mediawiki? I'm wrinting a text on possibilities for wiki.okfn.org site and I'd like to know what do we use on wikipedia [18:01:43] I am analysing what is better, to have the language set by the browser default and rediredct to the proper page (as commons) or to have a bar for all available language in a specifi article [18:01:59] New patchset: Mark Bergsma; "Add strontium and palladium as bits servers" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2834 [18:03:01] New patchset: Mark Bergsma; "Add strontium and palladium as bits servers" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2834 [18:04:04] New patchset: Mark Bergsma; "Add strontium and palladium as bits servers" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2834 [18:04:56] New review: Mark Bergsma; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/2834 [18:04:57] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2834 [18:08:48] New patchset: Ryan Lane; "Moving operations/software commits to operations channels." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2836 [18:22:05] PROBLEM - Packetloss_Average on locke is CRITICAL: CRITICAL: packet_loss_average is 10.4970663158 (gt 8.0) [18:33:39] New review: Ryan Lane; "(no comment)" [operations/puppet] (production); V: 1 C: 2; - https://gerrit.wikimedia.org/r/2836 [18:33:39] Change merged: Ryan Lane; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2836 [18:41:58] RoanKattouw, what's the status of the videos transcode' :) [18:42:11] RECOVERY - Packetloss_Average on locke is OK: OK: packet_loss_average is 3.64073380531 [18:42:23] I started it over the weekend [18:42:28] It got most of the way there, then the box died [18:42:36] heh I noticed [18:42:48] Rob brought it back up, but I haven't had time to go in and investigate which files are complete and which ones aren't [18:43:07] If I could get Ryan to set up the proxy, I guess a volunteer could do that for me maybe [18:43:14] (odder has been really great helping out with this) [18:44:06] I see [18:44:13] RoanKattouw: bring apache up on it, and I'll see how hard it'll be to proxy through an existing proxy [18:44:17] OK [18:44:30] I may just use singer [18:45:09] !log Installing Apache on cadmium [18:45:12] Logged the message, Mr. Obvious [18:49:31] AaronSchulz: is there a change log or release notes for flagged revs? [18:51:41] Ryan_Lane: Apache running and set up [18:51:50] ok [18:52:03] there's a couple things I *must* do today [18:52:08] like move the gerrit server [18:52:12] Sure [18:52:15] This is not high-prio [18:52:22] when do you need it by? [18:52:29] I'll make sure to do it by then [18:52:42] I'll even stick it into my calendar :D [18:52:44] I don't really have any sort of deadline [18:52:52] ok. by next monday? [18:52:58] But the Wikimania video work will be stalled [18:53:00] * Ryan_Lane needs a deadline ;) [18:53:00] Thta's fine [18:53:37] cool [18:59:46] New patchset: Mark Bergsma; "Make strontium and palladium internal rightaway, and prepare for converting the others" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2837 [19:00:27] New review: Mark Bergsma; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/2837 [19:00:28] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2837 [19:05:07] robla: AaronSchulz (abandoning #wikimedia-dev to not crash their meeting) any other comments? [19:05:46] not right this instant. heading into 1:1 now [19:05:52] maplebed: ok [19:06:14] AaronSchulz: for example, 3/37/BigPinkHeart.jpg/225px-BigPinkHeart.jpg&nmid=81691698 is an object in swift that my script would delete. [19:06:38] yeah, I think that's fine [19:06:44] how about 5/52/Fire_1000_fps.ogg/seek=17-Fire_1000_fps.ogg.jpg [19:07:02] that should not be deleted (judging from the name alone) [19:07:37] of course if it's corrupt its ok to nuke ;) [19:08:02] corrupt meaning filesize != size on ms5? [19:08:41] ok, so in addition to /\d+px-, I have /seek=[\d.]+- that's ok. [19:09:15] maplebed: yeah [19:09:19] eg 5/55/Snowboard_half-pipe.ogg/seek=17.6-Snowboard_half-pipe.ogg.jpg [19:10:02] !log reedy synchronizing Wikimedia installation... : Updating message stuffs [19:10:05] Logged the message, Master [19:11:23] AaronSchulz: any other file formats you can think of? [19:12:01] the tifs have a different format. [19:12:12] eg 4/4d/"Don't_be_Afraid_to_Ask,_Who_is_That_Guy^_-_NARA_-_514130.tif/lossy-page1-390px-"Don't_be_Afraid_to_Ask,_Who_is_That_Guy^_-_NARA_-_514130.tif.jpg [19:12:36] I would avoid any script that has to enumerate all thumbnail param types [19:13:04] I just want to find cruft I can delete, because there's a lot of stuff in there that is clearly broken. [19:14:41] you have page-,page\d-,seek,mid,lossy,px-, and who knows what else :) [19:15:07] I guess checking for truncated names and URL params is OK to identify bad stuff [19:15:25] (without false positives) [19:15:27] how about name/stuff-name as a valid match, [19:15:39] but name/stuff-namestuff as broken? [19:15:56] so far each of the 'good' bits seems to come abefore the thumb name. [19:16:11] the params always come first, yes [19:16:36] or "stuff" as you call it [19:16:45] ok, so the only exception is that it might be name/stuff-name.jpg (when name is .ogv or .ogg or .tif or .svg etc.) [19:16:56] is there always a hyphen between stuff and name? [19:17:50] 4/4d/000_0331.JPG/720px-000_0331.JPG&crop&fallback=hub_city&prefix=q is a good example of the type of thing I want to delete. [19:17:55] svgs are rendered to pngs [19:18:03] so you have name/stuff-name.png [19:18:18] ok, I'll except stuff-name.png and stuff-name.jpg [19:18:20] and yes the hyphen is always there [19:24:17] maplebed: anyway, the rest of the pastebin logic looked fine [19:24:38] thanks. [19:29:25] sync done. [19:30:23] Beau_: no changelog aside from SVN I'm afraid. There were little user facing changes and only some refactoring/fixes this release. [19:30:34] 19 minutes [19:30:35] yum [19:30:43] Reedy: I was about to say "damn" :) [19:31:27] Reedy: you could have heated up and finished a bowl of soup in that time [19:35:50] New patchset: Lcarr; "Ensuring conflicting files absent Also fixing some indenting" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2838 [19:37:21] New review: Lcarr; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/2838 [19:37:22] Change merged: Lcarr; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2838 [19:56:52] New review: Ryan Lane; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/2629 [19:56:53] Change merged: Ryan Lane; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2629 [19:57:26] New patchset: Lcarr; "switching to exec whenever (as does not trigger if nagios is not running already)" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2839 [19:57:49] New review: Ryan Lane; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/2781 [19:57:49] Change merged: Ryan Lane; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2781 [19:58:02] New review: Lcarr; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/2839 [19:58:02] Change merged: Lcarr; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2839 [19:58:10] New review: Ryan Lane; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/2590 [19:58:11] Change merged: Ryan Lane; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2590 [19:58:30] New review: Ryan Lane; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/2578 [19:58:31] Change merged: Ryan Lane; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2578 [20:00:22] New review: Ryan Lane; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/2514 [20:02:56] New patchset: Lcarr; "amending neon search path" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2840 [20:03:20] New review: Lcarr; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/2840 [20:03:23] New patchset: Hashar; "pyc files are now ignored" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2514 [20:03:47] Change merged: Lcarr; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2840 [20:15:11] New patchset: Mark Bergsma; "Reformat" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2841 [20:15:36] New patchset: Mark Bergsma; "Let's try raid1-lvm.cfg for bits servers" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2842 [20:16:07] New review: Mark Bergsma; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/2841 [20:16:07] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2841 [20:16:09] New review: Mark Bergsma; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/2842 [20:16:10] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2842 [20:28:12] Ryan_Lane: I have rebased https://gerrit.wikimedia.org/r/#change,2514 [20:37:09] RECOVERY - Host db1004 is UP: PING WARNING - Packet loss = 93%, RTA = 1924.98 ms [20:37:46] I found a case when a user was renamed but its contributions were moved only partially [20:38:08] And it was moved many days ago, so that's not job queue lag apparently [20:41:03] PROBLEM - mysqld processes on db1004 is CRITICAL: PROCS CRITICAL: 0 processes with command name mysqld [20:45:39] New patchset: Mark Bergsma; "Fix syntax, add missing 'echo'" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2843 [20:47:02] New review: Mark Bergsma; "lint check can kiss my..." [operations/puppet] (production); V: 1 C: 2; - https://gerrit.wikimedia.org/r/2843 [20:47:10] New review: Mark Bergsma; "lint check can kiss my..." [operations/puppet] (production); V: 1 C: 2; - https://gerrit.wikimedia.org/r/2843 [20:47:11] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2843 [20:48:05] !log reedy synchronized php-1.19/resources/mediawiki/mediawiki.util.js 'r112632' [20:48:07] Logged the message, Master [20:49:24] !log reedy synchronized php-1.19/extensions/CentralAuth/specials/SpecialCentralAuth.php 'r112634' [20:49:27] Logged the message, Master [20:49:39] CAN WE HAZ BUZILLA TROLL HAMER ? [20:50:18] !log reedy synchronized php-1.19/extensions/Collection/Collection.body.php 'r112634' [20:50:21] Logged the message, Master [21:00:05] vvv: that's normal for mw, take a look at https://bugzilla.wikimedia.org/show_bug.cgi?id=22907 [21:00:49] Beau_: does anyone bother to fix this? [21:02:05] vvv: Ariel fixed some of contribs, but not all; I think there's more on other wikis... I don't think there is a restart mechanism for renames if it was not completed [21:04:04] New patchset: Mark Bergsma; "Really fix syntax this time. And coffee." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2844 [21:04:57] New review: Mark Bergsma; "(no comment)" [operations/puppet] (production); V: 1 C: 2; - https://gerrit.wikimedia.org/r/2844 [21:04:58] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2844 [21:10:41] does anyone in particular maintain the http://test.wikipedia.org/ machine? [21:12:42] Ops does [21:12:46] It's part of our cluster [21:12:57] Do you have a specific issue with it? [21:13:21] RoanKattouw: I'd like to put it to better use in the near future I think. [21:13:47] What's "better use"? [21:13:51] test.wp.o has a very specific use [21:13:56] It's a last-minute staging area [21:14:08] It's almost impossible to stage things there without risking them accidentally getting deployed to the site [21:14:40] RoanKattouw: hmm, I thought test2 was the "staging" machine [21:14:55] So the workflow is 1) be confident something's good 2) merge it into the deployment branch 3) update the master checkout on fenari 4) this puts it on test.wp.o automatically, test it there 5) once good, run sync script and it goes to all machines [21:15:10] No, test.wp.o just has access to the newly deployed code right before all other machines do [21:15:22] Like, if you run svn up but don't run a sync script, the code already runs on test [21:15:50] is it possible that something in the cluster uses ForeignAPIRepo or something? CU's observed 127.0.0.1 in da logs [21:17:14] AaronSchulz: ---^^ [21:18:48] saper: checkusers? [21:20:47] AaronSchulz: yes, on commons [21:20:51] New patchset: Mark Bergsma; "Readd the mountpoint, accidently removed by the reformat" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2845 [21:21:24] New review: Mark Bergsma; "(no comment)" [operations/puppet] (production); V: 1 C: 2; - https://gerrit.wikimedia.org/r/2845 [21:21:25] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2845 [21:21:41] saper: why do you suspect that? It's read-only repo, it wouldn't cause loggable items? [21:21:41] AaronSchulz: my current theory is Http::get, Commons sounds suspicious and points to $useInstantCommons = true; [21:22:39] right [21:22:42] no idea then [21:22:53] new accounts are also logged [21:23:15] AaronSchulz: you can check in the db if you can (I can't) [21:25:18] saper: so you are saying GETs to index.php from a wiki are triggering centralauth account creation? [21:27:39] * AaronSchulz doesn't see anything in the config with api repo [21:28:09] AaronSchulz: this is the regex I've come up with to match thumbnails: "(temp|archive)?/?./../(\d+!)?(?P[^/]*)/(?P.*)-(?P=media)(.jpg|.png)?(?P.*)$" It seems to work. [21:28:24] AaronSchulz: I have _no_ idea. We only have reports from CUs seeing this. Might be not even Http::get. [21:29:23] saper: I think commons descriptions pages use ::get [21:29:29] though that's not foriegn api repo [21:31:35] maplebed: I'd change (temp|archive) to (temp/|archive/) and kill the /? [21:32:15] k. [21:32:17] still works. [21:32:49] and can be empty? :) [21:33:09] shouldn't be! [21:33:11] * maplebed changes * to + [21:33:30] also, files can have '-' in the name [21:33:38] New review: Reedy; "(no comment)" [operations/puppet] (production) C: 1; - https://gerrit.wikimedia.org/r/2821 [21:33:52] I think it's ok. [21:33:53] "(temp/|archive/)?./../(\d+!)?(?P[^/]+)/(?P.*)-(?P=media)(.jpg|.png)?(?P.*)$" [21:34:05] the medai match doesn't say no hypens, [21:34:17] so in order to match media the second time, it has to start at teh right place. [21:34:21] New patchset: Lcarr; "Adding in proper cgi.cfg for nagios3 and moving all nagios3 specific files to their own folder" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2846 [21:35:02] maplebed: nvm, it's ok with (?P=media) [21:35:16] AaronSchulz: a/a2/Face-espiegle.svg/40px-Face-espiegle.svg.png was not triggered as a 'bad image' [21:35:21] so I think it works as intended. [21:35:33] New patchset: Lcarr; "Adding in proper cgi.cfg for nagios3 and moving all nagios3 specific files to their own folder" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2846 [21:35:47] (but a/a2/Face-espiegle.svg/30px-Face-esp was, so yay!) [21:39:42] New patchset: Lcarr; "Adding in proper cgi.cfg for nagios3 and moving all nagios3 specific files to their own folder" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2846 [21:39:54] maplebed: I forgot what (?P=media) meant for a second :) [21:40:06] gotcha. [21:44:52] PROBLEM - MySQL Idle Transactions on db24 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [21:45:10] PROBLEM - MySQL Recent Restart on db24 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [21:45:28] PROBLEM - SSH on db24 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [21:45:56] PROBLEM - MySQL Replication Heartbeat on db24 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [21:46:04] PROBLEM - DPKG on db24 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [21:46:04] PROBLEM - mysqld processes on db24 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [21:46:22] PROBLEM - MySQL Slave Delay on db24 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [21:46:22] PROBLEM - MySQL Slave Running on db24 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [21:46:31] PROBLEM - Disk space on db24 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [21:46:31] PROBLEM - MySQL disk space on db24 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [21:46:41] PROBLEM - RAID on db24 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [21:46:41] PROBLEM - Full LVS Snapshot on db24 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [21:47:14] úps [21:48:07] New review: Lcarr; "gerrit borked" [operations/puppet] (production); V: 1 C: 2; - https://gerrit.wikimedia.org/r/2846 [21:48:09] saper: please put the plug back in ;) [21:48:11] Change merged: Lcarr; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2846 [21:49:32] ookay ... lemme check... the red one? [21:49:36] New patchset: Ryan Lane; "Revert "rm ansi sequences when validating puppet changes". Unfortunately broke the lint check." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2858 [21:49:59] New review: Ryan Lane; "(no comment)" [operations/puppet] (production); V: 1 C: 2; - https://gerrit.wikimedia.org/r/2858 [21:49:59] Change merged: Ryan Lane; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2858 [21:55:12] Hi. I think we have a vandal at bugzilla again. [21:55:29] RoanKattouw: I have a question about https://www.mediawiki.org/wiki/Special:Code/MediaWiki/103701, right now Special:NewPages shows current page title after page move, however if I change namespace of the page, the filter for example main namespace shows me pages moved to other namespaces... is this a bug or a feature? [21:56:05] He's changing bugs as "resolved" while they're not. [21:56:32] Hmm [21:56:45] mafk_: Link(s)? [21:56:51] john.next@gmx.com [21:57:04] Example: https://bugzilla.wikimedia.org/show_bug.cgi?id=18526 [21:57:05] Yeah that was about an hour ago [21:57:12] We blocked him and we're cleaning it up I think [21:57:15] Just received the email now. [21:57:24] Beau_: It's sort of a bug, exposed by this fix [21:57:36] Sorry for double-work [21:57:47] Beau_: The namespace filter is still based on the old original name, and we can't really change that [22:00:18] RoanKattouw: if there is old name stored in that table, maybe it will be a good idea to show a message: 2012-02-28T22:56:01 ‎Dyskusja:Reineckeia (hist) ‎[6,323 bytes] ‎MastiBot (Talk | contribs | block) moved from Previous title (→Robot zgłasza niedostępny link zewnętrzny) [22:00:23] it will not confuse the users [22:00:39] That could be done [22:00:45] Could you file a bug for that? [22:00:53] okay [22:01:23] I probably won't have time for it, but it should be an easy bug for someone to fix some time [22:04:37] New patchset: Hashar; "rm ansi sequences when validating puppet changes" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2863 [22:09:41] PROBLEM - NTP on db24 is CRITICAL: NTP CRITICAL: No response from NTP server [22:16:29] New patchset: Lcarr; "Fixing file paths" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2864 [22:17:18] New review: Lcarr; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/2864 [22:17:19] Change merged: Lcarr; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2864 [22:22:08] New patchset: Lcarr; "removing default site" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2865 [22:22:32] New review: Lcarr; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/2865 [22:23:28] New review: Lcarr; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/2865 [22:23:29] Change merged: Lcarr; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2865 [22:30:32] PROBLEM - Puppet freshness on db1040 is CRITICAL: Puppet has not run in the last 10 hours [22:32:29] PROBLEM - Puppet freshness on db1040 is CRITICAL: Puppet has not run in the last 10 hours [22:33:18] !log on db40: deleting mysqld data dir and recreating from schema files in /a/dump [22:33:20] Logged the message, Master [22:34:09] !log aaron synchronized php-1.19/extensions/FlaggedRevs/business/RevisionReviewForm.php 'debug logging' [22:34:12] Logged the message, Master [22:34:26] PROBLEM - Puppet freshness on db1040 is CRITICAL: Puppet has not run in the last 10 hours [22:36:23] PROBLEM - Puppet freshness on db1040 is CRITICAL: Puppet has not run in the last 10 hours [22:38:02] RECOVERY - Disk space on db40 is OK: DISK OK [22:38:20] PROBLEM - Puppet freshness on db1040 is CRITICAL: Puppet has not run in the last 10 hours [22:39:05] RECOVERY - MySQL disk space on db40 is OK: DISK OK [22:40:17] PROBLEM - Puppet freshness on db1040 is CRITICAL: Puppet has not run in the last 10 hours [22:41:04] !log aaron synchronized php-1.19/extensions/FlaggedRevs/backend/FlaggedRevs.hooks.php 'rc_patrolled bug fix' [22:41:07] Logged the message, Master [22:42:14] PROBLEM - Puppet freshness on db1040 is CRITICAL: Puppet has not run in the last 10 hours [22:44:06] "The certificate expired on 21.01.2011 16:35. The current time is 28.02.2012 23:43." (wikitech) [22:44:11] PROBLEM - Puppet freshness on db1040 is CRITICAL: Puppet has not run in the last 10 hours [22:46:17] PROBLEM - Puppet freshness on db1040 is CRITICAL: Puppet has not run in the last 10 hours [22:48:14] PROBLEM - Puppet freshness on db1040 is CRITICAL: Puppet has not run in the last 10 hours [22:49:26] RECOVERY - Puppet freshness on db1040 is OK: puppet ran at Tue Feb 28 22:49:16 UTC 2012 [22:50:11] PROBLEM - Puppet freshness on db1040 is CRITICAL: Puppet has not run in the last 10 hours [22:52:08] PROBLEM - Puppet freshness on db1040 is CRITICAL: Puppet has not run in the last 10 hours [22:53:29] PROBLEM - RAID on db40 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [22:54:05] PROBLEM - Puppet freshness on db1040 is CRITICAL: Puppet has not run in the last 10 hours [22:54:19] Why are both test.wikipedia and test2.wikipedia on 1.19 ? [22:54:31] I have no idea [22:54:47] is that revertible ? [22:54:58] (one of them) [22:55:33] Why? [22:55:34] Krinkle: is it a problem? [22:55:41] we need a test1 ;) [22:55:48] I thought that was test [22:56:02] PROBLEM - Puppet freshness on db1040 is CRITICAL: Puppet has not run in the last 10 hours [22:56:50] I figured it would be helpful for emergency testing to have one set to 'default' and one to 'next', so that during all stages there is a working environment from before [22:57:02] until the last stage flips it and leaves only 'default' [22:57:10] anyway, something for next time [22:57:23] "This [test2 wiki] was upgraded to MediaWiki 1.19 on Monday, February 13, 23:00-01:00 UTC. [22:57:25] hm.. [22:57:53] domas: guess what wiki is OOMing on parse? [22:57:58] so that one was scheduled [22:57:59] PROBLEM - Puppet freshness on db1040 is CRITICAL: Puppet has not run in the last 10 hours [22:58:00] https://www.mediawiki.org/wiki/MediaWiki_1.19/Roadmap#Deployment_schedule [22:58:07] deployment of 1.19 to test.wiki was not scheduled [22:58:08] aaronschulz: l'oreal! [22:58:15] aaronschulz: no wait, head&shoulders! [22:58:30] aaronschulz: hmmm.... ran out of shampoo brands! [22:58:34] Server: srv286 [22:58:36] Method: GET [22:58:38] URL: http://oc.wikipedia.org/wiki/Prada_de_Conflent [22:58:43] flooding /var/log/mw/fatal.log [23:00:05] PROBLEM - Puppet freshness on db1040 is CRITICAL: Puppet has not run in the last 10 hours [23:00:28] domas: seriously, click that link ;) [23:00:52] why would I [23:00:57] I cannot do anything with ocwiki [23:01:08] they want to take my root access away [23:01:25] so you're letting them vandalize the site with OOMs? :p [23:02:02] PROBLEM - Puppet freshness on db1040 is CRITICAL: Puppet has not run in the last 10 hours [23:03:59] PROBLEM - Puppet freshness on db1040 is CRITICAL: Puppet has not run in the last 10 hours [23:05:02] PROBLEM - Puppet freshness on hooper is CRITICAL: Puppet has not run in the last 10 hours [23:05:17] aaronschulz: why would I care? [23:05:26] I was explicitly asked not to do anything to upset the community [23:05:35] or my root access will be taken away [23:05:56] PROBLEM - Puppet freshness on cadmium is CRITICAL: Puppet has not run in the last 10 hours [23:05:56] PROBLEM - Puppet freshness on db1040 is CRITICAL: Puppet has not run in the last 10 hours [23:06:07] domas: you can't even investigate that wiki? [23:06:37] ah well [23:06:44] I guess thats already against their sovereignty [23:06:58] * AaronSchulz leaves it broken [23:07:17] RECOVERY - Puppet freshness on db1040 is OK: puppet ran at Tue Feb 28 23:06:55 UTC 2012 [23:07:35] domas: may I ask why they are so angry about you? [23:07:53] PROBLEM - Puppet freshness on db1040 is CRITICAL: Puppet has not run in the last 10 hours [23:08:06] dabpunkt: dunno, they got upset [23:08:22] http://oc.wikipedia.org/wiki/Discussion_Utilizaire:Midom [23:09:50] PROBLEM - Puppet freshness on db1040 is CRITICAL: Puppet has not run in the last 10 hours [23:11:47] PROBLEM - Puppet freshness on db1040 is CRITICAL: Puppet has not run in the last 10 hours [23:13:53] PROBLEM - Puppet freshness on db1040 is CRITICAL: Puppet has not run in the last 10 hours [23:15:50] PROBLEM - Puppet freshness on db1040 is CRITICAL: Puppet has not run in the last 10 hours [23:17:41] New patchset: Ryan Lane; "Adding manganese as a new gerrit server." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2866 [23:17:47] PROBLEM - Puppet freshness on db1040 is CRITICAL: Puppet has not run in the last 10 hours [23:18:23] New review: Ryan Lane; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/2866 [23:18:24] Change merged: Ryan Lane; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2866 [23:19:44] PROBLEM - Puppet freshness on db1040 is CRITICAL: Puppet has not run in the last 10 hours [23:20:02] RECOVERY - Puppet freshness on db1040 is OK: puppet ran at Tue Feb 28 23:19:50 UTC 2012 [23:21:21] New patchset: Ryan Lane; "Removing sumanah from manganese, for now." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2867 [23:21:41] PROBLEM - Puppet freshness on db1040 is CRITICAL: Puppet has not run in the last 10 hours [23:22:35] New review: Ryan Lane; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/2867 [23:22:36] Change merged: Ryan Lane; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2867 [23:23:38] PROBLEM - Puppet freshness on db1040 is CRITICAL: Puppet has not run in the last 10 hours [23:25:53] PROBLEM - Puppet freshness on db1040 is CRITICAL: Puppet has not run in the last 10 hours [23:27:50] PROBLEM - Puppet freshness on db1040 is CRITICAL: Puppet has not run in the last 10 hours [23:29:47] PROBLEM - Puppet freshness on db1040 is CRITICAL: Puppet has not run in the last 10 hours [23:30:05] New patchset: Ryan Lane; "Adding an apache server to the gerrit proxy" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2868 [23:30:50] RECOVERY - Puppet freshness on db1040 is OK: puppet ran at Tue Feb 28 23:30:42 UTC 2012 [23:31:29] !log asher synchronized wmf-config/db.php 'pulling db24' [23:31:32] Logged the message, Master [23:31:44] PROBLEM - Puppet freshness on db1040 is CRITICAL: Puppet has not run in the last 10 hours [23:32:10] New review: Ryan Lane; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/2868 [23:32:10] Change merged: Ryan Lane; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2868 [23:32:32] !log midom synchronized wmf-config/db.php 'putting db34 back ha ha thanks for reminding' [23:32:35] Logged the message, Master [23:34:31] PROBLEM - Puppet freshness on db1040 is CRITICAL: Puppet has not run in the last 10 hours [23:35:26] "RECOVERY - Puppet freshness on db1040 is OK: puppet ran at Tue Feb 28 23:30:42 UTC 2012" vs. "PROBLEM - Puppet freshness on db1040 is CRITICAL: Puppet has not run in the last 10 hours" [23:35:32] 10 hours isn't what it used to be [23:36:23] * AaronSchulz watches asher distort time-space [23:36:37] PROBLEM - Puppet freshness on db1040 is CRITICAL: Puppet has not run in the last 10 hours [23:37:56] RobH: Ryan_Lane: ever tried to power cycle one of the old suns from the ilom and get "Performing hard reset on /SYS failed reset: Internal error" ? [23:38:34] PROBLEM - Puppet freshness on db1040 is CRITICAL: Puppet has not run in the last 10 hours [23:38:48] New patchset: Ryan Lane; "Fix requirement chain" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2869 [23:38:48] o.O [23:38:52] binasher: never had that issue [23:39:03] I haven't dealt with them too much, though [23:40:31] PROBLEM - Puppet freshness on db1040 is CRITICAL: Puppet has not run in the last 10 hours [23:42:28] PROBLEM - Puppet freshness on db1040 is CRITICAL: Puppet has not run in the last 10 hours [23:44:11] New patchset: Ryan Lane; "Prefer RC4 ciphers to combat BEAST" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2870 [23:44:25] PROBLEM - Puppet freshness on db1040 is CRITICAL: Puppet has not run in the last 10 hours [23:44:34] RECOVERY - SSH on db24 is OK: SSH OK - OpenSSH_5.3p1 Debian-3ubuntu7 (protocol 2.0) [23:44:34] RECOVERY - MySQL Idle Transactions on db24 is OK: OK longest blocking idle transaction sleeps for seconds [23:44:52] RECOVERY - Disk space on db24 is OK: DISK OK [23:45:01] RECOVERY - MySQL Recent Restart on db24 is OK: OK seconds since restart [23:45:01] RECOVERY - RAID on db24 is OK: OK: 1 logical device(s) checked [23:45:10] RECOVERY - MySQL Replication Heartbeat on db24 is OK: OK replication delay seconds [23:45:37] RECOVERY - MySQL Slave Delay on db24 is OK: OK replication delay seconds [23:45:46] RECOVERY - DPKG on db24 is OK: All packages OK [23:45:52] New review: Ryan Lane; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/2869 [23:45:53] Change merged: Ryan Lane; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2869 [23:46:03] New review: Ryan Lane; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/2870 [23:46:04] Change merged: Ryan Lane; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2870 [23:46:04] RECOVERY - MySQL Slave Running on db24 is OK: OK replication [23:46:22] PROBLEM - Puppet freshness on db1040 is CRITICAL: Puppet has not run in the last 10 hours [23:47:45] New patchset: Ryan Lane; "Fix stupid requirement copy/paste" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2871 [23:48:19] PROBLEM - Puppet freshness on db1040 is CRITICAL: Puppet has not run in the last 10 hours [23:48:55] New review: Ryan Lane; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/2871 [23:48:55] Change merged: Ryan Lane; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2871 [23:50:16] PROBLEM - Puppet freshness on db1040 is CRITICAL: Puppet has not run in the last 10 hours [23:51:10] PROBLEM - MySQL Replication Heartbeat on db24 is CRITICAL: CRIT replication delay 7830 seconds [23:51:55] PROBLEM - MySQL Slave Running on db24 is CRITICAL: CRIT replication Slave_IO_Running: No Slave_SQL_Running: No Last_Error: Rollback done for prepared transaction because its XID was not in the [23:52:22] PROBLEM - Puppet freshness on db1040 is CRITICAL: Puppet has not run in the last 10 hours [23:54:19] PROBLEM - Puppet freshness on db1040 is CRITICAL: Puppet has not run in the last 10 hours [23:56:16] PROBLEM - Puppet freshness on db1040 is CRITICAL: Puppet has not run in the last 10 hours [23:58:13] PROBLEM - Puppet freshness on db1040 is CRITICAL: Puppet has not run in the last 10 hours