[00:07:05] New patchset: Lcarr; "Adding in gangliaweb class and putting on nickel" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/1775 [00:07:21] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/1775 [00:11:52] New review: Petrb; "looks good, but someone else must approve :|" [operations/puppet] (production) C: 1; - https://gerrit.wikimedia.org/r/1775 [00:14:21] neilk_, you're not in the middle of doing svn update on fenari are you? [00:14:39] Reedy: as a matter of fact yes, but I was just about to abort. [00:14:47] Reedy: please give me a minute or two to clean up. [00:14:55] Alright [00:15:03] zzz [00:27:22] Reedy: things should be okay now [00:28:16] cheers [00:28:57] !log reedy synchronized php-1.18/extensions/ 'r107970' [00:28:58] Logged the message, Master [00:30:29] !log reedy synchronized php-1.18/extensions/Nuke/ 'r107974' [00:30:30] Logged the message, Master [00:32:20] !log tstarling synchronizing Wikimedia installation... : [00:32:21] Logged the message, Master [00:33:40] hehe [00:34:00] sync done. [00:41:17] !log reedy synchronized php-1.18/includes/specials/ 'r107975' [00:41:18] Logged the message, Master [00:42:12] !log running purgeParserCache.php on hume, deleting objects older than 3 months [00:42:13] Logged the message, Master [00:48:57] !log reedy synchronized php-1.18/extensions 'r107977, r107976' [00:48:58] Logged the message, Master [00:56:30] Do you guys know that Special:LinkSearch is broken? [00:56:35] At least in ru.wp [00:58:12] Reedy: ping [00:58:19] Bug with r107956 [00:58:42] Blah [00:58:51] PHP fatal error in /usr/local/apache/common-local/php-1.18/includes/SpecialPage.php line 686: [00:58:52] Call to undefined method RequestContext::getLanguage() [01:00:52] Oh [01:00:54] Feck [01:05:17] !log reedy synchronized php-1.18/includes/ 'r107978' [01:05:17] Logged the message, Master [01:26:40] New patchset: Lcarr; "Creating ganglia frontend class, gmetad.conf added" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/1776 [01:26:55] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/1776 [01:32:00] New review: Asher; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/1776 [01:34:14] New review: Lcarr; "adding in because accidentally put this in requirements" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/1775 [01:34:15] Change merged: Lcarr; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/1776 [01:34:15] Change merged: Lcarr; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/1775 [01:46:12] hmm, might be worth it to send an update to microblogging groups watchers, I see that even #wikipedia status has not been updated [01:46:16] !log Last week slowness: job queue backlog now cleared on !Wikimedia Commons and (almost) English !Wikipedia http://ur1.ca/77q9b [01:46:18] Logged the message, Master [02:04:57] !log LocalisationUpdate completed (1.18) at Wed Jan 4 02:04:57 UTC 2012 [02:04:58] Logged the message, Master [02:25:44] !log removing manually added 10.2.1.13 address from lvs4 [02:25:46] Logged the message, Master [02:28:07] !log on spence: setting up logrotate for nagios.log and removing nagios-bloated-log.log [02:28:08] Logged the message, Master [02:30:54] !log I should clarify that I removed 10.2.1.13 from /etc/network/interfaces, it's still properly bound to lo [02:30:55] Logged the message, Master [02:56:21] RECOVERY - check_job_queue on spence is OK: JOBQUEUE OK - all job queues below 10,000 [02:56:22] RECOVERY - check_job_queue on spence is OK: JOBQUEUE OK - all job queues below 10,000 [03:04:51] RECOVERY - Puppet freshness on ms1002 is OK: puppet ran at Wed Jan 4 03:04:41 UTC 2012 [03:04:52] RECOVERY - Puppet freshness on ms1002 is OK: puppet ran at Wed Jan 4 03:04:41 UTC 2012 [03:20:30] !log on spence in nagios.cfg, reduced service_reaper_frequency from 10 to 1, to avoid having a massive process count spike every 10 seconds as checks are started. Locally only as a test. [03:20:32] Logged the message, Master [03:28:50] !log experimentally raised max_concurrent_checks to 128 [03:28:51] Logged the message, Master [04:05:55] TimStarling: urgently look at https://bugzilla.wikimedia.org/show_bug.cgi?id=32432 - the problem is just actually happening! [04:06:12] and good night... [04:23:09] RECOVERY - MySQL disk space on es1004 is OK: DISK OK [04:23:09] RECOVERY - MySQL disk space on es1004 is OK: DISK OK [04:23:39] RECOVERY - Disk space on es1004 is OK: DISK OK [04:23:39] RECOVERY - Disk space on es1004 is OK: DISK OK [04:37:44] PROBLEM - MySQL slave status on es1004 is CRITICAL: CRITICAL: Slave running: expected Yes, got No [04:37:44] PROBLEM - MySQL slave status on es1004 is CRITICAL: CRITICAL: Slave running: expected Yes, got No [04:42:24] PROBLEM - ps1-b5-sdtpa-infeed-load-tower-A-phase-Z on ps1-b5-sdtpa is CRITICAL: ps1-b5-sdtpa-infeed-load-tower-A-phase-Z CRITICAL - *2638* [04:42:24] PROBLEM - ps1-b5-sdtpa-infeed-load-tower-A-phase-Z on ps1-b5-sdtpa is CRITICAL: ps1-b5-sdtpa-infeed-load-tower-A-phase-Z CRITICAL - *2638* [05:01:54] RECOVERY - ps1-b5-sdtpa-infeed-load-tower-A-phase-Z on ps1-b5-sdtpa is OK: ps1-b5-sdtpa-infeed-load-tower-A-phase-Z OK - 2350 [05:01:54] RECOVERY - ps1-b5-sdtpa-infeed-load-tower-A-phase-Z on ps1-b5-sdtpa is OK: ps1-b5-sdtpa-infeed-load-tower-A-phase-Z OK - 2350 [06:03:33] tstarling cleared profiling data [07:20:21] New patchset: tstarling; "Updated IP address for upload-lb on ms6.esams" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/1777 [07:20:35] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/1777 [07:20:51] New review: tstarling; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/1777 [07:20:52] Change merged: tstarling; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/1777 [07:35:56] !log on ms6.esams: fixed proxy IP address and stopped puppet while I figure out how to fix it [07:35:58] Logged the message, Master [07:43:19] !log fixed puppet by re-running the post-merge hook with key forwarding enabled, and then started puppet on ms6 [07:43:20] Logged the message, Master [08:09:48] https://de.wikipedia.org/wiki/Spezial:MediaWiki-Systemnachrichten [08:09:52] PHP fatal error in /usr/local/apache/common-local/php-1.18/includes/specials/SpecialAllmessages.php line 118: [08:09:56] Call to undefined method AllmessagesTablePager::getRequest() [08:11:14] I think r107975 is the culprit [08:41:54] Raymond_: good catch [08:43:41] Nikerabbit: thanks. can you revert r107975 for SpecialAllmessages.php only? and deploy? [08:43:48] or fix it :) [08:43:50] Raymond_: I'll fix it [08:43:55] perfect [08:48:08] !log nikerabbit synchronized php-1.18/includes/specials/SpecialAllmessages.php 'r107998' [08:48:09] Logged the message, Master [08:54:11] Raymond_: ? [08:55:11] Nikerabbit: works. Thanks a lot [09:43:36] PROBLEM - Puppet freshness on es1002 is CRITICAL: Puppet has not run in the last 10 hours [09:43:37] PROBLEM - Puppet freshness on es1002 is CRITICAL: Puppet has not run in the last 10 hours [09:57:56] PROBLEM - Disk space on es1004 is CRITICAL: DISK CRITICAL - free space: /a 441484 MB (3% inode=99%): [09:57:57] PROBLEM - Disk space on es1004 is CRITICAL: DISK CRITICAL - free space: /a 441484 MB (3% inode=99%): [09:59:46] PROBLEM - MySQL disk space on es1004 is CRITICAL: DISK CRITICAL - free space: /a 435564 MB (3% inode=99%): [09:59:47] PROBLEM - MySQL disk space on es1004 is CRITICAL: DISK CRITICAL - free space: /a 435564 MB (3% inode=99%): [10:08:36] RECOVERY - MySQL slave status on es1004 is OK: OK: [10:08:37] RECOVERY - MySQL slave status on es1004 is OK: OK: [11:34:41] !log catrope synchronized php-1.18/extensions/ClickTracking/ClickTracking.hooks.php 'r108017' [11:34:42] Logged the message, Master [11:49:12] New patchset: Catrope; "Logrotate doesn't work with a missing olddir." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/1691 [11:50:28] New review: ArielGlenn; "(no comment)" [operations/puppet] (production); V: 1 C: 2; - https://gerrit.wikimedia.org/r/1691 [11:50:35] New review: ArielGlenn; "(no comment)" [operations/puppet] (production); V: 1 C: 2; - https://gerrit.wikimedia.org/r/1691 [11:50:36] Change merged: ArielGlenn; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/1691 [11:50:55] Thanks [13:42:59] zz [16:24:07] wow lag just jumped to almost 3 minutes [16:24:51] it takes at least 3 minutex to do that [16:31:43] RoanKattouw, have you been monitoring the decomissioned workers? Have new ones appeared today? [16:33:08] Let me check [16:33:16] The ones that were running were stuck anyway, I was being paranoid :) [16:33:42] ok [16:33:56] Nope, they're all clean [16:34:52] great [16:35:27] today, i'm cleaning all the bad entries found in the 01/01/2012 dump by null-editing the pages [16:35:33] let's see what happens in the next dump [18:03:28] !log catrope synchronized php-1.18/resources/mediawiki/mediawiki.user.js 'Live hack for tracking a percentage of bucketing events' [18:03:29] Logged the message, Master [18:04:17] ACKNOWLEDGEMENT - Host knsq11 is DOWN: PING CRITICAL - Packet loss = 100% daniel_zahn RT #2206 - hardware failure [18:04:17] ACKNOWLEDGEMENT - Host knsq11 is DOWN: PING CRITICAL - Packet loss = 100% daniel_zahn RT #2206 - hardware failure [18:15:08] RoanKattouw, bad news [18:15:38] http://pt.wiktionary.org/w/api.php?action=query&prop=links&format=xml&pllimit=1000&pageids=967 [18:16:00] there is a new bad link [[Imagem:....], which was not present in the latest dump [18:16:17] When was the last dump made? [18:16:22] 01/01 [18:16:45] page touched this morning [18:18:10] actually, a lot of new pages appeared since last dump [18:19:09] looking at the list at http://toolserver.org/~betacommand/ptwiktionary_p.txt , it seems most if not all templates with flags and [[Imagem:...]] created this problem [18:19:57] ACKNOWLEDGEMENT - Host srv191 is DOWN: PING CRITICAL - Packet loss = 100% daniel_zahn RT #2193 - failed hdd [18:20:34] this morning i edited a dependent template which would case all these other templates to refresh through the job queue [18:22:10] Righ [18:22:15] Well, complain to Tim on the bug then [18:25:15] !log catrope synchronized wmf-config/CommonSettings.php 'Enable tracking for AFTv5 bucketing' [18:25:16] Logged the message, Master [18:27:43] !log catrope synchronized wmf-config/CommonSettings.php 'and bump the version number too' [18:27:44] Logged the message, Master [18:32:01] !log catrope synchronized php-1.18/resources/mediawiki/mediawiki.user.js 'Revert live hack' [18:32:02] Logged the message, Master [18:33:21] !log catrope synchronized wmf-config/CommonSettings.php 'Actually bump version number' [18:33:21] Logged the message, Master [18:34:01] RoanKattouw, ok. when I see him around (his always away at my time). on a differen subject, is it normal to have pagelinks entries with a pl_from that does not exist in page (in page_id)? [18:34:17] Not really, but it happens sometimes [18:34:25] ok, i found 3 [18:37:11] !log catrope synchronized php-1.18/resources/startup.js 'touch' [18:37:12] Logged the message, Master [18:46:32] !log catrope synchronized wmf-config/CommonSettings.php 'Disable AFTv5 bucketing tracking again' [18:46:33] Logged the message, Master [18:58:26] !log catrope synchronized php-1.18/extensions/ArticleFeedbackv5/modules/jquery.articleFeedbackv5/jquery.articleFeedbackv5.js 'r108064' [18:58:26] Logged the message, Master [19:13:33] New patchset: Jgreen; "puppetizing impression log collection scripts" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/1778 [19:15:05] New review: Jgreen; "(no comment)" [operations/puppet] (production); V: 1 C: 2; - https://gerrit.wikimedia.org/r/1778 [19:15:05] Change merged: Jgreen; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/1778 [19:15:24] !log reedy synchronized php-1.18/extensions/CentralAuth/specials/ 'r107070' [19:15:25] Logged the message, Master [19:15:27] !log r108070 even [19:15:27] Logged the message, Master [19:23:31] New patchset: Lcarr; "Fixing ganglia web installation on nickel" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/1779 [19:25:48] ACKNOWLEDGEMENT - Host srv199 is DOWN: PING CRITICAL - Packet loss = 100% daniel_zahn RT #2209 NIC failure [19:27:12] New review: RobH; "(no comment)" [operations/puppet] (production); V: 1 C: 2; - https://gerrit.wikimedia.org/r/1779 [19:27:35] New review: Dzahn; "instead of require "generic::webserver::php5-mysql" and $ssl=true, you can also:" [operations/puppet] (production); V: 0 C: 0; - https://gerrit.wikimedia.org/r/1779 [19:32:08] New patchset: Lcarr; "Fixing ganglia web installation on nickel" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/1779 [19:39:50] New review: Dzahn; "(no comment)" [operations/puppet] (production); V: 1 C: 1; - https://gerrit.wikimedia.org/r/1779 [19:40:46] New review: Dzahn; "(no comment)" [operations/puppet] (production); V: 1 C: 2; - https://gerrit.wikimedia.org/r/1779 [19:40:47] Change merged: Dzahn; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/1779 [19:52:38] PROBLEM - Puppet freshness on es1002 is CRITICAL: Puppet has not run in the last 10 hours [19:53:04] New review: Dzahn; "(no comment)" [operations/puppet] (production); V: 1 C: 2; - https://gerrit.wikimedia.org/r/1774 [19:53:05] Change merged: Dzahn; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/1774 [20:04:46] New review: Dzahn; "(no comment)" [operations/puppet] (production); V: 1 C: 2; - https://gerrit.wikimedia.org/r/1768 [20:04:47] Change merged: Dzahn; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/1768 [20:05:24] New review: Dzahn; "(no comment)" [operations/puppet] (production); V: 1 C: 2; - https://gerrit.wikimedia.org/r/1769 [20:05:25] Change merged: Dzahn; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/1769 [20:13:27] New review: Dzahn; "URLs changed on gallium and work fine" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/1774 [20:14:26] New review: Dzahn; "installed maven on gallium without problems" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/1769 [20:15:22] mutante: thanks 8) [20:15:42] New patchset: Lcarr; "Adding warning + nickel into netboot.cfg" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/1780 [20:15:56] New review: Lcarr; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/1780 [20:16:02] Change merged: Lcarr; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/1780 [20:16:11] New review: Dzahn; "no problems, pulled in these:" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/1768 [20:16:26] hashar: yw [20:49:26] New patchset: Ryan Lane; "Adding in a pre-login banner for labs" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/1781 [20:50:31] New review: Ryan Lane; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/1781 [20:50:32] Change merged: Ryan Lane; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/1781 [20:52:05] PROBLEM - HTTP on nickel is CRITICAL: Connection refused [20:56:05] PROBLEM - SSH on nickel is CRITICAL: Connection refused [20:56:54] New patchset: Jgreen; "refactoring banner impression log handling scripts for easier maintenance" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/1783 [20:57:39] New review: Jgreen; "(no comment)" [operations/puppet] (production); V: 1 C: 2; - https://gerrit.wikimedia.org/r/1783 [20:57:39] Change merged: Jgreen; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/1783 [21:16:34] !log Applying schema changes to moodbar_feedback_response on all wikis (drop index, create index, add column) [21:16:35] Logged the message, Mr. Obvious [21:24:11] !log deploying LdapAuthentication 2.0a and OpenStackmanager 1.3 to virt1 [21:24:11] Logged the message, Master [21:25:51] RoanKattouw: Is a master-change in the air for that? [21:26:23] No [21:26:36] It was a <1000 row table [21:26:49] ok :) [21:29:02] yay. update worked fine [22:07:42] RECOVERY - SSH on nickel is OK: SSH OK - OpenSSH_5.3p1 Debian-3ubuntu7 (protocol 2.0) [22:28:18] New patchset: tstarling; "Updated collector location" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/1785 [22:28:50] New review: tstarling; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/1785 [22:28:50] Change merged: tstarling; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/1785 [22:34:01] tstarling cleared profiling data [22:41:02] New patchset: Asher; "install percona-nagios-checks in the right place, add nrpe template" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/1786 [22:42:43] !log taking srv280 for action=purge slowness investigation [22:42:44] Logged the message, Master [22:45:30] New patchset: Asher; "install percona-nagios-checks in the right place, add nrpe template" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/1786 [22:46:15] New review: Asher; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/1786 [22:46:15] Change merged: Asher; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/1786 [22:58:33] Heads up: we'll be starting an office hours in #wikimedia-office in just a few moments, topic is MediaWiki features development and its with the features team at the WMF [23:00:50] PROBLEM - NTP on nickel is CRITICAL: NTP CRITICAL: No response from NTP server [23:16:51] PROBLEM - mobile traffic loggers on cp1042 is CRITICAL: PROCS CRITICAL: 1 process with command name varnishncsa [23:30:37] !log catrope synchronizing Wikimedia installation... : Deploying MoodBar and MarkAsHelpful changes [23:30:38] Logged the message, Master [23:31:36] PROBLEM - mobile traffic loggers on cp1042 is CRITICAL: PROCS CRITICAL: 0 processes with command name varnishncsa [23:32:24] sync done. [23:36:26] RECOVERY - HTTP on nickel is OK: HTTP OK HTTP/1.1 200 OK - 455 bytes in 0.053 seconds [23:44:57] gn8 folks [23:49:16] RECOVERY - NTP on nickel is OK: NTP OK: Offset -0.09375977516 secs [23:51:16] RECOVERY - mobile traffic loggers on cp1042 is OK: PROCS OK: 2 processes with command name varnishncsa [23:58:20] i wanted to know if anyone would recommend a cool extension for wikimedia that shows who just edited what -- really big -- on the mainpage