[00:10:17] PROBLEM Free ram is now: WARNING on bots-sql2 i-000000af output: Warning: 19% free memory [02:05:17] RECOVERY Free ram is now: OK on bots-sql2 i-000000af output: OK: 22% free memory [03:36:17] PROBLEM Free ram is now: WARNING on test-oneiric i-00000187 output: Warning: 16% free memory [03:36:17] PROBLEM Free ram is now: WARNING on utils-abogott i-00000131 output: Warning: 15% free memory [03:48:17] PROBLEM Disk Space is now: WARNING on deployment-transcoding i-00000105 output: DISK WARNING - free space: / 78 MB (5% inode=52%): [03:51:17] PROBLEM Free ram is now: CRITICAL on utils-abogott i-00000131 output: Critical: 5% free memory [03:51:17] PROBLEM Free ram is now: CRITICAL on test-oneiric i-00000187 output: Critical: 5% free memory [03:53:47] PROBLEM Free ram is now: WARNING on nova-daas-1 i-000000e7 output: Warning: 11% free memory [03:56:17] PROBLEM Free ram is now: WARNING on orgcharts-dev i-0000018f output: Warning: 14% free memory [04:01:17] RECOVERY Free ram is now: OK on utils-abogott i-00000131 output: OK: 97% free memory [04:01:17] RECOVERY Free ram is now: OK on test-oneiric i-00000187 output: OK: 97% free memory [04:13:47] PROBLEM Free ram is now: CRITICAL on nova-daas-1 i-000000e7 output: Critical: 4% free memory [04:16:18] PROBLEM Free ram is now: CRITICAL on orgcharts-dev i-0000018f output: Critical: 3% free memory [04:18:47] RECOVERY Free ram is now: OK on nova-daas-1 i-000000e7 output: OK: 94% free memory [04:21:17] RECOVERY Free ram is now: OK on orgcharts-dev i-0000018f output: OK: 94% free memory [04:26:17] PROBLEM Free ram is now: WARNING on test3 i-00000093 output: Warning: 13% free memory [04:31:17] PROBLEM Free ram is now: CRITICAL on test3 i-00000093 output: Critical: 2% free memory [04:36:17] RECOVERY Free ram is now: OK on test3 i-00000093 output: OK: 96% free memory [05:33:17] PROBLEM Free ram is now: WARNING on bots-sql2 i-000000af output: Warning: 19% free memory [06:28:17] RECOVERY Disk Space is now: OK on deployment-transcoding i-00000105 output: DISK OK [07:00:12] PROBLEM Puppet freshness is now: CRITICAL on aggregator2 i-000002c0 output: Puppet has not run in last 20 hours [07:11:22] PROBLEM Disk Space is now: WARNING on deployment-transcoding i-00000105 output: DISK WARNING - free space: / 78 MB (5% inode=52%): [09:32:20] ACKNOWLEDGEMENT Puppet freshness is now: CRITICAL on aggregator2 i-000002c0 output: Puppet has not run in last 20 hours [09:38:32] hashar: hey [09:51:13] hello :) [09:51:18] sorry heading to get some sleep [10:38:52] hi [10:39:02] hi hundfred [11:20:12] PROBLEM Free ram is now: CRITICAL on incubator-bot1 i-00000251 output: CHECK_NRPE: Socket timeout after 10 seconds. [11:25:02] PROBLEM Free ram is now: WARNING on incubator-bot1 i-00000251 output: Warning: 13% free memory [15:12:10] RECOVERY Disk Space is now: OK on nova-production1 i-0000007b output: DISK OK [16:32:17] !log deployment-prep Disabled CheckUser extension again [16:32:19] .. [16:36:09] bot missing again? [16:37:03] there we go [16:37:34] !log deployment-prep Disabled CheckUser extension again [16:37:36] Logged the message, Master [16:37:40] Ryan_Lane: thanks :D [16:37:44] yw [16:37:58] I keep forgetting from where to restart it, though Petan gave the command on some bug report I did [16:38:26] https://bugzilla.wikimedia.org/show_bug.cgi?id=37527 (marked resolved) [16:38:34] Ryan_Lane: are you still working from Europe ? [16:38:40] yep [16:39:48] so I will poke you tomorrow morning :-] [16:42:23] !log deployment-prep squid: depooling apache 20 - 24, pooling apache 30 & 31 [16:42:24] Logged the message, Master [16:43:21] \O/ [16:43:29] beta now running from Precises boxes! [16:44:15] hashar: :O [16:44:16] nice [16:44:50] RECOVERY Free ram is now: OK on deployment-squid i-000000dc output: OK: 69% free memory [16:44:50] commons still missing some thumbs though ;-] [16:45:26] hashar: are you reloading or restarting squid [16:45:26] hashar: are you reloading or restarting squid [16:45:30] reloading [16:45:32] ok [16:46:01] sudo service squid reload [16:46:30] !log deployment-prep deleting the 4 old apaches [16:47:08] well hmm labs is dead [16:47:50] virt0 changing IP [16:48:34] o.o [16:48:39] how dead [16:48:50] !ping [16:48:50] pong [16:51:01] well [16:51:11] I will delete the apaches later tonight [16:51:13] or tomorrow [16:51:16] for now, daughter duty!! [17:13:20] RECOVERY Disk Space is now: OK on ipv6test1 i-00000282 output: DISK OK [17:21:20] PROBLEM Disk Space is now: WARNING on ipv6test1 i-00000282 output: DISK WARNING - free space: / 68 MB (5% inode=57%): [17:36:56] Ops folks, why u no review one line change? https://gerrit.wikimedia.org/r/#/c/11188/1 :) [17:38:50] RECOVERY Free ram is now: OK on bots-sql2 i-000000af output: OK: 20% free memory [17:41:19] why does everyone only add a single ops person to the reviewers list? [17:41:20] PROBLEM SSH is now: UNKNOWN on aggregator2 i-000002c0 output: Usage:check_ssh [-46] [-t timeout] [-r remote version] [-p port] host [17:41:29] you're also asking in the wrong channel [17:46:10] PROBLEM host: aggregator2 is DOWN address: i-000002c0 check_ping: Invalid hostname/address - i-000002c0 [17:46:50] PROBLEM Free ram is now: WARNING on bots-sql2 i-000000af output: Warning: 19% free memory [17:47:44] Ryan_Lane: should i add 5 ops ppl to review list at once then? [17:47:55] ideally we'd addallof them [17:48:00] anyway, is ldap still broke? [17:48:03] no [17:48:12] i can't get into bastion [17:48:16] no? [17:48:36] 3rd try going now... [17:48:41] hm [17:48:43] can't resolve [17:49:02] that's not a good sign [17:54:00] Ryan_Lane: new problem now? [17:54:09] Permission denied (publickey). [17:54:12] Ryan_Lane: is there an ops group I can add? [17:54:43] no, it's an ldap group, and there's a bug in gerrit that doesn't let ldap groups be added as reviewers :( [17:54:50] * jeremyb *still* wants a way to request from the wind without having to single out people........ [17:55:05] JeroenDeDauw: anyway, you should bug people in #wikimedia-operations, not here [17:55:28] jeremyb: you can't get in? [17:55:32] it's working for me now [17:55:39] no worky [17:55:46] hm [17:55:52] well my statements apply to all of gerrit i think not just ops [17:55:57] but i shall go bug ops too! [17:55:59] dns looks broken inside of labs [17:55:59] sec [17:56:42] Reedy: look up for the history ;) [17:57:01] Ryan_Lane: MWF needs to hire some people that have a deep seated java fetish to make gerrit magic happen ;) [17:57:12] WMF* [17:57:19] rotfl [17:57:30] too bad we have no room for bugzilla quips [17:57:58] Meh either way, some parts of MW would be better in java :P WMF/MWF are totally interchangeable in my head [18:03:03] fixing it [18:03:10] recursor was hitting the old ip [18:05:08] jeremyb: fixed [18:06:00] works! even to a non-public node! (not just to bastion) [18:06:02] danke! [18:06:50] RECOVERY host: aggregator2 is UP address: i-000002c0 PING OK - Packet loss = 0%, RTA = 0.38 ms [18:09:50] PROBLEM Disk Space is now: CRITICAL on aggregator2 i-000002c0 output: CHECK_NRPE: Error - Could not complete SSL handshake. [18:09:50] PROBLEM Current Users is now: CRITICAL on aggregator2 i-000002c0 output: CHECK_NRPE: Error - Could not complete SSL handshake. [18:09:50] PROBLEM Free ram is now: CRITICAL on aggregator2 i-000002c0 output: CHECK_NRPE: Error - Could not complete SSL handshake. [18:09:50] PROBLEM Total Processes is now: CRITICAL on aggregator2 i-000002c0 output: CHECK_NRPE: Error - Could not complete SSL handshake. [18:11:20] PROBLEM dpkg-check is now: CRITICAL on aggregator2 i-000002c0 output: CHECK_NRPE: Error - Could not complete SSL handshake. [18:12:50] PROBLEM Current Load is now: CRITICAL on aggregator2 i-000002c0 output: CHECK_NRPE: Error - Could not complete SSL handshake. [18:14:10] PROBLEM SSH is now: CRITICAL on aggregator2 i-000002c0 output: Server answer: [18:30:10] is http://commons.wikimedia.beta.wmflabs.org/wiki/ coming up for anyone else? [18:30:21] timed out for me [18:30:29] oh worked that time :P [18:38:01] ... so http://commons.wikimedia.beta.wmflabs.org/wiki/ is looking for all extension assets on //bits.wikimedia.org/static-trunk/extensions/ ... .. which of course does not have all these assets. [18:41:08] mdale: hashar might have left beta labs in a not-so-good state, I saw something about in passing a few hours ago [18:41:16] about that [18:42:00] thank chrismcmahon ... if hashar comes around, remind me to bother him ;) [19:12:33] quick re [19:13:35] !log deployment-prep deleting apache20-23 instances (the Lucid ones) [19:15:35] !log deployment-prep deleting apache20-23 instances (the Lucid ones) [19:17:34] well no luck reloading the bot :D [19:19:36] hashar mdale was here earlier saying " http://commons.wikimedia.beta.wmflabs.org/wiki/ is looking for all extension assets on //bits.wikimedia.org/static-trunk/extensions/ ... .. which of course does not have all these assets. " was that you? [19:19:50] ohh [19:19:55] yeah we have no bits yet :-( [19:21:01] hashar: why I have ya ... bug 37058 .. [19:21:41] just added a comment... why can't it just set the temporary path as part of configuration for thous transcode boxes? [19:22:21] I think the issue was that TempFSFile::factory() give you a file path in /tmp [19:22:28] I started implementation on a branch but then .. to do it correct is to really create a new TempStore class that extends the base temp file class so that we get all the maintenance benefits for long expired failed transcodes etc ... [19:22:32] .. but you can override that [19:22:33] which is mounted on the root partition which in turn might not have enough space [19:23:04] i.e just set the env var? [19:23:14] hmm [19:23:20] that might work ;-D [19:24:34] if that will work for you ... lets update the bug once you test that. [19:26:57] hexmode made a patch to wfTempFile() let me find it [19:27:14] https://gerrit.wikimedia.org/r/#/c/8996/ [19:27:42] mdale: do you know if there is a way to resubmit a transcoding job ? [19:27:59] just remove it [19:28:05] and reload the associated page [19:29:53] you need to give your self the 'transcode-reset' permission or be part of the sysop group [19:30:12] then "reset transcode" option should show up on any page [19:30:12] http://commons.wikimedia.beta.wmflabs.org/wiki/File:Youth_media_2004_Prescott_Circus.ogv [19:30:18] there is also api call to do that [19:30:52] ahhh I was missing that right ! [19:30:53] thanks :-] [19:32:11] I should give myself sysop privileges on commons some how. [19:32:16] on labs I mean [19:36:46] I am updating the labs cluster [19:36:50] well mediawiki [19:42:35] mdale: I have lost my account on beta :-/ Will recover it and make you a sysop there [19:45:02] password recovery for dummies : http://dpaste.org/KbtKv/ [19:53:15] @help [19:53:15] Type @commands for list of commands. This bot is running http://meta.wikimedia.org/wiki/WM-Bot version wikimedia bot v. 1.5.8 source code licensed under GPL and located at https://github.com/benapetr/wikimedia-bot [19:53:19] Thehelpfulone: this? [19:54:25] !log deployment-prep Made myself a bureaucrat on commons [19:54:43] !log updated MediaWiki core to 99fdc6e [19:54:45] pff [19:54:46] updated is not a valid project. [19:54:57] !log deployment-prep MediaWiki core to 99fdc6e [19:55:09] petan: looks like labs-morebots no more notify us :-( [19:55:34] yes, there is a bug for that [19:55:39] I created it like 2 months ago [19:55:48] we need to move it to another server [19:55:57] bots-2 is least stable box we have [19:55:59] well I am pretty sure it worked earlier [19:56:03] anything changed in the code? [19:56:07] ohh [19:56:09] no, just the box suck [19:56:16] it's overloaded most of time [19:56:23] most cpu expensive bot runs there [19:56:25] we might want to redeploy that bot to a dedicated one so :-] [19:57:07] yes [19:57:14] but only guy with code is Ryan [19:57:44] mdale: what is your username on commons ? mdale ? :-] [20:01:38] Is labs down? [20:02:16] !ping [20:02:16] pong [20:02:21] labs are fine [20:02:32] Hrm [20:02:42] labsconsole seems down? [20:02:50] hm... [20:02:53] And my instance is, too, from what I can tell [20:03:00] Ryan_Lane: is it? [20:03:07] it is just slow on connect [20:03:13] I am on labsconsole rightnow [20:03:26] <^demon> Quick, everyone refresh [20:04:03] Surely 30 people refreshing the page will fix the server load problems? [20:04:08] definitely [20:04:27] <^demon> marktraceur: If not, we'll all know individually without having to ask :) [20:04:55] Well, I still can't get to it, but if I'm the only one, I guess I can be patient [20:05:00] hashar: are we at a point where we might deploy extensions out of gerrit to http://en.wikipedia.beta.wmflabs.org/? [20:10:57] chrismcmahon: arent we supposed to deploy extensions in a dedicated wiki ? [20:11:46] hashar: not as far as I know. One extension I'm interested in is already one enwiki, the other is not. [20:11:46] dschoon: I just love your maven mail :-] [20:11:55] already on [20:12:48] chrismcmahon: which eats ? [20:12:50] err [20:12:52] which extensions? [20:13:49] hashar AFTv5 and Editor Engagement. let me see if I can find links. [20:14:42] yeah I know those [20:14:44] ;) [20:15:19] hashar: <3 [20:16:03] so it looks like the Editor Engagement has been set up on ee_prototypewiki [20:16:07] let me find the URL [20:16:30] http://ee-prototype.wikipedia.beta.wmflabs.org/ <--- blank page ahah [20:16:36] hashar: yes [20:16:59] * hashar launch tcpdump to get syslog traces [20:17:18] hashar: yes my user name is mdale [20:17:48] mdale: I have made you a sysop on the beta commons [20:17:53] chrismcmahon: ..8.....X..<189>Jun 13 20:17:25 i-000002d3 apache2[2860]: PHP Warning: require_once(/usr/local/apache/common-local/php-1.19/extensions/Interwiki/Interwiki.php) [function.require-once]: failed to open stream: No such file or directory in /usr/local/apache/common-local/wmf-config/CommonSettings.php on line 2417 [20:17:56] thx [20:18:03] chrismcmahon: looks like the Editor Engagement is using MediaWiki 1.19 somehow [20:18:18] mdale: Roan has bureaucrat status there IIRC [20:18:28] mdale: if you need any more right while I am sleeping :-] [20:18:49] oky [20:19:34] hashar: a recent AFTv5 deployment had a bug I think we might have caught if we'd tested it on an enwiki-like environment instead of on prototype [20:20:59] hashar: yes, and EE people are looking for a better test env. :) [20:21:05] chrismcmahon: yeah most probably. I am just afraid of having too many extensions on the beta enwiki, that will make it vastly different from the production wiki :/ [20:21:28] hashar: my thinking is that they could be rolled back if they caused trouble [20:22:25] hashar: I'm also thinking that having just those two extension in the test env would not affect the production-like nature of the beta enwiki [20:24:03] http://ee-prototype.wikipedia.beta.wmflabs.org/ <-- no more blank page [20:24:11] I have migrated it from 1.19 to master [20:25:06] thanks. (I hope it doesn't break anything) :-) [20:25:08] !log deployment-prep Migrated ee_prototypewiki from 1.19 to master (ran update.php too) [20:25:16] well it is broken, missing logo and such :-] [20:26:09] oh there is just no content http://ee-prototype.wikipedia.beta.wmflabs.org/wiki/Special:AllPages [20:26:33] <^demon> ee? As in...ee.pl? [20:26:39] editor engagement [20:27:18] <^demon> Ah. I'm too old school...I was thinking "external editor" :) [20:27:35] <^demon> For the 4-5 people who ever used it :p [20:28:01] * YuviPanda wonders if people *really* edit javascript in *that* editor [20:28:17] <^demon> Nobody edits anything in that editor, that's part of the joke :) [20:37:39] hashar: http://ee-prototype.wikipedia.beta.wmflabs.org/ isn't returning for me [20:37:51] <^demon> wfm. [20:38:11] chrismcmahon: :-/ Maybe a DNS issue? [20:38:36] don't see how that could be, but maybe [20:39:32] it should give you a 'Main Page' article with no content [20:40:31] <^demon> That's what I see. [20:41:26] hmm, it's something with DNS from my end then [20:41:46] thanks ^demon [20:41:57] <^demon> Glad I could help? :p [20:47:53] weird hosts on beta.wmflabs.org don't seem to be resolving for me [20:51:38] chrismcmahon: do you have linux / mac ? [20:51:46] you can do some debugging with the 'dig' command [20:51:55] ok [20:52:22] example output : http://dpaste.org/AxV9P/ [20:52:31] line starting with a semi colon are comments [20:52:37] the line that matter is ee-prototype.wikipedia.beta.wmflabs.org. 3600 IN A 208.80.153.219 [20:54:07] ohhh [20:54:11] that might be another issue [20:55:41] chrismcmahon: you might want to talk about it with ops in #wikimedia-operations [20:55:49] they have changed a server IP some hours ago [20:55:52] that might be the reason [20:56:29] hashar: what do you get for ee-prototype.wmflabs.org [20:56:50] chrismcmahon: http://dpaste.org/AxV9P/ [20:58:21] chrismcmahon: sorry I am going to bed :/ [20:58:32] #wikimedia-operations will be able to help though [21:00:05] have a good night! [21:19:34] all of labs down right now? [21:21:39] ping ssmollett Ryan_Lane andrewbogott paravoid [21:22:52] !nagios [21:22:52] http://nagios.wmflabs.org/nagios3 [21:22:57] Eloquence: We're discussing it on wikimedia-operations. Everything works for me, but there seem to be some selective networking problems. [21:23:17] ok [22:03:26] PROBLEM Disk Space is now: CRITICAL on deployment-transcoding i-00000105 output: CHECK_NRPE: Socket timeout after 10 seconds. [22:03:26] PROBLEM Disk Space is now: CRITICAL on ipv6test1 i-00000282 output: CHECK_NRPE: Socket timeout after 10 seconds. [22:06:47] PROBLEM Disk Space is now: WARNING on deployment-transcoding i-00000105 output: DISK WARNING - free space: / 76 MB (5% inode=52%): [22:06:47] PROBLEM Disk Space is now: WARNING on ipv6test1 i-00000282 output: DISK WARNING - free space: / 68 MB (5% inode=57%): [22:39:59] the domain name wmflabs.org isn't resolving for me [22:40:24] is something broken? [22:43:05] Ryan_Lane: ^ [22:43:58] kaldari: bastion.wmflabs.org does, right? [22:44:06] no [22:44:17] ssh: Could not resolve hostname bastion.wmflabs.org: nodename nor servname provided, or not known [22:45:07] well, nevermind, they're all working now