[00:19:26] PROBLEM host: asher1 is DOWN address: asher1 CRITICAL - Host Unreachable (asher1) [00:19:46] PROBLEM host: hugglewa-w1 is DOWN address: hugglewa-w1 CRITICAL - Host Unreachable (hugglewa-w1) [00:19:46] PROBLEM host: puppet-lucid is DOWN address: puppet-lucid CRITICAL - Host Unreachable (puppet-lucid) [00:19:56] PROBLEM host: turnkey-1 is DOWN address: turnkey-1 CRITICAL - Host Unreachable (turnkey-1) [00:20:46] PROBLEM host: pad2 is DOWN address: pad2 CRITICAL - Host Unreachable (pad2) [00:20:46] PROBLEM host: webserver-lcarr is DOWN address: webserver-lcarr CRITICAL - Host Unreachable (webserver-lcarr) [00:23:46] PROBLEM host: dumpster01 is DOWN address: dumpster01 CRITICAL - Host Unreachable (dumpster01) [00:23:46] PROBLEM host: dumps-4 is DOWN address: dumps-4 CRITICAL - Host Unreachable (dumps-4) [00:49:26] PROBLEM host: asher1 is DOWN address: asher1 CRITICAL - Host Unreachable (asher1) [00:49:46] PROBLEM host: hugglewa-w1 is DOWN address: hugglewa-w1 CRITICAL - Host Unreachable (hugglewa-w1) [00:49:46] PROBLEM host: puppet-lucid is DOWN address: puppet-lucid CRITICAL - Host Unreachable (puppet-lucid) [00:49:56] PROBLEM host: turnkey-1 is DOWN address: turnkey-1 CRITICAL - Host Unreachable (turnkey-1) [00:50:46] PROBLEM host: pad2 is DOWN address: pad2 CRITICAL - Host Unreachable (pad2) [00:50:46] PROBLEM host: webserver-lcarr is DOWN address: webserver-lcarr CRITICAL - Host Unreachable (webserver-lcarr) [00:53:46] PROBLEM host: dumpster01 is DOWN address: dumpster01 CRITICAL - Host Unreachable (dumpster01) [00:53:46] PROBLEM host: dumps-4 is DOWN address: dumps-4 CRITICAL - Host Unreachable (dumps-4) [01:19:26] PROBLEM host: asher1 is DOWN address: asher1 CRITICAL - Host Unreachable (asher1) [01:19:46] PROBLEM host: hugglewa-w1 is DOWN address: hugglewa-w1 CRITICAL - Host Unreachable (hugglewa-w1) [01:19:46] PROBLEM host: puppet-lucid is DOWN address: puppet-lucid CRITICAL - Host Unreachable (puppet-lucid) [01:19:56] PROBLEM host: turnkey-1 is DOWN address: turnkey-1 CRITICAL - Host Unreachable (turnkey-1) [01:20:46] PROBLEM host: pad2 is DOWN address: pad2 CRITICAL - Host Unreachable (pad2) [01:20:46] PROBLEM host: webserver-lcarr is DOWN address: webserver-lcarr CRITICAL - Host Unreachable (webserver-lcarr) [01:23:46] PROBLEM host: dumps-4 is DOWN address: dumps-4 CRITICAL - Host Unreachable (dumps-4) [01:23:46] PROBLEM host: dumpster01 is DOWN address: dumpster01 CRITICAL - Host Unreachable (dumpster01) [01:49:26] PROBLEM host: asher1 is DOWN address: asher1 CRITICAL - Host Unreachable (asher1) [01:49:46] PROBLEM host: hugglewa-w1 is DOWN address: hugglewa-w1 CRITICAL - Host Unreachable (hugglewa-w1) [01:49:46] PROBLEM host: puppet-lucid is DOWN address: puppet-lucid CRITICAL - Host Unreachable (puppet-lucid) [01:49:56] PROBLEM host: turnkey-1 is DOWN address: turnkey-1 CRITICAL - Host Unreachable (turnkey-1) [01:50:46] PROBLEM host: webserver-lcarr is DOWN address: webserver-lcarr CRITICAL - Host Unreachable (webserver-lcarr) [01:50:46] PROBLEM host: pad2 is DOWN address: pad2 CRITICAL - Host Unreachable (pad2) [01:53:46] PROBLEM host: dumps-4 is DOWN address: dumps-4 CRITICAL - Host Unreachable (dumps-4) [01:53:46] PROBLEM host: dumpster01 is DOWN address: dumpster01 CRITICAL - Host Unreachable (dumpster01) [02:05:24] petan|wk: hi, still need reboots? [02:19:26] PROBLEM host: asher1 is DOWN address: asher1 CRITICAL - Host Unreachable (asher1) [02:19:46] PROBLEM host: hugglewa-w1 is DOWN address: hugglewa-w1 CRITICAL - Host Unreachable (hugglewa-w1) [02:19:46] PROBLEM host: puppet-lucid is DOWN address: puppet-lucid CRITICAL - Host Unreachable (puppet-lucid) [02:19:56] PROBLEM host: turnkey-1 is DOWN address: turnkey-1 CRITICAL - Host Unreachable (turnkey-1) [02:20:46] PROBLEM host: pad2 is DOWN address: pad2 CRITICAL - Host Unreachable (pad2) [02:20:46] PROBLEM host: webserver-lcarr is DOWN address: webserver-lcarr CRITICAL - Host Unreachable (webserver-lcarr) [02:23:46] PROBLEM host: dumpster01 is DOWN address: dumpster01 CRITICAL - Host Unreachable (dumpster01) [02:23:46] PROBLEM host: dumps-4 is DOWN address: dumps-4 CRITICAL - Host Unreachable (dumps-4) [02:36:09] !log testlabs - rebooting asher1, webserver-lcarr [02:36:26] RECOVERY host: webserver-lcarr is UP address: webserver-lcarr PING OK - Packet loss = 0%, RTA = 0.87 ms [02:36:27] oh [02:36:28] those? [02:36:33] those are all on virt3 [02:36:37] I left them down on purpose [02:36:46] RECOVERY host: asher1 is UP address: asher1 PING OK - Packet loss = 0%, RTA = 0.81 ms [02:36:47] though virt3 doesn't look overly overloaded [02:37:07] arr, well < petan|wk> mutante, ssmollett can you reboot all these instances reported by nagios? [02:37:17] I'd say let's mute them in nagios [02:37:46] yeah, thats why i hesitated, i figured some might be down on purpose [02:37:48] it's possible we'll run the system out of memory again [02:38:06] yeah ok, let me shut them down again [02:38:10] since we're two dimms down [02:39:20] oh..hm. shutting down not just a click away .P hrmm [02:39:34] nope. there's no command for doing so [02:39:44] either need to ssh into it and shutdown, or do it from the system its on [02:39:52] using virsh destroy [02:40:01] destroy sounds evil :) [02:40:04] yeah [02:40:13] it isn't a graceful shutdown [02:40:27] neither is reboot, though [02:42:26] RECOVERY Total Processes is now: OK on essex-9 essex-9 output: PROCS OK: 99 processes [02:43:50] Ryan_Lane: whats the correct domain name? trying instance name, hostname or fqdn, but getting "failed to get domain" [02:43:56] RECOVERY Current Load is now: OK on essex-9 essex-9 output: OK - load average: 0.87, 1.09, 0.63 [02:44:07] what are you trying to do? [02:44:10] ssh into it? [02:44:13] destroy instances [02:44:19] virsh list [02:44:21] in virsh shell [02:44:29] then, list [02:44:36] RECOVERY Current Users is now: OK on essex-9 essex-9 output: USERS OK - 1 users currently logged in [02:44:36] then look for instance- [02:44:44] it'll be the same as i- [02:45:06] then destroy instance- [02:45:16] RECOVERY Disk Space is now: OK on essex-9 essex-9 output: DISK OK [02:45:30] ah, gotta prefix "instance-". instance-0000003a . i tried "I-0000003a" like in the Nova Resource page [02:45:42] yeah. nova expands that in the background [02:46:06] RECOVERY Free ram is now: OK on essex-9 essex-9 output: OK: 78% free memory [02:47:21] !log testlabs - "destroyed" I-0000003a (asher1) and I-00000134 (webserver-lcarr) again to prevent OOM [02:47:39] heh [02:47:47] the stupid bot isn't in the room [02:47:55] :p [02:47:59] gimme a sec [02:48:10] it needs to be fixed [02:48:16] RECOVERY dpkg-check is now: OK on essex-9 essex-9 output: All packages OK [02:48:51] it's going to die in a sec [02:48:53] !log test [02:48:53] Message missing. Nothing logged. [02:49:01] !log test test [02:49:27] ok. it'll be good now [02:49:35] re-try the log :) [02:49:46] PROBLEM host: hugglewa-w1 is DOWN address: hugglewa-w1 CRITICAL - Host Unreachable (hugglewa-w1) [02:49:46] PROBLEM host: puppet-lucid is DOWN address: puppet-lucid CRITICAL - Host Unreachable (puppet-lucid) [02:49:56] PROBLEM host: turnkey-1 is DOWN address: turnkey-1 CRITICAL - Host Unreachable (turnkey-1) [02:50:04] !log testlabs - rebooted asher1, webserver-lcarr [02:50:05] Logged the message, Master [02:50:14] !log testlabs - "destroyed" I-0000003a (asher1) and I-00000134 (webserver-lcarr) again to prevent OOM [02:50:15] Logged the message, Master [02:50:36] PROBLEM host: asher1 is DOWN address: asher1 CRITICAL - Host Unreachable (asher1) [02:50:46] PROBLEM host: pad2 is DOWN address: pad2 CRITICAL - Host Unreachable (pad2) [02:52:06] PROBLEM host: webserver-lcarr is DOWN address: webserver-lcarr CRITICAL - Host Unreachable (webserver-lcarr) [02:53:46] PROBLEM host: dumps-4 is DOWN address: dumps-4 CRITICAL - Host Unreachable (dumps-4) [02:53:46] PROBLEM host: dumpster01 is DOWN address: dumpster01 CRITICAL - Host Unreachable (dumpster01) [03:13:14] I'd like to have permission to execute host commands (notifcations, scheduled downtime) on Nagios [03:13:33] and/or feature request to do those via IRC bot ;) [03:18:44] host command is just a Nagios permission issue, while downtime would be accepted but fails due to file permissions (Error: Could not open command file '/var/lib/nagios3/rw/nagios.cmd' for update! .. The permissions on the external command file and/or directory may be incorrect. " .. lemme check puppet files [03:19:46] PROBLEM host: hugglewa-w1 is DOWN address: hugglewa-w1 CRITICAL - Host Unreachable (hugglewa-w1) [03:19:46] PROBLEM host: puppet-lucid is DOWN address: puppet-lucid CRITICAL - Host Unreachable (puppet-lucid) [03:19:56] PROBLEM host: turnkey-1 is DOWN address: turnkey-1 CRITICAL - Host Unreachable (turnkey-1) [03:20:46] PROBLEM host: pad2 is DOWN address: pad2 CRITICAL - Host Unreachable (pad2) [03:21:36] PROBLEM host: asher1 is DOWN address: asher1 CRITICAL - Host Unreachable (asher1) [03:22:46] PROBLEM host: webserver-lcarr is DOWN address: webserver-lcarr CRITICAL - Host Unreachable (webserver-lcarr) [03:23:46] PROBLEM host: dumps-4 is DOWN address: dumps-4 CRITICAL - Host Unreachable (dumps-4) [03:23:46] PROBLEM host: dumpster01 is DOWN address: dumpster01 CRITICAL - Host Unreachable (dumpster01) [03:26:49] hmm. not puppetized yet it seems [03:27:33] !log nagios puppet broken due to "Could not find class misc::apache2" [03:27:34] Logged the message, Master [03:36:30] !log nagios even though listed in all authorized_for_* commands in cgi.cfg i get denied to execute any by web ui. guess related to the Apache LDAP auth / auto-login [03:36:31] Logged the message, Master [03:37:37] :Q [03:49:46] PROBLEM host: hugglewa-w1 is DOWN address: hugglewa-w1 CRITICAL - Host Unreachable (hugglewa-w1) [03:49:46] PROBLEM host: puppet-lucid is DOWN address: puppet-lucid CRITICAL - Host Unreachable (puppet-lucid) [03:49:56] PROBLEM host: turnkey-1 is DOWN address: turnkey-1 CRITICAL - Host Unreachable (turnkey-1) [03:50:46] PROBLEM host: pad2 is DOWN address: pad2 CRITICAL - Host Unreachable (pad2) [03:51:46] PROBLEM host: asher1 is DOWN address: asher1 CRITICAL - Host Unreachable (asher1) [03:52:46] PROBLEM host: webserver-lcarr is DOWN address: webserver-lcarr CRITICAL - Host Unreachable (webserver-lcarr) [03:53:46] PROBLEM host: dumps-4 is DOWN address: dumps-4 CRITICAL - Host Unreachable (dumps-4) [03:53:46] PROBLEM host: dumpster01 is DOWN address: dumpster01 CRITICAL - Host Unreachable (dumpster01) [03:57:46] mutante: yeah, not too sure how to handle that [03:57:52] authentication scares me a little [03:58:06] since non-ops can access it as root [03:58:12] which means they can steal our passwords [03:58:33] mutante: also, nagios config isn't puppetized in labs because it isn't puppetized in production [03:58:39] so, it's scripted in labs [03:58:42] using SMW queries [03:59:27] we really need an SSO server [03:59:52] then everything could send people off to the SSO server to authenticate, and would get a token, rather than a password [04:00:41] Ryan_Lane: while i still get the "Your account does not have permissions to execute commands." in the web ui, i did fix the "permissions on external command file" thing just now.. well manually [04:00:49] ah [04:00:54] I dunno how to set that up [04:00:56] following instructions from Nagios faq [04:01:08] isn't that really unsafe? [04:01:11] so now i could use the "Downtime" link, and schedule a downtime for host asher1 [04:01:51] I thought there were vulnerabilities with allowing execution of commands [04:01:54] it is like: create a group "nagiocmd" which was missing, add the user nagios and the webserver user (www-data) to that group [04:02:49] then give the group rwx and g+s on that named pipe called "rw" [04:03:26] hmm, yeah, basically just wanted to see if that works, when doing what it says in the faq [04:05:18] http://nagios.manubulon.com/traduction/docs14en/commandfile.html [04:06:54] you're right about this: "If you've installed Nagios on a public/multi-user machine, I would suggest setting more restrictive permissions on the external command file and using something like CGIWrap to run the CGIs as a specific user. Failing to do so may allow normal users to control Nagios through the external command file! " [04:11:05] Change on 12mediawiki a page Wikimedia Labs/Agreement to disclosure of personally identifiable information was modified, changed by Ryan lane link https://www.mediawiki.org/w/index.php?diff=513110 edit summary: [04:11:57] restricted it again for now.. we'll have to take a closer look [04:13:29] Change on 12mediawiki a page Wikimedia Labs was modified, changed by Ryan lane link https://www.mediawiki.org/w/index.php?diff=513112 edit summary: /* Documents */ [04:14:45] heh nice, i didnt see the bot doing that yet:) [04:17:32] Change on 12mediawiki a page Wikimedia Labs/Agreement to disclosure of personally identifiable information was modified, changed by Ryan lane link https://www.mediawiki.org/w/index.php?diff=513114 edit summary: [04:18:06] heh [04:18:13] !log nagios - temp. changed permissions on external command file per Nagios FAQ, added group "nagiocmd" to see if that allows me to schedule downtimes, it does (independetly from the host command perms), but took permissions back due to security concerns [04:18:14] Logged the message, Master [04:18:22] it only works on mediawiki.org. We need to make it work for labsconsole too [04:19:01] Change on 12mediawiki a page Wikimedia Labs/Terms of use was modified, changed by Ryan lane link https://www.mediawiki.org/w/index.php?diff=513115 edit summary: [04:19:21] Change on 12mediawiki a page Wikimedia Labs/Terms of use/exception policy was modified, changed by Ryan lane link https://www.mediawiki.org/w/index.php?diff=513116 edit summary: [04:19:46] PROBLEM host: hugglewa-w1 is DOWN address: hugglewa-w1 CRITICAL - Host Unreachable (hugglewa-w1) [04:19:46] PROBLEM host: puppet-lucid is DOWN address: puppet-lucid CRITICAL - Host Unreachable (puppet-lucid) [04:19:56] PROBLEM host: turnkey-1 is DOWN address: turnkey-1 CRITICAL - Host Unreachable (turnkey-1) [04:20:06] Ryan_Lane: thinking along the line of "let humans execute host commands / downtimes" but ONLY via the ircbot, not using the web ui for it at all ... [04:20:46] PROBLEM host: pad2 is DOWN address: pad2 CRITICAL - Host Unreachable (pad2) [04:21:02] Change on 12mediawiki a page Wikimedia Labs/Account creation text was modified, changed by Ryan lane link https://www.mediawiki.org/w/index.php?diff=513117 edit summary: [04:21:18] the ircbot is even more dangerous :) [04:21:32] irc has no authentication at all, really [04:21:47] we'd have to trust cloaks [04:21:50] accounts can be hijacked during netsplits fairly easily [04:21:58] true..sigh [04:22:02] it's also easy to steal credentials [04:22:13] if this was jabber I'd be ok with it :) [04:22:46] PROBLEM host: webserver-lcarr is DOWN address: webserver-lcarr CRITICAL - Host Unreachable (webserver-lcarr) [04:22:58] Change on 12mediawiki a page Wikimedia Labs/Agreement to disclosure of personally identifiable information was modified, changed by Ryan lane link https://www.mediawiki.org/w/index.php?diff=513118 edit summary: [04:23:46] if just the bot user was in he nagiocmd group, and no users could get on the host, at least it could just execute hardcoded commands like scheduled downtime, but not arbitrary comamnds [04:23:46] PROBLEM host: dumps-4 is DOWN address: dumps-4 CRITICAL - Host Unreachable (dumps-4) [04:23:46] PROBLEM host: dumpster01 is DOWN address: dumpster01 CRITICAL - Host Unreachable (dumpster01) [04:23:55] ah. true [04:23:58] that would be better [04:24:31] I'd be ok with that [04:32:47] since we'd still have to pass args (hostname, duration of downtime) we'd really have to sanitize user input though to avoid injections .. [04:41:48] well [04:41:57] yeah. that's true [04:42:05] ascii only [04:42:08] for hostnames [04:42:20] well, same restrictions as on creation of hostnames [04:42:29] also, the bot should check to ensure the host actually exists [04:42:45] by querying for a list of hosts, then checking the string against it [04:42:59] +1 [04:43:09] then the duration should be [smhd] [04:43:37] that way we never use the strings to make queries or to run commands [04:43:40] several scripts to do it via shell or cron http://exchange.nagios.org/directory/Addons/Scheduled-Downtime [04:43:55] yep [04:49:46] PROBLEM host: hugglewa-w1 is DOWN address: hugglewa-w1 CRITICAL - Host Unreachable (hugglewa-w1) [04:49:46] PROBLEM host: puppet-lucid is DOWN address: puppet-lucid CRITICAL - Host Unreachable (puppet-lucid) [04:49:56] PROBLEM host: turnkey-1 is DOWN address: turnkey-1 CRITICAL - Host Unreachable (turnkey-1) [04:50:46] PROBLEM host: pad2 is DOWN address: pad2 CRITICAL - Host Unreachable (pad2) [04:52:46] PROBLEM host: webserver-lcarr is DOWN address: webserver-lcarr CRITICAL - Host Unreachable (webserver-lcarr) [04:53:46] PROBLEM host: dumpster01 is DOWN address: dumpster01 CRITICAL - Host Unreachable (dumpster01) [04:53:46] PROBLEM host: dumps-4 is DOWN address: dumps-4 CRITICAL - Host Unreachable (dumps-4) [05:18:32] !log nagios - put all the hosts currently down into scheduled downtimes for the next 3 days with manual bash commands [05:18:32] Logged the message, Master [05:19:01] ^ the bot notifications should stop flooding the channel now :) [05:19:46] PROBLEM host: hugglewa-w1 is DOWN address: hugglewa-w1 CRITICAL - Host Unreachable (hugglewa-w1) [05:21:27] well, besides this one because i misspelled the hostname and it has a different reason as well.. adding it anyways [05:21:59] http://wikitech.wikimedia.org/view/OpenStack#Mounting_an_instance.27s_disk [05:22:02] that's fun documentation :) [05:22:10] ah. cool. thanks [05:22:39] that's how to mount any of the kinds of disks we have in use :) [05:23:17] it's cool that qemu has commands for doing that [05:23:49] * werdna waves Ryan_Lane  [05:23:52] that said, I still have no clue what's wrong with hugglewa-w1 [05:23:55] werdna: howdy [05:24:10] ahaa! @ mounting disks [05:24:30] it's doing a dhcp lookup and getting a return [05:24:32] hugglewa-w1 i added comment "networking config, FIXME?" or so [05:24:39] I have no clue why it isn't getting an IP [05:24:54] or setting one, that is [05:26:18] what are you doing working at this time, Ryan_Lane? [05:26:22] It's almost time for ME to go home :) [05:26:34] heh [05:26:44] I needed to write an email [05:26:50] then I wanted to troubleshoot an issue [05:27:08] then I realized I should write documentation on what I was doing, so other people would know how to do it [05:28:26] !log hugglewa I can't seem to get hugglewa-w1 to boot. It gets an IP via DHCP, but seems to fail its networking somehow. [05:28:27] Logged the message, Master [05:29:00] !log hugglewa it may be good to delete/recreate. Let me know if you need to rescue any data off of it, I can do so before deletion [05:29:01] Logged the message, Master [05:29:19] mutante: it's possible to save an instance's data by mounting its disks :) [05:29:35] or to reconfigure it, if someone massively fucked it up [05:31:47] gotcha. nice! [05:34:07] http://wikitech.wikimedia.org/view/Nagios#Scheduling_downtimes_with_a_shell_command [05:34:15] bbiaw, really need to get some food [12:46:02] petan|wk: the hugglewa-w1 instance sleeps... [14:11:34] IWorld|mobile: what [14:12:44] petan|wk: do you know if its possible to enable something like $wgDebugLogFile on commons.wikimedia.beta.wmflabs.org [14:12:51] j^: yes [14:13:00] we have an issue with video uploads and i want to find out whats causing the error [14:13:09] if ( $wgDBname == "commonswiki" ) { $wgDe... [14:13:23] j^: you can insert that to InitialiseSettings.php [14:14:16] j^: you need the transcoding instance running for that [14:14:18] because it's down [14:14:46] petan|wk: ah transcoding is the next step, right now api / upload throws a 500 error [14:15:01] some way to find out what web server the requests is going to? [14:15:07] ah it's up [14:15:20] InitialiseSettings.php has some udp logging is that also available on labs? [14:15:29] web server can be found when you open source of page you attempt to open [14:16:00] I don't know if udp logging is possible since there is nothing listening for them [14:16:24] but if you want I can enable some kind of logger for that [14:16:46] if thats possible would be great since otherwise i have to check all web instances for the error each time [14:16:57] ok [14:17:19] since afaik api requests might not end up at one instance but could be distributed over all of them [14:17:32] sure [14:24:38] try usr/local/apache/common-local/errors [14:24:40] errors.log [14:24:53] it's a shared log file [14:24:59] it's already getting filled up [14:25:00] :D [14:25:14] ah great, thanks [14:52:55] re [14:53:55] PROBLEM Current Load is now: CRITICAL on pediapress-ocg3 pediapress-ocg3 output: CHECK_NRPE: Error - Could not complete SSL handshake. [14:54:35] PROBLEM Current Users is now: CRITICAL on pediapress-ocg3 pediapress-ocg3 output: CHECK_NRPE: Error - Could not complete SSL handshake. [14:54:48] petan|wk: i get a lot of Unable to load Tor exit node list: cold load disabled on page-views. [14:55:15] PROBLEM Disk Space is now: CRITICAL on pediapress-ocg3 pediapress-ocg3 output: CHECK_NRPE: Error - Could not complete SSL handshake. [14:55:21] not sure thats a known problem [14:56:05] PROBLEM Free ram is now: CRITICAL on pediapress-ocg3 pediapress-ocg3 output: CHECK_NRPE: Error - Could not complete SSL handshake. [14:57:08] j^: no idea what is that [14:57:25] PROBLEM Total Processes is now: CRITICAL on pediapress-ocg3 pediapress-ocg3 output: CHECK_NRPE: Error - Could not complete SSL handshake. [14:57:30] probably some error from extension [14:58:15] PROBLEM dpkg-check is now: CRITICAL on pediapress-ocg3 pediapress-ocg3 output: CHECK_NRPE: Error - Could not complete SSL handshake. [14:58:55] RECOVERY Current Load is now: OK on pediapress-ocg3 pediapress-ocg3 output: OK - load average: 1.42, 1.78, 1.10 [14:59:35] RECOVERY Current Users is now: OK on pediapress-ocg3 pediapress-ocg3 output: USERS OK - 1 users currently logged in [15:00:15] RECOVERY Disk Space is now: OK on pediapress-ocg3 pediapress-ocg3 output: DISK OK [15:01:05] RECOVERY Free ram is now: OK on pediapress-ocg3 pediapress-ocg3 output: OK: 88% free memory [15:01:56] petan|wk: is it possible to log exceptions somewhere or show them on the web wgShowExceptionDetails=True [15:02:10] not sure all requests go in the log right now [15:02:25] RECOVERY Total Processes is now: OK on pediapress-ocg3 pediapress-ocg3 output: PROCS OK: 85 processes [15:03:15] RECOVERY dpkg-check is now: OK on pediapress-ocg3 pediapress-ocg3 output: All packages OK [15:10:49] j^: for all wikis or common wiki only? [15:11:15] petan|wk: just commons.wikimedia.beta.wmflabs.org [15:13:27] ok [15:14:45] hi petan|wk [15:14:49] hey [15:14:51] on http://labs.wikimedia.beta.wmflabs.org/wiki/Global_Requests 'd like to test rev:113591 for passing parameters to the UploadWizard (something that's needed mainly for Wiki Loves Monuments, probably more testing people will appear here soon ;-) I'd need to create UploadWizard campaigns, thank you! --Elya (talk) 21:48, 19 March 2012 (UTC) [15:14:59] https://www.mediawiki.org/wiki/Special:Code/MediaWiki/113591 [15:15:04] can you put that on labs? [15:15:36] ok [15:15:47] thanks [15:16:56] hmm petan, for https://bugzilla.wikimedia.org/show_bug.cgi?id=28633 werdna has created a patch at https://www.mediawiki.org/wiki/Special:Code/MediaWiki/111217, but there's been no-one able to test it yet, do you know anyone that has a bit of abuse filter experience to be able to test and possibly code review it? [15:17:20] code review is problem [15:17:28] process is broken [15:17:52] btw ticket is flagged as done [15:17:57] is it still open? [15:18:08] should I reopen? [15:18:47] hmm [15:18:54] well the bug patch has been created [15:19:04] but I don't know if it's been implemented [15:19:13] if u think https://www.mediawiki.org/wiki/Special:Code/MediaWiki/111217 [15:19:13] AFAIK it hasn't because the code review still says "new" [15:19:21] yeah [15:19:24] ah [15:19:30] I will check it [15:21:24] it's big diff [15:21:32] I flagged it as tested since it works on labs [15:21:39] it's definitely not on production [15:21:50] it works on labs? [15:21:52] it won't be likely until deployment on .20 [15:21:57] yes it does [15:22:04] did you try it or did it just not break the whole wiki? [15:22:10] hmm when is .20? [15:22:20] I wanted it to be out in 1.19 :( [15:22:21] hm... I don't know [15:22:55] ... and labs went back down again [15:23:46] it's pretty new patch [15:23:52] it couldn' [15:23:57] couldn't be in 19 [15:24:56] seems to work to me [15:24:59] what's down [15:25:49] http://labs.wikimedia.beta.wmflabs.org/wiki/Special:RecentChanges [15:25:52] HTTP Error 500 (Internal Server Error): An unexpected condition was encountered while the server was attempting to fulfill the request. [15:26:19] it opens to me :) [15:26:21] and then Error 139 (net::ERR_TEMPORARILY_THROTTLED): Requests to the server have been temporarily throttled. [15:26:30] :o [15:26:33] refresh [15:26:47] age User:Vmcherriekkrebae .(Spam: content was: "[http://kolejlegenda.edu.my/ooi-chong-seong/ Ooi Chong Seon [15:26:50] I am refreshing [15:26:56] this is what the bots want you to do [15:27:02] you shouldn't leave the url in comment ;)O [15:27:24] bah I usually don't [15:27:26] Thehelpfulone (Talk | contribs) deleted page User:Vmcherriekkrebae [15:27:29] ok [15:27:31] :D [15:28:28] oh that's a lie, http://labs.wikimedia.beta.wmflabs.org/wiki/Special:Log/delete :P [15:28:40] I won't* [15:28:43] hi petan|wk. Our instance is off. [15:28:46] whats lie [15:29:24] anyways, so can I test the abuse filter patch petan|wk, have you applied it? [15:29:47] yes [15:29:49] it's there [15:30:28] petan|wk: can you setup a new instance? [15:32:50] for? [15:33:13] Hugglewa [15:33:22] I am trying to fix current one [15:33:27] ok [15:33:55] PROBLEM Current Load is now: CRITICAL on diablo-lucid diablo-lucid output: Connection refused by host [15:34:35] PROBLEM Current Users is now: CRITICAL on diablo-lucid diablo-lucid output: Connection refused by host [15:35:15] PROBLEM Disk Space is now: CRITICAL on diablo-lucid diablo-lucid output: CHECK_NRPE: Error - Could not complete SSL handshake. [15:36:05] PROBLEM Free ram is now: CRITICAL on diablo-lucid diablo-lucid output: CHECK_NRPE: Error - Could not complete SSL handshake. [15:37:25] PROBLEM Total Processes is now: CRITICAL on diablo-lucid diablo-lucid output: CHECK_NRPE: Error - Could not complete SSL handshake. [15:38:15] PROBLEM dpkg-check is now: CRITICAL on diablo-lucid diablo-lucid output: CHECK_NRPE: Error - Could not complete SSL handshake. [15:39:28] Change on 12mediawiki a page Wikimedia Labs was modified, changed by Sumanah link https://www.mediawiki.org/w/index.php?diff=513338 edit summary: Project:Labsconsole_accounts to ask for an account [15:40:11] petan|wk: available for a PM? [15:41:00] petan|wk: see my query ;) [15:41:29] !log deployment-prep j: install php-pear on deployment-web3/4/5 required by TMH [15:41:30] Logged the message, Master [15:41:34] !log deployment-prep petrb: getting sql server down I found a bunch of corrupted db's, rollback is necessary [15:41:35] Logged the message, Master [15:41:58] petan|wk: ok, think i found the problem, you can disable the debug output and logs again [15:42:11] j^: ok [15:42:21] but there is another problem, I guess you need the test site right now? [15:42:29] I found that most of db's are broken [15:42:46] since we moved to gluster and there was outage on labs, databases got corrupted a lot [15:42:57] Change on 12mediawiki a page Wikimedia Labs was modified, changed by IWorld link https://www.mediawiki.org/w/index.php?diff=513344 edit summary: better [15:43:00] I wanted to fix it now, but if you want I can do it later [15:43:59] petan|wk: you can fix it now, am done with fixing and testing will happen a later [15:44:04] ok [16:09:37] --> https://www.mediawiki.org/wiki/Project:Labsconsole_accounts [16:12:00] !log deployment-prep petrb: mysql is back up [16:12:01] Logged the message, Master [16:16:05] !log deployment-prep petrb: it seems that corruption of db is worse than I expected, need to restore backup old few months [16:16:06] Logged the message, Master [16:16:23] Damianz: here? [16:16:40] Sortof [16:16:50] db is totaly broken, around ~400 db's are corrupted [16:17:08] :( [16:17:12] it's pretty weird, because it seems that it was corrupted even in backup [16:17:24] I just recovered all of them and it still throw same errors [16:17:30] I assume the backup is just sqldumps not a copy of the innodb files? [16:17:43] d error "1033: Incorrect information in file: './bswiktionary/#sql-66c8_68.frm' [16:17:57] it's both [16:18:02] either dump and copy [16:18:11] dump is older though [16:18:22] I am about to recover from dump now [16:19:07] I don't even know if data are corrupted or schemes only [16:19:19] I could try to copy just the scheme now [16:19:25] and restore the original data files [16:19:31] Copying just the frm files might work [16:19:38] damn I hate to do that :D [16:19:38] It's rather weird it's broken though [16:19:42] yes it is [16:19:50] I think it's gluster's fail [16:20:01] how does it deal with quotas? [16:20:15] I hope it doesn't redirect data to /dev/null [16:20:18] when you exceed it [16:20:30] that would explain few things [16:21:20] Hmm not sure tbh, probably refuses writes [16:21:25] hm [16:21:28] that suck [16:22:01] it turns to all updates Ryan does actually makes it all worse than better :D [16:22:16] I am kind of scared of dedicated sql server [16:22:34] it's going to be inaccessible for us so such operations will not be possible in future [16:22:56] when you corrupt db, instead of recovering it yourself you will need to wait for someone to do that [16:23:36] Dedicated over virtual should be better but hmm good point [16:23:51] I will need to discuss this with him, at some point I like this we have [16:23:52] now [16:24:00] because we can do all maintenace etc [16:24:13] create new users, apply patches, import large data files [16:24:39] he even wanted to set up some limits on that shared server [16:24:51] that would definitely turn it to something worse than we have now [16:25:56] but who knows when this shared db is going to work :) [16:54:35] RECOVERY Free ram is now: OK on deployment-transcoding deployment-transcoding output: OK: 62% free memory [16:55:45] RECOVERY Total Processes is now: OK on deployment-transcoding deployment-transcoding output: PROCS OK: 107 processes [16:56:35] RECOVERY Current Load is now: OK on deployment-transcoding deployment-transcoding output: OK - load average: 1.36, 0.62, 0.23 [16:56:35] RECOVERY dpkg-check is now: OK on deployment-transcoding deployment-transcoding output: All packages OK [16:57:12] petan|wk: are you here? [16:57:15] RECOVERY SSH is now: OK on deployment-transcoding deployment-transcoding output: SSH OK - OpenSSH_5.3p1 Debian-3ubuntu7 (protocol 2.0) [16:57:55] RECOVERY Current Users is now: OK on deployment-transcoding deployment-transcoding output: USERS OK - 1 users currently logged in [16:58:45] RECOVERY Disk Space is now: OK on deployment-transcoding deployment-transcoding output: DISK OK [17:03:55] PROBLEM Current Load is now: CRITICAL on diablo-2 diablo-2 output: CHECK_NRPE: Error - Could not complete SSL handshake. [17:03:55] PROBLEM Current Load is now: CRITICAL on salt salt output: CHECK_NRPE: Error - Could not complete SSL handshake. [17:04:45] PROBLEM Current Users is now: CRITICAL on diablo-2 diablo-2 output: CHECK_NRPE: Error - Could not complete SSL handshake. [17:04:45] PROBLEM Current Users is now: CRITICAL on salt salt output: CHECK_NRPE: Error - Could not complete SSL handshake. [17:05:15] PROBLEM Disk Space is now: CRITICAL on diablo-2 diablo-2 output: CHECK_NRPE: Error - Could not complete SSL handshake. [17:05:15] PROBLEM Disk Space is now: CRITICAL on salt salt output: CHECK_NRPE: Error - Could not complete SSL handshake. [17:06:05] PROBLEM Free ram is now: CRITICAL on diablo-2 diablo-2 output: CHECK_NRPE: Error - Could not complete SSL handshake. [17:06:05] PROBLEM Free ram is now: CRITICAL on salt salt output: CHECK_NRPE: Error - Could not complete SSL handshake. [17:07:25] PROBLEM Total Processes is now: CRITICAL on diablo-2 diablo-2 output: CHECK_NRPE: Error - Could not complete SSL handshake. [17:07:30] PROBLEM Total Processes is now: CRITICAL on salt salt output: CHECK_NRPE: Error - Could not complete SSL handshake. [17:08:15] PROBLEM dpkg-check is now: CRITICAL on diablo-2 diablo-2 output: CHECK_NRPE: Error - Could not complete SSL handshake. [17:08:15] PROBLEM dpkg-check is now: CRITICAL on salt salt output: CHECK_NRPE: Error - Could not complete SSL handshake. [17:21:57] hi ^demon [17:22:02] <^demon> Hi [18:04:35] Change on 12mediawiki a page Wikimedia Labs/Account creation text was modified, changed by Cooper53 link https://www.mediawiki.org/w/index.php?diff=513397 edit summary: reworded language to make more express [18:05:48] RECOVERY Disk Space is now: OK on diablo-2 diablo-2 output: DISK OK [18:05:58] RECOVERY Free ram is now: OK on diablo-2 diablo-2 output: OK: 66% free memory [18:06:35] Change on 12mediawiki a page Wikimedia Labs/Agreement to disclosure of personally identifiable information was modified, changed by Cooper53 link https://www.mediawiki.org/w/index.php?diff=513398 edit summary: [18:07:28] RECOVERY Total Processes is now: OK on diablo-2 diablo-2 output: PROCS OK: 114 processes [18:08:48] RECOVERY Current Load is now: OK on diablo-2 diablo-2 output: OK - load average: 1.50, 2.06, 1.42 [18:09:38] RECOVERY Current Users is now: OK on diablo-2 diablo-2 output: USERS OK - 1 users currently logged in [18:13:08] RECOVERY dpkg-check is now: OK on diablo-2 diablo-2 output: All packages OK [18:19:40] Change on 12mediawiki a page Wikimedia Labs/Account creation text was modified, changed by Cooper53 link https://www.mediawiki.org/w/index.php?diff=513409 edit summary: [18:32:31] Ryan_Lane: Can you suggest a way for me to install python-mwclient on oneiric? The default oneiric apt repos don't have it. [18:34:05] I mean, I can just add the lucid repo, I guess... [18:36:00] oh [18:36:03] let me add it to the repo [18:36:07] it's just missing [18:36:09] sec [18:37:34] andrewbogott: apt-get update && apt-get install python-mwclient [18:37:45] I copied it to the oneiric repo from the lucid one [18:38:04] cool, thanks. [18:38:08] yw [18:44:17] Hm... can you do the same for wikimedia-raid-utils? [18:55:06] Change on 12mediawiki a page Wikimedia Labs/Terms of use was modified, changed by Cooper53 link https://www.mediawiki.org/w/index.php?diff=513425 edit summary: added provisions [19:03:51] Change on 12mediawiki a page Wikimedia Labs/Terms of use was modified, changed by Ryan lane link https://www.mediawiki.org/w/index.php?diff=513427 edit summary: [19:05:43] Change on 12mediawiki a page Wikimedia Labs/Terms of use was modified, changed by Ryan lane link https://www.mediawiki.org/w/index.php?diff=513429 edit summary: [19:06:00] andrewbogott: sure. gimme a sec [19:06:18] andrewbogott: done [19:12:22] New patchset: Sara; "First iteration of adding ganglia for labs." [operations/puppet] (test) - https://gerrit.wikimedia.org/r/2157 [19:12:33] New review: gerrit2; "Lint check passed." [operations/puppet] (test); V: 1 - https://gerrit.wikimedia.org/r/2157 [19:28:19] Ryan_Lane: Does OSM make use of nova-ajax-console-proxy? It's explicitly set up by puppet, but seems to be deprecated in essex. [19:28:30] Not clear if it's in puppet due to necessity or completeness. [19:32:14] well, I tried using it [19:32:19] and it didn't work well [19:32:21] we can remove it [19:32:25] ok [19:32:28] in essex there's some new, better, option [19:32:32] novnc, I think [19:32:47] I honestly don't know if consoles are worthwhile [19:32:58] Yeah, although I'm not sure I've ever seen vnc work either. [19:33:01] since people don't have the root password [19:33:20] maybe it would be good for us, to troubleshoot? [19:33:49] It could be useful for rescuing an otherwise ruined instance. [19:36:19] New patchset: Andrew Bogott; "Change the set of nova services required for essex." [operations/puppet] (test) - https://gerrit.wikimedia.org/r/3296 [19:36:31] New review: gerrit2; "Lint check passed." [operations/puppet] (test); V: 1 - https://gerrit.wikimedia.org/r/3296 [19:36:55] New review: Andrew Bogott; "(no comment)" [operations/puppet] (test); V: 1 C: 2; - https://gerrit.wikimedia.org/r/3296 [19:36:58] Change merged: Andrew Bogott; [operations/puppet] (test) - https://gerrit.wikimedia.org/r/3296 [19:42:08] Nice profile, Ryan_Lane :) [19:42:14] thanks :) [19:44:34] New patchset: Andrew Bogott; "Typotastic!" [operations/puppet] (test) - https://gerrit.wikimedia.org/r/3297 [19:44:45] New review: gerrit2; "Lint check passed." [operations/puppet] (test); V: 1 - https://gerrit.wikimedia.org/r/3297 [19:44:51] New review: Andrew Bogott; "(no comment)" [operations/puppet] (test); V: 1 C: 2; - https://gerrit.wikimedia.org/r/3297 [19:44:53] Change merged: Andrew Bogott; [operations/puppet] (test) - https://gerrit.wikimedia.org/r/3297 [19:46:28] New patchset: Sara; "First iteration of adding ganglia for labs." [operations/puppet] (test) - https://gerrit.wikimedia.org/r/2157 [19:46:39] New review: gerrit2; "Lint check passed." [operations/puppet] (test); V: 1 - https://gerrit.wikimedia.org/r/2157 [19:52:14] petan or petan|wk: are you here? [20:15:08] New patchset: Andrew Bogott; "Switch a bunch of ls -s commands to ls -sf." [operations/puppet] (test) - https://gerrit.wikimedia.org/r/3298 [20:15:22] New review: gerrit2; "Lint check passed." [operations/puppet] (test); V: 1 - https://gerrit.wikimedia.org/r/3298 [20:15:54] Ryan_Lane: Does this change look dangerous to you? https://gerrit.wikimedia.org/r/#change,3298 [20:16:11] Arguably it conceals mistakes where those ln commands are called a second time... [20:16:22] But there might also be cases where we /want/ the old links to get clobbered. [20:58:26] New patchset: Andrew Bogott; "Turn off nova-volume for now." [operations/puppet] (test) - https://gerrit.wikimedia.org/r/3299 [20:58:37] New patchset: Andrew Bogott; "Ensure that /etc/puppetd exists before sticking a file there." [operations/puppet] (test) - https://gerrit.wikimedia.org/r/3300 [20:58:49] New review: gerrit2; "Lint check passed." [operations/puppet] (test); V: 1 - https://gerrit.wikimedia.org/r/3299 [20:58:49] New review: gerrit2; "Lint check passed." [operations/puppet] (test); V: 1 - https://gerrit.wikimedia.org/r/3300 [20:59:12] New review: Andrew Bogott; "(no comment)" [operations/puppet] (test); V: 1 C: 2; - https://gerrit.wikimedia.org/r/3300 [20:59:36] New review: Andrew Bogott; "(no comment)" [operations/puppet] (test); V: 1 C: 2; - https://gerrit.wikimedia.org/r/3299 [21:00:41] New review: Andrew Bogott; "(no comment)" [operations/puppet] (test); V: 1 C: 2; - https://gerrit.wikimedia.org/r/3298 [21:00:44] Change merged: Andrew Bogott; [operations/puppet] (test) - https://gerrit.wikimedia.org/r/3300 [21:00:45] Change merged: Andrew Bogott; [operations/puppet] (test) - https://gerrit.wikimedia.org/r/3299 [21:00:46] Change merged: Andrew Bogott; [operations/puppet] (test) - https://gerrit.wikimedia.org/r/3298 [21:21:36] Reedy: is it possible to totally remove checkuser ability in beta? [21:21:42] do people need that access? ever? [21:21:46] even for testing? [21:22:40] Just disabling the extension should be enough for checkuser I think [21:22:45] aw [21:22:47] err [21:22:49] awesome [21:23:26] In theory it could be there for testing, but it's not like it's under any active maintenance [21:23:37] greay [21:23:41] *great [21:24:57] Make sure you also drop the cu_log and cu_changes tables [21:26:05] ok [21:38:26] Reedy: so... [21:38:33] Reedy: who's managing beta right now? [21:38:44] No idea, I'm not [21:38:53] robla: ^^? [21:39:07] because legal needs us to make some changes [21:39:14] petan: ^^ [21:44:30] Ryan_Lane: Don't you love volunteer-managed projects? ;) [21:44:35] :D [21:44:41] it's a problem, legally :) [21:51:49] Change on 12mediawiki a page Wikimedia Labs/Terms of use was modified, changed by Ryan lane link https://www.mediawiki.org/w/index.php?diff=513489 edit summary: [21:52:17] Change on 12mediawiki a page Wikimedia Labs/Terms of use was modified, changed by Ryan lane link https://www.mediawiki.org/w/index.php?diff=513490 edit summary: [21:54:42] Change on 12mediawiki a page Wikimedia Labs/Terms of use was modified, changed by Ryan lane link https://www.mediawiki.org/w/index.php?diff=513492 edit summary: [22:02:02] Change on 12mediawiki a page Wikimedia Labs/Terms of use was modified, changed by Ryan lane link https://www.mediawiki.org/w/index.php?diff=513498 edit summary: [22:04:27] Change on 12mediawiki a page Wikimedia Labs/Terms of use was modified, changed by Ryan lane link https://www.mediawiki.org/w/index.php?diff=513500 edit summary: [22:06:49] Change on 12mediawiki a page Wikimedia Labs was modified, changed by Ryan lane link https://www.mediawiki.org/w/index.php?diff=513501 edit summary: /* Documents */ [22:17:09] Change on 12mediawiki a page Wikimedia Labs/Things to fix in beta was modified, changed by Ryan lane link https://www.mediawiki.org/w/index.php?diff=513509 edit summary: [22:31:41] Change on 12mediawiki a page Wikimedia Labs/Things to fix in beta was modified, changed by Ryan lane link https://www.mediawiki.org/w/index.php?diff=513513 edit summary: [22:43:56] PROBLEM Current Load is now: CRITICAL on diablo-3 diablo-3 output: CHECK_NRPE: Error - Could not complete SSL handshake. [22:43:56] PROBLEM Current Load is now: CRITICAL on essex-1 essex-1 output: CHECK_NRPE: Error - Could not complete SSL handshake. [22:44:46] PROBLEM Current Users is now: CRITICAL on diablo-3 diablo-3 output: CHECK_NRPE: Error - Could not complete SSL handshake. [22:44:46] PROBLEM Current Users is now: CRITICAL on essex-1 essex-1 output: CHECK_NRPE: Error - Could not complete SSL handshake. [22:45:16] PROBLEM Disk Space is now: CRITICAL on diablo-3 diablo-3 output: CHECK_NRPE: Error - Could not complete SSL handshake. [22:45:16] PROBLEM Disk Space is now: CRITICAL on essex-1 essex-1 output: CHECK_NRPE: Error - Could not complete SSL handshake. [22:46:06] PROBLEM Free ram is now: CRITICAL on diablo-3 diablo-3 output: CHECK_NRPE: Error - Could not complete SSL handshake. [22:46:06] PROBLEM Free ram is now: CRITICAL on essex-1 essex-1 output: CHECK_NRPE: Error - Could not complete SSL handshake. [22:47:26] PROBLEM Total Processes is now: CRITICAL on diablo-3 diablo-3 output: CHECK_NRPE: Error - Could not complete SSL handshake. [22:47:31] PROBLEM Total Processes is now: CRITICAL on essex-1 essex-1 output: CHECK_NRPE: Error - Could not complete SSL handshake. [22:48:16] PROBLEM dpkg-check is now: CRITICAL on diablo-3 diablo-3 output: CHECK_NRPE: Error - Could not complete SSL handshake. [22:48:16] PROBLEM dpkg-check is now: CRITICAL on essex-1 essex-1 output: CHECK_NRPE: Error - Could not complete SSL handshake. [22:49:46] RECOVERY Current Users is now: OK on diablo-3 diablo-3 output: USERS OK - 1 users currently logged in [22:51:06] RECOVERY Free ram is now: OK on diablo-3 diablo-3 output: OK: 75% free memory [22:52:26] RECOVERY Total Processes is now: OK on diablo-3 diablo-3 output: PROCS OK: 92 processes [22:52:31] RECOVERY Total Processes is now: OK on essex-1 essex-1 output: PROCS OK: 112 processes [22:53:16] RECOVERY dpkg-check is now: OK on essex-1 essex-1 output: All packages OK [22:53:56] RECOVERY Current Load is now: OK on diablo-3 diablo-3 output: OK - load average: 0.01, 0.40, 0.49 [22:53:56] RECOVERY Current Load is now: OK on essex-1 essex-1 output: OK - load average: 1.00, 1.11, 0.78 [22:54:47] RECOVERY Current Users is now: OK on essex-1 essex-1 output: USERS OK - 2 users currently logged in [22:56:06] RECOVERY Free ram is now: OK on essex-1 essex-1 output: OK: 39% free memory [23:04:39] hey Ryan_Lane - do you know what project / install / whatever uses the wikimedia-en-labs-local-thumb bucket on the production swift cluster? [23:04:54] and should it be using it? [23:05:02] something is using it? [23:05:11] I'd imagine something is using instantcommons [23:05:11] I think my nightly cleaner script is probably making trouble for whomever is using it. [23:05:50] nah, it should be fine [23:07:10] the cleaner is likely deleting all the content in the bucket every night. [23:07:44] hm. maybe not all. [23:08:13] but definitely most. [23:08:19] well, instantcommons always hits the server again [23:08:28] well, it may cache for a short period of time [23:08:44] I don't know what instantcommons is. [23:08:59] hm [23:09:00] so... [23:09:09] if I configured a wiki to use instant commons [23:09:21] (if it's complicated, that's ok... I probably don't actually need to know...) [23:09:34] and then I went to mywiki.org/wiki/File:Lakeside_of_Mono_Lake.jpg [23:09:46] it would show the image and text for http://commons.wikimedia.org/wiki/File:Lakeside_of_Mono_Lake.jpg [23:09:50] with attribution [23:10:06] as if it was in my wiki [23:10:13] it pulls from our servers [23:10:14] huh. cute. [23:10:23] so its legit that it's hitting production [23:10:24] if you want to override the image, you can upload locally [23:10:27] yes [23:10:34] rather than a misconfiguration clawing its ways into prod. [23:10:38] yep [23:10:43] excellent. [23:10:55] though... that wouldn't create the labs bucket. [23:11:06] (cuz the prod part would be hitting the regular commons buckets) [23:11:34] well, local uploads would hit the labs buckets [23:11:49] or they'd just turn off instant commons, when we have local swift [23:11:58] and those are the thumbnails (and maybe others?) that I'm nuking. [23:12:07] hmm. [23:13:48] I'm going to tell the swift cleaner to ignore all labs buckets. [23:13:55] PROBLEM Current Load is now: CRITICAL on essex-2 essex-2 output: CHECK_NRPE: Error - Could not complete SSL handshake. [23:14:35] PROBLEM Current Users is now: CRITICAL on essex-2 essex-2 output: CHECK_NRPE: Error - Could not complete SSL handshake. [23:15:15] PROBLEM Disk Space is now: CRITICAL on essex-2 essex-2 output: CHECK_NRPE: Error - Could not complete SSL handshake. [23:16:05] PROBLEM Free ram is now: CRITICAL on essex-2 essex-2 output: CHECK_NRPE: Error - Could not complete SSL handshake. [23:17:25] PROBLEM Total Processes is now: CRITICAL on essex-2 essex-2 output: CHECK_NRPE: Error - Could not complete SSL handshake. [23:18:15] PROBLEM dpkg-check is now: CRITICAL on essex-2 essex-2 output: CHECK_NRPE: Error - Could not complete SSL handshake. [23:21:06] RECOVERY Free ram is now: OK on essex-2 essex-2 output: OK: 81% free memory [23:21:43] maplebed: wait… what are labs buckets? [23:21:48] I'm confused by that [23:22:23] Ryan_Lane: there are containers in our production swift cluster that have the word 'labs' in their title. [23:22:26] RECOVERY Total Processes is now: OK on essex-2 essex-2 output: PROCS OK: 102 processes [23:22:34] I'm very confused by that [23:22:36] eg 'wikimedia-en-labs-local-thumb' [23:22:48] that makes no sense [23:22:55] and there's content in those buckets. [23:23:01] burn it? with fire? [23:23:03] who made it? [23:23:08] oh [23:23:09] wait [23:23:15] maybe...... [23:23:39] welcome to the hell that is the fact we can't delete wikis: http://en.labs.wikimedia.org/wiki/Main_Page [23:23:56] RECOVERY Current Load is now: OK on essex-2 essex-2 output: OK - load average: 1.08, 1.19, 0.87 [23:24:14] I'd *love* to delete that wiki [23:24:21] and every other wiki with labs in the name [23:24:36] RECOVERY Current Users is now: OK on essex-2 essex-2 output: USERS OK - 1 users currently logged in [23:25:16] RECOVERY Disk Space is now: OK on essex-2 essex-2 output: DISK OK [23:25:28] why can't we delete wikis? or rather, why can't we delete old no-longer-used development wikis? [23:25:41] we have no way of deleting wikis [23:25:48] that's on the production cluster [23:33:06] RECOVERY dpkg-check is now: OK on essex-2 essex-2 output: All packages OK [23:44:16] robla: around today? [23:44:29] budget meetings until 5pm [23:44:33] ah [23:44:56] robla: just wanted to point you to this: https://www.mediawiki.org/wiki/Wikimedia_Labs/Things_to_fix_in_beta [23:45:03] I'm not sure who's handling beta now [23:45:10] legal needs us to make some changes [23:45:52] I doubt it's just a beta issue [23:46:24] it's more than just a beta issue [23:46:36] but we're getting major community complaints about beta [23:47:13] it's the most visible labs project, so they'd like us to start there [23:48:02] we've been working out the rest of the legal stuff over the past two weeks, but complaints started rolling in this week :(