[00:16:25] PROBLEM - MySQL Slave Running on db12 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:16:43] PROBLEM - MySQL Recent Restart on db12 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:17:10] PROBLEM - MySQL Idle Transactions on db12 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:17:46] RECOVERY - MySQL Slave Running on db12 is OK: OK replication Slave_IO_Running: Yes Slave_SQL_Running: Yes Last_Error: [00:17:55] RECOVERY - MySQL Recent Restart on db12 is OK: OK 5796904 seconds since restart [00:18:31] RECOVERY - MySQL Idle Transactions on db12 is OK: OK longest blocking idle transaction sleeps for 0 seconds [00:30:31] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:32:46] RECOVERY - MySQL Slave Delay on db12 is OK: OK replication delay 0 seconds [00:33:04] RECOVERY - MySQL Replication Heartbeat on db12 is OK: OK replication delay 0 seconds [00:33:31] PROBLEM - Puppet freshness on srv281 is CRITICAL: Puppet has not run in the last 10 hours [00:40:25] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 7.332 seconds [00:45:22] PROBLEM - MySQL Slave Delay on db12 is CRITICAL: CRIT replication delay 201 seconds [00:45:49] PROBLEM - MySQL Replication Heartbeat on db12 is CRITICAL: CRIT replication delay 225 seconds [00:52:09] RECOVERY - MySQL Replication Heartbeat on db12 is OK: OK replication delay 30 seconds [00:52:36] RECOVERY - MySQL Slave Delay on db12 is OK: OK replication delay 0 seconds [00:56:27] New patchset: Tim Starling; "Disabled AFT due to db12 overload from ArticleFeedbackv5Hooks::contributionsData() queries" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/16939 [00:57:19] New review: Tim Starling; "Already live." [operations/mediawiki-config] (master); V: 1 C: 2; - https://gerrit.wikimedia.org/r/16939 [00:57:20] Change merged: Tim Starling; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/16939 [01:00:41] New patchset: Alex Monk; "(bug 38806) Enable FlaggedRevs on eswikibooks." [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/16941 [01:13:54] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [01:18:15] PROBLEM - Puppet freshness on neon is CRITICAL: Puppet has not run in the last 10 hours [01:25:27] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 5.508 seconds [01:42:15] PROBLEM - Puppet freshness on ocg3 is CRITICAL: Puppet has not run in the last 10 hours [01:42:24] PROBLEM - MySQL Slave Delay on db1025 is CRITICAL: CRIT replication delay 278 seconds [01:42:33] PROBLEM - MySQL Slave Delay on storage3 is CRITICAL: CRIT replication delay 281 seconds [01:49:09] PROBLEM - Misc_Db_Lag on storage3 is CRITICAL: CHECK MySQL REPLICATION - lag - CRITICAL - Seconds_Behind_Master : 677s [01:52:27] RECOVERY - MySQL Slave Delay on db1025 is OK: OK replication delay 5 seconds [01:53:21] RECOVERY - Misc_Db_Lag on storage3 is OK: CHECK MySQL REPLICATION - lag - OK - Seconds_Behind_Master : 3s [01:53:48] RECOVERY - MySQL Slave Delay on storage3 is OK: OK replication delay 12 seconds [01:59:03] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:10:18] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 0.779 seconds [04:13:26] PROBLEM - Puppet freshness on db63 is CRITICAL: Puppet has not run in the last 10 hours [04:53:13] PROBLEM - Puppet freshness on ms-be10 is CRITICAL: Puppet has not run in the last 10 hours [05:59:01] any ops around? [05:59:08] * jeremyb needs help on labsconsole [06:03:12] https://www.mediawiki.org/wiki/Developer_access#User:Madman is a straightforward request, no prior SVN account. but needs ops to create to edit through titleblacklist [06:03:23] https://labsconsole.wikimedia.org/wiki/MediaWiki:Titleblacklist [06:03:28] * jeremyb sleeps [06:19:05] I was around but not watching here, sorry... [06:19:20] and now I have the same question: any ops around that want to review a dns change? [06:19:37] /tmp/dns.diff-atg on sockpuppet, adding ms10 [06:24:28] PROBLEM - udp2log log age for aft on emery is CRITICAL: CRITICAL: log files /var/log/aft/clicktracking.log, have not been written in a critical amount of time. For most logs, this is 4 hours. For slow logs, this is 4 days. [06:27:10] RECOVERY - udp2log log age for aft on emery is OK: OK: all log files active [07:40:28] !log gallium/jenkins: updating Android SDKs [07:40:38] Logged the message, Master [07:55:33] New patchset: Hashar; "structure for WLMMobile nightly builds" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/16948 [07:56:10] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/16948 [07:57:23] apergos: would you mind submitting they easy https://gerrit.wikimedia.org/r/16948 please ? :-D [07:57:32] some simple files copy around for the gallium host [07:57:51] PROBLEM - Puppet freshness on mw60 is CRITICAL: Puppet has not run in the last 10 hours [07:57:54] would be nice to have it merged in sock puppet, but you can skip the manual puppet run on the gallium host (where the class is applied) [07:59:54] why are there directories with 644 permissions? [08:00:57] hashar: [08:01:20] ohh there is the recurse => true flag [08:01:27] so all files would be 0644 [08:01:37] and puppet automatically add the +x flag on directories [08:01:50] so that is merely to avoid having +x on regular files [08:02:00] ok [08:02:35] it automatically adds -x for owner/group/all? [08:02:44] I think so yes [08:02:51] ok then [08:03:07] Change merged: ArielGlenn; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/16948 [08:04:23] apergos: I can't find the reference in puppet doc :/ [08:04:34] ok well you'll fix it later if it's wrong :-P [08:04:48] well I just copy pasted from the previous block [08:04:51] you should do your test run on gallium now [08:04:52] so that must be correct :-] [08:04:58] yeah I figured you copy pasted it [08:05:09] however the next step of your logic.. meh :-D [08:05:25] thanks! will wait for puppet to kick in :-) [08:05:30] ok [08:05:41] thanks! [08:05:47] yw [08:06:59] hashar, did you see the duplicate /mnt/thumbs puppet rule in labs? [08:07:10] Platonides: yup [08:07:21] Platonides: would fix that today hopefully [08:07:40] I have just seen the mail, not really read it nor started to investigate it [08:10:42] Platonides: I am more worried about the wikimedia-task-appserver attempting to delete /mnt/upload6 ;-) [08:11:19] !log gallium/jenkins: deployed job for the WLMMobile nightly builds [08:11:27] Logged the message, Master [08:11:28] not just attempting, it was gone [08:11:52] I guess that it owns /mnt/upload6/common [08:12:11] then on removing, it deleted that file with its parent now-empty "folder" [08:12:57] I don't know why it would consider wikimedia-task-server as unneeded [08:13:11] although with a corrupted disk, who knows [08:13:18] lets move to -labs :-] [08:13:27] I was told that it happened continuosuly on production, though ! [09:33:32] New patchset: Hashar; "captcha generation packages" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/16952 [09:34:12] New patchset: Hashar; "document bastion has captcha related tools" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/16953 [09:34:48] New patchset: Hashar; "new role::bastion::*" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/16954 [09:35:25] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/16952 [09:35:25] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/16953 [09:35:25] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/16954 [10:19:51] New patchset: Hashar; "`rake` on contint host gallium" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/16957 [10:20:29] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/16957 [10:34:11] PROBLEM - Puppet freshness on srv281 is CRITICAL: Puppet has not run in the last 10 hours [10:51:44] PROBLEM - check_job_queue on neon is CRITICAL: JOBQUEUE CRITICAL - the following wikis have more than 9,999 jobs: , zhwiki (23548) [10:53:50] PROBLEM - check_job_queue on spence is CRITICAL: JOBQUEUE CRITICAL - the following wikis have more than 9,999 jobs: , zhwiki (23560) [11:18:44] PROBLEM - Puppet freshness on neon is CRITICAL: Puppet has not run in the last 10 hours [11:42:44] PROBLEM - Puppet freshness on ocg3 is CRITICAL: Puppet has not run in the last 10 hours [11:55:57] it's quiet today [11:56:13] because everything died on weekend [11:56:16] :D [11:59:52] nah, two minor incidents [11:59:57] that's fine :) [12:02:38] I think labs died a bit because of puppet [12:02:53] see !logs from weekend :P [12:03:16] actually new instances didn't work [12:04:51] ldap died I think [12:04:52] that's about it [12:29:22] paravoid: care to double check a small dns change? (just cause someone shoudl always double check 'em), see /tmp/dns.diff-atg o sockpuppet if you do [12:29:48] sec [12:30:13] it's fine, go ahead [12:30:17] trivial enough :) [12:31:18] yup [12:31:19] thanks [12:42:21] New patchset: ArielGlenn; "add ms10 to dhcp" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/16962 [12:42:58] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/16962 [13:05:01] Change merged: ArielGlenn; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/16962 [13:17:36] mark: around? [13:17:58] he's having some computer trouble :) [13:18:00] can I help? [13:19:14] i guess not, since you are not visiting datacenter, are you? [13:19:37] I am not [13:21:06] not having trouble [13:21:12] just installing new mountain lion [13:22:11] ah, I thought your reinstall was related with the trouble you were facing the other day [13:22:25] it has been a bit unstable lately yeah [13:22:28] perhaps this install will fix that [13:22:34] mark: do you have any idea when is your next visit to datacenter? [13:22:40] why do you care? [13:23:21] been told some restart is needed on which some reinstall i am impatiently looking forward is depending [13:24:10] wolfsbane? [13:25:41] fuck you libvirt. [13:26:46] or s/libvirt/openstack/ [13:28:22] mark: yup mgmt console [13:30:24] might be this week, might be next week [13:31:59] awesome [13:32:26] paravoid: http://mysql-dba-journey.blogspot.com/2012/07/vmware-vm-default-platform-for-mysql.html ;-) [13:32:49] wtf? [13:33:17] "A key best practice for MySQL is putting it in a VMware VM. Every new MySQL database should be created in a VMware VM." [13:33:20] srsly? [13:33:34] "A VMware VM will take MySQL to new levels of stability, agility, availability and enterprise functionality." [13:33:57] * mark checks the date [13:34:00] people selling off their souls [13:34:23] mark: can I get two machines to install VMware and move all of our dbs there? [13:34:41] we'll have new levels of stability agility availability [13:34:46] AND WE'LL BE ENTERPRISE! [13:36:00] tbh, I'd love to have live migrations with only 2% impact [13:41:05] but will we be webscale? [13:53:37] so what's this "you need 1mb reserved for the boot loader" stuff? [13:53:46] I don't recall seeing this from ubuntu before [13:53:52] (I'm in the partitioner) [13:56:14] I'm not sure what exactly are you referring to, but the initial 1mb reserved is fairly common nowadays [13:56:36] well I am seeing a big fait warnign from the precise installer [13:56:38] it's for alignment with 4K sector disks [13:56:44] do I need to do something about it or can I ignore it? [13:57:04] what does it say? [13:57:25] The partition table format in use on your disks normally requires you to create a separate partition for boot loader code. This partition should be marked for use as a "Reserved BIOS boot area" and should be at least 1 MB in size. Note that this is not the same as a partition mounted on /boot. [13:57:43] If you do not go back to the partitioning menu and correct this error, boot loader installation may fail later, although it may still be possible to install the boot loader to a partition [13:57:45] oh crap [13:57:47] that's EFI [13:57:54] which is what I figured [13:57:59] has no one done a precise install yet? [13:58:05] I thought there had been some [13:58:08] I have and got no such warning [13:58:13] "Go back to the menu and correct this problem?" [13:58:17] so... ? [13:58:26] maybe this is on a newer machine that has EFI while the others didn't? [13:58:29] I have no idea [13:58:44] r510, 3 md1200s and an internal perc h700 with 12 3t disks as well [13:58:44] I have only once messed with EFI and then ran in horror [13:59:08] I have it on my mac, well I did until I wiped it to put macos on it, stupid juniper :-P [13:59:22] it was painful enoough that I dded a copy of the image so I will never have to do it again [13:59:42] I suppose if it failes I can just reinstall, whatever [14:01:02] bah, I guess I didn't set up the logical volumes. oh well that can be later [14:01:10] if it boots :-P [14:12:26] New patchset: Matthias Mullie; "Re-enabled AFT after fixing the db overload issue - https://gerrit.wikimedia.org/r/#/c/16966/" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/16967 [14:14:30] PROBLEM - Puppet freshness on db63 is CRITICAL: Puppet has not run in the last 10 hours [14:15:14] New patchset: Alex Monk; "(bug 38245) Change upload link to Wikipedia:Upload on aswiki." [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/16968 [14:20:22] New review: Matthias Mullie; "Do not submit before https://gerrit.wikimedia.org/r/#/c/16966/ is approved" [operations/mediawiki-config] (master); V: 0 C: -2; - https://gerrit.wikimedia.org/r/16967 [14:42:00] hmm, https:// on wikitech.wikimedia.org is br0ken, "The site's security certificate is not trusted! [14:42:00] " - is it because it's self-signed? [14:42:12] Yeah [14:42:23] Needs it's own ssl certificate, can't use the *.wikimedia cert there [14:42:56] I don't think it really matter there [14:43:11] self signed is fine, or not? :P [14:43:18] indeed [14:43:26] It's not meant for the general public ;) [14:43:39] yeah I only noticed it because I started using https everywhere [14:43:52] and hey, hopefully we'll be killing it off soon anyways :P [14:44:18] I get it via http with httpseverywhere enabled [14:46:26] <^demon> wikitech is still going to exist as a r/o mirror of labsconsole--offsite matters :) [14:50:25] good morning east coast [14:50:40] * jeremyb still needs ops [14:50:53] apergos: want another DNS change? [14:51:26] and, also unrelated: 30 05:59:08 * jeremyb needs help on labsconsole [14:51:29] 30 06:03:12 < jeremyb> https://www.mediawiki.org/wiki/Developer_access#User:Madman is a straightforward request, no prior SVN account. but needs ops to create to edit through titleblacklist [14:51:33] 30 06:03:23 < jeremyb> https://labsconsole.wikimedia.org/wiki/MediaWiki:Titleblacklist [14:54:33] PROBLEM - Puppet freshness on ms-be10 is CRITICAL: Puppet has not run in the last 10 hours [14:55:59] New patchset: Hashar; "new role::bastion::*" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/16954 [14:56:36] New review: Hashar; "Patchset 2 fix issues reported by Nikerabbit" [operations/puppet] (production) C: 0; - https://gerrit.wikimedia.org/r/16954 [14:56:36] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/16954 [14:58:46] jeremyb: trolololol [14:58:46] The user name "Madman" has been banned from creation. It matches the following blacklist entry: .*man.* [14:59:22] Reedy enwiki? [14:59:26] labsconsole [14:59:29] aha [14:59:34] why? :D [15:05:39] hah [15:05:45] they should ban .*a.* [15:06:22] Reedy: i said labs ;) [15:06:31] errr [15:06:37] Reedy: i said *ops* ;) [15:06:46] Why does ops need to do it? [15:06:54] I've got create user rights [15:06:56] Reedy: because they are the only sysops on that wiki [15:07:06] Reedy: so they can edit through blacklist [15:07:10] Oh [15:07:24] you want ops to do what exactly? [15:07:26] unless someone else can that i don't know about [15:07:35] paravoid: create a labs account please [15:08:16] why do we have that blacklist? [15:08:32] so users don't get created with local system usernames [15:09:03] ('man' did exist as a user on the instance I checked) [15:09:37] that's "man", not ".*man.*" though? [15:09:56] idk how titleblacklist works. [15:10:07] me neither :) [15:10:37] anyway, there's a fair number of existing, legit accounts for real humans (not services) that end in man. so it should be fine [15:10:48] New patchset: Hashar; "Overhauling gerrit manifest to be a role class" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/13484 [15:10:58] (just `getent passwd` on labs and search for man) [15:11:24] New review: Hashar; "PS6: cleaned few whitespaces, added some comments. Will rebase next." [operations/puppet] (production) C: 0; - https://gerrit.wikimedia.org/r/13484 [15:11:24] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/13484 [15:12:33] * jeremyb also has an RT waiting for DNS. not urgent but also trivial ;) [15:13:42] good morning rob [15:17:44] jeremyb: which one? [15:18:19] paravoid: erm? RT? [15:18:33] would be nice if there were some method for me to find out the # [15:22:30] New patchset: Hashar; "Overhauling gerrit manifest to be a role class" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/13484 [15:22:33] RT: > subject: create wikimania2013.m.wikimedia.org as a CNAME for m.wikimedia.org. [15:23:09] New review: Hashar; "Restore the Gerrit manifest and rebase to latest production branch." [operations/puppet] (production) C: 0; - https://gerrit.wikimedia.org/r/13484 [15:23:09] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/13484 [15:24:02] jeremyb: There's quite a lot more of those needed [15:24:13] multichill logged a bug about it last night [15:24:18] Reedy: this is the one the localteam complained about [15:24:29] https://bugzilla.wikimedia.org/show_bug.cgi?id=38799 [15:24:34] *click* [15:24:37] A lot of the *.wikimedia wikis are missing it [15:28:52] damnit, i forgot the interwiki prefix ;( [15:28:57] * jeremyb is half asleep ? [15:29:10] (on that bug) [15:33:55] jeremyb: I presume you meant #3343 re: wikimania2013.m; check your mail :) [15:35:12] paravoid: great. i don't know what the appropriate response is in greek ;-( [15:35:15] New patchset: Hashar; "cron to clear gerrit logs" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/16971 [15:35:48] ^demon: can you review https://gerrit.wikimedia.org/r/16971 ? I extracted your log cleanup cronjob from the role class. I believe we want it applied right now :-) [15:35:53] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/16971 [15:36:23] * jeremyb remembers the time apergos and others spent a while reading https://en.wikipedia.org/wiki/Greek_to_me#In_other_languages [15:36:31] paravoid: and the labs account? [15:37:34] let me ask Ryan about the .*man.* blacklist, I propose we should adjust it but should probably ask the person who put it there in the first place :) [15:38:36] paravoid: right, but in the meantime i thought we could just fulfill the request [15:38:51] not sure how… [15:39:03] it should let sysops edit through the blacklist [15:39:08] idk the details exactly [15:39:16] https://labsconsole.wikimedia.org/wiki/Special:CreateAccount [15:39:28] New patchset: Hashar; "Overhauling gerrit manifest to be a role class" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/13484 [15:40:06] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/13484 [15:40:10] New review: Hashar; "Cleaned up the role class from dupe entries already in gerrit.pp." [operations/puppet] (production) C: 0; - https://gerrit.wikimedia.org/r/13484 [15:40:22] well, I meant how to override the blacklist [15:45:25] hrmmm, can has sysop on testwiki ? [15:45:39] Thehelpfulone: ^ [15:45:45] i have a few other places but none of those seem to have active blacklists. testwiki does [15:45:57] the real one or the wmflabs one? [15:46:24] test.wikipedia [15:46:28] not labs [15:46:40] done [15:46:51] although i guess i could have tried labs [15:46:59] yes, danke [15:47:13] np :) [15:49:16] so for edits it just works, doesn't even let you know that it was blacklisted [15:52:07] Change merged: Reedy; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/16968 [15:52:41] Change merged: Reedy; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/16941 [15:53:48] !log Created FlaggedRevs tables on eswiki [15:53:57] Logged the message, Master [15:54:59] Change merged: Reedy; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/16921 [15:56:27] Change merged: Faidon; [operations/apache-config] (master) - https://gerrit.wikimedia.org/r/16742 [15:56:48] Change merged: Reedy; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/16675 [15:58:46] paravoid: for me there's 2 extra checkboxes as sysop on testwiki. idk if you get them on labsconsole [15:58:51] > Ignore the blacklist [15:58:56] > Ignore spoofing checks [15:59:43] New review: Reedy; "Why?" [operations/mediawiki-config] (master); V: 0 C: 0; - https://gerrit.wikimedia.org/r/16935 [16:01:20] Reedy, wasn't that supposed to be eswikibooks..? [16:02:00] I'm about to run sync-apache for the first time [16:02:07] Blah [16:02:10] I wonder if I'll finally break the site [16:02:12] !log Created FlaggedRevs tables on eswikiboooks [16:02:19] Logged the message, Master [16:02:22] boooks :D [16:04:28] paravoid: this is shorturl? [16:05:26] yes [16:05:50] RECOVERY - Apache HTTP on mw8 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.037 second response time [16:05:58] !log sync-apache/apache-graceful-all for RT #2121 [16:06:06] Logged the message, Master [16:06:29] Reedy: so, en.wp.org/s/foo redirects to /wiki/Main_Page [16:06:32] is that the intended behavior? [16:06:39] test2.wp.org works though [16:07:13] Only testwiki/test2wiki have shorturl enabled [16:07:16] I know [16:07:36] the question is: for the non-enabled wikis, is redirect to Main_Page the intended behavior? [16:07:46] Yeah [16:07:51] I think we have some weird catch all type things for this [16:09:10] Ok, so I can enable it on all the requested wikis [16:09:18] The question is then later if we enable it "everywhere" [16:09:43] But then we'll have people complaining about the urls under the title.. [16:11:20] New patchset: ArielGlenn; "settings for high network usage for ms10" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/16972 [16:11:59] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/16972 [16:12:26] I can't reach http://education.wmflabs.org/wiki/ for some reason. Can SSH into the box though, and apache appears to be running. Reboot did not fix the issue. Anyone an idea what might be going on? [16:15:17] New review: Hashar; "I have notified Ryan about this change so he can get a peak at it." [operations/puppet] (production) C: 0; - https://gerrit.wikimedia.org/r/16971 [16:15:30] off [16:15:33] Change merged: ArielGlenn; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/16972 [16:15:35] will be back later tonight [16:21:47] paravoid: I wonder if we should pre-populate the table for shorturl [16:22:01] I seem to recall someone saying not to, for some reason, but no idea who or why.. [16:22:01] what table? :) [16:22:44] I only very recently became involved with that ticket, so I don't know much I'm afraid [16:23:10] Which RT # is it? I don't know if iut's on there [16:24:46] it's not [16:24:55] it was RT #2121 [16:24:57] meh, won't bother then ;) [16:25:48] let's install it on a few places then [16:25:55] I'd be happy to help [16:26:49] !log Add ShortUrl table to hiwiki, orwiki and tawiki [16:26:57] Logged the message, Master [16:28:02] tbh, I'm not sure I see the whole point of ShortUrl [16:28:16] Give me a minute, and I'll show you :p [16:28:27] since it's per-wiki and you have to write the long hostname anyway [16:28:52] https://hi.wikipedia.org/wiki/%E0%A4%A1%E0%A5%8B%E0%A4%B2%E0%A4%BE,_%E0%A4%A8%E0%A5%88%E0%A4%A8%E0%A5%80%E0%A4%A4%E0%A4%BE%E0%A4%B2_%E0%A4%A4%E0%A4%B9%E0%A4%B8%E0%A5%80%E0%A4%B2 [16:28:53] vs [16:28:58] https://hi.wikipedia.org/s/s [16:29:12] heh [16:29:19] url encoding nightmares [16:29:21] Now do you see? :D [16:29:39] so for folks who might see the efi warning, I have rebooted a few times and it's been fine (without some fanc y1mb extra partition etc) [16:29:52] apergos: add it to preseed? [16:30:06] paravoid: well, my blacklist may be stupid, for all I know [16:30:15] it obviously is blocking legitimate users :) [16:30:15] Reedy: to be fair, el.wikipedia.org/wiki/Τρελαντώνης will work too [16:30:25] Ryan_Lane: good moorning :) [16:30:28] Sure [16:30:32] sometimes it will work [16:30:38] https://labsconsole.wikimedia.org/wiki/MediaWiki:Titleblacklist [16:30:45] I guess I've seen people in irc where their client doesn't like the unicode links [16:30:52] (though they can read and write unicode itself just fine) [16:31:24] all of those with ^$ should likely work [16:31:24] paravoid: I dunno which hosts are going to have that issue, not sure how to set it up [16:31:39] !log Created ShortUrl table on numerous tawiki* wikis [16:31:47] Logged the message, Master [16:31:49] apergos: set it up on the global config maybe? [16:31:53] the force yes [16:32:26] the "force yes" :) [16:32:28] good morning Ryan_Lane! [16:32:50] Ryan_Lane: does it need a User: prefix though ? [16:33:04] the docs for this extension suck [16:33:14] but, not really [16:33:22] I'm mostly trying to block for shellaccountname [16:33:32] my brief testing on testwiki using exist rules (not editing the list at all) makes me think it needs User: [16:33:51] right and shell name can be different anyway [16:34:06] though it would be nice if it blocked for both [16:34:12] can you just give acct creators rights to edit through blacklist? [16:34:20] no [16:34:33] just incase they make a poor decision [16:34:44] there's so few of us! [16:34:47] ;P [16:34:53] anyway, what do you want to do [16:34:54] ? [16:35:07] I changed it to this: https://labsconsole.wikimedia.org/wiki/MediaWiki:Titleblacklist [16:35:37] Ryan_Lane: 30 16:33:32 < jeremyb> my brief testing on testwiki using exist rules (not editing the list at all) makes me think it needs User: [16:35:42] existing* [16:36:15] so ^User:nagios$ [16:39:19] ok. I'll double up the rules, then [16:39:47] because it can't for shell account name [16:40:28] (User:)? <— would that work? [16:40:58] should [16:41:17] speaking of which i owe you an apache test ;) [16:42:20] heh [16:42:30] wow, wi.ki is being sold in a bid [16:42:38] min. bid $50k [16:43:02] en.wi.ki would probably blow the bank then [16:43:12] username blacklist should be more sane, now [16:43:43] i guess that means i should try to make him again [16:43:52] Ryan_Lane: you forgot the last 4 entries ) [16:43:58] er, 5 [16:44:00] bah [16:44:13] well, they're on User: list above [16:44:14] i figured that was intentional but didn't know why exactly [16:44:19] just delete those? [16:45:15] yeah, they are repeats [16:46:03] A randomly generated password for Madman has been sent to ... [16:46:07] ;) [16:46:12] yay :) [16:46:26] JeroenDeDauw: what's the name of the instance? [16:47:32] wep [16:47:50] Reedy: wpa. [16:55:32] oh it's not efi after all [16:55:34] partman-partitioning/no_bootable_gpt_biosgrub [16:55:51] there's a diferent warnign for efi, named partman-partitioning/no_bootable_gpt_efi [17:04:16] Reedy, how do you make a short URL then? [17:09:57] jeremyb: what Reedy said [17:18:45] anyone available for a quick review/merge? https://gerrit.wikimedia.org/r/#/c/16858 [17:19:42] notpeter: can you approve https://gerrit.wikimedia.org/r/#/c/16858/1/manifests/site.pp [17:20:42] looking. [17:21:03] Change merged: Faidon; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/16858 [17:21:28] done and live in production [17:21:34] paravoid: thanks! [17:22:12] let me force run puppet on wlm too [17:22:19] thanks paravoid, you just read my mind. [17:22:35] New patchset: Cmjohnson; "removing transcode1 and adding wtp1 to dhcpd file." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/16983 [17:23:08] hm, I can't seem to able to login there [17:23:12] has it properly provisioned? [17:23:14] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/16983 [17:23:38] RobH: per https://rt.wikimedia.org/Ticket/Display.html?id=3221 we need to force-run puppet on wlm.wikimedia.org. paravoid is trying but apparently cannot log in there [17:24:41] puppet has not been run [17:24:49] that's the issue, it's still in the provisioning state [17:24:57] but I can continue that [17:25:48] ha hm [17:26:52] New patchset: Platonides; "Remove the skipcaptcha right from autoconfirmed users in labs, for easy captcha testing." [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/16866 [17:26:53] New patchset: Faidon; "wlm.wm.org is a CNAME, use yttrium instead" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/16984 [17:27:24] anyone know ldap tricks for labs? how do i list or query properties of hosts? [17:27:29] ldaplist doesn't seem to work [17:27:31] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/16984 [17:27:36] Change merged: Faidon; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/16984 [17:27:37] or i'm using it wrong [17:29:21] New patchset: Bhartshorne; "amending swift's rewrite.py to allow it to talk directly to mediawiki image scalers instead of ms5" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/16821 [17:29:25] awjr: eh, there's not misc::wlm! [17:30:01] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/16821 [17:30:46] paravoid ! [17:31:28] awjr: 2585 » » misc::wlm [17:31:50] but where is it defined? :) [17:32:06] should be in mis-servers.pp [17:32:25] it's not [17:32:36] it got merged in https://gerrit.wikimedia.org/r/#/c/16755/ [17:33:02] and reverted by mark [17:33:03] it says it was reverted [17:33:07] https://gerrit.wikimedia.org/r/#/c/16812/ [17:33:21] Mark Bergsma Jul 27 [17:33:22] Patch Set 3: Reverted [17:33:23] This patchset was reverted in change: Ib71c6cd4fe64b3052c4132e692f51723be0445a7 [17:33:39] Revert "Adds WLM api host config in misc-servers" [17:33:39] Not merged on sockpuppet, not properly in misc/ dir, missing cron job file. [17:33:40] This reverts commit 8d2dedf85003bb44114a57e9143e13aa2fa0a20a [17:33:54] awjr: ^^ [17:34:01] thanks preilly [17:34:22] why was that reverted ? [17:34:33] LeslieCarr: see above [17:34:35] oh that's why [17:34:37] *ahem*, post commit review anyone? ;) [17:34:52] so, i can undertsand the cron job file but 'not properly in misc/ dir'? [17:35:11] we initially had a separate file in the misc dir but were then told to put the manifest in misc-servers.pp [17:35:18] hehehe [17:35:28] awjr: back to separate file in misc dir ;) [17:35:40] yea we are moving away from using the misc-servers.pp [17:35:45] was my understanding [17:35:57] i see [17:36:16] lesliecarr: can you merge my change https://gerrit.wikimedia.org/r/16983 [17:36:18] grmbl, my inital patchset had that separate file [17:36:38] MaxSem: well it's a simple copy and paste to bring it back [17:36:50] cmjohnson1: was removing transcode1 on purpose ? [17:36:53] MaxSem: you just need to add a cron job and it's all good [17:37:13] New patchset: Bhartshorne; "amending swift's rewrite.py to allow it to talk directly to mediawiki image scalers instead of ms5" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/16821 [17:37:14] yes...it is not wtp1 [17:37:21] same machine [17:37:25] preilly, I thought we will have that .sh file executed by cronjob in the WLM repo [17:37:28] now* ? [17:37:53] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/16821 [17:38:00] Change merged: Lcarr; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/16983 [17:38:02] MaxSem can you recommit that stuff for review? and perhaps until the cronjob script is done, just remove it from the manifest. we can add it back once we have the script ready [17:38:06] parsoid. took a while to remember cmjohnson's abbreviation :) [17:38:23] awjr, okie [17:38:28] thanks MaxSem [17:39:14] but what does "Not merged on sockpuppet" mean? [17:39:28] MaxSem: that it was merged in gerrit but not on the puppetmaster [17:39:49] MaxSem: which is a Bad Thing [17:41:18] oh those Puppet crufts! [17:41:51] should I just revert https://gerrit.wikimedia.org/r/#/c/16812/ and then amend, ot start a new patchset? [17:42:25] new i think [17:42:49] !log re-adding srv194 to pmtpa apaches pool for basic testing of precise [17:42:57] Logged the message, notpeter [17:45:44] * jeremyb wonders if LeslieCarr wants to look into the problem with wep.pmtpa.wmflabs ? i would dig more if i could figure out how to query more [17:46:46] * LeslieCarr hides [17:47:19] * preilly throws cloak over LeslieCarr  [17:47:30] $ for ip in 208.80.153.213 10.4.0.38; do curl -sv -H 'Host: education.wmflabs.org' http://"$ip"/ 2>&1 | fgrep 'Trying '; done [17:47:33] * Trying 208.80.153.213... Connection refused [17:47:35] * Trying 10.4.0.38... connected [17:47:46] * jeremyb runs away [17:48:18] notpeter: oh, I was going to look at it too [17:48:26] * preilly forgot to use the invisibility cloak and instead used the _____________ one [17:48:42] jeremyb: ok, so .213 is associated with the proper labs instance ? [17:48:42] preilly: which is that? [17:48:48] LeslieCarr: no idea! [17:48:53] jeremyb: no idea [17:48:58] paravoid: look at what? [17:49:01] notpeter: there are two issues I can immediately see: a) wikidiff, b) php5 is a lower version than what Ubuntu ships and puppet complains about the downgrade [17:49:06] precise appservers [17:49:08] LeslieCarr: i can't get labsconsole or ldap to talk to me [17:49:13] ah [17:49:16] LeslieCarr: proabably helps to be a sysop ;) [17:49:17] yeah, it needs pinning [17:49:23] did that by hand to test [17:49:25] it is pinned [17:49:28] jeremyb: try taking labsconsole out to dinner first [17:49:28] that's not the issue [17:49:35] sorry, it=php5 [17:49:52] I don't think it'll make a difference. [17:49:52] LeslieCarr: whoa, how? [17:49:59] why is that? [17:50:05] it's already pinned via the repo pin [17:50:07] LeslieCarr: we could print a unicorn and stick it in a chair? [17:50:12] jeremyb: yes!!! [17:50:15] also, where is it complaining? [17:50:25] LeslieCarr: i'm not optimistic that will help [17:50:29] notpeter: also, the 7GB / is not enough in general, but esp. with precise it's almost immediately 100% full [17:50:43] notpeter: I'm going to look at it as well, see the ticket about srv281 [17:50:50] jeremyb: so, is labsconsole not showing any of the instances on an instance list for you ? [17:50:51] notpeter: #3336 [17:50:58] notpeter: unless you want to (I don't mind either way) [17:51:01] jeremyb: also, which project is it in ? [17:51:21] LeslieCarr: it is. i just can't figure out how to get a hostname or public IP listing for projects I'm not a member of [17:51:31] LeslieCarr: https://labsconsole.wikimedia.org/wiki/Nova_Resource:I-000000c2 [17:52:22] paravoid: look at preseed.cfg I used a different disk setup for srv194 which i was testing on [17:52:28] jeremyb: hrm, i think you need to be a sysadmin of everything for that ... [17:52:32] could probably just swap that out for all apaches at this point [17:52:57] LeslieCarr: i was thinking they should just be normal SMW vars like everything else [17:53:16] paravoid: did you reinstall srv281? [17:53:37] notpeter: I did once, still with the old partioning [17:53:38] (it was probably down when I pulled a switcharoo with all the apache disks [17:53:42] ah, gotcha [17:53:50] well, I saw your script [17:53:54] yes [17:53:57] but I want to fix preseeding instead [17:53:58] Thehelpfulone: you don't, mostly. Visit a wiki with it enabled, it'll appear under the title [17:54:00] yes [17:54:10] we can just use the new preseed conf for all apaches at this point [17:54:28] which one? [17:55:02] i was using mw.cfg [17:55:12] it's mostly / and some /tmp [17:55:38] unless you think a different partion would be better [17:55:39] oh great [17:55:43] jeremyb: hrm… so checking , i think it might be firewall related since it's in the "Default" security group [17:56:06] ?? [17:56:19] notpeter: we discussed a new partioning scheme with mark (which I was going to work at) and actually that's exactly what we agreed on! [17:56:26] yay! [17:56:27] a big /, a big /tmp and swap [17:56:28] LeslieCarr: well it can't be security group i think because it worked at one time and security group changes don't effect existing instances (only new ones) [17:56:36] I couldn't tell if that was since oh great or not ;) [17:56:37] LeslieCarr: and it did once work [17:56:45] it was! [17:56:49] excellent [17:56:59] jeremyb: really ? why did it have to work once before… grrr [17:57:04] it's also great since that means I don't have to do anything! :) [17:57:11] hehheheh indeed [17:57:23] ok, I'll set all apaches to use that partitioning scheme [17:57:29] yay [17:57:33] now, what's wrong with the packages? I didn't quite get it [17:57:35] LeslieCarr: i'm pretty sure... i've seen the url advertised before. don't know 100% if i tried it myself and it worked. maybe reedy remembers [17:57:43] (I'm kinda sick, so I'm kinda dumb today...) [17:57:47] notpeter: fyi with apaches, srv281 is on an old partitioning scheme and failing [17:57:57] yep [17:58:02] yes, that's what we were just discussing :) [17:58:18] oh [17:58:19] wait [17:58:21] yes you were [17:58:25] * LeslieCarr hides again [17:58:52] PROBLEM - Puppet freshness on mw60 is CRITICAL: Puppet has not run in the last 10 hours [17:59:30] hahaha [17:59:34] heyyyyy guys [18:00:27] jeremyb: so i added myself as a sysadmin person but still have no access to the box via the bastion -- do you have an idea why that would happen ? [18:00:39] oh meeting time, bbiab [18:00:48] meeting time eh? [18:01:10] this is my first meeting! [18:01:17] lemme know how to call in [18:01:31] do you have your office account set up? [18:02:08] think so [18:02:17] LeslieCarr: huh. you're using bastion-restricted, right? [18:02:25] LeslieCarr: anyway, can chat later but will be off+on [18:02:36] ottomata: call 2002. [18:02:45] ohhh [18:02:51] then no, thought you meant office.wm.org [18:03:54] argh [18:03:56] sec [18:03:58] jeremyb: hrm, yeah [18:04:09] ottomata: ops meeting ? are you joining ops? ;) [18:04:16] yes! [18:04:24] really ? cuz that would be awesome [18:04:26] if I can figure out how to join the meeting [18:04:33] that might be a prereq [18:04:35] Can someone run this as root please (on fenari)? rm -rf /home/wikipedia/common/php-1.20wmf4 [18:04:44] ottomata: call the office line [18:04:46] needs secret handshake ;) [18:04:48] then extension 2002 [18:05:10] what's the office line…searching [18:05:20] wikimediafoundation.org should have it [18:05:25] ottomata: pm'ed you the office phone nmber [18:05:34] and many people's footers ;) [18:05:34] New patchset: Pyoungmeister; "mark all apaches to use mw.cfg" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/16986 [18:05:45] Reedy: done [18:05:52] thanks [18:06:13] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/16986 [18:06:52] Change merged: Pyoungmeister; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/16986 [18:07:03] thanks LeslieCarr, I'm in! [18:09:19] btw, i've been getting some intermittent white screens of death on labsconsole [18:10:53] New patchset: Alex Monk; "(bug 38690) Finish removal of editor and reviewer groups." [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/16987 [18:15:58] jeremyb: :( [18:16:05] jeremyb: likely due to the same damn bug [18:16:12] Ryan_Lane: which? [18:16:17] let me see if I can get a script for purging old instances [18:16:26] scaling issues with the version of nova we're using [18:16:40] hah, we're not even too big! [18:17:11] there's probably deployments with 10k times as many instances [18:25:02] jeremyb: we're using a broken version [18:25:20] also, we're one of the few deployments using diablo, rather than essex [18:25:26] which has the scaling issues fixes [18:25:27] !log removing srv194 from apaches pool due to logging issues [18:25:31] *fixed [18:25:35] Logged the message, notpeter [18:25:44] also, they apparently purge deleted instances [18:26:00] paravoid: have you looked at https://github.com/varnish/libvmod-example#readme [18:26:35] preilly: you mean if I've looked at the readme or at vmods? [18:26:56] (in ops meeting, might lag a little) [18:27:50] * jeremyb runs away again [18:28:42] paravoid: I meant have you looked at that example vmod [18:29:06] no, but I looked at the openddr one [18:29:17] paravoid: okay the weather channel one [18:29:49] yes [18:30:44] it's fairly straightforward from what I saw [18:31:03] paravoid: yeah it sure looks to be [18:31:44] I'm not sure if it's worth it for the 5 lines of inline C that you have for mobile [18:31:48] although it looks cleaner [18:32:01] it certainly would make sense IMO for our geoip stuff [18:34:27] paravoid: yeah I also wonder if it works with the varnish workspace better [18:34:46] paravoid: e.g., if it's a bit more performant [18:36:07] I don't think it'll make a difference performance wise [18:36:13] they're both compiled, so... [18:37:46] New patchset: Pyoungmeister; "adding mc.cfg for mc hosts" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/16989 [18:38:29] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/16989 [18:40:52] New patchset: Hashar; "Overhauling gerrit manifest to be a role class" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/13484 [18:41:26] New review: gerrit2; "Change did not pass lint check. You will need to send an amended patchset for this (see: https://lab..." [operations/puppet] (production); V: -1 - https://gerrit.wikimedia.org/r/13484 [18:49:31] New patchset: MaxSem; "Wiki Loves Monuments API server, RT#3221" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/16990 [18:49:56] preilly, awjr ^^ [18:50:08] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/16990 [18:50:12] * preilly ** clicking ** [18:51:10] thanks MaxSem [18:53:12] paravoid: maplebed https://minus.com/mbXZMC8PO/ [18:53:56] Reedy: did you enable shorturl on tawiki? [18:54:01] also, in meeting atm. [18:54:19] https://ta.wikipedia.org/s/fc and http://ta.wikipedia.org/s/fc WFM [18:54:48] paravoid, notpeter, LeslieCarr: can one of you take a look at https://gerrit.wikimedia.org/r/#/c/16990/1 ? [18:54:57] hi Reedy [18:55:19] Change merged: Pyoungmeister; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/16989 [18:55:22] yuvipanda: try it again now.. [18:55:53] Reedy: works :) [18:55:58] what changed? [18:56:05] I purged it [18:56:14] ah [18:56:26] i guess someone might've tested with 'fc' before it went live? [18:56:30] i had to do that some with the czech redirect [18:56:35] possibly [18:57:19] tamil [18:57:36] Reedy: thanks :D [18:58:47] LeslieCarr: can you down the gig-e on mc1 first when you get a chance? want to test image it [19:00:19] wtf, who chose the domain name for london olympics?! [19:00:33] london2012.com [19:00:54] jeremyb: i guess london will shut down after the olympics are done for the year :) [19:00:57] * jeremyb wonders what's wrong with 2012.olympic.org [19:01:53] ops still meeting? let us know when it's done? [19:04:54] yes it still is [19:25:52] New review: Platonides; "So that http://test.wikipedia.beta.wmflabs.org/ works" [operations/mediawiki-config] (master) C: 0; - https://gerrit.wikimedia.org/r/16935 [19:28:00] New patchset: Platonides; "Fatal error at http://test.wikipedia.beta.wmflabs.org/ There's no extensions/E3Experiments/Experiments.php" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/16875 [19:30:00] paravoid: when you have a spare cycle or two can you take a look at https://gerrit.wikimedia.org/r/#/c/16990/1 for MaxSem [19:30:47] mark: this addresses a revert of yours, want to look at it or should I just merge it? [19:31:11] mark: could you also review this change [19:31:32] can I do these tomorrow? [19:31:46] i need to get my laptop back up [19:32:54] mark: I know that we wanted to get this change reviewed as soon as possible but I think it's okay to wait until tomorrow [19:33:03] New patchset: MaxSem; "Wiki Loves Monuments API server, RT#3221" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/16990 [19:33:42] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/16990 [19:34:02] notpeter: so, apache/precise; want me to fix packages and stuff or are you handling it? [19:42:08] hey opsies, q that I thought about in the meeting [19:42:16] when you guys were talking about a new bastion host [19:42:32] why not VPNs instead of bastion hosts? [19:46:15] * jeremyb tends to prefer bastion i think [19:46:23] ottomata: do you use proxycommand? :) [19:46:59] in .ssh/config? [19:47:06] yes [19:47:08] yeah [19:47:10] and it is fine [19:47:18] but key forwarding has never worked for me [19:47:31] which means I can't copy files between servers internally [19:47:41] or ssh between them at all [19:47:44] which is super annoying [19:47:45] * jeremyb has never even *tried* key forwarding. perfectly happy without it. but i guess you need it for deploys and dsh [19:49:29] otoh, salt may fix that? [19:49:39] salt? [19:49:41] * jeremyb wonders what the salt forecast is like [19:49:51] ottomata: the new deployment system [19:50:01] ottomata: saltstack.org [19:51:07] kinda like capistrano? [19:51:11] i think what it does I can do with puppet [19:51:35] yea we ahve looked at it as a puppet replacement [19:51:40] as its not ruby. [19:51:43] orly [19:51:51] i thought it was primarily for deployment [19:51:51] its still an ongoing discussion [19:52:00] we're not looking it as a puppet replacement as far as I know [19:52:01] iirc it can push configs and the like [19:52:07] ahh it does more, that looks cool though [19:52:22] but rather use it for the deploymnet system, because it has push capabilities [19:52:23] paravoid: well, in that we prolly wont replace puppet anytime soon [19:52:24] paravoid: it's definitely been discussed but i didn't think it was being seriously considered [19:52:31] unless there's a Ryan secret plan behind that :) [19:52:36] it reminds me of capistrano and fabric [19:52:41] which can be used like puppet [19:52:52] but the big diff is that they are command based, rather than declartive, right? [19:52:58] puppet says "this is how the system should look" [19:53:10] these types of things say "Here is how to make the system look like something" [19:53:17] so its more like scripting [19:53:23] I'm not sure if Salt is like that [19:53:29] oh maybe not [19:53:31] I think it's both, although I've never used it/played with it [19:53:34] i've looked at it for about 1.5 minutes now [19:53:39] that would be cool [19:53:44] mabye like puppet + mcollective? [19:53:47] all in one? [19:54:09] or just deployment+mcollective and still use pupppet for pupppet [19:54:14] puppet* [19:54:21] the problem with any puppet replacement is our ever growing and maturing puppet repo ;] [19:54:30] (one of the problems) [19:55:18] what's the problem with puppet? (I wasn't aware that there was one.) performance? [19:55:26] thats one of them [19:55:35] plus i dont think anyone likes ruby. [19:55:49] because they don't like the syntax? [19:55:57] i don't hate ruby. i hate the way ruby evolves [19:55:57] or because of performance? [19:56:08] ottomata: we certainly have performance issues [19:56:15] because they make breaking API changes in minor releases [19:56:29] though it may be a combination of puppet itself, and how we use it. [19:56:51] aye but that's not a ruby problem, right? [19:56:56] but puppet has handled that pretty well i think [19:57:27] jeremyb's ruby objection is valid, and performance is a valid concern, but what's wrong with poor little ruby? [19:58:30] ops has no ruby programmers. [19:58:35] aaaanyway, um, wait, my q was, and maybe RobH knows [19:58:52] yeah but it is just a language, it isn't hard [19:58:55] q was [19:58:56] heh [19:59:00] why bastion hosts over a VPN? [19:59:10] or VPNs, plural? [19:59:15] ? [19:59:18] we dont run any vpns [19:59:24] i know, wondering why not [19:59:29] ottomata: maybe you don't know about this part: ops dealt with ruby and hated it and moved off. mobile was entirely ruby i think for a while and then the ruby was chucked overboard [19:59:35] bastion hosts + ssh proxying seems a little annoying [19:59:46] jeremyb makes a valid point, one that i have purposefully blocked from memory [19:59:48] ottomata: (as in *.m.wikimedia.org) [19:59:48] jeremyb: yeah i don't have the historical context [19:59:50] the old mobile gateway ;] [19:59:53] RobH: haha [20:00:04] it was rails? [20:00:13] idk if it was rails. it was ruby [20:00:18] was it a website? [20:00:24] it parsed the desktop version and rewrote on the fly [20:00:32] and was a spof [20:00:38] and not that well documented [20:00:44] and not stable. [20:00:48] not really... there were ~3-4 mobile hosts? [20:00:53] (re spof) [20:00:53] ha, hm, is that ruby's fault? [20:00:57] or just the app's fault? [20:00:57] well, we grew to that yes [20:01:03] but it started with less [20:01:15] i don't remember that early i guess [20:01:15] or i guess i may be confusing with older wap [20:01:21] it was a long time ago =P [20:01:25] ekrem? ;) [20:01:37] ottomata: so we have just always used simply ssh tunneling into cluster for 'vpn' [20:01:37] wub wub wub, ruby schmooby [20:01:45] aye [20:01:49] mostly because it works, ssh is already running, and no one has to do any extra work for it [20:01:50] do your ssh keys get forwarded? [20:01:58] you have to add extra boxes though [20:02:06] just to get new groups of people access [20:02:21] until recently that just simply did not happen [20:02:34] we had a very limited number of shell users [20:02:51] until a year ago the only bastion host was fenari [20:03:03] everyone simply came in through there [20:03:39] ottomata: So the point you are raising is with the number of groups being added that all require a bastion host, why isn't VPN being used as an alternative [20:03:53] I suppose with a single bastion and rulesets for the vpn to forward out to the proper network? [20:04:04] cuz something has to be the vpn server [20:05:03] and the differing bastion groups are usually due to them being in segregated vlans, so i guess that vpn server would need physical runs into those networks [20:05:25] (am i on a wrong track?) [20:05:31] what's the new bastion for? FR? [20:05:40] analytics i think [20:05:51] oh, huh [20:06:10] we have segregated networks for fundrasing, labs, and analytics [20:06:19] outside the normal cluster clans [20:06:21] vlans [20:06:24] right [20:06:43] yes, that's kinda what I'm thinking, and yes, hm, i guess if the networks are all separate, that is a problem [20:06:49] but are they all that separate? [20:06:51] right now analytics uses a single host they have in tampa as their bastion is my understanding, so they need an eqiad one, and one specifically in their vlan [20:07:02] ottomata: cuz folks within those networks want root and such [20:07:19] and we have to wall them off as they may have root or advanced rights that they are not allowed to have on the main cluster [20:07:34] segregating them off is one of the easier ways to prevent them from escalation of rights into other systems [20:07:35] right…but root is just per machine? [20:07:50] how would root on one machine help them get access to another? [20:08:11] it was explained to me once, though i honestly dont recall atm [20:08:17] hm, ok [20:08:23] im working on 3 hours sleep [20:08:31] i think ssh as root is enabled on machines [20:08:35] which is pretty nasty imo [20:08:36] and all mornign on airplanes, trains, and metro [20:09:18] aye, welp welp welp, just curious [20:09:30] i think i wouldn't care either if key forwarding worked for me [20:09:30] no worries, wish i had more answers =] [20:09:31] ahhh well [20:09:32] thank you! [20:09:36] get some sleep! [20:09:47] still eating my delivery food [20:09:50] then naptime =] [20:14:09] ottomata: /me is happy to come troubleshoot key forwarding with you some time [20:15:21] paravoid: I think that to make sure that the boxes get the right php5 packages, can just Pin: release o=Wikimedia similar to wikidiff [20:15:24] will that work? [20:15:27] is there anything else that's needed? [20:15:46] (sorry, was walking home. am sick. don't want to get the rest of the office sick) [20:16:12] notpeter: ugh [20:16:31] * jeremyb serializes some OJ for notpeter [20:16:38] hehehe thank you! [20:27:27] notpeter: Hey that cron spam of mine, did that start this weekend by any chance? If so, it's not really my fault, it's you guys' fault for disabling the wrong extension to avert a DB meltdown [20:28:42] I think it started soetime within the last 24 hours [20:29:55] I deleted them, so I'm not fully sure [20:35:14] PROBLEM - Puppet freshness on srv281 is CRITICAL: Puppet has not run in the last 10 hours [20:37:02] notpeter: I think pinning will not work, but try it nevertheless [20:37:09] notpeter: also, the wikidiff issue is totally separate [20:37:47] hmm, there's a user group that was recently (?) added to Commons, Upload Wizard campaign editors [20:37:49] https://commons.wikimedia.org/wiki/Commons:Upload_Wizard_campaign_editors [20:38:07] Thehelpfulone: and? [20:38:08] ;) [20:38:16] I'm getting to it jeremyb! [20:38:18] are we planning to roll out the upload wizard to places other than commons? [20:39:32] Thehelpfulone: seems like totally not an ops question [20:39:55] binasher: hey, slight bump on the OTRS RT i sent [20:40:11] Well I was going to wait for the answer before I asked my follow up, but Upload Wizard campaign editors [20:40:11] (list of members) [20:40:11] Configure Upload Wizard campaigns (upwizcampaigns) is on foundation wiki now [20:40:28] oh [20:40:30] huh [20:40:32] binasher: I wonder, have we settled on igbinary and whatnot for the memcache deployment we were discussing? [20:41:29] LeslieCarr: so, what's the deal with that instance's public IP? any ideas? should I try someone else? [20:41:51] paravoid: I think that pinning will work. that's what I did on srv194 and it worked. let me try [20:41:56] what is the wikidiff issue? [20:42:44] package ships /usr/lib/php5/2009.../wikidiff2.so while php.ini references /usr.../php_wikidiff2.so [20:43:04] I think (but I'm not sure) this is a result of my packaging changes that landed with the precise packages [20:43:05] arg. damnit [20:43:19] I can fix that if you prefer :) [20:43:24] wha tis your recommendation for fixing? [20:43:36] I haven't investigated yet, so I don't know [20:43:58] I remember being annoyed by this sometime last year.... [20:44:08] as I said before, I had "apache/precise" on my TODO, unaware that you were going to work at it [20:44:15] so it may not be something that I did [20:44:24] New review: Ryan Lane; "Inline comments." [operations/puppet] (production); V: 0 C: -1; - https://gerrit.wikimedia.org/r/16990 [20:45:04] paravoid: I can just change it in php.ini, I believe [20:46:17] hhmmm... maybe not [20:51:36] New patchset: Pyoungmeister; "pinning php5-common to wikimedia version" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/17002 [20:51:46] !log adding enwiki.aft_article_feedback af_user_id_ip index [20:51:54] Logged the message, Master [20:52:17] New patchset: Hashar; "Overhauling gerrit manifest to be a role class" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/13484 [20:52:49] jeremyb: hey [20:52:49] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/17002 [20:52:50] New review: gerrit2; "Change did not pass lint check. You will need to send an amended patchset for this (see: https://lab..." [operations/puppet] (production); V: -1 - https://gerrit.wikimedia.org/r/13484 [20:52:55] paravoid: so I htink that this will work to get us the correct php packages: https://gerrit.wikimedia.org/r/#/c/17002/ [20:52:56] LeslieCarr: yah [20:53:38] jeremyb: checking now - i previously hadn't been able to get into the instance [20:53:46] notpeter: that's a noop. [20:54:00] actually we can get rid of that line completely [20:54:21] we already pin with 1001 all packages that are release o=Wikimedia [20:54:32] hrm, ok [20:54:40] jeremyb: weird, I can't look at iptables [20:54:40] root@srv192:~# cat /etc/apt/preferences.d/wikimedia.pref [20:54:40] Explanation: Prefer Wikimedia APT repository packages in all cases [20:54:41] Package: * [20:54:41] Pin: release o=Wikimedia [20:54:41] Pin-Priority: 1001 [20:54:43] root@srv192:~# cat /etc/apt/preferences.d/php-wikidiff2 [20:54:44] paravoid: i believe so, tim found a 4x performance improvement using igbinary over php serialization and supports using it [20:54:45] Package: php-wikidiff2 [20:54:47] LeslieCarr: i didn't try getting into the instance but i also didn't find it terribly relevant. unless there was maybe some local iptables [20:54:48] Pin: release o=Wikimedia [20:54:50] LeslieCarr: huh? [20:54:50] Pin-Priority: 1001 [20:54:55] see how the php-wikidiff2 is a noop? [20:54:58] LeslieCarr: are you sysadmin on the project? [20:55:03] root@i-000000c2:/var/log# iptables -nL [20:55:03] and the pin works, that's why apt-get -f install tries to downgrade [20:55:03] -bash: /sbin/iptables: cannot execute binary file [20:55:27] hrmmm [20:55:27] it seemed to make a difference when I added it by hand on srv194, but I believe you that that might have been coincidence [20:55:32] hah [20:55:36] LeslieCarr: let me try [20:56:08] wtf [20:56:09] thanks [20:56:10] paravoid: so is the answer to just up the version numbers? [20:56:18] maybe it was corrupted on migrate or something? [20:56:23] jeremyb@wep:~$ sudo su - [20:56:23] su: error while loading shared libraries: /lib/libpam_misc.so.0: invalid ELF header [20:56:38] notpeter: well, that's a possible solution [20:56:41] time for a dpkg sanity check? [20:56:48] jeremyb: OUCH. [20:56:51] yes, that's corrupt [20:56:58] which VM is that? [20:57:08] paravoid: wep.pmtpa.wmflabs [20:57:10] Ryan_Lane: was it one of the VMs you attempted to block migrate? [20:57:41] and now the VM is not responding? [20:57:43] yep [20:57:50] it's corrupted [20:58:07] no console output either [20:58:10] does any data need to be rescured from it? [20:58:10] is it one of the ones you attempted to block migrate or are we looking at something else here? [20:58:15] block migrate [20:58:17] corrupted [20:58:27] it was in the list I sent out [20:58:56] yup, i see it on the list in the email [20:58:58] ah no wonder it's so messed up ... [21:02:40] paravoid: are you willing to fix up the packaging for precise apaches and I'll keep going with the puppet manifests? [21:02:52] I am [21:02:53] cmjohnson1: is mc1 in the 0/0 spot on asw2-d3 ? [21:02:58] cool! thank you :) [21:03:02] to clarify, you mean wikidiff too, right? [21:03:17] yes [21:03:23] lesliecarr ^^ [21:03:28] Change abandoned: Pyoungmeister; "per paravoid's explanation, this is a noop" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/17002 [21:03:40] correct [21:10:10] paravoid: sooooo, are you going to want me to put all of our apache-related stuff into a puppet module? [21:10:42] hold that for the moment [21:10:54] heh, ok [21:11:06] I'm kinda pro more rapid rollout [21:11:08] I'm still waiting for mark to approve my two modules and I think he's holding those off due to whitespace considerations :P [21:11:16] heh, gotcha [21:11:48] also, I think he's reviewing your changes, so you better discuss it with him [21:12:02] I'm not sure if he'd appreciate seeing everything being moved elsewhere mid-review :) [21:12:34] makes sense [21:17:53] binasher: iirc the packages I built were for lucid; are you planning on using lucid or precise for this? [21:18:02] New patchset: Hashar; "new role::bastion::*" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/16954 [21:18:45] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/16954 [21:19:11] PROBLEM - Puppet freshness on neon is CRITICAL: Puppet has not run in the last 10 hours [21:19:23] paravoid: client side, whatever the apaches are in perhaps 1-2 weeks. precise for the memcached servers [21:19:42] I think it's going to be precise, so we need new packages [21:19:43] New review: Hashar; "fixed in PS3" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/16954 [21:19:47] and the igbinary ones were not trivial [21:20:03] the whole igbinary/libmemcached/php5-memcached [21:20:12] good old packaging! [21:20:33] Ubuntu didn't have what you wanted [21:20:38] and I had to create packages of our own [21:21:09] aye, thank you :) [21:21:27] and the igbinary stuff is not very well thought in upstream [21:21:55] a) it's not a runtime dependency; if you build with igbinary you're forced to use it [21:22:12] (that's going to be an interesting challenge for us for the apt repo) [21:22:24] um [21:22:25] what? [21:22:35] what? [21:22:53] I don't think you can /choose/ to use igbinary or not at runtime [21:22:53] what do you mean forced to use it? [21:24:34] that's not true, the pecl memcached client provides runtime serializer selection via the memcached.serializer ini [21:24:57] it just defaults to igbinary if its compiled with it [21:25:47] * paravoid checks [21:25:55] not that I don't believe you [21:26:06] I just wonder why I thought that [21:27:20] you're completely right [21:27:23] apologies. [21:27:41] I still think it's silly to default to something that it's backwards incompatible [21:27:47] but it's not that bad [21:28:06] it's no that bad if you're aware of it and know what you're doing [21:28:08] it could be bad [21:28:11] heh [21:28:32] yeah, but if you know what you're doing, you don't need the default [21:28:49] you'll just set it to whatever you like [21:28:52] anyway [21:29:00] most people won't care [21:29:06] the other thing that's "worrying" [21:29:21] is that they have this igbinary.h that php5-memcached uses if it finds it [21:29:25] this is all written for a very select audience [21:29:57] so basically you have a dependency from the memcached .so to the igbinary .so [21:30:17] but I'm not sure if they make gurantees that the ABI won't change [21:30:30] (i.e. if you need a SONAME or not) [21:32:13] development is pretty static and limited to bug fixes now, which may not change. patrick and terry have both worked with andrei zmievski and there was some recent messaging [21:33:11] okay :) [21:33:14] good to know [21:33:31] anyway. I guess you'd like me to build precise packages too? [21:33:37] hah, memcached 1.4.14 was released 20 minutes ago and i was just all happy that precise has 1.4.13 [21:33:53] hahahaha [21:34:01] that would be great [21:34:12] I'm still happy, I don't really trust software that was released NN minutes ago :) [21:35:17] call me a conservative if you like :) [21:35:37] depends on the changes [21:42:21] conservative! [21:42:30] haha [21:43:25] binasher: btw, did you see Twemcache? [21:43:50] PROBLEM - Puppet freshness on ocg3 is CRITICAL: Puppet has not run in the last 10 hours [21:44:19] paravoid: yeah, not going to use it though [21:44:38] yay, I was hoping for some comments on that [21:44:40] Has 1.20wmf8 been deployed to enwiki? I think it has but just want a confirmation [21:45:11] Thehelpfulone: according to http://en.wikipedia.org/wiki/Special:Version it has [21:45:41] heh I should have checked there first, thanks [21:46:07] paravoid: this is a good read - https://github.com/twitter/twemcache/issues/2 [21:46:25] thanks a lot [21:46:44] and I hope I'm not being too distracting [21:46:54] not at all :) [21:48:40] interesting [21:48:53] you sort of are because now i'm reading about memcache ;) [21:49:22] is it just me (my memcache knowledge is not the best) or is manjuraj not realy answering the questions as straightforwardly as dormando is asking them ? [21:49:31] hehe [21:50:44] heh, same feeling I get [21:50:49] binasher: how is that db hardware coming along? [21:51:10] ok, good to know my reading isn't encessarily different [21:51:11] also it's funny to see these large posts getting a two-line reply [21:51:14] LeslieCarr: yeah, i think twitter would likely be better off dumping their fork at this point and would see perf gains going to a 2012 release of dormando's / mainline memcached but ego may be holding them back [21:51:41] i would be curious to see actual performance test comparisons [21:51:59] that may be a perf/stability tradeoff though [21:52:01] amazingly, fast.ly is going to get dormando full time soon [21:52:14] oooo [21:54:23] LeslieCarr: i saw benchmarks in another long discussion thread where domando was getting much better performance out of mainline memcached than twm, but twitter's fork is better than where things were two years ago [21:54:27] AaronSchulz: which db hw? [21:56:52] for sharding [22:02:12] AaronSchulz: ooh, yeah. i think RobH is putting together the quote request from dell today [22:03:28] * AaronSchulz chuckles at "slab_rebal_torture" [22:04:25] it's ok to go with 20 minute old releases if they've survived torture! [22:32:55] hey ^demon [22:33:04] I noticed that you have a labs instance running gerrit, would it be okay if we would open up that instance for gerrit-stats development by future volunteers? [22:33:33] <^demon> That instance is very unstable and being replaced. [22:33:55] <^demon> Once the manifests are fixed, there's no reason we couldn't spin up a gerrit install in labs for stats stuff. [22:34:03] ok, cool, [22:34:07] <^demon> (The main purpose of my instance is to test upgrades) [23:40:44] Ryan_Lane, re https://gerrit.wikimedia.org/r/#/c/16990/2/manifests/misc/wlm.pp : where in /srv would you like to see this? is there any convention? [23:45:57] MaxSem: wlm.wikimedia.org would equal /srv/org/wikimedia/wlm [23:46:54] but the complaint was not to make those directly accessible in the docroot [23:47:23] although to be fair, if it is a full install, i think the appropiate place would be in /opt/ [23:48:04] !log dropping enwiki.aft_article_feedback af_user_id_user_ip index [23:48:12] Logged the message, Master [23:49:33] !log adding revised enwiki.aft_article_feedback af_user_id_user_ip index via osc [23:49:41] Logged the message, Master