[00:05:50] PROBLEM Free ram is now: WARNING on bots-3 i-000000e5.pmtpa.wmflabs output: Warning: 16% free memory [00:29:32] PROBLEM host: i-000003ef.pmtpa.wmflabs is DOWN address: i-000003ef.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000003ef.pmtpa.wmflabs) [00:59:32] PROBLEM host: i-000003ef.pmtpa.wmflabs is DOWN address: i-000003ef.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000003ef.pmtpa.wmflabs) [01:29:32] PROBLEM host: i-000003ef.pmtpa.wmflabs is DOWN address: i-000003ef.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000003ef.pmtpa.wmflabs) [01:45:12] RECOVERY Disk Space is now: OK on conventionextension-trial i-000003bf.pmtpa.wmflabs output: DISK OK [01:50:43] RECOVERY Free ram is now: OK on bots-3 i-000000e5.pmtpa.wmflabs output: OK: 21% free memory [01:53:13] PROBLEM Disk Space is now: WARNING on conventionextension-trial i-000003bf.pmtpa.wmflabs output: DISK WARNING - free space: / 77 MB (5% inode=51%): [01:59:32] PROBLEM host: i-000003ef.pmtpa.wmflabs is DOWN address: i-000003ef.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000003ef.pmtpa.wmflabs) [02:29:32] PROBLEM host: i-000003ef.pmtpa.wmflabs is DOWN address: i-000003ef.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000003ef.pmtpa.wmflabs) [02:40:43] RECOVERY Free ram is now: OK on bots-sql2 i-000000af.pmtpa.wmflabs output: OK: 20% free memory [02:53:42] PROBLEM Free ram is now: WARNING on bots-sql2 i-000000af.pmtpa.wmflabs output: Warning: 14% free memory [02:59:32] PROBLEM host: i-000003ef.pmtpa.wmflabs is DOWN address: i-000003ef.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000003ef.pmtpa.wmflabs) [03:29:35] PROBLEM host: i-000003ef.pmtpa.wmflabs is DOWN address: i-000003ef.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000003ef.pmtpa.wmflabs) [03:53:52] PROBLEM Free ram is now: WARNING on orgcharts-dev i-0000018f.pmtpa.wmflabs output: Warning: 15% free memory [03:59:33] PROBLEM host: i-000003ef.pmtpa.wmflabs is DOWN address: i-000003ef.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000003ef.pmtpa.wmflabs) [04:18:57] PROBLEM Free ram is now: CRITICAL on orgcharts-dev i-0000018f.pmtpa.wmflabs output: Critical: 5% free memory [04:28:53] RECOVERY Free ram is now: OK on orgcharts-dev i-0000018f.pmtpa.wmflabs output: OK: 95% free memory [04:29:35] PROBLEM host: i-000003ef.pmtpa.wmflabs is DOWN address: i-000003ef.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000003ef.pmtpa.wmflabs) [04:59:35] PROBLEM host: i-000003ef.pmtpa.wmflabs is DOWN address: i-000003ef.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000003ef.pmtpa.wmflabs) [05:29:33] PROBLEM host: i-000003ef.pmtpa.wmflabs is DOWN address: i-000003ef.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000003ef.pmtpa.wmflabs) [05:58:52] RECOVERY dpkg-check is now: OK on aggregator2 i-000002c0.pmtpa.wmflabs output: All packages OK [05:59:32] PROBLEM host: i-000003ef.pmtpa.wmflabs is DOWN address: i-000003ef.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000003ef.pmtpa.wmflabs) [05:59:32] RECOVERY Total processes is now: OK on aggregator2 i-000002c0.pmtpa.wmflabs output: PROCS OK: 229 processes [05:59:42] PROBLEM Free ram is now: WARNING on aggregator2 i-000002c0.pmtpa.wmflabs output: Warning: 8% free memory [06:00:55] RECOVERY Disk Space is now: OK on aggregator2 i-000002c0.pmtpa.wmflabs output: DISK OK [06:01:32] RECOVERY Current Users is now: OK on aggregator2 i-000002c0.pmtpa.wmflabs output: USERS OK - 0 users currently logged in [06:02:05] RECOVERY SSH is now: OK on aggregator2 i-000002c0.pmtpa.wmflabs output: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1 (protocol 2.0) [06:02:22] RECOVERY Current Load is now: OK on aggregator2 i-000002c0.pmtpa.wmflabs output: OK - load average: 0.12, 0.24, 0.41 [06:29:36] PROBLEM host: i-000003ef.pmtpa.wmflabs is DOWN address: i-000003ef.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000003ef.pmtpa.wmflabs) [06:32:25] PROBLEM Total processes is now: WARNING on parsoid-spof i-000004d6.pmtpa.wmflabs output: PROCS WARNING: 154 processes [06:38:52] PROBLEM Disk Space is now: CRITICAL on aggregator2 i-000002c0.pmtpa.wmflabs output: CHECK_NRPE: Error - Could not complete SSL handshake. [06:39:32] PROBLEM Current Users is now: CRITICAL on aggregator2 i-000002c0.pmtpa.wmflabs output: CHECK_NRPE: Error - Could not complete SSL handshake. [06:39:42] PROBLEM Free ram is now: CRITICAL on aggregator2 i-000002c0.pmtpa.wmflabs output: CHECK_NRPE: Error - Could not complete SSL handshake. [06:40:25] PROBLEM Current Load is now: CRITICAL on aggregator2 i-000002c0.pmtpa.wmflabs output: CHECK_NRPE: Error - Could not complete SSL handshake. [06:41:52] PROBLEM dpkg-check is now: CRITICAL on aggregator2 i-000002c0.pmtpa.wmflabs output: CHECK_NRPE: Error - Could not complete SSL handshake. [06:42:32] PROBLEM Total processes is now: CRITICAL on aggregator2 i-000002c0.pmtpa.wmflabs output: CHECK_NRPE: Error - Could not complete SSL handshake. [06:45:12] PROBLEM SSH is now: CRITICAL on aggregator2 i-000002c0.pmtpa.wmflabs output: Server answer: [06:47:22] RECOVERY Total processes is now: OK on parsoid-spof i-000004d6.pmtpa.wmflabs output: PROCS OK: 150 processes [06:52:38] Change on 12mediawiki a page Developer access was modified, changed by Devsundar link https://www.mediawiki.org/w/index.php?diff=596196 edit summary: [06:57:45] PROBLEM Disk Space is now: WARNING on mw1-21beta-lucid i-00000416.pmtpa.wmflabs output: DISK WARNING - free space: / 78 MB (5% inode=51%): [06:59:37] PROBLEM host: i-000003ef.pmtpa.wmflabs is DOWN address: i-000003ef.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000003ef.pmtpa.wmflabs) [07:12:42] RECOVERY dpkg-check is now: OK on wikidata-dev-2 i-00000259.pmtpa.wmflabs output: All packages OK [07:29:33] PROBLEM host: i-000003ef.pmtpa.wmflabs is DOWN address: i-000003ef.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000003ef.pmtpa.wmflabs) [07:36:31] hello [07:36:36] hi [07:36:41] !ping [07:36:41] pong [07:40:25] PROBLEM Total processes is now: WARNING on parsoid-spof i-000004d6.pmtpa.wmflabs output: PROCS WARNING: 151 processes [07:43:22] hi [07:44:52] When you install mediawiki on a labs instance do you use the puppet thing called "role::mediawiki-install::labs"?? [07:45:21] * Silke_WMDE is determined to understand this puppet thing now [07:46:05] I'm trying to write a puppet module for wikidata. [07:47:57] For which I obviously need a fresh mediawiki. For which there is probably some existing puppet stuff already there... [07:59:33] PROBLEM host: i-000003ef.pmtpa.wmflabs is DOWN address: i-000003ef.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000003ef.pmtpa.wmflabs) [08:00:44] Silke_WMDE: I think there is a mediawiki_new puppet class [08:01:23] yes. It looks... short. [08:05:20] ah! it comes from a special apt source. I see. [08:18:17] Silke_WMDE: so that is a module in puppet modules/mediawiki_new/ but you probably figured that out :-] [08:18:30] yep [08:18:32] I have no experience with it myself [08:18:37] might just be for production usage [08:18:44] since it uses sync-common [08:19:39] ohh Silke_WMDE there is also a role::labs-mediawiki-install [08:19:55] in manifests/role/labsmw.pp [08:20:18] and a role::mediawiki-install::labs in manifests/role/labsmediawiki.pp [08:20:23] ah, I didn't find out where that one came from, thx! [08:20:40] not sure which one should be used. The later seems to get more content [08:21:17] yeah role::labs-mediawiki-install is marked about how it should not be used [08:21:27] so role::mediawiki-install::labs is most probably what you want [08:21:50] errr [08:22:12] role::mediawiki-install::labs is the one that is integrated into the clicky interface [08:22:33] seems to be the correct one [08:23:03] you apparently need to add a password in /srv/mediawiki/orig/adminpass [08:23:16] Andrew most probably wrote some documentation about that class somewhere [08:23:46] which timezone is Andrew in? [08:25:37] And you, is it early, late or something in between for you? [08:29:42] PROBLEM host: i-000003ef.pmtpa.wmflabs is DOWN address: i-000003ef.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000003ef.pmtpa.wmflabs) [08:34:20] Silke_WMDE: I think Andrew is in SF [08:36:10] Silke_WMDE: seems so according to his linked in profile [08:36:24] I never had any interaction with him :-( [08:39:18] hashar: hey! have you checked out the vagrant thing i posted to wikitech-l the other week by any chance? [08:39:31] ori-l: not at all :-] [08:39:45] happily delegating that to the feature team and volunteers to test it out :-] [08:39:56] definitely something I need to look at though [08:40:15] https://github.com/wikimedia/wmf-vagrant <- spins up ubuntu vm and provisions working mediawiki instance using puppet [08:41:30] ori-l: so the vagrant VM boot with those puppet files ? [08:42:41] it starts with a base box, configures it as vm for virtualbox, will mount the puppet modules and manifest as a shared folder inside the vm, and run puppet to provision the machine [08:43:11] nice [08:43:17] how long does it take to get the VM ready? [08:43:35] port 8080 on the host (i.e., your machine) is mapped to port 80 in the guest, so if you browse to 127.0.0.1 you get mediawiki served by the vm [08:43:49] depends if you have the ubuntu base image or not. you need to get that once. that's about 300mb iirc [08:43:54] after that it's about 2-3 minutes [08:44:08] ahh [08:44:11] only because i include an "apt-get update" run [08:44:23] would it be possible to have a VM which already has been bootstrapped ? [08:44:26] that might speed up thing [08:44:47] then whenever a change is made to the wikimedia/wmf-vagrant repo, we could recreate a new VM file [08:44:58] that might make it faster to boot a new instance [08:44:59] (pure supposition) [08:45:03] yes, just provision it once and then "vagrant package" [08:45:33] supposedly, that prebuilt vm will be faster to boot up :) [08:45:56] can you somehow pass parameters to "vagrant up" ? [08:45:57] maybe i'm misunderstanding. once the vm is provisioned you can suspend / boot it in seconds [08:46:21] you can, but most of the behavior is configured in the Vagrantfile (see repo root) [08:46:35] yeah fast boot is what we will want for continuous integration. We will want to simply copy an existing file then resume it and start tests. [08:46:39] brb [08:54:22] ori-l: sorry daughter is sick :-] [08:54:35] ori-l: will follow up by email, I don't want to get to bed too late :-] [08:54:49] me neither. sorry to hear about your daughter [08:54:54] hope she feels better soon [08:54:58] ttyl [08:55:04] she is fine :-] [08:55:12] just has a bit of fever [08:55:13] (aka she is hot) [08:56:24] ah, good. get some rest then & ttyl [08:59:42] PROBLEM host: i-000003ef.pmtpa.wmflabs is DOWN address: i-000003ef.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000003ef.pmtpa.wmflabs) [09:16:20] Change on 12mediawiki a page Developer access was modified, changed by Devsundar link https://www.mediawiki.org/w/index.php?diff=596215 edit summary: /* User:Devsundar */ [09:16:49] Change on 12mediawiki a page Developer access was modified, changed by Devsundar link https://www.mediawiki.org/w/index.php?diff=596216 edit summary: /* User:Devsundar */ [09:29:42] PROBLEM host: i-000003ef.pmtpa.wmflabs is DOWN address: i-000003ef.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000003ef.pmtpa.wmflabs) [09:59:42] PROBLEM host: i-000003ef.pmtpa.wmflabs is DOWN address: i-000003ef.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000003ef.pmtpa.wmflabs) [10:00:48] Change on 12mediawiki a page Developer access was modified, changed by Kota Tolikara Papua link https://www.mediawiki.org/w/index.php?diff=596224 edit summary: /* Developer Tolikara */ new section [10:02:43] RECOVERY Disk Space is now: OK on mw1-21beta-lucid i-00000416.pmtpa.wmflabs output: DISK OK [10:10:43] PROBLEM Disk Space is now: WARNING on mw1-21beta-lucid i-00000416.pmtpa.wmflabs output: DISK WARNING - free space: / 78 MB (5% inode=51%): [10:25:53] PROBLEM Current Load is now: WARNING on ve-roundtrip2 i-0000040d.pmtpa.wmflabs output: WARNING - load average: 5.22, 5.33, 5.09 [10:29:43] PROBLEM host: i-000003ef.pmtpa.wmflabs is DOWN address: i-000003ef.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000003ef.pmtpa.wmflabs) [10:35:54] RECOVERY Current Load is now: OK on ve-roundtrip2 i-0000040d.pmtpa.wmflabs output: OK - load average: 4.16, 4.52, 4.83 [10:53:38] Change on 12mediawiki a page Developer access was modified, changed by Tegel link https://www.mediawiki.org/w/index.php?diff=596235 edit summary: Reverted edits by [[Special:Contributions/Kota Tolikara Papua|Kota Tolikara Papua]] ([[User talk:Kota Tolikara Papua|talk]]) to last revision by [[User:Devsundar|Devsundar]] [11:02:05] PROBLEM host: i-000003ef.pmtpa.wmflabs is DOWN address: i-000003ef.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000003ef.pmtpa.wmflabs) [11:32:02] PROBLEM host: i-000003ef.pmtpa.wmflabs is DOWN address: i-000003ef.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000003ef.pmtpa.wmflabs) [12:02:05] PROBLEM host: i-000003ef.pmtpa.wmflabs is DOWN address: i-000003ef.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000003ef.pmtpa.wmflabs) [12:03:51] Hey puppet people! I get an error message referring to /etc/puppet/manifests/generic-definitions.pp - a timeout. Any idea what could be wrong? [12:04:27] It happens when trying to make puppet pull from git. [12:04:42] PROBLEM Free ram is now: WARNING on bots-3 i-000000e5.pmtpa.wmflabs output: Warning: 15% free memory [12:04:55] PROBLEM Free ram is now: WARNING on dumps-bot1 i-000003ed.pmtpa.wmflabs output: Warning: 19% free memory [12:19:58] Pulling an extension works while pulling core doesn't. [12:34:13] PROBLEM host: i-000003ef.pmtpa.wmflabs is DOWN address: i-000003ef.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000003ef.pmtpa.wmflabs) [13:04:22] PROBLEM host: i-000003ef.pmtpa.wmflabs is DOWN address: i-000003ef.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000003ef.pmtpa.wmflabs) [13:05:34] PROBLEM Total processes is now: WARNING on wikistats-01 i-00000042.pmtpa.wmflabs output: PROCS WARNING: 196 processes [13:10:32] RECOVERY Total processes is now: OK on wikistats-01 i-00000042.pmtpa.wmflabs output: PROCS OK: 103 processes [13:15:17] @seen Ryan_Lane [13:15:17] petan: Last time I saw Ryan_Lane they were quiting the network N/A at 10/22/2012 7:19:18 AM (05:55:58.9526000 ago) [13:29:18] andrewbogott_afk: [13:29:24] Oops [13:30:19] andrewbogott_afk: I have got a question concerning the puppet file labsmediawiki.pp which you might have written. [13:30:53] Would be cool if we could chat for a second. [13:34:22] PROBLEM host: i-000003ef.pmtpa.wmflabs is DOWN address: i-000003ef.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000003ef.pmtpa.wmflabs) [13:54:42] RECOVERY Free ram is now: OK on bots-3 i-000000e5.pmtpa.wmflabs output: OK: 21% free memory [14:04:25] PROBLEM host: i-000003ef.pmtpa.wmflabs is DOWN address: i-000003ef.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000003ef.pmtpa.wmflabs) [14:34:32] PROBLEM host: i-000003ef.pmtpa.wmflabs is DOWN address: i-000003ef.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000003ef.pmtpa.wmflabs) [14:46:52] PROBLEM Current Load is now: WARNING on parsoid-roundtrip3 i-000004d8.pmtpa.wmflabs output: WARNING - load average: 6.70, 6.92, 5.87 [14:56:53] RECOVERY Current Load is now: OK on parsoid-roundtrip3 i-000004d8.pmtpa.wmflabs output: OK - load average: 4.00, 4.20, 4.88 [15:04:32] PROBLEM host: i-000003ef.pmtpa.wmflabs is DOWN address: i-000003ef.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000003ef.pmtpa.wmflabs) [15:04:32] PROBLEM Total processes is now: WARNING on wikistats-01 i-00000042.pmtpa.wmflabs output: PROCS WARNING: 196 processes [15:26:00] Change on 12mediawiki a page Developer access was modified, changed by Jimmy xu wrk link https://www.mediawiki.org/w/index.php?diff=596282 edit summary: [15:28:57] PROBLEM Current Load is now: CRITICAL on mwreview-param i-000004da.pmtpa.wmflabs output: Connection refused by host [15:29:39] RECOVERY Total processes is now: OK on wikistats-01 i-00000042.pmtpa.wmflabs output: PROCS OK: 103 processes [15:29:39] PROBLEM Current Users is now: CRITICAL on mwreview-param i-000004da.pmtpa.wmflabs output: Connection refused by host [15:30:22] PROBLEM Disk Space is now: CRITICAL on mwreview-param i-000004da.pmtpa.wmflabs output: Connection refused by host [15:30:52] PROBLEM Free ram is now: CRITICAL on mwreview-param i-000004da.pmtpa.wmflabs output: Connection refused by host [15:32:32] PROBLEM Total processes is now: CRITICAL on mwreview-param i-000004da.pmtpa.wmflabs output: Connection refused by host [15:33:14] PROBLEM dpkg-check is now: CRITICAL on mwreview-param i-000004da.pmtpa.wmflabs output: Connection refused by host [15:34:32] PROBLEM host: i-000003ef.pmtpa.wmflabs is DOWN address: i-000003ef.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000003ef.pmtpa.wmflabs) [15:34:42] RECOVERY Current Users is now: OK on mwreview-param i-000004da.pmtpa.wmflabs output: USERS OK - 1 users currently logged in [15:35:22] RECOVERY Disk Space is now: OK on mwreview-param i-000004da.pmtpa.wmflabs output: DISK OK [15:35:52] RECOVERY Free ram is now: OK on mwreview-param i-000004da.pmtpa.wmflabs output: OK: 931% free memory [15:37:32] RECOVERY Total processes is now: OK on mwreview-param i-000004da.pmtpa.wmflabs output: PROCS OK: 93 processes [15:38:53] RECOVERY Current Load is now: OK on mwreview-param i-000004da.pmtpa.wmflabs output: OK - load average: 1.43, 1.03, 0.68 [15:43:12] RECOVERY dpkg-check is now: OK on mwreview-param i-000004da.pmtpa.wmflabs output: All packages OK [16:04:33] PROBLEM host: i-000003ef.pmtpa.wmflabs is DOWN address: i-000003ef.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000003ef.pmtpa.wmflabs) [16:34:42] PROBLEM host: i-000003ef.pmtpa.wmflabs is DOWN address: i-000003ef.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000003ef.pmtpa.wmflabs) [17:04:43] PROBLEM host: i-000003ef.pmtpa.wmflabs is DOWN address: i-000003ef.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000003ef.pmtpa.wmflabs) [17:07:43] PROBLEM Free ram is now: WARNING on bots-3 i-000000e5.pmtpa.wmflabs output: Warning: 19% free memory [17:16:18] Change on 12mediawiki a page Developer access was modified, changed by Sharihareswara (WMF) link https://www.mediawiki.org/w/index.php?diff=596304 edit summary: /* User:Devsundar */ [17:32:42] RECOVERY Free ram is now: OK on bots-3 i-000000e5.pmtpa.wmflabs output: OK: 21% free memory [17:35:14] PROBLEM host: i-000003ef.pmtpa.wmflabs is DOWN address: i-000003ef.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000003ef.pmtpa.wmflabs) [17:40:29] !log deployment-prep csteipp: cherry picked in LBFactory_Multi config [18:06:15] PROBLEM host: i-000003ef.pmtpa.wmflabs is DOWN address: i-000003ef.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000003ef.pmtpa.wmflabs) [18:28:32] PROBLEM Current Users is now: WARNING on parsoid-spof i-000004d6.pmtpa.wmflabs output: USERS WARNING - 6 users currently logged in [18:35:22] PROBLEM Total processes is now: WARNING on parsoid-spof i-000004d6.pmtpa.wmflabs output: PROCS WARNING: 153 processes [18:37:02] PROBLEM host: i-000003ef.pmtpa.wmflabs is DOWN address: i-000003ef.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000003ef.pmtpa.wmflabs) [18:41:52] RECOVERY Current Load is now: OK on parsoid-roundtrip4 i-000004d7.pmtpa.wmflabs output: OK - load average: 0.92, 3.52, 4.89 [18:57:53] csteipp: hey :-] [18:58:06] csteipp: so yeah I guess voyage on beta will need a wgLBFactoryConf configuration like in production [18:58:16] csteipp: wmf-config/db-wmflabs.php [18:58:20] Ryan_Lane, hello sir [18:58:25] howdy [18:58:25] csteipp: that is a bit of work though :-/ [18:58:29] I think so... I cherrypicked in the change, and it's working [19:07:02] PROBLEM host: i-000003ef.pmtpa.wmflabs is DOWN address: i-000003ef.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000003ef.pmtpa.wmflabs) [19:07:32] PROBLEM Total processes is now: WARNING on wikistats-01 i-00000042.pmtpa.wmflabs output: PROCS WARNING: 196 processes [19:08:45] * Damianz streatches [19:12:00] Jan_Luca: sorry that I keep making this same mistake… is this you, or the other Jan? https://gerrit.wikimedia.org/r/#/c/28355/1 [19:13:12] There are 2 Jan's? [19:14:09] hmm reminds me I should merge in my puppet stuff eventhough it's a bit shite [19:14:12] Jan Gerber, Jan Luca. [19:15:25] csteipp: oh you already submitted a change for wikivoyage LBFactoryConf. I reviewed it https://gerrit.wikimedia.org/r/#/c/29344/ :-] Seems almost good to me [19:22:43] Ryan_Lane: Can you tell me (again) your thoughts about how to avoid the Apache timeout when cloning MW? You thought we should try to host a repo on shared storage? [19:23:12] andrewbogott: ^demon's idea was to enable gerrit replication to labs [19:23:21] then you could clone from a global read-only share [19:23:26] which would be way faster [19:23:41] As long as our storage isn't crappy that is :) [19:23:48] *hehem nfs* [19:23:56] it would be nfs [19:24:01] but nfs shared from gluster [19:24:09] not a central instance [19:24:17] andrewbogott: That's me [19:24:23] Damianz: and yes, I need to take care of that before I leave for vacation [19:24:23] dumb question… do we even have a labs-wide sharedfs at the moment? [19:24:25] yes [19:24:31] public data [19:24:37] Jan_Luca: Ok! Just wanted to let you know that I have the refactor in process; I'm testing now. [19:25:01] Ryan_Lane: Hm, thought that was project-specific for some reason. I guess I haven't been paying attention :) [19:25:05] Ryan_Lane: :) [19:25:08] ok but I go sleep now [19:25:20] andrewbogott: I think git clone as a way to only clone a single branch (aka master), that might save up some time when cloning. [19:25:23] andrewbogott: oh and Hi :-] [19:25:26] Jan_Luca: Sure; I'll attach my updated patch to your patchset. I just wanted to avoid you duplicating effort. [19:25:43] andrewbogott: it has the public dumps and datasets [19:25:56] andrewbogott: also I have been asked by the wikidata folks if you have any documentation about the mediawiki puppet class. Maybe on labsconsole ? [19:25:57] it needs to be global and read-only because it comes from another source [19:26:04] I still wonder how well beta pulling from git is going to work long term with migrations/cache puring etc =/ [19:26:38] Ryan_Lane: OK, I will read about gerrit replication. [19:26:53] Damianz: we are going to use Gerrit build-in replication system instead of the lame "poll every 3 minutes" loop [19:27:08] hashare: Was that Silke? [19:27:14] andrewbogott: yeah Silke [19:27:23] hashar: Well yeah that would help, I was thinking more maintaince that needs running to stop updates borking the site though. [19:27:39] andrewbogott: I have pointed her to one of the class that seemed to be up to date. Something like role::labs-install::mediawiki [19:27:58] OK -- we exchanged emails this morning, hopefully He/She (?) is caught up. The documentation such as it is is here: https://labsconsole.wikimedia.org/wiki/Help:InstanceConfigMediawiki [19:27:58] andrewbogott: talk with ^demon|lunch about it [19:28:11] Ryan_Lane: ok! [19:28:21] <^demon|lunch> What are we replicating? [19:28:27] ^demon|lunch: gerrit to labs [19:28:35] ^demon|lunch, shouldn't you be eating? [19:28:41] <^demon|lunch> I stopped eating like 2.5h ago :p [19:29:03] lunch is a state of mind [19:29:18] I guess it should be ^demon|teatime where you are [19:29:46] We use to have snacks in the channel, Ryan would be food for a few hours a day :D [19:30:16] ^demon|lunch Right now I pretty much can't clone mediawiki into a labs instance because Apache times out. I guess Gerrit can push an up-to-date repo into labs storage? [19:30:18] * Ryan_Lane Ryan|flamebait [19:30:20] err [19:31:15] <^demon|lunch> andrewbogott: Should be feasible. All gerrit needs is a receiving box that has SSH + Git. [19:31:24] PROBLEM Disk Space is now: WARNING on labs-nfs1 i-0000005d.pmtpa.wmflabs output: DISK WARNING - free space: /export 1006 MB (5% inode=59%): /home/SAVE 1006 MB (5% inode=59%): [19:31:33] labs-home-wm: Blame Ryan :D [19:31:41] heh [19:31:53] <^demon|lunch> If https://gerrit.wikimedia.org/r/#/c/25508/ went in, this stuff would be easier. [19:32:07] <^demon|lunch> I made a new role that installs a gerritslave user + ssh key. [19:32:30] you have a −1 on it;) [19:32:41] <^demon|lunch> Yes I know. [19:32:44] <^demon|lunch> It's not perfect yet [19:32:50] ^demon|lunch: If you want me to just ask you about this again in two weeks that's perfectly acceptable. [19:33:04] No sense in solving the problem twice. [19:33:22] <^demon|lunch> Well most of the problems have been solved. Just need to clean up the rough edges on Ie91b0077. [19:33:33] <^demon|lunch> Then doing replication will be easy to in-cluster things. [19:34:10] Ryan_Lane: When the time comes, what host would you like us to use as the conduit to nfs? [19:34:16] hm [19:34:36] well, gluster would be configured for this [19:34:44] like publicdata is [19:34:57] yeah but gerrit won't have the volume mounted will it? [19:34:57] we can use the same project for that [19:35:17] ok [19:35:20] Hmm, as the automounts for gluster are in ldap do you have a global as well as per project ones? [19:35:40] Damianz: per-project ones are automount in a non-ghosted way [19:35:49] public datasets are keyed automounts [19:36:01] ah [19:36:02] so they are ghosted [19:36:12] can't ghost wildcard mounts [19:37:06] PROBLEM host: i-000003ef.pmtpa.wmflabs is DOWN address: i-000003ef.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000003ef.pmtpa.wmflabs) [19:40:13] <^demon|lunch> There is a Github api client written in ActionScript. [19:40:18] <^demon|lunch> I was unaware anyone used ActionScript [19:40:33] ^demon|lunch: OK, I'm adding myself as a watcher on that patch and will bug you about replication once it's settled. [19:41:00] ^demon|lunch: o.0 [19:41:26] <^demon|lunch> https://github.com/cbrammer/api-github-as3 [19:41:48] meanwhile… hashar, do you know how I tell git to clone only a single branch vs. a whole rep? [19:42:40] <^demon|lunch> Clone is always going to grab all branches. You can explicitly tell it which branch you want to initially checkout though. [19:42:49] <^demon|lunch> `git clone -b ` [19:43:23] That's what I feared… I'm doing that already. [19:43:28] apparenetly --single-branch [19:43:33] need to try out [19:43:37] ooh! [19:43:38] * andrewbogott tries [19:43:41] Why would you want to only clone 1 branch? [19:43:53] another way, if you don't care about updating it, is to use "git snapshot" [19:44:02] which takes a copy at a specific revision [19:44:20] something like git snapshot REL1_19 [19:44:24] or v1.19.2 [19:44:26] Damianz: bandwidth. [19:44:50] Downloading the entire linux kernel repo doesn't take /that/ long. [19:44:58] so some puppet parameterized class could be written to accept a branch / tag as a parameter [19:45:15] Damianz: The problem is when puppet clones it's non-interactive, which makes Apache on the server impatient. [19:45:19] And it hangs up halfway through. [19:45:27] <^demon|lunch> We upped the timeout :\ [19:45:29] Puppet needs a life [19:45:46] ~demon|lunch: The timeout on the server? [19:45:49] <^demon|lunch> Yes. [19:45:55] ^demon|lunch: Is it /over 9000/? [19:46:03] Well, shit, why's it still happening then? Maybe I've misdiagnosed the problem. [19:46:12] <^demon|lunch> Dzahn and I raised the timeout to like 5 minutes awhile back. [19:46:29] andrewbogott: Did you forget to feed the poor hamsters again? [19:46:31] <^demon|lunch> `git clone --quiet ` to your local system can replicate it, most likely. Doesn't require puppet. [19:46:38] Are there maybe two timeouts, one for git server and one for apache? [19:46:43] <^demon|lunch> In any case, cloning core shouldn't take *nearly* so long. It's not that big a repo. [19:47:20] ^demon|lunch: I did have multiple timeouts when cloning from gallium (in eqiad iirc) using the CLI [19:47:23] though that was a few months ago [19:47:27] Tbf gerrit is fucking slow sometimes [19:47:48] <^demon|lunch> hashar: Yeah this was known. It was why we upped the Apache timeout like ~2mo ago [19:48:06] <^demon|lunch> Damianz: I'm actually pretty sure it's not gerrit's fault entirely. At least with MediaWiki core. [19:48:11] <^demon|lunch> Other repos clone fast (as they should). [19:48:24] <^demon|lunch> I think we've got a really crappy history that's making jgit misbehave. [19:48:26] * ^demon|lunch shrugs [19:48:32] Not entirely, but I don't think it helps at all. [19:48:45] I'm running a quiet clone right now, will see what happens. [19:49:46] <^demon|lunch> Damianz: Cloning the same repo from github takes forever too. It's the repo, not gerrit. [19:49:57] andrewbogott: git snapshot does not exist :-] It is "git archive". Sorry for the confusion. You do something like: git archive --remote= master | (cd /srv/mediawiki && tar xf - ) [19:50:03] Also java sucks for doing random shit :( I spent the entire week debugging an AD issue that turn out to be 'java's implimentation is stoopid'. [19:50:22] Maybe we just need to run git-gc on the master repo? [19:50:24] git archive is basically svn export [19:50:30] <^demon|lunch> NO [19:50:44] That erases history, I take it? [19:50:51] andrewbogott: They see me trolling [19:50:57] I'm not trolling, just ignorant! [19:51:09] <^demon|lunch> git gc can make things inexplicably explode. [19:51:13] * andrewbogott thought that git-gc archived/compressed. [19:51:13] ^demon|lunch totally wants a week of fixing the repo again :D [19:51:19] <^demon|lunch> 2 weeks. [19:51:21] <^demon|lunch> And no, I don't. [19:51:24] ok, noted! [19:51:34] So it isn't /supposed/ to erase history, it just does. [19:51:37] andrewbogott: that also deleted unclaimed objects. Seems to be a problem with gerrit :/ [19:51:40] andrewbogott: You didn't notice when ^demon|lunch imploded both copies of the repos for a week? [19:51:42] or at least with our gerrit setup [19:52:17] http://www.youtube.com/watch?v=03yZK18PX-Y [19:52:17] anyway, running "git gc", although dismissed, is a good idea :-) [19:52:46] andrewbogott: that video is eligible for the international caps lock day (which is today) http://capslockday.com [19:53:56] * andrewbogott HAD NO IDEA [19:54:11] Change on 12mediawiki a page Developer access was modified, changed by Lockal link https://www.mediawiki.org/w/index.php?diff=596336 edit summary: [19:55:46] <^demon|lunch> Actually, there might be a safer way to run git-gc. [19:55:50] <^demon|lunch> https://gerrit.googlesource.com/gerrit/+/master/contrib/git-exproll.sh [19:55:50] fun fact: Back when I said I was doing a quiet clone, I had already started the clone. And I am still waiting. [19:55:57] So that's seven minutes and counting... [19:56:05] <^demon|lunch> I haven't tested it yet, so could very well Do Bad Things. [19:56:33] ^demon|lunch: Isn't it really a feature that should be rolled into gerrit as a scheduled job. [19:57:02] <^demon|lunch> Not something gerrit should do. [19:57:03] <^demon|lunch> Jgit itself should do it. [19:57:17] "And every night at 2AM, it automatically breaks everything" [19:57:46] andrewbogott: I know a setup where you can time network outages down to the second when backups start :) [19:58:21] ^demon|lunch: OK, I finally got myself a failure from git clone --quiet. RPC failed; result=22, HTTP code = 502 [19:58:39] So I want to say… make the timeout 12 minutes? [19:58:50] <^demon|lunch> Let's ask Ryan_Lane [19:58:58] * andrewbogott subscribes to the 'get a bigger hammer' school of troubleshooting [19:59:05] <^demon|lunch> Ryan_Lane: Can we raise the apache proxy timeout to like 12m instead of 10m? [19:59:15] <^demon|lunch> Or whatever I set it to last. [19:59:36] Problem with bumping the time out is you risk getting to the point when you're either dropping connections right down and can handle shit or you risk ooming the box holding sessions. [19:59:36] probably, yeah [19:59:44] that's true [19:59:47] <^demon|lunch> Oh, 5m. [19:59:56] but likely not a huge deal [19:59:56] <^demon|lunch> So yeah, even 10m would be nice. [20:00:00] go for it [20:00:10] It's never a big deal until it explodes in your face [20:00:11] heh [20:00:17] if that happens we'll lower it [20:00:52] <^demon|lunch> andrewbogott: Ok, it's in the puppet repo. templates/apache/sites/gerrit.wikimedia.org.erb (look for TimeOut) [20:01:49] ^demon|lunch: do you mean you just changed it, or are you suggesting that I change it? [20:01:53] you know what? I want to kill all our apahe stuff with a hammer and actually puppetize apache not the half assed lets include static files all over the place effort :( [20:01:54] <^demon|lunch> Oh, I'll change it now. [20:01:57] <^demon|lunch> Sorry for the confusion. [20:02:11] I'm happy to, just couldn't tell which [20:02:33] PROBLEM Total processes is now: CRITICAL on wikistats-01 i-00000042.pmtpa.wmflabs output: PROCS CRITICAL: 289 processes [20:03:45] <^demon|lunch> https://gerrit.wikimedia.org/r/29432 [20:04:02] <^demon|lunch> Ah crap, I forgot to grab a fresh branch. [20:04:03] <^demon|lunch> Second. [20:04:42] ^ this is why I hate the not 'branch, push, pull request' workflow [20:04:51] <^demon|lunch> Ok, rebased. [20:06:29] * andrewbogott will merge [20:07:04] PROBLEM host: i-000003ef.pmtpa.wmflabs is DOWN address: i-000003ef.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000003ef.pmtpa.wmflabs) [20:07:56] * andrewbogott predicts that 7 won't be enough [20:08:02] <^demon|lunch> I miscounted. We were already at 10. [20:08:05] <^demon|lunch> It was 600s. [20:08:20] <^demon|lunch> So I bumped it to 12m = 720s [20:08:26] Oh wait, you're right. [20:08:32] I forgot how many seconds are in a minute, apparently. [20:08:40] well, as did you :) [20:08:56] <^demon|lunch> The problem is really apache giving up since it thinks nothing is happening. [20:09:31] So the delay isn't data transfer, it's gerrit psyching itself up? [20:09:51] hm. maybe I shouldn't have scheduled all instances to update salt exactly a minute from now [20:09:51] <^demon|lunch> No, it's actually transferring, but --quiet means nothing over stdout. [20:10:04] <^demon|lunch> So apache is being stupid and thinks nothing is being proxied. [20:10:13] oh well [20:10:19] too late :) [20:10:33] RECOVERY Disk Space is now: OK on testing-arky i-0000033b.pmtpa.wmflabs output: DISK OK [20:11:07] <^demon|lunch> Ryan_Lane: When do you think would be a good time to upgrade manganese & formey to precise? [20:11:30] some time this week, I guess [20:11:34] my schedule is so screwed up [20:12:03] I <3 salt's event system [20:12:12] <^demon|lunch> I've already tested precise. We've only got one minor config tweak to do to make gerrit work (jre moved paths :\) [20:12:26] I had all instances schedule something via at, then made them fire an event when they were done [20:13:52] Ryan_Lane: Sounds like an easy way to trash every instance at once :) [20:14:05] Damianz: yep [20:15:44] PROBLEM Free ram is now: WARNING on bots-3 i-000000e5.pmtpa.wmflabs output: Warning: 19% free memory [20:16:16] well, seems that upgrade didn't go so well [20:16:25] none of the minions are connecting [20:16:40] ah. needed to restart the master [20:17:32] PROBLEM Total processes is now: WARNING on wikistats-01 i-00000042.pmtpa.wmflabs output: PROCS WARNING: 196 processes [20:17:57] I really like the idea of salt from the point of being able to mass execute stuff in a replicatable fasion and build an interface for auditing/access control purposes around it. It also scares the shit out of me and I'd rather use pdsh with key auth... though tehnically it's key auth with zeromq should be sound. [20:18:12] yeah [20:18:41] seems to be good software so far [20:18:42] PROBLEM Disk Space is now: WARNING on testing-arky i-0000033b.pmtpa.wmflabs output: DISK WARNING - free space: / 77 MB (5% inode=51%): [20:19:10] hm. most of the minions didn't upgrade [20:19:31] I'm thinking about rolling a simple web interface in django ontop of the master for our management stuff. When we build out our cdn it would provide a clean way to mass purge data from the edge nodes that are distributed around the world. [20:22:14] Damianz: a rest api is also being added right now [20:22:38] From what I've seen of the rest api code it's not really in the right direction atm... [20:22:50] no? [20:22:52] why not? [20:23:12] RECOVERY Disk Space is now: OK on conventionextension-trial i-000003bf.pmtpa.wmflabs output: DISK OK [20:23:34] It needs some form of authorization/authentication system that's modular, so you can support say ldap and give groups access to run certain commands. The code I looked at last seemed very basic and more of a POC of how to do a rest interface against the main codebase rather than a suitable to deploy system. [20:23:42] RECOVERY Disk Space is now: OK on testing-arky i-0000033b.pmtpa.wmflabs output: DISK OK [20:24:01] Damianz: yeah, it's very basic right now [20:24:02] Also needs lots of logging that you can hook into for auditing purposes adding for enterprise use. [20:24:05] what you are suggesting is being worked on [20:24:16] we're going to add keystone auth at some point [20:24:38] the groups/roles stuff will be in salt core [20:24:45] I think it could be freaking awesome... eventually, was kinda hoping to get some time to look at working on it but I'm kinda busy atm. [20:25:00] * Ryan_Lane nods [20:25:42] RECOVERY Disk Space is now: OK on mw1-21beta-lucid i-00000416.pmtpa.wmflabs output: DISK OK [20:25:59] hm. some of these minions just don't want to upgrade [20:26:15] I think roles should be extended in the core some more also in some cases. Like grains are useful, but sometimes you can't trust the end machine. If you could control the grains code centrally then use that data for roles (think puppet/facter) it would be awesome. [20:26:15] ah. they may be down [20:26:26] connect to host i-000001cc.pmtpa.wmflabs port 22: Connection refused [20:26:31] or they may have bad security groups [20:27:25] Nagios says 1 down, though about 10 are ignored. [20:27:43] * Damianz likes how it works, people ignore him, he makes nagios ignore their instances :) [20:28:01] hm [20:28:09] it shouldn't refuse my connection... [20:28:52] weird [20:29:07] I wonder if its a new instance and its security groups are screwed up [20:29:09] Tbh I kinda dislike security groups, they're awesome for public stuff but really we should use iptables and keep setups standard against prod or we split away every time and make sadface. [20:29:39] we hardly use iptables in production [20:30:05] and security is more of an issue in labs [20:30:12] since people tend to run random shit [20:30:32] Mostly on stuff that isn't behind lbs afaik, but yeah [20:30:41] Sometimes I just want to stab people [20:30:47] Actually I mostly just want to stab people, but that's another story [20:31:10] Though I think in some cases we should try to replicate stuff as close as possible and it's a shame we can't vlan off projects. [20:31:13] ah. this instance is running a dpkg command [20:31:24] I need to change puppet to install a specific version of salt [20:31:59] Damianz: it's possible to disable security groups per project [20:32:08] Damianz: make a rule that allows all in [20:32:26] then you can manage the groups in the instances [20:32:35] err [20:32:35] the rules [20:32:41] I'd argue that a filter allowing doesn't equal no filtering but that's semantics of the kernel over practicality [20:32:52] right [20:33:19] we can't *currently* vlan off projects [20:33:24] when we can start using quantum, we could [20:33:37] quantum is way more powerful than nova-network [20:34:32] It would be sort of nice to support vlans in labs, make some of the network nodes act as routers and peer with the real wmf routers for labs public ip space. Though it's probably just me that thinks labs sits too close to prod and is restircted in places currently. [20:34:52] PROBLEM Current Load is now: WARNING on parsoid-roundtrip4 i-000004d7.pmtpa.wmflabs output: WARNING - load average: 5.94, 6.39, 5.44 [20:34:52] PROBLEM Current Load is now: WARNING on parsoid-roundtrip3 i-000004d8.pmtpa.wmflabs output: WARNING - load average: 3.44, 4.93, 5.07 [20:35:12] Damianz: In theory I'm on the hook for adding keystone to salt-api, but I'm waiting for the api development to settle down a bit. [20:35:43] andrewbogott: yeah. likely a good idea [20:35:50]