[00:07:05] When would a jsub -continuous stop running? [00:09:14] 3Wikimedia Labs / 3tools: Random 503 Service Temporarily Unavailable errors from tools-webproxy - 10https://bugzilla.wikimedia.org/65179#c12 (10Tim Landscheidt) Yuvi currently has no power for his laptop, but he commented on IRC: | mutante: and then I looked at the logs and the problem was that... [00:10:22] When the program terminates successfully (exit 0) or OOMs. [00:11:20] (Or of course jstop/qdel.) [00:18:59] 3Wikimedia Labs / 3tools: Set up a tileserver for OSM in Labs - 10https://bugzilla.wikimedia.org/60819#c4 (10Matthew Flaschen) As a 'see also', bug 33980 is for doing this in production. [00:19:29] 3Wikimedia Labs / 3tools: Set up a tileserver for OSM in Labs - 10https://bugzilla.wikimedia.org/60819 (10Matthew Flaschen) [00:21:37] where does morebots log to? [00:22:14] 3Wikimedia Labs / 3(other): [tracking] OSM on Labs - 10https://bugzilla.wikimedia.org/58797 (10Matthew Flaschen) [00:23:02] nvm found it [00:23:26] !log labs 503's related to bug 65179 [00:23:26] labs is not a valid project. [00:23:33] !log tools 503's related to bug 65179 [00:23:34] Logged the message, Master [08:04:06] hello [09:02:37] (03PS1) 10Giuseppe Lavagetto: Adding a missing entry [labs/private] - 10https://gerrit.wikimedia.org/r/133214 [09:22:34] (03CR) 10Giuseppe Lavagetto: [C: 032 V: 032] Adding a missing entry [labs/private] - 10https://gerrit.wikimedia.org/r/133214 (owner: 10Giuseppe Lavagetto) [10:19:05] Does "./x >> file" only update file when x terminates? [12:50:32] a930913: That depends on if/how ./x buffers its output. For example, in Perl you can use "$| = 1;" to force a flush after every print. [12:52:07] scfc_de: Yeah, didn't find how to do it in python, so just changed prints to a function that logged. [12:52:28] scfc_de: Also, I gave up on trying to use the shared pywiki. [12:55:07] a930913: http://stackoverflow.com/questions/107705/python-output-buffering seems to be the compendium on that (untested). [13:01:14] scfc_de: Derp, I didn't even think of flushing it :p [13:03:37] We have a recurring vandal on wikitech, and I start to get tired of blocking his various incarnations. How do other wikis handle this? [13:05:41] scfc_de: abuse filter, checkuser range blocks [13:06:22] Whitelist? :p [13:06:44] I'd prefer whitelist :-). [13:06:49] *argl* [13:07:09] scfc_de: what type of vandalism is it? [13:08:19] https://wikitech.wikimedia.org/wiki/Special:Contributions/Unbeatsr, https://wikitech.wikimedia.org/wiki/Special:Contributions/Suckerf, https://wikitech.wikimedia.org/wiki/Special:Contributions/Sockpuppetofnoob, etc. [13:08:31] scfc_de: have you had a CU look at it? [13:08:46] Odds are its from the same IP [13:09:18] Or it [13:09:23] is a dial-up. [13:09:50] even then you can just softblock the range [13:10:58] CUs are Coren and Ryan, so it'll have to wait anyhow :-). [13:11:47] scfc_de: ask a steward [13:15:11] Hmm, how many abductions can the labs afford before it would fail or become unusable? [13:16:14] No admins => no data center moves, no crontab juggling, etc. :-). [13:17:10] scfc_de: Reply to your yesterday message: Yes, That is whois web searching service. We currently use that of toolserver tool in [[w:ja:MediaWiki:Sp-contributions-footer-anon]] and other projects. [13:17:46] scfc_de: Yes, but how many? [13:22:13] rxy: scfc_de: FWIW, OverloadQ, who maintained the older TS tools for investigating IP addresses, said they would migrate everything to WMF Labs. https://en.wikipedia.org/wiki/User_talk:OverlordQ#Toolserver_migration_needed_.28TorCheck.29 [13:23:24] rxy: Ah, okay. For the legal side of that, I think it's probably best to get approval by WMF (that applies to OverloadQ as well of course). [13:28:38] scfc_de: I see, Thanks. whym: Thanks for your notify. [15:22:00] Hi everyone: I have a question: how can I test if the mod_rewrite is working in tools.wmflabs? [15:38:59] mutante: around to +2 things? [15:39:00] :D [15:39:24] hey andrewbogott! [15:39:32] andrewbogott: found the source of the 503s, and it wasn't with nginx but with redis [15:39:48] andrewbogott: https://gerrit.wikimedia.org/r/#/c/133171/ and https://gerrit.wikimedia.org/r/#/c/133172/ [15:40:34] andrewbogott: because connection pooling wasn't setup properly. I tested these in the live setup [15:40:46] YuviPanda: great! [15:40:58] I still probably don't want you to re-update nginx until I have time to pay attention though :) [15:41:01] andrewbogott: too many connections, essentially. [15:41:03] well, me or Coren [15:41:04] andrewbogott: oh yeah, completely :) [15:41:17] andrewbogott: but we should merge these patches. [15:41:24] ok, i'll will read... [15:41:28] andrewbogott: ty! [15:44:29] YuviPanda: feeling confident about those? I'm happy to merge but I have to go in a few minutes... [15:45:04] andrewbogott: yeah, they were live for about 5m before puppet killed it [15:45:10] 'k [15:48:26] YuviPanda: ok, I'm forcing a puppet apply on the main proxy… what's the name of the box that the tools proxy runs on? [15:48:37] andrewbogott: tools-webproxy. I'm on it [15:48:48] ok. do I need to restart redis as well? [15:48:53] andrewbogott: nope [15:48:58] 'k [15:49:33] YuviPanda: um… my proxy test page just went down [15:49:45] tools is fine... http://tools.wmflabs.org/ [15:50:16] http://eqiadproxytest.wmflabs.org [15:50:51] yeah, I see that. logging in [15:51:27] andrewbogott: restarted nginx. all looks fine now [15:51:31] andrewbogott: not sure why that was needed [15:51:44] ah, ok. Yeah, seems better [15:52:03] petan: jsi? [15:52:46] andrewbogott: we need to setup some monitoring for it, and have it ping me whenever something happens. can figure out when you're back tho :) [15:53:45] That should be part of the bigger Icinga picture, though. [15:54:20] true [15:54:29] but I should at least setup something that pings me when tools.wmflabs.org is down [15:56:32] Some monitoring is always better than no monitoring at all :-). But we should watch out that we don't up with a different monitoring script for each http host. [15:56:42] oh totally [15:56:56] the connection pooling should also make the proxy faster in general [15:57:19] andrewbogott: taking the laptop with you? [15:57:39] YuviPanda: Can you see the difference in the number of connections nginx <-> redis to see if the patch works? [15:58:00] Danny_B: Not to dinner :) [15:58:04] scfc_de: so I tested it by adding instrumentation code yesterday that reported how many times the current connection was being reused. [15:58:17] scfc_de: but I can check again. moment [15:59:09] scfc_de: only 18 connections now. [15:59:21] YuviPanda: Compared to ... ? [15:59:32] I don't know :P didn't check (yes was an idiot) [15:59:43] In 30 minutes we'll know :-). [15:59:44] scfc_de: hmm, I can comment that part of the code out and see how it goes [15:59:52] scfc_de: already forced a puppet run. this is the new data [16:00:11] I was more thinking about "netstat -t" | grep something. [16:00:34] scfc_de: redis INFO command or MONITOR should work [16:01:00] But if merged and only 18 connections, perfect. [16:01:34] YuviPanda: BTW, have you decided whether you need the Android SDK on Tools? (https://gerrit.wikimedia.org/r/#/c/125241/) [16:03:35] scfc_de: thanks for the wikitech anti-vandalism. I hope that doesn't become a tradition [16:04:50] I think the current one is a keeper. Doesn't seem to lose interest too easily. [16:14:11] * andrewbogott leaves while the proxies are still working [16:31:39] !log tools tools-webproxy: "iptables -A INPUT -p tcp \! --source 127/8 --dport 6379 -j REJECT" to block connections from other Tools instances to Redis [16:31:41] Logged the message, Master [16:33:56] I'm trying to clone my svn from the toolserver to a local git in a project but git-svn is not installed [16:37:41] phe: please create a bug for that, so Coren can install it [16:37:43] phe: Just installed it on tools-dev and tools-login (I'll submit a Puppet patch later). [16:37:51] oh, that was quick :-p [16:38:04] * valhallasw is now completely confused about who can do what on tools-* [16:38:18] heh [16:38:49] scfc_de: I'll push that patch out [16:39:11] YuviPanda: git-svn or bind? [16:39:27] scfc_de: git-svn [16:39:54] valhallasw: It's not trivial. For example, I don't have sudo on tools-submit :-). [16:40:08] scfc_de, thanks, cloning works now [16:40:22] what's tools-submit? :) [16:41:09] The thing that has a crontab installed that sends out a mail each day on midnight :-). [17:55:46] Where is the page to manage tool accounts these days? I can't for the life of me find it on wikitech [17:56:28] Damianz: https://tools.wmflabs.org ? [17:56:54] there's a link to create a new tool / add/remove maintainers (although that link is broken) [17:57:31] Ah it's under Special:NovaServiceGroup... I had it filtered to another project (which is insane due to no other project really using that page). [17:57:34] Thanks... found it! [18:26:45] Hm.. what's the progress on Ganglia? [18:26:52] Being able to debug things would be easier with it up. [18:26:54] http://ganglia.wmflabs.org/latest/?c=integration&h=integration-slave1001 [18:29:04] And incinga seems flaky as well [18:29:04] NRPE: Command 'root_disk_space' not defined [18:29:08] http://icinga.wmflabs.org/cgi-bin/icinga/status.cgi?hostgroup=integration&style=detail [18:29:40] Welcome to the world of not everything running puppet properly/recently. [18:30:10] also firewall rules are sitll broken for that [18:30:24] * Damianz can't really fix those things properly without access everywhere [18:36:17] Damianz: Is there an open ticket? Someone we can bug in ops? [18:36:23] This can't be snafu, this is unworkable without notifications of any kind. [18:37:06] both 'deployment-prep' and 'integration' projects are used by our daily workflow and we need to get these issued caught before they become critical. Costing more manhours everytime something goes wrong. [18:37:25] What do you need? [18:37:40] I need to check the puppet status time on wikitech for all the instances and then bug ops to kick them/fix fw rules/poke people etc, but it's one of those annoying tasks that will require lots of back and foth and doing it yourself would be a lot easier [18:37:50] Tbf deployment-prep IIRC I have access to, so I can check that [18:40:43] Or I do but not root heh [18:40:53] I have root there I think. [18:41:05] In both of the mentioned projects. [18:41:29] What is the fix? Is it something that new instances get already and its just existing instances? [18:41:38] I'll do whatever upgrade/fixup you need. [18:43:06] I've found a new instances I have that's broken, though should be working. Checking if the puppet config is right or not, if not I'll look at getting a patch in... I thought I'd fixed /this/ sepcific issue before [18:44:47] (03PS1) 10DamianZaremba: Change root check [labs/nagios-builder] - 10https://gerrit.wikimedia.org/r/133289 [18:45:42] (03PS1) 10DamianZaremba: Meh file [labs/nagios-builder] - 10https://gerrit.wikimedia.org/r/133290 [18:45:58] (03CR) 10DamianZaremba: [C: 032 V: 032] Change root check [labs/nagios-builder] - 10https://gerrit.wikimedia.org/r/133289 (owner: 10DamianZaremba) [18:46:09] (03CR) 10DamianZaremba: [C: 032 V: 032] Meh file [labs/nagios-builder] - 10https://gerrit.wikimedia.org/r/133290 (owner: 10DamianZaremba) [18:49:36] (03PS1) 10DamianZaremba: We not have instances with no ip -.- [labs/nagios-builder] - 10https://gerrit.wikimedia.org/r/133291 [18:49:46] (03CR) 10DamianZaremba: [C: 032 V: 032] We not have instances with no ip -.- [labs/nagios-builder] - 10https://gerrit.wikimedia.org/r/133291 (owner: 10DamianZaremba) [18:50:10] I believe that should fix the root disk usage check [18:50:39] Puppet check/nrpe timeouts I'll look into exactlaly why and bug someone for access if required - I think for deployment-prep/integration/most activly used things those bits work [18:50:46] well maybe not puppet... it's an evil beast [18:59:23] (03PS1) 10DamianZaremba: Remove free ram check - someone removed the script, yet the nrpe config entry is still there. [labs/nagios-builder] - 10https://gerrit.wikimedia.org/r/133294 [18:59:38] (03CR) 10DamianZaremba: [C: 032 V: 032] Remove free ram check - someone removed the script, yet the nrpe config entry is still there. [labs/nagios-builder] - 10https://gerrit.wikimedia.org/r/133294 (owner: 10DamianZaremba) [19:01:31] (03PS1) 10DamianZaremba: Ignore hosts that will never see the light of day again [labs/nagios-builder] - 10https://gerrit.wikimedia.org/r/133295 [19:01:40] (03CR) 10DamianZaremba: [C: 032 V: 032] Ignore hosts that will never see the light of day again [labs/nagios-builder] - 10https://gerrit.wikimedia.org/r/133295 (owner: 10DamianZaremba) [19:03:34] (03PS1) 10DamianZaremba: Fix README, remove pmtpa nodes [labs/nagios-builder] - 10https://gerrit.wikimedia.org/r/133296 [19:03:47] (03CR) 10DamianZaremba: [C: 032 V: 032] Fix README, remove pmtpa nodes [labs/nagios-builder] - 10https://gerrit.wikimedia.org/r/133296 (owner: 10DamianZaremba) [19:08:42] (03PS1) 10DamianZaremba: Fix loading ignored hosts from cmdline [labs/nagios-builder] - 10https://gerrit.wikimedia.org/r/133299 [19:08:59] (03CR) 10DamianZaremba: [C: 032 V: 032] Fix loading ignored hosts from cmdline [labs/nagios-builder] - 10https://gerrit.wikimedia.org/r/133299 (owner: 10DamianZaremba) [19:10:09] Krinkle: Aside from puppet status, does that look better for you? Also do you care about free_ram, seems the check got broken when moving to modules (can add the script back if needed) [19:15:24] (03PS1) 10DamianZaremba: Add puppet run callback script, gets run by snmptt to properly format messages to submit to icinga for passive checks [labs/nagios-builder] - 10https://gerrit.wikimedia.org/r/133302 [19:15:39] (03CR) 10DamianZaremba: [C: 032 V: 032] Add puppet run callback script, gets run by snmptt to properly format messages to submit to icinga for passive checks [labs/nagios-builder] - 10https://gerrit.wikimedia.org/r/133302 (owner: 10DamianZaremba) [19:35:26] Damianz: I'm mostly concerned about memory usage in ganglia and disk size check in icinga [19:36:28] Should in theory work then - I think production uses a ganglia check in ops view for ram... could look at getting that in labs maybe. [19:36:54] Ganglia has a memory graph and that was working very recently [19:37:18] icinga in labs I've never used, but its default checks (such as free disk space) would be useful [19:37:32] there is also the new "diamond" stuff [19:37:45] "7 It is capable of collecting cpu, memory, network, i/o, 8 load and disk metrics. [19:38:06] and sends it to graphite [19:44:06] ganglia is going away "someday" not saying avoid it, just saying good to know [20:34:11] 3Wikimedia Labs / 3tools: tools-submit has no database aliases/NAT - 10https://bugzilla.wikimedia.org/65308 (10Tim Landscheidt) 3NEW p:3Unprio s:3normal a:3Marc A. Pelletier Cf. . [20:34:24] 3Wikimedia Labs / 3tools: tools-submit has no database aliases/NAT - 10https://bugzilla.wikimedia.org/65308 (10Tim Landscheidt) [20:34:25] 3Wikimedia Labs / 3Infrastructure: Move LabsDB aliases and NAT to DNS and LabsDB servers - 10https://bugzilla.wikimedia.org/61897 (10Tim Landscheidt) [22:38:33] YuviPanda: Redis is dead? [22:38:42] a930913: what? is it? [22:39:05] a930913: checking [22:39:11] Erm, looks it for the last hour. [22:39:55] a930913: looks up from here [22:40:39] YuviPanda: Up or working? [22:40:45] a930913: up and working [22:40:55] redis 127.0.0.1:6379> set hi bo [22:40:55] OK [22:40:55] redis 127.0.0.1:6379> get hi [22:40:57] "bo" [22:42:09] a930913: yeah, wfm [22:42:42] YuviPanda: python -c "import redis; r=redis.Redis(\"labs-redis\"); ps=r.pubsub(); ps.subscribe([\"testkey\"]);" [22:43:02] a930913: tools-redis is what you're looking for [22:43:07] there's no such thing as labs-redis [22:44:44] Woops. Stuff is still broken though. [22:44:58] That was just my mistake now. [22:46:19] a930913: sorry, still wfm [22:46:26] >>> r = redis.Redis("tools-redis") [22:46:39] >>> r.set("hi", "bo") [22:46:39] True [22:46:39] >>> r.get("hi") [22:46:41] Might just be just ClueBot. Lemme check. [22:46:41] 'bo' [22:46:43] from tools-login [22:46:45] yeah, probably :) [22:46:49] I've to go to sleep now, tho [22:47:01] YuviPanda: Same, g'night. [22:51:05] Damianz: Redis relay is down. [22:51:40] Yeah, it got broken when Coren broke all the crons. Not got to fixing it yet [23:10:41] Damianz: I have a BRfA relying on it atm :/ It was woerking until ~1.5 hours ago. [23:13:25] It was also running 3 times up until 1.5 hours ago ;) [23:13:32] working again now possibly [23:26:29] Damianz: <3 [23:33:50] !log deployment-prep Added irc input to logstash via I409fec9 [23:33:52] Logged the message, Master