geo spatial application: mySql vs CouchDB vs others - mysql

I am developing an application on google map and checking out various options to store and retrieve spatial information within a bounding box.
Initially I thought MySql was not a good option, but after checking http://dev.mysql.com/doc/refman/5.6/en/spatial-analysis-functions.html and http://code.google.com/apis/maps/articles/phpsqlsearch.html, looks like I can use MySql and it does support my use cases.
I was also evaluating node.js and couchdb with geocouch.. With modules like socket.io, geo etc looks like this is also a good choice. check out the book "Getting Started with GEO, CouchDB, and Node.js". My application would be 1 page application and I do not foresee if I would require rdbms anytime in future.
i have also seen this - http://nodeguide.com/convincing_the_boss.html and this makes me little apprehensive about whether to go with node.js-geocouch....
If the architecture for your next apps reads like the cookbook of
NoSQL ingredients, please pause for a second and read this.
Yes, Redis, CouchDB, MongoDB, Riak, Casandra, etc. all look really
tempting, but so did that red apple Eve couldn't resist. If you're
already taking a technological risk with using node.js, you shouldn't
multiply it with more technology you probably don't fully understand
yet.
Sure, there are legitimate use cases for choosing a document oriented
database. But if you are trying to build a business on top of your
software, sticking to conservative database technology (like postgres
or mysql) might just outweigh the benefits of satisfying your inner
nerd and impressing your friends.
What is your opinion ?

GeoCouch sounds like a good solution in your case. If you want to have an easy installation, you can have a look at Couchbase Single Server, which is basically a CouchDB with GeoCouch included (check out the Developer Preview for 2.0.

Related

MySQL: which API to use?

I'm just getting started with interfacing to MySQL from a C++ app. The app is pretty simple: it's a Linux web server, and the C++ code retrieves JavaScript from a local database to return to the client via Apache and Ajax. The database will contain no more than a few thousand short JavaScript programs.
Question: any advice on which API I should use? I'm just reading through the docs on dev.mysql.com, and there doesn't seem to be any good reason to choose one or other of libmysql, Connector/C, Connector/C++, MySQL++, or Connector/ODBC. Thanks.
With no more than a few thousand rows, chances are, you should pick your API after your language preferences, not the other way round - so go aheead and chose whatever fits your mood.
If your app's performance stands and falls with the performance differences of the MySQL connectors you should be quite busy fixing your design elsewhere.
I personally prefer portability, so I tend to use a lot of ODBC, accepting the small performance hit, but others might think different. If you never ever want to use a different RDBMS stay away from ODBC - without the portability benefit it's quite ugly.
I would just use the raw C API. Seems to be the simplest way with the least overhead.

Is RavenDB just a frontend for Access?

I've started using Raven for my last project. When my boss learned about it, he mentioned it's based on Access and he had very bad experience with multiple users and Access. Now I have to either switch or prove to him he is wrong.
No, it isn't. The confusion is because RavenDB can use ESENT for data storage and ESENT used to be called Jet Blue. It was called Jet Blue because it was originally developed to replace the Jet Red engine which was/is used in Access. The Wikipedia entry is quite accurate about the history and differences.
Laurion's answer is correct, but I also wanted to point out that in Raven you can swap out the ESENT storage engine for another that Oren developed called Munin.
From Ayende's blog post about Munin.
Raven.Munin is the actual implementation of a low level managed storage for RavenDB. I split it out of the RavenDB project because I intend to make use of it in additional projects.
At its core, Munin provides high performance transactional, non relational, data store written completely in managed code. The main point in writing it was to support the managed storage in RavenDB, but it is going to be used for Raven MQ as well, and probably a bunch of other stuff as well. I’ll post about Raven MQ in the future, so don’t bother asking about it.
Munin is a low level api, not something that you are likely to use directly. And it was explicitly modeled to give me an interface similar in capability to what Esent gives me, but in purely managed code.

Best open source, extendable crawler to use for image crawling

We are in the starting phase of a project, and we are currently wondering
whether which crawler is the best choice for us.
Our project:
Basically, we're going to set up Hadoop and crawl the web for images.
We will then run our own indexing software on the images stored in HDFS
based on the Map/Reduce facility in Hadoop. We will not use other indexing
than our own.
Some particular questions:
Which crawler will handle crawling for images best?
Which crawler will best adapt to a distributed crawling system, in which we
use many servers conducting crawling together?
Right now these look like the 3 best options-
Nutch: Known to scale. Doesn't look like the best option because it seems that is it tied closely to their text searching software.
Heritrix: Also scales. This one currently looks like the best option.
Scrapy: Has not been used on a large scale (not sure though). I dont know if it has the basic stuff like URL canonicalization. I would like to use this one because it is a python framework (I like python more than java), but I don't know if they have implemented the advanced features of a web crawler.
Summary:
We need to get as many images as possible from the web. Which existing crawling framework is both scalable and efficient , but also the one which will be the easiest to modify to get only images?
Thanks!
http://lucene.apache.org/nutch/
I would think going with something with the broadest use and support (community support) would be the better approach.
Nutch may be a good option because you want to end up on HDFS. It may be useful to look into the HBase integration that are currently in the works (NUTCH-650).
You may be able to get the data you need by skipping the index step at the end and instead look at the segments themselves.
However for flexibility another option may be Droids: http://incubator.apache.org/droids/. It's still in the incubator phase at apache, but worth looking at.
You may get some ideas by looking at the SimpleRuntime example in the org.apache.droids.examples. Perhaps by replacing the Sysout handler with one that stores the images onto HDFS that may give you what you want.

What is a Well Documented, Stable, Secure, and Scalable Web Application Framework?

We are building a RESTful API for our company, which will provide XML, JSON, and potentially other content types.
My team is looking to find a framework which is (In order of priority):
Well Documented
Ideally with good tutorials, and a thriving community and knowledgebase
Follows rational design patterns
Mostly we want consistency in the framework. Naming conventions that don't change based upon which method call you're calling.
Secure
Focused on forcing the developer to perform some form of validation of the GET, POST, PUT and DELETE Variables
Stable
Part of this is maturity, in the sense that the framework isn't changing too often
The other part is a well documented bug list which isn't scarily huge
Scalable/Performance Oriented
We have over 50K users who require significant high availability all around the world. IF our App goes down, people do not have internet in their home. So it's a highly critical environment.
Ideally we could launch the same codebase on 10 servers and just keep adding loadbalancers. We don't want to have to define which server is on which methods....
Integrates well with a Linux/MySQL Environment
We don't have a single MS server. We're not changing that. Sorry .Net fans :-D
I realize this a nebulous goal. There will not be any one framework that meets all of these needs, in fact there will probably be many that meet them in varying ways, shapes and forms.
This is language independent. We already have experience in PHP, but we also have developers who have never written a web application in their life, so learning Python or Ruby or Java is acceptable.
I'll go out on a limb here and suggest Ruby with Sinatra.
Why?
Sinatra isn't "well documented" but is "documented well". Considering that it is much more simple than other frameworks there needn't be quite so much documentation, and since it is built on Rack as a webserver it shares some common documentation with that. But what you need to know is on the website, and it's well written and contains no errors that I've found (IE, it's all up to date).
Most of what you need to know is in the Sinatra Book, the Readme, and the FAQ. Despite the work-in-progress nature of the book its contents are very much accurate and useful. And, if you are still stuck with questions, drop by the IRC chat room freenode.net#sinatra.
Sinatra is capable of being used in a functional/route-based logic method, or by overriding the Sinatra::Application object. You can use either, split your logic and methods into various files, or keep it all in one. It's all up to you.
Sinatra is, of itself, secure. You MUST validate all variables sent by the user, because aside from parsing them and passing them to you, Sinatra doesn't care how valid it is. Therefore, you either enforce validity of your variables or you regret it. ;-)
Sinatra hasn't changed a bunch in the last four months, but it certainly has had maintenance and minor updates. In addition, I've not found the bug list to be large or threatening. It's got virtually everything I need already to build my apps with.
Sinatra doesn't have to be deployed with Passenger, but can easily be custom tailored to be fast. If you use things like Enterprise Ruby and Thin you could proxy to either Nginix or LightHTTPd. If you took two servers you could make one the primary (with the proxy and a number of threads) and the second the database server (with MySQL and a number of threads) and let them loose. This way the tasks are spread across the servers. It'll give you more control than I think Passenger would. (Not to mention better performance.)
I find Passenger (on Dreamhost) to give relatively poor performance when compared against running threads by either Rack, Mongrel, or Thin. That said, once loaded the applications are responsive even in that environment. If I were to predict it, you'd not have a problem with scaling the application as you'd simply have to redeploy your code and restart the threads–nothing that can't be put into Capistrano.
Ruby on Linux is fast and isn't a problem to implement. MySQL with Ruby is easy enough, and there are several really good ORM packages available like ActiveRecord and Sequel. Sinatra won't make you choose one that you hate.
In addition to the answers to your questions, I have a few more reasons.
Sinatra has an easy learning curve, and is very easy to pick up. The biggest problem I had was getting it onto my Dreamhost server since Rack was an older version, but with a vendored version of Rack the problem vanished. If I could, I'd rewrite my latest Rails project in Sinatra with ActiveRecord so as to make maintenance easy upon myself; too much effort was spent in it already.
Thanks to its ease of use and ease of learning, I have found myself more productive in Sinatra without code generators than in Rails with all the code generators. And that's saying something.
Sinatra supports middleware for Rack, and is therefore very flexible in what you can do with it.
If I were to average out the helpfulness of the Sinatra community, on IRC, I'd say that they're more knowledgeable about the framework than the average Rails user–just as a cursory comparison. The reason being that Rails is more accessible to newbies and people who just have no business programming.
Sinatra will support Ruby 1.9. I'm still not entirely certain just how much support for 1.9 there is currently in Sinatra, but I do know they were initially waiting on Rack. As of April 25 this is no longer an issue, so presumably Sinatra is already prepared for 1.9; I know for a fact 1.9 support is in the pipeline for mid 2009, but I don't know how long that will be.
Assuming you can get Sinatra working with Ruby 1.9 with a little effort (version 0.9.2 already supports Rack 1.0, and by proxy 1.9 in Rack's code), before the public 1.0 with support for 1.9, your performance on the Ruby side would be stellar. Even if you cannot, then Enterprise Ruby would help the speed.
Both Django and Rails come pretty close to fitting most of your criteria, except I think that Django's documentation is way better than that of Ruby on Rails'; the documentation for Django is nothing short of amazing (and I'm not being hyperbolic here).
I don't know about the scalability of Django, though. I know Rails scales pretty well (up to a point), but I don't know if the same can be said of Django. (I'm not saying it can't; I'm just saying that I honestly don't know, as I've never written a large application using Django.)
Django also has a pony, in case you secretly desire that, too.
Well. Scalability is nothing easy to get. For Google-like response times, you need something like MapReduce. Ok. Don't kid yourself, super-scalability is nothing for beginners.
As for all the other points, Seaside is clearly best. As for security, check out seaside.st to see why it is inherently more secure than all the other frameworks I am aware of (including Rails and Seam, e.g.). Seaside is reasonably well-documented, but also looking at the internals of seaside is so easy and convenient, that hardly a question remains open for the community to answer, which it does fast usually. Seaside has been stable for many years now, so I think you'll be fine with that.
As for Performance oriented: Run the commercial Seaside, GLASS, and you will get stunning performance compared to a LAMP-like setup, due to the much faster database solution that is integrated, and the framework which trades memory for speed and gets a lot of speed.
Seaside is architectured so well that many people find writing Seaside apps easier than desktop applications. Try it out, you'll love it.
PS: For the record, Seaside is not RESTful.
You can have a look at Django, Python framwork.
It's a very well documented framework, it has an automatic CRUD administrator interface on database and it has also a free book online, that of course you can buy for real :)
Try them all to find out the correct answer!
Well, the people who will be suggesting 'one framework to rule them all' won't have tried them all either!
I guess if there were such a framework, it would be one and only.
For PHP, I've loved the Zend framework (though, to me it is not really a framework). One of it's best features is that each component is independent of the others... So if there is some part of it you don't like, just don't use it. Also, you mention JSON... Zend fully supports JSON in both directions....
Ruby on Rails is vastly documented with loads of plugins and has been tested in scalability already ( see BaseCamp and other solutions made in rails)
Looking at your list of priorities it's hard to say that any one route is the "right" way to go. On the PHP side I've spent a significant amount of time with CakePHP which accomplishes much of what you looking for. But being a guy who hates PHP I would suggest steering clear of anything in that realm.
It's all about style and experience. I've used Ruby On Rails, which isn't the most elegant of languages but it does the job exceptionally well. It hadn't matured as much as using a Spring/Hibernate stack on Java or using .Net which handles almost everything straight out of the box, but it does the job exceptionally well. I prefer the Java/.Net based projects because it fits much better with the way I like to program.
There is no "right" answer, just lots of good ones. ASP.Net MVC for example is a good choice. Forever ago I used Spring on Java which was also fairly effective at accomplishing the job. Even PHP is not a wrong choice. Ruby On Rails, which I have only done two projects with, is very easy to pick up and it makes some rather complicated tasks in other languages fairly simple.
I think that for sheer volume of documentation you can't beat J2EE. It's also believed to be insanely scalable and stable.
Now, from there to really being desirable....
If you're considering Java I would recommend Jersey, it works great and I think it reaches all you 5 goals...
If Java is in your toolkit, look at Stripes.
Rock stable, enthusiastic, though no a spectacularly large community. Good docs, some out of date granted, but the system is so stable even the "old stuff" is relevant. A real nice, recent (late last year) book. Stripes is small enough that the book can, and does, "cover everything".
It's an action framework, doesn't do much in the presentation area (save for forms, mostly, and it has a completely optional templating/layout facility). You can use JSP or FreeMarker, or, really, anything else. It can also do web services (though not as well as something like Jersey).
It is back end agnostic, but there is a JPA integration project for it.
Finally, you can leverage, if you like, all of the other Java/Java EE kit if you want. Since Stripes doesn't consume the entire stack, you have a lot of flexibility to pick and choose the parts you want. Full boat Java EE, Transactions, Session Beans, JMS. Works with Spring (it is "conscious" of Spring and has good integration) JPA, iBatis, Hibernate, raw JDBC, Lucene, JSR-170 Content Repository, whatever.
It's a great piece of kit.
For a 2014 answer, I would recommend Laravel/Slim Framework (PHP), Ruby on Rails/Sinatra (Ruby), Django/Flask (Python), Grails (Groovy, JVM-based language), Play! Framework (Java/Scala) or Sails.js/Kraken.js (Javascript).
Where the first framework mentioned is a bit larger and the second is a bit smaller for the languages where I mention 2 frameworks with the use of a "/".
I hope this helps people that have similar questions 5 years later.
Try cppcms
it is a high performance Web Development Framework

Is there a business proven cloud store / Key=>Value Database? (Open Source) [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 8 years ago.
Improve this question
I have been looking for cloud computing / storage solutions for a long time (inspired by the Google Bigtable). But I can't find a easy-to-use, business-ready solution.
I'm searching a simple, fault tolerant, distributed Key=>Value DB like SimpleDB from Amazon.
I've seen things like:
The CouchDB Project : Simple and distributed, fault-tolerant Database. But it understands only JSON. No XML connectors etc.
Eucalyptus : Nice Amazon EC2 interfaces. Open Standards & XML. But less distributed and less fault-tolerant? There are also a lot of open tickets with XEN/VMWare issues.
Cloudstore / Kosmosfs : Nice distributed, fault tolerant fs. But it's hard to configure. Are there any java connectors?
Apache Hadoop : Nice system which much more then abilities to store data. Uses its own Hadoop Distributed File System and has been testet on clusters with 2000 nodes.
*Amazon SimpleDB : Can't find an open-source alternative! It's a nice but expensive system for huge amounts of data. And you're addicted to Amazon.
Are there other, better solutions out there? Which one is the best to choose? Which one offers the smallest amount of SOF(Singe Point of Failure)?
How about memcached?
The High Scalability blog covers this issue; if there's an open source solution for what you're after, it'll surely be there.
Other projects include:
Project Voldemort
Lightcloud - Key-Value Database
Ringo - Distributed key-value storage for immutable data
Another good list: Anti-RDBMS: A list of distributed key-value stores
MongoDB is another option which is very similar to CouchDB, but using query language very similar to SQL instead of map/reduce in JavaScript. It also supports indexes, query profiling, replication and storage of binary data.
It has huge amount of documentation which might be overwhelming at fist, so I would suggest to start with Developer's tour
Wikipedia says that Yahoo both contributes to Hadoop and uses it in production (article linked from wikipedia). So I'd say it counts for business-provenness, although I'm not sure whether it counts as a K/V value database.
Not on your list is the Friendfeed system of using MySQL as a simple schema-less key/value store.
It's hard for me to understand your priorities. CouchDB is simple, fault-tolerant, and distributed, but somehow you exclude it because it doesn't have XML. Are XML and Java connectors an unstated requirement?
(Anyway, CouchDB should in fact be excluded because it's young, its API isn't stable, and it's not a key-value store.)
I use Google's Google Base api, it's Xml, free, documented, cloud based, and has connectors for many languages. I think it will fill your bill if you want free hosting too.
Now if you want to host your own servers Tokyo cabinet is your answer, its key=>value based, uses flat files, and is the fastest database out there right now (very barebones compared to say Oracle, but incredibly good at storing and accessing data, about 1 million records per second, with about 10bytes of overhead (depending on the storage engine)). As for business ready TokyoCabinet is the heart of a service called Mixi, which is the equivalent of Japan's Facebook+MyPage, with several million heavy users, so it's actually very battle proven.
If you want something like Bigtable, you can't go past HBase or Hypertable - they're both open-source Bigtable clones. One thing to consider, though, is if your requirements really are 'big enough' for Bigtable. It scales up to thousands of tablet servers, and as such, has quite a bit of infrastructure under it to enable that (for example, handling the expectation of regular node failures).
If you don't anticipate growing to, at the very least, tens of tablet servers, you might want to consider one of the proposed alternatives: You can't beat BerkelyDb for simplicity, or MySQL for ubiquity. If all you need is a key/value datastore, you can put a simple 'dict' wrapper around your database interface, and switch out your backend if you outgrow one.
You might want to look at hypertable which is modeled after google's bigtable.
Use The CouchDB
Whats wrong with JSON?
JSON to XML is trivial
You might want to take a look at this (using MySQL as key-value store):
http://bret.appspot.com/entry/how-friendfeed-uses-mysql
Cloudera is a company that commercializes Apache Hadoop, with some value-add of course, like productization, configuration, training & support services.
Instead of looking for something inspired by Google's bigtable- Why not just use bigtable directly? You could write a front-end on Google App-Engine.
Good compilation of storage tools for your question :
http://www.metabrew.com/article/anti-rdbms-a-list-of-distributed-key-value-stores/
Tokyo Cabinet has also received some attention as it supports table schemas, key value pairs and hash tables. It uses Lua as an embedded scripting platform and uses HTTP as it's communication protocol Here is an great demonstration.