MySQL: which API to use? - mysql

I'm just getting started with interfacing to MySQL from a C++ app. The app is pretty simple: it's a Linux web server, and the C++ code retrieves JavaScript from a local database to return to the client via Apache and Ajax. The database will contain no more than a few thousand short JavaScript programs.
Question: any advice on which API I should use? I'm just reading through the docs on dev.mysql.com, and there doesn't seem to be any good reason to choose one or other of libmysql, Connector/C, Connector/C++, MySQL++, or Connector/ODBC. Thanks.

With no more than a few thousand rows, chances are, you should pick your API after your language preferences, not the other way round - so go aheead and chose whatever fits your mood.
If your app's performance stands and falls with the performance differences of the MySQL connectors you should be quite busy fixing your design elsewhere.
I personally prefer portability, so I tend to use a lot of ODBC, accepting the small performance hit, but others might think different. If you never ever want to use a different RDBMS stay away from ODBC - without the portability benefit it's quite ugly.

I would just use the raw C API. Seems to be the simplest way with the least overhead.

Related

geo spatial application: mySql vs CouchDB vs others

I am developing an application on google map and checking out various options to store and retrieve spatial information within a bounding box.
Initially I thought MySql was not a good option, but after checking http://dev.mysql.com/doc/refman/5.6/en/spatial-analysis-functions.html and http://code.google.com/apis/maps/articles/phpsqlsearch.html, looks like I can use MySql and it does support my use cases.
I was also evaluating node.js and couchdb with geocouch.. With modules like socket.io, geo etc looks like this is also a good choice. check out the book "Getting Started with GEO, CouchDB, and Node.js". My application would be 1 page application and I do not foresee if I would require rdbms anytime in future.
i have also seen this - http://nodeguide.com/convincing_the_boss.html and this makes me little apprehensive about whether to go with node.js-geocouch....
If the architecture for your next apps reads like the cookbook of
NoSQL ingredients, please pause for a second and read this.
Yes, Redis, CouchDB, MongoDB, Riak, Casandra, etc. all look really
tempting, but so did that red apple Eve couldn't resist. If you're
already taking a technological risk with using node.js, you shouldn't
multiply it with more technology you probably don't fully understand
yet.
Sure, there are legitimate use cases for choosing a document oriented
database. But if you are trying to build a business on top of your
software, sticking to conservative database technology (like postgres
or mysql) might just outweigh the benefits of satisfying your inner
nerd and impressing your friends.
What is your opinion ?
GeoCouch sounds like a good solution in your case. If you want to have an easy installation, you can have a look at Couchbase Single Server, which is basically a CouchDB with GeoCouch included (check out the Developer Preview for 2.0.

Is it possible to do asynchronous / parallel database query in a Django application?

I have web pages that take 10 - 20 database queries in order to get all the required data.
Normally after a query is sent out, the Django thread/process is blocked waiting for the results to come back, then it'd resume execution until it reaches the next query.
Is there's any way to issue all queries asynchronously so that they can be processed by the database server(s) in parallel?
I'm using MySQL but would like to hear about solutions for other databases too. For example I heard that Postgresql has an async client library - how would I use that in this case?
This very recent blog entry seems to imply that it's not built in to either the django or rails frameworks. I think it covers the issue well and is quite worth a read along with the comments.
http://www.eflorenzano.com/blog/post/how-do-we-kick-our-synchronous-addiction/ (broken link)
I think I remember Cal Henderson mentioning this deficiency somewhere in his excellent speech http://www.youtube.com/watch?v=i6Fr65PFqfk
My naive guess is you might be able to hack something with separate python libraries but you would lose a lot of the ORM/template lazy evaluation stuff django gives to the point you might as well be using another stack. Then again if you are only optimizing a few views in a large django project it might be fine.
I had a similar problem and I solved it with javascript/ajax
Just load the template with basic markup and then do severl ajax requsts to execute the queries and load the data. You can even show loading animation. User will have a web 2.0 feel instead of just gloomy page loading. Ofcourse, this means several more HTTP requests per page, but it's up to you to decide.
Here is how my example looks: http://artiox.lv/en/search?query=test&where_to_search=all (broken link)
Try Celery, there's a bit of a overhead of having to run a ampq server, but it might do what you want. Not sure about concurrency of the DB tho. Also, if you want speed for your DB, I'd recommend MongoDB (but you'll need django-nonrel for that).

Best open source, extendable crawler to use for image crawling

We are in the starting phase of a project, and we are currently wondering
whether which crawler is the best choice for us.
Our project:
Basically, we're going to set up Hadoop and crawl the web for images.
We will then run our own indexing software on the images stored in HDFS
based on the Map/Reduce facility in Hadoop. We will not use other indexing
than our own.
Some particular questions:
Which crawler will handle crawling for images best?
Which crawler will best adapt to a distributed crawling system, in which we
use many servers conducting crawling together?
Right now these look like the 3 best options-
Nutch: Known to scale. Doesn't look like the best option because it seems that is it tied closely to their text searching software.
Heritrix: Also scales. This one currently looks like the best option.
Scrapy: Has not been used on a large scale (not sure though). I dont know if it has the basic stuff like URL canonicalization. I would like to use this one because it is a python framework (I like python more than java), but I don't know if they have implemented the advanced features of a web crawler.
Summary:
We need to get as many images as possible from the web. Which existing crawling framework is both scalable and efficient , but also the one which will be the easiest to modify to get only images?
Thanks!
http://lucene.apache.org/nutch/
I would think going with something with the broadest use and support (community support) would be the better approach.
Nutch may be a good option because you want to end up on HDFS. It may be useful to look into the HBase integration that are currently in the works (NUTCH-650).
You may be able to get the data you need by skipping the index step at the end and instead look at the segments themselves.
However for flexibility another option may be Droids: http://incubator.apache.org/droids/. It's still in the incubator phase at apache, but worth looking at.
You may get some ideas by looking at the SimpleRuntime example in the org.apache.droids.examples. Perhaps by replacing the Sysout handler with one that stores the images onto HDFS that may give you what you want.

What is a Well Documented, Stable, Secure, and Scalable Web Application Framework?

We are building a RESTful API for our company, which will provide XML, JSON, and potentially other content types.
My team is looking to find a framework which is (In order of priority):
Well Documented
Ideally with good tutorials, and a thriving community and knowledgebase
Follows rational design patterns
Mostly we want consistency in the framework. Naming conventions that don't change based upon which method call you're calling.
Secure
Focused on forcing the developer to perform some form of validation of the GET, POST, PUT and DELETE Variables
Stable
Part of this is maturity, in the sense that the framework isn't changing too often
The other part is a well documented bug list which isn't scarily huge
Scalable/Performance Oriented
We have over 50K users who require significant high availability all around the world. IF our App goes down, people do not have internet in their home. So it's a highly critical environment.
Ideally we could launch the same codebase on 10 servers and just keep adding loadbalancers. We don't want to have to define which server is on which methods....
Integrates well with a Linux/MySQL Environment
We don't have a single MS server. We're not changing that. Sorry .Net fans :-D
I realize this a nebulous goal. There will not be any one framework that meets all of these needs, in fact there will probably be many that meet them in varying ways, shapes and forms.
This is language independent. We already have experience in PHP, but we also have developers who have never written a web application in their life, so learning Python or Ruby or Java is acceptable.
I'll go out on a limb here and suggest Ruby with Sinatra.
Why?
Sinatra isn't "well documented" but is "documented well". Considering that it is much more simple than other frameworks there needn't be quite so much documentation, and since it is built on Rack as a webserver it shares some common documentation with that. But what you need to know is on the website, and it's well written and contains no errors that I've found (IE, it's all up to date).
Most of what you need to know is in the Sinatra Book, the Readme, and the FAQ. Despite the work-in-progress nature of the book its contents are very much accurate and useful. And, if you are still stuck with questions, drop by the IRC chat room freenode.net#sinatra.
Sinatra is capable of being used in a functional/route-based logic method, or by overriding the Sinatra::Application object. You can use either, split your logic and methods into various files, or keep it all in one. It's all up to you.
Sinatra is, of itself, secure. You MUST validate all variables sent by the user, because aside from parsing them and passing them to you, Sinatra doesn't care how valid it is. Therefore, you either enforce validity of your variables or you regret it. ;-)
Sinatra hasn't changed a bunch in the last four months, but it certainly has had maintenance and minor updates. In addition, I've not found the bug list to be large or threatening. It's got virtually everything I need already to build my apps with.
Sinatra doesn't have to be deployed with Passenger, but can easily be custom tailored to be fast. If you use things like Enterprise Ruby and Thin you could proxy to either Nginix or LightHTTPd. If you took two servers you could make one the primary (with the proxy and a number of threads) and the second the database server (with MySQL and a number of threads) and let them loose. This way the tasks are spread across the servers. It'll give you more control than I think Passenger would. (Not to mention better performance.)
I find Passenger (on Dreamhost) to give relatively poor performance when compared against running threads by either Rack, Mongrel, or Thin. That said, once loaded the applications are responsive even in that environment. If I were to predict it, you'd not have a problem with scaling the application as you'd simply have to redeploy your code and restart the threads–nothing that can't be put into Capistrano.
Ruby on Linux is fast and isn't a problem to implement. MySQL with Ruby is easy enough, and there are several really good ORM packages available like ActiveRecord and Sequel. Sinatra won't make you choose one that you hate.
In addition to the answers to your questions, I have a few more reasons.
Sinatra has an easy learning curve, and is very easy to pick up. The biggest problem I had was getting it onto my Dreamhost server since Rack was an older version, but with a vendored version of Rack the problem vanished. If I could, I'd rewrite my latest Rails project in Sinatra with ActiveRecord so as to make maintenance easy upon myself; too much effort was spent in it already.
Thanks to its ease of use and ease of learning, I have found myself more productive in Sinatra without code generators than in Rails with all the code generators. And that's saying something.
Sinatra supports middleware for Rack, and is therefore very flexible in what you can do with it.
If I were to average out the helpfulness of the Sinatra community, on IRC, I'd say that they're more knowledgeable about the framework than the average Rails user–just as a cursory comparison. The reason being that Rails is more accessible to newbies and people who just have no business programming.
Sinatra will support Ruby 1.9. I'm still not entirely certain just how much support for 1.9 there is currently in Sinatra, but I do know they were initially waiting on Rack. As of April 25 this is no longer an issue, so presumably Sinatra is already prepared for 1.9; I know for a fact 1.9 support is in the pipeline for mid 2009, but I don't know how long that will be.
Assuming you can get Sinatra working with Ruby 1.9 with a little effort (version 0.9.2 already supports Rack 1.0, and by proxy 1.9 in Rack's code), before the public 1.0 with support for 1.9, your performance on the Ruby side would be stellar. Even if you cannot, then Enterprise Ruby would help the speed.
Both Django and Rails come pretty close to fitting most of your criteria, except I think that Django's documentation is way better than that of Ruby on Rails'; the documentation for Django is nothing short of amazing (and I'm not being hyperbolic here).
I don't know about the scalability of Django, though. I know Rails scales pretty well (up to a point), but I don't know if the same can be said of Django. (I'm not saying it can't; I'm just saying that I honestly don't know, as I've never written a large application using Django.)
Django also has a pony, in case you secretly desire that, too.
Well. Scalability is nothing easy to get. For Google-like response times, you need something like MapReduce. Ok. Don't kid yourself, super-scalability is nothing for beginners.
As for all the other points, Seaside is clearly best. As for security, check out seaside.st to see why it is inherently more secure than all the other frameworks I am aware of (including Rails and Seam, e.g.). Seaside is reasonably well-documented, but also looking at the internals of seaside is so easy and convenient, that hardly a question remains open for the community to answer, which it does fast usually. Seaside has been stable for many years now, so I think you'll be fine with that.
As for Performance oriented: Run the commercial Seaside, GLASS, and you will get stunning performance compared to a LAMP-like setup, due to the much faster database solution that is integrated, and the framework which trades memory for speed and gets a lot of speed.
Seaside is architectured so well that many people find writing Seaside apps easier than desktop applications. Try it out, you'll love it.
PS: For the record, Seaside is not RESTful.
You can have a look at Django, Python framwork.
It's a very well documented framework, it has an automatic CRUD administrator interface on database and it has also a free book online, that of course you can buy for real :)
Try them all to find out the correct answer!
Well, the people who will be suggesting 'one framework to rule them all' won't have tried them all either!
I guess if there were such a framework, it would be one and only.
For PHP, I've loved the Zend framework (though, to me it is not really a framework). One of it's best features is that each component is independent of the others... So if there is some part of it you don't like, just don't use it. Also, you mention JSON... Zend fully supports JSON in both directions....
Ruby on Rails is vastly documented with loads of plugins and has been tested in scalability already ( see BaseCamp and other solutions made in rails)
Looking at your list of priorities it's hard to say that any one route is the "right" way to go. On the PHP side I've spent a significant amount of time with CakePHP which accomplishes much of what you looking for. But being a guy who hates PHP I would suggest steering clear of anything in that realm.
It's all about style and experience. I've used Ruby On Rails, which isn't the most elegant of languages but it does the job exceptionally well. It hadn't matured as much as using a Spring/Hibernate stack on Java or using .Net which handles almost everything straight out of the box, but it does the job exceptionally well. I prefer the Java/.Net based projects because it fits much better with the way I like to program.
There is no "right" answer, just lots of good ones. ASP.Net MVC for example is a good choice. Forever ago I used Spring on Java which was also fairly effective at accomplishing the job. Even PHP is not a wrong choice. Ruby On Rails, which I have only done two projects with, is very easy to pick up and it makes some rather complicated tasks in other languages fairly simple.
I think that for sheer volume of documentation you can't beat J2EE. It's also believed to be insanely scalable and stable.
Now, from there to really being desirable....
If you're considering Java I would recommend Jersey, it works great and I think it reaches all you 5 goals...
If Java is in your toolkit, look at Stripes.
Rock stable, enthusiastic, though no a spectacularly large community. Good docs, some out of date granted, but the system is so stable even the "old stuff" is relevant. A real nice, recent (late last year) book. Stripes is small enough that the book can, and does, "cover everything".
It's an action framework, doesn't do much in the presentation area (save for forms, mostly, and it has a completely optional templating/layout facility). You can use JSP or FreeMarker, or, really, anything else. It can also do web services (though not as well as something like Jersey).
It is back end agnostic, but there is a JPA integration project for it.
Finally, you can leverage, if you like, all of the other Java/Java EE kit if you want. Since Stripes doesn't consume the entire stack, you have a lot of flexibility to pick and choose the parts you want. Full boat Java EE, Transactions, Session Beans, JMS. Works with Spring (it is "conscious" of Spring and has good integration) JPA, iBatis, Hibernate, raw JDBC, Lucene, JSR-170 Content Repository, whatever.
It's a great piece of kit.
For a 2014 answer, I would recommend Laravel/Slim Framework (PHP), Ruby on Rails/Sinatra (Ruby), Django/Flask (Python), Grails (Groovy, JVM-based language), Play! Framework (Java/Scala) or Sails.js/Kraken.js (Javascript).
Where the first framework mentioned is a bit larger and the second is a bit smaller for the languages where I mention 2 frameworks with the use of a "/".
I hope this helps people that have similar questions 5 years later.
Try cppcms
it is a high performance Web Development Framework

Benefits of cross-platform development?

Are there benefits to developing an application on two or more different platforms? Does using a different compiler on even the same platform have benefits?
Yes, especially if you plan to distribute your code for multiple platforms.
But even if you don't cross platform development is a form of futureproofing; if it runs on multiple (diverse) platforms today, it's more likely to run on future platforms than something that was tuned, tweeked, and specialized to work on a version 7.8.3 clean install of vendor X's Q-series boxes (patch level 1452) and nothing else.
There seems to be a benefit in finding and simply preventing bugs with a different compiler and a different OS. Different CPUs can pin down endian issues early. There is the pain at the GUI level if you want to stay native at that level.
Short answer: Yes.
Short of cloning a disk, it is almost impossible to make two systems exactly alike, so you are going to end up running on "different platforms" whether you meant to or not. By specifically confronting and solving the "what if system A doesn't do things like B?" problem head on you are much more likely to find those key assumptions your code makes.
That said, I would say you should get a good chunk of your base code working on system A, and then take a day (or a week or ...) and get it running on system B. It can be very educational.
My education came back in the 80's when I ported a source level C debugger to over 100 flavors of U*NX. Gack!
Are there benefits to developing an application on two or more different platforms?
If this is production software, the obvious reason is the lure of a larger client base. Your product's appeal is magnified the moment the client hears that you support multiple platforms. Remember, most enterprises do not use a single OS or even a single version of the OS. It is fairly typical to find a section using Windows, another Mac and a smaller version some flavor of Linux.
It is also seen that customizing a product for a single platform is often far more tedious than to have it run on multi-platform. The law of diminishing returns kicks in even before you know.
Of course, all of this makes little sense, if you are doing customization work for an existing product for the client's proprietary hardware. But even then, keep an eye out for the entire range of hardware your client has in his repertoire -- you never know when he might ask for it.
Does using a different compiler on even the same platform have benefits?
Yes, again. Different compilers implement different extensions. See to it that you are not dependent on a particular version of a particular compiler.
Further, there may be a bug or two in the compiler itself. Using multiple compilers helps sort these out.
I have further seen bits of a (cross-platform) product using two different compilers -- one was to used in those modules where floating point manipulation required a very high level of accuracy. (Been a while I've heard anyone else do that, but ...)
I've ported a large C++ program, originally Win32, to Linux. It wasn't very difficult. Mostly dealing with compiler incompatibilities, because the MS C++ compiler at the time was non-compliant in various ways. I expect that problem has mostly gone now (until C++0x features start gradually appearing). Also writing a simple platform abstraction library to centralize the platform-specific code in one place. It depends to what extent you are dependent on services from the OS that would be hard to mimic on a new platform.
You don't have to build portability in from the ground up. That's why "porting" is often described as an activity you can perform in one shot after an initial release on your most important platform. You don't have to do it continuously from the very start. Purely for economic reasons, if you can avoid doing work that may never pay off, obviously you should. The cost of porting later on, when really necessary, turns out to be not that bad.
Mostly, there is an existing platform where the application is written for (individual software). But you adress more developers (both platforms), if you decide to provide an independent language.
Also products (standard software) for SMEs can be sold better if they run on different platforms! You can gain access to both markets, WIN&LINUX! (and MacOSx and so on...)
Big companies mostly buy hardware which is supported/certified by the product vendor only to deploy the specified product.
If you develop on multiple platforms at the same time you get the advantage of being able to use different tools. For example I once had a memory overwrite (I still swear I didn't need the +1 for the null byte!) that cause "free" to crash. I brought the code up to speed on Windows and found the overwrite in about 1 minute with Rational Purify... it had taken me a week under Linux of chasing it (valgrind might have found it... but I didn't know about it at the time).
Different compilers on the same or different platforms is, to me, a must as each compiler will report different things, and sometimes the report from one compiler about an error will be gibberish but the other compiler makes it very clear.
Using things like multiple databases while developing means you are much less likely to tie yourself to a particular database which means you can swap out the database if there is a reason to do so. If you want to integrate something that uses Oracle into a existing infrastructure that uses SQL Server for example it can really suck - much better if the Oracle or SQL Server pieces can be moved to the other system (I know of some places that have 3 different databases for their financial systems... ick).
In general, always developing for two or three things means that the odds of you finding mistakes is better, and the odds of the system being more flexible is better.
On the other hand all of that can take time and effort that, at the immediate time, is seen as an unneeded expense.
Some platforms have really dreadful development tools. I once worked in an IB where rather than use Sun's ghastly toolset, peole developed code in VC++ and then ported to Solaris.