Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 9 years ago.
Improve this question
I've read the whole SQL vs NoSql stuff out there in the Internet (spent a few days on it so I have rights to call it that way :) ) and still have a feeling I'm far away from being able to decide wich platform our products shall go with.
We're about to start designing a new set of products that mostly fit CRM/CMS categories, I'd say several B2B, B2C, B2E, E-Commerce as well as other financial and banking apps. So it's gonna be a complex system with dozens of databases solving different tasks. Let's concentrate on the DB area. I found this article is particularly interesting for DB systems in the world of enterprise. So the actual problem is:
Is it better to stay with good old RDBMS such as MySql (yes, it has to be open-source, that's the only requirement) or start off with NoSQL such as MongoDB/CouchDB (I guess Cassandra is too scalable for CRM, it's not going to be a very distributed and heavily clustered system. Up to 4 strong guys will do the job perfectly)???
As additional details I can say that a lot of media stuff and docs will be engaged in the system, this is a must for stores, markets, HR systems. And that the consumers of the storage will be web apps mainly.
Would it be better to split the DB back-end into two parts: RDBMS serving relational data and NoSQL for the media storage?
What you think and if you have examples or such an experience any help will just extremely help to avoid future problems. So Thank you guys in advance!
There are NoSQL (NewSQL) databases that are fully ACID compliant that you could consider. I would use one of those to handle the transactional CRM data. There are simply too many benefits using these compared to traditional relational databases:
Much better performance
Schemaless
Some let you remove the ORM completely and uses the created objects automatically
Some have integrated web server with REST/JSON support, that would be nice for you since you will work with web apps for the end user.
The ACID part is very important if you will build a CRM. I once build a CRM system that uses a NoSQL database and the performance made it possible to add features we never would have considered if we had used a traditional RMDBMS.
I like the idea that you should put the media and documents into a CDN and then refer to them from your database.
Your open source requirement could be a bit of a showstopper though.
I wrote an article on the subject that you might give some advice in the topic of selecting a database:
http://www.ulitzer.com/node/2636237
Related
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 3 years ago.
Improve this question
i have two scenarios here :
migrating mysql database to nosql without code change(no orms are used)
using no foriegn keys and indexes in mysql(because they want to migrate to different database in future)
3.all this done by very less code change
these questions are asked by my team lead. so i dont have a answer to give him properly because i feel it very unlikely to do mysql with no indexes and foreign keys and first of all if they are not meant to use mysql.then why they choose that.
i want to know that people do like this in software industries
ofently or they will choose on their need fits correctly
they are saying that foreign key validitations are done by api level
not by mysql level
i dont understand them becasue i have less experience so i dont have an answer why they are saying like this. please give me some insight to this that if this is a good practice or not ?
I don't think it will be possible without adding code - you need to implement how your data is managed by your nosql dB engine in some way. If the project is coded with a clear separation of business logic and database code, it's a simple matter of using the new database implementation instead of the old one. If that is not the case and your db implementation leaked into your business logic, then it will not be possible to switch without changing code. Depending on the size of the code base it might /will most likely be too expensive.
If you want to see an example of a clean separation of dB logic from business logic, have a look at this repository: https://github.com/fathersson/money-transfer
(this is not my repository, I just stumbled upon it today)
If you want to learn and understand the principles driving that design, start by looking for "clean architecture" and/or "Domain Driven Design" - the first one is easier to understand in my opinion and there are some talks on YouTube by Robert C. Martin that you can have a look at before buying some books.
Edit: The project I'm working on at the moment did change from postgresql running on rds to dynamodb using a different repository without changing any existing business logic. It saves a lot of money that way. So yes, changing the db backend does happen and is driven by requirements.
In addition to that, when I start working on a new feature set/micro service/bounded context I usually start with a simple in memory repository implementation that's using a map. After I'm done with the initial set of use cases, I know more about the db requirements and choose the db engine based on these and the general requirement to limit the number of different technologies in use.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
I'm doing research before I create my social network database and I've found a lot of questions/resources pertaining to graph and key-value databases for social networks. I understand there are a TON of different options and ways to implement the DB. I also understand that what the big companies do is complex and way above what I currently need (1b+ users). I also know each of the big companies have revamped their databases to account for the insane scaling they go through.
Because I don't know how the network will grow, and I don't believe I can accurately create a model that will scale to 1m users (due to unknowns such as how people will use it, how often people post, comment, etc). But I can at least try to create a database that will be easiest to scale when (if) the need arises.
Do most companies create a database to handle up to 1k users, then once they grow, they revamp it for 10k users, then 100k, etc? If they do, at each of these arbitrary numbers (because of the unknowns listed above), do companies typically change a few tables/nodes/etc, or do they completely recreate the database to take advantage of new technologies (such as moving from SQL to graph)?
I want to pick the best solution, but I'm finding the decision between graph, key-value, SQL, among others very difficult--especially with no data to know what relationships/data is most important. I believe I can create a solid system using a graph that can support up to 10k users, but I'm worried having to potentially completely reacreate the database as the system grows. Is this a worry now to avoid issues, or implement now and adapt later type problem?
Going further, if I do need to plan on complete DB restructures, does it typically make sense to use a Multi-Model NoSQL DBMS (such as OrientDB or ArangoDB)?
I personally think you are asking premature questions.
Seriously, even with a bad model, a database can handle 10k users.
You think about scaling, but the hardest problem is not scaling, it is to come to the point where you need to scale.
I'm sure everybody wants 1bn users, but then you are already dreaming about having a social network with 200 times more users than Github itself ? (Github has ~ 5 million users).
Also, even by thinking it ahead, you will refactor and refactor again definitely during years, and you will have more than one persistence layer, be sure of it.
Code and code good, stay lean, remain able to change quickly, deploy, show to users, refactor, test, deploy and show to users in the same day. These are the things you need to do now, not asking questions about a problem you don't have yet, you definitely have a lot of other problems to solve now ;-)
UPDATE
Based on your comment, you might need to think that there are questions we just can not simply answer, because we don't need your exact requirements.
I have a simple app, which uses 4 persistence layers, and this app is not yet online. I'll give you my "why" about using it and which use case :
Neo4j : it is the core of the application data, I use it because I love it, I know it very much (it is my job) and, as the concept of the app is quite new and can evolve rapidly, having a schemaless db is reducing a lot of the refactoring stuff. Also I have now a lot of use cases coming by building the app, which make Neo4j a good choice when you need to add features without breaking what has already been done.
MySQL
I use it for User accounts and profiles. Why ? Because the framework I use already has a lot of bundles integrating this kind of stuff in a couple of lines of code, the bundles are well maintained and if I would use (currently) neo4j for it, I will have to reinvent the wheel. Also all the modules I use evolve in stability and compatibility with the framework.
Of course the mysql data is coupled (minimally) with the neo4j one. But I know that this kind of data will not evolve that much, so Mysql is a good choice and in case I have to refactor some points, this will not be a huge pain.
Redis
I use Redis for storing analytics data, Redis is quite flexible and I can easily create new keys and add data on top of it.
RabbitMQ :
I use a lot of message queues, why ? For testing refactoring. I can easily process messages with multiple consumers for testing "refactoring", testing mutliple database layers while the app is running for testing changes, testing new features, testing refactoring, ...
You will refactor ! Just try to keep it as simple as possible.
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 7 years ago.
Improve this question
We currently use SQL CE databases on the client machines, which then synchronise their data to a central server using the merge replication/RDA functionality of MS SQL. The amounts of data involved is small, and the central server will often be idle for ~95% of the time - it's only active really when data is incoming, and is typically synchronised on a daily/weekly basis.
The SQL Standard licensing costs for this are large, relative to the SQL server workload / the amount of data we're talking about (in the order of 100s of MBs maximum). What I'd like to know is if there's an open source alternative (mySQL or similar) which we could use as the backend data storage for our .NET application. My background is Windows Server Admin, so relatively new to Linux, but happy to give it a go and learn some new skills, as long as it won't be prohibitively difficult. If there are any other alternatives that would be great too.
Well this is quite a open ended question so I am going to give you some guidelines around what you can start researching.
Client Side embeded databases.
MySQL can be embedded just from my understanding MySQL as a embedded server might be overkill for a client.There are however a stack of alternatives. Once such a point would be the Berkely database system. There are other alternatives as well. Keep in mind you dont want a FULL sql server on the client side you are looking for something light weight.You can read about Berkley here: http://en.wikipedia.org/wiki/Berkeley_DB and about alternatives here : Single-file, persistent, sorted key-value store for Java (alternative to Berkeley DB). They mention SQLite which might just be up your alley. So in short there is a whole stack of open source tools you can use here.
Back End Databases. MySQL will do the job very well and even PostgreSQL. PostegreSQL seems to support more enterprise features the last time I looked however that might have changed. These two are your main players in the SQL server market as far as open source is concerned. Either one will do fine in your scenario. Both PostgreSQL and MySQL run on windows as well so you dont have to install Linux though I would suggest that you invest the time in Linux as I have it is well worth the effort and the peace of mind you get is good.
There is one major sticking point for you if you switch over to MySQL/PostgreSQL that the current RDA/replication technology you have will not be supported by these databases and you will need to look at how to implement this probably from scratch. So while the backend and even front end DB's can be replaced the replication of the data will be a little more problematic but NOT impossible.
Go play with these technologies do some tests and then you will need to decide how you will replace that replication.
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 7 years ago.
Improve this question
I am in the middle of designing a highly-scalable application which must store a lot of data. Just for example it will store lots about users and then things like a lot of their messages, comments etc. I have always used MySQL before but now I am minded to try something new like couchdb or similar which is not SQL.
Does anyone have any thoughts or guidance on this?
Here's a quote from a recent blog post from Dare Obasanjo.
SQL databases are like automatic
transmission and NoSQL databases are
like manual transmission. Once you
switch to NoSQL, you become
responsible for a lot of work that the
system takes care of automatically in
a relational database system. Similar
to what happens when you pick manual
over automatic transmission. Secondly,
NoSQL allows you to eke more
performance out of the system by
eliminating a lot of integrity checks
done by relational databases from the
database tier. Again, this is similar
to how you can get more performance
out of your car by driving a manual
transmission versus an automatic
transmission vehicle.
However the most notable similarity is
that just like most of us can’t really
take advantage of the benefits of a
manual transmission vehicle because
the majority of our driving is sitting
in traffic on the way to and from
work, there is a similar harsh reality
in that most sites aren’t at Google or
Facebook’s scale and thus have no need
for a Bigtable or Cassandra.
To which I can add only that switching from MySQL, where you have at least some experience, to CouchDB, where you have no experience, means you will have to deal with a whole new set of problems and learn different concepts and best practices. While by itself this is wonderful (I am playing at home with MongoDB and like it a lot), it will be a cost that you need to calculate when estimating the work for that project, and brings unknown risks while promising unknown benefits. It will be very hard to judge if you can do the project on time and with the quality you want/need to be successful, if it's based on a technology you don't know.
Now, if you have on the team an expert in the NoSQL field, then by all means take a good look at it. But without any expertise on the team, don't jump on NoSQL for a new commercial project.
Update: Just to throw some gasoline in the open fire you started, here are two interesting articles from people on the SQL camp. :-)
I Can't Wait for NoSQL to Die (original article is gone, here's a copy)
Fighting The NoSQL Mindset, Though This Isn't an anti-NoSQL Piece
Update: Well here is an interesting article about NoSQL
Making Sense of NoSQL
Seems like only real solutions today revolve around scaling out or sharding. All modern databases (NoSQLs as well as NewSQLs) support horizontal scaling right out of the box, at the database layer, without the need for the application to have sharding code or something.
Unfortunately enough, for the trusted good-old MySQL, sharding is not provided "out of the box". ScaleBase (disclaimer: I work there) is a maker of a complete scale-out solution an "automatic sharding machine" if you like. ScaleBae analyzes your data and SQL stream, splits the data across DB nodes, and aggregates in runtime – so you won’t have to!
And it's free download.
Don't get me wrong, NoSQLs are great, they're new, new is more choice and choice is always good!! But choosing NoSQL comes with a price, make sure you can pay it...
You can see here some more data about MySQL, NoSQL...: http://www.scalebase.com/extreme-scalability-with-mongodb-and-mysql-part-1-auto-sharding
Hope that helped.
One of the best options is to go for MongoDB(NOSql dB) that supports scalability.Stores large amounts of data nothing but bigdata in the form of documents unlike rows and tables in sql.This is fasters that follows sharding of the data.Uses replicasets to ensure data guarantee that maintains multiple servers having primary db server as the base. Language independent.
Flexible to use
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 8 years ago.
Improve this question
What's the limitation?
Is there a specific volume of data each can handle regardless of disk space?
When to use what assuming licensing is not a problem?
This is a very nuanced question that really cannot be easily answered, as each situation can provide many pluses and minuses. Also, MySQL being owned by Oracle now and several branches off of the main functionality means that MySQL != MySQL anymore.
If you are looking for really really big data sets, then you will like have to break with the RDBMS sets and start to look at things like MapReduce and other large data set processing technologies.
I have personally worked with all three over the past decade or so from the application perspective. They all have their advantages, like MSSQL working will with the other Microsoft technologies like LINQ where as MySQL having a large open community support and Oracle being the workhorse of the commercial sector with lots of ability to embed application logic right into the database.
Again, it really depends on the application, the situation, the skills of the people who will maintain it after it is developed, commercial considerations, hardware and platform considerations, etc etc etc.
It depends what you are trying to do and obviously it has to do with cost.
MySQL and Postgres are very widely used by a huge number of startups because its open source and there is a lot of support out there for people using it
MSSQL is good if you are using MS programming languages because of the ease to connect and use.
I have never used oracle but know people use it a lot for data warehouseing so can't have that much of a bad name
All of these will suffer from similar issues when scaling because they are RDBMS databases. They do also have decent ways to get round it and with a decent ORM used in your code then it shouldn't matter what you use.
Pick the one that all the developers are comfortable with
I'd say if you want to compare apples to apples, then it is MySQL vs SQL Express, vs Oracle Express.
Or if you have $, then it is the MySQL support license, MS-SQL Standard, vs whatever Oracle's cheapest offering is.
In my experience, once you choose a language, e.g. Php goes best with MySQL, then you've chosen your DB. Java goes well with Oracle. C# goes well with MSSQL.
Similarly, if you choose your OS, then unix flavors run MySQL or Oracle, but MSSQL is windows only. MySQL and Oracle work on both unix and windows of course.
If you need to buy many machines, then not having to pay OS licenses for the server helps in scaling.
As to skaffman's point you may want to have a look at postgres if mysql isn't scaling for you. It is a more mature and robust than mysql and is opensource. The time to make the switch is highly dependent on you application environment, however, if you need clustering and replication to work properly 100% of the time then postgres will not let you down (as mysql has for me in the past)
It would help narrow things a great deal if you'd provide details like whether or not you intend to distribute the database along with your software; your system will be hosted; how much data; etc.
Don't assume anything with regard to licensing. Get a lawyer, maybe even one who specializes in open source law.
"...regardless of disk space..." - capacity always depends on this. Where do you think the data goes? Better to think about things like sharding your data, RAID, clustering, replication, etc.
I would worry about any system whose developer had to come to a forum like this to ask that kind of question. You should have people on staff with sufficient skill and knowledge to have a strong opinion on this sort of thing.
Perhaps one variable which people overlook in these cases is the availability of expert support. Okay, so currently there's an oversupply of people who can help you with db issues, efficiency issues, disaster recovery etc. However this may not be always the case in the future, and it's the applications you use it for that may be the defining issue. Are there people in your organization who have experience in one or more of the relevant databases? (as it happens I believe that someone's who's become proficient in say Oracle, can become fairly competent in Sql*Server or Mysql in a fairly short space of time) You state it's going to be used for your financial systems - perhaps you really need input from a consultant who's worked on implementing and/or supporting financial systems - for example I understand that Sybase is popular in City type firms. Or perhaps there's an off-the-shelf package that utilises a preferred database? Try and define exactly what your system(s) needs to do first.
Is the application buy or build ?
If Buy, does it support all three and
talk to the app vendor about the
differences ?
If Build, then is it an in-house
build or contract out. If contracting
out, put out your requirements and
let the suppliers put their
arguments.
If in house build, then first look at
why you are not contracting out.
Normally it is because you already
have an in house capability, so look
at that expertise.
You want some sizing information first.
Are you talking data volumes in megabytes, gigabytes or terabytes ?
What are your uptime requirements, backup (recovery time / recovery point) ?
How much concurrent activity ? Is that peak ?
Generally any database system is fine for data storage and retrieval. High-end analysis, load balancing, replication, management, backup/recovery, auditability, security are all areas you may need to consider.