do document-oriented databases have integrity? - relational-database

I'm coming from a MySQL background, and I'm interested in document-oriented databases, specifically CouchDB. One of the things I'm interested in is data integrity. How do document-oriented databases handle this? For instance, in RDBMSes, there are ways to prevent duplication of records, or guaranteeing that if you have one bit of information, you will have another, or else none at all.
I guess more broadly, my question is, what types of problems are RDBMSes cut out for, compared to problems that DODBes are used for? I looked on some of the other stackoverflow questions for an explanation, but didn't find any good ones.
Also, with my databases at work, I do a lot of reporting, with summing and averaging values, and historical trending. Is this something appropriate for document-oriented databases?

Most of the document-databases have only support very limited integrity or no integrity checks. They rely on the application to ensure that the data is correct. I can tell you how it is in CouchDB.
To the second part. I think RDBMS do very well at reporting and analyzing data. The fact that you can run complex queries on the data with joins, aggregations, functions etc make RDBMS a very powerful reporting-tool.
Document-databases do really well for storing the 'live' application-data. It very easy to store an retrieve object-graph into document-databases. The schema-free design makes it easy to extends the model for new application features. However this only works if you can split your application-data into nice documents. Otherwise you loose a lot of the elegance.
If you want to do mostly reporting, I would prefer a RDBMS. When to store lots of flat, simple records it very easy to do reporting on it. The tooling etc. is perfect for reporting. However when want to do reporting on complex structured data, you probably still better of with another database desgine than a RDBMS.
However this doesn't mean you need to limit yourself to RDBMS. You could combine the two technologies. Imagine a blog-software. You store the 'live' application data like blog-posts and comments into the documentdatabase. Data for reporting like click- and login-statistics is stored in a RDBMS. See also Rob Conerys post.

Related

Why do NoSql databases scale better than relational databases? How should I choose between them?

By nosql databases I mean something like mongodb or dynamodb
I've been trying to find why NoSql dbs usually are usually better at horizontal scaling than relational dbs, and how to choose between them
I have looked into many videos and posts that tell us the "SQL vs NoSQL". Most of them end up talking about "Normalization vs Denormalization".
Here are some questions I am still confused about.
1.
Many people said that relational dbs have to follow ACID so they are bad at horizontal scaling. But ACID is about transaction, we can always choose not to use any transaction, right? I know not many people do this, but if we denormalized tables enough, would it be like NoSQL dbs where we almost don't use any transaction?. And many NoSql dbs now have transactions too.
2.
I know denormalization is probably good for horizontal scaling, because if data are
spreaded across many nodes(machines), it'll be hard to do table joining(or transaction).
But like transaction, we can choose not to use any table join.
The only thing I can think of is NoSQL are schema-free, it is easier to add new fields(columns) than RDB.
What I am trying to ask are
why is a "Denormalized NoSQL db" better than a "Denormalized relational db" ?
why is a "Normalized NoSQL db" worse than a "Normalized relational db" ?
what's the real thing that prevents relational database from denormalization?
I've read this post
https://softwareengineering.stackexchange.com/questions/194340/why-are-nosql-databases-more-scalable-than-sql
It says
""The SQL API lacks a mechanism to describe queries where ACID's requirements are relaxed. This is why the BASE databases are all NoSQL.""
Could anyone give me an example of this?
Sorry for not being specific
By NoSQL databases I mean something like mongodb
A blog like https://neo4j.com/blog/acid-vs-base-consistency-models-explained/ explains BASE this way:
Basic Availability
The database appears to work most of the time.
Soft-state
Stores don’t have to be write-consistent, nor do different replicas have to be mutually consistent all the time.
Eventual consistency
Stores exhibit consistency at some later point (e.g., lazily at read time).
This level of equivocation doesn't sound very reliable, does it? They trade off availability and consistency to gain performance and scalability.
This is fine if you're running a service that is tolerant of mismatched data or stale data, or which is okay with some minor amount of data loss once in a while. If those issues are an uncommon occurrence, but you get superior performance nearly all the time, it's very attractive. And more importantly, it demos well.
But if you have to run a service with strict requirements for data integrity, it's no good. If losing even one record of data gets you in trouble with auditors, or if you can't reliably read data you just committed a moment before because that commit takes time to propagate to all nodes of your cluster, it could be a deal-breaker.
So which data store to choose depends on the requirements of your app. Only you can judge if the relaxed availability and consistency of a BASE data store is sufficient for the needs of your app.
NoSQL is a term that covers lots of types of storage/query engines e.g. document stores, Graph Databases, etc. - basically anything that looks something like a database but doesn’t use the standard tables/rows/columns structure that a SQL database does.
NoSQL databases were developed to support use cases that relational databases don’t handle well - so while you might be able to use either a SQL or a NoSQL database in any given scenario, the choice between the 2 is normally a no-brainer; they would very rarely both be viable options.
Just to clarify, your questions about types of DB being better or worse are meaningless without context. Without knowing precisely what your requirements are, it’s impossible to say whether a NoSQL DB is better or worse than a SQL one - and that’s before you start looking at specific products in each category.
Also, that post you reference is about 8 years old and much of the information is out of date - as one of the contributors acknowledges in an update made in 2019

Performance Analysis of CouchDB

I am developing a Discussion Forum for my University. For this to manipulate the data i m using CouchDB as database.
I m finding difficulty in designing the structure of my db, in order to maximize the performance of my db.
I want to discuss what is the good practice of designing a document database.
Either we should make only one database as SQL and make 'n' no. of documents in the database.
Or we can make more no of database in order to flatten my db structure.This also reduce the more no. of documents to be developed.
The questions you need to ask are simply this: "How do you want to get data out of your database?"
Database design hinges around the queries to be made, not what is available to be stored.
This is especially important for Document DBs like Couch, since, while it does have a flexible schema, it does not have flexible indexing. By that I mean that because of the granularity of the data, it's quite like that later on, when you need to ask a question that it was not designed to answer, answering that question may well be very expensive. It's much, much cheaper to design your views and other constructs early, when there is little data in the data base rather than later after you have thousands or millions of rows.
RDBMS's, since they tend to have a finer granularity of data, tend to be more nimble to new queries and such later in life. Document DBs, not so much.
So think through your use cases up front, and design around those, and design those early on, it's much less painless now than later.
It's hard to tell the right way to approach modeling your data since you don't give much information. Generally though you want to keep as much data as possible in one database as this allows you to index it together (indexes cannot span more than one database).
Also, since there is no schema enforcement in the database, you can create different types of records in each database. For example, there is nothing wrong with have both user information and forum entries in the same database.
Last, you will most likely want to keep messages and their replies in different records. This is an old but still relevant discussion on this topic: http://www.cmlenz.net/archives/2007/10/couchdb-joins
Cheers.

XML or MYSQL.Which should be used for storing connected data?

i am writing code for friend list and messaging system for my college website.I need to store interconnected data.. need to search them ...It has about 3500 records..So which way I proceed MYSQL or XML ..which is fastest..which is best ?why?
I'm going to use one of my professor's favorite answers here: "it depends."
XML and MySQL have very different applications. If you need to be doing lots of simultaneous queries for all sorts of sophisticated things, MySQL is your clear winner. Sometimes MySQL can be hard to use in some applications because you must first create a database schema in which to fit your data. It sounds like though, that you have many records with the same structure, and it would be easy enough to throw them into a database. With a SQL based database engine like MySQL, you can also construct queries using the standard SQL language. Database optimizations can also help to increase the performance of these types of queries, for example, you can used indexes and keys. If your data needs to be updated regularly, than MySQL will likely provide better performance as it will not have to rewrite the XML file. If you need your application to scale to many simultaneous connections of sophisticated queries, you are definitely going to want to go with some sort of SQL solution.
Depending upon your application though, sometimes there are other ways to store and access your data. I for one once needed to create a persistent data structure on the disk which could be accessed very quickly, but never updated. For that, I used cdb. There are also other database systems out there like the Berkeley database, and some No-SQL solutions such as couchdb and mongodb. I posed a somewhat interesting question here on stackoverflow on the use of No-SQL solutions a little while back which you may find interesting as well.
This is really just a sampling of different considerations you may want to make when you are choosing how you want to store your data. Think about questions like: How frequently will things be queried? or updated? What will your queries look like? What kinds of applications do you need to access your information from? etc.

What kinds of data queries are too hard to do on CouchDB (as opposed to SQL)? Seeking concrete examples

I think CouchDB is really cool and want to use it more. But I'd also like to know ahead of time whether there are any types of data query that are done easily on MySQL but are impossible or very awkward to accomplish in CouchDB.
Please answer with concrete answers or examples instead of just saying that "CouchDB is for documents and MySQL is for relational data." I don't really know what that statement means, since it seems that you can do things functionally equivalent to relational MySQL joins with CouchDB views.
For example, I've read that paginating through a data set is a bit awkward in CouchDB. This is the sort of answer I'm looking for.
A problem I'm having at the moment is displaying an AJAX grid with contents from a CouchDB database. The equivalent SQL request would be:
SELECT * FROM the_table
WHERE {filter_col} = {filter_value} [ AND ... ]
ORDER BY {order_col}
LIMIT {n} OFFSET {m}
It's a pretty simple request to run on a traditional SQL database, but having to perform filtering, ordering and paging all together at the same time is beyond what CouchDB indexing can manage - at least, without creating an insane number of different views.
Couchdb is having hard time with full-text searches (unless external software is used), although mysql isn't particularly good at that, couch is still even worse.
Couchdb isn't going to do a good job when your data model implies multiple and complex relations between objects, after all, it's a document-based system, not relational dbms.
Other than that, IMO couch rules.
EDIT: Particularly when you need to relax, of course! :)
It all depends on the motivation behind changing data stores. What problem or architectural challenge are you trying to overcome with MySQL that CouchDB can solve? If at the end of the day there is no difference in functionality or performance then the refactoring to change database platforms cannot be justified.
Have a look at some ORM frameworks, which if implemented correctly can let you swap out the back end databases easily.

Database structure - is mySQL the right choice?

We are currently planning the database structure of a quite complex e-commerce web app that has flexibility as its main cornerstone.
Our app features a large amount of data (products) and we have run into a slight headache trying to keep performance high without compromizing normalization rules in the database, or leaving our highly beloved flexibility concept behind when integrating product options (also widely known as product attributes or parameters).
Based on various references and sources available, we have made up lists on pros and cons of all major and well known database patterns to solve this. After comparing these, we have come up with two final alternatives:
EAV (Entity-attribute-value model) :
Pros: Database is used for all sorting.
Cons: All related queries will include a number of joins between multiple tables in order to complete the collection of data.
SLOB (Serialized LOB, also known as Facade?) :
Pros: Very flexible. Keeping the number of necessary joins low compared to a EAV design pattern. Easy to update/add/remove data from each product but hard to keep data integrity without additional tables.
Cons: All sorting will be done by the application instead of the database. Will use lots of performance (memory?) when big datasets is processed by a large number of users.
Our main questions:
Which pattern/structure would you use, or maybe even a different solution?
Is there better databases besides mySQL available nowadays to accomplish what we want?
Thanks a lot!
Reference: How to design a product table for many kinds of product where each product has many parameters
Why limit yourself to one model? It's very possible that you'll be better off with two different models where each one meets a specific goal very well.
Assuming, as is often the case, that the two don't have to be absolutely and instantaneously in sync, you might easily end up with much better overall performance. What kind of hard requirements would you have on synchronization? Milliseconds up to a minute?
Udi Dahan has some good information on command query responsibility separation (CQRS) that's relevant. See also a couple of other articles. InfoQ also has very relevant video of Greg Young from QCon08.
EDIT: Here's another video (by Udi Dahan) that discusses, among other things, the benefits of multiple models.
MySQL performs very well even for very large datasets. I use it at a financial services SaaS company and it has always worked well. I have also use SQL Server and Oracle for very large applications and MySQL performs no better or worse on whole. My focus is more the business layer, though, and you may get more detailed opinions from people closer to the DB.
When selecting a pattern, keep in mind that it's much more straightforward to scale the application tier than the data tier (easy and cheap to add application servers). Performing many joins for common operations can cause a real performance bottleneck.
I would suggest you prototype both approaches so that you can both get more familiar with each of them, and benchmark their performance in your specific environment.
Additionally, you may want to look into alternatives to SQL that attempt to achieve a pattern similar to the ones you outline. A friend at a very large, well-known Internet company is starting to use Project Voldemort. He prefers it over similar efforts mostly due to the very active community.
from your solution, it seems you don't want to use a relational model, so perhaps it's better not to use a relational database, take a look at these alternatives: http://nosql-database.org/ btw SQLServer has nice SLOB features in the form of xml fields (can be indexed an queried through XQuery)