For a many to many relationship is it better to use relational database or nosql? - many-to-many

For a many to many relationship is it better to use relational database or nosql?
Let's assume you have a bunch of users. And each user can have friends that are from the same users table. So it's essentially a many to many relationship to itself. Many to many relationship in relational database will create a third table. Now I was wondering assuming this user table is huge like millions of people in there, this third table would be thus be gigantic assuming let's say each person has more than 10 friends each. Wouldn't it be more efficient for friends(and just overall more intuitive) to be stored as a json list in a nosql as shown below?
{"user1": "friendslist":["user2","user3","user4"]}
{"user2": "friendslist":["user1","user3","user4"]}
{"user3": "friendslist":["user1","user2","user4"]}
{"user4": "friendslist":["user1","user2","user3"]}
so this is also a data structures question so it would be btree vs hash table if I'm not mistaken.

It does seem more intuitive to the untrained. That's why the network data model is still so prevalent even though the relational model has been around for decades.
"Better" depends on how you want to use it, and "more efficient" depends on the database engine, indexes and various other factors. I prefer the relational model since I can formulate any reasonable question that can be logically derived from the data and get a correct answer. For example, if I wanted to find friends of friends, I could join a relational many-to-many table with itself. I could find cycles and cliques of any particular size. I could easily declare a unique constraint on pairs of friends.
It's possible to do these things without a relational database but I doubt it would be as easy or concise.
The particular data structure used by the database engine has nothing to do with the relational concept, though it is relevant to efficiency. For more info on which data structure would be used, you'll need to look at particular database management systems and their storage engines.

Why would a relational implementation be "gigantic"? Why would your structure be "more efficient"? You are making a lot of unfounded assumptions that it would be good for you to think about. (Learn some relational basics. And the relational take on relational vs NoSQL.)
Re "intuitive", the obvious relational organization for when U friended F is a table Friended holding rows where... "U friended F". Friended(U,F) for short. If you want Us where U friended x, that's the rows where Friended(U,x), ie the rows in PROJECT U RESTRICT F x Friended, ie the rows in PROJECT U (Friended WHERE F=x), depending on whether you want to think in logic, relations or a mix. What's your query for that? Using a relational interface in terms of predicates and tables does not require or preclude any particular implementations. The entire NoSQL movement is a sad consequence of lack of understanding by users and vendors of the relational model as interface to data, not as storage structure. A DBMS for a NoSQL use case needs only to be a relational DBMS better supporting arbitrary types in querying and implementation.
From my answer to Adjustable, versioned graph database:
There is an obvious 1:1 correspondence between your states at a given time and a relational database with a given schema. So there is an obvious 1:1 correspondence between your set of states over time and a changing-schema database, ie a variable whose value is a database plus metadata, manipulated by both DDL and DML update commands. So there is no evidence that you shouldn't just use a relational DBMS.
Relational DBMSs allow generic querying with automated implementation at a certain computational complexity with certain opportunities for optimization. Any application can have specialized queries that make a specialized data structure and operators a better choice. But you must design your application and know about such special aspects to justify this. As it is, with the obvious correspondences between your states and relational states, this has not been justified.
Just because you can draw a picture of your application state as of some time using a graph does not mean that you need a graph database. What matters is what specialized queries/expressions you will be evaluating. You should understand what these are in terms of your problem domain, which is probably most easily expressible per some specialized data structure and operators and relationally. Then you can compare the expressive and computational demands to a specialized data structure, a relational representation, and the models of particular graph databases.
Of course there are specialized applications where we use optimized special operators and storage. But that merits justification, and from a relational perspective should supported by an extendible relational DBMS.

Related

How to design and store history message in database in IM(instant message system)?

By storing historical messages in persist storage, we can achieve multi-device synchronization and message roaming.
But How to design the table schema and divide the table?
In my most immediate thoughts, maybe every chat group should have a table, and then the messages sent in the chat group or channel will be appended to the table.
In this way, we will have lots of tables, like table group_123,table group_345,table group_${gid}. The only question with this method is whether it will be bad to divide so many tables.
I have searched some answers before, and they are mostly stored in one big table, where $gid is just a field of the table.
Besides, the difference in this scene between mysql and mongodb also puzzles me. I can't figure out which one is better, like why use mysql or why not use mysql or why use mongodb or why not use mongodb.
Be very wary of any design that starts, "I'll create a table per X," because whatever X is, it's likely to become too numerous, and soon you'll have thousands of tables, and discover that just managing the metadata becomes a burden.
In general, the way to approach relational table design is to follow rules of database normalization. Your table to store messages is a set of similar objects. Normalization does not make a distinction between sets that are modest in size versus large in size. If they are the same type of thing, they go in the same table. At least that's what normalization would guide us to do.
There are practical limits of any implementation, though, and you may find the need to bend the rules of normalization, by using partitioning or sharding of various forms. Even defining indexes is not called for by normalization, but it is a good idea to help optimize queries.
That's the key: any optimization strategy must be chosen in the context of specific queries that you need to run in your application. Optimization means to improve efficiency of one type of query, at the expense of other types of queries. You cannot choose which optimization strategy is best for your application without knowing the queries.
This is also the way to choose between relational and non-relational types of databases. Non-relational databases optimize for certain query types, so you need to know which queries are most important in your application before choosing any non-relational technology, or choosing which data model once you have chosen that technology.

Theoretical basis for CRUD Operations on data

I know that RDBMSs are based on the Relational Model, supported by Relational Algebra.
Various Relational Algebra theoretical concepts like Selection, Projection, Joins implemented in Query languages like SQL. But these operations are primarily the R (Read) of CRUD (Create, Read, Update, Delete).
CRUD is the holy grail of programming, especially in the enterprise world.
I wanted to know on which programming language independent, theoretical foundation (may or may not be mathematical) are the INSERTS, UPDATES, DELETES modeled on? Does such a theory even exist?
If it would exist, it could probable explain things like constraints on Databases amongst other things.
Eg:
You cannot update a single row (tuple) without specifying a unique column (a WHERE clasue).
Or,
If a one to many relation is deleted, the entity on the many side gets deleted (the table in which the other table's primary key is housed).
For the sake of simplicity let us assume all CRUD is operated on Relational Models only.
The reason I am asking is because I need to do a deep R&D for a product that hopes to automate CRUD. I know I know people have tried and failed, but I'd still like to be pointed to some theoretical foundation please!
EDIT This will also help in the design of ORMs which can produce all CRUD operations independent of the underlying DB Model
EDIT I just found this link -> https://cs.stackexchange.com/questions/43672/a-relational-algebra-extended-to-model-the-full-dml-crud-domain This is similar to what I have to ask unfortunately the OP's question circles into a specific implementation!
In relational terms CREATE, UPDATE and DELETE operations are all assignments. E.g. inserting I into T can be accomplished by:
T = T UNION I;
Any practical relational language ought to have syntax shortcuts for these operations. See Tutorial D for example.
CRUD can be reduced to relations, relational algebra, variables and (optionally) type theory. A database is seen as a set of relation variables, similar to variables in any imperative programming language except that they hold relations rather than scalar values. Queries apply a sequence of relational algebra operators to the values stored in relation variables. Read queries return the result to the caller. Create, Update and Delete queries assign the result back to the original relation variable.
One problem with ORMs is that they confuse rows for entities, tables for entity sets and columns for attributes. Chen's original paper stated that entities are represented by values and attributes are one-to-one relations represented by pairs of values. Another problem is trying to manipulate a row at a time when the underlying system works with sets. Another is trying to abstract over a very high-level declarative data sublanguage.
I don't want ORMs, I want my objects to talk in SQL with each other, but that's a different topic.
This is too long for a comment.
"Relational" databases only loosely implement relational algebra. The "relational" in relational algebra, for instance, refers (among other things) to the relationship between "attributes" (columns) and their values within a "tuple" (rows in a table). In most SQL databases, all rows in a table ("tuples") have the same columns. That is not a requirement for relational algebra. Another examples are duplicates within tables. Relational algebra deals with sets of "tuples", where duplicates are not allowed. Yet, relational databases allow duplicates in tables unless a primary key is explicitly defined.
The semantics around CRUD are driven more by the ACID properties of databases (atomicity, consistency, isolation, and durability). These properties drive the transactional semantics of relational databases.
In my experience, successful practical applications usually differ from theoretical underpinnings.

Relational Database & MyIsam

Just coming out of University, I have been taught the 'right' way of designing databases. e.g. database normalisation, how to structure tables etc.
Now I am faced with something which they didn't teach me at University...
It appears that I have a choice of 2 database engines - MyISAM or InnoDB.
I know that I can build a relational database with InnoDB storage engine, however as far as I can see, I cannot build a relational database with the MyISAM storage engine as I cannot link the tables.
So - my question - And please tell me if I am just being dumb or just missing a trick...
If I can't build a relational database with MyISAM, then what is it good for?
How do I ensure database integrity with MyISAM?
Do most people use MyISAM or INNODB?
How do I enforce constraints between two MyIsam tables?
E.g. If I am building a small online store, I will have one table for products, and one table for categories. A product must belong to 1 category. How would I build this using MyIsam?
It is true that the current version of MySQL will not enforce foreign key constraints that are defined on MyISAM tables, but that does not mean one cannot create relations between such tables (which are, after all, just a matter of holding in one table data that identifies a related record in another table): one must just be more careful to manage them properly.
If enforced ACID compliance is important to you, then InnoDB is the way to go; if you can sacrifice such compliance in return for improved performance in certain situations, then MyISAM may be worth a look. You can even mix and match both storage engines within the same database to achieve a balance, if required.
There are a lot of resources discussing the pros and cons of MyISAM vs InnoDB—just search on Google (or this site) and you will find!
The relational model was developed in 1969-1970 in order to help clarify the case for building databases that conformed to that model. There is no particular reason why the relational model should be used to model data that is eventually going to go into a hierarchical database. However, it might be useful to use the relational model to help describe the data as it comes out of a hierarchical database, and before it is delivered to the DBMS client.
It's important to realize that the relational model of data is a design tool, and not really an analysis tool. The ER model of data was invented precisely for data analysis without tilting the analysis towards one particular implementation, such as a relational or SQL implementation. In the ER model as such, there are no foreign keys. It's important to understand that foreign keys are a feature of the solution, not a feature of the problem.
Maybe, in order to build a decent database using MyISAM, you need to relearn the correct way to build a database. What you learned in university may have been how to build a relational database, presuming that no students would ever have to build a hierarchical database.
Caveat: most hierarchical DBMSes have had a "relational layer" plastered on top of them, so that people who think in relational terms can use the tool without have to leave the relational model to one side. And Relational DBMSes are "better" than non relational ones, in at least some ways. The arguments made in 1969-1970 are still largely valid.

The concept of implementing key/value stores with relational database languages

I want to get myself wet with the concept of implementing key/value stores with relational database languages (like mysql and sql server).
However this is one of the times when Google isn't good enough.
Does anyone know of any good info / good links regarding the concept of implementing key/value stores with relational database languages?
Wiki for EAV http://en.wikipedia.org/wiki/Entity-attribute-value_model. SO answer with link to whitepaper called "Best Practices for Semantic Data Modeling for Performance and Scalability" EAV over SQL Server
The primary reason to do a key value schema implementation in relational is to have the flexibility of a small sub-schema with key value aspects and the main schema being relational or vice versa. This could give one extreme flexibility to address key value lookups for some portion of the application and others a traditional relational option without having to keep multiple databases.
In fact we have implemented such cases for some of our customers, where the customers are either a specific relational DB shop or for the same above mentioned reasons. You can always create a key value store in a relational database but not the other way.

What is the difference between a Relational and Non-Relational Database?

MySQL, PostgreSQL and MS SQL Server are relational database systems, and NoSQL, MongoDB, etc. are non-relational DBMSs.
What are the differences between the two types of system?
Hmm, not quite sure what your question is.
In the title you ask about Databases (DB), whereas in the body of your text you ask about Database Management Systems (DBMS). The two are completely different and require different answers.
A DBMS is a tool that allows you to access a DB.
Other than the data itself, a DB is the concept of how that data is structured.
So just like you can program with Oriented Object methodology with a non-OO powered compiler, or vice-versa, so can you set-up a relational database without an RDBMS or use an RDBMS to store non-relational data.
I'll focus on what Relational Database (RDB) means and leave the discussion about what systems do to others.
A relational database (the concept) is a data structure that allows you to link information from different 'tables', or different types of data buckets. A data bucket must contain what is called a key or index (that allows to uniquely identify any atomic chunk of data within the bucket). Other data buckets may refer to that key so as to create a link between their data atoms and the atom pointed to by the key.
A non-relational database just stores data without explicit and structured mechanisms to link data from different buckets to one another.
As to implementing such a scheme, if you have a paper file with an index and in a different paper file you refer to the index to get at the relevant information, then you have implemented a relational database, albeit quite a simple one. So you see that you do not even need a computer (of course it can become tedious very quickly without one to help), similarly you do not need an RDBMS, though arguably an RDBMS is the right tool for the job. That said there are variations as to what the different tools out there can do so choosing the right tool for the job may not be all that straightforward.
I hope this is layman terms enough and is helpful to your understanding.
Relational databases have a mathematical basis (set theory, relational theory), which are distilled into SQL == Structured Query Language.
NoSQL's many forms (e.g. document-based, graph-based, object-based, key-value store, etc.) may or may not be based on a single underpinning mathematical theory. As S. Lott has correctly pointed out, hierarchical data stores do indeed have a mathematical basis. The same might be said for graph databases.
I'm not aware of a universal query language for NoSQL databases.
Most of what you "know" is wrong.
First of all, as a few of the relational gurus routinely (and sometimes stridently) point out, SQL doesn't really fit nearly as closely with relational theory as many people think. Second, most of the differences in "NoSQL" stuff has relatively little to do with whether it's relational or not. Finally, it's pretty difficult to say how "NoSQL" differs from SQL because both represent a pretty wide range of possibilities.
The one major difference that you can count on is that almost anything that supports SQL supports things like triggers in the database itself -- i.e. you can design rules into the database proper that are intended to ensure that the data is always internally consistent. For example, you can set things up so your database asserts that a person must have an address. If you do so, anytime you add a person, it will basically force you to associate that person with some address. You might add a new address or you might associate them with some existing address, but one way or another, the person must have an address. Likewise, if you delete an address, it'll force you to either remove all the people currently at that address, or associate each with some other address. You can do the same for other relationships, such as saying every person must have a mother, every office must have a phone number, etc.
Note that these sorts of things are also guaranteed to happen atomically, so if somebody else looks at the database as you're adding the person, they'll either not see the person at all, or else they'll see the person with the address (or the mother, etc.)
Most of the NoSQL databases do not attempt to provide this kind of enforcement in the database proper. It's up to you, in the code that uses the database, to enforce any relationships necessary for your data. In most cases, it's also possible to see data that's only partially correct, so even if you have a family tree where every person is supposed to be associated with parents, there can be times that whatever constraints you've imposed won't really be enforced. Some will let you do that at will. Others guarantee that it only happens temporarily, though exactly how long it can/will last can be open to question.
The relational database uses a formal system of predicates to address data. The underlying physical implementation is of no substance and can vary to optimize for certain operations, but it must always assume the relational model. In layman's terms, that's just saying I know exactly how many values (attributes) each row (tuple) in my table (relation) has and now I want to exploit the fact accordingly, thoroughly and to it's extreme. That's the true nature of the beast.
Since we're obviously the generation that has had a relational upbringing, if you look at NoSQL database models from the perspective of the relational model, again in layman's terms, the first obvious difference is that no assumptions about the number of values a row can contain is ever made. This is really oversimplifying the matter and does not cleanly apply to the intricacies of the physical models of every NoSQL database, but it's the pinnacle of the relational model and the first assumption we have to leave behind or, if you'd rather, the biggest leap we have to make.
We can agree to two things that are true for every DBMS: it can store any kind of data and has enough mathematical underpinnings to make it possible to manage the data in any way imaginable. The reality is that you'll never want to make the mistake of putting any of the two points to the test, but rather just stick with what the actual DBMS was really made for. In layman's terms: respect the beast within!
(Please note that I've avoided comparing the (obviously) well founded standards revolving around the relational model against the many flavors provided by NoSQL databases. If you'd like, consider NoSQL databases as an umbrella term for any DBMS that does not completely assume the relational model, in exclusion to everything else. The differences are too many, but that's the principal difference and the one I think would be of most use to you to understand the two.)
Try to explain this question in a level referring to a little bit technology
Take MongoDB and Traditional SQL for comparison, imagine the scenario of posting a Tweet on Twitter. This tweet contains 9 pictures. How do you store this tweet and its corresponding pictures?
In terms of traditional relationship SQL, you can store the tweets and pictures in separate tables, and represent the connection through building a new table.
What's more, you can set a field which is an image type, and zip the 9 pictures into a binary document and store it in this field.
Using MongoDB, you could build a document like this (similar to the concept of a table in relational SQL):
{
"id":"XXX",
"user":"XXX",
"date":"xxxx-xx-xx",
"content":{
"text":"XXXX",
"picture":["p1.png","p2.png","p3.png"]
}
Therefore, in my opinion, the main difference is about how do you store the data and the storage level of the relationships between them.
In this example, the data is the tweet and the pictures. The different mechanism about storage level of relationship between them also play a important role in the difference between both.
I hope this small example helps show the difference between SQL and NoSQL (ACID and BASE).
Here's a link of picture about the goals of NoSQL from the Internet:
http://icamchuwordpress-wordpress.stor.sinaapp.com/uploads/2015/01/dbc795f6f262e9d01fa0ab9b323b2dd1_b.png
The difference between relational and non-relational is exactly that. The relational database architecture provides with constraints objects such as primary keys, foreign keys, etc that allows one to tie two or more tables in a relation. This is good so that we normalize our tables which is to say split information about what the database represents into many different tables, once can keep the integrity of the data.
For example, say you have a series of table that houses information about an employee. You could not delete a record from a table without deleting all the records that pertain to such record from the other tables. In this way you implement data integrity. The non-relational database doesn't provide this constraints constructs that will allow you to implement data integrity.
Unless you don't implement this constraint in the front end application that is utilized to populate the databases' tables, you are implementing a mess that can be compared with the wild west.
First up let me start by saying why we need a database.
We need a database to help organise information in such a manner that we can retrieve that data stored in a efficient manner.
Examples of relational database management systems(SQL):
1)Oracle Database
2)SQLite
3)PostgreSQL
4)MySQL
5)Microsoft SQL Server
6)IBM DB2
Examples of non relational database management systems(NoSQL)
1)MongoDB
2)Cassandra
3)Redis
4)Couchbase
5)HBase
6)DocumentDB
7)Neo4j
Relational databases have normalized data, as in information is stored in tables in forms of rows and columns, and normally when data is in normalized form, it helps to reduce data redundancy, and the data in tables are normally related to each other, so when we want to retrieve the data, we can query the data by using join statements and retrieve data as per our need.This is suited when we want to have more writes, less reads, and not much data involved, also its really easy relatively to update data in tables than in non relational databases. Horizontal scaling not possible, vertical scaling possible to some extent.CAP(Consistency, Availability, Partition Tolerant), and ACID (Atomicity, Consistency, Isolation, Duration)compliance.
Let me show entering data to a relational database using PostgreSQL as an example.
First create a product table as follows:
CREATE TABLE products (
product_no integer,
name text,
price numeric
);
then insert the data
INSERT INTO products (product_no, name, price) VALUES (1, 'Cheese', 9.99);
Let's look at another different example:
Here in a relational database, we can link the student table and subject table using relationships, via foreign key, subject ID, but in a non relational database no need to have two documents, as no relationships, so we store all the subject details and student details in one document say student document, then data is getting duplicated, which makes updating records troublesome.
In non relational databases, there is no fixed schema, data is not normalized. no relationships between data is created, all data mostly put in one document. Well suited when handling lots of data, and can transfer lots of data at once, best where high amounts of reads and less writes, and less updates, bit difficult to query data, as no fixed schema. Horizontal and vertical scaling is possible.CAP (Consistency, Availability, Partition Tolerant)and BASE (Basically Available, soft state, Eventually consistent)compliance.
Let me show an example to enter data to a non relational database using Mongodb
db.users.insertOne({name: ‘Mary’, age: 28 , occupation: ‘writer’ })
db.users.insertOne({name: ‘Ben’ , age: 21})
Hence you can understand that to the database called db, and there is a collections called users, and document called insertOne to which we add data, and there is no fixed schema as our first record has 3 attributes, and second attribute has 2 attributes only, this is no problem in non relational databases, but this cannot be done in relational databases, as relational databases have a fixed schema.
Let's look at another different example
({Studname: ‘Ash’, Subname: ‘Mathematics’, LecturerName: ‘Mr. Oak’})
Hence we can see in non relational database we can enter both student details and subject details into one document, as no relationships defined in non relational databases, but here this way can lead to data duplication, and hence errors in updating can occur therefore.
Hope this explains everything
In layman terms it's strongly structured vs unstructured, which implies that you have different degrees of adaptability for your DB.
Differences arise in indexation particularly as you need to ensure that a certain reference index can link to a another item -> this a relation. The more strict structure of relational DB comes from this requirement.
To note that NosDB apaprently provides both relational and non relational DBs and a way to query both http://www.alachisoft.com/nosdb/sql-cheat-sheet.html