Relational Database & MyIsam - mysql

Just coming out of University, I have been taught the 'right' way of designing databases. e.g. database normalisation, how to structure tables etc.
Now I am faced with something which they didn't teach me at University...
It appears that I have a choice of 2 database engines - MyISAM or InnoDB.
I know that I can build a relational database with InnoDB storage engine, however as far as I can see, I cannot build a relational database with the MyISAM storage engine as I cannot link the tables.
So - my question - And please tell me if I am just being dumb or just missing a trick...
If I can't build a relational database with MyISAM, then what is it good for?
How do I ensure database integrity with MyISAM?
Do most people use MyISAM or INNODB?
How do I enforce constraints between two MyIsam tables?
E.g. If I am building a small online store, I will have one table for products, and one table for categories. A product must belong to 1 category. How would I build this using MyIsam?

It is true that the current version of MySQL will not enforce foreign key constraints that are defined on MyISAM tables, but that does not mean one cannot create relations between such tables (which are, after all, just a matter of holding in one table data that identifies a related record in another table): one must just be more careful to manage them properly.
If enforced ACID compliance is important to you, then InnoDB is the way to go; if you can sacrifice such compliance in return for improved performance in certain situations, then MyISAM may be worth a look. You can even mix and match both storage engines within the same database to achieve a balance, if required.
There are a lot of resources discussing the pros and cons of MyISAM vs InnoDB—just search on Google (or this site) and you will find!

The relational model was developed in 1969-1970 in order to help clarify the case for building databases that conformed to that model. There is no particular reason why the relational model should be used to model data that is eventually going to go into a hierarchical database. However, it might be useful to use the relational model to help describe the data as it comes out of a hierarchical database, and before it is delivered to the DBMS client.
It's important to realize that the relational model of data is a design tool, and not really an analysis tool. The ER model of data was invented precisely for data analysis without tilting the analysis towards one particular implementation, such as a relational or SQL implementation. In the ER model as such, there are no foreign keys. It's important to understand that foreign keys are a feature of the solution, not a feature of the problem.
Maybe, in order to build a decent database using MyISAM, you need to relearn the correct way to build a database. What you learned in university may have been how to build a relational database, presuming that no students would ever have to build a hierarchical database.
Caveat: most hierarchical DBMSes have had a "relational layer" plastered on top of them, so that people who think in relational terms can use the tool without have to leave the relational model to one side. And Relational DBMSes are "better" than non relational ones, in at least some ways. The arguments made in 1969-1970 are still largely valid.

Related

How to design and store history message in database in IM(instant message system)?

By storing historical messages in persist storage, we can achieve multi-device synchronization and message roaming.
But How to design the table schema and divide the table?
In my most immediate thoughts, maybe every chat group should have a table, and then the messages sent in the chat group or channel will be appended to the table.
In this way, we will have lots of tables, like table group_123,table group_345,table group_${gid}. The only question with this method is whether it will be bad to divide so many tables.
I have searched some answers before, and they are mostly stored in one big table, where $gid is just a field of the table.
Besides, the difference in this scene between mysql and mongodb also puzzles me. I can't figure out which one is better, like why use mysql or why not use mysql or why use mongodb or why not use mongodb.
Be very wary of any design that starts, "I'll create a table per X," because whatever X is, it's likely to become too numerous, and soon you'll have thousands of tables, and discover that just managing the metadata becomes a burden.
In general, the way to approach relational table design is to follow rules of database normalization. Your table to store messages is a set of similar objects. Normalization does not make a distinction between sets that are modest in size versus large in size. If they are the same type of thing, they go in the same table. At least that's what normalization would guide us to do.
There are practical limits of any implementation, though, and you may find the need to bend the rules of normalization, by using partitioning or sharding of various forms. Even defining indexes is not called for by normalization, but it is a good idea to help optimize queries.
That's the key: any optimization strategy must be chosen in the context of specific queries that you need to run in your application. Optimization means to improve efficiency of one type of query, at the expense of other types of queries. You cannot choose which optimization strategy is best for your application without knowing the queries.
This is also the way to choose between relational and non-relational types of databases. Non-relational databases optimize for certain query types, so you need to know which queries are most important in your application before choosing any non-relational technology, or choosing which data model once you have chosen that technology.

For a many to many relationship is it better to use relational database or nosql?

For a many to many relationship is it better to use relational database or nosql?
Let's assume you have a bunch of users. And each user can have friends that are from the same users table. So it's essentially a many to many relationship to itself. Many to many relationship in relational database will create a third table. Now I was wondering assuming this user table is huge like millions of people in there, this third table would be thus be gigantic assuming let's say each person has more than 10 friends each. Wouldn't it be more efficient for friends(and just overall more intuitive) to be stored as a json list in a nosql as shown below?
{"user1": "friendslist":["user2","user3","user4"]}
{"user2": "friendslist":["user1","user3","user4"]}
{"user3": "friendslist":["user1","user2","user4"]}
{"user4": "friendslist":["user1","user2","user3"]}
so this is also a data structures question so it would be btree vs hash table if I'm not mistaken.
It does seem more intuitive to the untrained. That's why the network data model is still so prevalent even though the relational model has been around for decades.
"Better" depends on how you want to use it, and "more efficient" depends on the database engine, indexes and various other factors. I prefer the relational model since I can formulate any reasonable question that can be logically derived from the data and get a correct answer. For example, if I wanted to find friends of friends, I could join a relational many-to-many table with itself. I could find cycles and cliques of any particular size. I could easily declare a unique constraint on pairs of friends.
It's possible to do these things without a relational database but I doubt it would be as easy or concise.
The particular data structure used by the database engine has nothing to do with the relational concept, though it is relevant to efficiency. For more info on which data structure would be used, you'll need to look at particular database management systems and their storage engines.
Why would a relational implementation be "gigantic"? Why would your structure be "more efficient"? You are making a lot of unfounded assumptions that it would be good for you to think about. (Learn some relational basics. And the relational take on relational vs NoSQL.)
Re "intuitive", the obvious relational organization for when U friended F is a table Friended holding rows where... "U friended F". Friended(U,F) for short. If you want Us where U friended x, that's the rows where Friended(U,x), ie the rows in PROJECT U RESTRICT F x Friended, ie the rows in PROJECT U (Friended WHERE F=x), depending on whether you want to think in logic, relations or a mix. What's your query for that? Using a relational interface in terms of predicates and tables does not require or preclude any particular implementations. The entire NoSQL movement is a sad consequence of lack of understanding by users and vendors of the relational model as interface to data, not as storage structure. A DBMS for a NoSQL use case needs only to be a relational DBMS better supporting arbitrary types in querying and implementation.
From my answer to Adjustable, versioned graph database:
There is an obvious 1:1 correspondence between your states at a given time and a relational database with a given schema. So there is an obvious 1:1 correspondence between your set of states over time and a changing-schema database, ie a variable whose value is a database plus metadata, manipulated by both DDL and DML update commands. So there is no evidence that you shouldn't just use a relational DBMS.
Relational DBMSs allow generic querying with automated implementation at a certain computational complexity with certain opportunities for optimization. Any application can have specialized queries that make a specialized data structure and operators a better choice. But you must design your application and know about such special aspects to justify this. As it is, with the obvious correspondences between your states and relational states, this has not been justified.
Just because you can draw a picture of your application state as of some time using a graph does not mean that you need a graph database. What matters is what specialized queries/expressions you will be evaluating. You should understand what these are in terms of your problem domain, which is probably most easily expressible per some specialized data structure and operators and relationally. Then you can compare the expressive and computational demands to a specialized data structure, a relational representation, and the models of particular graph databases.
Of course there are specialized applications where we use optimized special operators and storage. But that merits justification, and from a relational perspective should supported by an extendible relational DBMS.

The concept of implementing key/value stores with relational database languages

I want to get myself wet with the concept of implementing key/value stores with relational database languages (like mysql and sql server).
However this is one of the times when Google isn't good enough.
Does anyone know of any good info / good links regarding the concept of implementing key/value stores with relational database languages?
Wiki for EAV http://en.wikipedia.org/wiki/Entity-attribute-value_model. SO answer with link to whitepaper called "Best Practices for Semantic Data Modeling for Performance and Scalability" EAV over SQL Server
The primary reason to do a key value schema implementation in relational is to have the flexibility of a small sub-schema with key value aspects and the main schema being relational or vice versa. This could give one extreme flexibility to address key value lookups for some portion of the application and others a traditional relational option without having to keep multiple databases.
In fact we have implemented such cases for some of our customers, where the customers are either a specific relational DB shop or for the same above mentioned reasons. You can always create a key value store in a relational database but not the other way.

What is the difference between a Relational and Non-Relational Database?

MySQL, PostgreSQL and MS SQL Server are relational database systems, and NoSQL, MongoDB, etc. are non-relational DBMSs.
What are the differences between the two types of system?
Hmm, not quite sure what your question is.
In the title you ask about Databases (DB), whereas in the body of your text you ask about Database Management Systems (DBMS). The two are completely different and require different answers.
A DBMS is a tool that allows you to access a DB.
Other than the data itself, a DB is the concept of how that data is structured.
So just like you can program with Oriented Object methodology with a non-OO powered compiler, or vice-versa, so can you set-up a relational database without an RDBMS or use an RDBMS to store non-relational data.
I'll focus on what Relational Database (RDB) means and leave the discussion about what systems do to others.
A relational database (the concept) is a data structure that allows you to link information from different 'tables', or different types of data buckets. A data bucket must contain what is called a key or index (that allows to uniquely identify any atomic chunk of data within the bucket). Other data buckets may refer to that key so as to create a link between their data atoms and the atom pointed to by the key.
A non-relational database just stores data without explicit and structured mechanisms to link data from different buckets to one another.
As to implementing such a scheme, if you have a paper file with an index and in a different paper file you refer to the index to get at the relevant information, then you have implemented a relational database, albeit quite a simple one. So you see that you do not even need a computer (of course it can become tedious very quickly without one to help), similarly you do not need an RDBMS, though arguably an RDBMS is the right tool for the job. That said there are variations as to what the different tools out there can do so choosing the right tool for the job may not be all that straightforward.
I hope this is layman terms enough and is helpful to your understanding.
Relational databases have a mathematical basis (set theory, relational theory), which are distilled into SQL == Structured Query Language.
NoSQL's many forms (e.g. document-based, graph-based, object-based, key-value store, etc.) may or may not be based on a single underpinning mathematical theory. As S. Lott has correctly pointed out, hierarchical data stores do indeed have a mathematical basis. The same might be said for graph databases.
I'm not aware of a universal query language for NoSQL databases.
Most of what you "know" is wrong.
First of all, as a few of the relational gurus routinely (and sometimes stridently) point out, SQL doesn't really fit nearly as closely with relational theory as many people think. Second, most of the differences in "NoSQL" stuff has relatively little to do with whether it's relational or not. Finally, it's pretty difficult to say how "NoSQL" differs from SQL because both represent a pretty wide range of possibilities.
The one major difference that you can count on is that almost anything that supports SQL supports things like triggers in the database itself -- i.e. you can design rules into the database proper that are intended to ensure that the data is always internally consistent. For example, you can set things up so your database asserts that a person must have an address. If you do so, anytime you add a person, it will basically force you to associate that person with some address. You might add a new address or you might associate them with some existing address, but one way or another, the person must have an address. Likewise, if you delete an address, it'll force you to either remove all the people currently at that address, or associate each with some other address. You can do the same for other relationships, such as saying every person must have a mother, every office must have a phone number, etc.
Note that these sorts of things are also guaranteed to happen atomically, so if somebody else looks at the database as you're adding the person, they'll either not see the person at all, or else they'll see the person with the address (or the mother, etc.)
Most of the NoSQL databases do not attempt to provide this kind of enforcement in the database proper. It's up to you, in the code that uses the database, to enforce any relationships necessary for your data. In most cases, it's also possible to see data that's only partially correct, so even if you have a family tree where every person is supposed to be associated with parents, there can be times that whatever constraints you've imposed won't really be enforced. Some will let you do that at will. Others guarantee that it only happens temporarily, though exactly how long it can/will last can be open to question.
The relational database uses a formal system of predicates to address data. The underlying physical implementation is of no substance and can vary to optimize for certain operations, but it must always assume the relational model. In layman's terms, that's just saying I know exactly how many values (attributes) each row (tuple) in my table (relation) has and now I want to exploit the fact accordingly, thoroughly and to it's extreme. That's the true nature of the beast.
Since we're obviously the generation that has had a relational upbringing, if you look at NoSQL database models from the perspective of the relational model, again in layman's terms, the first obvious difference is that no assumptions about the number of values a row can contain is ever made. This is really oversimplifying the matter and does not cleanly apply to the intricacies of the physical models of every NoSQL database, but it's the pinnacle of the relational model and the first assumption we have to leave behind or, if you'd rather, the biggest leap we have to make.
We can agree to two things that are true for every DBMS: it can store any kind of data and has enough mathematical underpinnings to make it possible to manage the data in any way imaginable. The reality is that you'll never want to make the mistake of putting any of the two points to the test, but rather just stick with what the actual DBMS was really made for. In layman's terms: respect the beast within!
(Please note that I've avoided comparing the (obviously) well founded standards revolving around the relational model against the many flavors provided by NoSQL databases. If you'd like, consider NoSQL databases as an umbrella term for any DBMS that does not completely assume the relational model, in exclusion to everything else. The differences are too many, but that's the principal difference and the one I think would be of most use to you to understand the two.)
Try to explain this question in a level referring to a little bit technology
Take MongoDB and Traditional SQL for comparison, imagine the scenario of posting a Tweet on Twitter. This tweet contains 9 pictures. How do you store this tweet and its corresponding pictures?
In terms of traditional relationship SQL, you can store the tweets and pictures in separate tables, and represent the connection through building a new table.
What's more, you can set a field which is an image type, and zip the 9 pictures into a binary document and store it in this field.
Using MongoDB, you could build a document like this (similar to the concept of a table in relational SQL):
{
"id":"XXX",
"user":"XXX",
"date":"xxxx-xx-xx",
"content":{
"text":"XXXX",
"picture":["p1.png","p2.png","p3.png"]
}
Therefore, in my opinion, the main difference is about how do you store the data and the storage level of the relationships between them.
In this example, the data is the tweet and the pictures. The different mechanism about storage level of relationship between them also play a important role in the difference between both.
I hope this small example helps show the difference between SQL and NoSQL (ACID and BASE).
Here's a link of picture about the goals of NoSQL from the Internet:
http://icamchuwordpress-wordpress.stor.sinaapp.com/uploads/2015/01/dbc795f6f262e9d01fa0ab9b323b2dd1_b.png
The difference between relational and non-relational is exactly that. The relational database architecture provides with constraints objects such as primary keys, foreign keys, etc that allows one to tie two or more tables in a relation. This is good so that we normalize our tables which is to say split information about what the database represents into many different tables, once can keep the integrity of the data.
For example, say you have a series of table that houses information about an employee. You could not delete a record from a table without deleting all the records that pertain to such record from the other tables. In this way you implement data integrity. The non-relational database doesn't provide this constraints constructs that will allow you to implement data integrity.
Unless you don't implement this constraint in the front end application that is utilized to populate the databases' tables, you are implementing a mess that can be compared with the wild west.
First up let me start by saying why we need a database.
We need a database to help organise information in such a manner that we can retrieve that data stored in a efficient manner.
Examples of relational database management systems(SQL):
1)Oracle Database
2)SQLite
3)PostgreSQL
4)MySQL
5)Microsoft SQL Server
6)IBM DB2
Examples of non relational database management systems(NoSQL)
1)MongoDB
2)Cassandra
3)Redis
4)Couchbase
5)HBase
6)DocumentDB
7)Neo4j
Relational databases have normalized data, as in information is stored in tables in forms of rows and columns, and normally when data is in normalized form, it helps to reduce data redundancy, and the data in tables are normally related to each other, so when we want to retrieve the data, we can query the data by using join statements and retrieve data as per our need.This is suited when we want to have more writes, less reads, and not much data involved, also its really easy relatively to update data in tables than in non relational databases. Horizontal scaling not possible, vertical scaling possible to some extent.CAP(Consistency, Availability, Partition Tolerant), and ACID (Atomicity, Consistency, Isolation, Duration)compliance.
Let me show entering data to a relational database using PostgreSQL as an example.
First create a product table as follows:
CREATE TABLE products (
product_no integer,
name text,
price numeric
);
then insert the data
INSERT INTO products (product_no, name, price) VALUES (1, 'Cheese', 9.99);
Let's look at another different example:
Here in a relational database, we can link the student table and subject table using relationships, via foreign key, subject ID, but in a non relational database no need to have two documents, as no relationships, so we store all the subject details and student details in one document say student document, then data is getting duplicated, which makes updating records troublesome.
In non relational databases, there is no fixed schema, data is not normalized. no relationships between data is created, all data mostly put in one document. Well suited when handling lots of data, and can transfer lots of data at once, best where high amounts of reads and less writes, and less updates, bit difficult to query data, as no fixed schema. Horizontal and vertical scaling is possible.CAP (Consistency, Availability, Partition Tolerant)and BASE (Basically Available, soft state, Eventually consistent)compliance.
Let me show an example to enter data to a non relational database using Mongodb
db.users.insertOne({name: ‘Mary’, age: 28 , occupation: ‘writer’ })
db.users.insertOne({name: ‘Ben’ , age: 21})
Hence you can understand that to the database called db, and there is a collections called users, and document called insertOne to which we add data, and there is no fixed schema as our first record has 3 attributes, and second attribute has 2 attributes only, this is no problem in non relational databases, but this cannot be done in relational databases, as relational databases have a fixed schema.
Let's look at another different example
({Studname: ‘Ash’, Subname: ‘Mathematics’, LecturerName: ‘Mr. Oak’})
Hence we can see in non relational database we can enter both student details and subject details into one document, as no relationships defined in non relational databases, but here this way can lead to data duplication, and hence errors in updating can occur therefore.
Hope this explains everything
In layman terms it's strongly structured vs unstructured, which implies that you have different degrees of adaptability for your DB.
Differences arise in indexation particularly as you need to ensure that a certain reference index can link to a another item -> this a relation. The more strict structure of relational DB comes from this requirement.
To note that NosDB apaprently provides both relational and non relational DBs and a way to query both http://www.alachisoft.com/nosdb/sql-cheat-sheet.html

Database schema design for MySQL, suggestions?

So, have the chance to do my first "real" database, just thinking to myself what I should think about... any suggestions?
Select DB (using MySQL)
Select DB engine based on needs (using MySQL, know MyISAM vs InnoDB)
CREATE TABLE documention for DB using
Column definition (select right datatype)
Model data to be normalized
Referential Integrity: Primary Key (Natural or Surrogate); Composite Keys; Foreign Key
Which leads me to my main question, how do I know if the database is a success at fit enough?
As you are listing general questions that need to be answered in order to go from idea to working system I'll propose that it is very important to organize them into
1) logical design (get a clean model that represents the problem space you are trying to model)
2) physical design (which RDBMS and storage engine, exact data types, other practical and performance related decisions)
You are doing too much of mixing between the two. When you get a clean logical model and know the relationships between the entities you are modelling then the physical modelling will not be hard.
EDIT:
There are many books that deal with the steps of logical data design, but normally you would try to:
define use cases and business requirements (things are pretty soft still, check the requirements for contradictions; this is done interviewing people who know business process well, which can degenerate to a discussion with yourself)
get a list of all the attributes and entities used across the system and define them (data dictionary)
determine the domain of the attributes (which, later at physical level can be accomplished as data type, check contraint or by referring to 'helper' table, but don't worry about this yet, just make sure that you define domains well)
draw ER/UML diagrams defining relationships - define tables in terms of primary keys, foreign keys and all other attributes (this time aim for a completeness); this step can be done using CAM and decent diagramming tools will spit out CREATE DATABASE scripts from diagrams
examine the model in search for denormalized data (should be normalized already, however when translating problem space into logical model it is possible to make mistakes and discover that you have redundancy or other anomalies)
A few of these steps need to go back and forth as you consider different ways of accomplishing certain tasks. For example including new attributes might make you go and analyze a new use case. Or a discovery of contradicting requirement might lead to a discovery of a whole new entity. Or discovering a redundancy might lead you to a discovery of undocumented process that exists (and justifies, or rather, explains percieved redundancy by redefining a seemingly duplicate attribute). Etc...
Model your data and normalise it before defining your columns. This needs to be your first task even before selecting a database and table tiypes as it will allow you to get clarity about the task you are modeling.
Select DB engine based on needs (using MySQL, know MyISAM vs InnoDB)
The trade off is "query performance" v "transactions". In most cases innoDB performance is good enough and the benefits of transactions outweigh any downside.
Create table documentation is available on the MySQL website.
As Unreason says "When you get a clean logical model and know the relationships between the entities you are modelling then the physical modelling will not be hard".
Success can be measured in various ways. Money in your pocket, good performance on low priced hardware. Lots of happy comments from users ... Like Stackoverflow:)
1. Selecting a RDBMS is largely a matter of preference. You seem to be leaning towards MySQL already. That's okay, because MySQL is cheap and popular. However, you are left with not having an engine that can do transactions and full-text search at the same time (between MyISAM and InnoDB). Fulltext Search with InnoDB
2. (and 4) MyISAM vs InnoDB and datatypes
MyISAM for: full-text search and table level locking
InnoDB for: transactions, FKs, and row level locking (but no full-text search)
Also, InnoDB will probably perform better with large number of rows because of row level locking versus table level locking
3. CREATE TABLE? I prefer to use a database IDE, like Toad for MySQL
5. (and 6) Review of DB normalization/PKs/FKs (You'll need to use InnoDB for FKs.)
7. You forgot indexes! Very important factor in a database.
What is an index in MySQL?
MySQL indexes - what are the best practises?
Yes MySQL is a good fit if you have the above requirements.
However, as I said, with MySQL/MyISAM/InnoDB, you don't have an engine that can do full-text search AND transactions/FKs. A simple option is to have a 2nd copy (in MyISAM) of the InnoDB tables that need full-text search capability. You can do this because you can mix the 2 engines in the same database. Or, maybe you don't even need full-text search because LIKE is sufficient for your application.
On the other hand, with SQL Server, you can have all the features, including full-text, transactions, and FKs all in one engine.
Yet another option, is to use a separate technology for indexed full-text searches. There's a plugin for MySQL:
Sphinx Search Server
Reference manual
Example:
Using the Sphinx Search Engine with MySQL