Creating relationship between table column and another database - mysql

Say I have an application that adds companies. I have a table like this:
create table companies(
id primary key,
name varchar
)
For each company, I have to store loads of business-type information. Since they are so many, I've decided to create one database per company to avoid collision, very slow performance and complicated queries. The database name would be the company name in my companies table.
My problem is I would like to give the company name and the database a one-to-one relationship so they would be in sink with each other. Is that possible? If not, is there a better approach besides creating a database per company?

This is an elaboration on my comment.
Databases are designed to handle tables with millions, even billions of rows. I am guessing that your data is not that big.
Why do you want to store all the data for a single entity in a single table? Here are some reasons:
You can readily run queries across different companies.
You can define foreign key relationships between entities.
If you change the data structure, you can do it in one place.
It can be much more efficient in terms of space. Databases generally store data on data pages, and partially filled pages will eat up lots of space.
You have a single database for backup and recovery purposes.
A where clause to select a single company's data is not particularly "complicated".
(Note: This is referring to "entities", a database term. Data for the companies can still be spread across multiple tables.)
For performance, you can then adjust the data model, add indexes, and partition tables. This is sufficient for the vast myriad of applications that run on databases.
There are a handful of situations where a separate database per company/client is needed. Here are some I've encountered:
You are told this is what you have to do.
Each client really is customized, so there is little common data structure among them.
Security requirements specify that data must be in different databases or even on different servers (this is "good reason" for 1.).

Related

database, a table for each users or a big table?

I just start to learn database, in designing a database, I notice that a lot of recommendations, such as in this thread, suggests NOT to use one table per user, but keep all data in a big table and do a query when needed. But I still do NOT understand, because it seems that under a lot of situations, one table per user seems much efficient.
Suppose I have a database for 10,000 customers for them to track their orders. Each of customers will have very few orders, like around 10. In this way, every customer logs in, you will have to go through a big table to fetch data for this customer, however, if you keep each table per user, you can directly get what the customer need.
Another example, a restaurant information system tracks all restaurants' menu (say, in [foodname, price] pair), since each restaurant has different number of dishes, you can't really put each menu in one row, you can only make a huge table with [foodname,price,restaurant] rows. But there are a lot of restaurants, so when a user needs the menu of a certain restaurant, you'll need to go through the data of all restaurants, obviously inefficient.
For both these two examples, I can't think of a good way to design a database if I don't want to create each table per user. So my question is this:
If we want to avoid each table per user design, how should we design a database for these kinds of situations?
Sql databases are designed exactly for the types of scenarios you are suggesting. They can handle millions or billions of rows extremely efficiently. The complications of trying to partition every customer into a separate table are vast.
The only thing you need to worry about is that you have indexes on your table so that you do not have to scan through that billion records to find the ones applicable to your customer.
Once the indexes are in place then all of your example scenarios become simple and efficient queries.
Databases are designed to do exactly the kinds of lookups you're describing efficiently, even if all users are in a single table. As long as you create an index by user ID (or have the user ID as part of the primary key), then the database will keep the table sorted by user ID, so it can find any particular user efficiently using binary search.
"Tables" don't mean exactly what you think they mean either. Tables are meant to be used to logically group data in ways that are useful for the programmer. In theory, any database you use could just consist of one big table, but it's generally easier to reason about a database if you know that rows of the User table look like this, while rows of the Message table (or whatever) look like that. In fact, many databases only actually have one big underlying "table" in which all the data lives. So, whether two users are in the "same table" or "different tables" often doesn't matter at all from an efficiency standpoint.
Database management software is written based on the assumption that you'll have a relatively small number of tables (dozens, maybe hundreds in extreme cases). So go with whatever your database's documentation recommends.

With MySQL, in a complex system, is it better to have a database with 1000 tables or multiple databases to split up the tables?

My complex system has to do with collectables. I've got all kinds... movies, books, music, action figures, hot wheels, legos, video games, etc etc. Each collectable type has multiple tables associated with it with many difference references between each other. This leads to 100's, possible 1000's of tables, if this database continues to grow with new collectable types.
There are few tables they share in common. One is the barcode table, which has a key barcode that they all use. Another is a user_collection table, which stores all of a users collection, which has a collectable_id as a key. But that's about it (I might be missing a few but you get the point).
My question is, from a performance perspective, is it better to split these up into multiple databases (movies, books, comics, etc) or keep them in one database with all of the tables in it? Or does it even matter?
And if I do the split, how would I enforce the relationships I listed above?
It is always better to have a single database. Modern database systems are designed to handle large numbers of tables. It is more difficult to query across multiple databases.
Also, you need to think about recovery. If you have tables split across multiple databases, what happens when one of the databases gets destroyed? Will you have a synchronised set of database backups - because when you restore one database you will have to restore the other database(s) to the same point.
I would definitely suggest using a single database, however, I cannot really see why you would need 1000s of tables.
For instance you could have a table for movies, books, but depending on the design, you probably won't need to have a separate table for each categories.
It is more likely to put all the item in one or two tables, and then have categories tables, linking them with a foreign key.
Even if it does grow to 100's of tables with 1000's of items, it shouldn't be a problem with a well designed database.

dimensions, foreign keys, relational data

what is the difference between a relation of a relational database and a dimension as represented in a star diagram?
As part of an assignment I have a relational datawarehouse design, where most of the tables have been normalised using many to many, one to one, one to many relationship schema (I think this is the right terminology? please correct me if I'm wrong). The next step is to draw a star diagram that could be used in a data-mining environment, which I guess means a fact table that draws from different dimensions...
I'm a little confused here because 1. any data-analysis I could think of could be taken from the relational database, so whats the point of re-structuring it? and 2. If some of the tables that you want to draw data from contain foreign keys, how do you split that into dimensions.
for example:
I have these relations:
Courses {course_id, description}
Modules {module_id, description}
Course_modules {course_id, module_id}
Students {student_id, address, enrollment_option, enrollment_date, name, surname, nationality, home_language, gender ...}
Module_grades {student_id, module_id, assignment_1, assignment_1_sub_date, assignment_2, assignment_2_sub_date, exam, exam_date, overall_result}
and I'd like to know how course results relate to module grades. With a relational database I would query to join a table containing students information with the module grades table. What would be the equivalent with dimensions and reports? Especially as I'm using multiple columns as my primary key in the grades relation..
An operational database is highly normalized, which improves write performance, and minimizes write anomalies. It is designed to facilitate transaction processing.
An analytic database (data warehouse) is highly denormalized, which improves read performance, and makes it easier for non DBAs to understand. It is designed to facilitate analysis.
what is the difference between a relation of a relational database and a dimension
A data warehouse can be in a relational database, and can use its relations (tables), so there is no difference.
any data-analysis I could think of could be taken from the relational
database, so whats the point of re-structuring it?
A data warehouse often includes data from many sources, not just your operational database. Examples: emails, website scraping.
If you tell your boss to join ten tables to do a simple analysis, you will get fired.
If some of the tables that you want to draw data from contain foreign keys, how do you split that into dimensions.
This depends entirely on what you are trying to analyze, but in general you denormalize and copy the data to dimension tables.
Dimensional Design
You need to start with a process or event that you want to analyze.
Use Excel. Add all the columns that are pertinent to your analysis. For example, if you were analyzing the process of people visiting your website, each row in Excel would represent a site visit, and columns might be start_time, # pages visited, first page, last page, etc.
Now do ONE level of normalization. Find categorical columns that you can group together (like info about the user's web browser). These would go in a browsers dimension table. Find (true) numerical values that you cannot normalize out. These are measures. Example, the number of pages visited.
The measures, and keys that refer to your dimension tables, are your fact table.
Now go read this book.

New Table vs. New Schema

Suppose I have a schema with many related tables: users, cities, items, purchases, etc. I now want a table in my database that contains solely event logging data for my internal support. The rows of information in the logging table are self-contained, not at all relational, and unrelated to my other tables. Is it better to create a new table in my existing schema, or to create an entirely new schema? Is one method preferred over the other? Is there a cost associated with one over the other?
In my opinion it all depends on the size of your database. If you are managing dozens of tables with millions of rows of data then you will probably have an easier time isolating and managing these logging tables into their own schema/database. If you are just managing a small app then don't worry, put everything into one database/schema. If your database is large or your anticipate your database becoming large then break them out. Once they are broken into separate entities you can easily manage the communication between the multiple databases/schemas using all kinds of great available tools.
In my opinion, if the data is unrelated, it belongs in a different schema. There's likely to be a very small overhead associated with creating a new schema, as opposed to having everything in a single schema, but I wouldn't have thought it was worth worrying about.

What is the difference between a Relational and Non-Relational Database?

MySQL, PostgreSQL and MS SQL Server are relational database systems, and NoSQL, MongoDB, etc. are non-relational DBMSs.
What are the differences between the two types of system?
Hmm, not quite sure what your question is.
In the title you ask about Databases (DB), whereas in the body of your text you ask about Database Management Systems (DBMS). The two are completely different and require different answers.
A DBMS is a tool that allows you to access a DB.
Other than the data itself, a DB is the concept of how that data is structured.
So just like you can program with Oriented Object methodology with a non-OO powered compiler, or vice-versa, so can you set-up a relational database without an RDBMS or use an RDBMS to store non-relational data.
I'll focus on what Relational Database (RDB) means and leave the discussion about what systems do to others.
A relational database (the concept) is a data structure that allows you to link information from different 'tables', or different types of data buckets. A data bucket must contain what is called a key or index (that allows to uniquely identify any atomic chunk of data within the bucket). Other data buckets may refer to that key so as to create a link between their data atoms and the atom pointed to by the key.
A non-relational database just stores data without explicit and structured mechanisms to link data from different buckets to one another.
As to implementing such a scheme, if you have a paper file with an index and in a different paper file you refer to the index to get at the relevant information, then you have implemented a relational database, albeit quite a simple one. So you see that you do not even need a computer (of course it can become tedious very quickly without one to help), similarly you do not need an RDBMS, though arguably an RDBMS is the right tool for the job. That said there are variations as to what the different tools out there can do so choosing the right tool for the job may not be all that straightforward.
I hope this is layman terms enough and is helpful to your understanding.
Relational databases have a mathematical basis (set theory, relational theory), which are distilled into SQL == Structured Query Language.
NoSQL's many forms (e.g. document-based, graph-based, object-based, key-value store, etc.) may or may not be based on a single underpinning mathematical theory. As S. Lott has correctly pointed out, hierarchical data stores do indeed have a mathematical basis. The same might be said for graph databases.
I'm not aware of a universal query language for NoSQL databases.
Most of what you "know" is wrong.
First of all, as a few of the relational gurus routinely (and sometimes stridently) point out, SQL doesn't really fit nearly as closely with relational theory as many people think. Second, most of the differences in "NoSQL" stuff has relatively little to do with whether it's relational or not. Finally, it's pretty difficult to say how "NoSQL" differs from SQL because both represent a pretty wide range of possibilities.
The one major difference that you can count on is that almost anything that supports SQL supports things like triggers in the database itself -- i.e. you can design rules into the database proper that are intended to ensure that the data is always internally consistent. For example, you can set things up so your database asserts that a person must have an address. If you do so, anytime you add a person, it will basically force you to associate that person with some address. You might add a new address or you might associate them with some existing address, but one way or another, the person must have an address. Likewise, if you delete an address, it'll force you to either remove all the people currently at that address, or associate each with some other address. You can do the same for other relationships, such as saying every person must have a mother, every office must have a phone number, etc.
Note that these sorts of things are also guaranteed to happen atomically, so if somebody else looks at the database as you're adding the person, they'll either not see the person at all, or else they'll see the person with the address (or the mother, etc.)
Most of the NoSQL databases do not attempt to provide this kind of enforcement in the database proper. It's up to you, in the code that uses the database, to enforce any relationships necessary for your data. In most cases, it's also possible to see data that's only partially correct, so even if you have a family tree where every person is supposed to be associated with parents, there can be times that whatever constraints you've imposed won't really be enforced. Some will let you do that at will. Others guarantee that it only happens temporarily, though exactly how long it can/will last can be open to question.
The relational database uses a formal system of predicates to address data. The underlying physical implementation is of no substance and can vary to optimize for certain operations, but it must always assume the relational model. In layman's terms, that's just saying I know exactly how many values (attributes) each row (tuple) in my table (relation) has and now I want to exploit the fact accordingly, thoroughly and to it's extreme. That's the true nature of the beast.
Since we're obviously the generation that has had a relational upbringing, if you look at NoSQL database models from the perspective of the relational model, again in layman's terms, the first obvious difference is that no assumptions about the number of values a row can contain is ever made. This is really oversimplifying the matter and does not cleanly apply to the intricacies of the physical models of every NoSQL database, but it's the pinnacle of the relational model and the first assumption we have to leave behind or, if you'd rather, the biggest leap we have to make.
We can agree to two things that are true for every DBMS: it can store any kind of data and has enough mathematical underpinnings to make it possible to manage the data in any way imaginable. The reality is that you'll never want to make the mistake of putting any of the two points to the test, but rather just stick with what the actual DBMS was really made for. In layman's terms: respect the beast within!
(Please note that I've avoided comparing the (obviously) well founded standards revolving around the relational model against the many flavors provided by NoSQL databases. If you'd like, consider NoSQL databases as an umbrella term for any DBMS that does not completely assume the relational model, in exclusion to everything else. The differences are too many, but that's the principal difference and the one I think would be of most use to you to understand the two.)
Try to explain this question in a level referring to a little bit technology
Take MongoDB and Traditional SQL for comparison, imagine the scenario of posting a Tweet on Twitter. This tweet contains 9 pictures. How do you store this tweet and its corresponding pictures?
In terms of traditional relationship SQL, you can store the tweets and pictures in separate tables, and represent the connection through building a new table.
What's more, you can set a field which is an image type, and zip the 9 pictures into a binary document and store it in this field.
Using MongoDB, you could build a document like this (similar to the concept of a table in relational SQL):
{
"id":"XXX",
"user":"XXX",
"date":"xxxx-xx-xx",
"content":{
"text":"XXXX",
"picture":["p1.png","p2.png","p3.png"]
}
Therefore, in my opinion, the main difference is about how do you store the data and the storage level of the relationships between them.
In this example, the data is the tweet and the pictures. The different mechanism about storage level of relationship between them also play a important role in the difference between both.
I hope this small example helps show the difference between SQL and NoSQL (ACID and BASE).
Here's a link of picture about the goals of NoSQL from the Internet:
http://icamchuwordpress-wordpress.stor.sinaapp.com/uploads/2015/01/dbc795f6f262e9d01fa0ab9b323b2dd1_b.png
The difference between relational and non-relational is exactly that. The relational database architecture provides with constraints objects such as primary keys, foreign keys, etc that allows one to tie two or more tables in a relation. This is good so that we normalize our tables which is to say split information about what the database represents into many different tables, once can keep the integrity of the data.
For example, say you have a series of table that houses information about an employee. You could not delete a record from a table without deleting all the records that pertain to such record from the other tables. In this way you implement data integrity. The non-relational database doesn't provide this constraints constructs that will allow you to implement data integrity.
Unless you don't implement this constraint in the front end application that is utilized to populate the databases' tables, you are implementing a mess that can be compared with the wild west.
First up let me start by saying why we need a database.
We need a database to help organise information in such a manner that we can retrieve that data stored in a efficient manner.
Examples of relational database management systems(SQL):
1)Oracle Database
2)SQLite
3)PostgreSQL
4)MySQL
5)Microsoft SQL Server
6)IBM DB2
Examples of non relational database management systems(NoSQL)
1)MongoDB
2)Cassandra
3)Redis
4)Couchbase
5)HBase
6)DocumentDB
7)Neo4j
Relational databases have normalized data, as in information is stored in tables in forms of rows and columns, and normally when data is in normalized form, it helps to reduce data redundancy, and the data in tables are normally related to each other, so when we want to retrieve the data, we can query the data by using join statements and retrieve data as per our need.This is suited when we want to have more writes, less reads, and not much data involved, also its really easy relatively to update data in tables than in non relational databases. Horizontal scaling not possible, vertical scaling possible to some extent.CAP(Consistency, Availability, Partition Tolerant), and ACID (Atomicity, Consistency, Isolation, Duration)compliance.
Let me show entering data to a relational database using PostgreSQL as an example.
First create a product table as follows:
CREATE TABLE products (
product_no integer,
name text,
price numeric
);
then insert the data
INSERT INTO products (product_no, name, price) VALUES (1, 'Cheese', 9.99);
Let's look at another different example:
Here in a relational database, we can link the student table and subject table using relationships, via foreign key, subject ID, but in a non relational database no need to have two documents, as no relationships, so we store all the subject details and student details in one document say student document, then data is getting duplicated, which makes updating records troublesome.
In non relational databases, there is no fixed schema, data is not normalized. no relationships between data is created, all data mostly put in one document. Well suited when handling lots of data, and can transfer lots of data at once, best where high amounts of reads and less writes, and less updates, bit difficult to query data, as no fixed schema. Horizontal and vertical scaling is possible.CAP (Consistency, Availability, Partition Tolerant)and BASE (Basically Available, soft state, Eventually consistent)compliance.
Let me show an example to enter data to a non relational database using Mongodb
db.users.insertOne({name: ‘Mary’, age: 28 , occupation: ‘writer’ })
db.users.insertOne({name: ‘Ben’ , age: 21})
Hence you can understand that to the database called db, and there is a collections called users, and document called insertOne to which we add data, and there is no fixed schema as our first record has 3 attributes, and second attribute has 2 attributes only, this is no problem in non relational databases, but this cannot be done in relational databases, as relational databases have a fixed schema.
Let's look at another different example
({Studname: ‘Ash’, Subname: ‘Mathematics’, LecturerName: ‘Mr. Oak’})
Hence we can see in non relational database we can enter both student details and subject details into one document, as no relationships defined in non relational databases, but here this way can lead to data duplication, and hence errors in updating can occur therefore.
Hope this explains everything
In layman terms it's strongly structured vs unstructured, which implies that you have different degrees of adaptability for your DB.
Differences arise in indexation particularly as you need to ensure that a certain reference index can link to a another item -> this a relation. The more strict structure of relational DB comes from this requirement.
To note that NosDB apaprently provides both relational and non relational DBs and a way to query both http://www.alachisoft.com/nosdb/sql-cheat-sheet.html