How do databases physically store data on a filesystem? - mysql

I need to know how data from databases is stored on a filesystem. I am sure, that different databases use different ways of storing data, but I want to know what the general rule is (if there is one), and what can be changed in settings of a particular DB.
How is the whole database stored? In one big file or one file per table?
What if a table is enormous? Would it be split into few files?
What is typical size of file in that case?

The answer to this question is both database dependent and implementation dependent. Here are some examples of how data can be stored:
As a single file per database. (This is the default for SQL Server.)
Using a separate file system manager, which could be the operating system. (MySQL has several options, with names like InnoDB.)
Using separate files for each table. (If we consider Access a database.)
As multiple physical files, spread across multiple file systems, but represented as a single "file". (HIVE, for instance, that uses a parallel file system to store the data.)
However, these are the default configurations. Real databases typically let you split the data among multiple physical devices. SQL Server and MySQL call this partitions. Oracle calls this table spaces. These are typically set up by knowledgeable DBAs who understand the performance requirements of the system.
The final questions are easy to answer, though. Most databases give you the option of either growing the databases as space is needed or giving the database a fixed (or fixed maximum) size. I have not encountered a database engine that will split the underlying data into multiple files automatically, although it is possible that newer column oriented databases (such as Vertica) do something similar.

Related

mySQL performance one huge database vs small many

I am developing a site that has many subdomains in it.
It has blogging module, management system, and many more. I have shared this question in various sites but couldn't get a proper reply.
Question is should I use one database for all the modules, this means my database would have nearly 100 tables. Is this approach be appropriate or should I create separate database for every module?
Well, it does not really matter.
If you use innodb with single data file (innodb_file_per_table setting is not enabled), then all data will be stored in a single file anyway.
With innodb separate file per table mode or with myisam table engine, the only difference between one or multiple databases is really the directory where the database files are stored. Unless the directories (databases) are located in different storage devices with different speeds, their performance will be the same.
There can be 2 reasons to keep some tables in a different database:
Security: mysql does not support role based access control. Therefore if there is a group of tables that should be accessible by a certain group of users only, then the access control is more manageable if those tables are in a different database.
If some of the modules you mentioned happen to use the same table name, then you will have to move them to a separate database or you need to modify the code and table names to avoid errors.
There is no right or wrong way to design a system. Just advantages and disadvantages to the various techniques. I normally work in Oracle and SQL Server so I had to look up some terms for MySQL. According to my research, in MySQL a database is synonymous with a schema which changes things. I'd consider these things when planning the physical design for any vendor:
Security - Do all subdomains need read/write to each other? How are the users secured? Choosing one or many schemas can impact how easy schema and user security is to manage and control.
Growth - Do some subdomains grow at a faster rate than others? If yes, I'd consider separating them to allow for the different growth rates.
Organization - Is it easier to identify the different subdomains in practice if they're separated? If you don't separate them, use a strong naming convention so you can easily identify objects within one subdomain.
Linking - How easy is it to access one schema/database from another?
Hope this helps.

MySQL performace across databases vs within the same database

In our software, we share information across installations.
Currently we use staging tables within the same database to facilitate this. We use stored procedures to pull certain data from the live tables into the staging tables, and then dump them. This dump then gets loaded into the staging tables of the target database, and stored procedures merge in the data.
This works fine, but for a few reasons*, I'm considering moving this from staging tables to a separate staging database. I'm just concerned about whether or not this will have any performance implications.
Having very quickly tried this (just as a thought exercise) on a couple of differently configured systems, I've come up with differing results. One (with not much data, and running MySQL 5.6) showed no real difference, possibly even slightly faster. The other (with much more data, running MySQL 5.5) showed it to be about 1.5 times slower.
I'm well aware that there will likely be configuration options that may affect this, I'm no DBA, so any pointers would be much appreciated.
TL;DR
What performance implications might there be in inserting data into tables in a different database (on the same server), compared to within the same database. Will it depend on MySQL version, or configuration settings?
* If you're interested in 'reasons', I can let you know in the comments
Let me start by saying that you cannot conduct tests the way you did.
Databases, and MySQL among them, rely on hardware. There are number of optimizations for MySQL variables available which can turn it from a snail to formula 1.
Consequently, you tested on separate systems, and each either runs on different hardware or contains different data. Technically, what MySQL does by default is use InnoDB storage engine. InnoDB storage engine, by default, stores all the data into a single file. So from some "down to the core" perspective, whether you use a table or a database - MySQL won't really care because it will store it into the one and the same file. From there on we get to the point whether the file is fragmented or not (if on mechanical disk) and to many other interesting details that can't be covered in a single answer.
That brings us to next issue - databases exist to store structured data.
Databases are not glorified text files. They exist so we can ask them to cross-examine data and give us meaningful results to questions.
That means we design databases and tables in ways that correspond to certain logic. If it makes sense to store those staging tables into database A, then store it. If it makes sense to do some other thing - do the other thing.
From performance point of view, it hugely depends on how your servers are configured, both from OS to MySQL variables, to which hardware (especially HDD) you have. Without knowing what's going on there down to the last detail - no one can tell you "Yes, it is faster", "No it is slower" or "It is the same". If they do - they're lying. We basically lack every possible information to tell you with certainty which approach is better.

BLOB vs FileSystem

Although this question has been appear in past previous post, but different scenario and different consideration decide which one is the best.
I need to implement a system whereby it can handle 200GB - 400GB size of images yearly(approximately < 1mb per image). It is P&C images which only allowed for authorised personal to access and VIEW only. I am planning to use an application based of system to INSERT to MYSQL database and using PHP web based application for VIEW only.
I am thinking to use FILESYSTEM because it is easy to do backup & restore on the images and no need to worry on the size of the MYSQL database.
I am using MySQL + Apache + PHP running in Windows Server.
Your advice and input is very much appreciated.
Thank you.
Regards,
Desmond
Also worth reading:
Best Practice in File Storage while Building Applications - Database (Blob Storage) Vs File System
BLOB Storage as the Best Solution
For better scalability. Although file systems are designed to handle a large number of objects of varying sizes, say files and folders, actually they are not optimized for a huge number (tens of millions) of small files. Database systems are optimized for such scenarios.
For better availability. Database servers have availability features that extend beyond those provided by the file system. Database replication is a set of solutions that allow you to copy, distribute, and potentially modify data in a distributed environment whereas Log shipping provides a way of keeping a stand-by copy of a database in case the primary system fails.
For central repository of data with controlled growth. DBA has the privilege to control and monitor the growth of database and split the database as and when needed.
For full-text index and search operations. You can index and search certain types of data stored in BLOB columns. When a database designer decides that a table will contain a BLOB column and the column will participate in a full-text index, the designer must create, in the same table, a separate character-based data column that will hold the file extension of the file in the corresponding BLOB field. During the full-text indexing operation, the full-text service looks at the extensions listed in the character-based column (.txt, .doc, .xls, etc.), applies the corresponding filter to interpret the binary data, and extracts the textual information needed for indexing and querying.
File System Storage as the Best Solution
For the application in which the images will be used requires streaming performance, such as real-time video playback.
For applications such as Microsoft PhotoDraw® or Adobe PhotoShop, which only know how to access files.
If you want to use some specific feature in the NTFS file system such as Remote Storage.
objects smaller than 256K are best stored in a database while objects
larger than 1M are best stored in the filesystem. Between 256K and 1M,
the read:write ratio and rate of object overwrite or replacement are
important factors.
souce:
http://research.microsoft.com/apps/pubs/default.aspx?id=64525
Edit: It is MS SQL, so MAYBE same as Mysql :)

How does MySQL store data

I looked around Google but didn't find any good answers. Does it store the data in one big file? What methods does it use to make data access quicker than just reading and writing to a regular file?
This question is a bit old but I decided to answer it anyway since I have been doing some digging on the same. My answer is based on the linux file system. Basically mySQL stores data in files in your hard disk. It stores the files in a specific directory that has the system variable "datadir". Opening a mysql console and running the following command will tell you exactly where the folder is located.
mysql> SHOW VARIABLES LIKE 'datadir';
+---------------+-----------------+
| Variable_name | Value |
+---------------+-----------------+
| datadir | /var/lib/mysql/ |
+---------------+-----------------+
1 row in set (0.01 sec)
As you can see from the above command, my "datadir" was located in /var/lib/mysql/. The location of the "datadir" may vary in different systems. The directory contains folders and some configuration files. Each folder represents a mysql database and contains files with data for that specific database. below is a screenshot of the "datadir" directory in my system.
Each folder in the directory represents a MySQL database. Each database folder contains files that represent the tables in that database. There are two files for each table, one with a .frm extension and the other with a .idb extension. See screenshot below.
The .frm table file stores the table's format. Details: MySQL .frm File Format
The .ibd file stores the table's data. Details: InnoDB File-Per-Table Tablespaces
That’s it folks! I hope I helped someone.
Does it store the data in one big file?
Some DBMSes store the whole database in a single file, some split tables, indexes and other object kinds to separate files, some split files not by object kind but by some storage/size criteria, some can even entirely bypass the file system, etc etc...
I don't know which one of these strategies MySQL uses (it probably depends on whether you use MyISAM vs. InnoDB etc.), but fortunately, it doesn't matter: from the client perspective, this is a DBMS implementation detail the client should rarely worry about.
What methods does it use to make data access quicker them just reading and writing to a regular file?
First of all, DBMses are not just about performance:
They are even more about safety of your data - they have to ensure there is no data corruption even in the face of a power cut or a network failure.1
DBMSes are also about concurrency - they have to arbiter between multiple clients accessing and potentially modifying the same data.2
As for your specific question of performance, relational data is very "susceptible" to indexing and clustering, which is richly exploited by DBMSes to achieve performance. On top of that, the set-based nature of SQL lets the DBMS choose the optimal way to retrieve the data (in theory at least, some DBMSes are better at that than the others). For more about DBMS performance, I warmly recommend: Use The Index, Luke!
Also, you probably noticed that most DBMSes are rather old products. Like decades old, which is really eons in our industry's terms. One consequence of that is that people had plenty of time to optimize the heck out of the DBMS code base.
You could, in theory, achieve all these things through files, but I suspect you'd end-up with something that looks awfully close to a DBMS (even if you had the time and resources to actually do it). So, why reinvent the wheel (unless you didn't want the wheel in the first place ;) )?
1 Usually though some kind of "journaling" or "transaction log" mechanism. Furthermore, to minimize the probability of "logical" corruption (due to application bugs) and promote code reuse, most DBMSes support declarative constraints (domain, key and referential), triggers and stored procedures.
2 By isolating transactions and even by allowing clients to explicitly lock specific portions of the database.
Technically everything is a "file" including folders.. your entire hard drive is giant file. Having said that, yes relational databases, MySQL included store data in a Data file on the hard drive. The difference between a Database and writing/reading to a file is apples and oranges. Databases provide a structured way to store and search/retrieve data in a way you could never replicate by just reading and writing to a file.. Unless you wrote your own db of course..
hope that helps.
When you store data in a flat file, it is compact and efficient to read sequentially, but there is no fast way to access it randomly. This is especially true of variable-length data such as documents, names or strings. To allow for fast random access, most databases store information in a single file using a data structure called a B-Tree. This structure allows for insert, deletion, and search to be fast, but it can use up to 50% more space than the original file. Typically, however, this is not an issue as disk space is cheap and larger, while the primary tasks usually require fast access.
For more information:
http://en.wikipedia.org/wiki/B-tree
Looking carefully into the MySQL docs, we find that indices may be optionally set to "BTREE" or "HASH" type. Inside a single MySQL file, multiple indices are stored which may use either data structure.
Although safety and concurrency are important, these are not WHY databases exist, but added features. The very first databases exist because it is not possible to randomly access a sequential file containing variable length data.

How does MySQL store rows on disk? [duplicate]

I looked around Google but didn't find any good answers. Does it store the data in one big file? What methods does it use to make data access quicker than just reading and writing to a regular file?
This question is a bit old but I decided to answer it anyway since I have been doing some digging on the same. My answer is based on the linux file system. Basically mySQL stores data in files in your hard disk. It stores the files in a specific directory that has the system variable "datadir". Opening a mysql console and running the following command will tell you exactly where the folder is located.
mysql> SHOW VARIABLES LIKE 'datadir';
+---------------+-----------------+
| Variable_name | Value |
+---------------+-----------------+
| datadir | /var/lib/mysql/ |
+---------------+-----------------+
1 row in set (0.01 sec)
As you can see from the above command, my "datadir" was located in /var/lib/mysql/. The location of the "datadir" may vary in different systems. The directory contains folders and some configuration files. Each folder represents a mysql database and contains files with data for that specific database. below is a screenshot of the "datadir" directory in my system.
Each folder in the directory represents a MySQL database. Each database folder contains files that represent the tables in that database. There are two files for each table, one with a .frm extension and the other with a .idb extension. See screenshot below.
The .frm table file stores the table's format. Details: MySQL .frm File Format
The .ibd file stores the table's data. Details: InnoDB File-Per-Table Tablespaces
That’s it folks! I hope I helped someone.
Does it store the data in one big file?
Some DBMSes store the whole database in a single file, some split tables, indexes and other object kinds to separate files, some split files not by object kind but by some storage/size criteria, some can even entirely bypass the file system, etc etc...
I don't know which one of these strategies MySQL uses (it probably depends on whether you use MyISAM vs. InnoDB etc.), but fortunately, it doesn't matter: from the client perspective, this is a DBMS implementation detail the client should rarely worry about.
What methods does it use to make data access quicker them just reading and writing to a regular file?
First of all, DBMses are not just about performance:
They are even more about safety of your data - they have to ensure there is no data corruption even in the face of a power cut or a network failure.1
DBMSes are also about concurrency - they have to arbiter between multiple clients accessing and potentially modifying the same data.2
As for your specific question of performance, relational data is very "susceptible" to indexing and clustering, which is richly exploited by DBMSes to achieve performance. On top of that, the set-based nature of SQL lets the DBMS choose the optimal way to retrieve the data (in theory at least, some DBMSes are better at that than the others). For more about DBMS performance, I warmly recommend: Use The Index, Luke!
Also, you probably noticed that most DBMSes are rather old products. Like decades old, which is really eons in our industry's terms. One consequence of that is that people had plenty of time to optimize the heck out of the DBMS code base.
You could, in theory, achieve all these things through files, but I suspect you'd end-up with something that looks awfully close to a DBMS (even if you had the time and resources to actually do it). So, why reinvent the wheel (unless you didn't want the wheel in the first place ;) )?
1 Usually though some kind of "journaling" or "transaction log" mechanism. Furthermore, to minimize the probability of "logical" corruption (due to application bugs) and promote code reuse, most DBMSes support declarative constraints (domain, key and referential), triggers and stored procedures.
2 By isolating transactions and even by allowing clients to explicitly lock specific portions of the database.
Technically everything is a "file" including folders.. your entire hard drive is giant file. Having said that, yes relational databases, MySQL included store data in a Data file on the hard drive. The difference between a Database and writing/reading to a file is apples and oranges. Databases provide a structured way to store and search/retrieve data in a way you could never replicate by just reading and writing to a file.. Unless you wrote your own db of course..
hope that helps.
When you store data in a flat file, it is compact and efficient to read sequentially, but there is no fast way to access it randomly. This is especially true of variable-length data such as documents, names or strings. To allow for fast random access, most databases store information in a single file using a data structure called a B-Tree. This structure allows for insert, deletion, and search to be fast, but it can use up to 50% more space than the original file. Typically, however, this is not an issue as disk space is cheap and larger, while the primary tasks usually require fast access.
For more information:
http://en.wikipedia.org/wiki/B-tree
Looking carefully into the MySQL docs, we find that indices may be optionally set to "BTREE" or "HASH" type. Inside a single MySQL file, multiple indices are stored which may use either data structure.
Although safety and concurrency are important, these are not WHY databases exist, but added features. The very first databases exist because it is not possible to randomly access a sequential file containing variable length data.