MYSQL data on multiple drives - mysql

I have a MYSQL database on my SDA. It's mostly all one schema with "popular" tables in it. I want to store the less "popular" tables of the schema (which take up another 1TB or so) on my SDB partition.
What is the right way to do this? Do I need another MYSQL server running on that drive? Or can I simply set like DATA_DIRECTORY= or something? This is Ubuntu and MYSQL 5.7.38. Thank you for any help, it's much appreciated.

As of MySQL 8.0.21, the ability to specify the data directory per table has finally improved.
CREATE TABLE t1 (c1 INT PRIMARY KEY) DATA DIRECTORY = '/external/directory';
Read https://dev.mysql.com/doc/refman/8.0/en/innodb-create-table-external.html#innodb-create-table-external-data-directory for details.
In earlier versions of MySQL, you could use symbolic links. That is, the link still has to reside under the default data directory, but the link can point to a file on another physical device.
It was unreliable to use symbolic links for individual tables in this way, because OPTIMIZE TABLE or many forms of ALTER TABLE would recreate the file without the symbolic link, effectively moving it back to the primary storage device. To solve this, it was recommended to use a symbolic link for the schema subdirectory instead of individual tables.
To be honest, I've never found a case where I needed to use either of these techniques. Just keep it simple: one data directory on one filesystem, and don't put the data directory on the same device as the root filesystem. Make sure the data storage volume is large enough for all your data. Use software RAID if you need to use multiple devices to make one larger filesystem.

Related

Best way to architect a system to check if file already exists, and get it

I have a backend express server where I need it to check if an image is already stored on the server. The end user can upload any type of image to check if it's on the server.
The folder of images on the server is a folder, that contains lots of other folders and they contain images. They could have lots of images, perhaps 1000 in each folder so I don't want to get going through to manually check them.
I was thinking that I would initially get the hash of each image and store this along with the unique image name in a database table.
This way when the user uploads an image I can hash that image, then make a call to the database table and see if it exists there? Would there be an issue with speed here for a table that could contain 200k entries for example? If it was mysql what would be the best way to see if it contains the hash and get back the location?
Does this seem like the right approach?
Thanks
I think that hashing the image would work. You could store that mapping in a relational db table and potentially create an index from the image hash column to improve the lookup time.
I would also consider a no-SQL approach here, as it seems to be a better fit for that particular use-case. Redis would handle 200K rows pretty fine I think.

How does MySQL store data

I looked around Google but didn't find any good answers. Does it store the data in one big file? What methods does it use to make data access quicker than just reading and writing to a regular file?
This question is a bit old but I decided to answer it anyway since I have been doing some digging on the same. My answer is based on the linux file system. Basically mySQL stores data in files in your hard disk. It stores the files in a specific directory that has the system variable "datadir". Opening a mysql console and running the following command will tell you exactly where the folder is located.
mysql> SHOW VARIABLES LIKE 'datadir';
+---------------+-----------------+
| Variable_name | Value |
+---------------+-----------------+
| datadir | /var/lib/mysql/ |
+---------------+-----------------+
1 row in set (0.01 sec)
As you can see from the above command, my "datadir" was located in /var/lib/mysql/. The location of the "datadir" may vary in different systems. The directory contains folders and some configuration files. Each folder represents a mysql database and contains files with data for that specific database. below is a screenshot of the "datadir" directory in my system.
Each folder in the directory represents a MySQL database. Each database folder contains files that represent the tables in that database. There are two files for each table, one with a .frm extension and the other with a .idb extension. See screenshot below.
The .frm table file stores the table's format. Details: MySQL .frm File Format
The .ibd file stores the table's data. Details: InnoDB File-Per-Table Tablespaces
That’s it folks! I hope I helped someone.
Does it store the data in one big file?
Some DBMSes store the whole database in a single file, some split tables, indexes and other object kinds to separate files, some split files not by object kind but by some storage/size criteria, some can even entirely bypass the file system, etc etc...
I don't know which one of these strategies MySQL uses (it probably depends on whether you use MyISAM vs. InnoDB etc.), but fortunately, it doesn't matter: from the client perspective, this is a DBMS implementation detail the client should rarely worry about.
What methods does it use to make data access quicker them just reading and writing to a regular file?
First of all, DBMses are not just about performance:
They are even more about safety of your data - they have to ensure there is no data corruption even in the face of a power cut or a network failure.1
DBMSes are also about concurrency - they have to arbiter between multiple clients accessing and potentially modifying the same data.2
As for your specific question of performance, relational data is very "susceptible" to indexing and clustering, which is richly exploited by DBMSes to achieve performance. On top of that, the set-based nature of SQL lets the DBMS choose the optimal way to retrieve the data (in theory at least, some DBMSes are better at that than the others). For more about DBMS performance, I warmly recommend: Use The Index, Luke!
Also, you probably noticed that most DBMSes are rather old products. Like decades old, which is really eons in our industry's terms. One consequence of that is that people had plenty of time to optimize the heck out of the DBMS code base.
You could, in theory, achieve all these things through files, but I suspect you'd end-up with something that looks awfully close to a DBMS (even if you had the time and resources to actually do it). So, why reinvent the wheel (unless you didn't want the wheel in the first place ;) )?
1 Usually though some kind of "journaling" or "transaction log" mechanism. Furthermore, to minimize the probability of "logical" corruption (due to application bugs) and promote code reuse, most DBMSes support declarative constraints (domain, key and referential), triggers and stored procedures.
2 By isolating transactions and even by allowing clients to explicitly lock specific portions of the database.
Technically everything is a "file" including folders.. your entire hard drive is giant file. Having said that, yes relational databases, MySQL included store data in a Data file on the hard drive. The difference between a Database and writing/reading to a file is apples and oranges. Databases provide a structured way to store and search/retrieve data in a way you could never replicate by just reading and writing to a file.. Unless you wrote your own db of course..
hope that helps.
When you store data in a flat file, it is compact and efficient to read sequentially, but there is no fast way to access it randomly. This is especially true of variable-length data such as documents, names or strings. To allow for fast random access, most databases store information in a single file using a data structure called a B-Tree. This structure allows for insert, deletion, and search to be fast, but it can use up to 50% more space than the original file. Typically, however, this is not an issue as disk space is cheap and larger, while the primary tasks usually require fast access.
For more information:
http://en.wikipedia.org/wiki/B-tree
Looking carefully into the MySQL docs, we find that indices may be optionally set to "BTREE" or "HASH" type. Inside a single MySQL file, multiple indices are stored which may use either data structure.
Although safety and concurrency are important, these are not WHY databases exist, but added features. The very first databases exist because it is not possible to randomly access a sequential file containing variable length data.

Can I move MySQL table to a second drive?

I am having I/O related performance problems that would be solved if a few relatively small tables were running on a SSD. I can't move the entire DB to SSD because it is much too large.
I thought this was possible (map specific tables to different drives) but a tech at my managed hosting company says that the entire DB needs to be in a single directory. Is this correct? If he's wrong, can someone point me somewhere with basic instructions on how this is done? Or even provide the instructions here?
When you create a MySQL table you can specify the data directory and index directory.
Have a look at http://dev.mysql.com/doc/refman/5.1/en/create-table.html
So, to answer your question, you could create a new table in the different directory and migrate your data there.

How does MySQL store rows on disk? [duplicate]

I looked around Google but didn't find any good answers. Does it store the data in one big file? What methods does it use to make data access quicker than just reading and writing to a regular file?
This question is a bit old but I decided to answer it anyway since I have been doing some digging on the same. My answer is based on the linux file system. Basically mySQL stores data in files in your hard disk. It stores the files in a specific directory that has the system variable "datadir". Opening a mysql console and running the following command will tell you exactly where the folder is located.
mysql> SHOW VARIABLES LIKE 'datadir';
+---------------+-----------------+
| Variable_name | Value |
+---------------+-----------------+
| datadir | /var/lib/mysql/ |
+---------------+-----------------+
1 row in set (0.01 sec)
As you can see from the above command, my "datadir" was located in /var/lib/mysql/. The location of the "datadir" may vary in different systems. The directory contains folders and some configuration files. Each folder represents a mysql database and contains files with data for that specific database. below is a screenshot of the "datadir" directory in my system.
Each folder in the directory represents a MySQL database. Each database folder contains files that represent the tables in that database. There are two files for each table, one with a .frm extension and the other with a .idb extension. See screenshot below.
The .frm table file stores the table's format. Details: MySQL .frm File Format
The .ibd file stores the table's data. Details: InnoDB File-Per-Table Tablespaces
That’s it folks! I hope I helped someone.
Does it store the data in one big file?
Some DBMSes store the whole database in a single file, some split tables, indexes and other object kinds to separate files, some split files not by object kind but by some storage/size criteria, some can even entirely bypass the file system, etc etc...
I don't know which one of these strategies MySQL uses (it probably depends on whether you use MyISAM vs. InnoDB etc.), but fortunately, it doesn't matter: from the client perspective, this is a DBMS implementation detail the client should rarely worry about.
What methods does it use to make data access quicker them just reading and writing to a regular file?
First of all, DBMses are not just about performance:
They are even more about safety of your data - they have to ensure there is no data corruption even in the face of a power cut or a network failure.1
DBMSes are also about concurrency - they have to arbiter between multiple clients accessing and potentially modifying the same data.2
As for your specific question of performance, relational data is very "susceptible" to indexing and clustering, which is richly exploited by DBMSes to achieve performance. On top of that, the set-based nature of SQL lets the DBMS choose the optimal way to retrieve the data (in theory at least, some DBMSes are better at that than the others). For more about DBMS performance, I warmly recommend: Use The Index, Luke!
Also, you probably noticed that most DBMSes are rather old products. Like decades old, which is really eons in our industry's terms. One consequence of that is that people had plenty of time to optimize the heck out of the DBMS code base.
You could, in theory, achieve all these things through files, but I suspect you'd end-up with something that looks awfully close to a DBMS (even if you had the time and resources to actually do it). So, why reinvent the wheel (unless you didn't want the wheel in the first place ;) )?
1 Usually though some kind of "journaling" or "transaction log" mechanism. Furthermore, to minimize the probability of "logical" corruption (due to application bugs) and promote code reuse, most DBMSes support declarative constraints (domain, key and referential), triggers and stored procedures.
2 By isolating transactions and even by allowing clients to explicitly lock specific portions of the database.
Technically everything is a "file" including folders.. your entire hard drive is giant file. Having said that, yes relational databases, MySQL included store data in a Data file on the hard drive. The difference between a Database and writing/reading to a file is apples and oranges. Databases provide a structured way to store and search/retrieve data in a way you could never replicate by just reading and writing to a file.. Unless you wrote your own db of course..
hope that helps.
When you store data in a flat file, it is compact and efficient to read sequentially, but there is no fast way to access it randomly. This is especially true of variable-length data such as documents, names or strings. To allow for fast random access, most databases store information in a single file using a data structure called a B-Tree. This structure allows for insert, deletion, and search to be fast, but it can use up to 50% more space than the original file. Typically, however, this is not an issue as disk space is cheap and larger, while the primary tasks usually require fast access.
For more information:
http://en.wikipedia.org/wiki/B-tree
Looking carefully into the MySQL docs, we find that indices may be optionally set to "BTREE" or "HASH" type. Inside a single MySQL file, multiple indices are stored which may use either data structure.
Although safety and concurrency are important, these are not WHY databases exist, but added features. The very first databases exist because it is not possible to randomly access a sequential file containing variable length data.

Should I use MySQL blob field type?

I am struggling to decide if I should be using the MySQL blob field type in an upcoming project I have.
My basic requirements are, there will be certain database records that can be viewed and have multiple files uploaded and "attached" to those records. Seeing said records can be limited to certain people on a case by case basis. Any type of file can be uploaded with virtually no restriction.
So looking at it one way, if I go the MySQL route, I don't have to worry about virus's creeping up or random php files getting uploaded and somehow executed. I also have a much easier path for permissioning and keeping data tied close to a record.
The other obvious route is storing the data in a specific folder structure outside of the webroot. in this case I'd have to come up with a special naming convention for folders/files to keep track of what they reference inside the database.
Is there a performance hit with using MySQL blob field type? I'm concerned about choosing a solution that will hinder future growth of the website as well as choosing a solution that wont be easy to maintain.
Is there a performance hit with using MySQL blob field type?
Not inherently, but if you have big BLOBs clogging up your tables and memory cache that will certainly result in a performance hit.
The other obvious route is storing the data in a specific folder structure outside of the webroot. in this case I'd have to come up with a special naming convention for folders/files to keep track of what they reference inside the database.
Yes, this is a common approach. You'd usually do something like have folders named after each table they're associated with, containing filenames based only on the primary key (ideally a integer; certainly never anything user-submitted).
Is this a better idea? It depends. There are deployment-simplicity advantages to having only a single data store, and not having to worry about giving the web user write access to anything. Also if there might be multiple copies of the app running (eg active-active load balancing) then you need to synchronise the storage, which is much easier with a database than it is with a filesystem.
If you do use the filesystem rather than a blob, the question is then, do you get the web server to serve it by pointing an Alias at the folder?
+ is super fast
+ caches well
- extra server config: virtual directory; needs appropriate file extension to return desired Content-Type
- extra server config: need to add Content-Disposition: attachment/X-Content-Type-Options headers to stop IE sniffing for HTML as part of anti-XSS measures
or do you serve the file manually by having a server-side script spit it out, as you would have to serving from a MySQL blob?
- is potentially slow
- needs a fair bit of manual If-Modified-Since and ETag handling to cache properly
+ can use application's own access control methods
+ easy to add correct Content-Type and Content-Disposition headers from the serving script
This is a trade-off there's not one globally-accepted answer for.
If your web server will be serving these uploaded files over the web, the performance will almost certainly be better if they are stored on the filesystem. The web server will then be able to apply HTTP caching hints such as Last-Modified and ETag which will help performance for users accessing the same file multiple times. Additionally, the web server will automatically set the correct Content-Type for the file when serving. If you store blobs in the database, you'll end up implementing the above mentioned features and more when you should be getting them for free from your web server.
Additionally, pulling large blob data out of your database may end up being a performance bottleneck on your database. Also, your database backups will probabaly be slower because they'll be backing up more data. If you're doing ad-hoc queries during development, it'll be inconvenient seeing large blobs in result sets for select statements. If you want to simply inspect an uploaded file, it will be inconvenient and roundabout to do so because it'll be awkwardly stored in a database column.
I would stick with the common practice of storing the files on the filesystem and the path to the file in the database.
In my experience storing a BLOB in MySQL is OK, as long you store only the blob in one table, while other fields are in another (joined) table. Conversely, searching in the fields of a table with a few standard fields and one blob field with 100 MB of data can slow queries dramatically.
I had to change the data layer of a mailing app for this issue where emails were stored with content in the same table as date sent, email addresses, etc. It was taking 9 secs to search 10000 emails. Now it takes what it should take ;-)
Data should be stored in one consistent place: the database.
This performance and Content-Type thing is not an issue at all, because there is nothing stopping you from caching those BLOB fields to the local web server and serving it from there as it is requested for the first time. You do not need to access that table on every page view.
This file system cache can be emptied out at any moment, which will only impact performance temporarily as it is being refilled automagically. It will also enable you to use one database and many web servers as your application grows, they will simply all have a local cache on the file system.
Many people recommend against storing file attachments (usually this applies to images) in blobs in the database. Instead they prefer to store a pathname as a string in the database, and store the file somewhere safe on the filesystem. There are some merits to this:
Database and database backups are smaller.
It's easier to edit files on the filesystem if you need to work with them ad hoc.
Filesystems are good at storing files. Databases are good at storing tuples. Let each one do what it's good at.
There are counter-arguments too, that support putting attachments in a blob:
Deleting a row in a database automatically deletes the associated attachment.
Rollback and transaction isolation work as expected when data is in a row, but not when some part of the data is on the filesystem.
Backups are simpler if all data is in the database. No need to worry about making consistent backups of data that's changing concurrently during the backup procedure.
So the best solution depends on how you're going to be using the data in your application. There's no one-size-fits-all answer.
I know you tagged your question with MySQL, but if folks reading this question use other brands of RDBMS, they might want to look into BFILE when using Oracle, or FILESTREAM when using Microsoft SQL Server 2008. These give you the ability store files outside the database but access them like they're part of a row in a database table (more or less).
Large volumes of data will eventually take their toll on performance. MS SQL 2008 has a specialized way of storing binary data in the file system:
http://msdn.microsoft.com/en-us/library/cc949109.aspx
I would employ the similar approach too for your project too.
You can create a FILES table that will keep information about files such as original names for example. To safely store files on the disk rename them using for example GUIDs. Store new file names in your FILES table and when user needs to download it you can easily locate it on disk and stream it to user.
In my opinion storing files in database is bad idea. What you can store there is id, name, type, possibly md5 hash of file, and date inserted. Files can be uploaded in to folder outside public location. Also you should be concern that it is not advised to keep more than 1000 files in one folder. So what you have to create new folder each time file id is increased by 1000.