MySQL DB normalization - mysql

I've got a single table DB with 100K rows. There are about 30 columns and 28 of them are varchars / tiny text and one of them is an int primary key and one of them is a blob.
My question, is in terms of performance, would it be better to separate the blob from the rest of the table and store them in their own table with foreign key constraint to the primary id?
The table will eventually be turned into a sqlite persistent store for iOS core data and a lot of the searching / filtering will be done based on the NSPredicate for the lighter varchar columns.
Sorry if this is too subjective, but I'm thinking there is a recommended way.
Thanks!

If you do SELECT * FROM table (which you shouldn't if you don't need the BLOB field actually) then yes, the query will be faster because in that case pages with BLOB won't be touched.
If you do frequent SELECT f1, f2, f3 FROM table (all fields are non-BLOBs) then yes, storing BLOBS in a separate table will make the query faster because of the same reason - MySQL will have to read less pages.
If however the BLOB is selected frequently then it makes no sense to keep it separately.

This totally depends on data usage.
If you need the data everytime you query the table, there is no difference in haviong a separate table for it (as long as blob data is unique in each row - that is, "as long as the database is normalized").
If you don'T need the blob data but only metadata from other columns, there may be a speed bonus qhen querying if the blob has its own table. querying the blob data is slower thoguh, as you need to query bowth tables.
The USUAL way is not to store any blob data inside the database (at least not huge data), but store the binary data into files and have the fiel path inside the database instead. This is recommended, as binary data most likely doesn'T benefit from being inside a DBMS (not indexable, sortable, groupable, ..), so there is no drawback of storing it inside files, while the database isn't optimized for binary data ('cause, again, it can't do much with it anyway).

Blobs are stored on disk only the point to the storage is in memory in Mysql. Moving it to another table with a foreign key will not noticeably help your performance. Don't know if this is the case for sqlite.

Related

Blob column needs to be moved to a different table for speed?

I have 30 columns in a table which will be frequently accessed. Out of 30 columns, 3 of the columns of type BLOB however it will be very rarely used.
Do we need to split the columns to different tables especially the blob?
We have some slowness but not sure whether it is due to having all blob columns in the same table.
Please advise.
Thanks
Thanks, Cascader
Most sources suggest a separate table for blobs.
Reference:
Speed of mysql query on tables containing blob depends on filesystem cache
containing-blob-depends-on-filesystem-cache
Many people recommend using blob with only one primary key in a separate table and storing the blobs meta data in another table with a foreign key to the blob table. With this the performance will be higher considerably.

Is it correct to have a BLOB field directly in the main table?

Which one is better: having a BLOB field in the same table or having a 1-TO-1 reference to it in another table?
I'm making a MySQL database whose main table is called item(ID, Description). This table is consulted by a program I'm developing in VB.NET which offers the possibility to double-click a specific item obtained with a query. Once opened its dedicated form, I would like to show an image stored in the BLOB field, a sort of item preview. The problem is I don't know where is better to create this BLOB field.
Assuming to have a table like this: Item(ID, Description, BLOB), will the BLOB field affect the database performance on queries like:
SELECT ID, Description FROM Item;
If yes, what do you think about this solution:
Item(ID, Description)
Images(Item, File)
Where Images.Item references to Item.ID, and File is the BLOB field.
You can add the BLOB field directly to your main table, as BLOB fields are not stored in-row and require a separate look-up to retrieve its contents. Your dependent table is needless.
BUT another and preferred way is to store on your database table only a pointer (path to the file on server) to your image file. In this way you can retrive the path and access the file from your VB.NET application.
To quote the documentation about blobs:
Each BLOB or TEXT value is represented internally by a separately allocated object. This is in contrast to all other data types, for which storage is allocated once per column when the table is opened.
In simpler terms, the blob's storage isn't stored inside the table's row, only a pointer is - which is pretty similar to what you're trying to achieve with the secondary table. To make a long story short - there's no need for another table, MySQL already doesn't the same thing internally.
Most of what has been said in the other Answers is mostly correct. I'll start from scratch, adding some caveats.
The two-table, 1-1, design is usually better for MyISAM, but not for InnoDB. The rest of my Answer applies only to InnoDB.
"Off-record" storage may happen to BLOB, TEXT, and 'large' VARCHAR and VARBINARY, almost equally.
"Large" columns are usually stored "off-record", thereby providing something very similar to your 1-1 design. However, by having InnoDB do the work usually leads to better performance.
The ROW_FORMAT and the size of the column makes a difference.
A "small" BLOB may be stored on-record. Pro: no need for the extra fetch when you include the blob in the SELECT list. Con: clutter.
Some ROW_FORMATs cut off at 767 bytes.
Some ROW_FORMATs store 20 bytes on-record; this is just a 'pointer'; the entire blob is off-record.
etc, etc.
Off-record is beneficial when you need to filter out a bunch of rows, then fetch only a few. Also, when you don't need the column.
As a side note, TINYTEXT is possibly useless. There are situations where the 'equivalent' VARCHAR(255) performs better.
Storing an image in the table (on- or off-record) is arguably unwise if that image will be used in an HTML page. HTML is quite happy to request the <img src=...> from your server or even some other server. In this case, a smallish VARCHAR containing a url is the 'correct' design.

Does text or blob fields slow down the access to the table

I have table contains text and blob field and some other , I'm wondering if using these kind of data types would slow down the access to table
CREATE TABLE post
(
id INT(11),
person_id INT(11) ,
title VARCHAR(120) ,
date DATETIME ,
content TEXT,
image BLOB ,
);
let's say i have more than 100,000 posts and i want to do some query like
SELECT * FROM post WHERE post.date >= ? AND post.person_id = ?
would the query be faster if the table does not contains TEXT and BLOB fields
Yes or no.
If you don't fetch the text/blob fields, they don't slow down SELECTs. If you do, then they slow things down in either or both of these ways:
In InnoDB, TEXT and BLOB data, if large enough, is stored in a separate area from the rest of the columns. This may necessitate an extra disk hit. (Or may not, if it is already cached.)
In complex queries (more complex than yours), the Optimizer may need to make a temporary table. Typical situations: GROUP BY, ORDER BY and subqueries. If you are fetching a text or blob, the temp table cannot be MEMORY, but must be the slower MyISAM.
But, the real slowdown, is that you probably do not have this composite index: INDEX(person_id, date). Without it, the query might choose to gather up the text/blob (buried in the *) and haul it around, only to later discard it.
Action items:
Make sure you have that composite index.
If you don't need content for this query, don't use *.
If you need a TEXT or BLOB, use it; the alternatives tend to be no better. Using "vertical partitioning" ("splitting the table", as mentioned by #changepicture) is no better in the case of InnoDB. (It was a useful trick with MyISAM.) InnoDB is effectively "doing the split for you".
In my opinion, the short answer is yes. But there's more to it of course.
If you have good indexes then mysql will locate the data very fast but because the data is big then it will take a longer time to send the data.
In general smaller tables and use of numeric column types provides better performance.
And never do "SELECT *", it's just bad practice and in your case it's worst. What if you only need the title and date? Instead of transferring few data you transfer it all.
Consider splitting the table, meta data in one table and content and image in another table. This way going through the first table is very fast and only when you need the data from the second table will you access it. You will have a one-to-one relationship using this table structure.

indexing varchars without duplicating the data

I've huge data-set of (~1 billion) records in the following format
|KEY(varchar(300),UNIQE,PK)|DATA1(int)|DATA2(bool)|DATA4(varchar(10)|
Currently the data is stored in MySAM MYSQL table, but the problem is that the key data (10G out of 12G table size) is stored twice - once in the table and once as index. (the data is append only there won't ever be UPDATE query on the table)
There are two major actions that run against the data-set :
contains - Simple check if a key is found
count - Aggregation (mostly) functions according to the data fields
Is there a way to store the key data only once?
One idea I had is to drop the DB all together and simply create 2-5 char folder structure.
this why the data assigned to the key "thesimon_wrote_this" would be stored in the fs as
~/data/the/sim/on_/wro/te_/thi/s.data
This way the data set will function much as btree and the "contains" and data retrieval functions will run in almost O(1) (with the obvious HDD limitations).
This makes the backups pretty easy (backing up only files with A attribute) but the aggregating functions became almost useless as I need to grep 1 billion of files every time. The allocation unit size is irrelevant as I can adjust the file structure so that only 5% of the disk space is taken without use.
I'm pretty sure that there is another - much more elegant way to do that, I can't Google it out :).
It would seem like a very good idea to consider having a fixed-width, integral key, like a 64-bit integer. Storing and searching a varchar key is very slow by comparison! You can still add an additional index on the KEY column for fast lookup, but it shouldn't be your primary key.

Maximum Data Length in a MySQL field

I m designing a new forum for My Company and i wanted to confirm that saving the forum posts in MySQL Database would be scalable and would it have good Performance ..?
The Posts may have characters around 400(may be i will limit to 400 Chars). If i save 400 Chars of text in a MySQL field, and the Table has 10 million rows, will it affect performance ..?
My Main constraint is performance. Can Please Someone Shed light on this
There are two data type to consider VARCHAR or TEXT
What datatype you decide on depends on
How frequently you display it ?
Total number of characters you store
TEXT and BLOB is stored off the table with the table just having a pointer to the location of the actual storage.
VARCHAR is stored inline with the table. VARCHAR is faster when the size is reasonable, the tradeoff of which would be faster depends upon your data and your hardware, you'd want to benchmark a realworld senerio with your data.
VARCHAR (stored inline) is usually faster IF the data is frequently retrieved (included by most queries). However, for a large volume of data that is not normally retrieved (that is, not referenced by any query), then it may be better to not have the data stored inline. There is an upper limit on the row size, for data stored inline.
When a table has TEXT or BLOB columns, the table can't be stored in memory. This means every query (which doesn't hit cache) has to access the file system - which is orders of magnitude slower than the memory.
If you post content is large use TEXT field but store the Text field in a seperate table which is only accessed when you actually need it. This way the original table can be stored in memory and will be much faster.
Think of it as separating the data into one "memory table" and one "file table". The reason for doing this is to avoid accessing of the filesystem except when neccessary (i.e. only when you need the text).
You can try (posts, post_text) or (post_details, posts) or something like that.