There are 210 columns in my table with around 10000 rows. Each row is unique and there is a primary key on the table. The thing is we always had to do select all query on the table to get data of all the sites.
Currently, the problem is it takes too much time and the data returned is around 10mb and it will be large in the future.
The table has varchar, text and date types in it.
Is there any way I can modify the structure or something to make my retrieval faster. More indexing or breaking down the table. (Although I think denormalized data is good for retrieval)
Update: "why do wider tables slow down the query performance?"
Thanks..!
why do wider tables slow down the query performance?
InnoDB stores "wide" tables in a different way. Instead of having all the columns together in a single string (plus overhead, such as lengths, etc), it does the following:
If the total of all the columns for a given row exceeds about 8KB, it will move some of the data to another ("off-record") storage area.
Which columns are moved off-record depends on the sizes of the columns, etc.
The details depend on the ROW_FORMAT chosen.
"Off-record" is another 16KB block (or blocks).
Later, when doing SELECT * (or at least fetching the off-record column(s)), it must do another disk fetch.
What to do?
Rethink having so many columns.
Consider "vertical partitioning", wherein you have another table(s) that contains selected TEXT columns. Suggest picking groups of columns based on access patterns in your app.
For columns that are usually quite long, consider compressing them in the client and storing into a BLOB instead of a TEXT. Most "text" shrinks 3:1. Blobs are sent off-record the same as Texts, however, these compressed blobs would be smaller, hence less likely to spill.
Do more processing in SQL -- to avoid returning all the rows, or to avoid returning the full text, etc. When blindly shoveling lots of text to a client, the network and client become a sighificant factor in the elapsed time, not just the SELECT, itself.
Related
I have already 300GB data into table.
I want to add indexing to a perticlar column but when adding index on a column then increase the memory of drive(or increase the size of ibdata file).
This process is apply to another table but memory is not increase.
below is query to add indexing,
CREATE INDEX index_name ON table (column)
In general, the amount of space needed for the index tables, varies between about 50% of the original text volume and 200%.
Generally, the larger the total amount of text, the smaller the overhead, but many small records will use more overhead than fewer large records. Also, clean data (such as published text) will require less overhead than dirty data such as emails or discussion notes, since the dirty data is likely to include many unique words from spellings mistakes and abbreviations.
So,the data type and content of the column also matters.
Hope this will help you a bit..!!
I have a frequently accessed table containing 3 columns of blobs, and 4 columns of extra data that is not used in the query, but just sent as result to PHP. There are 6 small columns (big int, small int, tiny int, medium int, medium int, medium int) that are used in the queries in the WHERE/ORDER BY/GROUP BY.
The server has very low memory, around 1GBs, and so the cache is not enough to improve the performance one on the large table. I've indexed the last 6 small columns, but it doesn't seem to be helping.
Would it be a good solution to split this large table into two?
One table containing the last 6 columns, and the other containing the blobs and extra data, and link it to the previous table with a foreign key that has a one to one relationship?
I'll then run the queries on the small table, and join the little number of rows remaining after filtering to the table with the blobs and extra data to return them to PHP.
Please note, I've already done this, and I managed to decrease the query time from 1.2-1.4 seconds to 0.1-0.2 seconds. However I'm not sure if the solution I've tried is considered good practice, or is even advisable at all?
What you have implemented is sometimes called "vertical partitioning". If you take it to the extreme, then it is the basis for columnar databases, such as Vertica.
As you have observed, such partitioning can dramatically increase query performance. One reason is that less data needs to be read for processing a row of data.
The downside is for updates, inserts, and deletes. With all the data in a single row, these operations are basically atomic -- that is, the operation only affects one row in a data page. (This is not strictly true with blobs, because these are split among multiple pages.)
When you split the data among multiple tables, then you need to coordinate these operations among the tables, so you don't end up with "partial" rows of data.
For a database being used with bulk inserts and lots of querying, this is not a particularly important consideration. Your splitting of separate columns of the data into separate tables is a reasonable approach for improving performance.
I am using a mysql database.
My website is cut in different elements (PRJ_12 for projet 12, TSK_14 for task 14, DOC_18 for document 18, etc). We currently store the references to these elements in our database as VARCHAR. The relation columns are Indexed so it is faster to select.
We are thinking of currint these columns in 2 columns (on column "element_type" with PRJ and one "element_id" with 12). We are thinking on this solution as we do a lot of requests containing LIKE ...% (for example retrieve all tasks of one user, no matter the id of the task).
However, splitting these columns in 2 will increase the number of Indexed columns.
So, I have two questions :
Is a LIKE ...% request in an Indexed column realy more slow than a a simple where query (without like). I know that if the column is not indexed, it is not advisable to do where ... LIKE % requests but I don't realy know how Index work).
The fact that we split the reference columns in two will double the number of Indexed table. Is that a problem?
Thanks,
1) A like is always more costly than a full comparison (with = ), however it all comes down to the field data types and the number of records (unless we're talking of a huge table you shouldn't have issues)
2) Multicolumn indexes are not a problem, yes it makes the index bigger, but so what? Data types and ammount of total rows matter, but thats what indexes are for.
So go for it
There are a number of factors involved, but in general, adding one more index on a table that has only one index already is unlikely to be a big problem. Some things to consider.
If the table most mostly read-only, then it is almost certainly not a problem. If updates are rare, then the indexes won't need to be modified often meaning there will be very little extra cost (aside from the additional disk space).
If updates to existing records do not change either of those key values, then no index modification should be needed and so again there would be no additional runtime cost.
DELETES and INSERTS will need to update both indexes. So if that is the majority of the operations (and far exceeding reads), then an additional index might incur measurable performance degradation (but it might not be a lot and not noticeable from a human perspective).
The like operator as you describe the usage should be fully optimized. In other words, the clause WHERE combinedfield LIKE 'PRJ%' should perform essentially the same as WHERE element_type = 'PRJ' if there is an index existing in both situations. The more expensive situation is if you use the wild card at the beginning (e.g., LIKE '%abc%'). You can think of a LIKE search as being equivalent to looking up a word in a dictionary. The search for 'overf%' is basically the same as a search for 'overflow'. You can do a "manual" binary search in the dictionary and quickly find the first word beginning with 'overf'. Searching for '%low', though is much more expensive. You have to scan the entire dictionary in order to find all the words that end with "low".
Having two separate fields to represent two separate values is almost always better in the long run since you can construct more efficient queries, easily perform joins, etc.
So based on the given information, I would recommend splitting it into two fields and index both fields.
I m designing a new forum for My Company and i wanted to confirm that saving the forum posts in MySQL Database would be scalable and would it have good Performance ..?
The Posts may have characters around 400(may be i will limit to 400 Chars). If i save 400 Chars of text in a MySQL field, and the Table has 10 million rows, will it affect performance ..?
My Main constraint is performance. Can Please Someone Shed light on this
There are two data type to consider VARCHAR or TEXT
What datatype you decide on depends on
How frequently you display it ?
Total number of characters you store
TEXT and BLOB is stored off the table with the table just having a pointer to the location of the actual storage.
VARCHAR is stored inline with the table. VARCHAR is faster when the size is reasonable, the tradeoff of which would be faster depends upon your data and your hardware, you'd want to benchmark a realworld senerio with your data.
VARCHAR (stored inline) is usually faster IF the data is frequently retrieved (included by most queries). However, for a large volume of data that is not normally retrieved (that is, not referenced by any query), then it may be better to not have the data stored inline. There is an upper limit on the row size, for data stored inline.
When a table has TEXT or BLOB columns, the table can't be stored in memory. This means every query (which doesn't hit cache) has to access the file system - which is orders of magnitude slower than the memory.
If you post content is large use TEXT field but store the Text field in a seperate table which is only accessed when you actually need it. This way the original table can be stored in memory and will be much faster.
Think of it as separating the data into one "memory table" and one "file table". The reason for doing this is to avoid accessing of the filesystem except when neccessary (i.e. only when you need the text).
You can try (posts, post_text) or (post_details, posts) or something like that.
With MyISAM having variable length columns (varchar, blob) on the table really slowed queries so that I encountered advices on the net to move varchar columns into separate table.
Is that still an issue with InnoDB? I don't mean cases where introducing many varchar rows into the table causes page split. I just mean should you consider, for example, move post_text (single BLOB field in the table) into another table, speaking performance-wise about InnoDB?
As far as I know BLOBs (and TEXTs) are actually stored outside of the table, VARCHARs are stored in the table.
VARCHARs are bad for read performance because each record can be of variable length and that makes it more costly to find fields in a record.
BLOBs are slow because the value has to be fetched separately and it will very likely require another read from disk or cache.
To my knowledge InnoDB doesn't do anything differently in this respect so I would assume the performance characteristics hold.
I don't think moving BLOB values really helps - other than reducing overall table size which has a positive influence on performance regardless.
VARCHARs are a different story. You will definitely benefit here. If all your columns are of defined length (and I guess that means you can't use BLOBs either?) the field lookup will be faster.
If you're just 'reading' the VARHCAR and BLOB fields I'd say this is worth a shot. But if your select query needs to compare a value from a VARCHAR or a BLOB you're pretty sour.
So yes you can definitely gain performance here but make sure you test that you're actually gaining performance and that the increase is worth the aggressive denormalization.
PS.
Another way of 'optimizing' VARCHAR read performance is to simply replace them by CHAR fields (of fixed length). This could benefit read performance, so long as the increase in disk space is acceptable.
InnoDB data completely differently than MyISAM.
In MyISAM all indexes--primary or otherwise--- are stored in the MYI file an contain a pointer to the data stored in the MYD file. Variable length rows shouldn't directly affect query speed directly, but the MYD file does tend to get more fragmented with variable length rows because the hole left behind when you delete a row can't necessarily be filed in with the row you insert next. If you update a variable length value to make it longer you might have to move it somewhere else, which means it will tend to get out-of-order with respect to the indexes over time, making range queries slower. (If you're running it on a spinning disk where seek times are important).
InnoDB stores data clustered in pages in a B-tree on the primary key. So long as the data will fit in a page it is stored in the page whether you're using a BLOB or VARCHAR. As long as you aren't trying to insert inordinately long values on a regular basis it shouldn't matter whether your rows are fixed-length or variable-length.