I have a table called data in mysql database. The table is quite large and has about 500k records and this number will grow up to 1 million. Each record consists of about 50 columns and most of them contain varchars.
The data table is being used very frequently. Actually, most queries access this table. The data is being read from and written to it by ~50 users simultaneously. The system is highly loaded with the users uploading and checking their data so it can be stopped maximum for an hour or two.
After some research. I found out that almost all the select queries that have 'where' clause use one of four fields in the table. That fields are: isActive, country, state, city - all are in the int format. The where can be either
where isActive = {0|1}
or
where isActive = {0|1} and {country|state|city} = {someIntValue}
or
where {country|state|city} = {someIntValue}
And the last thing is that the table does not have any indexes except for the primary id one.
After the table has grown to current sizes i faced some performance issues.
So, my question is if i create the indexes on the columns isActive, country, state and city will the performance increase?
UPD: I've just created an index on one of that fields and WOW! the queries are being executed immediately. Thank you, guys.
I don't think it's a good idea to index the isActive field because it'll cause the indexing overhead when adding/updating/deleting, but it'll only split data in two chunks (1 and 0) when reading so it'll not really help.
Edit: found this to explain the point above:
Is there any performance gain in indexing a boolean field?
For the other tree columns, I recommend you to do a benchmark when most user are offline (in the night, or lunch time) and see how it affect performance, but I think it'll really help without many downsides.
Edit: ypercube has signaled some interesting use cases where my answer about indexing boolean field isn't relevant, check comments.
Yes creating an index on each of these columns will help you.
Consider and underline the word each.
A separate index for each one is what I suggest. The reason being coexistence of different combinations of the columns.
Yes, definately.
you may see even better results if you include selected additional fields to each index too. Just take careful notice of the column order...
But before all else, make sure you dont use myisam engine for a big table with many writes! Switch to innodb for example.
Related
I would like to ask a question about the principle of index and optimization in database.
I am using mysql. The schema engine is myisam. In one query, the explain results showed 8000+ rows in a table that had been well indexed. Then my colleague used the command 'optimize table' in this table. And after that the explain showed 2 rows which looked correct. The result is good, but both of us do not really understand what really happened and why.
I am new in this area. So can anyone help to explain how this 'explain' and the index can be significantly changed after optimization? I thought index should be good enough before we optimize the table.
Many thanks!
You can read the manual on OPTIMIZE TABLE here: https://dev.mysql.com/doc/refman/5.7/en/optimize-table.html
For MyISAM tables, OPTIMIZE TABLE works as follows:
If the table has deleted or split rows, repair the table.
If the index pages are not sorted, sort them.
If the table's statistics are not up to date (and the repair could
not be accomplished by sorting the index), update them.
It's the last step that is most useful in your case. This is the same work that is performed by ANALYZE TABLE. Read more about what that does here: https://dev.mysql.com/doc/refman/5.7/en/analyze-table.html
Both OPTIMIZE TABLE and ANALYZE TABLE do completely different things when using InnoDB. Read the docs to learn more.
It's all about the "distribution of data" in indexes. as time passes and records are added, one index might become better suited than another. You obviously need an example:
Let's say you have a table with last_name and city field and an index for each. If you have a search with BOTH fields, like WHERE last_name='jones' and city='here' then any of the indexes might be used, they are both equal. Once one is chosen, then a slow search is done for the second field.
Now with time, city might start to show a lot less variability than name. So a search on both might indicate that city will yield too many records to filter as a second pass, where as last_name might be a smaller set , so faster.
Optimize will detect this distribution and hint to use last_name in preference to city with more data and time.
Hope this was clear ...
There's 10 tables all with a session_id column and a single session table. The goal is to join them all on the session table. I get the feeling that this is a major code smell. Is this good/bad practice ?
What problems could occur?
Whether this is a good design or not depends deeply on what you are trying to represent with it. So, it might be OK or it might not be... there's no way to tell just from your question in its current form.
That being said, there are couple ways to speed up a join:
Use indexes.
Use covering indexes.
Under the right DBMS, you could use a materialized view to store pre-joined rows. You should be able to simulate that under MySQL by maintaining a special table via triggers (or even manually).
Don't join a table unless you actually need its fields. List only the fields you need in the SELECT list (instead of blindly using *). The fastest operation is the one you don't have to do!
And above all, measure on representative amounts of data! Possible results:
It's lightning fast. Yay!
It's slow, but it doesn't matter that it's slow (i.e. rarely used / not important).
It's slow and it matters that it's slow. Strap-in, you have work to do!
We need Query with 11 joins and the EXPLAIN posted in the original question when it is available, please. And be kind to your community, for every table involved post as well SHOW CREATE TABLE tblname SHOW INDEX FROM tblname to avoid additional requests for these 11 tables. And we will know scope of data and cardinality involved for each indexed column.
of Course more join kills performance.
but it depends !! if your data model is like that then you can't help yourself here unless complete new data model re-design happen !!
1) is it a online(real time transaction ) DB or offline DB (data warehouse)
if online , then better maintain single table. keep data in one table , let column increase in size.!!
if offline , it's better to maintain separate table , because you are not going to required all column always.!!
Reading this I now understand when to use indexes and when not to use them. But i have a question; would using an index on a column with a limited number of possible values help speedup queries (SELECT-ing) ? Consider the following:
Table "companies": id, district_id, name
Table "districts": id, name
The number of districts would never pass 5 entries. Should i use an index on companies.district_id then or not? I read somewhere (can't find the link :( ) that it wont help since the values are not that many and it would actually slow down the query in many cases.
PS: both tables are MyISAM
Almost never is an INDEX on a low-cardinality column used by the optimizer.
On the other hand, a "compound index" may be useful. For example, does INDEX(district_id, name) have any use?
Having INDEX(district_id) will slow down INSERTs because the index is added to whenever a row is inserted. It will not slow down SELECTs, other than the minor amount of time for the Optimizer to notice the index and reject it.
(My statements apply to both MyISAM and InnoDB.)
More discussion of this answer:
MySQL: Building the best INDEX for a given SELECT: Flags and Low Cardinality
I've got a table in a MySQL db with about 25000 records. Each record has about 200 fields, many of which are TEXT. There's nothing I can do about the structure - this is a migration from an old flat-file db which has 16 years of records, and many fields are "note" type free-text entries.
Users can be viewing any number of fields, and order by any single field, and any number of qualifiers. There's a big slowdown in the sort, which is generally taking several seconds, sometimes as much as 7-10 seconds.
an example statement might look like this:
select a, b, c from table where b=1 and c=2 or a=0 order by a desc limit 25
There's never a star-select, and there's always a limit, so I don't think the statement itself can really be optimized much.
I'm aware that indexes can help speed this up, but since there's no way of knowing what fields are going to be sorted on, i'd have to index all 200 columns - what I've read about this doesn't seem to be consistent. I understand there'd be a slowdown when inserting or updating records, but assuming that's acceptable, is it advisable to add an index to each column?
I've read about sort_buffer_size but it seems like everything I read conflicts with the last thing I read - is it advisable to increase this value, or any of the other similar values (read_buffer_size, etc)?
Also, the primary identifier is a crazy pattern they came up with in the nineties. This is the PK and so should be indexed by virtue of being the PK (right?). The records are (and have been) submitted to the state, and to their clients, and I can't change the format. This column needs to sort based on the logic that's in place, which involves a stored procedure with string concatenation and substring matching. This particular sort is especially slow, and doesn't seem to cache, even though this one field is indexed, so I wonder if there's anything I can do to speed up the sorting on this particular field (which is the default order by).
TYIA.
I'd have to index all 200 columns
That's not really a good idea. Because of the way MySQL uses indexes most of them would probably never be used while still generating quite a large overhead. (see chapter 7.3 in link below for details). What you could do however, is to try to identify which columns appear most often in WHERE clause, and index those.
In the long run however, you will probably need to find a way, to rework your data structure into something more manageable, because as it is now, it has the smell of 'spreadsheet turned into database' which is not a nice smell.
I've read about sort_buffer_size but it seems like everything I read
conflicts with the last thing I read - is it advisable to increase
this value, or any of the other similar values (read_buffer_size,
etc)?
In general he answer is yes. However the actual details depend on your hardware, OS and what storage engine you use. See chapter 7.11 (especially 7.11.4 in link below)
Also, the primary identifier is a crazy pattern they came up with in
the nineties.[...] I wonder if there's anything I can do to speed up
the sorting on this particular field (which is the default order by).
Perhaps you could add a primarySortOrder column to your table, into which you could store numeric values that would map the PK order (precaluclated from the store procedure you're using).
Ant the link you've been waiting for: Chapter 7 from MySQL manual: Optimization
Add an index to all the columns that have a large number of distinct values, say 100 or even 1000 or more. Tune this number as you go.
Recently I've learned the wonder of indexes, and performance has improved dramatically. However, with all I've learned, I can't seem to find the answer to this question.
Indexes are great, but why couldn't someone just index all fields to make the table incredibly fast? I'm sure there's a good reason to not do this, but how about three fields in a thirty-field table? 10 in a 30 field? Where should one draw the line, and why?
Indexes take up space in memory (RAM); Too many or too large of indexes and the DB is going to have to be swapping them to and from the disk. They also increase insert and delete time (each index must be updated for every piece of data inserted/deleted/updated).
You don't have infinite memory. Making it so all indexes fit in RAM = good.
You don't have infinite time. Indexing only the columns you need indexed minimizes the insert/delete/update performance hit.
Keep in mind that every index must be updated any time a row is updated, inserted, or deleted. So the more indexes you have, the slower performance you'll have for write operations.
Also, every index takes up further disk space and memory space (when called), so it could potentially slow read operations as well (for large tables).
Check this out
You have to balance CRUD needs. Writing to tables becomes slow. As for where to draw the line, that depends on how the data is being acessed (sorting filtering, etc.).
Indexing will take up more allocated space both from drive and ram, but also improving the performance a lot. Unfortunately when it reaches memory limit, the system will surrender the drive space and risk the performance. Practically, you shouldn't index any field that you might think doesn't involve in any kind of data traversing algorithm, neither inserting nor searching (WHERE clause). But you should if otherwise. By default you have to index all fields. The fields which you should consider unindexing is if the queries are used only by moderator, unless if they need for speed too
It is not a good idea to indexes all the columns in a table. While this will make the table very fast to read from, it also becomes much slower to write to. Writing to a table that has every column indexed would involve putting the new record in that table and then putting each column's information in the its own index table.
this answer is my personal opinion based I m using my mathematical logic to answer
the second question was about the border where to stop, First let do some mathematical calculation, suppose we have N rows with L fields in a table if we index all the fields we will get a L new index tables where every table will sort in a meaningfull way the data of the index field, in first glance if your table is a W weight it will become W*2 (1 tera will become 2 tera) if you have 100 big table (I already worked in project where the table number was arround 1800 table ) you will waste 100 times this space (100 tera), this is way far from wise.
If we will apply indexes in all tables we will have to think about index updates were one update trigger all indexes update this is a select all unordered equivalent in time
from this I conclude that you have in this scenario that if you will loose this time is preferable to lose it in a select nor an update because if you will select a field that is not indexed you will not trigger another select on all fields that are not indexed
what to index ?
foreign-keys : is a must based on
primary-key : I m not yet sure about it may be if someone read this could help on this case
other fields : the first natural answer is the half of the remaining filds why : if you should index more you r not far from the best answer if you should index less you are not also far because we know that no index is bad and all indexed is also bad.
from this 3 points I can conclude that if we have L fields composed of K keys the limit should be somewhere near ((L-K)/2)+K more or less by L/10
this answer is based on my logic and personal prictices
First of all, at least in SAP - ABAP and in background database table, we can create one index table for all required index fields, we will have their addresses only. So other SQL related software-database system can also use one table for all fields to be indexed.
Secondly, what is the writing performance? A company in one day records 50 sales orders for example. And let assume there is a table VBAK sales order header table with 30 fields for example each has 20 CHAR length..
I can write to real table in seconds, but other index table can work in the background, and at the same time a report is tried to be run, for this report while index table is searched, ther can be a logic- for database programming- a index writing process is contiuning and wait it for ending ( 5 sales orders at the same time were being recorded for example and take maybe 5 seconds) ..so , a running report can wait 5 seconds then runs 5 seconds total 10 seconds..
without index, a running report does not wait 5 seconds for writing performance..but runs maybe 40 seconds...
So, what is the meaning of writing performance no one writes thousands of records at the same time. But reading them.
And reading a second table means that : there were all ready sorted fields.I have 3 fields selected and I can find in which sorted sets I need to search these data, then I bring them...what RAM, what memory it is just a copied index table with only one data for each field -address data..What memory?
I think, this is one of the software company secrets hide from customers, not to wake them up , otherwise they will not need another system in the future with an expensive price.