I'm using MySQL database. In that i'm having sales datas in one table and i've created an index for the Date column (i.e) OrderedDate.
The datas are retrieved fast when we use the query like
SELECT CustID
,CustName
,CustPhone
FROM SalesData
WHERE OrderDate BETWEEN '2012-01-08' AND '2012-05-08';
But for taking the details for Particular Quarter cycle , it gets slow and scans whole table datas..
The Query is like
SELECT CustID
,CustName
,CustPhone
FROM SalesData
WHERE Quarter(OrderDate) = Quarter('2012-01-08')
AND Year(OrderDate) = Year('2012-01-08');
Is there anyway to index the quarter function (or) any way to speedup the data retrieval for quarter cycle...?
Explain statement:
For 1st Query
id Selecttype table type possible_keys key key_len ref rows Extra
1 SIMPLE SalesData range csrv csrv 4 138 Using where
For 2nd Query (scanning 104785 rows data)
id Selecttype table type possible_keys key key_len ref rows Extra
1 SIMPLE SalesData All 104785 Using where
You found the solution yourself, why don't you use it?
Just calculate the boundary dates for the quarter in question and use them with BETWEEN:
SELECT CustID
,CustName
,CustPhone
FROM SalesData
WHERE OrderDate BETWEEN '2012-01-01' AND '2012-03-31';
You can calculate the boundary dates in your application, or also via MySQL as shown in this article:
http://use-the-index-luke.com/sql/where-clause/obfuscation/dates?db_type=mysql
There is no way to index anyFunction(OrderDate) in mysql, unless your store it separately.
I think you can handle it more efficient in 2 ways:
1- Migrate from MySQL to Mariadb, it has Computed/Virtual columns that you can make virtual field as OrderDateQuarter and OrderDateYear also you can index them without much overhead.
(MariaDB is a community-developed fork of the MySQL, Ithink it much better than native MySQL)
2- You can store OrderDateQuarter and OrderDateYear in another columns, then index them, you can make it easy by write a trigger to store these two columns OrderDate.
Related
I need to fetch last 24 hrs data frequently and this query runs frequently.
Since this scans many rows, using it frequently, affects the database performance.
MySql execution strategy picks index on created_at and that returns 1,00,000 rows approx. and these rows are scanned one by one to filter customer_id = 10 and my final result has 20000 rows.
How can I optimize this query?
explain SELECT *
FROM `order`
WHERE customer_id = 10
and `created_at` >= NOW() - INTERVAL 1 DAY;
id : 1
select_type : SIMPLE
table : order
partitions : NULL
type : range
possible_keys : idx_customer_id, idx_order_created_at
key : idx_order_created_at
key_len : 5
ref : NULL
rows : 103357
filtered : 1.22
Extra : Using index condition; Using where
The first optimization I would do is on the access to the table:
create index ix1 on `order` (customer_id, created_at);
Then, if the query is still slow I would try appending the columns you are selecting to the index. If, for example, you are selecting the columns order_id, amount, and status:
create index ix1 on `order` (customer_id, created_at,
order_id, amount, status);
This second strategy could be beneficial, but you'll need to test it to find out what performance improvement it peoduces in your particular case.
The big improvement of this second strategy is that it walks the secondary index only, by avoiding to walk back to the primary clustered index of the table (that can be time consumming).
Instead of two single indexes on ID and Created, create a single composite index on ( customer_id, created_at ). This way the index engine can use BOTH parts of the where clause instead of just hoping to get the one. Jump right to the customer ID, then jump directly to the date desired, then gives results. it SHOULD be very fast.
Additional Follow-up.
I hear your comment about having multiple indexes, but add those into the main one, just after such as
( customer_id, created_at, updated_at, completion_time )
Then, in your queries could always include some help on the index in the where clause. For example, and I don't know your specific data. A record is created at some given point. The updated and completion time will always be AFTER that. How long does it take (worst-case scenario) from a creation to completion time... 2 days, 10 days, 90 days?
where
customerID = ?
AND created_at >= date - 10 days
AND updated_at >= date -1
Again, just an example, but if a person has 1000's of orders and relatively quick turn-around time, you could jump to those most recent and then find those updated within the time period.. Again, just an option as a single index vs 3, 4 or more indexes.
Seems you are dealing a very quick growing table, I should consider moving this frequent query to a cold table or replica.
One more point is that did you consider partition by customer_id. I am not quite understand the business logic behind to query customer_id = 10. If it's multi tenancy application, try partition.
For this query:
SELECT o.*
FROM `order` o
WHERE o.customer_id = 10 AND
created_at >= NOW() - INTERVAL 1 DAY;
My first inclination would be a composite index on (customer_id, created_at) -- as others have suggested.
But, you appear to have a lot of data and many inserts per day. That suggests partitioning plus an index. The appropriate partition would be on created_at, probably on a daily basis, along with an index for user_id.
A typical query would access the two most recent partitions. Because your queries are focused on recent data, this also reduces the memory occupied by the index, which might be an overall benefit.
This technique should be better than all the other answers, though perhaps by only a small amount:
Instead of orders being indexed thus:
PRIMARY KEY(order_id) -- AUTO_INCREMENT
INDEX(customer_id, ...) -- created_at, and possibly others
do this to "cluster" the rows together:
PRIMARY KEY(customer_id, order_id)
INDEX (order_id) -- to keep AUTO_INCREMENT happy
Then you can optionally have more indexes starting with customer_id as needed. Or not.
Another issue -- What will you do with 20K rows? That is a lot to feed to a client, especially of the human type. If you then munch on it, can't you make a more complex query that does more work, and returns fewer rows? That will probably be faster.
mysql table struct
it make me confuse, if query range influence use index in mysql
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
That is what happens. And it is actually an optimization.
When using a secondary key (such as INDEX(teacher_id)), the processing goes like this:
Reach into the index, which is a B+Tree. In such a structure, it is quite efficient to find a particular value (such as 1) and then scan forward (until 5000 or 10000).
For each entry, reach over into the data to fetch the row (SELECT *). This uses the PRIMARY KEY, a copy of which is in the secondary key. The PK and the data are clustered together; each lookup by one PK value is efficient (again, a BTree), but you need to do 5000 or 10000 of them. So the cost (time taken) adds up.
A "table scan" (ie, not using any INDEX) goes like this:
Start at the beginning of the table, walk through the B+Tree for the table (in PK order) until the end.
For each row, check the WHERE clause (a range on teacher_id).
If more than something like 20% of the table needs to be looked at, a table scan is actually faster than bouncing back and forth between the secondary index and the data.
So, "large" is somewhere around 20%. The actual value depends on table statistics, etc.
Bottom line: Just let the Optimizer do its thing; most of the time it knows best.
in brief, i use mysql database
execute
EXPLAIN
SELECT * FROM t_teacher_course_info
WHERE teacher_id >1 and teacher_id < 5000
will use index INDEX `idx_teacher_id_last_update_time` (`teacher_id`, `last_update_time`)
but if change range
EXPLAIN
SELECT * FROM t_teacher_course_info
WHERE teacher_id >1 and teacher_id < 10000
id select_type table type possible_keys key key_len ref rows Extra
1 1 SIMPLE t_teacher_course_info ALL idx_teacher_update_time 671082 Using where
scan all table, not use index , any mysql config
maybe scan row count judge if use index. !!!!!!!
I am running a query (taking 3 minutes to execute)-
SELECT c.eveDate,c.hour,SUM(c.dataVolumeDownLink)+SUM(c.dataVolumeUpLink)
FROM cdr c
WHERE c.evedate>='2013-10-19'
GROUP BY c.hour;
with explain plan -
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE c ALL evedate_index,eve_hour_index 31200000 Using where; Using temporary; Using filesort
I am using table(Myisam)-
Primary key(id,evedate),
with weekly 8 partitions with key evedate,
index key- on evedate,
composite index on evedate,hour.
I have changed the mysql tuning parameters from my.ini as (4GB RAM)-
tmp_table_size=200M
key_buffer_size=400M
read_rnd_buffer_size=2M
But still its using temporary table and file sort. Please let me know what should I do to exclude this.
After adding new composite index(evedate,msisdn)
I have found some changes in few queries they were not using any temporary case, even in above query if I omit group by clause, its not using temporary table.
You cannot do anything. MySql is not able to optimize this query and avoid temporaty table.
According to this link: http://dev.mysql.com/doc/refman/5.7/en/group-by-optimization.html
there are two methods MySql is using to optimize GROUP BY.
The first method - Loose Index Scan - cannot be used for your query because this condition is not meet:
The only aggregate functions used in the select list (if any) are MIN() and MAX() ....
Your query contains SUM, therefore MySql cannot use the above optimalization method.
The second method - Tight Index Scan - cannot be used for your query, because this condition is not meet:
For this method to work, it is sufficient that there is a constant equality condition for all columns in a query referring to parts of the key coming before or in between parts of the GROUP BY key.
Your query is using only a range operator : WHERE c.evedate>='2013-10-19', there is no any equality condition in the WHERE clause, therefore this method cannot be used to optimize the query.
I have a MySQL query :
SELECT date(FROM_UNIXTIME(time)) as date,
count(view) as views
FROM ('table_1')
WHERE 'address' = 1
GROUP BY date(FROM_UNIXTIME(time))
where
view : auto increment and primary key, (int(11))
address : index , (int(11))
time : index, (int(11))
total rows number of the table is : 270k
this query have slow executing, in mysql-slow.log I got :
Query_time: 1.839096
Lock_time: 0.000042
Rows_sent: 155
Rows_examined: 286435
with use EXPLAIN looks like below:
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE table_1 ref address address 5 const 139138 Using where; Using temporary; Using filesort
How to improve this query to speed up executing? Maybe better will be if I change date in PHP? But I think in PHP take a date as timestamp next convert to human readable and make "group by" will take more time then one query in MySQL. Maybe somebody knows how to make this query faster?
When you apply the functions date() and FROM_UNIXTIME() to the time in the group by you kill any indexing benefit you may have on that field.
Adding a date column would be the only way i can see speeding this up if you need it grouped by day. Without it, you'll need to decrase the overall set you are trying to group by. You could maybe add start/end dates to limit the date range. That would decrease the dates being transformed and grouped.
You should consider adding a additional DATE column to your table and indexing it.
I have been trying to debug my search MYSQL search speeds and they are horrible (a couple of seconds to 2 minutes).
This is an example code for the search. Search complexity can become really complicated depending on the user requirements.
SELECT Device,Input,`W(m)`,`L(m)`,VDD,`Temp(C)`,Param,Value
FROM `TABLE_NAME`
WHERE (`Temp(C)`='110' OR `Temp(C)`='125' )
AND (Device='ngear' )
AND (Input='a' OR Input='b' OR Input='a' OR Input='b' OR Input='c' OR Input='b' )
AND (Param='speed' OR Param='leakage' )
Please note this table has no indices and no primary key. This data isn't really relational as it contains statistical simulation data that is stored in MYSQL. This table has about 1 million rows of data.
Should I start indexing every column? Any thoughts would be appreciated.
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE TABLE_NAME ALL NULL NULL NULL NULL 12278640 Using where
You need a composite index, and properly require some redefine of your data type,
here is some general advice :
alter table MQUAL__TSMC_20G_v0d1_Robert_02122011
add index ( `Temp(C)`, Device, Input, Param);
And your query can be change to :-
where `Temp(C)` in(110, 125)
AND Device='ngear'
AND Input in('a','b','c')
AND Param in('speed', 'leakage');
Due to lack of information of your schema,
so, data type should be :-
`Temp(C)` <-- int
My suggestion is to add an index with the 'where' clause that narrows the data the most. As a guess temp should be the first column in the index. You want a compound index as well. The general rule of columns should start with the column that reduces the result set the most to the last being the most common value.