I have the following table structure.
id (INT) index
date (TIMESTAMP) index
companyId (INT) index
This is the problem I am facing
companyId 111: hasta a total of 100000 rows in a 1 year time period.
companyId 222: has a total of 8000 rows in a 1 year time period.
If companyId 111 has 100 rows between '2020-09-01 00:00:00' AND '2020-09-06 23:59:59' and companyId 222 has 2000 rows in the same date range, companyId 111 is much slower than 222 even if it has less rows in the selected date range.
Shouldn't MySQL ignore all the rows outside the date range so the query becomes faster?
This is a query example I am using:
SELECT columns FROM table WHERE date BETWEEN '2020-09-01 00:00:00' AND '2020-09-06 23:59:59' AND companyId = 111;
Thank you
I would suggest a composite index here:
CREATE INDEX idx ON yourTable (companyId, date);
The problem with your premise is that, while you have an index on each column, you don't have any indices completely covering the WHERE clause of your example query. As a result, MySQL might even choose to not use any of your indices. You can also try reversing the order of the index above to compare performance:
CREATE INDEX idx ON yourTable (date, companyId);
Related
I'm using MySQL 5.7.
There is a table called transactions with over 3 million records. The table schema is as follows:
id - INT (autoincrements)
deleted_at (DATETIME, NULL allowed)
record_status (TINYINT, DEFAULT value is 1)
Other columns pertaining to this table...
The record_status is an integer version of the deleted_at. When a record is deleted, the value is set to 0. An index is also created for this column.
The null based DATETIME query takes 740 ms to execute:
select transactions.id from transactions where transactions.deleted_at is null
The TINYINT based query takes 15.1 s to execute:
select transactions.id from transactions where transactions.record_status = 1
Isn't the check on the TINYINT column (with index) supposed to be faster? Why is this happening?
[EDIT]
Added information about the table's performance
To take the experiment further, all unnecessary columns were removed from the table. Only the following persist.
id - INT (autoincrements)
deleted_at (DATETIME, NULL allowed)
record_status (TINYINT, DEFAULT value is 1)
transaction_time (DATETIME)
Query 1: Takes 2.3ms
select transactions.id from transactions
where transactions.record_status = 1 limit 1000000;
Query 2: Takes 2.1ms
select transactions.id from transactions
where transactions.deleted_at is null limit 1000000;
Query 3: Takes 20 seconds
select transactions.id from transactions
where transactions.record_status = 1
and transaction_time > '2020-04-01' limit 1000;
Query 4: Takes 500ms
select transactions.id from transactions
where transactions.deleted_at is null
and transaction_time > '2020-04-01' limit 1000;
Query 5: 394ms
select transactions.id from transactions
where transaction_time > '2020-04-01' limit 1000000;
I'm unable to figure out why Query 3 is taking this long.
The issue was addressed by adding a composite index.
Both the following now result in fast performance.
Composite key on transaction_time and deleted_at.
Composite key on transaction_time and record_status.
Thanks to #Gordon Linoff and #jarlh. Their suggestions led to this finding.
select transactions.id from transactions
where transactions.record_status = 1
and transaction_time > '2020-04-01' limit 1000;
cannot be optimized without a composite index like this -- in this order:
INDEX(record_status, transaction_time)
(No, two separate single-column indexes will not work as well.)
I have a table that has 1.6M rows. Whenever I use the query below, I get an average of 7.5 seconds.
select * from table
where pid = 170
and cdate between '2017-01-01 0:00:00' and '2017-12-31 23:59:59';
I tried adding a LIMIT 1000 or 10000 or change the date to filter for 1 month, it still processes it to an average of 7.5s. I tried adding a composite index for pid and cdate but it resulted to 1 second slower.
Here is the INDEX list
https://gist.github.com/primerg/3e2470fcd9b21a748af84746554309bc
Can I still make it faster? Is this an acceptable performance considering the amount of data?
Looks like the index is missing. Create this index and see if its helping you.
CREATE INDEX cid_date_index ON table_name (pid, cdate);
And also modify your query to below.
select * from table
where pid = 170
and cdate between CAST('2017-01-01 0:00:00' AS DATETIME) and CAST('2017-12-31 23:59:59' AS DATETIME);
Please provide SHOW CREATE TABLE clicks.
How many rows are returned? If it is 100K rows, the effort to shovel that many rows is significant. And what will you do with that many rows? If you then summarize them, consider summarizing in SQL!
Do have cdate as DATETIME.
Do you use id for anything? Perhaps this would be better:
PRIMARY KEY (pid, cdate, id) -- to get benefit from clustering
INDEX(id) -- if still needed (and to keep AUTO_INCREMENT happy)
This smells like Data Warehousing. DW benefits significantly from building and maintaining Summary table(s), such as one that has the daily click count (etc), from which you could very rapidly sum up 365 counts to get the answer.
CAST is unnecessary. Furthermore 0:00:00 is optional -- it can be included or excluded for either DATE or DATETIME. I prefer
cdate >= '2017-01-01'
AND cdate < '2017-01-01' + INTERVAL 1 YEAR
to avoid leap year, midnight, date arithmetic, etc.
I have seen several question in SO and based in that I improved my sql query also.
but it sometime take 12 second or it sometime takes 3 seconds to execute. so minimum time we can its 3 seconds. query is like this way
SELECT ANALYSIS.DEPARTMENT_ID
,SCORE.ID
,SCORE.KPI_ SCORE.R_SCORE
,SCORE.FACTOR_SCORE
,SCORE.FACTOR_SCORE
,SCORE.FACTOR_SCORE
,SCORE.CREATED_DATE
,SCORE.UPDATED_DATE
FROM SCORE_INDICATOR SCORE
,AG_SENTIMENT ANALYSIS
WHERE SCORE.TAG_ID = ANALYSIS.ID
AND ANALYSIS.ORGANIZATION_ID = 1
AND ANALYSIS.DEPARTMENT_ID IN (1,2,3,4,5)
AND DATE (ANALYSIS.REVIEW_DATE) BETWEEN DATE ('2016-05-02') AND DATE ('2017-05-02')
ORDER BY ANALYSIS.DEPARTMENT_ID
now one table SCORE_INDIACATOR has 19345116 and later has 19057025 rows total. and I added index on ORGANIZATION_ID and department_id and another as combination of ORGANIZATION_ID and department_id . is there any other way to improve it or is it maximum I can achieve with this amount of data?
Here is checklist:
1) Make sure logs table (ANALYSIS) uses MyISAM engine (it's fast for OLAP queries).
2) Make sure that You've indexed ANALYSIS.REVIEW_DATE field.
3) Make sure that ANALYSIS.REVIEW_DATE is type of DATE (not CHAR, VARCHAR)
4) Change query (rearrange query plan):
SELECT
ANALYSIS.DEPARTMENT_ID
,SCORE.ID
,SCORE.KPI_ SCORE.R_SCORE
,SCORE.FACTOR_SCORE
,SCORE.FACTOR_SCORE
,SCORE.FACTOR_SCORE
,SCORE.CREATED_DATE
,SCORE.UPDATED_DATE
FROM SCORE_INDICATOR SCORE
,AG_SENTIMENT ANALYSIS
WHERE
SCORE.TAG_ID = ANALYSIS.ID
AND
ANALYSIS.REVIEW_DATE >= '2016-05-02' AND ANALYSIS.REVIEW_DATE < '2016-05-03'
AND
ANALYSIS.ORGANIZATION_ID = 1
AND
ANALYSIS.DEPARTMENT_ID IN (1,2,3,4,5)
ORDER BY ANALYSIS.DEPARTMENT_ID;
I have changed the order and style to JOIN syntax. The Score table seems to be the child to the primary criteria of the Analysis table. All your criteria is based on qualifying Analysis records. Now, the indexing. By doing a DATE() function call on a column does not help the optimizer. So, to get all possible date/time components, I have changed from between to >= the first date and LESS THAN one day beyond the end. In your example DATE( '2017-05-02' ) is the same as LESS than '2017-05-03' which will include 2017-05-02 up to 23:59:59 and the date can be applied better.
Now for the index. DO a compound index based on fields for join and order by might help
AG_Segment table... index ON(Organization_ID, Department_ID, Review_Date, ID)
SELECT
ANALYSIS.DEPARTMENT_ID,
SCORE.ID,
SCORE.KPI_ SCORE.R_SCORE,
SCORE.FACTOR_SCORE,
SCORE.FACTOR_SCORE,
SCORE.FACTOR_SCORE,
SCORE.CREATED_DATE,
SCORE.UPDATED_DATE
FROM
AG_SENTIMENT ANALYSIS
JOIN SCORE_INDICATOR SCORE
ON ANALYSIS.ID = SCORE.TAG_ID
where
ANALYSIS.ORGANIZATION_ID = 1
AND ANALYSIS.DEPARTMENT_ID IN (1,2,3,4,5)
AND ANALYSIS.REVIEW_DATE >= '2016-05-02'
AND ANALYSIS.REVIEW_DATE < '2017-05-03'
ORDER BY
ANALYSIS.DEPARTMENT_ID
Not sure how this would work. I have a between query, but how would I run a query to list results that match each and every day. Example, enterys that exists on 2011-06-17, 2011-06-18, 2011-06-19 and 2011-06-20
SELECT lookup, `loc`, `octect1` ,`octect2` ,`octect3` ,`octect4`, date, time, count(`lookup`) as count FROM index
WHERE date between '2011-06-17' AND '2011-06-20'
GROUP BY lookup
ORDER BY count DESC
Thanks
Instead of BETWEEN, use comparison operators:
SELECT lookup, `loc`, `octect1` ,`octect2` ,`octect3` ,`octect4`,
date, time, count(`lookup`) as count
FROM index
WHERE date > '2011-06-17' AND date < '2011-06-20'
GROUP BY lookup
ORDER BY count DESC
Why MySQL search all rows when I switch to a 1 year range?
--Table dates
id (int)
date (timestamp)
value (varchar)
PRIMARY(id), date_index(date)
1750 rows
Executing
EXPLAIN SELECT * FROM dates WHERE date BETWEEN '2011-04-27' AND '2011-04-28'
The rows column display 18 rows.
If I increase or decrease the BETWEEN range - 1 year for example - the rows column display 1750 rows.
EXPLAIN SELECT * FROM dates WHERE date BETWEEN '2011-04-27' AND '2012-04-28'
EXPLAIN SELECT * FROM dates WHERE date BETWEEN '2010-04-27' AND '2011-04-28'
The optimizer builds the query plan depending on several things including the amount/distribution of the data. My best guess would be that you don't have much more than a year's data or that using the index for the year's worth of data wouldn't use many less rows than the total table size.
If that doesn't sound right can you post up the output of:
SELECT MIN(date), MAX(date) FROM dates;
SELECT COUNT(*) FROM dates WHERE date BETWEEN '2011-04-27' AND '2012-04-28';
This article I wrote shows some examples of how the optimizer works too: What makes a good MySQL index? Part 2: Cardinality