I have very simple select like this:
SELECT * FROM table
WHERE column1 IN (5, 20, 30);
on column1 is seted index, after explaining query is index used, all looks to be ok.
but if there are more than three values in range, like this:
SELECT * FROM table
WHERE column1 IN (5, 20, 30, 40);
index is not used and select runs thru all records. Am I doing something wrong? thanks
How many rows does MySql think there are in the table?
Mysql often (usually correctly!) assumes it will be quicker to do a sequential scan of the rows, rather than mess around with the more complex access via an index.
It varies from DBMS to DBMS but the tradeoff point is somewhere about 30% of the rows.
IE. If the optimiser expects more than 30% of the rows to be selected it will sequentially scan the whole table as this is usually faster than doing lots of direct access via indexes.
Related
I want to perform a MySQL query on a large table (500+ million rows). A particular column has thousands of possible element values I want to return all the rows where this column is just 100 of these values.
Lets say this column1 can have these values: A1, A2, A3, .., A12309. I want to do a query like so:
SELECT column2 WHERE column1='A1' OR column1='A2'... OR column1='A100'
Having 100 different OR operators is highly inefficient.
Is there a better way to structure this query?
Is MySQL suited for this or is there a better DBMS / data analysis tool?
SELECT column2 FROM tbl
WHERE column1 IN ('A1', 'A2', ... 'A100')
This will work reasonably well for 100 values.
This should help, too:
INDEX(column1, column2)
Which version of MySQL? (Some versions have a cutoff -- above some number of values in the IN leads to a less efficient execution plan, especially for 500M rows.)
I am working with mysql dbs. There are two columns in a particular table (column1 and column2) and 10000000+ rows. I want to get all entries where column1 is one of a list of 50000 no.s. I am using this query currently:
Select * from db.table where column1 in (list of 50000 no.s)
Is there a faster query than this?
I can not talk about MySQL - only SQL Server - but the same principle may apply.
On SQL Server an IN has a serious problem of no statistics. Which means that with a non trivial number, the query plan is a table scan.
It is better to make a temporary table and load the ID's (AND put in a unique index on it which puts up statistics) and then JOIN between the two tables. More for the query analyzer to work with.
INDEX(column1)
Are there only 2 columns in the table? If not, then don't use SELECT *, but spell out the column names.
Please provide EXPLAIN SELECT ...
User will select a date e.g. 06-MAR-2017 and I need to retrieve hundred thousand of records for date earlier than 06-MAR-2017 (but it could vary depends on user selection).
From above case, I am using this querySELECT col from table_a where DATE_FORMAT(mydate,'%Y%m%d') < '20170306' I feel that the record is kind of slow. Are there any faster or fastest way to get date results like this?
With 100,000 records to read, the DBMS may decide to read the table record for record (full table scan) and there wouldn't be much you could do.
If on the other hand the table contains billions of records, so 100,000 would just be a small part, then the DBMS may decide to use an index instead.
In any way you should at least give the DBMS the opportunity to select via an index. This means: create an index first (if such doesn't exist yet).
You can create an index on the date column alone:
create index idx on table_a (mydate);
or even provide a covering index that contains the other columns used in the query, too:
create index idx on table_a (mydate, col);
Then write your query such that the date column is accessed directly. You have no index on DATE_FORMAT(mydate,'%Y%m%d'), so above indexes don't help with your original query. You'd need a query that looks up the date itself:
select col from table_a where mydate < date '2017-03-06';
Whether the DBMS then uses the index or not is still up to the DBMS. It will try to use the fastest approach, which very well can still be the full table scan.
If you make a function call in any column at the left side of comparison, MySql will make a full table scan.
The fastest method would be to have an index created on mydate, and make the right side ('20170306') the same datatype of the column (and the index)
Which is the complexity of the "group by" statement in MySQL?
I am managing vaery big tables and I also would like to know if there is any method to calculate how much time a query is going to take.
This question is impossible to answer with knowledge of what the entire query looks like. Some group bys can be prohibitively expensive while others are very cheap, it all depends on how the indexes in the database are set up, if the value you group by can be cached etc.
For example, this is a very cheap group by:
CREATE TABLE t (a INT, KEY(a));
SELECT * FROM WHERE 1 GROUP BY a;
Since a is an index.
But for something like this, it's very expensive since it would require a table scan.
CREATE TABLE t (a INT);
SELECT * FROM WHERE 1 GROUP BY a;
Generally if a key is not available, the database will creates a temporary table in memory for group by clauses, go through all the values, insert each value into the temporary table with an index to the corresponding row in the result set, then it will select from the temporary table, pick the first row from each column and send that back as the result. Depending on if you use the "extra" rows per group by clause (ie. using MAX(), GROUP_CONCAT() or similar) it will need to fetch all rows again.
You can use EXPLAIN to figure out what strategy MySQL will use, the 'Extra' (in ascending order of cost to execute) 'Using index' if an index can be used, 'Using filesort' if reading all rows from disk will be necessary, and column will contain 'Using Temporary' if a temporary will be required
I run the following SQL Query on a MySQL platform.
Table A is a table which has a single column (primary key) and 25K rows.
Table B has several columns and 75K rows.
It takes 20 minutes to execute following query. I will be glad if you could help.
INSERT INTO sometable
SELECT A.PrimaryKeyColumn as keyword, 'SomeText', B.*
FROM A, B
WHERE B.PrimaryKeyColumn = CONCAT(A.PrimaryKeyColumn, B.NotUniqueButIndexedColumn);
Run the SELECT without the INSERT to see if the problem is with the SELECT or not.
If it is with the SELECT, follow the MySQL documentation explaining how to optimize queries using EXPLAIN.
If the SELECT runs fine but the INSERT takes forever, make sure you don't have a lot of unnecessary indexes on sometable. Beyond that, you may need to do some MySQL tuning and/or OS tuning (e.g., memory or disk performance) to get a measurable performance boost with the INSERT.
If I get it right you are roughly trying to insert 1.875 Billion records - (which does not match the where clause).
For that 20 minutes doesn't sound too bad....