I am working with mysql dbs. There are two columns in a particular table (column1 and column2) and 10000000+ rows. I want to get all entries where column1 is one of a list of 50000 no.s. I am using this query currently:
Select * from db.table where column1 in (list of 50000 no.s)
Is there a faster query than this?
I can not talk about MySQL - only SQL Server - but the same principle may apply.
On SQL Server an IN has a serious problem of no statistics. Which means that with a non trivial number, the query plan is a table scan.
It is better to make a temporary table and load the ID's (AND put in a unique index on it which puts up statistics) and then JOIN between the two tables. More for the query analyzer to work with.
INDEX(column1)
Are there only 2 columns in the table? If not, then don't use SELECT *, but spell out the column names.
Please provide EXPLAIN SELECT ...
Related
I know that you can use
SELECT *
FROM table
WHERE id IN (ids)
In my case, I have 100,000 of ids.
I'm wondering if MySQL has a limit for IN clause. If anyone knows a more efficient way to do this, that would be great!
Thanks!
Just this week I had to kill -9 a MySQL 5.7 Server where one of the developers had run a query like you describe, with a list of hundreds of thousands of id's in an IN( ) predicate. It caused the thread running the query to hang, and it wouldn't even respond to a KILL command. I had to shut down the MySQL Server instance forcibly.
(Fortunately it was just a test server.)
So I recommend don't do that. I would recommend one of the following options:
Split your list of 100,000 ids into batches of at most 1,000, and run the query on each batch. Then use application code to merge the results.
Create a temporary table with an integer primary key.
CREATE TEMPORARY TABLE mylistofids (id INT PRIMARY KEY);
INSERT the 100,000 ids into it. Then run a JOIN query for example:
SELECT t.* FROM mytable AS t JOIN mylistofids USING (id)
Bill Karwin suggestions are good.
The number of values from IN clause is only limited by max_allowed_packet from my.ini
MariaDB creates a temporary table when the IN clause exceeds 1000 values.
Another problem with such number of IDs is the transfer of data from the PHP script (for example) to the MySQL server. It will be a very long query. You can also create a stored procedure with that select and just call it from you script. It will be more efficient in terms of passing data from your script to MySQL.
I have a table with two fields: a,b
Both fields are indexed separately -- no compound index.
While trying to run a select query with both fields:
select * from table where a=<sth> and b=<sth>
It took over 400ms. while
select * from table where a=<sth>
took only 30ms;
Do I need set a compound index for (a,b)?
Reasonably, if I have indexes on both a and b, it should be fast for queries of a AND b like above right?
For this query:
select *
from table
where a = <sth> and b = <sth>;
The best index is on table(a, b). This can also be used for your second query as well.
Usually (but not always).
In your case the number of different values in a (and b) and the number of columns you use in your select can change the way db decide to use index / table.
For example,
if in table you have,say, 100.000 records and 80.000 of them have the same value for a, when you query for:
SELECT * FROM table WHERE a=<your value>
db engine could decide to "scan" directly the table without using the index, while if you query
SELECT a, b FROM table WHERE a=<your value>
and in index you added column b too (in index directly or with INCLUDE) it's quite probable that db engine will use the index.
Try to give a look on internet for index tips and give a look too to How can I index these queries?
The SQLite documentation explains how index lookups work.
Once the database has used an index to look up some rows, the other index is no longer efficient to use (there is no easy method to filter the results of the first lookup because the other index refers to rows in the original table, not to entries in the first index). See Multiple AND-Connected WHERE-Clause Terms.
To make index lookups on two columns as fast as possible, you need Multi-Column Indices.
I have a database table with 10000 rows in it and I'd like to select a few thousand items using something like the following:
SELECT id FROM models WHERE category_id = 2
EDIT: The id column is the primary index of the table. Also, the table has another index on category_id.
My question is what would be the impact on performance? Would the query run slow? Should I consider splitting my models table into separate tables (one table for each category)?
This is what database engines are designed for. Just make sure you have an index on the column that you're using in the WHERE clause.
You can try this to get the 100 records
SELECT id FROM models WHERE category_id = 2 LIMT 100
Also you can create index on that column to get the fast retrival of the result
ALTER TABLE `table` ADD INDEX `category_id ` (`category_id `)
EDIT:-
If you have index created on your columns then you dont have to worry about the performance, database engines are smart enough to take care of the performance.
My question is what would be the impact on performance? Would the
query run slow? Should I consider splitting my models table into
separate tables
No you dont have to split your tables as that would not help you in gaining performance
Im fairly new to SQL however I would first index the column
I agree with R.T.'s solution. In addition I can recommend you the link below :
https://indexanalysis.codeplex.com/
download the sql code. It's a stored procedure that helps me a lot when I want to analyze the impact of the indexes or what status they have in my database.
Please check.
I run the following SQL Query on a MySQL platform.
Table A is a table which has a single column (primary key) and 25K rows.
Table B has several columns and 75K rows.
It takes 20 minutes to execute following query. I will be glad if you could help.
INSERT INTO sometable
SELECT A.PrimaryKeyColumn as keyword, 'SomeText', B.*
FROM A, B
WHERE B.PrimaryKeyColumn = CONCAT(A.PrimaryKeyColumn, B.NotUniqueButIndexedColumn);
Run the SELECT without the INSERT to see if the problem is with the SELECT or not.
If it is with the SELECT, follow the MySQL documentation explaining how to optimize queries using EXPLAIN.
If the SELECT runs fine but the INSERT takes forever, make sure you don't have a lot of unnecessary indexes on sometable. Beyond that, you may need to do some MySQL tuning and/or OS tuning (e.g., memory or disk performance) to get a measurable performance boost with the INSERT.
If I get it right you are roughly trying to insert 1.875 Billion records - (which does not match the where clause).
For that 20 minutes doesn't sound too bad....
I have a simple table with 1,000,000 rows.
this row has a datetime field that I am always quering where statement on it.
SELECT * from my_table WHERE date_time = 'blabla';
Is it reccomened to put index on it for that reason only (where statement)?
Definitely yes. Without it, MySQL will have to do a full table scan (go row by row comparing the date_time to the provided value.)
Yes, put an index on fields in the where clause.
Just take a look at the EXPLAIN EXTENDED (put it in front of your query) and see what the query actually touches.
Also note: Do not use SELECT *, but always the fields you actually use