Writing faster select queries on large databases - mysql

I have two tables, one with about 1,000 rows and one with 700,000 rows; table1 and table2 respectively. I wrote a simple select query:
SELECT DISTINCT name1
FROM table1, table2
WHERE table1.name1 = table2.name2;
The query got me exactly what I want but it took 91 seconds! I tried this sub query out:
SELECT DISTINCT name1
FROM table1
WHERE table1.name1 IN(SELECT DISTINCT name2 FROM table2);
That query took a consistent 37 seconds. So there's some performance boost in the way you write select queries. I wrote a third query:
CREATE TEMPORARY TABLE IF NOT EXISTS t1qry
(SELECT DISTINCT table1.name1 FROM table1);
CREATE TEMPORARY TABLE IF NOT EXISTS t2qry
(SELECT DISTINCT table2.name2 FROM table2);
SELECT name2 FROM t2qry JOIN t1qry ON name1 = name2;
DROP TABLE t1qry, t2qry;
This last query took 0.4 seconds to run and produced identical results to the other two.
I knew that each 'select distinct' query took less than a second to run so I was trying to craft one that would find common distinct values between the table. My question is why does what I wrote work? How do I write a faster select query like this without creating temporary tables?
I've been using MySQL and MariaDB but I'll take any SQL related help here. I'm new to SQL and have been trying to learn as much as I can, so I'll take any pointers or info about this.

If your latter version (including creating the temporary tables) goes so fast, then you probably have indexes on nameX in both tables.
I would suggests using exists:
SELECT DISTINCT name1
FROM table1
WHERE EXISTS (SELECT 1 FROM table2 WHERE table1.name1 = table2.name2);
For these queries, you do want indexes on table1(name1) and table2(name2).
And, if table1 has no duplicates, then leave out the DISTINCT.

Related

Are SQL Subquery more efficient

Trying to understand this, but code efficiency increased more than 10x when I stopped using subquery. Table2 has about 5000 rows, while table1 is pretty huge, a few hundred thousand.
Original Statement
SELECT *
FROM table1
WHERE indexedCol IN (
SELECT indexedCol
FROM table2
WHERE iCol2 = "somevalue"
)
So somehow this is way more efficient.
SELECT *
FROM table1
WHERE indexedCol IN
(*comma separated result of SELECT FROM table2*)
Is there something I am missing here? Or subquery is never a good idea.
The real issue is the sub-query correlated? What do I mean by that? If the sub-query references table1. If it doesn't then then answer is simple -- if you have two queries
SELECT *
FROM table1
and
SELECT indexedCol
FROM table2
WHERE iCol2 = "somevalue"
The time it take to run one of them is less than the time it takes to run both of them. This could be even worse (as suggested in the comments) if one of them is run for every row.
This query could be rewriten to use a join like this:
SELECT *
FROM TABLE1
JOIN TABLE2 on TABLE1.indexedCol = TABLE2.indexedCol and TABLE2.iCol2 = 'some value'
Which will probably solve your problem.

Which of the two queries do I use best for performance?

i have two query To do a job
query 1 :
SELECT * FROM table1 where id = 1
UNION ALL
SELECT * FROM table2 where id = 5
UNION ALL
SELECT * FROM table1 where id = 70
UNION ALL
SELECT * FROM table2 where id = 3
UNION ALL
SELECT * FROM table1 where id = 90
and query 2 :
SELECT * FROM table1 where id IN (1,70,90)
UNION ALL
SELECT * FROM table2 where id IN (5,3)
Which of these two queries is faster ?
If your answer is the second query .
I've used Query 1 in many different places. in the project Is the difference so large that I would replace everywhere with the second query ?
The second version is more concise, and should be faster, because it only requires actually executing two queries, as opposed to the first version, which does a separate query for each id value.
Assuming id be the primary key in both tables, then MySQL might also be able to use the clustered index for faster lookup of matching records.
What are the typical counts? Total of 5 rows? 2 tables? I would predict the performance difference to be a factor of rows/tables in favoring the 2nd (shorter) formulation. In experimenting, I got about 2x.
So, if you have 100 rows from 2 tables, the second formulation will be significantly faster; enough faster to be worth the effort.
Why?
For such simple queries, parsing and optimizing dominates the time.
For newer versions of MySQL, both queries will touch the same number of rows.
For MySQL 5.7.3 and later, no temp table will be needed for either UNION ALL.
Does it matter that the output rows are likely to be in a different order?

Any way to make UNION ALL run faster?

I have a lot of exactly same tables. TableA,TableB,TableC,TableD etc. which I want to create views from.
Doing select * from TableA takes 20ms, doing select * from tableB takes 20ms, but doing
(select * from TableA) union all (select * from TableB) takes over 20 minutes.
Those tables have exactly same columns. Is there any settings in my.cnf that I need to change, or a way to create a view that would run faster? All tables have 1.5m to about 10m rows.
Results of explain
PRIMARY TableA ALL 28808685
UNION TableB ALL 15316215
UNION RESULT <union1,2> ALL Using temporary
Table structure:
10 varchar(20)'s, 5 unsigned INTs.
My guess is that select * from TableA does not take 20 ms. It takes 20 ms to start returning results.
Although I am going to answer your question, you should revisit your data structure. Having multiple tables with the same layout is usually a really bad idea. Instead, you should have a single table with all the rows.
But, you don't seem to have that.
Try running the union all without parentheses:
select * from TableA union all
select * from TableB;
MySQL has a habit of materializing subqueries. I'm not sure if it does this with union all subqueries, but given your description of the problem, that sees likely.

Better way to get 15 tables results at a time in MySql

I have about 20 tables. These tables have only id (primary key) and description (varchar). The data is a lot reaching about 400 rows for one table.
Right now I have to get data of at least 15 tables at a time.
Right now I am calling them one by one. Which means that in one session I am giving 15 calls. This is making my process slow.
Can any one suggest any better way to get the results from the database?
I am using MySQL database and using Java Springs on server side. Will making view for all combined help me ?
The application is becoming slow because of this issue and I need a solution that will make my process faster.
It sounds like your schema isn't so great. 20 tables of id/varchar sounds like a broken EAV, which is generally considered broken to begin with. Just the same, I think a UNION query will help out. This would be the "View" to create in the database so you can just SELECT * FROM thisviewyoumade and let it worry about the hitting all the tables.
A UNION query works by having multiple SELECT stataements "Stacked" on top of one another. It's important that each SELECT statement has the same number, ordinal, and types of fields so when it stacks the results, everything matches up.
In your case, it makes sense to manufacturer an extra field so you know which table it came from. Something like the following:
SELECT 'table1' as tablename, id, col2 FROM table1
UNION ALL
SELECT 'table2', id, col2 FROM table2
UNION ALL
SELECT 'table3', id, col2 FROM table3
... and on and on
The names or aliases of the fields in the first SELECT statement are the field names that are used in the result set that is returned, so no worries about doing a bunch AS blahblahblah in subsequent SELECT statements.
The real question is whether this union query will perform faster than 15 individual calls on such a tiny tiny tiny amount of data. I think the better option would be to change your schema so this stuff is already stored in one table just like this UNION query outputs. Then you would need a single select statement against a single table. And 400x20=8000 is still a dinky little table to query.
To get a row of all descriptions into app code in a single roundtrip send a query kind of
select t1.description, ... t15.description
from t -- this should contain all needed ids
join table1 t1 on t1.id = t.t1id
...
join table1 t15 on t15.id = t.t15id
I cannot get you what you really need but here merging all those table values into single table
CREATE TABLE table_name AS (
SELECT *
FROM table1 t1
LEFT JOIN table2 t2 ON t1.ID=t2.ID AND
...
LEFT JOIN tableN tN ON tN-1.ID=tN.ID
)

Nested SELECT SQL Queries Workbench

Hi i have this query but its giving me an error of Operand should contain 1 column(s) not sure why?
Select *,
(Select *
FROM InstrumentModel
WHERE InstrumentModel.InstrumentModelID=Instrument.InstrumentModelID)
FROM Instrument
according to your query you wanted to get data from instrument and instrumentModel table and in your case its expecting "from table name " after your select * .when the subselect query runs to get its result its not finding table instrument.InstrumentModelId inorder to fetch result from both the table by matching you can use join .or you can also select perticuler fields by tableName.fieldName and in where condition use your condition.
like :
select Instrument.x,InstrumentModel.y
from instrument,instrumentModel
where instrument.x=instrumentModel.y
You can use a join to select from 2 connected tables
select *
from Instrument i
join InstrumentModel m on m.InstrumentModelID = i.InstrumentModelID
When you use subqueries in the column list, they need to return exactly one value. You can read more in the documentation
as a user commented in the documentation, using subqueries like this can ruin your performance:
when the same subquery is used several times, mysql does not use this fact to optimize the query, so be careful not to run into performance problems.
example:
SELECT
col0,
(SELECT col1 FROM table1 WHERE table1.id = table0.id),
(SELECT col2 FROM table1 WHERE table1.id = table0.id)
FROM
table0
WHERE ...
the join of table0 with table1 is executed once for EACH subquery, leading to very bad performance for this kind of query.
Therefore you should rather join the tables, as described by the other answer.