I am trying to write a query using the IN keyword.
Table A
attrId, attrName
Table B
key, attrId, attrVal
Based on key provided, I want to return the all attrName, attrVal combinations. The result will contain of columns from both tables. I don't want to use join using attrId as I trying to practice the usage of IN keyword.
Below is the query that I have attempted:
Select a.attrName, b2.attrVal
from table_A AS a, table_B AS b2
where a.attrId in (Select b1.attrId from Table_B b1 where key = <someKey>)
However I am not getting any result for the query. Also are queries that use IN keyword slow and should be avoided. I have approx 500 entries in table_A and 500k entries in table_B. The other alternative for me is to fetch all attrId from table_B and then fire multiple jdbc queries for each attrId retrieved to get corresponding attrName.
Can you please help out?
Thanks
Your query is performing a CROSS JOIN operation, every row returned from a is being "matched" with every row from b.
Your query is equivalent to:
SELECT a.attrName
, b2.attrVal
FROM table_A a
CROSS
JOIN table_B b2
WHERE a.attrId IN ( <some_list> )
The only way this query doesn't return any rows are 1) no rows in a satisfy the predicate in the WHERE clause, 2) b2 contains no rows, or 3) execution of the query is generating so many rows that it exceeds some available resource (e.g. temporary space) and returns an error, or 4) the client has timed out or cancelled the query before it completes.
I understand you are attempting to write a query that uses the use the IN operator, but the set returned by the query you posted really doesn't make much sense.
Q: are queries that use IN keyword slow and should be avoided.
A: The IN operator itself does not necessarily make a query slow.
For example:
SELECT t.id FROM mytable t WHERE t.id = 2 OR t.id = 3 OR t.id = 5
Could be rewritten using the IN operator as:
SELECT t.id FROM mytable t WHERE t.id IN (2,3,5)
On the other hand, a query using the IN operator with a correlated subquery can be "slow" if either 1) the subquery is slow and/or 2) there's a lot of rows that the subquery has to be evaluated for.
If you want to return rows from b that meet some condition, and then match those to rows in a, you should avoid a CROSS JOIN operation, by supplying some condition for the match (a join predicate in the ON clause)
For example:
SELECT a.attrName
, b.attrVal
FROM table_A a
JOIN table_B b
ON a.attrId = b.attrId
WHERE b.key = '<someKey>'
Related
I don't know if this is possible, but can mysql do a sub select and retrieve multiple records?
Here is my simplified query:
SELECT table1.*,
(
SELECT table2.*
FROM Table2 table2
WHERE table2.key_id = table1.key_id
)
FROM Table1 table1
Basically, Table2 has X amount of records that I need to pull back in the query and I don't want to have to run a secondary query (for instance get the results from Table1 and then loop over those results and then get all the results from Table2).
Thanks.
No. The subquery in the SELECT clause is called a scalar subquery. A scalar subquery has two important properties:
It can only retrieve one column.
It can only retrieve zero or one rows.
A scalar subquery -- as its name implies -- substitutes for a scalar value in an expression. If the subquery returns no rows, the value used in the expression is NULL.
In your case, you can use a LEFT JOIN instead:
SELECT t1.*, t2.*
FROM Table1 t1 LEFT JOIN
Table2 t2
ON t2.key_id = t1.keyid;
Note that table aliases are a good thing. However, they should make the query simpler, so repeating the table name is not a big win.
MySQL can do a subquery that returns multiple rows or multiple columns, but it's not valid to do that in a scalar context.
You're putting a subquery in a scalar context. In other words, in the select-list, a subquery must return one column and one row (or zero rows), because it will be used for one item on the respective row as it uses the select-list to build a result.
I have a MYSQL query of this form:
SELECT
employee.name,
totalpayments.totalpaid
FROM
employee
JOIN (
SELECT
paychecks.employee_id,
SUM(paychecks.amount) totalpaid
FROM
paychecks
GROUP BY
paychecks.employee_id
) totalpayments on totalpayments.employee_id = employee.id
I've recently found that this returns MUCH faster in this form:
SELECT
employee.name,
(
SELECT
SUM(paychecks.amount)
FROM
paychecks
WHERE
paychecks.employee_id = employee.id
) totalpaid
FROM
employee
It surprises me that there would be a difference in speed, and that the lower query would be faster. I prefer the upper form for development, because I can run the subquery independently.
Is there a way to get the "best of both worlds": speedy results return AND being able to run the subquery in isolation?
Likely, the correlated subquery is able to make effective use of an index, which is why it's fast, even though that subquery has to be executed multiple times.
For the first query with the inline view, that causing MySQL to create a derived table, and for large sets, that's effectively a MyISAM table.
In MySQL 5.6.x and later, the optimizer may choose to add an index on the derived table, if that would allow a ref operation and the estimated cost of the ref operation is lower than the nested loops scan.
I recommend you try using EXPLAIN to see the access plan. (Based on your report of performance, I suspect you are running on MySQL version 5.5 or earlier.)
The two statements are not entirely equivalent, in the case where there are rows in employees for which there are no matching rows in paychecks.
An equivalent result could be obtained entirely avoiding a subquery:
SELECT e.name
, SUM(p.amount) AS total_paid
FROM employee e
JOIN paychecks p
ON p.employee_id = e.id
GROUP BY e.id
(Use an inner join to get a result equivalent to the first query, use a LEFT outer join to be equivalent to the second query. Wrap the SUM() aggregate in an IFNULL function if you want to return a zero rather than a NULL value when no matching row with a non-null value of amount is found in paychecks.)
Join is basically Cartesian product that means all the records of table A will be combined with all the records of table B. The output will be
number of records of table A * number of records of table b =rows in the new table
10 * 10 = 100
and out of those 100 records, the ones that match the filters will be returned in the query.
In the nested queries, there is a sample inner query and whatever is the total size of records of the inner query will be the input to the outter query that is why nested queries are faster than joins.
I am using INNER JOIN and WHERE with LEFT function to match records by its first 8 chars.
INSERT INTO result SELECT id FROM tableA a
INNER JOIN tableB b ON a.zip=b.zip
WHERE LEFT(a.street,8)=LEFT(b.street,8)
Both a.street and b.street are indexed (partial index 8).
The query didn't finish in 24+ hours. I am wondering is there a problem with indexes or is there a more efficient way to perform this task
Mysql won't use indexes for columns that have a function applied.
Other databases do allow function based indexes.
You could create a column with just the first 8 chars of a.street and b.street and index those and things will be quicker.
This is your query:
INSERT INTO result
SELECT id
FROM tableA a INNER JOIN
tableB b ON a.zip=b.zip
WHERE LEFT(a.street,8)=LEFT(b.street,8);
MySQL is not smart enough to use a prefix index with this comparison. It will use a prefix index for like and direct string comparisons. If I assume that id is combine from tableA, then the following may perform better:
INSERT INTO result(id)
SELECT id
FROM tableA a
WHERE exists (select 1
from tableB b
where a.zip = b.zip and
b.street like concat(left(a.street, 8), '%')
);
The index that you want is tableB(zip, street(8)) or tableB(zip, street). This may use both components of the index. In any case, it might get better performance even if it cannot use both sides of the index.
Hi i have this query but its giving me an error of Operand should contain 1 column(s) not sure why?
Select *,
(Select *
FROM InstrumentModel
WHERE InstrumentModel.InstrumentModelID=Instrument.InstrumentModelID)
FROM Instrument
according to your query you wanted to get data from instrument and instrumentModel table and in your case its expecting "from table name " after your select * .when the subselect query runs to get its result its not finding table instrument.InstrumentModelId inorder to fetch result from both the table by matching you can use join .or you can also select perticuler fields by tableName.fieldName and in where condition use your condition.
like :
select Instrument.x,InstrumentModel.y
from instrument,instrumentModel
where instrument.x=instrumentModel.y
You can use a join to select from 2 connected tables
select *
from Instrument i
join InstrumentModel m on m.InstrumentModelID = i.InstrumentModelID
When you use subqueries in the column list, they need to return exactly one value. You can read more in the documentation
as a user commented in the documentation, using subqueries like this can ruin your performance:
when the same subquery is used several times, mysql does not use this fact to optimize the query, so be careful not to run into performance problems.
example:
SELECT
col0,
(SELECT col1 FROM table1 WHERE table1.id = table0.id),
(SELECT col2 FROM table1 WHERE table1.id = table0.id)
FROM
table0
WHERE ...
the join of table0 with table1 is executed once for EACH subquery, leading to very bad performance for this kind of query.
Therefore you should rather join the tables, as described by the other answer.
Which query will execute faster and which is perfect query ?
SELECT
COUNT(*) AS count
FROM
students
WHERE
status = 1
AND
classes_id IN(
SELECT
id
FROM
classes
WHERE
departments_id = 1
);
Or
SELECT
COUNT(*) AS count
FROM
students s
LEFT JOIN
classes c
ON
c.id = s.classes_id
WHERE
status = 1
AND
c.departments_id = 1
I have placed two queries both will output same result. Now I want to know which method will execute faster and which method is correct way ?
You should always use EXPLAIN to determine how your query will run.
Unfortunately, MySQL will execute your subquery as a DEPENDENT QUERY, which means that the subquery will be ran for each row in the outer query. You'd think MySQL would be smart enough to detect that the subquery isn't a correlated subquery and would run it just once, alas, it's not yet that smart.
So, MySQL will scan through all of the rows in students, running the subquery for each row, and not utilizing any indexes on the outer query whatsoever.
Writing the query as a JOIN would allow MySQL to utilize indexes, and the following query would be the optimum way to write it:
SELECT COUNT(*) AS count
FROMstudents s
JOIN classes c
ON c.id = s.classes_id
AND c.departments_id = 1
WHERE s.status = 1
This would utilize the following indexes:
students(`status`)
classes(`id`, `departements_id`) : multi-column index
From a design and clarity standpoint I'd avoid inner selects like the first one. It is true that to be 100% sure on if or how each query will be optimized and which will run 'better' requires seeing how the SQL server you're using will interperet it and its plan. In Mysql, use "Explain".
However.... Even without seeing this, my money is still on the Join only version... The inner select version has to perform the inner select in it's entirety before determining the values to use inside the "IN" clause--I know this to be true when you wrap stuff in functions, and pretty sure it's true when sticking a select in as IN arguements. I also know that that's a good way to totally neutralize any benefit you might have with indexes on the tables inside the inner select.
I'm generally of the opinion that Inner selects are only really needed for very rare query situations. Usually, those who use them often are thinking like traditional iterative flow programmers not really thinking in relational DB result set terms...
EXPLAIN Both the queries individually
The difference between both queries is of Sub-Queries vs Joins
Mostly Joins are faster than sub-queries. Join creates execution plan and predict what data is going to process, hence it saves time. On the other hand sub-queries run all the queries until all the data is loaded. Most developer use Sub-queries because these are more readable than JOINS, but where the performance is matter, JOIN is better solution.
The best way to find out is to measure it:
Without index
Query 1: 0.9s
Query 2: 0.9s
With index
Query 1: 0.4s
Query 2: 0.2s
The conclusion is:
If you don't have indexes then it makes no difference which query you use.
The join is faster if you have the right index.
The effect of adding the correct index is greater than the effect of choosing the right query. If performance matters, make sure you have the correct indexes.
Of course, your results may vary depending on the MySQL version and the distribution of data you have.
Here's how I tested it:
1,000,000 students (25% with status 1).
50,000 courses.
10 departments.
Here's the SQL I used to create the test data:
CREATE TABLE students
(id INT PRIMARY KEY AUTO_INCREMENT,
status int NOT NULL,
classes_id int NOT NULL);
CREATE TABLE classes
(id INT PRIMARY KEY AUTO_INCREMENT,
departments_id INT NOT NULL);
CREATE TABLE numbers(id INT PRIMARY KEY AUTO_INCREMENT);
INSERT INTO numbers VALUES (),(),(),(),(),(),(),(),(),();
INSERT INTO numbers
SELECT NULL
FROM numbers AS n1
CROSS JOIN numbers AS n2
CROSS JOIN numbers AS n3
CROSS JOIN numbers AS n4
CROSS JOIN numbers AS n5
CROSS JOIN numbers AS n6;
INSERT INTO classes (departments_id)
SELECT id % 10 FROM numbers WHERE id <= 50000;
INSERT INTO students (status, classes_id)
SELECT id % 4 = 0, id % 50000 + 1 FROM numbers WHERE id <= 1000000;
SELECT COUNT(*) AS count
FROM students
WHERE status = 1
AND classes_id IN (SELECT id FROM classes WHERE departments_id = 1);
SELECT COUNT(*) AS count
FROM students s
LEFT JOIN classes c
ON c.id = s.classes_id
WHERE status = 1
AND c.departments_id = 1;
CREATE INDEX ix_students ON students(status, classes_id);
The two queries won't produce the same results:
SELECT
COUNT(*) AS count
FROM
students
WHERE
status = 1
AND
classes_id IN(
SELECT
id
FROM
classes
WHERE
departments_id = 1
);
...will return the number of rows in the students table that have a classes_id field that is also in the classes table with a departments_id of 1.
SELECT
COUNT(*) AS count
FROM
students s
LEFT JOIN
classes c
ON
c.id = s.classes_id
WHERE
status = 1
AND
c.departments_id = 1
...will return the total number of rows in the students table where the status field is 1 and possibly more than that depending on how your data is organised.
If you want the queries to return the same thing, you need to change the LEFT JOIN to an INNER JOIN so it will match only the rows that suit both conditions.
Run EXPLAIN SELECT ... on both queries and check which one does what ;)