Let's say I have a query:
select product_id, price, price_day
from products
where price>10
and I want to join the result of this query with itself (if for example I want to get in the same row product's price and the price in previous day)
I can do this:
select * from
(
select product_id, price, price_day
from products
where price>10
) as r1
join
(
select product_id, price, price_day
from products
where price>10
) as r2
on r1.product_id=r2.product_id and r1.price_day=r2.price_day-1
but as you can see I am copying the original query, naming it a different name just to join its result with itself.
Another option is to create a temp table but then I have to remember to remove it.
Is there a more elegant way to join the result of a query with itself?
self join query will help
select a.product_ID,a.price
,a.price_day
,b.price as prevdayprice
,b.price_day as prevday
from Table1 a
inner join table1 b
on a.product_ID=b.product_ID and a.price_day = b.price_day+1
where a.price >10
You could do a bunch of things, just a few options could be:
Just let mysql handle the optimization
this will likely work fine until you hit many rows
Make a view for your base query and use that
could increase performance but mostly increases readability (if done right)
Use a table (non temporary) and insert your initial rows in there. (unfortunately you cannot refer to a temporary table more than once in a query)
this will likely be more expensive performance wise until a certain number of rows is reached.
Depending on how important performance is for your situation and how many rows you need to work with the "best" choice would change.
Just to get duplicates in the same row?
select product_id as id1, price as price1, price_day as priceday1, product_id as id2, price as price2, price_day as priceday2,
from products
where price>10
Related
I have 3 tables:
1. products(product_id,name)
2. orders(id,order_id,product_id)
3. factors(id,order_id,date)
I want to retrieve product names(products.name) where have similar order_id on a date in two last tables.
I use this query for this purpose:
select products.name
from products
WHERE products.product_id ~IN
(
SELECT distinct orders.product_id FROM orders WHERE
order_id IN (select order_id FROM factors WHERE
factors.datex ='2017-04-29') GROUP BY product_id
)
but no result. where is my mistake? how can I resolve that? thanks
Your query should be fine. I am rewriting it to make a few changes to the structure, but not the logic (this makes it easier for me to understand the query):
select p.name
from products p
where p.product_id in (select o.product_id
from orders o
where o.order_id in (select f.order_id
from factors f
where f.datex = '2017-04-29'
)
) ;
Notes on the changes:
When using multiple tables in a query, always qualify the column names.
Use table aliases. They make queries easier to write and to read.
SELECT DISTINCT and GROUP BY are unnecessary in IN subqueries. The logic of IN already handles (i.e. ignores) duplicates. And by explicitly including the operations, you run the risk of a less efficient query plan.
Why might your query not work?
factors.datex has a time component. If so, then this will work date(f.datex) = '2017-04-29'.
There are no factors on that date.
There are no orders that match factors on that date.
There are no products in the orders that match the factors on that date.
In factors table column name is date so it should be -
factors.date ='2017-04-29'
You have written -
factors.datex ='2017-04-29'
Considering the following query:
SELECT COUNT(table1.someField), COUNT(table2.someField)
FROM table1
INNER JOIN table2 ON table2.id = table1.id
GROUP BY table1.id
I am trying to understand what the difference is (if any) between groupping by table1.id and groupping by table2.id. In short, when inner joining two tables on X=Y, what the difference is when groupping by X and when groupping by Y. That's it.
The real world example - pretty straightforward: a table transaction holds transactions information (paid amount, dates etc), and a table transaction_product holds information regarding which products were included in which transaction.
So for example, transaction number 1 could have included products number 1, 2 and 3, and so forth (so the table relation is obviously one-to-many).
The problem: I need to know for each transaction, how much was paid for how many products. This is the query, including both GROUP BY alternatives:
SELECT
`transaction`.id,
SUM(`transaction`.transaction_amount) AS total_amount,
COUNT(`transaction_product`.product_id) AS number_of_products
FROM `transaction`
INNER JOIN `transaction_product` ON `transaction_product`.transaction_id = `transaction`.id
GROUP BY [`transaction`.id [OR] `transaction_product`.transaction_id]
I need to know if there is a difference between the two GROUP BY alternatives. I couldn't find relevant information regarding the GROUP BY behavior in this case in the documentation, therefore any help on clarifying the matter would be much appreciated.
The result of the inner join will be a set of rows with matching transaction IDs, so the set of values that column can have will be the same on both transaction and transaction_product tables.
The group by will return a single row for each available value of the grouped column(s), and all the rows that share the same value will be aggregated with the aggregation function you use. The result
Result: there won't be any difference between the two options you have, because the same rows will be grouped with the exact same criteria, being the set of values the same on both sides.
TL/DR
There is no difference at all.
There is no difference whatsovever which id you choose to include in your GROUP BY clause. The total number of rows for each transaction id will be the number of products for that transaction. This query should get what you need:
SELECT
`transaction`.id,
SUM(`transaction`.transaction_amount) AS total_amount,
COUNT(1) AS number_of_products
FROM `transaction`
INNER JOIN `transaction_product` ON `transaction_product`.transaction_id =
`transaction`.id
GROUP BY `transaction`.id
I have a table of inventory items (holds description, details of item etc.), a table of stock (physical items that we have - items of inventory), and a suppliers table (who supply the stock, but may differ from time to time).
Suppliers -- Stock -- Inventory
Inventory has many stock. Suppliers have many stock. Stock has one supplier, and one inventory
I'm trying to run a query to get all data from inventory, and count how many suppliers it has through a sub query. However, I need to use SELECT *
What I have at the moment:
SELECT
( SELECT COUNT(DISTINCT SupplierID)
FROM Stock
WHERE Stock.InventoryID = Inventory.ID
) AS Suppliers
, *
FROM `Inventory`;
I've tried variations on this, swapping the field order (seen this elsewhere on this site), changing the sub-query etc.
However, it tells me there's an error near '* FROM'. Can anyone suggest a way to do this query please?
Use table aliases:
SELECT (SELECT COUNT(DISTINCT s.SupplierID)
FROM Stock s
WHERE s.InventoryID = i.ID
) AS Suppliers, i.*
FROM `Inventory` i;
The need for a qualification on * is described in the documentation:
Use of an unqualified * with other items in the select list may
produce a parse error. To avoid this problem, use a qualified
tbl_name.* reference
SELECT AVG(score), t1.* FROM t1 ...
I was reading some tutorials about group by clause, i faced the following problem and don't know why it was solved like that, the table is as follows:
the requirement is to select the most expensive product in each category, and the following query was the answer:
SELECT
categoryID, productID, productName, MAX(unitprice)
FROM
products A
WHERE
unitprice = (
SELECT
MAX(unitprice)
FROM
products B
WHERE
B.categoryId = A.categoryID)
GROUP BY categoryID;
i don't know why the above query was the answer, why it wasn't just:
SELECT
categoryID, productID, productName, MAX(unitprice)
FROM
products
GROUP BY categoryID;
also, if the first query is the right one, why MAX function exists in the outer and inner query, isn't it enough to exist in the inner query?
thanks.
The second query will produce an error because it is not possible to have columns in the select clause whitout grouping by them in the Group by clause (unless they are subject to the aggregation).
Therefore you need to first find the highest unit price in each category and then find which product has that uniprice. You can actually accomplish this in many ways. This first query is one of them.
From your picture it looks as others have mentioned that you are using mysql, the MYSQL optimiser doesn't like subqueries very much and it would horrible to run over lots of data, best habit is to use joins where possible (if you look at query plans in postgres, oracle or mssql it will re-write sub-queries as joins 90% of the time)
The second query will run on default mysql as it will group by the missed columns you missed.
Below is an example:
SELECT
A.categoryID, A.productID, A.productName, B.max_unitprice
FROM products A
JOIN (
SELECT
max(unit price) as max_unitprice,
categoryId
FROM products
GROUP BY categoryId) B
ON B.categoryId = A.categoryID
SELECT p.*
FROM products p
WHERE NOT EXISTS ( SELECT 'p2'
FROM products p2
WHERE p2.categoryId = p.categoryId
AND p2.unitPrice > p.unitPrice
)
I have an SQL query that needs to perform multiple inner joins, as follows:
SELECT DISTINCT adv.Email, adv.Credit, c.credit_id AS creditId, c.creditName AS creditName, a.Ad_id AS adId, a.adName
FROM placementlist pl
INNER JOIN
(SELECT Ad_id, List_id FROM placements) AS p
ON pl.List_id = p.List_id
INNER JOIN
(SELECT Ad_id, Name AS adName, credit_id FROM ad) AS a
ON ...
(few more inner joins)
My question is the following: How can I optimize this query? I was under the impression that, even though the way I currently query the database creates small temporary tables (inner SELECT statements), it would still be advantageous to performing an inner join on the unaltered tables as they could have about 10,000 - 100,000 entries (not millions). However, I was told that this is not the best way to go about it but did not have the opportunity to ask what the recommended approach would be.
What would be the best approach here?
To use derived tables such as
INNER JOIN (SELECT Ad_id, List_id FROM placements) AS p
is not recommendable. Let the dbms find out by itself what values it needs from
INNER JOIN placements AS p
instead of telling it (again) by kinda forcing it to create a view on the table with the two values only. (And using FROM tablename is even much more readable.)
With SQL you mainly say what you want to see, not how this is going to be achieved. (Well, of course this is just a rule of thumb.) So if no other columns except Ad_id and List_id are used from table placements, the dbms will find its best way to handle this. Don't try to make it use your way.
The same is true of the IN clause, by the way, where you often see WHERE col IN (SELECT DISTINCT colx FROM ...) instead of simply WHERE col IN (SELECT colx FROM ...). This does exactly the same, but with DISTINCT you tell the dbms "make your subquery's rows distinct before looking for col". But why would you want to force it to do so? Why not have it use just the method the dbms finds most appropriate?
Back to derived tables: Use them when they really do something, especially aggregations, or when they make your query more readable.
Moreover,
SELECT DISTINCT adv.Email, adv.Credit, ...
doesn't look to good either. Yes, sometimes you need SELECT DISTINCT, but usually you wouldn't. Most often it is just a sign that you haven't thought your query through.
An example: you want to select clients that bought product X. In SQL you would say: where a purchase of X EXISTS for the client. Or: where the client is IN the set of the X purchasers.
select * from clients c where exists
(select * from purchases p where p.clientid = c.clientid and product = 'X');
Or
select * from clients where clientid in
(select clientid from purchases where product = 'X');
You don't say: Give me all combinations of clients and X purchases and then boil that down so I just get each client once.
select distinct c.*
from clients c
join purchases p on p.clientid = c.clientid and product = 'X';
Yes, it is very easy to just join all tables needed and then just list the columns to select and then just put DISTINCT in front. But it makes the query kind of blurry, because you don't write the query as you would word the task. And it can make things difficult when it comes to aggregations. The following query is wrong, because you multiply money earned with the number of money-spent records and vice versa.
select
sum(money_spent.value),
sum(money_earned.value)
from user
join money_spent on money_spent.userid = user.userid
join money_earned on money_earned.userid = user.userid;
And the following may look correct, but is still incorrect (it only works when the values happen to be unique):
select
sum(distinct money_spent.value),
sum(distinct money_earned.value)
from user
join money_spent on money_spent.userid = user.userid
join money_earned on money_earned.userid = user.userid;
Again: You would not say: "I want to combine each purchase with each earning and then ...". You would say: "I want the sum of money spent and the sum of money earned per user". So you are not dealing with single purchases or earnings, but with their sums. As in
select
sum(select value from money_spent where money_spent.userid = user.userid),
sum(select value from money_earned where money_earned.userid = user.userid)
from user;
Or:
select
spent.total,
earned.total
from user
join (select userid, sum(value) as total from money_spent group by userid) spent
on spent.userid = user.userid
join (select userid, sum(value) as total from money_earned group by userid) earned
on earned.userid = user.userid;
So you see, this is where derived tables come into play.