SQL subquery going heywire - mysql

I am trying to fetch some combined result from two separate individual tables.
The transaction_fact table has around 3.6 million rows and translation_table has around 300000 rows.
Now i want a sum of amount for all transactions grouped by location and the product within that location. But as the fact table has only location id and product id and i would like the names in the result , I am using sub query.
My query is as follows:
SELECT
( SELECT translation
FROM translation_table
WHERE dim_name LIKE 'location_dim'
AND lang_id LIKE 'es'
AND dim_id LIKE CAST(o.loc_id AS CHAR(50))
AND field_name LIKE 'city') AS Location
, ( SELECT product_name
FROM prod_dim
WHERE prod_id = o.prod_id) AS Product
, SUM(amount)
FROM transaction_fact o
GROUP
BY loc_id
, prod_id
ORDER
BY loc_id
, prod_id;
But this query is not returning anything , just keeps on processing.
I waited for about one and half hour but still no result.
Please tell me what might be going wrong.

Joining the tables should eliminate the need for subqueries and give some performance boost. If not you may need to provide more details on the table structure before we can help. Something like this should get you started:
SELECT t.translation AS Location, p.product_name AS Product, SUM(o.amount) AS Total
FROM transaction_fact o
INNER JOIN translation_table t ON CAST(o.loc_id AS char(50)) = t.dim_id
INNER JOIN prod_dim p ON p.prod_id = o.prod_id
WHERE t.dim_name = 'location_dim'
AND t.lang_id = 'es'
AND t.field_name = 'city'
GROUP BY t.translation, p.product_name
ORDER BY o.loc_id, o.prod_id;
Notes: I've changed the LIKEs to =, as LIKE is for when you want to match on a pattern that includes wildcards.
The CAST that is used in the join to translation_table is not ideal. If you could do away with that you'd get better performance.

Related

MySQL: Optimizing Sub-queries

I have this query I need to optimize further since it requires too much cpu time and I can't seem to find any other way to write it more efficiently. Is there another way to write this without altering the tables?
SELECT category, b.fruit_name, u.name
, r.count_vote, r.text_c
FROM Fruits b, Customers u
, Categories c
, (SELECT * FROM
(SELECT *
FROM Reviews
ORDER BY fruit_id, count_vote DESC, r_id
) a
GROUP BY fruit_id
) r
WHERE b.fruit_id = r.fruit_id
AND u.customer_id = r.customer_id
AND category = "Fruits";
This is your query re-written with explicit joins:
SELECT
category, b.fruit_name, u.name, r.count_vote, r.text_c
FROM Fruits b
JOIN
(
SELECT * FROM
(
SELECT *
FROM Reviews
ORDER BY fruit_id, count_vote DESC, r_id
) a
GROUP BY fruit_id
) r on r.fruit_id = b.fruit_id
JOIN Customers u ON u.customer_id = r.customer_id
CROSS JOIN Categories c
WHERE c.category = 'Fruits';
(I am guessing here that the category column belongs to the categories table.)
There are some parts that look suspicious:
Why do you cross join the Categories table, when you don't even display a column of the table?
What is ORDER BY fruit_id, count_vote DESC, r_id supposed to do? Sub query results are considered unordered sets, so an ORDER BY is superfluous and can be ignored by the DBMS. What do you want to achieve here?
SELECT * FROM [ revues ] GROUP BY fruit_id is invalid. If you group by fruit_id, what count_vote and what r.text_c do you expect to get for the ID? You don't tell the DBMS (which would be something like MAX(count_vote) and MIN(r.text_c)for instance. MySQL should through an error, but silently replacescount_vote, r.text_cbyANY_VALUE(count_vote), ANY_VALUE(r.text_c)` instead. This means you get arbitrarily picked values for a fruit.
The answer hence to your question is: Don't try to speed it up, but fix it instead. (Maybe you want to place a new request showing the query and explaining what it is supposed to do, so people can help you with that.)
Your Categories table seems not joined/related to the others this produce a catesia product between all the rows
If you want distinct resut don't use group by but distint so you can avoid an unnecessary subquery
and you dont' need an order by on a subquery
SELECT category
, b.fruit_name
, u.name
, r.count_vote
, r.text_c
FROM Fruits b
INNER JOIN Customers u ON u.customer_id = r.customer_id
INNER JOIN Categories c ON ?????? /Your Categories table seems not joined/related to the others /
INNER JOIN (
SELECT distinct fruit_id, count_vote, text_c, customer_id
FROM Reviews
) r ON b.fruit_id = r.fruit_id
WHERE category = "Fruits";
for better reading you should use explicit join syntax and avoid old join syntax based on comma separated tables name and where condition
The next time you want help optimizing a query, please include the table/index structure, an indication of the cardinality of the indexes and the EXPLAIN plan for the query.
There appears to be absolutely no reason for a single sub-query here, let alone 2. Using sub-queries mostly prevents the DBMS optimizer from doing its job. So your biggest win will come from eliminating these sub-queries.
The CROSS JOIN creates a deliberate cartesian join - its also unclear if any attributes from this table are actually required for the result, if it is there to produce multiples of the same row in the output, or just an error.
The attribute category in the last line of your query is not attributed to any of the tables (but I suspect it comes from the categories table).
Further, your code uses a GROUP BY clause with no aggregation function. This will produce non-deterministic results and is a bug. Assuming that you are not exploiting a side-effect of that, the query can be re-written as:
SELECT
category, b.fruit_name, u.name, r.count_vote, r.text_c
FROM Fruits b
JOIN Reviews r
ON r.fruit_id = b.fruit_id
JOIN Customers u ON u.customer_id = r.customer_id
ORDER BY r.fruit_id, count_vote DESC, r_id;
Since there are no predicates other than joins in your query, there is no scope for further optimization beyond ensuring there are indexes on the join predicates.
As all too frequently, the biggest benefit may come from simply asking the question of why you need to retrieve every single row in the tables in a single query.

SQL Fetch multiple result counts from same set of data using same query

I am working on product data, part of which has the below structure (let's call it product_serials):
The table is a collection of product serial numbers. The snapped field determines whether a specific product has been purchased or not via it's serial number. Am trying to query the table to get a count of both all serials and also all unpurchased serials of the same product_id, using a single SQL query. So far using COUNT(ps1.id) AND COUNT(ps2.id) ... WHERE ps2.snapped = FALSE does not seem to work, it still counts the same values for both all serials and unpurchased serials, and even exaggerates the count, so am definitely doing something wrong.
What could I be missing?
My SQL query as requested:
SELECT pd.id AS product_id, pd.description,
COUNT(pds.id) AS total, COUNT(pds2.id) AS available
FROM products pd
LEFT JOIN product_serials pds ON pds.product_id = pd.id
LEFT JOIN product_serials pds2 ON pds2.product_id = pd.id
WHERE pds2.snapped = FALSE
GROUP BY pd.id
ORDER BY pd.date_added DESC
Here you join tables (even multiplying them) and then apply a WHERE condition to both.
I suggest something like the following:
SELECT product_id, count(serial), count(unpurchased)
FROM (SELECT product_id, serial,
CASE WHEN snapped THEN NULL ELSE 1 END AS unpurchased)
GROUP BY product_id

sql SELECT query for 3 tables

I have 3 tables:
1. products(product_id,name)
2. orders(id,order_id,product_id)
3. factors(id,order_id,date)
I want to retrieve product names(products.name) where have similar order_id on a date in two last tables.
I use this query for this purpose:
select products.name
from products
WHERE products.product_id ~IN
(
SELECT distinct orders.product_id FROM orders WHERE
order_id IN (select order_id FROM factors WHERE
factors.datex ='2017-04-29') GROUP BY product_id
)
but no result. where is my mistake? how can I resolve that? thanks
Your query should be fine. I am rewriting it to make a few changes to the structure, but not the logic (this makes it easier for me to understand the query):
select p.name
from products p
where p.product_id in (select o.product_id
from orders o
where o.order_id in (select f.order_id
from factors f
where f.datex = '2017-04-29'
)
) ;
Notes on the changes:
When using multiple tables in a query, always qualify the column names.
Use table aliases. They make queries easier to write and to read.
SELECT DISTINCT and GROUP BY are unnecessary in IN subqueries. The logic of IN already handles (i.e. ignores) duplicates. And by explicitly including the operations, you run the risk of a less efficient query plan.
Why might your query not work?
factors.datex has a time component. If so, then this will work date(f.datex) = '2017-04-29'.
There are no factors on that date.
There are no orders that match factors on that date.
There are no products in the orders that match the factors on that date.
In factors table column name is date so it should be -
factors.date ='2017-04-29'
You have written -
factors.datex ='2017-04-29'

MySQL Group two column with where clause on both two group

What I have:
I have two table , first is user_faktorha save invoices data and second is u_payment save payment data .
What I want:
I want to group all data from this two table and have a result as one table with sum both table.
My two table with sample query's is on sqlfiddle : http://sqlfiddle.com/#!2/b9f9e/4
What's problem:
I try to solve this problem , but give wrong result each time , for example (can be see on sqlfiddle) , user/tell named as habib on give wrong sum(price) result.
habib's faktorhaprice = -508261 and habib's paymentprice = 648000 but sum result in main query have wrong data -7115654 and 13000000
what's the solution ?
(Updated) One way:
SELECT tell,SUM(FAKTORHAPRICE) FAKTORHAPRICE, SUM(PaymentPrice) PaymentPrice
FROM (SELECT tell, price as FAKTORHAPRICE, null PaymentPrice
from user_faktorha
union all
SELECT Username as tell, null as FAKTORHAPRICE, Price as PaymentPrice
FROM `u_payment` WHERE Active='1') sq
GROUP BY tell ORDER BY FAKTORHAPRICE ASC;
SQLFiddle here.
The essence of your problem here is that you are trying to relate to unrelated tables. Sure they have common data in the user name, but there is not a clean relation between them like an invoice id that can be used to relate the items together such that the OUTER JOIN wouldn't duplicate records in your result set. My suggestion would be to do the aggregation on each table individually and then join the results like this:
SELECT f.tell, f.faktorhaprice, p.paymentprice
FROM
(SELECT tell, SUM(price) AS faktorhaprice FROM user_faktorha GROUP BY tell) AS f
INNER JOIN
(SELECT username, SUM(price) AS paymentprice FROM u_payment GROUP BY username) AS p
ON f.tell = p.username

How do I limit the result of a subquery in MySQL?

Is there a way of limiting the result of a subquery? The sort of thing I'm trying to achieve can be explained by the query below:
SELECT *
FROM product p
JOIN (
SELECT price
FROM supplierPrices sp
ORDER BY price ASC
LIMIT 1
) ON (p.product_id = sp.product_id)
The idea would be to get only the lowest price for a particular product from a table that had all the price data in it. LIMIT 1 is limiting the entire result set, whereas excluding it would result in a row being returned for each price, with duplicated product data. I tried GROUP BY price as well to no avail.
Once the limit is working I need to apply IFNULL as well, so that if there is no price found at all for any supplier it can return a supplied string, such as "n/a" rather than NULL. I assume that would just mean modifying the SELECT as below, and changing the JOIN to a LEFT JOIN?
SELECT *, IFNULL(price,'n/a')
Just to expand on Wolfy's answer slightly, and bearing in mind this is untested:
SELECT *
FROM product p
LEFT JOIN (
SELECT product_id, MIN(price)
FROM supplierPrices sp
GROUP BY product_id
) x ON (p.product_id = x.product_id)
And, as you say, it should just be a matter of doing an IFNULL on that column to replace it with something sensible.