creating a custom column from joining two tables - mysql

I am terrible with sub queries if that is what i need to do. First let me show you a preview of my tables and what i'm trying to do.
this is the result i want at the end:
business.name
reviews_count (total count, matching the current queries business_id)
where the b.industry_id matches 7
This is what i'm trying but i feel stuck and dont know how to match the total count, let me explain:
select
b.name,
reviews_count as (select count(*) as count from reviews where business_id = b.business_id),
from business as b
left join reviews as r
on r.business_id = b.id
where b.industry_id = 7
the sub query business_id needs to match the the current businesses id that is being run. Hope i made sense. ( reviews_count doesnt exist, i just made it up to use when i output)

This looks like a job for GROUP BY
SELECT
b.name,
count(distinct r.id)
FROM
businesses b
JOIN reviews r ON r.business_id = b.id
WHERE b.industry_id = 7
GROUP BY b.id
That way you can avoid the subquery alltogether.

Related

MySQL: Optimizing Sub-queries

I have this query I need to optimize further since it requires too much cpu time and I can't seem to find any other way to write it more efficiently. Is there another way to write this without altering the tables?
SELECT category, b.fruit_name, u.name
, r.count_vote, r.text_c
FROM Fruits b, Customers u
, Categories c
, (SELECT * FROM
(SELECT *
FROM Reviews
ORDER BY fruit_id, count_vote DESC, r_id
) a
GROUP BY fruit_id
) r
WHERE b.fruit_id = r.fruit_id
AND u.customer_id = r.customer_id
AND category = "Fruits";
This is your query re-written with explicit joins:
SELECT
category, b.fruit_name, u.name, r.count_vote, r.text_c
FROM Fruits b
JOIN
(
SELECT * FROM
(
SELECT *
FROM Reviews
ORDER BY fruit_id, count_vote DESC, r_id
) a
GROUP BY fruit_id
) r on r.fruit_id = b.fruit_id
JOIN Customers u ON u.customer_id = r.customer_id
CROSS JOIN Categories c
WHERE c.category = 'Fruits';
(I am guessing here that the category column belongs to the categories table.)
There are some parts that look suspicious:
Why do you cross join the Categories table, when you don't even display a column of the table?
What is ORDER BY fruit_id, count_vote DESC, r_id supposed to do? Sub query results are considered unordered sets, so an ORDER BY is superfluous and can be ignored by the DBMS. What do you want to achieve here?
SELECT * FROM [ revues ] GROUP BY fruit_id is invalid. If you group by fruit_id, what count_vote and what r.text_c do you expect to get for the ID? You don't tell the DBMS (which would be something like MAX(count_vote) and MIN(r.text_c)for instance. MySQL should through an error, but silently replacescount_vote, r.text_cbyANY_VALUE(count_vote), ANY_VALUE(r.text_c)` instead. This means you get arbitrarily picked values for a fruit.
The answer hence to your question is: Don't try to speed it up, but fix it instead. (Maybe you want to place a new request showing the query and explaining what it is supposed to do, so people can help you with that.)
Your Categories table seems not joined/related to the others this produce a catesia product between all the rows
If you want distinct resut don't use group by but distint so you can avoid an unnecessary subquery
and you dont' need an order by on a subquery
SELECT category
, b.fruit_name
, u.name
, r.count_vote
, r.text_c
FROM Fruits b
INNER JOIN Customers u ON u.customer_id = r.customer_id
INNER JOIN Categories c ON ?????? /Your Categories table seems not joined/related to the others /
INNER JOIN (
SELECT distinct fruit_id, count_vote, text_c, customer_id
FROM Reviews
) r ON b.fruit_id = r.fruit_id
WHERE category = "Fruits";
for better reading you should use explicit join syntax and avoid old join syntax based on comma separated tables name and where condition
The next time you want help optimizing a query, please include the table/index structure, an indication of the cardinality of the indexes and the EXPLAIN plan for the query.
There appears to be absolutely no reason for a single sub-query here, let alone 2. Using sub-queries mostly prevents the DBMS optimizer from doing its job. So your biggest win will come from eliminating these sub-queries.
The CROSS JOIN creates a deliberate cartesian join - its also unclear if any attributes from this table are actually required for the result, if it is there to produce multiples of the same row in the output, or just an error.
The attribute category in the last line of your query is not attributed to any of the tables (but I suspect it comes from the categories table).
Further, your code uses a GROUP BY clause with no aggregation function. This will produce non-deterministic results and is a bug. Assuming that you are not exploiting a side-effect of that, the query can be re-written as:
SELECT
category, b.fruit_name, u.name, r.count_vote, r.text_c
FROM Fruits b
JOIN Reviews r
ON r.fruit_id = b.fruit_id
JOIN Customers u ON u.customer_id = r.customer_id
ORDER BY r.fruit_id, count_vote DESC, r_id;
Since there are no predicates other than joins in your query, there is no scope for further optimization beyond ensuring there are indexes on the join predicates.
As all too frequently, the biggest benefit may come from simply asking the question of why you need to retrieve every single row in the tables in a single query.

MySQL View in place of subquery does not return the same result

The query below is grabbing some information about a category of toys and showing the most recent sale price for three levels of condition (e.g., Brand New, Used, Refurbished). The price for each sale is almost always different. One other thing - the sales table row id's are not necessarily in chronological order, e.g., a toy with a sale id of 5 could have happened later than a toy with a sale id of 10).
This query works but is not performant. It runs in a manageable amount of time, usually about 1s. However, I need to add yet another left join to include some more data, which causes the query time to balloon up to about 9s, no bueno.
Here is the working but nonperformant query:
SELECT b.brand_name, t.toy_id, t.toy_name, t.toy_number, tt.toy_type_name, cp.catalog_product_id, s.date_sold, s.condition_id, s.sold_price FROM brands AS b
LEFT JOIN toys AS t ON t.brand_id = b.brand_id
JOIN toy_types AS tt ON t.toy_type_id = tt.toy_type_id
LEFT JOIN catalog_products AS cp ON cp.toy_id = t.toy_id
LEFT JOIN toy_category AS tc ON tc.toy_category_id = t.toy_category_id
LEFT JOIN (
SELECT date_sold, sold_price, catalog_product_id, condition_id
FROM sales
WHERE invalid = 0 AND condition_id <= 3
ORDER BY date_sold DESC
) AS s ON s.catalog_product_id = cp.catalog_product_id
WHERE tc.toy_category_id = 1
GROUP BY t.toy_id, s.condition_id
ORDER BY t.toy_id ASC, s.condition_id ASC
But like I said it's slow. The sales table has about 200k rows.
What I tried to do was create the subquery as a view, e.g.,
CREATE VIEW sales_view AS
SELECT date_sold, sold_price, catalog_product_id, condition_id
FROM sales
WHERE invalid = 0 AND condition_id <= 3
ORDER BY date_sold DESC
Then replace the subquery with the view, like
SELECT b.brand_name, t.toy_id, t.toy_name, t.toy_number, tt.toy_type_name, cp.catalog_product_id, s.date_sold, s.condition_id, s.sold_price FROM brands AS b
LEFT JOIN toys AS t ON t.brand_id = b.brand_id
JOIN toy_types AS tt ON t.toy_type_id = tt.toy_type_id
LEFT JOIN catalog_products AS cp ON cp.toy_id = t.toy_id
LEFT JOIN toy_category AS tc ON tc.toy_category_id = t.toy_category_id
LEFT JOIN sales_view AS s ON s.catalog_product_id = cp.catalog_product_id
WHERE tc.toy_category_id = 1
GROUP BY t.toy_id, s.condition_id
ORDER BY t.toy_id ASC, s.condition_id ASC
Unfortunately, this change causes the query to no longer grab the most recent sale, and the sales price it returns is no longer the most recent.
Why is it that the table view doesn't return the same result as the same select as a subquery?
After reading just about every top-n-per-group stackoverflow question and blog article I could find, getting a query that actually worked was fantastic. But now that I need to extend the query one more step I'm running into performance issues. If anybody wants to sidestep the above question and offer some ways to optimize the original query, I'm all ears!
Thanks for any and all help.
The solution to the subquery performance issue was to use the answer provided here: Groupwise maximum
I thought that this approach could only be used when querying a single table, but indeed it works even when you've joined many other tables. You just have to left join the same table twice using the s.date_sold < s2.date_sold join condition and make sure the where clause looks for the null value in the second table's id column.

How is joining with a subquery different from joining without a subquery? Looking for difference between two similar queries

I want to see which user created floor equipment for which customer -- both of these queries do what I want. The second query, however, results with 700 more rows than the first. Could you please explain the difference?
I ran another query that found the difference between the two sets -- sure enough, this query yielded 700 rows. Therefore, the data output is the same, but somehow the second query catches more results. I tried looking at the additional 700 rows, but they all seemed normal and similar to the other results. I can't find the difference by looking at the code, which is what I'm hoping someone can help me with
First query
SELECT customer.name, user.name, floor_equipment.id
FROM customer, user, floor_equipment, floor, building, site
WHERE (floor_equipment.floorID = floor.ID AND floor.buildingID = building.id AND
building.siteID = site.id AND floor_equipment.created_by = user.id)
Second Query
SELECT newTable.custName, newTable.userName, newTable.equipID
FROM (SELECT customer.name as "custName", user.name as "userName",
floor_equipment.id as "equipID", floor_equipment.created_by as "creatorID"
FROM customer, floor_equipment, floor, building, site
WHERE (floor_equipment.floorID = floor.ID AND floor.buildingID = building.id AND
building.siteID = site.id AND site.customerID = customer.ID)) as newTable, user
WHERE user.id = newTable.creatorID
I would expect both of these queries to have the same result, however the second query yields 700 more rows than the first. Aside from the extra rows, both queries result in the same data. The 700 additional rows seem to be normal and similar to the other rows.
NOTE: There is a seemingly pointless subquery in the second query. The purpose of this was for optimization. I am running these queries within Domo, a business intelligence webapp. I wrote the subquery in hopes that it would run faster. Because of the way Domo works, the former took 2 hours whereas the latter took 45 seconds.
Ignoring (or perhaps rectifying) the syntax errors, your first query can be written as follows:
SELECT c.name
, u.name
, fe.id
FROM customer c
CROSS
JOIN user u
JOIN floor_equipment fe
ON fe.created_by = u.id
JOIN floor f
ON f.ID = fe.floorID
JOIN building b
ON b.id = f.buildingID
JOIN site s
ON s.id = b.siteID
Likewise, written a little more coherently, your second query is as follows:
SELECT x.custName
, x.userName
, x.equipID
FROM
( SELECT c.name custName
, u.name userName
, fe.id equipID
, fe.created_by creatorID
FROM customer c
JOIN site s
ON s.customerID = c.ID
JOIN building b
ON b.siteID = s.id
JOIN floor f
ON f.buildingID = b.id
JOIN floor_equipment fe
ON fe.floorID = f.ID
) x
JOIN user u
ON u.id = x.creatorID
Again, we can omit the subquery and write it thus...
SELECT c.name custName
, u.name userName
, fe.id equipID
, fe.created_by creatorID
FROM customer c
JOIN site s
ON s.customerID = c.ID
JOIN building b
ON b.siteID = s.id
JOIN floor f
ON f.buildingID = b.id
JOIN floor_equipment fe
ON fe.floorID = f.ID
JOIN user u
ON u.id = fe.created_by
...so we can see that the first query had a cartesian product (CROSS JOIN), whereas the second query does not.
Your code is a Cartesian product between the tables:
customer, user, floor_equipment, floor, building, site
and your where condition is not for a join but just for a tuple of Boolean value
floor_equipment.floorID = floor.ID,
floor.buildingID = building.id,
building.siteID = site.id,
floor_equipment.created_by = user.id
( boolean, boolean, boolean, boolean)
each boolean is the result for the corresponding match eg:
floor_equipment.floorID = floor.ID
so practically return all the rows because have not matching counterpart.
In the second, your first Cartesian product is expanded by the join between the first result and the matching rows for user.id and newTable.creatorID. Looking to your code, it could be that you need an explicit join syntax and proper on condition.

Merging 3 Tables, Limiting 1 Table With Multiple Fields Needed

Been looking into this for awhile. Hoping someone might be able to provide some insight. I have 3 tables. All of which I'm grabbing multiple columns, but the 3rd I need to limit the output to just the most recent timestamp entry, BUT still display multiple columns.
If I have the following data [ Please see SQL Fiddle ]:
http://sqlfiddle.com/#!2/84b91/6
The fiddle is a list of (names) in Table1(users), (job_name,years) in Table2(job), and then (score, timestamp) in Table3(job_details). All linked together by the users id.
I am definitely not great at MYSQL. I know I'm missing something.. possibly a series of JOINs. I have been able to get Table 1, Table 2 and one column of Table 3 by doing this:
select a.id, a.name, b.job_name, b.years,
(select c.timestamp
from job_details as c
where c.user_id = a.id
order by c.timestamp desc limit 1) score
from users a, job as b where a.id = b.user_id;
At this point, I can get multiple column data on the first two columns, limit the 3rd to one value and sort that value on the last timestamp...
My question is: How does one go about adding a second column to the limit? In the example in the fiddle, I'd like to add the score as well as the timestamp to the output.
I'd like the output to be:
NAME, JOB, YEARS, SCORE, TIMESTAMP. The last two columns would only be the last entry in job_details sorted by the most recent TIMESTAMP.
Please let me know if more information is required! Thank you for your time!
T
Try this:
select a.id, a.name, b.job_name, b.years, c.timestamp, c.score
from users a
INNER JOIN job as b ON a.id = b.user_id
INNER JOIN (SELECT jd.user_id, jd.timestamp, jd.score
FROM job_details as jd
INNER JOIN (select user_id, MAX(timestamp) as tstamp
from job_details
GROUP BY user_id) as max_ts ON jd.user_id = max_ts.user_id
AND jd.timestamp = max_ts.tstamp
) as c ON a.id = c.user_id
;

SQL - subqueries for top result without order by

(Sorry about the title, couldn't think of how to explain it)
So I have an Olympic database, the basic layout is that there's a competitors table with competitornum, givenname, and familyname (other columns aren't necessary for this) There's also a results table with competitornum, and place (between 1 and 8).
I'm trying to get the givenname and familyname and total number of gold, silver, and bronze medals (place = 1, 2 or 3)
It also needs to only display the results with the top number of medals, and all of this without using the Order By clause...
I asked this question before but realised I forgot to say some things, but the previous answer before the bold part was added was:
SELECT c.Givenname, c.Familyname, COUNT(r.places) AS TotalPlaces
FROM Competitors c INNER JOIN Results r
ON r.Competitornum = c.Competitornum
WHERE r.place IN (1,2,3)
GROUP BY c.Givenname, c.Familyname
I'm thinking it needs another subquery like
AND TotalPlaces = (SELECT MAX(TotalPlaces))
but I'm not sure how to use an alias in a subquery when it's above the subquery...
All help is appreciated, thanks!
EDIT: The official question on my assignment (I can't figure out the answer, I've really tried, that's why I'm here):
Which competitor(s) got the largest number of medals (counting gold, silver and bronze all together)? List their given and family names and the total number of their medals (only).
Warning: your solution must not assume that competitor names are always different
Do not use an ORDER BY clause, in any part of this query.
You need to have another subquery for this,
SELECT c.Givenname, c.Familyname, COUNT(r.places) AS TotalPlaces
FROM Competitors c
INNER JOIN Results r ON r.Competitornum = c.Competitornum
WHERE r.place IN (1,2,3)
GROUP BY c.Givenname, c.Familyname
HAVING COUNT(r.places) =
(
SELECT MAX(TotalPlaces)
FROM
(
SELECT COUNT(g.places) AS TotalPlaces
FROM Competitors f
INNER JOIN Results g ON f.Competitornum = g.Competitornum
WHERE g.place IN (1,2,3)
GROUP BY f.Givenname, f.Familyname
)
)
The final answer (thanks to John Woo and lc.)
Pasted this here for anyone that comes across this question in the future:
SELECT c.Givenname, c.Familyname, COUNT(r.place) AS TotalPlaces
FROM Competitors c
INNER JOIN Results r ON r.Competitornum = c.Competitornum
WHERE r.place IN (1,2,3)
GROUP BY c.competitornum, c.Givenname, c.Familyname
HAVING COUNT(r.place) =
(
SELECT MAX(TotalPlaces)
FROM
(
SELECT COUNT(r.place) AS TotalPlaces
FROM Competitors c
INNER JOIN Results r ON r.Competitornum = c.Competitornum
WHERE r.place IN (1,2,3)
GROUP BY c.competitornum, c.Givenname, c.Familyname
)
)