MySQL Outer Join Giving Max Join Size Error - mysql

I thought I knew how to do a simple outer join, but it appears that I am wrong. I am new to MySQL, but I do have Oracle experience.
I have two tables that I want to query. The first table is a members table. The second table is called purchases. Purchases contains a row for each item a member purchases.
The members table contains a little more than 2700 rows. The purchases table contains a little less than 130,000 rows.
I eventually want to get a list of all members with a count of their unique item purchases. Here is my query:
select mem.member_id
,mem.name
,count(distinct pur.item_id)
from members mem
left outer join purchases pur on mem.member_id = pur.member_id
I get the following error when I execute the query:
1104 - The SELECT would examine more than MAX_JOIN_SIZE rows; check your WHERE and use SET SQL_BIG_SELECTS=1 or SET SQL_MAX_JOIN_SIZE=# if the SELECT is okay
The MAX Join Size is currently set to 7 million.
What am I not understanding here?

Your query looks fine, but if that's obviously failing, you might try the following
select
m.member_id,
m.`name`,
coalesce( cnts.UniqItems, 0 ) as UniqItems
from
members m
left join ( select p.member_id, count( distinct p.item_id ) as UniqItems
from purchases p
group by p.member_id ) cnts
on m.member_id = cnts.member_id
After writing, I think the problem may be the reserved word "NAME" for the column and should probably just need to be wrapped in tic marks to differentiate the column vs reserved word.

Related

MySQL - Join 2 tables and count number of entries

I'm trying to join 2 tables and count the number of entries for unique variables in one of the columns. In this case I'm trying to join 2 tables - patients and trials (patients has a FK to trials) and count the number of patients that show up in each trial. This is the code i have so far:
SELECT patients.trial_id, trials.title
FROM trials
JOIN(SELECT patients, COUNT(id) AS Num_Enrolled
FROM patients
GROUP BY trials) AS Trial_Name;
The Outcome I'm trying to acheive is:
Trial_Name Num_Patients
Bushtucker 5
Tribulations 7
I'm completely new to sql and have been struggling with the syntax compared to scripting languages.
It's not 100% clear from your question of the names of your columns however you are after a basic aggregation. Adjust the names of the columns if necessary:
select t.title Trial_Name, Count(*) Num_Patients
from Trials t
join Patients p on p.Trial_Id = t.Id
group by t.title;
Based on Stu-'s answer, I want to say that your column naming is wrong.But you can write query based on logic like this.
SELECT trial.title AS Trial_Name, COUNT(p.id) AS Num_Patients
FROM trial
INNER JOIN patients AS p
ON trial.patient_fk_id = p.id
GROUP BY trial.title,p.id;

MySQL: Optimizing Sub-queries

I have this query I need to optimize further since it requires too much cpu time and I can't seem to find any other way to write it more efficiently. Is there another way to write this without altering the tables?
SELECT category, b.fruit_name, u.name
, r.count_vote, r.text_c
FROM Fruits b, Customers u
, Categories c
, (SELECT * FROM
(SELECT *
FROM Reviews
ORDER BY fruit_id, count_vote DESC, r_id
) a
GROUP BY fruit_id
) r
WHERE b.fruit_id = r.fruit_id
AND u.customer_id = r.customer_id
AND category = "Fruits";
This is your query re-written with explicit joins:
SELECT
category, b.fruit_name, u.name, r.count_vote, r.text_c
FROM Fruits b
JOIN
(
SELECT * FROM
(
SELECT *
FROM Reviews
ORDER BY fruit_id, count_vote DESC, r_id
) a
GROUP BY fruit_id
) r on r.fruit_id = b.fruit_id
JOIN Customers u ON u.customer_id = r.customer_id
CROSS JOIN Categories c
WHERE c.category = 'Fruits';
(I am guessing here that the category column belongs to the categories table.)
There are some parts that look suspicious:
Why do you cross join the Categories table, when you don't even display a column of the table?
What is ORDER BY fruit_id, count_vote DESC, r_id supposed to do? Sub query results are considered unordered sets, so an ORDER BY is superfluous and can be ignored by the DBMS. What do you want to achieve here?
SELECT * FROM [ revues ] GROUP BY fruit_id is invalid. If you group by fruit_id, what count_vote and what r.text_c do you expect to get for the ID? You don't tell the DBMS (which would be something like MAX(count_vote) and MIN(r.text_c)for instance. MySQL should through an error, but silently replacescount_vote, r.text_cbyANY_VALUE(count_vote), ANY_VALUE(r.text_c)` instead. This means you get arbitrarily picked values for a fruit.
The answer hence to your question is: Don't try to speed it up, but fix it instead. (Maybe you want to place a new request showing the query and explaining what it is supposed to do, so people can help you with that.)
Your Categories table seems not joined/related to the others this produce a catesia product between all the rows
If you want distinct resut don't use group by but distint so you can avoid an unnecessary subquery
and you dont' need an order by on a subquery
SELECT category
, b.fruit_name
, u.name
, r.count_vote
, r.text_c
FROM Fruits b
INNER JOIN Customers u ON u.customer_id = r.customer_id
INNER JOIN Categories c ON ?????? /Your Categories table seems not joined/related to the others /
INNER JOIN (
SELECT distinct fruit_id, count_vote, text_c, customer_id
FROM Reviews
) r ON b.fruit_id = r.fruit_id
WHERE category = "Fruits";
for better reading you should use explicit join syntax and avoid old join syntax based on comma separated tables name and where condition
The next time you want help optimizing a query, please include the table/index structure, an indication of the cardinality of the indexes and the EXPLAIN plan for the query.
There appears to be absolutely no reason for a single sub-query here, let alone 2. Using sub-queries mostly prevents the DBMS optimizer from doing its job. So your biggest win will come from eliminating these sub-queries.
The CROSS JOIN creates a deliberate cartesian join - its also unclear if any attributes from this table are actually required for the result, if it is there to produce multiples of the same row in the output, or just an error.
The attribute category in the last line of your query is not attributed to any of the tables (but I suspect it comes from the categories table).
Further, your code uses a GROUP BY clause with no aggregation function. This will produce non-deterministic results and is a bug. Assuming that you are not exploiting a side-effect of that, the query can be re-written as:
SELECT
category, b.fruit_name, u.name, r.count_vote, r.text_c
FROM Fruits b
JOIN Reviews r
ON r.fruit_id = b.fruit_id
JOIN Customers u ON u.customer_id = r.customer_id
ORDER BY r.fruit_id, count_vote DESC, r_id;
Since there are no predicates other than joins in your query, there is no scope for further optimization beyond ensuring there are indexes on the join predicates.
As all too frequently, the biggest benefit may come from simply asking the question of why you need to retrieve every single row in the tables in a single query.

SQL Inner Join 2 Tables

Hoping to get some help with this, I have made a few attempts at an inner join that shows all 'Product' information from the product table for, any product that has sold more than 10 units using an inner join.
PRODUCT TABLE (Columns)
P_CODE, P_DESCRIPT, P_INDATE, P_QOH, P_MIN, P_PRICE, P_DISCOUNT, V_CODE
LINE TABLE (Columns) this table shows the lines/information for each
invoice
INV_NUMBER, LINE NUMBER, P_CODE, LINE_UNITS, LINE_PRICE, LINE_TOTAL
I understand that I have to make the join using the common key attribute (p_code) but I cannot figure out how to do the sum within the inner join.
Here is my most recent attempt:
SELECT * PRODUCT FROM PRODUCT
INNER JOIN line
ON product.p_code = line.p_code
WHERE sum(line_units) >=10
AND line.p_code = product.p_code;
Error: near "product"; syntax error
Any help would be appreciated,
Thank you.
Looks like you have the table name PRODUCT within the SELECT section. And the sum() needs to happen within the SELECT section along with the extra HAVING clause at the end.
SELECT *, sum(line_units) as line_units_sum FROM product
INNER JOIN line ON product.p_code = line.p_code
WHERE line.p_code = product.p_code
HAVING line_units_sum >= 10
The requirement
Show all product information from the product table for any product that has sold more than 10 units.
The solution
Because you only want to build the projection from the product table, and you don't need any column from the line table, you can also use a correlated subquery like the following one:
SELECT *
FROM product
WHERE 10 < (
SELECT COUNT(*)
FROM line
WHERE line.p_code = product.p_code
)
The database optimizer might choose to use a JOIN internally if the cost of the JOIN is lower than other alternatives. So, it does not mean that the query will do row-by-row processing for the outer table records. Only the execution plan can tell how the query is executed by the database engine.

MySQL avg() and count() in one statement with group by

today I'm fighting with MySQL: I've got two tables, that contain records like that (actually there are more columns, but I don't think it's relevant):
Table Metering:
id, value
1000, 0.117
1000, 0.689
1001, 0.050
...
Table Res (there is no more than one record per id in this table):
id, number_residents
1001, 2
...
I try to get results in the following format:
number_residents, avg, count(id)
2, 0.1234, 456
3, 0.5678, 567
...
In words: I try to find out the average of the value-fields with the same number_residents. The id-field is the connection between the two tables. The count(id)-column should show how many ids have been found with that number_residents. The query I could come up with was the following:
select number_residents,count(distinct Metering.id),avg(value)
from Metering, Res
where Metering.id = Res.id
group by number_residents;
The results look like what I searched for but when I tried to validate them I became insecure. I tried it without the distinct at first but that leads to too high values in the count-column of the results.
Is my statement right to get what I want? I thought it might have to to something with the order of execution like asked here, but I actually can't find any official documentation on that...
Thanks for helping!
Judging by the table names, Res is the "parent" table and Metering us the "child" table - that is there are 0-n meterings for each residence.
You have use "old school" joins (and I mean old - the join syntax has been around for 25 years now), which are inner joins, meaning residences without meterings won't participate in the results.
Use an outer join:
select
number_residents,
count(distinct r.id) residences_count,
avg(value) average_value
from Res r
left join Metering m on m.id = r.id
group by number_residents
Although meterings.id = res.id, with a left join counting them may produce different results: I've changed the count to count residences, which for a left join means residences that don't have meterings still count.
Now, nulls (which are what you get from a left-joined table that doesn't have a matching row) don't participate in avg() - either for the numerator or denominator, if you want residences without any meterings to count when calcukating the average (as if they have a single zero metering for the purposes of dividing the total value), use this query:
select
number_residents,
count(distinct r.id) residences_count,
sum(value) / count(r.id) average_value
from Res r
left join Metering m on m.id = r.id
group by number_residents
Because res.id is never null, count(r.id) counts the number of meterings plus 1 for every residence without any meterings.

SQL if number of hires leaves 1 left (Use column alias in where clause)

I am having a small issue with a simple SQL statement.
I need to find out if I have 7 copies of a movie in the movie table and 6 people have rented it out, I need to see that I only have 1 copy left (I need to do this all through the SQL query). I know normally I would do it using PHP and just takeaway the number hired from the total number of copies, but sadly my college wants me to do it the other way.
SELECT *,
COUNT(distinct hire.movie_id) AS num_orders
FROM `movie`
INNER JOIN hire ON hire.movie_id = movie.id
WHERE num_orders < movie.no_copies;
When I run this I get the following issue #1054 - Unknown column 'num_orders' in 'where clause'.
You can't use an alias in a where predicate - you either need to repeat the predicate, wrap the query in a derived table before filtering on the alias, or you can also use HAVING as per below (in MySql, at least).
I don't see the need for DISTINCT h.movie_id (since you want to count the rentals? - possibly DISTINCT h.hireid ?), and it seems you will need to group by the movies to count the number of rentals.
How about:
SELECT m.id, m.no_copies, COUNT(h.movie_id) AS num_rentals
FROM `movie` m
INNER JOIN hire h ON h.movie_id = m.id
GROUP BY m.id, m.no_copies
HAVING num_rentals < m.no_copies;