mySQL group by function showing lack of data - mysql

What I'm after is to see what is the fastest lap time for particular races, which will be identified by using race name and race date.
SELECT lapName AS Name, lapDate AS Date, T
FROM Lap
INNER JOIN Race ON Lap.lapName = Race.Name
AND Lap.lapDate = Race.Date
GROUP BY Date;
It currently only displays 3 different race names, with 4 different dates, meaning I've got 4 combinations total, when there are in fact 9 unique race name, race date combinations.
Unique race data is stored in the Race table. Laptimes are stored in the LapInfo table.
I'm also getting a warning about my group statement saying it is ambiguous though it still runs.

You don't seem to need a join for this:
SELECT l.lapRaceName, l.lapRaceDate,
MIN(l.lapTime)
FROM LapInfo l
GROUP BY l.lapRaceName, l.lapRaceDate;
If you don't need a JOIN, it is superfluous to put one in the query.

First of all, your query is actually invalid SQL. You need to use the MIN function to get the fastest lapTime. Also, you have to GROUP BY lapRaceName, raceDate instead of just lapRaceName. Unfortunately, in this case, mysql is lax enough to execute it without error.
Also, you JOIN LapInfo with Race, and return jthe joined columns from LapInfo that you alias as names that can be found in Race. That's OK from SQL point of view, but that's also usulessly complicated : return directly the columns from the Race table, as they have the names that you are looking for.
Finally, it would be far better to indicate which table each column belongs to. Here, column lapTime belongs to table LapInfo, so let's make it explicit.
Query :
SELECT
Race.raceName,
Race.raceDate,
MIN(LapInfo.lapTime)
FROM
Race
INNER JOIN LapInfo
ON LapInfo.lapRaceName = Race.raceName
AND LapInfo.lapRaceDate = Race.raceDate
GROUP BY
Race.raceName,
Race.raceDate
;

Related

Join Performances When Searching For NULL Value

I need to find a value that exists in LoyaltyTransactionBasketItemStores table but not in DimProductConsolidate table. I need the item code and its corresponding company. This is my query
SELECT
A.ProductReference, A.CompanyCode
FROM
(SELECT ProductReference, CompanyCode FROM dwhdb.LoyaltyTransactionsBasketItemsStores GROUP BY ProductReference) A
LEFT JOIN
(SELECT LoyaltyVariantArticleCode FROM dwhdb.DimProductConsolidate) B ON B.LoyaltyVariantArticleCode = A.ProductReference
WHERE
B.LoyaltyVariantArticleCode IS NULL
It is a pretty straight forward query. But when I run it, it's taking 1 hour and still not finish. Then I use EXPLAIN and this is the result
But when I remove the CompanyCode from my query, its performance is increasing a lot. This is the EXPLAIN result
I want to know why is this happening and is there any way to get ProductReference and its company with a lot more better performance?
Your current query is rife with syntax and structural errors. I would use exists logic here:
SELECT a.ProductReference, a.CompanyCode
FROM dwhdb.LoyaltyTransactionsBasketItemsStores a
WHERE NOT EXISTS (SELECT 1 FROM dwhdb.DimProductConsolidate b
WHERE b.LoyaltyVariantArticleCode = a.ProductReference);
Your current query is doing a GROUP BY in the first subquery, but you never select aggregates, but rather other non aggregate columns. On most other databases, and even on MySQL in strict mode, this syntax is not allowed. Also, there is no need to have 2 subqueries here. Rather, just select from the basket table and then assert that matching records do not exist in the other table.

Not selecting duplicates in join / where query

I've been trying to learn MySQL, and I'm having some trouble creating a join query to not select duplicates.
Basically, here's where I'm at :
SELECT atable.phonenumber, btable.date
FROM btable
LEFT JOIN atable ON btable.id = atable.id
WHERE btable.country_id = 4
However, in my database, there is the possibility of having duplicate rows in column atable.phonenumber.
For example (added asterisks for clarity)
phonenumber | date
-------------|-----------
*555-681-2105 | 2015-08-12
555-425-5161 | 2015-08-15
331-484-7784 | 2015-08-17
*555-681-2105 | 2015-08-25
.. and so on.
I tried using SELECT DISTINCT but that doesn't work. I also was looking through other solutions which recommended GROUP BY, but that threw an error, most likely because of my WHERE clause and condition. Not really sure how I can easily accomplish this.
DISTINCT applies to the whole row being returned, essentially saying "I want only unique rows" - any row value may participate in making the row unique
You are getting phone numbers duplicated because you're only looking at the column in isolation. The database is looking at phone number and also date. The rows you posted have different dates, and these hence cause the rows to be different
I suggest you do as the commenter recommended and decide what you want to do with the dates. If you want the latest date for a phone number, do this:
SELECT atable.phonenumber, max(btable.date)
FROM battle
LEFT JOIN atable ON btable.id = atable.id
WHERE btable.country_id = 4
GROUP BY atable.phonenumber
When you write a query that uses grouping, you will get a set of rows where there is only one set of value combinations for anything that is in the group by list. In this case, only unique phone numbers. But, because you want other values as well (I.e. Date) you MUST use what's called an aggregate function, to specify what you want to do with all the various values that aren't part of the unique set. Sometimes it will be MAX or MIN, sometimes it will be SUM, COUNT, AVG and so on.
if you're familiar with hash tables or dictionaries from elsewhere in programming, this is what a group by is: it maps a set of values (a key) to a list of rows that have those key values, and then the aggregating function is applied to any of the values in the list associated with the key
The simple rule when using group by (and one that MySQL will do implicitly for you) is to write queries thus:
SELECT
List,
of,
columns,
you,
want,
in,
unique,
combination,
FN(List),
FN(of),
FN(columns),
FN(you),
FN(want),
FN(aggregating)
FROM table
GROUP BY
List,
of,
columns,
you,
want,
in,
unique,
combination
i.e. You can copy paste from your select list to your group list. MySQL does this implicitly for you if you don't do it (i.e. If you use one or more aggregate functions like max in your select list, but forget or omit the group by clause- it will take everything that isn't in an agggregate function and run the grouping as if you'd written it). Whether group by is hence largely redundant is often debated, but there do exist other things you can do with a group by, such as rollup, cube and grouping sets. Also you can group on a column, if that column is used in a deterministic function, without having to group on the result of he deterministic function. Whether there is any point to doing so is a debate for another time :)
You should add GROUP BY, and an aggregate to the date field, something like this:
SELECT atable.phonenumber, MAX(btable.date)
FROM btable
LEFT JOIN atable ON btable.id = atable.id
WHERE btable.country_id = 4
GROUP BY atable.phonenumber
This will return the maximum date, hat is the latest date...

mySQL bringing back result it should not

I have a table filled with tasting notes written by users, and another table that holds ratings that other users give to each tasting note.
The query that brings up all notes that are written by other users that you have not yet rated looks like this:
SELECT tastingNotes.userID, tastingNotes.beerID, tastingNotes.noteID, tastingNotes.note, COALESCE(sum(tasteNoteRate.Score), 0) as count,
CASE
WHEN tasteNoteRate.userVoting = 1162 THEN 1
ELSE 0
END AS userScored
FROM tastingNotes
left join tasteNoteRate on tastingNotes.noteID = tasteNoteRate.noteID
WHERE tastingNotes.userID != 1162
Group BY tastingNotes.noteID
HAVING userScored < 1
ORDER BY count, userScored
User 1162 has written a note for note 113. In the tasteNoteRate table it shows up as:
noteID | userVoting | score
113 1162 0
but it is still returned each time the above query is run....
MySQL allows you to use group by in a rather special way without complaining, see the documentation:
If ONLY_FULL_GROUP_BY is disabled, a MySQL extension to the standard SQL use of GROUP BY permits the select list, HAVING condition, or ORDER BY list to refer to nonaggregated columns even if the columns are not functionally dependent on GROUP BY columns. [...] In this case, the server is free to choose any value from each group, so unless they are the same, the values chosen are indeterminate, which is probably not what you want.
This behaviour was the default behaviour prior to MySQL 5.7.
In your case that means, if there is more than one row in tasteNoteRate for a specific noteID, so if anyone else has already voted for that note, userScored, which is using tasteNoteRate.userVoting without an aggregate function, will be based on a random row - likely the wrong one.
You can fix that by using an aggregate:
select ...,
max(CASE
WHEN tasteNoteRate.userVoting = 1162 THEN 1
ELSE 0
END) AS userScored
from ...
or, because the result of a comparison (to something other than null) is either 1 or 0, you can also use a shorter version:
select ...,
coalesce(max(tasteNoteRate.userVoting = 1162),0) AS userScored
from ...
To be prepared for an upgrade to MySQL 5.7 (and enabled ONLY_FULL_GROUP_BY), you should also already group by all non-aggregate columns in your select-list: group by tastingNotes.userID, tastingNotes.beerID, tastingNotes.noteID, tastingNotes.note.
A different way of writing your query (amongst others) would be to do the grouping of tastingNoteRates in a subquery, so you don't have to group by all the columns of tastingNotes:
select tastingNotes.*,
coalesce(rates.count, 0) as count,
coalesce(rates.userScored,0) as userScored
from tastingNotes
left join (
select tasteNoteRate.noteID,
sum(tasteNoteRate.Score) as count,
max(tasteNoteRate.userVoting = 1162) as userScored
from tasteNoteRate
group by tasteNoteRate.noteID
) rates
on tastingNotes.noteID = rates.noteID and rates.userScored = 0
where tastingNotes.userID != 1162
order by count;
This also allows you to get the notes the user voted on by changing rates.userScored = 0 in the on-clause to = 1 (or remove it to get both).
Change to an inner join.
The tasteNoteRate table is being left joined to the tastingNotes, which means that the full tastingNotes table (matching the where) is returned, and then expanded by the matching fields in the tasteNoteRate table. If tasteNoteRate is not satisfied, it doesn't prevent tastingNotes from returning the matched fields. The inner join will take the intersection.
See here for more explanation of the types of joins:
What's the difference between INNER JOIN, LEFT JOIN, RIGHT JOIN and FULL JOIN?
Make sure to create an index on noteID in both tables or this query and use case will quickly explode.
Note: Based on what you've written as the use case, I'm still not 100% certain that you want to join on noteID. As it is, it will try to give you a joined table on all the notes joined with all the ratings for all users ever. I think the CASE...END is just going to interfere with the query optimizer and turn it into a full scan + join. Why not just add another clause to the where..."and tasteNoteRate.userVoting = 1162"?
If these tables are not 1-1, as it looks like (given the sum() and "group by"), then you will be faced with an exploding problem with the current query. If every note can have 10 different ratings, and there are 10 notes, then there are 100 candidate result rows. If it grows to 1000 and 1000, you will run out of memory fast. Eliminating a few rows that the userID hasn't voted on will remove like what 10 rows from eventually 1,000,000+, and then sum and group them?
The other way you can do it is to reverse the left join:
select ...,sum()... from tasteNoteRate ... left join tastingNotes using (noteID) where userID != xxx group by noteID, that way you only get tastingNotes information for other users' notes.
Maybe that helps, maybe not, but yeah, SCHEMA and specific use cases/example data would be helpful.
With this kind of "ratings of ratings", sometimes its better to maintain a summary table of the vote totals and just track which the user has already voted on. e.g. Don't sum them all up in the select query. Instead, sum it up in the insert...on duplicate key update (total = total + 1); At least thats how I handle the problem in some user ranking tables. They just grow so big so fast.

Error Code: 1242 Subquery returns more than 1 row mysql

I am trying select the cost based on destination(DEST). If the cost not exist for country, then i have to find for continent, if the cost not exist for continent also, then i have select based on others(destination).
SELECT SUM(CD.COST+CW.COST) AS TOTAL_COST
FROM SHIP_COST_BY_DEST CD
INNER JOIN SHIP_COST_BY_WEIGHT CW
ON CD.PROD_CODE = CW.PROD_CODE
WHERE IF((SELECT UPPER(CD.DEST) FROM SHIP_COST_BY_DEST),
(UPPER(CD.DEST) = UPPER('Others')),
(IF((SELECT UPPER(CD.DEST) FROM SHIP_COST_BY_DEST
WHERE UPPER(CD.DEST)=UPPER('India')),
UPPER(CD.DEST) = UPPER('India'),'Asia')))
Could any one help me
That's an awfully complex WHERE clause - it will be a nightmare to maintain. But the problem is that one of your subqueries (The selects inside the parentheses) is returning multiple values and logic doesn't permit that. The easiest thing for you to do (since I don't have your data) is to run each of those subqueries separately and see which returns multiple values, and see if it logically makes sense at that place.
For example, you can't say
Name = (SELECT Name FROM AllCustomers)
because the name can't EQUAL all those names at once. You can, however, say
Name IN (SELECT Name FROM AllCustomers)
because that gives you a list of possibilities for it to be IN. Also, you need to make sure that not only are all of your subqueries only returning one answer if appropriate NOW, but that they can ONLY return one answer, or you'll end up with a problem in production when the data is different from what you tested with.

How do I join one table onto another where userid = userid but only for that date?

I'm looking to take the total time a user worked on each batch at his workstation, the total estimated work that was completed, the amount the user was paid, and how many failures the user has had for each day this year. If I can join all of this into one query then I can use it in excel and format things nicely in pivot tables and such.
EDIT: I've realized that is only possible to do this in multiple queries so I have narrowed my scope down to this:
SELECT batch_log.userid,
batches.operation_id,
SUM(TIME_TO_SEC(ramses.batch_log.time_elapsed)),
SUM(ramses.tasks.estimated_nonrecurring + ramses.tasks.estimated_recurring),
DATE(start_time)
FROM batch_log
JOIN batches ON batch_log.batch_id=batches.id
JOIN ramses.tasks ON ramses.batch_log.batch_id=ramses.tasks.batch_id
JOIN protocase.tblusers on ramses.batch_log.userid = protocase.tblusers.userid
WHERE DATE(ramses.batch_log.start_time) > "2011-01-01"
AND protocase.tblusers.active = 1
GROUP BY userid, batches.operation_id, start_time
ORDER BY start_time, userid ASC
The cross join was causing the problem.
No, in general a Having clause is used to filter the results of your Group by - for example, only reporting those who were paid for more than 24 hours in a day (HAVING SUM(ramses.timesheet_detail.paidTime) > 24). Unless you need to perform filtering of aggregate results, you shouldn't need a having clause at all.
Most of those conditions should be moved into a where clause, or as part of the joins, for two reasons - 1) Filtering should in general be done as soon as possible, to limit the work the query needs to perform. 2) If the filtering is already done, restating it may cause the query to perform additional, unneeded work.
From what I've seen so far, it appears that you're trying to roll things up by the day - try changing the last column in the group by clause to date(ramses.batch_log.start_time), or you're grouping by (what I assume is) a timestamp.
EDIT:
About schema names - yes, you can name them in the from and join sections. Often, too, the query may be able to resolve the needed schemas based on some default search list (how or if this is set up depends on your database).
Here is how I would have reformatted the query:
SELECT tblusers.userid, operations.name AS name,
SUM(TIME_TO_SEC(batch_log.time_elapsed)) AS time_elapsed,
SUM(tasks.estimated_nonrecurring + tasks.estimated_recurring) AS total_estimated,
SUM(timesheet_detail.paidTime) as hours_paid,
DATE(start_time) as date_paid
FROM tblusers
JOIN batch_log
ON tblusers.userid = batch_log.userid
AND DATE(batch_log.start_time) >= "2011-01-01"
JOIN batches
ON batch_log.batch_id = batches.id
JOIN operations
ON operations.id = batches.operation_id
JOIN tasks
ON batches.id = tasks.batch_id
JOIN timesheet_detail
ON tblusers.userid = timesheet_detail.userid
AND batch_log.start_time = timesheet_detail.for_day
AND DATE(timesheet_detail.for_day) = DATE(start_time)
WHERE tblusers.departmentid = 8
GROUP BY tblusers.userid, name, DATE(batch_log.start_time)
ORDER BY date_paid ASC
Of particular concern is the batch_log.start_time = timesheet_detail.for_day line, which is comparing (what are implied to be) timestamps. Are these really equal? I expect that one or both of these should be wrapped in a date() function.
As for why you may be getting unexpected data - you appear to have eliminated some of your join conditions. Without knowing the exact setup and use of your database, I cannot give the exact reason for your results (or even able to say they are wrong), but I think the fact that you join to the operations table without any join condition is probably to blame - if there are 2 records in that table, it will double all of your previous results, and it looks like there may be 12. You also removed operations.name from the group by clause, which may or may not give you the results you want. I would look into the rest of your table relationships, and see if there are any further restrictions that need to be made.