SELECT TOP 100 with JOIN using LLBLGEN - sql-server-2008

I am using LLBLgen as ORM and want to achieve the following:
Table1:
SessionId
Table2:
SessionId
Timestamp
SELECT TOP 100 * FROM Table1
INNER JOIN Table2 ON Table1.SessionId = Table2.SessionId
ORDER BY Table2.Timestamp DESC
This code is running fine when executing it directly on SQL Server 2008 R2 - returning exactly 100 rows from Table1 if available, but somehow I am unable to achieve the same result with LLBLGen. Currently I'm using still 2.6, but updating is an option if needed.
Is there a possibility to achieve this behavior in LLBLGen?
This is the result if I use normal mechanisms in LLBLGen
SELECT * FROM Table1
INNER JOIN Table2 ON Table1.SessionId = Table2.SessionId
ORDER BY Table2.Timestamp DESC
BTW: I read that LLBLGen takes the TOP 100 results from the reader then kills the connection. Nonetheless the query takes A LOT longer using LLBLGen in comparison to just executing the SQL directly (this counts, to my surprise, also for latter query!)

It doesn't add TOP as that would maybe return duplicate rows as you have a join and there's a situation in your query (you didn't post the real query) where you have distinct violating typed fields in your projection.
In general, when fetching entities, llblgen pro will add TOP in your case and DISTINCT. If it can't add distinct, because your query returns fields of type image, ntext, text, or you sort on a field which isn't in the projection (so distinct can't be applied otherwise sqlserver will throw an error), it won't add TOP either as that could mean you get potential duplicate rows in the set limited by TOP, which are filtered out, as entities are always unique.
Example:
fetching Customers based on a filter on Order (so using a join), will create a Customers INNER JOIN Orders on northwind, but as this is a 1:n relationship, it will create duplicates. If Customers contains a text, image or ntext field, distinct can't be applied, so if we then would specify TOP, you'll get duplicate rows. As llblgen pro never materializes duplicate rows into entities, you'll get less entities back than the value you asked for.
So instead it switches, in THIS particular case, to client side limitation: it kills the connection once it has read the # of entities (not rows!) which you asked for. So if you ask for 10 entities and you have 10000 duplicate rows in the first 10010 rows, you'll get 10000 rows being fetched at least.
So my guess is the sort on table2 which is the issue, as that prevents DISTINCT from being emitted. This is an illegal query on sqlserver:
SELECT DISTINCT C.CompanyName FROM Customers C INNER JOIN Orders O on c.CustomerId = o.CustomerId
ORDER BY o.OrderDate DESC;
The reason is that ORDER BY appends a hidden column for all fields to sort on which aren't in the projection, which ruins the distinct. This is common in RDBMS-s.
So TL;DR: it's a feature :)

Related

MySQL - Duplicate column name Error when counting results of a query with joins

I am trying to count the number of results generated by each query that runs on an application that we are building, these queries are dynamic and are generated by the application, therefore we do not have control over the syntax.
These queries turn out to be very heavy sometimes and we need to add limits before executing them, but before adding those limits we need to know the final count.
We fond a very nice way of obtaining the row count of each query without having to execute the results:
Select count(*) from ($query) as q_count;
Where $query is any given dynamic query generated by the application. This works great until the query contains a Join and within the two join tables there is a column with the same name.
For instance, lets imagine the table CLIENTS with CLID (Client ID) and the table Invoices also with CLID (as a foreign key)
The following query returns all the invoice numbers and client names:
SELECT * FROM CLIENTS LEFT JOIN INVOICES on CLIENTS.CLID = INVOICES.CLID
But the moment I try to count them using Select count(*) from ($query) as q_count where $query is the query above I get the following error:
select count(*) from (SELECT * FROM CLIENTS LEFT JOIN INVOICES on CLIENTS.CLID = INVOICES.CLID) as q_count
Error:
Duplicate column name 'CLID'
I understand why this is happening, in the example above I guess CLID becomes ambiguous and count(*) is unable to differentiate between both, my question is for a solution or a way around it, remember that I don't have control over the queries ($query)
There must be a way because phpMyAdmin is able to calculate the number of results without fetching all the data regardless of the query.
Any thoughts?

mySQL group by function showing lack of data

What I'm after is to see what is the fastest lap time for particular races, which will be identified by using race name and race date.
SELECT lapName AS Name, lapDate AS Date, T
FROM Lap
INNER JOIN Race ON Lap.lapName = Race.Name
AND Lap.lapDate = Race.Date
GROUP BY Date;
It currently only displays 3 different race names, with 4 different dates, meaning I've got 4 combinations total, when there are in fact 9 unique race name, race date combinations.
Unique race data is stored in the Race table. Laptimes are stored in the LapInfo table.
I'm also getting a warning about my group statement saying it is ambiguous though it still runs.
You don't seem to need a join for this:
SELECT l.lapRaceName, l.lapRaceDate,
MIN(l.lapTime)
FROM LapInfo l
GROUP BY l.lapRaceName, l.lapRaceDate;
If you don't need a JOIN, it is superfluous to put one in the query.
First of all, your query is actually invalid SQL. You need to use the MIN function to get the fastest lapTime. Also, you have to GROUP BY lapRaceName, raceDate instead of just lapRaceName. Unfortunately, in this case, mysql is lax enough to execute it without error.
Also, you JOIN LapInfo with Race, and return jthe joined columns from LapInfo that you alias as names that can be found in Race. That's OK from SQL point of view, but that's also usulessly complicated : return directly the columns from the Race table, as they have the names that you are looking for.
Finally, it would be far better to indicate which table each column belongs to. Here, column lapTime belongs to table LapInfo, so let's make it explicit.
Query :
SELECT
Race.raceName,
Race.raceDate,
MIN(LapInfo.lapTime)
FROM
Race
INNER JOIN LapInfo
ON LapInfo.lapRaceName = Race.raceName
AND LapInfo.lapRaceDate = Race.raceDate
GROUP BY
Race.raceName,
Race.raceDate
;

mySQL bringing back result it should not

I have a table filled with tasting notes written by users, and another table that holds ratings that other users give to each tasting note.
The query that brings up all notes that are written by other users that you have not yet rated looks like this:
SELECT tastingNotes.userID, tastingNotes.beerID, tastingNotes.noteID, tastingNotes.note, COALESCE(sum(tasteNoteRate.Score), 0) as count,
CASE
WHEN tasteNoteRate.userVoting = 1162 THEN 1
ELSE 0
END AS userScored
FROM tastingNotes
left join tasteNoteRate on tastingNotes.noteID = tasteNoteRate.noteID
WHERE tastingNotes.userID != 1162
Group BY tastingNotes.noteID
HAVING userScored < 1
ORDER BY count, userScored
User 1162 has written a note for note 113. In the tasteNoteRate table it shows up as:
noteID | userVoting | score
113 1162 0
but it is still returned each time the above query is run....
MySQL allows you to use group by in a rather special way without complaining, see the documentation:
If ONLY_FULL_GROUP_BY is disabled, a MySQL extension to the standard SQL use of GROUP BY permits the select list, HAVING condition, or ORDER BY list to refer to nonaggregated columns even if the columns are not functionally dependent on GROUP BY columns. [...] In this case, the server is free to choose any value from each group, so unless they are the same, the values chosen are indeterminate, which is probably not what you want.
This behaviour was the default behaviour prior to MySQL 5.7.
In your case that means, if there is more than one row in tasteNoteRate for a specific noteID, so if anyone else has already voted for that note, userScored, which is using tasteNoteRate.userVoting without an aggregate function, will be based on a random row - likely the wrong one.
You can fix that by using an aggregate:
select ...,
max(CASE
WHEN tasteNoteRate.userVoting = 1162 THEN 1
ELSE 0
END) AS userScored
from ...
or, because the result of a comparison (to something other than null) is either 1 or 0, you can also use a shorter version:
select ...,
coalesce(max(tasteNoteRate.userVoting = 1162),0) AS userScored
from ...
To be prepared for an upgrade to MySQL 5.7 (and enabled ONLY_FULL_GROUP_BY), you should also already group by all non-aggregate columns in your select-list: group by tastingNotes.userID, tastingNotes.beerID, tastingNotes.noteID, tastingNotes.note.
A different way of writing your query (amongst others) would be to do the grouping of tastingNoteRates in a subquery, so you don't have to group by all the columns of tastingNotes:
select tastingNotes.*,
coalesce(rates.count, 0) as count,
coalesce(rates.userScored,0) as userScored
from tastingNotes
left join (
select tasteNoteRate.noteID,
sum(tasteNoteRate.Score) as count,
max(tasteNoteRate.userVoting = 1162) as userScored
from tasteNoteRate
group by tasteNoteRate.noteID
) rates
on tastingNotes.noteID = rates.noteID and rates.userScored = 0
where tastingNotes.userID != 1162
order by count;
This also allows you to get the notes the user voted on by changing rates.userScored = 0 in the on-clause to = 1 (or remove it to get both).
Change to an inner join.
The tasteNoteRate table is being left joined to the tastingNotes, which means that the full tastingNotes table (matching the where) is returned, and then expanded by the matching fields in the tasteNoteRate table. If tasteNoteRate is not satisfied, it doesn't prevent tastingNotes from returning the matched fields. The inner join will take the intersection.
See here for more explanation of the types of joins:
What's the difference between INNER JOIN, LEFT JOIN, RIGHT JOIN and FULL JOIN?
Make sure to create an index on noteID in both tables or this query and use case will quickly explode.
Note: Based on what you've written as the use case, I'm still not 100% certain that you want to join on noteID. As it is, it will try to give you a joined table on all the notes joined with all the ratings for all users ever. I think the CASE...END is just going to interfere with the query optimizer and turn it into a full scan + join. Why not just add another clause to the where..."and tasteNoteRate.userVoting = 1162"?
If these tables are not 1-1, as it looks like (given the sum() and "group by"), then you will be faced with an exploding problem with the current query. If every note can have 10 different ratings, and there are 10 notes, then there are 100 candidate result rows. If it grows to 1000 and 1000, you will run out of memory fast. Eliminating a few rows that the userID hasn't voted on will remove like what 10 rows from eventually 1,000,000+, and then sum and group them?
The other way you can do it is to reverse the left join:
select ...,sum()... from tasteNoteRate ... left join tastingNotes using (noteID) where userID != xxx group by noteID, that way you only get tastingNotes information for other users' notes.
Maybe that helps, maybe not, but yeah, SCHEMA and specific use cases/example data would be helpful.
With this kind of "ratings of ratings", sometimes its better to maintain a summary table of the vote totals and just track which the user has already voted on. e.g. Don't sum them all up in the select query. Instead, sum it up in the insert...on duplicate key update (total = total + 1); At least thats how I handle the problem in some user ranking tables. They just grow so big so fast.

Basics: Query results not returning as expected

I have less than basic knowledge of MS Access, as I only need to use it to pull down information irregularly before using R to do the manipulation. As a result, I have no SQL coding knowledge - I just use the Access GUI.
My problem: When I create a query that includes multiple tables Access seems to exclude the results that don't have values in all of the tables.
Solution: I'm looking for a simple way, through the GUI, to tell Access to include all the IDs in the parent table, irrespective of whether they have values in any of the child tables. Those IDs that have no values in the child tables should just return with blanks in those columns.
I know this is probably SQL 101 but my searching hasn't returned anything useful.
You should use LEFT JOIN or RIGHT JOIN, the direction meaning the table from which you want to get all rows. See the select below:
SELECT * FROM TABLE_A a LEFT JOIN TABLE_B b ON a.id=b.id
This will return all rows from TABLE_A linked to the corresponding rows from TABLE_B. When there is no match the TABLE_B columns will return NULL.

INNER JOIN with condition on a column - Efficient way

I have 2 tables:
Service_BD:
LOB:
I have a requirement now to drop the redundant columns in LOB table like industryId etc. and use Service_BD table to fetch the LOBs for industryId and then get the details of the particular LOB using LOB table.
I am trying to get a single SQL query using Inner Joins but the results are odd.
When I run a simple SQL query like this:
SELECT industryId, LobId
FROM Service_BD
WHERE industryId = 'I01'
GROUP BY lobId
The results are 9 rows:
Now, I would like to join rest of the LOB columns (minus the dropped ones of course) to get the LOB details out of it. So I use the below query:
SELECT *
FROM LOB
INNER JOIN Service_BD ON Service_BD.lobId = LOB.lobId
WHERE Service_BD.industryId = 'I01'
GROUP BY Service_BD.lobID
I am getting the desired results but I have a doubt if this is the most efficient way or not. I doubt because, both Service_BD and LOB tables have huge amount of data, but I have a feeling that if GROUP BY Service_BD.lobID is performed first that would reduce the time complexity of WHERE condition.
Just wanted to know if this is the right way to write the query or are there any better ways to do the same.
You haven't mentioned which DB engine you are using so I guess you are using MySQL. In most cases the GROUP BY will be done only on the rows meeting the WHERE condition. So the GROUP BY is performed only on the fetched result of both the INNER JOIN and the WHERE clause.
I don't think
SELECT *
FROM LOB INNER
JOIN Service_BD ON Service_BD.lobId = LOB.lobId
WHERE Service_BD.industryId = 'I01'
GROUP BY Service_BD.lobID
improves the performance of your query but it certainly eliminates duplicate lobID from your result. Also, I don't see any other better way to eliminate duplicates except introducing the HAVING clause but I don't think it's going to improve the performance of your query.