Column scope with joined subqueries - mysql

I have to select a counted column, grouped by dates from two sources. I'm joining the resultset as a subquery. However, the result is bogus. As I see the problem is related to the JOIN .. ON clause. This query works fine:
SELECT id
FROM pu a
LEFT JOIN (
SELECT
COUNT(pd.id) AS c_id1,
NULL AS c_id2,
LEFT(pd.start_date, 10) AS date,
pd.pid
FROM
p_d pd
**WHERE pd.pid = 111**
GROUP BY date
UNION
SELECT
NULL AS c_id1,
COUNT(pd.id) AS c_id2,
LEFT(pd.inactivation_date, 10) AS date,
pd.pid
FROM
p_d pd
**WHERE pd.pid = 111**
GROUP BY date
) x
ON x.pid = a.id;
But this one (without the WHERE clause) returns a bad result set:
SELECT id
FROM pu a
LEFT JOIN (
SELECT
COUNT(pd.id) AS c_id1,
NULL AS c_id2,
LEFT(pd.start_date, 10) AS date,
pd.pid
FROM
p_d pd
GROUP BY date
UNION
SELECT
NULL AS c_id1,
COUNT(pd.id) AS c_id2,
LEFT(pd.inactivation_date, 10) AS date,
pd.pid
FROM
p_d pd
GROUP BY date
) x
ON x.pid = a.id;
It is possible to use a.id in the joined subquery somehow? It's "unknown column" now.

In your subquery you are using columns like pd.pid for SELECT that are not part of the GROUP BY and are not aggregated. Such columns are called hidden and in standard SQL this would give syntax error, but mysql permits it, though it is free to choose value from any row in every group.
If you restrict your set with WHERE pd.pid = 111 all the values of pd.pid in the group will be the same so it doesn't matter which row will be used to get it, however without WHERE value of pd.pid will be undefined (mysql will probably choose the one that can fetch you fastest). You also use that undefined pid for the JOIN so you are bound to get wrong results.
http://dev.mysql.com/doc/refman/5.6/en/group-by-hidden-columns.html
It's hard to say however how you should rewrite your query as you don't provide enough info about table schema, what are you trying to achieve and what is the meaning of your table/column names.

Related

Return column value without aggregation for grouped query

Ok, question sound very confusing, I just can't come up with better title.
Here is my query:
SELECT TS.LocationKey, TA.TrailerKey, MAX(TS.ArrivedOnLocal) MaxArrivedOnLocal
FROM dbo.DSPTripStop TS
INNER JOIN dbo.DSPTripAssignment TA ON TS.TripStopKey = TA.ToTripStopKey AND TA.TrailerKey IS NOT NULL
GROUP BY TS.LocationKey, TA.TrailerKey
Query returns list of trailers with locations and last time they were dropped at that location. This is what I need. MAX(time) for location is a goal.
But I'd like to also know which DSPTripStop.TripStopKey this MAX() time happened on.
I can't group by this value. I understand that it is not defined (can be multiple values for the same time). For my purpose ANY random will work. But I can't find any better way then joining second time by MaxArrivedOnLocal to get what I need.
SQL Server already "sees" this data when MAX() aggregated, any way to pull it in this query?
I think this is what you want. Rather than doing a group by, you partition instead, number the rows, then take the top 1
WITH cte AS
(
SELECT TS.LocationKey,
TA.TrailerKey,
TS.ArrivedOnLocal,
TS.TripStopKey,
ROW_NUMBER() OVER (PARTITION BY TS.LocationKey, TA.TrailerKey ORDER BY ArrivedOnLocal DESC) rn
FROM dbo.DSPTripStop TS
INNER JOIN dbo.DSPTripAssignment TA ON TS.TripStopKey = TA.ToTripStopKey AND TA.TrailerKey IS NOT NULL
)
SELECT LocationKey,
TrailerKey,
ArrivedOnLocal,
TripStopKey
FROM cte
WHERE rn = 1
If you need any random value for DSPTripStop.TripStopKey then you can use MAX itself as this will return the latest TripStopKey.
SELECT
TS.LocationKey,
TA.TrailerKey,
MAX(TS.ArrivedOnLocal) MaxArrivedOnLocal,
MAX(TS.TripStopKey)
FROM dbo.DSPTripStop TS
INNER JOIN dbo.DSPTripAssignment TA
ON TS.TripStopKey = TA.ToTripStopKey
AND TA.TrailerKey IS NOT NULL
GROUP BY TS.LocationKey, TA.TrailerKey

Incorrect group by and order by merge

I have couple tables joined in MySQL - one has many others.
And try to select items from one, ordered by min values from another table.
Without grouping in seems to be like this:
Code:
select `catalog_products`.id
, `catalog_products`.alias
, `tmpKits`.`minPrice`
from `catalog_products`
left join `product_kits` on `product_kits`.`product_id` = `catalog_products`.`id`
left join (
SELECT MIN(new_price) AS minPrice, id FROM product_kits GROUP BY id
) AS tmpKits on `tmpKits`.`id` = `product_kits`.`id`
where `category_id` in ('62')
order by product_kits.new_price ASC
Result:
But when I add group by, I get this:
Code:
select `catalog_products`.id
, `catalog_products`.alias
, `tmpKits`.`minPrice`
from `catalog_products`
left join `product_kits` on `product_kits`.`product_id` = `catalog_products`.`id`
left join (
SELECT MIN(new_price) AS minPrice, id FROM product_kits GROUP BY id
) AS tmpKits on `tmpKits`.`id` = `product_kits`.`id`
where `category_id` in ('62')
group by `catalog_products`.`id`
order by product_kits.new_price ASC
Result:
And this is incorrect sorting!
Somehow when I group this results, I get id 280 before 281!
But I need to get:
281|1600.00
280|2340.00
So, grouping breaks existing ordering!
For one, when you apply the GROUP BY to only one column, there is no guarantee that the values in the other columns will be consistently correct. Unfortunately, MySQL allows this type of SELECT/GROUPing to happen other products don't. Two, the syntax of using an ORDER BY in a subquery while allowed in MySQL is not allowed in other database products including SQL Server. You should use a solution that will return the proper result each time it is executed.
So the query will be:
For one, when you apply the GROUP BY to only one column, there is no guarantee that the values in the other columns will be consistently correct. Unfortunately, MySQL allows this type of SELECT/GROUPing to happen other products don't. Two, the syntax of using an ORDER BY in a subquery while allowed in MySQL is not allowed in other database products including SQL Server. You should use a solution that will return the proper result each time it is executed.
So the query will be:
select CP.`id`, CP.`alias`, TK.`minPrice`
from catalog_products CP
left join `product_kits` PK on PK.`product_id` = CP.`id`
left join (
SELECT MIN(`new_price`) AS "minPrice", `id` FROM product_kits GROUP BY `id`
) AS TK on TK.`id` = PK.`id`
where CP.`category_id` IN ('62')
order by PK.`new_price` ASC
group by CP.`id`
The thing is that group by does not recognize order by in MySQL.
Actually, what I was doing is really bad practice.
In this case you should use distinct and by catalog_products.*
In my opinion, group by is really useful when you need group result of agregated functions.
Otherwise you should not use it to get unique values.

MySQL MAX(datetime) not working

I am trying to retrieve the max(date_entered) for a group of computer_ids.
The first query won't return accurate results. The second query gives me accurate results but essentially hangs unless I filter by a specific computer_id.
I'd rather use this first query
SELECT *, max(reports.date_entered)
FROM reports, hardware_reports
WHERE reports.report_id=hardware_reports.report_id
GROUP BY computer_id;
than this second query
SELECT *
FROM reports a
JOIN hardware_reports
ON a.report_id=hardware_reports.report_id
AND a.date_entered = (
SELECT MAX(date_entered)
FROM reports AS b
WHERE a.computer_id = b.computer_id)
and computer_id = 1648;
I need to either optimize second or get max to work in first.
You can alternative join it on a subquery that gets the latest record for every computer_ID.
SELECT a.*, c.*
FROM reports a
INNER JOIN
(
SELECT computer_ID, MAX(date_entered) date_entered
FROM reports
GROUP BY computer_ID
) b ON a.computer_ID = b.computer_ID
AND a.date_entered = b.date_entered
INNER JOIN hardware_reports c
ON a.report_id = c.report_id
To make it more efficient, provide an index on columns:
ALTER TABLE reports INDEX idx_report_compDate (computer_ID, date_entered)

MySQL Inner Join with where clause sorting and limit, subquery?

Everything in the following query results in one line for each invBlueprintTypes row with the correct information. But I'm trying to add something to it. See below the codeblock.
Select
blueprintType.typeID,
blueprintType.typeName Blueprint,
productType.typeID,
productType.typeName Item,
productType.portionSize,
blueprintType.basePrice * 0.9 As bpoPrice,
productGroup.groupName ItemGroup,
productCategory.categoryName ItemCategory,
blueprints.productionTime,
blueprints.techLevel,
blueprints.researchProductivityTime,
blueprints.researchMaterialTime,
blueprints.researchCopyTime,
blueprints.researchTechTime,
blueprints.productivityModifier,
blueprints.materialModifier,
blueprints.wasteFactor,
blueprints.maxProductionLimit,
blueprints.blueprintTypeID
From
invBlueprintTypes As blueprints
Inner Join invTypes As blueprintType On blueprints.blueprintTypeID = blueprintType.typeID
Inner Join invTypes As productType On blueprints.productTypeID = productType.typeID
Inner Join invGroups As productGroup On productType.groupID = productGroup.groupID
Inner Join invCategories As productCategory On productGroup.categoryID = productCategory.categoryID
Where
blueprints.techLevel = 1 And
blueprintType.published = 1 And
productType.marketGroupID Is Not Null And
blueprintType.basePrice > 0
So what I need to get in here is the following table with the columns below it so I can use the values timestamp and sort the entire result by profitHour
tablename: invBlueprintTypesPrices
columns: blueprintTypeID, timestamp, profitHour
I need this information with the following select in mind. Using a select to show my intention of the JOIN/in-query select or whatever that can do this.
SELECT * FROM invBlueprintTypesPrices
WHERE blueprintTypeID = blueprintType.typeID
ORDER BY timestamp DESC LIMIT 1
And I need the main row from table invBlueprintTypes to still show even if there is no result from the invBlueprintTypesPrices. The LIMIT 1 is because I want the newest row possible, but deleting the older data is not a option since history is needed.
If I've understood correctly I think I need a subquery select, but how to do that? I've tired adding the exact query that is above with a AS blueprintPrices after the query's closing ), but did not work with a error with the
WHERE blueprintTypeID = blueprintType.typeID
part being the focus of the error. I have no idea why. Anyone who can solve this?
You'll need to use a LEFT JOIN to check for NULL values in invBlueprintTypesPrices. To mimic the LIMIT 1 per TypeId, you can use the MAX() or to truly make sure you only return a single record, use a row number -- this depends on whether you can have multiple max time stamps for each type id. Assuming not, then this should be close:
Select
...
From
invBlueprintTypes As blueprints
Inner Join invTypes As blueprintType On blueprints.blueprintTypeID = blueprintType.typeID
Inner Join invTypes As productType On blueprints.productTypeID = productType.typeID
Inner Join invGroups As productGroup On productType.groupID = productGroup.groupID
Inner Join invCategories As productCategory On productGroup.categoryID = productCategory.categoryID
Left Join (
SELECT MAX(TimeStamp) MaxTime, TypeId
FROM invBlueprintTypesPrices
GROUP BY TypeId
) blueprintTypePrice On blueprints.blueprintTypeID = blueprintTypePrice.typeID
Left Join invBlueprintTypesPrices blueprintTypePrices On
blueprintTypePrice.TypeId = blueprintTypePrices.TypeId AND
blueprintTypePrice.MaxTime = blueprintTypePrices.TimeStamp
Where
blueprints.techLevel = 1 And
blueprintType.published = 1 And
productType.marketGroupID Is Not Null And
blueprintType.basePrice > 0
Order By
blueprintTypePrices.profitHour
Assuming you might have the same max time stamp with 2 different records, replace the 2 left joins above with something similar to this getting the row number:
Left Join (
SELECT #rn:=IF(#prevTypeId=TypeId,#rn+1,1) rn,
TimeStamp,
TypeId,
profitHour,
#prevTypeId:=TypeId
FROM (SELECT *
FROM invBlueprintTypesPrices
ORDER BY TypeId, TimeStamp DESC) t
JOIN (SELECT #rn:=0) t2
) blueprintTypePrices On blueprints.blueprintTypeID = blueprintTypePrices.typeID AND blueprintTypePrices.rn=1
You don't say where you are putting the subquery. If in the select clause, then you have a problem because you are returning more than one value.
You can't put this into the from clause directly, because you have a correlated subquery (not allowed).
Instead, you can put it in like this:
from . . .
(select *
from invBLueprintTypesPrices ibptp
where ibtp.timestamp = (select ibptp2.timestamp
from invBLueprintTypesPrices ibptp2
where ibptp.blueprintTypeId = ibptp2.blueprintTypeId
order by timestamp desc
limit 1
)
) ibptp
on ibptp.blueprintTypeId = blueprintType.TypeID
This identifies the most recent records for all the blueprintTypeids in the subquery. It then joins in the one that matches.

Getting the number of rows with a GROUP BY query within a subquery

I'm building a MySQL query with subqueries. The query requires, as described in Getting the number of rows with a GROUP BY query, the number of records returned by a group-by query, because I want the number of days with records in the database. So I'm using the following:
SELECT
COUNT(*)
FROM
(
SELECT
cvdbs2.dateDone
FROM
cvdbStatistics cvdbs2
WHERE
cvdbs2.mediatorId = 123
GROUP BY
DATE_FORMAT( cvdbs2.dateDone, "%Y-%d-%m" )
) AS activityTempTable
Now, I want this as a subquery, because I need some more data with different WHERE statements. So my query becomes:
SELECT
x,
y,
z,
(
SELECT
COUNT(*)
FROM
(
SELECT
cvdbs2.dateDone
FROM
cvdbStatistics cvdbs2
WHERE
cvdbs2.mediatorId = mediators.id
GROUP BY
DATE_FORMAT( cvdbs2.dateDone, "%Y-%d-%m" )
) AS activityTempTable
) AS activeDays
FROM
mediators
LEFT JOIN
cvdbStatistics
ON
mediators.id = cvdbStatistics.mediatorId
WHERE
mediators.recruiterId = 409
GROUP BY
mediators.email
(I left out some irrelevant WHERE-statements from my queries. 409 is just an example id, this is inserted by PHP).
Now, I'm getting the following error:
#1054 - Unknown column 'mediators.id' in 'where clause'
MySQL forgot about the mediators.id in the deepest subquery. How can I build a query, with the number of results of a GROUP-BY query, which requires a value from the main query, as one of the results? Why isn't the deepest query aware of 'mediators.id'?
Try the following:
SELECT
x,
y,
z,
(
SELECT
COUNT(distinct DATE_FORMAT( cvdbs2.dateDone, "%Y-%d-%m" ))
FROM
cvdbStatistics cvdbs2
WHERE
cvdbs2.mediatorId = mediators.id
) AS activeDays
FROM
mediators
LEFT JOIN
cvdbStatistics
ON
mediators.id = cvdbStatistics.mediatorId
WHERE
mediators.recruiterId = 409
GROUP BY
mediators.email
Did you try to put also the "mediators" table in the FROM of the deepest subquery ? Because they are two different queries and the tables of the first one are not called in the subquery. I'm not sure of what i say but i think the only relation between the query and the subquery is the result return by the subquery.