I have couple tables joined in MySQL - one has many others.
And try to select items from one, ordered by min values from another table.
Without grouping in seems to be like this:
Code:
select `catalog_products`.id
, `catalog_products`.alias
, `tmpKits`.`minPrice`
from `catalog_products`
left join `product_kits` on `product_kits`.`product_id` = `catalog_products`.`id`
left join (
SELECT MIN(new_price) AS minPrice, id FROM product_kits GROUP BY id
) AS tmpKits on `tmpKits`.`id` = `product_kits`.`id`
where `category_id` in ('62')
order by product_kits.new_price ASC
Result:
But when I add group by, I get this:
Code:
select `catalog_products`.id
, `catalog_products`.alias
, `tmpKits`.`minPrice`
from `catalog_products`
left join `product_kits` on `product_kits`.`product_id` = `catalog_products`.`id`
left join (
SELECT MIN(new_price) AS minPrice, id FROM product_kits GROUP BY id
) AS tmpKits on `tmpKits`.`id` = `product_kits`.`id`
where `category_id` in ('62')
group by `catalog_products`.`id`
order by product_kits.new_price ASC
Result:
And this is incorrect sorting!
Somehow when I group this results, I get id 280 before 281!
But I need to get:
281|1600.00
280|2340.00
So, grouping breaks existing ordering!
For one, when you apply the GROUP BY to only one column, there is no guarantee that the values in the other columns will be consistently correct. Unfortunately, MySQL allows this type of SELECT/GROUPing to happen other products don't. Two, the syntax of using an ORDER BY in a subquery while allowed in MySQL is not allowed in other database products including SQL Server. You should use a solution that will return the proper result each time it is executed.
So the query will be:
For one, when you apply the GROUP BY to only one column, there is no guarantee that the values in the other columns will be consistently correct. Unfortunately, MySQL allows this type of SELECT/GROUPing to happen other products don't. Two, the syntax of using an ORDER BY in a subquery while allowed in MySQL is not allowed in other database products including SQL Server. You should use a solution that will return the proper result each time it is executed.
So the query will be:
select CP.`id`, CP.`alias`, TK.`minPrice`
from catalog_products CP
left join `product_kits` PK on PK.`product_id` = CP.`id`
left join (
SELECT MIN(`new_price`) AS "minPrice", `id` FROM product_kits GROUP BY `id`
) AS TK on TK.`id` = PK.`id`
where CP.`category_id` IN ('62')
order by PK.`new_price` ASC
group by CP.`id`
The thing is that group by does not recognize order by in MySQL.
Actually, what I was doing is really bad practice.
In this case you should use distinct and by catalog_products.*
In my opinion, group by is really useful when you need group result of agregated functions.
Otherwise you should not use it to get unique values.
Related
I am trying to combine a query I have run on one table with some columns on another table. The query I ran calculates the total of something for me by this:
SELECT security.Loan_id
, SUM(security.SecMktValue) AS TotalSecMktValue
FROM security
GROUP
BY Loan_id
ORDER
BY loan_id ASC;
I then tried to join this query with columns from another table by:
SELECT loans.Loan_id, loans.TotalLoanAmt
FROM loans
JOIN(SELECT SUM(security.SecMktValue) AS TotalSecMktValue,security.Loan_id
FROM security
GROUP BY Loan_id ASC)
ON loans.Loan_id = security.Loan_id;
However, this wont run as it says there is an error with my SQL syntax even though nothing is underlined in red. Does anyone know why that is?
You're missing an alias:
SELECT loans.Loan_id, loans.TotalLoanAmt
FROM loans
JOIN(
SELECT SUM(security.SecMktValue) AS TotalSecMktValue,security.Loan_id
FROM security
GROUP BY Loan_id ASC
) security -- <-- here
ON loans.Loan_id = security.Loan_id;
MySQL requires an alias to be assigned to the derived table (i.e. a correlation name associated with the inline view). The qualifier security is out of scope outside the inline view i.e. it's not a valid reference in the outer query.)
Here's an example, assigning the alias t to the derived table. Notice that in the outer query, references to columns from the inline view are qualified with t.
SELECT l.loan_id
, l.totalloanamt
, t.totalsecmktvalue
FROM loans l
LEFT
JOIN (
SELECT s.loan_id
, SUM(s.secmktvalue) AS totalsecmktvalue
FROM security s
GROUP
BY s.loan_id
) t
ON t.loan_id = l.loan_id
ORDER
BY l.loan_id
If i got you question right then this should work :)
SELECT security.Loan_id,SUM(security.SecMktValue) AS TotalSecMktValue
FROM security security
JOIN loans loan
ON loans.Loan_id = security.Loan_id
GROUP BY Loan_id ASC
I have SQL query with LEFT JOIN:
SELECT COUNT(stn.stocksId) AS count_stocks
FROM MedicalFacilities AS a
LEFT JOIN stocks stn ON
(stn.stocksIdMF = ( SELECT b.MedicalFacilitiesIdUser
FROM medicalfacilities AS b
WHERE b.MedicalFacilitiesIdUser = a.MedicalFacilitiesIdUser
ORDER BY stn.stocksId DESC LIMIT 1)
AND stn.stocksEndDate >= UNIX_TIMESTAMP() AND stn.stocksStartDate <= UNIX_TIMESTAMP())
These query I want to select one row from table stocks by conditions and with field equal value a.MedicalFacilitiesIdUser.
I get always count_stocks = 0 in result. But I need to get 1
The count(...) aggregate doesn't count null, so its argument matters:
COUNT(stn.stocksId)
Since stn is your right hand table, this will not count anything if the left join misses. You could use:
COUNT(*)
which counts every row, even if all its columns are null. Or a column from the left hand table (a) that is never null:
COUNT(a.ID)
Your subquery in the on looks very strange to me:
on stn.stocksIdMF = ( SELECT b.MedicalFacilitiesIdUser
FROM medicalfacilities AS b
WHERE b.MedicalFacilitiesIdUser = a.MedicalFacilitiesIdUser
ORDER BY stn.stocksId DESC LIMIT 1)
This is comparing MedicalFacilitiesIdUser to stocksIdMF. Admittedly, you have no sample data or data layouts, but the naming of the columns suggests that these are not the same thing. Perhaps you intend:
on stn.stocksIdMF = ( SELECT b.stocksId
-----------------------------^
FROM medicalfacilities AS b
WHERE b.MedicalFacilitiesIdUser = a.MedicalFacilitiesIdUser
ORDER BY b.stocksId DESC
LIMIT 1)
Also, ordering by stn.stocksid wouldn't do anything useful, because that would be coming from outside the subquery.
Your subquery seems redundant and main query is hard to read as much of the join statements could be placed in where clause. Additionally, original query might have a performance issue.
Recall WHERE is an implicit join and JOIN is an explicit join. Query optimizers
make no distinction between the two if they use same expressions but readability and maintainability is another thing to acknowledge.
Consider the revised version (notice I added a GROUP BY):
SELECT COUNT(stn.stocksId) AS count_stocks
FROM MedicalFacilities AS a
LEFT JOIN stocks stn ON stn.stocksIdMF = a.MedicalFacilitiesIdUser
WHERE stn.stocksEndDate >= UNIX_TIMESTAMP()
AND stn.stocksStartDate <= UNIX_TIMESTAMP()
GROUP BY stn.stocksId
ORDER BY stn.stocksId DESC
LIMIT 1
This is puzzling me and no amount of the Google is helping me, hoping someone can point me in the right direction.
Please note that I have omitted some fields from the tables that don't relate to the question just to simplify things.
contacts
contact_id
name
email
contact_uuids
uuid
contact_id
visitor_activity
uuid
event
contact_communications
comm_id
contact_id
employee_id
Query
SELECT
`c`.*,
(SELECT COUNT(`log_id`) FROM `contact_communications` `cc` WHERE `cc`.`contact_id` = `c`.`contact_id`) as `num_comms`,
(SELECT MAX(`date`) FROM `contact_communications` `cc` WHERE `cc`.`contact_id` = `c`.`contact_id`) as `latest_date`,
(SELECT MIN(`date`) FROM `contact_communications` `cc` WHERE `cc`.`contact_id` = `c`.`contact_id`) as `first_date`,
(SELECT COUNT(`vaid`) FROM `visitor_activity` `va` WHERE `va`.`uuid` = `cu`.`uuid`) as `num_act`
FROM `contacts` `c`
LEFT JOIN `contact_uuids` `cu` ON `c`.`contact_id` = `cu`.`contact_id`
GROUP BY `c`.`contact_id`
ORDER BY `c`.`name` ASC
Some contacts have multiple UUIDs (upwards of 20 or 30).
When I perform the query WITHOUT the GROUP BY statement, I get the results I expect - a row returned for each UUID that exists for that contact, with the correct "num_comms" and "num_act" numbers.
However when I add the GROUP BY statement, the "num_comms" is a lot smaller then expected and the "num_act" returns only the value from the first row without the GROUP BY statement.
I tried doing a "WHERE NOT IN" in the subquery, however that simply crashed the server as it was far too intense.
So - how do I get this to add up all the COUNT values from the LEFT JOIN and not just return the first value?
Also if anyone can help me optimize this that would be great.
Two problems:
GROUP BY c.contact_id does not include all the non-aggregate columns. This is a MySQL extension. What you get is random values for the rows other than contact_id
The JOIN adds confusion. Your only use for visitor_activity is the COUNT(*) one it. But that does not make sense since it is limited to one UUID, whereas the row is limited to one contact_id. Rethink the purpose of that.
Remove this line:
(SELECT COUNT(`vaid`) FROM `visitor_activity` `va` WHERE `va`.`uuid` = `cu`.`uuid`) as `num_act`
and the rest may work ok.
I will continue with the assumption that you want the COUNT of all rows in visitor_activity for all the uuids associated with the one contact_id.
See if this:
( SELECT COUNT(*)
FROM `contacts` c2
JOIN `visitor_activity` USING(uuid)
WHERE c2.contact_id = c.contact_id as `num_act` ) AS num_act
will work for the last subquery. At the same time, remove the JOIN:
LEFT JOIN `contact_uuids` `cu` ON `c`.`contact_id` = `cu`.`contact_id`
Now, back to the other problem (the non-standard usage of GROUP BY). Assuming that contact_id is the PRIMARY KEY, then simply remove the
GROUP BY `c`.`contact_id`
Initially I need to build a query fetching sites from one table ordered by date of newest article (articles placed in the separate table).
I build the following query:
SELECT *
FROM `sites`
INNER JOIN `articles` ON `articles`.`site_id` = `sites`.`id`
ORDER BY `articles`.`date` DESC
GROUP BY `sites`.`id`
I supposed that SELECT and INNER JOIN will fetch all posts and associate a site to each one, than ORDER BY will order the result by descending of post date than GROUP BY will take the very first post for each site and I will get the needed result.
But I'm receiving MySQL error #1064 - You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'GROUP BYsites.idLIMIT 0, 30' at line 7
If I place GROUP BY before ORDER BY statement the query is working but it will not give me the newest post for each site. Instead the result will be sorted after the grouping which is not the thing I need (actually I could prefer to order in another way after grouping).
I read several pretty similar questions but they all related to the data stored in a single table making it possible to use MAX and MIN functions.
What should I do to implement what I need?
You can use either a subquery / derived-table / inline-view or a self-exclusion join, e.g.:
SELECT s.*, a1.*
FROM `sites` s
INNER JOIN `articles` a1 ON a1.`site_id` = s.`id`
LEFT OUTER JOIN `articles` a2 ON a2.`site_id` = a1.`site_id`
AND a2.`date` > a1.`date`
WHERE
a2.`site_id` IS NULL
ORDER BY
a1.`date` DESC
The principle is that you select the sites for which there is no article date greater than any other article date.
rewrite the sql to the following syntax -
SELECT `articles`.`article_name`,'sites'.'id','articles'.'site_id'
FROM `sites`,'articles'
WHERE `articles`.`site_id` = `sites`.`id`
ORDER BY 'sites'.'id', `articles`.`date` DESC;
Do something like this in the select statement. Group by function demands that all fields to be grouped. Hence usage of * is not possible.
SELECT * FROM ( SELECT `S.<col1>`, `S.<col2>`, `A.<col1>`,`A.<col2>`,
ROW_NUMBER ()
OVER (PARTITION BY `SITES`.`ID`
ORDER BY `SITES`.`ID` DESC)
RID
FROM `SITES` `S`,`ARTICLES` `A`
WHERE `ARTICLES`.`SITE_ID` = `SITES`.`ID`
)
WHERE RID = 1;
Can you try this?
Finally I came to the solution.
First of all I changed the main query from queering from sites table to queering from articles. Next I added the MAX(date) column to the result.
So the resulting query implementing the thing I need is the following:
SELECT `sites`.`url`,MAX(`articles`.`date`) AS `last_article_date`
FROM `articles`
INNER JOIN `sites` ON `sites`.`id` = `article`.`site_id`
GROUP BY `site_id`
ORDER BY `last_article_date` ASC
Thanks to all of you for giving me hints and right search directions!
I have a mysql query and it works fine when i use where clause, but when i donot use
where clause it gone and never gives the output and finally timeout.
Actually i have used Explain command to check the performance of the query and in both cases the Explain gives the same number of rows used in joining.
I have attached the image of output got with Explain command.
Below is the query.
I couldn't figure whats the problem here.
Any help is highly appreciated.
Thanks.
SELECT
MCI.CLIENT_ID AS CLIENT_ID, MCI.NAME AS CLIENT_NAME, MCI.PRIMARY_CONTACT AS CLIENT_PRIMARY_CONTACT,
MCI.ADDED_BY AS SP_ID, CONCAT(MUD_SP.FIRST_NAME, ' ', MUD_SP.LAST_NAME) AS SP_NAME,
MCI.FK_PROSPECT_ID AS PROSPECT_ID, MCI.DATE_ADDED AS ADDED_ON,
(SELECT GROUP_CONCAT(LT.TAG_TEXT SEPARATOR ', ')
FROM LK_TAG LT
INNER JOIN M_OBJECT_TAG_MAPPING MOTM
ON LT.PK_ID = MOTM.FK_TAG_ID
WHERE MOTM.FK_OBJECT_ID = MCI.FK_PROSPECT_ID
AND MOTM.OBJECT_TYPE = 1
AND MOTM.IS_ACTIVE = 1
) AS TAGS,
IFNULL(SUM(GET_DIGITS(MMR.RCP_AMOUNT)), 0) AS REVENUE_SO_FAR,
IFNULL(SUM(GET_DIGITS(MMR.RCP_RUPEES)), 0) AS REVENUE_INR,
COUNT(DISTINCT PMI_MONTHLY.PROJECT_ID) AS MONTHLY,
COUNT(DISTINCT PMI_FIXED.PROJECT_ID) AS FIXED,
COUNT(DISTINCT PMI_HOURLY.PROJECT_ID) AS HOURLY,
COUNT(DISTINCT PMI_ANNUAL.PROJECT_ID) AS ANNUAL,
COUNT(DISTINCT PMI_CURRENTLY_RUNNING.PROJECT_ID) AS CURRENTLY_RUNNING_PROJECTS,
COUNT(DISTINCT PMI_YET_TO_START.PROJECT_ID) AS YET_TO_START_PROJECTS,
COUNT(DISTINCT PMI_TECH_SALES_CLOSED.PROJECT_ID) AS TECH_SALES_CLOSED_PROJECTS
FROM
M_CLIENT_INFO MCI
INNER JOIN M_USER_DETAILS MUD_SP
ON MCI.ADDED_BY = MUD_SP.PK_ID
LEFT OUTER JOIN M_MONTH_RECEIPT MMR
ON MMR.CLIENT_ID = MCI.CLIENT_ID
LEFT OUTER JOIN M_PROJECT_INFO PMI_FIXED
ON PMI_FIXED.CLIENT_ID = MCI.CLIENT_ID AND PMI_FIXED.PROJECT_TYPE = 1
LEFT OUTER JOIN M_PROJECT_INFO PMI_MONTHLY
ON PMI_MONTHLY.CLIENT_ID = MCI.CLIENT_ID AND PMI_MONTHLY.PROJECT_TYPE = 2
LEFT OUTER JOIN M_PROJECT_INFO PMI_HOURLY
ON PMI_HOURLY.CLIENT_ID = MCI.CLIENT_ID AND PMI_HOURLY.PROJECT_TYPE = 3
LEFT OUTER JOIN M_PROJECT_INFO PMI_ANNUAL
ON PMI_ANNUAL.CLIENT_ID = MCI.CLIENT_ID AND PMI_ANNUAL.PROJECT_TYPE = 4
LEFT OUTER JOIN M_PROJECT_INFO PMI_CURRENTLY_RUNNING
ON PMI_CURRENTLY_RUNNING.CLIENT_ID = MCI.CLIENT_ID AND PMI_CURRENTLY_RUNNING.STATUS = 4
LEFT OUTER JOIN M_PROJECT_INFO PMI_YET_TO_START
ON PMI_YET_TO_START.CLIENT_ID = MCI.CLIENT_ID AND PMI_YET_TO_START.STATUS < 4
LEFT OUTER JOIN M_PROJECT_INFO PMI_TECH_SALES_CLOSED
ON PMI_TECH_SALES_CLOSED.CLIENT_ID = MCI.CLIENT_ID AND PMI_TECH_SALES_CLOSED.STATUS > 4
WHERE YEAR(MCI.DATE_ADDED) = '2012'
GROUP BY MCI.CLIENT_ID ORDER BY CLIENT_NAME ASC
Yes, as many people have said, the key is that when you have the where clause, mysql engine filters the table M_CLIENT_INFO --probably drammatically--.
A similar result as removing the where clause is to to add this where clause:
where 1 = 1
You will see that the performance is degraded also because mysql will try to get all the data.
Remove the where clause and all columns from select and add a count to see how many records you get. If it is reasonable, say up to 10k, then do the following,
put back the select columns related to M_CLIENT_INFO
do not include the nested one "TAGS"
remove all your joins
run your query without where clause and gradually include the joins
this way you'll find out when the timeout is caused.
I would try the following. First, MySQL has a keyword "STRAIGHT_JOIN" which tells the optimizer to do the query in the table order you've specified. Since all you left-joins are child-related (like a lookup table), you don't want MySQL to try and interpret one of those as a primary basis of the query.
SELECT STRAIGHT_JOIN ... rest of query.
Next, your M_PROJECT_INFO table, I dont know how many columns of data are out there, but you appear to be concentrating on just a few columns on your DISTINCT aggregates. I would make sure you have a covering index on these elements to help the query via an index on
( Client_ID, Project_Type, Status, Project_ID )
This way the engine can apply the criteria and get the distinct all out of the index instead of having to go back to the raw data pages for the query.
Third, your M_CLIENT_INFO table. Ensure that has an index on both your criteria, group by AND your Order By, and change your order by from the aliased "CLIENT_NAME" to the actual column of the SQL table so it matches the index
( Date_Added, Client_ID, Name )
I have "name" in ticks as it is also a reserved word and helps clarify the column, not the keyword.
Next, the WHERE clause. Whenever you apply a function to an indexed column name, it doesn't work the greatest, especially on date/time fields... You might want to change your where clause to
WHERE MCI.Date_Added between '2012-01-01' and '2012-12-31 23:59:59'
so the BETWEEN range is showing the entire year and the index can better be utilized.
Finally, if the above do not help, I would consider splitting your query some. The GROUP_CONCACT inline select for the TAGS might be a bit of a killer for you. You might want to have all the distinct elements first for the grouping per client, THEN get those details.... Something like
select
PQ.*,
group_concat(...) tags
from
( the entire primary part of the query ) as PQ
Left join yourGroupConcatTableBasis on key columns