Mysql DISTINCT and ORDER BY not working - mysql

I have this query:
SELECT
DISTINCT (entries.guid),
locales.*,
locales.guid as locale_guid,
entries.*,
node.category_guid,
node.title as node_title,
nodelocs.caption as node_caption,
nodelocs.uri as node_uri,
nodelocs.url as node_url
FROM entries AS entries
JOIN entry_locales AS locales ON (locales.entry_guid=entries.guid)
LEFT JOIN nodes AS node ON (node.guid=entries.node_guid)
LEFT JOIN node_locales AS nodelocs ON (nodelocs.node_guid=entries.node_guid AND nodelocs.lang_guid=locales.lang_guid)
LEFT JOIN metrics as metrics_default ON (entries.guid = metrics_default.entry_guid)
WHERE
metrics_default.customer_guid='453b894967968204'
AND metrics_default.metrics='entryview'
AND locales.lang_guid='4b5747548a5bd457'
AND locales.state='1' AND entries.node_guid IN (SELECT child_guid FROM node_children WHERE guid IN ('4b084f280caffbef') ) ORDER BY metrics_default._updated DESC LIMIT 0,10
So I want to get items with unique guids and want to order it by '_updated' field from 'metricts_default' table. But nothing happens.
If I delete DISTINCT() - result is fine, but with duplicates.

you misunderstand the distinct
keyword distinct here does not mean you will get distinct entries.guid, but different row
what this mean is the row with same guid but different In other fields will not be eliminated by MySQL.

Related

Speeding up mysql query

I have a mysql query to join four tables and I thought that it was just best to join tables but now that mysql data is getting bigger the query seems to cause the application to stop execution.
SELECT
`purchase_order`.`id`,
`purchase_order`.`po_date` AS po_date,
`purchase_order`.`po_number`,
`purchase_order`.`customer_id` AS customer_id ,
`customer`.`name` AS customer_name,
`purchase_order`.`status` AS po_status,
`purchase_order_items`.`product_id`,
`purchase_order_items`.`po_item_name`,
`product`.`weight` as product_weight,
`product`.`pending` as product_pending,
`product`.`company_owner` as company_owner,
`purchase_order_items`.`uom`,
`purchase_order_items`.`po_item_type`,
`purchase_order_items`.`order_sequence`,
`purchase_order_items`.`pending_balance`,
`purchase_order_items`.`quantity`,
`purchase_order_items`.`notes`,
`purchase_order_items`.`status` AS po_item_status,
`purchase_order_items`.`id` AS po_item_id
FROM `purchase_order`
INNER JOIN customer ON `customer`.`id` = `purchase_order`.`customer_id`
INNER JOIN purchase_order_items ON `purchase_order_items`.`po_id` = `purchase_order`.`id`
INNER JOIN product ON `purchase_order_items`.`product_id` = `product`.`id`
GROUP BY id ORDER BY `purchase_order`.`po_date` DESC LIMIT 0, 20
my problem really is the query that takes a lot of time to finish. Is there a way to speed this query or to change this query for faster retrieval of the data?
heres the EXPLAIN EXTENED as requested in the comments.
Thanks in advance, I really hope this is the right channel for me to ask. If not please let me know.
Will this give you the correct list of ids?
SELECT id
FROM purchase_order
ORDER BY`po_date` DESC
LIMIT 0, 20
If so, then start with that before launching into the JOIN. You can also (I think) get rid of the GROUP BY that is causing an "explode-implode" of rows.
SELECT ...
FROM ( SELECT id ... (as above) ...) AS ids
JOIN purchase_order po ON po.id = ids.id
JOIN ... (the other tables)
GROUP BY ... -- (this may be problematic, especially with the LIMIT)
ORDER BY po.po_date DESC -- yes, this needs repeating
-- no LIMIT
Something like this
SELECT
`purchase_order`.`id`,
`purchase_order`.`po_date` AS po_date,
`purchase_order`.`po_number`,
`purchase_order`.`customer_id` AS customer_id ,
`customer`.`name` AS customer_name,
`purchase_order`.`status` AS po_status,
`purchase_order_items`.`product_id`,
`purchase_order_items`.`po_item_name`,
`product`.`weight` as product_weight,
`product`.`pending` as product_pending,
`product`.`company_owner` as company_owner,
`purchase_order_items`.`uom`,
`purchase_order_items`.`po_item_type`,
`purchase_order_items`.`order_sequence`,
`purchase_order_items`.`pending_balance`,
`purchase_order_items`.`quantity`,
`purchase_order_items`.`notes`,
`purchase_order_items`.`status` AS po_item_status,
`purchase_order_items`.`id` AS po_item_id
FROM (SELECT id, po_date, po_number, customer_id, status
FROM purchase_order
ORDER BY `po_date` DESC
LIMIT 0, 5) as purchase_order
INNER JOIN customer ON `customer`.`id` = `purchase_order`.`customer_id`
INNER JOIN purchase_order_items
ON `purchase_order_items`.`po_id` = `purchase_order`.`id`
INNER JOIN product ON `purchase_order_items`.`product_id` = `product`.`id`
GROUP BY purchase_order.id DESC
LIMIT 0, 5
You need to be sure that purchase_order.po_date and all id column are indexed. You can check it with below query.
SHOW INDEX FROM yourtable;
Since you mentioned that data is getting bigger. I would suggest doing sharding and then you can parallelize multiple queries. Please refer to the following article
Parallel Query for MySQL with Shard-Query
First, I cleaned up readability a bit. You don't need tick marks around every table.column reference. Also, for short-hand, using aliases works well. Ex: "po" instead of "purchase_order", "poi" instead of "purchase_order_items". The only time I would use tick marks is around reserved words that might cause a problem.
Second, you don't have any aggregations (sum, min, max, count, avg, etc.) in your query so you should be able to strip the GROUP BY clause.
As for indexes, I would have to assume you have an index on your reference tables on their respective "id" key columns.
For your Purchase Order table, I would have an index on that based on the "po_date" in the first index field position in case you already had an index using it. Since your Order by is on that, let the engine jump directly to those dated records first and you have your descending order resolved.
SELECT
po.id,
po.po_date,
po.po_number,
po.customer_id,
c.`name` AS customer_name,
po.`status` AS po_status,
poi.product_id,
poi.po_item_name,
p.weight as product_weight,
p.pending as product_pending,
p.company_owner,
poi.uom,
poi.po_item_type,
poi.order_sequence,
poi.pending_balance,
poi.quantity,
poi.notes,
poi.`status` AS po_item_status,
poi.id AS po_item_id
FROM
purchase_order po
INNER JOIN customer c
ON po.customer_id = c.id
INNER JOIN purchase_order_items poi
ON po.id = poi.po_id
INNER JOIN product p
ON poi.product_id = p.id
ORDER BY
po.po_date DESC
LIMIT
0, 20

Count, Group By, Subquery, Left Join not working as expected

This is puzzling me and no amount of the Google is helping me, hoping someone can point me in the right direction.
Please note that I have omitted some fields from the tables that don't relate to the question just to simplify things.
contacts
contact_id
name
email
contact_uuids
uuid
contact_id
visitor_activity
uuid
event
contact_communications
comm_id
contact_id
employee_id
Query
SELECT
`c`.*,
(SELECT COUNT(`log_id`) FROM `contact_communications` `cc` WHERE `cc`.`contact_id` = `c`.`contact_id`) as `num_comms`,
(SELECT MAX(`date`) FROM `contact_communications` `cc` WHERE `cc`.`contact_id` = `c`.`contact_id`) as `latest_date`,
(SELECT MIN(`date`) FROM `contact_communications` `cc` WHERE `cc`.`contact_id` = `c`.`contact_id`) as `first_date`,
(SELECT COUNT(`vaid`) FROM `visitor_activity` `va` WHERE `va`.`uuid` = `cu`.`uuid`) as `num_act`
FROM `contacts` `c`
LEFT JOIN `contact_uuids` `cu` ON `c`.`contact_id` = `cu`.`contact_id`
GROUP BY `c`.`contact_id`
ORDER BY `c`.`name` ASC
Some contacts have multiple UUIDs (upwards of 20 or 30).
When I perform the query WITHOUT the GROUP BY statement, I get the results I expect - a row returned for each UUID that exists for that contact, with the correct "num_comms" and "num_act" numbers.
However when I add the GROUP BY statement, the "num_comms" is a lot smaller then expected and the "num_act" returns only the value from the first row without the GROUP BY statement.
I tried doing a "WHERE NOT IN" in the subquery, however that simply crashed the server as it was far too intense.
So - how do I get this to add up all the COUNT values from the LEFT JOIN and not just return the first value?
Also if anyone can help me optimize this that would be great.
Two problems:
GROUP BY c.contact_id does not include all the non-aggregate columns. This is a MySQL extension. What you get is random values for the rows other than contact_id
The JOIN adds confusion. Your only use for visitor_activity is the COUNT(*) one it. But that does not make sense since it is limited to one UUID, whereas the row is limited to one contact_id. Rethink the purpose of that.
Remove this line:
(SELECT COUNT(`vaid`) FROM `visitor_activity` `va` WHERE `va`.`uuid` = `cu`.`uuid`) as `num_act`
and the rest may work ok.
I will continue with the assumption that you want the COUNT of all rows in visitor_activity for all the uuids associated with the one contact_id.
See if this:
( SELECT COUNT(*)
FROM `contacts` c2
JOIN `visitor_activity` USING(uuid)
WHERE c2.contact_id = c.contact_id as `num_act` ) AS num_act
will work for the last subquery. At the same time, remove the JOIN:
LEFT JOIN `contact_uuids` `cu` ON `c`.`contact_id` = `cu`.`contact_id`
Now, back to the other problem (the non-standard usage of GROUP BY). Assuming that contact_id is the PRIMARY KEY, then simply remove the
GROUP BY `c`.`contact_id`

Creating a subquery in Access

I am attempting to create a subquery in Access but I am receiving an error stating that one record can be returned by this subquery. I am wanting to find the top 10 companies that have the most pets then I want to know the name of those pets. I have never created a subquery before so I am not sure where I am going wrong. Here is what I have:
SELECT TOP 10 dbo_tGovenrnmentRegulatoryAgency.GovernmentRegulatoryAgency
(SELECT dbo_tPet.Pet
FROM dbo_tPet)
FROM dbo_tPet INNER JOIN dbo_tGovenrnmentRegulatoryAgency ON
dbo_tPet.GovernmentRegulatoryAgencyID =
dbo_tGovenrnmentRegulatoryAgency.GovernmentRegulatoryAgencyID
GROUP BY dbo_tGovenrnmentRegulatoryAgency.GovernmentRegulatoryAgency
ORDER BY Count(dbo_tPet.PetID) DESC;
Consider this solution, requiring a subquery in the WHERE IN () clause:
SELECT t1.GovernmentRegulatoryAgency, dbo_tPet.Pet,
FROM dbo_tPet
INNER JOIN dbo_tGovenrnmentRegulatoryAgency t1 ON
dbo_tPet.GovernmentRegulatoryAgencyID = t1.GovernmentRegulatoryAgencyID
WHERE t1.GovernmentRegulatoryAgency IN
(SELECT TOP 10 t2.GovernmentRegulatoryAgency
FROM dbo_tPet
INNER JOIN dbo_tGovenrnmentRegulatoryAgency t2 ON
dbo_tPet.GovernmentRegulatoryAgencyID = t2.GovernmentRegulatoryAgencyID
GROUP BY t2.GovernmentRegulatoryAgency
ORDER BY Count(dbo_tPet.Pet) DESC);
Table aliases are not needed but I include them for demonstration.
This should hopefully do it:
SELECT a.GovernmentRegulatoryAgency, t.NumOfPets
FROM dbo_tGovenrnmentRegulatoryAgency a
INNER JOIN (
SELECT TOP 10 p.GovernmentRegulatoryAgencyID, COUNT(p.PetID) AS NumOfPets
FROM dbo_tPet p
GROUP BY p.GovernmentRegulatoryAgencyID
ORDER BY COUNT(p.PetID) DESC
) t
ON a.GovernmentRegulatoryAgencyID = t.GovernmentRegulatoryAgencyID
In a nutshell, first get the nested query sorted, identifying what the relevant agencies are, then inner join back to the agency table to get the detail of the agencies so picked.

Incorrect group by and order by merge

I have couple tables joined in MySQL - one has many others.
And try to select items from one, ordered by min values from another table.
Without grouping in seems to be like this:
Code:
select `catalog_products`.id
, `catalog_products`.alias
, `tmpKits`.`minPrice`
from `catalog_products`
left join `product_kits` on `product_kits`.`product_id` = `catalog_products`.`id`
left join (
SELECT MIN(new_price) AS minPrice, id FROM product_kits GROUP BY id
) AS tmpKits on `tmpKits`.`id` = `product_kits`.`id`
where `category_id` in ('62')
order by product_kits.new_price ASC
Result:
But when I add group by, I get this:
Code:
select `catalog_products`.id
, `catalog_products`.alias
, `tmpKits`.`minPrice`
from `catalog_products`
left join `product_kits` on `product_kits`.`product_id` = `catalog_products`.`id`
left join (
SELECT MIN(new_price) AS minPrice, id FROM product_kits GROUP BY id
) AS tmpKits on `tmpKits`.`id` = `product_kits`.`id`
where `category_id` in ('62')
group by `catalog_products`.`id`
order by product_kits.new_price ASC
Result:
And this is incorrect sorting!
Somehow when I group this results, I get id 280 before 281!
But I need to get:
281|1600.00
280|2340.00
So, grouping breaks existing ordering!
For one, when you apply the GROUP BY to only one column, there is no guarantee that the values in the other columns will be consistently correct. Unfortunately, MySQL allows this type of SELECT/GROUPing to happen other products don't. Two, the syntax of using an ORDER BY in a subquery while allowed in MySQL is not allowed in other database products including SQL Server. You should use a solution that will return the proper result each time it is executed.
So the query will be:
For one, when you apply the GROUP BY to only one column, there is no guarantee that the values in the other columns will be consistently correct. Unfortunately, MySQL allows this type of SELECT/GROUPing to happen other products don't. Two, the syntax of using an ORDER BY in a subquery while allowed in MySQL is not allowed in other database products including SQL Server. You should use a solution that will return the proper result each time it is executed.
So the query will be:
select CP.`id`, CP.`alias`, TK.`minPrice`
from catalog_products CP
left join `product_kits` PK on PK.`product_id` = CP.`id`
left join (
SELECT MIN(`new_price`) AS "minPrice", `id` FROM product_kits GROUP BY `id`
) AS TK on TK.`id` = PK.`id`
where CP.`category_id` IN ('62')
order by PK.`new_price` ASC
group by CP.`id`
The thing is that group by does not recognize order by in MySQL.
Actually, what I was doing is really bad practice.
In this case you should use distinct and by catalog_products.*
In my opinion, group by is really useful when you need group result of agregated functions.
Otherwise you should not use it to get unique values.

Getting last element from Group By

I have this query...
$sQuery = "
SELECT SQL_CALC_FOUND_ROWS ".str_replace(" , ", " ", implode(", ", $aColumns))."
FROM dominios left join datas on dominios.id_dominio=datas.id_dominio
left join dnss on dominios.id_dominio=dnss.id_dominio
left join entidades_gestoras on dominios.id_dominio=entidades_gestoras.id_dominio
left join estados on dominios.id_dominio=estados.id_dominio
left join ips on dominios.id_dominio=ips.id_dominio
left join quantidade_dnss on dominios.id_dominio=quantidade_dnss.id_dominio
left join responsaveis_tecnicos on dominios.id_dominio=responsaveis_tecnicos.id_dominio
left join titulares on dominios.id_dominio=titulares.id_dominio
WHERE dominios.estado not like 2 and dominios.estado not like 0 AND data_expiracao > '".date("Ymd")."' $sWhere $where
GROUP BY dominio
$sOrder
$sLimit
";
It returns me the results I 'need'...
But the Group By, it show me the first result that appear on the database, and I needed the last...
How can I do this? :s
Edited
This is the final query, without those variables
SELECT SQL_CALC_FOUND_ROWS `datas`.`data_insercao`, `datas`.`data_expiracao`, `datas`.`data_registo`,
`dominios`.`dominio`,
`titulares`.`nome`, `titulares`.`morada`, `titulares`.`email`, `titulares`.`localidade`, `titulares`.`cod_postal`,
`entidades_gestoras`.`nome`, `entidades_gestoras`.`email`,
`responsaveis_tecnicos`.`nome`, `responsaveis_tecnicos`.`email`,
`ips`.`ip`, `dominios`.`id_dominio` FROM dominios left join datas on dominios.id_dominio=datas.id_dominio
left join dnss on dominios.id_dominio=dnss.id_dominio
left join entidades_gestoras on dominios.id_dominio=entidades_gestoras.id_dominio
left join estados on dominios.id_dominio=estados.id_dominio
left join ips on dominios.id_dominio=ips.id_dominio
left join quantidade_dnss on dominios.id_dominio=quantidade_dnss.id_dominio
left join responsaveis_tecnicos on dominios.id_dominio=responsaveis_tecnicos.id_dominio left join titulares on dominios.id_dominio=titulares.id_dominio WHERE dominios.estado not like 2 and dominios.estado not like 0 AND data_expiracao > '20120730' GROUP BY dominio ORDER BY `datas`.`data_insercao` asc LIMIT 0, 10
General considerations
I'm not sure what columsn you have in aColumns, or what table that dominio column comes from. When you group a number of rows using GROUP BY, then the columns you select for your result should either have the same value for all rows of the group (i.e. be functionally dependent), or should be some aggregate function combining the values of all the rows in the group.
Some SQL dialects enforce this. MySQL doesn't, but if you select an unaggregated column which has different values within the group, there are no guarantees as to what value will actually be returned to you. It might come from any row within the group. So there is no way to get the “last” of these rows, as there isn't any inherent order. In simple cases you can use MIN or MAX to select the value you need. In more complicated cases, you'll most likely have to use subqueries to do the selection from within the groups.
For example, this answer computes for every Name (which corresponds to your dominio grouping column) the last value of Action based on an ordering by ascending Time. Or rather the first value using a descending ordering, which is the same.
Your application
As your comment below indicates that you want the maximal id_dominio for each dominio in dominios, I suggest the following:
SELECT …
FROM (SELECT MAX(id_dominio) AS id_dominio
FROM dominios
GROUP BY dominio
WHERE estado <> 2
AND estado <> 0
) domIds
LEFT JOIN datas ON domIds.id_dominio=datas.id_dominio
…
So there will be one subquery to compute the maximal id_dominio for each dominio group, and all subsequent joins can use the IDs from that subquery instead of the full dominio table. If you need other columns from the dominio table as well, you might have to include that in the join again, so that you can get all the values from those row3s whose IDs you selected in the subquery.
By default MySQL sorts records in ascending order, to get last records first you need to sort the records in DESCNDING ORDER:
$sOrder DESC