Getting duplicates from SQL query? - mysql

I'm using this sql query, but I get duplicates. How can I make sure that only one post per inst_id is fetched? I guess it's my group by main_posts.c_p_id that makes it happen, but I get an error if I don't have it since I'm sorting by it. Basically I'm trying to sort and put posts that have a match in main_posts.c_p_id first.
SELECT main_posts.c_p_id, insts.inst_id, insts.inst_title
FROM insts
LEFT JOIN inst_posts
ON inst_posts.instp_inst_id = insts.inst_id
LEFT JOIN main_posts
ON main_posts.c_id = insts.instp_c_id
GROUP BY insts.inst_id, main_posts.c_p_id
ORDER BY main_posts.c_p_id DESC, insts.inst_title ASC

How can I make sure that only one post per inst_id is fetched?
Since, you haven't provided any sample data, but you mentioned you are getting duplicate rows. Then, you could use DISTINCT to show distinct rows.
SELECT DISTINCT main_posts.c_p_id, insts.inst_id, insts.inst_title
FROM insts
LEFT JOIN inst_posts
ON inst_posts.instp_inst_id = insts.inst_id
LEFT JOIN main_posts
ON main_posts.c_id = insts.instp_c_id
GROUP BY insts.inst_id, main_posts.c_p_id
ORDER BY main_posts.c_p_id DESC, insts.inst_title ASC
I guess it's my group by main_posts.c_p_id that makes it happen,
Probably.
but I get an error if I don't have it since I'm sorting by it.
No, you are getting error because, you have mentioned main_posts.c_p_id column in GROUP BY. The ORDER BY will only order the record, it doesn't matter whether you select this column or not.

Related

How to improve performance getting recent records to display in list, recent top 5 most

I'm making a sample recent screen that will display a list, it displays the list, with id set as primary key.
I have done the correct query as expected but the table with big amount of data can cause slow performance issues.
This is the sample query below:
SELECT distinct H.id -- (Primary Key),
H.partnerid as PartnerId,
H.partnername AS partner, H.accountname AS accountName,
H.accountid as AccountNo,
FROM myschema.mytransactionstable H
INNER JOIN (
SELECT S.accountid, S.partnerid, S.accountname,
max(S.transdate) AS maxDate
from myschema.mytransactionstable S
group by S.accountid, S.partnerid, S.accountname
) ms ON H.accountid = ms.accountid
AND H.partnerid = ms.partnerid
AND H.accountname =ms.accountname
AND H.transdate = maxDate
WHERE H.accountid = ms.accountid
AND H.partnerid = ms.partnerid
AND H.accountname = ms.accountname
AND H.transdate = maxDate
GROUP BY H.partnerid,H.accountid, H.accountname
ORDER BY H.id DESC
LIMIT 5
In my case, there are values which are similar in the selected columns but differ only in their id's
Below is a link to an image without executing the query above. They are all the records that have not yet been filtered.
Sample result query click here
Since I only want to get the 5 most recent by their id but the other columns can contain similar values
accountname,accountid,partnerid.
I already got the correct query but,
I want to improve the performance of the query. Any suggestions for the improvement of query?
You can try using row_number()
select * from
(
select *,row_number() over(order by transdate desc) as rn
from myschema.mytransactionstable
)A where rn<=5
Don't repeat ON and WHERE clauses. Use ON to say how the tables (or subqueries) are "related"; use WHERE for filtering (that is, which rows to keep). Probably in your case, all the WHERE should be removed.
Please provide SHOW CREATE TABLE
This 'composite' index would probably help because of dealing with the subquery and the JOIN:
INDEX(partnerid, accountid, accountname, transdate)
That would also avoid a separate sort for the GROUP BY.
But then the ORDER BY is different, so it cannot avoid a sort.
This might avoid the sort without changing the result set ordering: ORDER BY partnerid, accountid, accountname, transdate DESC
Please provide EXPLAIN SELECT ... and EXPLAIN FORMAT=JSON SELECT ... if you have further questions.
If we cannot get an index to handle the WHERE, GROUP BY, and ORDER BY, the query will generate all the rows before seeing the LIMIT 5. If the index does work, then the outer query will stop after 5 -- potentially a big savings.

Having both GROUP BY and ORDER BY in one query

I've been told that I can't have GROUP BY and ORDER BY in one MySQL Query. Here is an abbreviated version of the query -
SELECT n.colorName, n.colorComp, n.colorID, SUM(n.gallons) AS TotalGallons
FROM netTran n, Store m, Product p
WHERE ((n.store = m.store) and m.state = "FL")
AND ((n.salesNbr = p.salesNbr) AND (p.intExt = "EXTERIOR" OR p.intExt = "INT/EXT"))
AND ((n.clrnt1 = "L1") AND (n.clrnt1 = "R3"))
GROUP BY n.colorComp, n.colorID
ORDER BY TotalGallons DESC;
I've been told that having the ORDER BY with the GROUP BY will give me different results and that the only way the ORDER BY would work is if the main query were nested in
SELECT * FROM
(query)
ORDER BY TotalGallons DESC;
Is that correct?
Use the query as
SELECT n.colorName, n.colorComp, n.colorID, SUM(n.gallons) AS TotalGallons
FROM netTran n, Store m, Product p
WHERE ((n.store = m.store) and m.state = "FL")
AND ((n.salesNbr = p.salesNbr) AND (p.intExt = "EXTERIOR" OR p.intExt = "INT/EXT"))
AND ((n.clrnt1 = "L1") AND (n.clrnt1 = "R3"))
GROUP BY n.colorName, n.colorComp,n.colorID
ORDER BY TotalGallons DESC;
You can have grouo by and order by in a single query. But you need to provide all columns in case of you are aggregating a column
Group by will change the results.. Order by will just present data in order..
Having the ORDER BY with the GROUP BY won't give you different results
Yes, that's true. In the mysql reference manual you can read th
If you use GROUP BY, output rows are sorted according to the GROUP BY columns as if you had an ORDER BY for the same columns. To avoid the overhead of sorting that GROUP BY produces, add ORDER BY NULL:
I suppose that this means that ORDER BY has no effect at all.
Curious... I always thought that order by worked...
GROUP BY and ORDER BY are two different things. It is plain wrong that you cannot use them together.
GROUP BY is used to tell the DBMS per which group to aggregate the data. In your example you sum gallons per colorComp and colorID.
ORDER BY is used to tell the DBMS in which order you want the data shown. In your query by the sum of gallons descending.
In standard SQL you don't usually use GROUP BY without ORDER BY, because in spite of the grouping, the data may be shown unordered. MySQL however decided to guarantee that GROUP BY performs an ORDER BY. So in MySQL it was not necessary to use ORDER BY after GROUP BY, as long as you didn't want another order as in your example. This non-standard behavior is now deprecated. See here:
https://dev.mysql.com/doc/refman/5.6/en/group-by-optimization.html
However, relying on implicit GROUP BY sorting is deprecated.
So you should have an ORDER BY clause now whenever you want data sorted. With no exception.

Order by Date not working as expected in MySql

I have a mysql query
select count(*) as TotalCount,
pd.Product_Modified_Date,
psc.Product_Subcategory_Name,
pd.Product_Image_URL
from product_subcategory psc
inner join product_details pd on psc.Product_Subcategory_ID = pd.Product_Subcategory_Reference_ID
where pd.Product_Status = 0 and
psc.Product_Subcategory_Status = 0
group by psc.Product_Subcategory_Name
order by pd.Product_Modified_Date desc
In my product_details table have new image urls. But i could not get it by the above query.
How can i do it?
You are grouping by one column, Product_Subcategory_Name, but you have other columns Product_Image_URL and Product_Modified_Date in your select-list.
If you have cases where the group has multiple rows (which you do, since the count is 14 or more in each group), MySQL can only present one value for the Product_Image_URL. So it picks some row in the group, and uses the value in that row. The URL value for all other rows in the group is ignored.
To fix this, you must group by all columns in your select-list that are not part of an aggregate function. Any column you don't want to use to form a new group must go into an aggregate function.
Roland Bouman wrote an excellent blog detailing how to use GROUP BY properly: http://rpbouman.blogspot.com/2007/05/debunking-group-by-myths.html
Combining GROUP BY and ORDER BY is problematic and your problem is most likely covered in another question on Stack Exchange : MySQL wrong results with GROUP BY and ORDER BY

Conditional Distinct in MYSQL with respect o another column

I have query as follow
SELECT * FROM content_type_product cp
JOIN content_field_product_manufacturer cf ON cf.nid = cp.nid group by cp.nid
ORDER
BY field(cf.field_product_manufacturer_value,'12') DESC,
cp.field_product_price_value DESC
This is working perfect just a small flaw, there are two records having the same id (one is for cf.field_product_manufacturer_value='12' and other is for cf.field_product_manufacturer_value = '57') which I eliminated using group by clause. But the problem is that I want to get that particular id which has greater "field_product_price_value" but somehow it gives me the value which is lesser. If I query it for '57' then it gives me the id with greater field_product_price_value but when I query it for '12' it gives me id for lesser "field_product_price_value". Is there any way where I can specify to pick the id with greater "field_product_price_value"
You should use max(field_product_price_value) combined with appropriate GROUP BY-clause.
In general, you should use GROUP BY-clause only when you select both normal columns and aggregate functions (MIN, MAX, COUNT, AVG) in the query.
You query is using a (mis)feature of MySQL called Hidden Columns. This is only advised when all the unaggregated columns in the SELECT and not in the GROUP BY have the same value. This is not the case, so you need to select the correct records yourself:
SELECT cp.*, cf.*
FROM content_type_product cp JOIN
content_field_product_manufacturer cf
ON cf.nid = cp.nid join
(select cf.nid, max(field_product_price_value) as maxprice
from content_field_product_manufacturer
group by cf.nid
) cfmax
on cf.nid = cfmax.nid and cf.field_product_price_value = cfmax.maxprice
ORDER BY field(cf.field_product_manufacturer_value,'12') DESC,
cp.field_product_price_value DESC
Unless you really know what you are doing, when you use a GROUP BY, be sure all unaggregated columns in the SELECT are in the GROUP BY.
'2' > '12'
if we are talking about varchars. I believe you should convert your field to number type and your sort will work fine. Read this article for more information.

MySQL Joins, Group By, and Ordering the Group By Choice

Is it possible to order the GROUP BY chosen results of a MySQL query w/out using a subquery? I'm finding that, with my large dataset, the subquery adds a significant amount of load time to my query.
Here is a similar situation: how to sort order of LEFT JOIN in SQL query?
This is my code that works, but it takes way too long to load:
SELECT tags.contact_id, n.last
FROM tags
LEFT JOIN ( SELECT * FROM names ORDER BY timestamp DESC ) n
ON (n.contact_id=tags.contact_id)
WHERE tags.tag='$tag'
GROUP BY tags.contact_id
ORDER BY n.last ASC;
I can get a fast result doing a simple join w/ a table name, but the "group by" command gives me the first row of the joined table, not the last row.
I'm not really sure what you're trying to do. Here are some of the problems with your query:
selecting n.last, although it is neither in the group by clause, nor an aggregate value. Although MySQL allows this, it's really not a good idea to take advantage of.
needlessly sorting a table before joining, instead of just joining
the subquery isn't really doing anything
I would suggest carefully writing down the desired query results, i.e. "I want the contact id and latest date for each tag" or something similar. It's possible that will lead to a natural, easy-to-write and semantically correct query that is also more efficient than what you showed in the OP.
To answer the question "is it possible to order a GROUP BY query": yes, it's quite easy, and here's an example:
select a, b, sum(c) as `c sum`
from <table_name>
group by a,b
order by `c sum`
You are doing a LEFT JOIN on contact ID which implies you want all tag contacts REGARDLESS of finding a match in the names table. Is that really the case, or will the tags table ALWAYS have a "Names" contact ID record. Additionally, your column "n.Last". Is this the person's last name, or last time something done (which I would assume is actually the timestamp)...
So, that being said, I would just do a simple direct join
SELECT DISTINCT
t.contact_id,
n.last
FROM
tags t
JOIN names n
ON t.contact_id = n.contact_id
WHERE
t.tag = '$tag'
ORDER BY
n.last ASC