GROUP BY in query - mysql

I have a little struggle with this query. The GROUP BY function is not working in this query. Without the webshop group by it works, but multiple webshops show. Maybe anyone got an idea? Thanks!
SELECT
default_deals.id,
default_deals.title,
default_webshops.name,
COUNT(default_deals_categories.categories_id) as category_count,
default_webshops.popular,
default_webshops.popular_order
FROM
default_deals
JOIN default_deals_categories ON default_deals.id = default_deals_categories.row_id
JOIN default_webshops ON default_deals.webshop = default_webshops.id
WHERE
categories_id IN (1, 2)
GROUP BY
default_deals.id,
default_deals.title,
default_webshops.name,
default_webshops.popular,
default_webshops.popular_order
ORDER BY
category_count DESC,
default_webshops.popular DESC,
default_webshops.popular_order DESC

Try this:
SELECT
default_deals.id,
title,
name,
COUNT(categories_id) as category_count,
popular,
popular_order
FROM
default_deals
JOIN default_deals_categories ON default_deals.id = default_deals_categories.row_id
JOIN default_webshops ON default_deals.webshop = default_webshops.id
WHERE
categories_id IN (1, 2)
GROUP BY
default_deals.id,
title,
name,
popular,
popular_order
ORDER BY
category_count DESC,
popular DESC,
popular_order DESC
Okay, I guess your results have a deal listed multiple times with each webshop it has been linked to and then the same category count but possibly different values for popular and popular_order? What you need to do is to decide how you want to aggregate these fields. For example, you could do this:
SELECT
dd.id,
dd.title,
MAX(dw.name) AS webshop_name,
COUNT(dc.categories_id) AS category_count,
MAX(dw.popular) AS popular,
SUM(dw.popular_order) AS popular_order
FROM
default_deals dd
JOIN default_deals_categories dc ON dd.id = dc.row_id
JOIN default_webshops dw ON dd.webshop = dw.id
WHERE
dc.categories_id IN (1, 2)
GROUP BY
dd.id,
dd.title
ORDER BY
COUNT(dc.categories_id) DESC,
MAX(dw.popular) DESC,
SUM(dw.popular_order) DESC;
But I am going to bet that is NOT what you actually want to see. Where you have multiple webshops what do you want to see? My example just takes the highest value alphabetically but you could make this into a list, e.g. "Webshop1, Webshop2", etc.
You also need to decide how to aggregate the other fields in the webshop table. My query takes the maximum for popular and the sum of popular_order but this is just an example.
Finally, this will double up, triple up, etc. the category count depending on how many webshops a deal is linked to... and I am going to bet that this is also not what you want?

Related

Speeding up mysql query

I have a mysql query to join four tables and I thought that it was just best to join tables but now that mysql data is getting bigger the query seems to cause the application to stop execution.
SELECT
`purchase_order`.`id`,
`purchase_order`.`po_date` AS po_date,
`purchase_order`.`po_number`,
`purchase_order`.`customer_id` AS customer_id ,
`customer`.`name` AS customer_name,
`purchase_order`.`status` AS po_status,
`purchase_order_items`.`product_id`,
`purchase_order_items`.`po_item_name`,
`product`.`weight` as product_weight,
`product`.`pending` as product_pending,
`product`.`company_owner` as company_owner,
`purchase_order_items`.`uom`,
`purchase_order_items`.`po_item_type`,
`purchase_order_items`.`order_sequence`,
`purchase_order_items`.`pending_balance`,
`purchase_order_items`.`quantity`,
`purchase_order_items`.`notes`,
`purchase_order_items`.`status` AS po_item_status,
`purchase_order_items`.`id` AS po_item_id
FROM `purchase_order`
INNER JOIN customer ON `customer`.`id` = `purchase_order`.`customer_id`
INNER JOIN purchase_order_items ON `purchase_order_items`.`po_id` = `purchase_order`.`id`
INNER JOIN product ON `purchase_order_items`.`product_id` = `product`.`id`
GROUP BY id ORDER BY `purchase_order`.`po_date` DESC LIMIT 0, 20
my problem really is the query that takes a lot of time to finish. Is there a way to speed this query or to change this query for faster retrieval of the data?
heres the EXPLAIN EXTENED as requested in the comments.
Thanks in advance, I really hope this is the right channel for me to ask. If not please let me know.
Will this give you the correct list of ids?
SELECT id
FROM purchase_order
ORDER BY`po_date` DESC
LIMIT 0, 20
If so, then start with that before launching into the JOIN. You can also (I think) get rid of the GROUP BY that is causing an "explode-implode" of rows.
SELECT ...
FROM ( SELECT id ... (as above) ...) AS ids
JOIN purchase_order po ON po.id = ids.id
JOIN ... (the other tables)
GROUP BY ... -- (this may be problematic, especially with the LIMIT)
ORDER BY po.po_date DESC -- yes, this needs repeating
-- no LIMIT
Something like this
SELECT
`purchase_order`.`id`,
`purchase_order`.`po_date` AS po_date,
`purchase_order`.`po_number`,
`purchase_order`.`customer_id` AS customer_id ,
`customer`.`name` AS customer_name,
`purchase_order`.`status` AS po_status,
`purchase_order_items`.`product_id`,
`purchase_order_items`.`po_item_name`,
`product`.`weight` as product_weight,
`product`.`pending` as product_pending,
`product`.`company_owner` as company_owner,
`purchase_order_items`.`uom`,
`purchase_order_items`.`po_item_type`,
`purchase_order_items`.`order_sequence`,
`purchase_order_items`.`pending_balance`,
`purchase_order_items`.`quantity`,
`purchase_order_items`.`notes`,
`purchase_order_items`.`status` AS po_item_status,
`purchase_order_items`.`id` AS po_item_id
FROM (SELECT id, po_date, po_number, customer_id, status
FROM purchase_order
ORDER BY `po_date` DESC
LIMIT 0, 5) as purchase_order
INNER JOIN customer ON `customer`.`id` = `purchase_order`.`customer_id`
INNER JOIN purchase_order_items
ON `purchase_order_items`.`po_id` = `purchase_order`.`id`
INNER JOIN product ON `purchase_order_items`.`product_id` = `product`.`id`
GROUP BY purchase_order.id DESC
LIMIT 0, 5
You need to be sure that purchase_order.po_date and all id column are indexed. You can check it with below query.
SHOW INDEX FROM yourtable;
Since you mentioned that data is getting bigger. I would suggest doing sharding and then you can parallelize multiple queries. Please refer to the following article
Parallel Query for MySQL with Shard-Query
First, I cleaned up readability a bit. You don't need tick marks around every table.column reference. Also, for short-hand, using aliases works well. Ex: "po" instead of "purchase_order", "poi" instead of "purchase_order_items". The only time I would use tick marks is around reserved words that might cause a problem.
Second, you don't have any aggregations (sum, min, max, count, avg, etc.) in your query so you should be able to strip the GROUP BY clause.
As for indexes, I would have to assume you have an index on your reference tables on their respective "id" key columns.
For your Purchase Order table, I would have an index on that based on the "po_date" in the first index field position in case you already had an index using it. Since your Order by is on that, let the engine jump directly to those dated records first and you have your descending order resolved.
SELECT
po.id,
po.po_date,
po.po_number,
po.customer_id,
c.`name` AS customer_name,
po.`status` AS po_status,
poi.product_id,
poi.po_item_name,
p.weight as product_weight,
p.pending as product_pending,
p.company_owner,
poi.uom,
poi.po_item_type,
poi.order_sequence,
poi.pending_balance,
poi.quantity,
poi.notes,
poi.`status` AS po_item_status,
poi.id AS po_item_id
FROM
purchase_order po
INNER JOIN customer c
ON po.customer_id = c.id
INNER JOIN purchase_order_items poi
ON po.id = poi.po_id
INNER JOIN product p
ON poi.product_id = p.id
ORDER BY
po.po_date DESC
LIMIT
0, 20

How to show last data (max data) of each group by on mysql

I have query like below:
SELECT kd.id_karir, kd.nama, kd.kelamin,
(YEAR(NOW())-YEAR(tanggal)) usia, MAX(pf.jenis), pf.jenis,
pf.nama AS pendidikan, pf.jurusan, kd.alamat, kd.telepon,
kd.handphone, kd.email, kd.tempat AS tempat_lahir,
kd.tanggal AS tanggal_lahir
FROM keadaan_diri AS kd
LEFT OUTER JOIN pendidikan_formal AS pf ON (kd.id_karir = pf.id_karir)
WHERE kd.id_karir = 'P1409047'
GROUP BY kd.id_karir
ORDER BY kd.nama ASC, pf.jenis DESC
I mean to returning the last data on the table pendidikan_formal using max and group but the query doesn't work.
First of all, you can / should (depending on the MySQL configuration) only select and order by columns that are part of your group by clause. For all other columns, you have to specify an aggregation function. For example, let's say you have two records of humans, both have the same name and a different age. When you group by name, you have to choose one of the two age values (max, min, average, ...). If you don't care which, you could turn off sql mode only full group by. I wouldn't suggest that however.
In order to get the one record with some maximum value however, group by is not the right approach. Take a look at these examples:
Subselect:
SELECT name, age, ...
FROM humans
WHERE age=(SELECT MAX(age) FROM humans);
Order by and limit:
SELECT name, age, ...
FROM humans
ORDER BY age DESC
LIMIT 1;
Left join:
SELECT name, age, ...
FROM humans h1
LEFT JOIN humans h2 ON h1.age < h2.age
WHERE h2.age IS NULL;
Now if you want all maximum rows per group, check one of these answers with tag greatest-n-per-group.
You can use a correlated subquery. Your question is a bit vague; I assume that id_karir is the group and tanggal is the date.
If I understand correctly, this would apply to your query as:
SELECT kd.id_karir, kd.nama, kd.kelamin,
(YEAR(NOW())-YEAR(tanggal)) usia, pf.jenis, pf.jenis,
pf.nama AS pendidikan, pf.jurusan, kd.alamat, kd.telepon,
kd.handphone, kd.email, kd.tempat AS tempat_lahir,
kd.tanggal AS tanggal_lahir
FROM keadaan_diri kd LEFT OUTER JOIN
pendidikan_formal pf
ON kd.id_karir = pf.id_karir AND
pf.tanggal = (SELECT MAX(pf2.tanggal) FROM pendidikan_formal pf2 WHERE pf2.id_karir = pf.id_karir)
This is not an aggregation query. This is a filtering query.

SQL query:Having number=max(number) doesn't work

I have two tables,Writer and Books. A writer can pruduce many books. I want to get the all writers who produce maximal number of books.
Firstly, my sql query is like:
SELECT Name FROM(
SELECT Writer.Name,COUNT(Book.ID) AS NUMBER FROM Writer,Book
WHERE
Writer.ID=Book.ID
GROUP BY Writer.Name
)
WHERE NUMBER=(SELECT MAX(NUMBER) FROM
(SELECT Writer.Name,COUNT(Book.ID) AS NUMBER FROM Writer,Book
WHERE Writer.ID=Book.ID
GROUP BY Writer.Name
)
It works. However I think this query is too long and there exists some duplications. I want to make this query shorter. So I try another query like this:
SELECT Name FROM(
SELECT Writer.Name,COUNT(Book.ID) AS NUMBER FROM Writer,Book
WHERE
Writer.ID=Book.ID
GROUP BY Writer.Name
HAVING NUMBER = MAX(NUMBER)
)
However, this HAVING clause doesn't work and my sqlite says its an error.
I don't know why. Can anyone explain to me ? Thank you!
The HAVING clause provides filtering on the final set (typically after a group by) and does not provide additional grouping functionality. Think of it just like a WHERE clause, but can be applied after a GROUP BY.
Your query with the HAVING NUMBER = MAX(NUMBER) implies grouping of the set of NUMBER values across all records and doesn't make sense in this example (even though we all get what you want it to do).
Each query provides you with one level of aggregation, so you cannot use Max on COUNT in the same query. You need a sub-query like you did in your first query.
However, your first query can be simplified on MySQL to:
SELECT Writer.Name
FROM Writer, Book
WHERE Writer.ID = Book.ID
GROUP BY Writer.Name
HAVING COUNT(Book.ID) = (SELECT COUNT(Book.ID) AS n
FROM Writer, Book
WHERE Writer.ID = Book.ID
GROUP BY Writer.Name
ORDER BY n DESC
LIMIT 1)
In MySQL (but not SQLite), you can use variables to reduce the amount of work and make a simpler query. However, there are nuances there, because variables with group by require an extra level of subqueries:
SELECT name
FROM (SELECT t.*, (#m := if(#m = 0, NUMBER, #m)) as maxn
FROM (SELECT w.Name, COUNT(b.ID) AS NUMBER
FROM Writer w JOIN
Book b
ON w.ID = b.ID
GROUP BY w.Name
) t CROSS JOIN
(SELECT #m := 0) params
ORDER BY NUMBER desc
) t
WHERE maxn = number;
It looks like you are nesting aggregate functions, which is not allowed.
HAVING NUMBER = MAX(NUMBER) is like HAVING COUNT(Book.ID) = MAX(COUNT(Book.ID))
Nesting COUNT inside MAX seems to be the issue here

improving MySQL related articles query

For a related topic list I use a query using tags. It displays a list of 5 articles that have 1 or more tags in common and that are older than the viewed one.
Is it possible to write a query that produce more relevant results by giving more weight to articles that have 2,3,4... tags in common?
I saw this topic on more or less the same subject:
MySQL Find Related Articles
but it produces 0 results in the case there are less than 3 tags in common.
The query I use now:
SELECT DISTINCT
AAmessage.message_id, AAmessage.title, AAmessage.date
FROM
AAmessage
LEFT JOIN
AAmessagetagtable
AS child ON child.message_id = AAmessage.message_id
JOIN AAmessagetagtagtable
AS parent ON parent.tag_id = child.tag_id
AND
parent.message_id = '$message_id'
AND AAmessage.date < '$row[date]'
ORDER BY
AAmessage.date DESC LIMIT 0,5
using tables:
AAmessage (message_id, title, date...)
AAmessagetable (key, message_id, tag_id)
AAtag (tag_id, tag.... not used in this query but needed to store names of tags)
First of all, please excuse that I changed the table names a bit to message and message_tag for readability.
Second, I didn't test this. Use it rather as a pointer than a definite answer.
The query uses two subqueries, which might not be so efficient, there is probably a room for improvement. First, the innermost query looks for the tags of the current message. Then, the middle query looks for messages which are marked with at least one common tag. The grouping is used to get unique message_id and order them by number of common tags. Last, the JOIN is used to load additional details and to filter out the old messages.
You may notice I used question marks instead of '$xyz'. This is to avoid the care about escaping the variable contents.
SELECT message_id, title, date
FROM message
RIGHT JOIN (SELECT message_id, COUNT(*)
FROM message_tag
WHERE tag_id IN
(SELECT MT.tag_id FROM message_tag MT WHERE MT.message_id = ?)
GROUP BY message_id
ORDER BY COUNT(*) DESC) RELATED_MESSAGES
ON message.message_id = RELATED_MESSAGES.message_id
WHERE date < ?
I use HAVING for this situations.
SELECT DISTINCT m.message_id, m.title, m.date
FROM AAmessage AS `m`
LEFT JOIN AAmessagetagtable AS `mt` ON mt.message_id = mt.message_id
GROUP m.message_id
HAVING COUNT(mt.key) >= 1
WHERE m.message_id = '$message_id'
AND m.date < '$row[date]'
ORDER BY m.date DESC
LIMIT 5

MySQL SELECT from multiple columns from tables

I am trying to retrieve information from a mysql database.
I have the following tables:
Qualifications(qualificationid, qualificationname, personid,status)
Address(addressid, addressline1,city,province,areacode,personid)
score(scoreid, score.choices,personid,jobid)
I use typed the following mysql statement to retrieve the data
SELECT score.personid, qualifications.qualificationname, score.score
FROM
Qualifications, Score, Address
WHERE
score.jobid=58
AND
qualifications.qualificationName ='Human Resource Management'
AND
aadress.province ='Western Cape'
ORDER BY score.score
LIMIT 0,20;
this seems to work for everything else but doesn't restrict the province to western cape.
Why don't you use joins? Like so:
SELECT s.personid, q.qualificationname, s.score
FROM Score s
INNER JOIN Qualifications q ON q.personid = s.personid AND q.qualificationName ='Human Resource Management'
INNER JOIN Address a ON a.personid = s.personid AND a.province ='Western Cape'
WHERE s.jobid = 58
ORDER BY s.score DESC
LIMIT 0,20;
You will need to define relations. The system now has no clue how Addresses relate to Scores or Qualifications in your example. By adding a GROUP BY score.personid and AND score.personid = address.personid and score.personid = qualifications.personid you might fix your problems.
Also, using JOINS is probably more efficient as it does basically the same.