Counting rows in referencing parents and children - mysql

I have a table that looks like this:
Categories:
cId | Name | Parent
----+-------------------------+-------
1 | Parent One | NULL
2 | Child of 1st Parent | 1
3 | Parent Two | NULL
4 | Child of 1st Parent | 1
5 | Child of 2nd Parent | 2
The table does not represent a heirarchy: Every item is either a child or a parent, but not both.
And one table like this:
Posts:
pId | Name | cID
----+-------------------------+-------
1 | Post 1 | 1
2 | Post 2 | 2
3 | Post 3 | 2
4 | Post 4 | 3
I'd like to run a query on it that returns this:
cId | Count
---+---------
1 | 3
2 | 2
3 | 1
4 | 0
5 | 0
Count is the number of posts connected to the category.
All categories should be returned.
Parent categories should have the count of the category + child categories sum. (this is one of the things I'm having problem with)
Child categories should have the category sum.
How should I do this?

From your expected results, it looks like you don't care about grandchildren and lower, in which case, this should work. To get the correct parent count, I'm checking for Parent IS NULL or Count(children) > 0, in which case, I'm adding 1:
SELECT c.cId, CASE WHEN C.Parent IS NULL OR COUNT(C2.cId) > 0 THEN 1 ELSE 0 END +
COUNT(C2.cId) TotalCount
FROM Categories C
LEFT JOIN Categories C2 on c.cId = c2.Parent
GROUP BY c.cId
Here is some sample fiddle: http://www.sqlfiddle.com/#!2/b899f/1
And the results:
CID TOTALCOUNT
1 3
2 2
3 1
4 0
5 0
---EDIT---
From reading your comments, it looks like you want something like this:
SELECT c.cId,
COUNT(DISTINCT P.pId) + COUNT(DISTINCT P2.pId) TotalCount
FROM Categories C
LEFT JOIN Posts P ON C.CId = P.CId
LEFT JOIN Categories C2 on c.cId = c2.Parent
LEFT JOIN Posts P2 ON C2.CId = P2.CId
GROUP BY c.cId
http://www.sqlfiddle.com/#!2/eb0d2/3

This is general hint. I do not know if analytic functions and partitioning available in MySQL but you can partition your output by categories then count and sum up within categories. Do some research about analytic functions and partition by clause. General example of what I meant - output is partitioned by deptno and ordered. Also, max hiredate determined within partition - replace max with count, sum etc... in your case:
SELECT * FROM
(
SELECT deptno
, empno
, ename
, sal
, RANK() OVER (PARTITION BY deptno ORDER BY sal desc) rnk
, ROW_NUMBER() OVER (PARTITION BY deptno ORDER BY sal desc) rno
, MAX(hiredate) OVER (PARTITION BY deptno ORDER BY deptno) max_hire_date
FROM emp_test
ORDER BY deptno
)
--WHERE rnk = 1
ORDER BY deptno, sal desc
/
DEPTNO EMPNO ENAME SAL RNK RNO MAX_HIRE_DATE
--------------------------------------------------------------------
10 7839 KING 5000 1 1 1/23/1982
10 7782 CLARK 2450 2 2 1/23/1982
10 7934 MILLER 1300 3 3 1/23/1982
20 7788 SCOTT 3000 1 1 1/28/2013
20 7902 FORD 3000 1 2 1/28/2013
20 7566 JONES 2975 3 3 1/28/2013

Related

Group all rows after nth row together

I have the current table:
+----------+-------+
| salesman | sales |
+----------+-------+
| 1 | 142 |
| 2 | 120 |
| 3 | 176 |
| 4 | 140 |
| 5 | 113 |
| 6 | 137 |
| 7 | 152 |
+----------+-------+
I would like to make a query to retrieve the 3 top salesman, and an "Other" column, that would be the sum of everyone else. The expected output would be:
+----------+-------+
| salesman | sales |
+----------+-------+
| 3 | 176 |
| 7 | 152 |
| 1 | 142 |
| Others | 510 |
+----------+-------+
I am using MySQL, and I am experienced about it, but i can't imagine a way of doing this kind of GROUP BY.
A tried UNION with 2 SELECT, one for the top 3 salesman and another select for the "Others", but I couldn't figure a way of excluding the top 3 from the 2nd SELECT
You can do this by LEFT JOINing your table to a list of the top 3 salesmen, and then grouping on the COALESCEd salesman number from the top 3 table (which will be NULL if the salesman is not in the top 3).
SELECT COALESCE(top.sman, 'Others') AS saleman,
SUM(sales) AS sales
FROM test
LEFT JOIN (SELECT salesman AS sman
FROM test
ORDER BY sales DESC
LIMIT 3) top ON top.sman = test.salesman
GROUP BY saleman
ORDER BY saleman = 'Others', sales DESC
Output:
saleman sales
3 176
7 152
1 142
Others 510
Demo on dbfiddle
Using UNION, ORDER BY, LIMIT, OFFSET AND GROUP BY statements you should do the trick:
SELECT salesman, sales
FROM t
ORDER BY sales DESC LIMIT 3
UNION
SELECT 'Others', SUM(sales)
FROM (SELECT salesman, sales
FROM t
ORDER BY sales DESC LIMIT 3, 18446744073709551615) AS tt;
The big number at the end is the way to apply limit until the end of the table, as suggested here
This is a pain in MySQL:
(select salesman, count(*) as cnt
from t
group by salesman
order by count(*), salesman
limit 3
) union all
(select 'Others', count(*)
from t left join
(select salesman, count(*) as cnt
from t
group by salesman
order by count(*)
limit 3
) t3
on t3.salesman = t.salesman
where t3.salesman is null
);
This should be the fastest one if appropriate indexes are present:
(
SELECT salesman, sales
FROM t
ORDER BY sales DESC
LIMIT 3
)
UNION ALL
(
SELECT 'Other', SUM(sales) - (
SELECT SUM(sales)
FROM (
SELECT sales
FROM t
ORDER BY sales DESC
LIMIT 3
) AS top3
)
FROM t
)
ORDER BY CASE WHEN salesman = 'Other' THEN NULL ELSE sales END DESC
this will work:
select salesman,sales from tablename a where a.salesman in (3,7,1)
union all
select 'others' as others,sum(a.sales) as sum_of_others from tablename a where
a.salesman not in (3,7,1) group by others;
check https://www.db-fiddle.com/f/73GjFXL3KsZsYnN26g3rS2/0

retrieve value of maximum occurrence in a table

I am in a very complicated problem. Let me explain you first what I am doing right now:
I have a table name feedback in which I am storing grades against course id. The table looks like this:
+-------+-------+-------+-------+-----------+--------------
| id | cid | grade |g_point| workload | easiness
+-------+-------+-------+-------+-----------+--------------
| 1 | 10 | A+ | 1 | 5 | 4
| 2 | 10 | A+ | 1 | 2 | 4
| 3 | 10 | B | 3 | 3 | 3
| 4 | 11 | B+ | 2 | 2 | 3
| 5 | 11 | A+ | 1 | 5 | 4
| 6 | 12 | B | 3 | 3 | 3
| 7 | 11 | B+ | 2 | 7 | 8
| 8 | 11 | A+ | 1 | 1 | 2
g_point has just specific values for the grades, thus I can use these values to show the user courses sorted by grades.
Okay, now first my task is to print out the grade of each course. The grade can be calculated by the maximum occurrence against each course. For example from this table we can see the result of cid = 10 will be A+, because it is present two times there. This is simple. I have already implemented this query which I will write here in the end.
The main problem is when we talk about the course cid = 11 which has two different grades. Now in that situation client asks me to take the average of workload and easiness of both these courses and whichever course has the greater average should be shown. The average would be computed like this:
all workload values of the grade against course
+ all easiness values of the grade against course
/ 2
From this example cid = 11 has four entries,have equal number of grades against a course
B+ grade average
avgworkload(2 + 7)/2=x
avgeasiness(3 + 8)/2 = y
answer x+y/2 = 10
A+ grade average
avgworkload(5 + 1)/2=x
avgeasiness(4 + 2)/2 = y
answer x+y/2 = 3
so the grade should be B+.
This is the query which I am running to get the max occurrence grade
SELECT
f3.coursecodeID cid,
f3.grade_point p,
f3.grade g
FROM (
SELECT
coursecodeID,
MAX(mode_qty) mode_qty
FROM (
SELECT
coursecodeID,
COUNT(grade_point) mode_qty
FROM feedback
GROUP BY
coursecodeID, grade_point
) f1
GROUP BY coursecodeID
) f2
INNER JOIN (
SELECT
coursecodeID,
grade_point,
grade,
COUNT(grade_point) mode_qty
FROM feedback
GROUP BY
coursecodeID, grade_point
) f3
ON
f2.coursecodeID = f3.coursecodeID AND
f2.mode_qty = f3.mode_qty
GROUP BY f3.coursecodeID
ORDER BY f3.grade_point
Here is SQL Fiddle.
I added a table Courses with the list of all course IDs, to make the main idea of the query easier to see. Most likely you have it in the real database. If not, you can generate it on the fly from feedback by grouping by cid.
For each cid we need to find the grade. Group feedback by cid, grade to get a list of all grades for the cid. We need to pick only one grade for a cid, so we use LIMIT 1. To determine which grade to pick we order them. First, by occurrence - simple COUNT. Second, by the average score. Finally, if there are several grades than have same occurrence and same average score, then pick the grade with the smallest g_point. You can adjust the rules by tweaking the ORDER BY clause.
SELECT
courses.cid
,(
SELECT feedback.grade
FROM feedback
WHERE feedback.cid = courses.cid
GROUP BY
cid
,grade
ORDER BY
COUNT(*) DESC
,(AVG(workload) + AVG(easiness))/2 DESC
,g_point
LIMIT 1
) AS CourseGrade
FROM courses
ORDER BY courses.cid
result set
cid CourseGrade
10 A+
11 B+
12 B
UPDATE
MySQL doesn't have lateral joins, so one possible way to get the second column g_point is to repeat the correlated sub-query. SQL Fiddle
SELECT
courses.cid
,(
SELECT feedback.grade
FROM feedback
WHERE feedback.cid = courses.cid
GROUP BY
cid
,grade
ORDER BY
COUNT(*) DESC
,(AVG(workload) + AVG(easiness))/2 DESC
,g_point
LIMIT 1
) AS CourseGrade
,(
SELECT feedback.g_point
FROM feedback
WHERE feedback.cid = courses.cid
GROUP BY
cid
,grade
ORDER BY
COUNT(*) DESC
,(AVG(workload) + AVG(easiness))/2 DESC
,g_point
LIMIT 1
) AS CourseGPoint
FROM courses
ORDER BY CourseGPoint
result set
cid CourseGrade CourseGPoint
10 A+ 1
11 B+ 2
12 B 3
Update 2 Added average score into ORDER BY SQL Fiddle
SELECT
courses.cid
,(
SELECT feedback.grade
FROM feedback
WHERE feedback.cid = courses.cid
GROUP BY
cid
,grade
ORDER BY
COUNT(*) DESC
,(AVG(workload) + AVG(easiness))/2 DESC
,g_point
LIMIT 1
) AS CourseGrade
,(
SELECT feedback.g_point
FROM feedback
WHERE feedback.cid = courses.cid
GROUP BY
cid
,grade
ORDER BY
COUNT(*) DESC
,(AVG(workload) + AVG(easiness))/2 DESC
,g_point
LIMIT 1
) AS CourseGPoint
,(
SELECT (AVG(workload) + AVG(easiness))/2
FROM feedback
WHERE feedback.cid = courses.cid
GROUP BY
cid
,grade
ORDER BY
COUNT(*) DESC
,(AVG(workload) + AVG(easiness))/2 DESC
,g_point
LIMIT 1
) AS AvgScore
FROM courses
ORDER BY CourseGPoint, AvgScore DESC
result
cid CourseGrade CourseGPoint AvgScore
10 A+ 1 3.75
11 B+ 2 5
12 B 3 3
If I understood well you need an inner select to find the average, and a second outer select to find the maximum values of the average
select cid, grade, max(average)/2 from (
select cid, grade, avg(workload + easiness) as average
from feedback
group by cid, grade
) x group by cid, grade
This solution has been tested on your data usign sql fiddle at this link
If you change the previous query to
select cid, max(average)/2 from (
select cid, grade, avg(workload + easiness) as average
from feedback
group by cid, grade
) x group by cid
You will find the max average for each cid.
As mentioned in the comments you have to choose wich strategy use if you have more grades that meets the max average. For example if you have
+-------+-------+-------+-------+-----------+--------------
| id | cid | grade |g_point| workload | easiness
+-------+-------+-------+-------+-----------+--------------
| 1 | 10 | A+ | 1 | 5 | 4
| 2 | 10 | A+ | 1 | 2 | 4
| 3 | 10 | B | 3 | 3 | 3
| 4 | 11 | B+ | 2 | 2 | 3
| 5 | 11 | A+ | 1 | 5 | 4
| 9 | 11 | C | 1 | 3 | 6
You will have grades A+ and C soddisfing the maximum average 4.5

Retrieve data from a complex table

While searches the date range, start date (date_reg) and end date (date_reg)
, the mysql result should be have each main_table rows contains latest return, received, balance of each products.
E.g.: Between 10-01-2014 and 10-05-2014, should retrieve values of each product within the date
Client Id | Return | Received | Balance
| prod 1 prod 2 | prod 1 prod 2 | prod 1 prod 2
--------------------------------------------------------------
1 | 2 [3] 2 [7] | 5 5 | 8 5
2 | 1 [5] 0 [8] | 5 5 | 9 3
3 | 0 [6] 1 [10]| 5 5 | 7 6
[id], where id is the primary key of sub_table
I have tried mysql query
SELECT p.product_name, ipd.id as ipd_id, i.id as i_id, ipd.*, i.*
FROM main_table i
LEFT JOIN sub_table ipd ON ipd.main_table_id=i.id AND ipd.product_id IN (1,2)
LEFT JOIN product p ON ipd.product_id=p.id
WHERE ipd.date_reg IN (SELECT MAX(ipd1.date_reg)
FROM sub_table ipd1
WHERE ipd1.main_table_id=i.id AND
date_reg BETWEEN '10-01-2014' AND '10-05-2014')
ORDER BY cl.id ASC LIMIT 0, 20
it only return single product of return, received and balance of each client
When you use the subquery WHERE 'ipd.date_reg IN 'SELECT MAX...' you're only going to get 1 entry based on your data - 10-04-2014. Working correctly.
Try use GROUP BY in the sub query
Also GROUP_CONCAT(expr); helps to do many-to-many info's which can be be used to concatenate column values into a single string.
I got the output. Thanks everyone for the helps.
I have used GROUP_CANCAT to concatenate the results into one string with comma seperated
SELECT p.product_name, ipd.id as ipd_id, i.id as i_id, ipd.*, i.*,
GROUP_CONCAT(product_id SEPARATOR ',') as group_product_id,
GROUP_CONCAT(ipd.return SEPARATOR ',') as group_return,
GROUP_CONCAT(ipd.received SEPARATOR ',') as group_received,
GROUP_CONCAT(ipd.balance SEPARATOR ',') as group_balance
FROM main_table i
LEFT JOIN sub_table ipd ON ipd.main_table_id=i.id AND ipd.product_id IN (1,2)
LEFT JOIN product p ON ipd.product_id=p.id
WHERE ipd.date_reg IN (SELECT MAX(ipd1.date_reg)
FROM sub_table ipd1
WHERE ipd1.main_table_id=i.id AND
date_reg BETWEEN '10-01-2014' AND '10-05-2014'
GROUP BY ipd1.product_id)
ORDER BY cl.id ASC LIMIT 0, 20
The Result
Client Id | group_product_id | group_return | group_received | group_balance
--------------------------------------------------------------------------
1 | 1, 2 | 2, 2 | 5,5 | 8,5
2 | 1, 2 | 1, 0 | 5,5 | 9,3
3 | 1, 2 | 0, 1 | 5,5 | 7,6
Then the strings can be exploded into an array.

Query to find the duplicates between the name and number in table

SELECT count(*), lower(name), number
FROM tbl
GROUP BY lower(name), number
HAVING count(*) > 1;
input tb1
slno name number
1 aaa 111
2 Aaa 111
3 abb 221
4 Abb 121
5 cca 131
6 cca 141
7 abc 222
8 cse 222
This query can just find the duplicates in the number and names which are same but it wont be able find the duplicates in the 3 and 4th row!!!
SELECT count(*), lower(name)
FROM tbl
GROUP BY lower(name)
HAVING count(lower(name)) > 1
this query can find all the duplicates in name!!! it works perfectly
SELECT count(*), number
FROM tbl
GROUP BY number
HAVING count(number) > 1
this query can find all the duplicates in number!!! it works perfectly
I want a query which can find all the duplicates in both name and number whether the name consists of lower case and upper case
output
name number count
2 111 aaa
2 --- abb
2 --- cca
2 222 ---
Updated question
"Get duplicate on both number and name" ... "name and number as different column"
Rows can be counted twice here!
SELECT lower(name), NULL AS number, count(*) AS ct
FROM tbl
GROUP BY lower(name)
HAVING count(*) > 1
UNION ALL
SELECT NULL, number, count(*) AS ct
FROM tbl
GROUP BY number
HAVING count(*) > 1;
-> sqlfiddle
Original question
The problem is that the query groups by
GROUP BY lower(name), number
As row 3 and 4 have a different number, they are not the same for this query.
If you want to ignore different numbers for this query, try something like:
SELECT lower(name)
, count(*) AS ct
FROM tbl
GROUP BY lower(name)
HAVING count(*) > 1;
With a little work we can show counts for both name and number in one column:
select NameOrNumber, count(*) as Count
from (
select name as NameOrNumber from tb1
union all
select number from tb1
) a
group by NameOrNumber
having count(NameOrNumber) > 1
SQL Fiddle Example #1
Output #1:
| NAMEORNUMBER | COUNT |
------------------------
| 111 | 2 |
| aaa | 2 |
| abb | 2 |
| cca | 2 |
If you want the output in separate columns, you can do something like this:
select distinct if(t1.name = t2.name, t1.name, null) as DUPLICATE_Name,
if(t1.number = t2.number, t1.number, null) as DUPLICATE_Number
from tb1 t1
inner join tb1 t2 on (t1.name = t2.name or t1.number = t2.number)
and t1.slno <> t2.slno
SQL Fiddle Example #2
Output #2:
| DUPLICATE_NAME | DUPLICATE_NUMBER |
-------------------------------------
| Aaa | 111 |
| Abb | (null) |
| cca | (null) |

Finding the MIN value that appears for each unique value in either of two other columns

Given the following (simplified) tables:
People p
id name registered
-----------------------------------
1 Geoff 2011-03-29 12:09:08
2 Phil 2011-04-29 09:03:54
3 Tony 2011-05-29 21:22:23
4 Gary 2011-06-21 22:56:08
...
Items i
date p1id p2id
----------------------------------------
2011-06-29 20:09:44 1 2
2011-06-26 10:45:00 1 3
2011-06-23 12:22:43 2 3
2011-06-22 13:07:12 2 4
...
I'd like:
The earliest single i.date that each p.id appears in either column p1id or p2id; or p.registered if they feature in neither.
So far, I've tried:
CREATE TEMPORARY TABLE temp (id INT);
INSERT INTO temp (id)
SELECT DISTINCT u FROM (
SELECT p1id AS u FROM Items UNION ALL
SELECT p2id AS u FROM Items
)tt;
SELECT registered,id FROM People
WHERE id NOT IN (SELECT id FROM temp);
Which gets me as far as the second part, albeit in a fairly clumsy way; and I'm stuck on the first part beyond some sort of external, scripted iteration through all the values of p.id (ugh).
Can anyone help?
I'm on MySQL 5.1 and there's ~20k people and ~100k items.
One more solution:
SELECT id, name, IF(min_date1 IS NULL AND min_date2 IS NULL, registered, LEAST(COALESCE(min_date1, min_date2), COALESCE(min_date2, min_date1))) date FROM (
SELECT p.id, p.name, p.registered, MIN(i1.date) min_date1, MIN(i2.date) min_date2 FROM people p
LEFT JOIN items i1
ON p.id = i1.p1id
LEFT JOIN items i2
ON p.id = i2.p2id
GROUP BY id
) t;
OR this:
SELECT p.id, p.name, COALESCE(MIN(i.date), p.registered) FROM people p
LEFT JOIN (
SELECT p1id id, date FROM items
UNION ALL
SELECT p2id id, date FROM items
) i
ON p.id = i.id
GROUP BY id;
Result:
+------+-------+---------------------+
| id | name | date |
+------+-------+---------------------+
| 1 | Geoff | 2011-06-26 10:45:00 |
| 2 | Phil | 2011-06-22 13:07:12 |
| 3 | Tony | 2011-06-23 12:22:43 |
| 4 | Gary | 2011-06-22 13:07:12 |
+------+-------+---------------------+
This is tested in Postgres, but I think it ought to work in MySQL with few or no changes:
SELECT p.id,COALESCE(MIN(x.date),p.registered) AS date
FROM p
JOIN (
SELECT p.id,MIN(i.date) AS date
FROM p
JOIN i ON (p.id=i.p1id)
GROUP BY p.id
UNION
SELECT p.id,MIN(i.date) AS date
FROM p
JOIN i ON (p.id=i.p2id)
GROUP BY p.id
) AS x ON x.id = p.id
GROUP BY p.id,p.registered;
Output (given your sample data):
id | date
----+---------------------
3 | 2011-06-23 12:22:43
1 | 2011-06-26 10:45:00
2 | 2011-06-22 13:07:12
4 | 2011-06-22 13:07:12
(4 rows)