segmentation and count using group by - mysql

I have a table like
uid programid
1 3
1 4
2 5
2 6
3 3
...
but imagine that on one million line, what I would like to get is something like
Is this possible doing that using mysql ? Percentage is not that important but I would really appreciate to get the 'cluster' part.
Thanks for your help :)

You can pre-compute data using table expressions as shown below. If using MySQL 8.x you can use CTEs (that are friendlier to use). For example:
select
favorites,
users,
case when users = 0 then 0 else mod(users - 1, 5) end as cluster
from (
select
favorites,
count(*) as users
from (
select uid, count(*) as favorites
from t
group by uid
) x
group by favorites
) y
order by favorites

Related

Possible to count number of occurrences in a "group" in MySQL?

Sorry if the title is misleading, I don't really know the terminology for what I want to accomplish. But let's consider this table:
CREATE TABLE entries (
id INT NOT NULL,
number INT NOT NULL
);
Let's say it contains four numbers associated with each id, like this:
id number
1 0
1 9
1 17
1 11
2 5
2 8
2 9
2 0
.
.
.
Is it possible, with a SQL-query only, to count the numbers of matches for any two given numbers (tuples) associated with a id?
Let's say I want to count the number of occurrences of number 0 and 9 that is associated with a unique id. In the sample data above 0 and 9 does occur two times (one time where id=1 and one time where id=2). I can't think of how to write a SQL-query that solves this. Is it possible? Maybe my table structure is wrong, but that's how my data is organized right now.
I have tried sub-queries, unions, joins and everything else, but haven't found a way yet.
You can use GROUP BY and HAVING clauses:
SELECT COUNT(s.id)
FROM(
SELECT t.id
FROM YourTable t
WHERE t.number in(0,9)
GROUP BY t.id
HAVING COUNT(distinct t.number) = 2) s
Or with EXISTS():
SELECT COUNT(distinct t.id)
FROM YourTable t
WHERE EXISTS(SELECT 1 FROM YourTable s
WHERE t.id = s.id and s.id IN(0,9)
HAVING COUNT(distinct s.number) = 2)

Select redundant rows only, not the original

So I'm tasked with cleaning up a system that has generated redundant orders.
Data example of the problem
ORDER ID, SERIAL, ...
1 1
2 1
3 2
4 2
5 3
6 3
7 3
The above data shows that 2 orders were generated with serial 1, 2 orders with serial 2, and 3 orders with serial 3. This is not allowed, and there should be only one order per serial.
So I need a query that can identify the REDUNDANT orders ONLY. I'd like the query to exclude the original order.
So the output from the above data should be:
REDUNDANT ORDER IDS
2
4
6
7
I can easily identify which orders have duplicates using GROUP BY and HAVING COUNT(*) > 1 but the tricky part comes with removing the original.
Is it even possible?
Any help is greatly appreciated.
As posted in the comments, here's one way to achieve this:
SELECT T1.ORDER_ID as redundant
FROM thetable T1
LEFT JOIN
(
SELECT SERIAL, MIN(ORDER_ID) AS firstorder
FROM thetable
GROUP BY SERIAL
HAVING COUNT(*) > 1
) T2 ON T1.ORDER_ID=T2.firstorder
WHERE T2.firstorder IS NULL
SQL Fiddle

Adding Row Values when there are no results - MySQL

Problem Statement: I need my result set to include records that would not naturally return because they are NULL.
I'm going to put some simplified code here since my code seems to be too long.
Table Scores has Company_type, Company, Score, Project_ID
Select Score, Count(Project_ID)
FROM Scores
WHERE company_type= :company_type
GROUP BY Score
Results in the following:
Score Projects
5 95
4 94
3 215
2 51
1 155
Everything is working fine until I apply a condition to company_type that does not include results in one of the 5 score categories. When this happens, I don't have 5 rows in my result set any more.
It displays like this:
Score Projects
5 5
3 6
1 3
I'd like it to display like this:
Score Projects
5 5
4 0
3 6
2 0
1 3
I need the results to always display 5 rows. (Scores = 1-5)
I tried one of the approaches below by Spencer7593. My simplified query now looks like this:
SELECT i.score AS Score, IFNULL(count(*), 0) AS Projects
FROM (SELECT 5 AS score
UNION ALL
SELECT 4
UNION ALL
SELECT 3
UNION ALL
SELECT 2
UNION ALL
SELECT 1) i
LEFT JOIN Scores ON Scores.score = i.score
GROUP BY Score
ORDER BY i.score DESC
And gives the following results, which is accurate except that the rows with 1 in Projects should actually be 0 because they are derived by the "i". There are no projects with a score of 5 or 2.
Score Projects
5 1
4 5
3 6
2 1
1 3
Solved! I just needed to adjust my count to specifically look at the project count - count(project) rather than count(*). This returned the expected results.
If you always want your query to return 5 rows, with Score values of 5,4,3,2,1... you'll need a rowsource that supplies those Score values.
One approach would be to use a simple query to return those fixed values, e.g.
SELECT 5 AS score
UNION ALL SELECT 4
UNION ALL SELECT 3
UNION ALL SELECT 2
UNION ALL SELECT 1
Then use that query as inline view, and do an outer join operation to the results from your current query
SELECT i.score AS `Score`
, IFNULL(q.projects,0) AS `Projects`
FROM ( SELECT 5 AS score
UNION ALL SELECT 4
UNION ALL SELECT 3
UNION ALL SELECT 2
UNION ALL SELECT 1
) i
LEFT
JOIN (
-- the current query with "missing" Score rows goes here
-- for completeness of this example, without the query
-- we emulate that result with a different query
SELECT 5 AS score, 95 AS projects
UNION ALL SELECT 3, 215
UNION ALL SELECT 1, 155
) q
ON q.score = i.score
ORDER BY i.score DESC
It doesn't have to be the view query in this example. But there does need to be a rowsource that the rows can be returned from. You could, for example, have a simple table that contains those five rows, with those five score values.
This is just an example approach for the general approach. It might be possible to modify your existing query to return the rows you want. But without seeing the query, the schema, and example data, we can't tell.
FOLLOWUP
Based on the edit to the question, showing an example of the current query.
If we are guaranteed that the five values of Score will always appear in the Scores table, we could do conditional aggregation, writing a query like this:
SELECT s.score
, COUNT(IF(s.company_type = :company_type,s.project_id,NULL)) AS projects
FROM Scores s
GROUP BY s.score
ORDER BY s.score DESC
Note that this will require a scan of all the rows, so it may not perform as well. The "trick" is the IF function, which returns a NULL value in place of project_id, when the row would have been excluded by the WHERE clause.)
If we are guaranteed that project_id is non-NULL, we could use a more terse MySQL shorthand expression to achieve an equivalent result...
, IFNULL(SUM(s.company_type = :company_type),0) AS projects
This works because MySQL returns 1 when the comparison is TRUE, and otherwisee returns 0 or NULL.
Try something like this:
select distinct score
from (
select distinct score from scores
) s
left outer join (
Select Score, Count(Project_ID) cnt
FROM Scores
WHERE company_type= :company_type
) x
on s.score = x.score
Your posted query would not work without a group by statement. However, even there, if you don't have those particular scores for that company type, it wouldn't work either.
One option is to use an outer join. That would require a little more work though.
Here's another option using conditional aggregation:
select Score, sum(company_type=:company_type)
from Scores
group by Score

Count 2 columns on different conditions

I have this dataset:
id uid follows_uid status
1 1 2 ACTIVE
2 1 3 ACTIVE
3 3 1 ACTIVE
4 4 1 ACTIVE
5 2 1 ACTIVE
on giving uid I want to calculate how many users are following, and how many are followed by (the given user).
Result set will be:
following followers
2 3
and here is the query which does the work:
SELECT COUNT(*) as following,
(SELECT COUNT(*) FROM user_followers where follows_uid = 1 ) as followers
FROM user_followers
WHERE uid = 1 and `status` = 'ACTIVE'
Now the question is, In't there any other way to get this done? Or is it the best way to achieve this?
If you have separate indexes on uid and follows_uid, then I believe using subqueries as you did is the fastest way to retrieve the separate counts because each query will take advantage of an index to retrieve the count.
Here's another way of achieving it.
select following.*, followers.* from
(select count(uid) from user_followers where uid = 1) following,
(select count(follows_uid) from user_followers where follows_uid = 1) followers;
And, to answer your question, your subquery approach is, in fact, the best way to achieve it. As pointed out by #FuzzyTree, you could use indexes to optimise your performance.
SELECT
IFNULL(SUM(IF(uid = 1, 1, 0)), 0) as following,
IFNULL(SUM(IF(follows_uid = 1, 1, 0)), 0) as followers
FROM user_followers
WHERE (uid = 1 OR follows_uid = 1)
AND `status` = 'ACTIVE';
Click here to see SQL Fiddle

Elegant mysql to select, group, combine multiple rows from one table

Here is a simplified version of my table:
group price spec
a 1 .
a 2 ..
b 1 ...
b 2
c .
. .
. .
I'd like to produce a result like this: (I'll refer to this as result_table)
price_a |spec_a |price_b |spec_b |price_c ...|total_cost
1 |. |1 |.. |... |
(min) (min) =1+1+...
Basically I want to:
select the rows containing the min price within each group
combine columns into a single row
I know this can be done using several queries and/or combined with some non-sql processing on the results, but I suspect that there maybe better solutions.
The reason that I want to do task 2 (combine columns into a single row)
is because I want to do something like the following with the result_table:
select *,
(result_table.total_cost + table1.price + table.2.price) as total_combined_cost
from result_table
right join table1
right join table2
This may be too much to ask for, so here is some other thoughts on the problem:
Instead of trying to combine multiple rows(task 2), store them in a temporary table
(which would be easier to calculate the total_cost using sum)
Feel free to drop any thoughts, don't have to be complete answer, I feel it's brilliant enough if you have an elegant way to do task 1 !
==Edited/Added 6 Feb 2012==
The goal of my program is to identify best combinations of items with minimal cost (and preferably possess higher utilitarian value at the same time).
Consider #ypercube's comment about large number of groups, temporary table seems to be the only feasible solution. And it is also pointed out there is no pivoting function in MySQL (although it can be implemented, it's not necessary to perform such operation).
Okay, after study #Johan's answer, I'm thinking about something like this for task 1:
select * from
(
select * from
result_table
order by price asc
) as ordered_table
group by group
;
Although looks dodgy, it seems to work.
==Edited/Added 7 Feb 2012==
Since there could be more than one combination may produce the same min value, I have modified my answer :
select result_table.* from
(
select * from
(
select * from
result_table
order by price asc
) as ordered_table
group by group
) as single_min_table
inner join result_table
on result_table.group = single_min_table.group
and result_table.price = single_min_table.price
;
However, I have just realised that there is another problem I need to deal with:
I can not ignore all the spec, since there is a provider property, items from different providers may or may not be able to be assembled together, so to be safe (and to simplify my problem) I decide to combine items from the same provider only, so the problem becomes:
For example if I have an initial table like this(with only 2 groups and 2 providers):
id group price spec provider
1 a 1 . x
2 a 2 .. y
3 a 3 ... y
4 b 1 ... y
5 b 2 x
6 b 3 z
I need to combine
id group price spec provider
1 a 1 . x
5 b 2 x
and
2 a 2 .. y
4 b 1 ... y
record (id 6) can be eliminated from the choices since it dose not have all the groups available.
So it's not necessarily to select only the min of each group, rather it's to select one from each group so that for each provider I have a minimal combined cost.
You cannot pivot in MySQL, but you can group results together.
The GROUP_CONCAT function will give you a result like this:
column A column B column c column d
groups specs prices sum(price)
a,b,c some,list,xyz 1,5,7 13
Here's a sample query:
(The query assumes you have a primary (or unique) key called id defined on the target table).
SELECT
GROUP_CONCAT(a.`group`) as groups
,GROUP_CONCAT(a.spec) as specs
,GROUP_CONCAT(a.min_price) as prices
,SUM(a.min_prices) as total_of_min_prices
FROM
( SELECT price, spec, `group` FROM table1
WHERE id IN
(SELECT MIN(id) as id FROM table1 GROUP BY `group` HAVING price = MIN(price))
) AS a
See: http://dev.mysql.com/doc/refman/5.0/en/group-by-functions.html
Producing the total_cost only:
SELECT SUM(min_price) AS total_cost
FROM
( SELECT MIN(price) AS min_price
FROM TableX
GROUP BY `group`
) AS grp
If a result set with the minimum prices returned in row (not in column) per group is fine, then your problem is of the gretaest-n-per-group type. There are various methods to solve it. Here's one:
SELECT tg.grp
tm.price AS min_price
tm.spec
FROM
( SELECT DISTINCT `group` AS grp
FROM TableX
) AS tg
JOIN
TableX AS tm
ON
tm.PK = --- the Primary Key of the table
( SELECT tmin.PK
FROM TableX AS tmin
WHERE tmin.`group` = tg.grp
ORDER BY tmin.price ASC
LIMIT 1
)