Count number of distinct rows for multiple values - mysql

Let's consider this table specifing how many times a person bought a property.
+--------+----------+
| user | property |
+--------+----------+
| john | car |
| john | car |
| john | house |
| peter | car |
| peter | car |
| amanda | house |
| amanda | house |
+--------+----------+
I need to know how many times a car was bought once, how many times a house was bought once, etc. Something like this:
+----------+---+---+
| property | 1 | 2 |
+----------+---+---+
| cars | 4 | 2 |
| house | 3 | 1 |
+----------+---+---+
How many times a car was bought? Four, two for peter and two for john.
How many times a car was bought twice? Two, for the same guys.
How many times a house was bought? Three, two for amanda and once for john.
How many times a house was bought twice? Only once, for amanda
Is this possible to do this only using SQL queries?
I don't care about performance or hackish ways.
There are more than two frequencies.
There's a fixed set of time a person can buy a property (5) so it's not problem to specify the columns manually in the query. I mean there's not problem doing something like:
SELECT /* ... */ AS 1, /* ... */ AS 2, /* ... */, AS 3 /* ... */

SELECT DISTINCT #pr := prop,
(SELECT COUNT(1) FROM tbl WHERE prop = #pr LIMIT 1),
(SELECT COUNT(1) FROM
(SELECT *, COUNT(*) cnt
FROM tbl
GROUP BY usr, prop
HAVING cnt = 2) as tmp
WHERE `tmp`.prop = #pr LIMIT 1)
FROM tbl;
Yes, it is not the best method; but hey, you get the answers as desired.
Also, it'll generate the results for any kind of property in your table.
The fiddle link lies here.
P.S.: 60 tries O_O

I am here since you posted the question. Good one...
Here is a way to do it exactly as you asked for, with just groups and counts.
The trick is that I concatenate the user and property columns to produce a unique "id" for each, if we could call it that. It should work independently of the count of purchases.
SELECT C.`property`, COUNT(C.`property`), D.`pcount` from `purchases` C
LEFT JOIN(
SELECT A.`property`, B.`pcount` FROM `purchases` A
LEFT JOIN (
SELECT `property`,
CONCAT(`user`, `property`) as conc,
COUNT(CONCAT(`user`, `property`)) as pcount
FROM `purchases` GROUP BY CONCAT(`user`, `property`)
) B
ON A.`property` = B.`property`
GROUP BY B.pcount
) D
ON C.`property` = D.`property`
GROUP BY C.`property`

SQL Fiddle
MySQL 5.5.30 Schema Setup:
CREATE TABLE Table1
(`user` varchar(6), `property` varchar(5))
;
INSERT INTO Table1
(`user`, `property`)
VALUES
('john', 'car'),
('john', 'car'),
('john', 'house'),
('peter', 'car'),
('peter', 'car'),
('amanda', 'house'),
('amanda', 'house')
;
Query 1:
select t.property, t.total, c1.cnt as c1, c2.cnt as c2, c3.cnt as c3
from
(select
t.property ,
count(t.property) as total
from Table1 t
group by t.property
) as t
left join (
select property, count(*) as cnt
from (
select
property, user, count(*) as cnt
from table1
group by property, user
having count(*) = 1
) as i1
group by property
) as c1 on t.property = c1.property
left join (
select property, count(*) as cnt
from (
select
property, user, count(*) as cnt
from table1
group by property, user
having count(*) = 2
) as i2
group by property
) as c2 on t.property = c2.property
left join (
select property, count(*) as cnt
from (
select
property, user, count(*) as cnt
from table1
group by property, user
having count(*) = 3
) as i3
group by property
) as c3 on t.property = c3.property
Results:
| PROPERTY | TOTAL | C1 | C2 | C3 |
-------------------------------------------
| car | 4 | (null) | 2 | (null) |
| house | 3 | 1 | 1 | (null) |

You may try following.
SELECT COUNT(TABLE1.PROPERTY) AS COUNT, PROPERTY.USER FROM TABLE1
INNER JOIN (SELECT DISTINCT PROPERTY, USER FROM TABLE1) AS PROPERTY
ON PROPERTY.PROPERTY = TABLE1.PROPERTY
AND PROPERTY.USER = TABLE1.USER
GROUP BY TABLE1.USER, PROPERTY.PROPERTRY
tested similar in MySQL

try this
SELECT property , count(property) as bought_total , count(distinct(user)) bought_per_user
FROM Table1
GROUP BY property
the output will be like that
PROPERTY | BOUGHT_TOTAL | BOUGHT_PER_USER
________________________________________________________
car | 4 | 2
house | 3 | 2
DEMO SQL FIDDLE HERE

You should be able to do this with sub-selects.
SELECT property, user, COUNT(*) FROM purchases GROUP BY property, user;
will return you the full set of grouped data that you want. You then need to look at the different frequencies:
SELECT property, freq, COUNT(*) FROM (SELECT property, user, COUNT(*) freq FROM purchases GROUP BY property, user) AS foo GROUP BY property, freq;
It's not quite in the format that you illustrated but it returns the data

I hope this can help u.....let us create one table first:
create table prop(user varchar(max),property varchar(max))
insert into prop values('john','car'),insert into prop values('john','car'),
insert into prop values('john','house'),insert into prop values('peter','car'),
insert into prop values('peter','car'),insert into prop values('amanda','house'),
insert into prop values('amanda','house')
1)how many times car was bought?
ANS: select count(property) from prop where property = 'car'
(4)
2)How many times a car was bought twice?
ANS: select user,COUNT(property) from prop where property = 'car' group by user
having COUNT(property) = 2
2-john
2-peter
3)How many times a house was bought?
ANS: select COUNT(property) from prop where property = 'house'
(3)
4)How many times a house was bought twice?
ANS: select user,COUNT(property) from prop where property='house' group by user
having COUNT(property)< =2
2-amanda
1-john

Related

Calculate percentage in mySQL where SUM is already present in the table

I have a table(Which I have no control over) like this:
As, you can see this already has total calculate in a separate row
I have to do calculate percentage which should look something like this:
The issue is how do I pass Total in a sub query like
SELECT Marks from <TABLE> WHERE Topic = 'Total';
, so that I only get a single row?
Thanks
You can do something along the lines of
SELECT m1.*, ROUND(m1.marks / m2.marks * 100, 2) percentage
FROM marks m1 join marks m2
ON m1.name = m2.name AND m2.topic = 'Total'
ORDER BY name, topic
Output:
| Name | Topic | Marks | percentage |
|------|---------|-------|------------|
| Joe | Chem | 43 | 26.38 |
| Joe | Maths | 75 | 46.01 |
| Joe | Physics | 45 | 27.61 |
| Joe | Total | 163 | 100 |
...
SQLFiddle
The total SHOULD NOT be in the table. Given that you cannot modify it, I would just ignore that value and calculate the total and then calculate the percentage.
SELECT
m.Name,
Topic,
Marks,
Marks / t.Total * 100 AS Percentage
FROM
marks AS m
JOIN (
SELECT
Name,
SUM(Marks) AS Total
FROM
marks
WHERE
Topic != 'Total'
GROUP BY
Name) AS t ON t.Name = m.Name
In a subquery select the row with the same name and the topic 'Total'.
SELECT t1.name,
t1.topic,
t1.marks,
t1.marks
/ (SELECT t2.marks
FROM elbat t2
WHERE t2.name = t1.name
AND t2.topic = 'Total')
* 100 percentage
FROM elbat t1;
Another option is using a join.
SELECT t1.name,
t1.topic,
t1.marks,
t1.marks
/ t2.marks
* 100 percentage
FROM elbat t1
LEFT JOIN elbat t2
ON t2.name = t1.name
AND t2.topic = 'Total';
name is required to be unique and there must only be one row with 'Total' per name. Otherwise the subquery will throw an error about returning more than one row. With the join there's no such error but nonsense/ambiguous results.
You might also think about the case when there's a total of 0, as this would trigger a division by zero error.
The table design alas is bad. Tables represent relations, not spreadsheets. The rows with the total have no business being in there. Lookup relational normalization.

mysql "and" logic within result set

Say I have a data set like the following:
table foo
id | employeeType | employeeID
-------------------------
1 | Developer | 1
2 | Developer | 2
3 | Developer | 3
4 | Manager | 1
5 | Manager | 4
6 | Manager | 5
7 | CEO | 1
8 | CEO | 6
and I wanted to run a query that would return all the employeeids (along with the employeeTypes) where there is a common employee id between all employeeTypes (that's the 'and' logic. ONly employeeIDs that have all employeeTypes will return. employeeType = Developer and employeeType=Manager and employeeType=CEO). For the data above the example output would be
result table
id | employeeType | employeeID
-------------------------
1 | Developer | 1
4 | Manager | 1
7 | CEO | 1
I was able to do this when I only had only TWO employeeTypes by self joining the table like this.
select * from foo as fooOne
join foo as fooTwo
on fooOne.employeeID = fooTwo.employeeID
AND
fooOne.employeeType <> fooTwo.employeeType
that query returns a result set with values from fooTwo when the 'and' logic matches, but again, only for two types of employees. My real use case scenario dictates that I need to be able to handle a variable number of employeeTypes (3, 4, 5, etc...)
Any thoughts on this would be greatly appreciated.
This should return the rows that you want:
SELECT foo.*
FROM
foo
WHERE
employeeID IN (
SELECT employeeID
FROM foo
GROUP BY employeeID
HAVING COUNT(DISTINCT employeeType) =
(SELECT COUNT(DISTINCT employeeType)
FROM foo)
)
Please see a fiddle here.
The inner query will return the number of distinct employee types:
(SELECT COUNT(DISTINCT employeeType) FROM foo)
The middle query will return all the employee IDs that have the maximum number of employee types:
SELECT employeeID
FROM foo
GROUP BY employeeID
HAVING COUNT(DISTINCT employeeType) =
(SELECT COUNT(DISTINCT employeeType) FROM foo)
and the outer query will return the whole rows.
You can try a subquery to make it dynamic
SELECT employeeID, employeeType
FROM foo
WHERE employeeID IN (
SELECT employeeID
FROM foo
GROUP BY employeeID
HAVING COUNT(DISTINCT employeeType) = (SELECT COUNT(DISTINCT employeeType) FROM foo)
)
I agree that this might be looked down as a very inefficient/hacky way of doing things, but this should still get the job done. And frankly, I can't see any other way out of this.
SELECT * FROM (
SELECT EMPLOYEE_ID, GROUP_CONCAT(DISTINCT EmployeeType ORDER BY EmployeeType) AS Roles
FROM EMPLOYEES GROUP BY EMPLOYEE_ID
) EMPLOYEE_ROLES
WHERE EMPLOYEE_ROLES.Roles = 'CEO,Developer,Manager';
Note that the comma separated list of roles provided in the end is in the alphabetical order.

Converting IN to EXISTS for counting users who have taken multiple kinds of actions

I have a table full of users, timestamps, and different types of actions. Let's call them type A, B, and C:
| User ID | Date | ActionType |
--------------------------------------
| 1 | 10/2/14 | A |
| 2 | 10/12/14 | A |
| 3 | 11/1/14 | B |
| 1 | 11/15/14 | B |
| 2 | 12/2/14 | C |
I'm trying to get counts of the number of users who have taken combinations of different action types within a time period -- for example, the number of users who have done both action A and action B between October and December.
This code works (for one combination of actions at a time), but takes forever to run:
SELECT
COUNT(DISTINCT `cm0`.`User ID`) AS `Users`
FROM `mytable` AS `cm0`
WHERE
(`cm0`.`User ID` IN (SELECT `cm1`.`User ID` FROM `mytable` AS `cm1` WHERE
(`cm1`.`ActionType` = 'A' AND (`cm1`.`Date` BETWEEN dateA AND
dateB)))
AND (`cm0`.`ActionType` = 'B')
AND (`cm0`.`Date` BETWEEN dateA AND dateB))
I researched ways to do this using common table expressions, and then realized I couldn't do those in mySQL. Now I'm trying to figure out how to optimize with EXISTS instead of IN, but I'm having trouble fitting examples into what I need. Any help would be much appreciated!
Try this:
SELECT COUNT(DISTINCT cm0.User_ID) AS Users
FROM mytable AS cm0
WHERE cm0.ActionType IN ('A', 'B') AND cm0.Date BETWEEN dateA AND dateB
GROUP BY cm0.User_ID
HAVING COUNT(DISTINCT cm0.ActionType) = 2;
The above query will return the number of users who have done both action A and action B between October and December, but if you still want to use EXISTS then check below query:
SELECT COUNT(DISTINCT cm0.User_ID) AS Users
FROM mytable AS cm0
WHERE EXISTS (SELECT 1 FROM mytable AS cm1
WHERE cm0.User_ID = cm1.User_ID AND cm1.ActionType = 'A' AND cm1.Date BETWEEN dateA AND dateB
) AND cm0.ActionType = 'B' AND cm0.Date BETWEEN dateA AND dateB

SQL - select rows that have the same value in two columns

The solution to the topic is evading me.
I have a table looking like (beyond other fields that have nothing to do with my question):
NAME,CARDNUMBER,MEMBERTYPE
Now, I want a view that shows rows where the cardnumber AND membertype is identical. Both of these fields are integers. Name is VARCHAR. Name is not unique, and duplicate cardnumber, membertype should show for the same name, as well.
I.e. if the following was the table:
JOHN | 324 | 2
PETER | 642 | 1
MARK | 324 | 2
DIANNA | 753 | 2
SPIDERMAN | 642 | 1
JAMIE FOXX | 235 | 6
I would want:
JOHN | 324 | 2
MARK | 324 | 2
PETER | 642 | 1
SPIDERMAN | 642 | 1
this could just be sorted by cardnumber to make it useful to humans.
What's the most efficient way of doing this?
What's the most efficient way of doing this?
I believe a JOIN will be more efficient than EXISTS
SELECT t1.* FROM myTable t1
JOIN (
SELECT cardnumber, membertype
FROM myTable
GROUP BY cardnumber, membertype
HAVING COUNT(*) > 1
) t2 ON t1.cardnumber = t2.cardnumber AND t1.membertype = t2.membertype
Query plan: http://www.sqlfiddle.com/#!2/0abe3/1
You can use exists for this:
select *
from yourtable y
where exists (
select 1
from yourtable y2
where y.name <> y2.name
and y.cardnumber = y2.cardnumber
and y.membertype = y2.membertype)
SQL Fiddle Demo
Since you mentioned names can be duplicated, and that a duplicate name still means is a different person and should show up in the result set, we need to use a GROUP BY HAVING COUNT(*) > 1 in order to truly detect dupes. Then join this back to the main table to get your full result list.
Also since from your comments, it sounds like you are wrapping this into a view, you'll need to separate out the subquery.
CREATE VIEW DUP_CARDS
AS
SELECT CARDNUMBER, MEMBERTYPE
FROM mytable t2
GROUP BY CARDNUMBER, MEMBERTYPE
HAVING COUNT(*) > 1
CREATE VIEW DUP_ROWS
AS
SELECT t1.*
FROM mytable AS t1
INNER JOIN DUP_CARDS AS DUP
ON (T1.CARDNUMBER = DUP.CARDNUMBER AND T1.MEMBERTYPE = DUP.MEMBERTYPE )
SQL Fiddle Example
If you just need to know the valuepairs of the 3 fields that are not unique then you could simply do:
SELECT concat(NAME, "|", CARDNUMBER, "|", MEMBERTYPE) AS myIdentifier,
COUNT(*) AS count
FROM myTable
GROUP BY myIdentifier
HAVING count > 1
This will give you all the different pairs of NAME, CARDNUMBER and MEMBERTYPE that are used more than once with a count (how many times they are duplicated). This doesnt give you back the entries, you would have to do that in a second step.

Select from one table but filtering other two

Let's say i've got this database:
book
| idBook | name |
|--------|----------|
| 1 |Book#1 |
category
| idCateg| category |
|--------|----------|
| 1 |Adventures|
| 2 |Science F.|
book_categ
| id | idBook | idCateg | DATA |
|--------|--------|----------|--------|
| 1 | 1 | 1 | (null) |
| 2 | 1 | 2 | (null) |
I'm trying to select only the books which are in category 1 AND category 2 something like this
SELECT book.* FROM book,book_categ
WHERE book_categ.idCateg = 1 AND book_categ.idCateg = 2
Obviously, this giving 0 results becouse each row has only one idCateg it does work width OR but the results are not what I need. I've also tried to use a join, but I just can't get the results I expect.
Here it's the SQLFiddle of my current project, with my current DB, the data at the begining is just a sample. SQLFiddle
Any help will be really appreciated.
Solution using EXISTS:
select *
from book b
where exists (select 'x'
from book_categ x
where x.idbook = b.idbook
and x.idcateg = 1)
and exists (select 'x'
from book_categ x
where x.idbook = b.idbook
and x.idcateg = 2)
Solution using join with an inline view:
select *
from book b
join (select idbook
from book_categ
where idcateg in (1, 2)
group by idbook
having count(*) = 2) x
on b.idbook = x.idbook
You could try using ALL instead of IN (if you only want values that match all criteria to be returned):
SELECT book.*
FROM book, book_categ
WHERE book_categ.idCateg = ALL(1 , 2)
One way to get the result is to do join to the book_categ table twice, something like
SELECT b.*
FROM book b
JOIN book_categ c1
ON c1.book_id = b.id
AND c1.idCateg = 1
JOIN book_categ c2
ON c2.book_id = b.id
AND c2.idCateg = 2
This assumes that (book_id, idCateg) is constrained to be unique in the book_categ table. If it isn't unique, then this query can return duplicate rows. Adding a GROUP BY clause or the DISTINCT keyword will eliminate any generated duplicates.
There are several other queries that can get generate the same result.
For example, another approach to finding book_id that are in two categories is to get all the rows with idCateg values of 1 or 2, and then GROUP BY book_id and get a count of DISTINCT values...
SELECT b.*
FROM book b
JOIN ( SELECT d.book_id
FROM book_categ d
WHERE d.idCateg IN (1,2)
GROUP BY d.book_id
HAVING COUNT(DISTINCT d.idCateg) = 2
) c
ON c.book_id = b.id