count of individual column with group by on multiple columns - mysql

I have two columns account_number and customer_id. A single customer can have multiple account but a single account can't have multiple customer.
I have dumped a file containing account_num and its corresponding customer_id to db through LOAD DATA INFILE command. Now I am trying to validate through query does any account which has come multiple times in a file has same customer_id or different customer_id in two different rows.
REQUIREMENT : i want to return those accounts which has come multiple times but having diferent customer ids
I tried with group by , but didn't get desired result.
This is my query which is not giving the desired result
SELECT ACCOUNT_NUM,UNIQUE_CUSTOMER_ID,COUNT(UNIQUE_CUSTOMER_ID)
FROM LINKAGE_FILE
GROUP BY ACCOUNT_NUM, UNIQUE_CUSTOMER_ID
HAVING COUNT(ACCOUNT_NUM) > 1 AND COUNT(UNIQUE_CUSTOMER_ID) = 1;
Hope I am clear.

You can simply get the count of unique customer ids using COUNT(DISTINCT..) for every account_num and filter out those cases where count is more than 1, inside the HAVING clause:
SELECT
ACCOUNT_NUM,
COUNT(DISTINCT CUSTOMER_ID) AS unique_customer_count
FROM LINKAGE_FILE
GROUP BY ACCOUNT_NUM
HAVING unique_customer_count > 1

Drop the customer check into a join query like so
DROP TABLE if exists t;
create table t(accountid int,cid int);
insert into t values
(1,1),(1,2).(1,1),(2,3),(3,4),(3,4);
select distinct t.accountid,t.cid
from t
join
(
select accountid,count(distinct cid) cids
from t
group by accountid having cids > 1
) s on s.accountid = t.accountid;
+-----------+------+
| accountid | cid |
+-----------+------+
| 1 | 1 |
| 1 | 2 |
+-----------+------+
2 rows in set (0.00 sec)

You can use EXISTS :
SELECT lf.*
FROM LINKAGE_FILE lf
WHERE EXISTS (SELECT 1 FROM LINKAGE_FILE lf1 WHERE lf1.ACCOUNT_NUM = lf.ACCOUNT_NUM AND lf1.UNIQUE_CUSTOMER_ID <> lf.UNIQUE_CUSTOMER_ID);
However, you can also aggregation with your query :
SELECT ACCOUNT_NUM, COUNT(DISTINCT UNIQUE_CUSTOMER_ID)
FROM LINKAGE_FILE
GROUP BY ACCOUNT_NUM
HAVING COUNT(DISTINCT UNIQUE_CUSTOMER_ID) > 1;
By this, you can get only ACCOUNT_NUMs which have two or more CUSTOMER_IDs.

Related

SQL Formula for mysql Table

Hello – I have a DB table (MySQL ver 5.6.41-84.1-log) that has about 92,000 entries, with columns for:
id (incremental unique ID)
post_type (not important)
post_id (not important, but shows relation to another table)
user_id (not important)
vote (not important)
ip (IP Address, ie. 123.123.123.123)
voted (Datestamp in GMT, ie. 2018-12-03 04:50:05)
I recently ran a contest and we had a rule that no single IP could vote more than 60 times per day. So now I need to run a custom SQL formula that applies the following rule:
For each IP address, for each day, if there are > 60 rows, delete those additional rows.
Thank you for your help!
This is a complicated one, and I think it is hard to provide a 100% sure answer without actual table and data to play with.
However let me try to describe the logic, and build the query step by step so you can paly around with it and possibly fix lurking erros.
1) We start with selecting all ip adresses that posted more than 60 votes on a given day. For this we use a group by on the voting day and on the ip adress, combined with a having clause
select date(voted), ip_adress
from table
group by date(voted), ip_adress
having count(*) > 60
2) From then, we go back to the table and select the first 60 ids corresponding to each voting day / ip adress couple. id is an autoincremented field so we just sort using this field and the use the mysql limit instruction
select id, ip_adress, date(voted) as day_voted
from table
where ip_adress, date(voted) in (
select date(voted), ip_adress
from table
group by date(voted), ip_adress
having count(*) > 60
)
order by id
limit 60
3) Finally, we go back once again to the table and search for the all ids whose ip adress and day of vote belong to the above list, but whose id is greater than the max id of the list. This is achieved with a join and requires a group by clause.
select t1.id
from
table t1
join (
select id, ip_adress, date(voted) as day_voted
from table
where ip_adress, date(voted) in (
select date(voted), ip_adress
from table
group by date(voted), ip_adress
having count(*) > 60
)
order by id
limit 60
) t2
on t1.ip_adress = t2.ip_adress
and date(t1.voted) = t2.day_voted and t1.id > max(t2.id)
group by t1.id
That should return the list of all ids that we need to delete. Test if before you go further.
4) The very last step is to delete those ids. There are limitations in mysql that make a delete with subquery condition quite uneasy to achieve. See the following SO question for more information on the technical background. You can either use a temporary table to store the selected ids, or try to outsmart mysql by wrapping the subquery and aliasing it. Let us try with the second option :
delete t.* from table t where id in ( select id from (
select t1.id
from
table t1
join (
select id, ip_adress, date(voted) as day_voted
from table
where ip_adress, date(voted) in (
select date(voted), ip_adress
from table
group by date(voted), ip_adress
having count(*) > 60
)
order by id
limit 60
) t2
on t1.ip_adress = t2.ip_adress
and date(t1.voted) = t2.day_voted
and t1.id > max(t2.id)
group by t1.id
) x );
Hope this helps !
You could approach this by vastly simplifying your sample data and using row number simulation for mysql version prior to 8.0 or window function for versions 8.0 or above. I assume you are not on version 8 or above in the following example
drop table if exists t;
create table t(id int auto_increment primary key,ip varchar(2));
insert into t (ip) values
(1),(1),(3),(3),
(2),
(3),(3),(1),(2);
delete t1 from t t1 join
(
select id,rownumber from
(
select t.*,
if(ip <> #p,#r:=1,#r:=#r+1) rownumber,
#p:=ip p
from t
cross join (select #r:=0,#p:=0) r
order by ip,id
)s
where rownumber > 2
) a on a.id = t1.id;
Working in to out the sub query s allocates a row number per ip, sub query a then picks row numbers > 2 and the outer multi-table delete deletes from t joined to a to give
+----+------+
| id | ip |
+----+------+
| 1 | 1 |
| 2 | 1 |
| 3 | 3 |
| 4 | 3 |
| 5 | 2 |
| 9 | 2 |
+----+------+
6 rows in set (0.00 sec)
I had someone help me write the following query, which addressed my question.
SET SQL_SAFE_UPDATES = 0;
create table temp( SELECT id, ip, voted
FROM
(SELECT id, ip, voted,
#ip_rank := IF(#current_ip = ip, #ip_rank + 1, 1) AS ip_rank,
#current_ip := ip
FROM `table_name` where ip in (SELECT ip from `table_name` group by date(voted),ip having count(*) >60)
ORDER BY ip, voted desc
) ranked
WHERE ip_rank <= 2);
DELETE FROM `table_name`
WHERE id not in (select id from temp) and ip in (select ip from temp);
drop table temp;

How to get counts using the `IN` operator

I am trying to use the IN operator to get the count of certain fields in the table.
This is my query:
SELECT order_id, COUNT(*)
FROM remake_error_type
WHERE order_id IN (1, 2, 100)
GROUP BY order_id;
My current output:
| order_id | COUNT(*) |
+----------+----------+
| 1 | 8 |
| 2 | 8 |
My expected output:
| order_id | COUNT(*) |
+----------+----------+
| 1 | 8 |
| 2 | 8 |
| 100 | 0 |
You can write your query this way:
SELECT t.id, COUNT(remake_error_type.order_id)
FROM
(SELECT 1 AS id UNION ALL SELECT 2 UNION ALL SELECT 100) as t
LEFT JOIN remake_error_type
ON t.id = remake_error_type.order_id
GROUP BY
t.id
a LEFT JOIN will return all rows from the subquery on the left, and the COUNT(remake_error_type.order_id) will count all values where the join succeeds.
You can create a temporary table, insert as many order_ids as required, and perform the left join to remake_error_type. At a small number of orders the other answers are sufficient, but if you were doing this for a lot of orders, UNION ALL and sub-queries are inefficient, both to type it up and to execute on the server.
Additionally, this is a very dynamic approach, because you can control easily the values in your temp table by modifying the insert statement.
However, this will only work if the database user has sufficient privileges: at least select, create temporary and drop table.
DROP TABLE IF EXISTS myTempOrders;
CREATE TEMPORARY TABLE myTempOrders (order_id INTEGER, PRIMARY KEY(order_id));
INSERT INTO myTempOrders (order_id) VALUES (1), (2), (100);
SELECT temp.order_id, count(*)
FROM myTempOrders temp
LEFT JOIN remake_error_type ON temp.order_id = remake_error_type.order_id
GROUP BY 1
If the order_id values exist in some table, then it is possible to extract the desired result without creating a temporary table and inserting values into it.
To qualify, the table must
have an auto increment primary key with # rows greater than the maximum sought order_id value
have a starting increment value less than the minimum sought order_id value
have no missing values in the primary key (i.e. no records have been deleted)
if a qualified table exists, then you can run the following query, where you have to replace surrogate with the qualified table name and surrogate_id with the auto-incrementing primary key of the qualified table name
SELECT surrogate.surrogate_id, count(*)
FROM my_qualified_table surrogate
LEFT JOIN remake_error_type ON surrogate.surrogate_id = remake_error_type.order_id
WHERE surrogate.surrogate_id IN (1, 2, 100)
GROUP BY 1
You could use a union for this. No, this does not use the IN operator, but it is an alternative that will give you your expected results. One option is to hardcode the order_id and use conditional aggregation to get the SUM() of rows with that id:
SELECT 1 AS order_id, SUM(order_id = 1) AS numOrders FROM myTable
UNION ALL
SELECT 2 AS order_id, SUM(order_id = 2) AS numOrders FROM myTable
UNION ALL
SELECT 100 AS order_id, SUM(order_id = 100) AS numOrders FROM myTable;
Here is an SQL Fiddle example.

Find which values are not in table

Simple question, but I'm drawing a blank. Any help is appreciated.
I have a table of ids:
-------
| ids |
-------
| 1 |
| 5 |
| 7 |
-------
Except the actual table is thousands of entries long.
I have a list (x), not a table, of other ids, say 2, 6, 7. I need to see which ids from x are not in the ids table.
I need to get back (2,6).
I tried something like this:
SELECT id FROM ids WHERE id IN (2,6,7) GROUP BY id HAVING COUNT(*) = 0;
However, COUNT(*) returns count of retrieved rows only, it doesn't return 0.
Any suggestions?
Create a temporary table, insert the IDs that you need into it, and run a join, like this:
CREATE TEMPORARY TABLE temp_wanted (id BIGINT);
INSERT INTO temp_wanted(id) VALUES (2),(6),(7);
SELECT id
FROM temp_wanted t
LEFT OUTER JOIN ids i ON i.id=t.id
WHERE i.id IS NULL
Try something with "NOT IN" clause:
select * from
(SELECT 2 as id
UNION ALL
SELECT 6 as id
UNION ALL
SELECT 7 as id) mytable
WHERE ID not in (SELECT id FROM ids)
See fiddle here

Database combine two queries into one

I have table that has 2 fields userId and ebayitemId. Following is table from database:
userId | ebayitemId
12 | 1
12 | 2
12 | 3
12 | 4
In my situation, the client makes request with ebayitemId to see what other items are listed with this user.( ebayitemId is unique ). So far I am using two query to select all items listed by the user. First query is
SELECT userId WHERE ebayitemId = '1'
This query gets me USERID FOR THAT EBAYITEMID.
The second query is
SELECT ebayitemId WHERE userId = '$userid'
This gives me ebayitemId 1,2,3 and 4.
My question: Is there a way to combine these two queries into one query to get above result since only one table is involved?
The query :
SELECT iu.ebayitemId
FROM t_items AS iu
INNER JOIN t_items AS ii ON (iu.userId=ii.userId)
WHERE (ii.ebayitemId= $item )
and if you don't want the first item to be selected :
SELECT iu.ebayitemId
FROM t_items AS iu
INNER JOIN t_items AS ii ON (iu.userId=ii.userId)
WHERE (ii.ebayitemId= $item )
AND (iu.ebayitemId<>ii.ebayitemId)
Note : an IN statement would be less optimized.
It can be done with
SELECT ebayitemId from table WHERE userId in (SELECT userId from table WHERE ebayitemId = '1')
Naively:
SELECT ebayitemId
FROM yourtable
WHERE userId IN (
SELECT userId
FROM yourtable
WHERE ebayitemId = '1'
)
Note there are other ways to skin this cat with joins etc.

Retrieve Unique Values and Counts For Each

Is there a simple way to retrieve a list of all unique values in a column, along with how many times that value appeared?
Example dataset:
A
A
A
B
B
C
... Would return:
A | 3
B | 2
C | 1
Use GROUP BY:
select value, count(*) from table group by value
Use HAVING to further reduce the results, e.g. only values that occur more than 3 times:
select value, count(*) from table group by value having count(*) > 3
SELECT id,COUNT(*) FROM file GROUP BY id