Get the duplicate entries from table mysql - mysql

I have the table structure as shown below. The database is MariaDB.
+-----------+----------+--------------+-----------------+
| id_object | name | value_double | value_timestamp |
+-----------+----------+--------------+-----------------+
| 1 | price | 1589 | null |
| 1 | payment | 1590 | null |
| 1 | date | null | 2012-04-17 |
| 2 | price | 1589 | null |
| 2 | payment | 1590 | null |
| 2 | date | null | 2012-04-17 |
| 3 | price | 1589 | null |
| 3 | payment | 1590 | null |
| 3 | date | null | 2012-09-25 |
| ... | ... | ... | .. |
+-----------+----------+--------------+-----------------+
1) I need to get the duplicates by three entries: price & payment & date;
For example: the record with id_object=2 is duplicate because price, payment and date are the same as values of the record with id_object=1. Record with id_object = 3 is not the duplicate because the date is different (2012-09-25 != 2012-04-17)
2) I should remove the duplicates except one copy of them.
I thought to do three select operations and join each select on id_object. I can get the duplicates by one entry (price | payment | date). I faced the problem doing the joins
SELECT `id_object`,`name`,{P.`value_double` | P.`value_timestamp`}
FROM record P
INNER JOIN(
SELECT {value_double | value_timestamp}
FROM record
WHERE name = {required_entry}
GROUP BY {value_double | value_timestamp}
HAVING COUNT(id_object) > 1
)temp ON {P.value_double = temp.value_double | P.value_timestamp = temp.value_timestamp}
WHERE name = {required_entry}
Can someone help and show the pure (better) solution?

Though less efficient than certain alternatives, I find an approach along these lines easier to read...
SELECT MIN(id_object) id_object
, price
, payment
, date
FROM
( SELECT id_object
, MAX(CASE WHEN name = 'price' THEN value_double END) price
, MAX(CASE WHEN name = 'payment' THEN value_double END) payment
, MAX(CASE WHEN name = 'date' THEN value_timestamp END) date
FROM eav
GROUP
BY id_object
) x
GROUP
BY price
, payment
, date;

I would just group_concat() the values together and do the test that way:
select t.*
from t join
(select min(id_object) id_object
from (select id_object,
group_concat(name, ':', coalesce(value_double, ''), ':', coalesce(value_timestamp, '') order by name) pairs
from t
where name in ('price', 'payment', 'date')
group by id_object
) tt
group by pairs
) tt
on t.id_object = tt.id_object;
To actually delete the ones that are not the minimum id for each group of related values:
delete t
from t left join
(select min(id) as id
from (select id, group_concat(name, ':', coalesce(value_double, ''), ':', coalesce(value_timestamp, '' order by name) as pairs,
from t
where name in ('price', 'payment', 'date')
group by id
) tt
group by pairs
) tt
on t.id = tt.id
where tt.id is null;

Related

Joining 2 SQL SELECT result into one query

I wanted to know if there's a way to join two or more result sets into one.
i have the following two queries
First query:
SELECT
CONCAT(day(db.prod_id.created_on),"-",month(db.prod_id.created_on),"-",year(db.prod_id.created_on)) as day_month_year,
db.country.country ,
count(concat(day(db.prod_id.created_on),"-",month(db.prod_id.created_on),"-",year(db.prod_id.created_on))) as count ,
COUNT(DISTINCT db.prod_id.email) AS MAIL
from db.prod_id
left join db.country on db.prod_id.branch_id = db.country.id
where db.prod_id.created_on > '2020-11-17' and (db.country.type = 1 or db.country.type = 2)
group by
concat(day(db.prod_id.created_on),"-",month(db.prod_id.created_on),"-",year(db.prod_id.created_on)),
db.country.country
order by db.prod_id.created_on
The second query:
select
CONCAT(day(db.prod_id.created_on),"-",month(db.prod_id.created_on),"-",year(db.prod_id.created_on)) as day_month_year,
db.country.country,
count(CONCAT(day(db.prod_id.created_on),"-",month(db.prod_id.created_on),"-",year(db.prod_id.created_on))) as count_BUY
from db.prod_id
left join db.prod_evaluations on db.prod_id.id = db.prod_evaluations.id
left join db.country on db.prod_id.branch_id = db.country.id
left join (Select prod_properties.prod_id, prod_properties.value From prod_properties Where prod_properties.property_id = 5) as db3 on db3.prod_id = db.prod_id.id
where db.prod_id.created_on > '2020-11-17'
and db3.value = 'online-buy' and db.prod_id.status_id <> 25
group by
concat(day(db.prod_id.created_on),"-",month(db.prod_id.created_on),"-",year(db.prod_id.created_on)),
db.country.country
order by db.prod_id.created_on
The first query give the following result:
+------------+---------+-------+------+
| day | Country | Count | Mail |
+------------+---------+-------+------+
| 17-11-2020 | IT | 200 | 100 |
| 17-11-2020 | US | 250 | 100 |
| 18-11-2020 | IT | 350 | 300 |
| 18-11-2020 | US | 200 | 100 |
+------------+---------+-------+------+
The second query give:
+------------+---------+-----------+
| day | Country | Count_BUY |
+------------+---------+-----------+
| 17-11-2020 | IT | 50 |
| 17-11-2020 | US | 70 |
| 18-11-2020 | IT | 200 |
| 18-11-2020 | US | 50 |
+------------+---------+-----------+
Now i want to merge these two result in one:
+------------+---------+-------+------+-----------+
| day | Country | Count | Mail | Count_BUY |
+------------+---------+-------+------+-----------+
| 17-11-2020 | IT | 200 | 100 | 50 |
| 17-11-2020 | US | 250 | 100 | 70 |
| 18-11-2020 | IT | 350 | 300 | 200 |
| 18-11-2020 | US | 200 | 100 | 50 |
+------------+---------+-------+------+-----------+
How can i perform this query?
I'm using mysql
Thanks
The simple way: You can join queries.
select *
from ( <your first query here> ) first_query
join ( <your second query here> ) second_query using (day_month_year, country)
order by day_month_year, country;
This is an inner join. You can also outer join of course. MySQL doesn't support full outer joins, though. If you want that, you'll have to look up how to emulate a full outer join in MySQL.
The hard way ;-) Merge the queries.
If I am not mistaken, your two queries can be reduced to
select
date(created_on),
branch_id as country,
count(*) as count_products,
count(distinct p.email) as count_emails
from db.prod_id
where created_on >= date '2020-11-17'
and branch_id in (select country from db.country where type in (1, 2))
group by date(created_on), branch_id
order by date(created_on), branch_id;
and
select
date(created_on),
branch_id as country,
count(*) as count_buy
from db.prod_id
where created_on >= date '2020-11-17'
and status_id <> 25
and prod_id in (select prod_id from prod_properties where property_id = 5 and status_id <> 25)
group by date(created_on), branch_id
order by date(created_on), branch_id;
The two combined should be
select
date(created_on),
branch_id as country,
sum(branch_id in (select country from db.country where type in (1, 2)) as count_products,
count(distinct case when branch_id in (select country from db.country where type in (1, 2) then p.email end) as count_emails,
sum(status_id <> 25 and prod_id in (select prod_id from prod_properties where property_id = 5 and status_id <> 25)) as count_buy
from db.prod_id
where created_on >= date '2020-11-17'
group by date(created_on), branch_id
order by date(created_on), branch_id;
You see, the conditions the queries have in common remain in the where clause and the other conditions go inside the aggregation functions.
sum(boolean) is short for sum(case when boolean then 1 else 0 end), i.e. this counts the rows where the condition is met in MySQL.

Big Query: Join single latest row from second table

I have two tables. One is a list of Orders, and one is a list of Events.
For each Order, I want to join the single last Event that happened (using clicked_at) before the created_at of the Order.
I have tried numerous ways to get this to work and tried several other answers on Stack Overflow but I am struggling to return the correct data.
The sudo logic for the subquery in my mind is something like:
SELECT campaign, user_id, created_at
FROM `Events`
WHERE order.user_id = user_id AND clicked_at < order.created_at
ORDER created_at DESC
LIMIT 1
Please see the example data below:
# Orders
| order_id | user_id | created_at |
-----------------------------------
| 123 | abc | 2020-07-04 |
| 456 | abc | 2020-05-01 |
# Events
| campaign | keyword | user_id | clicked_at |
----------------------------------------------
| facebook | shoes | abc | 2020-07-03 |
| google | hair | abc | 2020-07-01 |
My desired result
# Orders with campaign attribution
| order_id | user_id | created_at | campaign | keyword |
---------------------------------------------------------
| 123 | abc | 2020-07-04 | facebook | shoes |
| 456 | abc | 2020-06-04 | null | null |
Thanks!
Alex
Below is for BigQuery Standard SQL
#standardSQL
SELECT a.*, campaign, keyword
FROM `project.dataset.orders` a
LEFT JOIN (
SELECT
ANY_VALUE(o).*,
ARRAY_AGG(STRUCT(campaign, keyword) ORDER BY clicked_at DESC LIMIT 1)[OFFSET(0)].*
FROM `project.dataset.orders` o
JOIN `project.dataset.events` e
ON o.user_id = e.user_id
AND clicked_at < created_at
GROUP BY FORMAT('%t', o)
)
USING(order_id)
if applied to sample data from our question - result is
Row order_id user_id created_at campaign keyword
1 123 abc 2020-07-04 facebook shoes
2 456 abc 2020-05-01 null null
with orders as (
select 123 as order_id, 'abc' as user_id, cast('2020-07-04' as date) as created_at union all
select 456, 'abc', '2020-05-01'
),
events as (
select 'facebook' as campaign, 'shoes' as keyword, 'abc' as user_id, cast('2020-07-03' as date) as clicked_at union all
select 'google', 'hair', 'abc', '2020-07-01'
),
logic as (
select
orders.order_id,
orders.user_id,
orders.created_at,
events.clicked_at,
events.campaign,
events.keyword,
row_number() over (partition by orders.order_id order by events.clicked_at desc) as rn
from orders
left join events
on orders.user_id = events.user_id and events.clicked_at < orders.created_at
)
select * except(rn)
from logic
where rn = 1

Customizing ROLLUP row mySQL

I want to summarize the sales data and I want to sum its total in the last row, I'm using "GROUP BY" and "WITH ROLLUP" but the results are:
+--------+--------------------+------------+--------+-----------+
| id | name | date | amount | total |
+--------+--------------------+------------+--------+-----------+
| Z00015 | Mebel Harmonis | 2019-05-09 | 2 | 10000000 |
| Z00016 | Mebel Harmonis | 2019-05-09 | 10 | 45000000 |
| Z00017 | Mebel Tunggal Jaya | 2019-05-10 | 3 | 12000000 |
| (null) | Mebel Tunggal Jaya | 2019-05-10 | 29 | 131000000 |
+--------+--------------------+------------+--------+-----------+
the last row that i want:
+--------+--------+--------+----+-----------+
| (null) | (null) | (null) | 29 | 131000000 |
+--------+--------+--------+----+-----------+
This is my query:
SELECT
order2.id_order AS id,
customer.name_customer AS name,
DATE( order2.date_order ) AS date ,
Count( order_detail.id_detail ) AS amount,
SUM( harga ) AS total
FROM
order_detail
INNER JOIN order2 ON order2.id_order = order_detail.id_order
INNER JOIN customer ON order2.id_customer = customer.id_customer
INNER JOIN produk ON produk.id_produk = order_detail.id_produk
INNER JOIN sofa ON sofa.id_sofa = produk.id_sofa
WHERE
date( date_order ) >= '2019-05-01'
AND date( date_order ) <= '2019-05-31'
GROUP BY
order2.id_order WITH ROLLUP;
You need to specify all the columns that can be combined together for the grand total in your GROUP BY clause:
GROUP BY id, name, date WITH ROLLUP
However, this will create intermediate subtotals for each id and id, name. You can filter them out with:
HAVING id IS NOT NULL OR (id IS NULL AND name IS NULL AND date IS NULL)

Finding duplicates from two columns, but show all rows MySQL

I have a table like this
| user_id | company_id | employee_id |
|---------|------------|-------------|
| 1 | 2 | 123 |
| 2 | 2 | 123 |
| 3 | 5 | 432 |
| 4 | 5 | 432 |
| 5 | 7 | 432 |
I have a query that looks like this
SELECT COUNT(*) AS Repeated, employee_id, GROUP_CONCAT(user_id) as user_ids, GROUP_CONCAT(username)
FROM user_company
INNER JOIN user ON user.id = user_company.user_id
WHERE employee_id IS NOT NULL
AND user_company.deleted_at IS NULL
GROUP BY employee_id, company_id
HAVING Repeated >1;
The results I am getting look like this
| Repeated | employee_id | user_ids |
|---------|--------------|------------|
| 2 | 123 | 2,3 |
| 2 | 432 | 7,8 |
I need results that look like this
| user_id |
|---------|
| 2 |
| 3 |
| 7 |
| 8 |
I realize my query is getting more, but that's just to make sure I'm getting the correct data. Now I need to get a single column result with each user_id in a new row for updating based on user_id in another query. I've tried this by only selecting the user_id but I only get two rows, I need all four rows of duplicates.
Any ideas on how to modify my query?
Here is the query to get all of your user_ids:
SELECT user_id
FROM user_company uc
INNER JOIN
(
SELECT employee_id, company_id
FROM user_company
WHERE employee_id IS NOT NULL
AND deleted_at IS NULL
GROUP BY employee_id, company_id
HAVING COUNT(employee_id) >1
) AS `emps`
ON emps.employee_id = uc.`employee_id`
AND emps.company_id = uc.`company_id`;
This query below will generate the query you are looking for.
SELECT CONCAT('UPDATE user_company SET employee_id = null WHERE user_id IN (', GROUP_CONCAT(user_id SEPARATOR ', '),')') AS user_sql
FROM user_company uc
INNER JOIN
(SELECT employee_id, company_id
FROM user_company
WHERE employee_id IS NOT NULL
AND deleted_at IS NULL
GROUP BY employee_id, company_id
HAVING COUNT(employee_id) >1) AS `emps`
ON emps.employee_id = uc.`employee_id`
AND emps.company_id = uc.`company_id`;

Get all rows from a table for a particular user along with sum

I have a table called real_estate its structure and data is as follows:-
| id | user_id | details | location | worth
| 1 | 1 | Null | Null | 10000000
| 2 | 1 | Null | Null | 20000000
| 3 | 2 | Null | Null | 10000000
My query is the folloeing:
SELECT * , SUM( worth ) as sum
FROM real_estate
WHERE user_id = '1'
The result which I get from this query is
| id | user_id | details | location | worth | sum
| 1 | 1 | Null | Null | 10000000 | 30000000
I want result to be like
| id | user_id | details | location | worth | sum
| 1 | 1 | Null | Null | 10000000 | 30000000
| 2 | 1 | Null | Null | 20000000 | 30000000
Is there any way to get the result the way I want or should I write 2 different queries?
1)To get the sum of worth
2)To get all the rows for that user
You need to use a subquery that calculates the sum for every user, and then JOIN the result of the subquery with your table:
SELECT real_estate.*, s.user_sum
FROM
real_estate INNER JOIN (SELECT user_id, SUM(worth) AS user_sum
FROM real_estate
GROUP BY user_id) s
ON real_estate.user_id = s.user_id
WHERE
user_id = '1'
but if you just need to return records for a single user, you could use this:
SELECT
real_estate.*,
(SELECT SUM(worth) FROM real_estate WHERE user_id='1') AS user_sum
FROM
real_estate
WHERE
user_id='1'
You can do your sum in a subquery like this
SELECT * , (select SUM(worth) from real_estate WHERE user_id = '1' ) as sum
FROM real_estate WHERE user_id = '1'
Group by id
SELECT * , SUM( worth ) as sum FROM real_estate WHERE user_id = '1' group by id