Remove duplicates from one column keeping whole rows - mysql

id | userid | total_points_spent
1 | 1 | 10
2 | 2 | 15
3 | 2 | 50
4 | 3 | 5
5 | 1 | 15
With the above table, I would first like to remove duplicates of userid keeping the rows with the largest total_points_spent, like so:
id | userid | total_points_spent
3 | 2 | 50
4 | 3 | 5
5 | 1 | 15
And then I would like to sum the values of total_points_spent, which would be the easy part, resulting in 70.

I am not really sure the "remove" you meant is to delete or to select. Here is the query for select only max totalpointspend record respectively.
SELECT tblA.*
FROM ( SELECT userid, MAX(totalpointspend) AS maxtotal
FROM tblA
GROUP BY userid ) AS dt
INNER JOIN tblA
ON tblA.userid = dt.userid
AND tblA.totalpointspend = dt.maxtotal
ORDER BY tblA.userid

Related

Sum of Counted records that calculated using "group by" with condition and "group by"

I'm sorry for fuzzy title of this question.
I have 2 Tables in my database and want to count records of first_table using "group by" on a foreign key id that exists in a column of second_table (which stores ids like array "1,2,3,4,5").
id | name | fk_id
1 | john | 1
2 | mike | 1
3 | jane | 2
4 | tailor | 1
5 | jane | 3
6 | tailor | 5
7 | jane | 4
8 | tailor | 5
9 | jane | 5
10 | tailor | 5
id | name | fk_ids | s_fk_id
1 | xxx | 1,5,6 | 1
2 | yyy | 2,3 | 1
3 | zzz | 9 | 1
4 | www | 7,8 | 1
Now i wrote the following query but it not working properly and displays wrong numbers.
I WANT TO:
1-Count records in first_table group by "fk_id"
2-Sum the counted records which exists in "fk_ids"
3-Display the sum result (sum of related counts) grouped by id.
symbol ' ' means ``.
select sum(if(FIND_IN_SET('fk_id', 'fk_ids')>0,'count',0) 'sum', 'count', 'from'.'fk_id', 'second_table'.* FROM 'second_table'
LEFT JOIN
(
SELECT 'fk_id', count(*) 'count'
FROM 'first_table'
group BY 'fk_id'
) AS 'from'
ON FIND_IN_SET('fk_id', 'fk_ids')>0
WHERE 'second_table'.'s_fk_id'=1
GROUP BY 'id'
ORDER by 'count' DESC
This table has many data and we have no plan to change the structure.
Edit:
Desired output:
id | name | sum
1 | xxx | 7 (3+4+0)
2 | yyy | 2 (1+1)
3 | zzz | 0 (0)
4 | www | 0 (0+0)
After two holidays i came back to work and found out that the "FIND_IN_SET" function is not working properly with space contained string.
And the problem is that i was ignored the spaces too, (same as this question)
Finnaly this query worked:
select sum(`count`) `sum`, `count`, `from`.`fk_id`, `second_table`.* FROM `second_table`
LEFT JOIN
(
SELECT `fk_id`, count(*) `count`
FROM `first_table`
group BY `fk_id`
) AS `from`
ON FIND_IN_SET(`fk_id`, replace(`fk_ids`,' ',''))>0
WHERE `second_table`.`s_fk_id`=1
GROUP BY `id`
ORDER by `count` DESC
And the magic is replace(fk_ids,' ','')

SQL order by match to specific row

I have a example table below. I am trying to create a SQL query that gets all user_ids besides user_id of the current user and then orders by number of matches to the row with the current user_id
For example, if the user has a user_id of '1', I want to get all of the user_ids corresponding with the rows of id 2-8, and then order the user_ids from most matches to the row of the current user to least matches with the row of the current user
Let's say var current_user = 1
Something like this:
SELECT user_id
FROM assets
WHERE user_id <> `current_user` and
ORDER BY most matches to `current_user`"
The output should get 7,8,3,9,2
I would appreciate anyone's input on how I can effectively achieve this.
Table assets
+----------+---------+-------+--------+-------+
| id | user_id | cars | houses | boats |
+----------+---------+-------+--------+-------+
| 1 | 1 | 3 | 2 | 3 |
| 2 | 8 | 3 | 2 | 5 |
| 3 | 3 | 3 | 2 | 2 |
| 4 | 2 | 5 | 1 | 5 |
| 5 | 9 | 5 | 7 | 3 |
| 8 | 7 | 3 | 2 | 3 |
+----------+---------+-------+--------+-------+
I think you can just do this:
select a.*
from assets a cross join
assets a1
where a1.user_id = 1 and a.user_id <> a1.user_id
order by ( (a.cars = a1.cars) + (a.houses = a1.houses) + (a.boats = a1.boats) ) desc;
In MySQL, a boolean expression is treated as an integer in a numeric context, with 1 for true and 0 for false.
If you want to be fancier, you could order by the total difference:
order by ( abs(a.cars - a1.cars) + abs(a.houses - a1.houses) + abs(a.boats - a1.boats) );
This is called Manhattan distance, and you would be implementing a version of a nearest neighbor model.

Check multiple columns for duplicate and list all records

I have a table with columns ID, Content and Day. I am trying to find all rows that have duplicate Content and Day values and display all rows
SELECT ID,Content, `Day`, Count(*)
FROM table
GROUP BY Content,`Day`
HAVING COUNT(*) > 1
The current code will return a list of duplicate Content and 'Day' values for instance:
ID|Content|Day
1 | a | 1
2 | a | 1
3 | a | 1
4 | b | 2
5 | b | 2
6 | c | 3
7 | c | 4
Will result in:
ID|Content|Day|Count
1 | a | 1 | 3
4 | b | 2 | 2
But I want to display all the unique IDs as well;
ID|Content|Day
1 | a | 1
2 | a | 1
3 | a | 1
4 | b | 2
5 | b | 2
Just make a Sub-Query
select *
from table
where `day` in
(
SELECT ID
FROM table
GROUP BY Content,`Day`
HAVING COUNT(*) > 1
) A
Use that query as a subquery to join against the table again:-
SELECT table.ID, table.Content, table.`Day`
FROM table
INNER JOIN
(
SELECT Content, `Day`, Count(*)
FROM table
GROUP BY Content,`Day`
HAVING COUNT(*) > 1
) sub0
ON sub0.Content = table.Content
AND sub0.`Day` = table.`Day`

MySQL/MariaDB GROUP BY, ORDER BY returns same result twice

Assume I have the following table
+----+--------+--------+
| id | result | person |
+----+--------+--------+
| 1 | 1 | 1 |
| 2 | 2 | 2 |
| 3 | 2 | 2 |
| 4 | 4 | 3 |
| 5 | 4 | 1 |
| 6 | 1 | 2 |
+----+--------+--------+
Now I want to get the best result by each person ordered high to low, where best result means highest value of the result-column, so basically I want to GROUP BY person and ORDER BY result. Also if a person has the same result more than one time, I only want to return want one of those results. So the return I want is this:
+----+--------+--------+
| id | result | person |
+----+--------+--------+
| 4 | 4 | 3 |
| 5 | 4 | 1 |
| 2 | 2 | 2 |
+----+--------+--------+
The following query almost gets me there:
SELECT id, groupbytest.result, groupbytest.person
FROM groupbytest
JOIN (
SELECT MAX(result) as res, person
FROM groupbytest
GROUP BY person
) AS tmp
ON groupbytest.result = tmp.res
AND groupbytest.person = tmp.person
ORDER BY groupbytest.result DESC;
but returns two rows for the same person, if this person has made the same best result twice, so what I get back is
+----+--------+--------+
| id | result | person |
+----+--------+--------+
| 4 | 4 | 3 |
| 5 | 4 | 1 |
| 2 | 2 | 2 |
| 3 | 2 | 2 |
+----+--------+--------+
If two results for the same person are similar, only the one with lowest id should be returned, so instead of returning rows with ids 2 and 3, only row with id 2 should be returned.
Any ideas how to implement this?
Try this:
SELECT ttable.* from ttable
inner join
(
SELECT max(ttable.id) as maxid FROM `ttable`
inner join (SELECT max(`result`) as res, `person` FROM `ttable` group by person) t
on
ttable.result = t.res
and
ttable.person = t.person
group by ttable.person ) tt
on
ttable.id = tt.maxid
Check if tmp results in the correct resulting table. I think tmp should group correctly. The join adds new rows, because you have different values of "id".
Hence the rows with different id's will be treatet as different rows, no matter if the other columns are equal. You do not have duplicate results as long as there is no duplicate id. Try to remove the id from the SELECT. Then you should have the result you wanted, but without the id.
Example: Imagine Rooms with your id's from above. Let result be the amount of tables in the room and person the amount of people. Just because you have randomly the same amount of tables and people in room 2 and 3, it doesn't mean, that this are the same rooms.

MySQL - Get row with the maximum HISTORY ID for COMPONENT IDs in non-existing months

I have a table INVENTORY which consists of inventory items. I have the following table structure:
INSTALLATION_ID
COMPONENT_ID
HISTORY_ID
ON_STOCK
LAST_CHANGE
I need to obtain the row with the max HISTORY ID for records for which the spcified LAST_CHANGE month doesn't exist.
Each COMPONENT_ID and INSTALLATION_ID can occur multiple times, they are distinguished by their respective HISTORY_ID
Example:
I have the following records
COMPONENT_ID | INSTALLATION_ID | HISTORY_ID | LAST_CHANGE
1 | 100 | 1 | 2013-01-02
1 | 100 | 2 | 2013-02-01
1 | 100 | 3 | 2013-04-09
2 | 100 | 1 | 2013-02-22
2 | 100 | 2 | 2013-03-12
2 | 100 | 3 | 2013-07-07
2 | 100 | 4 | 2013-08-11
2 | 100 | 5 | 2013-09-15
2 | 100 | 6 | 2013-09-29
3 | 100 | 1 | 2013-02-14
3 | 100 | 2 | 2013-09-23
4 | 100 | 1 | 2013-04-17
I am now trying to retrieve the rows with the max HISTORY ID for each component but ONLY for COMPONENT_IDs in which the specifiec month does not exists
I have tried the following:
SELECT
INVENTORY.COMPONENT_ID,
INVENTORY.HISTORY_ID
FROM INVENTORY
WHERE INVENTORY.HISTORY_ID = (SELECT
MAX(t2.HISTORY_ID)
FROM INVENTORY t2
WHERE NOT EXISTS
(
SELECT *
FROM INVENTORY t3
WHERE MONTH(t3.LAST_CHANGE) = 9
AND YEAR(t3.LAST_CHANGE)= 2013
AND t3.HISTORY_ID = t2.HISTORY_ID
)
)
AND INVENTORY.INSTALLATION_ID = 200
AND YEAR(INVENTORY.LAST_CHANGE) = 2013
The query seems to have correct syntax but it times out.
In this particular case, i would like to retrieve the maximum HISTORY_ID for all components except for those that have records in September.
Because I need to completely exclude rows by their month, i cannot use NOT IN, since they will just suppress the records for september but the same component could show up with another month.
Could anybody give some pointers? Thanks a lot.
If I understand correctly what you want you can do it like this
SELECT component_id, MAX(history_id) history_id
FROM inventory
WHERE last_change BETWEEN '2013-01-01' AND '2013-12-31'
AND installation_id = 100
GROUP BY component_id
HAVING MAX(MONTH(last_change) = 9) = 0
Output:
| COMPONENT_ID | HISTORY_ID |
|--------------|------------|
| 1 | 3 |
| 4 | 1 |
If you always filter by installation_id and a year of last_change make sure that you have a compound index on (installation_id, last_change)
ALTER TABLE inventory ADD INDEX (installation_id, last_change);
Here is SQLFiddle demo