I have a example table below. I am trying to create a SQL query that gets all user_ids besides user_id of the current user and then orders by number of matches to the row with the current user_id
For example, if the user has a user_id of '1', I want to get all of the user_ids corresponding with the rows of id 2-8, and then order the user_ids from most matches to the row of the current user to least matches with the row of the current user
Let's say var current_user = 1
Something like this:
SELECT user_id
FROM assets
WHERE user_id <> `current_user` and
ORDER BY most matches to `current_user`"
The output should get 7,8,3,9,2
I would appreciate anyone's input on how I can effectively achieve this.
Table assets
+----------+---------+-------+--------+-------+
| id | user_id | cars | houses | boats |
+----------+---------+-------+--------+-------+
| 1 | 1 | 3 | 2 | 3 |
| 2 | 8 | 3 | 2 | 5 |
| 3 | 3 | 3 | 2 | 2 |
| 4 | 2 | 5 | 1 | 5 |
| 5 | 9 | 5 | 7 | 3 |
| 8 | 7 | 3 | 2 | 3 |
+----------+---------+-------+--------+-------+
I think you can just do this:
select a.*
from assets a cross join
assets a1
where a1.user_id = 1 and a.user_id <> a1.user_id
order by ( (a.cars = a1.cars) + (a.houses = a1.houses) + (a.boats = a1.boats) ) desc;
In MySQL, a boolean expression is treated as an integer in a numeric context, with 1 for true and 0 for false.
If you want to be fancier, you could order by the total difference:
order by ( abs(a.cars - a1.cars) + abs(a.houses - a1.houses) + abs(a.boats - a1.boats) );
This is called Manhattan distance, and you would be implementing a version of a nearest neighbor model.
Related
I have a table that has an autoincremented numeric primary. I'm trying to get a count of rows that match a condition grouped by increments of their primary key. Given the data:
| id | value |
|----|-------|
| 1 | a |
| 2 | b |
| 3 | a |
| 4 | a |
| 5 | b |
| 6 | a |
| 7 | b |
| 8 | a |
| 9 | b |
| 10 | b |
| 11 | a |
| 12 | b |
If I wanted to know how many rows matched value = 'a' for every five rows, the result should be:
| count(0) |
|----------|
| 3 |
| 2 |
| 1 |
I can nest a series of subqueries in the SELECT statement, like such:
SELECT (SELECT count(0)
FROM table
WHERE value = 'a'
AND id > 0
AND id <= 5) AS `1-5`,
(SELECT count(0)
FROM table
WHERE value = 'a'
AND id > 5
AND id <=10) AS `6-10`,
...
But is there a way to do this with a GROUP BY statement or something similar where I don't have to manually write out the increments? If not, is there a more time efficient method than a series of subqueries in the SELECT statement as in the above example?
You could divide the ID by 5 and then ceil the result:
SELECT CONCAT((CEIL(id / 5.0) - 1) * 5, '-', CEIL(id / 5.0) * 5), COUNT(*)
FROM mytable
WHERE value = 'a'
GROUP BY CEIL(id / 5.0)
The following aggregated query should do the trick :
SELECT CEIL(id/5), COUNT(*)
FROM table
WHERE value = 'a'
GROUP BY CEIL(id/5)
+------+---------+--------+---------+---------+---------+
| id | user_id | obj_id | created | applied | content |
+------+---------+--------+---------+---------+---------+
| 1 | 1 | 1 | 1 | 1 | ... |
| 2 | 1 | 2 | 1 | 1 | ... |
| 3 | 1 | 1 | 1 | 2 | ... |
| 4 | 1 | 2 | 2 | 2 | ... |
| 5 | 2 | 1 | 1 | 1 | ... |
| 6 | 2 | 2 | 1 | 1 | ... |
+------+---------+--------+---------+---------+---------+
I have a table similar to the one above. id, user_id and obj_id are foreign keys; created and applied are timestamps stored as integers. I need to get the entire row, grouped by user_id and obj_id, with the maximum value of applied. If two rows have the same applied value, I need to favour the maximum value of created. So for the above data, my desired output is:
+------+---------+--------+---------+---------+---------+
| id | user_id | obj_id | created | applied | content |
+------+---------+--------+---------+---------+---------+
| 1 | 1 | 1 | 1 | 1 | ... |
| 4 | 1 | 2 | 2 | 2 | ... |
| 5 | 2 | 1 | 1 | 1 | ... |
| 6 | 2 | 2 | 1 | 1 | ... |
+------+---------+--------+---------+---------+---------+
My current solution is to get everything ordered by applied then created:
select * from data order by applied desc created desc;
and sort things out in the code, but this table gets pretty big and I'd like an SQL solution that just gets the data I need.
select *
from my_table
where id in (
/* inner subquery b */
select max(id)
from my_table where
(user_id, obj_id, applied, created) in (
/* inner subquery A */
select user_id, obj_id, max(applied), max(created)
from my_table
group by user_id, obj_id
)
);
Then inner subquery A return the (distinct) rows having user_id, obj_id, max(applied), max(created). Using these with in clause the subquery B retrive a list of single ID each realated the a row with a proper value of user_id, obj_id, max(applied), max(created). so you have a collection of valid id for getting your result.
The main select use these ID for select the result you need.
Thanks to Mark Heintz in the comments, this answer got me to where I need to be.
SELECT
data.id,
data.user_id,
data.obj_id,
data.created,
data.applied,
data.content
FROM data
LEFT JOIN data next_max_applied ON
next_max_applied.user_id = data.user_id AND
next_max_applied.obj_id = data.obj_id AND (
next_max_applied.applied > data.applied OR (
next_max_applied.applied = data.applied AND
next_max_applied.created > data.created
)
)
WHERE next_max_applied.applied IS NULL
GROUP BY user_id, obj_id;
Go read the answer for details on how it works; the left join tries to find a more recently applied row for the same user and object. If there isn't one, it will find a row applied at the same time, but created more recently.
The above means that any row without a more recent row to replace it will have a next_max_applied.applied value of null. These rows are filtered for by the IS NULL clause.
Finally, the group by clause handles any rows that have identical user, object, applied and created columns.
I'm trying to get a distinct list of results, distinct based on user, where the selected result would be based on a set of parameters. To break it down, I have users, logs, and files. Each user can be on multiple logs and can have multiple files. Files CAN be associated with logs or not, and can also have a 'billing' flag set to true. What I'm trying to do when someone selects a log is bring up the list of files most closely associated with both the 'billing' flag and the log.
If the user has a file that is associated with the log AND has the
'billing' flag set to true, that is the result for that user.
If that is not available, the next would be the file that only has the 'billing' flag set to true (associated with any highest log or none).
If that is not available, the highest log number.
Here is the generalization of the tables:
Test Table:
+----+------+-----+
| ID | user | log |
+----+------+-----+
| 1 | 1 | 2 |
| 2 | 1 | 2 |
| 3 | 2 | 2 |
| 4 | 3 | 2 |
| 5 | 3 | 2 |
| 6 | 4 | 2 |
+----+------+-----+
File Table:
+----+-------+-----+---------+------+
| ID | file | log | billing | user |
+----+-------+-----+---------+------+
| 1 | a.pdf | 2 | 0 | 1 |
| 2 | b.pdf | 3 | 1 | 1 |
| 3 | c.pdf | 1 | 0 | 2 |
| 4 | d.pdf | 2 | 1 | 2 |
| 5 | e.pdf | 1 | 0 | 3 |
| 6 | f.pdf | 3 | 0 | 3 |
| 7 | g.pdf | 0 | 1 | 4 |
| 8 | h.pdf | 1 | 0 | 4 |
| 9 | i.pdf | 2 | 1 | 4 |
| 10 | j.pdf | 3 | 0 | 4 |
+----+-------+-----+---------+------+
In this case I would want to get:
+------+-------+-----+---------+
| user | file | log | billing |
+------+-------+-----+---------+
| 1 | b.pdf | 3 | 1 |
| 2 | d.pdf | 2 | 1 |
| 3 | f.pdf | 3 | 0 |
| 4 | i.pdf | 2 | 1 |
+------+-------+-----+---------+
My simplified query so far returns all files for the users but I'm having trouble grouping based on the above parameters.
SELECT
user,
file,
log,
billing
FROM
files
WHERE
user IN (
SELECT
DISTINCT(user)
FROM
tests
WHERE
log = 2
)
ORDER BY
CASE
WHEN log = 2 AND billing = 1 THEN 1
WHEN billing = 1 THEN 2
ELSE -1
END
Any help would be greatly appreciated.
You can use a separate query to get the results based on each of the 3 criteria specified in the OP, then UNION the results from these queries and fetch result from first query if available, otherwise from second query, otherwise from third query:
SELECT user, file, log, billing
FROM (
SELECT #row_number:=CASE WHEN #user=user THEN #row_number+1
ELSE 1
END AS row_number,
#user:=user AS user,
file, log, billing
FROM (
-- 1st query: has biggest priority
SELECT 1 AS pri, t.user, f.file, f.log, f.billing
FROM (SELECT DISTINCT user, log
FROM tests
WHERE log = 2) AS t
INNER JOIN files AS f
ON (t.user = f.user AND t.log = f.log AND f.billing = 1)
UNION ALL
-- 2nd query: priority = 2
SELECT 2 AS pri, t.user, f.file, f.log, f.billing
FROM (SELECT DISTINCT user, log
FROM tests
WHERE log = 2) AS t
INNER JOIN files AS f
ON (t.user = f.user AND f.billing = 1)
WHERE f.log > t.log OR f.log = 0
UNION ALL
-- 3rd query: priority = 3
SELECT 3 AS pri, t.user, f.file, f.log, f.billing
FROM (SELECT DISTINCT user, log
FROM tests
WHERE log = 2) AS t
INNER JOIN files AS f ON (t.user = f.user)
ORDER BY user, pri, log DESC ) s ) r
WHERE r.row_number = 1
ORDER BY user
pri column is used so as to discern and prioritize results between the three separate queries. #row_number and #user variables are used in order to simulate ROW_NUMBER() OVER (PARTITION BY user ORDER BY pri) window function. Using #row_number in the outermost query we can select the required record, i.e. the record having the highest priority within each 'user' partition.
SQL Fiddle Demo
Assume I have the following table
+----+--------+--------+
| id | result | person |
+----+--------+--------+
| 1 | 1 | 1 |
| 2 | 2 | 2 |
| 3 | 2 | 2 |
| 4 | 4 | 3 |
| 5 | 4 | 1 |
| 6 | 1 | 2 |
+----+--------+--------+
Now I want to get the best result by each person ordered high to low, where best result means highest value of the result-column, so basically I want to GROUP BY person and ORDER BY result. Also if a person has the same result more than one time, I only want to return want one of those results. So the return I want is this:
+----+--------+--------+
| id | result | person |
+----+--------+--------+
| 4 | 4 | 3 |
| 5 | 4 | 1 |
| 2 | 2 | 2 |
+----+--------+--------+
The following query almost gets me there:
SELECT id, groupbytest.result, groupbytest.person
FROM groupbytest
JOIN (
SELECT MAX(result) as res, person
FROM groupbytest
GROUP BY person
) AS tmp
ON groupbytest.result = tmp.res
AND groupbytest.person = tmp.person
ORDER BY groupbytest.result DESC;
but returns two rows for the same person, if this person has made the same best result twice, so what I get back is
+----+--------+--------+
| id | result | person |
+----+--------+--------+
| 4 | 4 | 3 |
| 5 | 4 | 1 |
| 2 | 2 | 2 |
| 3 | 2 | 2 |
+----+--------+--------+
If two results for the same person are similar, only the one with lowest id should be returned, so instead of returning rows with ids 2 and 3, only row with id 2 should be returned.
Any ideas how to implement this?
Try this:
SELECT ttable.* from ttable
inner join
(
SELECT max(ttable.id) as maxid FROM `ttable`
inner join (SELECT max(`result`) as res, `person` FROM `ttable` group by person) t
on
ttable.result = t.res
and
ttable.person = t.person
group by ttable.person ) tt
on
ttable.id = tt.maxid
Check if tmp results in the correct resulting table. I think tmp should group correctly. The join adds new rows, because you have different values of "id".
Hence the rows with different id's will be treatet as different rows, no matter if the other columns are equal. You do not have duplicate results as long as there is no duplicate id. Try to remove the id from the SELECT. Then you should have the result you wanted, but without the id.
Example: Imagine Rooms with your id's from above. Let result be the amount of tables in the room and person the amount of people. Just because you have randomly the same amount of tables and people in room 2 and 3, it doesn't mean, that this are the same rooms.
Here is the sqlFiddle
I want to filter the users who have selected entities ,So if I want to filter user with entity say entity having ids "1" and "3" I hope to get the users which have both of these entities.
No of entities selected can vary in number .
Query I am using is
SELECT user_id from user_entities where entity_id IN(1,3)
but for obvious reason it is returing me result as
+----+-----------+---------+--------+
| ID | ENTITY_ID | USER_ID | STATUS |
+----+-----------+---------+--------+
| 1 | 1 | 3 | 1 |
| 2 | 3 | 3 | 1 |
| 7 | 1 | 2 | 1 |
| 29 | 3 | 1 | 1 |
+----+-----------+---------+--------+
So I will apply distinct to it it will give me user id with ids 1,2,3 but I only want user 3 as this is the only user having both entities .
What can be modified to get the exact results
You could join the table to itself specifying both IDs as part of the join condition:
SELECT e1.user_id
FROM user_entities e1
INNER JOIN user_entities e2
ON e1.user_id = e2.user_id AND
e1.entity_id = 1 AND
e2.entity_id = 3;