I have a relation that is built from 2 integers photo_id , user_id and a string -info, (this is the tag) ,
primary key is (user_id, photo_id, info)
photo_id | user_id | info
---------------------------
5 | 3 | aa
7 | 6 | aa
2 | 2 | bb
1 | 2 | cc
1 | 9 | aa
2 | 8 | cc
1 | 4 | cc
9 | 9 | cc
I'm trying to find the k most common tags in my relation.
(secondary sort is by tags).
in this example i would like to get:
k=2 : aa , cc
k=1 : cc
By using this sql query :
SELECT info,tagCount
FROM (SELECT info, COUNT(photo_id) as tagCount
FROM Tags
GROUP BY info
ORDER BY tagCount DESC, info ASC) T
WHERE (SELECT count(info) FROM T T1
WHERE ((T1.tagCount > T.tagCount) OR
(T1.tagCount = T.tagCount AND T1.info < T.info))) < 'k';
But I get the error:
SQL error:
ERROR: relation "t" does not exist
Where is my mistake?
While I still remain unclear on what you are trying to achieve, and assuming the query is for MySQL (not "sql server") then the following may also help. Please note that the cause of the error message is that alias T refers to a resultset, but you cannot reuse that entire resultset in the where clause (the subquery T1 assume that you can reuse T). Regrettablly MySQL (at the time of writing) does not support common table expressions which would allow referencing T like this:
/* T as a common table expression (CTE) */
with T as (
SELECT info, COUNT(photo_id) as tagCount
FROM Tags
GROUP BY info
)
SELECT info,tagCount
, (SELECT count(info) FROM T T1
WHERE (T1.tagCount > T.tagCount) OR
(T1.tagCount = T.tagCount AND T1.info < T.info)
) as k
FROM T
ORDER BY tagCount DESC, info ASC
;
So, in the absence of a CTE capability, you have to repeat the initial subquery, like this:
SELECT
info
, tagCount
, (
SELECT
COUNT(info)
FROM (
SELECT
info
, COUNT(photo_id) AS tagCount
FROM Tags
GROUP BY
info
) T1
WHERE (T1.tagCount > T.tagCount)
OR (T1.tagCount = T.tagCount
AND T1.info < T.info)
)
AS k
FROM (
SELECT
info
, COUNT(photo_id) AS tagCount
FROM Tags
GROUP BY
info
) T
ORDER BY
tagCount DESC
, info ASC
;
and the result of that query (from the sample data) is as follows:
| info | tagCount | k |
|------|----------|---|
| cc | 4 | 0 |
| aa | 3 | 1 |
| bb | 1 | 2 |
Now, exactly how you derive the "expected result" shown in the question (where tag "bb" is not included) I remain unclear.
By the way. Another issue in your original query is that the where clause predicate is comparing an integer to 'k'
where (select count(info) ....) < 'k'
count(info) is an integer, 'k' is a string, so it will fail.
This may only be a step toward your solution as I don't completely understand the question. I think you need to count(distinct column) then use a much simpler where clause.
SQL Fiddle
MySQL 5.6 Schema Setup:
CREATE TABLE Tags
(`photo_id` int, `user_id` int, `info` varchar(2))
;
INSERT INTO Tags
(`photo_id`, `user_id`, `info`)
VALUES
(5, 3, 'aa'),
(7, 6, 'aa'),
(2, 2, 'bb'),
(1, 2, 'cc'),
(1, 9, 'aa'),
(2, 8, 'cc'),
(1, 4, 'cc'),
(9, 9, 'cc')
;
Query 1:
SELECT
info
, COUNT(distinct photo_id) AS photoCount
, COUNT(distinct user_id) AS userCount
FROM Tags
GROUP BY
info
ORDER BY
photoCount DESC
, userCount DESC
, info ASC
Results:
| info | photoCount | userCount |
|------|------------|-----------|
| cc | 3 | 4 |
| aa | 3 | 3 |
| bb | 1 | 1 |
Related
Following query...
SELECT event_id, user_id FROM EventUser WHERE user_id IN (1, 2)
...gives me the following result:
+----------+---------+
| event_id | user_id |
+----------+---------+
| 3 | 1 |
| 2 | 1 |
| 1 | 1 |
| 5 | 1 |
| 4 | 1 |
| 6 | 1 |
| 4 | 2 |
| 2 | 2 |
| 1 | 2 |
| 5 | 2 |
+----------+---------+
Now, I want to modify the above query so that I only get for example two rows for each user_id, eg:
+----------+---------+
| event_id | user_id |
+----------+---------+
| 3 | 1 |
| 2 | 1 |
| 4 | 2 |
| 5 | 2 |
+----------+---------+
I am thinking about something like this, which of course does not work:
SELECT event_id, user_id FROM EventUser WHERE user_id IN (1, 2) LIMIT 2 by user_id
Ideally, this should work with offsets as well because I want to use it for paginations.
For performance reasons it is essential to use the WHERE user_id IN (1, 2) part of the query.
One method -- assuming you have at least two rows for each user -- would be:
(select min(event_id) as event_id, user_id
from t
where user in (1, 2)
group by user_id
) union all
(select max(event_id) as event_id, user_id
from t
where user in (1, 2)
group by user_id
);
Admittedly, this is not a "general" solution, but it might be the simplest solution for what you want.
If you want the two biggest or smallest, then an alternative also works:
select t.*
from t
where t.user_id in (1, 2) and
t.event_id >= (select t2.event_id
from t t2
where t2.user_id = t.user_id
order by t2.event_id desc
limit 1, 1
);
Here is a dynamic example for such problems, Please note that this example is working in SQL Server, could not try on mysql for now. Please let me know how it works.
CREATE TABLE mytable
(
number INT,
score INT
)
INSERT INTO mytable VALUES ( 1, 100)
INSERT INTO mytable VALUES ( 2, 100)
INSERT INTO mytable VALUES ( 2, 120)
INSERT INTO mytable VALUES ( 2, 110)
INSERT INTO mytable VALUES ( 3, 120)
INSERT INTO mytable VALUES ( 3, 150)
SELECT *
FROM mytable m
WHERE
(
SELECT COUNT(*)
FROM mytable m2
WHERE m2.number = m.number AND
m2.score >= m.score
) <= 2
How about this?
SELECT event_id, user_id
FROM (
SELECT event_id, user_id, row_number() OVER (PARTITION BY user_id) AS row_num
FROM EventUser WHERE user_id in (1,2)) WHERE row_num <= n;
And n can be whatever
Later but help uses a derived table and the cross join.
For the example in this post the query will be this:
SELECT
#row_number:=CASE
WHEN #user_no = user_id
THEN
#row_number + 1
ELSE
1
END AS num,
#user_no:=user_id userid, event_id
FROM
EventUser,
(SELECT #user_no:=0,#row_number:=0) as t
group by user_id,event_id
having num < 3;
More information in this link.
Assuming that i have the following database table:
ID | Name | Type | Value|
--- --------- ---------- ------
1 | First | A | 10 |
2 | First | B | 20 |
3 | First | C | 30 |
4 | First | D | 40 |
5 | Second | A | 10 |
6 | Second | B | 20 |
and a previous query returned:
ID | Name | Type | Value|
--- --------- ---------- ------
1 | Third | A | 10 |
2 | Third | B | 20 |
3 | Third | C | 30 |
My question is:
What is the best way to query the first table and get all records that have at least all the type returned in the previous query?
In the above example the name "Third" has types A B C. Using these as a list, I would like to retrieve only the "First" records (as "First" has A B C D) but not "Second" (as "Second" has only A B - missing C).
The IN statement matches eveything, and I want the query to match at least all items in my "type" list. The list is does not necessarily come from an sql statement but can be provided
EDIT: I'm working with MySQL
Query
Included are two variations of the same query for either database.
MySQL
DBFiddle
SELECT main.*
FROM main
LEFT JOIN (
SELECT name, json_arrayagg(type) as type
FROM main
GROUP BY name
) AS main_agg USING(name)
WHERE EXISTS (
SELECT 1
FROM (
select json_arrayagg(type) as type
from query
group by name
) AS query_agg
WHERE JSON_CONTAINS(main_agg.type, query_agg.type)
)
groups types by name
uses the JSON_CONTAINS function to compare the table to the query
Postgres
SQLFiddle
WITH main_agg AS
(
SELECT name, array_agg(type) "type"
FROM main
GROUP BY name
)
SELECT main.*
FROM main
JOIN main_agg USING(name)
WHERE EXISTS (
SELECT 1
FROM (select array_agg(type) "type" from query group by name) query_agg
WHERE main_agg."type" #> query_agg."type"
)
groups types by name
utilizes the Array #> (contains operator) to compare to the query
Setup
(Works for MySQL or PostgreSQL)
CREATE TABLE main
(ID int, Name varchar(6), Type varchar(1), Value int)
;
INSERT INTO main
(ID, Name, Type, Value)
VALUES
(1, 'First', 'A', 10),
(2, 'First', 'B', 20),
(3, 'First', 'C', 30),
(4, 'First', 'D', 40),
(5, 'Second', 'A', 10),
(6, 'Second', 'B', 20)
;
CREATE TABLE query
(ID int, Name varchar(5), Type varchar(1), Value int)
;
INSERT INTO query
(ID, Name, Type, Value)
VALUES
(1, 'Third', 'A', 10),
(2, 'Third', 'B', 20),
(3, 'Third', 'C', 30)
;
In standard SQL, you can do this as a join and group by with some filtering. The following assumes that the types are unique for each name in each table:
with prevq as (
. . .
)
select t.name
from t join
prevq
on t.type = prevq.type
group by t.name
having count(*) = (select count(*) from prevq);
EDIT:
MySQL does not support CTEs (before 8.0). This is easy enough to do without:
select t.name
from t join
(<your query here>) prevq
on t.type = prevq.type
group by t.name
having count(*) = (select count(*) from (<your query here>) prevq);
I have the following table test:
+----+-------+
| id | value |
+----+-------+
| 1 | -3 |
| 2 | -5 |
| 3 | 10 |
| 4 | -1 |
+----+-------+
For MIN(value) I get -5, for MAX(value) I get 10, and for SUM(value) I get 1. However, I would like to get the minimum and maximum value when progressing through the table step by step.
Example 1: SELECT AWESOME_FUNCTION_SUM_MIN(value) FROM test ORDER BY id ASC
This should return -8 (first row is -3, plus the second row -5 results in the lowest value over the course of all values).
Example 2: SELECT AWESOME_FUNCTION_SUM_MAX(value) FROM test ORDER BY id ASC
This should return 2 (first row -3, second -5, and third row +10 leads to the highest value over the course of all values).
Obviously, ORDER BY does not really make sense, since it is used for ordering the results of a query, but I used it here anyways for demonstration purposes. To me, this is such a basic functionality, so I was surprised to find nothing about it. I potentially am using the wrong keywords. Can somebody help me out? Or do I have to extract all values and do the analysis externally (=not with MySQL)?
Create table/insert data.
CREATE TABLE test
(`id` INT, `value` INT)
;
INSERT INTO test
(`id`, `value`)
VALUES
(1, -3),
(2, -5),
(3, 10),
(4, -1)
;
MySQL doesnt have those functions but you can simulate them using a self join.
Query SUM_MIN
SELECT
SUM(test.value)
FROM
test
INNER JOIN (
SELECT
id
FROM
test
WHERE
test.value > 0
ORDER BY
id ASC
LIMIT 1
)
AS
positive_number
ON
test.id < positive_number.id
ORDER BY
test.id
Result
sum(test.value)
-----------------
-8
Query SUM_MAX
SELECT
SUM(test.value)
FROM
test
INNER JOIN (
SELECT
id
FROM
test
WHERE
test.value > 0
ORDER BY
id ASC
LIMIT 1
)
AS
positive_number
ON
test.id <= positive_number.id
ORDER BY
test.id
Result
sum(test.value)
-----------------
2
Here's one way:
SELECT x.*
, #least:=LEAST(#least,value) least
, #greatest:=GREATEST(#greatest,value) greatest
, #i:=#i+value running
FROM my_table x
, (SELECT #least:=1000,#greatest:=-1000,#i:=0) vars
ORDER
BY id;
+----+-------+-------+----------+---------+
| id | value | least | greatest | running |
+----+-------+-------+----------+---------+
| 1 | -3 | -3 | -3 | -3 |
| 2 | -5 | -5 | -3 | -8 |
| 3 | 10 | -5 | 10 | 2 |
| 4 | -1 | -5 | 10 | 1 |
+----+-------+-------+----------+---------+
To get a cumulative sum, you can join a table to itself.
select min(val)
from (select sum(a.value) as val from test a join test b
on a.id<=b.id group by b.id) t1;
/* answer: -8 */
select max(val)
from (select sum(a.value) as val from test a join test b
on a.id<=b.id group by b.id) t1;
/* answer: 2 */
I have a table as so...
----------------------------------------
| id | name | group | number |
----------------------------------------
| 1 | joey | 1 | 2 |
| 2 | keidy | 1 | 3 |
| 3 | james | 2 | 2 |
| 4 | steven | 2 | 5 |
| 5 | jason | 3 | 2 |
| 6 | shane | 3 | 3 |
----------------------------------------
I'm running a select like so:
SELECT * FROM table WHERE number IN (2,3);
The problem im trying to solve is that I want to only grab get results from groups that have 1 or more rows of each number. For instance the above query is returning id's 1-2-3-5-6, when I'd like the results to exclude id 3 since the group of '2' can only return 1 result for the number of '2' and not for BOTH 2 and 3, since there's no row with the number 3 for the group 2 i'd like it to not even select id 3 at all.
Any help would be great.
Try it this way
SELECT *
FROM table1 t
WHERE number IN(2, 3)
AND EXISTS
(
SELECT *
FROM table1
WHERE number IN(2, 3)
AND `group` = t.`group`
GROUP BY `group`
HAVING MAX(number = 2) > 0
AND MAX(number = 3) > 0
)
or
SELECT *
FROM table1 t JOIN
(
SELECT `group`
FROM table1
WHERE number IN(2, 3)
GROUP BY `group`
HAVING MAX(number = 2) > 0
AND MAX(number = 3) > 0
) q
ON t.`group` = q.`group`;
or
SELECT *
FROM table1
WHERE `group` IN
(
SELECT `group`
FROM table1
WHERE number IN(2, 3)
GROUP BY `group`
HAVING MAX(number = 2) > 0
AND MAX(number = 3) > 0
);
Sample output (for both queries):
| ID | NAME | GROUP | NUMBER |
|----|-------|-------|--------|
| 1 | joey | 1 | 2 |
| 2 | keidy | 1 | 3 |
| 5 | jason | 3 | 2 |
| 6 | shane | 3 | 3 |
Here is SQLFiddle demo
On this, you can approach from a fun way with multiple joins for what you WANT qualified, OR, apply a prequery to get all qualified groups as others have suggested, but readability is a bit off for me..
Anyhow, here's an approach going through the table once, but with joins
select DISTINCT
T.id,
T.Name,
T.Group,
T.Number
from
YourTable T
Join YourTable T2
on T.Group = T2.Group AND T2.Group = 2
Join YourTable T3
on T.Group = T3.Group AND T3.Group = 3
where
T.Number IN ( 2, 3 )
So on the first record, it is pointing to by it's own group to the T2 group AND the T2 group is specifically a 2... Then again, but testing the group for the T3 instance and T3's group is a 3.
If it cant complete the join to either of the T2 or T3 instances, the record is done for consideration, and since indexes work great for joins like this, make sure you have one index for your NUMBER criteria, and another index on the (GROUP, NUMBER) for those comparisons and the next query sample...
If doing by more than this simple 2, but larger group, prequery qualified groups, then join to that
select
YT2.*
from
( select YT1.group
from YourTable YT1
where YT1.Number in (2, 3)
group by YT1.group
having count( DISTINCT YT1.group ) = 2 ) PreQualified
JOIN YourTable YT2
on PreQualified.group = YT2.group
AND YT2.Number in (2,3)
Maybe this,if I understand you
SELECT id FROM table WHERE `group` IN
(SELECT `group` FROM table WHERE number IN (2,3)
GROUP BY `group`
HAVING COUNT(DISTINCT number)=2)
SQL Fiddle
This will return all ids where BOTH numbers exist in a group.Remove DISTINCT if you want ids for groups where just one numbers is in.
I am trying to implement a message system quite similar to facebook . The message table is :
+--------+----------+--------+-----+----------+
| msg_id | msg_from | msg_to | msg | msg_time |
+--------+----------+--------+-----+----------+
Here msg_from and msg_to contain user ids and the msg_time contains the timestamp of the message . A user's user id can appear in both the to and from column and multiple times for another user . How should I write a SQL query which selects the most recent sent message between two users ? (The message can come from either one) 1 to 2 or 2 to 1 .
Since John Woo clarified that it is not directional, here's my new answer:
select *
from msgsList
where (least(msg_from, msg_to), greatest(msg_from, msg_to), msg_time)
in
(
select
least(msg_from, msg_to) as x, greatest(msg_from, msg_to) as y,
max(msg_time) as msg_time
from msgsList
group by x, y
);
Output:
| MSG_ID | MSG_FROM | MSG_TO | MSG | MSG_TIME |
------------------------------------------------------------------------
| 1 | 1 | 2 | hello | January, 23 2010 17:00:00-0800 |
| 5 | 1 | 3 | me too | January, 23 2012 00:15:00-0800 |
| 6 | 3 | 2 | hello | January, 23 2012 01:12:12-0800 |
For this input:
create table msgsList
(
msg_id int,
msg_from int,
msg_to int,
msg varchar(10),
msg_time datetime
);
insert into msgslist VALUES
(1, 1, 2, 'hello', '2010-01-23 17:00:00'), -- shown
(2, 2, 1, 'world', '2010-01-23 16:00:00'),
(3, 3, 1, 'i am alive', '2011-01-23 00:00:00'),
(4, 3, 1, 'really', '2011-01-22 23:15:00'),
(5, 1, 3, 'me too', '2012-01-23 00:15:00'), -- shown
(6, 3, 2, 'hello', '2012-01-23 01:12:12'); -- shown
SQLFiddle Demo
If ANSI SQL is your cup of tea, here's the way to do it: http://sqlfiddle.com/#!2/0a575/19
select *
from msgsList z
where exists
(
select null
from msgsList
where
least(z.msg_from, z.msg_to) = least(msg_from, msg_to)
and greatest(z.msg_from, z.msg_to) = greatest(msg_from, msg_to)
group by least(msg_from, msg_to), greatest(msg_from, msg_to)
having max(msg_time) = z.msg_time
) ;
Could it be this simple? http://www.sqlfiddle.com/#!2/50f9f/1
set #User1 := 'John';
set #User2 := 'Paul';
select *
from
(
select *
from messages
where msg_from = #User1 and msg_to = #User2
order by msg_time desc
limit 1
) as x
union
select *
from
(
select *
from messages
where msg_from = #User2 and msg_to = #User1
order by msg_time desc
limit 1
) as x
order by msg_time desc
Output:
| MSG_ID | MSG_FROM | MSG_TO | MSG | MSG_TIME |
----------------------------------------------------------------------------
| 2 | Paul | John | Hey Johnny! | August, 20 2012 00:00:00-0700 |
| 1 | John | Paul | Hey Paulie! | August, 19 2012 00:00:00-0700 |
Could be a lot simpler if only MySQL supported windowing function: http://www.sqlfiddle.com/#!1/e4781/8
with recent_message as
(
select *, rank() over(partition by msg_from, msg_to order by msg_time desc) as r
from messages
)
select *
from recent_message
where r = 1
and
(
(msg_from = 'John' and msg_to = 'Paul')
or
(msg_from = 'Paul' and msg_to = 'John')
)
order by msg_time desc;
For any complex query like this, use TDQD — Test-Driven Query Design. Devise the answer step-by-step, with the size of the steps controlled by your experience and how well you understand the problem.
Step 1 — Find the time of the most recent message between the given users
Throughout this, I assume that the user IDs are integers; I'm using the values 1000 and 2000.
SELECT MAX(msg_time) AS msg_time
FROM message
WHERE ((msg_to = 1000 AND msg_from = 2000) OR
(msg_to = 2000 AND msg_from = 1000)
)
Step 2 — Find the record corresponding to the most recent message
SELECT m.*
FROM message AS m
JOIN (SELECT MAX(msg_time) AS msg_time
FROM message
WHERE ((msg_to = 1000 AND msg_from = 2000) OR
(msg_to = 2000 AND msg_from = 1000)
)
) AS t
ON t.msg_time = m.msg_time
WHERE ((m.msg_to = 1000 AND m.msg_from = 2000) OR
(m.msg_to = 2000 AND m.msg_from = 1000)
)
If there happen to be two (or more) messages between these characters with the same latest timestamp, then they'll all be selected; there is at present no basis for choosing between the collisions. If you think that's a problem, you can arrange to find the MAX(msg_id) using the query above (as a sub-query):
SELECT m2.*
FROM message AS m2
JOIN (SELECT MAX(m.msg_id) AS msg_id
FROM message AS m
JOIN (SELECT MAX(msg_time) AS msg_time
FROM message
WHERE ((msg_to = 1000 AND msg_from = 2000) OR
(msg_to = 2000 AND msg_from = 1000)
)
) AS t
ON t.msg_time = m.msg_time
WHERE ((m.msg_to = 1000 AND m.msg_from = 2000) OR
(m.msg_to = 2000 AND m.msg_from = 1000)
)
) AS i
ON i.msg_id = m2.msg_id
Warning: Code not formally tested with any DBMS.
After giving it some thought, I came up with this:
SELECT min_user AS min(msg_from, msg_to), max_user AS max(msg_from, msg_to),
max(msg_date) FROM msg GROUP BY min_user, max_user
I'm still not quite sure how to get the additional data from the message, but I'll give it some thought.