SQL including count and multiple where clauses - mysql

I have 2 tables.
Table text_to_annotate:
CREATE TABLE text_to_annotate (
ID varchar(3),
text varchar(100));
INSERT INTO text_to_annotate (ID, text)
VALUES
(1, test1),
(2, test2),
(3, test3);
Table annotation_data:
CREATE TABLE annotation_data (
text_ID varchar(3),
annotation_ID varchar(3)
IP varchar(15));
INSERT INTO annotation_data (text_ID, annotation_ID, IP)
VALUES
(1, 0, IP_1),
(2, 1, IP_1),
(3, 2, IP_1),
(1, 1, IP_2),
(2, 2, IP_2),
(3, 3, IP_2),
(3, 0, IP_3),
(3, 0, IP_4),
(3, 2, IP_5);
I want to display an unseen text to an annotator which hasn't been annotated more than 5 times. For example, a new annotator with IP = IP_6 cannot annotate text_ID = 3, only text_ID = 1 and text_ID = 2. An annotator can only annotate unique text_IDs once.
Here's my code, but something isn't quite correct:
SELECT text_to_annotate.ID, text_to_annotate.text
FROM text_to_annotate
WHERE text_to_annotate.ID NOT IN (
SELECT text_ID, count(*)
FROM annotation_data
WHERE IP = '{$ip}'
AND GROUP BY text_ID
HAVING count(*) > 1;
)
ORDER BY RAND()

Here's my answer:
SELECT text_to_annotate.ID, text_to_annotate.text
FROM text_to_annotate
WHERE text_to_annotate.ID NOT IN (
SELECT text_ID, count(*)
FROM annotation_data
WHERE IP = '{$ip}'
)
AND text_to_annotate.ID IN (
SELECT text_ID FROM impact_annotation
GROUP BY text_ID
HAVING COUNT(*) < 5
)
ORDER BY RAND()

Related

Check if a pair of records belong to multiple group IDs

I have a table that contains 2 IDs - UserID and GroupID. I need to pull a list of all UserIDs that "share" the same GroupID at least 4 times
So, based on the following data set:
CREATE TABLE IF NOT EXISTS `tableA` (
`UserID` int(11) unsigned NOT NULL,
`GroupID` int(11) unsigned NOT NULL
) DEFAULT CHARSET=utf8;
INSERT INTO `tableA` (`UserID`, `GroupID`) VALUES
(1, 1),
(2, 1),
(3, 1),
(4, 1),
(1, 2),
(2, 2),
(3, 2),
(1, 3),
(2, 3),
(3, 3),
(1, 4),
(2, 4),
(3, 4),
(1, 5),
(3, 5);
I'm trying to generate the following result:
UserID A
UserID B
NumberOfOccurrences
1
2
4
2
3
4
1
3
5
I've created an SQLFiddle for it. I've tried to achieve this via JOINs and sub-queries, but I'm not entirely sure how to properly proceed with something like this.
Do a self join. GROUP BY. Use HAVING to make sure at least 4 common GroupID's.
select a1.userid, a2.userid
from tablea a1
join tablea a2
on a1.GroupID = a2.GroupID and a1.userid < a2.userid
group by a1.userid, a2.userid
having count(*) >= 4

How do i get a limited number of rows for each value in a field? (db is mysql but could be changed if easier)

I found this difficult to search for this question. I have a table of sports fixtures (tbl_fixture) and a table of sports participants (tbl_participant) which have a many-to-many relationship via a linking table (tbl_fixture_participant)
I need to return the most recent 3 fixtures (ie latest tbl_fixture.start_datetime) of multiple participants and whether they won each of the fixtures, (eg more recent 3 fixtures of participant 1 and most recent 3 fixtures of participant 2, and most recent 3 fixtures of participant 3, with each record returning the fixture_id, participant_id, start_datetime and is_winner fields).
The number of participants that i need to get the data for could be between 1 and 100.
If there's a better way to structure my data, or a better database for this type of query (graph db?) then i'm happy to look into those.
Here's a sample schema:
CREATE TABLE tbl_fixture (
fixture_id INT AUTO_INCREMENT PRIMARY KEY,
start_datetime DATETIME NOT NULL
);
CREATE TABLE tbl_participant (
participant_id INT AUTO_INCREMENT PRIMARY KEY,
name VARCHAR(255) NOT NULL
);
CREATE TABLE tbl_fixture_participant (
fixture_id INT NOT NULL,
participant_id INT NOT NULL,
is_winner TINYINT NOT NULL,
FOREIGN KEY (fixture_id)
REFERENCES tbl_fixture (fixture_id)
ON UPDATE RESTRICT ON DELETE CASCADE,
FOREIGN KEY (participant_id)
REFERENCES tbl_participant (participant_id)
ON UPDATE RESTRICT ON DELETE CASCADE
);
INSERT INTO tbl_fixture (fixture_id, start_datetime)
VALUES (1, "2021-01-14 15:00:00"),
(2, "2021-01-13 16:00:00"),
(3, "2021-01-12 17:00:00"),
(4, "2021-01-11 15:00:00"),
(5, "2021-01-19 16:00:00"),
(6, "2021-01-18 17:00:00"),
(7, "2021-01-05 15:00:00"),
(8, "2021-01-03 16:00:00"),
(9, "2021-01-03 17:00:00"),
(10, "2021-01-11 15:00:00"),
(11, "2021-01-12 16:00:00"),
(12, "2021-01-13 17:00:00"),
(13, "2021-01-14 15:00:00"),
(14, "2021-01-19 16:00:00");
INSERT INTO tbl_participant (participant_id, name) VALUES
( 1,"Team 1"),
( 2,"Team 2"),
( 3,"Team 3");
INSERT INTO tbl_fixture_participant (fixture_id, participant_id, is_winner)
VALUES (1, 1, 0)
,(2, 1, 1)
,(2, 2, 0)
,(3, 1, 1)
,(12, 2, 0)
,(4, 3, 1)
,(4, 2, 0)
,(6, 3, 1)
,(1, 2, 1)
,(10, 1, 1)
,(5, 2, 0)
,(6, 1, 0)
,(11, 1, 1)
,(14, 1, 0)
,(7, 2, 0)
,(7, 3, 1)
,(3, 3, 0)
,(8, 1, 0)
,(5, 3, 1)
,(13, 2, 0)
,(8, 3, 1)
,(13, 3, 1)
,(9, 1, 0)
,(9, 2, 1)
,(10, 2, 0)
,(11, 3, 0)
,(12, 3, 1)
,(14, 3, 1);
And SQL Fiddle of same.
I would like the data to come back like:
fixture_id
start_datetime
participant_id
is_winner
14
2021-01-19T16:00:00Z
1
0
6
2021-01-18T17:00:00Z
1
0
1
2021-01-14T15:00:00Z
1
0
5
2021-01-19T16:00:00Z
2
0
13
2021-01-14T15:00:00Z
2
0
1
2021-01-14T15:00:00Z
2
1
EDITED TO REFLECT FACT THAT DATES ARE NOT NECESSARILY SEQUENTIAL...
E.g. (for older versions of MySQL)...
SELECT x.*
, fx.start_datetime
FROM fixture_participant x
JOIN fixture fx
ON fx.fixture_id = x.fixture_id
JOIN fixture_participant y
ON y.participant_id = x.participant_id
JOIN fixture fy
ON fy.fixture_id = y.fixture_id
AND fy.start_datetime > fx.start_datetime
GROUP
BY x.fixture_id
, x.participant_id
, x.is_winner
, fx.start_datetime
HAVING COUNT(x.fixture_id) <=3
ORDER
BY participant_id,fixture_id;
...or something like that.

Ponderate average MYSQL

We have a little simulator of a tour-operator DB (MYSQL) and we are asked to get a Query that gives us the weighted avg of duration of the tours that we have.
https://en.wikipedia.org/wiki/Weighted_arithmetic_mean
Using subquery I got to this point where I have the days that each tour lasts and the weight of each tour from the total of tours, but I am stuck and don't know how to get the weighted avg from here. I know I have to use another select from the result I already got but I would appreciate some help.
SQLfiddle down here:
http://sqlfiddle.com/#!9/53d80/2
Tables and data
CREATE TABLE STAGE
(
ID INT AUTO_INCREMENT NOT NULL,
TOUR INT NOT NULL,
TYPE INT NOT NULL,
CITY INT NOT NULL,
DAYS INT NOT NULL,
PRIMARY KEY (ID)
);
CREATE TABLE TOUR
(
ID INT AUTO_INCREMENT NOT NULL,
DESCRIPTION VARCHAR(255) CHARACTER SET UTF8 COLLATE UTF8_UNICODE_CI
NOT NULL,
STARTED_ON DATE NOT NULL,
TYPE INT NOT NULL,
PRIMARY KEY (ID)
);
INSERT INTO TOUR (DESCRIPTION, STARTED_ON, TYPE) VALUES
('Mediterranian Cruise','2018-01-01',3),
('Trip to Nepal','2017-12-01',1),
('Tour in Nova York','2015-04-24',5),
('A week at the Amazones','2014-09-11',2),
('Visiting the Machu Picchu','2013-02-19',4);
INSERT INTO STAGE (TOUR, TYPE, CITY, DAYS) VALUES
(1, 1, 38254, 1),
(1, 2, 22460, 3),
(1, 2, 47940, 3),
(1, 2, 42600, 4),
(1, 3, 38254, 1),
(2, 1, 13097, 1),
(2, 2, 29785, 5),
(2, 3, 13097, 1),
(3, 1, 788, 2); ,
(3, 2, 48019, 6),
(3, 3, 788, 1),
(4, 1, 38254, 2),
(4, 2, 8703, 3);,
(4, 3, 38254, 4),
(5, 1, 10453, 1),
(5, 2, 32045, 5),
(5, 3, 10453, 2);
Query:
SELECT
AVG(TD.TOUR_DAYS) AS AVERAGE_DAYS,
COUNT(TD.TOUR_ID) AS WEIGHT
FROM
(
SELECT
TOUR.ID AS TOUR_ID,
SUM(DAYS) AS TOUR_DAYS,
COUNT(STAGE.ID) AS STAGE_DAYS
FROM
TOUR
INNER JOIN
STAGE
ON
TOUR.ID = STAGE.TOUR
GROUP BY
TOUR.ID
) AS TD
GROUP BY
TD.TOUR_DAYS
weigthed avg would be:
(1×7+1×8+2×9+1×12) / (1+1+2+1) = 9
Wheighted AVG can be calculated with SUM(value * wheight) / SUM(wheight). In your case:
SELECT SUM(AVERAGE_DAYS * WEIGHT) / SUM(WEIGHT)
FROM (
SELECT
AVG(TD.TOUR_DAYS) AS AVERAGE_DAYS,
COUNT(TD.TOUR_ID) AS WEIGHT
FROM
(
SELECT
TOUR.ID AS TOUR_ID,
SUM(DAYS) AS TOUR_DAYS,
COUNT(STAGE.ID) AS STAGE_DAYS
FROM
TOUR
INNER JOIN
STAGE
ON
TOUR.ID = STAGE.TOUR
GROUP BY
TOUR.ID
) AS TD
GROUP BY
TD.TOUR_DAYS
) sub
http://sqlfiddle.com/#!9/53d80/4
I'm not 100% sure, but it looks like the following query is doing exactly the same:
SELECT AVG(TOUR_DAYS)
FROM (
SELECT TOUR, SUM(DAYS) AS TOUR_DAYS
FROM STAGE
GROUP BY TOUR
) sub;
Or even without any subqueries:
SELECT SUM(DAYS) / COUNT(DISTINCT TOUR)
FROM STAGE;
That would mean, the requirement should be simplified to "Get average number of days per tour".

Stored Procedure timing out

Trying to collect some heirarchical data to send out to a third party, and was directed to this post.
After attempting to tweak it to my use case on SQL Fiddle, the stored procedure keeps timing out.
So I tried it on locally twice (via PhpMyAdmin).
When I try to reload PMA in the browser after calling the stored procedure, I just get an eternal "waiting for response" spinner (more than 10 or 20 minutes).
I'm presuming there's something wrong with my SP code ???
CREATE TABLE foo
(`id` int, `name` varchar(100), `parentId` int, `path` varchar(100))
//
INSERT INTO foo (`id`, `name`, `parentId`, `path`)
VALUES (1, 'discrete', 0, NULL),
(2, 'res', 1, NULL),
(3, 'smt', 2, NULL),
(4, 'cap', 1, NULL),
(5, 'ind', 1, NULL),
(6, 'smt', 4, NULL),
(7, 'tant', 6, NULL),
(8, 'cer', 6, NULL)
//
CREATE PROCEDURE updatePath()
BEGIN
DECLARE cnt, n int;
SELECT COUNT(*) INTO n FROM foo WHERE parentId = 0;
UPDATE foo a, foo b SET a.path = b.name WHERE b.parentId IS NULL AND a.parentId = b.id;
SELECT count(*) INTO cnt FROM foo WHERE path IS NULL;
while cnt > n do
UPDATE foo a, foo b SET a.path = concat(b.path, '|', b.id) WHERE b.path IS NOT NULL AND a.parentId = b.id;
SELECT count(*) INTO cnt FROM foo WHERE path IS NULL;
end while;
END//
EDIT
Expected results:
VALUES (1, 'discrete', 0, '1'),
(2, 'res', 1, '1|2'),
(3, 'smt', 2, '1|2|3'),
(4, 'cap', 1, '1|4'),
(5, 'ind', 1, '1|5'),
(6, 'smt', 4, '1|4|6'),
(7, 'tant', 6, '1|4|6|7'),
(8, 'cer', 6, '1|4|6|8');
After a good nights sleep, I took #Drew's lead and I went through it one pc at a time.
Got it working. Here's where I'm leaving it:
CREATE TABLE foo
(`id` int, `name` varchar(100), `parentId` int, `path` varchar(100))
//
INSERT INTO foo
(`id`, `name`, `parentId`, `path`)
VALUES
(1, 'dscr', 0, NULL),
(2, 'res', 1, NULL),
(3, 'smt', 2, NULL),
(4, 'cap', 1, NULL),
(5, 'ind', 1, NULL),
(6, 'chp', 4, NULL),
(7, 'tant', 6, NULL),
(8, 'cer', 6, NULL)
//
CREATE PROCEDURE updatePath()
BEGIN
DECLARE cnt, n int;
SELECT COUNT(*) INTO n FROM foo WHERE parentId = 0; -- n is now 1
SELECT COUNT(*) INTO cnt FROM foo WHERE path IS NULL; -- cnt is now 8
UPDATE foo child, foo parent -- each child now has its parent and own ID's in the path
SET child.path = CONCAT(parent.id, '|', child.id)
WHERE parent.parentId = 0
AND child.parentId = parent.id;
WHILE cnt > n DO
UPDATE foo child, foo parent -- concat parent's path and own ID into each empty child's path
SET child.path = concat( parent.path,'|',child.id )
WHERE parent.path IS NOT NULL
AND child.parentId = parent.id;
SELECT COUNT(*) INTO cnt -- decrement cnt
FROM foo
WHERE path IS NULL;
END WHILE;
UPDATE foo -- set path for any top-level categories
SET path = id
WHERE path IS NULL;
END//
call updatePath()//
Feel free to critique.
Hope this helps someone else some time.
Are you trying to do a self-referencing join to create the hierarchy?
select
a.name,
parentName = b.name
from
foo a ,
outer join foo b on
(
a.id = b.parentId
)

Returning records which only have one specific many to many relation

Given this structure
CREATE TABLE locations
(`id` int, `Name` varchar(128))
;
INSERT INTO locations
(`id`, `Name`)
VALUES
(1, 'Location 1'),
(2, 'Location 2'),
(3, 'Location 3')
;
CREATE TABLE locations_publications
(`id` int, `publication_id` int, `location_id` int)
;
INSERT INTO locations_publications
(`id`, `publication_id`, `location_id`)
VALUES
(1, 1, 1),
(2, 2, 1),
(3, 2, 2),
(4, 1, 3)
;
I would like to find only Location 2 based on the fact that it has only one relation with a publication_id = 2.
It should not return location one due to the fact that it has two relation rows.
This is sort of what I'm looking for but of course dosnt work because it limits the relationship to where publication_id = 2.
select * from locations
join locations_publications on locations_publications.location_id = locations.id
where locations_publications.publication_id = 2
group by (locations.location_id)
having count(*) = 1
You can do this with aggregation:
select location_id
from locations_publications
group by location_id
having count(*) = 1
If a location might have multiple records with the same publication, change the having criteria to count(distinct publication_id) = 1
Given your edits, you can use conditional aggregation for that:
select location_id
from locations_publications
group by location_id
having count(*) = sum(case when publication_id = 2 then 1 else 0 end)