SELECT multiple values and order by relevance - mysql

I have the following structure:
CREATE TABLE IF NOT EXISTS `user_subjects` (
`user_subject_id` int(11) NOT NULL AUTO_INCREMENT,
`user_id` int(11) NOT NULL,
`subject_id` int(11) NOT NULL,
PRIMARY KEY (`user_subject_id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1 AUTO_INCREMENT=1 ;
and I want to get the user_ids that has subject_id 1 or 2, for example, ordered by relevance. (edit: i meant by how many results it matches)
I've tryed this but it doesn't count the relevance. It returns relevance 1.
http://sqlfiddle.com/#!2/944ab8/2
For that example i wanted to get
user_id relevance
1 1
2 2
and the subject_id matched if it's possible.
Thanks

Something like the following will I think do what you want, although I'm not entirely clear what you mean by "relevance"; this simply counts the number of rows that match both the user and have subject_id 1 or 2:
SELECT
user_id,
COUNT(subject_id) AS relevance
FROM
`user_subjects`
WHERE
subject_id IN (1, 2)
GROUP BY
user_id
ORDER BY
relevance DESC
(This is mysql-specific, I believe, because of the use of relevance in ORDER BY.)

Note sure I follow. Do you mean something like this:
SELECT user_id
, SUM(CASE WHEN subject_id IN(1,2) THEN 1 ELSE 0 END)ttl
FROM user_subjects
GROUP
BY user_id;
?

Well, you'll never get a relevance of 2, because (subject_id LIKE 1) + (subject_id LIKE 2) will return 1 at most.
Maybe you should rethink your query.

Related

Quickly Select Random Rows With Where Condition

Is it possible to quickly select random rows from a table, while also using a where condition?
Example:
SELECT * FROM geo WHERE placeRef = 1 ORDER BY RAND() LIMIT 1
This can take 10+ seconds.
I found this, which is sometimes quick, sometimes very slow:
(SELECT *
FROM geo
INNER JOIN ( SELECT RAND() * ( SELECT MAX( nameRef ) FROM geo ) AS ID ) AS t ON geo.nameRef >= t.ID
WHERE geo.placeRef = 1
ORDER BY geo.nameRef
LIMIT 1)
This provides a quick result, only if there is no extra where condition.
This is the create table:
CREATE TABLE `geo` (
`nameRef` int(8) DEFAULT NULL,
`placeRef` mediumint(7) unsigned DEFAULT NULL,
`category` enum('continent','country','region','subregion') COLLATE utf8_bin DEFAULT NULL,
`parentRef` mediumint(7) DEFAULT NULL,
`incidence` int(9) unsigned NOT NULL,
`percent` decimal(11,9) unsigned DEFAULT NULL,
`ratio` int(11) NOT NULL,
`rank` mediumint(7) unsigned DEFAULT NULL,
KEY `placeRef_rank` (`placeRef`,`rank`),
KEY `nameRef_category` (`nameRef`,`category`),
KEY `nameRef_parentRef` (`nameRef`,`parentRef`),
KEY `nameRef_placeRef` (`nameRef`,`placeRef`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 COLLATE=utf8_bin
N.B. this table has around 550 million rows.
Desired query: query the table where placeRef = x; and then quickly return one row.
Issue: a query like SELECT * FROM geo WHERE placeRef = 1 can provide up to about 15 million results. So selecting a single random row is slow.
That technique is variable because it depends on where the matching rows happen to lie in the table.
The quick fix may be to add this index, assuming that nameRef is the PRIMARY KEY for the table:
INDEX(placeRef, nameRef)
Let's discuss this further after
You provide SHOW CREATE TABLE geo
You read http://mysql.rjweb.org/doc.php/random
There are (currently) 3 indexes that make this subquery very fast (because of the leading nameRef):
( SELECT MAX( nameRef ) FROM geo )
After that, my suggestion of (placeRef, nameRef) will kick in for these:
WHERE geo.placeRef = 1
geo.nameRef >= t.ID
I think the resulting query should be consistently fast.
This is pulling a result in 1/100th of a second:
SELECT * FROM geo where placeRef = 1 AND nameRef >= CEIL( RAND() * ( SELECT MAX( nameRef ) FROM forenameGeo ) ) LIMIT 1
This works well if you have an index on both the columns you would like to query. However, you may need to make a new table that is randomly ordered. In my table the nameRefs tend to be grouped by country. This causes the random results to be selected from a handful of results as most of the resulted are grouped around the same Id. I needed to create a new table ordered randomly ORDER BY RAND() where each row had a unique Id. Now I search this much smaller summary table with:
SELECT * FROM geoSummary where placeRef = 1 AND nameRef >= CEIL( RAND() * ( SELECT MAX( id ) FROM geoSummary ) ) LIMIT 1
Though to cut that SELECT MAX query running all the time I have saved the maximum Id in the server-side code, generate the random number there and run:
SELECT * FROM geoSummary where placeRef = 1 AND nameRef >= :random_number LIMIT 1
This provides truly random results.

mysql query to select all objectives of a goal and those objective which are done

goal id total occurance of goal id total occurance when status is 1
1 5 3
This is schema of the table
CREATE TABLE `goal_objectives` (
`objective_id` int(11) NOT NULL ,
`objective_name` varchar(255) NOT NULL,
`objective_description` tinytext NOT NULL,
`goal_id` int(11) NOT NULL,
`objective_status` tinyint(4) NOT NULL
);
select goal_id, count(objective_status)as objective_done
from goal_objectives
where objective_status='1' group by goal_id;
select goal_id,count(goal_id) as total_current_goals
from goal_objectives
group by goal_id
order by goal_id DESC ;
I just want to show the combine result of these two queries.
Individually it returns required result but when i try to merge them is does not work
See the output in the link below:
https://i.imgur.com/6Rnac89.png
Use conditional aggregation:
select goal_id, count(*) as total_current_goals,
sum( objective_status = 1 ) as objective_done
from goal_objectives
group by goal_id
order by goal_id desc ;
Note that objective_status is a number. The comparison value should be a number, not a string.

MySQL query optimization with group by clause

I want to calculate total and unique clickouts based on country,partner and retailer.
I have achieved the desired result but i think its not a optimal solution and for longer data sets it will take longer time. how can I improve this query?
here is my test table, designed query and expected output:
"country_id","partner","retailer","id_customer","id_clickout"
"1","A","B","100","XX"
"1","A","B","100","XX"
"2","A","B","100","XX"
"2","A","B","100","GG"
"2","A","B","100","XX"
"2","A","B","101","XX"
DROP TABLE IF EXISTS x;
CREATE TEMPORARY TABLE x AS
SELECT test1.country_id, test1.partner,test1.retailer, test1.id_customer,
SUM(CASE WHEN test1.id_clickout IS NULL THEN 0 ELSE 1 END) AS clicks,
CASE WHEN test1.id_clickout IS NULL THEN 0 ELSE 1 END AS unique_clicks
FROM test1
GROUP BY 1,2,3,4
;
SELECT country_id,partner,retailer, SUM(clicks), SUM(unique_clicks)
FROM x
GROUP BY 1,2,3
Output:
"country_id","partner","retailer","SUM(clicks)","SUM(unique_clicks)"
"1","A","B","2","1"
"2","A","B","4","2"
And here is DDL and input data:
CREATE TABLE test (
country_id INT(11) DEFAULT NULL,
partner VARCHAR(256) CHARACTER SET utf8 DEFAULT NULL,
retailer VARCHAR(256) CHARACTER SET utf8 DEFAULT NULL,
id_customer BIGINT(20) DEFAULT NULL,
id_clickout VARCHAR(256) CHARACTER SET utf8 DEFAULT NULL)
ENGINE=InnoDB DEFAULT CHARSET=utf8;
INSERT INTO test VALUES(1,'A','B','100','XX'),(1,'A','B','100','XX'),
(2,'A','B','100','XX'),(2,'A','B','100','GG'),
(2,'A','B','100','XX'),(2,'A','B','101','xx')
SELECT
country_id,
partner,
retailer,
COUNT(id_clickout) AS clicks,
COUNT(DISTINCT CASE WHEN id_clickout IS NOT NULL THEN id_customer END) AS unique_clicks
FROM
test1
GROUP BY
1,2,3
;
COUNT(a_field) won't count any NULL values.
So, COUNT(id_clickout) will only count the number of times that it is NOT NULL.
Equally, the CASE WHEN statement in the unique_clicks only returns the id_customer for records where they clicked, otherwise it returns NULL. This means that the COUNT(DISTINCT CASE) only counts distinct customers, and only when they clicked.
EDIT :
I just realised, it's potentially even simpler than that...
SELECT
country_id,
partner,
retailer,
COUNT(*) AS clicks,
COUNT(DISTINCT id_customer) AS unique_clicks
FROM
test1
WHERe
id_clickout IS NOT NULL
GROUP BY
1,2,3
;
The only material difference in the results will be that any country_id, partner, retailed that previously showed up with 0 clicks will now not appear in the results at all.
With an INDEX on country_id, partner, retailed, id_clickout, id_customer or country_id, partner, retailed, id_customer, id_clickout, however, this query should be significantly faster.
I think this is what you are after:
SELECT country_id,partner,retailer,COUNT(retailer) as `sum(clicks)`,count(distinct id_clickout) as `SUM(unique_clicks)`
FROM test1
GROUP BY country_id,partner,retailer
Result:
COUNTRY_ID PARTNER RETAILER SUM(CLICKS) SUM(UNIQUE_CLICKS)
1 A B 2 1
2 A B 4 2
See result in SQL Fiddle.

Select characters and group together with joins?

I have the following table setup in mysql:
CREATE TABLE `games_characters` (
`game_id` int(11) DEFAULT NULL,
`player_id` int(11) DEFAULT NULL,
`character_id` int(11) DEFAULT NULL,
KEY `game_id_key` (`game_id`),
KEY `character_id_key` (`character_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
My objective is to get a game_id where a list of character_ids are all present in this game_id.
An example set of data:
1, 1
1, 2
1, 3
2, 1
2, 2
3, 1
3, 4
Let's say i want to get the game_id where the character_id has 1, 2, and 3. How would I go about making an efficient query? Best idea I have had so far was joining the table to itself multiple times, but i assume there has to be a better way to do this.
Thanks
EDIT: for anyone curious this was the final solution I used as it proved the best query time:
SELECT game_ID
FROM (
SELECT DISTINCT character_ID, game_ID
FROM games_Characters
) AS T
WHERE character_ID
IN ( 1, 2, 3 )
GROUP BY game_ID
HAVING COUNT( * ) =3
Select game_ID from games_Characters
where character_ID in (1,2,3)
group by game_ID
having count(*) = 3
the above makes two assumptions
1) you know the characters your looking for
2) game_ID and character_ID are unique
I don't assume you can get the #3 for the count I knnow you can since you know the list of people you're looking for.
This ought to do it.
select game_id
from games_characters
where character_id in (1,2,3)
group by game_id
having count(*) = 3
If that's not dynamic enough for you you'll need to add a few more steps.
create temporary table character_ids(id int primary key);
insert into character_ids values (1),(2),(3);
select #count := count(*)
from character_ids;
select gc.game_id
from games_characters as gc
join character_ids as c
on (gc.character_id = c.id)
group by gc.game_id
having count(*) = #count;

Counting the number of first place scores with mysql

Ok, so I have the following database:
CREATE TABLE IF NOT EXISTS `highscores` (
`lid` int(11) NOT NULL,
`username` varchar(15) NOT NULL,
`score` int(16) NOT NULL,
PRIMARY KEY (`lid`,`username`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1;
lid being the level id.
lets say I have the following values in the table:
lid, username,score
1,sam,15
1,joe,12
1,sue,6
1,josh,9
2,sam,8
2,joe,16
2,sue,4
3,sam,65
4,josh,87
4,sue,43
5,sam,12
5,sue,28
5,joe,29
and so on.
How would I create a query(or if required a set of queries) to get the following
sam has 3 high scores
joe has 2 high scores
josh has 1 high score
Thanks in advance.
i have not tested it, but try the following query
select
concat(h.username ," has ", count(h.username)," high scores ")
from
highscores h inner join
(select lid, max(score) as maxscore
from highscores group by lid) t on h.lid = t.lid and h.score = t.maxscore
group by h.username
From what you've described this query will produce what you need
SELECT username,COUNT(*) as num_highscores FROM (
SELECT lid,username
FROM highscores h1
WHERE score=(
SELECT MAX(score)
FROM highscores h2
WHERE h2.lid=h1.lid
)
) AS high_scores
GROUP BY username
ORDER BY num_highscores DESC
Although the results I get on your sample data are different:
sam 2
joe 2
josh 1