MySQL Query to select subset of top 50 rows based on frequency

MySQL Query to select subset of top 50 rows based on frequency - mysql

I have a table called data:
create table data
(
ID int primary key,
val varchar(50),
forID int
constraint fk_forID foreign key (forID) references otherTable(forID)
)
I have a view called dataFrequencies
create view dataFrequencies (val, freq)
as select val, COUNT(*)
from data
group by val
order by freq desc
What I want is the subset of rows from table data where val is in the top fifty rows of dataFrequencies.
My current solution is somewhat roundabout. I create a table topFifty that contains the top 50 rows of dataFrequencies. Then I create a view topFiftyVals which selects all from data but inner joins on table topFifty:
create table topFifty
(
val varchar(50) primary key
)
insert into topFifty select val from dataFrequencies order by frequency desc limit 50;
create view topFiftyVals (ID, val, forID)
as select *
from data d
inner join topFifty tf on d.val = tf.val
I am sure there is some kind of direct querying method that will do this! Thanks for all the help!

Yes, there is a direct way. It's the code in your topFiftyVals view, slightly altered:
select d.*, tf.freq
from data d
inner join ( select val, COUNT(*) AS freq
from data
group by val
order by freq desc
limit 50
) tf
on d.val = tf.val ;

Couldn't you just do:
SELECT *
FROM data
WHERE val IN (SELECT val
FROM dataFrequencies
ORDER BY frequency DESC
LIMIT 50);

Related

MySQL Row Position Using ORDER BY

I'm trying to get this to work. When I run the SELECT on the whole dataset I know that the record with cust_number shows up in position 6 (When Using ORDER BY) but this code returns position 37327 which is it's non ordered by position.
SELECT
x.position,
x.cust_number,
x.company,
x.surname,
x.first_name,
x.title
FROM
(SELECT
#rownum:=#rownum + 1 AS position,
c.cust_number,
company,
surname,
first_name,
title
FROM
1_customer_records c
LEFT JOIN addresses a ON c.fk_addresses_id = a.id
JOIN (SELECT #rownum:=0) r
ORDER BY a.company , c.surname , c.first_name , c.title) x
WHERE
x.cust_number = 43246;

Here is another approach using a temp table
CREATE TEMPORARY TABLE row_calc (id INT AUTO_INCREMENT, fk INT NULL, PRIMARY KEY (id)) ENGINE=MEMORY;
INSERT INTO row_calc(fk)
SELECT
cust_number
FROM
1_customer_records c
LEFT JOIN
addresses a ON c.fk_addresses_id = a.id
ORDER BY company,surname,first_name,title;
SELECT
id
FROM
row_calc
WHERE
fk = 43246 LIMIT 1;
DROP TABLE row_calc;

How to update by Ignoring null values in mysql

I have my data like this,
Heading
& I want output in
Name Gender Salary
Sam M 3.45
Priya F 4.02
Please help me out.
thank you.

This is not a practical scenario. At least you must have a Order by column or a sequence number to manage your data set. but try something like this. Most probably, **Order by** cause you many issues wile you working with the real data set.
CREATE TEMPORARY TABLE t_names
SELECT ROW_NUMBER() OVER ( ORDER BY NAME) rowid, NAME FROM Heading
WHERE NAME IS NOT NULL;
CREATE TEMPORARY TABLE t_gender
SELECT ROW_NUMBER() OVER ( ORDER BY NAME) rowid, gender FROM Heading
WHERE gender IS NOT NULL;
CREATE TEMPORARY TABLE t_salary
SELECT ROW_NUMBER() OVER ( ORDER BY NAME) rowid, salary FROM Heading
WHERE salary IS NOT NULL;
SELECT nm.name, tg.gender, sl.salary FROM t_names nm
INNER JOIN t_gender tg ON tg.rowid = nm.rowid
INNER JOIN t_salary sl ON sl.rowid = nm.rowid
what i given you is a suggestion for your scenario. But if this is a real world scenario, better to discuss this with your team and bring up a good structure for the table with Primary key, sequence number for the table.
at least add a new column as and set it as a identity column, and then it will create a sequence for your table.
if you created a sequence number as i explained you, then there is a 99% possibility to retrieve your exact result set
CREATE TEMPORARY TABLE t_names
SELECT ROW_NUMBER() OVER ( ORDER BY seq) rowid, NAME FROM Heading
WHERE NAME IS NOT NULL;
CREATE TEMPORARY TABLE t_gender
SELECT ROW_NUMBER() OVER ( ORDER BY seq) rowid, gender FROM Heading
WHERE gender IS NOT NULL;
CREATE TEMPORARY TABLE t_salary
SELECT ROW_NUMBER() OVER ( ORDER BY seq) rowid, salary FROM Heading
WHERE salary IS NOT NULL;
SELECT nm.name, tg.gender, sl.salary FROM t_names nm
INNER JOIN t_gender tg ON tg.rowid = nm.rowid
INNER JOIN t_salary sl ON sl.rowid = nm.rowid

Is there a simpler way to perform this query

Here is that query (MySQL syntax):
select
id_image
from
(
select
id_image
, count(id_image) as nb
from
data
group by
id_image
) temp_table
where
nb = (select count(distinct id_group) from data)
data is a table of 3 columns: int id_user, int id_group and int id_image
A row (x, y, z) means that:
image z is in image group y
group y was created by user x
And we want to list all the images that are present in each image group. Thanks.

You are selecting all Image IDs that occur in data as often as there are distinct group IDs in that table? That seems strange.
Anyhow, the query can be re-written as:
select id_image
from data
group by id_image
having count(*) = (select count(distinct id_group) from data);

How about try this:
select id_group, id_image from data where id_group in (select distinct id_group from data);

How do I write this kind of query (returning the latest avaiable data for each row)

I have a table defined like this:
CREATE TABLE mytable (id INT NOT NULL AUTO_INCREMENT, PRIMARY KEY(id),
user_id INT REFERENCES user(id) ON UPDATE CASCASE ON DELETE RESTRICT,
amount REAL NOT NULL CHECK (amount > 0),
record_date DATE NOT NULL
);
CREATE UNIQUE INDEX idxu_mybl_key ON mytable (user_id, amount, record_date);
I want to write a query that will have two columns:
user_id
amount
There should be only ONE entry in the returned result set for a given user. Furthermore, the amount figure returned should be the last recoreded amount for the user (i.e. MAX(record_date).
The complication arises because weights are recorded on different dates for different users, so there is no single LAST record_date for all users.
How may I write (preferably an ANSI SQL) query to return the columns mentioned previously, but ensuring that its only the amount for the last recorded amount for the user that is returned?
As an aside, it is probably a good idea to return the 'record_date' column as well in the query, so that it is eas(ier) to verify that the query is working as required.
I am using MySQL as my backend db, but ideally the query should be db agnostic (i.e. ANSI SQL) if possible.

First you need the last record_date for each user:
select user_id, max(record_date) as last_record_date
from mytable
group by user_id
Now, you can join previous query with mytable itself to get amount for this record_date:
select
t1.user_id, last_record_date, amount
from
mytable t1
inner join
( select user_id, max(record_date) as last_record_date
from mytable
group by user_id
) t2
on t1.user_id = t2.user_id
and t1.record_date = t2.last_record_date
A problem appears becuase a user can have several rows for same last_record_date (with different amounts). Then you should get one of them, sample (getting the max of the different amounts):
select
t1.user_id, t1.record_date as last_record_date, max(t1.amount)
from
mytable t1
inner join
( select user_id, max(record_date) as last_record_date
from mytable
group by user_id
) t2
on t1.user_id = t2.user_id
and t1.record_date = t2.last_record_date
group by t1.user_id, t1.record_date

I do not now about MySQL but in general SQL you need a sub-query for that. You must join the query that calculates the greatest record_date with the original one that calculates the corresponding amount. Roughly like this:
SELECT B.*
FROM
(select user_id, max(record_date) max_date from mytable group by user_id) A
join
mytable B
on A.user_id = B.user_id and A.max_date = B.record_date

SELECT datatable.* FROM
mytable AS datatable
INNER JOIN (
SELECT user_id,max(record_date) AS max_record_date FROM mytable GROUP BS user_id
) AS selectortable ON
selectortable.user_id=datatable.user_id
AND
selectortable.max_record_date=datatable.record_date
in some SQLs you might need
SELECT MAX(user_id), ...
in the selectortable view instead of simply SELECT user_id,...

The definition of maximum: there is no larger(or: "more recent") value than this one. This naturally leads to a NOT EXISTS query, which should be available in any DBMS.
SELECT user_id, amount
FROM mytable mt
WHERE mt.user_id = $user
AND NOT EXISTS ( SELECT *
FROM mytable nx
WHERE nx.user_id = mt.user_id
AND nx.record_date > mt.record_date
)
;
BTW: your table definition allows more than one record to exist for a given {id,date}, but with different amounts. This query will return them all.

How to prevent GROUP_CONCAT from creating a result when no input data is present?

Given the following MySQL query:
SELECT
`show`.`id`
, GROUP_CONCAT( `showClips`.`clipId` ORDER BY `position` ASC ) AS 'playlist'
FROM
`show`
INNER JOIN
`showClips`
ON
( `show`.`id` = `showClips`.`showId` )
;
I want to retrieve a list of all "shows" from the database, including the ids of contained "clips".
This works fine, as long as there are entries in the show table. For this problem, let's assume all tables are completely empty.
GROUP_CONCAT will return NULL and thus forcing a row into the result (which contains only NULL values).
My application will then think that one show/result exists. But that result will be invalid. This can of course be checked, but I feel like this could (and should) be prevented in the query already.

You should simply add a GROUP BY at the end.
Test case:
CREATE TABLE `show` (id int);
CREATE TABLE `showClips` (clipId int, showId int, position int);
SELECT
`show`.`id`,
GROUP_CONCAT( `showClips`.`clipId` ORDER BY `position` ASC ) AS 'playlist'
FROM `show`
INNER JOIN `showClips` ON ( `show`.`id` = `showClips`.`showId` )
GROUP BY `show`.`id`;
Empty set (0.00 sec)

Add group by show.id, then result will be correct for empty tables:
create table emptyt(id int, name varchar(20));
select id, group_concat(name) from emptyt
result:
NULL, NULL
query with group by
select id, group_concat(name) from emptyt
group by Id
result:
empty dataset

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

MySQL Query to select subset of top 50 rows based on frequency - mysql

Yes, there is a direct way. It's the code in your topFiftyVals view, slightly altered: select d., tf.freq from data d inner join ( select val, COUNT() AS freq from data group by val order by freq desc limit 50 ) tf on d.val = tf.val ;

Couldn't you just do: SELECT * FROM data WHERE val IN (SELECT val FROM dataFrequencies ORDER BY frequency DESC LIMIT 50);

Related

MySQL Row Position Using ORDER BY

How to update by Ignoring null values in mysql

Is there a simpler way to perform this query

How do I write this kind of query (returning the latest avaiable data for each row)

How to prevent GROUP_CONCAT from creating a result when no input data is present?

Categories

Resources

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

MySQL Query to select subset of top 50 rows based on frequency - mysql

Yes, there is a direct way. It's the code in your topFiftyVals view, slightly altered: select d.*, tf.freq from data d inner join ( select val, COUNT(*) AS freq from data group by val order by freq desc limit 50 ) tf on d.val = tf.val ;

Couldn't you just do: SELECT * FROM data WHERE val IN (SELECT val FROM dataFrequencies ORDER BY frequency DESC LIMIT 50);

Related

MySQL Row Position Using ORDER BY

How to update by Ignoring null values in mysql

Is there a simpler way to perform this query

How do I write this kind of query (returning the latest avaiable data for each row)

How to prevent GROUP_CONCAT from creating a result when no input data is present?

Categories

Resources

Yes, there is a direct way. It's the code in your topFiftyVals view, slightly altered: select d., tf.freq from data d inner join ( select val, COUNT() AS freq from data group by val order by freq desc limit 50 ) tf on d.val = tf.val ;