HOW to Find the most frequent string in mySQL - mysql

I would like to get the most frequent car type (varchar) within the 25 years older owners. I wrote a query, but it counts all the types of names.
How can I complement this mySQL query to count only the same name type of cars?
SELECT type, COUNT(type)
FROM `car`
INNER JOIN owner
ON car.owner= tulajdonos.id
WHERE 2018 - owner.birth_date >= 25

You have to sort descending your query and take only the topmost row with LIMIT 1.
SELECT type, COUNT(type) AS counter
FROM `car`
INNER JOIN owner
ON car.owner= tulajdonos.id
WHERE 2018 - owner.birth_date >= 25
GROUP BY type
ORDER BY COUNT(type) DESC
LIMIT 1

If you want the one most frequent type you could use
SELECT top 1 type, COUNT(type)
FROM `car`
INNER JOIN owner
ON car.owner= tulajdonos.id
WHERE 2018 - owner.birth_date >= 25
order by count(type) desc, type
if you are sure there are no ties.
Also, your way of determining age isn't perfect, you could use something like:
WHERE
case
when DATEPART(DY, owner.birth_date) > DATEPART(DY, GETDATE())
then DATEDIFF(YYYY, owner.birth_date, GETDATE()) - 1
else DATEDIFF(YYYY, owner.birth_date, GETDATE())
end >= 25

You could start by doing the date arithmetic correctly. After that, you probably just want limit:
SELECT c.type, COUNT(*) AS counter
FROM car c INNER JOIN
owner o
ON c.owner= o.id
WHERE o.birth_date < curdate() - interval 25 year;
ORDER BY counter DESC
LIMIT 1

Related

sql optimization: count all rows through subquery or own query / other improvements

I'm trying to improve my mysql query. At first I'm trying to optimize that simple one:
SELECT * ,
(
SELECT COUNT(id)
FROM animal
WHERE type = :type AND timestampadopt > 0 AND (date BETWEEN DATE_FORMAT(CURDATE() , '%Y-%m-%d') - INTERVAL 1 YEAR AND DATE_FORMAT(CURDATE(),'%Y-%m-%d'))
) AS countanimals
FROM animal
WHERE type = :type AND timestampadopt > 0 AND (date BETWEEN DATE_FORMAT(CURDATE() , '%Y-%m-%d') - INTERVAL 1 YEAR AND DATE_FORMAT(CURDATE(),'%Y-%m-%d'))
ORDER BY timestamp DESC
LIMIT 1, 20;
COLUMNS:
id | timestampadd | timestampadopt | dateborn | animaltype | gender | chipped | smalldescger | smalldesceng | imagepath
On that affected site I loop all animals, with pagination. So you can see 20 animals and for the next 20 you have to use the next button.
I need to know for the pagination how many sites have to be displayed, so I have to count how many animals in total are, that is what the subquery does.
I measured with profiling the times and get following results:
0.0047s for the total query,
0.0023s for the subquery
In the database are only 5 rows!
On that site I offer some filters, like age +/- 1 year and is the animal already adopted, because of that I need the WHERE clause on both, which probably takes up the most performance, followed by the order by clause which is necessary to display the new ones first.
P.S. I need all columns from the table, I did some testings and SELECT * had same runtimes then selecting all 10 columns manually like some people recommend.
EDIT:
Would it be worth to exclude the smalltext (varchar 250), imagpath (varchar 50) columns in a own table and inner join them, the other columns I could probably need for later filter. But type, gender, chipped are tinyints.
Any improvement tips for me?
Should I do the subquery in a own query outside of the main one?
Edit: 31.07
SELECT a.* , c.cnt AS countanimals
FROM animal a
JOIN (
Select a1.date AS date1, a1.tmstmpadopt AS tmstmpadopt1, a1.type AS type1, COUNT(a1.id) as cnt
FROM animal a1
GROUP BY date1, tmstmpadopt1, type1
) c on (a.date = c.date1 AND a.tmstmpadopt = c.tmstmpadopt1 AND a.type = c.type1)
WHERE a.type = 1 AND tmstmpadopt = 0 AND (date BETWEEN DATE_FORMAT(CURDATE() , '%Y-%m-%d') - INTERVAL 100 YEAR AND DATE_FORMAT(CURDATE(),'%Y-%m-%d')- INTERVAL 1 YEAR)
ORDER BY a.timestamp DESC
LIMIT 1, 20;
Inline view may help you. So try this
SELECT a.*,c.cnt AS countanimals
FROM animal a
join (Select a1.dateborn, a1.timestampadopt, count(a1.id) as cnt
from animals a1
Where a1.timestampadopt > 0
and a1.type = :type
group by a1.dateborn, a1.timestampadopt) c on (a.dateborn = c.dateborn and a.timestampadopt = c.timestampadopt)
WHERE a.type = :type
AND a.timestampadopt > 0
AND a.dateborn BETWEEN DATE_FORMAT(CURDATE(),'%Y-%m-%d')-INTERVAL 1 YEAR AND DATE_FORMAT(CURDATE(),'%Y-%m-%d'))
ORDER BY a.timestamp DESC
LIMIT 1, 20;
Why don't you do the count on the script, as you process the rows, you can count them.

Comparing Dates in a MYSQL Subquery

I have two tables
class
-------------
id name
-------------
1 Knives
2 Pastries
class_date
-------------
get_id start_date
-------------
1 2017-10-09
1 2017-11-15
1 2017-12-03
2 2017-10-30
The class 'Knives' is a series with multiple dates. The class 'Pastries' is only offered on one date.
I want my result to be based on Oct 10, 2017 (or current date). In my search I only want results based on the first date - in this case the date of Oct 9, 2017 for 'Knives' should disqualify it from showing up in the results. 'Pastries' should show up.
I am not sure if I should do a LEFT OUTER JOIN or a Subquery. I've tried both but neither works - but I'm probably not doing it correctly.
This is what I tried:
SELECT *
FROM class, class_date WHERE
class_date.get_id = class.id &&
(SELECT DATE(start_date)
FROM class, class_date WHERE
class_date.get_id = classes.id
ORDER BY class_date.start_date ASC
LIMIT 1
) > CURDATE()
ORDER BY class_date.start_date ASC
and
SELECT *
FROM class
LEFT OUTER JOIN
class_date ON
class_date.get_id = classes.id
WHERE
class_date.start_date > CURDATE()
GROUP BY classes.class_id
ORDER BY class_dates.start_date ASC
I have a feeling that the subquery is the way to go but I get no results. If I use < instead of > I get too many results. Any help would be appreciated.
Here is one method to get the most recent record as of a particular date. This allows you to get all the rows (and you can join in class to get rows there):
select cd.*
from class_date cd
where cd.date = (select max(cd2.date)
from class_date cd2
where cd2.get_id = cd.get_id and
cd2.date <= '2017-10-09'
);
If you just want the maximum date for a given class:
select cd.get_id, max(cd.date)
from class_date cd
where cd.date <= '2017-10-09'
group by cd.get_id;

mysql : Get latest value and sum of values from previous hour

I would like to return a product together with its latest value and values from last hour.
I have a product-table :
id, name, type (and so on)...
I have a values-table :
id_prod, timestamp, value
Something like :
12:00:00 = 10
12:15:00 = 10
12:30:00 = 10
12:45:00 = 10
13:00:00 = 10
13:15:00 = 10
13:30:00 = 10
I would like a query that returns the latest value (13:30:00) together with the sum of values one hour back. This should return:
time = 13:30:00
latestread = 10
lasthour = 40
What I almost got working was:
SELECT *,
(SELECT value FROM values S WHERE id_prod=P.id
ORDER BY timestamp DESC LIMIT 1) as latestread,
(SELECT sum(value) FROM values WHERE id_prod=D.id and
date_created>SUBTIME(S.date_created,'01:00:00')) as trendread
FROM prod P ORDER BY name
But this fails with "Unknown column 'S.date_created' in 'where clause'"
Any suggestions?
If I understand correctly what you're trying to do, then You would have something like:
SELECT p.id, max(date_created), sum(value), mv.max_value
FROM product p
JOIN values v on p.id = v.product_id
JOIN (SELECT product_id, value as max_value
FROM values v2
WHERE date_created = (SELECT max(date_created) FROM values WHERE product_id=v2.product_id)) mv on product_id=p.id
WHERE date_created between DATE_SUB(now(), INTERVAL 1 HOUR)) and now()
GROUP BY p.id
ORDER BY p.id
Aleks G and mhasan gave solutions, but not the reason why this fails. The reason this fails is because the alias S is not known inside the subquery. Subqueries have no knowledge about the tables outside their scope.
You have missed providing alias for table Values in subquery below
SELECT *,
(SELECT value FROM values S WHERE id_prod=P.id
ORDER BY timestamp DESC LIMIT 1) as latestread,
(SELECT sum(value) FROM values S WHERE id_prod=P.id and
date_created>SUBTIME(S.date_created,'01:00:00')) as trendread
FROM prod P ORDER BY name
I think this is the query that you are trying to write:
SELECT p.*,
(SELECT v.value
FROM values v
WHERE v.id_prod = p.id
ORDER BY v.timestamp DESC
LIMIT 1
) as latestread,
(SELECT sum(v.value)
FROM values v
WHERE v.id_prod = p.id and
v.timestamp > SUBTIME(now(), '01:00:00')
) as trendread
FROM prod p
ORDER BY p.name;
This changes all the aliases to be abbreviations for the table name. It also fixes the expression for the last hour by using now() and gets rid of date_created which doesn't seem to be in either table based on the question. The query conveniently assumes that timestamp is a datetime. If it is a unix timestamp, then somewhat different time logic is necessary.
This should be reasonably efficient with an index on values(id_prod, timestamp, value).

Get date when two things appear at the same time (mysql query)

Is there a sql query that can generate the date when 2 things appear together?
I mean, let's say I have a table consists of bus schedule. Then, I have bus A and B. Bus A will operate on 22 May, 24 May, and 25 May while B operates on 22 May, 24 May and 26 May. I want to get the most recent date that 2 buses appear together which is 24 May.
To see those that both buses share:
SELECT t.date
FROM YOUR_TABLE t
WHERE t.bus IN ('A', 'B')
GROUP BY t.date
HAVING COUNT(DISTINCT t.bus) = 2
To see the most recent date that both buses share:
SELECT t.date
FROM YOUR_TABLE t
WHERE t.bus IN ('A', 'B')
GROUP BY t.date
HAVING COUNT(DISTINCT t.bus) = 2
ORDER BY t.date DESC
LIMIT 1
Assuming you have a table named bus_schedule that contains a bus_name and bus_date field, something like this should work:
select bus_schedule_a.bus_date
from bus_schedule bus_schedule_a
inner join bus_schedule bus_schedule_b
on bus_schedule_a.bus_date = bus_schedule_b.bus_date
and bus_schedule_a.bus_name <> bus_schedule_b.bus_name
order by bus_schedule_a.bus_date desc
limit 1

How to select the most recent set of dated records from a mysql table

I am storing the response to various rpc calls in a mysql table with the following fields:
Table: rpc_responses
timestamp (date)
method (varchar)
id (varchar)
response (mediumtext)
PRIMARY KEY(timestamp,method,id)
What is the best method of selecting the most recent responses for all existing combinations of method and id?
For each date there can only be one response for a given method/id.
Not all call combinations are necessarily present for a given date.
There are dozens of methods, thousands of ids and at least 365 different dates
Sample data:
timestamp method id response
2009-01-10 getThud 16 "....."
2009-01-10 getFoo 12 "....."
2009-01-10 getBar 12 "....."
2009-01-11 getFoo 12 "....."
2009-01-11 getBar 16 "....."
Desired result:
2009-01-10 getThud 16 "....."
2009-01-10 getBar 12 "....."
2009-01-11 getFoo 12 "....."
2009-01-11 getBar 16 "....."
(I don't think this is the same question - it won't give me the most recent response)
This solution was updated recently.
Comments below may be outdated
This can query may perform well, because there are no joins.
SELECT * FROM (
SELECT *,if(#last_method=method,0,1) as new_method_group,#last_method:=method
FROM rpc_responses
ORDER BY method,timestamp DESC
) as t1
WHERE new_method_group=1;
Given that you want one resulting row per method this solution should work, using mysql variables to avoid a JOIN.
FYI, PostgreSQL has a way of doing this built into the language:
SELECT DISTINCT ON (method) timestamp, method, id, response
FROM rpc_responses
WHERE 1 # some where clause here
ORDER BY method, timestamp DESC
Self answered, but I'm not sure that it will be an efficient enough solution as the table grows:
SELECT timestamp,method,id,response FROM rpc_responses
INNER JOIN
(SELECT max(timestamp) as timestamp,method,id FROM rpc_responses GROUP BY method,id) latest
USING (timestamp,method,id);
Try this...
SELECT o1.id, o1.timestamp, o1.method, o1.response
FROM rpc_responses o1
WHERE o1.timestamp = ( SELECT max(o2.timestamp)
FROM rpc_responses o2
WHERE o1.id = o2.id )
ORDER BY o1.timestamp, o1.method, o1.response
...it even works in Access!
i used this,worked for me
select max(timestamp),method,id from tables where 1 group by method,id order by timestamp desc
Subquery is very taxing when the data set becomes larger.
Try this:
SELECT t1.*
FROM rpc_responses AS t1
INNER JOIN rpc_responses AS t2
GROUP BY t1.method, t1.id, t1.timestamp
HAVING t1.timestamp=MAX(t2.timestamp)
ORDER BY t1.timestamp, t1.method, t1.response;
Checking the three main answers in some other use case shows that the most voted answer is also by far the fastest, swarm intelligence works here:
# Answer 1: https://stackoverflow.com/a/12625667/11154841
# 165ms
SELECT
COUNT(0)
FROM
(
SELECT
mtn.my_primary_key,
mtn.my_info_col,
IF(#last_my_primary_key = my_primary_key,
0,
1) AS new_my_primary_key_group,
#last_my_primary_key := my_primary_key
FROM
my_db_schema.my_table_name mtn
WHERE
mtn.date_time_col > now() - INTERVAL 1 MONTH
ORDER BY
my_primary_key,
mtn.date_time_col DESC
) AS t1
WHERE
new_my_primary_key_group = 1
AND t1.my_info_col = 'delete';
# Answer 2: https://stackoverflow.com/a/435709/11154841
# 757ms
SELECT
count(0)
FROM
my_db_schema.my_table_name mtn
JOIN
(
SELECT
my_primary_key,
max(date_time_col) AS date_time_col
FROM
my_db_schema.my_table_name mtn
WHERE
mtn.date_time_col > now() - INTERVAL 1 MONTH
GROUP BY
mtn.my_primary_key) latest
USING (my_primary_key,
date_time_col)
WHERE
mtn.my_info_col = 'delete';
# Answer 3: https://stackoverflow.com/a/3185644/11154841
# 1.310s
SELECT
count(0)
FROM
my_db_schema.my_table_name mtn
WHERE
mtn.date_time_col = (
SELECT
max(mtn2.date_time_col)
FROM
my_db_schema.my_table_name mtn2
WHERE
mtn2.my_primary_key = mtn.my_primary_key
AND mtn2.date_time_col > now() - INTERVAL 1 MONTH
)
AND mtn.date_time_col > now() - INTERVAL 1 MONTH
AND mtn.my_info_col = 'delete';
The concept of "most recent" is fairly vague. If you mean something like the 100 most recent rows then you can just add a TOP(100) to your SELECT clause.
If you mean the "most recent" based on a the most recent date then you can just do
SELECT timestamp,method,id,response
FROM rpc_responses
HAVING max(timestamp) = timestamp
...is more than one year later but i might help someone
To select all the queries starting from latest
SELECT *
FROM rpc_responses
ORDER BY timestamp DESC