What I have is a table statistieken with an ip, hash of browser info, url visited and last visited date in timestamp.
What I could compile from different sources led to this query, the only problem is that this query takes forever(9 minutes) to complete on a table with about 15000 rows, so this query is very inefficient.
I think I'm going to this the wrong way around, but I can't find a decent post or tutorial how to use the results of a select as basis for getting the results I want.
What I simply want is an overview of every entry in the table that matches the hash of the results that are returned that have visted more than 25 pages in the last 12 hours.
CREATE TABLE IF NOT EXISTS `statsitieken` (
`hash` varchar(35) NOT NULL,
`ip` varchar(24) NOT NULL,
`visits` int(11) NOT NULL,
`lastvisit` int(11) NOT NULL,
`browserinfo` text NOT NULL,
`urls` text NOT NULL
) ENGINE=MyISAM DEFAULT CHARSET=latin1;
This is the query I have tried to compile so far.
SELECT * FROM `database`.`statsitieken` WHERE hash in (SELECT hash FROM `database`.`statsitieken`
where `lastvisit` > unix_timestamp(DATE_SUB(
NOW(),INTERVAL 12 hour
)
)
group by hash
having count(urls) > 25
order by urls)
I need this to compile in a decent time, like < 1 second which should be possible in my opinion...
I suggest trying this modified query. The subquery is now computed only once instead of being run for each record returned:
SELECT s.*
FROM `database`.`statsitieken` s, (SELECT *
FROM `database`.`statsitieken`
WHERE `lastvisit` > UNIX_TIMESTAMP(DATE_SUB(NOW(),INTERVAL 12 HOUR))
GROUP BY hash
HAVING COUNT(urls)>25) tmp
WHERE s.`hash`=tmp.`hash`
ORDER BY s.urls
Be sure you have indexes on the following fields:
hash to speed up the GROUP BY and WHERE
urls to speed up the ORDER BY
Derived table with INNER JOIN is faster than a subquery. try this optimized query:
SELECT *
FROM statsitieken a
INNER JOIN (SELECT hash
FROM statsitieken
WHERE lastvisit > unix_timestamp(DATE_SUB(
NOW(),INTERVAL 12 hour
)
) b
ON a.hash = b.hash
GROUP BY a.hash
HAVING COUNT(urls) > 25
ORDER BY urls;
For better performance of this select query you should add indexes as:
ALTER TABLE statsitieken ADD KEY ix_hash(hash);
ALTER TABLE statsitieken ADD KEY ix_lastvisit(lastvisit);
WHERE hash in (SELECT hash FROM `database`.`statsitieken`
where `lastvisit` > unix_timestamp(DATE_SUB(
NOW(),INTERVAL 12 hour
)
)
You are "subquerying" (i don't know if exists that word :P, 'doing a subquery') in the same table, why not to:
where `lastvisit` > unix_timestamp(DATE_SUB(
NOW(),INTERVAL 12 hour
)
do it directly?
Related
I have a MySQL table that looks (very simplified) like this:
CREATE TABLE `logging` (
`id` bigint(20) NOT NULL,
`time` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
`level` smallint(3) NOT NULL,
`message` longtext CHARACTER SET utf8 COLLATE utf8_general_mysql500_ci NOT NULL
);
I would like to delete all rows of a specific level, except the last one (time is most recent).
Is there a way to select all rows with level set to a specific value and then delete all rows except the latest one in one single SQL query? How would I start solving this problem?
(As I said, this is a very simplified table, so please don't try to discuss possible design problems of this table. I removed some columns. It is designed per PSR-3 logging standard and I don't think there is an easy way to change that. What I want to solve is how I can select from a table and then delete all but some rows of the same table. I have only intermediate knowledge of MySQL.)
Thank you for pushing me in the right direction :)
Edit:
The Database version is /usr/sbin/mysqld Ver 8.0.18-0ubuntu0.19.10.1 for Linux on x86_64 ((Ubuntu))
You can use ROW_NUMBER() analytic function ( as using DB version 8+ ) :
DELETE lg FROM `logging` AS lg
WHERE lg.`id` IN
( SELECT t.`id`
FROM
(
SELECT t.*,
ROW_NUMBER() OVER (ORDER BY `time` DESC) as rn
FROM `logging` t
-- WHERE `level` = #lvl -- optionally add this line to restrict for a spesific value of `level`
) t
WHERE t.rn > 1
)
to delete all of the rows except the last inserted one(considering id is your primary key column).
You can do this:
SELECT COUNT(time) FROM logging WHERE level=some_level INTO #TIME_COUNT;
SET #TIME_COUNT = #TIME_COUNT-1;
PREPARE STMT FROM 'DELETE FROM logging WHERE level=some_level ORDER BY time ASC LIMIT ?;';
EXECUTE STMT USING #TIME_COUNT;
If you have an AUTO_INCREMENT id column - I would use it to determine the most recent entry. Here is one way doing that:
delete l
from (
select l1.level, max(id) as id
from logging l1
where l1.level = #level
) m
join logging l
on l.level = m.level
and l.id < m.id
An index on (level) should give you good performance and will support the MAX() subquery as well as the JOIN.
View on DB Fiddle
If you really need to use the time column, you can modify the query as follows:
delete l
from (
select l1.level, l1.id
from logging l1
where l1.level = #level
order by l1.time desc, l1.id desc
limit 1
) m
join logging l
on l.level = m.level
and l.id <> m.id
View on DB Fiddle
Here you would want to have an index on (level, time).
I have table in MySQL with 10 million rows with 2 GB data
selecting IN LIFO format data is slow
Table engine is = InnoDB
table has one primary key and one unique key
SELECT * FROM link LIMIT 999999 , 50;
how I improve the performance of the table. ?
table structure
id int(11) NO PRI NULL auto_increment
url varchar(255) NO UNI NULL
website varchar(100) NO NULL
state varchar(10) NO NULL
type varchar(100) NO NULL
prio varchar(100) YES NULL
change varchar(100) YES NULL
last varchar(100) YES NULL
NOTE:
SELECT * FROM link LIMIT 1 , 50; is taking .9ms but current sql is taking 1000ms its 1000 time taking more
This most likely is due to "early row lookup"
MySQL can be forced to do "late row lookup". Try below query
SELECT l.*
FROM (
SELECT id
FROM link
ORDER BY
id
LIMIT 999999 , 50
) q
JOIN link l
ON l.id = q.id
Check this article
MySQL limit clause and rate low lookups
For the Next and Prev buttons you can use a WHERE clause instead of OFFSET.
Example (using LIMIT 10 - Sample data explained below): You are on some page which shows you 10 rows with the ids [2522,2520,2514,2513,2509,2508,2506,2504,2497,2496]. This in my case is created with
select *
from link l
order by l.id desc
limit 10
offset 999000
For the next page you would use
limit 10
offset 999010
getting rows with ids [2495,2494,2493,2492,2491,2487,2483,2481,2479,2475].
For the previous page you would use
limit 10
offset 998990
getting rows with ids [2542,2541,2540,2538,2535,2533,2530,2527,2525,2524].
All above queries execute in 500 msec. Using the "trick" suggested by Sanj it still takes 250 msec.
Now with the given page with minId=2496 and maxId=2522 we can create queries for the Next and Last buttons using the WHERE clause.
Next button:
select *
from link l
where l.id < :minId -- =2496
order by l.id desc
limit 10
Resulting ids: [2495,2494,2493,2492,2491,2487,2483,2481,2479,2475].
Prev button:
select *
from link l
where l.id > :maxId -- =2522
order by l.id asc
limit 10
Resulting ids: [2524,2525,2527,2530,2533,2535,2538,2540,2541,2542].
To reverse the order you can use the query in a subselect:
select *
from (
select *
from link l
where l.id > 2522
order by l.id asc
limit 10
) sub
order by id desc
Resulting ids: [2542,2541,2540,2538,2535,2533,2530,2527,2525,2524].
These queries execute in "no time" (less than 1 msec) and provide the same result.
You can not use this solution to create page numbers. But i don't think you are going to output 200K page numbers.
Test data:
Data used for the example and benchmarks has been created with
CREATE TABLE `link` (
`id` INT(11) NOT NULL AUTO_INCREMENT,
`url` VARCHAR(255) NOT NULL,
`website` VARCHAR(100) NULL DEFAULT NULL,
`state` VARCHAR(10) NULL DEFAULT NULL,
`type` VARCHAR(100) NULL DEFAULT NULL,
`prio` VARCHAR(100) NULL DEFAULT NULL,
`change` VARCHAR(100) NULL DEFAULT NULL,
`last` VARCHAR(100) NULL DEFAULT NULL,
PRIMARY KEY (`id`),
UNIQUE INDEX `url` (`url`)
) COLLATE='utf8_general_ci' ENGINE=InnoDB;
insert into link
select i.id
, concat(id, '-', rand()) url
, rand() website
, rand() state
, rand() `type`
, rand() prio
, rand() `change`
, rand() `last`
from test._dummy_indexes_2p23 i
where i.id <= 2000000
and rand() < 0.5
where test._dummy_indexes_2p23 is a table containing 2^23 ids (about 8M). So the data contains about 1M rows randomly missing every second id. Table size: 228 MB
Due to the large amount of data,
There are few tips for improve the query response time:
Change the storage engine Innodb to myisam.
Create table partitioning
(https://dev.mysql.com/doc/refman/5.7/en/partitioning-management.html)
Mysql cluster (http://dev.mysql.com/doc/refman/5.7/en/mysql-cluster-overview.html)
Increase hardware capacity.
Thanks
First of all running on your table without any order doesn't guaranty your query will return the same data if ran twice.
It's better adding an ORDER BY clause. Taking id as a good candidate, as it's your primary key and seems unique (as it's an auto_increment value).
You could use this as your base:
SELECT * FROM link ORDER BY id LIMIT 50;
This will give you the first 50 rows in your table.
Now for the next 50 rows, instead of using OFFSET, we could save our last location in the query.
You would save the id from the last row last id from the previous query and use it in the next query:
SELECT * FROM link WHERE id > last_id ORDER BY id LIMIT 50;
This will give you the next 50 rows after the last id.
The reason your query runs slowly on high values of OFFSET is because mysql has to run on all rows in the given OFFSET and return the last LIMIT number of rows. This means that the bigger OFFSET is the slower the query will run.
The solution I showed above, doesn't depend on OFFSET, thus the query will run at the same speed independent of the current page.
See also this useful article that explains a few other options you can choose from: http://www.iheavy.com/2013/06/19/3-ways-to-optimize-for-paging-in-mysql/
I have updated my SQL Query to this and this is taking less amount of time.
SELECT * FROM link ORDER BY id LIMIT 999999 , 50 ;
I have a query that I'm testing on my database, but for some weird reason, and randomly, it returns a different set of results. Interestingly, there are only two distinct result-sets that it returns, from thousands of rows, and the query will randomly return one or the other, but nothing else.
Is there a reason the query only returns one of two datasets? Query and schema below.
My goal is to select the fastest laps for a given track, in a given time period, but only the fastest lap for each user (so there are always 10 different users in the top 10).
Most of the time the correct results are returned, but randomly, a totally different result set is returned.
SELECT `lap`.`ID`, `lap`.`qualificationTime`, `lap`.`userId`
FROM `lap`
WHERE (lap.trackID =4)
AND (lap.raceDateTime >= "2013-07-25 10:00:00")
AND (lap.raceDateTime < "2013-08-04 23:59:59")
AND (isTestLap =0)
GROUP BY `userId`
ORDER BY `qualificationTime` ASC
LIMIT 10
Schema:
CREATE TABLE IF NOT EXISTS `lap` (
`ID` int(11) NOT NULL AUTO_INCREMENT,
`userId` int(11) DEFAULT NULL,
`trackId` int(11) DEFAULT NULL,
`raceDateTime` datetime NOT NULL,
`qualificationTime` decimal(7,4) DEFAULT '0.0000',
`isTestLap` int(11) NOT NULL DEFAULT '0',
PRIMARY KEY (`ID`)
(DB create script trimmed of un-needed columns)
You are using a (mis)feature of MySQL called hidden columns. As others have pointed out, you are allowed to put columns in the select statement that are not in the group by. But, the returned values are arbitrary, and not even guaranteed to be the same from one run to the next.
The solution is to find the max qualification time for each user. Then join this information back to get the other fields. Here is one way:
select l.*
from (SELECT userId, min(qualificationtime) as minqf
FROM lap
WHERE (lap.trackID =4)
AND (lap.raceDateTime >= "2013-07-25 10:00:00")
AND (lap.raceDateTime < "2013-08-04 23:59:59")
AND (isTestLap =0)
GROUP BY `userId`
) lu join
lap l
on lu.minqf = l.qualificationtime
ORDER BY l.`qualificationTime` ASC
LIMIT 10
You are selecting lap.ID, lap.qualificationTime and lap.userId, but you are not GROUPing BY them. You can only select fields you group by, or else aggregate functions on the other fields (MIN, MAX, AVG, etc). Otherwise, results are undefined.
I think you mean that sometimes values for lap.ID, lap.qualificationTime are different. And it's right behaviour for mysql. Because you group by userId and you don't know what values for other fields will be returned. Mysql can select different values depend on first value or last rows reading.
I would check something like this:
SELECT `l1`.`qualificationTime`, `l1`.`userId`,
(SELECT l2.ID FROM `lap` AS l2 WHERE l2.`userId` = l1.userId AND
l2.qualificationTime = min(l1.`qualificationTime`))
FROM `lap` AS `l1`
WHERE (l1.trackID =4)
AND (l1.raceDateTime >= "2013-07-25 10:00:00")
AND (l1.raceDateTime < "2013-08-04 23:59:59")
AND (isTestLap =0)
GROUP BY `userId`
ORDER BY `qualificationTime` ASC
LIMIT 10
It's likely to be your ORDER BY on a decimal entity, and how the DB stores this and then retrieves it.
I have a table with posts and I want to generate a graph that shows how many posts were made the previous last 30 minutes, and the last 30 minutes before that etc. The posts are selected by their post_handler and post_status.
The table structure looks like this.
CREATE TABLE IF NOT EXISTS `posts` (
`post_title` varchar(255) NOT NULL,
`post_content` text NOT NULL,
`post_date_added` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
`post_handler` varchar(255) NOT NULL,
`post_status` tinyint(4) NOT NULL,
`id` int(11) NOT NULL AUTO_INCREMENT,
PRIMARY KEY (`id`),
KEY `post_status` (`post_status`),
KEY `post_status_2` (`post_status`,`id`),
KEY `post_handler` (`post_handler`),
KEY `post_date_added` (`post_date_added`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 AUTO_INCREMENT=2300131 ;
The results I'd like to receive, sorted after post_date_added.
period_start period_end posts
2011-12-06 19:23:44 2011-12-06 19:53:44 10
2011-12-06 19:53:44 2011-12-06 20:23:44 39
2011-12-06 20:23:44 2011-12-06 20:53:44 40
Right now I use solution where I have to run this query many times over, and then insert the data into another table from the PHP script.
SELECT COUNT(*) FROM posts WHERE post_handler = 'test' AND post_status = 1 AND post_date_added BETWEEN '2011-12-06 19:23:44' AND '2011-12-06 19:53:44'
Do you know any other solution? Is there any way to run a query that also inserts results into the database, all in one query?
Its fairly easy to group by distinctive time parameters, like hour, minute, day or whatever. If you want to group this by an hour, a possible query might look like this:
SELECT DATE_FORMAT(post_date_added,"%Y-%m-%d %H") AS "_Date",
COUNT(*)
FROM posts
WHERE post_handler = 'test'
AND post_status = 1
GROUP BY _Date;
(run this with a mysql query tool of your choice to see the output).
However, if you want to consider 30mins as the base of your group, the SQL part will get more tricky. For this special purpose, since you've only have to divide into two different subsets, maybe work with this approach:
SELECT DATE_FORMAT(post_date_added,"%Y-%m-%d %H") AS "_Date",
"00" AS "semihour",
COUNT(*)
FROM posts
WHERE post_handler = 'test'
AND DATE_FORMAT(post_date_added,"%i") < 30
AND post_status = 1
GROUP BY _Date
UNION
SELECT DATE_FORMAT(post_date_added,"%Y-%m-%d %H") AS "_Date",
"30" AS "semihour",
COUNT(*)
FROM posts
WHERE post_handler = 'test'
AND DATE_FORMAT(post_date_added,"%i") >= 30
AND post_status = 1
GROUP BY _Date;
Again, run this with a mysql query tool of your choice to see the output. You could add mathematical distinguishments there too working with CASE or IF and such, but personally I'd either group by hour or minute just to keep the SQL part way easier.
To directly add those numbers into your graph database, use this syntax:
INSERT INTO yourtable (yourfields)
SELECT ...
More details about this can be found here in the MySQL documentation.
In (very) brief: yes, you can insert the results of a query into another table. Take a look at INSERT ... SELECT here: http://dev.mysql.com/doc/refman/5.1/en/insert-select.html
Essentially, you'd just change what you have to something like
INSERT INTO post_statistics_table (period_start, period_end, posts)
SELECT ?, ?, COUNT(*) FROM posts
WHERE post_handler = 'test'
AND post_status = 1
AND post_date_added BETWEEN ? AND ?
and then fill in the four ?s with the same two DATETIMEs, repeated. ($from, $to, $from, $to)
I have a table like this.
CREATE TABLE `accounthistory` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`date` datetime DEFAULT NULL,
`change_ammount` float DEFAULT NULL,
`account_id` int(11) DEFAULT NULL,
PRIMARY KEY (`id`),
)
Its a list of account daily chargings. If i need the balance of the account i use
SELECT sum(change_ammount) FROM accounthistory WHERE account_id=;
Its quite fast becouse i added an index on the account_id column.
But now i need to find the time when the account went in minus (date when SUM(change_ammount)<0)
I use this query:
SELECT main.date as date from accounthistory as main
WHERE main.account_id=484368430
AND (SELECT sum(change_ammount) FROM accounthistory as sub
WHERE sub.account_id=484368430 AND
sub.date < main.date)<0
ORDER BY main.date DESC
LIMIT 1;
But it works very slow. Can you propose a beter solution?
Maybe i need some indexes (not only on account_id)?
The way to make your query faster is to use denormalization: Store the current account balance on every record. The achieve this, you'll have to do three things, then we'll look at how the query would look:
a) Add a columns to your table:
ALTER TABLE accounthistory ADD balance float;
b) Populate the new column
UPDATE accounthistory main SET
balance = (
SELECT SUM(change_amount)
FROM accounthistory
where account_id = main.account_id
and data <= main.date
);
c) To populate new rows, either a) use a trigger, b) use application logic, or c) run the above UPDATE statement for the row added after adding it, ie UPDATE ... WHERE id = ?
Now the query to find which dattes the account changed to negative, which will be very fast, becomes:
SELECT date
from accounthistory
where balance < 0
and balance - change_amount > 0
and account_id = ?;
SELECT MAX(main.date) as date
from accounthistory as main
WHERE main.account_id=484368430
AND EXISTS (SELECT 1 FROM accounthistory as sub
WHERE sub.account_id=main.account_id AND
sub.date < main.date HAVING SUM(sub.change_ammount) < 0)