Increase speed of a mySQL query - mysql

I have a table like this.
CREATE TABLE `accounthistory` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`date` datetime DEFAULT NULL,
`change_ammount` float DEFAULT NULL,
`account_id` int(11) DEFAULT NULL,
PRIMARY KEY (`id`),
)
Its a list of account daily chargings. If i need the balance of the account i use
SELECT sum(change_ammount) FROM accounthistory WHERE account_id=;
Its quite fast becouse i added an index on the account_id column.
But now i need to find the time when the account went in minus (date when SUM(change_ammount)<0)
I use this query:
SELECT main.date as date from accounthistory as main
WHERE main.account_id=484368430
AND (SELECT sum(change_ammount) FROM accounthistory as sub
WHERE sub.account_id=484368430 AND
sub.date < main.date)<0
ORDER BY main.date DESC
LIMIT 1;
But it works very slow. Can you propose a beter solution?
Maybe i need some indexes (not only on account_id)?

The way to make your query faster is to use denormalization: Store the current account balance on every record. The achieve this, you'll have to do three things, then we'll look at how the query would look:
a) Add a columns to your table:
ALTER TABLE accounthistory ADD balance float;
b) Populate the new column
UPDATE accounthistory main SET
balance = (
SELECT SUM(change_amount)
FROM accounthistory
where account_id = main.account_id
and data <= main.date
);
c) To populate new rows, either a) use a trigger, b) use application logic, or c) run the above UPDATE statement for the row added after adding it, ie UPDATE ... WHERE id = ?
Now the query to find which dattes the account changed to negative, which will be very fast, becomes:
SELECT date
from accounthistory
where balance < 0
and balance - change_amount > 0
and account_id = ?;

SELECT MAX(main.date) as date
from accounthistory as main
WHERE main.account_id=484368430
AND EXISTS (SELECT 1 FROM accounthistory as sub
WHERE sub.account_id=main.account_id AND
sub.date < main.date HAVING SUM(sub.change_ammount) < 0)

Related

Delete all items in a database except the last date

I have a MySQL table that looks (very simplified) like this:
CREATE TABLE `logging` (
`id` bigint(20) NOT NULL,
`time` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
`level` smallint(3) NOT NULL,
`message` longtext CHARACTER SET utf8 COLLATE utf8_general_mysql500_ci NOT NULL
);
I would like to delete all rows of a specific level, except the last one (time is most recent).
Is there a way to select all rows with level set to a specific value and then delete all rows except the latest one in one single SQL query? How would I start solving this problem?
(As I said, this is a very simplified table, so please don't try to discuss possible design problems of this table. I removed some columns. It is designed per PSR-3 logging standard and I don't think there is an easy way to change that. What I want to solve is how I can select from a table and then delete all but some rows of the same table. I have only intermediate knowledge of MySQL.)
Thank you for pushing me in the right direction :)
Edit:
The Database version is /usr/sbin/mysqld Ver 8.0.18-0ubuntu0.19.10.1 for Linux on x86_64 ((Ubuntu))
You can use ROW_NUMBER() analytic function ( as using DB version 8+ ) :
DELETE lg FROM `logging` AS lg
WHERE lg.`id` IN
( SELECT t.`id`
FROM
(
SELECT t.*,
ROW_NUMBER() OVER (ORDER BY `time` DESC) as rn
FROM `logging` t
-- WHERE `level` = #lvl -- optionally add this line to restrict for a spesific value of `level`
) t
WHERE t.rn > 1
)
to delete all of the rows except the last inserted one(considering id is your primary key column).
You can do this:
SELECT COUNT(time) FROM logging WHERE level=some_level INTO #TIME_COUNT;
SET #TIME_COUNT = #TIME_COUNT-1;
PREPARE STMT FROM 'DELETE FROM logging WHERE level=some_level ORDER BY time ASC LIMIT ?;';
EXECUTE STMT USING #TIME_COUNT;
If you have an AUTO_INCREMENT id column - I would use it to determine the most recent entry. Here is one way doing that:
delete l
from (
select l1.level, max(id) as id
from logging l1
where l1.level = #level
) m
join logging l
on l.level = m.level
and l.id < m.id
An index on (level) should give you good performance and will support the MAX() subquery as well as the JOIN.
View on DB Fiddle
If you really need to use the time column, you can modify the query as follows:
delete l
from (
select l1.level, l1.id
from logging l1
where l1.level = #level
order by l1.time desc, l1.id desc
limit 1
) m
join logging l
on l.level = m.level
and l.id <> m.id
View on DB Fiddle
Here you would want to have an index on (level, time).

Update data from same table, with nested date

I am trying to create a table that I can use to compare a set 8 of GPS co-ordinates. Eventually I want to check that these co-ordinates are no more than 20m apart. I am currently having trouble populating this table as I keep getting the following error:
Error Code: 1093. You can't specify target table 'GPS1' for update in FROM clause
I have tried changing my query a few times, with no luck.
Currently this is what I have:
UPDATE ots_outlet_gps AS GPS1
LEFT JOIN
(SELECT *
FROM
(SELECT
TMP.store_code
, TMP.gps
, TMP.action_date
FROM tmp_outlet_gps TMP
JOIN
(SELECT *
FROM
ots_outlet_gps JOI
WHERE
action_date1 > (SELECT action_date1 FROM ots_outlet_gps AS AA WHERE store_code = JOI.store_code GROUP BY store_code)
) INN
ON
TMP.store_code = INN.store_code
WHERE
action_date >= '2019-01-01'
AND action_date <= '2019-01-06'
) PRNK
) SRC
ON
GPS1.store_code = SRC.store_code
SET
GPS1.gps2 = SRC.gps
, GPS1.action_date2 = SRC.action_date
WHERE
GPS1.gps2 IS NULL
AND GPS1.action_date2 IS NULL
;
TABLE STRUCTURE (ots_outlet_gps):
id int(6)
store_code bigint(12)
action_date1 date
gps1 varchar(20)
variance1 decimal(8,2)
action_date2 date
gps2 varchar(20)
variance2 decimal(8,2)
etc
TABLE STRUCTURE (tmp_outlet_gps):
store_code int(10)
gps varchar(20)
action_date date
Any help would be appreciated. I'm also not sure if I am using the correct approach for the desired end result, and would also be open to alternative suggestions.
Thanks.

Strange query results from MySQL

I have a query that I'm testing on my database, but for some weird reason, and randomly, it returns a different set of results. Interestingly, there are only two distinct result-sets that it returns, from thousands of rows, and the query will randomly return one or the other, but nothing else.
Is there a reason the query only returns one of two datasets? Query and schema below.
My goal is to select the fastest laps for a given track, in a given time period, but only the fastest lap for each user (so there are always 10 different users in the top 10).
Most of the time the correct results are returned, but randomly, a totally different result set is returned.
SELECT `lap`.`ID`, `lap`.`qualificationTime`, `lap`.`userId`
FROM `lap`
WHERE (lap.trackID =4)
AND (lap.raceDateTime >= "2013-07-25 10:00:00")
AND (lap.raceDateTime < "2013-08-04 23:59:59")
AND (isTestLap =0)
GROUP BY `userId`
ORDER BY `qualificationTime` ASC
LIMIT 10
Schema:
CREATE TABLE IF NOT EXISTS `lap` (
`ID` int(11) NOT NULL AUTO_INCREMENT,
`userId` int(11) DEFAULT NULL,
`trackId` int(11) DEFAULT NULL,
`raceDateTime` datetime NOT NULL,
`qualificationTime` decimal(7,4) DEFAULT '0.0000',
`isTestLap` int(11) NOT NULL DEFAULT '0',
PRIMARY KEY (`ID`)
(DB create script trimmed of un-needed columns)
You are using a (mis)feature of MySQL called hidden columns. As others have pointed out, you are allowed to put columns in the select statement that are not in the group by. But, the returned values are arbitrary, and not even guaranteed to be the same from one run to the next.
The solution is to find the max qualification time for each user. Then join this information back to get the other fields. Here is one way:
select l.*
from (SELECT userId, min(qualificationtime) as minqf
FROM lap
WHERE (lap.trackID =4)
AND (lap.raceDateTime >= "2013-07-25 10:00:00")
AND (lap.raceDateTime < "2013-08-04 23:59:59")
AND (isTestLap =0)
GROUP BY `userId`
) lu join
lap l
on lu.minqf = l.qualificationtime
ORDER BY l.`qualificationTime` ASC
LIMIT 10
You are selecting lap.ID, lap.qualificationTime and lap.userId, but you are not GROUPing BY them. You can only select fields you group by, or else aggregate functions on the other fields (MIN, MAX, AVG, etc). Otherwise, results are undefined.
I think you mean that sometimes values for lap.ID, lap.qualificationTime are different. And it's right behaviour for mysql. Because you group by userId and you don't know what values for other fields will be returned. Mysql can select different values depend on first value or last rows reading.
I would check something like this:
SELECT `l1`.`qualificationTime`, `l1`.`userId`,
(SELECT l2.ID FROM `lap` AS l2 WHERE l2.`userId` = l1.userId AND
l2.qualificationTime = min(l1.`qualificationTime`))
FROM `lap` AS `l1`
WHERE (l1.trackID =4)
AND (l1.raceDateTime >= "2013-07-25 10:00:00")
AND (l1.raceDateTime < "2013-08-04 23:59:59")
AND (isTestLap =0)
GROUP BY `userId`
ORDER BY `qualificationTime` ASC
LIMIT 10
It's likely to be your ORDER BY on a decimal entity, and how the DB stores this and then retrieves it.

How to format this mysql Query

What I have is a table statistieken with an ip, hash of browser info, url visited and last visited date in timestamp.
What I could compile from different sources led to this query, the only problem is that this query takes forever(9 minutes) to complete on a table with about 15000 rows, so this query is very inefficient.
I think I'm going to this the wrong way around, but I can't find a decent post or tutorial how to use the results of a select as basis for getting the results I want.
What I simply want is an overview of every entry in the table that matches the hash of the results that are returned that have visted more than 25 pages in the last 12 hours.
CREATE TABLE IF NOT EXISTS `statsitieken` (
`hash` varchar(35) NOT NULL,
`ip` varchar(24) NOT NULL,
`visits` int(11) NOT NULL,
`lastvisit` int(11) NOT NULL,
`browserinfo` text NOT NULL,
`urls` text NOT NULL
) ENGINE=MyISAM DEFAULT CHARSET=latin1;
This is the query I have tried to compile so far.
SELECT * FROM `database`.`statsitieken` WHERE hash in (SELECT hash FROM `database`.`statsitieken`
where `lastvisit` > unix_timestamp(DATE_SUB(
NOW(),INTERVAL 12 hour
)
)
group by hash
having count(urls) > 25
order by urls)
I need this to compile in a decent time, like < 1 second which should be possible in my opinion...
I suggest trying this modified query. The subquery is now computed only once instead of being run for each record returned:
SELECT s.*
FROM `database`.`statsitieken` s, (SELECT *
FROM `database`.`statsitieken`
WHERE `lastvisit` > UNIX_TIMESTAMP(DATE_SUB(NOW(),INTERVAL 12 HOUR))
GROUP BY hash
HAVING COUNT(urls)>25) tmp
WHERE s.`hash`=tmp.`hash`
ORDER BY s.urls
Be sure you have indexes on the following fields:
hash to speed up the GROUP BY and WHERE
urls to speed up the ORDER BY
Derived table with INNER JOIN is faster than a subquery. try this optimized query:
SELECT *
FROM statsitieken a
INNER JOIN (SELECT hash
FROM statsitieken
WHERE lastvisit > unix_timestamp(DATE_SUB(
NOW(),INTERVAL 12 hour
)
) b
ON a.hash = b.hash
GROUP BY a.hash
HAVING COUNT(urls) > 25
ORDER BY urls;
For better performance of this select query you should add indexes as:
ALTER TABLE statsitieken ADD KEY ix_hash(hash);
ALTER TABLE statsitieken ADD KEY ix_lastvisit(lastvisit);
WHERE hash in (SELECT hash FROM `database`.`statsitieken`
where `lastvisit` > unix_timestamp(DATE_SUB(
NOW(),INTERVAL 12 hour
)
)
You are "subquerying" (i don't know if exists that word :P, 'doing a subquery') in the same table, why not to:
where `lastvisit` > unix_timestamp(DATE_SUB(
NOW(),INTERVAL 12 hour
)
do it directly?

Doing some calculations in mysql, numbers off when using GROUP BY

Im running the following query to get the stats for a user, based on which I pay them.
SELECT hit_paylevel, sum(hit_uniques) as day_unique_hits
, (sum(hit_uniques)/1000)*hit_paylevel as day_earnings
, hit_date
FROM daily_hits
WHERE hit_user = 'xxx' AND hit_date >= '2011-05-01' AND hit_date < '2011-06-01'
GROUP BY hit_user
The table in question looks like this:
CREATE TABLE IF NOT EXISTS `daily_hits` (
`hit_itemid` varchar(255) NOT NULL,
`hit_mainid` int(11) NOT NULL,
`hit_user` int(11) NOT NULL,
`hit_date` date NOT NULL,
`hit_hits` int(11) NOT NULL DEFAULT '0',
`hit_uniques` int(11) NOT NULL,
`hit_embed` int(11) NOT NULL,
`hit_paylevel` int(1) NOT NULL DEFAULT '1',
PRIMARY KEY (`hit_itemid`,`hit_date`),
KEY `hit_user` (`hit_user`),
KEY `hit_mainid` (`hit_mainid`,`hit_date`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
The problem in the calculation has to do with the hit_paylevel which acts as a multiplier. Default is one, the other option is 2 or 3, which essentially doubles or triples the earnings for that day.
If I loop through the days, the daily day_earnings is correct, its just that when I group them, it calculates everything as paylevel 1. This happens if the user was paylevel 1 in the beginning, and was later upgraded to a higher level. if user is pay level 2 from the start, it also calculates everything correctly.
Shouldn't this be sum(hit_uniques * hit_paylevel) / 1000?
Like #Denis said:
Change the query to
SELECT hit_paylevel, sum(hit_uniques) as day_unique_hits
, sum(hit_uniques * hit_paylevel) / 1000 as day_earnings
, hit_date
FROM daily_hits
WHERE hit_user = 'xxx' AND hit_date >= '2011-05-01' AND hit_date < '2011-06-01'
GROUP BY hit_user;
Why this fixes the problem
Doing the hit_paylevel outside the sum, first sums all hit_uniques and then picks a random hit_paylevel to multiply it by.
Not what you want. If you do both columns inside the sum MySQL will pair up the correct hit_uniques and hit_paylevels.
The dangers of group by
This is an important thing to remember on MySQL.
The group by clause works different from other databases.
On MSSQL *(or Oracle or PostgreSQL) you would have gotten an error
non-aggregate expression must appear in group by clause
Or words to that effect.
In your original query hit_paylevel is not in an aggregate (sum) and it's also not in the group by clause, so MySQL just picks a value at random.