Update data from same table, with nested date - mysql

I am trying to create a table that I can use to compare a set 8 of GPS co-ordinates. Eventually I want to check that these co-ordinates are no more than 20m apart. I am currently having trouble populating this table as I keep getting the following error:
Error Code: 1093. You can't specify target table 'GPS1' for update in FROM clause
I have tried changing my query a few times, with no luck.
Currently this is what I have:
UPDATE ots_outlet_gps AS GPS1
LEFT JOIN
(SELECT *
FROM
(SELECT
TMP.store_code
, TMP.gps
, TMP.action_date
FROM tmp_outlet_gps TMP
JOIN
(SELECT *
FROM
ots_outlet_gps JOI
WHERE
action_date1 > (SELECT action_date1 FROM ots_outlet_gps AS AA WHERE store_code = JOI.store_code GROUP BY store_code)
) INN
ON
TMP.store_code = INN.store_code
WHERE
action_date >= '2019-01-01'
AND action_date <= '2019-01-06'
) PRNK
) SRC
ON
GPS1.store_code = SRC.store_code
SET
GPS1.gps2 = SRC.gps
, GPS1.action_date2 = SRC.action_date
WHERE
GPS1.gps2 IS NULL
AND GPS1.action_date2 IS NULL
;
TABLE STRUCTURE (ots_outlet_gps):
id int(6)
store_code bigint(12)
action_date1 date
gps1 varchar(20)
variance1 decimal(8,2)
action_date2 date
gps2 varchar(20)
variance2 decimal(8,2)
etc
TABLE STRUCTURE (tmp_outlet_gps):
store_code int(10)
gps varchar(20)
action_date date
Any help would be appreciated. I'm also not sure if I am using the correct approach for the desired end result, and would also be open to alternative suggestions.
Thanks.

Related

Delete all items in a database except the last date

I have a MySQL table that looks (very simplified) like this:
CREATE TABLE `logging` (
`id` bigint(20) NOT NULL,
`time` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
`level` smallint(3) NOT NULL,
`message` longtext CHARACTER SET utf8 COLLATE utf8_general_mysql500_ci NOT NULL
);
I would like to delete all rows of a specific level, except the last one (time is most recent).
Is there a way to select all rows with level set to a specific value and then delete all rows except the latest one in one single SQL query? How would I start solving this problem?
(As I said, this is a very simplified table, so please don't try to discuss possible design problems of this table. I removed some columns. It is designed per PSR-3 logging standard and I don't think there is an easy way to change that. What I want to solve is how I can select from a table and then delete all but some rows of the same table. I have only intermediate knowledge of MySQL.)
Thank you for pushing me in the right direction :)
Edit:
The Database version is /usr/sbin/mysqld Ver 8.0.18-0ubuntu0.19.10.1 for Linux on x86_64 ((Ubuntu))
You can use ROW_NUMBER() analytic function ( as using DB version 8+ ) :
DELETE lg FROM `logging` AS lg
WHERE lg.`id` IN
( SELECT t.`id`
FROM
(
SELECT t.*,
ROW_NUMBER() OVER (ORDER BY `time` DESC) as rn
FROM `logging` t
-- WHERE `level` = #lvl -- optionally add this line to restrict for a spesific value of `level`
) t
WHERE t.rn > 1
)
to delete all of the rows except the last inserted one(considering id is your primary key column).
You can do this:
SELECT COUNT(time) FROM logging WHERE level=some_level INTO #TIME_COUNT;
SET #TIME_COUNT = #TIME_COUNT-1;
PREPARE STMT FROM 'DELETE FROM logging WHERE level=some_level ORDER BY time ASC LIMIT ?;';
EXECUTE STMT USING #TIME_COUNT;
If you have an AUTO_INCREMENT id column - I would use it to determine the most recent entry. Here is one way doing that:
delete l
from (
select l1.level, max(id) as id
from logging l1
where l1.level = #level
) m
join logging l
on l.level = m.level
and l.id < m.id
An index on (level) should give you good performance and will support the MAX() subquery as well as the JOIN.
View on DB Fiddle
If you really need to use the time column, you can modify the query as follows:
delete l
from (
select l1.level, l1.id
from logging l1
where l1.level = #level
order by l1.time desc, l1.id desc
limit 1
) m
join logging l
on l.level = m.level
and l.id <> m.id
View on DB Fiddle
Here you would want to have an index on (level, time).

SQL alternative to sub-query in FROM

I have a table containing user to user messages. A conversation has all messages between two users. I am trying to get a list of all the different conversations and display only the last message sent in the listing.
I am able to do this with a SQL sub-query in FROM.
CREATE TABLE `messages` (
`id` bigint(20) NOT NULL AUTO_INCREMENT,
`from_user_id` bigint(20) DEFAULT NULL,
`to_user_id` bigint(20) DEFAULT NULL,
`type` smallint(6) NOT NULL,
`is_read` tinyint(1) NOT NULL,
`is_deleted` tinyint(1) NOT NULL,
`text` longtext COLLATE utf8_unicode_ci NOT NULL,
`heading` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
`created_at_utc` datetime DEFAULT NULL,
`read_at_utc` datetime DEFAULT NULL,
PRIMARY KEY (`id`)
);
SELECT * FROM
(SELECT * FROM `messages` WHERE TYPE = 1 AND
(from_user_id = 22 OR to_user_id = 22)
ORDER BY created_at_utc DESC
) tb
GROUP BY from_user_id, to_user_id;
SQL Fiddle:
http://www.sqlfiddle.com/#!2/845275/2
Is there a way to do this without a sub-query?
(writing a DQL which supports sub-queries only in 'IN')
You seem to be trying to get the last contents of messages to or from user 22 with type = 1. Your method is explicitly not guaranteed to work, because the extra columns (not in the group by) can come from arbitrary rows. As explained in the [documentation][1]:
MySQL extends the use of GROUP BY so that the select list can refer to
nonaggregated columns not named in the GROUP BY clause. This means
that the preceding query is legal in MySQL. You can use this feature
to get better performance by avoiding unnecessary column sorting and
grouping. However, this is useful primarily when all values in each
nonaggregated column not named in the GROUP BY are the same for each
group. The server is free to choose any value from each group, so
unless they are the same, the values chosen are indeterminate.
Furthermore, the selection of values from each group cannot be
influenced by adding an ORDER BY clause. Sorting of the result set
occurs after values have been chosen, and ORDER BY does not affect
which values within each group the server chooses.
The query that you want is more along the lines of this (assuming that you have an auto-incrementing id column for messages):
select m.*
from (select m.from_user_id, m.to_user_id, max(m.id) as max_id
from message m
where m.type = 1 and (m.from_user_id = 22 or m.to_user_id = 22)
) lm join
messages m
on lm.max_id = m.id;
Or this:
select m.*
from message m
where m.type = 1 and (m.from_user_id = 22 or m.to_user_id = 22) and
not exists (select 1
from messages m2
where m2.type = m.type and m2.from_user_id = m.from_user_id and
m2.to_user_id = m.to_user_id and
m2.created_at_utc > m.created_at_utc
);
For this latter query, an index on messages(type, from_user_id, to_user_id, created_at_utc) would help performance.
Since this is a rather specific type of data query which goes outside common ORM use cases, DQL isn't really fit for this - it's optimized for walking well-defined relationships.
For your case however Doctrine fully supports native SQL with result set mapping. Using a NativeQuery with ResultSetMapping like this you can easily use the subquery this problem requires, and still map the results on native Doctrine entities, allowing you to still profit from all caching, usability and performance advantages.
Samples found here.
If you mean to get all conversations and all their last messages, then a subquery is necessary.
SELECT a.* FROM messages a
INNER JOIN (
SELECT
MAX(created_at_utc) as max_created,
from_user_id,
to_user_id
FROM messages
GROUP BY from_user_id, to_user_id
) b ON a.created_at_utc = b.max_created
AND a.from_user_id = b.from_user_id
AND a.to_user_id = b.to_user_id
And you could append the where condition as you like.
THE SQL FIDDLE.
I don't think your original query was even doing this correctly. Not sure what the GROUP BY was being used for other than maybe try to only return a single (unpredictable) result.
Just add a limit clause:
SELECT * FROM `messages`
WHERE `type` = 1 AND
(`from_user_id` = 22 OR `to_user_id` = 22)
ORDER BY `created_at_utc` DESC
LIMIT 1
For optimum query performance you need indexes on the following fields:
type
from_user_id
to_user_id
created_at_utc

Using max function, group by and join

I have 3 tables:
CREATE TABLE IF NOT EXISTS `disksinfo` (
`idx` int(10) NOT NULL AUTO_INCREMENT,
`hostinfo_idx` int(10) DEFAULT NULL,
`id` char(30) DEFAULT NULL,
`name` char(30) DEFAULT NULL,
`size` bigint(20) DEFAULT NULL,
`freespace` bigint(20) DEFAULT NULL,
PRIMARY KEY (`idx`)
)
CREATE TABLE IF NOT EXISTS `hostinfo` (
`idx` int(10) NOT NULL AUTO_INCREMENT,
`host_idx` int(11) DEFAULT NULL,
`probetime` datetime DEFAULT NULL,
`processor_load` tinyint(4) DEFAULT NULL,
`memory_total` bigint(20) DEFAULT NULL,
`memory_free` bigint(20) DEFAULT NULL,
PRIMARY KEY (`idx`)
)
CREATE TABLE IF NOT EXISTS `hosts` (
`idx` int(10) NOT NULL AUTO_INCREMENT,
`name` char(30) DEFAULT '0',
PRIMARY KEY (`idx`)
)
Basicaly, hosts ist just fixed list of hostnames used in hostinfo table (hostinfo.host_idx = hosts.idx)
hostinfo is a table which is filled each few minutes with data from all hosts and in addition, for each hostinfo row at least one diskinfo row is created. Each diskinfo row contains informations about at least one disk (so, for some hosts there are 3-4 rows of diskinfo). diskinfo.hostinfo_idx = hostinfo.idx.
hostinfo.probetime is simply the time at which data snapshot was created.
What i want to perform now is to select last hostinfo (.probetime) for each particular distinct host (hostinfo.host_idx), while joing informations about disks (diskinfo table) and host names (hosts table)
I came with this:
SELECT hinfo.idx,
hinfo.host_idx,
hinfo.processor_load,
hinfo.memory_total,
hinfo.memory_free,
hnames.idx,
hnames.name,
disks.hostinfo_idx,
disks.id,
disks.name,
disks.size,
disks.freespace,
Max(hinfo.probetime)
FROM systeminfo.hostinfo AS hinfo
INNER JOIN systeminfo.hosts AS hnames
ON hnames.idx = hinfo.host_idx
INNER JOIN systeminfo.disksinfo AS disks
ON disks.hostinfo_idx = hinfo.idx
GROUP BY disks.id,
hnames.name
ORDER BY hnames.name,
disks.id
It seems to work! But, is it 100% correct? Is it optimal? Thanks for any tip!
It's not 100% correct, no.
Suppose you have this table:
x | y | z
-----------------
a b 1
a c 2
d e 1
d f 2
Now when you only group by x, the rows are collapsing and MySQL picks a random row from the collapsed ones. So you might get
x | y | z
-----------------
a b 2
d e 2
or this
x | y | z
-----------------
a c 2
d f 2
Or another combination, this is not determined. Each time you fire your query you might get a different result. The 2 in column z is always there, because of the MAX() function, but you won't necessarily get the corresponding row to it.
Other RDBMSs would actually do the same, but most forbid this by default (in can be forbidden in MySQL, too). You have two possibilities to fix this (actually there are more, but I'll restrict to two).
Either you put all columns you have in your SELECT clause which are not used in an aggregate function like SUM() or MAX() or whatever into the GROUP BY clause as well, like this:
SELECT hinfo.idx,
hinfo.host_idx,
hinfo.processor_load,
hinfo.memory_total,
hinfo.memory_free,
hnames.idx,
hnames.name,
disks.hostinfo_idx,
disks.id,
disks.name,
disks.size,
disks.freespace,
Max(hinfo.probetime)
FROM systeminfo.hostinfo AS hinfo
INNER JOIN systeminfo.hosts AS hnames
ON hnames.idx = hinfo.host_idx
INNER JOIN systeminfo.disksinfo AS disks
ON disks.hostinfo_idx = hinfo.idx
GROUP BY
hinfo.idx,
hinfo.host_idx,
hinfo.processor_load,
hinfo.memory_total,
hinfo.memory_free,
hnames.idx,
hnames.name,
disks.hostinfo_idx,
disks.id,
disks.name,
disks.size,
disks.freespace
ORDER BY hnames.name,
disks.id
Note that this query might get you a different result! I'm just focusing on the problem, that you might get wrong data to the row you think holds the MAX(hinfo.probetime).
Or you solve it like this (and this will get you what you want):
SELECT hinfo.idx,
hinfo.host_idx,
hinfo.processor_load,
hinfo.memory_total,
hinfo.memory_free,
hnames.idx,
hnames.name,
disks.hostinfo_idx,
disks.id,
disks.name,
disks.size,
disks.freespace,
hinfo.probetime
FROM systeminfo.hostinfo AS hinfo
INNER JOIN systeminfo.hosts AS hnames
ON hnames.idx = hinfo.host_idx
INNER JOIN systeminfo.disksinfo AS disks
ON disks.hostinfo_idx = hinfo.idx
WHERE hinfo.probetime = (SELECT MAX(probetime) FROM systeminfo.hostinfo AS hi
INNER JOIN systeminfo.hosts AS hn
ON hnames.idx = hinfo.host_idx
INNER JOIN systeminfo.disksinfo AS d
ON disks.hostinfo_idx = hinfo.idx
WHERE d.id = disks.id AND hn.name = hnames.name)
GROUP BY disks.id,
hnames.name
ORDER BY hnames.name,
disks.id
There's also a nice example in the manual about this: The Rows Holding the Group-wise Maximum of a Certain Column

How to format this mysql Query

What I have is a table statistieken with an ip, hash of browser info, url visited and last visited date in timestamp.
What I could compile from different sources led to this query, the only problem is that this query takes forever(9 minutes) to complete on a table with about 15000 rows, so this query is very inefficient.
I think I'm going to this the wrong way around, but I can't find a decent post or tutorial how to use the results of a select as basis for getting the results I want.
What I simply want is an overview of every entry in the table that matches the hash of the results that are returned that have visted more than 25 pages in the last 12 hours.
CREATE TABLE IF NOT EXISTS `statsitieken` (
`hash` varchar(35) NOT NULL,
`ip` varchar(24) NOT NULL,
`visits` int(11) NOT NULL,
`lastvisit` int(11) NOT NULL,
`browserinfo` text NOT NULL,
`urls` text NOT NULL
) ENGINE=MyISAM DEFAULT CHARSET=latin1;
This is the query I have tried to compile so far.
SELECT * FROM `database`.`statsitieken` WHERE hash in (SELECT hash FROM `database`.`statsitieken`
where `lastvisit` > unix_timestamp(DATE_SUB(
NOW(),INTERVAL 12 hour
)
)
group by hash
having count(urls) > 25
order by urls)
I need this to compile in a decent time, like < 1 second which should be possible in my opinion...
I suggest trying this modified query. The subquery is now computed only once instead of being run for each record returned:
SELECT s.*
FROM `database`.`statsitieken` s, (SELECT *
FROM `database`.`statsitieken`
WHERE `lastvisit` > UNIX_TIMESTAMP(DATE_SUB(NOW(),INTERVAL 12 HOUR))
GROUP BY hash
HAVING COUNT(urls)>25) tmp
WHERE s.`hash`=tmp.`hash`
ORDER BY s.urls
Be sure you have indexes on the following fields:
hash to speed up the GROUP BY and WHERE
urls to speed up the ORDER BY
Derived table with INNER JOIN is faster than a subquery. try this optimized query:
SELECT *
FROM statsitieken a
INNER JOIN (SELECT hash
FROM statsitieken
WHERE lastvisit > unix_timestamp(DATE_SUB(
NOW(),INTERVAL 12 hour
)
) b
ON a.hash = b.hash
GROUP BY a.hash
HAVING COUNT(urls) > 25
ORDER BY urls;
For better performance of this select query you should add indexes as:
ALTER TABLE statsitieken ADD KEY ix_hash(hash);
ALTER TABLE statsitieken ADD KEY ix_lastvisit(lastvisit);
WHERE hash in (SELECT hash FROM `database`.`statsitieken`
where `lastvisit` > unix_timestamp(DATE_SUB(
NOW(),INTERVAL 12 hour
)
)
You are "subquerying" (i don't know if exists that word :P, 'doing a subquery') in the same table, why not to:
where `lastvisit` > unix_timestamp(DATE_SUB(
NOW(),INTERVAL 12 hour
)
do it directly?

Increase speed of a mySQL query

I have a table like this.
CREATE TABLE `accounthistory` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`date` datetime DEFAULT NULL,
`change_ammount` float DEFAULT NULL,
`account_id` int(11) DEFAULT NULL,
PRIMARY KEY (`id`),
)
Its a list of account daily chargings. If i need the balance of the account i use
SELECT sum(change_ammount) FROM accounthistory WHERE account_id=;
Its quite fast becouse i added an index on the account_id column.
But now i need to find the time when the account went in minus (date when SUM(change_ammount)<0)
I use this query:
SELECT main.date as date from accounthistory as main
WHERE main.account_id=484368430
AND (SELECT sum(change_ammount) FROM accounthistory as sub
WHERE sub.account_id=484368430 AND
sub.date < main.date)<0
ORDER BY main.date DESC
LIMIT 1;
But it works very slow. Can you propose a beter solution?
Maybe i need some indexes (not only on account_id)?
The way to make your query faster is to use denormalization: Store the current account balance on every record. The achieve this, you'll have to do three things, then we'll look at how the query would look:
a) Add a columns to your table:
ALTER TABLE accounthistory ADD balance float;
b) Populate the new column
UPDATE accounthistory main SET
balance = (
SELECT SUM(change_amount)
FROM accounthistory
where account_id = main.account_id
and data <= main.date
);
c) To populate new rows, either a) use a trigger, b) use application logic, or c) run the above UPDATE statement for the row added after adding it, ie UPDATE ... WHERE id = ?
Now the query to find which dattes the account changed to negative, which will be very fast, becomes:
SELECT date
from accounthistory
where balance < 0
and balance - change_amount > 0
and account_id = ?;
SELECT MAX(main.date) as date
from accounthistory as main
WHERE main.account_id=484368430
AND EXISTS (SELECT 1 FROM accounthistory as sub
WHERE sub.account_id=main.account_id AND
sub.date < main.date HAVING SUM(sub.change_ammount) < 0)