MySQL Query Fixing/Optimisation for my configuration table - mysql

I got a mySQL table, that holds the configuration of my project, each configuration change creates a new entry, so that i have a history of all changes, and who changed it.
CREATE TABLE `configurations` (
`name` varchar(255) NOT NULL,
`value` text NOT NULL,
`lastChange` datetime NOT NULL,
`changedBy` bigint(32) NOT NULL,
KEY `lastChange` (`lastChange`),
KEY `name` (`name`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8;
INSERT INTO `configurations` (`name`, `value`, `lastChange`, `changedBy`) VALUES
('activePageLimit', 'activePageLimit-old-value', '2016-01-06 12:25:05', 1096775260340178),
('activePageLimit', 'activePageLimit-new-value', '2016-01-06 12:27:57', 1096775260340178),
('customerLogo', 'customerLogo-old-value', '2016-02-06 00:00:00', 1096775260340178),
('customerLogo', 'customerLogo-new-value', '2016-01-07 00:00:00', 1096775260340178);
Right now i have a problem with my select query, that should return all names and their latest value (ordered by lastChange).
| name | value | lastChange |
|-----------------|---------------------------|---------------------------|
| customerLogo | customerLogo-new-value | January, 07 2016 00:00:00 |
| activePageLimit | activePageLimit-new-value | January, 06 2016 12:27:57 |
My current Query is:
SELECT `name`, `value`, `lastChange`
FROM (
SELECT `name`, `value`, `lastChange`
FROM `configurations`
ORDER BY `lastChange` ASC
) AS `c`
GROUP BY `name` DESC
But unfortunately this does not always return the right values, and i don't like to use a subquery, there has to be a cleaner and faster way to do this.
I also created a SQL-Fiddle for you as a playground: http://sqlfiddle.com/#!9/f1dc9/1/0
Is there any other clever solution i missed?

Your method is documented to return indeterminate results (because you have columns in the select that are not in the group by).
Here are three alternatives. The first is standard SQL, using an explicit aggregation to get the most recent change.
SELECT c.*
FROM configurations c JOIN
(SELECT `name`, MAX(`lastChange`) as maxlc
FROM `configurations`
GROUP BY name
) mc
ON c.name = mc.name and c.lasthange = mc.maxlc ;
The second is also standard SQL, using not exists:
select c.*
from configurations c
where not exists (select 1
from configurations c2
where c2.name = c.name and c2.lastchange > c.lastchange
);
The third uses a hack which is available in MySQL (and it assumes that the value does not have any commas in this version and is not too long):
select name, max(lastchange),
substring_index(group_concat(value order by lastchange desc), ',', 1) as value
from configurations
order by name;
Use this version carefully, because it is prone to error (for instance, the intermediate group_concat() result could exceed a MySQL parameter, which would then have to be re-set).
There are other methods -- such as using variables. But these three should be sufficient for you to consider your options.

If we want to avoid SUBQUERY the only other option is JOIN
SELECT cc.name, cc.value, cc.lastChange FROM configurations cc
JOIN (
SELECT name, value, lastChange
FROM configurations
ORDER BY lastChange ASC
) c on c.value = cc.value
GROUP BY cc.name DESC

You have two requirements: a historical log, and a "state". Keep them in two different tables, in spite of that providing redundant information.
That is, have one table that faithfully records who changed what when.
Have another table that faithfully specifies the current state for the configuration.
Plan A: INSERT into the Log and UPDATE the `State whenever anything happens.
Plan B: UPDATE the State and use a TRIGGER to write to the Log.

Related

MySQL optimized performance for two large tables with same index

I have two tables with huge amount of data in them (~1.8mil in the main one, ~1.2mil in the secondary one), as follows:
subscriber_table (id, name, email, country, account_status, ...)
subscriber_payment_table (id, subscriber_id, payment_type, payment_credential)
My end goal is having a table, containing all the users and their payment tables (null if non existing), up to yesterday, and with account_status = 1 (active)
Mot all subscribers have a corresponding subscriber_payment, so using an INNER JOIN isn't a viable option, and using a LEFT JOIN has me end up with SQL timing out my query after 2 hrs after much processing effort.
SELECT
`subscribers`.`id` AS `id`,
`subscribers`.`email` AS `email`,
`subscribers`.`name` AS `name`,
`subscribers`.`geoloc_country` AS `country`,
`subscribers_payment`.`payment_type` AS `paymentType`,
`subscribers_payment`.`payment_credential` AS `paymentCredential`
`subscribers`.`create_datetime` AS `createdAt`
FROM
`subscribers`
LEFT JOIN
`subscribers_payment` ON (`subscribers_payment`.`subscriberId` = `subscribers`.`id`)
WHERE
`subscribers`.`account_status` = 1
AND DATE_FORMAT(CAST(`subscribers`.`create_datetime` AS DATE), '%Y-%m-%d') < curdate())
As mentioned, this query takes too much time and ends up timing out and not working.
I've also considered having a UNION, between "All the Subscribers" and "Subscribers with Payment".
(
SELECT
`subscribers`.`id` AS `id`,
`subscribers`.`email` AS `email`,
`subscribers`.`name` AS `name`,
`subscribers`.`geoloc_country` AS `country`,
null AS `paymentType`,
null AS `paymentCredential`
`subscribers`.`create_datetime` AS `createdAt`
FROM
`subscribers`
WHERE
`subscribers`.`account_status` = 1
AND DATE_FORMAT(CAST(`subscribers`.`create_datetime` AS DATE), '%Y-%m-%d') < curdate()))
UNION
(
SELECT
`subscribers`.`id` AS `id`,
`subscribers`.`email` AS `email`,
`subscribers`.`name` AS `name`,
`subscribers`.`geoloc_country` AS `country`,
`subscribers_payment`.`payment_type` AS `paymentType`,
`subscribers_payment`.`payment_credential` AS `paymentCredential`
`subscribers`.`create_datetime` AS `createdAt`
FROM
`subscribers`
INNERJOIN
`subscribers_payment` ON (`subscribers_payment`.`subscriberId` = `subscribers`.`id`)
WHERE
`subscribers`.`account_status` = 1
AND DATE_FORMAT(CAST(`subscribers`.`create_datetime` AS DATE), '%Y-%m-%d') < curdate()))
The problem with that current implementation is that I'm getting duplicate queries (I'm using a UNION but it's not grouping my results together and removing non-distinct values, that's because I have a different value in the paymentType and paymentCredential columns)
This query runs in about ~2mins, so this is more feasible for me. I just need to eliminate duplicate records.. unless there's a wiser option here
Disclaimer: we're using MyISAM tables, so having foreign keys to speed up the queries is a no-go.
For this query:
SELECT . . .
FROM subscribers s LEFT JOIN
subscribers_payment sp
ON sp.subscriberId = s.id
WHERE s.account_status = 1 AND
s.create_datetime < curdate();
Then, you want an index on subscribers(account_status, create_datetime, id) and on subscribers_payment(subscriberId).
I am guessing that the index on subscriber_payment is missing, which explains the performance problems.
Notes:
Use table aliases -- they make the query easier to write and read.
There should be no need to convert a datetime to a string for comparison purposes.
There is no need to use backticks for all identifiers. They just make the query harder to write and read.

Delete all items in a database except the last date

I have a MySQL table that looks (very simplified) like this:
CREATE TABLE `logging` (
`id` bigint(20) NOT NULL,
`time` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
`level` smallint(3) NOT NULL,
`message` longtext CHARACTER SET utf8 COLLATE utf8_general_mysql500_ci NOT NULL
);
I would like to delete all rows of a specific level, except the last one (time is most recent).
Is there a way to select all rows with level set to a specific value and then delete all rows except the latest one in one single SQL query? How would I start solving this problem?
(As I said, this is a very simplified table, so please don't try to discuss possible design problems of this table. I removed some columns. It is designed per PSR-3 logging standard and I don't think there is an easy way to change that. What I want to solve is how I can select from a table and then delete all but some rows of the same table. I have only intermediate knowledge of MySQL.)
Thank you for pushing me in the right direction :)
Edit:
The Database version is /usr/sbin/mysqld Ver 8.0.18-0ubuntu0.19.10.1 for Linux on x86_64 ((Ubuntu))
You can use ROW_NUMBER() analytic function ( as using DB version 8+ ) :
DELETE lg FROM `logging` AS lg
WHERE lg.`id` IN
( SELECT t.`id`
FROM
(
SELECT t.*,
ROW_NUMBER() OVER (ORDER BY `time` DESC) as rn
FROM `logging` t
-- WHERE `level` = #lvl -- optionally add this line to restrict for a spesific value of `level`
) t
WHERE t.rn > 1
)
to delete all of the rows except the last inserted one(considering id is your primary key column).
You can do this:
SELECT COUNT(time) FROM logging WHERE level=some_level INTO #TIME_COUNT;
SET #TIME_COUNT = #TIME_COUNT-1;
PREPARE STMT FROM 'DELETE FROM logging WHERE level=some_level ORDER BY time ASC LIMIT ?;';
EXECUTE STMT USING #TIME_COUNT;
If you have an AUTO_INCREMENT id column - I would use it to determine the most recent entry. Here is one way doing that:
delete l
from (
select l1.level, max(id) as id
from logging l1
where l1.level = #level
) m
join logging l
on l.level = m.level
and l.id < m.id
An index on (level) should give you good performance and will support the MAX() subquery as well as the JOIN.
View on DB Fiddle
If you really need to use the time column, you can modify the query as follows:
delete l
from (
select l1.level, l1.id
from logging l1
where l1.level = #level
order by l1.time desc, l1.id desc
limit 1
) m
join logging l
on l.level = m.level
and l.id <> m.id
View on DB Fiddle
Here you would want to have an index on (level, time).

Mysql deduplicate records in single query

I have the following table:
CREATE TABLE `relations` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`relationcode` varchar(25) DEFAULT NULL,
`email_address` varchar(100) DEFAULT NULL,
`firstname` varchar(100) DEFAULT NULL,
`latname` varchar(100) DEFAULT NULL,
`last_contact_date` varchar(25) DEFAULT NULL,
PRIMARY KEY (`id`)
)
In this table there are duplicates, these are relation with exact the same relationcode and email_address. They can be in there twice or even 10 times.
I need a query that selects the id's of all records, but excludes the ones that are in there more than once. Of those records, I only would like to select the record with the most recent last_contact_id only.
I'm more into Oracle than Mysql, In Oracle I would be able to do it this way:
select * from (
select row_number () over (partition by relationcode order by to_date(last_contact_date,'dd-mm-yyyy')) rank,
id,
relationcode,
email_address ,
last_contact_date
from RELATIONS)
where rank = 1
But I can't figure out how to modify this query to work in MySql. I'm not even dure it's possible to do the same thing in a single query in MySQl.
Any ideas?
Normal way to do this is a sub query to get the latest record and then join that against the table:-
SELECT id, relationcode, email_address, firstname, latname, last_contact_date
FROM RELATIONS
INNER JOIN
(
SELECT relationcode, email_address, MAX(last_contact_date) AS latest_contact_date
FROM RELATIONS
GROUP BY relationcode, email_address
) Sub1
ON RELATIONS.relationcode = Sub1.relationcode
AND RELATIONS.email_address = Sub1.email_address
AND RELATIONS.last_contact_date = Sub1.latest_contact_date
It is possible to manually generate the kind of rank that your Oracle query uses using variables. Bit messy though!
SELECT id, relationcode, email_address, firstname, latname, last_contact_date
FROM
(
SELECT id, relationcode, email_address, firstname, latname, last_contact_date, #seq:=IF(#relationcode = relationcode AND #email_address = email_address, #seq + 1, 1) AS seq, #relationcode := relationcode, #email_address := email_address
(
SELECT id, relationcode, email_address, firstname, latname, last_contact_date
FROM RELATIONS
CROSS JOIN (SELECT #seq:=0, #relationcode := '', #email_address :='') Sub1
ORDER BY relationcode, email_address, last_contact_date DESC
) Sub2
) Sub3
WHERE seq = 1
This uses a sub query to initialise the variables. The sequence number is added to if the relation code and email address are the same as the previous row, if not they are reset to 1 and stored in a field. Then the outer select check the sequence number (as a field, not as the variable name) and records only returned if it is 1.
Note that I have done this as multiple sub queries. Partly to make it clearer to you, but also to try to force the order that MySQL executes it is. There are a couple of possible issues with how MySQL says it may order the execution of things that could cause an issue. They never have done for me, but with sub queries I would hope for force the order.
Here is a method that will work in both MySQL and Oracle. It rephrases the question as: Get me all rows from relations where the relationcode has no larger last_contact_date.
It works something like this:
select r.*
from relations r
where not exists (select 1
from relations r2
where r2.relationcode = r.relationcode and
r2.last_contact_date > r.last_contact_date
);
With the appropriate indexes, this should be pretty efficient in both databases.
Note: This assumes that last_contact_date is stored as a date not as a string (as in your table example). Storing dates as strings is just a really bad idea and you should fix your data structure

SQL alternative to sub-query in FROM

I have a table containing user to user messages. A conversation has all messages between two users. I am trying to get a list of all the different conversations and display only the last message sent in the listing.
I am able to do this with a SQL sub-query in FROM.
CREATE TABLE `messages` (
`id` bigint(20) NOT NULL AUTO_INCREMENT,
`from_user_id` bigint(20) DEFAULT NULL,
`to_user_id` bigint(20) DEFAULT NULL,
`type` smallint(6) NOT NULL,
`is_read` tinyint(1) NOT NULL,
`is_deleted` tinyint(1) NOT NULL,
`text` longtext COLLATE utf8_unicode_ci NOT NULL,
`heading` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
`created_at_utc` datetime DEFAULT NULL,
`read_at_utc` datetime DEFAULT NULL,
PRIMARY KEY (`id`)
);
SELECT * FROM
(SELECT * FROM `messages` WHERE TYPE = 1 AND
(from_user_id = 22 OR to_user_id = 22)
ORDER BY created_at_utc DESC
) tb
GROUP BY from_user_id, to_user_id;
SQL Fiddle:
http://www.sqlfiddle.com/#!2/845275/2
Is there a way to do this without a sub-query?
(writing a DQL which supports sub-queries only in 'IN')
You seem to be trying to get the last contents of messages to or from user 22 with type = 1. Your method is explicitly not guaranteed to work, because the extra columns (not in the group by) can come from arbitrary rows. As explained in the [documentation][1]:
MySQL extends the use of GROUP BY so that the select list can refer to
nonaggregated columns not named in the GROUP BY clause. This means
that the preceding query is legal in MySQL. You can use this feature
to get better performance by avoiding unnecessary column sorting and
grouping. However, this is useful primarily when all values in each
nonaggregated column not named in the GROUP BY are the same for each
group. The server is free to choose any value from each group, so
unless they are the same, the values chosen are indeterminate.
Furthermore, the selection of values from each group cannot be
influenced by adding an ORDER BY clause. Sorting of the result set
occurs after values have been chosen, and ORDER BY does not affect
which values within each group the server chooses.
The query that you want is more along the lines of this (assuming that you have an auto-incrementing id column for messages):
select m.*
from (select m.from_user_id, m.to_user_id, max(m.id) as max_id
from message m
where m.type = 1 and (m.from_user_id = 22 or m.to_user_id = 22)
) lm join
messages m
on lm.max_id = m.id;
Or this:
select m.*
from message m
where m.type = 1 and (m.from_user_id = 22 or m.to_user_id = 22) and
not exists (select 1
from messages m2
where m2.type = m.type and m2.from_user_id = m.from_user_id and
m2.to_user_id = m.to_user_id and
m2.created_at_utc > m.created_at_utc
);
For this latter query, an index on messages(type, from_user_id, to_user_id, created_at_utc) would help performance.
Since this is a rather specific type of data query which goes outside common ORM use cases, DQL isn't really fit for this - it's optimized for walking well-defined relationships.
For your case however Doctrine fully supports native SQL with result set mapping. Using a NativeQuery with ResultSetMapping like this you can easily use the subquery this problem requires, and still map the results on native Doctrine entities, allowing you to still profit from all caching, usability and performance advantages.
Samples found here.
If you mean to get all conversations and all their last messages, then a subquery is necessary.
SELECT a.* FROM messages a
INNER JOIN (
SELECT
MAX(created_at_utc) as max_created,
from_user_id,
to_user_id
FROM messages
GROUP BY from_user_id, to_user_id
) b ON a.created_at_utc = b.max_created
AND a.from_user_id = b.from_user_id
AND a.to_user_id = b.to_user_id
And you could append the where condition as you like.
THE SQL FIDDLE.
I don't think your original query was even doing this correctly. Not sure what the GROUP BY was being used for other than maybe try to only return a single (unpredictable) result.
Just add a limit clause:
SELECT * FROM `messages`
WHERE `type` = 1 AND
(`from_user_id` = 22 OR `to_user_id` = 22)
ORDER BY `created_at_utc` DESC
LIMIT 1
For optimum query performance you need indexes on the following fields:
type
from_user_id
to_user_id
created_at_utc

Generate statistics in MySQL

I have a table with posts and I want to generate a graph that shows how many posts were made the previous last 30 minutes, and the last 30 minutes before that etc. The posts are selected by their post_handler and post_status.
The table structure looks like this.
CREATE TABLE IF NOT EXISTS `posts` (
`post_title` varchar(255) NOT NULL,
`post_content` text NOT NULL,
`post_date_added` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
`post_handler` varchar(255) NOT NULL,
`post_status` tinyint(4) NOT NULL,
`id` int(11) NOT NULL AUTO_INCREMENT,
PRIMARY KEY (`id`),
KEY `post_status` (`post_status`),
KEY `post_status_2` (`post_status`,`id`),
KEY `post_handler` (`post_handler`),
KEY `post_date_added` (`post_date_added`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 AUTO_INCREMENT=2300131 ;
The results I'd like to receive, sorted after post_date_added.
period_start period_end posts
2011-12-06 19:23:44 2011-12-06 19:53:44 10
2011-12-06 19:53:44 2011-12-06 20:23:44 39
2011-12-06 20:23:44 2011-12-06 20:53:44 40
Right now I use solution where I have to run this query many times over, and then insert the data into another table from the PHP script.
SELECT COUNT(*) FROM posts WHERE post_handler = 'test' AND post_status = 1 AND post_date_added BETWEEN '2011-12-06 19:23:44' AND '2011-12-06 19:53:44'
Do you know any other solution? Is there any way to run a query that also inserts results into the database, all in one query?
Its fairly easy to group by distinctive time parameters, like hour, minute, day or whatever. If you want to group this by an hour, a possible query might look like this:
SELECT DATE_FORMAT(post_date_added,"%Y-%m-%d %H") AS "_Date",
COUNT(*)
FROM posts
WHERE post_handler = 'test'
AND post_status = 1
GROUP BY _Date;
(run this with a mysql query tool of your choice to see the output).
However, if you want to consider 30mins as the base of your group, the SQL part will get more tricky. For this special purpose, since you've only have to divide into two different subsets, maybe work with this approach:
SELECT DATE_FORMAT(post_date_added,"%Y-%m-%d %H") AS "_Date",
"00" AS "semihour",
COUNT(*)
FROM posts
WHERE post_handler = 'test'
AND DATE_FORMAT(post_date_added,"%i") < 30
AND post_status = 1
GROUP BY _Date
UNION
SELECT DATE_FORMAT(post_date_added,"%Y-%m-%d %H") AS "_Date",
"30" AS "semihour",
COUNT(*)
FROM posts
WHERE post_handler = 'test'
AND DATE_FORMAT(post_date_added,"%i") >= 30
AND post_status = 1
GROUP BY _Date;
Again, run this with a mysql query tool of your choice to see the output. You could add mathematical distinguishments there too working with CASE or IF and such, but personally I'd either group by hour or minute just to keep the SQL part way easier.
To directly add those numbers into your graph database, use this syntax:
INSERT INTO yourtable (yourfields)
SELECT ...
More details about this can be found here in the MySQL documentation.
In (very) brief: yes, you can insert the results of a query into another table. Take a look at INSERT ... SELECT here: http://dev.mysql.com/doc/refman/5.1/en/insert-select.html
Essentially, you'd just change what you have to something like
INSERT INTO post_statistics_table (period_start, period_end, posts)
SELECT ?, ?, COUNT(*) FROM posts
WHERE post_handler = 'test'
AND post_status = 1
AND post_date_added BETWEEN ? AND ?
and then fill in the four ?s with the same two DATETIMEs, repeated. ($from, $to, $from, $to)