if i have the following tables:
create table rar (
rar_id int(11) not null auto_increment primary key,
rar_name varchar (20));
create table data_link(
id int(11) not null auto_increment primary key,
rar_id int(11) not null,
foreign key(rar_id) references rar(rar_id));
create table consumption (
id int(11) not null,
foreign key(id) references data_link(id),
consumption int(11) not null,
total_consumption int(11) not null,
date_time datetime not null);
i want the total consumption to be all the consumption field values added up. Is there a way to accomplish this through triggers? or do i need to each time read all the values + the latest value, sum them up and then update the table? is there a better way to do this?
--------------------------------------------------
id | consumption | total_consumption | date_time |
==================================================|
1 | 5 | 5 | 09/09/2013 |
2 | 5 | 10 | 10/09/2013 |
3 | 7 | 17 | 11/09/2013 |
4 | 3 | 20 | 11/09/2013 |
--------------------------------------------------
just wondering if there is a cleaner faster way of getting the total each time a new entry is added?
Or perhaps this is bad design? Would it better to have something like:
SELECT SUM(consumption) FROM consumption WHERE date BETWEEN '2013-09-09' AND '2013-09-11' in order to get this type of information... would doing this be the best option? The only problem i see with this is that the same command would be re-run multiple times - where each time the data would not be stored as it would be retrieved by request....it could be inefficient when you are re-generating the same report several times over for viewing purposes.... rather if the total is already calculated all you have to do is read the data, rather than computing it again and again... thoughts?
any help would be appreciated...
If you've got an index on total_consumption it won't noticeably slow the query down to have a nested select of MAX(total_consumption) as part of the insert as the max value will be stored already.
eg.
INSERT INTO `consumption` (consumption, total_consumption)
VALUES (8,
consumption + (SELECT MAX(total_consumption) FROM consumption)
);
I'm not sure how you're using the id column but you can easily add criteria to the nested select to control this.
If you do need to put a WHERE on the nested select, make sure you have an index across the fields you use and then the total_consumption column. For example, if you make it ... WHERE id = x, you'll need an index on (id, total_consumption) for it to work efficiently.
if You MUST have trigger - it shoud be like that:
DELIMITER $$
CREATE
TRIGGER `chg_consumption` BEFORE INSERT ON `consumption`
FOR EACH ROW BEGIN
SET NEW.total_consumption=(SELECT
MAX(total_consumption)+new.consumption
FROM consumption);
END;
$$
DELIMITER ;
p.s. and make total_consumption int(11) not null, nullable or default 0
EDIT:
improve from SUM(total_consumption) for MAX(total_consumption) as #calcinai suggestion
Related
I have a MySQL table structured like this:
CREATE TABLE `messages` (
`id` int NOT NULL AUTO_INCREMENT,
`author` varchar(250) COLLATE utf8mb4_unicode_ci NOT NULL,
`message` varchar(2000) COLLATE utf8mb4_unicode_ci NOT NULL,
`serverid` varchar(200) COLLATE utf8mb4_unicode_ci NOT NULL,
`date` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
`guildname` varchar(1000) COLLATE utf8mb4_unicode_ci NOT NULL,
PRIMARY KEY (`id`,`date`)
) ENGINE=InnoDB AUTO_INCREMENT=27769461 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;
I need to query this table for various statistics using date ranges for Grafana graphs, however all of those queries are extremely slow, despite the table being indexed using a composite key of id and date.
"id" is auto-incrementing and date is also always increasing.
The queries generated by Grafana look like this:
SELECT
UNIX_TIMESTAMP(date) DIV 120 * 120 AS "time",
count(DISTINCT(serverid)) AS "servercount"
FROM messages
WHERE
date BETWEEN FROM_UNIXTIME(1615930154) AND FROM_UNIXTIME(1616016554)
GROUP BY 1
ORDER BY UNIX_TIMESTAMP(date) DIV 120 * 120
This query takes over 30 seconds to complete with 27 million records in the table.
Explaining the query results in this output:
+----+-------------+----------+------------+------+---------------+------+---------+------+----------+----------+-----------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+----------+------------+------+---------------+------+---------+------+----------+----------+-----------------------------+
| 1 | SIMPLE | messages | NULL | ALL | PRIMARY | NULL | NULL | NULL | 26952821 | 11.11 | Using where; Using filesort |
+----+-------------+----------+------------+------+---------------+------+---------+------+----------+----------+-----------------------------+
This indicates that MySQL is indeed using the composite primary key I created for indexing the data, but still has to scan almost the entire table, which I do not understand. How can I optimize this table for date range queries?
Plan A:
PRIMARY KEY(date, id), -- to cluster by date
INDEX(id) -- needed to keep AUTO_INCREMENT happy
Assiming the table is quite big, having date at the beginning of the PK puts the rows in the given date range all next to each other. This minimizes (somewhat) the I/O.
Plan B:
PRIMARY KEY(id),
INDEX(date, serverid)
Now the secondary index is exactly what is needed for the one query you have provided. It is optimized for searching by date, and it is smaller than the whole table, hence even faster (I/O-wise) than Plan A.
But, if you have a lot of different queries like this, adding a lot more indexes gets impractical.
Plan C: There may be a still better way:
PRIMARY KEY(id),
INDEX(server_id, date)
In theory, it can hop through that secondary index checking each server_id. But I am not sure that such an optimization exists.
Plan D: Do you need id for anything other than providing a unique PRIMARY KEY? If not, there may be other options.
The index on (id, date) doesn't help because the first key is id not date.
You can either
(a) drop the current index and index (date, id) instead -- when date is in the first place this can be used to filter for date regardless of the following columns -- or
(b) just create an additional index only on (date) to support the query.
In MySQL, I have two innodb tables, a small mission critical table, that needs to be readily available at all times for reads/writes. Call this mission_critical. I have a larger table (>10s of millions of rows), called big_table. I need to update big_table, for instance:
update mission_critical c, big_table b
set
b.something = c.something_else
where b.refID=c.id
This query could take more than an hour, but this creates a write-lock on the mission_critical table. Is there a way I can tell mysql, "I don't want a lock on mission_critical" so that that table can be written to?
I understand that this is not ideal from a transactional point of view. The only workaround I can think of right now is to make a copy of the small mission_critical table and do the update from that (which I don't care gets locked), but I'd rather not do that if there's a way to make MySQL natively deal with this more gracefully.
It is not the table that is locking but all of the records in mission_critical that are locked, since they are basically all scanned by the update. I am not assuming this; the symptom is that when a user logs in to an online system, it tries to update a datetime column in mission_critical to update the last time they logged in. These queries die due to a Lock wait timeout exceeded error while the query above is running. If I kill the query above, all pending queries run immediately.
mission_critical.id and big_table.refID are both indexed.
The pertinent portions of the creation statements for each table is:
mission_critical:
CREATE TABLE `mission_critical` (
`intID` int(11) NOT NULL AUTO_INCREMENT,
`id` smallint(6) DEFAULT NULL,
`something_else` varchar(50) NOT NULL,
`lastLoginDate` datetime DEFAULT NULL,
PRIMARY KEY (`intID`),
UNIQUE KEY `id` (`id`),
UNIQUE KEY `something_else` (`something_else`),
) ENGINE=InnoDB AUTO_INCREMENT=1432 DEFAULT CHARSET=latin1
big_table:
CREATE TABLE `big_table` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`postDate` date DEFAULT NULL,
`postTime` int(11) DEFAULT NULL,
`refID` smallint(6) DEFAULT NULL,
`something` varchar(50) NOT NULL,
`change` decimal(20,2) NOT NULL
PRIMARY KEY (`id`),
KEY `refID` (`refID`),
KEY `postDate` (`postDate`),
) ENGINE=InnoDB AUTO_INCREMENT=138139125 DEFAULT CHARSET=latin1
The explanation of the query is:
+----+-------------+------------------+------------+------+---------------+-------+---------+------------------------------------+------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+------------------+------------+------+---------------+-------+---------+------------------------------------+------+----------+-------------+
| 1 | SIMPLE | mission_critical | | ALL | id | | | | 774 | 100 | Using where |
| 1 | UPDATE | big_table | | ref | refID | refID | 3 | db.mission_critical.something_else | 7475 | 100 | |
+----+-------------+------------------+------------+------+---------------+-------+---------+------------------------------------+------+----------+-------------+
I first suggested a workaround with a subquery, to create a copy in an internal temporary table. But in my test the small table was still locked for writes. So I guess your best bet is to make a copy manually.
The reason for the lock is described in this bug report: https://bugs.mysql.com/bug.php?id=72005
This is what Sinisa Milivojevic wrote in an answer:
update table t1,t2 ....
any UPDATE with a join is considered a multiple-table update. In that
case, a referenced table has to be read-locked, because rows must not
be changed in the referenced table during UPDATE until it has
finished. There can not be concurrent changes of the rows, nor DELETE
of the rows, nor, much less, exempli gratia any DDL on the referenced
table. The goal is simple, which is to have all tables with consistent
contents when UPDATE finishes, particularly since multiple-table
UPDATE can be executed with several passes.
In short, this behavior is for a good reason.
Consider writing INSERT and UPDATE triggers, which will update the big_table on the fly. That would delay writes on the mission_critical table. But it might be fast enough for you, and wouldn't need the mass-update-query any more.
Also check if it wouldn't be better to use char(50) instead of varchar(50). I'm not sure, but it's possible that it will improve the update performance because the row size wouldn't need to change. I could improve the update performance about 50% in a test.
UPDATE will lock the rows that it needs to change. It may also lock the "gaps" after those rows.
You may use MySQL transactions in loop
Update only 100 rows at once
BEGIN;
SELECT ... FOR UPDATE; -- arrange to have this select include the 100 rows
UPDATE ...; -- update the 100 rows
COMMIT;
May be worth trying a correlated subquery to see if the optimiser comes up with a different plan, but performance may be worse.
update big_table b
set b.something = (select c.something_else from mission_critical c where b.refID = c.id)
Forgive me for asking what should be a simple question but I am totally new to Sphinx.
I am using Sphinx with a mySQL datastore. The table looks like below with the Title and Content fields indexed by Sphinx.
CREATE TABLE `documents` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`group_id` int(11) NOT NULL,
`group_id2` int(11) NOT NULL,
`date_added` datetime NOT NULL,
`title` varchar(255) NOT NULL,
`content` text NOT NULL,
`url` varchar(255) NOT NULL,
`links` int(11) NOT NULL,
`hosts` int(11) NOT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `url` (`url`)
) ENGINE=InnoDB AUTO_INCREMENT=439043 DEFAULT CHARSET=latin1
Now, if I connect through Sphinx with
mysql -h0 -P9306
I can run a simple query like...
SELECT * FROM test1 WHERE MATCH('test document');
And I will get back a result set like...
+--------+----------+------------+
| id | group_id | date_added |
+--------+----------+------------+
| 360625 | 1 | 1499727792 |
| 362257 | 1 | 1499727807 |
| 362777 | 1 | 1499727811 |
| 159717 | 1 | 1499717614 |
| 160557 | 1 | 1499717621 |
----------------------------------
When what I actually want is it to return a result set with column values from the documents table (like the URL, Title, Links, Hosts, etc. columns) and, if all possible, sort these by the relevancy of the Sphinx match.
Can that be accomplished in a single query? What might it look like?
Thanks in advance!
Two (main) options
Take the ids from the SphinxQL result, and run a MySQL Query to get the full details, see http://sphinxsearch.com/info/faq/#row-storage
eg SELECT * FROM documents WHERE id IN (3,5,7) ORDER BY FIELD(id,3,5,7)
This MySQL query, should be VERY quick, because its a PK lookup, and only retrieving a few rows (ie one page of results) - the heavy lifting of searching the whole table has already been done in first Sphinx Query.
Duplicate all the columns you want to retrieve in the resultset as Attributes. You've already made group_id and date_added as attributes, would need to make more attributes.
sql_field_string is a very convenient shortcut to make BOTH a Field and an String Attribute from one column. Not available for other column types, but less useful, as numeric columns, are not typically needed as fields anyway.
option 1 is good in it avoids duplicating the data, and saves memory (Sphinx wants to typically hold attributes in memory) - and may be most practical on big datasets.
whereas option 2 is good in that it avoids a second query for each result. But because have a copy of data, it may mean additional complication syncing.
Doesn't look relevant in your case, but if say had a 'clicks' column, which you want in increment often (when users click!), and need it in resultset but you don't really need it in sphinx for query purposes, the first option, would allow you only have to increment it in database, and the mysql query would always get the live value. But the second option means having to keep sphinx index in 'sync' at all times)
I have this table, that contains around 80,000,000 rows.
CREATE TABLE `mytable` (
`date` date NOT NULL,
`parameters` mediumint(8) unsigned NOT NULL,
`num` tinyint(3) unsigned NOT NULL,
`val1` int(11) NOT NULL,
`val2` int(10) NOT NULL,
`active` tinyint(3) unsigned NOT NULL,
`ref` int(10) unsigned NOT NULL AUTO_INCREMENT,
PRIMARY KEY (`ref`) USING BTREE,
KEY `parameters` (`parameters`)
) ENGINE=MyISAM AUTO_INCREMENT=79092001 DEFAULT CHARSET=latin1
it's articulated around 2 main columns: "parameters" and "date".
there are around 67,000 possible values for "parameters"
for each "parameters" there are around 1200 rows, each with a different date.
so for each date, there are 67,000 rows.
1200 * 67,000 = 80,400,000.
table size appears as 1.5GB, index size 1.4GB.
now, I want to query the table to retrieve all rows of one "parameters"
(actually I want to do it for each parameter, but this is a good start)
SELECT val1 FROM mytable WHERE parameters=1;
the first run gives me results in 8 seconds
subsequent runs for different but close values of parameters (2, 3, 4...) are instantaneous
a run for a "far away" value (parameters=1000) gives me results in 8 seconds again.
I did tests running the same query without the index, and got results in 20 seconds, so I guess the index is kicking in as shown by EXPLAIN, but not giving a drastic jump in performances:
+----+-------------+----------+------+---------------+------------+---------+-------+------+-------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+----------+------+---------------+------------+---------+-------+------+-------+
| 1 | SIMPLE | mytable | ref | parameters | parameters | 3 | const | 1097 | |
+----+-------------+----------+------+---------------+------------+---------+-------+------+-------+
but I'm still baffled by the time for such and easy request (no join, directly on the index).
the server is 2 years-old 2 cpu quad core 2.6GHz running Ubuntu, with 4G of RAM.
I've raised the key_buffer parameter to 1G, and have restarted mysql, but noticed no change whatsoever.
should I consider this normal ? or is there something I'm doing wrong ? I get the feeling with the right config the request should be almost immediate.
Try using a covering index, i.e. create an index that includes both of the columns you need. It won't need the second disk I/O to fetch the values from the main table since the data's right there in the index.
I am trying get a rough idea of a server's past load; I would like to utilize a log table to estimate concurrent access but I'm getting stuck with building the query. The table is as follows:
FYI: I didn't design/create this table, took over an ugly database :(
CREATE TABLE `user_access_log` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`oem_id` varchar(50) NOT NULL,
`add_date` datetime NOT NULL,
`username` varchar(36) NOT NULL,
`site_id` int(5) NOT NULL,
`card_id` int(6) NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=MyISAM AUTO_INCREMENT=1 DEFAULT CHARSET=latin1$$
Table data:
---------------------------------------------------------------------------
| id | add_date | username | site_id | card_id |
---------------------------------------------------------------------------
| 185818901 | 2012-12-25 03:01:04 | 1944E9745A9CE91 | 4 | 27273 |
| 185818900 | 2012-12-25 03:01:04 | EA5902C8115C9FF | 1 | 27238 |
---------------------------------------------------------------------------
This may be far fetched, or perhaps illogical... but, I am trying to count unique usernames for each second from add_date (keeping in mind the data spans over months).
Any thoughts?
Thanks,
Kate
Sounds like you just need to use COUNT and DATEFORMAT:
select DATE_FORMAT(add_date, '%Y-%m-%d %H:%i:%s'),
count(distinct username) usercount
from player_user_video
group by DATE_FORMAT(add_date, '%Y-%m-%d %H:%i:%s')
SQL Fiddle Demo
Depending on how your data is stored, you may not need to use DATEFORMAT. But this will return you a count of distinct usernames grouped by the date, hour, minute and second. I assume that's what you're trying to accomplish. If you are trying to figure out which second or hour has the most traffic, you can use SECOND(add_date) for example. Depends on your exact needs (the Fiddle includes both examples).