I have a table in MySQL like this:
startIp
endIp
city
countryCode
latitude
longitude
16777216
16777471
Los Angeles
US
34.0522
-118.244
16777472
16778239
Fuzhou
CN
26.0614
119.306
16778240
16779263
Melbourne
AU
-37.814
144.963
and 2.7 million more entries.
Now I have a converted IP Adresss like 16777566.This should return "Fuzhou, CN, 26.0614, 119.306"
Right now I use this query:
SELECT * FROM kombiniert WHERE startIp < 16777566 AND endIp > 16777566
It works really well but its to slow. Performance:
Without LIMIT: SELECT * FROM kombiniert WHERE startIp < 2264918979 AND endIp > 2264918979;
avg (2300ms)
With LIMIT: SELECT * FROM kombiniert WHERE startIp < 2264918979 AND endIp > 2264918979 LIMIT 1;
avg (1500ms)
Indexed Without LIMIT: SELECT * FROM kombiniert WHERE startIp < 2264918979 AND endIp > 2264918979;
avg (5300ms)
Indexed With LIMIT: SELECT * FROM kombiniert WHERE startIp < 2264918979 AND endIp > 2264918979 LIMIT 1;
avg (5500ms)
Now I want to speed up this query! Why should I do?
Thanks so much!
EDIT:I forgot to mention: The fields startIp, endIp are bigint!
EDIT2: table creation sql:
SET SQL_MODE = "NO_AUTO_VALUE_ON_ZERO";
START TRANSACTION;
SET time_zone = "+00:00";
CREATE TABLE `kombiniert` (<br>
`id` int(11) NOT NULL,<br>
`startIp` bigint(20) NOT NULL,<br>
`endIp` bigint(20) NOT NULL,<br>
`city` text NOT NULL,<br>
`countryCode` varchar(4) NOT NULL,<br>
`latitude` float NOT NULL,<br>
`longitude` float NOT NULL<br>
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;<br>
<br>
ALTER TABLE `kombiniert`<br>
ADD PRIMARY KEY (`id`),<br>
ADD KEY `startIp` (`startIp`),<br>
ADD KEY `endIp` (`endIp`);<br>
<br>
ALTER TABLE `kombiniert`<br>
MODIFY `id` int(11) NOT NULL AUTO_INCREMENT, AUTO_INCREMENT=2747683;<br>
COMMIT;<br>
Searching for IP addresses (or any other metric that is split into buckets) is not efficient. Or at least not efficient with the obvious code. The best average performance you can get is scanning one-quarter of the table for what you are looking for. That is "Order(N)".
You can get "Order(1)" performance for most operations, but it takes a restructuring of the table and the query. See http://mysql.rjweb.org/doc.php/ipranges
Related
I'm working with a 3rd party MYSQL database over which I have no control except I can read from it. It contains 51 tables with identical column structure but slightly different names. They hold daily summaries for a different data source. Example Table:
CREATE TABLE `archive_day_?????` (
`dateTime` int(11) NOT NULL,
`min` double DEFAULT NULL,
`mintime` int(11) DEFAULT NULL,
`max` double DEFAULT NULL,
`maxtime` int(11) DEFAULT NULL,
`sum` double DEFAULT NULL,
`count` int(11) DEFAULT NULL,
`wsum` double DEFAULT NULL,
`sumtime` int(11) DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
where ????? changes to indicate the type of data held.
The dateTime field is mirrored across all tables being midnight of every day since the system has been running.
I want to produce a single data set across all tables using an inner join on the dateTime. But to avoid writing
SELECT ad1.maxtime as ad1_maxtime, ad2.maxtime as ad2_maxtime...
51 times for 9 fields is there a way I can bulk create aliases e.g
ad1.* as ad_*, ad2.* as ad_* and so on.
I have looked at Create Aliases In Bulk? but this doesn't seem to work for MySQL. Ultimatly the data is being used by a Django ORM.
EDIT: Unfortunately Union doesn't uniquely identify the fields or group them together e.g.
SELECT * FROM `archive_day_ET` UNION ALL SELECT * FROM `archive_day_inTemp`
results in:
To generate a string with all the field names from those tables, you could query information_schema.columns
For example:
SELECT
GROUP_CONCAT(CONCAT(TABLE_NAME,'.`',column_name,'` AS `',column_name,'_',replace(TABLE_NAME,'archive_day_',''),'`') SEPARATOR ',\r\n')
FROM information_schema.columns
WHERE TABLE_NAME like 'archive_day_%'
A test on db<>fiddle here
And to generate the JOIN's then you could use information_schema.tables
For example:
SELECT CONCAT('FROM (\r\n ',GROUP_CONCAT(CONCAT('SELECT `dateTime` FROM ',TABLE_NAME) SEPARATOR '\r\n UNION\r\n '),'\r\n) AS dt \r\nLEFT JOIN ',
GROUP_CONCAT(CONCAT(TABLE_NAME,' ON ',
TABLE_NAME,'.`dateTime` = dt.`dateTime`') SEPARATOR '\r\nLEFT JOIN ')) as SqlJoins
FROM information_schema.tables
WHERE TABLE_NAME like 'archive_day_%'
A test on db<>fiddle here
For the 2 example tables they would generate
archive_day_ET.`dateTime` AS `dateTime_ET`,
archive_day_ET.`min` AS `min_ET`,
archive_day_ET.`mintime` AS `mintime_ET`,
archive_day_ET.`max` AS `max_ET`,
archive_day_ET.`maxtime` AS `maxtime_ET`,
archive_day_ET.`sum` AS `sum_ET`,
archive_day_ET.`count` AS `count_ET`,
archive_day_ET.`wsum` AS `wsum_ET`,
archive_day_ET.`sumtime` AS `sumtime_ET`,
archive_day_inTemp.`dateTime` AS `dateTime_inTemp`,
archive_day_inTemp.`min` AS `min_inTemp`,
archive_day_inTemp.`mintime` AS `mintime_inTemp`,
archive_day_inTemp.`max` AS `max_inTemp`,
archive_day_inTemp.`maxtime` AS `maxtime_inTemp`,
archive_day_inTemp.`sum` AS `sum_inTemp`,
archive_day_inTemp.`count` AS `count_inTemp`,
archive_day_inTemp.`wsum` AS `wsum_inTemp`,
archive_day_inTemp.`sumtime` AS `sumtime_inTemp`
And
FROM (
SELECT `dateTime` FROM archive_day_ET
UNION
SELECT `dateTime` FROM archive_day_inTemp
) AS dt
LEFT JOIN archive_day_ET ON archive_day_ET.`dateTime` = dt.`dateTime`
LEFT JOIN archive_day_inTemp ON archive_day_inTemp.`dateTime` = dt.`dateTime`
I have a table with some data. Many of these data have the name ICA Supermarket with different sums for every data. If I use the following SQL query, it will also show data with the sum under 100. This applies also if I change >= '100' to a higher digit, for an example 200.
SELECT *
FROM transactions
WHERE data_name LIKE '%ica%'
AND REPLACE(data_sum, '-', '') >= '100'
If I change >= to <= no data will show at all. Here's how the table looks like:
CREATE TABLE IF NOT EXISTS `transactions` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`data_name` tinytext NOT NULL,
`data_sum` decimal(10,2) NOT NULL
UNIQUE KEY `id` (`id`)
)
Is it because data_sum is a DECIMAL? How can I prevent this from happening? I want to use DECIMAL for sums :)
Note: data_sum will also contain sums that are above minus.
REPLACE(data_sum, '-', '') returns a string. Also '100' is a string. So a string compare will be used. You should use ABS function:
SELECT *
FROM transactions
WHERE data_name LIKE '%ica%'
AND ABS(data_sum) >= 100
Are you looking for values >= 100 and <= -100? Or just values <= -100.
If the latter, then
... AND data_sum <= -100
This applies to DECIMAL, INT, FLOAT, etc.
Every table 'needs' a PRIMARY KEY. Promote that UNIQUE to PRIMARY.
I have a table with 15 million records containing name, email addresses and IPs. I need to update another column in the same table with the country code using the IP address. I downloaded a small database (ip2location lite - https://lite.ip2location.com/) containing the all ip ranges and associated countries. The ip2location table has the following structure;
CREATE TABLE `ip2location_db1` (
`ip_from` int(10) unsigned DEFAULT NULL,
`ip_to` int(10) unsigned DEFAULT NULL,
`country_code` char(2) COLLATE utf8_bin DEFAULT NULL,
`country_name` varchar(64) COLLATE utf8_bin DEFAULT NULL,
KEY `idx_ip_from` (`ip_from`),
KEY `idx_ip_to` (`ip_to`),
KEY `idx_ip_from_to` (`ip_from`,`ip_to`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 COLLATE=utf8_bin
I'm using the following function to retrieve the country code from an ip address;
CREATE DEFINER=`root`#`localhost` FUNCTION `get_country_code`(
ipAddress varchar(30)
) RETURNS VARCHAR(2)
DETERMINISTIC
BEGIN
DECLARE ipNumber INT UNSIGNED;
DECLARE countryCode varchar(2);
SET ipNumber = SUBSTRING_INDEX(ipAddress, '.', 1) * 16777216;
SET ipNumber = ipNumber + (SUBSTRING_INDEX(SUBSTRING_INDEX(ipAddress, '.', 2 ),'.',-1) * 65536);
SET ipNumber = ipNumber + (SUBSTRING_INDEX(SUBSTRING_INDEX(ipAddress, '.', -2 ),'.',1) * 256);
SET ipNumber = ipNumber + SUBSTRING_INDEX(ipAddress, '.', -1 );
SET countryCode =
(SELECT country_code
FROM ip2location.ip2location_db1
USE INDEX (idx_ip_from_to)
WHERE ipNumber >= ip2location.ip2location_db1.ip_from AND ipNumber <= ip2location.ip2location_db1.ip_to
LIMIT 1);
RETURN countryCode;
END$$
DELIMITER ;
I've ran an EXPLAIN statement and this is the output;
'1', 'SIMPLE', 'ip2location_db1', NULL, 'range', 'idx_ip_from_to', 'idx_ip_from_to', '5', NULL, '1', '33.33', 'Using index condition'
My problem is that the query on 1000 records takes ~15s to execute which mean running the same query on all the database would require more than 2 days to complete. Is there a way to improve this query.
PS - If I remove the USE INDEX (idx_ip_from_to) the query takes twice as long. Can you explain why?
Also I'm not a database expert so bear with me :)
This can be quite tricky. I think the issue is that only the ip_from part of the condition can be used. See if this gets the performance you want:
SET countryCode =
(SELECT country_code
FROM ip2location.ip2location_db1 l
WHERE ipNumber >= l.ip_from
ORDER BY ip_to
LIMIT 1
);
I know I'm leaving off the ip_to. If this works, then you can do the full check in two parts. First get the ip_from using a similar query. Then use an equality query to get the rest of the information in the row.
The reason USE INDEX helps is because MySQL wasn't planning to use that index. Its optimizer chose a different one, but it guessed wrong. Sometimes this happens.
Also, I'm not sure if this will affect performance a ton, but you should just use INET_ATON to change the IP address string into an integer. You don't need that SUBSTRING_INDEX business, and it may be slower.
What I would do here is measure the maximum distance between from and to:
SELECT MAX(ip_from - ip_to) AS distance
FROM ip2location_db1;
Assuming this is not a silly number, you will then be able to use the ip_from index properly. The check becomes:
WHERE ipNumber BETWEEN ip_from AND ip_from + distance
AND ipNumber <= ip_to
The goal here is to make all of the information to find a narrow set of rows come from a limited range of one column's value: ip_from. Then ip_to is just an accuracy check.
The reason you want to do this is because the ip_to value (second part of the index) can't be used until the corresponding ip_from value is found. So it still has to scan most of the index records for low values of ip_from without an upper bound.
Otherwise, you might consider measuring how unique the IP addresses are in your 15 million records. For example, if there are only 5 million unique IPs, it could be better to extract a unique list, map those to country codes, and then use that mapping (either at runtime, or to update the original table.) Depends.
If the values are very unique, but potentially in localized clusters, you could try removing the irrelevant rows from ip2location_db1, or even horizontal partitioning to improve the range checks. I'm not sure this would win anything, but if you can use some index on the original table to consult specific partitions only, you might be able to win some performance.
I've a table named messages where users of my local hub store their messages(kind of like a web-forums). Currently, a majority of users are participating and I get nearly 30 to 50 new entries to my table everyday.
Since this has been going on for past few years, we've got nearly 100,000 rows of data in table. The table structure is kind of like this. Where fid is the PRIMARY and ip and id(nickname) are just INDEX.
I was using this kind of query uptil now; and then iterating the resultset in luasql as shown in this link. This, according to me, consumes a lot of time and space(in buffers).
`msg` VARCHAR(280) NOT NULL,
`id` VARCHAR(30) NOT NULL,
`ctg` VARCHAR(10) NOT NULL,
`date` DATE NOT NULL COMMENT 'date_format( %m/%d/%y )',
`time` TIME NOT NULL COMMENT 'date_format( %H:%i:%s )',
`fid` BIGINT(20) NOT NULL AUTO_INCREMENT,
`ip` CHAR(39) NOT NULL DEFAULT '127.0.0.1'
My problem is that now-a-days, we've switched to new API of PtokaX and the number of requests to read and write have increased dramatically. Since, I recently read about MySQL procedures, I was thinking if these procedures are a faster or safer way of dealing with this situation.
SELECT *
FROM ( SELECT *
FROM `messages`
ORDER BY `fid` DESC
LIMIT 50 ) AS `temp`
ORDER BY `fid` ASC;
P.S.
We get around one request to read one message every 7 to 10 seconds on average. On weekends, it rises to around one every 3 seconds.
Please let me know if anything more is required.
TO SUM UP
Is their a way that I can call a stored procedure and get the final result in a smaller time. Current query(and method) takes it nearly 3 seconds to fetch and organize the data.
Few things regarding your query:
SELECT *
FROM ( SELECT *
FROM `messages`
ORDER BY `fid` DESC
LIMIT 50 ) AS `temp`
ORDER BY `fid` ASC;
Never SELECT * (all); always specify a column list (what you need)
Subqueries typically cost more (for sorting & storage)
If you are trying to fetch the bottom '50', trying using a BETWEEN clause instead
You can always see what you're query is doing by using EXPLAIN. I would try the following query:
SELECT `msg`, `id`, `ctg`, `date`, `time`, `fid`, `ip` FROM `messages`
WHERE `fid` > (SELECT MAX(`fid`)-50 FROM `messages`)
ORDER BY `fid`
I have one mysql table:
CREATE TABLE IF NOT EXISTS `test` (
`Id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`SenderId` int(10) unsigned NOT NULL,
`ReceiverId` int(10) unsigned NOT NULL,
`DateSent` datetime NOT NULL,
`Notified` tinyint(1) unsigned NOT NULL DEFAULT '0',
PRIMARY KEY (`Id`),
KEY `ReceiverId_SenderId` (`ReceiverId`,`SenderId`),
KEY `SenderId` (`SenderId`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_bin;
The table is populated with 10.000 random rows for testing by using the following procedure:
DELIMITER //
CREATE DEFINER=`root`#`localhost` PROCEDURE `FillTest`(IN `cnt` INT)
BEGIN
DECLARE i INT DEFAULT 1;
DECLARE intSenderId INT;
DECLARE intReceiverId INT;
DECLARE dtDateSent DATE;
DECLARE blnNotified INT;
WHILE (i<=cnt) DO
SET intSenderId = FLOOR(1 + (RAND() * 50));
SET intReceiverId = FLOOR(51 + (RAND() * 50));
SET dtDateSent = str_to_date(concat(floor(1 + rand() * (12-1)),'-',floor(1 + rand() * (28 -1)),'-','2008'),'%m-%d-%Y');
SET blnNotified = FLOOR(1 + (RAND() * 2))-1;
INSERT INTO test (SenderId, ReceiverId, DateSent, Notified)
VALUES(intSenderId,intReceiverId,dtDateSent, blnNotified);
SET i=i+1;
END WHILE;
END//
DELIMITER ;
CALL `FillTest`(10000);
The problem:
I need to write a query which will group by ‘SenderId, ReceiverId’ and return the first 100 highest Ids of each group, ordered by Id in ascending order.
I played with GROUP BY, ORDER BY and MAX(Id), but the query was too slow, so I came up with this query:
SELECT SQL_NO_CACHE t1.*
FROM test t1
LEFT JOIN test t2 ON (t1.ReceiverId = t2.ReceiverId AND t1.SenderId = t2.SenderId AND t1.Id < t2.Id)
WHERE t2.Id IS NULL
ORDER BY t1.Id ASC
LIMIT 100;
The above query returns the correct data, but it becomes too slow when the test table has more than 150.000 rows . On 150.000 rows the above query needs 7 seconds to complete. I expect the test table to have between 500.000 – 1M rows, and the query needs to return the correct data in less than 3 sec. If it’s not possible to fetch the correct data in less than 3 sec, than I need it to fetch the data using the fastest query possible.
So, how can the above query be optimized so that it runs faster?
Reasons why this query may be slow:
It's a lot of data. Lots of it may be returned. It returns the last record for each SenderId/ReceiverId combination.
The division of the data (many Sender/Receiver combinations, or relative few of them, but with multiple 'versions'.
The whole result set must be sorted by MySQL, because you need the first 100 records, sorted by Id.
These make it hard to optimize this query without restructuring the data. A few suggestions to try:
- You could try using NOT EXISTS, although I doubt if it would help.
SELECT SQL_NO_CACHE t1.*
FROM test t1
WHERE NOT EXISTS
(SELECT 'x'
FROM test t2
WHERE t1.ReceiverId = t2.ReceiverId AND t1.SenderId = t2.SenderId AND t1.Id < t2.Id)
ORDER BY t1.Id ASC
LIMIT 100;
- You could try using proper indexes on ReceiverId, SenderId and Id. Experiment with creating a combined index on the three columns. Try two versions, one with Id being the first column, and one with Id being the last.
With slight database modifications:
- You could save a combination of SenderId/ReceiverId in a separate table with a LastId pointing to the record you want.
- You could save a 'PreviousId' with each record, keeping it NULL for the last record per Sender/Receiver. You only need to query the records where previousId is null.