Greater Than and Less than MySql query failing with decimal field - mysql

I have a database which has 3 columns:
user_id | lat | lon
1 |-1.403976 | 53.428692
2 |-1.353276 | 55.224692
etc etc
Both lat and lon are set as decimal fields. I'm running a query similar to this but it isn't filtering based on being greater than and less than the given lat/lon numbers:
SELECT * FROM `table` WHERE `lat` < '-1.399999' AND 'lat' > '-1.300000'
AND 'lon' < '55.555555' AND > '53.000000'
This query just returns every row in the table and I don't know why? Is it something to do with the fields being set as decimals?
I hope someone can help - i know its probably a simple answer if you know it.
As per comment - here's the create table:
CREATE TABLE IF NOT EXISTS `members` (
`user_id` int(10) NOT NULL AUTO_INCREMENT,
`lat` decimal(8,6) NOT NULL,
`lon` decimal(8,6) NOT NULL,
UNIQUE KEY `user_id` (`user_id`),
) ENGINE=InnoDB DEFAULT CHARSET=latin1 AUTO_INCREMENT=136 ;

the problem is you are wrapping column names with single quotes forcing decimal values to compare against string literals. Column names as well as tables name are identifiers not string literals so they shouldn't be wrap with single quotes.
AND `lat` > '-1.300000'
AND `lon` BETWEEN '55.555555' AND '53.000000' -- use between here

As #JW suggests, you're mixing up backticks (`) and single quotes ('). Use backticks around database table and column names, but quotes around data values.
Also, the second half of your query doesn't match the first half. You have
`lat` < '-1.399999' AND 'lat' > '-1.300000'
'lon' < '55.555555' AND > '53.000000'
But it should be
`lat` < '-1.399999' AND `lat` > '-1.300000'
`lon` < '55.555555' AND `lon` > '53.000000'
So you're missing the column name in the fourth statement there.
Or, as #JW says, use BETWEEN which makes it easier to read too!
SELECT * FROM `table` WHERE `lat` BETWEEN '-1.399999' AND '-1.300000' AND `lon` BETWEEN '55.555555' AND '53.000000';

Example:
Query the sum of Northern Latitudes (LAT_N) from STATION having values greater than 38.7880 and less than 137.2345. Truncate your answer to 4 decimal places.
SELECT TRUNCATE(SUM(LAT_N),4)
FROM STATION
WHERE LAT_N > 38.7880 and LAT_N < 137.2345;

The question is pretty straight forward and your developed query need to use one of the attribute: LAT_N.
The lowest and highest value for LAT_N is given in the question. BETWEEN function will help you get the value between the highest and lowest value of LAT_N and ROUND function will get us value rounded to desired decimals(4 in this case).
My query will be as follows:
SELECT ROUND(SUM(LAT_N),4)
FROM STATION
WHERE LAT_N BETWEEN 38.7880 AND 137.2345;

Related

MySQL compare query performance

I have a table in MySQL like this:
startIp
endIp
city
countryCode
latitude
longitude
16777216
16777471
Los Angeles
US
34.0522
-118.244
16777472
16778239
Fuzhou
CN
26.0614
119.306
16778240
16779263
Melbourne
AU
-37.814
144.963
and 2.7 million more entries.
Now I have a converted IP Adresss like 16777566.This should return "Fuzhou, CN, 26.0614, 119.306"
Right now I use this query:
SELECT * FROM kombiniert WHERE startIp < 16777566 AND endIp > 16777566
It works really well but its to slow. Performance:
Without LIMIT: SELECT * FROM kombiniert WHERE startIp < 2264918979 AND endIp > 2264918979;
avg (2300ms)
With LIMIT: SELECT * FROM kombiniert WHERE startIp < 2264918979 AND endIp > 2264918979 LIMIT 1;
avg (1500ms)
Indexed Without LIMIT: SELECT * FROM kombiniert WHERE startIp < 2264918979 AND endIp > 2264918979;
avg (5300ms)
Indexed With LIMIT: SELECT * FROM kombiniert WHERE startIp < 2264918979 AND endIp > 2264918979 LIMIT 1;
avg (5500ms)
Now I want to speed up this query! Why should I do?
Thanks so much!
EDIT:I forgot to mention: The fields startIp, endIp are bigint!
EDIT2: table creation sql:
SET SQL_MODE = "NO_AUTO_VALUE_ON_ZERO";
START TRANSACTION;
SET time_zone = "+00:00";
CREATE TABLE `kombiniert` (<br>
`id` int(11) NOT NULL,<br>
`startIp` bigint(20) NOT NULL,<br>
`endIp` bigint(20) NOT NULL,<br>
`city` text NOT NULL,<br>
`countryCode` varchar(4) NOT NULL,<br>
`latitude` float NOT NULL,<br>
`longitude` float NOT NULL<br>
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;<br>
<br>
ALTER TABLE `kombiniert`<br>
ADD PRIMARY KEY (`id`),<br>
ADD KEY `startIp` (`startIp`),<br>
ADD KEY `endIp` (`endIp`);<br>
<br>
ALTER TABLE `kombiniert`<br>
MODIFY `id` int(11) NOT NULL AUTO_INCREMENT, AUTO_INCREMENT=2747683;<br>
COMMIT;<br>
Searching for IP addresses (or any other metric that is split into buckets) is not efficient. Or at least not efficient with the obvious code. The best average performance you can get is scanning one-quarter of the table for what you are looking for. That is "Order(N)".
You can get "Order(1)" performance for most operations, but it takes a restructuring of the table and the query. See http://mysql.rjweb.org/doc.php/ipranges

If using sum>100, sums under 100 will still show

I have a table with some data. Many of these data have the name ICA Supermarket with different sums for every data. If I use the following SQL query, it will also show data with the sum under 100. This applies also if I change >= '100' to a higher digit, for an example 200.
SELECT *
FROM transactions
WHERE data_name LIKE '%ica%'
AND REPLACE(data_sum, '-', '') >= '100'
If I change >= to <= no data will show at all. Here's how the table looks like:
CREATE TABLE IF NOT EXISTS `transactions` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`data_name` tinytext NOT NULL,
`data_sum` decimal(10,2) NOT NULL
UNIQUE KEY `id` (`id`)
)
Is it because data_sum is a DECIMAL? How can I prevent this from happening? I want to use DECIMAL for sums :)
Note: data_sum will also contain sums that are above minus.
REPLACE(data_sum, '-', '') returns a string. Also '100' is a string. So a string compare will be used. You should use ABS function:
SELECT *
FROM transactions
WHERE data_name LIKE '%ica%'
AND ABS(data_sum) >= 100
Are you looking for values >= 100 and <= -100? Or just values <= -100.
If the latter, then
... AND data_sum <= -100
This applies to DECIMAL, INT, FLOAT, etc.
Every table 'needs' a PRIMARY KEY. Promote that UNIQUE to PRIMARY.

MYSQL CSV column check for exclude

I need to find a record who dont have a specific value in CSV column. below is the table structure
CREATE TABLE `employee` (
`id` int NOT NULL AUTO_INCREMENT,
`first_name` varchar(100) NOT NULL,
`last_name` varchar(100) NOT NULL,
`keywords` text,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
Sample record1: 100, Sam, Thompson, "50,51,52,53"
Sample record2: 100, Wan, Thompson, "50,52,53"
Sample record3: 100, Kan, Thompson, "53,52,50"
50 = sports
51 = cricket
52 = soccer
53 = baseball
i need to find the employees name who has the tags of "sports,soccer,baseball" excluding cricket
so the result should return only 2nd and 3rd record in this example as they dont have 51(cricket) but all other 3 though in diff pattern.
My query is below, but i couldnt get it worked any more.
SELECT t.first_name,FROM `User` `t` WHERE (keywords like '50,52,53') LIMIT 10
is there anything like unlike option? i am confused how to get this worked.
You could use FIND_IN_SET:
SELECT t.first_name
FROM `User` `t`
WHERE FIND_IN_SET('50', `keywords`) > 0
AND FIND_IN_SET('52', `keywords`) > 0
AND FIND_IN_SET('53', `keywords`) > 0
AND FIND_IN_SET('51', `keywords`) = 0;
Keep in mind it could be slow. The correct way is to normalize your table structure.
FIND_IN_SET will do the job for you but it does not use indexes. This is not a bug it's a feature.
SUBSTRING_INDEX can use an index and return the data as you wish. You don't have an index on it at the moment, But the catch here is that TEXT fields cannot be fully indexed and what you have is a TEXT field.
Normalize!
This is what you really should be doing. It's not a good idea to store comma separated values in a database. You really should be having a keywords table and since the keywords will be short, you can have a char or varchar narrow column which can be fully indexed.

how to make default text column where comparision binary (case sensitive and trim)

Sorry if this is duplicated, but I don't know how to find about the question.
Hi, this my table:
CREATE TABLE `log_Valor` (
`idLog_Valor` int(11) NOT NULL AUTO_INCREMENT,
`Valor` text binary NOT NULL,
PRIMARY KEY (`idLog_Valor`)
)
ENGINE=InnoDB;
INSERT INTO `log_Valor` (Valor) VALUES ('teste');
INSERT INTO `log_Valor` (Valor) VALUES ('teste ');
I have 2 rows:
1 | 'teste'
2 | 'teste '
When I run:
SELECT * FROM log_Valor where valor = 'teste'
It returns the two rows.
How do I make default comparison case sensitive and to not trim without having to specify in the query BINARY?
Use LIKE instead of =.
SELECT * FROM log_Valor WHERE valor LIKE 'teste';
From the documentation
In particular, trailing spaces are significant, which is not true for CHAR or VARCHAR comparisons performed with the = operator
DEMO

How to optimize this range query

I have a table with 15 million records containing name, email addresses and IPs. I need to update another column in the same table with the country code using the IP address. I downloaded a small database (ip2location lite - https://lite.ip2location.com/) containing the all ip ranges and associated countries. The ip2location table has the following structure;
CREATE TABLE `ip2location_db1` (
`ip_from` int(10) unsigned DEFAULT NULL,
`ip_to` int(10) unsigned DEFAULT NULL,
`country_code` char(2) COLLATE utf8_bin DEFAULT NULL,
`country_name` varchar(64) COLLATE utf8_bin DEFAULT NULL,
KEY `idx_ip_from` (`ip_from`),
KEY `idx_ip_to` (`ip_to`),
KEY `idx_ip_from_to` (`ip_from`,`ip_to`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 COLLATE=utf8_bin
I'm using the following function to retrieve the country code from an ip address;
CREATE DEFINER=`root`#`localhost` FUNCTION `get_country_code`(
ipAddress varchar(30)
) RETURNS VARCHAR(2)
DETERMINISTIC
BEGIN
DECLARE ipNumber INT UNSIGNED;
DECLARE countryCode varchar(2);
SET ipNumber = SUBSTRING_INDEX(ipAddress, '.', 1) * 16777216;
SET ipNumber = ipNumber + (SUBSTRING_INDEX(SUBSTRING_INDEX(ipAddress, '.', 2 ),'.',-1) * 65536);
SET ipNumber = ipNumber + (SUBSTRING_INDEX(SUBSTRING_INDEX(ipAddress, '.', -2 ),'.',1) * 256);
SET ipNumber = ipNumber + SUBSTRING_INDEX(ipAddress, '.', -1 );
SET countryCode =
(SELECT country_code
FROM ip2location.ip2location_db1
USE INDEX (idx_ip_from_to)
WHERE ipNumber >= ip2location.ip2location_db1.ip_from AND ipNumber <= ip2location.ip2location_db1.ip_to
LIMIT 1);
RETURN countryCode;
END$$
DELIMITER ;
I've ran an EXPLAIN statement and this is the output;
'1', 'SIMPLE', 'ip2location_db1', NULL, 'range', 'idx_ip_from_to', 'idx_ip_from_to', '5', NULL, '1', '33.33', 'Using index condition'
My problem is that the query on 1000 records takes ~15s to execute which mean running the same query on all the database would require more than 2 days to complete. Is there a way to improve this query.
PS - If I remove the USE INDEX (idx_ip_from_to) the query takes twice as long. Can you explain why?
Also I'm not a database expert so bear with me :)
This can be quite tricky. I think the issue is that only the ip_from part of the condition can be used. See if this gets the performance you want:
SET countryCode =
(SELECT country_code
FROM ip2location.ip2location_db1 l
WHERE ipNumber >= l.ip_from
ORDER BY ip_to
LIMIT 1
);
I know I'm leaving off the ip_to. If this works, then you can do the full check in two parts. First get the ip_from using a similar query. Then use an equality query to get the rest of the information in the row.
The reason USE INDEX helps is because MySQL wasn't planning to use that index. Its optimizer chose a different one, but it guessed wrong. Sometimes this happens.
Also, I'm not sure if this will affect performance a ton, but you should just use INET_ATON to change the IP address string into an integer. You don't need that SUBSTRING_INDEX business, and it may be slower.
What I would do here is measure the maximum distance between from and to:
SELECT MAX(ip_from - ip_to) AS distance
FROM ip2location_db1;
Assuming this is not a silly number, you will then be able to use the ip_from index properly. The check becomes:
WHERE ipNumber BETWEEN ip_from AND ip_from + distance
AND ipNumber <= ip_to
The goal here is to make all of the information to find a narrow set of rows come from a limited range of one column's value: ip_from. Then ip_to is just an accuracy check.
The reason you want to do this is because the ip_to value (second part of the index) can't be used until the corresponding ip_from value is found. So it still has to scan most of the index records for low values of ip_from without an upper bound.
Otherwise, you might consider measuring how unique the IP addresses are in your 15 million records. For example, if there are only 5 million unique IPs, it could be better to extract a unique list, map those to country codes, and then use that mapping (either at runtime, or to update the original table.) Depends.
If the values are very unique, but potentially in localized clusters, you could try removing the irrelevant rows from ip2location_db1, or even horizontal partitioning to improve the range checks. I'm not sure this would win anything, but if you can use some index on the original table to consult specific partitions only, you might be able to win some performance.