SELECT name WHERE point in square - mysql

I know that this is more of a Math problem, but I am not certaint as to how to perform math correctly in a query anyways. What Im trying to do, is get a column from a database, where a point (x, y) is inside a region saved in another column of that row (x, y) - (x + 16, y + 16).
The database loks something like:
+--------+---------------+
| Name | Position |
+--------+---------------+
|Area1 |16:32 |
|Area2 |-32:16 |
|Area3 |128:64 |
+--------+---------------+
An area is the saved cordinates (X:Y) + 16. It's basically a grid of 16x16 areas.
I'm trying to get the name of the area by the position of a point (x, y), which can be anywhere in that area.
If this would make things easier, It is possible to change the "Position" column to 2 diffent one aka something like:
+--------+---------------+---------------+
| Name | Position_x | Position_Y |
+--------+---------------+---------------+
|Area1 |16 |32 |
|Area2 |-32 |16 |
+--------+---------------+---------------+
Thanks in advance!

If I understand you correctly, you just need to check that the x value is between Position_x and Position_x+15 (inclusive) and the same for y:
SELECT *
FROM areas
WHERE 20 BETWEEN Position_x AND Position_x + 15
AND 40 BETWEEN Position_y AND Position_y + 15
Output (for your sample data):
Name Position_x Position_Y
Area1 16 32
Demo on dbfiddle

Related

use st_within to get the locations in a radius/circle

I have a table that looks like this
| id | name | latitude | langitude | costLat | sinLat | cosLng | sinLng |
| 1 | place 1 | 2.942743912327621 | 101.79377630352974 | 0.99868133582304 | 0.051337992546461 | -0.20438972214917 | 0.97888959616485 |
Referring to this article, it seems like a good idea to use st_within in order for me to search for locations within 5 km radius from a given latitute and langitude in my table above. But I totally have no idea how to do that.
The table is MyISAM, MySQL version 5.6
Sorry for not being clear on what I tried. From the documentation it mentions that
ST_Within(g1,g2)
Returns 1 or 0 to indicate whether g1 is spatially within g2.
So my understanding is, we need to pass 2 params to ST_Within. Sound simple enough, but when I looked at the sample query in the linked articles, it does (*note: I changed shape to CIRCLE in the query, as my assumption is my shape is CIRCLE because I'm searching for radius)
set #lat= 37.615223;
set #lon = -122.389979;
set #dist = 10;
set #rlon1 = #lon-#dist/abs(cos(radians(#lat))*69);
set #rlon2 = #lon+#dist/abs(cos(radians(#lat))*69);
set #rlat1 = #lat-(#dist/69);
set #rlat2 = #lat+(#dist/69);
SELECT ASTEXT("CIRCLE"), NAME FROM location_final
WHERE st_within("CIRCLE", ENVELOPE(LINESTRING(POINT(#rlon1, #rlat1), POINT(#rlon2, #rlat2))))
ORDER BY st_distance(POINT(#lon, #lat), "CIRCLE") LIMIT 10;
So looking at the query above, my confusion is, where do the comparison between the latitude and langitude happens? Where in the query should I mention about my column latitude and langitude?
Looking at the output at the given link, it display something like
+--------------------------------+-------------------------------+
| astext(shape) | name |
+--------------------------------+-------------------------------+
| POINT(-122.3890954 37.6145378) | Tram stop:Terminal A |
| POINT(-122.3899 37.6165902) | Tram stop:Terminal G |
Where do the POINT come from?

How to use substr(...) for BIT(...) data type columns?

I have this table:
// numbers
+---------+------------+
| id | numb |
+---------+------------+
| int(11) | bit(10) |
+---------+------------+
| 1 | 1001100111 |
| 2 | 0111000101 |
| 3 | 0001101010 |
| 4 | 1111111011 |
+---------+------------+
Now I'm trying to get third digit (left to right) from those number. Example:
1001100111
^ // I want to get 0
So it is expected result:
+--------------------+
| substr(numb, 3, 1) |
+--------------------+
| 0 |
| 1 |
| 0 |
| 1 |
+--------------------+
Here is my query:
SELECT SUBSTR(numb, 3, 1) FROM numbers
But it doesn't work. because bit(10) isn't string and SUBSTR() cannot parse it. Is there any workaround?
You could convert BIT to VARCHAR (or CHAR) and then use SUBSTR in following:
SELECT SUBSTR(CONVERT(VARCHAR(10),numb), 3, 1)
FROM numbers
Or using LEFT and RIGHT:
SELECT LEFT(RIGHT(CONVERT(VARCHAR(10),numb),8),1)
FROM numbers
Although you could use substr after converting to varchar, a simpler approach for BIT(...) data type it to use bit operators.
Since according to your comment it is OK to extract 8-th bit from the right, rather than the third bit from the left, this will produce the expected result:
select id, (x>>7)&1
from test
Demo.
Is it possible to I update just one of its digits? I mean I want to update seventh digit (right to left) from 1001011101 and make it 0?
You can set a single bit to zero like this:
UPDATE test SET x = x & b'1110111111' WHERE id=3
Position of 0 indicates the bit you are setting to zero.
If you want to set it to 1, use
UPDATE test SET x = x | b'0001000000' WHERE id=3
You can have more than one zero in the first example if you would like to set multiple bits to zero. Similarly, you can have more than one 1 in the second example if you need to set multiple bits to 1.
If you have a bit column, then use bit operations.
These are documented here.
One method is:
select ( (numb & b'0010000000') > 0)

Query XY array pair for y value at arbitrary x in SQL

I'd like to make a database of products. Each product have characteristics described as an array of x values and corresponding y values.
And I'd like to query products for certain characteristics.
Example product data:
ProductA_x = [10, 20, 30, 40, 50]
ProductA_y = [2, 10, 30, 43, 49]
ProductB_x = [11, 22, 33, 44, 55, 66]
ProductB_y = [13, 20, 42, 35, 28, 21]
Now I'd like to get a list of products where y < 35 # x=31.
In the example data case, I should get ProductA.
If I use MySQL, what would be a good way to define table(s) to
achieve this query at SQL level?
Would it become easier if I could use PostgreSQL? (Use
Array or JSON type??)
One way I was advised was to make a table to specify xy pairs for x range. First data is for range x[0] to x[1], next data is for x[1] to x[2]. Something like this.
| ProductID | x1 | x2 | y1 | y2 |
| --------- | -- | -- | -- | -- |
| 1 | 10 | 20 | 2 | 10 |
| 1 | 20 | 30 | 10 | 30 |
| 1 | 30 | 40 | 30 | 43 |
| 1 | 40 | 50 | 43 | 49 |
| 2 | 11 | 22 | 33 | 44 |
| 2 | 22 | 33 | 20 | 42 |
| 2 | 33 | 44 | 42 | 35 |
| 2 | 44 | 55 | 35 | 28 |
| 2 | 55 | 66 | 28 | 21 |
Then I could query for (x1 > 31 AND 31 < x2) AND (y1 < 35 OR y2 < 35)
This solution is not too bad but I wonder if there is cleverer approach.
Please note that x array is guaranteed to be incremental but different product would have different starting x value, step size and number of points. And x value to be searched for may not exist as exact value in x array.
The length of real x and y arrays would be about 2000. I expect I'd have about 10,000 products.
It would be best if corresponding y value can be interpolated but searching y value at nearest x value is acceptable.
since every X corresponds to exactly one Y, the sane table definition on a classic relational database would be:
CREATE TABLE product (id serial not null unique, sku text primary key, ....);
CREATE TABLE product_xy (product_id int not null references product(id),
x int not null,
y int not null,
primary key(product_id, x));
That would make your query manageable in all cases.
On PostgreSQL 9.3 you could use a LATERAL subquery to effectively use arrays but I don't think it would be easier than just going with a relational design to start with. The only case where you would want to store the info in an array in PostgreSQL is if ordinality mattered on the x array. Then the design becomes slightly more complex because the following array combinations are not semantically the same:
array[1, 2, 3] x
array[4, 5, 6] y
and
array[2, 1, 3] x
array[5, 4, 6] y
If those need to be distinct then go with an array-based solution in PostgreSQL (note that in both cases the same x value corresponds with the same y value, but the ordering of the pairs differs). Otherwise go with a standard relational design. If you have to go with that, then your better option is to have a 2-dimensional xy array that would be something like:
array[
array[1, 2, 3],
array[4, 5, 6]
] xy
You could then have functions which could process these pairs on the array as a whole, but the point is that in this case the xy represents a single atomic value in a specific domain, where ordinality matters in both dimensions and therefore the value can be processed at once. In other words, if ordinality matters on both dimensions, then you have a single value in your domain and so this does not violate first normal form. If ordinality along either dimension does not matter, then it does violate first normal form.

Is it possible to automaticly have in a row a substraction between two MYSQL variables

I would like to have something like this :
+----------+------+-----+--------+
| image_id | good | bad | result |
+----------+------+-----+--------+
| 1 | 10 | 2 | x |
+----------+------+-----+--------+
| 2 | 4 | 1 | y |
+----------+------+-----+--------+
Where x and y is calculated automaticaly to be respectively 10 - 2 and 4 - 1. (good - bad) -avoid negative number if possible-
I would like this value to change if values (good or bad) changes as well.
+----------+------+-----+--------+
| image_id | good | bad | result |
+----------+------+-----+--------+
| 1 | 10 | 2 | x |
+----------+------+-----+--------+
| 2 | 4 | 1 | y |
+----------+------+-----+--------+
I can do this in php but is there a way to do this directly with MYSQL ?
Calculate the result and return no less than zero, so avoiding negative numbers:
SELECT image_id, good, bad, GREATEST(good-bad, 0) AS result from `table`;
use this query:
select image_id, good, bad, GREATEST(good-bad, 0) as 'result' from tbl
This will calculate the difference for each row and returns the result (or 0 if the result is negative= in another column named result.
As a general rule, try to avoid to store in columns the results of calculation based entirely on other columns of the same table, expecially if the calculations are so trivial like a simple difference.
You can simply write:
select image_id, good, bad, (good-bad) as result from mytable
What you could do is have this schema:
CREATE TABLE tbl (image_id INTEGER PRIMARY KEY, good INTEGER, bad INTEGER);
CREATE VIEW tbl_result AS SELECT image_id, good, bad, CAST(good AS INTEGER) - bad AS result FROM tbl;

How can I optimize this stored procedure?

I need some help optimizing this procedure:
DELIMITER $$
CREATE DEFINER=`ryan`#`%` PROCEDURE `GetCitiesInRadius`(
cityID numeric (15),
`range` numeric (15)
)
BEGIN
DECLARE lat1 decimal (5,2);
DECLARE long1 decimal (5,2);
DECLARE rangeFactor decimal (7,6);
SET rangeFactor = 0.014457;
SELECT `latitude`,`longitude` into lat1,long1
FROM world_cities as wc WHERE city_id = cityID;
SELECT
wc.city_id,
wc.accent_city as city,
s.state_name as state,
c.short_name as country,
GetDistance(lat1, long1, wc.`latitude`, wc.`longitude`) as dist
FROM world_cities as wc
left join states s on wc.state_id = s.state_id
left join countries c on wc.country_id = c.country_id
WHERE
wc.`latitude` BETWEEN lat1 -(`range` * rangeFactor) AND lat1 + (`range` * rangeFactor)
AND wc.`longitude` BETWEEN long1 - (`range` * rangeFactor) AND long1 + (`range` * rangeFactor)
AND GetDistance(lat1, long1, wc.`latitude`, wc.`longitude`) <= `range`
ORDER BY dist limit 6;
END
Here is my explain on the main portion of the query:
+----+-------------+-------+--------+---------------+--------------+---------+--------------------------+------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+--------+---------------+--------------+---------+--------------------------+------+----------------------------------------------+
| 1 | SIMPLE | B | range | idx_lat_long | idx_lat_long | 12 | NULL | 7619 | Using where; Using temporary; Using filesort |
| 1 | SIMPLE | s | eq_ref | PRIMARY | PRIMARY | 4 | civilipedia.B.state_id | 1 | |
| 1 | SIMPLE | c | eq_ref | PRIMARY | PRIMARY | 1 | civilipedia.B.country_id | 1 | Using where |
+----+-------------+-------+--------+---------------+--------------+---------+--------------------------+------+----------------------------------------------+
3 rows in set (0.00 sec)
Here are the indexes:
mysql> show indexes from world_cities;
+--------------+------------+---------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment |
+--------------+------------+---------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| world_cities | 0 | PRIMARY | 1 | city_id | A | 3173958 | NULL | NULL | | BTREE | |
| world_cities | 1 | country_id | 1 | country_id | A | 23510 | NULL | NULL | YES | BTREE | |
| world_cities | 1 | city | 1 | city | A | 3173958 | NULL | NULL | YES | BTREE | |
| world_cities | 1 | accent_city | 1 | accent_city | A | 3173958 | NULL | NULL | YES | BTREE | |
| world_cities | 1 | idx_pop | 1 | population | A | 28854 | NULL | NULL | YES | BTREE | |
| world_cities | 1 | idx_lat_long | 1 | latitude | A | 1057986 | NULL | NULL | YES | BTREE | |
| world_cities | 1 | idx_lat_long | 2 | longitude | A | 3173958 | NULL | NULL | YES | BTREE | |
| world_cities | 1 | accent_city_2 | 1 | accent_city | NULL | 1586979 | NULL | NULL | YES | FULLTEXT | |
+--------------+------------+---------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
8 rows in set (0.01 sec)
The function you see in the query I wouldn't think would cause the slow down, but here is the function:
CREATE DEFINER=`ryan`#`%` FUNCTION `GetDistance`(lat1 numeric (9,6),
lon1 numeric (9,6),
lat2 numeric (9,6),
lon2 numeric (9,6) ) RETURNS decimal(10,5)
BEGIN
DECLARE x decimal (20,10);
DECLARE pi decimal (21,20);
SET pi = 3.14159265358979323846;
SET x = sin( lat1 * pi/180 ) * sin( lat2 * pi/180 ) + cos(
lat1 *pi/180 ) * cos( lat2 * pi/180 ) * cos( (lon2 * pi/180) -
(lon1 *pi/180)
);
SET x = atan( ( sqrt( 1- power( x, 2 ) ) ) / x );
RETURN ( 1.852 * 60.0 * ((x/pi)*180) ) / 1.609344;
END
As far as I can tell there is not something directly wrong with your logic that would make this slow, so the problems ends up being that you can't use any indexes with this query.
MySQL needs to do a full table scan and apply the functions of your WHERE clause to each row to determine if it passed the conditions. Currently there's 1 index used: idx_lat_long.
It's a bit of a bad index, the long portion will never be used, because the lat portion is a float. But at the very least you managed to effectively filter out all rows that are outside the latitude range. But it's likely.. these are still a lot though.
You'd actually get slightly better results on the longitude, because humans only really live in the middle 30% of the earth. We're very much spread out horizontally, but not really vertically.
Regardless, the best way to further minimize the field is to try to filter out as many records in the general area. Right now it's a full vertical strip on the earth, try to make it a bounding box.
You could naively dice up the earth in say, 10x10 segments. This would in a best case make sure the query is limited to 10% of the earth ;).
But as soon as your bounding box exceeds to separate segments, only the first coordinate (lat or lng) can be used in the index and you end up with the same problem.
So when I thought of this problem I started thinking about this differently. Instead, I divided up the earth in 4 segments (lets say, north east, north west, south east, south west on map). So this gives me coordinates like:
0,0
0,1
1,0
1,1
Instead of putting the x and y value in 2 separate fields, I used it as a bit field and store both at once.
Then every 1 of the 4 boxes I divided up again, which gives us 2 sets of coordinates. The outer and inner coordinates. I'm still encoding this in the same field, which means we now use 4 bits for our 8x8 coordinate system.
How far can we go? If we assume a 64 bit integer field, it means that 32bit can be used for each of the 2 coordinates. This gives us a grid system of 4294967295 x 4294967295 all encoded into one database field.
The beauty of this field is that you can index it. This is sometimes called (I believe) a Quad-tree. If you need to select a big area in your database, you just calculate the 64bit top-left coordinate (in the 4294967295 x 4294967295 grid system) and the bottom-left, and it's guaranteed that anything that lies in that box, will also be within the two numbers.
How do you get to those numbers. Lets be lazy and assume that both our x and y coordinate have range from -180 to 180 degrees. (The y coordinate of course is half that, but we're lazy).
First we make it positive:
// assuming x and y are our long and lat.
var x+=180;
var y+=180;
So the max for those is 360 now, and (4294967295 / 360 is around 11930464).
So to convert to our new grid system, we just do:
var x*=11930464;
var y*=11930464;
Now we have to distinct numbers, and we need to turn them into 1 number. First bit 1 of x, then bit 1 of y, bit 2 of x, bit 2 of y, etc.
// The 'morton number'
morton = 0
// The current bit we're interleaving
bit = 1
// The position of the bit we're interleaving
position = 0
while(bit <= latitude or bit <= longitude) {
if (bit & latitude) morton = morton | 1 << (2*position+1)
if (bit & longitude) morton = morton | 1 << (2*position)
position += 1
bit = 1 << position
}
I'm calling the final variable 'morton', the guy who came up with it in 1966.
So this leaves us finally with the following:
For each row in your database, calculate the morton number and store it.
Whenever you do a query, first determine the maximum bounding box (as the morton number) and filter on that.
This will greatly reduce the number of records you need to check.
Here's a stored procedure I wrote that will do the calculation for you:
CREATE FUNCTION getGeoMorton(lat DOUBLE, lng DOUBLE) RETURNS BIGINT UNSIGNED DETERMINISTIC
BEGIN
-- 11930464 is round(maximum value of a 32bit integer / 360 degrees)
DECLARE bit, morton, pos BIGINT UNSIGNED DEFAULT 0;
SET #lat = CAST((lat + 90) * 11930464 AS UNSIGNED);
SET #lng = CAST((lng + 180) * 11930464 AS UNSIGNED);
SET bit = 1;
WHILE bit <= #lat || bit <= #lng DO
IF(bit & #lat) THEN SET morton = morton | ( 1 << (2 * pos + 1)); END IF;
IF(bit & #lng) THEN SET morton = morton | ( 1 << (2 * pos)); END IF;
SET pos = pos + 1;
SET bit = 1 << pos;
END WHILE;
RETURN morton;
END;
A few caveats:
The absolute worst case scenario will still scan 50% of your entire table. This chance is extremely low though, and I've seen absolutely significant performance increases for most real-world queries.
The bounding box in this case assumes a Eucllidean space, meaning.. a flat surface. In reality your bounding boxes are not exact squares, and they warp heavily when getting closer to the poles. By just making the boxes a bit larger (depending on how exact you want to be) you can get quite far. Most real-world data is also often not close to the poles ;). Remember that this filter is just a 'rough filter' to get the most of the likely unwanted rows out.
This is based on a so-called Z-Order curve. To get even better performance, if you're feeling adventurous.. you could try to go for the Hilbert Curve instead. This curve oddly rotates, which ensures that in a worst case scenario, you will only scan about 25% of the table.. Magic! In general this one will also filter much more unwanted rows.
Source for all this: I wrote 3 blogposts about this topic when I came to the same problems and tried to creatively get to a solution. I got much better performance with this compared to MySQL's GEO indexes.
http://www.rooftopsolutions.nl/blog/229
http://www.rooftopsolutions.nl/blog/230
http://www.rooftopsolutions.nl/blog/231