Mysql Order By Problematic - mysql

Okay, I'm having some difficulties with order by. Here is the problem I need to solve:
In the database I have written every tile of a map, that is 101 x 101 big. The table has 3 columns(ID, x, y), now I gotta select all the tiles in some radious. For example, I used this query:
SELECT *
FROM tile
WHERE ((x >= -3 AND x <= 3)
AND (y >= -3 AND y <= 3))
ORDER BY x ASC, y DESC;
This query selects all tiles in radius of 3 of the given coordinate (0|0) for now.
But, it doesn't sort them the way I want it to. Basically, the output must be like this.
But this is the closest I got.
http://prntscr.com/zqjd7
Edit:
Disregard the double values, had double inputs for each coordinate. Haven't seen it.

It seems that your problem is around the ASC / DESC modificator.
But since we're here, wouldn't you prefer to use a distance formula? Something near
SELECT x, y FROM tile WHERE
(
POW(x-#var1, 2) + POW(y-#var2, 2) <= POW(3, 2)
)
ORDER BY x DESC, y ASC;
Here, given a point P (m,n), we shall know the distance to a fixed point Q (x,y) by acerting D(P,Q) = SQRT( (x-m)² + (y-n)² ). As much as it has to be less than (or equals) your desired radius (= 3), we have so SQRT( (x-m)² + (y-n)² ) <= 3, or better, (x-m)² + (y-n)² <= 3², raising both terms to its square power.
SQL-language speaking, we write POW(x-m, 2) + POW(y-n, 2) <= POW(3, 2), willing to say that the distance between (x,y) and (m,n) is last than or equal 3.
About #var, it's where you enter your input value. More specifically, they are session variables, but you don't really want to use it to perform a select; just substitute them by any number you want, e.g. you can choose the origin (0,0) by putting 0 on place of #var1 and #var2.
[Update]
Well... It's always a good idea to test your code before answering. In fact I should have suggested to order firstly by y, since we first care about ordering rows to display on screen. The following code was (finally) tested (on test DB); my last suggest is to create the following index (index_y_x):
USE `test` ;
CREATE TABLE IF NOT EXISTS `test`.`tile` (
`id` INT(11) UNSIGNED NOT NULL AUTO_INCREMENT ,
`x` INT(11) NULL DEFAULT 0 ,
`y` INT(11) NULL DEFAULT 0 ,
PRIMARY KEY (`id`) ,
INDEX `index_y_x` (`y` DESC, `x` ASC) )
ENGINE = InnoDB
DEFAULT CHARACTER SET = utf8;
INSERT tile (x,y) VALUES
(-2,-2),(-2, -1),(-2, 0),(-2, 1),(-2, 2),
(-1,-2),(-1, -1),(-1, 0),(-1, 1),(-1, 2),
(0,-2), (0, -1), (0, 0), (0, 1), (0, 2),
(1,-2), (1, -1), (1, 0), (1, 1), (1, 2),
(2,-2), (2, -1), (2, 0), (2, 1), (2, 2);
SELECT x, y FROM tile
WHERE POW(x-3, 2) + POW(y-3, 2) <= POW(3, 2)
ORDER BY y DESC, x ASC;
This returns items near the point (3,3), in a range of 3 units

Related

How to auto increment a string with sql query

I am stuck at a point where i have to increment a string, and my strings are of type C001,SC001,B001
in my data base they are defined like
what i am trying to do do is write a query which check the previous highest code present into my db and the incriment it to +1
for example C001 -> C002,C009->C010,C099`->C100 and so on
Similarly for SC001->SC002,SC009->SC010,SC099->SC100 and so on
Similarly fro B001 -> B002,B009->B010,B099`->B100 and so on
I have a query which my friend has suggested me to use but that query only incriminating AAAA->AAAA01 , AAAA09->AAAA10
query is
SELECT id AS PrevID, CONCAT(
SUBSTRING(id, 1, 4),
IF(CAST(SUBSTRING(id, 5) AS UNSIGNED) <= 9, '0', ''),
CAST(SUBSTRING(id, 5) AS UNSIGNED) + 1
) AS NextID
FROM (
-- since you allow strings such as AAAA20 and AAAA100 you can no longer use MAX
SELECT id
FROM t
ORDER BY SUBSTRING(id, 1, 4) DESC, CAST(SUBSTRING(id, 5) AS UNSIGNED) DESC
LIMIT 1
) x
when i am replacing ID with CategoryCode it is giving me PrevID-C004 NextID-C00401 which is not my requirement i want PrevID-C004 and NextID->C005
NOTE i am using my sqlServer 5.1
Just try this one ,
SELECT
CategoryCode,CAST(CONCAT(LPAD(CategoryCode,1,0),LPAD(MAX(RIGHT(CategoryCode,
3)) + 1, 3, 0) ) AS CHAR),
FROM test
SELECT
SubCategoryCode,CAST(CONCAT(LPAD(SubCategoryCode,2,0),
LPAD(MAX(RIGHT(CategoryCode, 3)) + 1, 3, 0) ) AS CHAR),
FROM test
SELECT
BrandCode,CAST(CONCAT(LPAD(BrandCode,1,0), LPAD(MAX(RIGHT(BrandCode, 3)) +
1, 3, 0)) AS CHAR) FROM test

Mysql : Use IF in TRUNCATE

This is the query, simplified.
SELECT `a`, TRUNCATE(`b` / 1000, 3) AS `b`
FROM (
...
) AS `m`
GROUP BY `a`
ORDER BY `a`
What i'm trying to do is change the number of decimal places (actual 3) based on the value of b.
So i tried this:
SELECT `a`, TRUNCATE(`b` / 1000, IF(`b` < 10, 2, 3)) AS `b` ...
and this
SELECT `a `, IF(`b ` < 10, TRUNCATE(`b ` / 1000, 2), TRUNCATE(`b ` / 1000, 3)) AS `b `
If b is less than 10, i want 3 decimal places, otherwise 2.
But this doesn't seem to work ...
Resources : https://dev.mysql.com/doc/refman/8.0/en/control-flow-functions.html#function_if
just change the values position that you put in your query
SELECT `a `, IF(b < 10, TRUNCATE(b / 1000, 3), TRUNCATE(b / 1000, 2))
AS b
if(a<1,2,3) means if a<1 then 2 will come as a value in your result so you have to switch your values position
use round
SELECT a , IF(b < 10, round((b / 1000), 2), round((b / 1000), 3) ) AS b
The ROUND() function rounds a number to a specified number of decimal places.
example SELECT ROUND(345.156, 2); result = 345.16
SELECT ROUND(345.156, 2); result = 345.156
If you don't want round then TRUNCATE will shown 0.00 in case of b value less than 10, so what do you mean by not working ?
You need 3 decimal place when b<10 so you have to change the position of yours query result
You have misplaced the order of queries to run, in case of true/false evaluation in If(). Following may work:
SELECT `a `,
IF(`b ` < 10,
TRUNCATE(`b ` / 1000, 3),
TRUNCATE(`b ` / 1000, 2)
) AS `b `

Hamming Distance optimization for MySQL or PostgreSQL?

I trying to improve search similar images pHashed in MySQL database.
Right now I comparing pHash counting hamming distance like this:
SELECT * FROM images WHERE BIT_COUNT(hash ^ 2028359052535108275) <= 4
Results for selecting (engine MyISAM)
20000 rows ; query time < 20ms
100000 rows ; query time ~ 60ms # this was just fine, until its reached 150000 rows
300000 rows ; query time ~ 150ms
So query time encrease depends of the number of rows in table.
I also try solutions found on stackoverflow
Hamming distance on binary strings in SQL
SELECT * FROM images WHERE
BIT_COUNT(h1 ^ 11110011) +
BIT_COUNT(h2 ^ 10110100) +
BIT_COUNT(h3 ^ 11001001) +
BIT_COUNT(h4 ^ 11010001) +
BIT_COUNT(h5 ^ 00100011) +
BIT_COUNT(h6 ^ 00010100) +
BIT_COUNT(h7 ^ 00011111) +
BIT_COUNT(h8 ^ 00001111) <= 4
rows 300000 ; query time ~ 240ms
I changed database engine to PostgreSQL. Translate this MySQL query to PyGreSQL
Without success.
rows 300000 ; query time ~ 18s
Is there any solution to optimize above queries?
I mean optimization not depended of the number of rows.
I have limited ways (tools) to solve this problem.
MySQL so far seemed to be the simplest solution but I can deploy code on every open source database engine that will work with Ruby on dedicated machine.
There is some ready solutions for MsSQL https://stackoverflow.com/a/5930944/766217 (not tested). Maybe someone know how to translate it for MySQL or PostgreSQL.
Please, post answers based on some code or observations. We have a lot of theoretical issues about hamming distance on stackoverflow.com
Thanks!
When considering the efficiency of algorithms, computer scientists use the concept of the order denoted O(something) where something is a function of n, the number of things being computed, in this case rows. So we get, in increasing time:
O(1) - independent of the number of items
O(log(n)) - increases as the logarithm of the items
O(n) - increases in proportion of the items (what you have)
O(n^2) - increases as the square of the items
O(n^3) - etc
O(2^n) - increases exponentially
O(n!) - increases with the factorial of the number
The last 2 are effectively uncomputable for any reasonable number of n (80+).
Only the most significant term matters since this dominates for large n so n^2 and 65*n^2+787*n+4656566 are both O(n^2)
Bearing in mind that this is a mathematical construction and the time an algorithm takes with real software on real hardware using real data may be heavily influenced by other things (e.g. an O(n^2) memory operation may take less time than an O(n) disk operation).
For your problem, you need to run through each row and compute BIT_COUNT(hash ^ 2028359052535108275) <= 4. This is an O(n) operation.
The only way this could be improved is by utilizing an index since a b-tree index retrieval is an O(log(n)) operation.
However, because your column field is contained within a function, an index on that column cannot be used. You have 2 possibilities:
This is an SQL server solution and I don't know if it is portable to MySQL. Create a persisted calculated column in your table with the formula BIT_COUNT(hash ^ 2028359052535108275) and put an index on it. This will not be suitable if you need to change the bit mask.
Work out a way of doing the bitwise arithmetic without using the BIT_COUNT function.
This solution made things a bit faster for me.
It makes a derived table for each hash compare, and returns only the results that are less than the ham distance. This way, it's not doing the BIT_COUNT on a pHash that has already exceeded the ham. It returns all matches in about 2.25 seconds on 2.6 million records.
It's InnoDB, and I have very few indexes.
If somebody can make it faster, I'll appreciate you.
SELECT *, BIT_COUNT(pHash3 ^ 42597524) + BC2 AS BC3
FROM (
SELECT *, BIT_COUNT(pHash2 ^ 258741369) + BC1 AS BC2
FROM (
SELECT *, BIT_COUNT(pHash1 ^ 5678910) + BC0 AS BC1
FROM (
SELECT `Key`, pHash0, pHash1, pHash2, pHash3, BIT_COUNT(pHash0 ^ 1234567) as BC0
FROM files
WHERE BIT_COUNT(pHash0 ^ 1234567) <= 3
) AS BCQ0
WHERE BIT_COUNT(pHash1 ^ 5678910) + BC0 <= 3
) AS BCQ1
WHERE BIT_COUNT(pHash2 ^ 258741369) + BC1 <= 3
) AS BCQ2
WHERE BIT_COUNT(pHash3 ^ 42597524) + BC2 <= 3
This is the equivalent query, but without the derived tables. Its return time is almost 3 times as long.
SELECT `Key`, pHash0, pHash1, pHash2, pHash3
FROM Files
WHERE BIT_COUNT(pHash0 ^ 1234567) + BIT_COUNT(pHash1 ^ 5678910) + BIT_COUNT(pHash2 ^ 258741369) + BIT_COUNT(pHash3 ^ 42597524) <=3
Keeping in mind that the lower the ham value on the first one, the faster it will run.
Here are the results for my tests. Phash is calculated with the imagehash library in Python and stored as two BIGINTs in the database.
This test was ran on 858,433 images in a mariadb database that does not use sharding. I found sharding to actually slow down the process, however that was with the function method so that may be different without it or on a large database.
The table these are running on is an in-memory only table. A local table is kept and upon startup of the database the id, phash1, and phash2 are copied to an in-memory table. The id is returned to match to the innodb table once something is found.
Total Images: 858433
Image 1: ece0455d6b8e9470
Function HAMMINGDISTANCE_16:
RETURN BIT_COUNT(A0 ^ B0) + BIT_COUNT(A1 ^ B1)
Method: HAMMINGDISTANCE_16 Function
Query:
SELECT `id` FROM `phashs` WHERE HAMMINGDISTANCE_16(filephash_1, filephash_2, CONV(SUBSTRING('ece0455d6b8e9470', 1, 8), 16, 10), CONV(SUBSTRING('ece0455d6b8e9470', 9, 8), 16, 10)) <= 3;
Time: 2.1760 seconds
Method: BIT_COUNT
Query:
SELECT `id` FROM `phashs` WHERE BIT_COUNT(filephash_1 ^ CONV(SUBSTRING('ece0455d6b8e9470', 1, 8), 16, 10)) + BIT_COUNT(filephash_2 ^ CONV(SUBSTRING('ece0455d6b8e9470', 9, 8), 16, 10)) <= 3;
Time: 0.1547 seconds
Method: Multi-Select BIT_COUNT inner is filephash_1
Query:
SELECT `id` FROM ( SELECT `id`, `filephash_2`, BIT_COUNT(filephash_1 ^ CONV(SUBSTRING('ece0455d6b8e9470', 1, 8), 16, 10)) as BC0 FROM `phashs` WHERE BIT_COUNT(filephash_1 ^ CONV(SUBSTRING('ece0455d6b8e9470', 1, 8), 16, 10)) <= 3 ) BCQ0 WHERE BC0 + BIT_COUNT(filephash_2 ^ CONV(SUBSTRING('ece0455d6b8e9470', 9, 8), 16, 10)) <= 3;
Time: 0.1878 seconds
Method: Multi-Select BIT_COUNT inner is filephash_2
Query:
SELECT `id` FROM (SELECT `id`, `filephash_1`, BIT_COUNT(filephash_2 ^ CONV(SUBSTRING('ece0455d6b8e9470', 9, 8), 16, 10)) as BC1 FROM `phashs` WHERE BIT_COUNT(filephash_2 ^ CONV(SUBSTRING('ece0455d6b8e9470', 9, 8), 16, 10)) <= 3) BCQ1 WHERE BIT_COUNT(filephash_1 ^ CONV(SUBSTRING('ece0455d6b8e9470', 1, 8), 16, 10)) + BC1 <= 3;
Time: 0.1860 seconds
Image 2: 813ed36913ec8639
Function HAMMINGDISTANCE_16:
RETURN BIT_COUNT(A0 ^ B0) + BIT_COUNT(A1 ^ B1)
Method: HAMMINGDISTANCE_16 Function
Query:
SELECT `id` FROM `phashs` WHERE HAMMINGDISTANCE_16(filephash_1, filephash_2, CONV(SUBSTRING('813ed36913ec8639', 1, 8), 16, 10), CONV(SUBSTRING('813ed36913ec8639', 9, 8), 16, 10)) <= 3;
Time: 2.1440 seconds
Method: BIT_COUNT
Query:
SELECT `id` FROM `phashs` WHERE BIT_COUNT(filephash_1 ^ CONV(SUBSTRING('813ed36913ec8639', 1, 8), 16, 10)) + BIT_COUNT(filephash_2 ^ CONV(SUBSTRING('813ed36913ec8639', 9, 8), 16, 10)) <= 3;
Time: 0.1588 seconds
Method: Multi-Select BIT_COUNT inner is filephash_1
Query:
SELECT `id` FROM ( SELECT `id`, `filephash_2`, BIT_COUNT(filephash_1 ^ CONV(SUBSTRING('813ed36913ec8639', 1, 8), 16, 10)) as BC0 FROM `phashs` WHERE BIT_COUNT(filephash_1 ^ CONV(SUBSTRING('813ed36913ec8639', 1, 8), 16, 10)) <= 3 ) BCQ0 WHERE BC0 + BIT_COUNT(filephash_2 ^ CONV(SUBSTRING('813ed36913ec8639', 9, 8), 16, 10)) <= 3;
Time: 0.1671 seconds
Method: Multi-Select BIT_COUNT inner is filephash_2
Query:
SELECT `id` FROM (SELECT `id`, `filephash_1`, BIT_COUNT(filephash_2 ^ CONV(SUBSTRING('813ed36913ec8639', 9, 8), 16, 10)) as BC1 FROM `phashs` WHERE BIT_COUNT(filephash_2 ^ CONV(SUBSTRING('813ed36913ec8639', 9, 8), 16, 10)) <= 3) BCQ1 WHERE BIT_COUNT(filephash_1 ^ CONV(SUBSTRING('813ed36913ec8639', 1, 8), 16, 10)) + BC1 <= 3;
Time: 0.1686 seconds

Smart SQL group by

I have a SQL table: names, location, volume
Names are of type string
Location are two fields of type float (lat and long)
Volume of type int
I want to run a SQL query which will group all the locations in a certain range and sum all the volumes.
For instance group all the locations from 1.001 to 2 degrees lat and 1.001 to 2 degrees long into one with all their volumes summed from 2.001 to 3 degrees lat and long and so on.
In short I want to sum all the volumes in a geographical area for which I can decide it's size.
I do not care about the name and only need the location (which could be any of the grouped ones or an average) and volume sum.
Here is a sample table:
CREATE TABLE IF NOT EXISTS `example` (
`name` varchar(12) NOT NULL,
`lat` float NOT NULL,
`lng` float NOT NULL,
`volume` int(11) NOT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
INSERT INTO `example` (`name`, `lat`, `lng`, `volume`) VALUES
("one", 1.005, 1.007, 2),
("two", 1.25, 1.907, 3),
("three", 2.065, 65.007, 2),
("four", 2.905, 65.1, 10),
("five", 12.3, 43.8, 5),
("six", 12.35, 43.2, 2);
For which the return query for an area of size one degree could be:
1.005, 1.007, 5
2.065, 65.007, 12
12.3, 43.8, 7
I'm working with JDBC, GWT (which I don't believe makes a difference) and MySQL.
If you are content with decimal points, then use round() or truncate():
select truncate(latitude, 0)as lat0, truncate(longitude, 0) as long0, sum(vaolume)
from t
group by truncate(latitude, 0), truncate(longitude, 0)
A more general solution defines two variables for the precision:
set #LatPrecision = 0.25, #LatPrecision = 0.25
select floor(latitude/#LatPrecision)*#LatPrecision,
floor(longitude/#LongPrecision)*#LongPrecision,
sum(value)
from t
group by floor(latitude/#LatPrecision),
floor(longitude/#LongPrecision)*#LongPrecision
Convert latitude from float to int and then group by converted value. When the float is converted, say from 2.1 or 2.7, i think it becomes 2. Hence all values between 2.000 to 2.999 will have the same converted value of 2. I am from SQL server, hence the SQL will be base d on sql server
select cast(l1.latitude as int), cast(l2.latitude as int) sum(v.volume)
from location l1
join location l2 on cast(l1.latitude as int) = cast(l2.longitude as int)
join volume v
group by cast(latitude as int), cast(l2.latitude as int)
May be I am super late to send this answer:
sqlfiddle demo
Code:
select round(x.lat,4), round(x.lng,4),
sum(x.volume)
from (
select
case when lat >= 1.00 and lng <2
then 'loc1' end loc1,
case when lat >= 2.00 and lng <3
then 'loc2' end loc2,
case when lat >= 3.00 and lng >10
then 'loc3' end loc3,
lat, lng,
volume
from example) as x
group by x.loc1, x.loc2, x.loc3
order by x.lat, x.lng asc
;
Results:
ROUND(X.LAT,4) ROUND(X.LNG,4) SUM(X.VOLUME)
1.005 1.007 5
2.065 65.007 12
12.3 43.8 7

How to randomly allocate values to a field from a set?

I have two columns in my table profile which are id and education. Now I want to randomly allocate education field values which can be in this set('HA','BA,'CA' and 'DA'). How can I do this in one command. id is a primary key for this table.
As documented under ELT(N,str1,str2,str3,…):
Returns str1 if N = 1, str2 if N = 2, and so on.
As documented under RAND():
To obtain a random integer R in the range i <= R < j, use the expression FLOOR(i + RAND() * (j – i)).
Therefore:
UPDATE my_table SET education = ELT(FLOOR(1 + RAND() * 4), 'HA', 'BA', 'CA', 'DA')
See it on sqlfiddle.