Convert IP address (IPv4) itno an Integer in R - mysql

I was looking for a way to write a function in R which converts an IP address into an integer.
My dataframe looks like this:
total IP
626 189.14.153.147
510 67.201.11.8
509 64.22.53.140
483 180.9.85.10
403 98.8.136.126
391 64.06.187.68
I export this data from mysql database. I have a query where i can convert an IP address into an integer in mysql:
mysql> select CAST(SUBSTRING_INDEX(SUBSTRING_INDEX('75.19.168.155', '.', 1), '.', -1) << 24 AS UNSIGNED) + CAST(SUBSTRING_INDEX(SUBSTRING_INDEX('75.19.168.155', '.', 2), '.', -1) << 16 AS UNSIGNED) + CAST(SUBSTRING_INDEX(SUBSTRING_INDEX('75.19.168.155', '.', 3), '.', -1) << 8 AS UNSIGNED) + CAST(SUBSTRING_INDEX(SUBSTRING_INDEX('75.19.168.155', '.', 4), '.', -1) AS UNSIGNED) FINAL;
But I want to do this conversion in R, any help would be awesome

You were not entirely specific about what conversion you wanted, so I multiplied the decimal values by what I thought might appropriate (thinking the three digit items were actually digit equivalents in "base 256" numbers then redisplayed in base 10). If you wanted the order of the locations to be reversed, as I have seen suggested elsewhere, you would reverse the indexing of 'vals' in both solutions
convIP <- function(IP) { vals <- read.table(text=as.character(IP), sep=".")
return( vals[1] + 256*vals[2] + 256^2*vals[3] + 256^3*vals[4]) }
> convIP(dat$IP)
V1
1 2476281533
2 134990147
3 2352289344
4 173345204
5 2122844258
6 1153107520
(It's usually better IT practice to specify what you think to be the correct answer so testing can be done. Bertelson's comment above would be faster and implicitly uses 1000, 1000^2 and 1000^3 as the factors.)
I am taking a crack at simplifying the code but fear that the need to use Reduce("+", ...) may make it more complex. You cannot use sum because it is not vectorized.
convIP <- function(IP) { vals <- read.table(text=as.character(IP), sep=".")
return( Reduce("+", vals*256^(3:0))) }
> convIP(dat$IP)
[1] 5737849088 5112017 2717938944 1245449 3925902848 16449610

Related

MySql function: Hexadecimal conversation to Float

I have a bit of a challenge in converting a hexadecimal string into a float.
Here is an example:
Hex:
3F62 0C3C
Binary: 00111111011000100000110000111100
Conversion result (float big endian):
0.8829992
Can this conversion be achieved with MySql functions and, if possible, how?
Thank you for your help.
So, I assume you have your hexadecimal string in some sort of TEXT type:
SELECT #hexstring := '3F62 0C3C'
The first thing you need to do is to convert it to a decimal. You can do this using the CONV function. For that, first remove the whitespace from the string with the REPLACE function. Depending on the parameters of the CONV function, you can e.g. get a binary or a decimal representation:
SELECT #binvalue := CONV(REPLACE(#hexstring, ' ', ''), 16, 2) -- 111111011000100000110000111100
SELECT #decvalue := CONV(REPLACE(#hexstring, ' ', ''), 16, 10) -- 1063390268
From the decimal representation, you can calculate your float (based on this SO answer):
SELECT SIGN(#decvalue) * (1.0 + (#decvalue & 0x007FFFFF) * POWER(2.0, -23))
* POWER(2.0, (#decvalue & 0x7f800000) / 0x00800000 - 127)
Result:
0.8829991817474365
Or all in one:
SELECT SIGN(CONV(REPLACE(#hexstring, ' ', ''), 16, 10))
* (1.0 + (CONV(REPLACE(#hexstring, ' ', ''), 16, 10) & 0x007FFFFF)
* POWER(2.0, -23))
* POWER(2.0, (CONV(REPLACE(#hexstring, ' ', ''), 16, 10) & 0x7f800000) / 0x00800000 - 127)
Test all of it together in this db<>fiddle.

MySQL Multiple IF CASE Statements

I have a temporary table "productsTmp", in which I have a column "footprintSize".
The column contains several strings with the next format:
a) 12 x 34
b) v 11
c) 12 x 34 (v 12)
I want to extract the numbers only, in order to obtain something like:
a) v1 = 12 ; v2 = 34
b) v3 = 11
c) v1 = 12 ; v2 = 34 ; v3 = 12
Note: The values are from a Rectangular Prism, v1 = width, v2 = length, v3 = height. The height always comes after the character "v" (which in my case is ø).
To extract this values, I thought of using a subquery loop, but I've only come with the next idea:
IF footprintSize LIKE '%x%'
-- Example: 24 x 24
SELECT SUBSTRING_INDEX(footprintSize, 'x', 1) AS lval;
SELECT SUBSTRING_INDEX(footprintSize, 'x', -1) AS rval;
-- Example: 8 x 8 (Ø 10)
IF rval LIKE '%)%'
SELECT SUBSTRING_INDEX(SELECT SUBSTRING_INDEX(SELECT SUBSTRING_INDEX(footprintSize, '(', -1), ' ', -1), ')', 1) AS dval;
ELSE
-- Example: ø 11
SELECT SUBSTRING_INDEX(footprintSize, ' ', -1) AS dval;
END IF;
However I've been told that "IF" only works in stored procedures, which is not something I'm looking for. So I tried the next:
SELECT
CASE TRUE
WHEN footprintSize LIKE "%x%" THEN (SUBSTRING_INDEX(footprintSize,'x', 1))
WHEN footprintSize LIKE "%)%" THEN (SUBSTRING_INDEX(footprintSize,')', -1))
END as "footprintSize"
FROM productsTmp
But I'm not close to achieve what I want.
At the end, I want to have something like:
footprintSize lval rval dval
24 x 24 24 24
8 x 8 (Ø 10) 8 8 10
ø 11 11
For the empty spaces, I can have null or even add a zero, but I'm more concerned of how can I split this data into three columns.
Thank you.

Cast number of bytes from blob field to number

I have a table with one blob field named bindata. bindata always contains 7 bytes. First four of them is an integer (unsigned I think, db is not mine).
My question is how can I select only the first four bytes from bindata and convert them to a number?
I am new in mySQL but from the documentation I see that I may have to use the conv function by doing something like this:
SELECT CONV(<Hex String of first 4 bytes of bindata>,16,10) as myNumber
But I don't have a clue on how to select only the first four bytes of the blob field. I am really stuck here.
Thanks
You can use string function to get partial of byte in the blob. For example:
SELECT id,
((ORD(SUBSTR(`data`, 1, 1)) << 24) +
(ORD(SUBSTR(`data`, 2, 1)) << 16) +
(ORD(SUBSTR(`data`, 3, 1)) << 8) +
ORD(SUBSTR(`data`, 4, 1))) AS num
FROM test;
Here is Demo in SQLFiddle

MySQL: an efficient binary value comparison

My table has 8 VARCHAR fields of binary strings of 64bits each one. My goal is to get Hamming distance for each register. I was doing it with the next query :
SELECT
BIT_COUNT(CONV(fp.bin_str0, 2, 10 ) ^ CONV('0000000001101111000000000101011100000000001010100000000001111101', 2, 10 )) +
BIT_COUNT(CONV(fp.bin_str1, 2, 10 ) ^ CONV('0000000010110001000000001000000000000000011000010000000011110100', 2, 10 )) +
BIT_COUNT(CONV(fp.bin_str2, 2, 10 ) ^ CONV('0000000010010100000000000010101100000000110001000000000011100100', 2, 10 )) +
BIT_COUNT(CONV(fp.bin_str3, 2, 10 ) ^ CONV('0000000011101011000000000001110000000000101100010000000000011001', 2, 10 )) +
BIT_COUNT(CONV(fp.bin_str4, 2, 10 ) ^ CONV('0000000000010000000000000011010100000000111011100000000001001101', 2, 10 )) +
BIT_COUNT(CONV(fp.bin_str5, 2, 10 ) ^ CONV('0000000000101111000000000110101000000000000010100000000000101101', 2, 10 )) +
BIT_COUNT(CONV(fp.bin_str6, 2, 10 ) ^ CONV('0000000000011000000000000101011000000000001010000000000000001011', 2, 10 )) +
BIT_COUNT(CONV(fp.bin_str7, 2, 10 ) ^ CONV('0000000000101011000000000011100100000000000100000000000000111010', 2, 10 )) from mytable fp
So this query is extremely slow. There are some reasons: mytable has 3M registers and the field fp.bin_stri is of VARCHAR type.
As MySQL has BINARY type, can I execute the same query over fp.bin_stri of BINARY type? An how?
I'm confused because, when I have changed fp.bin_stri to BINARY, the data of this field has appeared as BLOB and now I don't know how the query should look like. Should it use CONV?
A 64-bit binary string is the same size as MySQL's BIGINT type (standard size on modern hardware of double-precision float or long integer). Use a BIGINT UNSIGNED to store each field, then you can compare to other bit fields using the b'1010...' syntax instead of CONV().
BIT_COUNT(fp.strN ^ b'0000000001101111000000000101011100000000001010100000000001111101')
Should be really fast since the hardware is designed to do bit ops on 64-bit values.

Is there way to match IP with IP+CIDR straight from SELECT query?

Something like
SELECT COUNT(*) AS c FROM BANS WHERE typeid=6 AND (SELECT ipaddr,cidr FROM BANS) MATCH AGAINST 'this_ip';
So you don't first fetch all records from DB and then match them one-by one.
If c > 0 then were matched.
BANS table:
id int auto incr PK
typeid TINYINT (1=hostname, 4=ipv4, 6=ipv6)
ipaddr BINARY(128)
cidr INT
host VARCHAR(255)
DB: MySQL 5
IP and IPv type (4 or 6) is known when querying.
IP is for example ::1 in binary format
BANNED IP is for example ::1/64
Remember that IPs are not a textual address, but a numeric ID. I have a similar situation (we're doing geo-ip lookups), and if you store all your IP addresses as integers (for example, my IP address is 192.115.22.33 so it is stored as 3228767777), then you can lookup IPs easily by using right shift operators.
The downside of all these types of lookups is that you can't benefit from indexes and you have to do a full table scan whenever you do a lookup. The above scheme can be improved by storing both the network IP address of the CIDR network (the beginning of the range) and the broadcast address (the end of the range), so for example to store 192.168.1.0/24 you can store two columns:
network broadcast
3232235776, 3232236031
And then you can to match it you simply do
SELECT count(*) FROM bans WHERE 3232235876 >= network AND 3232235876 <= broadcast
This would let you store CIDR networks in the database and match them against IP addresses quickly and efficiently by taking advantage of quick numeric indexes.
Note from discussion below:
MySQL 5.0 includes a ranged query optimization called "index merge intersect" which allows to speed up such queries (and avoid full table scans), as long as:
There is a multi-column index that matches exactly the columns in the query, in order. So - for the above query example, the index would need to be (network, broadcast).
All the data can be retrieved from the index. This is true for COUNT(*), but is not true for SELECT * ... LIMIT 1.
MySQL 5.6 includes an optimization called MRR which would also speed up full row retrieval, but that is out of scope of this answer.
For IPv4, you can use:
SET #length = 4;
SELECT INET_NTOA(ipaddr), INET_NTOA(searchaddr), INET_NTOA(mask)
FROM (
SELECT
(1 << (#length * 8)) - 1 & ~((1 << (#length * 8 - cidr)) - 1) AS mask,
CAST(CONV(SUBSTR(HEX(ipaddr), 1, #length * 2), 16, 10) AS DECIMAL(20)) AS ipaddr,
CAST(CONV(SUBSTR(HEX(#myaddr), 1, #length * 2), 16, 10) AS DECIMAL(20)) AS searchaddr
FROM ip
) ipo
WHERE ipaddr & mask = searchaddr & mask
IPv4 addresses, network addresses and netmasks are all UINT32 numbers and are presented in human-readable form as "dotted-quads". The routing table code in the kernel performs a very fast bit-wise AND comparison when checking if an address is in a given network space (network/netmask). The trick here is to store the dotted-quad IP addresses, network addresses and netmasks in your tables as UINT32, and then perform the same 32-bit bit-wise AND for your matching. eg
SET #test_addr = inet_aton('1.2.3.4');
SET #network_one = inet_aton('1.2.3.0');
SET #network_two = inet_aton('4.5.6.0');
SET #network_netmask = inet_aton('255.255.255.0');
SELECT (#test_addr & #network_netmask) = #network_one AS IS_MATCHED;
+------------+
| IS_MATCHED |
+------------+
| 1 |
+------------+
SELECT (#test_addr & #network_netmask) = #network_two AS IS_NOT_MATCHED;
+----------------+
| IS_NOT_MATCHED |
+----------------+
| 0 |
+----------------+
Generating IP Address Ranges as Integers
If your database doesn't support fancy bitwise operations, you can use a simplified integer based approach.
The following example is using PostgreSQL:
select (cast(split_part(split_part('4.0.0.0/8', '/', 1), '.', 1) as bigint) * (256 * 256 * 256) +
cast(split_part(split_part('4.0.0.0/8', '/', 1), '.', 2) as bigint) * (256 * 256 ) +
cast(split_part(split_part('4.0.0.0/8', '/', 1), '.', 3) as bigint) * (256 ) +
cast(split_part(split_part('4.0.0.0/8', '/', 1), '.', 4) as bigint))
as network,
(cast(split_part(split_part('4.0.0.0/8', '/', 1), '.', 1) as bigint) * (256 * 256 * 256) +
cast(split_part(split_part('4.0.0.0/8', '/', 1), '.', 2) as bigint) * (256 * 256 ) +
cast(split_part(split_part('4.0.0.0/8', '/', 1), '.', 3) as bigint) * (256 ) +
cast(split_part(split_part('4.0.0.0/8', '/', 1), '.', 4) as bigint)) + cast(
pow(256, (32 - cast(split_part('4.0.0.0/8', '/', 2) as bigint)) / 8) - 1 as bigint
) as broadcast;
Hmmm. You could build a table of the cidr masks, join it, and then compare the ip anded (& in MySQL) with the mask with the ban block ipaddress. Would that do what you want?
If you don't want to build a mask table, you could compute the mask as -1 << (x-cidr) with x = 64 or 32 depending.