I would like to create a MySQL query to find the longest match (of a given ip address in quad-dotted format) that is present in a table of subnets.
Ultimately, I'd like to create a LEFT JOIN that will display every quad-dotted ip address in one table joined with their longest matches in another table. I don't want to create any temporary tables or structure it as a nested query.
I'm somewhat of a MySQL newbie, but what I'm thinking is something like this:
SELECT `ip_address`
LEFT JOIN ON
SELECT `subnet_id`
FROM `subnets_table`
WHERE (`maximum_ip_value` - `minimum_ip_value`) =
LEAST(<list of subnet intervals>)
WHERE INET_ATON(<given ip address>) > `minimum_ip_value`
AND INET_ATON(<given ip address>) < `maximum_ip_value`;
Such that minimum_ip_value and maximum_ip_value are the lowest and highest decimal-formatted ip addresses possible in a given subnet-- e.g., for the subnet 172.16.0.0/16:
minimum_ip_value = 2886729728 (or 172.16.0.0)
maximum_ip_value = 2886795263 (or 172.16.255.255)
And <list of subnet intervals> contains all intervals in subnets_table where <given ip address> is between minimum_ip_value and maximum_ip_value
And if more than one interval contains <given ip address>, then the smallest interval (i.e., smallest subnet, or most specific and "longest" match) is joined.
Ultimately, all I really want is the subnet_id value that corresponds with that interval.
So my questions are:
1) Can I use the LEAST() function with an arbitrary number of parameters? I'd like to compare every row of subnets_table, or more specifically, every row's interval between minimum_ip_value and maximum_ip_value, and select the smallest interval.
2) Can I perform all of this computation within a LEFT JOIN query? I'm fine with any suggestions that will be fast, encapsulated, and avoid repetitive queries of the same data.
I'm wondering if this is even possible to perform in a single query (i.e., without querying the subnets table for each ip address), but I don't know enough to rule it out. Please advise if this looks like it won't work, so I can try another angle.
Thanks.
After some research and trial & error, I see that there are a few issues with the prototype query above:
The LEAST() function takes only a set number of arguments. As per my original question, I want a function that will work on an arbitrary number of arguments, or every row in a table. That is a different function in MySQL, MIN().
The function MIN() has a lower precedence than the JOIN functions in MySQL, and is evaluated after the JOIN functions in any given query. Therefore, I can't JOIN on the MIN() of a set of values, because the MIN() doesn't exist yet at the time the JOIN is performed.
The only way I could see to solve this issue was to perform two separate queries: one with the MIN(), performed first, and another with the JOIN, performed on the results of the first query. This meant that for a table with n rows, I'd perform n^n queries, instead of n queries. That wasn't acceptable.
To work around the issue, I wrote a new script that modifies the database before any of these queries are ever performed. Each subnet is given its own "bucket" of ip values, and all values in that range map to that subnet. If a more specific (i.e., smaller) subnet overlaps a less specific (i.e., larger) subnet, then the more specific range is mapped only to the smaller subnet, and the larger subnet retains only the values from the less specific range. Now any given ip address falls into only one "bucket", and maps to only one subnet, which is its most specific match. I can JOIN on this match and never have to worry about the MIN() function.
Related
I'm working on some kind of RNA probe data. For one probe, a set of RNA sequences are supplied, typically 20-40 sequences, e.g.
As can be seen from the picture, each sequence, within a set, is about 30 characters long.
When populating the database with NEW probes, a new set of sequences are supplied and associated with the new probe.
We will need to check and make sure that the new set of sequences are not the same as one that already exists in the database.
First test would be the number of sequences (the data above got 20). That is a simple test.
If the set size is equal, then we need to check each item within a set. However, the order of the items, within each set, are irrelevant.
Question is, does MySQL have an inbuilt command to check for equality between two sets, where the order of each item, in each set, is irrelevant?
The short answer to your question, is "no", there's no built-in command in MySQL to check for equality. In some other databases, there is an INTERSECT or MINUS/EXCEPT operator that would do pretty much what you are asking for. I've made the assumption here that sequences within a probe are unique. The SQL below can probably be adapted to do the job. I've prepared and tested a DBFiddle sample here. Basically what it does is joins all the sequences in the new probe to all the sequences in the existing probes, then checks to see if the number of records returned from the join is the same as the total count of records in the existing probe. If the counts match, then the new probe is a duplicate. The query will return the id of the existing duplicate probe. HTH.
SELECT x.probe,
COUNT(*) AS newrecs,
proberecs
FROM (SELECT a.probe,
a.rnaseq
FROM rnaprobes a
JOIN newprobe b
ON a.rnaseq = b.rnaseq) x
JOIN (SELECT probe,
COUNT(*) AS proberecs
FROM rnaprobes
GROUP BY probe) c
ON x.probe = c.probe
GROUP BY x.probe
HAVING COUNT(*) = proberecs
SQL SELECT query for multiple values and ranges in row data of a given column.
Problem description:
Server: MySQL
Database: Customer
Table: Lan
Column: Allowed VLAN (in the range 1-4096)
One row has data as below in the column Allowed VLAN:
180,181,200,250-499,550-811,826-mismatched
I need a SELECT statement WHERE the column Allowed VLAN includes a given number for instance '600'. The given number '600' is even one of the comma separated value or included in any of the ranges "250-499","550-811" or it is just the starting number value of "826-mismatched" range.
SELECT * WHERE `Allowed VLAN`='600' OR `Allowed VLAN` LIKE '%600%' OR (`Allowed VLAN` BETWEEN '1-1' AND '1-4096');
I could not figure it out how to deal with data ranges with WHERE Clause. I have solved the problem with PHP code using explode() split functions etc., but I think there are some SQL SELECT solutions.
I would be appreciated for any help.
I would highly recommend normalizing your data. Storing a comma-separated list of items in a single row is generally never a good idea.
Assuming you can make such a change, then something like this should work for you (although you could consider storing your ranges in different columns to make it even easier):
create table lan (allowedvan varchar(100));
insert into lan values
('180'),('181'),('200'),('250-499'),('550-811'),('826-mismatched');
select *
from lan
where allowedvan = '600'
or
(instr(allowedvan,'-') > 0 and
'600' >= left(allowedvan,instr(allowedvan,'-')-1) and
'600' <= right(allowedvan,length(allowedvan)-instr(allowedvan,'-'))
)
SQL Fiddle Demo
This uses INSTR to determine if the value contains a range (a hyphen) and then uses LEFT and RIGHT to get the range. Do not use LIKE because that could return inaccurate results (600 is like 1600 for example).
If you are unable to alter your database, then perhaps look into using a split function (several posts on it on SO) and then you can do something similar to the above method.
So I'm building a bit of an API where users can query my database with read-only access. However, I want to block certain fields, specifically IP addresses. I'm currently using preg_replace in PHP to match and switch out IPs, but I feel like someone could get around that with come clever string-splitting MySQL functions.
Is there a way I can block/replace/obfuscate this particular field for this read-only MySQL user?
The record would be at (table.field):
`TrafficIp`.`Value`
An example query they might use would be
SELECT COUNT(*) Hits, Value IpAddress
FROM TrafficIp
INNER JOIN Traffic
ON Traffic.IpId = TrafficIp.Id
GROUP BY Value
ORDER BY Hits DESC
How would I bait and switch?
You could create a view of your table that omits the field with the IP address, and let API users query that view, but not the underlying table.
Really, instead of trying to do "damage control" on the back end of the query, your API should be filtering the queries before they ever make it to the database. It is highly inadvisable to just pass through raw SQL queries from the outside world, into your database.
I have a table with columns latitude and longitude. In most cases the value extends past the decimal quite a bit: -81.7770051972473 on the rare occasion the value is like this: -81.77 for some records.
How do I find duplicates and remove one of the duplicates for only the records that extend beyond two decimal places?
Using some creative substring, float, and charindex logic, I came up with this:
delete l1
from
latlong l1
inner join (
select
id,
substring(cast(latitude as varchar), 0, INSTR(CAST(latitude as varchar))+3, '.') as truncatedLat
from
latlong
) l2 on
l1.id <> l2.id
and l1.latitude = cast(l2.truncatedLat as float)
Before running, try select * in lieu of delete l1 first to make sure you're deleting the right rows.
I should note that this worked on SQL Server using functions I know exist in MySQL, but I wasn't able to test it against a MySQL instance, so there may be some little tweaking that needs to be done. For example, in SQL Server, I used charindex instead of instr, but both should work similarly.
Not sure how to do that purely in SQL.
I have used scripting languages like PHP or CFML to solve similar needs by building a query to pull the records then looping over the record set and performing some comparison. If true, then VERY CAREFULLY call another function, passing in the record ID and delete the record. I would probably even leave the record in the table, but mark some another column as isDeleted.
If you are more ambitious than I, it looks like this thread is close to what you want
Deleting Duplicates in MySQL
finding multi column duplicates mysql
Using an external programming language (Perl, PHP, Java, Assembly...):
Select * from database
For each row, select * from database where newLat >= round(oldLat,2) and newLat < round(oldLat,2) + .01 and //same criteria for longitude
Keep one of them based on whatever criteria you choose. If lowest primary key, sort by that and skip the first result.
Delete everything else.
Repeat skipping to this step for any records you already deleted.
If for some reason you want to identify everything with greater than 2 digit precision:
select * from database where lat != round(lat,2), or long != round(long,2)
I have the following query..
SELECT Flights.flightno,
Flights.timestamp,
Flights.route
FROM Flights
WHERE Flights.adshex = '400662'
ORDER BY Flights.timestamp DESC
Which returns the following screenshot.
However I cannot use a simple group by as for example BCS6515 will appear a lot later in the list and I only want to "condense" the rows that are the same next to each other in this list.
An example of the output (note BCS6515 twice in this list as they were not adjacent in the first query)
Which is why a GROUP BY flightno will not work.
I don't think there's a good way to do so in SQL without a column to help you. At best, I'm thinking it would require a subquery that would be ugly and inefficient. You have two options that would probably end up with better performance.
One would be to code the logic yourself to prune the results. (Added:) This can be done with a procedure clause of a select statement, if you want to handle it on the database server side.
Another would be to either use other information in the table or add new information to the table for this purpose. Do you currently have something in your table that is a different value for each instance of a number of BCS6515 rows?
If not, and if I'm making correct assumptions about the data in your table, there will be only one flight with the same number per day, though the flight number is reused to denote a flight with the same start/end and times on other days. (e.g. the 10a.m. from NRT to DTW is the same flight number every day). If the timestamps were always the same day, then you could use DAY(timestamp) in the GROUP BY. However, that doesn't allow for overnight flights. Thus, you'll probably need something such as a departure date to group by to identify all the rows as belonging to the same physical flight.
GROUP BY does not work because 'timestamp' value is different for 2 BCS6515 records.
it will work only if:
SELECT Flights.flightno,
Flights.route
FROM Flights
WHERE Flights.adshex = '400662'
GROUP BY (Flights.flightno)