Extract Data Between Parenthesis That's Always Different Lengths - mysql

I have run into a problem and I cannot seem to find the correct solution anywhere. I'd like to extract data from a column that is never going to be the same length but, will always be in parenthesis. I've tried different SUBSTR and LOCATE statements to no avail.
Table: FiguresLog
|UpdateDate| |Description|
|2014-01-01| |(10.0.600.1) Various descriptions follow|
|2014-01-02| |(192.168.10.100) Various descriptions follow|
I need to be able to extract (create a new table/field) containing the IP Addresses within the parenthesis and as I stated, they will always be a different length.

You can do this with regular expressions. But, if there is only one parenthetical expression in the string, this should work for you:
select substring_index(substring_index(description, ')', 1), '(', -1) as IpAddress

You can do the whole thing using LOCATE and SUBSTR. Because of how SUBSTR takes position and length, the math gets a little funky. Hopefully this example makes it clear:
SELECT
SUBSTR(text, ip_start, ip_len) AS ip_addr
FROM
(
SELECT text,
(LOCATE('(', text) + 1) AS ip_start,
(LOCATE(')', text) - (LOCATE('(', text) + 1)) AS ip_len
FROM test
) temp;
Notice that (LOCATE('(', text) + 1) gets repeated. The + 1 is so we don't include the parenthesis in the substring.
The actual calculation for ip_len is ip_len = end_paren_pos - ip_start but we cannot create and select from ip_start in the same query.
Example in action: http://sqlfiddle.com/#!2/2845e/3

Related

Mysql: extract a string from field between delimiters (backwards)

I have a Column 'ACCOUNT_NUMBER' from a table 'BankingActivity' which contains data as follow :
example:
ManualBanking-BankDeposit-350-1006590343--INTERNAL_A
or
MyPayCard-MyPayDeposit-620-989228234--TL
I need to extract the number '1006590343' or '989228234'
Initially i execute the following query:
select substr( `BankingActivity`.`ACCOUNT_NUMBER`,(
locate( '--', `BankingActivity`.`ACCOUNT_NUMBER` ) - 9 ),9 ) * 1
from BankingActivity
Which works fine if the length of the string does not exceed 9 digits. Over 9 digits, I obviously have issues and can not get the full string.
How can i look backwards for the delimiter '--' and then extract the value between the '--' delimiter and the previous '-' delimiter?
I tried with some Regex but I am not familiar enough with it to get a correct result.
Try
SELECT regexp_substr(
regexp_substr(acct, '-\\d+--'), '\\d+')
FROM (
SELECT 'ManualBanking-BankDeposit-350-1006590343--INTERNAL_A' as acct
UNION
SELECT 'MyPayCard-MyPayDeposit-620-989228234--TL'
) accounts;
The inner regexp_substr extracts a substring that begins with a dash followed by 1 or more digits and ends with two dashes. That would be e. g. '-1006590343--'. From this, the outer regexp_substr extracts all consecutive digits, that is '1006590343'.
More detailed information about regular expressions in MySQL can be found in the documentation.
If I have understood your question correctly then you can try something like this -
select SUBSTRING_INDEX(SUBSTRING_INDEX('ManualBanking-BankDeposit-350-1006590343--INTERNAL_A', '-' ,-3), '--', 1);
select SUBSTRING_INDEX(SUBSTRING_INDEX('MyPayCard-MyPayDeposit-620-989228234--TL', '-' ,-3), '--', 1);
This is probably a job for SUBSTRING_INDEX().
Check it out. Fiddle here.
SET #s = 'ManualBanking-BankDeposit-350-1006590343--INTERNAL_A';
SELECT SUBSTRING_INDEX(#s, '-', -3);
This splits your string on '-'. It takes everything after the third '-' delimiter from the end, and gives you back 1006590343--INTERNAL_A.
Then we use SUBSTRING_INDEX() again on that.
SELECT SUBSTRING_INDEX(SUBSTRING_INDEX(#s, '-', -3), '-', 1);
Lo and behold, this gets us 1006590343.
But. This is a brittle way to do it. MySQL's string processing isn't easy to program in detailed ways. This solution doesn't take into account things like missing dashes at the end of the string. Garbage in, garbage out. Use a host language like C# / php / nodejs / Java etc to do this kind of string analysis if you want it to be super-robust for real world data.

Use of REGEXP_SUBSTR to get date values from string

I'm looking for the REGEXP_SUBSTR code that gets dates like format '06-11-2014 - 05-12-2014' or format '01/11/2019 - 30/11/2019' from a string. The first date being the startdate and the second date being the enddate. It would be extremely helpful to understand how the REGEXP_SUBSTR works in this case and also why. I want to get the string with the two dates, but then I want both dates to be in their own column.
A record look likes this:
Medium - nl (06-11-2014 - 05-12-2014) ruimte: Standaard (5.000 MB).
Although text can be shorter or longer the two dates between brackets are always there.
The code below gets the first one, but only if it's with '-'. I want both '-' and '/' variants displayed.
REGEXP_SUBSTR(description, '[0-9][0-9][-[0-9][0-9]-[0-9][0-9][0-9][0-9]')
Thanks a lot for any and all help.
Since you are using MySQL 8+, it means you also have access to the REGEXP_REPLACE function, which is suitable for isolating the portion of the string which contains the two dates. In the CTE below, I isolate the date string, then in a subquery on that CTE, I fish out the two dates in separate columns using SUBSTRING_INDEX.
WITH cte AS (
SELECT
text,
REGEXP_REPLACE(text, '^.*\(([0-9]{2}-[0-9]{2}-[0-9]{4} - [0-9]{2}-[0-9]{2}-[0-9]{4})\).*$', '$1') AS dates
FROM yourTable
)
SELECT
text,
SUBSTRING_INDEX(dates, ' - ', 1) AS first_date,
SUBSTRING_INDEX(dates, ' - ', -1) AS second_date
FROM cte;
Demo
Here is an explanation of the regex pattern used:
^ from the start of the string
.* match any content, until hitting
\( '(' which is followed by
( (capture what follows)
[0-9]{2}-[0-9]{2}-[0-9]{4} a single date
- -
[0-9]{2}-[0-9]{2}-[0-9]{4} another single date
) (stop capture)
\) ')'
.* match the remainder of the content
$ end of the string
Note that we include a pattern which matches the entire input, which is a requirement since we want to use a capture group. Also, note that REGEXP_SUBSTR might have been viable here, but it could run the risk that you get false positives, in the event that a date could appear elsewhere besides the terms in parentheses.

how to handle white spaces in sql

I want to write an SQL query that will fetch all the students who live in a specific Post Code. Following is my query.
SELECT * FROM `students` AS ss WHERE ss.`postcode` LIKE 'SE4 1NA';
Now the issue is that in database some records are saved without the white space is postcode, like SE41NA and some may also be in lowercase, like se41na or se4 1na.
The query gives me different results based on how the record is saved. Is there any way in which I can handle this?
Using regexp is one way to do it. This performs a case insensitive match by default.
SELECT * FROM students AS ss
WHERE ss.postcode REGEXP '^SE4[[:space:]]?1NA$';
[[:space:]]? matches an optional space character.
REGEXP documentation MySQL
Whether case matters depends on the collation of the string/column/database/server. But, you can get around it by doing:
WHERE UPPER(ss.postcode) LIKE 'SE4%1NA'
The % will match any number of characters, including none. It is a bit too general for what you might really need -- but it should work fine in practice.
The more important issue is that your database does not validate the data being put into it. You should fix the application so the postal codes are correct and follow a standard format.
Use a combination of UPPER and REPLACE.
SELECT *
FROM students s
WHERE UPPER(REPLACE(s.postcode, ' ', '')) LIKE '%SE41NA%'
SELECT * FROM students AS ss
WHERE UPPER(REPLACE(ss.postcode, ' ', '')) = 'SE41NA' ;
SELECT *
FROM students AS ss
WHERE UPPER(ss.postcode) LIKE SELECT REPLACE(UPPER('SE4 1NA'), ' ', '%'); ;
I propose using the spaces replaced with the'%' placeholder. Also transform the case to upper for both sides of the LIKE operator

Finding number of occurence of a specific string in MYSQL

Consider the string "55,33,255,66,55"
I am finding ways to count number of occurence of a specific characters ("55" in this case) in this string using mysql select query.
Currently i am using the below logic to count
select CAST((LENGTH("55,33,255,66,55") - LENGTH(REPLACE("55,33,255,66,55", "55", ""))) / LENGTH("55") AS UNSIGNED)
But the issue with this one is, it counts all occurence of 55 and the result is = 3,
but the desired output is = 2.
Is there any way i can make this work correct? please suggest.
NOTE : "55" is the input we are giving and consider the value "55,33,255,66,55" is from a database field.
Regards,
Balan
You want to match on ',55,', but there's the first and last position to worry about. You can use the trick of adding commas to the frot and back of the input to get around that:
select LENGTH('55,33,255,66,55') + 2 -
LENGTH(REPLACE(CONCAT(',', '55,33,255,66,55', ','), ',55,', 'xxx'))
Returns 2
I've used CONCAT to pre- and post-pend the commas (rather than adding a literal into the text) because I assume you'll be using this on a column not a literal.
Note also these improvements:
Removal of the cast - it is already numeric
By replacing with a string one less in length (ie ',55,' length 4 to 'xxx' length 3), the result doesn't need to be divided - it's already the correct result
2 is added to the length because of the two commas added front and back (no need to use CONCAT to calculate the pre-replace length)
Try this:
select CAST((LENGTH("55,33,255,66,55") + 2 - LENGTH(REPLACE(concat(",","55,33,255,66,55",","), ",55,", ",,"))) / LENGTH("55") AS UNSIGNED)
I would do an sub select in this sub select I would replace every 255 with some other unique signs and them count the new signs and the standing 55's.
If(row = '255') then '1337'
for example.

Converting a upper case database to proper case

I am new to SQL and I have several large database with upper case first and last names that I need to convert to proper case in SQL sever 2008.
I am using the following to do this:
update database
Set FirstNames = upper(substring(FirstNames, 1, 1))
+ lower(substring(FirstNames, 2, (len(FirstNames) - 1) ))
I was wondering if there was any way to adapt this so that a field with two first names is also updated (currently I make the change and then go through and manually change the second name).
I have looked over the other answers in this field and they all seem quit long, compared to the query above.
Also is there any way to assist with converting the Mc suranmes ( I will manually change the others)? MCDONALD to McDonald, again I am just using the about query but replacing the FirstNames with LastName.
This is probably best done outside of SQL. However, if there is a requirement to do it on the server or if speed isn't an issue (because it will be an issue so you need to figure out if you care), the way you are going about it is probably the best way of doing so. If you want, you could create a UDF that puts all of the logic in one area.
Here is some code I came across (with attribution and more information below it):
CREATE FUNCTION dbo.fCapFirst(#input NVARCHAR(4000)) RETURNS NVARCHAR(4000)
AS
BEGIN
DECLARE #position INT
WHILE IsNull(#position,Len(#input)) > 1
SELECT #input = Stuff(#input,IsNull(#position,1),1,upper(substring(#input,IsNull(#position,1),1))),
#position = charindex(' ',#input,IsNull(#position,1)) + 1
RETURN (#input)
END
--Call it like so
select dbo.fCapFirst(Lower(Column)) From MyTable
I got this code from http://www.sqlteam.com/forums/topic.asp?TOPIC_ID=37760 There is more information and other suggestions in this forum as well.
As for dealing with cases like the McDonald, I would suggest one of two ways to handle this. One would be to put a search in the above UDF for key names ('McDonald', 'McGrew', etc.) or for patterns (the first two letters are Mc then make the next one capital, etc.) The second way would be to put these cases (the full names) in a table and have their replacement value in a second column. Then simply do a replace. Most likely, however, it will be easiest to identify rules like Mc then capitalize instead of trying to list every last-name possibility.
Don't forget you may want to modify the above UDF to include dashes, not just spaces.
Maybe this is too long but it is very easy and can be adapted for -, ', etc:
UPDATE tbl SET LastName = Case when (CharIndex(' ',lastname,1)<>0) then (Upper(Substring(lastname,1,1))+Lower(Substring(lastname,2,CharIndex(' ',lastname,1)-1)))+
(Upper(Substring(lastname,CharIndex(' ',lastname,1)+1,1))+
Lower(Substring(lastname,CharIndex(' ',lastname,1)+2,Len(lastname)-(CharIndex(' ',lastname,1)-1))))
else (Upper(Substring(lastname,1,1))+Lower(Substring(lastname,2,Len(lastname)-1))) end,
FirstName = Case when (CharIndex(' ',firstname,1)<>0) then (Upper(Substring(firstname,1,1))+Lower(Substring(firstname,2,CharIndex(' ',firstname,1)-1)))+
(Upper(Substring(firstname,CharIndex(' ',firstname,1)+1,1))+
Lower(Substring(firstname,CharIndex(' ',firstname,1)+2,Len(firstname)-(CharIndex(' ',firstname,1)-1))))
else (Upper(Substring(firstname,1,1))+Lower(Substring(firstname,2,Len(firstname)-1))) end;
Tony Rogerson has code that deals with:
double barrelled names eg Arthur Bentley-Smythe
Control characters
I haven't used it myself though...