How do I remove unknown characters in a string? - sql-server-2008

I would like to delete parts of an string.
We have a Table: Locations
mk-MK=New York; sq-AL=Nej York; en-US=New York
mk-MK=London; sq-AL=London; en-US=London
mk-MK=Paris; sq-AL=Paris; en-US=Paris
I Want to remove everything and keep only sq-AL=LocationName.
I want the result to be:
sq-AL=Nej York;
sq-AL=London;

This is yet another example of the importance of normalized databases.
In a normalized database you would have a table with 2 columns, one for the culture (sq-Al, en-US etc`) and one for the value. I would go a step further and have the cultures in a lookup table.
However, since this is not the case you have to use string manipulations to get the value of a specific culture. you can use SUBSTRING and CHARINDEX to find the specific pattern you want.
This will work in any of the cases represented by the sample data I've listed.
-- Create the table and insert sample data
CREATE TABLE Location ([Name] varchar(100))
INSERT INTO Location ([Name]) VALUES
('en-US=Huston; mk-MK=Huston; sq-AL=Huston;'), -- end of the row, with the ending ';'.
('en-US=New York; mk-MK=New York; sq-AL=Nej York'), -- end of the row, without the ending ';'.
('mk-MK=London; sq-AL=London; en-US=London'), -- middle of the row
('sq-AL=Paris; en-US=Paris; mk-MK=Paris') -- begining of the row
SELECT SUBSTRING(Name,
CHARINDEX('sq-AL=', Name), -- index of 'sq-AL='
CASE WHEN CHARINDEX(';', Name, CHARINDEX('sq-AL=', Name)) > 0 THEN -- If there is a ';' after 'sq-AL='.
CHARINDEX(';', Name, CHARINDEX('sq-AL=', Name)) -- index of the first ';' after 'sq-AL='
- CHARINDEX('sq-AL=', Name) -- index of the first ';' - the index of 'sq-AL=' will give you the length for `Nej York`
ELSE
LEN(Name)
END
) + ';'
FROM Location
-- Cleanup
DROP Table Location

You can use CHARINDEX function. I've tried same with a variable as,
declare #locations varchar(100) = 'mk-MK=New York; sq-AL=Nej York; en-US=New York'
select LEFT(
RIGHT(
#locations, LEN(#locations)-CHARINDEX(';',#locations)
--output here : sq-AL=Nej York; en-US=New York
)
,CHARINDEX(';',#locations)
) + ';'
--Final Output : sq-AL=Nej York;
In your case: Query will be as,
select LEFT(
RIGHT(
Name, LEN(Name)-CHARINDEX(';',Name)
--output here : sq-AL=Nej York; en-US=New York
)
,CHARINDEX(';',Name)
) + ';'
FROM Locations

Related

using a regex to remove invalid values from image

working on a code where we are storing images but some images ending with weird characters
like , %2C -x1 to x10 etc or more but always end wih a .jpg
how can i regex to replace the image name to be a valid name
here is an example of what i have
PCpaste_10_g,-X1,-X2,-X3
SNBar_NEW,-X1
they can go till -X10
so i want to have regex to remove ,and everything afterwards it
i tried using replace but that only works for one item at a time
If your data is consistent with the string before the first comma that need to be taken, then you can try with SUBSTRING_INDEX;
Let's use this as you sample table & using your sample data:
CREATE TABLE mytable (
val VARCHAR(255));
INSERT INTO mytable VALUES
('PCpaste_10_g,-X1,-X2,-X3.jpg'),
('SNBar_NEW,-X1.jpg');
val
PCpaste_10_g,-X1,-X2,-X3.jpg
SNBar_NEW,-X1.jpg
Then first you extract the first string before comma occurrence:
SELECT SUBSTRING_INDEX(val,',',1) extracted
FROM mytable
returns
extracted
PCpaste_10_g
SNBar_NEW
Then to add back .jpg:
SELECT CONCAT(SUBSTRING_INDEX(val,',',1),'.jpg') extracted_combined
FROM mytable
IF your image extension is not consistently .jpg, you can do another SUBSTRING_INDEX() to get the extension then CONCAT() them:
SELECT CONCAT(SUBSTRING_INDEX(val,',',1) ,'.',
SUBSTRING_INDEX(val,'.',-1)) Extracted_combined
FROM mytable;
Demo fiddle
You can use LOCATE to find the first occurrence of "," in the field and LEFT to grab everything up to the first "," -
SET #value := 'PCpaste_10_g,-X1,-X2,-X3';
SELECT CONCAT(LEFT(#value, LOCATE(',', #value) - 1), '.jpg');
or for your update -
UPDATE <table>
SET image_name = CONCAT(LEFT(image_name, LOCATE(',', image_name) - 1), '.jpg')
WHERE image_name LIKE '%,%';
or to handle your %2C at the same time -
UPDATE <table>
SET image_name = CASE
WHEN image_name LIKE '%,%'
THEN CONCAT(LEFT(image_name, LOCATE(',', image_name) - 1), '.jpg')
WHEN image_name LIKE '%\%2C%'
THEN CONCAT(LEFT(image_name, LOCATE('%2C', image_name) - 1), '.jpg')
END
WHERE image_name LIKE '%,%'
OR image_name LIKE '%\%2C%';

Teradata Masking - Retain all chararcters at position 1,4,8,12,16 .... in a string and mask remaining characters with 'X'

I have a requirement where I need to mask all but characters in position 1,4,8,12,16.. for a variable length string with 'X'
For example:
Input string - 'John Doe'
Output String - 'JXXn xxE'
SPACE between the two strings must be retained.
Kindly help or reach out for more details if required.
I think maybe an external function would be best here, but if that's too much to bite off, you can get crafty with strtok_split_to_table, xml_agg and regexp_replace to rip the string apart, replace out characters using your criteria, and stitch it back together:
WITH cte AS (SELECT REGEXP_REPLACE('this is a test of this functionality', '(.)', '\1,') AS fullname FROM Sys_Calendar.calendar WHERE calendar_date = CURRENT_DATE)
SELECT
REGEXP_REPLACE(REGEXP_REPLACE((XMLAGG(tokenout ORDER BY tokennum) (VARCHAR(200))), '(.) (.)', '\1\2') , '(.) (.)', '\1\2')
FROM
(
SELECT
tokennum,
outkey,
CASE WHEN tokennum = 1 OR tokennum mod 4 = 0 OR token = ' ' THEN token ELSE 'X' END AS tokenout
FROM TABLE (strtok_split_to_table(cte.fullname, cte.fullname, ',')
RETURNS (outkey VARCHAR(200), tokennum integer, token VARCHAR(200) CHARACTER SET UNICODE)) AS d
) stringshred
GROUP BY outkey
This won't be fast on a large data set, but it might suffice depending on how much data you have to process.
Breaking this down:
WITH cte AS (SELECT REGEXP_REPLACE('this is a test of this functionality', '(.)', '\1,') AS fullname FROM Sys_Calendar.calendar WHERE calendar_date = CURRENT_DATE)
This CTE is just adding a comma between every character of our incoming string using that regexp_replace function. Your name will come out like J,o,h,n, ,D,o,e. You can ignore the sys_calendar part, I just put that in so it would spit out exactly 1 record for testing.
SELECT
tokennum,
outkey,
CASE WHEN tokennum = 1 OR tokennum mod 4 = 0 OR token = ' ' THEN token ELSE 'X' END AS tokenout
FROM TABLE (strtok_split_to_table(cte.fullname, cte.fullname, ',')
RETURNS (outkey VARCHAR(200), tokennum integer, token VARCHAR(200) CHARACTER SET UNICODE)) AS d
This subquery is the important bit. Here we create a record for every character in your incoming name. strtok_split_to_table is doing the work here splitting that incoming name by comma (which we added in the CTE)
The Case statement just runs your criteria swapping out 'X' in the correct positions (record 1, or a multiple of 4, and not a space).
SELECT
REGEXP_REPLACE(REGEXP_REPLACE((XMLAGG(tokenout ORDER BY tokennum) (VARCHAR(200))), '(.) (.)', '\1\2') , '(.) (.)', '\1\2')
Finally we use XMLAGG to combine the many records back into one string in a single record. Because XMLAGG adds a space in between each character we have to hit it a couple of times with regexp_replace to flip those spaces back to nothing.
So... it's ugly, but it does the job.
The code above spits out:
tXXs XX X XeXX oX XhXX fXXXtXXXaXXXy
I couldn't think of a solution, but then #JNevill inspired me with his idea to add a comma to each character :-)
SELECT
RegExp_Replace(
RegExp_Replace(
RegExp_Replace(inputString, '(.)(.)?(.)?(.)?', '(\1(\2[\3(\4', 2)
,'(\([^ ])', 'X')
,'(\(|\[)')
,'this is a test of this functionality' AS inputString
tXXs XX X XeXX oX XhXX fXXXtXXXaXXXy
The 1st RegExp_Replace starts at the 2nd character (keep the 1st character as-is) and processes groups of (up to) 4 characters adding either a ( (characters #1,#2,#4, to be replaced by X unless it's a space) or [ (character #3, no replacement), which results in :
t(h(i[s( (i(s[ (a( (t[e(s(t( [o(f( (t[h(i(s( [f(u(n(c[t(i(o(n[a(l(i(t[y(
Of course this assumes that both characters don't exists in your input data, otherwise you have to choose different ones.
The 2nd RegExp_Replace replaces the ( and the following character with X unless it's a space, which results in:
tXX[s( XX[ X( X[eXX( [oX( X[hXX( [fXXX[tXXX[aXXX[y(
Now there are some (& [ left which are removed by the 3rd RegExp_Replace.
As I still consider me as a beginner in Regular Expressions, there will be better solutions :-)
Edit:
In older Teradata versions not all parameters were optional, then you might have to add values for those:
RegExp_Replace(
RegExp_Replace(
RegExp_Replace(inputString, '(.)(.)?(.)?(.)?', '(\1(\2[\3(\4', 2, 0 'c')
,'(\([^ ])', 'X', 1, 0 'c')
,'(\(|\[)', '', 1, 0 'c')

Concatenation with case statement in mysql

I want to generate .sql file with the sql query output . I am doing this with concat statement in sql . I am using case statement in some queries this will be the problem for me.
select concat('insert into x values(',CASE a when B then 'Book' else 'NONE' end , ') on duplicate key update B = values(B)') from author;
select 'insert into x values('+CASE a when B then 'Book' else 'NONE' end +') on duplicate key update B = values(B)' from author;
It also not works because in mysql + used for adding only numbers not for strings .
Is there any way for doing this?.
The problem with the first version is the quotes of things within the string. For instance, you want your string to contain "'Book'"
select concat('insert into x values(',
(CASE a when 'B' then '''Book''' else '''NONE''' end) ,
') on duplicate key update B = values(''B'')'
)
from author;
I think this quotes al the strings as they should be. I'm guess column A is a character that should be compared to 'B' and not to column B.

MySql: updating a column with the column's content plus something else

I'm don't have a lot of knowledge of MySql (or SQL in general) so sorry for the noobness.
I'm trying to update a bunch of String entries this way:
Lets say we have this:
commands.firm.pm.Stuff
Well I want to convert that into:
commands.firm.pm.print.Stuff
Meaning, Add the .print after pm, before "Stuff" (where Stuff can be any Alphanumerical String).
How would I do this with a MySql Query? I'm sure REGEXP has to be used, but I'm not sure how to go about it.
Thanks
Try something like this. It finds the last period and inserts your string there:
select insert(s, length(s) - instr(reverse(s), '.') + 1, 0, '.print')
from (
select 'commands.firm.pm.Stuff' as s
) a
To update:
update MyTable
set MyColumn = insert(MyColumn, length(MyColumn) - instr(reverse(MyColumn), '.') + 1, 0, '.print')
where MyColumn like 'commands.firm.pm.%'
Perhaps use a str_replace to replace commands.firm.pm to commands.firm.pm.print
$original_str = "commands.firm.pm.15hhkl15k0fak1";
str_replace("commands.firm.pm", "commands.firm.pm.print", $original_str);
should output: commands.firm.pm.print.15hhkl15k0fak1
then update your table with the new value...How to do it all in one query (get column value and do the update), I do not know. All I can think of is you getting the column value in one query, doing the replacement above, and then updating the column with the new value in a second query.
To update rows that end in '.Stuff' only:
UPDATE TableX
SET Column = CONCAT( LEFT( CHAR_LENGTH(Column) - CHAR_LENGTH('.Stuff') )
, '.print'
, '.Stuff'
)
WHERE Column LIKE '%.Stuff'
To update all rows - by appending .print just before the last dot .:
UPDATE TableX
SET Column = CONCAT( LEFT( CHAR_LENGTH(Column)
- CHAR_LENGTH(SUBSTRING_INDEX(Column, '.', -1))
)
, 'print.'
, SUBSTRING_INDEX(Column, '.', -1)
)
WHERE Column LIKE '%.%'

sql parse full name field into first, middle, last, and suffix

I'm trying to split full names into last, first, middle, and suffix. I searched but couldn't find the exact the same format as mine. I have the following code, but I'm getting this error when running the full select.
Msg 537, Level 16, State 3, Line 1
Invalid length parameter passed to the LEFT or SUBSTRING function.
SpaceComma table gets the correct indexes.
This is the format of the names I have:
CREATE TABLE #myfullnames (fullName VARCHAR(50))
GO
INSERT #myfullnames VALUES ('BROOK SR, JAMES P.')
INSERT #myfullnames VALUES ('BLOCK JR., BILL V.')
INSERT #myfullnames VALUES ('MOOR, CLODE M.')
INSERT #myfullnames VALUES ('SOUDER III, Laurence R.')
INSERT #myfullnames VALUES ('SOUDER, WILL' )
INSERT #myfullnames VALUES ('KOLIV, Kevin E.')
INSERT #myfullnames VALUES ('Simk, JR. Thomas Todd')
INSERT #myfullnames VALUES ('Polio, Gary R.')
I would appreciate your help. Thanks.
select SplitNames.LastName, SplitNames.FirstName,
SplitNames.MiddleName, SplitNames.Title
from (
select [fullName]
, substring([fullName], 1, SpceTitle-1) as LastName
, substring([fullName], SpceMid,(SpceMid - SpceFirstName - 1)) as FirstName
, substring([fullName], SpaceComma.SpceTitle, (SpaceComma.SpceFirstName -
SpaceComma.SpceTitle)) as Title
, nullif(substring([fullName],SpaceComma.SpceMid+1,100),'') as
MiddleName
from (
select [fullName],
charindex(',',[fullName]) as Comma,
charindex(' ',[fullName]+space(1),charindex(',',[fullName])) as
SpceFirstName,
(len([fullName]) + 1 - charindex(' ',reverse([fullName]), 0)) as
SpceMid,
charindex(' ',[fullName], charindex (' ',reverse([fullName]))) as SpceTitle
from #myfullnames
) SpaceComma
) SplitNames
DROP TABLE #myfullnames
The data in your example does not follow any fixed set of rules so there will not be a perfect solution for parsing the names. An example of the rule violation is between "Simk" and "BLOCK" in that the "JR" is inside the comma on one and not the other. The only solution to the rule violation is to manually correct the violators.
We can parse the name using the "PARSENAME" function in SQL Server. SQL Server uses PARSENAME to seperate the SERVERNAME.DATABASE.SCHEMA.TABLE and is limited to four parts.
Here is a query which parses the names
select fullname
, REPLACE(fullname,'.','') AS [1]
, REPLACE(REPLACE(fullname,'.',''),', ','.') AS [2]
, ParseName(REPLACE(REPLACE(fullname,'.',''),', ','.'),2) AS [3]
, REPLACE(ParseName(REPLACE(REPLACE(fullname,'.',''),', ','.'),1),' ','.') AS [4]
, PARSENAME(REPLACE(ParseName(REPLACE(REPLACE(fullname,'.',''),', ','.'),1),' ','.'),1) AS [5]
, PARSENAME(REPLACE(ParseName(REPLACE(REPLACE(fullname,'.',''),', ','.'),1),' ','.'),2) AS [6]
from #myfullnames
The six output columns demonstrate the use of replacing characters with the "." then using PARSENAME to extract a portion of the string.