Inserting a delimiter - mysql

MySql has a function CONCAT_WS that I use to export multiple fields with a delimiter into a single field. Works great!
There are multiple fields being stored in a database I query off of that has data that I need to extract each field individually but within each field the data need to include a delimiter. I can most certainly do a concatenate but that does take awhile to set-up if my data requires up to 100 unique values. Below is an example of what I am talking about
Stored Data 01020304050607
End Result 01,02,03,04,05,06,07
Stored Data 01101213
End Result 01,10,12,13
Is there a function in MySQL that does the above?

I am not that familiar with mysql but I have seen questions like this come up before where a regular expression function would be useful. There are user-defined functions available that allow Oracle-like regular expression functions to be used as their support is weak in mysql. See here: https://github.com/hholzgra/mysql-udf-regexp
So you could do something like this:
select trim(TRAILING ',' FROM regexp_replace(your_column, '(.{2})', '\1,') )
from your_table;
This adds a comma every 2 character then chops off the last one. Maybe this will give you some ideas.

Related

MYSQL REGEXP with JSON array

I have an JSON string stored in the database and I need to SQL COUNT based on the WHERE condition that is in the JSON string. I need it to work on the MYSQL 5.5.
The only solution that I found and could work is to use the REGEXP function in the SQL query.
Here is my JSON string stored in the custom_data column:
{"language_display":["1","2","3"],"quantity":1500,"meta_display:":["1","2","3"]}
https://regex101.com/r/G8gfzj/1
I now need to create a SQL sentence:
SELECT COUNT(..) WHERE custom_data REGEXP '[HELP_HERE]'
The condition that I look for is that the language_display has to be either 1, 2 or 3... or whatever value I will define when I create the SQL sentence.
So far I came here with the REGEX expression, but it does not work:
(?:\"language_display\":\[(?:"1")\])
Where 1 is replaced with the value that I look for. I could in general look also for "1" (with quotes), but it will also be found in the meta_display array, that will have different values.
I am not good with REGEX! Any suggestions?
I used the following regex to get matches on your test string
\"language_display\":\[(:?\"[0-9]\"\,)*?\"3\"(:?\,\"[0-9]\")*?\]
https://regex101.com/ is a free online regex tester, it seems to work great. Start small and work big.
Sorry it doesn't work for you. It must be failing on the non greedy '*?' perhaps try without the '?'
Have a look at how to serialize this data, with an eye to serializing the language display fields.
How to store a list in a column of a database table
Even if you were to get your idea working it will be slow as fvck. Better off to process through each row once and generate something more easily searched via sql. Even a field containing the comma separated list would be better.

MySql : query to format a specific column in database table

I have one column name phone_number in the database table.Right now the numbers stored in the table are format like ex.+91-852-9689568.I want to format it and just want only digits.
How can i do it in MySql ? I have tried it with using functions like REGEXP but it displays error like function does not exist.And i don't want to use multiple REPLACE.
One of the options is to use mySql substring. (As long as the format doesn't change)
SELECT concat(SUBSTRING(pNo,2,2), SUBSTRING(pNo,5,3), SUBSTRING(pNo,9,7));
if you want to format via projection only, use SELECT, you will only need to use replace twice and no problem with that.
SELECT REPLACE(REPLACE(columnNAme, '-', ''), '+', '')
FROM tableName
otherwise, if you want to update the value permanently, use UPDATE
UPDATE tableName
SET columnName = REPLACE(REPLACE(columnNAme, '-', ''), '+', '')
MySQL does not have a builtin function for pattern-matching and replace.
You'll be better off fetching the whole string back to your application, and then using a more flexible string-manipulation function on it. For instance, preg_replace() in PHP.
Try the following and comment please.
Select dbo.Regex('\d+',pNo);
Select dbo.Regex('[0-9]+',pNo);
Reference on RUBLAR.
So MYSQL is not like Oracle, hence you may just use a USer defined Function to get numbers. This could get you going.

Why does SSIS TOKEN function fail to count adjacent column delimiters?

I ran into a problem with SQL Server Integration Services 2012's new string function in the Expression Editor called TOKEN().
This is supposed to help you parse a delimited record. If the record comes out of a flat file, you can do this with the Flat File Source. In this case, I am dealing with old delimited import records that were stored as strings in a database VARCHAR field. Now they need to be extracted, massaged, and re-exported as delimited strings. For example:
1^Apple^0001^01/01/2010^Anteater^A1
2^Banana^0002^03/15/2010^Bear^B2
3^Cranberry^0003^4/15/2010^Crow^C3
If these strings are in a column called OldImportRecord, the delimiter is a caret (as shown), and we wish to put the fifth field into a Derived Column, we would use an expression like:
TOKEN(OldImportRecord,"^",5)
This returns Anteater, Bear, Crow, etc. In fact, we can create Derived Columns for each of the fields in this record (note that the index is one-based), change them as needed, and then build another delimited record for export.
Here's the problem. What if some of our data includes some empty strings (or Nulls rendered as empty strings)?
4^^0004^6/15/2010^Duck^D4
The TOKEN() fails to count the adjacent column delimiters, which throws off the column count. Now it only sees five columns instead of six columns. Our TOKEN(OldImportRecord,"^",5) returns "D4" instead of the intended "Duck". When we extract the fourth column, we wind up trying to put "Duck" into a Date column, and all sorts of fun ensues.
Here's a partial workaround:
TOKEN(REPLACE(OldImportRecord,"^^","^ ^"),"^",5)
Notice this misses every second delimiter pair, so it will fail for a string like "5^^^^Emu^E5", which looks like"5^ ^^ ^Emu^E5" after the REPLACE(). The column count is still wrong.
So here's my full workaround. This includes two nested REPLACE statements(), an RTRIM() to remove the superfluous spaces, and a DT_STR cast because I would like to keep the result in VARCHAR:
(DT_STR,255,1252)RTRIM(TOKEN(REPLACE(REPLACE(OldImportRecord,"^^","^ ^"),"^^","^ ^"),"^",5))
I am posting this for information, since others may also run into this problem.
Does anyone have a better workaround, or even a real solution?
Reason for the issue:
TOKEN method in SSIS uses the implementation of strtok function in C++. I gathered this information while reading the book Microsoft® SQL Server® 2012 Integration Services. It is mentioned as note on page 113 (I like this book! Lots of nice information.).
I searched for the implementation of strtok function and I found the following links.
INFO: strtok(): C Function -- Documentation Supplement - The code sample in this link shows that the function does ignore consecutive delimiter characters.
The answers to the following SO questions point out that strtok function is designed to ignore consecutive delimiters.
Need to know when no data appears between two token separators using strtok()
strtok_s behaviour with consecutive delimiters
I think that the TOKEN and TOKENCOUNT functions are working as per design but whether that is how SSIS should behave might be a question for the Microsoft SSIS team.
Original Post - Above section is an update:
I created a simple package in SSIS 2012 based on your data inputs. As you had described in your question, the TOKEN function does not behave as intended. I agree with you that the function doesn't seem to work. This post is not an answer to your original issue.
Here is an alternative way to write the expression in a relatively simpler fashion. This will only work if the last segment in your input record will always have a value (say A1, B2, C3 etc.).
Expression can be rewritten as:
This statement will take the input record as the parameter, the delimiter caret (^) as the second parameter. The third parameter calculates the total number segments in the records when split by the delimiter. If you have data in the last segment, you are guaranteed to have two segments. You can then subtract 1 to fetch the penultimate segment.
(DT_STR,50,1252)TOKEN(OldImportRecord,"^",TOKENCOUNT(OldImportRecord,"^") - 1)
I created a simple package with data flow task. OLE DB source retrieves the data and the derived transformation parses and splits the data as per the screenshot below. The output is then inserted into the destination table. You can see the source and destination tables in the last screenshot. Destination table has two columns. The first column stores the penultimate segment data and the segments count based on the delimiter (which again isn't correct). You can notice that the last record didn't fetch the correct results. If the last record didn't have the value 8, then the above expression will fail because the expression will evaluate to zero index.
Hope that helps to simplify your expression.
If you don't hear from anyone else, I would recommend logging this issue in Microsoft Connect website.
Create table and populate scripts:
CREATE TABLE [dbo].[SourceTable](
[OldImportRecord] [varchar](50) NOT NULL
) ON [PRIMARY]
GO
CREATE TABLE [dbo].[DestinationTable](
[NewImportRecord] [varchar](50) NOT NULL,
[CaretCount] [int] NOT NULL
) ON [PRIMARY]
GO
INSERT INTO dbo.SourceTable (OldImportRecord) VALUES
('1^Apple^0001^01/01/2010^Anteater^A1'),
('2^Banana^0002^03/15/2010^Bear^B2'),
('3^Cranberry^0003^4/15/2010^Crow^C3'),
('4^^0004^6/15/2010^Duck^D4'),
('5^^^^Emu^E5'),
('6^^^^Geese^F6'),
('^^^^Pheasant^G7'),
('8^^^^Sparrow^');
GO
Derived column transformation inside data flow task:
Data in source and destination tables:
Not only does TOKEN skip adjacent delimiters, it also skips leading and trailing delimiters as well. So, using your example, if you had a field "good" field that looks like this:
1^Apple^0001^01/01/2010^Anteater^A1
Followed by one with adjacent and leading delimiters like this:
^^^0004^6/15/2010^Duck^
TOKENCOUNT would only find two delimiters and you'd end up with 0004 assigned to Token1, 6/15/2010 for Token2, and Duck for Token3.
I used a different kind of replace. Rather than placing spaces between adjacent delimiters, which wouldn't help with leading or training, I used replace to surround the delimiters with characters I absolutely wouldn't find in my text. The following Expression works well for me. It's wordy, but it is what it is.
(DT_STR,255,1252)REPLACE(TOKEN(REPLACE(OldImportRecord,"^","~^~"),"^",1),"~","")
Of course, you'd replace the number 1 with whatever Token you wanted and adjust the cast according to your needs. Hope that helps.

Creating variables and reusing within a mysql update query? possible?

I am struggling with this query and want to know if I am wasting my time and need to write a php script or is something like the following actually possible?
UPDATE my_table
SET #userid = user_id
AND SET filename('http://pathto/newfilename_'#userid'.jpg')
FROM my_table
WHERE filename
LIKE '%_%' AND filename
LIKE '%jpg'AND filename
NOT LIKE 'http%';
Basically I have 700 odd files that need renaming in the database as they do not match the filenames as I am changing system, they are called in the database.
The format is 2_gfhgfhf.jpg which translates to userid_randomjumble.jpg
But not all files in the database are in this format only about 700 out of thousands. So I want to identify names that contain _ but don't contain http (thats the correct format that I don't want to touch).
I can do that fine but now comes the tricky bit!!
I want to replace that file name userid_randomjumble.jpg with http://pathto/filename_userid.jpg So I want to set the column user_id in that row to a variable and insert it into my new filename.
The above doesn't work for obvious reasons but I am not sure if there is a way round what I'm trying to do. I have no idea if it's possible? Am I wasting my time with this and should I turn to PHP with mysql and stop being lazy? Or is there a way to get this to work?
Yes it is possible without the php. Here is a simple example
SET #a:=0;
SELECT * FROM table WHERE field_name = #a;
Yes you can do it using straightforward SQL:
UPDATE my_table
SET filename = CONCAT('http://pathto/newfilename_', userid, '.jpg')
WHERE filename LIKE '%\_%jpg'
AND filename NOT LIKE 'http%';
Notes:
No need for variables. Any columns of rows being updated may be referenced
In mysql, use CONCAT() to add text values together
With LIKE, an underscore (_) has a special meaning - it means "any single character". If you want to match a literal underscore, you must escape it with a backslash (\)
Your two LIKE predicates may be safely merged into one for a simpler query

mysql: replace \ (backslash) in strings

I am having the following problem:
I have a table T which has a column Name with names. The names have the following structure:
A\\B\C
You can create on yourself like this:
create table T ( Name varchar(10));
insert into T values ('A\\\\B\\C');
select * from T;
Now if I do this:
select Name from T where Name = 'A\\B\C';
That doesn't work, I need to escape the \ (backslash):
select Name from T where Name = 'A\\\\B\\C';
Fine.
But how do I do this automatically to a string Name?
Something like the following won't do it:
select replace('A\\B\C', '\\', '\\\\');
I get: A\\\BC
Any suggestions?
Many thanks in advance.
You have to use "verbatim string".After using that string your Replace function will
look like this
Replace(#"\", #"\\")
I hope it will help for you.
The literal A\\B\C must be coded as A\\\\A\\C, and the parameters of replace() need escaping too:
select 'A\\\\B\\C', replace('A\\\\B\\C', '\\', '\\\\');
output (see this running on SQLFiddle):
A\\B\C A\\\\B\\C
So there is little point in using replace. These two statements are equivalent:
select Name from T where Name = replace('A\\\\B\\C', '\\', '\\\\');
select Name from T where Name = 'A\\\\B\\C';
Usage of regular expression will solve your problem.
This below query will solve the given example.
1) S\\D\B
select * from T where Name REGEXP '[A-Z]\\\\\\\\[A-Z]\\\\[A-Z]$';
if incase the given example might have more then one char
2) D\\B\ACCC
select * from T where Name REGEXP '[A-Z]{1,5}\\\\\\\\[A-Z]{1,5}\\\\[A-Z]{1,5}$';
note: i have used 5 as the max occurrence of char considering the field size is 10 as its mentioned in the create table query.
We can still generalize it.If this still has not met your expectation feel free to ask for my help.
You're confusing what's IN the database with how you represent that data in SQL statements. When a string in the database contains a special character like \, you have to type \\ to represent that character, because \ is a special character in SQL syntax. You have to do this in INSERT statements, but you also have to do it in the parameters to the REPLACE function. There are never actually any double slashes in the data, they're just part of the UI.
Why do you think you need to double the slashes in the SQL expression? If you're typing queries, you should just double the slashes in your command line. If you're generating the query in a programming language, the best solution is to use prepared statements; the API will take care of proper encoding (prepared statements usually use a binary interface, which deals with the raw data). If, for some reason, you need to perform queries by constructing strings, the language should hopefully provide a function to escape the string. For instance, in PHP you would use mysqli_real_escape_string.
But you can't do it by SQL itself -- if you try to feed the non-escaped string to SQL, data is lost and it can't reconstruct it.
You could use LIKE:
SELECT NAME FROM T WHERE NAME LIKE '%\\\\%';
Not exactly sure by what you mean but, this should work.
select replace('A\\B\C', '\', '\\');
It's basically going to replace \ whereever encountered with \\ :)
Is this what you wanted?