Mysql - how to strip certain characters from the end of the string - mysql

I have strings like this :
column:
----------
word[1]
word[2]
word
word[2]
word
word[3]
Where word is a variable length random characters string.
How would I remove square brackets with numbers in them from the end of these strings in mysql table?
Does mysql allow regexes?

update test
set name = SUBSTRING_INDEX(name,'[',1)
where name=name
DEMO

You could use the following select:
IF(RIGHT[(myColumn, 1) = "]", SUBSTRING(myColumn, -3), myColumn)
RIGHT(mycolumn, 1) == ] will check if your entry lasts with a closing bracket.
SUBSTRING(myColumn, -3) will return the string without the closing bracket, if there is one.
myColumns will return the full string, if there is no bracket.

Related

Use regexp in presto to remove the last slash only when it's preceded by a character

Is there a way in Presto or SQL to remove the last backslash only when it is preceded by a character and keep it otherwise?
I am using regexp_replace in presto. So for example if x = '/' The expression should return '/'
and if x = 'beta/alpha/' it should return 'beta/alpha'
I am using select regexp_replace ([expression], '[\/]$', '').
This returns an empty string when there is only a backslash and removes the backslash from the end of the string if the expression has some characters before the backslash.
You can use
regexp_replace([expression], '([^/])/$', '$1')
-- or
regexp_replace([expression], '(?<=[^/])/$', '')
See the regex demo.
Details
([^/])/$ - matches and captures any char but / into Group 1 (with the ([^/]) pattern, the $1 in the replacement pattern is a replacement backreference that refers to the Group 1 value), then matches a / at the end of string ($)
(?<=[^/])/$ matches a / at the end of a string only when the char immediately on the left is not a / char (and not the start of a string).

how to get rid of newline space while using substring_index

i have a field with value like below:
utf8: "\xE2\x9C\x93"
id: "805265"
plan: initial
acc: "123456"
last: "1234"
doc: "1281468479"
validation: field
commit: Accept
i used below query to extract acc value
select SUBSTRING_INDEX(SUBSTRING_INDEX(columnname, 'acc: "', -1),'last',1) as acc from table_name;
i am able to retrieve acc value but problem is when i export the result to csv file, the field is taking newline space which is before last...how do i get rid of that space???
I would expect you would want to strip out the end quote as well. But to answer your specific question, you can just update your SUBSTRING_INDEX delimiter to include the newline, i.e. select SUBSTRING_INDEX(SUBSTRING_INDEX(columnname, 'acc: "', -1),'\nlast',1) as acc from table_name;.
Or, if you prefer, you can use the REPLACE function to strip out any unwanted characters.

Escape string for use in MySQL fulltext search

I am using Laravel 4 and have set up the following query:
if(Input::get('keyword')) {
$keyword = Input::get('keyword');
$search = DB::connection()->getPdo()->quote($keyword);
$query->whereRaw("MATCH(resources.name, resources.description, resources.website, resources.additional_info) AGAINST(? IN BOOLEAN MODE)",
array($search)
);
}
This query runs fine under normal use, however, if the user enters a string such as ++, an error is thrown. Looking at the MySQl docs, there are some keywords, such as + and - which have specific purposes. Is there a function which will escape these types of special characters from a string so it can be used in a fulltext search like above without throwing any errors?
Here is an example of an error which is thrown:
{"error":{"type":"Illuminate\\Database\\QueryException","message":"SQLSTATE[42000]: Syntax error or access violation: 1064 syntax error, unexpected '+' (SQL: select * from `resources` where `duplicate` = 0 and MATCH(resources.name, resources.description, resources.website, resources.additional_info) AGAINST('c++' IN BOOLEAN MODE))","file":"\/var\/www\/html\/[...]\/vendor\/laravel\/framework\/src\/Illuminate\/Database\/Connection.php","line":555}}
Solutions I've tried:
$search = str_ireplace(['+', '-'], ' ', $keyword);
$search = filter_var($keyword, FILTER_SANITIZE_STRING);
$search = DB::connection()->getPdo()->quote($keyword);
I'm assuming I will need to use regex. What's the best approach here?
Only the words and operators have meaning in Boolean search mode. Operators are: +, -, > <, ( ), ~, *, ", #distance. After some research I found what word characters are: Upper case, Lower case letters, Numeral (digit) and _. I think you can use one of two approaches:
Replace all non word characters with spaces (I prefer this approach). This can be accomplished with regex:
$search = preg_replace('/[^\p{L}\p{N}_]+/u', ' ', $keyword);
Replace characters-operators with spaces:
$search = preg_replace('/[+\-><\(\)~*\"#]+/', ' ', $keyword);
Only words are indexed by full text search engine and can be searched. Non word characters isn't indexed, so it does not make sense to leave them in the search string.
References:
Boolean Full-Text Searches
Fine-Tuning MySQL Full-Text Search (see: "Character Set Modifications")
PHP: preg_replace
PHP: Unicode character properties
PHP: Possible modifiers in regex patterns
While the answer from Rimas is technically correct, it will suit you only if you do not want users to use the MATCH operators, because it will strip them all completely. For example, I do want to allow use of all of them except #distance in search forms on my site, thus I've come up with this:
#Trim first
$newValue = preg_replace('/^\p{Z}+|\p{Z}+$/u', '', string);
#Remove all symbols except allowed operators and space. #distance is not included, since it's unlikely a human will be using it through UI form
$newValue = preg_replace('/[^\p{L}\p{N}_+\-<>~()"* ]/u', '', $newValue);
#Remove all operators, that can only precede a text and that are not preceded by either beginning of string or space
$newValue = preg_replace('/(?<!^| )[+\-<>~]/u', '', $newValue);
#Remove all double quotes and asterisks, that are not preceded by either beginning of string, letter, number or space
$newValue = preg_replace('/(?<![\p{L}\p{N}_ ]|^)[*"]/u', '', $newValue);
#Remove all double quotes and asterisks, that are inside text
$newValue = preg_replace('/([\p{L}\p{N}_])([*"])([\p{L}\p{N}_])/u', '', $newValue);
#Remove all opening parenthesis which are not preceded by beginning of string or space
$newValue = preg_replace('/(?<!^| )\(/u', '', $newValue);
#Remove all closing parenthesis which are not preceded by beginning of string or space or are not followed by end of string or space
$newValue = preg_replace('/(?<![\p{L}\p{N}_])\)|\)(?! |$)/u', '', $newValue);
#Remove all double quotes if the count is not even
if (substr_count($newValue, '"') % 2 !== 0) {
$newValue = preg_replace('/"/u', '', $newValue);
}
#Remove all parenthesis if count of closing does not match count of opening ones
if (substr_count($newValue, '(') !== substr_count($newValue, ')')) {
$newValue = preg_replace('/[()]/u', '', $newValue);
}
Unfortunately I was not able to figure out a way to do this in 1 regex, thus doing multiple runs. It's also possible, that I am missing some edge cases, as well. Any additions or corrections are appreciated: either here or create an issue for https://github.com/Simbiat/database where I implement this.

Remove last char if it's a specific character

I need to update values in a table by removing their last char if they ends with a +
Example:
John+Doe and John+Doe+ should both become John+Doe.
What's the best way to achieve this?
UPDATE table
SET field = SUBSTRING(field, 1, CHAR_LENGTH(field) - 1)
WHERE field LIKE '%+'
If you are trying to display the field instead of update the table, then you can use a CASE statement:
select
case
when right(yourfield,1) = '+' then left(yourfield,length(yourfield)-1)
else yourfield end
from yourtable
SQL Fiddle Demo
you didn't explain exactly the situation.
but if you search for names in a text. I'll remove all the non chars (anything not a-z and A-Z) including spaces and then compare.
if you want just the last char, try the SUBSTRING_INDEX function.
if you are passing to the DB as a string, you can do this with str_replace
<?php
$str = "John+Doe+";
$str = str_replace("+"," ",$str);
echo $str;
?>

MySQL LOAD DATA INFILE non consistent field

I have a file with four fields per line that looks like this:
<uri> <uri> <uri> <uri> .
:_non-spaced-alphanumeric <uri> "25"^^<uri:integer> <uri> .
:_non-spaced-alphanumeric <uri> "Hello"#en <uri> .
:_non-spaced-alphanumeric <uri> "just text in quotes" <uri> .
...
and this sql script:
LOAD DATA LOCAL INFILE 'data-0.nq'
IGNORE
INTO TABLE btc.btc_2012
FIELDS
TERMINATED BY ' ' OPTIONALLY ENCLOSED BY '"'
LINES
TERMINATED BY '.\n'
(subject,predicate,object,provenance);
The third field in the examples can be of any of the formats seen above. I don't really care about the 3rd value unless it's a uri, which is parsed fine by the script anyway. But if it's not then the fourth field consists of the part of the third after the quotation plus the fourth itself.
Is there a way I can get it working without manipulating the file, which by the way is 17GB?
Yes, there's a way to work with this. Have the data fields loaded into MySQL user variables, and then assign expressions to the actual columns.
For example, in place of:
(subject,predicate,object,provenance
do something like this:
(subject, predicate, #field3, #field4)
SET object = CASE WHEN #field3 LIKE '"%"_%' THEN ... ELSE #field3 END
, provenance = CONCAT(CASE WHEN #field3 LIKE '"%"%_"' THEN ... ELSE '' END,#field4)
That's just an outline. Obviously, those ... need to replaced with appropriate expressions that return the portions of the field values you want assigned to the columns. (That will be some combination of SUBSTRING, SUBSTRING_INDEX, INSTR, LOCATE, REPLACE, et al. string functions, and you may need additional WHEN constructs to handle variations.
(I'm not exactly clear on what conditions you need to check.)
If this is running on Unix or Linux, another option would be to make use of a named pipe, and external program to read the file, perform the require manipulation, and write to the named pipe, run that in the background.
e.g.
> mkfifo /tmp/mydata.pipe
> myprogram <myfile >/tmp/mydata.pipe 2>/tmp/mydata.err &
mysql> LOAD DATA LOCAL INFILE /tmp/mydata.pipe ...
FOLLOWUP
With an input line like this:
abc def "Hello"#en klm .
given FIELDS TERMINATED BY ' ' OPTIONALLY ENCLOSED BY '"'
field1 = 'abc'
field2 = 'def'
field3 = '"Hello"#en'
field4 = 'klm'
To test for the case when field3 contains double quotes, with the first double quote as the first character in the string, we could use something like this:
LIKE '"%"%'
That says the First character has to be a double quote, followed by zero one or more characters, followed by another double quote, followed again by zero one or more characters.
To get the portion of the field3 before the second double quote:
SUBSTRING_INDEX(#field3,'"',2)
To get rid of the leading double quote from that, i.e. to return what's between the double quotes in field3, you could do something like this:
SUBSTRING_INDEX(SUBSTRING_INDEX(#field3,'"',2),'"',-1)
To get the portion of field3 following the last double quote:
SUBSTRING_INDEX(SUBSTRING_INDEX(#field3,'"',-1)
(These expressions assume that there are at most two double quotes in field3.)
To get the value for the third column:
CASE
-- when field starts with a double quote and is followed by another double quote
WHEN #field3 LIKE '"%"%"'
-- return whats between the double quotes in field3
THEN SUBSTRING_INDEX(SUBSTRING_INDEX(#field3,'"',2),'"',-1)
-- otherwise return the entirety of field3
ELSE #field3
END
To get the value to be prepended to the fourth column, when field3 contains two double quotes:
CASE
-- when field starts with a double quote and is followed by another double quote
WHEN #field3 LIKE '"%"%"'
-- return whats after the last double quote in field3
THEN SUBSTRING_INDEX(#field3,'"',-1)
-- otherwise return an empty string
ELSE ''
END
To prepend that to field4, use the CONCAT function with te CASE expression above and field4.
And these are the values we'd expect to have inserted into the table:
column1 = 'abc'
column2 = 'def'
column3 = 'Hello'
column4 = '#enklm'
ANOTHER FOLLOWUP
If the LOAD DATA isn't recognizing the line delimiter because it's not recognizing the field delimiters, then you'd have to ditch the field delimiters, and do the parsing yourself. Load the whole line into a user variable, and parse away.
e.g.
LINES TERMINATED BY '.\n'
(#line)
SET subject
= SUBSTRING_INDEX(#line,' ',1)
, predicate
= SUBSTRING_INDEX(SUBSTRING_INDEX(#line,' ',2),' ',-1)
, object
= CASE
WHEN SUBSTRING_INDEX(SUBSTRING_INDEX(#line,' ',3),' ',-1) LIKE '"%'
THEN SUBSTRING_INDEX(SUBSTRING_INDEX(#line,'"',2),'"',-1)
ELSE SUBSTRING_INDEX(SUBSTRING_INDEX(#line,' ',3),' ',-1)
END
, provenance
= CASE
WHEN SUBSTRING_INDEX(SUBSTRING_INDEX(#line,' ',3),' ',-1) LIKE '"%'
THEN SUBSTRING_INDEX(SUBSTRING_INDEX(SUBSTRING_INDEX(#line,'"',-1),' ',2),' ',-1)
ELSE SUBSTRING_INDEX(SUBSTRING_INDEX(#line,' ',4),' ',-1)
END
This will work for all the lines in your example data, with fields delimited by a single space, with the exception of matching double quotes in the third field.
NOTE: The functions available in SQL for string manipulation lead to clumsy and awkward syntax; SQL wasn't specifically designed for easy string manipulation.