split string with space in MYSQL - mysql

i have data like
Name
-----------------
Ram Mohan
Ram Lal Mohan
Ram K Lal Mohan
...
I am using:
select SUBSTRING_INDEX(Name,' ',1) from contact
to get first name
select SUBSTRING_INDEX(Name,' ',-1) from contact
to get last name
am getting data like
first name last name
------------------------
Ram Mohan
Ram Mohan
Ram Mohan
but data i should get be like
first name last name
------------------------
Ram Mohan
Ram Lal Mohan
Ram K Lal Mohan
only last word after space should come in last name
rest should come in first name
can some one help me in finding a way to achieve this?

You could use a regex replacement here, assuming you are on MySQL 8+:
SELECT
REGEXP_REPLACE(Name, '\\s+\\S+$', '') AS first,
SUBSTRING_INDEX(Name,' ', -1) AS last
FROM contact;
Demo
For earlier versions of MySQL, and assuming that the last name would never appear anywhere else in the name, you could use SUBSTRING_INDEX along with REPLACE:
SELECT
REPLACE(Name, CONCAT(' ', SUBSTRING_INDEX(Name,' ', -1)), '') AS first,
SUBSTRING_INDEX(Name,' ', -1) AS last
FROM contact;
Demo
This second approach basically justs deletes off the last name (plus leading space) which you were already correctly finding using SUBSTRING_INDEX. What is left behind should be the first, middle, etc., components you want.

Since you can get the last name, you can remove that number of characters to get the first name.
select
trim(left(Name,char_length(Name)-char_length(substring_index(Name,' ',-1)))) first_name,
substring_index(Name,' ',-1) last_name
from contact
Do note that last names can have spaces in them (e.g. "Walter de la Mare").
fiddle

Related

In a SQL request, how to find all the records that have digits in the last part of the string (after last whitespace)

Thanks if someone can help.
In my table, I have a street column, and a number column. But for some records, the number of the house is at the end of the street name in the street column, separated by a whitespace.
From a request in phpmyadmin, I would like to remove the last block of the street column (after last whitespace) if this block contains any digit and put this block in the number column.
I entered that request in phpmyadmin to just find those records.
SELECT `street`,`number`
FROM `map`
WHERE `street` REGEXP '[\r\n\t\f\v ][0-9]+ ^[\r\n\t\f\v ]'
but the request is not complete because it doesn't take only the last block, and because it's also not removing the substring and putting into the number column.
Examples for how it should work: (street column, number column) :
('Rue van Malder 47B', '-1') becomes ('Rue van Malder', '47B')
('Rue des 2 Arbres 511B', '-1') becomes ('Rue des 2 Arbres', '511B') ->only last block with one ore more digits moves from street to number column
('place du 4 Août', '1') stays ('place du 4 Août', '1') because the digit '4' is not in the last block
('751 2nd St', '-1') stays ('751 2nd St', '-1') for same reason than just above
Gordon Linoff, thanks, your answer was already a good element of answer, but I can't completely transform your proposition to update my fields. This request almost worked for filling the number column:
UPDATE map
SET number = substring_index(street, ' ', -1)
WHERE street IN (SELECT street REGEXP '[0-9]+$')
but something is missing, because a field like this:
('Bd de la 2e armée Britannique', '-1') becomes ('Bd de la 2e armée Britannique', 'Britannique')
and this element should not be affected because the digit is not in the last block of the string
Also, how could I remove this last block in the street column with another UPDATE request to finally obtain a truncated string in the field street:
('Rue van Malder 47B', '-1') becomes ('Rue van Malder', '47B')
Thanks
I exactly found the request I was needed with your help Gordon Linoff and some searches on google and I'll explain it here for eventual future help:
UPDATE map
SET number = substring_index(street, ' ', -1),
street = LEFT(street, LENGTH(street) - LENGTH(substring_index(street, ' ', -1))-1)
WHERE street REGEXP '[\r\n\t\f\v ][0-9]+[a-zA-Z]*$'
So first, I update in the 'map' table 2 columns: number and street:
-number is found with the last part of the string from the last space
-street is found by replacing it with a substring removing to the whole 'street' string, the length of the number part (part we juste founded before)
-REGEXP means:
$ means that we have to touch the end of the string
[\r\n\t\f\v ] : a space (no sign + or * means we exactly search for one space)
[0-9] : a digit (+ sign means that we search at 1 to unlimited digits)
[a-zA-Z] : a letter (* sign means we search at 0 to unlimited letters)
So this REGEXP will always take the last block after last space, because we are searching for a space, 1 or more digits, 0 or more letters, and $ means at the end.
You appear to be using MySQL. If so, this comes very close:
select (case when substring_index(street, ' ', -1) + 0 > 0
then substring_index(street, ' ', -1)
end)
If the street ends in 34xyz, then this would put in 34. For your version with just a number:
select (case when street regexp ' [0-9]+$'
then substring_index(street, ' ', -1)
end)
The update would look like this:
update t
set num = substring_index(street, ' ', -1) + 0
where street regexp ' [0-9]+$';
Aside from the UPDATE, etc, I think you need 2 steps to determine that the last word has digits:
WHERE street REGEXP "[[:space:]][^[:space:]0-9]+$"
Should be TRUE when the last word does not contain a digit. Note: I am checking for leading, trailing, or embedded digit(s). The statement of the problem, together with the examples, was was ambiguous in this area.
After that, you can use something like this to extract the last word:
SUBSTRING_INDEX(street, ' ', -1)
but that only works for "space", not for general "white space" as in [[:space:]]. You really need to do the task in a language that has full regexp support. (Note: MariaDB is better than MySQL in this area, but still may fall short.)

Removing Punctuation from a set of results

I have a query as below
select ContactName,Address, concat(City,' ', StateOrRegion,' ',PostalCode) as 'Region Info'
from Customers
with the results here
Maria Anders Obere Str. 57 Berlin 12209
Ana Trujillo Avda. de la Constitución 2222 México D.F. 05021
Antonio Moreno Mataderos 2312 México D.F. 05023
Thomas Hardy 120 Hanover Sq. London WA1 1DP
Christina Berglund Berguvsvägen 8 Luleå S-958 22
Hanna Moos Forsterstr. 57 Mannheim 68306
Frédérique Citeaux 24, place Kléber Strasbourg 67000
Martín Sommer C/ Araquil, 67 Madrid 28023
Laurence Lebihan 12, rue des Bouchers Marseille 13008
Elizabeth Lincoln 23 Tsawassen Blvd. Tsawassen BC T2F 8M4
My question is in the address field can I remove the punctuation without creating a table and if so what is the best way to go about it. would working with Ltrim and or rtrim be a possibility to this?
If you have a limited set of items you want to remove, you can simply use REPLACE(x, y, z) to replace the characters you want to remove with a zero-length string. x is the string to be searched, y is the string to find, and z is the string to replace y with.
An example:
DECLARE #a VARCHAR(50);
SET #a = 'This, is a test.';
SELECT REPLACE(REPLACE(#a, '.', ''), ',', '');
This will remove both the comma and the period from the string. Depending on the scale of your problem, this may work well.
Instead of using CONCAT(), why not simply use + to concatenate the values?
I'd rewrite your query as:
SELECT c.ContactName
, c.Address
, [Region Info] = c.City + ' ' + c.StateOrRegion + ' ' + c.PostalCode
FROM dbo.Customers c;
You may notice I've capitalized the keywords in my query; this provides a great way to easily recognize keywords separately from column and table names, etc.
Also, you want to explicitly specify the schema; normally this is dbo. This will make your code less susceptible to problems in future if someone creates a new schema that happens to contain a table with the same name as the ones in your FROM clause.
You should also get in the habit of specifying an alias for items in the FROM clause, and use that alias in the other parts of your query. This makes debugging a lot simpler down the road.

Count the frequency of each word

I've been trolling the internet and realize that MySQL is not the best way to get at this but I'm asking anyway. What query, function or stored procedure has anyone seen or used that will get the frequency of a word across a text column.
ID|comment
----------------------
Ex. 1|I love this burger
2|I hate this burger
word | count
-------|-------
burger | 2
I | 2
this | 2
love | 1
hate | 1
This solution seems to do the job (stolen almost verbatim from this page). It requires an auxiliary table, filled with sequential numbers from 1 to at least the expected number of distinct words. This is quite important to check that the auxiliary table is large enough, or results will be wrong (showing no error).
SELECT
SUBSTRING_INDEX(SUBSTRING_INDEX(maintable.comment, ' ', auxiliary.id), ' ', -1) AS word,
COUNT(*) AS frequency
FROM maintable
JOIN auxiliary ON
LENGTH(comment)>0 AND SUBSTRING_INDEX(SUBSTRING_INDEX(comment, ' ', auxiliary.id), ' ', -1)
<> SUBSTRING_INDEX(SUBSTRING_INDEX(comment, ' ', auxiliary.id-1), ' ', -1)
GROUP BY word
HAVING word <> ' '
ORDER BY frequency DESC;
SQL Fiddle
This approach is as inefficient as one can be, because it cannot use any index.
As an alterative, I would use a statistics table that I would keep up-to-date with triggers. Perhaps initialise the stats table with the above.
Something like this should work. Just make sure you don't pass in a 0 length string.
SET #searchString = 'burger';
SELECT
ID,
LENGTH(comment) - LENGTH(REPLACE(comment, #searchString, '')) / LENGTH(#searchString) AS count
FROM MyTable;

Substring from last index

ABC:123 UVW XYZ NN-000
What is the best method to get the value after the last space using substr()? In this case I want to get NN-000 but also be able to get that last value in the case that it's greater than or less than 6 characters.
In Oracle, use SUBSTR and INSTR functions
SELECT SUBSTR('ABC:123 UVW XYZ NN-000', INSTR('ABC:123 UVW XYZ NN-000', ' ', -1))
AS LASTOCCUR
FROM DUAL
RESULT:
| LASTOCCUR |
-------------
| NN-000 |
Refer LIVE DEMO
In MySQL you could use reverse and substring_index:
select data,
rv,
reverse(substring_index(rv, ' ', 1)) yd
from
(
select data,
reverse(data) rv
from yt
) d;
See SQL Fiddle with Demo
In Oracle you could use reverse, substr and instr:
select data,
reverse(substr(rv, 0, instr(rv, ' '))) rv
from
(
select data, reverse(data) rv
from yt
) d
See SQL Fiddle with Demo
Combine the powers of RIGHT(),REVERSE() AND LOCATE()
SELECT RIGHT('ABC:123 UVW XYZ NN-000',LOCATE(' ',REVERSE('ABC:123 UVW XYZ NN-000'))-1)
EDIT: Locate in MYSQL, not CHARINDEX
REVERSE() reverses the string, so that the 'first' space it finds is really the last one. You could use SUBSTRING instead of RIGHT, but if what you're after is at the end of the string, might as well use RIGHT.

MySql ordering problem

Consider the situation i have a table name "test"
-------
content (varchar(30))
-------
1
abc
2
bcd
-------
if i use order by
Select * from test order by content asc
i could get result like
--------
content
--------
1
2
abc
bcd
---------
but is there any way i could get the following result using query
--------
content
--------
abc
bcd
1
2
---------
To get by the collation, you can do by testing the first character... it appears you want anything starting with a numeric to be after anything alhpa oriented... something like the ISNUMERIC() representation by Ted, but my quick check doesn't show such function in MySQL.. So an alternative... because numerics in ASCII list are less than "A" (char 65)
Select *
from test
order by
case when left( content, 1 ) < "A" then 2 else 1 end,
content
Although I've seen different CONVERT() calls, I don't have MySQL available to confirm. However, in addition to the above case/when, you can add a SECOND case/when and call some UDF() or other convert function on the "content" value. If the string starts as alpha, it should return a zero value so the first case/when will keep them to the top of the list, then since all are all non-convertible to numeric would have a value of zero... no impact on the sort, then finally the content itself which will keep in alpha order.
HOWEVER, if your second case/when / convert function call DOES return a numeric value, then it will be properly sorted within the numeric grouping segment... which will then supercede that of the content... However, if content was something like
100 smith rd and
100 main st
they will sort in the same "100" category numeric value, but then alphabetically by the content as
100 main st
100 smith rd
100
this will do it:
SELECT *
FROM test
ORDER BY CAST(field AS UNSIGNED), field ASC
select * from sometable order by content between '0' and '9', content
Not sure on MySql but on SQL Server you can do this...
SELECT * FROM test
ORDER BY IsNumeric(content), content
The order of results is defined by collation used, so if you can find the right collation then yes.
http://dev.mysql.com/doc/refman/5.0/en/charset-collate.html
//edit
This is tricky. I've done some research and it seems that no currently available collation can do that. However there's also possibility to add new collation to MySQL. Here's how.