Extracting second word from each row in a column - mysql

I have a vendors table in my database that am experimenting with, as shown below
And when i run the sql command below
SELECT vendor_name
FROM vendors
ORDER BY vendor_name
LIMIT 10
I get the output below
My issue is am trying to extract the second word from each vendor_name and when the second word doesn't exist it's supposed to return a blank cell.
And below is the sql query i have written to do just that
SELECT vendor_name,
SUBSTRING(
SUBSTRING( vendor_name, LOCATE(' ', vendor_name) + 1),
1,
LOCATE( ' ', SUBSTRING( vendor_name, LOCATE(' ', vendor_name) + 1) ) - 1
) AS second_word
FROM vendors
ORDER BY vendor_name
LIMIT 10
And here is the output of that sql query
If you notice from the output above, when the words in the vendor_name are more than two, it returns the second word just fine and when the vendor_name contains one word it returns a blank cell as expected.
Problem comes when the vendor_name contains exactly two words, instead of returning the second word it is returning a blank cell for example in the case of American Express and ASC Signs.
How can i better improve my query so that even when the vendor_name does contain two words, it does return the second word instead of a blank cell?
Thank you.

That's because there is no space after the second word, if the text ends there, the locate() has no space to find.
Quick hack: Add a space at the end.
LOCATE( ' ', CONCAT(SUBSTRING( vendor_name, LOCATE(' ', vendor_name) + 1), ' ') ) - 1

SELECT vendor_name , substr(vendor_name , instr(vendor_name, " ") ,
case when LOCATE (' ', vendor_name,instr(vendor_name, " ") ) > 0 then LOCATE (' ',
vendor_name,instr(vendor_name, " ") ) else CHAR_LENGTH (vendor_name) end )
from vendors ;

I took tips from both #stick bit and #kiran gadhe and i came up with this sql query and it's working just fine
SELECT vendor_name,
CASE
WHEN INSTR( vendor_name, ' ' ) = 0
THEN
''
ELSE
SUBSTRING(
SUBSTRING( vendor_name, LOCATE(' ', vendor_name) + 1),
1,
LOCATE( ' ', CONCAT(SUBSTRING( vendor_name, LOCATE(' ', vendor_name) + 1), ' ') ) - 1
)
END AS second_word
FROM vendors
ORDER BY vendor_name
LIMIT 10

Related

How do I split every data or string that has a space on it using MySQL?

I wrote a query which is this:
SELECT mrt_name as MRT ,
operation_alpha_numeric_codes as Original,
SUBSTRING_INDEX(operation_alpha_numeric_codes,' ', 1) as First_code,
SUBSTRING_INDEX(operation_alpha_numeric_codes,' ', -1) as Second_Code,
SUBSTRING_INDEX(operation_alpha_numeric_codes,' ', -2) as Third_Code
FROM scraping.xp_pn_mrt;
I got result like this
As you can see the second_code copies the value of the original or the first_code if the value doesn't have a corresponding space or data. Also, the third code gets the second_code in the records that have a third code in them. How do I prevent the data being copied or set it to blank when the code doesn't have a corresponding value in it and how can I achieve getting the third code without copying the second one? Can someone help me with my query and what's wrong with it? Thanks a lot.
Since you're using MariaDB, you can use REGEXP_REPLACE to extract the parts of the code that you want:
SELECT
operation_alpha_numeric_codes as Original,
REGEXP_REPLACE(operation_alpha_numeric_codes, '^([^ ]+)(?: ([^ ]+))?(?: ([^ ]+))?$', '\\1') as First_code,
REGEXP_REPLACE(operation_alpha_numeric_codes, '^([^ ]+)(?: ([^ ]+))?(?: ([^ ]+))?$', '\\2') as Second_code,
REGEXP_REPLACE(operation_alpha_numeric_codes, '^([^ ]+)(?: ([^ ]+))?(?: ([^ ]+))?$', '\\3') as Third_code
FROM data
Output for (part of) your sample data
Original First_code Second_code Third_code
NS23 NS23
NS24 NE6 CC1 NS24 NE6 CC1
NS25 EW13 NS25 EW13
Demo on dbfiddle
Here's a version that will also work on MySQL 5.7, using RLIKE to check if the input matches given patterns:
SELECT
operation_alpha_numeric_codes as Original,
SUBSTRING_INDEX(operation_alpha_numeric_codes, ' ', 1) AS First_code,
CASE WHEN operation_alpha_numeric_codes RLIKE '^([^ ]+)$' THEN ''
WHEN operation_alpha_numeric_codes RLIKE '^([^ ]+)( ([^ ]+))?$' THEN SUBSTRING_INDEX(operation_alpha_numeric_codes, ' ', -1)
ELSE SUBSTRING_INDEX(SUBSTRING_INDEX(operation_alpha_numeric_codes, ' ', 2), ' ', -1)
END AS Second_code,
CASE WHEN operation_alpha_numeric_codes RLIKE '^([^ ]+)( ([^ ]+)){2}$' THEN SUBSTRING_INDEX(operation_alpha_numeric_codes, ' ', -1)
ELSE ''
END AS Third_code
FROM data
Demo on dbfiddle
You can try the following:
SELECT
mrt_name as MRT ,
operation_alpha_numeric_codes as Original,
SUBSTRING_INDEX(SUBSTRING_INDEX(operation_alpha_numeric_codes , ' ', 1), ' ', -1) AS First_code,
If( length(operation_alpha_numeric_codes ) - length(replace(operation_alpha_numeric_codes , ' ', ''))>=1,
SUBSTRING_INDEX(SUBSTRING_INDEX(operation_alpha_numeric_codes , ' ', 2), ' ', -1) , '')
as Second_code,
If( length(operation_alpha_numeric_codes ) - length(replace(operation_alpha_numeric_codes , ' ', ''))>=2,
SUBSTRING_INDEX(SUBSTRING_INDEX(operation_alpha_numeric_codes , ' ', 3), ' ', -1), '')
AS Third_code
FROM scraping.xp_pn_mrt;
[EDIT]
For double spaces between each value, This will work:
SELECT
mrt_name as MRT ,
operation_alpha_numeric_codes as Original,
If( length(operation_alpha_numeric_codes ) - length(replace(operation_alpha_numeric_codes , ' ', ''))>=2,
SUBSTRING_INDEX(SUBSTRING_INDEX(operation_alpha_numeric_codes , ' ', 3), ' ', -1) , '')
as Second_code,
If( length(operation_alpha_numeric_codes ) - length(replace(operation_alpha_numeric_codes , ' ', ''))>=4,
SUBSTRING_INDEX(SUBSTRING_INDEX(operation_alpha_numeric_codes , ' ', 3), ' ', -1), '') AS Third_code
FROM scraping.xp_pn_mrt;
I have used CASE WHEN clause with LENGTH function.
I used LENGTH to calculate number of occurences of your separator ' ' in the string. CASE WHEN it is ONE occurances then there are TWO "results". CASE WHEN it is TWO occurances then there are THREE "results".
Here is the DEMO that will show the correct results for your two problematic data.
SELECT mrt_name as MRT
, operation_alpha_numeric_codes as Original
, SUBSTRING_INDEX(operation_alpha_numeric_codes,' ', 1) as First_code
, CASE WHEN LENGTH(operation_alpha_numeric_codes) - LENGTH(REPLACE(operation_alpha_numeric_codes, ' ', '')) = 1
THEN SUBSTRING_INDEX(operation_alpha_numeric_codes,' ', -1)
WHEN LENGTH(operation_alpha_numeric_codes) - LENGTH(REPLACE(operation_alpha_numeric_codes, ' ', '')) = 2
THEN SUBSTRING_INDEX(SUBSTRING_INDEX(operation_alpha_numeric_codes,' ', 2), ' ', -1)
ELSE NULL
END Second_Code
, CASE WHEN LENGTH(operation_alpha_numeric_codes) - LENGTH(REPLACE(operation_alpha_numeric_codes, ' ', '')) = 2
THEN SUBSTRING_INDEX(SUBSTRING_INDEX(operation_alpha_numeric_codes,' ', -2), ' ', -1)
ELSE NULL
END Third_Code
FROM scraping.xp_pn_mrt;

Insert white space before the last three characters

I have a table of UK postcodes. All of them are in different format, some are capitalized with white space some are not. What I want to do is format them so they can follow the UK postcode standard. For instance AB1 2BB.
I used this query for the purpose which does work, but some postcodes have a longer or shorter first part so it does not succeed for all.
SELECT UPPER(INSERT((REPLACE(postcode , ' ', '')) , 4, 0, ' ')) AS postcode
However if I try to do it the other way around
SELECT UPPER(INSERT((REPLACE(postcode , ' ', '')) , -4, 0, ' ')) AS postcode
It does not work and returns all the postcodes glued together e.g AB12BB
What I want is to put a space before the last 3 characters.
I think you want:
select concat_ws(' ',
left(replace(postcode, ' ', ''), 3),
right(replace(postcode, ' ', ''), 3
) as standardized_postcode
Insert a space at the 3th char from the end of the string, after you remove all the spaces:
SELECT UPPER(INSERT(REPLACE(postcode , ' ', ''), LENGTH(REPLACE(postcode , ' ', '')) - 2, 0, ' ')) AS postcode
It sounds like you're going to be dealing with postcodes like this:
LS10 1DH
LS101DH
LS63DR
etc. We should start by removing all of the spaces:
REPLACE(postcode,' ','') -- LS10 1DH becomes LS101DH
taking the last 3 characters:
RIGHT(REPLACE(postcode,' ',''), 3) -- 1DH
and all of the characters up to the 3rd from last:
LEFT(REPLACE(postcode,' ',''), LEN(REPLACE(postcode,' ','')) - 3))
Then use CONCAT to bring it all together:
SET #postcode = 'LS101DH';
SELECT CONCAT(LEFT(REPLACE(#postcode,' ',''), LENGTH(REPLACE(#postcode,' ','')) - 3),
' ', -- add a space in
RIGHT(REPLACE(#postcode,' ',''), 3));
for pc in postcodes:
print('{} {}'.format(pc[:-3], pc[-3:]))
Seems to work for a given list of UK postcodes for me.

return the Nth word from database

I want to get nth word from a column I'm using a code line and it works for me but there is an issue, for example:
First line is: "the Nth word from database"
Second line is: "return the Nth word from database and more words"
When I search for 6th word 'database' it returns my first line and second line but I don't want to get my first line because it has only 5 words.
thank you all
My code line:
SELECT *,
SUBSTRING_INDEX(SUBSTRING_INDEX(`Text`, ' ', 6), ' ', -1) as Nth
FROM `tbl_name`
Having six words in you sentence means that you have to have at least five spaces, adding simlpe condition will resolve your problem:
select *,
case when length(`text`) - length(replace(`text`, ' ', '')) >= 5 then
substring_index(replace(`text`, substring_index(`text`, ' ', 5) , ''), ' ', 2)
else null end Nth
from `tbl_name`
Also I changed your query, because it didn't take into account that you might not have 6th space (exactly six words).
Demo
Or even more concicse:
select *,
substring_index(substring_index(`text`, ' ', 5 - (length(`text`) - length(replace(sentence`text` ' ', ''))) - 1), ' ', 1)
from `tbl_name`
Another demo.
You should update your query with where clause, in where you can count the number of words by the following query.
SELECT *, SUBSTRING_INDEX(SUBSTRING_INDEX(`Text`, ' ', 6), ' ', -1) as Nth
FROM `tbl_name`
where (COUNT(column1) - LENGTH(replace(column1, ' ', '')) > 5
You should have to take count of spaces or whichever string which you want to take & then need to apply having-clause on that count.
SELECT
* ,
SUBSTRING_INDEX( SUBSTRING_INDEX( `text` , ' ', 6 ) , ' ', -1 ) AS Nth,
ROUND( ( LENGTH( `text` ) - LENGTH( REPLACE( `text` , " ", "" ) ) ) / LENGTH( " " ) ) AS countq
FROM `xp_test`
HAVING
countq >= 5

Splitting a single column (name) into two (forename, surname) in SQL

Currently I'm working on a database redesign project. A large bulk of this project is pulling data from the old database and importing it into the new one.
One of the columns in a table from the old database is called 'name'. It contains a forename and a surname all in one field (ugh). The new table has two columns; forenames and surname. I need to come up with a clean, efficient way to split this single column into two.
For now I'd like to do everything in the same table and then I can easily transfer it across.
3 columns:
Name (the forename and surname)
Forename (currently empty, first half of name should go here)
Surname (currently empty, second half of name should go here)
What I need to do: Split name in half and place into forename and surname
If anyone could shed some light on how to do this kind of thing I would really appreciate it as I haven't done anything like this in SQL before.
Database engine: MySQL
Storage engine: InnoDB
A quick solution is to use SUBSTRING_INDEX to get everything at the left of the first space, and everything past the first space:
UPDATE tablename
SET
Forename = SUBSTRING_INDEX(Name, ' ', 1),
Surname = SUBSTRING_INDEX(Name, ' ', -1)
Please see fiddle here. It is not perfect, as a name could have multiple spaces, but it can be a good query to start with and it works for most names.
Try this:
insert into new_table (forename, lastName, ...)
select
substring_index(name, ' ', 1),
substring(name from instr(name, ' ') + 1),
...
from old_table
This assumes the first word is the forename, and the rest the is lastname, which correctly handles multi-word last names like "John De Lacey"
For the people who wants to handle fullname: John -> firstname: John, lastname: null
SELECT
if( INSTR(`name`, ' ')=0,
TRIM(SUBSTRING(`name`, INSTR(`name`, ' ')+1)),
TRIM(SUBSTRING(`name`, 1, INSTR(`name`, ' ')-1)) ) first_name,
if( INSTR(`name`, ' ')=0,
null,
TRIM(SUBSTRING(`name`, INSTR(`name`, ' ')+1)) ) last_name
It works fine with John Doe. However if user just fill in John with no last name, SUBSTRING(name, INSTR(name, ' ')+1)) as lastname will return John instead of null and firstname will be null with SUBSTRING(name, 1, INSTR(name, ' ')-1).
In my case I added if condition check to correctly determine lastname and trim to prevent multiple spaces between them.
This improves upon the answer given, consider entry like this "Jack Smith Smithson", if you need just first and last name, and you want first name to be "Jack Smith" and last name "Smithson", then you need query like this:
-- MySQL
SELECT
SUBSTR(full_name, 1, length(full_name) - length(SUBSTRING_INDEX(full_name, ' ', -1)) - 1) as first_name,
SUBSTRING_INDEX(full_name, ' ', -1) as last_name
FROM yourtable
Just wanted to share my solution. It also works with middle names. The middle name will be added to the first name.
SELECT
TRIM(SUBSTRING(name,1, LENGTH(name)- LENGTH(SUBSTRING_INDEX(name, ' ', -1)))) AS firstname,
SUBSTRING_INDEX(name, ' ', -1) AS lastname
I had a similar problem but with Names containing multiple names, eg. "FirstName MiddleNames LastName" and it should be "MiddleNames" and not "MiddleName".
So I used a combo of substring() and reverse() to solve my problem:
select
SystemUser.Email,
SystemUser.Name,
Substring(SystemUser.Name, 1, instr(SystemUser.Name, ' ')) as 'First Name',
reverse(Substring(reverse(SystemUser.Name), 1, instr(reverse(SystemUser.Name), ' '))) as 'Last Name',
I do not need the "MiddleNames" part and maybe this is not the most efficient way to solve it, but it works for me.
Got here from google, and came up with a slightly different solution that does handle names with more than two parts (up to 5 name parts, as would be created by space character). This sets the last_name column to everything to the right of the 'first name' (first space), it also sets full_name to the first name part. Perhaps backup your DB before running this :-) but here it is it worked for me:
UPDATE users SET
name_last =
CASE
WHEN LENGTH(SUBSTRING_INDEX(full_name, ' ', 1)) = LENGTH(full_name) THEN ''
WHEN LENGTH(SUBSTRING_INDEX(full_name, ' ', 2)) = LENGTH(full_name) THEN SUBSTRING_INDEX(del_name, ' ', -1)
WHEN LENGTH(SUBSTRING_INDEX(full_name, ' ', 3)) = LENGTH(full_name) THEN SUBSTRING_INDEX(del_name, ' ', -2)
WHEN LENGTH(SUBSTRING_INDEX(full_name, ' ', 4)) = LENGTH(full_name) THEN SUBSTRING_INDEX(del_name, ' ', -3)
WHEN LENGTH(SUBSTRING_INDEX(full_name, ' ', 5)) = LENGTH(full_name) THEN SUBSTRING_INDEX(del_name, ' ', -4)
WHEN LENGTH(SUBSTRING_INDEX(full_name, ' ', 6)) = LENGTH(full_name) THEN SUBSTRING_INDEX(del_name, ' ', -5)
ELSE ''
END,
full_name = SUBSTRING_INDEX(full_name, ' ', 1)
WHERE LENGTH(name_last) = 0 or LENGTH(name_last) is null or name_last = ''
SUBSTRING_INDEX didn't work for me in SQL 2018, so I used this:
declare #fullName varchar(50) = 'First Last1 Last2'
declare #first varchar(50)
declare #last varchar(50)
select #last = right(#fullName, len(#fullName)-charindex(' ',#fullName, 1)), #first = left(#fullName, (charindex(' ', #fullName, 1))-1);
Yields #first = 'First', #last = 'Last1 Last2'

Invalid length parameter passed to the LEFT or SUBSTRING function

I've seen a few of these questions asked but haven't spotted one that's helped!! I'm trying to select the first part of a postcode only, essentially ignoring anything after the space. the code I am using is
SUBSTRING(PostCode, 1 , CHARINDEX(' ', PostCode ) -1)
However, I am getting:
Invalid length parameter passed to the LEFT or SUBSTRING function
There's no nulls or blanks but there are some the only have the first part. Is this what causing the error and if so what's the work around?
That would only happen if PostCode is missing a space.
You could add conditionality such that all of PostCode is retrieved should a space not be found as follows
select SUBSTRING(PostCode, 1 ,
case when CHARINDEX(' ', PostCode ) = 0 then LEN(PostCode)
else CHARINDEX(' ', PostCode) -1 end)
CHARINDEX will return 0 if no spaces are in the string and then you look for a substring of -1 length.
You can tack a trailing space on to the end of the string to ensure there is always at least one space and avoid this problem.
SELECT SUBSTRING(PostCode, 1 , CHARINDEX(' ', PostCode + ' ' ) -1)
This is because the CHARINDEX-1 is returning a -ive value if the look-up for " " (space) is 0. The simplest solution would be to avoid '-ve' by adding
ABS(CHARINDEX(' ', PostCode ) -1))
which will return only +ive values for your length even if CHARINDEX(' ', PostCode ) -1) is a -ve value. Correct me if I'm wrong!
One of the selected column is null or empty.
Something else you can use is isnull:
isnull( SUBSTRING(PostCode, 1 , CHARINDEX(' ', PostCode ) -1), PostCode)