Removing Punctuation from a set of results - sql-server-2014

I have a query as below
select ContactName,Address, concat(City,' ', StateOrRegion,' ',PostalCode) as 'Region Info'
from Customers
with the results here
Maria Anders Obere Str. 57 Berlin 12209
Ana Trujillo Avda. de la Constitución 2222 México D.F. 05021
Antonio Moreno Mataderos 2312 México D.F. 05023
Thomas Hardy 120 Hanover Sq. London WA1 1DP
Christina Berglund Berguvsvägen 8 Luleå S-958 22
Hanna Moos Forsterstr. 57 Mannheim 68306
Frédérique Citeaux 24, place Kléber Strasbourg 67000
Martín Sommer C/ Araquil, 67 Madrid 28023
Laurence Lebihan 12, rue des Bouchers Marseille 13008
Elizabeth Lincoln 23 Tsawassen Blvd. Tsawassen BC T2F 8M4
My question is in the address field can I remove the punctuation without creating a table and if so what is the best way to go about it. would working with Ltrim and or rtrim be a possibility to this?

If you have a limited set of items you want to remove, you can simply use REPLACE(x, y, z) to replace the characters you want to remove with a zero-length string. x is the string to be searched, y is the string to find, and z is the string to replace y with.
An example:
DECLARE #a VARCHAR(50);
SET #a = 'This, is a test.';
SELECT REPLACE(REPLACE(#a, '.', ''), ',', '');
This will remove both the comma and the period from the string. Depending on the scale of your problem, this may work well.
Instead of using CONCAT(), why not simply use + to concatenate the values?
I'd rewrite your query as:
SELECT c.ContactName
, c.Address
, [Region Info] = c.City + ' ' + c.StateOrRegion + ' ' + c.PostalCode
FROM dbo.Customers c;
You may notice I've capitalized the keywords in my query; this provides a great way to easily recognize keywords separately from column and table names, etc.
Also, you want to explicitly specify the schema; normally this is dbo. This will make your code less susceptible to problems in future if someone creates a new schema that happens to contain a table with the same name as the ones in your FROM clause.
You should also get in the habit of specifying an alias for items in the FROM clause, and use that alias in the other parts of your query. This makes debugging a lot simpler down the road.

Related

split string with space in MYSQL

i have data like
Name
-----------------
Ram Mohan
Ram Lal Mohan
Ram K Lal Mohan
...
I am using:
select SUBSTRING_INDEX(Name,' ',1) from contact
to get first name
select SUBSTRING_INDEX(Name,' ',-1) from contact
to get last name
am getting data like
first name last name
------------------------
Ram Mohan
Ram Mohan
Ram Mohan
but data i should get be like
first name last name
------------------------
Ram Mohan
Ram Lal Mohan
Ram K Lal Mohan
only last word after space should come in last name
rest should come in first name
can some one help me in finding a way to achieve this?
You could use a regex replacement here, assuming you are on MySQL 8+:
SELECT
REGEXP_REPLACE(Name, '\\s+\\S+$', '') AS first,
SUBSTRING_INDEX(Name,' ', -1) AS last
FROM contact;
Demo
For earlier versions of MySQL, and assuming that the last name would never appear anywhere else in the name, you could use SUBSTRING_INDEX along with REPLACE:
SELECT
REPLACE(Name, CONCAT(' ', SUBSTRING_INDEX(Name,' ', -1)), '') AS first,
SUBSTRING_INDEX(Name,' ', -1) AS last
FROM contact;
Demo
This second approach basically justs deletes off the last name (plus leading space) which you were already correctly finding using SUBSTRING_INDEX. What is left behind should be the first, middle, etc., components you want.
Since you can get the last name, you can remove that number of characters to get the first name.
select
trim(left(Name,char_length(Name)-char_length(substring_index(Name,' ',-1)))) first_name,
substring_index(Name,' ',-1) last_name
from contact
Do note that last names can have spaces in them (e.g. "Walter de la Mare").
fiddle

How do I Query for used BETWEEN Operater for text searches in MySql database?

I have a SQL Table in that i use BETWEEN Operater.
The BETWEEN Operater selects values within range. The values can be numbers, text , dates.
stu_id name city pin
1 Raj Ranchi 123456
2 sonu Delhi 652345
3 ANU KOLKATA 879845
4 K.K's Company Delhi 345546
5 J.K's Company Delhi 123456
I have a query like this:-
SELECT * FROM student WHERE stu_id BETWEEN 2 AND 4 //including 2 & 4
SELECT * FROM `student` WHERE name between 'A' and 'K' //including A & not K
Here My Question is why not including K.
but I want K also in searches.
Don't use between -- until you really understand it. That is just general advice. BETWEEN is inclusive, so your second query is equivalent to:
WHERE name >= 'A' AND
name <= 'K'
Because of the equality, 'K' is included in the result set. However, names longer than one character and starting with 'K' are not -- "Ka" for instance.
Instead, be explicit:
WHERE name >= 'A' AND
name < 'L'
Of course, BETWEEN can be useful. However, it is useful for discrete values, such as integers. It is a bit dangerous with numbers with decimals, strings, and date/time values. That is why I encourage you to express the logic as inequalities.
In supplement to gordon's answer, one way to get what you're expecting is to turn your name into a discrete set of values:
SELECT * FROM `student` WHERE LEFT(name, 1) between 'A' and 'K'
You need to appreciate that K.K's Company is alphabetically AFTER the letter K on its own so it is not BETWEEN, in the same way that 4.1 is not BETWEEN 2 and 4
By stripping it down to just a single character from the start of the string it will work like you expect, but take cautionary note, you should always avoid running functions on values in tables, because if you had a million names, thats a million strings that mysql has to strip out to just the first letter and it might no longer be able to use an index on name, battering the performance.
Instead, you could :
SELECT * FROM `student` WHERE name >= 'A' and name < 'L'
which is more likely to permit the use of an index as you aren't manipulating the stored values before comparing them
This works because it asks for everything up to but not including L.. Which includes all of your names starting with K, even kzzzzzzzz. Numerically it is equivalent to saying number >= 2 and number < 5 which gives you all the numbers starting with 2, 3 or 4 (like the 4.1 from before) but not the 5
Remember that BETWEEN is inclusive at both ends. Always revert to a pattern of a >= b and a < c, a >= c and a < d when you want to specify ranges that capture all possible values
Compare in lexicographical order, 'K.K's Company' > 'K'
We should convert the string to integer. You can try that mysql script with CAST and SUBSTRING. I've updated your script here. It will include the last record as well.
SELECT * FROM student WHERE name CAST(SUBSTRING(username FROM 1) AS UNSIGNED)
BETWEEN 'A' AND 'K';
The script will work. Hope it will helps to you.
Here I've attached my test sample.

In a SQL request, how to find all the records that have digits in the last part of the string (after last whitespace)

Thanks if someone can help.
In my table, I have a street column, and a number column. But for some records, the number of the house is at the end of the street name in the street column, separated by a whitespace.
From a request in phpmyadmin, I would like to remove the last block of the street column (after last whitespace) if this block contains any digit and put this block in the number column.
I entered that request in phpmyadmin to just find those records.
SELECT `street`,`number`
FROM `map`
WHERE `street` REGEXP '[\r\n\t\f\v ][0-9]+ ^[\r\n\t\f\v ]'
but the request is not complete because it doesn't take only the last block, and because it's also not removing the substring and putting into the number column.
Examples for how it should work: (street column, number column) :
('Rue van Malder 47B', '-1') becomes ('Rue van Malder', '47B')
('Rue des 2 Arbres 511B', '-1') becomes ('Rue des 2 Arbres', '511B') ->only last block with one ore more digits moves from street to number column
('place du 4 Août', '1') stays ('place du 4 Août', '1') because the digit '4' is not in the last block
('751 2nd St', '-1') stays ('751 2nd St', '-1') for same reason than just above
Gordon Linoff, thanks, your answer was already a good element of answer, but I can't completely transform your proposition to update my fields. This request almost worked for filling the number column:
UPDATE map
SET number = substring_index(street, ' ', -1)
WHERE street IN (SELECT street REGEXP '[0-9]+$')
but something is missing, because a field like this:
('Bd de la 2e armée Britannique', '-1') becomes ('Bd de la 2e armée Britannique', 'Britannique')
and this element should not be affected because the digit is not in the last block of the string
Also, how could I remove this last block in the street column with another UPDATE request to finally obtain a truncated string in the field street:
('Rue van Malder 47B', '-1') becomes ('Rue van Malder', '47B')
Thanks
I exactly found the request I was needed with your help Gordon Linoff and some searches on google and I'll explain it here for eventual future help:
UPDATE map
SET number = substring_index(street, ' ', -1),
street = LEFT(street, LENGTH(street) - LENGTH(substring_index(street, ' ', -1))-1)
WHERE street REGEXP '[\r\n\t\f\v ][0-9]+[a-zA-Z]*$'
So first, I update in the 'map' table 2 columns: number and street:
-number is found with the last part of the string from the last space
-street is found by replacing it with a substring removing to the whole 'street' string, the length of the number part (part we juste founded before)
-REGEXP means:
$ means that we have to touch the end of the string
[\r\n\t\f\v ] : a space (no sign + or * means we exactly search for one space)
[0-9] : a digit (+ sign means that we search at 1 to unlimited digits)
[a-zA-Z] : a letter (* sign means we search at 0 to unlimited letters)
So this REGEXP will always take the last block after last space, because we are searching for a space, 1 or more digits, 0 or more letters, and $ means at the end.
You appear to be using MySQL. If so, this comes very close:
select (case when substring_index(street, ' ', -1) + 0 > 0
then substring_index(street, ' ', -1)
end)
If the street ends in 34xyz, then this would put in 34. For your version with just a number:
select (case when street regexp ' [0-9]+$'
then substring_index(street, ' ', -1)
end)
The update would look like this:
update t
set num = substring_index(street, ' ', -1) + 0
where street regexp ' [0-9]+$';
Aside from the UPDATE, etc, I think you need 2 steps to determine that the last word has digits:
WHERE street REGEXP "[[:space:]][^[:space:]0-9]+$"
Should be TRUE when the last word does not contain a digit. Note: I am checking for leading, trailing, or embedded digit(s). The statement of the problem, together with the examples, was was ambiguous in this area.
After that, you can use something like this to extract the last word:
SUBSTRING_INDEX(street, ' ', -1)
but that only works for "space", not for general "white space" as in [[:space:]]. You really need to do the task in a language that has full regexp support. (Note: MariaDB is better than MySQL in this area, but still may fall short.)

how can I replace different characters in a column without using cursor?

Sample items in table1
table1.productname
Moshi Monsters 7-in-1 Accessory Pack - Poppet
Mario vs. Donkey Kong Mini-Land Mayhem!
I would like to replace '- . !' from all the productname but using
select case
when CHARINDEX ('-',[productname])>0 then REPLACE (ProductName ,'-',' ')
when CHARINDEX ('!',[productname])>0 then REPLACE (ProductName ,'!','')
when CHARINDEX ('.',[productname])>0 then REPLACE (ProductName ,'.','')
else productname
end as productname
from table1
seems to replace only -
output
Moshi Monsters 7 in 1 Accessory Pack Poppet
Mario vs. Donkey Kong Mini-Land Mayhem
expected output
Moshi Monsters 7 in 1 Accessory Pack Poppet
Mario vs Donkey Kong MiniLand Mayhem
How shall I approach for solution of this, I have multiple characters in the productname to replace such as in example and more and the column is around 5k big.
actually I wanted to update the table1 with the changed name but wanted to see which are changed and how before I update. It seems all the requirement is not fulfilled with this kind of replace statement.
Seems it could be done with multiple iterations in update but do not know how to use the iteration in update. How shall I process ahead?
You're always using the SAME source for your string replacements: the original ProductName field, which does not change. You need to chain the replacements:
REPLACE(REPLACE(REPLACE(ProductName, '.', ''), '!', ''), '-', '')
which gets hideously ugly very fast. You'd be better off doing this in your client.

sql server 2008 filter on big list passed in

I am having a performance issue. I am trying to select from a table based on a very long list of parameters
Currently am using this stored proc
CREATE PROC [dbo].[GetFileContentsFromTitles]
#MyTitles varchar(max)
AS
SELECT [Title], [Sequence] From [dbo].[MasterSequence]
WHERE charindex(',' + Title + ',', ',' + #MyTitles + ',') > 0;
Where #MyTitles can be a very big number (currently doing a string with 4000 entries seperated by commas). Any suggestions? Thanks
OK, if you want performance for something like this, then you need to use the best stuff out there. First, create this function for splitting strings (which I got from Jeff Moden about two weeks ago):
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
ALTER FUNCTION [dbo].[DelimitedSplit8K]
/**********************************************************************************************************************
Purpose:
Split a given string at a given delimiter and return a list of the split elements (items).
Notes:
1. Leading a trailing delimiters are treated as if an empty string element were present.
2. Consecutive delimiters are treated as if an empty string element were present between them.
3. Except when spaces are used as a delimiter, all spaces present in each element are preserved.
Returns:
iTVF containing the following:
ItemNumber = Element position of Item as a BIGINT (not converted to INT to eliminate a CAST)
Item = Element value as a VARCHAR(8000)
Statistics on this function may be found at the following URL:
http://www.sqlservercentral.com/Forums/Topic1101315-203-4.aspx
CROSS APPLY Usage Examples and Tests:
--=====================================================================================================================
-- TEST 1:
-- This tests for various possible conditions in a string using a comma as the delimiter. The expected results are
-- laid out in the comments
--=====================================================================================================================
--===== Conditionally drop the test tables to make reruns easier for testing.
-- (this is NOT a part of the solution)
IF OBJECT_ID('tempdb..#JBMTest') IS NOT NULL DROP TABLE #JBMTest
;
--===== Create and populate a test table on the fly (this is NOT a part of the solution).
-- In the following comments, "b" is a blank and "E" is an element in the left to right order.
-- Double Quotes are used to encapsulate the output of "Item" so that you can see that all blanks
-- are preserved no matter where they may appear.
SELECT *
INTO #JBMTest
FROM ( --# & type of Return Row(s)
SELECT 0, NULL UNION ALL --1 NULL
SELECT 1, SPACE(0) UNION ALL --1 b (Empty String)
SELECT 2, SPACE(1) UNION ALL --1 b (1 space)
SELECT 3, SPACE(5) UNION ALL --1 b (5 spaces)
SELECT 4, ',' UNION ALL --2 b b (both are empty strings)
SELECT 5, '55555' UNION ALL --1 E
SELECT 6, ',55555' UNION ALL --2 b E
SELECT 7, ',55555,' UNION ALL --3 b E b
SELECT 8, '55555,' UNION ALL --2 b B
SELECT 9, '55555,1' UNION ALL --2 E E
SELECT 10, '1,55555' UNION ALL --2 E E
SELECT 11, '55555,4444,333,22,1' UNION ALL --5 E E E E E
SELECT 12, '55555,4444,,333,22,1' UNION ALL --6 E E b E E E
SELECT 13, ',55555,4444,,333,22,1,' UNION ALL --8 b E E b E E E b
SELECT 14, ',55555,4444,,,333,22,1,' UNION ALL --9 b E E b b E E E b
SELECT 15, ' 4444,55555 ' UNION ALL --2 E (w/Leading Space) E (w/Trailing Space)
SELECT 16, 'This,is,a,test.' --E E E E
) d (SomeID, SomeValue)
;
--===== Split the CSV column for the whole table using CROSS APPLY (this is the solution)
SELECT test.SomeID, test.SomeValue, split.ItemNumber, Item = QUOTENAME(split.Item,'"')
FROM #JBMTest test
CROSS APPLY dbo.DelimitedSplit8K(test.SomeValue,',') split
;
--=====================================================================================================================
-- TEST 2:
-- This tests for various "alpha" splits and COLLATION using all ASCII characters from 0 to 255 as a delimiter against
-- a given string. Note that not all of the delimiters will be visible and some will show up as tiny squares because
-- they are "control" characters. More specifically, this test will show you what happens to various non-accented
-- letters for your given collation depending on the delimiter you chose.
--=====================================================================================================================
WITH
cteBuildAllCharacters (String,Delimiter) AS
(
SELECT TOP 256
'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789',
CHAR(ROW_NUMBER() OVER (ORDER BY (SELECT NULL))-1)
FROM master.sys.all_columns
)
SELECT ASCII_Value = ASCII(c.Delimiter), c.Delimiter, split.ItemNumber, Item = QUOTENAME(split.Item,'"')
FROM cteBuildAllCharacters c
CROSS APPLY dbo.DelimitedSplit8K(c.String,c.Delimiter) split
ORDER BY ASCII_Value, split.ItemNumber
;
-----------------------------------------------------------------------------------------------------------------------
Other Notes:
1. Optimized for VARCHAR(8000) or less. No testing or error reporting for truncation at 8000 characters is done.
2. Optimized for single character delimiter. Multi-character delimiters should be resolvedexternally from this
function.
3. Optimized for use with CROSS APPLY.
4. Does not "trim" elements just in case leading or trailing blanks are intended.
5. If you don't know how a Tally table can be used to replace loops, please see the following...
http://www.sqlservercentral.com/articles/T-SQL/62867/
6. Changing this function to use NVARCHAR(MAX) will cause it to run twice as slow. It's just the nature of
VARCHAR(MAX) whether it fits in-row or not.
7. Multi-machine testing for the method of using UNPIVOT instead of 10 SELECT/UNION ALLs shows that the UNPIVOT method
is quite machine dependent and can slow things down quite a bit.
-----------------------------------------------------------------------------------------------------------------------
Credits:
This code is the product of many people's efforts including but not limited to the following:
cteTally concept originally by Iztek Ben Gan and "decimalized" by Lynn Pettis (and others) for a bit of extra speed
and finally redacted by Jeff Moden for a different slant on readability and compactness. Hat's off to Paul White for
his simple explanations of CROSS APPLY and for his detailed testing efforts. Last but not least, thanks to
Ron "BitBucket" McCullough and Wayne Sheffield for their extreme performance testing across multiple machines and
versions of SQL Server. The latest improvement brought an additional 15-20% improvement over Rev 05. Special thanks
to "Nadrek" and "peter-757102" (aka Peter de Heer) for bringing such improvements to light. Nadrek's original
improvement brought about a 10% performance gain and Peter followed that up with the content of Rev 07.
I also thank whoever wrote the first article I ever saw on "numbers tables" which is located at the following URL
and to Adam Machanic for leading me to it many years ago.
http://sqlserver2000.databases.aspfaq.com/why-should-i-consider-using-an-auxiliary-numbers-table.html
-----------------------------------------------------------------------------------------------------------------------
Revision History:
Rev 00 - 20 Jan 2010 - Concept for inline cteTally: Lynn Pettis and others.
Redaction/Implementation: Jeff Moden
- Base 10 redaction and reduction for CTE. (Total rewrite)
Rev 01 - 13 Mar 2010 - Jeff Moden
- Removed one additional concatenation and one subtraction from the SUBSTRING in the SELECT List for that tiny
bit of extra speed.
Rev 02 - 14 Apr 2010 - Jeff Moden
- No code changes. Added CROSS APPLY usage example to the header, some additional credits, and extra
documentation.
Rev 03 - 18 Apr 2010 - Jeff Moden
- No code changes. Added notes 7, 8, and 9 about certain "optimizations" that don't actually work for this
type of function.
Rev 04 - 29 Jun 2010 - Jeff Moden
- Added WITH SCHEMABINDING thanks to a note by Paul White. This prevents an unnecessary "Table Spool" when the
function is used in an UPDATE statement even though the function makes no external references.
Rev 05 - 02 Apr 2011 - Jeff Moden
- Rewritten for extreme performance improvement especially for larger strings approaching the 8K boundary and
for strings that have wider elements. The redaction of this code involved removing ALL concatenation of
delimiters, optimization of the maximum "N" value by using TOP instead of including it in the WHERE clause,
and the reduction of all previous calculations (thanks to the switch to a "zero based" cteTally) to just one
instance of one add and one instance of a subtract. The length calculation for the final element (not
followed by a delimiter) in the string to be split has been greatly simplified by using the ISNULL/NULLIF
combination to determine when the CHARINDEX returned a 0 which indicates there are no more delimiters to be
had or to start with. Depending on the width of the elements, this code is between 4 and 8 times faster on a
single CPU box than the original code especially near the 8K boundary.
- Modified comments to include more sanity checks on the usage example, etc.
- Removed "other" notes 8 and 9 as they were no longer applicable.
Rev 06 - 12 Apr 2011 - Jeff Moden
- Based on a suggestion by Ron "Bitbucket" McCullough, additional test rows were added to the sample code and
the code was changed to encapsulate the output in pipes so that spaces and empty strings could be perceived
in the output. The first "Notes" section was added. Finally, an extra test was added to the comments above.
Rev 07 - 06 May 2011 - Peter de Heer, a further 15-20% performance enhancement has been discovered and incorporated
into this code which also eliminated the need for a "zero" position in the cteTally table.
**********************************************************************************************************************/
--===== Define I/O parameters
(#pString VARCHAR(8000), #pDelimiter CHAR(1))
RETURNS TABLE WITH SCHEMABINDING AS
RETURN
--===== "Inline" CTE Driven "Tally Table" produces values from 1 up to 10,000...
-- enough to cover VARCHAR(8000)
WITH E1(N) AS (
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1
), --10E+1 or 10 rows
E2(N) AS (SELECT 1 FROM E1 a, E1 b), --10E+2 or 100 rows
E4(N) AS (SELECT 1 FROM E2 a, E2 b), --10E+4 or 10,000 rows max
cteTally(N) AS (--==== This provides the "base" CTE and limits the number of rows right up front
-- for both a performance gain and prevention of accidental "overruns"
SELECT TOP (ISNULL(DATALENGTH(#pString),0)) ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM E4
),
cteStart(N1) AS (--==== This returns N+1 (starting position of each "element" just once for each delimiter)
SELECT 1 UNION ALL
SELECT t.N+1 FROM cteTally t WHERE SUBSTRING(#pString,t.N,1) = #pDelimiter
),
cteLen(N1,L1) AS(--==== Return start and length (for use in substring)
SELECT s.N1,
ISNULL(NULLIF(CHARINDEX(#pDelimiter,#pString,s.N1),0)-s.N1,8000)
FROM cteStart s
)
--===== Do the actual split. The ISNULL/NULLIF combo handles the length for the final element when no delimiter is found.
SELECT ItemNumber = ROW_NUMBER() OVER(ORDER BY l.N1),
Item = SUBSTRING(#pString, l.N1, l.L1)
FROM cteLen l
;
Yes, it long, but that's mostly comments to explain it and its history. Don't worry, its the fastest thing available in T-SQL (AFAIK only SQLCLR is faster, and that's not T-SQL).
Note that it only supports up to VARCHAR(8000). If your really need VARCHAR(MAX), then it can be easily changed to that, but VARCHAR(MAX)'s are about twice as slow.
Now you can implement your procedure like this:
CREATE PROC [dbo].[GetFileContentsFromTitles]
#MyTitles varchar(max)
AS
SELECT *
INTO #tmpTitles
FROM dbo.DelimitedSplit8K(#MyTitles, ',')
SELECT [Title], [Sequence] From [dbo].[MasterSequence]
WHERE Title IN (SELECT item FROM #tmpTitles)
I cannot test this for you without your DDL and some data, but it should be much faster. If not, then we may need to throw an index onto the [item] column in the temp table.
Here's another version of the split function that uses VARCHAR(MAX):
ALTER FUNCTION [dbo].[DelimitedSplitMax]
/**********************************************************************************************************************
Purpose:
Split a given string at a given delimiter and return a list of the split elements (items).
Notes:
1. Leading a trailing delimiters are treated as if an empty string element were present.
2. Consecutive delimiters are treated as if an empty string element were present between them.
3. Except when spaces are used as a delimiter, all spaces present in each element are preserved.
Returns:
iTVF containing the following:
ItemNumber = Element position of Item as a BIGINT (not converted to INT to eliminate a CAST)
Item = Element value as a VARCHAR(MAX)
Statistics on this function may be found at the following URL:
http://www.sqlservercentral.com/Forums/Topic1101315-203-4.aspx
-----------------------------------------------------------------------------------------------------------------------
Other Notes:
1. Optimized for VARCHAR(8000) or less. No testing or error reporting for truncation at 8000 characters is done.
2. Optimized for single character delimiter. Multi-character delimiters should be resolvedexternally from this
function.
3. Optimized for use with CROSS APPLY.
4. Does not "trim" elements just in case leading or trailing blanks are intended.
5. If you don't know how a Tally table can be used to replace loops, please see the following...
http://www.sqlservercentral.com/articles/T-SQL/62867/
6. Changing this function to use NVARCHAR(MAX) will cause it to run twice as slow. It's just the nature of
VARCHAR(MAX) whether it fits in-row or not.
7. Multi-machine testing for the method of using UNPIVOT instead of 10 SELECT/UNION ALLs shows that the UNPIVOT method
is quite machine dependent and can slow things down quite a bit.
-----------------------------------------------------------------------------------------------------------------------
Credits:
This code is the product of many people's efforts including but not limited to the following:
cteTally concept originally by Iztek Ben Gan and "decimalized" by Lynn Pettis (and others) for a bit of extra speed
and finally redacted by Jeff Moden for a different slant on readability and compactness. Hat's off to Paul White for
his simple explanations of CROSS APPLY and for his detailed testing efforts. Last but not least, thanks to
Ron "BitBucket" McCullough and Wayne Sheffield for their extreme performance testing across multiple machines and
versions of SQL Server. The latest improvement brought an additional 15-20% improvement over Rev 05. Special thanks
to "Nadrek" and "peter-757102" (aka Peter de Heer) for bringing such improvements to light. Nadrek's original
improvement brought about a 10% performance gain and Peter followed that up with the content of Rev 07.
I also thank whoever wrote the first article I ever saw on "numbers tables" which is located at the following URL
and to Adam Machanic for leading me to it many years ago.
http://sqlserver2000.databases.aspfaq.com/why-should-i-consider-using-an-auxiliary-numbers-table.html
-----------------------------------------------------------------------------------------------------------------------
Revision History:
Rev 00 - 20 Jan 2010 - Concept for inline cteTally: Lynn Pettis and others.
Redaction/Implementation: Jeff Moden
- Base 10 redaction and reduction for CTE. (Total rewrite)
Rev 01 - 13 Mar 2010 - Jeff Moden
- Removed one additional concatenation and one subtraction from the SUBSTRING in the SELECT List for that tiny
bit of extra speed.
Rev 02 - 14 Apr 2010 - Jeff Moden
- No code changes. Added CROSS APPLY usage example to the header, some additional credits, and extra
documentation.
Rev 03 - 18 Apr 2010 - Jeff Moden
- No code changes. Added notes 7, 8, and 9 about certain "optimizations" that don't actually work for this
type of function.
Rev 04 - 29 Jun 2010 - Jeff Moden
- Added WITH SCHEMABINDING thanks to a note by Paul White. This prevents an unnecessary "Table Spool" when the
function is used in an UPDATE statement even though the function makes no external references.
Rev 05 - 02 Apr 2011 - Jeff Moden
- Rewritten for extreme performance improvement especially for larger strings approaching the 8K boundary and
for strings that have wider elements. The redaction of this code involved removing ALL concatenation of
delimiters, optimization of the maximum "N" value by using TOP instead of including it in the WHERE clause,
and the reduction of all previous calculations (thanks to the switch to a "zero based" cteTally) to just one
instance of one add and one instance of a subtract. The length calculation for the final element (not
followed by a delimiter) in the string to be split has been greatly simplified by using the ISNULL/NULLIF
combination to determine when the CHARINDEX returned a 0 which indicates there are no more delimiters to be
had or to start with. Depending on the width of the elements, this code is between 4 and 8 times faster on a
single CPU box than the original code especially near the 8K boundary.
- Modified comments to include more sanity checks on the usage example, etc.
- Removed "other" notes 8 and 9 as they were no longer applicable.
Rev 06 - 12 Apr 2011 - Jeff Moden
- Based on a suggestion by Ron "Bitbucket" McCullough, additional test rows were added to the sample code and
the code was changed to encapsulate the output in pipes so that spaces and empty strings could be perceived
in the output. The first "Notes" section was added. Finally, an extra test was added to the comments above.
Rev 07 - 06 May 2011 - Peter de Heer, a further 15-20% performance enhancement has been discovered and incorporated
into this code which also eliminated the need for a "zero" position in the cteTally table.
Rev 07a- 18 Oct 2012 - RBarryYoung, Varchar(MAX), because its needed, even though its slower...
**********************************************************************************************************************/
--===== Define I/O parameters
(#pString VARCHAR(MAX), #pDelimiter CHAR(1))
RETURNS TABLE WITH SCHEMABINDING AS
RETURN
--===== "Inline" CTE Driven "Tally Table" produces values from 1 up to 100,000,000...
-- hopefully enough to cover most VARCHAR(MAX)'s
WITH E1(N) AS (
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1
), --10E+1 or 10 rows
E2(N) AS (SELECT 1 FROM E1 a, E1 b), --10E+2 or 100 rows
E4(N) AS (SELECT 1 FROM E2 a, E2 b), --10E+4 or 10,000 rows max
E8(N) AS (SELECT 1 FROM E4 a, E4 b), --10E+8 or 100,000,000 rows max
cteTally(N) AS (--==== This provides the "base" CTE and limits the number of rows right up front
-- for both a performance gain and prevention of accidental "overruns"
SELECT TOP (ISNULL(DATALENGTH(#pString),0)) ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM E8
),
cteStart(N1) AS (--==== This returns N+1 (starting position of each "element" just once for each delimiter)
SELECT 1 UNION ALL
SELECT t.N+1 FROM cteTally t WHERE SUBSTRING(#pString,t.N,1) = #pDelimiter
),
cteLen(N1,L1) AS(--==== Return start and length (for use in substring)
SELECT s.N1,
ISNULL(NULLIF(CHARINDEX(#pDelimiter,#pString,s.N1),0)-s.N1,999999999)
FROM cteStart s
)
--===== Do the actual split. The ISNULL/NULLIF combo handles the length for the final element when no delimiter is found.
SELECT ItemNumber = ROW_NUMBER() OVER(ORDER BY l.N1),
Item = SUBSTRING(#pString, l.N1, l.L1)
FROM cteLen l
;
Be forewarned, however, that I only set it up to count up to 100,000,000 characters. Also, I have not had a chance to test it yet, you should be sure to test it yourself.