Using MS SQL 2008, I need to remove email addresses contained within a string of text. i.e.:
"This is a sample line of text with multiple email addresses like fred#example.com or I could also have bert#home.co.uk or even someone#somewhere.pl to mix things up".
The desired result is:
"This is a sample line of text with multiple email addresses like or I could also have or even to mix things up"
or even:
"This is a sample line of text with multiple email addresses like fred#***** or I could also have bert#***** or even someone#***** to mix things up".
There are many examples of removing everything to the left or right of a certain character, but not the removal of the text to the left or right of a fixed character up to the first space. Any help is appreciated.
Assuming the string of text is within a table column you can do this in SQL Server 2008 with the help of a string spitting function and string concatenation via for xml. Within newer versions of SQL Server - which you should really try to migrate towards - this is made easier with functions like string_agg instead of the stuff( ... for xml ) combo.
This implentation is greedy in that it will mask any combination of characters between two spaces (or the start/end of the string) that include at least one # character. So fullsentences-like^this'one%will£be$included,so\long)as/there>is.a=#*character&somewhere as well as strings made up of just one or more # character.
I'll leave how to deal with strings that aren't email addresses up to you.
Query
with t as
(
select *
from(values('This is a sample line of text with multiple email addresses like fred#example.com or I could also have bert#home.co.uk or even someone#somewhere.pl to mix things up')
,('This is another sample line of text with multiple email addresses like fred2#example2.com or I could also have bert2#home2.co.uk or even someone2#somewhere2.pl to mix things up')
,('fullsentences-like^this''one%will£be$included,so\long)as/there>is.a=#*character&somewhere')
,('#')
,('a ##### b')
,('# # # # #')
,('Let''s meet # the beach')
) as s(s)
)
,s as
(
select t.s
,s.rn
,case when charindex('#',s.item) > 0 then '***' else s.item end as item
from t
cross apply dbo.fn_StringSplit4k(t.s,' ',null) as s
)
select t.s
,stuff((select ' ' + s.item
from s
where t.s = s.s
order by s.rn
for xml path('')
)
,1,1,''
) as s
from t;
Output
s
s
This is a sample line of text with multiple email addresses like fred#example.com or I could also have bert#home.co.uk or even someone#somewhere.pl to mix things up
This is a sample line of text with multiple email addresses like *** or I could also have *** or even *** to mix things up
This is another sample line of text with multiple email addresses like fred2#example2.com or I could also have bert2#home2.co.uk or even someone2#somewhere2.pl to mix things up
This is another sample line of text with multiple email addresses like *** or I could also have *** or even *** to mix things up
fullsentences-like^this'one%will£be$included,so\long)as/there>is.a=#*character&somewhere
***
#
***
a ##### b
a *** b
# # # # #
*** *** *** *** ***
Let's meet # the beach
Let's meet *** the beach
String Splitting Function
create function [dbo].[fn_StringSplit4k]
(
#str nvarchar(4000) = ' ' -- String to split.
,#delimiter as nvarchar(1) = ',' -- Delimiting value to split on.
,#num as int = null -- Which value to return.
)
returns table
as
return
-- Start tally table with 10 rows.
with n(n) as (select 1 union all select 1 union all select 1 union all select 1 union all select 1 union all select 1 union all select 1 union all select 1 union all select 1 union all select 1)
-- Select the same number of rows as characters in #str as incremental row numbers.
-- Cross joins increase exponentially to a max possible 10,000 rows to cover largest #str length.
,t(t) as (select top (select len(isnull(#str,'')) a) row_number() over (order by (select null)) from n n1,n n2,n n3,n n4)
-- Return the position of every value that follows the specified delimiter.
,s(s) as (select 1 union all select t+1 from t where substring(isnull(#str,''),t,1) = #delimiter)
-- Return the start and length of every value, to use in the SUBSTRING function.
-- ISNULL/NULLIF combo handles the last value where there is no delimiter at the end of the string.
,l(s,l) as (select s,isnull(nullif(charindex(#delimiter,isnull(#str,''),s),0)-s,4000) from s)
select rn
,item
from(select row_number() over(order by s) as rn
,substring(#str,s,l) as item
from l
) a
where rn = #num
or #num is null;
Related
i would like to perform Mysql search & replace with random characters, taken from a list. I cannot use regex, since my version is way prior to 8.
instead of the below,
i would like to change for instance the letter u with one out of (a,e,i,f,k) randomly.
UPDATE products
SET
productDescription = REPLACE(productDescription,
'abuot',
'about');
Is there a mysql command for this task?
Actually my goal is to get in the lastnames column, new names that are not exactly like the real ones, so one could work on "anonymous" data.
I would like to replace all rows in a certain column. Say in table products, in column description, we have data like:
abcud
ieruie
kjuklkllu
uiervfd
With the replace function, we would not want to create something like: replace e with i,
but replace e with one of (a,e,i,f,k)
example desired output:
abced
ierfie
kjiklkllk
aiervfd
like i said, we plan to use this into last names, we plan to replace many characters with random ones from a list, in an effort to create anonymous data in the column that contains last names.
On a next step, i would like to do the same, in order to make anonymous telephone numbers.
example
726456273
827364878
347823472
replace 3 with one of 0-9,
output:
726456279
827664878
547821472
SELECT REPLACE('product abuot Description',
SUBSTRING('product abuot Description', CHARINDEX('abuot', 'product abuot Description') ,5) , 'about')
CREATE FUNCTION smart_replace ( argument TEXT,
search_for CHAR(1),
replace_with TEXT )
RETURNS TEXT
NO SQL
BEGIN
SET argument = REPLACE(argument, search_for, CHAR(0));
REPEAT
SET argument = CONCAT( SUBSTRING_INDEX(argument, CHAR(0), 1),
SUBSTRING(replace_with FROM CEIL(RAND() * LENGTH(replace_with)) FOR 1),
SUBSTRING(argument FROM 2 + LENGTH(SUBSTRING_INDEX(argument, CHAR(0), 1))));
UNTIL NOT LOCATE(CHAR(0), argument) END REPEAT;
RETURN argument;
END
replace e with one of (a,e,i,f,k)
SELECT smart_replace(table.column, 'e', 'aeifk')
replace 3 with one of 0-9
SELECT smart_replace(table.phone, 'e', '0123456789')
Let's say we have following table.
UserId | Message
-------|-------------
1 | Hi, have a nice day
2 | Hi, I had a nice day
I need to have all { Hi,-have-a-nice-day-I-had } words separately.
Is there any way to do that ? What if I want to export words from whole database tables ?
Similar results would be also good.
try this:In Sql server 2005 or above
create table yourtable(RowID int, Layout varchar(200))
INSERT yourtable VALUES (1,'hello,world,welcome,to,tsql')
INSERT yourtable VALUES (2,'welcome,to,stackoverflow')
;WITH SplitSting AS
(
SELECT
RowID,LEFT(Layout,CHARINDEX(',',Layout)-1) AS Part
,RIGHT(Layout,LEN(Layout)-CHARINDEX(',',Layout)) AS Remainder
FROM YourTable
WHERE Layout IS NOT NULL AND CHARINDEX(',',Layout)>0
UNION ALL
SELECT
RowID,LEFT(Remainder,CHARINDEX(',',Remainder)-1)
,RIGHT(Remainder,LEN(Remainder)-CHARINDEX(',',Remainder))
FROM SplitSting
WHERE Remainder IS NOT NULL AND CHARINDEX(',',Remainder)>0
UNION ALL
SELECT
RowID,Remainder,null
FROM SplitSting
WHERE Remainder IS NOT NULL AND CHARINDEX(',',Remainder)=0
)
SELECT part FROM SplitSting ORDER BY RowID
SQLFIDDLE DEMO
Well, ok, here it goes.
In SQL Server you can use this...
SELECT word = d.value('.', 'nvarchar(max)')
FROM
(SELECT xmlWords = CAST(
'<a><i>' + replace([Message], ' ', '</i><i>') + '</i></a>' AS xml)
FROM MyMessageTbl) T(c)
CROSS APPLY c.nodes('/a/i') U(d)
And I hope that for MySQL you can use the same thing, using XML support - ExtractValue() etc.
EDIT: explanation
- replace([Message], ' ', '</i><i>') replaces e.g. 'my word' with 'my</i><i>word'
- then I add the beginning and the end of xml -> '<a><i>my</i><i>word</i></a>', so I have a valid xml... and cast it to xml type to be able to do something with it
- I select from that xml and shred xml nodes '/a/i' it to rows using CROSS APPLY c.nodes('/a/i');
alias rows using U(d), so one 'i' maps to column d (e.g. 'my')
- d.value('.', 'nvarchar(max)') extracts node content and casts it to character type
I have a MySQL table named "content"containing (a.o.) the fields "_date" and "text", for example:
_date text
---------------------------------------------------------
2011-02-18 I'm afraid my car won't start tomorrow
2011-02-18 I hope I'm going to pass my exams
2011-02-18 Exams coming up - I'm not afraid :P
2011-02-19 Not a single f was given this day
2011-02-20 I still hope I passed, but I'm afraid I didn't
2011-02-20 On my way to school :)
I'm looking for a query to count the number of times the words "hope" and "afraid" are being used per day. In other words, the output would have to be something like:
_date word count
-----------------------
2011-02-18 hope 1
2011-02-18 afraid 2
2011-02-19 hope 0
2011-02-19 afraid 0
2011-02-20 hope 1
2011-02-20 afraid 1
Is there an easy way to do this or should I just write I different query per term? I now have this, but I don't know what to put instead of "?"
SELECT COUNT(?) FROM content WHERE text LIKE '%hope' GROUP BY _date
Can somebody help met with the correct query for this?
I think the most easy and redable way is to make subquerys:
Select
_date, 'hope' as word,
sum( case when `text` like '%hope%' then 1 else 0 end) as n
from content
group by _date
UNION
Select
_date, 'afraid' as word,
sum( case when `text` like '%afraid%' then 1 else 0 end) as n
from content
group by _date
This approach has not the best performace. If you are looking for performance you should grouping in subquery by day, also this like condition is a performance killer. This is a solution if you only execute the query in batch mode time by time. Explain your performance requeriments for an accurate solution.
EDITED TO MATCH LAST OP REQUERIMENT
Your query is almost correct:
SELECT _date, 'hope' AS word, COUNT(*) as count
FROM content WHERE text LIKE '%hope%' GROUP BY _date
use %hope% to match the word anywhere (not only at the end of the string). COUNT(*) should do what you want.
To get multiple words from a single query, use UNION ALL
Another approach is to create a sequence of words on the fly and use it as the second table in a join:
SELECT _date, words.word, COUNT(*) as count
FROM (
SELECT 'hope' AS word
UNION
SELECT 'afraid' AS word
) AS words
CROSS JOIN content
WHERE text LIKE CONCAT('%', words.word, '%')
GROUP BY _date, words.word
Note that it will only count a single occurrence of each word per sentence. So »I hope there is still hope« will only give you 1, and not 2
To get 0 when there are no matches, join the previous result with the dates again:
SELECT content._date, COALESCE(result.word, 'no match'), COALESCE(result.count, 0)
FROM content
LEFT JOIN (
SELECT _date, words.word, COUNT(*) as count
FROM (
SELECT 'hope' AS word
UNION
SELECT 'afraid' AS word
) AS words
CROSS JOIN content
WHERE text LIKE CONCAT('%', words.word, '%')
GROUP BY _date, words.word ) AS result
ON content._date = result._date
Assuming you want to count all words and find the most used words (rather than looking for the count of a few specific words) you might want to try something like the following stored procedure (string splitting compliments of this blog post):
DROP PROCEDURE IF EXISTS wordsUsed;
DELIMITER //
CREATE PROCEDURE wordsUsed ()
BEGIN
DROP TEMPORARY TABLE IF EXISTS wordTmp;
CREATE TEMPORARY TABLE wordTmp (word VARCHAR(255));
SET #wordCt = 0;
SET #tokenCt = 1;
contentLoop: LOOP
SET #stmt = 'INSERT INTO wordTmp SELECT REPLACE(SUBSTRING(SUBSTRING_INDEX(`text`, " ", ?),
LENGTH(SUBSTRING_INDEX(`text`, " ", ? -1)) + 1),
" ", "") word
FROM content
WHERE LENGTH(SUBSTRING_INDEX(`text`, " ", ?)) != LENGTH(`text`)';
PREPARE cmd FROM #stmt;
EXECUTE cmd USING #tokenCt, #tokenCt, #tokenCt;
SELECT ROW_COUNT() INTO #wordCt;
DEALLOCATE PREPARE cmd;
IF (#wordCt = 0) THEN
LEAVE contentLoop;
ELSE
SET #tokenCt = #tokenCt + 1;
END IF;
END LOOP;
SELECT word, count(*) usageCount FROM wordTmp GROUP BY word ORDER BY usageCount DESC;
END //
DELIMITER ;
CALL wordsUsed();
You might want to write another query (or procedure) or add some nested "REPLACE" statements to further remove punctuation from the resulting temp table of words, but this should be a good start.
I have read quite a few selcet+update questions in here but cannot understand how to do it. So will have to ask from the beginning.
I would like to update a table based on data in another table. Setup is like this:
- TABLE a ( int ; string )
ID WORD
1 banana
2 orange
3 apple
- TABLE b ( "comma separated" string ; string )
WORDS TEXTAREA
0 banana -> 0,1
0 orange apple apple -> BEST:0,2,3 ELSE 0,2,3,3
0 banana orange apple -> 0,1,2,3
Now I would like to for each word in TABLE a append ",a.ID" to b.WORDS like:
SELECT id, word FROM a
(for each) -> UPDATE b SET words = CONCAT(words, ',', a.id) WHERE b.textarea like %a.word%
Or even better: replace the word found in b.textarea with ",a.id" so it is the b.textarea that ends up beeing a comma separeted string of id's... But I do not know if that is possible.
Tried this but not working. But I think I am getting closer:
UPDATE a, b
SET b.textarea =
replace(b.textarea,a.word,CONCAT(',',a.id))
WHERE a.word IN (b.textarea)
ORDER BY length(a.word) DESC
I ended up doing a work-a-round. I exported all a.words to excel and created an update for each row like this:
UPDATE `tx_ogarktiskdocarchive_loebe` SET `temp_dictionay` = replace(lower(temp_dictionay) , lower('Drygalski’s Grønlandsekspedition'), CONCAT(',',191));
Then I pasted the aprox 1000 rows into ans sql file and executed it. Done.
I had to do "a cleaner double post" of this one to get the answer.
A solution can be put together based on this manual:
http://dev.mysql.com/doc/refman/5.1/en/group-by-functions.html#function_group-concat
GROUP_CONCAT will make a comma separated string based on the fileds it shall CONCAT. Perfect. And regarding the preferred solution with no dublicates in the result there is this example in the manual that will filter out dublicates using DISTINCT inside the GROUP_CONCAT:
mysql> SELECT student_name,
-> GROUP_CONCAT(DISTINCT test_score
-> ORDER BY test_score DESC SEPARATOR ' ')
-> FROM student
-> GROUP BY student_name;
SELECT *
FROM `thread`
WHERE forumid NOT IN (1,2,3) AND IF( LEFT( title, 1) = '#', 1, 0)
ORDER BY title ASC
I have this query which will select something if it starts with a #. What I want to do is if # is given as a value it will look for numbers and special characters. Or anything that is not a normal letter.
How would I do this?
If you want to select all the rows whose "title" does not begin with a letter, use REGEXP:
SELECT *
FROM thread
WHERE forumid NOT IN (1,2,3)
AND title NOT REGEXP '^[[:alpha:]]'
ORDER BY title ASC
NOT means "not" (obviously ;))
^ means "starts with"
[[:alpha:]] means "alphabetic characters only"
Find more about REGEXP in MySQL's manual.
it's POSSIBLE you can try to cast it as a char:
CAST('#' AS CHAR)
but i don't know if this will work for the octothorpe (aka pound symbol :) ) because that's the symbol for starting a comment in MySQL
SELECT t.*
FROM `thread` t
WHERE t.forumid NOT IN (1,2,3)
AND INSTR(t.title, '#') = 0
ORDER BY t.title
Use the INSTR to get the position of a given string - if you want when a string starts, check for 0 (possibly 1 - the documentation doesn't state if it's zero or one based).