SQL query - Replace/Move some parts of content - mysql

I need to update about 2000 records in MySQL
I have a column 'my_content' from table 'my_table' with the folowing value
Title: some title here<br />Author: John Smith<br />Size: 2MB<br />
I have created 3 new columns (my_title, my_author and my_size) and now I need to separate the content of 'my_content' like this
'my_title'
some title here
'my_author'
John Smith
'my_size'
2MB
As you can imagine the title, author and size are always different for each row.
What I'm thinking is to query the following, but I'm not great at SQL queries and I'm not sure what the actually query would look like.
This is what I'm trying to do:
Within 'my_content' find everything that starts with "title:..." and ends with "...<br />au" and move it to 'my_title'
Within 'my_content' find everything that starts with "thor:..." and ends with "...<br />s" and move it to 'my_author'
Within 'my_content' find everything that starts with "ize:..." and ends with "...<br />" and move it to 'my_size'
I just don't know how to write a query to do this.
Once all the content is in the new columns, I can just find and delete the content that's not needed any more, for example 'thor:' , etc.

You can use INSTR to find the index of your delimiters and SUBSTRING to select out the part you want. So, for instance, the author would be
SUBSTR(my_content,
INSTR(my_content, "Author: ") + 8,
INSTR(my_content, "Size: ") - INSTR(my_content, "Author: ") - 8)
You'd need a bit more work to trim the <br/> and any surrounding whitespace.

Please try the below:
SELECT SUBSTRING(SUBSTRING_INDEX(mycontent,'<br />',1),LOCATE('Title: ',mycontent)+7) as mytitle,
SUBSTRING(SUBSTRING_INDEX(mycontent,'<br />',2),LOCATE('Author: ',mycontent)+8) as myauthor,
SUBSTRING(SUBSTRING_INDEX(mycontent,'<br />',3),LOCATE('Size: ',mycontent)+6) as mysize
FROM mytable;

Related

How to duplicate a column to a list of comma delimited items in SQL

I am trying to figure out how to append the id to each item in the tags list. For example I have a item id of 01 and its corresponding tags are Recycled, leather, case, holder, iPad, snap, kindle. I am trying to figure out how to output the data in a way that it can be exported from mySQL line by line.
01;Recycled
01;leather
01;case
01;.....
02;agrarian
02;urban
id tags
01 Recycled, leather, case, holder, iPad, snap, kindle
02 agrarian, urban, planter, eco, ...
I have tried pulling the id into a table, I have tried using substring to parse the data, but I just can't figure out how to get the data the way I am looking for.
Thank you for your help
I am guessing that you want group_concat():
select id, group_concat(tag separator ', ') as tags
from t
group by id;

Extract text from column in select of MySql query

I have a table named sentEmails where the body column contains the body text of an email.
In the body text, there is a substring like:
some link: <a href="https://somelink#somesite.com/somePage.php?someVar=someVal&sentby=agent">Random link text
Using MySql, I need to extract the url from this column like https://somelink#somesite.com/somePage.php?someVar=someVal&sentby=agent
I was thinking something like the below would work by finding the starting location and returning the next 150 chars, of course it actually just returns the first 150 chars.
SELECT LEFT(body, LOCATE('some link: <a href="', body)+150) AS link
FROM sentEmails
WHERE sent between date_sub(now(),INTERVAL 1 WEEK) and now()
AND body like '%some link:%'
AND toEmail = 'email#gmail.com'
Additional info:
the link will always be preceded by the text some link:
Random link text at the end will change
I can live with getting a bit more of the text than need if I have to, for example, getting https://somelink#somesite.com/somePage.php">Random link text would be acceptable
the text shown above is a substring of the full body column which contains much more text
This isnt something Im going to be doing often. Im researching an issue and I need the links from 40-50 of these rows, Im just hoping to avoid having to pull the link manually from each row.
I can only use MySQL Query Browser to access this DB if I could connect with php, this would be trivial
The url in question, can have 6-25 parameters in it
The url in question will always end with this parameter &sentby=agent
If you had two unique delimiters around the URL, then could just use SUBSTRING() to isolate it. One approach would be to replace the two sides of the URL in the anchor tag with a delimeter:
SELECT SUBSTRING_INDEX(SUBSTRING_INDEX(
REPLACE(REPLACE(body, '<a href="', '~'), '&sentby=agent">', '&sentby=agent~'), '~', -2),
'~', 1)
FROM sentEmails
WHERE sent BETWEEN DATE_SUB(NOW(), INTERVAL 1 WEEK) AND NOW() AND
body LIKE '%some link:%' AND
toEmail = 'email#gmail.com'
I replaced <a href=" and "> with ~. If ~ does not occur anywhere in the body column, and if you only have one HTML tag in the body, then this should work.
If the body column is just a big chunk of HTML, then you should consider using xpath and handling this in your app layer.
if you're just trying to extract the link out, can you do instr() and mid function. something like this
select mid(body,substr(body,'="'),substr(body,'">')-substr(body,'="')) from email...
substr(body,'="') = starting position of the link =" and substr(body,'">') is the end position of the link.
MID function takes (str,pos, len) and len = end position - starting position
Thanks to Tim's help, I was able to get this working with the below query:
SELECT SUBSTRING_INDEX( SUBSTRING_INDEX(body, 'some link: <a href="', -1) , 'sentby=agent">', 1) AS link
FROM sentEmails
where sent between date_sub(now(),INTERVAL 1 WEEK) and now()
AND body like '%some link:%'
AND toEmail = 'email#gmail.com'
Doing this kind of search is not convenient. As the table with emails grows in size, the query will be less and less performant.
If this is a new application you're building, you're better with keeping a separate table with the list of URLs used on each sent email. You'd write the URLs to the DB as you send the emails.
The reasoning of this is that the App will do more searches in the DB than sending emails. Therefore, by doing a little extra work when sending emails, you help a lot in the most-expensive usage of the feature, which is the search.
If you still decide to keep the current approach, you'll want to have an index containing the columns (toEmail, sent) in this order.
Other than that, your approach makes sense and will work. Did you actually try it? Does it work for you?

Full Text Search always returns empty result set

I have a table named 'fact' with title column, that should be full text index.
Firstly, I add a full text index:
ALTER TABLE fact ADD FULLTEXT title_fts (title)
So, I insert row:
INSERT INTO fact (id, title) VALUES ('1', 'red blue yellow ok green grey ten first wise form');
And then I perform search:
select * from fact f where contains (f.title, '"red" or "blue"')
When I perform the following query, or any other query with 'contains' statement, I get emtpy result set:
I have to use this statement, not against-match or like.
Does anyone happen to have an idea why is this happening?
Thank you.
There are two really important concepts when using full text search. The first is the minimum work length (see here). By default, the value is 4, meaning that words shorter than this are ignored. Bye-bye "red", "ok", "ten" and other short words.
The second important concept is the stop words list (see here). This would also get rid of "ok" and "first".
Your text doesn't have "blue" and "red" is ignored, so your query doesn't return anything.
You will need to re-build your index after you decide on the words that you really need to include.

Comparing two url-slugs to find count of same words

I want to find similar posts on my website depending on the url slug.
say i have the following five slugs
i-am-a-slug /*the slug i want to compare*/
i-am-another-slug /* 3 same words */
i-am-an-ant /* 2 same words */
the-slug-life /* 1 same word */
foo-bar /* 0 same words */
at the moment i am using the following code to find out if there are any similar words in the compared slug
SELECT *
FROM News
WHERE News.slug != "i-am-a-slug"
ORDER BY CASE
WHEN News.slug REGEXP "i|am|a|slug" THEN 1
ELSE 2
it doesn't even work very well... words like the a in the example give back hits in nearly every slug i have in my database... in the example, even the slug foo-bar would be returned.
i can't seem to figure out how to select a variable same-words-count that counts all the same words within each tested slug (see comments first code-block for the solutions i would like to get), so i could
SELECT *
FROM News
WHERE News.slug != "i-am-a-slug"
ORDER BY CASE
WHEN same-words-count > 2 THEN 1
WHEN same-words-count = 2 THEN 2
WHEN same-words-count = 1 THEN 3
ELSE 4
or is there an even better way to do this?
thank you very much in advance, sorry my mysql is a bit rusty lately...
What you are looking for is called inverted index: http://en.wikipedia.org/wiki/Inverted_index
Depending on the data amount and what you actually are going to do with the result (do you need to keep it up to date, do you need to show it on the site, etc) you might want to solve the problem using a programming language of your choice or use some full-blown fulltext search solution. Plain SQL just isn't a right tool for this.

Working Around SQL Replace Wildcards

I know that I cannot use a wildcard in a MySQL replace query through phpMyAdmin. But, I need some kind of workaround. I'm very open to ideas. Here's the skinny:
I have about 2,000 pages in a MySQL database that need to have image URL's updated. Some are local, some are hotlinked. Each one is different, the URL lengths vary, the image on the page and the new image are unique per page id number, and each one occurs at a different spot in the page.
I basically need to do the following:
UPDATE pages SET body = replace(body, 'src=\"%\"', 'src=\"http://newdomain/newimage.jpg\"') WHERE id="{page_number}"
But I know that the 'src=\"%\"' component doesn't jive.
So I fall at the feet of your collective knowledge to come up with some way to take the src="%" and replace it with a set URL for a set page id number. Thanks in advance.
If there's only one image per page, a quick solution would be like this:
UPDATE pages
SET
body = CONCAT(
SUBSTRING_INDEX(body, 'src="', 1),
'src=\"http://newdomain/newimage.jpg\"',
SUBSTRING(
SUBSTRING_INDEX(body, 'src="', -1)
FROM LOCATE('"', SUBSTRING_INDEX(body, 'src="', -1))+1)
)
WHERE
id="{page_number}" AND
body NOT LIKE '%<img%<img%';
First SUBSTRING_INDEX extract the body part at the left of src=", the last two nested SUBSTRING_INDEX extracts the body part at the right of the first " next to src=".
Last check is a very dirty check to make sure that only one image is present in the string. It could fail under some circumstances, but it might help.
My suggestion would be to build a table with your replace strings that would look like this:
page_id replace
1 src="..."
Then you can update across a JOIN like this
UPDATE pages AS p
INNER JOIN replace AS r
ON p.page_id = r.page_id
SET p.body = REPLACE(p.body, CONCAT('src="', SUBSTRING_INDEX(SUBSTRING_INDEX(p.body, 'src="', -1), '"', 1), '"', r.replace);
This would replace the last occurrence anything of format src="..." with a new value in same format, so this would work for all records with a single src value.