Retrieving special characters using regular expressions - html

I am trying to retrieve some information from a website using regular expression. I ended up with an output containing html entity for a special character.
For example, instead of Côté I am getting Côt&eacute.
Please help in retrieving the actual string. TIA.

HtmlDecode should work for you:
http://msdn.microsoft.com/en-us/library/7c5fyk1k.aspx
s = HttpUtility.HtmlDecode(s)

Related

How to remove brackets, quotes from table?

I have a column uuid looking like as shown in the picture. It is a JSON type. I want to remove
the square brackets from each row and then the quotes (which I can remove using JSON_UNQUOTE). I tried using JSON_EXTRACT(uuid, '$[0]') but with this, I can only select one value at a time e.g. "5f5616fd88b3484bb636e6dbf5a702b6" not all the values inside the square brackets at once.
Once this is done, I want to remove quotes from each value and then again add brackets back. After this I want to export it as csv and use it for building a network graph using Networkx python library.
I am very open to suggestions, if my idea is wrong. Thank you!
You can't do that with JSON functions, because what you are trying to produce is not valid JSON.
However, you can process the json value with string functions. If you just want to replace the embedded double quotes, you can do:
replace(uuid, '"', '')

MYSQL REGEXP with JSON array

I have an JSON string stored in the database and I need to SQL COUNT based on the WHERE condition that is in the JSON string. I need it to work on the MYSQL 5.5.
The only solution that I found and could work is to use the REGEXP function in the SQL query.
Here is my JSON string stored in the custom_data column:
{"language_display":["1","2","3"],"quantity":1500,"meta_display:":["1","2","3"]}
https://regex101.com/r/G8gfzj/1
I now need to create a SQL sentence:
SELECT COUNT(..) WHERE custom_data REGEXP '[HELP_HERE]'
The condition that I look for is that the language_display has to be either 1, 2 or 3... or whatever value I will define when I create the SQL sentence.
So far I came here with the REGEX expression, but it does not work:
(?:\"language_display\":\[(?:"1")\])
Where 1 is replaced with the value that I look for. I could in general look also for "1" (with quotes), but it will also be found in the meta_display array, that will have different values.
I am not good with REGEX! Any suggestions?
I used the following regex to get matches on your test string
\"language_display\":\[(:?\"[0-9]\"\,)*?\"3\"(:?\,\"[0-9]\")*?\]
https://regex101.com/ is a free online regex tester, it seems to work great. Start small and work big.
Sorry it doesn't work for you. It must be failing on the non greedy '*?' perhaps try without the '?'
Have a look at how to serialize this data, with an eye to serializing the language display fields.
How to store a list in a column of a database table
Even if you were to get your idea working it will be slow as fvck. Better off to process through each row once and generate something more easily searched via sql. Even a field containing the comma separated list would be better.

MySQL find/replace with a unique string inside

not sure how far I'm going to get with this, but I'm going through a database removing certain bits and pieces in preparation for a conversion to different software.
I'm struggling with the image tags as on the site they currently look like
[img:<string>]<image url>[/img:<string>]
those strings are in another field called bbcode_uid
The query I'm running to make the changes so far is
UPDATE phpbb_posts SET post_text = REPLACE(post_text, '[img:]', '');
So my actual question, is there any way of pulling in each string from bbcode_uid inside of that SQL query so that I don't have to run the same command 10,000+ times, changing the unique string every time.
Alternatively could I include something inside [img:] to also include the next 8 characters, whatever they may be, as that is the length of the string that is used.
Hoping to save time with this, otherwise I might have to think of another way of doing it.
As requested.
The text I wish to replace would be
[img:1nynnywx]http://i.imgur.com/Tgfrd3x.jpg[/img:1nynnywx]
I want to end up with just
http://i.imgur.com/Tgfrd3x.jpg
Just removing the code around the URL, however each post_text has a different string which is contained inside bbcode_uid.
Method 1
LIB_MYSQLUDF_PREG
If you want more regular expression power in your database, you can consider using LIB_MYSQLUDF_PREG. This is an open source library of MySQL user functions that imports the PCRE library. LIB_MYSQLUDF_PREG is delivered in source code form only. To use it, you'll need to be able to compile it and install it into your MySQL server. Installing this library does not change MySQL's built-in regex support in any way. It merely makes the following additional functions available:
PREG_CAPTURE extracts a regex match from a string. PREG_POSITION returns the position at which a regular expression matches a string. PREG_REPLACE performs a search-and-replace on a string. PREG_RLIKE tests whether a regex matches a string.
All these functions take a regular expression as their first parameter. This regular expression must be formatted like a Perl regular expression operator. E.g. to test if regex matches the subject case insensitively, you'd use the MySQL code PREG_RLIKE('/regex/i', subject). This is similar to PHP's preg functions, which also require the extra // delimiters for regular expressions inside the PHP string
you can refer this link :github.com/hholzgra/mysql-udf-regexp
Method 2
Use php program, fetch records one by one , use php preg_replace
refer : www.php.net/preg_replace
reference:http://www.online-ebooks.info/article/MySql_Regular_Expression_Replace.html
You might be able to do this with substring_index().
The following will work on your example:
select substring_index(substring_index(post_text, '[/img:', 1), ']', -1)

MySQL: Find and Replace Between Certain Characters

In field post_content I have a string like this in nearly 800 rows:
http://somesite.com/">This is some site</a>
I need to remove everything from "> onwards so that it leaves just the URL. I can't do a straight find and replace because the text is unique.
Any clues? This is really my first foray into MySQL database modifications but I did do an extensive search before posting here.
Thanks,
~Kyle~
From this site: http://www.regular-expressions.info/mysql.html
LIB_MYSQLUDF_PREG
If you want more regular expression power in your database, you can consider using LIB_MYSQLUDF_PREG. This is an open source library of MySQL user functions that imports the PCRE library. LIB_MYSQLUDF_PREG is delivered in source code form only. To use it, you'll need to be able to compile it and install it into your MySQL server. Installing this library does not change MySQL's built-in regex support in any way. It merely makes the following additional functions available:
Here it comes...
PREG_CAPTURE extracts a regex match from a string. PREG_POSITION returns the position at which a regular expression matches a string. PREG_REPLACE performs a search-and-replace on a string. PREG_RLIKE tests whether a regex matches a string.
Sounds exactly what you're looking for.
All these functions take a regular expression as their first parameter. This regular expression must be formatted like a Perl regular expression operator. E.g. to test if regex matches the subject case insensitively, you'd use the MySQL code PREG_RLIKE('/regex/i', subject). This is similar to PHP's preg functions, which also require the extra // delimiters for regular expressions inside the PHP string.
See this post: How to do a regular expression replace in MySQL?
Either that or you could just write a script in any lanugage which goes through each record, does a regex replacement and then updates the field. For more info on regex, see here: http://www.regular-expressions.info/reference.html
There's a number of options. One might be to use SUBSTRING_INDEX():
UPDATE
table
SET field = SUBSTRING_INDEX( field, '">', 1 )
It's possible - there is a syntax for User Defined Functions which would let you pass in a regular expression pattern that matches the link and strips everything else.
However, this is quite complicated for somebody new to MySQL, and from your question, this sounds like a one-off. In which case - why not just use Excel and then reimport the data?
Great stuff!
All seems doable with a little bit of time and self education.
In the end, I exported that table as a CSV in Sequel Pro and did some nifty find and replace work in Coda. Not as sophisticated as your suggestions, but it worked.
Thanks again,
~Kyle~

MySQL: Formatting a string

My database contains a string pattern that is used to allow for easy user editing via a JS-script.
The string is basicly formatted like so:
aaa[bbb#ccc]ddd[eee#fff]ggg
the result I am looking for is
aaacccdddfffggg
I'd like to do this when selecting the string from the database. I'm guessing a regex should do the trick. But my knowledge in the subject of regex's is rather limited. However, this is not a requirement, if there exist a more elegant solution to the problem.
Unfortunately, you can only use a MySQL REGEXP in a WHERE clause, to match against values. You can't use them to transform Strings.
You'll either need to do it clientside, or work with the other String Functions. A MID() would do the trick, if the lengths and positions of the substrings are fixed. If not, use POSITION() (or LOCATE()) to find the special characters []#.