I have a list of a million or urls in an mysql table.
I need to cleanse the data (extract domains) so I can be confident about DISTINCT type queries.
Data is in several different types: -
www.domain.tld
domain.tld
http://domain.tld
https://vhost.domain.tld
domain.tld/
There are invalid domains and empty data.
Ideally I'd like to do something along the lines of : -
UPDATE table1 SET domain = website REGEXP '^(https?://)?[a-zA-Z0-9\\\\.\\\\-]+(/|$|\\\\?)'
domain being a new empty field, website being the original url.
You can't use regex like that in MySQL as is, but apparently you can some some UDFs that implement it. See:
How to do a regular expression replace in MySQL?
https://launchpad.net/mysql-udf-regexp
http://www.mysqludf.org/lib_mysqludf_preg/
Related
Using MySQLAdmin. Moved data from Windows server and trying to replace case in urls but not finding the matches. Need slashes as I don't want to replace text in anything but the urls (in post table). I think the %20 are the problem somwhow?
UPDATE table_name SET field = replace(field, '/user%20name/', '/User%20Name/')
The actual string is more like:
https://www.example.com/forum/uploads/user%20name/GFCI%20Stds%20Rev%202006%20.pdf
In a case you are using MariaDB you have REGEXP_REPLACE() function.
But best approach is to dump the table into the file. Open it in a Notepad ++
and run regex replace like specified on a pic:
Pattern is: (https:[\/\w\s\.]+uploads/)(\w+)\%20(\w+)((\/.*)+)
Replace with: $1\u$2\%20\u$3$4
Then import the table again
Hope this help
If its MariaDB, you can do the following:
UPDATE table_name SET field = REGEXP_REPLACE(field, '\/user%20name\/', '\/User%20Name\/');
First, please check, what is actually stored in the database: %20 is a html-entity which represents a whitespace. Usually, when you are storing this inside the database, it will be represented as an actual whitespace (converted before you store it) -> Hence your replace doesn't match the actual data.
The second option that might be possible - depending on what you want to do: You are seeing the URL containing %20, therefore you created your database records (which you would like to fetch) with that additional %20 - And when you now try to query your results based on the actual url, the %20 is replaced with an "actual" whitespace (before your query) and hence it doesn't match your stored data.
Is it possible to search and replace with a regex expression in MySQL?
I have a thousand values on a column containing a JSON string, somewhere inside each JSON are several occurrences of a string that I have to change.
I've already made a PHP script that do the job, but it is a little slow.
Is there a nicer way to do that using only MySQL?
Something like:
UPDATE mytable SET value = "disabled" WHERE data REGEXP '{"field": "(.+)"}'
MariaDB has REGEXP_REPLACE(), which might provide the tool you need.
My URLs look like
'/api/comments/languages/124/component/segment_translation/2'
I've know which parts of the url are static; and which are dynamic - and have a structure which tells me this
I have example requests and responses (where the dynamic parts won't match) - which I'm trying to look up in mySQL - so I could very easily generate a query
select url from qb_log_full_requests
where
URL REGEXP 'api/comments/languages/[^f.*]/component/[^f.*]/[^f.*]'
Which is great; except it doesn't work.
Is there a way to ask mySQL to match
/exact_string/[wild card]/exact_string/[wild card]
etc?
You may try following regexp:
/api/comments/languages/[^/]+/component/segment_translation/[^/]+
I am trying to transfer a list of domain names from an old system to a newer one.
The problem is the the data in the old database was used as a reference and contains additional information but the new system will integrate with cpanel and thus the domain has to be correct.
I am trying to automate the import of the old data that does conform to my requirements and leave aside for manual import the ~4% that does not.
I have used a regular expression to achieve this but for some reason it is not working as I expect it.
This is the condition I use:
`domain` REGEXP '^[\.A-Za-z0-9\-]+\\.[a-zA-Z]{2,4}$' = 1
It correctly identifies the following as not being valid:
https://test-1.example.com:8443/login_up.php3
118.18.187.15
But fails for the wollowing:
the-example.com mchannel
example.com NEW
I know regex decently well but I can't figure out why in this case it does not work.
Fiddle URL: http://sqlfiddle.com/#!2/a9d70/5
Example what should validate: http://www.regexr.com/39f4v
This regex should match everything from the first regex and the ips
[[:<:]][\.A-Za-z0-9\-]+\.[a-zA-Z]{2,4}[[:>:]]|[[:digit:]]+\.[[:digit:]]+\.[[:digit:]]+\.[[:digit:]]+
This question already has an answer here:
Closed 11 years ago.
Possible Duplicate:
Simulating regex capture groups in mysql
Good day,
I have many rows of data stored in a MySQL table. A typical value could look something like this:
::image-gallery::
::gallery-entry::images/01.jpg::/gallery-entry::
::/image-gallery::
Is there a way - by means of a regular expression that I can a) extract the term image gallery from the first line (it could be any phrase, not just image-gallery) and then extract the center line as two separate values like this:
gallery-entry and then images/01.jpg
There could be many lines of ::gallery-entry:: values, and they could be called anything as well. A more complete example would be:
::image-gallery::
::title::MY GALLERY::/title::
::date::2011-05-20::/date::
::gallery-entry::images/01.jpg::/gallery-entry::
::/image-gallery::
In essence I want this information: The content type (image-gallery) in the above case, first line and last line. Then I need the title as a key value style pair, so title as the key and MY GALLERY as the value. Then, subsequently, I would need all the rows of fields thereafter (gallery-entry) as key value pairs too.
This is for a migration script where data from an old system will be migrated over to a new system with different syntax.
If MySQL select statements would not work, would it be easier to parse the results with a PHP script for data extraction?
Any and all help is always appreciated.
Kind regards,
Simon
Try this regex:
::image-gallery::\s+::title::(.*?)::/title::.*?::gallery-entry::(.*?)::/gallery-entry::\s+::/image-gallery::
Use single-line mode (/pattern/s) so the .*? chews up newlines.
Your key-value pairs will be:
title: $1 (matching group 1)
gallery-entry: $2 (matching group 2)
From simulating-regex-capture-groups-in-mysql there does not seem to be a way to easily capture groups with a regex in mysql. The reason is that MySQL does not natively support capture groups in a regex. If you want that functionality you can use a server side extension like lib_mysqludf_preg to add that capability to MySQL.
The easiest way is to extract the whole column with SQL and then do the text matching in another language (such as php).
In my tests kenbritton's regex didn't work, but building off of it the following regex worked on your test data:
::image-gallery::\s+::title::(.*?)::\/title::\s+(?:.*\s+)*::gallery-entry::(.*?)::\/gallery-entry::\s+::\/image-gallery::