Find non-consecutive repeating text in a mysql column - mysql

I have a database with a large set of email addresses.
Because of a bug in a script, the database is full of wrong email addresses. These addresses has a known pattern.
They are made of a true email address, concatenated with a string in the beginning.
This string is itself a part of the email address.
Example:
The correct email should be:
john.doe#example.com
Instead I have:
doejohn.doe#example.com
Or also:
johndoejohn.doe#example.com
How can I identify these addresses?
I thought about creating a regexp that finds repeating text inside a string, but I could find out how to do it.
Any ideas?

You can use below query to take care of LASTNAMEfirstname.lastname#something.com pattern, This will first find the last_name and then replace that with null in the first part before first ..
concat(replace(substr(email,1,locate('.',email)),substr(email,LOCATE('.',email)+1,locate('#',email)-LOCATE('.',email)-1),'')
,
substr(email,locate('.',email)+1,length(email))
)
See SQL Fiddle example here
http://sqlfiddle.com/#!9/24fba/2
But this will not take care of FIRSTNAMElastnameFIRSTNAME.lastname#example.com pattern.

Can't test right now but this might work:
^([^#]{5,})[^#]{1,}\.\1#[^#]+$

Related

Update Column from another column in same table

we need to update our mail addresses in a table, because we have a new domain-part. The local-part remains untouched
In the table is a column for Name, Surname and Mail. (And other columns which are not important).
We want it to look like this in the end:
Name Surname Mail
Test Name Test.Name#newdomain.com
Test2 Name2 Test2.Name2#newdomain.com
But while trying to do so we broke it and now the mail column only shows the new domain. We used the following code:
update table
set mail = Replace('olddomain.com','newdomain.com')
where mail LIKE '%olddomain.com'
So now we need to restore the mail column and add the new domain-part. Any help?
I'm surprised this works. Normally, replace() takes three arguments:
set mail = Replace(mail, 'olddomain.com', 'newdomain.com')
I might suggest that you include the # in the logic as well.
replace() takes three arguments
update table
set mail = Replace(mail,'#olddomain.com','#newdomain.com')
where mail LIKE '%olddomain.com'

Sort numbers by string

Hi I am using an API from Postcode Anywhere, the idea being to add a company by searching by postcode to select the address, this is pretty standard and the code works fine.
Just some background info, PAW works in two stages, 1 the post code search criteria is sent off to their services, which returns an array of possible addresses, you then select the address you want, and in stage 2, the full PAF file for that ID is returned and stored to the table.
The problem I am having is that the array they send includes an Address Field which includes house number and street address in one field, making it difficult to sort alphanumerically.
This is the sample data I have in my table:
and this is how it looks in my application:
As you can see it is not ideal and I have no control on how they send the data.
Does anyone have any ideas on how I can search a string based on numbers that can be 1, 11, 2, instead of 01, 02, 03, etc, or at the very least be able to split this into two rows. Also please note, that it most cases, the post code search will result business/property names as well as house numbers, as seen in this example.
Any thoughts would be greatly appreciated.
Have you considered using a different API provider for the data, Allies Computing (who I work for) have a single step API, where the initial postcode search returns all fields in the response. It also orders these results by premise number/name.
Give it a try here - https://developers.alliescomputing.com/postcoder-web-api/address-lookup/premise
There are also other providers of PAF data that do it this way such as Crafty Clicks and Ideal Postcodes.
It might also be worth checking the PAF license with your provider to ensure you comply with that too.

Find column values that are a start string of given string.

I have a database table that contains URLs in a column. I want to show certain data depending on what page the user is on, defaulting to a 'parent' page if not a direct match. How can I find the columns where the value is part of the submitted URL?
Eg. I have www.example.com/foo/bar/baz/here.html; I would expect to see (after sorting on length of column value):
www.example.com/foo/bar/baz/here.html
www.example.com/foo/bar/baz
www.example.com/foo/bar
www.example.com/foo
www.example.com
if all those URLs are in the table of course.
Is there a built in function or would I need to create a procedure? Googling kept getting me to LIKE and REGEXP, which is not what I need. I figured that a single query would be much more efficient than chopping the URL and making multiple queries (the URLs could potentially contain many path components).
Simple turn around the "Like" operator:
SELECT * FROM urls WHERE "www.example.com/foo/bar/baz/here.html" LIKE CONCAT(url, "%");
http://sqlfiddle.com/#!2/ef6ee/1

What can I do with an inconsistent column delimited text file?

I have a text file that looks something like...
firstname:middle:lastname
firstname:middle:lastname
firstname:lastname
firstname:middle:lastname
firstname:lastname
I would like to be able to eventually use this information in a MySQL database, but since the columns are not correct I am not sure what to do. Is there any way to resolve this?
If the data you have is only the above variations, then you can make the assumptions:
First part is the firstname
Last part is the lastname
Therefore if using PHP for example you could use explode to separate the data on the delimeter such as in this case being :.
When looping through each row just assume the last part is the lastname, first part is the firstname and the middle part is the middlename.
You can use count() to find out how many parts are in the specific row you are reading inside the loop. This should allow you to figure out which one is the last part.
If the file is so simple ... the solution is trivial
firstname:middle:lastname
firstname:lastname
if(there are only two columns) { that means we have first and last name }
else { we have first, middle and last name }
If there are more columns, you could maybe resolve data to proper columns if you manage to build a priority list (like in what order they could be missing, for example 'last name > first name > middle name') or/and if you could combine that with data type matching (string/int/double/date) ... anyway you need to gather all your domain knowledge and see if that suffice.

separating values in a URL, not with an &

Each parameter in a URL can have multiple values. How can I separate them? Here's an example:
http://www.example.com/search?queries=cars,phones
So I want to search for 2 different things: cars and phones (this is just a contrived example). The problem is the separator, a comma. A user could enter a comma in the search form as part of their query and then this would get screwed up. I could have 2 separate URL parameters:
http://www.example.com/login?name1=harry&name2=bob
There's no real problem there, in fact I think this is how URLs were designed to handle this situation. But I can't use it in my particular situation. Requires a separate long post to say why... I need to simply separate the values.
My question is basically, is there a URL encodable character or value that can't possibly be entered in a form (textarea or input) which I can use as a separator? Like a null character? Or a non-visible character?
UPDATE: thank you all for your very quick responses. I should've listed the same parameter name example too, but order matters in my case so that wasn't an option either. We solved this by using a %00 URL encoded character (UTF-8 \u0000) as a value separator.
The standard approach to this is to use the same key name twice.
http://www.example.com/search?queries=cars&queries=phones
Most form libraries will allow you to access it as an array automatically. (If you are using PHP (and making use of $_POST/GET and not reinventing the wheel) you will need to change the name to queries[].)
You can give them each the same parameter name.
http://www.example.com/search?query=cars&query=phones
The average server side HTTP API is able to obtain them as an array. As per your question history, you're using JSP/Servlet, so you can use HttpServletRequest#getParameterValues() for this.
String[] queries = request.getParameterValues("query");
Just URL-encode the user input so that their commas become %2C.
Come up with your own separator that is unlikely to get entered in a query. Two underscores '__' for example.
Why not just do something like "||"? Anyone who types that into a search area probably fell asleep on their keyboard :} Then just explode it on the backend.
easiest thing to do would be to use a custom separator like [!!ValSep!!].