Ex.Table Store
store_id Employee_id
0020,0345,0002345 0234
0034 0943
I tried REPLACE(LTRIM(REPLACE(store_id,'0',' ')),' ','0') but it trims leading zeros for the first value alone.
How to get all storeIds of an employee without leading zeros in a sql query?
Is it possible?
If this is a table in an rdbms, it violates 1NF. You should avoid doing this. If you possible use junction tables or reference tables. Else you could use the same schema and insert multiple entries corresponding to a single Employee_id.
Now, To solve this in a generic manner, there is only one solution, UDF's. User defined function specification is here
or move this kind of processing to the client.
If you just want to remove the leading zeros -- and there aren't too many -- you can use replace():
select substr(replace(replace(replace(concat(',', store_id), ',0', ','),
',0', ','),
',0', ',')
2, length(store_id)
This just replaces a comma followed by a zero with a comma, adding and removing a comma at the beginning and end of the string. Different databases have slightly different names for substr() and length(), but the functionality is generally there.
Related
If I have a value in column WorkoutID that looks like 100,10,7
and I want to remove the value 10 from this, I can use the following SQL script:
UPDATE
UserDB.Programs
SET
WorkoutID = REPLACE(WorkoutID, ',10','')
WHERE
ProgramID = '7172';
which would correctly output 100,7.
The expected outcome ALWAYS needs to be
number, number, or number
NOT number,,number, or number, or ,number
which makes it tricky because in the replace statement, i need to look for the value, but how can I assume the comma position? i.e replace(WorkoutID, ',10', ''), or replace(WorkoutID, '10,', '')
As others pointed out in the comment, you should not store comma separated values in a single column.
But in case you have no control over the data, here is a solution.
The problem with your solution is that you may unintentionally remove parts of other values, i.e. if you have something like 100,10,7,1000 and remove ,10, you will get 100,700...
One solution would be to add a leading and trailing comma to the original string, then replace the value enclosed with commas (i.e. ,10,), then remove the added leading and trailing commas.
Example :
CREATE TABLE program (ProgramID INT, WorkoutID TEXT);
INSERT INTO program VALUES (1, '100,12,4,55,120,212,45');
SELECT TRIM(BOTH ',' FROM REPLACE(CONCAT(',', WorkoutID, ','),',12,',','))
FROM program;
Result :
100,4,55,120,212,45
Fiddle
There may be other solutions using JSON paths etc. but I think this one is pretty fast and understandable.
I am taking the MySQL class by Duke on Coursera. In week two there is reference to messy data and I figured I would ask my question here. The scope of my question would be how to compare an entry in a row of table that would already match an instance except that it was entered with a hyphen, i.e. "Golden Retriever Mix" is the same instance as "Golden Retriever- Mix". And when I run a DISTINCT SELECT statement I do not want it to pull both results. The catch is, we cannot just remove all hyphens from the column fields because we still want them for instance for entry of "Golden Retriever-Airedale Terrier Mix". How would a query look for doing this. The example code that pulls in both "Golden Retriever Mix" and "Golden Retriever- Mix" is below.
SELECT DISTINCT breed,
TRIM(LEADING '-' FROM breed)
FROM dogs
ORDER BY (LEADING '-' FROM breed) LIMIT 1000, 1000;
I am thinking I need and IF/THEN statement that says
IF(REPLACE(breed,'-','') = breed)
THEN DELETE breed;
Obviously this is not correct syntax which is what I am looking for.
I think what you are looking for is the Levenshtein distance (https://en.wikipedia.org/wiki/Levenshtein_distance).
This one calculates the difference between words e.g. comparison of "Test" and "Test1" would result to 1 because there is one letter more.
You could use the suggested procedures from
How to add levenshtein function in mysql? or Levenshtein: MySQL + PHP
This will not only bring up all entries having a leading "-" it even includes the ones with misspelling. You can filter your result data by the calculated distance then.
If you do not want this one because of performance issues you can still use TRIM or REPLACE to filter your symbol and compare it with the other string.
You're almost there, all you need to do is get rid of the plain breed column in your select clause and change TRIM() with REPLACE()
SELECT DISTINCT REPLACE(breed, '-', ' ')
FROM dogs
TRIM(LEADING...) would remove the hyphens at the beginning of the string, but what you want to show is the distinct values of breed considering hyphens as spaces.
Edit
I was assuming the two strings were "Golden Retriever Mix" and "Golden Retriever-Mix", but if there's actually a space after the hyphen ("Golden Retriever- Mix"), you can use REPLACE(breed, '-', '') instead
Edit 2
After the clarification in your comment, I think what you need is a GROUP BY clause
SELECT MIN(breed)
FROM dogs
GROUP BY REPLACE(breed, '-', ' ')
Any string with an hypen will be considered higher in value than the same string with a space instead, so when there are both this query will return the one with the space. If there's only one instead, it will be returned as is
I have a table called media with a column called accounts_used in which the rows appear in the following format
68146, 67342, 60577, 61506, 67194, 67034, 63484, 49113, 61518, 66971, 67511,
67351, 63621, 67725, 63638, 68141, 66114, 67262, 67537, 67537, 61765, 63701,
67087, 62641, 61294, 67063, 67049, 67038, 67170, 67147, 67289, 61264, 67091,
63690, 63505, 63505, 49172, 52313, 67070, 66945, 67234, 62265, 61368, 67870,
67211, 67586, 49240, 67538, 67538, 67809, 67183, 67164, 62712, 67519, 66895,
67693, 60266, 60266, 67593, 67031, 67137, 62570, 60682, 61195, 67569, 67569,
67069, 62082, 67345, 61748, 61553, 52029, 66877, 62630, 67196, 67196, 67196,
67196, 67196, 67196, 66873, 63677, 68174, 67127, 63594, 67107, 60419, 66601,
68156, 67203, 68161, 60233, 66586, 52654, 63570, 66887, 67191, 60877, 52108,
67131, 61784, 67566, 67162, 67073, 67092, 67064, 60133, 66907, 67559, 66846,
60490, 60347, 66558, 48737, 61539, 67236, 68135, 67238 , 63656, 67585, 67512
If the row has a comma at the end I want to remove this, so for example if the row looks like the following
1,2,3,4,5,6,
I want to replace it to just this
1,2,3,4,5,6
Is this possible to do using just a simple query?
It is a bad idea to store lists of ids in rows. But, you are doing it. You can fix this by doing:
update media
set accounts_used = left(accounts_used, length(accounts_used) - 1)
where accounts_used = '%,';
Instead, you should have a MediaAccounts table, with one row per "media" and one row per account.
EDIT:
Possibly, the row ends with a ', ' rather than just a comma:
update media
set accounts_used = left(accounts_used, length(accounts_used) - 2)
where accounts_used = '%, ';
We faced a similar string-replacement issue with a large dataset of bibliographic entries, where we also needed to trim extraneous punctuation from a large number of strings stored in the database which had been imported verbatim from another system. Many of the records in our dataset also contained Unicode characters, as such we needed to find a suitable SQL query that would allow us to find the relevant records that needed to be updated, and then to update them in a way that was Unicode (multibyte character) compatible under MySQL.
In testing with our dataset, I found performing a search for the relevant records we needed to update using MySQL's LEFT() and RIGHT() substring methods, performed better than using a LIKE pattern-match query. Additionally, MySQL's LENGTH() method returns the number of bytes in a string, rather than the number of characters, and the distinction is important when dealing with string fields that potentially contain multibyte character sequences as MySQL's substring methods operate on the number of characters to select, rather than the number of bytes. Thus using the LENGTH() method did not work in our case where many of strings under test contained multibyte characters. These requirements resulted in an UPDATE query with the format presented below:
UPDATE media
SET accounts_used = LEFT(accounts_used, CHAR_LENGTH(accounts_used) - 1)
WHERE RIGHT(accounts_used, 1) = ',';
The query selects records in the media table where the accounts_used column ends with a comma , (found here using the WHERE RIGHT(accounts_used, 1) = ',' clause to perform the filtering where the RIGHT() method returns a substring of specified length starting on the right of the provided string/column), and then uses the LEFT(accounts_used, CHAR_LENGTH(accounts_used) - 1) method call to perform the string trim operation, here trimming the last character from the accounts_used column value, where LEFT() returns a substring of specified length starting on the left of the provided string/column).
Here the use of the multibyte-aware CHAR_LENGTH() method – rather than the basic LENGTH() method – was important in our case due to the countless records in our dataset that contained multibyte characters. If you are only dealing with an ASCII-encoded or another single-byte encoded character set then the LENGTH() method would work perfectly, and indeed in that case CHAR_LENGTH() and LENGTH() would return the same length count, and could even be used interchangeably. When dealing with data that could contain multibyte characters, or if in doubt use the CHAR_LEGNTH() method instead as it will return an accurate character length count in either case.
Please note that the column and field names used in the example query above match those noted in the original question, and should be modified as needed to suit your own dataset needs.
I have two databases, both containing phone numbers. I need to find all instances of duplicate phone numbers, but the formats of database 1 vary wildly from the format of database 2.
I'd like to strip out all non-digit characters and just compare the two 10-digit strings to determine if it's a duplicate, something like:
SELECT b.phone as barPhone, sp.phone as SPPhone FROM bars b JOIN single_platform_bars sp ON sp.phone.REGEX = b.phone.REGEX
Is such a thing even possible in a mysql query? If so, how do I go about accomplishing this?
EDIT: Looks like it is, in fact, a thing you can do! Hooray! The following query returned exactly what I needed:
SELECT b.phone, b.id, sp.phone, sp.id
FROM bars b JOIN single_platform_bars sp ON REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(b.phone,' ',''),'-',''),'(',''),')',''),'.','') = REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(sp.phone,' ',''),'-',''),'(',''),')',''),'.','')
MySQL doesn't support returning the "match" of a regular expression. The MySQL REGEXP function returns a 1 or 0, depending on whether an expression matched a regular expression test or not.
You can use the REPLACE function to replace a specific character, and you can nest those. But it would be unwieldy for all "non-digit" characters. If you want to remove spaces, dashes, open and close parens e.g.
REPLACE(REPLACE(REPLACE(REPLACE(sp.phone,' ',''),'-',''),'(',''),')','')
One approach is to create user defined function to return just the digits from a string. But if you don't want to create a user defined function...
This can be done in native MySQL. This approach is a bit unwieldy, but it is workable for strings of "reasonable" length.
SELECT CONCAT(IF(SUBSTR(sp.phone,1,1) REGEXP '^[0-9]$',SUBSTR(sp.phone,1,1),'')
,IF(SUBSTR(sp.phone,2,1) REGEXP '^[0-9]$',SUBSTR(sp.phone,2,1),'')
,IF(SUBSTR(sp.phone,3,1) REGEXP '^[0-9]$',SUBSTR(sp.phone,3,1),'')
,IF(SUBSTR(sp.phone,4,1) REGEXP '^[0-9]$',SUBSTR(sp.phone,4,1),'')
,IF(SUBSTR(sp.phone,5,1) REGEXP '^[0-9]$',SUBSTR(sp.phone,5,1),'')
) AS phone_digits
FROM sp
To unpack that a bit... we extract a single character from the first position in the string, check if it's a digit, if it is a digit, we return the character, otherwise we return an empty string. We repeat this for the second, third, etc. characters in the string. We concatenate all of the returned characters and empty strings back into a single string.
Obviously, the expression above is checking only the first five characters of the string, you would need to extend this, basically adding a line for each position you want to check...
And unwieldy expressions like this can be included in a predicate (in a WHERE clause). (I've just shown it in the SELECT list for convenience.)
MySQL doesn't support such string operations natively. You will either need to use a UDF like this, or else create a stored function that iterates over a string parameter concatenating to its return value every digit that it encounters.
I have a table "locales" with a column named "name". The records in name always begin with a number of characters folowed by an underscore (ie "foo_", "bar_"...). The record can have more then one underscore and the pattern before the underscore may be repeated (ie "foo_bar_", "foo_foo_").
How, with a simple query, can I get rid of everything before the first underscore including the first underscore itself?
I know how to do this in PHP, but I cannot understand how to do it in MySQL.
SELECT LOCATE('_', 'foo_bar_') ... will give you the location of the first underscore and SUBSTR('foo_bar_', LOCATE('_', 'foo_bar_')) will give you the substring starting from the first underscore. If you want to get rid of that one, too, increment the locate-value by one.
If you now want to replace the values in the tables itself, you can do this with an update-statement like UPDATE table SET column = SUBSTR(column, LOCATE('_', column)).
select substring('foo_bar_text' from locate('_','foo_bar_text'))
MySQL REGEXs can only match data, they can't do replacements. You'd need to do the replacing client-side in your PHP script, or use standard string operations in MySQL to do the changes.
UPDATE sometable SET somefield=RIGHT(LENGTH(somefield) - LOCATE('_', somefield));
Probably got some off-by-one errors in there, but that's the basic way of going about it.