Generate unique username from first and last name? - mysql

I've got a bunch of users in my database and I want to reset all their usernames to the first letter of their first name, plus their full last name. As you can imagine, there are some dupes. In this scenario, I'd like to add a "2" or "3" or something to the end of the username. How would I write a query to generate a unique username like this?
UPDATE user
SET username=lower(concat(substring(first_name,1,1), last_name), UNIQUETHINGHERE)

CREATE TABLE bar LIKE foo;
INSERT INTO bar (id,user,first,last)
(SELECT f.id,CONCAT(SUBSTRING(f.first,1,1),f.last,
(SELECT COUNT(*) FROM foo f2
WHERE SUBSTRING(f2.first,1,1) = SUBSTRING(f.first,1,1)
AND f2.last = f.last AND f2.id <= f.id
)),f.first,f.last from foo f);
DROP TABLE foo;
RENAME TABLE bar TO foo;
This relies on a primary key id, so for each record inserted into bar, we only count duplicates found in foo with id less than bar.id.
Given foo:
select * from foo;
+----+------+--------+--------+
| id | user | first | last |
+----+------+--------+--------+
| 1 | aaa | Roger | Hill |
| 2 | bbb | Sally | Road |
| 3 | ccc | Fred | Mount |
| 4 | ddd | Darren | Meadow |
| 5 | eee | Sharon | Road |
+----+------+--------+--------+
The above INSERTs into bar, resulting in:
select * from bar;
+----+----------+--------+--------+
| id | user | first | last |
+----+----------+--------+--------+
| 1 | RHill1 | Roger | Hill |
| 2 | SRoad1 | Sally | Road |
| 3 | FMount1 | Fred | Mount |
| 4 | DMeadow1 | Darren | Meadow |
| 5 | SRoad2 | Sharon | Road |
+----+----------+--------+--------+
To remove the "1" from the end of user names,
INSERT INTO bar (id,user,first,last)
(SELECT f3.id,
CONCAT(
SUBSTRING(f3.first,1,1),
f3.last,
CASE f3.cnt WHEN 1 THEN '' ELSE f3.cnt END),
f3.first,
f3.last
FROM (
SELECT
f.id,
f.first,
f.last,
(
SELECT COUNT(*)
FROM foo f2
WHERE SUBSTRING(f2.first,1,1) = SUBSTRING(f.first,1,1)
AND f2.last = f.last AND f2.id <= f.id
) as cnt
FROM foo f) f3)

As a two-parter:
SELECT max(username)
FROM user
WHERE username LIKE concat(lower(concat(substring(first_name,1,1),lastname), '%')
to retrieve the "highest" username for that name combo. Extract the numeric suffix, increment it, then insert back into the database for your new user.
This is racy, of course. Two users with the same first/last names might stomp on each other's usernames, depending on how things work out. You'd definitely want to sprinkle some transaction/locking onto the queries to make sure you don't have any users conflicting.

Nevermind.... I just found the dupes:
select LOWER(CONCAT(SUBSTRING(first_name,1,1),last_name)) as new_login,count(* ) as cnt from wx_user group by new_login having count(* )>1;
And set those ones manually. Was only a handful.

Inspired in the answer of unutbu: there is no need to create an extra table neither several queries:
UPDATE USER a
LEFT JOIN (
SELECT USR_ID,
REPLACE(
CONCAT(
SUBSTRING(f.`USR_FIRSTNAME`,1,1),
f.`USR_LASTNAME`,
(
(SELECT IF(COUNT(*) > 1, COUNT(*), '')
FROM USER f2
WHERE SUBSTRING(f2.`USR_FIRSTNAME`,1,1) =
SUBSTRING(f.`USR_FIRSTNAME`,1,1)
AND f2.`USR_LASTNAME` = f.`USR_LASTNAME`
AND f2.`USR_ID` <= f.`USR_ID`)
)
),
' ',
'') as login
FROM USER f) b
ON a.USR_ID = b.USR_ID
SET a.USR_NICKNAME = b.login

Related

Removing duplicates based on one column, and keeping the row that has value in different column, and if there isn't any, keep lowest ID row

Using MySQL 5.7 on Google Cloud, I'm trying to deduplicate MySQL data based on an "EmailAddress" column, but some of the rows have a value in the "FullName" column and some of them don't. I want to keep the ones that have a value in the FullName column, but if none of the rows with that EmailAddress value a FullName value, then just keep the duplicate with the lowest ID number (first column - primary key).
I've finally broken it down into two separate queries, one to first remove the rows with no value in the FullName column IF there's another duplicate row that does have a value in the FullName column:
DELETE
FROM customer_info
WHERE id IN
(
SELECT *
FROM
(
SELECT c1.id
FROM customer_info c1
INNER JOIN customer_info c2 on c1.EmailAddress=c2.EmailAddress and c1.id!=c2.id
WHERE
(trim(c1.FullName)='' or c1.FullName is NULL)
and c2.FullName is not NULL
and length(trim(c2.FullName))!=0
) t
)
and another query to remove the rows with the bigger IDs where no value was found in the FullName column:
DELETE
FROM customer_info
WHERE id IN
(
SELECT *
FROM
(
SELECT c1.id
FROM customer_info c1
INNER JOIN customer_info c2 on c1.EmailAddress=c2.EmailAddress and c1.id>c2.id
) t
)
This "works", but not really. It worked one time when I left it running overnight for a smaller segment of the data, and when I woke up there was an error, but I looked at the data and it was complete.
Am I missing something in my query that's making it highly inefficient, or is it just par for the course for this type of query, and there's no optimization possible in my code that would make a tangible improvement? I've maxed out a Google Cloud SQL instance to their db-n1-highmem-32 size, with 32 GB of memory and 1000 GB of storage space, and it still chokes up and spits out a 2013 error after running for an hour. I need to do this for a total of a little over 3 million rows.
For example, this:
id | FullName | EmailAddress |
----------------------------------------------
1 | John Doe | john.doe#email.com |
2 | null | janedoe#box.com |
3 | null | billybob#bobby.com |
4 | null | john.doe#email.com |
5 | John Lennon | jlennon#yoohoo.com |
6 | null | james.smith#coolmail.com|
7 | null | billybob#bobby.com |
8 | Jane Doe | janedoe#box.com |
would result in this:
id | FullName | EmailAddress |
----------------------------------------------
1 | John Doe | john.doe#email.com |
3 | null | billybob#bobby.com |
5 | John Lennon | jlennon#yoohoo.com |
6 | null | james.smith#coolmail.com|
8 | Jane Doe | janedoe#box.com |
using exists() might be simpler in this situation
delete
from customer_info c
where (trim(c.FullName)='' or c.FullName is null)
and exists (
select 1
from customer_info i
where i.Email = c.EmailAddress
and trim(i.FullName)>''
)
delete
from customer_info c
where exists (
select 1
from customer_info i
where i.Email = c.EmailAddress
and i.id < c.id
)

mysql - find the number of occurrences from two columns

question example :
source | target
apple | dog
dog | cat
door | cat
dog | apple
cat | dog
result :
apple dog 2
dog cat 2
door cat 1
Here is my question, as an example:
I am trying to count the apple and dog occurrence from source and target. The
count is 2, that is; apple dog and dog cat.
In the same way; dog cat and cat dog, they occur 2 times.
How can I do this with mysql ?
the data will be very large, so this is just a simple example.
Assuming Source and Target are joined with an ID I would do this as:
SELECT
FirstValue,
SecondValue,
COUNT(*) As MyCount
FROM
(SELECT
SourceTable.Value FirstValue,
TargetTable.Value SecondValue
FROM
SourceTable
INNER JOIN TargetTable ON SourceTable.IDValue = TargetTable.IDValue
UNION ALL
SELECT
TargetTable.Value FirstValue,
SourceTable.Value SecondValue
FROM
TargetTable
INNER JOIN SourceTable ON TargetTable.IDValue = SourceTable.IDValue)
GROUP BY
FirstValue,
SecondValue
Reading the question again I'm unsure if these are two columns in the same table. If they are then the query can be simplified to:
SELECT
FirstValue,
SecondValue,
COUNT(*) As MyCount
FROM
(SELECT
SourceColumn FirstValue,
TargetColumn SecondValue
FROM
MyTable
UNION ALL
SELECT
TargetColumn FirstValue,
SourceColumn SecondValue
FROM
MyTable)
GROUP BY
FirstValue,
SecondValue
As I see, your issue is: to count your values independent of their order in your columns. So, pair <'foo', 'bar'> should be counted as <'bar', 'foo'>. For that you may use:
SELECT
*,
COUNT(*)
FROM
test
GROUP BY
LEAST(source, target),
GREATEST(source, target)
Note, that:
Mixing non-group columns with group function will work in MySQL only. It's an extension, so server is free to chose any row.
DROP TABLE IF EXISTS my_table;
CREATE TABLE my_table
(source VARCHAR(12) NOT NULL
,target VARCHAR(12) NOT NULL
,PRIMARY KEY(source,target)
);
INSERT INTO my_table VALUES
('apple','dog'),
('dog','cat'),
('door','cat'),
('dog','apple'),
('cat','dog');
SELECT * FROM my_table;
+--------+--------+
| source | target |
+--------+--------+
| apple | dog |
| cat | dog |
| dog | apple |
| dog | cat |
| door | cat |
+--------+--------+
SELECT GREATEST(source,target),LEAST(source,target),COUNT(*) FROM my_table GROUP BY GREATEST(source,target),LEAST(source,target);
+-------------------------+----------------------+----------+
| GREATEST(source,target) | LEAST(source,target) | COUNT(*) |
+-------------------------+----------------------+----------+
| dog | apple | 2 |
| dog | cat | 2 |
| door | cat | 1 |
+-------------------------+----------------------+----------+

How to query a table (which has multiple rows pertaining to a single entity) and return GROUPED result but only where all conditionals have been met?

Firstly, pardon the incredibly vague/long question, I'm really not sure how to summarise my query without the full explanation.
Ok, I have a single MySQL table with the format like so
some_table
user_id
some_key
some_value
If you imagine that, for each user, there are multiple rows, for example:
1 | skill | html
1 | skill | php
1 | foo | bar
2 | skill | html
3 | skill | php
4 | foo | bar
If I want to find all the users who have listed HTML as a skill I can simply do:
SELECT user_id
FROM some_table
WHERE some_key = 'skill' AND some_value='html'
GROUP BY user_id
Easy enough. This would give me user ID's 1 and 2.
If I want to find all users who have listed HTML or PHP as a skill then I can do:
SELECT user_id
FROM some_table
WHERE (some_key = 'skill' AND some_value='html') OR (some_key = 'skill' AND some_value='php')
GROUP BY user_id
This would give me use ID's 1, 2 and 3.
Now, what I'm struggling to work out is how I can query the same table but this time say "give me all the users who have listed both HTML and PHP as a skill", i.e: just user ID 1.
Any advice, guidance or links to docs massively appreciated.
Thanks.
Here's one way:
SELECT user_id
FROM some_table
WHERE user_id IN (SELECT user_id FROM some_table where (some_key = 'skill' AND some_value='html'))
AND user_id IN (SELECT user_id FROM some_table where (some_key = 'skill' AND some_value='php'))
you need to use a nested query (or a self join, which is different)
I set up the following table.
+-------+----------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-------+----------+------+-----+---------+-------+
| id | int(11) | YES | | NULL | |
| type | char(10) | YES | | NULL | |
| value | char(10) | YES | | NULL | |
+-------+----------+------+-----+---------+-------+
inserted the following values
+------+-------+-------+
| id | type | value |
+------+-------+-------+
| 1 | skill | html |
| 1 | skill | php |
| 2 | skill | html |
| 3 | skill | php |
| 2 | skill | php |
+------+-------+-------+
ran this query
select id
from test
where type = 'skill'
and value = 'html'
and id in (
select id
from test
where type = 'skill'
and value = 'php');
and got
+------+
| id |
+------+
| 1 |
| 2 |
+------+
a self join would be as follows
select e1.id
from test e1, test e2
where e1.id = e2.id
and e2.type = 'skill'
and e2.value = 'html'
and e1.type = 'skill'
and e1.value = 'php'
;
and produce the same result.
so there you have two ways to try it in your code.
I don't know if this is valid for mysql, but should be (works for other db engines):
SELECT php.user_id
FROM some_table php, some_table html
WHERE php.user_id = html.user_id
AND php.some_key = 'skill'
AND html.some_key = 'skill'
AND php.some_value = 'php'
AND html.some_value = 'html';
And alternative, by using HAVING statement:
SELECT user_id, count(*)
FROM some_table
WHERE some_key = 'skill'
AND some_value in ('php','html')
GROUP BY user_id
HAVING count(*) = 2;
And a third option is to use inner selects. A slight alternative approach to David's approach:
SELECT user_id FROM some_table
WHERE
some_key = 'skill' AND
some_value = 'html' AND
user_id IN (
SELECT user_id FROM some_table
WHERE
some_key = 'skill' AND
some_value = 'php' AND
user_id IN (
SELECT user_id FROM some_table
WHERE
some_key = 'skill' AND
some_value = 'js' -- AND user_id IN ... for next level, etc.
)
);
... idea is that you can "pipe" the inner selects. With each new property you add new inner select to the most inner one.

SELECT N rows before and after the row matching the condition?

The behaviour I want to replicate is like grep with -A and -B flags .
eg grep -A 2 -B 2 "hello" myfile.txt will give me all the lines which have "hello" in them, but also 2 lines before and 2 lines after it.
Lets assume this table schema :
+--------+-------------------------+
| id | message |
+--------+-------------------------+
| 1 | One tow three |
| 2 | No error in this |
| 3 | My testing message |
| 4 | php module test |
| 5 | hello world |
| 6 | team spirit |
| 7 | puzzle game |
| 8 | social game |
| 9 | stackoverflow |
|10 | stackexchange |
+------------+---------------------+
Now a query like :
Select * from theTable where message like '%hello%' will result in :
5 | hello world
How can I put another parameter "N" which selects N rows before, and N rows after the matched record i.e. for N = 2, the result should be :
| 3 | My testing message |
| 4 | php module test |
| 5 | hello world |
| 6 | team spirit |
| 7 | puzzle game |
For simplicity assume 'like %TERM%' matches only 1 row .
Here the result is supposed to be sorted on auto-increment id field.
Right, this works for me:
SELECT child.*
FROM stack as child,
(SELECT idstack FROM stack WHERE message LIKE '%hello%') as parent
WHERE child.idstack BETWEEN parent.idstack-2 AND parent.idstack+2;
Don't know if this is at all valid MySQL but how about
SELECT t.*
FROM theTable t
INNER JOIN (
SELECT id FROM theTable where message like '%hello%'
) id ON id.id <= t.id
ORDER BY
ID DESC
LIMIT 3
UNION ALL
SELECT t.*
FROM theTable t
INNER JOIN (
SELECT id FROM theTable where message like '%hello%'
) id ON id.id > t.id
ORDER BY
ID
LIMIT 2
Try this simple one (edited) -
CREATE TABLE messages(
id INT(11) DEFAULT NULL,
message VARCHAR(255) DEFAULT NULL
);
INSERT INTO messages VALUES
(1, 'One tow three'),
(2, 'No error in this'),
(3, 'My testing message'),
(4, 'php module test'),
(5, 'hello world'),
(6, 'team spirit'),
(7, 'puzzle game'),
(8, 'social game'),
(9, 'stackoverflow'),
(10, 'stackexchange');
SET #text = 'hello world';
SELECT id, message FROM (
SELECT m.*, #n1:=#n1 + 1 num, #n2:=IF(message = #text, #n1, #n2) pos
FROM messages m, (SELECT #n1:=0, #n2:=0) n ORDER BY m.id
) t
WHERE #n2 >= num - 2 AND #n2 <= num + 2;
+------+--------------------+
| id | message |
+------+--------------------+
| 3 | My testing message |
| 4 | php module test |
| 5 | hello world |
| 6 | team spirit |
| 7 | puzzle game |
+------+--------------------+
N value can be specified as user variable; currently it is - '2'.
This query works with row numbers, and this guarantees that the nearest records will be returned.
Try
Select * from theTable
Where id >=
(Select id - variableHere from theTable where message like '%hello%')
Order by id
Limit (variableHere * 2) + 1
(MS SQL Server only)
The most reliable way would be to use the row_number function that way it doesn't matter if there are gaps in the id. This will also work if there are multiple occurances of the search result and properly return two above and below each result.
WITH
srt AS (
SELECT ROW_NUMBER() OVER (ORDER BY id) AS int_row, [id]
FROM theTable
),
result AS (
SELECT int_row - 2 AS int_bottom, int_row + 2 AS int_top
FROM theTable
INNER JOIN srt
ON theTable.id = srt.id
WHERE ([message] like '%hello%')
)
SELECT theTable.[id], theTable.[message]
FROM theTable
INNER JOIN srt
ON theTable.id = srt.id
INNER JOIN result
ON srt.int_row >= result.int_bottom
AND srt.int_row <= result.int_top
ORDER BY srt.int_row
Adding an answer using date instead of an id.
The use-case here is an on-call rotation table with one record pr week.
Due to edits the id might be out of order for the purpose intended.
Any use-case having several records pr week, pr date or other will of course have to be mended.
+----------+--------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+----------+--------------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| startdate| datetime | NO | | NULL | |
| person | int(11) | YES | MUL | NULL | |
+----------+--------------+------+-----+---------+----------------+
The query:
SELECT child.*
FROM rota-table as child,
(SELECT startdate
FROM rota-table
WHERE YEARWEEK(startdate, 3) = YEARWEEK(now(), 3) ) as parent
WHERE
YEARWEEK(child.startdate, 3) >= YEARWEEK(NOW() - INTERVAL 25 WEEK, 3)
AND YEARWEEK(child.startdate, 3) <= YEARWEEK(NOW() + INTERVAL 25 WEEK, 3)

MySQL Select all differences between 2 tables?

I have 3 tables, 'old', 'new' and a 'result' table (from a phonebook database), they have the same structure and nearly the same entries.
old:
ID | name | number | email | ...
----+--------------------+--------+-------+-----
1 | foo | 123 | ...
2 | bar | 456 |
3 | entrry with typo | 012345 |
4 | John Doe | 123345 |
new:
ID | name | number | email | ...
----+--------------------+--------+-------+-----
1 | foo | 123 | ...
2 | bar | 456 |
3 | entry without typo | 012345 |
4 | John Doe | 12345 |
5 | newly added entry | 09876 |
From this 'new' table I would like to select all rows that are different from the 'old' table, so the result would be:
result:
ID | name | number | email | ...
----+--------------------+--------+-------+-----
3 | entry without typo | 012345 | ...
4 | John Doe | 12345 |
5 | newly added entry | 09876 |
including all entries that have changed data plus all entries that don't appear in 'old' table...
Not only to make it more complicated, there are about 10 columns in those tables (including ID, name, number, email and several flags and other info).
Is there any most performant solution for doing this or will I have to compare each column with a new query..?
You'll have to do some comparison on the old records for correctness but I think this is the most straight forward solution.
Update I was a little confused about icluding all entries that have changed data plus all entries that don't appear in 'old' table... So I added the where and modified the join clause
insert into result (id, name, number, email, ...)
select new.id, new.name, new.number, new.email, ...
from new
LEFT JOIN old
ON new.ID = old.id
WHERE
old.ID is null
OR
( new.name <> old.name
or
new.number <> old.number
or
new.email <> new.email
...)
SELECT new.*
FROM new
JOIN old ON new.id = old.id
WHERE (CONCAT(new.ID,new.name,new.number,etc...) <> CONCAT(old.ID,old.name,old.number,etc...))
That should pull up any records in the new table where at least one its fields differs from the equivalent record in the old table.
Assuming the IDs must match up in order to make the comparisons legitimate:
select n.*
from new n
left join old o on o.id = n.id
where o.id is null
or not (
and o.name = n.name
and o.number = n.number
and o.email = n.email
and ...)
Note, this solution handles the case where some of the fields can be NULL. If you use (o.name <> n.name) instead of not (o.name = n.name) you won't correctly consider NULLs to be different from non-nulls.