Duplicating rows in sql query - mysql

I have a table containing several fields. The primary key is userId. Currently the user id column contains values '1,2,3,4...etc' like so:
+------+
|userId|
+------+
| 1 |
| 2 |
| 3 |
| 4 |
...etc
I now want to add new rows ending in a,b,c, like so:
+------+
|userId|
+------+
| 1 |
| 1a |
| 1b |
| 1c |
| 2 |
| 2a |
| 2b |
| 2c |
...etc
The new rows should be identical to their parent row, except for the userId. (i.e. 1a,1b & 1c should match 1)
Also I can't guarantee that there won't already be a few 'a', 'b' or 'c's in userid column.
Is there a way to write an sql query to do this quickly and easily?

DON'T DO IT you will run into more problems than the one you are trying to solve!
add a new column to store the letter and make the primary key cover the original UserId and this new column.
If you ever just want the userId, you need to split the letter portion off, which will be expensive for your query and be a real pain.

I agree with KM. I'm not sure why you're creating these duplicate/composite IDs, but it feels like an uncomfortable direction to take.
That said, there is only really one obsticle to overcome; Apparently you can't select from and insert into the same table in MySQL.
So, you need to insert into a Temporary Table first, then insert into the real table...
CREATE Temporary TABLE MyNewUserIDs (
UserID VARCHAR(32)
)
INSERT INTO
myNewUserIDs
SELECT
CONCAT(myTable.UserID, suffix.val)
FROM
myTable
INNER JOIN
(SELECT 'A' as val UNION ALL SELECT 'B' UNION ALL SELECT 'C' UNION ALL SELECT 'D') AS suffix
ON RIGHT(myTable.UserID, 1) <> Suffix.val
WHERE
NOT EXISTS (SELECT * FROM myTable AS lookup WHERE UserID = CONCAT(myTable.UserID, suffix.val))
INSERT INTO
myTable
SELECT
UserID
FROM
MyNewUserIDs
Depending on your environment, you may want to look into locking the tables, so that changes are not made between creating the list of IDs and inserting them into your table.

This is quite simple from a SQL perspective to generate the extra rows: I'll do that here
#Km's answer tells you how to store it as 2 distinct values which I've assumed here. Feel free to concatenate userid and suffix if you prefer.
INSERT myTable (userid, suffix, co11, col2, ...coln)
SELECT M.userid, X.suffix, M.col1, M.col2, ..., M.coln
FROM
myTable M
CROSS JOIN
(SELECT 'a' AS Suffix UNION ALL SELECT 'b' UNION ALL SELECT 'c') X
WHERE
NOT EXISTS (SELECT *
FROM
MyTable M2
WHERE
M2.userid = M.userid ANS M2.Suffix = X.Suffix)

Related

Select only those rows that are in multiple result sets

I have multiple SELECT statements that all return the same columns but may return different resultsets. Is there any way to select all rows that are in all resultsets on database level?
E.g.
|---------------------|------------------|---------|
| ID | Name | Age |
|---------------------|------------------|---------|
| 1 | Paul | 50 |
| 2 | Peter | 40 |
| 3 | Frank | 20 |
| 4 | Pascal | 60 |
|---------------------|------------------|---------|
SELECT 1
SELECT name FROM table WHERE age > 40
Result: Paul, Pascal
SELECT 2
SELECT name FROM table where name like 'P%'
Result: Paul, Peter, Pascal
SELECT 3
SELECT name FROM table where id > 3
Result: Pascal
EDIT: This is a very simplified example of my problem. The statements can get very complex (joins over multiple tables), so a simple AND in the WHERE part is not the final solution.
The result should be Pascal. What I am looking for is something like a "reverse UNION".
Alternatively it would be possible to achieve that programatically (NodeJS), but I would like to avoid to iterate over all resultsets, because they might be quite huge.
Thanks in advance!
Is there any way to select all rows that are in all resultsets?
You seem to want and:
select name
from table
where age > 40 and name like 'P%' and id < 3
If using AND between the WHERE conditions is not possible, you could use multiple IN expressions on subqueries using your initial queries.
SELECT name
FROM table
WHERE id IN (SELECT id FROM table WHERE age > 40)
AND id IN (SELECT id FROM table where name like 'P%')
AND id IN (SELECT id FROM table where id < 3)
If you have different result sets and you want to see the intersection, you can use join:
select q1.id
from (<query 1>) q1 join
(<query 2>) q2
on q1.id = q2.id join
(<query 3>) q3
on q1.id = q3.id;
That said, I think GMB has the most concise answer to the question that you actually asked.
If your statements are complex, what you could do is to use a procedure where each of the statements put the matching id's into a temp table. Then select those rows where id's match the number of statements. This will also most likely be more efficient than one huge query with all complex statements combined into one.
create procedure sp_match_all()
begin
drop temporary table if exists match_tmp;
create temporary table match_tmp (
id int
);
insert into match_tmp
SELECT id FROM table WHERE age > 40;
insert into match_tmp
SELECT id FROM table where name like 'P%';
insert into match_tmp
SELECT id FROM table where id < 3;
select t.name
from table t
join (
select id
from match_tmp
group by id
having count(*)=3
) q on q.id=t.id;
drop temporary table match_tmp;
end

MySQL JOIN two tables by one value and take other results (without that value too)

I have some problem with a mySQL query.
The table A is this:
A.id
A.value1
A.user
Table B is:
B.id
B.user
I need to find value_that_i_need from query, by searching for B.user.
But I don't need only values with A.user, i need all values from Table A with the same A.id (inside Table A) that matches B.user.
So I need all distinct id (where there is B.user=A.user) and search for them inside table A by A.id.
I want to avoid 2 different queries! Already tried differents JOIN, nothing works for me.
EDIT
Ok, i will ty to explain the problem in a easiest way.
I have this table:
+---------+------------+
| id_user | another_id |
+---------+------------+
id_user -> unique id for each user
another_id -> an id related to something like a group
another_id can be the same to more users, but i need to take only users who are inside my same groups.
So i will have to check my groups (by searching my id_user) and then i have to see all users with my same another_id.
Problem is that if i query something like this:
SELECT * FROM table0 AS t0, something_like_groups AS slg
JOIN user_inside_group as uig ON slg.id_group=uig.group_id AND slg.id_user='my_user_id'
WHERE slg.id='id_group' AND t0.user_id=uig.user_id
Actually i have to join 3 tables, but the problem is that i need to find the "group" inside i am and get ALL informations about all users inside my same group. (without an additional query)
Perhaps you just want to find the min id based on b user and then get all the rows from a which match. for example
drop table if exists t,t1;
create table t( id int,user varchar(10));
create table t1( id int,user varchar(10));
insert into t values
(1,'aaa'),(1,'bbb'),(2,'ccc');
insert into t1 values
(1,'bbb'),(2,'ccc')
;
select t.id,t.user
from t
join
(
select t1.user,min(t.id) minid
from t1
join t on t.user = t1.user
group by t1.user
) s
on t.id = s.minid;
+------+------+
| id | user |
+------+------+
| 1 | aaa |
| 1 | bbb |
| 2 | ccc |
+------+------+
3 rows in set (0.00 sec)

How is COUNT(*) different to COUNT(id)? [duplicate]

I have the following query:
select column_name, count(column_name)
from table
group by column_name
having count(column_name) > 1;
What would be the difference if I replaced all calls to count(column_name) to count(*)?
This question was inspired by How do I find duplicate values in a table in Oracle?.
To clarify the accepted answer (and maybe my question), replacing count(column_name) with count(*) would return an extra row in the result that contains a null and the count of null values in the column.
count(*) counts NULLs and count(column) does not
[edit] added this code so that people can run it
create table #bla(id int,id2 int)
insert #bla values(null,null)
insert #bla values(1,null)
insert #bla values(null,1)
insert #bla values(1,null)
insert #bla values(null,1)
insert #bla values(1,null)
insert #bla values(null,null)
select count(*),count(id),count(id2)
from #bla
results
7 3 2
Another minor difference, between using * and a specific column, is that in the column case you can add the keyword DISTINCT, and restrict the count to distinct values:
select column_a, count(distinct column_b)
from table
group by column_a
having count(distinct column_b) > 1;
A further and perhaps subtle difference is that in some database implementations the count(*) is computed by looking at the indexes on the table in question rather than the actual data rows. Since no specific column is specified, there is no need to bother with the actual rows and their values (as there would be if you counted a specific column). Allowing the database to use the index data can be significantly faster than making it count "real" rows.
The explanation in the docs, helps to explain this:
COUNT(*) returns the number of items in a group, including NULL values and duplicates.
COUNT(expression) evaluates expression for each row in a group and returns the number of nonnull values.
So count(*) includes nulls, the other method doesn't.
We can use the Stack Exchange Data Explorer to illustrate the difference with a simple query. The Users table in Stack Overflow's database has columns that are often left blank, like the user's Website URL.
-- count(column_name) vs. count(*)
-- Illustrates the difference between counting a column
-- that can hold null values, a 'not null' column, and count(*)
select count(WebsiteUrl), count(Id), count(*) from Users
If you run the query above in the Data Explorer, you'll see that the count is the same for count(Id) and count(*)because the Id column doesn't allow null values. The WebsiteUrl count is much lower, though, because that column allows null.
The COUNT(*) sentence indicates SQL Server to return all the rows from a table, including NULLs.
COUNT(column_name) just retrieves the rows having a non-null value on the rows.
Please see following code for test executions SQL Server 2008:
-- Variable table
DECLARE #Table TABLE
(
CustomerId int NULL
, Name nvarchar(50) NULL
)
-- Insert some records for tests
INSERT INTO #Table VALUES( NULL, 'Pedro')
INSERT INTO #Table VALUES( 1, 'Juan')
INSERT INTO #Table VALUES( 2, 'Pablo')
INSERT INTO #Table VALUES( 3, 'Marcelo')
INSERT INTO #Table VALUES( NULL, 'Leonardo')
INSERT INTO #Table VALUES( 4, 'Ignacio')
-- Get all the collumns by indicating *
SELECT COUNT(*) AS 'AllRowsCount'
FROM #Table
-- Get only content columns ( exluce NULLs )
SELECT COUNT(CustomerId) AS 'OnlyNotNullCounts'
FROM #Table
COUNT(*) – Returns the total number of records in a table (Including NULL valued records).
COUNT(Column Name) – Returns the total number of Non-NULL records. It means that, it ignores counting NULL valued records in that particular column.
Basically the COUNT(*) function return all the rows from a table whereas COUNT(COLUMN_NAME) does not; that is it excludes null values which everyone here have also answered here.
But the most interesting part is to make queries and database optimized it is better to use COUNT(*) unless doing multiple counts or a complex query rather than COUNT(COLUMN_NAME). Otherwise, it will really lower your DB performance while dealing with a huge number of data.
Further elaborating upon the answer given by #SQLMeance and #Brannon making use of GROUP BY clause which has been mentioned by OP but not present in answer by #SQLMenace
CREATE TABLE table1 (
id INT
);
INSERT INTO table1 VALUES
(1),
(2),
(NULL),
(2),
(NULL),
(3),
(1),
(4),
(NULL),
(2);
SELECT * FROM table1;
+------+
| id |
+------+
| 1 |
| 2 |
| NULL |
| 2 |
| NULL |
| 3 |
| 1 |
| 4 |
| NULL |
| 2 |
+------+
10 rows in set (0.00 sec)
SELECT id, COUNT(*) FROM table1 GROUP BY id;
+------+----------+
| id | COUNT(*) |
+------+----------+
| 1 | 2 |
| 2 | 3 |
| NULL | 3 |
| 3 | 1 |
| 4 | 1 |
+------+----------+
5 rows in set (0.00 sec)
Here, COUNT(*) counts the number of occurrences of each type of id including NULL
SELECT id, COUNT(id) FROM table1 GROUP BY id;
+------+-----------+
| id | COUNT(id) |
+------+-----------+
| 1 | 2 |
| 2 | 3 |
| NULL | 0 |
| 3 | 1 |
| 4 | 1 |
+------+-----------+
5 rows in set (0.00 sec)
Here, COUNT(id) counts the number of occurrences of each type of id but does not count the number of occurrences of NULL
SELECT id, COUNT(DISTINCT id) FROM table1 GROUP BY id;
+------+--------------------+
| id | COUNT(DISTINCT id) |
+------+--------------------+
| NULL | 0 |
| 1 | 1 |
| 2 | 1 |
| 3 | 1 |
| 4 | 1 |
+------+--------------------+
5 rows in set (0.00 sec)
Here, COUNT(DISTINCT id) counts the number of occurrences of each type of id only once (does not count duplicates) and also does not count the number of occurrences of NULL
It is best to use
Count(1) in place of column name or *
to count the number of rows in a table, it is faster than any format because it never go to check the column name into table exists or not
There is no difference if one column is fix in your table, if you want to use more than one column than you have to specify that how much columns you required to count......
Thanks,
As mentioned in the previous answers, Count(*) counts even the NULL columns, whereas count(Columnname) counts only if the column has values.
It's always best practice to avoid * (Select *, count *, …)

SQL search to filter a list of id's

I'm curious if there is a crafty way to accomplish the following through SQL only. I have a list of id's from one database and I want to filter this list against another database/table. The requirements are:
Search through the table for matching id's; if there is a match and that record meets another constraint (where field2 is null), then remove it from the initial list.
Return the results from (1) as well as any id in the initial list which was not found in the second table.
For example, if my list contains id's [1,2,3,4], and my_table that I wish to filter on looks like:
+-------+--------+
| my_id | Field2 |
+-------+--------+
| 1 | true |
| 2 | |
| 3 | true |
+-------+--------+
Then I expect the final result to be [1,3,4]. Record with id=2 is filtered out because Field2 is null, and 4 remains because its not in the table at all.
So far, all I've come up with is below, which meets requirement (1), but not (2):
select distinct my_id
from my_table where my_id IN (1,2,3) --csv list of id's
and not exists
(select my_id
from my_table
where my_id IN (1,2,3) and field2 is null)
Is there a possibility of using MINUS somehow by creating a temp record set from my initial list of id's?
You can use WHERE <field> IN <list> to find all values matching a list:
SELECT DISTINCT t1.my_id
FROM my_table t1
LEFT JOIN my_table2 t2
ON t1.my_id = t2.field2
WHERE t1.my_id IN (1,2,3)
AND t2.field2 IS NULL

SELECT that returns list of values not occurring in any row

Query:
select id from users where id in (1,2,3,4,5)
If the users table contains ids 1, 2, 3, this would return 1, 2, and 3. I want a query that would return 4 and 5. In other words, I don't want the query to return any rows that exist in the table, I want to give it a list of numbers and get the values from that list that don't appear in the table.
(updated to clarify question following several inapplicable answers)
If you don't want to (explicitly) use temporary tables, this will work:
SELECT id FROM (
(SELECT 1 AS id) UNION ALL
(SELECT 2 AS id) UNION ALL
(SELECT 3 AS id) UNION ALL
(SELECT 4 AS id) UNION ALL
(SELECT 5 AS id)
) AS list
LEFT JOIN users USING (id)
WHERE users.id IS NULL
However, it is quite ugly, quite long, and I am dubious about how it would perform if the list of IDs is long.
Had the same need and built on the answer by BugFinder using a temporary table in session. This way it will automatically be destroyed after I'm done with the query, so I don't have to deal with house cleaning as I will run this type of query often.
Create the temporary table:
CREATE TEMPORARY TABLE tmp_table (id INT UNSIGNED);
Populate tmp_table with the values you will check:
INSERT INTO tmp_table (id) values (1),(2),(3),(4),(5);
With the table created and populated, run the query as with any regular table:
SELECT tmp_table.id
FROM tmp_table
LEFT JOIN users u
ON tmp_table.id = u.id
WHERE u.id IS NULL;
This info on MySQL Temporary Tables was also useful
Given the numbers are a fixed list. Quickest way I can think of is have a test table, populated with those numbers and do
untested select statement - but you will follow the princpal.
select test.number
from test
left join
users
on
test.number = users.id
where test.number <> users.id
Then you'll get back all the numbers that dont have a matching user.id and so can fill in the holes..
A different option is to use another table containing all possible ids and then do a select from there:
mysql> describe ids;
+-------+-----------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-------+-----------------+------+-----+---------+-------+
| id | int(5) unsigned | NO | | 0 | |
+-------+-----------------+------+-----+---------+-------+
1 row in set (0.05 sec)
mysql> select * from ids;
+----+
| id |
+----+
| 1 |
| 2 |
| 3 |
| 4 |
| 5 |
+----+
5 rows in set (0.00 sec)
mysql> select * from users;
+----+
| id |
+----+
| 1 |
| 2 |
| 3 |
+----+
3 rows in set (0.00 sec)
mysql> select id from ids where id not in (select id from users);
+----+
| id |
+----+
| 4 |
| 5 |
+----+
2 rows in set (0.04 sec)
Added side effect - allows you to expand the result list by inserting into the ids table
select missing.id
from
(select ELT(#indx, 1,2,3,4,5) as id, #indx:=#indx+1
from (select #indx:=1) init,
users
where ELT(#indx, 1,2,3,4,5) is not null
) missing
left join users u using(id)
where u.id is null;
What you have in here is:
ELT together with the variable #indx allows you to 'transpose' the list into a column.
(select #indx:=1) is needed to initialize indx to 1
and users table in the inner select is needed so that MySQL has something to iterate on (so the size of your list cannot exceed the number of the rows in users table, if that's the case that you can use any other table that is big enough instead of inner users, again, table itself does not matter it's just to have something to iterate on, so only its size that matters).
ELT(#indx, 1,2,3,4,5) is not null condition in the nested select is to stop iteration once you are at the index exceeding your list size.
The rest is simple - left join and check for null.
Sorry, I cannot add comments, #Austin,
You have my +1.
Anyway, not sure if this works on mysql, but change all that atomic selects concatenated unions for a value set, so you have something like that:
SELECT id FROM (
VALUES (1), (2), (3), (4), (5)
) AS list(id)
LEFT JOIN users USING (id)
WHERE users.id IS NULL