MYSQL, Duplicating records but changing a column's value - mysql

I'm trying to write an SQL statement that duplicates all rows WHERE employee = 16(i.e.), but the new rows would have a different employee value.
Table before INSERT:
| employee | property_name | property_value |
|:--------:|:--------------|:---------------|
| 16 | Salary | 28,000 |
| 16 | Department | 12 |
| 17 | Salary | 38,000 |
| 17 | Department | 8 |
Desired outcome after INSERT:
| employee | property_name | property_value |
|:--------:|:--------------|:---------------|
| 16 | Salary | 28,000 |
| 16 | Department | 12 |
| 17 | Salary | 38,000 |
| 17 | Department | 8 |
| 18 | Salary | 28,000 |
| 18 | Department | 12 |
I've seen some threads that use variables. Could I set and reference a variable somehow that would replace values from an insert/select?
The answer to this thread looks like it would work. But I'd rather not create and drop tables like that.

insert into YourTable (employee,property_name, property_value)
select 18, property_name, property_value from YourTable where employee = 16

Related

mysql - WHERE records does not have records in another table

I have some tables from which I need to get data.
Here is my structure:
employees
| id | name |
+----+---------+
| 1 | Michael |
| 2 | Sarah |
reports
| id | employee_id | month | year | value | group_id |
+----+-------------+-------+------+-------+----------+
| 1 | 1 | 01 | 2018 | 35 | 1 |
| 2 | 1 | 02 | 2018 | 12 | 1 |
| 3 | 2 | 02 | 2018 | 2 | 2 |
groups
| id | name | employee_id |
+----+------+-------------+
| 1 | G11 | 1 |
| 2 | Z15 | 2 |
Now I need to get groups with employee WHERE employee with group_id AND month AND year DON'T HAVE REPORT, eg.
When I look for 01.2018, it should returns me only Z15 but when I look for 04.2018 it should return Z15 and G11.
How can I do this? At this moment I have sth like this:
SELECT
groups.*,
employees.*,
-- all fields from reports
FROM
groups
INNER JOIN
employees
ON
employees.id = groups.employee_id
My column names are slightly different from yours. That's deliberate...
SELECT g.*
FROM groups g
LEFT
JOIN reports r
ON r.group_id = g.group_id
AND r.yearmonth = 201801
WHERE r.report_id IS NULL;

Compare different rows and bring out result

I have a table which requires me to pair certain rows together using a unique value that both the rows share.
For instance in the below table;
+--------+----------+-----------+-----------+----------------+-------------+
| id | type | member | code | description | matching |
+--------+----------+-----------+-----------+----------------+-------------+
| 1000 |transfer | 552123 | SC120314 | From Gold | |
| 1001 |transfer | 552123 | SC120314 | To Platinum | |
| 1002 |transfer | 833612 | SC120314 | From silver | |
| 1003 |transfer | 833612 | SC120314 | To basic | |
| 1004 |transfer | 457114 | SC150314 | From Platinum | |
| 1005 |transfer | 457114 | SC150314 | To silver | |
| 1006 |transfer | 933276 | SC180314 | From Gold | |
| 1007 |transfer | 933276 | SC180314 | From To basic | |
+--------+----------+-----------+-----------+----------------+-------------+
basically What i need the query / routine to do is find the rows where the value in the 'member' column for each row match. Then see if the values in the 'code' column for the same found rows also match.
If both columns for both rows match, then assign a value to the 'matching' column for both rows. This value should be the same for both rows and unique to only them.
The unique code can be absolutely anything, so long as it's exclusive to matching rows. Is there any query / routine capable of carrying this out?
I'm not sure I understand the question correctly, but if you like to pick out and update rows where the code and member columns matches and set matching to some unique value for each of the related rows, I believe this would work:
UPDATE <table> A
INNER JOIN (SELECT * FROM <table>) B ON
B.member = A.member && B.code = A.code && A.id <> B.id
SET A.matching = (A.id + B.id);
The matching value will be set to the sum of the id columns for both rows. Notice that updating the matching field this way will not work if there are more than two rows that can match.
Running the above query against your example table would yield:
+------+----------+--------+----------+---------------+----------+
| id | type | member | code | description | matching |
+------+----------+--------+----------+---------------+----------+
| 1000 | transfer | 552123 | SC120314 | From Gold | 2001 |
| 1001 | transfer | 552123 | SC120314 | To Platinum | 2001 |
| 1002 | transfer | 833612 | SC120314 | From Silver | 2005 |
| 1003 | transfer | 833612 | SC120314 | To basic | 2005 |
| 1004 | transfer | 457114 | SC150314 | From Platinum | 2009 |
| 1005 | transfer | 457114 | SC150314 | To silver | 2009 |
| 1006 | transfer | 933276 | SC180314 | From Gold | 2013 |
| 1007 | transfer | 933276 | SC180314 | From To basic | 2013 |
+------+----------+--------+----------+---------------+----------+
I can give you a simple query what can do what you need.
tst is the name of the table.
SELECT *, COUNT( t2.id ) as matching FROM tst t LEFT JOIN tst t2 ON t2.member = t.member GROUP BY t.id

Remove duplicates SQL while ignoring key and selecting max of specified column

I have the following sample data:
| key_id | name | name_id | data_id |
+--------+-------+---------+---------+
| 1 | jim | 23 | 098 |
| 2 | joe | 24 | 098 |
| 3 | john | 25 | 098 |
| 4 | jack | 26 | 098 |
| 5 | jim | 23 | 091 |
| 6 | jim | 23 | 090 |
I have tried this query:
INSERT INTO temp_table
SELECT
DISTINCT #key_id,
name,
name_id,
#data_id FROM table1,
I am trying to dedupe a table by all fields in a row.
My desired output:
| key_id | name | name_id | data_id |
+--------+-------+---------+---------+
| 1 | jim | 23 | 098 |
| 2 | joe | 24 | 098 |
| 3 | john | 25 | 098 |
| 4 | jack | 26 | 098 |
What I'm actually getting:
| key_id | name | name_id | data_id |
+--------+-------+---------+----------+
| 1 | jim | 23 | NULL |
| 2 | joe | 24 | NULL |
| 3 | john | 25 | NULL |
| 4 | jack | 26 | NULL |
I am able to dedupe the table, but I am setting the 'data_Id' value to NULL by attempting to override the field with '#'
Is there anyway to select distinct on all fields and while keeping the value for 'data_id'? I will take the highest or MAX data_id # if possible.
If you only want one row returned for a specific value (in this case, name), one option you have is to group by that value. This seems like a good approach because you also said you wanted the largest data_id for each name, so I would suggest grouping and using the MAX() aggregate function like this:
SELECT name, name_id, MAX(data_id) AS data_id
FROM myTable
GROUP BY name, name_id;
The only thing you should be aware of is the possibility that a name occurs multiple times under different name_ids. If that is possible in your table, you could group by the name_id too, which is what I did.
Since you stated you're not interested in the key_id but only the name, I just excluded it from the query altogether to get this:
| name | name_id | data_id |
+-------+---------+---------+
| jim | 23 | 098 |
| joe | 24 | 098 |
| john | 25 | 098 |
| jack | 26 | 098 |
Here is the SQL Fiddle example.
RENAME TABLE myTable to Old_mytable,
myTable2 to myTable
INSERT INTO myTable
SELECT *
FROM Old_myTable
GROUP BY name, name_id;
This groups my tables by the values I want to dedupe while still keeping structure and ignoring the 'Data_id' column

Group column values together in one cell

I need to write an SQL select statement that groups together values from one column into one cell.
e.g.
table name: Customer_Hobbies
+------------+------------+-----------+
| CustomerId | Age | Hobby |
+------------+------------+-----------+
| 123 | 17 | Golf |
| 123 | 17 | Football |
| 324 | 14 | Rugby |
| 627 | 28 | Football |
+------------+------------+-----------+
should return...
+------------+------------+----------------+
| CustomerId | Age | Hobbies |
+------------+------------+----------------+
| 123 | 17 | Golf,Football |
| 324 | 14 | Rugby |
| 627 | 28 | Football |
+------------+------------+----------------+
Is this possible?
N.B. I know the data's not laid out in a particularly sensible way, but I can't change that.
You want group_concat():
select customerId, age, group_concat(hobby) as hobbies
from t
group by customerId, age

60 million entries, select entries from a certain month. How to optimize database?

I have a database with 60 million entries.
Every entry contains:
ID
DataSourceID
Some Data
DateTime
I need to select entries from certain month. Each month contains approximately 2 million entries.
select *
from Entries
where time between "2010-04-01 00:00:00" and "2010-05-01 00:00:00"
(query takes approximately 1.5 minutes)
I'd also like to select data from certain month from a given DataSourceID.
(takes approximately 20 seconds)
There are about 50-100 different DataSourceIDs.
Is there a way to make this faster? What are my options?
How to optimize this database/query?
EDIT: There's approx. 60-100 inserts PER second!
To get entries in a particular month, for a particular year, faster - you will need to index the time column:
CREATE INDEX idx_time ON ENTRIES(time) USING BTREE;
Additionally, use:
SELECT e.*
FROM ENTRIES e
WHERE e.time BETWEEN '2010-04-01' AND DATE_SUB('2010-05-01' INTERVAL 1 SECOND)
...because BETWEEN is inclusive, so you'd get anything dated "2010-05-01 00:00:00" with the query you posted.
I'd also like to select data from certain month from a given DataSourceID
You can either add a separate index for the datasourceid column:
CREATE INDEX idx_time ON ENTRIES(datasourceid) USING BTREE;
...or setup a covering index to include both columns:
CREATE INDEX idx_time ON ENTRIES(time, datasourceid) USING BTREE;
A covering index requires that the leftmost columns have to be used in the query for the index to be used. In this example, having time first will work for both situations you mentioned -- datasourceid doesn't have to be used for the index to be of use. But, you have to test your queries by viewing the EXPLAIN output to really know what works best for your data & the queries being performed on that data.
That said, indexes will slow down INSERT, UPDATE and DELETE statements. And an index doesn't provide a lot of value if the column data is has few distinct values - IE: a boolean column is a bad choice to index, because the cardinality is low.
Take advantage of innodb clustered primary key indexes.
http://dev.mysql.com/doc/refman/5.0/en/innodb-index-types.html
This will be extremely performant:
create table datasources
(
year_id smallint unsigned not null,
month_id tinyint unsigned not null,
datasource_id tinyint unsigned not null,
id int unsigned not null, -- needed for uniqueness
data int unsigned not null default 0,
primary key (year_id, month_id, datasource_id, id)
)
engine=innodb;
select * from datasources where year_id = 2011 and month_id between 1 and 3;
select * from datasources where year_id = 2011 and month_id = 4 and datasouce_id = 100;
-- etc..
EDIT 2
Forgot i was running the first test script with 3 months of data. Here's the results for a single month : 0.34 and 0.69 seconds.
select d.* from datasources d where d.year_id = 2010 and d.month_id = 3 and datasource_id = 100 order by d.id desc limit 10;
+---------+----------+---------------+---------+-------+
| year_id | month_id | datasource_id | id | data |
+---------+----------+---------------+---------+-------+
| 2010 | 3 | 100 | 3290330 | 38434 |
| 2010 | 3 | 100 | 3290329 | 9988 |
| 2010 | 3 | 100 | 3290328 | 25680 |
| 2010 | 3 | 100 | 3290327 | 17627 |
| 2010 | 3 | 100 | 3290326 | 64508 |
| 2010 | 3 | 100 | 3290325 | 14257 |
| 2010 | 3 | 100 | 3290324 | 45950 |
| 2010 | 3 | 100 | 3290323 | 49986 |
| 2010 | 3 | 100 | 3290322 | 2459 |
| 2010 | 3 | 100 | 3290321 | 52971 |
+---------+----------+---------------+---------+-------+
10 rows in set (0.34 sec)
select d.* from datasources d where d.year_id = 2010 and d.month_id = 3 order by d.id desc limit 10;
+---------+----------+---------------+---------+-------+
| year_id | month_id | datasource_id | id | data |
+---------+----------+---------------+---------+-------+
| 2010 | 3 | 116 | 3450346 | 42455 |
| 2010 | 3 | 116 | 3450345 | 64039 |
| 2010 | 3 | 116 | 3450344 | 27046 |
| 2010 | 3 | 116 | 3450343 | 23730 |
| 2010 | 3 | 116 | 3450342 | 52380 |
| 2010 | 3 | 116 | 3450341 | 35700 |
| 2010 | 3 | 116 | 3450340 | 20195 |
| 2010 | 3 | 116 | 3450339 | 21758 |
| 2010 | 3 | 116 | 3450338 | 51378 |
| 2010 | 3 | 116 | 3450337 | 34687 |
+---------+----------+---------------+---------+-------+
10 rows in set (0.69 sec)
EDIT 1
Decided to test the above schema with approx. 60 million rows spread over 3 years. Each query is run cold i.e. each run separately after which mysql is restarted clearing any buffers and with no query caching.
The full test script can be found here : http://pastie.org/1723506 or below...
As you can see it's a pretty performant schema even on my humble desktop :)
select count(*) from datasources;
+----------+
| count(*) |
+----------+
| 60306030 |
+----------+
select count(*) from datasources where year_id = 2010;
+----------+
| count(*) |
+----------+
| 16691669 |
+----------+
select
year_id, month_id, count(*) as counter
from
datasources
where
year_id = 2010
group by
year_id, month_id;
+---------+----------+---------+
| year_id | month_id | counter |
+---------+----------+---------+
| 2010 | 1 | 1080108 |
| 2010 | 2 | 1210121 |
| 2010 | 3 | 1160116 |
| 2010 | 4 | 1300130 |
| 2010 | 5 | 1860186 |
| 2010 | 6 | 1220122 |
| 2010 | 7 | 1250125 |
| 2010 | 8 | 1460146 |
| 2010 | 9 | 1730173 |
| 2010 | 10 | 1490149 |
| 2010 | 11 | 1570157 |
| 2010 | 12 | 1360136 |
+---------+----------+---------+
12 rows in set (5.92 sec)
select
count(*) as counter
from
datasources d
where
d.year_id = 2010 and d.month_id between 1 and 3 and datasource_id = 100;
+---------+
| counter |
+---------+
| 30003 |
+---------+
1 row in set (1.04 sec)
explain
select
d.*
from
datasources d
where
d.year_id = 2010 and d.month_id between 1 and 3 and datasource_id = 100
order by
d.id desc limit 10;
+----+-------------+-------+-------+---------------+---------+---------+------+---------+-----------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref |rows | Extra |
+----+-------------+-------+-------+---------------+---------+---------+------+---------+-----------------------------+
| 1 | SIMPLE | d | range | PRIMARY | PRIMARY | 4 | NULL |4451372 | Using where; Using filesort |
+----+-------------+-------+-------+---------------+---------+---------+------+---------+-----------------------------+
1 row in set (0.00 sec)
select
d.*
from
datasources d
where
d.year_id = 2010 and d.month_id between 1 and 3 and datasource_id = 100
order by
d.id desc limit 10;
+---------+----------+---------------+---------+-------+
| year_id | month_id | datasource_id | id | data |
+---------+----------+---------------+---------+-------+
| 2010 | 3 | 100 | 3290330 | 38434 |
| 2010 | 3 | 100 | 3290329 | 9988 |
| 2010 | 3 | 100 | 3290328 | 25680 |
| 2010 | 3 | 100 | 3290327 | 17627 |
| 2010 | 3 | 100 | 3290326 | 64508 |
| 2010 | 3 | 100 | 3290325 | 14257 |
| 2010 | 3 | 100 | 3290324 | 45950 |
| 2010 | 3 | 100 | 3290323 | 49986 |
| 2010 | 3 | 100 | 3290322 | 2459 |
| 2010 | 3 | 100 | 3290321 | 52971 |
+---------+----------+---------------+---------+-------+
10 rows in set (0.98 sec)
select
count(*) as counter
from
datasources d
where
d.year_id = 2010 and d.month_id between 1 and 3;
+---------+
| counter |
+---------+
| 3450345 |
+---------+
1 row in set (1.64 sec)
explain
select
d.*
from
datasources d
where
d.year_id = 2010 and d.month_id between 1 and 3
order by
d.id desc limit 10;
+----+-------------+-------+-------+---------------+---------+---------+------+---------+-----------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref |rows | Extra |
+----+-------------+-------+-------+---------------+---------+---------+------+---------+-----------------------------+
| 1 | SIMPLE | d | range | PRIMARY | PRIMARY | 3 | NULL |6566916 | Using where; Using filesort |
+----+-------------+-------+-------+---------------+---------+---------+------+---------+-----------------------------+
1 row in set (0.00 sec)
select
d.*
from
datasources d
where
d.year_id = 2010 and d.month_id between 1 and 3
order by
d.id desc limit 10;
+---------+----------+---------------+---------+-------+
| year_id | month_id | datasource_id | id | data |
+---------+----------+---------------+---------+-------+
| 2010 | 3 | 116 | 3450346 | 42455 |
| 2010 | 3 | 116 | 3450345 | 64039 |
| 2010 | 3 | 116 | 3450344 | 27046 |
| 2010 | 3 | 116 | 3450343 | 23730 |
| 2010 | 3 | 116 | 3450342 | 52380 |
| 2010 | 3 | 116 | 3450341 | 35700 |
| 2010 | 3 | 116 | 3450340 | 20195 |
| 2010 | 3 | 116 | 3450339 | 21758 |
| 2010 | 3 | 116 | 3450338 | 51378 |
| 2010 | 3 | 116 | 3450337 | 34687 |
+---------+----------+---------------+---------+-------+
10 rows in set (1.98 sec)
Hope this helps :)
You could use an index to trade disk usage for query speed. An index that starts the time column can speed up queries that ask for a particular month:
create index IX_YourTable_Date on YourTable (time, DataSourceID, ID, SomeData)
Because the index starts with the time field, MySQL can do a key range scan on the index. That should be as fast as it gets. The index should include all columns in the query, or MySQL would have to look from the index to the table data for each row. Since you're asking for 2 million rows, MySQL will likely ignore an index that is not covering. (Covering index = index that includes all rows in the query.)
If you never query on ID, you can redefine the table to use (time, DataSourceID, ID) as primary key:
alter table YourTable add primary key (time, DataSourceID, ID)
This will speed up searches on time at no cost in disk space, but searches on ID will be very slow.
I would try putting an index if you haven't already on the time field.
For DataSourceID, you could try using Enum instead of varchar/int.