Comparing two tables in MySQL - mysql

I have a master table and a temp table that look something like:
things_temp
+----+--------+--------------+
| id | number | current_time |
+----+--------+--------------+
| 1 | 456 | 9/16/2013 |
| 2 | 123 | 9/16/2013 |
+----+--------+--------------+
things_master
+----+--------+--------------+-----+
| id | number | last_updated | old |
+----+--------+--------------+-----+
| 1 | 456 | 9/15/2013 | 0 |
| 2 | 234 | 9/15/2013 | 0 |
| 3 | 888 | 8/14/2012 | 1 |
+----+--------+--------------+-----+
I need to iterate through the things_temp table and, if there exists the same number in things_master AND old == 0, update the last_updated to the current_time.
Otherwise, if both conditions above are not satisfied, simply add the record from things_temp to things_master with last_updated as current_time and old = 0.
Now, I could easily get the count of things_temp and check each one individually. But there are something like 40,000 records in each table so I think that may be a bad idea.
I've been looking around and there are things like UNION ALL, LEFT JOIN, INNER JOIN that all seem like they may be a part of the solution, but I'm a bit lost.
Is there a better way to accomplish my task without iterating through each record of things_temp and searching through things_master?

You might be able to do this in one statement by abusing replace into, but it's probably clearer to do it in two steps. Other databases support merge which is designed for this sort of thing.
start transaction;
-- update any matching numbers with the data from thing_temp
update
things_master m
inner join
things_temp t
on m.number = t.number
set
m.last_updated = t.`current_time`
where
m.old = 0;
-- add any missing numbers
insert into
things_master (number, last_updated, old)
Select
number, `current_time`, 0
From
things_temp t
Where
not exists (
select
'x'
from
things_master m
where
t.number = m.number and
m.old = 0
);
commit transaction;

Related

In mysql how can I get only rows from one table which do not link to any rows in another table with a specific ID

I have two tables with the following structures (unnecessary columns trimmed out)
----------------- ---------------------
| mod_personnel | | mod_skills |
| | | |
| - prs_id | | - prs_id |
| - name | | - skl_id |
----------------- | |
---------------------
There may be 0 to many rows in the skills table for each prs_id
What I want is all the personnel records which do NOT have an associated skill record with skill_id 1.
In plain English "I want all the people who do not have the skill x".
Currently, I have only been able to do it with the following nested select. But I am hoping to find a faster way.
SELECT * FROM `mod_personnel` WHERE `prs_id` NOT IN (
SELECT `prs_id` FROM `mod_skills` WHERE `skl_id` = 1 )
This may be faster:
SELECT `mod_personnel`.*
FROM `mod_personnel`
left outer join `mod_skills`
on `mod_skills`.`prs_id` = `mod_personnel`.`prs_id`
and `mod_skills`.`skl_id` = 1
WHERE `mod_skills`.`prs_id` is null;
Using a NOT EXISTS might be faster.
SELECT *
FROM `mod_personnel` p
WHERE NOT EXISTS (SELECT *
FROM `mod_skills` s
WHERE s.`prs_id` = p.`prs_id`
AND s.`skl_id` = 1 );

MySQL Inner Join with No Rows

I have a MySQL database that I normalized and the idea is to allow for a business to select zero or more marketing sequences, but the kicker is that a handful of marketing sequences are required (right now I have 4, but the list can grow). So what I've done is structured my tables as such:
sequence
+-------------+------------------+-------+-------------+----------+
| sequence_id | customer_type_id | title | description | required |
| 1 | 1 | ... | ... | true |
| 2 | 1 | ... | ... | true |
| 3 | 1 | ... | ... | false |
| 4 | 2 | ... | ... | true |
| 5 | 3 | ... | ... | true |
| 6 | 4 | ... | ... | false |
+-------------+------------------+-------+-------------+----------+
business_sequence
+----------------------+-------------+-------------+
| business_sequence_id | business_id | sequence_id |
+----------------------+-------------+-------------+
customer_type_id and business_id are foreign key fields that link to tables that describe the type of customer (customer, former customer, etc.) and the business's information (name, address, etc.) respectively.
The reason why I have the required column in my sequence table is so that if a business decides not to allow for any of the non-required sequences, then that business would not need a row. After all, there's no need to have duplicate rows in the business_sequence table if the only piece of data that is different is the business_id field.
Now what I'm trying to do is get all the rows and all the fields from the sequence table where the business_id in the business_sequence table matches a parameterized value (say 1 for the example that I'm going to show in a second). The query that I tried to use is:
SELECT
s.*
FROM
`sequence` AS s
INNER JOIN `business_sequence` AS b ON b.`sequence_id` = s.`sequence_id`
WHERE
b.`business_id` = 1 AND
s.`required` = true;
But this returned no results if the business had no rows in the sequence table. What I expected it to do is return the 0 rows from the b.business_id = 1 but I also expected it to return the 4 "required" rows (ids: 1, 2, 4, and 5) from the s.required = true.
Whenever I took out the INNER JOIN statement and the business_id portion of the WHERE clause, it does in fact return the 4 "required" rows. This leads me to believe that in my original query, because there are no rows for that particular business_id in the sequence table it isn't returning anything.
With all of this being said, how do I accomplish retrieving the zero or more rows when the business_id field matches the parameterized value and retrieve all of the rows when the required field is true?
How about using OR condition in stead of AND ?
SELECT
s.*
FROM
`sequence` AS s
INNER JOIN `business_sequence` AS b ON b.`sequence_id` = s.`sequence_id`
WHERE
b.`business_id` = 1 OR
s.`required` = true;
I was able to resolve my problem by performing a UNION as such:
SELECT * FROM `sequence` WHERE `required` = true
UNION
SELECT
s.*
FROM
`sequence` AS s
INNER JOIN `business_sequence` AS b ON b.`sequence_id` = s.`sequence_id`
WHERE
b.`business_id` = 1

MySQL Intermediate-Level Table Relationship

Each row in Table_1 needs to have a relationship with one or more rows that might come from any number of other tables in the database (Table_X). So I set up an intermediate table (Table_2) where each row contains an id from Table_1, and the id from Table_X. It also has its own auto increment id since none of the relationships will be exclusive and therefore both the other ids will not be unique in the table.
My problem now is that when I retrieve the row from Table_1 and would like to see the information from each related row from Table_X, I don't know how to get it. At first I thought I could create a column for the exact name of Table_X for each row in Table_2 and have a second SELECT statement using that information, but I've been seeing inklings about things such as foreign keys and join statements that I think I need to get into. I'm just having trouble sorting it all out. Do I even need Table_2?
This probably isn't overly complicated, but I'm just getting into MySQL and this is the first real challenge I've encountered.
Edit to include requested information: If I understand correctly, I think I'm dealing with a many to many relationship. Table_3 has games; Table_1 has articles. An article can be about multiple games, and a game can also have multiple articles written about it. The only other possibly pertinent information I can see is that when a new article is made, every game that will be related to it is decided all at once. But the list of articles related to a given game can grow over time as more articles are written. That's probably not especially important, however.
If I understood correctly You are talking about one to many relationship in database (for example: one person can have multiple phone numbers), You can store data in two separate tables persons and phones.
Persons:
|person_id|person_name |person_age |
| 1 | Bodan Kustan| 28 |
Phones:
|phone_id |person_id |phone_number|
| 1 | 1 | 31337 |
| 2 | 1 | 370 |
Then you can execute query with Join:
SELLECT * FROM `persons`
LEFT JOIN `phones` ON `persons`.`person_id` = `phones`.`person_id`
WHERE `persons`.`person_id` = 1;
And it will return to You list of persons with phone numbers:
|person_id|person_name |person_age |phone_id |person_id |phone_number|
| 1 | Bodan Kustan| 28 | 1 | 1 | 31337 |
| 1 | Bodan Kustan| 28 | 2 | 1 | 370 |
Another possibility is Many to Many relationship (for example: Any person can love pizza, and pizza is not unique for that person), then You need third table to join tables together person_food
Persons:
|person_id|person_name |person_age |
| 1 | Bodan Kustan| 28 |
Food:
|food_id |food_name |
| 1 | meat |
| 2 | pizza |
Person_Food
|person_id |food_id |
| 1 | 2 |
Then you can execute query with Join:
SELLECT * FROM `persons`
LEFT JOIN `person_food` ON `person`.`person_id` = `person_food`.`person_id`
LEFT JOIN `food` ON `food`.`food_id` = `person_food`.`food_id`
WHERE `persons`.`person_id` = 1;
And it will return data from all tables:
|person_id|person_name |person_age |person_id |food_id |food_name |
| 1 | Bodan Kustan| 28 | 1 | 2 | pizza |
However sometimes you need to join n amount of tables to join, then You could use separate table to hold information about relation. My approach (I don't think it's the best) would be to store table name next to relation (for example split mobile phones and home phones into two separate tables):
Persons:
|person_id|person_name |person_age |
| 1 | Bodan Kustan| 28 |
Mobile_Phone:
|mobile_phone_id |mobile_phone_number |
| 1 | 31337 |
Home_Phone:
|home_phone_id |home_phone_number |
| 1 | 370 |
Person_Phone:
|person_id |related_id |related_column |related_table |
| 1 | 1 | mobile_phone_id | mobile_phone |
| 1 | 1 | home_phone_id | home_phone |
Then query middle table to get all relations:
SELECT * FROM person_phone WHERE person_id = 1
Then build dynamic query (pseudo code, not tested -- might not work):
foreach (results as result)
append_to_final_sql = "LEFT JOIN {related_table}
ON {related_table}.{related_column} = `person_phone`.`related_id`
AND `person_phone`.`related_table` = {related_table}"
final_sql = "SELECT * FROM `persons` "
+ append_to_final_sql +
" WHERE `persons`.`person_id` = 1"
So Your final SQL would be:
SELECT * FROM `persons`
LEFT JOIN `person_phone` ON `person_phone`.`person_id` = `person`.`person_id`
LEFT JOIN `mobile_phone` ON `mobile_phone`.`mobile_phone_id` = `person_phone`.`related_id` AND `person_phone`.`related_table` = 'mobile_phone'
LEFT JOIN `home_phone` ON `home_phone`.`home_phone_id` = `person_phone`.`related_id` AND `person_phone`.`related_table` = 'home_phone'
You only need Table2 if entries in Table_x can be related to multiple rows in Table1 - otherwise a simple key for Table1 will suffice.
Look into joins - very powerful, flexible and fast.
select * from Table1 left join Table2 on Table1_id = Table2_table_1_id
left join Table_X on Tablex_id = Table2_table_x_id
Look at the output and you'll see that it returns all table_x rows with copies of the Table1 and Table2 fields.

SELECT from Union x 3 using filter of another table

Background
I have a web application which must remove entries from other tables, filtered through a selection of 'tielists' from table 1 -> item_table 1, table 2, table 3.... now basically my result set is going to be filthy big unless I use a filter statement from another table, using a user_id... so can someone please help me structure my statement as needed? TY!
Tables
cars_belonging_to_user
-----------------------------
ID | user_id | make | model
----------------------------
1 | 1 | Toyota | Camry
2 | 1 |Infinity| Q55
3 | 1 | DMC | DeLorean
4 | 2 | Acura | RSX
Okay, Now the three 'tielists'
name:tielist_one
----------------------------
id | id_of_car | id_x | id_y|
1 | 1 | 12 | 22 |
2 | 2 | 23 | 32 |
-----------------------------
name:tielist_two
-------------------------------
id | id_of_car | id_x | id_z|
1 | 3 | 32 | 22 |
-----------------------------
name: tielist_three
id | id_of_car | id_x | id_a|
1 | 4 | 45 | 2 |
------------------------------
Result Set and Code
echo name_of_tielist_table
// I can structure if statements to echo result sets based upon the name
// Future Methodology: if car_id is in tielist_one, delete id_x from x_table, delete id_y from y_table...
// My output should be a double select base:
--SELECT * tielists from WHERE car_id is 1... output name of tielist... then
--SELECT * from specific_tielist where car_id is 1.....delete x_table, delete y_table...
Considering the list will be massive, and the tielist equally long, I must filter the results where car_id(id) = $variable && user_id = $id....
Side Notes
Only one car id will appear once in any single tielist..
This select statement MUST be filtered with user_id = $variable... (and remember, i'm looking for which car id too)
I MUST HAVE THE NAME of the tielist it comes from able to be echo'd into a variable...
I will only be looking for one single id_of_car at any given time, because this select will be contained in a foreach loop.
I was thinking a union all items would do the trick to select the row, but how can I get the name of the tielist the row is in, and how can the filter be used from the user_id row
If you want performance, I would suggest left outer join instead of union all. This will allow the query to make efficient use of indexes for your purpose.
Based on what you say, a car is in exactly one of the lists. This is important for this method to work. Here is the SQL:
select cu.*,
coalesce(tl1.id_x, tl2.id_x, tl3.id_x) as id_x,
tl1.y, tl2.idz, tl3.id_a,
(case when tl1.id is not null then 'One'
when tl2.id is not null then 'Two'
when tl3.id is not null then 'Three'
end) as TieList
from Cars_Belonging_To_User cu left ouer join
TieList_One tl1
on cu.id_of_car = tl1.id_of_car left outer join
TieList_Two tl2
on cu.id_of_car = tl2.id_of_car left outer join
TieList_Three tl3
on cu.id_of_car = tl3.id_of_car;
You can then add a where clause to filter as you need.
If you have an index on id_of_car for each tielist table, then the performance should be quite good. If the where clause uses an index on the first table, then the joins and where should all be using indexes, and the query will be quite fast.

Mysql - deleting duplicates

I have a table with a barcode column with a unique index. The data has been loaded with additional chars (-xx) at the end of each barcode to prevent duplicates, but there will be lots of duplicates once I strip off the suffix. Here is a sample of the data:
itemnumber barcode
17912 2-14
18082 2-1
21870 2-10
29219 2-8
Then I created two temporary tables, marty and manny, both with the itemnumber and the stripped down barcodes. So,both tables would contain
itemnumber barcode
17912 2
18082 2
21870 2
29219 2
etc
And the I tried to delete all but the first entry with barcode '2' in the marty table(and every other barcode). I hoped then to update the original table with the correct first entry and the users could fix up the duplicates themselves in time in the application.
So, this was my query to delete all but the first entry in the marty table for each barcode
DELETE FROM marty
WHERE itemnumber NOT IN
(SELECT MIN(itemnumber) FROM manny GROUP BY barcode)
There are 130,000 rows in marty and manny. The query took over 24 hours and then didn't finish properly. The connection to the server crashed and the query did not do all the updates.
Is there a better way to approach this that would not us the subquery, which i think is causing the delay? And the group by is probably slowing things down too with so many records.
Thanks
One more variant: this variant works without any temporary tables for deleting duplicates:
Delete m1
From Marty m1
join Marty m2
on m1.barcode = m2.barcode
and m1.itemnumber > m2.itemnumber
Here is a two-stage approach that avoids use of NOT IN. It also does not use the temporary table "manny". First, join "marty" to itself to pick out rows for which itemnumber != min(itemnumber). Use UPDATE to set barcode for these rows to NULL. A second pass with DELETE then removes all rows that were flagged in the first phase.
For this example, I split the barcode column of "marty" into two columns; it could be done with the table in its original format with some modification (need to split the column values on the fly).
select * from marty;
+------------+---------+---------+
| itemnumber | barcode | subcode |
+------------+---------+---------+
| 17912 | 2 | 14 |
| 18082 | 2 | 1 |
| 21870 | 2 | 10 |
| 29219 | 2 | 8 |
| 30133 | 3 | 5 |
| 30134 | 3 | 7 |
| 30139 | 3 | 9 |
| 30142 | 3 | 12 |
+------------+---------+---------+
8 rows in set (0.00 sec)
UPDATE
(marty m1
JOIN
(SELECT barcode,
MIN(itemnumber) AS itemnumber
FROM marty
GROUP BY barcode) m2
USING(barcode))
SET m1.barcode = NULL WHERE m1.itemnumber != m2.itemnumber;
mysql> select * from marty;
+------------+---------+---------+
| itemnumber | barcode | subcode |
+------------+---------+---------+
| 17912 | 2 | 14 |
| 18082 | NULL | 1 |
| 21870 | NULL | 10 |
| 29219 | NULL | 8 |
| 30133 | 3 | 5 |
| 30134 | NULL | 7 |
| 30139 | NULL | 9 |
| 30142 | NULL | 12 |
+------------+---------+---------+
8 rows in set (0.00 sec)
DELETE FROM marty WHERE barcode IS NULL;
MySQL is notoriously slow when using IN with very large sets. A scripted alternative:
Use a script to construct a long itemnumber = X OR itemnumber = y OR itemnumber = z clause (chunks size ~1000) and INSERT the matched rows (i.e. the ones that would not have been DELETEd in your previous query) into a new table, TRUNCATE the existing and load the contents of the new table back into the old with INSERT INTO marty SELECT * FROM marty_tmp.
You may want to lock the table or run in a transaction for the final TRUNCATE, INSERT.
edit:
Query SELECT MIN(itemnumber) FROM manny GROUP BY barcode from a script, store results in desiredItemNumbers array
Take batches of 1000 desiredItemNumbers and construct this query: INSERT INTO manny_tmp SELECT * FROM manny WHERE itemnumber = desiredItemNumbers[0] OR itemnumber = desiredItemNumbers[1] .... Rerun this query until you've exhausted the desiredItemNumbers array (n.b. the last query will probably have less than 1000 desiredItemNumbers).
You now have a table with the results that you would have been left with had you DELETEd the rest, so swap the contents of the marty and marty_tmp tables.
TRUNCATE marty
INSERT INTO marty SELECT * FROM marty_tmp
If you are creating temp tables anyway, how about building your table with an "INSERT INTO " or "CREATE TABLE .. AS ..." based on:
SELECT MIN(itemnumber) AS itemnumber, barcode
FROM marty
GROUP BY barcode