Merge data from 2 tables, use only unique rows - mysql

I have 2 tables in my database
primary_id
primary_date
primary_measuredData
temporary_id
temporary_date
temporary_measuredData
well. the table have other columns but these are the important ones.
What I want is the following.
Table "primary" consists of verified measuredData.If data is available here, the output should choose first from primary, and if not available in primary, choose from temporary.
In about 99.99% of the cases all old data is in the primary, and only the last day is from the temporary table.
Example:
primary table:
2013-02-05; 345
2013-02-07; 123
2013-02-08; 3425
2013-02-09; 334
temporary table:
2013-02-06; 567
2013-02-07; 1345
2013-02-10; 31
2013-02-12; 33
I am looking for the SQL query that outputs:
2013-02-05; 345 (from primary)
2013-02-06; 567 (from temporary, no value available from prim)
2013-02-07; 123 (from primary, both prim & temp have this date so primary is used)
2013-02-08; 3425 (primary)
2013-02-09; 334 (primary)
2013-02-10; 31 (temp)
2013-02-12; 33 (temp)
you see, no duplicate dates and if data is avalable at primary table then the data is used from that one.
I have no idea how to solve this, so I cant give you any "this is what I've done so far :D"
Thanks!
EDIT:
The value of "measuredData" can differ from temp and primary. This is because temp is used to store a temporary value, and later when the data is verified it goes into the primary table.
EDIT 2:
I changed the primary table and added a new column "temporary". So that I store all the data in the same table. When the primary data is updated it updates the temporary data with the new numbers. This way I dont need to merge 2 tables into one.

You should start with a UNION QUERY like this:
SELECT p.primary_date AS dt, p.primary_measuredData as measured
FROM
`primary` p
UNION ALL
SELECT t.temporary_date, t.temporary_measuredData
FROM
`temporary` t LEFT JOIN `primary` p
ON p.primary_date=t.temporary_date
WHERE p.primary_date IS NULL
a LEFT JOIN where there's no match (p.primary_date IS NULL) will return all rows from the temporary table that are not present in the primary table. And using UNION ALL you can return all rows available in the first table.
You might want to add an ORDER BY clause to the whole query. Please see fiddle here.

Related

create a table with avg() from other table that updates automatically like a foreign key

I have a "rating" table with multiple columns like the one bellow
user_id subject_id subject_rating
1 25 9
2 20 6
1 20 8
3 25 10
I want to create another table "av_rating" with a column that stores average rating from the column "subject_rating" found in the "rating" table above. I have tried the following code
CREATE TABLE av_rating AS SELECT AVG(subject_rating) FROM rating WHERE subject_id=25
the problem with this code is that it does not update the value in the new table after the initial value is stored even if the values in the first "rating" table is changed. I tried using the FOREIGN KEY REFERENCE with av() but couldn't get the syntax right.
Thank you and I am sorry if it's unclear.
Depending on the scale of data you are dealing with, you will probably want to create either a view or a materialized view (which don't exist natively in MySQL but can be approximated). For a small dataset (lets say less than a 100k rows) or if the performance of the query isn't a major concern, I suspect a view should suffice.
Typically one wouldn't include such a specific WHERE clause in the definition of a view or materialized view. Instead, you probably want to select the average subject_value within each subject and then join to those values based on the subject_id.
CREATE OR REPLACE VIEW av_rating AS
SELECT subject_id, AVG(subject_rating) avg
FROM rating
GROUP by subject_id
;
Which in the example case would generate the following:
SELECT *
FROM av_rating
;
---------------------------------
subject_id | avg
------------+--------------------
25 | 9.5000000000000000
20 | 7.0000000000000000

multi-to-multi indexing issue, mysql

i have a multi-to-multi table, this is going to have millions of rows. Let me describe my confusion with an example.
example:
table: car_dealer_rel
opt:1
columns: car_id: int unsigned, dealer_id: int unsigned
index on: car_id, dealer_id
car_id|dealer_id
-------|---------
1 | 1
1 | 2
....
sub-opt:1: Here I can have one index on both columns.
sub-opt:2: One combined index on 2 columns.
opt-2:
one column table
col: car_id_dealer_id: varchar:21
index on: PKI on this single column.
Here idea is to put values as: car_id.dealer_id and do searches as %.xxx and or xxx.%
car_id_dealer_id
----------------
1.1
1.2
1.15
2.10
...
...
after millions of records which will be faster for:
read from
add/update/delete.
I am novice on MySQL, all help is appreciated.
with first one
car_id|dealer_id
-------|---------
1 | 1
1 | 2
you can easlily create composite index fo both sides
create index ind1 on car_dealer_rel (car_id,dealer_id );
create index ind2 on car_dealer_rel (dealer_id, car_id );
that work very fast
and you can easily filter in both the sense
where car_id = your_value
or
where dealer_id = another_value
or using both
with the second one you can't do this easily( you need frequently string manipulation and this don't let you use the index) and in some condition you can't do using sql
and for update, insert and delete the performance remain pratically the same
It depends on the actual query that you use and I suggest run EXPLAIN first, with quite many dummy data, to understand how MySQL is going to execute your query.
But if you are going to find records by column car_id alone or car_id and dealer_id, you can use composite index (car_id, dealer_id).
If you also want to find by dealer_id alone, you can add additional index on dealer_id column.
Your one column table option is not very good because
You cannot find rows by dealer_id fast.
Table schema is not normalized.

(MS Access) How to return a field of a different record in a query?

Let's say I have a table with 10 records labeled 1 through 10, and each record contains two fields. I want to create a query that shows me Field 1 of record N with Field 2 of record N+1. For example, the query would show Field 1 of record 3 with Field 2 of record 4. Is this possible?
It is possible not particularily complex.
Given a table tblFoo with FooId as Primary Key and the two additional fields FooText and BarText, the SQL to get the desired results would look like this:
SELECT f1.FooText, f2.BarText
FROM tblFoo AS f1
LEFT JOIN tblFoo AS f2
ON f1.FooID +1 = f2.FooID
While it is simple to implement, performance will no be ideal for large tables because the expression FooId+1 prevents the query engine to use the primary key as index while retrieving the results.

Mysql Relational Database duplicate with different keys

I'm trying to correct a relational db for a month, but i cant find efficient solution.
Hier is my problem:
I have like 534 M rows Relational Db with lots of foreig keys(30).
I can handle normal duplicates with union...group by...havin count(*)=1 by inserting, but there are also duplciates with different keys.
example:
table 1
id | key1 | value
1 | 11 | a1
2 | 22 | a1
table 2
key1 | value
11 | a2
22 | a2
Foreign key table1(key1) references table2(key1)
I'm trying to find, remove duplicate , correct the parents.
I have tried 3 different ways,
1: PHP Script,Arrays
export tables (dump) --> array_unique, find duplicates, correct the parents array --> import tables
Its pretty fast, but need 80GB Memory, which could be problem in the future
2: PHP Script,SQL Query
exporrt tables(dump) --> find duplicates --> send queries to parent table
No need memory, but the tables are really big and 5 queries take 1 second, 50 M duplicates would take days, months, years
3: ON DUPLICATE UPDATE KEY: I added one column 'duplicate' to store duplicate keys and I defined all columns except key as unique key,
insert.... on duplicate update concat(duplicate,';',VALUES(key)).
But some tables has more than 1 key and sometimes I should define 24 column as unique index and memory problem again
I hope I could explain my problem. Do you have any idea ?
Why don't you simply create a unique key on column. Just use "Ignore" keyword it will remove the duplicate records.So your query will be something like: ALTER IGNORE TABLE testdb.table1
ADD UNIQUE INDEX column1 (column1 ASC) ;

Moving records around from tables

I have a question regarding the design of two tables.
Table 1: The main table, called Batch. Values are added here from parsing files.
Table 2: This table works like a log table, every row that is deleted from table 1 goes here.
Example
Table 1
ID text
1 'bla1'
2 'bla2'
3 'bla3'
Delete row where id is 2 and 3
Table 2
ID text
2 'bla2'
3 'bla3'
Problem:
What if I insert ID 2 and 3 again in table 1 and deletes it? Table 2 would have same data. How can I fix this? Should I just make ID also identity column? So when I add 2 records it would be this (additional question how do I keep counting if I delete the whole table 1?):
Table 1
ID
4 'Bla3'
5 'Bla4'
Just have a unique identifier for Table 1. This identifier should be unique to this table, not for the data you load. You can then load id 100 from your source file as many times as you want, they should get a unique identifier in the Table 1.
An Identity Column seems to fit your requirements for this. I'd look into more audit data as well, perhaps store what file it came from, when it was loaded, who loaded it, etc.
As for filling the log table, you can just attach a trigger on your Table 1 that fills Table 2 with deleted rows, should be pretty straight forward.
It seems that in your design Table 1 uses a surrogate key. In this case you should define also a natural key for your purpose. Then Table 2 will contains natural key and values of Table 1 erased data.
Because you can erase some data for several times, you should add a timestamp field you your table 2.
create table table1 (
id int identity primary key,
[text] varchar(50) not null unique,
... other data ...
)
create table table2 (
[text] varchar(50) not null,
erased datetime not null,
... other data ...
constraint table2_pk
primary key ( [text], erased )
)