Possibly a duplicate been searching for the specific answer i need but couldnt find it. I have two simple tables
Source :
| Id | companyName | adress |
|----|-------------|---------|
| 1 | aquatics | street1 |
| 2 | rivers | street2 |
target :
| Id | nameCompany | companyAdress |
|----|-------------|----------------|
| 1 | aquatics | street1 |
| 2 | rivers | |
I simplified the matter, I have two sets of data the source table is extern data en i as a dev want to update my table with the extern data.
So we see that in the source everything is filled in.
I miss some info. In this case i miss the adress.
How can i run a query that checks: your row is incomplete. Lemme update this target row with the source data.
Only problem is. Extern data uses a different name voor the same columns as i have.
Been a couple of days with mysql. So pls try to explain noob friendly tried some thing but i couldnt figure it out
Below query updates companyAdress on target table if adress column on Source table is not equal to companyAdress on target table.
If you only need to update the empty values for target.companyAdress you should change the condition where s.adress <> t.companyAdress to where t.companyAdress =''
update target t
inner join `Source` s
on t.nameCompany=s.companyName
set t.companyAdress=s.adress
where s.adress <> t.companyAdress ;
Demo: https://www.db-fiddle.com/f/pB6b5xrgPKCivFWcpQHsyE/0
I have the following table "texts"
+---------+-------------+-------------+-----------+--------------------+
| txt_id | txt_lang_id | txt_section | txt_code | txt_value |
+---------+-------------+-------------+-----------+--------------------+
| 1 | 1 | home | txt_title | Home |
| 2 | 1 | home | txt_btn | I'm a button |
| 3 | 1 | home | txt_welc | Welcome to home |
etc...
I have multiple databases, one for each company, and a master database where the texts are created, besides, in each company, the administrator can customize your texts.
My idea is to create a query that inserts the new texts created in each database, and if it already exists to update the value.
It is possible to make the query INSERT INTO ... ON DUPLICATE KEY UPDATE Have several conditions like txt_lang_id = 1 AND txt_section = 'home' AND txt_code = 'txt_title' SET txt_value = 'New home'
My idea is to be that way, because the same function would use it for other tables, such as configuration, which is a table that starts empty, and is populated as the administrator of company changes the default options, so the auto id is not always in the same order for all companies.
It is possible to do something like this, or rather I look for the way that the rows are always in the same order. Thanks.
You can use UPDATE with CASE, e.g.:
INSERT INTO ... ON DUPLICATE KEY UPDATE txt_value =
(CASE WHEN txt_lang_id = 1 AND txt_section = 'home'
AND txt_code = 'txt_title' THEN 'New home' else txt_value end);
Here's the SQL Fiddle.
When a row is duplicated in our system, it inserts the new row with a reference back to the ID of the one it was duplicated from.
If that new row is then duplicated, it has a reference back to the row it was duplicated from and so on.
What I cant work out is how to follow this trail in a SELECT.
Take the following data...
+------+-------------------+
| ID | duplicated_from |
+------+-------------------+
| 1 | NULL |
| 2 | 1 |
| 3 | 2 |
| 4 | NULL |
+------+-------------------+
So, given ID 1, how would you look up all the slides in the chain that have been duplicated off it?
Or is this something that will have to be done at an application level?
Seems that you are after a recursive query, i found this solutions that may help you How to create a MySQL hierarchical recursive query
HTH
So far we have been storing information of changes as following.
Imagine having a changeset table structure of something that gets changed that is called object. The object is connected to say a foreign element by a foreign key. The object gets created like this
changesetId (Timestamp) | objectId | foreignKey | name (String) | description (String)
2015-04-29 23:28:52 | 2 | 123 | none | none
Now we change the name, the table will look like that after the name change
changesetId (Timestamp) | objectId | foreignKey | name (String) | description (String)
2015-04-29 23:28:52 | 2 | 123 | none | none
2015-04-29 23:30:01 | 2 | null | foo | null
This structure is exactly the minimum. It contains exactly the change we did. But to create the current version of the object, we have to add up the changes to actually get the final version. E.g.
changesetId (Timestamp) | objectId | foreignKey | name (String) | description (String)
2015-04-29 23:28:52 | 2 | 123 | none | none
2015-04-29 23:30:01 | 2 | null | foo | null
*2015-04-29 23:30:01 | 2 | 123 | foo | none
the * marking the final version, which does not exist in the DB.
So if we only store exactly the changes, we have more work to do. Especially, when coming from a foreign object f. If I have a number of objects f and I want to get all changes to the object from our table, I have to create a bit of an ugly SQL. This obviously gets worse, the more foreign objects you have.
Basically I have to do:
Select all F that I want and
Select all objects WHERE foreignKey = foreignId
OR Select all objects that have objectId in (Select all objects that have foreignKey = foreignId)
e.g. I have to select the objects that have foreignKey 123 or elements that have foreignKey null but there exists an entry with same objectId with foreignKey 123.
The more dependencies, the uglier this SQL gets obviously.
Did I make myself clear?
Wouldn't it be much easier to keep always all fields in all versions
e.g. a simple name change gets:
changesetId (Timestamp) | objectId | foreignKey | name (String) | description (String)
2015-04-29 23:28:52 | 2 | 123 | none | none
2015-04-29 23:30:01 | 2 | 123 | foo | none
now to create a diff I have to compare both versions, but I don't have to do the extra work for selecting the right elements nor for calculating the final version of said timestamp.
What do you consider the proven best solution?
how is svn doing it?
For your use case the method you suggest seem to be better. Key value stores like LSM trees do exactly the same. They just write a newer version of the object without deleting the older version. If, at any point of time, you need the change that was made, I think you can just diff two adjacent versions.
The second method might use more space if you have a lot of variable length text fields, but that's a trade-off you get for speed and maintainability.
I'm at the moment developping a quite big application that will manipulate a lot of data.
I'm designing the data model and I wonder how to tune this model for big amount of data. (My DBMS is MySQL)
I have a table that will contain objects called "values". There are 6 columns called :
id
type_bool
type_float
type_date
type_text
type_int
Depending of the type of that value (that is written elsewhere), one of these columns has a data, the others are NULL values.
This table is aimmed to contain millions lines (growing very fastly). It's also going to be read a lot of times.
My design is going to make a lot of lines with few data. I wonder if it's better to make 5 different tables, each will contain only one type of data. With that solution there would be much more jointures.
Can you give me a piece of advice ?
Thank you very much !
EDIT : Description of my tables
TABLE ELEMENT In the application there are elements thats contains attributes.
There will be a LOT of rows.
There is a lot of read/write, few update/delete.
TABLE ATTRIBUTEDEFINITION Each attribute is described (design time) in the table attributeDefinition that tells which is the type of the attribute.
There will not be a lot of rows
There is few writes at the begining but a LOT of reads.
TABLE ATTRIBUTEVALUE After that, another table "attributeValue" contains the actual data of each attributeDefinition for each element.
There will be a LOT of rows ([nb of Element] x [nb of attribute])
There is a LOT of read/write/UPDATE
TABLE LISTVALUE *Some types are complex, like the list_type. The set of values available for this type are in another table called LISTVALUE. The attribute value table then contains an id that is a key of the ListValue Table*
Here are the create statements
CREATE TABLE `element` (
`id` int(11),
`group` int(11), ...
CREATE TABLE `attributeDefinition` (
`id` int(11) ,
`name` varchar(100) ,
`typeChamps` varchar(45)
CREATE TABLE `attributeValue` (
`id` int(11) ,
`elementId` int(11) , ===> table element
`attributeDefinitionId` int(11) , ===> table attributeDefinition
`type_bool` tinyint(1) ,
`type_float` decimal(9,8) ,
`type_int` int(11) ,
`type_text` varchar(1000) ,
`type_date` date,
`type_list` int(11) , ===> table listValue
CREATE TABLE `listValue` (
`id` int(11) ,
`name` varchar(100), ...
And there is a SELECT example that retrieve all elements of a group that id is 66 :
SELECT elementId,
attributeValue.id as idAttribute,
attributeDefinition.name as attributeName,
attributeDefinition.typeChamps as attributeType,
listValue.name as valeurDeListe,
attributeValue.type_bool,
attributeValue.type_int,
DATE_FORMAT(vdc.type_date, '%d/%m/%Y') as type_date,
attributeValue.type_float,
attributeValue.type_text
FROM element
JOIN attributeValue ON attributeValue.elementId = element.id
JOIN attributeDefinition ON attributeValue.attributeDefinitionId = attributeDefinition.id
LEFT JOIN listValue ON attributeValue.type_list = listValue.id
WHERE `e`.`group` = '66'
In my application, foreach row, I print the value that corresponds to the type of the attribute.
As you are only inserting into a single column each time, create a different table for each data type - if you are inserting large quantities of data you will be wasting a lot of space with this design.
Having fewer rows in each table will increase index lookup speed.
Your column names should describe the data in them, not the column type.
Read up on Database Normalisation.
Writing will not be an issue here. Reading will
You have to ask yourself :
how often are you gonna query this ?
are old data modified or is it just "append" ?
==> if the answers are frequently / append only, or minor modification of old data, a cache may solve your read issues, as you won't query the base so often.
There will be a lot of null fields at each row. If the table is not big ok, but as you said there will be millions of rows so you are wasting space and the queries will take longer to execute. Do someting like this:
table1
id | type
table2
type | other fields
Advice I have, although it might not be the kind you want :-)
This looks like an entity-attribute-value schema; using this kind of schema leads to all kind of maintenance / performance nightmares:
complicated queries to get all values for a master record (essentially, you'll have to left join your result table N times with itself to obtain N attributes for a master record)
no referential integrity (I'm assuming you'll have lookup values with separate master data tables; you cannot use foreign key constraints for this)
waste of disk space (since your table will be sparsely filled)
For a more complete list of reasons to avoid this kind of schema, I'd recommend getting a copy of SQL Antipatterns
Finally I tried to implement both solutions and then I benched them.
For both solution, there were a table element and a table attribute definition as follow :
[attributeDefinition]
| id | group | name | type |
| 12 | 51 | 'The Bool attribute' | type_bool |
| 12 | 51 | 'The Int attribute' | type_int |
| 12 | 51 | 'The first Float attribute' | type_float |
| 12 | 51 | 'The second Float attribute'| type_float |
[element]
| id | group | name
| 42 | 51 | 'An element in the group 51'
First Solution (Best one)
One big table with one column per type and many empty cells. Each value of each attribute of each element.
[attributeValue]
| id | element | attributeDefinition | type_int | type_bool | type_float | ...
| 1 | 42 | 12 | NULL | TRUE | NULL | NULL...
| 2 | 42 | 13 | 5421 | NULL | NULL | NULL...
| 3 | 42 | 14 | NULL | NULL | 23.5 | NULL...
| 4 | 42 | 15 | NULL | NULL | 56.8 | NULL...
One table for attributeDefinition that describe each attribute of every element in a group.
Second Solution (Worse one)
8 tables, one for each type :
[type_float]
| id | group | element | value |
| 3 | 51 | 42 | 23.5 |
| 4 | 51 | 42 | 56.8 |
[type_bool]
| id | group | element | value |
| 1 | 51 | 42 | TRUE |
[type_int]
| id | group | element | value |
| 2 | 51 | 42 | 5421 |
Conclusion
My bench was first looking at the database size. I had 1 500 000 rows in the big table which means approximatly 150 000 rows in each small table if there are 10 datatypes.
Looking in phpMyAdmin, sizes are nearly exactly the same.
First Conclusion : Empty cells doesn't take place.
After that, my second bench was for performance tests, getting all values of all attributes of all elements in one group. There are 15 groups in the database. Each group has :
400 elements
30 attributes per element
So that is 12 000 rows in [attributeValue] or 1200 rows in each table [type_*].
The First SELECT only does one join between [attributeValue] and [element] to put a WHERE on the group.
The second SELECT uses a UNION with 10 SELECT in each table [type_*].
That second SELECT is 10 times longer !
Second Conclusion : One table is better that many.