MySQL: Finding existences between values in database and array - mysql

I'd like to know how can I make a unique query to find which values exist and which values do not. I explain.
What I have
I've got a database table with a structure as follows:
+----+--------+-----------+-----------+
| id | action | button_id | type |
+----+--------+-----------+-----------+
| 1 | 1 | 1 | button |
| 2 | 2 | 4 | button |
| 3 | 1 | 2 | attribute |
+----+--------+-----------+-----------+
As you can see, an action can have multiple button_id values. For your knowledge, a button_id can be assigned to multiple action, too, but a button_id can only have a type for an action.
So, button_id 1 can be also present in action 4 with the type "attribute" set to it, but it cannot be duplicated to the same action with another type.
The problem
The problem comes when I want to update the buttons in an action. I receive an action object with an array of the buttons it have (in PHP) with the structure below (I write it in JSON structure):
"buttons":
[
{
"id":"1",
"type":"button"
},
{
"id":"3",
"type":"attribute"
}
]
As you can see, the button with ID 1 remains the same, but I've got a new button to deal with (the button with ID 3) and the button with ID 2 is not present anymore.
What I'd want
I'd want to be able to make a unique MySQL query that returns me which values from those I receive exists and which do not, and which may be present in the database but not in that array.
To sum up: I want to know the differences between the buttons in the array received and those present in the database.
So, as an example with the received data described before and the database as we have it right now, I expect to receive something like this:
+--------+-----------+--------+------------+
| action | button_id | exists | is_present |
+--------+-----------+--------+------------+
| 1 | 1 | 1 | 1 |
| 1 | 2 | 1 | 0 |
| 1 | 3 | 0 | 1 |
+--------+-----------+--------+------------+
With this information, I'd be able to know that button with ID 2 does not exist anymore (because it's not present in the new array) and button with ID 3 is a new button because it does not exists previously but it's present in the new array.
What I've tried
There are some tests I've tried, but none of them gives me what I need, and not only tested with MySQL pure queries.
For example, I've tried to check the existence for each button I receive but that would leave me without being able to find if a button is deleted (so it's not present in the received array).
Checking that but taking as reference the buttons in the database has the same effect, as I will be able to check which have been updated or deleted, but it would skip those that are new and not present in the database.
I've tried to write some queries making COUNT queries and GROUP BY button_id, and so, but no luck neither.
(I won't write the queries because none of them have given me the expected results, so they won't be of any help for you).
Any combination of those explained before I think will be much slower than doing it purely by database queries, and that's why I'm asking for it.
The question
Is there a query that would return to me something like the response explained before in "What I'd want" section, so it would make only a call to the MySQL server?
Thank you all for your time, your responses and your patience for any lack of information you may find by my part.
Of course, any doubts, questions you have or information you may need, comment it and I'll try to explain it better or to add it.
Kind regards.

To do that in a single query would be very cubersome. Here is a solution that is not exactly what you are looking for but should do the job.
Let's say your table looks like this :
CREATE TABLE htmlComponent
(
id int auto_increment primary key,
action int,
button_id int not null,
type varchar(20),
dtInserted datetime,
dtUpdated datetime
);
CREATE UNIQUE INDEX buttonType ON htmlComponent(button_id, type);
Now we need to update the table according to the buttons / atributes you have for a specific action.
-- Reset dtInserted and dtUpdated for action 1
UPDATE htmlComponent SET dtInserted = null, dtUpdated = null WHERE action=1;
-- INSERT or UPDATE according to the data inside the json structure
INSERT INTO htmlComponent (action, button_id, type, dtInserted)
VALUES
(1, 1, 'button', NOW()),
(1, 3, 'attribute', NOW())
ON DUPLICATE KEY UPDATE
button_id = VALUES(button_id),
type = VALUES(type),
dtInserted = null,
dtUpdated = NOW();
-- Getting the result
SELECT * FROM htmlComponent where action=1;
Your should end up with this result which will make it easy to figure out what doesn't exists anymore, what is new and what was updated.
+----+--------+-----------+-----------+----------------------------+----------------------------+
| ID | ACTION | BUTTON_ID | TYPE | DTINSERTED | DTUPDATED |
+----+--------+-----------+-----------+----------------------------+----------------------------+
| 1 | 1 | 1 | button | (null) | February, 09 2015 16:21:49 |
| 3 | 1 | 2 | attribute | (null) | (null) |
| 4 | 1 | 3 | attribute | February, 09 2015 16:21:49 | (null) |
+----+--------+-----------+-----------+----------------------------+----------------------------+
Here is a fiddle. Please note I had to put the UPDATE and the INSERT in the left panel because DML are not allowed in the query panel.

Related

update target table based on data source table if empty

Possibly a duplicate been searching for the specific answer i need but couldnt find it. I have two simple tables
Source :
| Id | companyName | adress |
|----|-------------|---------|
| 1 | aquatics | street1 |
| 2 | rivers | street2 |
target :
| Id | nameCompany | companyAdress |
|----|-------------|----------------|
| 1 | aquatics | street1 |
| 2 | rivers | |
I simplified the matter, I have two sets of data the source table is extern data en i as a dev want to update my table with the extern data.
So we see that in the source everything is filled in.
I miss some info. In this case i miss the adress.
How can i run a query that checks: your row is incomplete. Lemme update this target row with the source data.
Only problem is. Extern data uses a different name voor the same columns as i have.
Been a couple of days with mysql. So pls try to explain noob friendly tried some thing but i couldnt figure it out
Below query updates companyAdress on target table if adress column on Source table is not equal to companyAdress on target table.
If you only need to update the empty values for target.companyAdress you should change the condition where s.adress <> t.companyAdress to where t.companyAdress =''
update target t
inner join `Source` s
on t.nameCompany=s.companyName
set t.companyAdress=s.adress
where s.adress <> t.companyAdress ;
Demo: https://www.db-fiddle.com/f/pB6b5xrgPKCivFWcpQHsyE/0

MySQL insert on duplicate update multiple conditions

I have the following table "texts"
+---------+-------------+-------------+-----------+--------------------+
| txt_id | txt_lang_id | txt_section | txt_code | txt_value |
+---------+-------------+-------------+-----------+--------------------+
| 1 | 1 | home | txt_title | Home |
| 2 | 1 | home | txt_btn | I'm a button |
| 3 | 1 | home | txt_welc | Welcome to home |
etc...
I have multiple databases, one for each company, and a master database where the texts are created, besides, in each company, the administrator can customize your texts.
My idea is to create a query that inserts the new texts created in each database, and if it already exists to update the value.
It is possible to make the query INSERT INTO ... ON DUPLICATE KEY UPDATE Have several conditions like txt_lang_id = 1 AND txt_section = 'home' AND txt_code = 'txt_title' SET txt_value = 'New home'
My idea is to be that way, because the same function would use it for other tables, such as configuration, which is a table that starts empty, and is populated as the administrator of company changes the default options, so the auto id is not always in the same order for all companies.
It is possible to do something like this, or rather I look for the way that the rows are always in the same order. Thanks.
You can use UPDATE with CASE, e.g.:
INSERT INTO ... ON DUPLICATE KEY UPDATE txt_value =
(CASE WHEN txt_lang_id = 1 AND txt_section = 'home'
AND txt_code = 'txt_title' THEN 'New home' else txt_value end);
Here's the SQL Fiddle.

MySQL - How to follow a trail of rows that have been duplicated?

When a row is duplicated in our system, it inserts the new row with a reference back to the ID of the one it was duplicated from.
If that new row is then duplicated, it has a reference back to the row it was duplicated from and so on.
What I cant work out is how to follow this trail in a SELECT.
Take the following data...
+------+-------------------+
| ID | duplicated_from |
+------+-------------------+
| 1 | NULL |
| 2 | 1 |
| 3 | 2 |
| 4 | NULL |
+------+-------------------+
So, given ID 1, how would you look up all the slides in the chain that have been duplicated off it?
Or is this something that will have to be done at an application level?
Seems that you are after a recursive query, i found this solutions that may help you How to create a MySQL hierarchical recursive query
HTH

Data structure for a set of changes similar to SVN?

So far we have been storing information of changes as following.
Imagine having a changeset table structure of something that gets changed that is called object. The object is connected to say a foreign element by a foreign key. The object gets created like this
changesetId (Timestamp) | objectId | foreignKey | name (String) | description (String)
2015-04-29 23:28:52 | 2 | 123 | none | none
Now we change the name, the table will look like that after the name change
changesetId (Timestamp) | objectId | foreignKey | name (String) | description (String)
2015-04-29 23:28:52 | 2 | 123 | none | none
2015-04-29 23:30:01 | 2 | null | foo | null
This structure is exactly the minimum. It contains exactly the change we did. But to create the current version of the object, we have to add up the changes to actually get the final version. E.g.
changesetId (Timestamp) | objectId | foreignKey | name (String) | description (String)
2015-04-29 23:28:52 | 2 | 123 | none | none
2015-04-29 23:30:01 | 2 | null | foo | null
*2015-04-29 23:30:01 | 2 | 123 | foo | none
the * marking the final version, which does not exist in the DB.
So if we only store exactly the changes, we have more work to do. Especially, when coming from a foreign object f. If I have a number of objects f and I want to get all changes to the object from our table, I have to create a bit of an ugly SQL. This obviously gets worse, the more foreign objects you have.
Basically I have to do:
Select all F that I want and
Select all objects WHERE foreignKey = foreignId
OR Select all objects that have objectId in (Select all objects that have foreignKey = foreignId)
e.g. I have to select the objects that have foreignKey 123 or elements that have foreignKey null but there exists an entry with same objectId with foreignKey 123.
The more dependencies, the uglier this SQL gets obviously.
Did I make myself clear?
Wouldn't it be much easier to keep always all fields in all versions
e.g. a simple name change gets:
changesetId (Timestamp) | objectId | foreignKey | name (String) | description (String)
2015-04-29 23:28:52 | 2 | 123 | none | none
2015-04-29 23:30:01 | 2 | 123 | foo | none
now to create a diff I have to compare both versions, but I don't have to do the extra work for selecting the right elements nor for calculating the final version of said timestamp.
What do you consider the proven best solution?
how is svn doing it?
For your use case the method you suggest seem to be better. Key value stores like LSM trees do exactly the same. They just write a newer version of the object without deleting the older version. If, at any point of time, you need the change that was made, I think you can just diff two adjacent versions.
The second method might use more space if you have a lot of variable length text fields, but that's a trade-off you get for speed and maintainability.

SQL big tables optimization

I'm at the moment developping a quite big application that will manipulate a lot of data.
I'm designing the data model and I wonder how to tune this model for big amount of data. (My DBMS is MySQL)
I have a table that will contain objects called "values". There are 6 columns called :
id
type_bool
type_float
type_date
type_text
type_int
Depending of the type of that value (that is written elsewhere), one of these columns has a data, the others are NULL values.
This table is aimmed to contain millions lines (growing very fastly). It's also going to be read a lot of times.
My design is going to make a lot of lines with few data. I wonder if it's better to make 5 different tables, each will contain only one type of data. With that solution there would be much more jointures.
Can you give me a piece of advice ?
Thank you very much !
EDIT : Description of my tables
TABLE ELEMENT In the application there are elements thats contains attributes.
There will be a LOT of rows.
There is a lot of read/write, few update/delete.
TABLE ATTRIBUTEDEFINITION Each attribute is described (design time) in the table attributeDefinition that tells which is the type of the attribute.
There will not be a lot of rows
There is few writes at the begining but a LOT of reads.
TABLE ATTRIBUTEVALUE After that, another table "attributeValue" contains the actual data of each attributeDefinition for each element.
There will be a LOT of rows ([nb of Element] x [nb of attribute])
There is a LOT of read/write/UPDATE
TABLE LISTVALUE *Some types are complex, like the list_type. The set of values available for this type are in another table called LISTVALUE. The attribute value table then contains an id that is a key of the ListValue Table*
Here are the create statements
CREATE TABLE `element` (
`id` int(11),
`group` int(11), ...
CREATE TABLE `attributeDefinition` (
`id` int(11) ,
`name` varchar(100) ,
`typeChamps` varchar(45)
CREATE TABLE `attributeValue` (
`id` int(11) ,
`elementId` int(11) , ===> table element
`attributeDefinitionId` int(11) , ===> table attributeDefinition
`type_bool` tinyint(1) ,
`type_float` decimal(9,8) ,
`type_int` int(11) ,
`type_text` varchar(1000) ,
`type_date` date,
`type_list` int(11) , ===> table listValue
CREATE TABLE `listValue` (
`id` int(11) ,
`name` varchar(100), ...
And there is a SELECT example that retrieve all elements of a group that id is 66 :
SELECT elementId,
attributeValue.id as idAttribute,
attributeDefinition.name as attributeName,
attributeDefinition.typeChamps as attributeType,
listValue.name as valeurDeListe,
attributeValue.type_bool,
attributeValue.type_int,
DATE_FORMAT(vdc.type_date, '%d/%m/%Y') as type_date,
attributeValue.type_float,
attributeValue.type_text
FROM element
JOIN attributeValue ON attributeValue.elementId = element.id
JOIN attributeDefinition ON attributeValue.attributeDefinitionId = attributeDefinition.id
LEFT JOIN listValue ON attributeValue.type_list = listValue.id
WHERE `e`.`group` = '66'
In my application, foreach row, I print the value that corresponds to the type of the attribute.
As you are only inserting into a single column each time, create a different table for each data type - if you are inserting large quantities of data you will be wasting a lot of space with this design.
Having fewer rows in each table will increase index lookup speed.
Your column names should describe the data in them, not the column type.
Read up on Database Normalisation.
Writing will not be an issue here. Reading will
You have to ask yourself :
how often are you gonna query this ?
are old data modified or is it just "append" ?
==> if the answers are frequently / append only, or minor modification of old data, a cache may solve your read issues, as you won't query the base so often.
There will be a lot of null fields at each row. If the table is not big ok, but as you said there will be millions of rows so you are wasting space and the queries will take longer to execute. Do someting like this:
table1
id | type
table2
type | other fields
Advice I have, although it might not be the kind you want :-)
This looks like an entity-attribute-value schema; using this kind of schema leads to all kind of maintenance / performance nightmares:
complicated queries to get all values for a master record (essentially, you'll have to left join your result table N times with itself to obtain N attributes for a master record)
no referential integrity (I'm assuming you'll have lookup values with separate master data tables; you cannot use foreign key constraints for this)
waste of disk space (since your table will be sparsely filled)
For a more complete list of reasons to avoid this kind of schema, I'd recommend getting a copy of SQL Antipatterns
Finally I tried to implement both solutions and then I benched them.
For both solution, there were a table element and a table attribute definition as follow :
[attributeDefinition]
| id | group | name | type |
| 12 | 51 | 'The Bool attribute' | type_bool |
| 12 | 51 | 'The Int attribute' | type_int |
| 12 | 51 | 'The first Float attribute' | type_float |
| 12 | 51 | 'The second Float attribute'| type_float |
[element]
| id | group | name
| 42 | 51 | 'An element in the group 51'
First Solution (Best one)
One big table with one column per type and many empty cells. Each value of each attribute of each element.
[attributeValue]
| id | element | attributeDefinition | type_int | type_bool | type_float | ...
| 1 | 42 | 12 | NULL | TRUE | NULL | NULL...
| 2 | 42 | 13 | 5421 | NULL | NULL | NULL...
| 3 | 42 | 14 | NULL | NULL | 23.5 | NULL...
| 4 | 42 | 15 | NULL | NULL | 56.8 | NULL...
One table for attributeDefinition that describe each attribute of every element in a group.
Second Solution (Worse one)
8 tables, one for each type :
[type_float]
| id | group | element | value |
| 3 | 51 | 42 | 23.5 |
| 4 | 51 | 42 | 56.8 |
[type_bool]
| id | group | element | value |
| 1 | 51 | 42 | TRUE |
[type_int]
| id | group | element | value |
| 2 | 51 | 42 | 5421 |
Conclusion
My bench was first looking at the database size. I had 1 500 000 rows in the big table which means approximatly 150 000 rows in each small table if there are 10 datatypes.
Looking in phpMyAdmin, sizes are nearly exactly the same.
First Conclusion : Empty cells doesn't take place.
After that, my second bench was for performance tests, getting all values of all attributes of all elements in one group. There are 15 groups in the database. Each group has :
400 elements
30 attributes per element
So that is 12 000 rows in [attributeValue] or 1200 rows in each table [type_*].
The First SELECT only does one join between [attributeValue] and [element] to put a WHERE on the group.
The second SELECT uses a UNION with 10 SELECT in each table [type_*].
That second SELECT is 10 times longer !
Second Conclusion : One table is better that many.