That might sound weird, I know. This is an explanation:
1- I have the following table - items (which gets updated because users can update the amount of items as well as the content inside of them):
| id | content | item_id | Order (Unique Index)|
|:-----------|------------:|:------------:|:--------------------:
| 1 | This | 1 | 1
| 2 | is | 1 | 2
| 3 | content | 1 | 3
| 4 | Some | 2 | 1
| 5 | More | 2 | 2
| 6 | More! | 3 | 1
2- On my server, I am running a query that will iterate through my POSTed content check every item based on its item_id as well as to check if the order in that row is set. If order is set, then update the content, else insert new content
Lets say that my content is POSTing 4 items and the item_id = 1. Preferably, what I would want it to do would be this:
| id | content | item_id | Order (Unique Index)|
|:-----------|------------:|:------------:|:--------------------:
| 1 | This | 1 | 1
| 2 | updated | 1 | 2
| 3 | content | 1 | 3
| 4 | Some | 2 | 1
| 5 | More | 2 | 2
| 6 | More! | 3 | 1
> 7 | added | 1 | 4
Note that what happened was, it added a new row because my POSTed content had four items in it. It iterated though every single one, checked if the order existed, if the order existed, then update the value, else create a new row and insert the value as well as the order (key). The order is pretty much the key. That's how I am setting it in there.
It doesn't work when I do this:
// Start loop - for (key in content) {
INSERT INTO items (item_id, content, content_order) VALUES (?, content[key], ?) WHERE item_id = ? ON DUPLICATE KEY UPDATE content = ?
// End loop
What the loop is doing, it is iterating through every content POSTed and inserting it into the database, and if there is a duplicate key (in this case, the Unique Index is the Order column) then only update the content inside of it.
The problem with this is, it will only work on the first three items. Why? Because the first three items are the first ones with those unique indexes. If I was to update the item in which the item_id is 2, then it would give me an error because I cannot update items that have the same unique key. I cannot even INSERT anything because it violates the Unique Index constraints!
So how can I do this?
Is there a way to make the Unique Index absolute to the query - meaning that it will only keep in mind the Unique Indexes based on the queries' specified item_id? (Doubt it)
How can I make it so that it checks if the order is set and update the content or insert a new row without using unique keys?
Is there an alternate way to write this?
If elaboration is needed, please let me know. Thanks.
A straightforward design for your needs would probably have no problems. Although your question is unclear, especially about new content.
Order is not a key of items. Because column order is not unique. The key you want is (item_id,order).
Do you need items id? I'm going to ignore it. I'm going to treat new content as if it were in a table. You probably should build a constant subquery from it. Do all new content item_ids appear in items?
1. No NULLs.
A simple design is to have a version of items called content that holds the rows that make the following fill-in-the-blanks statement true. I'll assume items order is consecutive within item_id.
// item [item_id]'s [order]th content is [content]
content(item_id,order,content)
primary key (item_id,order)
I will guess at your new content format and effect. I'll assume order is consecutive from 1 within item_id. I'll replace all content info for a new content item_id.
// item [item_id]'s [order]th content is [content]
more_content(item_id,order,content)
primary key (item_id,order)
delete from content
where item_id in (select item_id from more_content)
insert into content
select * from more_content
2. NULLs
If NULL order indicates that there is no content then you can instead have content NULL and order=1. (You can also have another table and no NULLs.) If NULL order indicates that there is an unchanged default then just have another table:
// item [item_id]'s content is a default
has_default(item_id)
primary key (item_id)
delete from has_default
where item_id in (select item_id from more_content)
delete from content
where item_id in (select item_id from more_content)
insert into content
select * from more_content
If you want items readable the way it is, make a view:
// [order] is null AND item [item_id]'s default content is [content]
// OR [order] is not null AND item [item_id]'s [order]th content is [content]
create view items as (
select c.item_id,c.content,if(d.item_id is null,c.ord,NULL) as ord
from content c left join has_default d on c.item_id=d.item_id
)
It's hard to make out much more about your design.
It may be difficult to implement constraints for any design for your needs in SQL. But you should start with a straightforward design.
Related
I want to get a record from a joint table at a time. But I don't hope the tables are joined as a whole.
The actual tables are as follow.
table contents -- stores content information.
+----+----------+----------+----------+-------------------+
| id | name |status |priority |last_registered_day|
+----+----------+----------+----------+-------------------+
| 1 | content_1|0 |1 |2020/10/10 11:20:20|
| 2 | content_2|2 |1 |2020/10/10 11:21:20|
| 3 | content_3|2 |2 |2020/10/10 11:22:20|
+----+----------+----------+----------+-------------------+
table clusters -- stores cluster information
+----+----------+
| id | name |
+----+----------+
| 1 | cluster_1|
| 2 | cluster_2|
+----+----------+
table content_cluster -- each record indicates that one content is on one cluster
+----------+----------+-------------------+
|content_id|cluster_id| last_update_date|
+----------+----------+-------------------+
| 1 | 1 |2020-10-01T11:30:00|
| 2 | 2 |2020-10-01T11:30:00|
| 3 | 1 |2020-10-01T10:30:00|
| 3 | 2 |2020-10-01T10:30:00|
+----------+----------+-------------------+
By specifying a cluster_id, I want to get one content name at a time where contents.status=2 and (contents name, cluster_id) pair is in content_cluster. The query in sql is something like follow.
SELECT contents.name
FROM contents
JOIN content_cluster
ON contents.content_id = content_cluster.content_id
where contents.status = 2
AND content_cluster.cluster_id = <cluster_id>
ORDER
BY contents.priority
, contents.last_registered_day
, contents.name
LIMIT 1;
However, I don't want the tables to be joined as a whole every time as I have to do it frequently and the tables are large. Is there any efficient way to do this? I can add some indices to the tables. What should I do?
I would try writing the query like this:
SELECT c.name
FROM contents c
WHERE EXISTS (SELECT 1
FROM content_cluster cc
WHERE cc.content_id = c.content_id AND
cc.cluster_id = <cluster_id>
) AND
c.status = 2
ORDER BY c.priority, c.last_registered_day, c.name
LIMIT 1;
Then create the following indexes:
content(status, priority, last_registered_day, name, content_id, name)
content_cluster(content_id, cluster_id).
The goal is for the execution plan to scan the index for context and for each row, look up to see if there is a match in content_cluster. The query stops at the first match.
I can't guarantee that this will generate that plan (avoiding the sort), but it is worth a try.
This query can easily be optimized by applying correct indexes. Apply the alter statements I am mentioning below. And let me know if the performance have considerably increased or not:
alter table contents
add index idx_1 (id),
add index idx_2(status);
alter table content_cluster
add index idx_1 (content_id),
add index idx_2(cluster_id);
If a content can be in multiple clusters and the number of clusters can change, I think that doing a join like this is the best solution.
You could try splitting your contents table into different tables each containing the contents of a specific cluster, but it would need to be updated frequently.
I have a table with pairs (and sometimes triples) of ids, which act as sort of links in a chain
+------+-----+
| from | to |
+------+-----+
| id1 | id2 |
| id2 | id3 |
| id4 | id5 |
+------+-----+
I want to create a new table where all the links are clustered into chains/families:
+-----+----------+
| id | familyid |
+-----+----------+
| id1 | 1 |
| id2 | 1 |
| id3 | 1 |
| id4 | 2 |
| id5 | 2 |
+-----+----------+
i.e. add up all chains in a link into a single family, and give it an id.
in the example above, the first 2 rows of the first table create one family, and the last row creates another family.
Solution
I will use node.js to query big batches of rows (a few thousands every batch), process them, and insert them into my own table with a family id.
The issue
The problem is I have a few tens of thousands of id pairs, and I will also need to add new ids over time after the initial creation of the families table, and i will need to add ids to existing families
Are there good algorithms for clustering pairs of data into families/clusters, keeping my issue in mind?
Not sure if it's an answer as more some ideas...
I created two tables similar to the ones you have, the first one I populated with the same data as you have.
Table Base, fromID, toID
Table chain, fromID, chainID (numeric, null allowed)
I then inserted all unique values from Base into chain with a null value for chainID. The idea being these are the rows as yet unprocessed.
It was then a case of repeatedly running a couple of statements...
update chain c
set chainID = n
where chainid is null and exists ( select 1 from base b where b.fromID = c.fromID )
order by fromID
limit 1
This would allocate the next chain ID to the first row without one (n needs to be generated from somewhere and incremented each time you run this)
Then the one that relates all of the records...
update chain c
join base b on b.toID = c.fromID
join chain c1 on b.fromID = c1.fromID
set c.chainID = c1.chainID
where c.chainID is null and c1.chainID is not null
This is run repeatedly until it affects 0 rows (i.e. it's nothing more to do).
Then run the first update to create the next chain etc. Again if you run the first update till it affects 0 rows, this shows that they are all linked.
Would be interested if you want to try this and see if it stands up with more complex scenarios.
This looks a lot like clustering over graph dataset where 'familyid' is the cluster center number.
Here is a question I think is relevant.
Here is the algorithm description. You will need to implement under the conditions you described.
So my database's primary key is just a column called 'id'. When i go to add a new item into the database I want it in a specific order without having to go into the DB and changing every value after it to +1 whatever it was before.
Example:
ID
| 1 | item1 |
| 2 | item2 |
| 3 | item3 |
| 4 | item4 |
Say I want to add an item inbetween item2 and item3. To do that I would need to change item 3's id to 4 and item4's id to 5, but currently I have to go into the database and do it automatically.
How would I make it increment automatically when I INSERT a new item?
You should consider leaving IDs unaltered and adding a secondary column to sort by, e.g. sort_order. Needing to alter all the IDs in the parent and related tables for every insert can't be a good idea, esp. if you don't have properly crafted foreign keys.
If you do so, it should be fairly easy to accomplish:
-- Untested
START TRANSACTION;
UPDATE foo SET sort_order=sort_order+1 WHERE sort_order>=4;
INSERT INTO foo (name, sort_order) VALUES ('item', 4);
COMMIT;
Is it possible to add a database constraint to limit a row to have a single value in one of two columns, never more and never less? Let me illustrate:
Sales Order Table
---------------------------------
id | person_id | company_id |
Rows for this would look like:
id | person_id | company_id |
---|-----------|------------|
1 | 1 | null |
2 | 2 | null |
3 | null | 1 |
4 | null | 2 |
In this illustration, the source of the sales order is either a person or a company. It is one or the other, no more or less. My question is: is there a way to constrain the database so that 1) both fields can't be null and 2) both fields can't be not-null? i.e., one has to be null and one has to be not-null...
I know the initial reaction from some may be to combine the two tables (person, company) into one customer table. But, the example I'm giving is just a very simple example. In my application the two fields I'm working with cannot be combined into one.
The DBMS I'm working with is MySQL.
I hope the question makes sense. Thank you in advance for your help!
This may come as a shock...
mysql doesn't support CHECKconstraints. It allows you to define them, but it totally ignores them.
They are allowed in the syntax only to provide compatibility with other database's syntax.
You could use a trigger on update/insert, and use SIGNAL to raise an exception.
Lookup table - unique row identity
The other lookup tables just do not make sense as from what I have seen giving a row an ID then putting that id in another table which also has a id then adding these id's to some more tables which may reference them and still creating a lookup tables with more id's (this is how all the examples I can find seem) What I have done is this :
product_item - table
------------------------------------------
id | title | supplier | price
1 | title11 | suuplier1 | price1
etc.
it then goes on to include more items (sure you get it)
product_feature - table
--------------------------
id | title | iskeyfeature
1 | feature1 | true
feature_desc - table
-----------------------------
id | title | desc
1 | desc1 | text description
product_lookup - table
item_id | feature_id | feature_desc
1 | 1 | 1
1 | 2 | 2
1 | 3 | 3
1 |64 | 15
(as these only need to be referenced in the lookup the id's can be multiples per item or multiple items per feature)
What I want to do without adding item_id to every feature row or description row is retrieve only the columns from the multiple tables where their id is referenced in the same row of the lookup table. I want to know if it is possible to select all the referenced columns from the lookup row if I only know the item_id eg. Item_id = 1 return all rows where item_id = 1 with the columns referenced in the same row. Every item can have multiple features and also every feature could be attached to multiple items , this will not matter if I can just get the pattern right in how to construct this query from a single known value.
Any assistance or just some direction will be greatly appreciated. I'm using phpmyadmin, and sure this will be easier with some php voodoo I am learning mysql from tutorials ect and would like to know how to do it with sql directly.
Having a NULL value in a column is not the major concern that would lead to this design - it's the problem with adding new attribute columns in the future, at which MySQL is disgracefully bad.
If you want to make a query that returns everything about an item in one row, you need to LEFT OUTER JOIN back to the product_lookup table for each feature_id. This is about every 10th mysql question on Stack Overflow, so you should be able to find tons of examples.