SQL schema / query design for horizontally scaled clients iterating through a list - mysql

I've got two tables (simplified)
List_To_Action
----------------------------------------------------
| Item_Name | Date_Time_Actioned | Action_Success? |
----------------------------------------------------
And
Action_Records (created by software)
------------------------------------------------
| ID | Item_Name | Date | ... | Action_Success?|
------------------------------------------------
I'm writing a python script that tells a client to iterate through List_To_Action for items that have not yet been actioned.
This would be simple if I had just one client accessing the table:
SELECT * FROM List_To_Action WHERE Date_Time_Actioned IS NULL
However, I will be horizontally scaling and multiple clients will be hitting this list to search for items that have not been actioned, so what I want to know is.
How would I set up the schema / query for this list so that when multiple clients are hitting List_To_Action, they're not pulling out the same items? Pseudo code / pointers are fine.

Related

DB structure - form with dynamic number of options

I've been reading similar questions, but I think my case is a bit more complicated.
I have a form that register items. These items may have options with sub-options (checkboxes and radio buttons):
The number of checkboxes and radio buttons may decrease/increase but the real pain to design a good structure is for the checkboxes, as these must have (at least I think so) a fixed name column for each one.
The case for radio buttons is easier as I just assign an id to each one (and save the names in a different table).
My current DB structure is simple (between parenthesis is the table/column name):
The items table (item) have columns of type integer (to save the id of the radio buttons).
Another table for the checkboxes (item_option), with columns of type integer (1 if checked, 0 if unchecked). And 1 PK column (item_id) that points to the PK column (id) of the items table.
And tables (again item_option) for the names of the radio buttons with a PK column (id) that points to the option column (is this understandable? Sorry for my bad english).
I think a different table containing the sub-options is better than put all the columns in the main table, right?
So, the radio buttons are stored in the main table (1 column per option) and the checkboxes in a separeted table (1 table per option):
Items table:
+-----+----------+----------+
| id | Option_1 | Option_2 |
+-----+----------+----------+
| 123 | 3 | 1 |
+-----+----------+----------+
| 456 | 2 | 3 |
+-----+----------+----------+
| 789 | 1 | 2 |
+-----+----------+----------+
item_option_3 table (this would be needed to know which ones are checked):
+--------------+--------------+--------------+---------+
| Sub_Option_1 | Sub_Option_2 | Sub_Option_3 | item_id |
+--------------+--------------+--------------+---------+
| 1 | 0 | 1 | 123 |
+--------------+--------------+--------------+---------+
| 1 | 1 | 0 | 456 |
+--------------+--------------+--------------+---------+
| 0 | 1 | 1 | 789 |
+--------------+--------------+--------------+---------+
item_option_1-2 table (this would be used to print the names):
+-----------+--------------+--------------+
| option_id | name | name_es |
+-----------+--------------+--------------+
| 1 | Sub_Option_1 | Sub_Opción_1 |
+-----------+--------------+--------------+
| 2 | Sub_Option_2 | Sub_Opción_2 |
+-----------+--------------+--------------+
| 3 | Sub_Option_3 | Sub_Opción_3 |
+-----------+--------------+--------------+
What kind of structure do I need to spawn these sub options (checkboxes) dynamically?
What about something like this?
Your model has option keys as columns and values as rows. Why have both keys and values be rows? If you don't need complex type-based validation, it should suffice to have a single options table with a one to optionally many relationship to itself to account for suboptions. To enumerate all options and values, just retrieve all rows from the table. If ParentOptionId is null, then it is a base-level option; otherwise it is a suboption.
UML & ER version below.
EDIT: After reading through your question and comments again, I've come up with a more complicated but more robust design for you to consider:
It works like this:
Every user input is an Option. Every option consists of a display text (OptionText), tooltip/subtext/etc (Description), a default and then user supplied value (Value), a value type (ValueType boolean,text, date, etc). It also has a DisplayOrder so you know where to situate it in relation to other Options in its group. Options can also have a parent/child relationship with other Options. You can do the same for the other entities if you want but I did not model that.
Every Option is contained within an OptionGroup with 0 or more sibling Options. OptionGroups are just a collection of one or more related Options. The GroupType field dictates how your form builder needs to treat that group. The most obvious example would be for your radio button groups; each of those would be an OptionGroup and each radio button would be a boolean Option within the OptionGroup. An OptionGroup could just as easily handle a multiple selection checkbox group or just some related text inputs that need a common header text (like a street address).
For further dynamic design OptionGroups are contained within GroupSections, even if there is just one default GroupSection in a form.
Finally, a Form models your final actual UI form and consists of one or more GroupSections.
This should be flexible enough for you to tweak to your liking. What do you think?
Final note: if you are looking into dynamically building your forms in Javascript, check out a few frameworks like X-editable or formly. They take JSON or configuration objects and build out the entire form with validation/etc from there while giving you some hooks for event handling. Chances are you don't need to completely reinvent the wheel unless you want to keep your implementation as simple and specific as possible.

How do I resolve or avoid need for MySQL with multiple AUTO INCREMENT columns?

I have put a lot of effort into my database design, but I think I am
now realizing I made a major mistake.
Background: (Skip to 'Problem' if you don't need background.)
The DB supports a custom CMS layer for a website template. Users of the
template are limited to turning pages on and off, but not creating
their own 'new' pages. Further, many elements are non editable.
Therefore, if a page has a piece of text I want them to be able to edit,
I would have 'manually' assigned a static ID to it:
<h2><%= CMS.getDataItemByID(123456) %></h2>
Note: The scripting language is not relevant to this question, but the design forces
each table to have unique column names. Hence the convention of 'TableNameSingular_id'
for the primary key etc.
The scripting language would do a lookup on these tables to find the string.
mysql> SELECT * FROM CMSData WHERE CMSData_data_id = 123456;
+------------+-----------------+-----------------------------+
| CMSData_id | CMSData_data_id | CMSData_CMSDataType_type_id |
+------------+-----------------+-----------------------------+
| 1 | 123456 | 1 |
+------------+-----------------+-----------------------------+
mysql> SELECT * FROM CMSDataTypes WHERE CMSDataType_type_id = 1;
+----------------+---------------------+-----------------------+------------------------+
| CMSDataType_id | CMSDataType_type_id | CMSDataType_type_name | CMSDataType_table_name |
+----------------+---------------------+-----------------------+------------------------+
| 1 | 1 | String | CMSStrings |
+----------------+---------------------+-----------------------+------------------------+
mysql> SELECT * FROM CMSStrings WHERE CMSString_CMSData_data_id=123456;
+--------------+---------------------------+----------------------------------+
| CMSString_id | CMSString_CMSData_data_id | CMSString_string |
+--------------+--------------------------------------------------------------+
| 1 | 123456 | The answer to the universe is 42.|
+--------------+---------------------------+----------------------------------+
The rendered text would then be:
<h2>The answer to the universe is 42.</h2>
This works great for 'static' elements, such as the example above. I used the exact same
method for other data types such as file specifications, EMail Addresses, Dates, etc.
However, it fails for when I want to allow the User to dynamically generate content.
For example, there is an 'Events' page and they will be dynamically created by the
User by clicking 'Add Event' or 'Delete Event'.
An Event table will use keys to reference other tables with the following data items:
Data Item: Table:
--------------------------------------------------
Date CMSDates
Title CMSStrings (As show above)
Description CMSTexts (MySQL TEXT data type.)
--------------------------------------------------
Problem:
That means, each time an Event is created, I need to create the
following rows in the CMSData table;
+------------+-----------------+-----------------------------+
| CMSData_id | CMSData_data_id | CMSData_CMSDataType_type_id |
+------------+-----------------+-----------------------------+
| x | y | 6 | (Event)
| x+1 | y+1 | 5 | (Date)
| x+2 | y+2 | 1 | (Title)
| x+3 | y+3 | 3 | (Description)
+------------+-----------------+-----------------------------+
But, there is the problem. In MySQL, you can have only 1 AUTO INCREMENT field.
If I query for the highest value of CMSData_data_id and just add 1 to it, there
is a chance there is a race condition, and someone else grabs it first.
How is this issue typically resolved - or avoided in the first place?
Thanks,
Eric
The id should be meaningless, except to be unique. Your design should work no matter if the block of 4 ids is contiguous or not.
Redesign your implementation to add the parts separately, not as a block of 4. Doing so should simplify things overall, and improve your scalability.
What about locking the table before writing into it? This way, when you are inserting a row in the CMSData table, you can get the last id.
Other suggestion would be to not have an incremented id, but a unique generated one, like a guid or so.
Lock Tables

Database design and query optimization/general efficiency when joining 6 tables in mySQL

I have 6 tables. These are simplified for this example.
user_items
ID | user_id | item_name | version
-------------------------------------
1 | 123 | test | 1
data
ID | name | version | info
----------------------------
1 | test | 1 | info
data_emails
ID | name | version | email_id
------------------------
1 | test | 1 | 1
2 | test | 1 | 2
emails
ID | email
-------------------
1 | email#address.com
2 | second#email.com
data_ips
ID | name | version | ip_id
----------------------------
1 | test | 1 | 1
2 | test | 1 | 2
ips
ID | ip
--------
1 | 1.2.3.4
2 | 2.3.4.5
What I am looking to achieve is the following.
The user (123) has the item with name 'test'. This is the basic information we need for a given entry.
There is data in our 'data' table and the current version is 1 as such the version in our user_items table is also 1. The two tables are linked together by the name and version. The setup is like this as a user could have an item for which we dont have data, likewise there could be an item for which we have data but no user owns..
For each item there are also 0 or more emails and ips associated. These can be the same for many items so rather than duplicate the actual email varchar over and over we have the data_emails and data_ips tables which link to the emails and ips table respectively based on the email_id/ip_id and the respective ID columns.
The emails and ips are associated with the data version again through the item name and version number.
My first query is is this a good/well optimized database setup?
My next query and my main question is joining this complex data structure.
What i had was:
PHP
- get all the user items
- loop through them and get the most recent data entry (if any)
- if there is one get the respective emails
- get the respective ips
Does that count as 3 queries or essentially infinite depending on the number of user items?
I was made to believe that the above was inefficient and as such I wanted to condense my setup into using one query to get the same data.
I have achieved that with the following code
SELECT user_items.name,GROUP_CONCAT( emails.email SEPARATOR ',' ) as emails, x.ip
FROM user_items
JOIN data AS data ON (data.name = user_items.name AND data.version = user_items.version)
LEFT JOIN data_emails AS data_emails ON (data_emails.name = user_items.name AND data_emails.version = user_items.version)
LEFT JOIN emails AS emails ON (data_emails.email_id = emails.ID)
LEFT JOIN
(SELECT name,version,GROUP_CONCAT( the_ips.ip SEPARATOR ',' ) as ip FROM data_ips
LEFT JOIN ips as the_ips ON data_ips.ip_id = the_ips.ID )
x ON (x.name = data.name AND x.version = user_items.version)
I have done loads of reading to get to this point and worked tirelessly to get here.
This works as I require - this question seeks to clarify what are the benefits of using this instead?
I have had to use a subquery (I believe?) to get the ips as previously it was multiplying results (I believe based on the complex joins). How this subquery works I suppose is my main confusion.
Summary of questions.
-Is my database setup well setup for my usage? Any improvements would be appreciated. And any useful resources to help me expand my knowledge would be great.
-How does the subquery in my sql actually work - what is the query doing?
-Am i correct to keep using left joins - I want to return the user item, and null values if applicable to the right.
-Am I essentially replacing a potentially infinite number of queries with 2? Does this make a REAL difference? Can the above be improved?
-Given that when i update a version of an item in my data table i know have to update the version in the user_items table, I now have a few more update queries to do. Is the tradeoff off of this setup in practice worthwhile?
Thanks to anyone who contributes to helping me get a better grasp of this !!
Given your data layout, and your objective, the query is correct. If you've only got a small amount of data it shouldn't be a performance problem - that will change quickly as the amount of data grows. However when you ave a large amount of data there are very few circumstances where you should ever see all your data in one go, implying that the results will be filtered in some way. Exactly how they are filtered has a huge impact on the structure of the query.
How does the subquery in my sql actually work
Currently it doesn't work properly - there is no GROUP BY
Is the tradeoff off of this setup in practice worthwhile?
No - it implies that your schema is too normalized.

How do I use mysql to match against multiple possibilities from a second table?

I'm not entirely sure how to ask this question, so I'll lead by providing an example table and an example output and then follow up with a more thorough explanation of what I'm attempting to accomplish.
Imagine that I have two tables. In the first is a list of companies. Some of these companies have duplicate entries due to being imported and continuously updated from different sources. For example, the company table may look something like this:
| rawName | strippedName |
| Kohl's | kohls |
| kohls.com | kohls |
| kohls Corporation | kohls |
So in this situation, we have information that has come in from three different sources. In an attempt to allow my program to understand that each of these sources are all the same store, I created the stripped name column (which I also use for creating URL's and whatnot).
In the second table, we have information about deals, coupons, shipping offers, etc. However, since these come in from their various sources, the end up with the three different rawNames that we identified above. For example, the second table might look something like this:
| merchantName | dealInformation |
| kohls.com | 10% off everything... |
| kohl's | Free shipping on... |
| kohls corporation | 1 Day Flash Sale! |
| kohls.com | Buy one get one... |
So here we have four entries that are all from the same company. However, when a user on the site visits the listing for Kohls, I want it to display all the entries from each source.
Here is what I currently have, but it doesn't seem to be doing the trick. This seems to only work if I set the LIMIT in that sub-query to 1 so that it only brings back one of the rawNames. I need it to match against all of the rawNames.
SELECT * FROM table2
WHERE merchantName = (SELECT rawName FROM table1 WHERE strippedName = '".$strippedName."')
The quickest fix is to replace your mercahantName = with merchantName IN
SELECT * FROM table2
WHERE merchantName IN (SELECT rawName FROM table1 WHERE strippedName = '".$strippedName."')
The = operator needs to have exactly one value on each side - the IN keyword matches a value against multiple values.

mysql lookup table

Lookup table - unique row identity
The other lookup tables just do not make sense as from what I have seen giving a row an ID then putting that id in another table which also has a id then adding these id's to some more tables which may reference them and still creating a lookup tables with more id's (this is how all the examples I can find seem) What I have done is this :
product_item - table
------------------------------------------
id | title | supplier | price
1 | title11 | suuplier1 | price1
etc.
it then goes on to include more items (sure you get it)
product_feature - table
--------------------------
id | title | iskeyfeature
1 | feature1 | true
feature_desc - table
-----------------------------
id | title | desc
1 | desc1 | text description
product_lookup - table
item_id | feature_id | feature_desc
1 | 1 | 1
1 | 2 | 2
1 | 3 | 3
1 |64 | 15
(as these only need to be referenced in the lookup the id's can be multiples per item or multiple items per feature)
What I want to do without adding item_id to every feature row or description row is retrieve only the columns from the multiple tables where their id is referenced in the same row of the lookup table. I want to know if it is possible to select all the referenced columns from the lookup row if I only know the item_id eg. Item_id = 1 return all rows where item_id = 1 with the columns referenced in the same row. Every item can have multiple features and also every feature could be attached to multiple items , this will not matter if I can just get the pattern right in how to construct this query from a single known value.
Any assistance or just some direction will be greatly appreciated. I'm using phpmyadmin, and sure this will be easier with some php voodoo I am learning mysql from tutorials ect and would like to know how to do it with sql directly.
Having a NULL value in a column is not the major concern that would lead to this design - it's the problem with adding new attribute columns in the future, at which MySQL is disgracefully bad.
If you want to make a query that returns everything about an item in one row, you need to LEFT OUTER JOIN back to the product_lookup table for each feature_id. This is about every 10th mysql question on Stack Overflow, so you should be able to find tons of examples.