Mysql DB Is this the most efficient design? - mysql

I have an existing mysql DB that manages regulations for 50 states. The current setup is relational - three tables for EACH of the 50 states:
state_table contains the chapter/sub-chapter headings
item_table contains the end records
department_table contains the ID's to relate the two.
all combined it handles around 620,000 records
I'm not a DB design expert and have always utilized this as-is and gotten-by however, the nature of tables for all 50 states limits searching across all states etc. and I'm wondering if there is a better approach.
I'm wondering if I should consider combining this into either a single set of 3 relational tables for the entire nation or even a single table to handle everything.
I've asked this on other forums and have been told to read various volumes of DB schema and structures etc. so if there is someone who can just suggest the direction to go in and the pro's and con's of what I have vs the alternative that would be great!
thanks!
Here's the way it is, X 50
alabama
ID
Name
State
Parent
Description
alabama_department
Department - ID's from "alabama"
Item - ID's from "alabama_item"
alabama_item
ID
Name
Description
Keywords
Doc_ID
Effective_date
...
...
The Queries: I step through the heirarchy of chapter/sub-chapter/end-record via links this works fine but I'm starting to focus more on search capability and also thinking what I have is overkill and it sounds like a couple of you think so (overkill)

If I am correct in thinking you have 150 tables (3 * 50 states) Then:
You should have a 'states' table which includes a stateID and stateName. Then use ONE table for chapter/subchapters, ONE for departments, and ONE for end records and use the stateID to relate different records to a state.
You should not have 3 tables for each state, you can use one of each and just relate to a state table. This brings you to four tables instead of 150.

Related

How best to implent 3-dimensional SQL relationship?

This question is an extension of another question I asked regarding many-to-many relationships in MySQL.
I currently have 3 tables that I need to link with a 4th intermediary table:
Stores, Products, and States
My intermediary table, _stores_products_states, combines the id from the other three tables to determine which product is sold by which store and in which state.
Now, as I understand it, I would need to create an entry in _stores_products_states for every possible combination of the three, correct? This would lead to thousands of duplicated values in 1-2 of the columns (though never all 3).
For example, if Best Guy sells both GI Bros and Darbies in all 50 states, that would be 100 entries just for those two products. If those products are sold by another store, they too would have 100 entries.
Is this the correct way to implement this kind of relationship?
EDIT:
This whole setup is basically just to determine the availability of a particular product. A user will search for a product and receive a list of stores that sell that product in their state.
The 4th table is the way!
So if I got it right, your '_stores_products_states' table could even be called sale
You do not need to create a record for all possible combinations of product, state, and store. You only need to create a record for existing combinations, that is, availability of a product in a store in a state (maybe with things like local price and quantity bolted on).
You will have to store this information one way or another; a 3-relation link table, especially stored as a clustered MySQL index, would be a pretty standard solution, with good performance characteristics.
One thing I wonder about is why you have stores separate from states. I'd expect a store to be associated with a state. With the 3-relation link table, you'd be able to associate the same store with a product in several different states. Is this what your business domain supposes?

MYSQL Selection Value Table Structure

I am creating a website where users can post a listing of their home. I have checkboxes where users can check the characteristics their home contains such as a pool, fireplace, attached/detached garage etc.
I had to designs in mind but I was wondering which is more correct:
Create a column in the home listing table for each characteristics and give it a type of enum('0','1') where 0 stands for not checked and 1 stands for checked
Create a table which holds all the characteristics a property can have like: garage, pool, fireplace etc.. and then create a second table that pulls the characteristic id and pairs it with a home listing id
For eg: home_1 has a pool so a row will be created like this:
| home_1 | 1 |
where home_1 is the listing id and 1 is the id of pool in the characteristics table
Which option should I go with?
Option 1 seems good, because if you go with 2nd option then there will be joins while querying the database. And join are expensive and time taking in MySQL.
more can be found here https://www.percona.com/blog/2013/07/19/what-kind-of-queries-are-bad-for-mysql/
If you want to query the data like "count all detached houses"
Enum with seperate columns will work faster and easier to handle db operations.
If you are willing to query houses ONLY ON addresses, price and such NOT those features. 2nd method is easier to develop and maintain.
In short, use 2nd method if u are not going to query those house characteristics
individually.
It all depends on your method of using the data after you save them. But the basic idea should be to consider mappings in these ways:
Go with the second option when:
If the two entities are many-many (many homes, many characteristics) you should go with the second option (even if it adds little cost of using joins in future).
Since your full db mapping is not known, I am proposing one more option IF the characteristics are independent of property. Meaning, if you are planning to use characteristics to reference some other entities of other tables, then it will be best again to go with your second option.
Go with the first option when
If it is just one-many relationship (one home, many characteristics), your first option works good because not only it would reduce cost while fetching but also will update/remove the dependent characteristics of home when your home record gets updated/deleted.
Lastly, Its only up to you to decide the mapping type and dependencies of data models.

Access query is duplicating unique records / Linked table issues

I hope someone can help me with this:
I have a simple query combining a list of names and basic details with another table containing more specific information. Some names will necessarily appear more than once and arbitrary distinctions like "John Smith 1" and "John Smith 2" are not an option, so I have been using an autonumber to keep the records distinct.
The problem is that my query is creating two records for each name that appears more than once. For example, there are two clients named 'Sophoan', each with a different id number, and the query has picked up each one twice resulting in four records (in total there are 122 records when there should only be 102). 'Unique values' is set to 'yes'.
I've researched as much as I can and am completely stuck. I've tried to tinker with sql but it always comes back with errors, I presume because there are too many fields in the query.
What am I missing? Or is a query the wrong approach and I need to find another way to combine my tables?
Project in detail: I'm building a database for a charity which has two main activities: social work and training. The database is to record their client information and the results of their interactions with clients (issues they asked for help with, results of training workshops etc.). Some clients will cross over between activities which the organisation wants to track, hence all registered clients go into one list and individual tables spin of that to collect data for each specific activity the client takes part in. This query is supposed to be my solution for combining these tables for data entry by the user.
At present I have the following tables:
AllList (master list of client names and basic contact info; 'Social Work Register' and 'Participant Register' join to this table by
'Name')
Social Work Register (list of social work clients with full details
of each case)
Social Work Follow-up Table (used when staff call social work clients
to see how their issue is progressing; the register has too many
columns to hold this as well; joined to Register by 'Client Name')
Participants Register (list of clients for training and details of
which workshops they were attended and why they were absent if they
missed a session)
Individual workshop tables x14 (each workshop includes a test and
these tables records the clients answers and their score for each
individual test; there will be more than 20 of these when the
database is finished; all joined to the 'Participants Register' by
'Participant Name')
Queries:
Participant Overview Query (links the attendance data from the 'Register' with the grading data from each Workshop to present a read-only
overview; this one seems to work perfectly)
Social Work Query (non-functional; intended to link the 'Client
Register' to the 'AllList' for data entry so that when a new client
is registered it creates a new record in both tables, with the
records matched together)
Participant Query (not yet attempted; as above, intended to link the
'Participant Register' to the 'AllList' for data entry)
BUT I realised that queries can't be used for data entry, so this approach seems to be a dead end. I have had some success with using subforms for data entry but I'm not sure if it's the best way.
So, what I'm basically hoping to achieve is a way to input the same data to two tables simultaneously (for new records) and have the resulting records matched together (for new entries to existing records). But it needs to be possible for the same name to appear more than once as a unique record (e.g. three individuals named John Smith).
[N.B. There are more tables that store secondary information but aren't relevant to the issue as they are not and will not be linked to any other tables.]
I realised that queries can't be used for data entry
Actually, non-complex queries are usually editable as long as the table whose data you want to edit remains 'at the core' of the query. Access applies a number of factors to determine if a query is editable or not.
Most of the time, it's fairly easy to figure out why a query has become non-editable.
Ask yourself the question: if I edit that data, how will Access ensure that exactly that data will be updated, without ambiguity?
If your tables have defined primary keys and these are part of your query, and if there are no grouping, calculated fields (fields that use some function to change or test the value of that field), or complex joins, then the query should remain editable.
You can read more about that here:
How to troubleshoot errors that may occur when you update data in Access queries and in Access forms
Dealing with Non-Updateable Microsoft Access Queries and the Use of Temporary Tables.
So, what I'm basically hoping to achieve is a way to input the same data to two tables simultaneously (for new records) and have the resulting records matched together (for new entries to existing records). But it needs to be possible for the same name to appear more than once as a unique record (e.g. three individuals named John Smith).
This remark actually proves that you have design issues in your database.
A basic tenet of Database Design is to remove redundancy as much as possible. One of the reasons is actually to avoid having to update the same data in multiple places.
Another remark: you are using the Client's name as a Natural Key. Frankly, it is not a very good idea. Generally, you want to make sure that what constitutes a Primary key for a table is reliably unique over time.
Using people's names is generally the wrong choice because:
people change name, for instance in many cultures, women change their family name after they get married.
There could also have been a typo when entering the name and now it can be hard to correct it if that data is used as a Foreign Key all in different tables.
as your database grows, you are likely to end up with some people having the same name, creating conflicts, or forcing the user to make changes to that name so it doesn't create a duplicate.
The best way to enforce uniqueness of records in a table is to use the default AutoNumber ID field proposed by Access when you create a new table. This is called a Surrogate key.
It's not mean to be edited, changed or even displayed to the user. It's sole purpose is to allow the primary key of a table to be unique and non-changing over time, so it can reliably be used as a way to reference a record from one table to another (if a table needs to refer to a particular record, it will contain a field that will hold that ID. That field is called a Foreign Key).
The names you have for your tables are not precise enough: think of each table as an Entity holding related data.
The fact that you have a table called AllList means that its purpose isn't that well-thought of; it sounds like a catch-all rather than a carefully crafted entity.
Instead, if this is your list of clients, then simply call it Client. Each record of that table holds the information for a single client (whether to use plural or singular is up to you, just stick to your choice though, being consistent is hugely important).
Instead of using the client's name as a key, create an ID field, an Autonumber, and set it as Primary Key.
Let's also rename the "Social Work Register", which holds the Client's cases, simply as ClientCase. That relationship seems clear from your description of the table but it's not clear in the table name itself (by the way, I know Access allows spaces in table and field names, but it's a really bad idea to use them if you care at least a little bit about the future of your work).
In that, create a ClientID Number field (a Foreign Key) that will hold the related Client's ID in the ClientCase table.
You don't talk about the relationship between a Client and its Cases. This is another area where you must be clear: how many cases can a single Client have?
At most 1 Case ? (0 or 1 Case)
exactly 1 Case?
at least one Case? (1 or more Cases)
any number of Cases? (0 or more Cases)
Knowing this is important for selecting the right type of JOIN in your queries. It's a crucial part of the design assumptions when building your database.
For instance, in the most general case, assuming that a Client can have 0 or more cases, you could have a report that displays the Client's Name and the number of cases related to them like this:
SELECT Client.Name,
Count(ClientCase.ID) AS CountOfCases
FROM Client
LEFT JOIN ClientCase
ON Client.ID = ClienCase.ClientID
GROUP BY Client.Name
You've described your basic design a bit more, but that's not enough. Show us the actual table structures and the SQL of the queries you tried. From the description you give, it's hard to really understand the actual details of the design and to tell you why it fails and how to make it work.

MySQL table design, one row or more pr user?

Using MySQL I have table of users, a table of matches (Updated with the actual result) and a table called users_picks (at first it's always going to be 10 football matches pr. gameweek pr. league because there's only one league as of now, but more leagues will come along eventually, and some of them only have 8 matches pr. gameweek).
In the users_picks table should i store each 'pick' (by pick I mean both 'hometeam score' and 'awayteam score') in a different row, or have all 10 picks in one single row? Both with a FK for user and gameweek. All picks in one row would mean I had columns with appended numbers like this:
Option 1: [pick_id, user_id, league_id, gameweek_id, match1_hometeam_score, match1_awayteam_score, match2_hometeam_score, match2_awayteam_score ... etc]
and that option doesn't quite fill me with joy, and looks a bit stupid. Especially since there's going to be lots of potential NULLs in the db. The second option would mean eventually millions of rows. But would look like this:
Option 2: [pick_id, user_id, league_id, gameweek_id, match_id, hometeam_score, awayteam_score]
What's the best practice? And would it be a PITA to do all sorts of statistics using the second option? eg. Calculating how many matches a user has hit correctly in a specific round, how many alltime correct hits etc.
If I'm not making much sense, I'll try to elaborate anything. I just wan't my table design to be good from the start, so I won't have a huge headache in a couple of months.
Thanks in advance.
The second choice is much better than the first. This is called database normalisation and makes querying easier, not harder. I would suggest reading the linked article, and the related descriptions of the various "normal forms", and aiming for a 3rd Normal Form data structure as a minimum.
To see the flaw in your first option, imagine if there were to be included later a new league with 11 matches. Or 400.
You should read up about database normalization.
When you have a 1:n relation, like in your case one team having many matches, you would create two tables. One table "teams" and a second table "matches" where each row includes the ID of the team which played the match.
In the same manner you should also have separate tables for users, picks and leagues.
Option two is better, provided you INDEX your table properly, since (as you indicate) it will grow quite large. The pick_id is the primary key, but also create an INDEX on the user_id field, as likely the most common query will be
SELECT * FROM `users_pics` WHERE `user_id`=?;
to get all the picks for a given user.

How to store hierarchical information into a database?

I have the following information that should be retrieved by using several dependent select fields on a web form:
Users will be able to add new categories.
Food
- Fruits
- Tropical
- Pineapples
- Pineapples - Brazil
- Pineapples - Hawaii
- Coconuts
- Continental
- Orange
- Fish
....
This data should come from a database.
I realize that creating a table for each category here presented is not a good schema perhaps, so I would to ask, if is there any standard way to deal with this?
I'm also aware of this schema example:
Managing Hierarchical Data in MySQL
Is there any other (perhaps more intuitive way) to store this type of information ?
The link you provided describes the two standard ways for storing this type of information:
Adjacency List
Nested Sets
One issue your question didn't raise is whether all fruits have the same attributes or not.
If all fruits have the same attributes, then the answer that tells you to look at the link you provided and read about adjacency lists and nested sets is correct.
If new fruits can have new attributes, then a user that can add a new fruit can also add a new attribute. This can turn into a mess, real easily. If two users invent the same attribute, but give it a different name, that might be a problem. If two users invent different attributes, but give them the same name, that's another problem.
You might just as well say that, conceptually, each user has their own database, and no meaningful queries can be made that combine data from different users. Problem is, the mission of the database almost always includes, sooner or later, bringing together all the data from the different users.
That's where you face a nearly impossible data management issue.
Kawu gave you the answer.... a recursive relation (the table will be be related to itself) aka Pig's Ear relation.
You example shows a parent with several children, but you didn't say if an item can belong to more that one parent. Can an orange be in 'Tropical' and in 'Citrus'?
Each row has an id and a parent_id with the parent_id pointing to the id of another row.
id=1 name='Fruits' parent_id=0
id=2 name='Citrus' parent_id=1
id=3 name='Bitter Lemon' parent_id=2
id=4 name='Pink Grapefruit' parent_id=2
Here are some examples of schemas using this type of relation to provide unlimited parent-child relations:
Data model for product categories
Data model for organizations and people