Most Efficient Method of Storing a List in MySQL - mysql

I'm relatively new to databases and MySQL, but I'm using it to connect a database to a program I've made in VB.NET. Along with many programming languages, I understand SQL, but I have very little experience with databases. Also, I'm using MySQL Workbench (if it helps to know).
I am creating a program which retrieves information from the database. This program in particular is a guide for cooking.
The Database
My database consists of one table named "recipes". Within the table are four columns, each named (in order): ID, Recipe Name, Origin, Ingredients.
My only problem is I plan on storing around 80 or so recipes within the database; however, this will not be a difficult task because I'm simply copy-and-pasting from a Wikia page.
The Problem
The Wikia page in which I'm copying my ingredients from are in a numerical list, therefor I cannot simply copy nearly ten steps, and past them into my ingredients column because it will not let you (typing it would take ages as well). This an issue because I need to retrieve all the ingredients in a list, and I thought it would be inefficient to create over ten different columns.
Conclusion
Is there a more efficient way to store a list of items rather than creating multiple columns? How can I combat this issue?

Have multiple tables. Have a table of recipies with acolumns ID, Recipe Name and Origin, and aother table of ingredients which contains ID, Recipe ID and ingredient (ie, one row per recipe per ingrediant).
You initial ideas (ie, either all ingredients in one column, or many columns, one for each ingredient) would be inefficient and also difficult to interrogate. For example finding which recipes contained a particular ingredient would be difficult.

Related

Structuring and Designing SQL Database for List of Items Containing More Lists of Items

I have collected a bunch of crafts and DIY instructionals over the years. Each has a list of instructions and a list of materials needed.
I want to store these in a database, but as I have limited experience I'm questioning the correct/proper way to structure and Design the database for SQL.
Initially I was going to put everything in one table, so each record would represent a project, and then there would be one field for the instructions and one field materials. Each set of instructions would go into one value, perhaps separated by some character or symbol.
However, I'm wondering if it's better to separate the instructions and materials into their own tables (a table for each instruction set for each project). But with this method, there would have to be hundreds of tables if I create a table for each set of instructions for each project.
The third option would be to put ALL the instructions for each project into one table, then give them an id for the corresponding project, and an id for the order they should appear. But this would mean possible duplicates, especially in the materials table (hammer, for instance would appear multiple times, but would have separate ids referring to separate projects). I also worry about the table being too big with thousands of entries.
The fourth option would be to have one table, but add an excessive number of fields for instructions and materials, and only use the ones needed for each project (for example 100 "materials" fields, but perhaps one project only uses 10 and the other 99), but this does not seem proper, and could lead to trouble later on.
Is there a standard way to go about solving this problem, structuring a database for a list of things, which each reference another list of things?

sql query to check many interests are matched

So I am building a swingers site. The users can search other users by their interests. This is only part of a number of parameters used to search a user. The thing is there are like 100 different interests. When searching another user they can select all the interests the user must share. While I can think of ways to do this, I know it is important the search be as efficient as possible.
The backend uses jdbc to connect to a mysql database. Java is the backend programming language.
I have debated using multiple columns for interests but generating the thing is the sql query need not check them all if those columns are not addressed in the json object send to the server telling it the search criteria. Also I worry i may have to make painful modifications to the table at a later point if i add new columns.
Another thing I thought about was having some kind of long byte array, or number (used like a byte array) stored in a single column. I could & this with another number corresponding to the interests the user is searching for but I read somewhere this is actually quite inefficient despite it making good sense to my mind :/
And all of this has to be part of one big sql query with multiple tables joined into it.
One of the issues with me using multiple columns would be the compiting power used to run statement.setBoolean on what could be 40 columns.
I thought about generating an xml string in the client then processing that in the sql query.
Any suggestions?
I think the correct term is a Bitmask. I could maybe have one table for the bitmask that maps the users id to the bitmask for querying users interests, and another with multiple entries for each interest per user id for looking up which user has which interests efficiently if I later require this?
Basically, it would be great to have a separate table with all the interests, 2 columns: id and interest.
Then, have a table that links the user to the interests: user_interests which would have the following columns: id,user_id,interest_id. Here some knowledge about many-to-many relations would help a lot.
Hope it helps!

mySQL: one entry multiple strings vs multiple entries

At the minute im working on a complex database. I've got one table where I'd like to insert Data every day from dozents of Users.
Example:
There are 200 recipes for menus (each column a recipe) and 200+ Users. Every User is using a recipe between 1 and 3 times a day. in Addition to that i want to track the ingredients and the amount of the ingredients. Even more Data to it, like an evaulation of how difficult cooking was, how nice it tasted and so on.
First idea was to make one entry per usage:
[user id, timestamp, recipe#1, recipe#2, ... , recipe#200] // daily up to 3 entries per user
Details of the recipe would be in an array. I was wondering if I could make that easier. I want to synchronise the User's app and the database once per week. So could it be easier to make one entry for the week and differenciate the recipe usuage with a timestamp in that array?
Second idea:
[user id, recipe#1,...,recipe#200]
=>'1','"details","timestamp"','"details","timestamp"','"details","timestamp"'
// weekly one entry per user
If I want to show charts with stats about the recipes Idea1 would be easy, but depending on the users and the entries of those my database grow almost exponential. Could it be better to go with idea2 to reduce it for one entry per week and differenciate with timestamps inside those arrays?
I also dont like the idea of maintaining a structure like this. Adding more recipes wouldnt be very dynamic. Basicly Users are growing, recipes are growing, details a dynamic, time is stamps are getting inserted without an end..
At the end of the day I want to display stats, behaviour depending on user, on time, be able to be sorted by every category possible - which gives me a headache :D
Always go with multiple entries if you need to run queries on it for comparing and calculating.
Could you please share your whole database structure? I would go with more related tables. Preferable not add arrays into the database, see the database more like an array already to collect data in a better way.
I would go with something like cooking_id, user_id, recipe_id, difficulty, taste, timestamp, and create a new post for each time. Then in recipe table you have something like: recipe_id, name, details (maybe ingredients). Depends how you wanna measure things.
If you need to measure ingredients, you could make ingredients a separate table and create a related table for the recipe. Like for ingredients: ingredients_id, name. And for the related table: ingredients_id, recipe_id, grams. (here you can make ingredients_id and recipe_id together as a primary key.
It was some time since I last worked or study more deeply in database structure, hope I could give you some advice at least :)

Relationship database design - object specific many to many, do I solve with self join table or new table

Being new to relational database design, I am trying to clarify one piece of information to properly design this database. Although I am using Filemaker as the platform, I believe this is a universal question.
Using the logic of ideally having all one to many relationships, and using separate tables or join tables to solve these.
I have a database with multiple products, made by multiple brands, in multiple product categories. I also want this to be as scale-able as possible when it comes to reporting, being able to slice and dice the data in as many ways as possible since the needs of the users are constantly changing.
So when I ask the question "Does each Brand have multiple products" I get a yes, and "Does each product have multiple brands" the answer is no. So this is a one to many relationship, but it also seems that a self-join table might give me everything that I need.
This methodology also seems to go down a rabbit hole for other "product related" information such as product category, each product is tied to one product category, but only one product category is related to a product.
So I see 2 possibilities, make three tables and join them with primary and foreign keys, one for Brand, one for Product Category, and one for Products.
Or the second possibility is to create one table that has the brand and product category and product info all in one table (since they are all product related) and simply do self-joins and other query based tables to give me the future reporting requirements that will be changing over time.
I am looking for input from experiences that might point me in the right direction.
Thanks in advance!
Could you ever want to store additional information about a brand (company URL, phone number, etc.) or about a product category (description, etc.)?
If the answer is yes, you definitely want to use three tables. If you don't, you'll be repeating all that information for every single item that belongs to the same brand or same category.
If the answer is no, there is still an advantage to using three tables - it will prevent typos or other spelling inconsistencies from getting into your database. For example, it would prevent you from writing a brand as "Coca Cola" for some items and as "Coca-Cola" for other items. These inconsistencies get harder and harder to find and correct as your database grows. By having each brand only listed once in it's own table, it will always be written the same way.
The disadvantage of multiple tables is the SQL for your queries is more complicated. There's definitely a tradeoff, but when in doubt, normalize into multiple tables. You'll learn when it's better to de-normalize with more experience.
I am not sure where do you see a room for a self-join here. It seems to me you are saying: I have a table of products; each product has one brand and one (?) category. If that's the case then you need either three tables:
Brands -< Products >- Categories
or - in Filemaker only - you can replace either or both the Brands and the Categories tables with a value list (assuming you won't be renaming brands/categories and at the expense of some reporting capabilities). So really it depends on what type of information you want to get out in the end.
If you truly want your solution to be scalable you need to parse and partition your data now. Otherwise you will be faced with the re-structuring of the solution down the road when the solution grows in size. You will also be faced with parsing and relocating the data to new tables. Since you've also included the SQL and MySQL tags if you plan on connecting Filemaker to an external data source then you will definitely need to up your game structurally.
Building everything in one table is essentially using Filemaker to do Excel work and it won't cut it if you are connecting to SQL, MySQL, etc.
Self join tables are a great tool. However, they should really only be used for calculating small data points and should not be used as pivot points or foundations for your reporting features. It can grow out of control as time goes on and you need to keep your backend clean.
Use summary and sub-summary reporting features to slice product based data.
For retail and general product management solutions, whether it's Filemaker/SQL/or whatever the "Brand" or "Vendor" is it's own table. Then you would have a "Products" table (the match key being the "Brand ID").
The "Product Category" field should be a field in the "Products" table. You can manage the category values by building a standard value list or building a value list based on a "Product Category" table. The second scenario is better for long term administration.

More rows or more tables in a db design?

I'm pretty sure I already know the answer, but would like some confirmation...
We received 220 text files of providers. Each file is a different category of provider. In total there are 3.2 million records.
My inclination is to create a category table and a provider table that links to category by an ID, then index any other columns that may be searched on like state, or even last name.
The other option is to have one table per category, but I think other than the smaller row size there are a lot of disadvantages to this approach.
It's a PHP/MySQL implementation.
Anyone think the separate table option is better for any reason?
Thanks,
D.
Go with two table approach -- categories and providers.
This will enable you to
easily adding new categories
easily reverse search Categories based on a column such as state of provider.
It make sense from data-structure point of view as well. One type of data in one table.
I agree with your original thought, and with Nishant's answer. In addition to his points, it also normalizes the data, and allows easy updates if a category changes names for some reason.