Inserting JSON data line by line into an SQL table - mysql

I have a JSON file that stores the information about a bunch of recipes, like cuisine, time, the ingredients, instructions, etc. I am supposed to transfer all the data to a MySQL table with the relevant headings.
The "ingredients" and the "instructions" are stored like this:
The instructions and ingredients have several "lines" , stored as a list.
How can we store the ingredients and instructions in a MySQL table, in a line by line format?
something like:
instructions
inst1
inst2
..
The JSON file was created using a python program using the beautiful soup module.
PS: I am very new to both SQL and JSON, so I unfortunately dont have anything to show under "what I tried"...Any help will be appreciated.

Rather than give you the exact answer, I'll give you the process I use to determine a database structure. You're using a relational database, so that's what I'll talk about. Its also good to have a naming convention, I've used CamelCase here but you can do whatever you want.
You mentioned you were using python but this answer is language agnostic.
You've chosen quite a complex example, but I'll assume you understand how to create a table, and use primary keys and foreign keys. If not, maybe you should do something simpler.
Step 1 - Figure out what the entities are
These are the real-life entities which need to represented as database tables. In this case, I'm seeing 4 entities;
Recipe
Keyword
Ingredient
Instruction
Each of these can have a table in MySql. Give them a Primary key which follows a naming convention.
Step 2 - figure out the relationships
It looks like keywords are shared between multiple recipes, so you'll a many to many relationship - this means there's going to be an extra table,
RecipeKeyword
This is just a link between Recipe and keyword to avoid redundancy. It has two foreign keys, RecipeId and KeywordId. At the moment its just a dumb object. In other situations like this, its common for an application to need information about a join - for example, who linked the two things together (consider users, permissions, and a join table with information on who granted the permission)
The other entities are one to many - each will need a foreign key, RecipeId
Step 3 - design each table
As well as having several lists, your Recipe object has some properties. These can be in its table. Most of them are strings in your data, although there are better ways to store things we can keep this simple.
The other entities just have a text field, from your screenshot, only the Recipe has properties.
For this system, you'll need to first insert all Recipe and Keyword objects. There is a common pattern in relational databases where in insert a record, and get its ID so you can insert more stuff which references it.
Step 4 - find a python mysql library
I don't know of one but google will help you find it. The documentation should include the basics of querying.
Step 5 - Insert your data
Here is some psudocode
FOR EACH recipe
INSERT the recipe, and get its ID
FOR EACH keyword
IF the keyword does not exist already
INSERT the new keyword and get its ID
INSERT a record into RecipeKeyword with RecipeId and KeywordId
FOR EACH ingredient
INSERT the ingredient, give it RecipeId as a foreign key
FOR EACH instruction
INSERT the instruction, give it RecipeId as a foreign key
That's it. From here you can select with joins - To form what we're seeing above, you might need to do 3 seperate queries and merge them together into a record object on the python side to reproduce the original structure.

Related

MySQL, how to restructure optional multiple foreign keys

For this example, I'm trying to build a system that will allow output from multiple sources, but these sources are not yet built. The output "module" will be one component, and each source will be its own component to be built and expanded upon later.
Here's an example I designed in MySQLWorkbench:
The goal is to make my output module display data from the output table while being easily expanded upon later as more sources are built. I also want to minimize schema updates when adding new sources. Currently, I will have to add a new table per source, then add a foreign key to the output table.
Is there a better way to do this? I don't know how I feel about these NULL-able foreign keys because the JOIN query will contains IFNULL's and will get unruly quickly.
Thoughts?
EDIT 1: Clarification
I will be displaying a grid using data in the output table. The output table will contain general data for all items in the grid and will basically act as an aggregator for the output_source_X tables:
output(id, when_added, is_approved, when_approved, sort_order, ...)
The output_source_X tables will contain additional data specific to a source. For example, let's say one of the output source tables is for Facebook posts, so this table will contain columns specific to the Facebook API:
output_source_facebook(id, from, message, place, updated_time, ...)
Another may be Twitter, so the columns are specific for Twitter:
output_source_twitter(id, coordinates, favorited, truncated, text, ...)
A third output source table could be Instagram, so the output_source_instagram table will contain columns specific to Instagram.
There will be a one-to-one foreign key relationship with the output table and ONLY ONE of the output_source_X tables, depending on if the output item is a Facebook, Twitter, Instagram, etc... post, hence the NULL-able foreign keys.
output table
------------
foreign key (source_id_facebook) references output_source_facebook(id)
foreign key (source_id_twitter) references output_source_twitter(id)
foreign key (source_id_instagram) references output_source_instagram(id)
I guess my concern is that this is not as modular as I'd like it to be because I'd like to add other sources as well without having to update the schema much. Currently, this requires me to join output_source_X on the output table using whatever foreign key is not null.
This design in almost certainly bad in a few ways.
It's not that clear what your design is representing but a straightforward one would be something like:
// source [id] has ...
source(id,message,...)
// output [id] is approved when [approved]=1 and ...
output(id,approved,...)
// output [output_id] has [source_id] as a source
output_source(output_id,source_id)
foreign key (source_id) references source(id)
foreign key (source_id) references source(id)
Maybe you have different subtypes of outputs and/or sources? Based on sources and/or outputs? Maybe each source is restricted to feeding particular outputs? Are "outputs" and "sources" actually kinds of outputs and sources, and this is info not on how outputs are sourced but info on what kinds of output-source pairings are permittted?
Please give us statements parameterized by column names for the basic statements you want to make about your application. Ie for the application relationships you are interested in. (Eg like the code comments above.) (You could do it for the diagrammed design but probably that would be overly complicated and not really reflecting what you are trying to model.)
Re your EDIT:
There will be a one-to-one foreign key relationship with the output
table and ONLY ONE of the output_source_X tables, depending on if the
output item is a Facebook, Twitter, Instagram, etc... post, hence the
NULL-able foreign keys.
You have a case of multiple disjoint subtypes of a supertype.
Your situation is a lot like that of this question except that where they have a subtype discriminator/tag column indicating which subtype table you have a set of columns where the non-empty one indicates which subtype table. See Erwin Smout's & my answers. Also this answer.
Please give us statements parameterized by column names for the basic
statements you want to make about your application
and you will find straightforward statements (as above). And if you give the statements for your current design you will find them complex. See also this.
I guess my concern is that this is not as modular as I'd like it to be
because I'd like to add other sources as well without having to update
the schema much.
Your structure is not reducing schema changes compared to proper subtype designs.
Anyway, DDL is there for that. You can genericize subtypes to avoid DDL only by loss of the DBMS managing integrity. That would only be relevant or reasonable based on evaluating DDL vs DML performance tradeoffs. Search re (usually, anti-pattern) EAV.
(Only after you shown that creating & deleting new tables is infeasible and the corresponding horrible integrity-&-concurrency-challenged mega-joining table-and-metadata-encoded-in-table EAV information-equivalent design is feasible should you consider using EAV.)

A more efficient way to store data in MySQL using more than one table

I had one single table that had lots of problems. I was saving data separated by commas in some fields, and afterwards I wasn't able to search them. Then, after search the web and find a lot of solutions, I decided to separate some tables.
That one table I had, became 5 tables.
First table is called agendamentos_diarios, this is the table that I'm gonna be storing the schedules.
Second Table is the table is called tecnicos, and I'm storing the technicians names. Two fields, id (primary key) and the name (varchar).
Third table is called agendamento_tecnico. This is the table (link) I'm goona store the id of the first and the second table. Thats because there are some schedules that are gonna be attended by one or more technicians.
Forth table is called veiculos (vehicles). The id and the name of the vehicle (two fields).
Fith table is the link between the first and the vehicles table. Same thing. I'm gonna store the schedule id and the vehicle id.
I had an image that can explain better than I'm trying to say.
Am I doing it correctly? Is there a better way of storing data to MySQL?
I agree with #Strawberry about the ids, but normally it is the Hibernate mapping type that do this. If you are not using Hibernate to design your tables you should take the ID out from agendamento_tecnico and agendamento_veiculos. That way you garantee the unicity. If you don't wanna do that create a unique key on the FK fields on thoose tables.
I notice that you separate the vehicles table from your technicians. On your model the same vehicle can be in two different schedules at the same time (which doesn't make sense). It will be better if the vehicle was linked on agendamento_tecnico table which will turn to be agendamento_tecnico_veiculo.
Looking to your table I note (i'm brazilian) that you have a column called "servico" which, means service. Your schedule table is designed to only one service. What about on the same schedule you have more than one service? To solve this you can create a table services and create a m-n relationship with schedule. It will be easier to create some reports and have the services well separated on your database.
There is also a nome_cliente field which means the client for that schedule. It would be better if you have a cliente (client) table and link the schedule with an FK.
As said before, there is no right answer. You have to think about your problem and on the possible growing of it. Model a database properly will avoid lot of headache later.
Better is subjective, there's no right answer.
My natural instinct would be to break that schedule table up even more.
Looks like data about the technician and the client is duplicated.
There again you might have made a decisions to de-normalise for perfectly valid reasons.
Doubt you'll find anyone on here who disagrees with you not having comma separated fields though.
Where you call a halt to the changes is dependant on your circumstances now. Comma separated fields caused you an issue, you got rid of them. So what bit of where you are is causing you an issue now?
looks ok, especially if a first try
one comment: I would name PK/FK (ids) the same in all tables and not using 'id' as name (additionaly we use '#' or '_' as end char of primary / foreighn keys: example technicos.technico_ and agendamento_tecnico has fields agend_tech_ and technico_. But this is not common sense. It makes queries a bit more coplex (because you must fully qualify the fields), but make the databse schema mor readable (you know in the moment wich PK belong to wich FK)
other comment: the two assotiative (i never wrote that word before!) tables, joining technos and agendamento_tecnico have an own ID field, but they do not need that, because the two (primary/unique) keys of the two tables they join, are unique them selfes, so you can use them as PK for this tables like:
CREATE TABLE agendamento_tecnico (
technico_ int not null,
agend_tech_ int not null,
primary key(technico_,agend_tech_)
)

How to synchronise Core Data relationships?

I'm creating an app that pulls data from a web server (MySQL), parses it and stores it in a SQLite database using Core Data.
The MySQL database has a 'words' table. Each word can be in a 'category'. So the words table has a field for 'category_id' to join the tables.
I'm having some trouble getting my head around how to replicate this locally in my app. I currently have entities matching the structure of the MySQL database, but no relationships. It seems like in my 'words' entity I shouldn't need the 'category_id' field (I should instead have a one-to-one 'category' relation set-up).
I'm confused as to how to keep this Core Data relationship in sync with the web server?
Assuming you have an Entity for Word and Category you will need to make a relationship (naming may be a bit hazy). Also assuming a Category can have many words and
// Word Entity
Relationship Destination Inverse
category Categories words
// Category Entity
Relationship Destination Inverse
words Word category // To-Many relationship
You are correct you would not need the category_id field as all relationships are managed through the object graph that Core Data maintains. You will still need a primary key like server_id (or similar) in each entity or you will have trouble updating/finding already saved objects.
This is how I deal with syncing data from an external database (I use RESTful interfaces with JSON but that does not really matter)
Grab the feed sorted by server_id
Get the primary keys (server_id) of all the objects in the feed
Perform a fetch using the a predicate like ... #"(serverId IN %#)", primaryKeys
which is sorted by the primary key.
Step through each array. If the fetch result has my record then I update it. If it does not then I insert a new one.
You would need to do this for both Word and Category
Next fetch all objects that form part of a relationship
Use the appropriate methods generated by core data for adding objects. e.g. something like `[myArticle addWords:[NSSet setWithObjects:word1, word2, word3, nil];
It's hard for me to test but this should give you a starting point?
Good to see a fellow Shiny course attendee using stack overflow - it's not just me

How to handle many-to-many relationships with more than 2 tables?

Here is where I am at right now. I have four tables: task, project, opportunity, and task_xref. The project and opportunity tables each have a one-to-many relationship with task. I am storing those relationships in task_xref. The schema looks something like this for each table (simplified):
task
----
id(pk)
name
project
-------
id(pk)
name
...
opportunity
-----------
id(pk)
name
...
task_xref
---------
task_id(task id)
fkey(other table id)
Assume that the keys in project and opportunity will not be the same (GUID) so an opportunity can't fetch tasks for a project and so on. This works well on the surface, one xref table to maintain the relationships between task and project, opportunity (or any other tables that might need a task relationship in the future).
My current dilemma is of bi-directionality. If I'm looking to get all the tasks for an individual project or opportunity it's no problem. If I'm pulling back a task and want to know the name of the related project or opportunity I can't. I have no way of knowing whether the related fkey is a project or opportunity. In the future I will probably have other tables with task relationships; while I have 2 tables now, there could be many more in the future.
Here are the possible solutions I've thought of so far:
1) separate xref table for each pair (e.g. task_project_xref, task_opportunity_xref...)
cons: I have to run a query for each xref table looking for relationships for a task
2) a third column in task_xref to point to the parent table
cons: this seems like a kludge to me
3) store the primary keys in project, opportunity in an identifiable way (e.g. proj1, proj2, proj3, opp1, opp2, opp3) so I can tell which table a task relates to by looking at the fkey
cons: this feels like I'm making the primary keys in projects and opportunities magical, imbuing them with more meaning than just being an identifier to a single record (maybe I'm over-thinking it)
My question then is this: Are there other solutions I am overlooking? Which solution is better/worse than the others?
I am trying to keep joins limited if possible and performance as good as possible. I am also not opposed to joining data in code if that will help simplify things.
I am using PHP and MySQL (MyISAM tables presently, but will use INNODB if there is a reason to do so).
I do prefer to have a separate xref table for the different concepts. This way, the relationship from a concept to its tasks is easier to maintain and isolated from the other concept's relationships to tasks.
I'd maybe favor a single xref for all related concepts if any table in your design could have tasks, and then I'd just have an extra column to indicate which kind of object is the parent of the tasks.
Your SQL joins will be neater if task_xref has separate project_id and opportunity_id fields. It's a good way of doing it, because queries are straightforward. Also, it's simple to tell what type of task it is, by simply seeing which of the foreign keys is not null. It's also efficient. When joining with tasks, two queries are needed, since project and opportunities will no doubt have different structures, so the results can't be union'd anyway.

MySQL Database - Related Results from same table / Many to Many database design problem

I am designing a relational database of products where there are products that are copies/bootlegs of each other, and I'd like to be able to show that through the system.
So initially during my first draft, I had a field in products called " isacopyof " thinking to just list a comma delimited list of productIDs that are copies of the current product.
Obviously once I started implementing, that wasn't going to work out.
So far, most many-to-many relationship solutions revolve around an associative table listing related id from table A and related id from table B. That works, but my situation involves related items from the SAME table of products...
How can I build a solution around that ? Or maybe I am thinking in the wrong direction ?
You're overthinking.
If you have a products table with a productid key, you can have a clones table with productid1 and productid2 fields mapping from products to products and a multi-key on both fields. No issue, and it's still 3NF.
Because something is a copy, that means you have a parent and child relationship... Hierarchical data.
You're on the right track for the data you want to model. Rather than have a separate table to hold the relationship, you can add a column to the existing table to hold the parent_id value--the primary key value indicating the parent to the current record. This is an excellent read about handling hierarchical data in MySQL...
Sadly, MySQL doesn't have hierarchical query syntax, which for things like these I highly recommend looking at those that do:
PostgreSQL (free)
SQL Server (Express is free)
Oracle (Express is also free)
There's no reason you can't have links to the same product table in your 'links' table.
There are a few ways to do this, but a basic design might simply be 2 columns:
ProductID1, ProductID2
Where both these columns link back to ProductID in your product table. If you know which is the 'real' product and which is the copy, you might have logic/constraints which place the 'real' productID in ProductID1 and the 'copy' productID in ProductID2.