How to retrieve hierarchial parent-child data related by multiple tables? - mysql

I have a website that contains guitar lessons and exercises, broken down by category. So you could have category scales. Then a lesson scales lesson1, which could contain exercise1_1, exercise1_2. Likewise for other categories and lessons with exercises.
Lessons and exercises are considered nodes (it is a Drupal site). So there is a node table that has node ids, node type (lesson or exercise) and titles.
Other info fields for these nodes (lesson/exercise text, etc) are stored in separate tables for each field. For instance there is a drupal_field_data_description table that contains description for each lesson and exercise.
Categories are stored in a taxonomy term table.
Relations among categories are handled via a taxonomy index table that establishes child-parent relation (so you could have scales, scales->major scales, etc). For my question, I am just considering one depth of category.
Categories of lessons and exercises are stored in a table drupal_field_data_field_category, which maps lessons and exercises to the category they are a part of.
Exercise-Lesson child-parent relations are stored in a table drupal_field_data_field_lesson that maps exercises to lessons.
Here is example data:
The categories (drupal_taxonomy_term_data):
tid vid name
1 2 Scales
2 2 Arpeggios
The lessons and exercises (drupal_node):
nid type title
1 lesson Lesson1
2 lesson Lesson2
3 exercise Ex1_1
4 exercise Ex1_2
5 exercise Ex2_1
6 exercise Ex2_2
The description field for the lessons and exercises (drupal_field_data_field_description):
entity_type bundle entity_id field_description_value
node lesson 1 Lesson1Summary
node lesson 2 Lesson2Summary
node exercise 3 Ex1_1Summary
node exercise 4 Ex1_2Summary
node exercise 5 Ex2_1Summary
node exercise 6 Ex2_2Summary
The mapping of lessons and exercises to the taxonomy (drupal_taxonomy_index):
nid tid
1 1
2 1
3 1
4 1
5 1
6 1
The mapping of lessons and exercises to the category (drupal_field_data_field_category) (this one almost seems unnecessary because of the taxonomy index):
entity_type bundle entity_id field_category_tid
node lesson 1 1
node lesson 2 1
node exercise 3 1
node exercise 4 1
node lesson 5 1
node lesson 6 1
The mapping of exercises to lessons (drupal_field_data_field_lesson):
entity_type bundle entity_id field_lesson_target_id
node exercise 3 1
node exercise 4 1
node exercise 5 2
node exercise 6 2
So... with this structure, I can't figure out how to build a query that will return a result of the form
Lesson1 Lesson1Summary
Ex1_1 Ex1_1Summary
Ex1_2 Ex1_2Summary
Lesson2 Lesson2Summary
Ex2_1 Ex2_1Summary
Ex2_2 Ex2_2Summary
Note that Lesson1 and Lesson2 are in the same category.
I need to return such data, because for a category page (that has no subcategories), I need to display a table for each lesson that shows the exercises in the lesson.
I could do all this in multiple queries, but I am really trying to better understand SQL joins and grouping. Also, I am not dead set on a result set as shown above. I am open to whatever result set will let me readily display the data (which I will do via PHP) in the fashion as I described.
The SQL fiddle is here
How would you recommend building such a query to extract a lesson and its exercises grouped in a logical way (e.g. how I show above)?
Seems getting lesson and exercises in this way would amount to a self join, with a variety of inner joins on the other tables but I just can't piece it all together...

Well, after much reading, I think I figured it out:
SELECT n.title, d.field_description_value, n.nid, l.field_lesson_target_id from drupal_node n
JOIN drupal_field_data_field_description AS d ON d.entity_id = n.nid
JOIN drupal_taxonomy_index AS t ON t.nid = n.nid
LEFT JOIN drupal_field_data_field_lesson AS l ON l.entity_id = n.nid
ORDER BY COALESCE(l.field_lesson_target_id, n.nid), l.field_lesson_target_id, n.nid
I based above on this post
My sqlfiddle is here.
This is definitely new territory for me, and while above works, I wish I understood the ORDER BY and GROUP BY nuances to understand where/how to use them.

Related

Joins on database engine vs on client-side [duplicate]

This question already has answers here:
MySQL query in a loop vs using a SQL join
(2 answers)
Closed 10 months ago.
I am creating an API with express.js for a food app and I can't figure out what is the best (most efficient) way to query the data and send it to the client-side. Any help is much appreciated.
I have a table called Restaurants
id
name
img_url
address
1
restaurant 1
link to img 1
address 1
2
restaurant 2
link to img 2
address 2
Another table called categories
id
name
1
Pizza
2
Pasta
And a restaurant can be in one or more categories, so I have another table for the many-to-many relationship
restaurant_id
category_id
1
1
1
2
2
1
2
2
Now on the home page of the app, I need to send a get request to the server and get in return all the restaurants and the categories so that I can display them all in one scroll view with the category name and below it all the restaurants that belong to it.
The first approach that got to my mind is to join all three tables
SELECT *
FROM Restaurants r
INNER JOIN RestaurantCategory rc ON r.id=rc.restaurant_id
INNER JOIN Categories c ON c.id=rc.category_id;
which will give me a result similar to this
id
name
img_url
address
category
1
restaurant 1
link to img 1
address 1
Pizza
1
restaurant 1
link to img 1
address 1
Pasta
2
restaurant 2
link to img 2
address 2
Pizza
2
restaurant 2
link to img 2
address 2
Pasta
And then I would somehow either on the client-side or the server-side loop on the result and make a list for each category and put in it any restaurant that is in that category.
Instead of doing the lists myself I also thought about first selecting the categories and then doing a for loop on the server-side to get the restaurants for each category, but I am not sure if in this case having multiple select statements is better than one.
I didn't like this approach because of how the table returned is having all the information about the same restaurant repeated more than once for each category. If I have 100 restaurants (with more columns than in the example above) for example and each is in maybe 3 categories, I will get 300 records, which I think will be a big amount of data being sent from the server to the client and it is all repeated.
The second approach is to Select each table alone, then do the join on the client-side myself.
I know that I should let the database engines do the joins for me because they are more powerful but I was thinking maybe if the users have a bad internet connection or something it will be worse to have the amount of data doubled or tripled?
Use a GROUP BY clause:
SELECT *
FROM Restaurants r
INNER JOIN RestaurantCategory rc ON r.id=rc.restaurant_id
INNER JOIN Categories c ON c.id=rc.category_id
GROUP BY c.id

Database Design for Time Table Generation

I am doing a project using J2EE(servlet) for Time Table Generation of College.
There are Six Slots(6 Hours) in a Day
4 x 1 HR Lectures
1 x 2 HR Lab
There Are three batches ( 3IT, 5IT, 7IT)
2 Classroom
1 LAB
Each slot in the time table will have
(Subject,Faculty)
For Lab I will duplicate the slot.
The Tables
Subject(SubjectID INT, SubjectName VARCHAR);
Faculty(FacultyID INT,FacultyName VARCHAR,NumOfSub INT,Subjects XYZ);
Here I am not able to decide the DATATYPE for subject. What should I do ? Since a faculty can teach multiple subjects ? Also how to link with Subject Table ?
P.S. Using MySQL Database
You don't want to actually store either NumOfSub (number of subjects) OR Subjects in Faculty. Storing subjects that way is a violation of First Normal Form, and dealing with it would cause major headaches.
Instead, what you want is another table:
FacultySubject
----------------
FacultyId -- fk for Faculty.FacultyId
SubjectId -- fk for Subject.SubjectId
From this, you can easily get the count of subjects, or a set of rows listing the subjects (I believe MySQL also has functions to return a list of values, but I have no experience with those):
This query will retrieve the count of Subjects taught by a particular teacher:
SELECT Faculty.FacultyId, COUNT(*)
FROM Faculty
JOIN FacultySubject
ON FacultySubject.FacultyId = FacultyId.FacultyId
WHERE Faculty.FacultyName = 'Really Cool Professor'
GROUP BY Faculty.FacultyId
... and this query will get all the subjects (named) that they teach:
SELECT Subject.SubjectId, Subject.SubjectName
FROM Faculty
JOIN FacultySubject
ON FacultySubject.FacultyId = FacultyId.FacultyId
JOIN Subject
ON Subject.SubjectId = FacultySubject.SubjectId
WHERE Faculty.FacultyName = 'Really Cool Professor'
(note that this last returns the subjects as a set of rows ie:
SubjectId SubjectName
=========================
1 Tree Houses
2 Annoying Younger Sisters
3 Swimming Holes
4 Fishing
)

Efficient MySQL query method for multiple joins

I am asking this question in the hope there is a more efficient (faster) way to pull and insert data in the the tables I am working with.
The basic structure of the data table is
ID Doc_ID Field Value
1 10 Title abc
2 10 Abstract xyz
3 10 Author Bob
4 11 Publisher Bookworms
5 11 Title zzz
6 11 Abstract bbb
7 12 Title aaa
8 12 Sale No
In other words the data tables are row based, each row contain a document id and the corresponding field value. Not all documents have the same number of fields defined. Indeed books may differ radically from magazines.
The data table is 10,000,000 rows typically a document has 100 fields associated with it.
So the performance problem I am finding is pulling a report with reference to 50+ different fields, for example if I have a query list in an order_table the query could be like
select ord.number as 'Order ID', d1.value as 'Title', d2.value as 'Author' .......
from order_table ord
LEFT JOIN data_table as d1 on d1.Doc_ID=ord.Doc_ID and d1.Field='Title'
LEFT JOIN data_table as d2 on d2.Doc_ID=ord.Doc_ID and d2.Field='Author'
........
LEFT JOIN data_table as d50 on d50.Doc_ID=ord.Doc_ID and d50.Field='Qty'
Using LEFT JOINS as there is no guarantee that the field is defined for that document.
Given there may be some WHERE parameters to limit the list to items (in stock for example or below a price) it is a slow query. Indexes don't really much.
Without being able to change the data model, what is the best way to pull volumes of information out?

Storing Hierarchical Data (MySQL) for Referral Marketing

I need to have a 5 levels hierarchy for the users registered to a website. Every user is invited by another, and I need to know all descendants for a user. And also ancestors for a user.
I have in mind 2 solution.
Keeping a table with relationships this way. A closure table:
ancestor_id descendant_id distance
1 1 0
2 2 0
3 3 0
4 4 0
5 5 0
6 6 0
2 3 1
Having this table for relationships. Keeping in a table 5 levels ancestors. A "ancestors" table:
user_id ancestor_level1_id ancestor_level2_id ancestor_level3_id ancestor_level4_id ancestor_level5_id
10 9 7 4 3 2
9 7 4 3 2 1
Are these good ideas?
I know about "the adjacency list model" and "the modified preorder tree traversal algorithm", but are these good solutions for a "referral" system?
The queries that I need to perform on this tree are:
frequently adding a new users
when a user buys something, their referrers get a percentage commission
every user should be able to find out how many people they've referred (and how many people were referred by people who they referred....) at each level
Closure Table
ancestor_id descendant_id distance
1 1 0
2 2 0
3 3 0
4 4 0
5 5 0
6 6 0
2 3 1
To add user 10, referred by user 3. (I don't think you need to lock the table between these two insertions):
insert into ancestor_table
select ancestor_id, 10, distance+1
from ancestor_table
where descendant_id=3;
insert into ancestor_table values (10,10,0);
To find all users referred by user 3.
select descendant_id from ancestor_table where ancestor_id=3;
To count those users by depth:
select distance, count(*) from ancestor_table where ancestor_id=3 group by distance;
To find the ancestors of user 10.
select ancestor_id, distance from ancestor_table where descendant_id=10;
The drawback to this method is amount of storage space this table will take.
Use the OQGRAPH storage engine.
You probably want to keep track of an arbitrary number of levels, rather than just 5 levels. Get one of the MySQL forks that supports the QGRAPH engine (such as MariaDB or OurDelta), and use that to store your tree. It implements the adjacency list model, but by using a special column called latch to send a command to the storage engine, telling it what kind of query to perform, you get all of the advantages of a closure table without needing to do the bookkeeping work each time someone registers for your site.
Here are the queries you'd use in OQGRAPH. See the documentation at
http://openquery.com/graph-computation-engine-documentation
We're going to use origid as the referrer, and destid as the referree.
To add user 11, referred by user 10
insert into ancestors_table (origid,destid) values (10,11)
To find all users referred by user 3.
SELECT linkid FROM ancestors_table WHERE latch = 2 AND origid = 3;
To find the ancestors of user 10.
SELECT linkid FROM ancestors_table WHERE latch = 2 AND destid = 10;
To find the number of users at each level, referred by user 3:
SELECT count(linkid), weight
FROM ancestors_table
WHERE latch = 2 AND origid = 3
GROUP BY weight;
Managing Hierarchical Data in MySQL
In general, I like the "nested set", esp. in MySQL which doesn't really have language support for hierarchical data.
It's fast, but you'll need to make sure your developers read that article if ease of maintenance is a big deal. It's very flexible - which doesn't seem to matter much in your case.
It seems a good fit for your problem - in the referral model, you need to find the tree of referrers, which is fast in the nested set model; you also need to know who are the ~children# of a given user, and the depth of their relationship; this is also fast.
Delimited String of Ancestors
If you're strongly considering the 5-level relationship table, it may simplify things to use a delimited string of ancestors instead of 5 separate columns.
user_id depth ancestors
10 7 9,7,4,3,2,1
9 6 7,4,3,2,1
...
2 2 1
1 1 (empty string)
Here are some SQL commands you'd use with this model:
To add user 11, referred by user 10
insert into ancestors_table (user_id, depth, ancestors)
select 11, depth+1, concat(10,',',ancestors)
from ancestors_table
where user_id=10;
To find all users referred by user 3. (Note that this query can't use an index.)
select user_id
from ancestors_table
where ancestors like '%,3,%' or ancestors like '3,%' or ancestors like '%,3';
To find the ancestors of user 10. You need to break up the string in your client program. In Ruby, the code would be ancestorscolumn.split(",").map{|x| x.to_i}. There's no good way to break up the string in SQL.
select ancestors from ancestors_table where user_id=10;
To find the number of users at each level, referred by user 3:
select
depth-(select depth from ancestors_table where user_id=3),
count(*)
from ancestors_table
where ancestors like '%,3,%' or ancestors like '3,%' or ancestors like '%,3'
group by depth;
You can avoid SQL injection attacks in the like '%,3,%' parts of these queries by using like concat('%,', ?, ',%') instead and binding the an integer for the user number to the placeholder.

MySQL: How to pull information from multiple tables based on information in other tables?

Ok, I have 5 tables which I need to pull information from based on one variable.
gameinfo
id | name | platforminfoid
gamerinfo
id | name | contact | tag
platforminfo
id | name | abbreviation
rosterinfo
id | name | gameinfoid
rosters
id | gamerinfoid | rosterinfoid
The 1 variable would be gamerinfo.id, which would then pull all relevant data from gamerinfo, which would pull all relevant data from rosters, which would pull all relevant data from rosterinfo, which would pull all relevant data from gameinfo, which would then pull all relevant data from platforminfo.
Basically it breaks down like this:
gamerinfo contains the gamers basic
information.
rosterinfo contains basic information about the rosters
(ie name and the game the roster is
aimed towards)
rosters contains the actual link from the gamer to the
different rosters (gamers can be on
multiple rosters)
gameinfo contains basic information about the games (ie
name and platform)
platform info contains information about the
different platforms the games are
played on (it is possible for a game
to be played on multiple platforms)
I am pretty new to SQL queries involving JOINs and UNIONs and such, usually I would just break it up into multiple queries but I thought there has to be a better way, so after looking around the net, I couldn't find (or maybe I just couldn't understand what I was looking at) what I was looking for. If anyone can point me in the right direction I would be most grateful.
There is nothing wrong with querying the required data step-by-step. If you use JOINs in your SQL over 5 tables, we sure to have useful indexes on all important columns. Also, this could create a lot of duplicate data:
Imagine this: You need 1 record from gamerinfo, maybe 3 of gameinfo, 4 ouf of rosters and both 3 out of the remaining two tables. This would give you a result of 1*3*4*3*3 = 108 records, which will look like this:
ID Col2 Col3
1 1 1
1 1 2
1 1 3
1 2 1
... ... ...
You can see that you would fetch the ID 108 times, even if you only need it once. So my advice would be to stick with mostly single, simple queries to get the data you need.
There is no need for UNION just multiple JOINs should do the work
SELECT gameinfo.id AS g_id, gameinfo.name AS g_name, platforminfoid.name AS p_name, platforminfoid.abbreviation AS p_abb, rosterinfo.name AS r_name
FROM gameinfo
LEFT JOIN platforminfo ON gameinfo.platforminfoid = platforminfo.id
LEFT JOIN rosters ON rosters.gameinfoid = gameinfo.id
LEFT JOIN rosterinfo ON rosterinfo.id = rosters.rosterinfoid
WHERE gameinfo.id = XXXX
this should pull all info about game based on game id
indexing on all id(s) gameinfoid, platformid, rosterinfoid will help on performance