while trying to learn sql i came across "Learn SQL The Hard Way" and i started reading it.
Everything was going fine then i thought ,as a way to practice, to make something like given example in the book (example consists in 3 tables pet,person,person_pet and the person_pet table 'links' pets to their owners).
I made this:
report table
+----+-------------+
| id | content |
+----+-------------+
| 1 | bank robbery|
| 2 | invalid |
| 3 | cat on tree |
+----+-------------+
notes table
+-----------+--------------------+
| report_id | content |
+-----------+--------------------+
| 1 | they had guns |
| 3 | cat was saved |
+-----------+--------------------+
wanted result
+-----------+--------------------+---------------+
| report_id | report_content | report_notes |
+-----------+--------------------+---------------+
| 1 | bank robbery | they had guns |
| 2 | invalid | null or '' |
| 3 | cat on tree | cat was saved |
+-----------+--------------------+---------------+
I tried a few combinations but no success.
My first thought was
SELECT report.id,report.content AS report_content,note.content AS note_content
FROM report,note
WHERE report.id = note.report_id
but this only returns the ones that have a match (would not return the invalid report).
after this i tried adding IF conditions but i just made it worse.
My question is, is this something i will figure out after getting past basic sql
or can this be done in simple way?
Anyway i would appreciate any help, i pretty much lost with this.
Thank you.
EDIT: i have looked into related questions but havent yet found one that solves my problem.
I probably need to look into other statements such as join or something to sort this out.
You need to get to the chapter on OUTER JOINS, specifically, a LEFT JOIN
SELECT report.id,report.content AS report_content,note.content AS note_content
FROM report
LEFT JOIN note ON report.id = note.report_id
Note the ANSI-92 JOIN syntax as opposed to using WHERE x=y
(You can probably do it using the older syntax you were using WHERE report.id *= note.report_id, if I recall the old syntax correctly, but I'd recommend the above syntax instead)
You are doing a join. The kind of join you have is an inner join, but you want an outer join:
SELECT report.id,report.content AS report_content,note.content AS note_content
FROM report
LEFT JOIN note on report.id = note.report_id
Note that the LEFT table is the one that will supply the missing values.
Related
I hope that stackoverflow is the correct place to ask this, I feel a bit on the fence but didn't find that it really fit better into another stack-exchange site.
So, the question is pretty much about "best-practice" or design in mysql, I don't see this done a lot in tutorials and resources why I am a bit afraid that it is not a good way to do it, so I thought I'd try to get some feedback.
I tried to make a layout as an example (thanks for commenting)
https://www.db-fiddle.com/f/rBRUhX3DYiTgGyBPSgQfCm/2
I have a layout similar to this:
table: player
+----+------+------+
| id | name | data |
+----+------+------+
| 1 | foo | bar |
| 2 | test | test |
+----+------+------+
Then I have tables to pick specific information
table: user_external_name
+----+----------+
| id | nickname |
+----+----------+
| 1 | baz |
| 2 | qux |
+----+----------+
And I have a third table containing matches between players, something like:
table: matches
+---------+--------+--------+
| matchid | homeid | awayid |
+---------+--------+--------+
| 0 | 1 | 2 |
+---------+--------+--------+
And then I might do queries like this on matches:
SELECT
(SELECT nickname from user_external_name WHERE id = matches.home) as home,
(SELECT nickname from user_external_name WHERE id = matches.away) as away
FROM matches;
I also realized that I can make use of joins to make the query and that way I go get rid of the multiple selects. I am still not sure why the design is dumb, but I figured out that what I need to read about is pretty much relational databases. I will leave my original above for reference if someone else come stumbling down this road.
SELECT
h.nickname home,
a.nickname away
FROM `matches` as m
join user_external_name as h on h.id = m.home
join user_external_name as a on a.id = m.away;
resulting in:
+------+------+
| home | away |
+------+------+
| baz | qux |
+------+------+
So the actual question
Is this a reasonable way of doing it, or is it dumb in some way? One of my main arguments are that this way I can reuse the id to get the specific information by id in other tables (i.e. I never have to copy the actual name). Could you point me to a better way of doing this, or some resources/suggestions as how to think in this situation?
Thanks for taking the time to read through and hopefully I can learn something good. :)
I wanted to ask you which could be the best approach creating my MySQL database structure having the following case.
I've got a table with items, which is not needed to describe as the only important field here is the ID.
Now, I'd like to be able to assign some attributes to each item - by its ID, of course. But I don't know exactly how to do it, as I'd like to keep it dynamic (so, I do not have to modify the table structure if I want to add a new attribute type).
What I think
I think - and, in fact, is the structure that I have right now - that I can make a table items_attributes with the following structure:
+----+---------+----------------+-----------------+
| id | item_id | attribute_name | attribute_value |
+----+---------+----------------+-----------------+
| 1 | 1 | place | Barcelona |
| 2 | 2 | author_name | Matt |
| 3 | 1 | author_name | Kate |
| 4 | 1 | pages | 200 |
| 5 | 1 | author_name | John |
+----+---------+----------------+-----------------+
I put data as an example for you to see that those attributes can be repeated (it's not a relation 1 to 1).
The problem with this approach
I have the need to make some querys, some of them for statistic purpouses, and if I have a lot of attributes for a lot of items, this can be a bit slow.
Furthermore - maybe because I'm not an expert on MySQL - everytime I want to make a search and find "those items that have 'place' = 'Barcelona' AND 'author_name' = 'John'", I end up having to make multiple JOINs for every condition.
Repeating the example before, my query would end up like:
SELECT *
FROM items its
JOIN items_attributes attr
ON its.id = attr.item_id
AND attr.attribute_name = 'place'
AND attr.attribute_value = 'Barcelona'
AND attr.attribute_name = 'author_name'
AND attr.attribute_value = 'John';
As you can see, this will return nothing, as an attribute_name cannot have two values at once in the same row, and an OR condition would not be what I'm searching for as the items MUST have both attributes values as stated.
So the only possibility is to make a JOIN on the same repeated table for every condition to search, which I think it's very slow to perform when there are a lot of terms to search for.
What I'd like
As I said, I'd like to be able to keep the attributes types dynamical, so by adding a new input on 'attribute_name' would be enough, without having to add a new column to a table. Also, as they are 1-N relationship, they cannot be put in the 'items' table as new columns.
If the structure, in your opinion, is the only one that can acheive my interests, if you could light up some ideas so the search queries are not a ton of JOINs it would be great, too.
I don't know if it's quite hard to get it as I've been struggling my head until now and I haven't come up with a solution. Hope you guys can help me with that!
In any case, thank you for your time and attention!
Kind regards.
You're thinking in the right direction, the direction of normalization. The normal for you would like to have in your database is the fifth normal form (or sixth, even). Stackoverflow on this matter.
Table Attribute:
+----+----------------+
| id | attribute_name |
+----+----------------+
| 1 | place |
| 2 | author name |
| 3 | pages |
+----+----------------+
Table ItemAttribute
+--------+----------------+
| item_id| attribute_id |
+--------+----------------+
| 1 | 1 |
| 2 | 1 |
| 3 | 2 |
+--------+----------------+
So for each property of an object (item in this case) you create a new table and name it accordingly. It requires lots of joins, but your database will be highly flexible and organized. Good luck!
In my Opinion it should be something like this, i know there are a lot of table, but actually it normilizes your DB
Maybe that is why because i cant understant where you get your att_value column, and what should contains this columns
I have a table which looks like this but much longer...
| CategoryID | Category | ParentCategoryID |
+------------+----------+------------------+
| 23 | Screws | 3 |
| 3 | Packs | 0 |
I am aiming to retrieve one column from this which in this instance would give me the following...
| Category |
+--------------+
| Packs/Screws |
Please excuse me for not knowing exactly how to word this, so far I can only think to split the whole table into multiple tables and use LEFT JOIN, this seems like a very good opportunity for a learning curve however.
I realise that CONCAT() will come into play when combining the two retrieved Category names but beyond that I am stumped.
SELECT CONCAT(x.category,'/',y.category) Category
FROM my_table x
JOIN my_table y
ON y.categoryid = x.parentcategoryid
[WHERE x.parentcategoryid = 0]
Bear with me, this needs a lot of up front info to explain what I am trying to do. I have tried to genericize it as much as possible to make things clearer. In a single query I am hoping to pull out a list of pages which match against tags linked in another table, and these tags are in groups. I am hoping to use the textual representation of the item instead of it's id, but if nothing else I could do 2 up front queries to get the tag_id and taggroup_id - just hoping not to have to do that.
DB Schema:
+-----------------------------------+
| taggroups |
+------------------+----------------+
| taggroup_id | group_name |
+------------------+----------------+
| 1 | fruits |
+------------------+----------------+
+-----------------------------------------------+
| tags |
+-------------+-----------------+---------------+
| tag_id | taggroup_id | tag_name |
+-------------+-----------------+---------------+
| 1 | 1 | apple |
| 2 | 1 | orange |
| 3 | 1 | grape |
+-------------+-----------------+---------------+
+--------------------------------------+
| pages |
+------------------+-------------------+
| page_id | title |
+------------------+-------------------+
| 99 | Doctor a day |
+------------------+-------------------+
+--------------------------------------------------+
| tags_to_pages |
+------------+----------+---------------+----------+
| join_id | tag_id | taggroup_id | page_id |
+------------+----------+---------------+----------+
| 1 | 1 | 1 | 99 |
| 2 | 2 | 1 | 99 |
+------------+----------+---------------+----------+
Test Query:
Got this far and can't seem to get it to work.
SELECT
pages.*, tags.tag_name, taggroups.group_name
FROM
tags_to_pages
INNER JOIN taggroups as grp ON (
grp.group_name = 'fruits'
AND
tags_to_pages.taggroup_id = grp.taggroup_id
)
INNER JOIN tags as val ON (val.tag_name = 'apple' AND tags_to_pages.tag_id = val.tag_id)
LEFT JOIN pages ON (tags_to_pages.page_id = pages.page_id)
Additionally, what tables should have indexes and what should the indexes be for be optimization?
I'd do it this way:
SELECT
pages.*, tags.tag_name, taggroups.group_name
FROM
tags_to_pages
JOIN taggroups AS grp ON grp.taggroup_id = tags_to_pages.taggroup_id
JOIN tags AS val ON val.taggroup_id = grp.taggroup_id
JOIN pages ON tags_to_pages.page_id = pages.page_id
WHERE
grp.group_name='fruits'
AND val.tag_name = 'apple'
This isn't that different to what you have, but I'm putting the join criteria in the JOIN clauses and the selection criteria in the WHERE clauses, which seems tidier to me.
While re-typing this query I spotted that you were using a tag_id somewhere I thought should have been a taggroup_id, so I changed it, but I regret I can't see it again now.
I'd also worry about the selection criteria - what if an apple doesn't happed to be a fruit? Obviously it is in this case, and indeed in reality :-), but I think you should only be specifying the fruit name in your query, not fruit and group names, and leave the database to sort it out for itself.
Also, why use an INNER JOIN for tags, tags_to_pages and taggroups and an OUTER JOIN for pages? Surely if there are no pages, you're better of getting no rows returned rather than one row, half full on NULLS?
I'd index the id columns only.
Just my 2p worth, really.
EDIT
I have set up a demo of this on www.sqlfiddle.com. Your original query worked fine once I changed the aliases in the SELECT list. My attempt above didn't work terribly well :-(. I had the same problems with the aliases and once I fixed them the query returned the same row twice.
I re-wrote it again from scratch as
SELECT pages.*, tags.tag_name, taggroups.group_name
FROM pages
JOIN tags_to_pages AS ttp ON ttp.page_id = pages.page_id
JOIN tags ON tags.tag_id = ttp.tag_id
JOIN taggroups ON taggroups.taggroup_id = ttp.taggroup_id
WHERE taggroups.group_name = 'fruits' AND tags.tag_name='apple';
and it works fine link.
Setting up this demo, it struck me to wonder why you were saving the taggroup_id in the tags_to_pages table. I'm "self-taught" in SQL and databases (translate that to: "I make it up as I go along, rely on doing things that 'work' and trust to intuition to find out what's 'right'") but doesn't this break the idea of normalisation? Shouldn't the connection between tags and taggroups be defined only via the taggroup_id column in the tags table? Perhaps someone who really understands databases will come along and put me right.
Finally, I've no idea why PHPMyAdmin just hung up when you tried your query. Good luck!
I am a beginner at using mysql and I am trying to learn the best practices. I have setup a similar structure as seen below.
(main table that contains all unique entries) TABLE = 'main_content'
+------------+---------------+------------------------------+-----------+
| content_id | (deleted) | title | member_id |
+------------+---------------+------------------------------+-----------+
| 6 | | This is a very spe?cal t|_st | 1 |
+------------+---------------+------------------------------+-----------+
(Provides the total of each difficulty and joins id --> actual name) TABLE = 'difficulty'
+---------------+-------------------+------------------+
| difficulty_id | difficulty_name | difficulty_total |
+---------------+-------------------+------------------+
| 1 | Absolute Beginner | 1 |
| 2 | Beginner | 1 |
| 3 | Intermediate | 0 |
| 4 | Advanced | 0 |
| 5 | Expert | 0 |
+---------------+-------------------+------------------+
(This table ensures that multiple values can be inserted for each entry. For example,
this specific entry indicates that there are 2 difficulties associated with the submission)
TABLE = 'lookup_difficulty'
+------------+---------------+
| content_id | difficulty_id |
+------------+---------------+
| 6 | 1 |
| 6 | 2 |
+------------+---------------+
I am joining all of this into a readable query:
SELECT group_concat(difficulty.difficulty_name) as difficulty, member.member_name
FROM main_content
INNER JOIN difficulty ON difficulty.difficulty_id
IN (SELECT difficulty_id FROM main_content, lookup_difficulty WHERE lookup_difficulty.content_id = main_content.content_id )
INNER JOIN member ON member.member_id = main_content.member_id
The above works fine, but I am wondering if this is good practice. I practically followed the structure laid out Wikipedia's Database Normalization example.
When I run the above query using EXPLAIN, it says: 'Using where; Using join buffer' and also that I am using 2 DEPENDENT SUBQUERY (s) . I don't see any way to NOT use sub-queries to achieve the same affect, but then again I'm a noob so perhaps there is a better way....
The DB design looks fine - regarding your query, you could rewrite it exclusively with joins like:
SELECT group_concat(difficulty.difficulty_name) as difficulty, member.member_name
FROM main_content
INNER JOIN lookup_difficulty ON main_content.id = lookup_difficulty.content_id
INNER JOIN difficulty ON difficulty.id = lookup_difficulty.difficulty_id
INNER JOIN member ON member.member_id = main_content.member_id
If the lookup_difficulty provides a link between content and difficulty I would suggest you take out the difficulty_id column from your main_content table. Since you can have multiple lookups for each content_id, you would need some extra business logic to determine which difficulty_id to put in your main_content table (or multiple entries in the main_content table for each difficulty_id, but that goes against normalization practices). For ex. the biggest value / smallest value / random value. In either case, it does not make much sense.
Other than that the table looks fine.
Update
Saw you updated the table :)
Just as a side-note. Using IN can slow down your query (IN can cause a table-scan). In any case, it used to be that way, but I'm sure that these days the SQL compiler optimizes it pretty well.