I am using MySQLi to pull blog post information from a database onto my website. My database currently has 3 tables with the following relevant columns:
blog_posts: id,
blog_post_tags: blog_post_id, tag_id,
tags: id, name
I am trying to select all blog posts by tag name and this is the query that I am using:
SELECT blog_posts.*
FROM blog_post_tags
LEFT JOIN (blog_posts)
ON (blog_post_tags.blog_post_id = blog_posts.id)
WHERE blog_post_tags.tag_id
IN (
SELECT id FROM tags
WHERE name=$in_tag_name
)
where $in_tag_name is my PHP variable representing the name of the selected tag. However, I am not very experienced with SQL so I'm not sure if there is a more performant approach here than using a nested select.
To remove the nested select, I've considered instead of having 3 tables, just having 2 tables with the following relevant columns:
blog_posts: id,
blog_post_tags: blog_post_id, tag_name,
Then I considered the query:
SELECT blog_posts.*
FROM blog_post_tags
LEFT JOIN (blog_posts)
ON (blog_post_tags.blog_post_id = blog_posts.id)
WHERE blog_post_tags.tag_name=$in_tag_name
This approach gets rid of the nested query but feels less intuitive by not separating the tags table information fully from the blog posts table information.
I'm wondering if there is a best approach here of the two database structures, or else a better query than the nested select in the first database structure. I'm not sure if this is a silly question or if I'm overcomplicating things so if anyone can nudge me in the right direction here, that would be much appreciated!
The first schema is better, because you might want to add more information to tags, and this shouldn't be repeated for each post that uses tags. For instance, look at how Stack Exchange uses tags.
You don't need to use LEFT JOIN. Just join the three tables:
SELECT p.*
FROM blog_posts AS p
JOIN blog_post_tags AS bpt ON p.id = bpt.blog_post_id
JOIN tags AS t ON t.id = bpt.tag_id
WHERE t.name = :in_tag_name
Performance should be good if you declare all the foreign keys, which will automatically index them.
Related
I have 2 tables: Articles and Comments;
"Comments.articleID" is a foreign key.
I want to query the database to compose a website that shows the article text of a certain article (given an articleID) and all the article's comments.
I can think of 2 ways to query the data:
Use 2 separate queries:
SELECT articles.text FROM articles where id = givenArticleID
SELECT comments.* FROM comments where comments.articleID = givenArticleID
Use an Inner join:
SELECT articles.text, comments.*
FROM articles
INNER JOIN comments on articles.id = comments.articleID
WHERE articles.id = givenArticleID
The first option only returns the data I am interested in - that is good.
The second option returns all data I am interested in, but much more data than necessary. Every row in the result set contains the article.text column, that could be a lot of (unnecessary) data.
I think that the join would be better for certain queries, that do not require a WHERE condition (thus containing different articles).
Which way would you generally prefer in the situation above?
Or is there an even better alternative...?
Option 2 is probably better, because it is only one client-server round trip.
Also don't forget that each query has to be parsed by the database server.
I'd recommend that you benchmark both versions and see which one performs better.
I have three tables 'users', 'friends', 'friendsrequests'.
'Users' contains an id, a firstname and a lastname, 'friends' and 'friendsrequests' both contain users_id_a and users_id_b.
When I search for new friends, I select id's where firstname is LIKE :whatever or lastname LIKE :whatever. However, I want to exclude those id's which are present in the other two tables.
I know how to solve this via application logic, but I also know I shouldn't do this. I know I shouldn't chain the SELECT statements and that I should use joins.
You've answered your own question in that you know you can use joins. There are plenty of examples available here on how to do a join in MySQL.
There are several join types but the one you require in this instance is probably a LEFT OUTER. You could do a then do the filtering on the field on the other two tables by using a IS NULL. So what this is doing is joining on the additional tables regardless if there is any data in those tables. Using a WHERE IS NULL to filter out those that are present.
Rather than using joins you could take a WHERE NOT EXISTS approach. This logic might be more up your street if you're not familiar with SQL joins.
An example might be:
SELECT * FROM FRIENDS f
WHERE NOT EXISTS (SELECT 1 FROM friendsrequests fr WHERE f.user_id = fr.user_id)
Some examples can be found here:
SELECT * WHERE NOT EXISTS
Another approach in using the IN statement or specifically the WHERE NOT IN (SELECT ...)
Hopefully this will guide you if you're still stuck post your exact sql schema and the requirement on a site like http://sqlfiddle.com/ and you'll more likely get more specific response.
I have the following tables.
Articles table
a_id INT primary unique
name VARCHAR
Description VARCHAR
c_id INT
Category table
id INT
cat_name VARCHAR
For now I simply use
SELECT a_id,name,Description,cat_name FROM Articles LEFT JOIN Category ON Articles.a_id=Category.id WHERE c_id={$id}
This gives me all articles which belong to a certain category along with category name.
Each article is having only one category.
AND I use a sub category in a similar way(I have another table named sub_cat).But every article doesn't necessary have a sub category.It may belong to multiple categories instead.
I now think of tagging an article with more than one category just like the questions at stackoverflow are tagged(eg: with multiple tags like PHP,MYSQL,SQL etc).AND later I have to display(filter) all article with certain tags(eg: tagged with php,php +MySQL) and I also have to display the tags along with the article name,Description.
Can anyone help me redesign the database?(I am using php + MySQL at back-end)
Create a new table:
CREATE TABLE ArticleCategories(
A_ID INT,
C_ID INT,
Constraint PK_ArticleCategories Primary Key (Article_ID, Category_ID)
)
(this is the SQL server syntax, may be slightly different for MySQL)
This is called a "Junction Table" or a "Mapping Table" and it is how you express Many-to-Many relationships in SQL. So, whenever you want to add a Category to an Article, just INSERT a row into this table with the IDs of the Article and the Category.
For instance, you can initialize it like this:
INSERT Into ArticleCategories(A_ID,C_ID)
SELECT A_ID,C_ID From Articles
Now you can remove c_id from your Articles table.
To get back all of the Categories for a single Article, you would do use a query like this:
SELECT a_id,name,Description,cat_name
FROM Articles
LEFT JOIN ArticleCategories ON Articles.a_id=ArticleCategories.a_id
INNER JOIN Category ON ArticleCategories.c_id=Category.id
WHERE Articles.a_id={$a_id}
Alternatively, to return all articles that have a category LIKE a certain string:
SELECT a_id,name,Description
FROM Articles
WHERE EXISTS( Select *
From ArticleCategories
INNER JOIN Category ON ArticleCategories.c_id=Category.id
WHERE Articles.a_id=ArticleCategories.a_id
AND Category.cat_name LIKE '%'+{$match}+'%'
)
(You may have to adjust the last line, as I am not sure how string parameters are passed MySQL+PHP.)
Ok RBarryYoung you asked me about an reference/analyse you get one
This reference / analyse is based off the documention / source code analyse off the MySQL server
INSERT Into ArticleCategories(A_ID,C_ID)
SELECT A_ID,C_ID From Articles
On an large Articles table with many rows this copy will push one core off the CPU to 100% load and will create a disk based temporary table what will slow down the complete MySQL performance because the disk will be stress out with that copy.
If this is a one time process this is not that bad but do the math if you run this every time..
SELECT a_id,name,Description
FROM Articles
WHERE EXISTS( Select *
From ArticleCategories
INNER JOIN Category ON ArticleCategories.c_id=Category.id
WHERE Articles.a_id=ArticleCategories.a_id
AND Category.cat_name LIKE '%'+{$match}+'%'
)
Note dont take the Execution Times on sqlfriddle for real its an busy server and the times vary alot to make a good statement but look to what View Execution Plan has to say
see http://sqlfiddle.com/#!2/48817/21 for demo
Both querys always trigger an complete table scan on table Articles and two DEPENDENT SUBQUERYS thats not good if you have an large Articles table with many records.
This means the performance depends on the number of Articles rows even when you want only the articles that are in the category.
Select *
From ArticleCategories
INNER JOIN Category ON ArticleCategories.c_id=Category.id
WHERE Articles.a_id=ArticleCategories.a_id
AND Category.cat_name LIKE '%'+{$match}+'%'
This query is the inner subquery but when you try to run it, MySQL cant run because it depends on a value of the Articles table so this is correlated subquery. a subquery type that will be evaluated once for each row processed by the outer query. not good indeed
There are more ways off rewriting RBarryYoung query i will show one.
The INNER JOIN way is much more efficent even with the LIKE operator
Note ive made an habbit out off it that i start with the table with the lowest number off records and work my way up if you start with the table Articles the executing will be the same if the MySQL optimizer chooses the right plan..
SELECT
Articles.a_id
, Articles.name
, Articles.description
FROM
Category
INNER JOIN
ArticleCategories
ON
Category.id = ArticleCategories.c_id
INNER JOIN
Articles
ON
ArticleCategories.a_id = Articles.a_id
WHERE
cat_name LIKE '%php%';
;
see http://sqlfiddle.com/#!2/43451/23 for demo Note that this look worse because it looks like more rows needs to be checkt
Note if the Article table has low number off records RBarryYoung EXIST way and INNER JOIN way will perform more or less the same based on executing times and more proof the INNER JOIN way scales better when the record count become larger
http://sqlfiddle.com/#!2/c11f3/1 EXISTS oeps more Articles records needs to be checked now (even when they are not linked with the ArticleCategories table) so the query is less efficient now
http://sqlfiddle.com/#!2/7aa74/8 INNER JOIN same explain plan as the first demo
Extra notes about scaling it becomes even more worse when you also want to ORDER BY or GROUP BY the NOT EXIST way has an bigger chance it will create an disk based temporary table that will kill MySQL performance
Lets also analyse the LIKE '%php%' vs = 'php' for the EXIST way and INNER JOIN way
the EXIST way
http://sqlfiddle.com/#!2/48817/21 / http://sqlfiddle.com/#!2/c11f3/1 (more Articles) the explain tells me both patterns are more or less the same but 'php' should be little faster because off the const type vs ref in the TYPE column but LIKE %php% will use more CPU because an string compare algoritme needs to run.
the INNER JOIN way
http://sqlfiddle.com/#!2/43451/23 / http://sqlfiddle.com/#!2/7aa74/8 (more Articles) the explain tell me the LIKE '%php%' should be slower because 3 more rows need to be analysed but not shocking slower in this case (you can see the index is not really used on the best way).
RBarryYoung way works but doenst keep performance atleast not on a MySQL server
see http://sqlfiddle.com/#!2/b2bd9/1 or http://sqlfiddle.com/#!2/34ea7/1
for examples that will scale on large tables with lots of records this is what the topic starter needs
I have a table called faq. This table consists from fields faq_id,faq_subject.
I have another table called article which consists of article_id,ticket_id,a_body and which stores articles in a specific ticket. Naturally there is also a table "ticket" with fields ticket_id,ticket_number.
I want to retrieve a result table in format:
ticket_number,faq_id,faq_subject.
In order to do this I need to search for faq_id in the article.a_body field using %LIKE% statement.
My question is, how can I do this dynamically such that I return with SQL one result table, which is in format ticket_number,faq_id,faq_subject.
I tried multiple configurations of UNION ALL, LEFT JOIN, LEFT OUTER JOIN statements, but they all return either too many rows, or have different problems.
Is this even possible with MySQL, and is it possible to write an SQL statement which includes #variables and can take care of this?
First off, that kind of a design is problematic. You have certain data embedded within another column, which is going to cause logic as well as performance problems (since you can't index the a_body in such a way that it will help the JOIN). If this is a one-time thing then that's one issue, but otherwise you're going to have problems with this design.
Second, consider this example: You're searching for faq_id #123. You have an article that includes faq_id 4123. You're going to end up with a false match there. You can embed the faq_id values in the text with some sort of mark-up (for example, [faq_id:123]), but at that point you might as well be saving them off in another table as well.
The following query should work (I think that MySQL supports CAST, if not then you might need to adjust that).
SELECT
T.ticket_number,
F.faq_id,
F.faq_subject
FROM
Articles A
INNER JOIN FAQs F ON
A.a_body LIKE CONCAT('%', F.faq_id, '%')
INNER JOIN Tickets T ON
T.ticket_id = A.ticket_id
EDIT: Corrected to use CONCAT
SELECT DISTINCT t.ticket_number, f.faq_id, f.faq_subject
FROM faq.f
INNER JOIN article a ON (a.a_body RLIKE CONCAT('faq_id: ',faq_id))
INNER JOIN ticket t ON (t.ticket_id = a.ticket_id)
WHERE somecriteria
MySQL setup: step by step.
programs -> linked to --> speakers (by program_id)
At this point, it's easy for me to query all the data:
SELECT *
FROM programs
JOIN speakers on programs.program_id = speakers.program_id
Nice and easy.
The trick for me is this. My speakers table is also linked to a third table, "books." So in the "speakers" table, I have "book_id" and in the "books" table, the book_id is linked to a name.
I've tried this (including a WHERE you'll notice):
SELECT *
FROM programs
JOIN speakers on programs.program_id = speakers.program_id
JOIN books on speakers.book_id = books.book_id
WHERE programs.category_id = 1
LIMIT 5
No results.
My questions:
What am I doing wrong?
What's the most efficient way to make this query?
Basically, I want to get back all the programs data and the books data, but instead of the book_id, I need it to come back as the book name (from the 3rd table).
Thanks in advance for your help.
UPDATE:
(rather than opening a brand new question)
The left join worked for me. However, I have a new problem. Multiple books can be assigned to a single speaker.
Using the left join, returns two rows!! What do I need to add to return only a single row, but separate the two books.
is there any chance that the books table doesn't have any matching columns for speakers.book_id?
Try using a left join which will still return the program/speaker combinations, even if there are no matches in books.
SELECT *
FROM programs
JOIN speakers on programs.program_id = speakers.program_id
LEFT JOIN books on speakers.book_id = books.book_id
WHERE programs.category_id = 1
LIMIT 5
Btw, could you post the table schemas for all tables involved, and exactly what output (or reasonable representation) you'd expect to get?
Edit: Response to op author comment
you can use group by and group_concat to put all the books on one row.
e.g.
SELECT speakers.speaker_id,
speakers.speaker_name,
programs.program_id,
programs.program_name,
group_concat(books.book_name)
FROM programs
JOIN speakers on programs.program_id = speakers.program_id
LEFT JOIN books on speakers.book_id = books.book_id
WHERE programs.category_id = 1
GROUP BY speakers.id
LIMIT 5
Note: since I don't know the exact column names, these may be off
That's typically efficient. There is some kind of assumption you are making that isn't true. Do your speakers have books assigned? If they don't that last JOIN should be a LEFT JOIN.
This kind of query is typically pretty efficient, since you almost certainly have primary keys as indexes. The main issue would be whether your indexes are covering (which is more likely to occur if you don't use SELECT *, but instead select only the columns you need).