MySQL storing and searching string - mysql

I’m currently in the process of developing my own blogging system. Currently when you create a new post, you get the option to archive it categories of your own choise.
Currently I’m storing the categories as a VARCHAR value in a mysql database. As an example the field will contain 2,4,8 if the user has chosen the categories with ID: 2, 4 and 8.
To retrieve the blog posts for the category with ID 4 I then use:
SELECT col FROM table WHERE LOCATE(',4,', CONCAT(',',col,','))
I’ve been told that values seperated with a decimal comma is a no-go (very bad) when it comes to good database structure!
Can anyone provide me with a good way/technique to make this the most effective way?
Thanks in advance

A flexible & robust setup, as posted so many times in SO:
POSTS
id
name
text
CATEGORIES
id
name
POST_CATEGORIES
post_id
category_id
Where the current query would be:
SELECT p.id, p.name, p.text
FROM posts p
JOIN post_categories pc
ON pc.post_id = p.id
AND pc.category_id = 4;

Look into relational database normalization. For your specific case consider creating 2 additional tables, Categories and BlogCategories in addition to your Blog content table. Categories contain the definition of all tags/categories and nothing else. The BlogCategories table is a many-to-many cross reference table that probably in your case just contains the foreign key reference to the Blog table and the foreign key reference to the Categories table. This allows 1 Blog entry to be associated with multiple categories and 1 Category to be associated with multiple Blog entries.
Getting the data out won't be any more difficult than a 3 table join at worst and you'll be out of the substring business to figure our your business logic.

Related

How to select comma-separated values from a field in one table joined to another table with a specific where condition?

I'm working on a mysql database select and cannot find a solution for this tricky problem.
There's one table "words" with id and names of objects (in this case possible objects in a picture).
words
ID object
house
tree
car
…
In the other table "pictures" all the information to a picture is saved. Besides to information to resolution, etc. there are especially informations on the objects in the picture. They are saved in the column objects by the ids from the table words like 1,5,122,345, etc.
Also the table pictures has a column "location", where the id of the place is written, where I took the picture.
pictures
location objectsinpicture ...
1 - 1,2,3,4
2 - 1,5,122,34
1 - 50,122,345
1 - 91,35,122,345
2 - 1,14,32
1 - 1,5,122,345
To tag new pictures of a particular place I want to become suggestions of already saved information. So I can create buttons in php to update the database instead of using a dropdown with multiple select.
What I have tried so far is the following:
SELECT words.id, words.object
FROM words, pictures
WHERE location = 2 AND FIND_IN_SET(words.id, pictures.objectsinpicture)
GROUP BY words.id
ORDER BY words.id
This nearly shows the expected values. But some information is missing. It doesn't show all the possible objects and I cannot find any reason for this.
What I want is for example all ids fo location 2 joined to the table words and to group double entries of objectsinpicture:
1,5,122,34
1,14,32
1,5,14,32,34,122
house
...
...
...
...
...
Maybe I need to use group_concat with comma separator. But this doesn't work, either. The problem seems to be where condition with the location.
I hope that anyone has an idea of solving this request.
Thanks in advance for any support!!!
This is a classic problem of denormalization causing problems.
What you need to do is store each object/picture association separately, in another table:
create table objectsinpicture (
picture_id int,
object_id int,
primary key (picture_id, object_id)
);
Instead of storing a comma-separated list, you would store one association per row in this table. It will grow to a large number of rows of course, but each row is just a pair of id's so the total size won't be too great.
Then you can query:
SELECT w.id, w.object
FROM pictures AS p
JOIN objectsinpicture AS o ON o.picture_id = p.id
JOIN words AS w ON o.object_id = w.id
WHERE p.location = 2;

MySQL: Taxonomy - Get ID that belongs to multiple categories

I was hoping someone could help me come up with a query for what I'm looking to do.
I have a website that lists game servers and I'm trying to improve my search system a bit.
There's three tables of interest; servers, version_taxonomy and category_taxonomy. The taxonomy tables contain two columns, one for a server ID and one for a version/category ID, where associations between a server and it's supported versions and categories can be made.
Up till now, I've been joining both taxonomy tables to the server table and be looking up servers for one version and one category, it's been working fine. However I'm looking to allow the search of a server that has multiple categories at the same time.
I've made an image to try and illustrate what I'm looking to do:
Say I'm looking for a server that has both categories 5 and 12 - Based on the table on the left that would be servers 1 and 3. But how would that be in a query? And how would I use that query to later get and work with the rest of the server data (JOIN like I'd normally do?)
Hopefully that makes sense! Looking forward to your responses.
Assuming I understand the question:
Join the two tables then count the distinct values of category ID while limiting by them. Distinct is not be needed if you can guarantee the uniqueness of serverID, categoryID from table A and a 1:1 relationship to server taxonomy which would be true if you always limit by 1 and only 1 version...
SELECT A.ServerID, count(A.CategoryID) CatCnt
FROM A
INNER JOIN B
on A.ServerID = B.ServerID
WHERE A.CATEGORYID in (5,12)
and B.Version= 1.16
GROUP BY A.ServerID
HAVING count(distinct A.CategoryID) = 2
The category ID could be parameter passed in as well as the count distinct as you know both values.
This could be used as a CTE or as a inline derived table as a source then join in to get the addiontal data; or left join in the desired data assuming it's a 1:1 relationship.
If you want a working example: post DDL for table and SQL to create sample data and I'll put something in https://rextester.com/.

How do I join all rows with the same name in one table using MYSQL?

Suppose I have a database called clubmembership that has a column for names, a column for clubs, and a column for the role they play in that club. The name Margrit would be in the column name many times, or as many times as she is in a club. If I want to see which people are members of the sewing club my query might look something like this:
SELECT DISTINCT NAME FROM CLUBMEMBERSHIP
WHERE CLUB=’SEWING’
AND ROLE=’MEMBER’;
My problem is that I can't figure out a query for who is not in the sewing club. Of course the simple 'not in' clause isn't working because there are plenty of rows which sewing does not appear in. In this database if someone is not in the sewing club, sewing does not appear under club so I imagine there is a way to join the different rows with the same name under 'name' and then potentially use the 'not in' clause
I hope this was a good explanation of this question. I have been struggling with this problem for a while now.
Thanks for your help!
Nicolle
This is not something that can be solved by just changing the existing code, it is to do with the database design.
Database normalisation is the process of sorting out your database into sensible tables.
If you’re adding a person many times, then you should create a table called members instead. And if there is a list of clubs, then you should create a clubs table.
Then, you can create a table to join them together.
Here’s your three tables:
members
-------
id (int)
name (varchar)
clubs
-------
id (int)
name (varchar)
memberships
-------
member_id (int)
club_id (int)
Then you can use joins in MySQL to return the information you need.
Stack Overflow doesn’t like external links as the answer should be here, but this is a huge topic that won’t fit in a single reply, so I would briefly read about database normalization, and then read about ‘joining’ tables.
If I understand you correctly, you wanted to list all names that is not a member of SEWING. The Inner query will get all Names that are member of SEWING, however, the NOT EXISTS operator will get all Names that are not found in the inner query.
SELECT DISTINCT C.NAME
FROM CLUBMEMBERSHIP C
WHERE C.ROLE = 'MEMBER'
NOT EXISTS
(
SELECT NULL
FROM CLUBMEMBERSHIP D
WHERE D.CLUB='SEWING'
AND D.ROLE='MEMBER'
AND C.NAME = D.NAME
)
Here's a Demo.

Whats the best way to implement a database with multivalued attributes?

i am trying to implement a database which has multi valued attributes and create a filter based search. For example i want my people_table to contain id, name, address, hobbies, interests (hobbies and interests are multi-valued). The user will be able to check many attributes and sql will return only those who have all of them.
I made my study and i found some ways to implement this but i can't decide which one is the best.
The first one is to have one table with the basic info of people (id, name, address), two more for the multi-valued attributes and one more which contains only the keys of the other tables (i understand how to create this tables, i don't know yet how to implement the search).
The second one is to have one table with the basic info and then one for each attribute. So i will have 20 or more tables (football, paint, golf, music, hiking etc.) which they only contain the ids of the people. Then when the user checks the hobbies and the activities i am going to get the desired results with the use of the JOIN feature (i am not sure about the complexity, so i don't know how fast is going to be if the user do many checks).
The last one is an implementation that i didn't find on internet (and i know there is a reason :) ) but in my mind is the easiest to implement and the fastest in terms of complexity. Use only one table which will have the basic infos as normal and also all the attributes as boolean variables. So if i have 1000 people in my table there are going to be only 1000 loops and which i imagine with the use of AND condition are going to be fast enough.
So my question is: can i use the the third implementation or there is a big disadvantage that i don't get? And also which one of the first two ways do you suggest me to use?
That is a typical n to m relation. It works like this
persons table
------------
id
name
address
interests table
---------------
id
name
person_interests table
----------------------
person_id
interest_id
person_interests contains a record for each interest of a person. To get the interests of a person do:
select i.name
from interests i
join person_interests pi on pi.interest_id = i.id
join persons p on pi.person_id = p.id
where p.name = 'peter'
You could create also tables for hobbies. To get the hobbies do the same in a separate query. To get both in one query you can do something like this
select p.id, p.name,
i.name as interest,
h.name as hobby
from persons p
left join person_interests pi on pi.person_id = p.id
left join interests i on pi.interest_id = i.id
left join person_hobbies ph on ph.person_id = p.id
left join hobbies h on ph.hobby_id = h.id
where p.name = 'peter'
The basic way to deal with this is with a many-to-many join table. Each user can have many hobbies. Each hobby can have many users. That's basic stuff you can find information about anywhere, and #juergend already covered that.
The harder part is tracking different information about various hobbies and interests. Like if their hobby is "baseball" you might want to track what position they play, but if their hobby is "travel" you might want to track their favorite countries. Doing this with typical SQL relationships will lead to a rapid proliferation of tables and columns.
A hybrid approach is to use the new JSON data type to store some unstructured data. To expand on #juergend's example, you might add a field to Person_Interests which can store some of those details about that person's interest.
create table Person_Interests (
InterestID integer references Interests(ID),
PersonID integer references Persons(ID),
Details JSON
);
And now you could add that Person 45 has Interest 12 (travel), their favorite country is Djibouti, and they've been to 45 countries.
insert into person_interests
(InterestID, PersonID, Details)
(12, 45, '{"favorite_country": "Djibouti", "countries_visited": 45}');
And you can use JSON search functions to find, for example, everyone whose favorite country is Djibouti.
select p.id, p.name
from person_interests pi
join persons p on p.id = pi.personid
where pi.details->"$.favorite_country" = "Djibouti"
The advantage here is flexibility: interests and their attributes aren't limited by your database schema.
The disadvantages is performance. The JSON data type isn't the most efficient, and indexing a JSON column in MySQL is complicated. Good indexing is critical to good SQL performance. So as you figure out common patterns you might want to turn commonly used attributes into real columns in real tables.
The other option would be to use table inheritance. This is a feature of Postgres, not MySQL, and I'd recommend considering switching. Postgres also has better and more mature JSON support and JSON columns are easier to index.
With table inheritance, rather than having to write a completely new table for every different interest, you can make specific tables which inherit from a more generic one.
create table person_interests_travel (
FavoriteCountry text,
CountriesVisited text[]
) inherits(person_interests);
This still has InterestID, PersonID, and Details, but it's added some specific columns for tracking their favorite country and countries they've visited.
Note that text[]. Postgresql also supports arrays so you can store real lists without having to create another join table. You can also do this in MySQL with a JSON field, but arrays offer type constraints that JSON does not.

'Likes' system database

I am developing web application where I have to implement 'Likes' system as facebook has. Application will have a few categories of products that customer can 'like'. So I have started to create database, but I stuck on one obstacle. As I understand there are two ways of doing this:
First. Create one database table with fields of "id, user_id, item_category, item_id". When user click 'like' button information will be saved in this table with various categories of products (item_category).
Second. Create several tables for certain categories of item. For instance, "tbl_item_category_1, tbl_item_category_2, tbl_item_category_3" with fields of "user_id, item_id".
Would be great to get more insight about best practices of this kind database structures. Which works faster? and more logical/practical? I will use only several categories of items.
I would go with the first version with a table structure similar to this:
User Table: PK id
id
username
Category Table: PK id
id
categoryname
Like Table: PK both user_id and catgory_id
user_id
category_id
Here is a SQL Fiddle with demo of table structure with two sample queries to give the Total Likes by user and Total Likes by category
The second one - creating multiple tables is a terrible idea. If you have 50-100 categories trying to query those tables would be horrible. It would become completely unmanageable.
If you have multiple tables trying to get a the total likes would be:
Select count(*)
from category_1
JOIN category_2
ON userid = userid
join category_3
ON userid = userid
join .....
Use one table, no question.
The first method is the correct one. Never make multiple tables for item categories, it makes maintaining your code a nightmare, and makes queries ugly.
In fact, the general rule is that anything that is dynamic (i.e. it changes) should not be stored as a set of static objects (e.g. tables). If you think you might add a new type of 'something' later on, then you need a 'something' types table.
For example, imagine trying to get a count of how many items a user has liked. With the first method, you can just do SELECT COUNT(*) FROM likes WHERE user_id = 123, but in the second method you'd need to do a JOIN or UNION, which is bad for performance and bad for maintainability.
The first method is the correct one. Because you dont know how many categories you will be having and it is very difficult to get the data.