How to design MySQL Table Structure for search optimization [duplicate] - mysql

This question already has answers here:
Is storing a delimited list in a database column really that bad?
(10 answers)
Closed 2 years ago.
Hello to Folks out there,
I need some suggestion on How to design MySQL Table Structure for search optimization.
I am creating a Real estate website. In that I have property table and all its associated tables.
I can design my table for saving these records in two ways.
I have amenity master table
id property_name
----------------
1 Property A
2 Property B
3 Property C
Approach 1
Property Table
id property_name
----------------
1 Property A
2 Property B
3 Property C
Property_Amenities table
id p_id amenity_id
------------------------
1 1 1
2 1 2
3 1 3
4 1 4
5 2 1
6 2 1
Approach 2
Property Table
id property_name amenity_id
----------------------------
1 Property A 1,2,3,4,5
2 Property B 1,4,7,9,12
3 Property C 3,4,7,8,9,10
Approach 1 Query : I can join tables and get the all the amenities name for a particular property. Add the required index for optimization. For A property there will be 20 amenities on averages. Suppose I have 100K property records then to get amenities for particular property, MySQL query will search in 2 millions records of Property_Amenities table.
Approach 2 Query : I can search using FIND_IN_SET or IN MySQL operator. But I was going through this search topic and it seems that this approach will be much resource intensive and cost more for same amount of data i.e 100k property records.
Any suggestion will be appropriated. What your thought on this scenario or any other approach I should follow.

Use the first approach. Period. The second is wrong, wrong, wrong. Here are some reasons:
A SQL string should not be used to store multiple values.
Integers should be stored using the proper type.
Foreign key relationships should be declared.
SQL has very poor string handling methods.
SQL has a great way to store lists; it is called a "table" not a "string".

Related

MYSQL DB Best method to store keywords and URL index

Which of these methods would be the most efficient way of storing, retrieving, processing and searching a large (millions of records) index of stored URLs along with there keywords.
Example 1: (Using one table)
TABLE_URLs-----------------------------------------------
ID DOMAIN KEYWORDS
1 mysite.com videos,photos,images
2 yoursite.com videos,games
3 hissite.com games,images
4 hersite.com photos,pictures
---------------------------------------------------------
Example 2: (one-to-one Relationship from one table to another)
TABLE_URLs-----------------------------------------------
ID DOMAIN KEYWORDS
1 mysite.com
2 yoursite.com
3 hissite.com
4 hersite.com
---------------------------------------------------------
TABLE_URL_KEYWORDS---------------------------------------------
ID DOMAIN_ID KEYWORDS
1 1 videos,photos,images
2 2 videos,games
3 3 games,images
4 4 photos,pictures
---------------------------------------------------------
Example 3: (one-to-one Relationship from one table to another (Using a reference table))
TABLE_URLs-----------------------------------------------
ID DOMAIN
1 mysite.com
2 yoursite.com
3 hissite.com
4 hersite.com
---------------------------------------------------------
TABLE_URL_TO_KEYWORDS------------------------------------
ID DOMAIN_ID KEYWORDS_ID
1 1 1
2 2 2
3 3 3
4 4 4
---------------------------------------------------------
TABLE_KEYWORDS-------------------------------------------
ID KEYWORDS
1 videos,photos,images
2 videos,games
3 games,images
4 photos,pictures
---------------------------------------------------------
Example 4: (many-to-many Relationship from url to keyword ID (using reference table))
TABLE_URLs-----------------------------------------------
ID DOMAIN
1 mysite.com
2 yoursite.com
3 hissite.com
4 hersite.com
---------------------------------------------------------
TABLE_URL_TO_KEYWORDS------------------------------------
ID DOMAIN_ID KEYWORDS_ID
1 1 1
2 1 2
3 1 3
4 2 1
5 2 4
6 3 4
7 3 3
8 4 2
9 4 5
---------------------------------------------------------
TABLE_KEYWORDS-------------------------------------------
ID KEYWORDS
1 videos
2 photos
3 images
4 games
5 pictures
---------------------------------------------------------
My understanding is that Example 1 would take the largest amount of storage space however searching through this data would be quick (Repeat keywords saved multiple times, however keywords are sat next to the relevant domain)
wWhereas Example 4 would save a tons on storage space but searching through would take longer. (Not having to store duplicate keywords, however referencing multiple keywords for each domain would take longer)
Could anyone give me any insight or thoughts on which the best method would be to utilise when designing a database that can handle huge amounts of data? With the foresight that you may want to display a URL with its assosicated keywords OR search for one or more keywords and bring up the most relevant URLs
You do have a many-to-many relationship between url and keywords. The canonical way to represent this in a relational database is to use a bridge table, which corresponds to example 4 in your question.
Using the proper data structure, you will find out that the queries will be much easier to write, and as efficient as it gets.
I don't know what drives you to think that searchin in a structure like the first one will be faster. This requires you to do pattern matching when searching for each single keyword, which is notably slow. On the other hand, using a junction table lets you search for exact matches, which can take advantage of indexes.
Finally, maintaining such a structure is also much easier; adding or removing keywords can be done with insert and delete statements, while other structures require you do do string manipulation in delimited list, which again is tedious, error-prone and inefficient.
None of the above.
Simply have a table with 2 string columns:
CREATE TABLE domain_keywords (
domain VARCHAR(..) NOT NULL,
keyword VARCHAR(..) NOT NULL,
PRIMARY KEY(domain, keyword),
INDEX(keyword, domain)
) ENGINE=InnoDB
Notes:
It will be faster.
It will be easier to write code.
Having a plain id is very much a waste.
Normalizing the domain and keyword buys little space savings, but at a big loss in efficiency.
"Huse database"? I predict that this table will be smaller than your Domains table. That is, this table is not your main concern for "huge".

processing MySQL data when there are field values inserted with commas

I have some columns in mysql table with field vaues are seperated with commas. fields like IP address and running_port_ids, dns_range or subnet etc. running a cron to check every hour whether the ports are used or not on the appliance. if ports are used against each appliance running_port_ids(like 2,3,7) are inserted with comma seperated values.
How to process the data so that i can get a reports which ports are less used (i have a static list of port ids) in ascending order like below by grouping of address, running_port_ids and insert date for a date range of one month.
address port usage%
10.2.1.3 3 1
10.3.21.22 2 20
there are thousands of record now in the table with comma seperated running_port_ids. is there any methods available in MySql to do this?
Any help much appreciated.
If you can convert your data model to a n:m relation (or "link table"), i.e. normalize your data model, this is pretty easy using grouping (or "aggregate") functions. So I'd advise to revise your data model and introduce a table containing one row for each of the ports, in stead of storing this de-normalized in a text column.
A typical example would be: "student has many classes", and a property of this relation is "attendance":
Student
id name
1 John
2 Jane
Course
id name
1 Engineering
2 Databases
Class
id courseid date room
1 1 2015-08-05 10:00:00 301
2 1 2015-08-13 10:00:00 301
3 1 2015-09-03 10:00:00 301
StudentClass
studentid classid attendance
1 1 TRUE
1 2 FALSE
1 3 NULL
2 1 TRUE
2 2 TRUE
2 3 NULL
In this case, you can see the relation between student and class is normalized, i.e. every other value is stored vertically in stead of horizontally. This way, you can easily query things like "How many classes did John miss?" or "How many students did not miss any class". NULL in the example shows that we can not yet tell anything about the attendance (as the date is in the future), but we do know that they should attend.
This is the way you should keep track of properties of a relation between two things. I can't really make out what you're trying to build, but I'm pretty sure you need a similar model.
Hope this helps.

Advice on linking product codes to a product in a MySQL database

I need some advice of how to setup my tables I currently have a product table and a product codes table.
In the codes table I have an id and a title such as:
1 567902
2 345789
3 345678
there can be many items in this table.
In my product table I have the usual product id,title, etc but also a code id column that I'm currently storing a comma separate list of ids for any codes the product needs to reference.
in that column I could end up with ids like: 2,5,6,9
I'm going to need to be able to search the products table looking for code ids for a specific set this is where I've come into problems trying to use id IN ($var) or FIND_IN_SET is proving problematic I've been advised to restructure it I'm happy to do just wondering what the best method would be.
Sounds like you have two choices. If this is a 1 to many relationship, then you need to have the foreign key in the code table, not the product table.
i.e.
codeId code productId
1 567902 2
2 345789 6
3 345678 9
4 345690 9
The other option is to have another table which contains productId and codeId (both as foreign keys), this is a many-to-many relationship. This is what you should go for if a code can be assigned to multiple products (I assume not). It will look something like this:
codeId productId
1 2
1 10
2 6
3 9
4 9
I think the first option is what you need.

MS Access 2007 Rows to columns in recordset

I have a table which is like a questionnaire type ..
My original table contains 450 columns and 212 rows.
Slno is the person's id who answer the questionaire .
SlNo Q1a Q1b Q2a Q2b Q2c Q2d Q2e Q2f .... Q37c <450 columns>
1 1
2 1 1
3 1
4 1 1
5 1
I have to do analysis for this data , eg Number of persons who is male (Q1a) and who owns a boat (Q2b) i.e ( select * from Questionnaire where Q1a=1 and Q2b=1 ).. etc .. many more combinations are there ..
I have designed in MS access all the design worked perfectly except for a major problem ( Number of table columns is restricted to 255 ).
To be able to enter this into access table i have inserted in as 450 rows and 212 columns (now am able to enter this into access db). Now while fetching the records i want the record set to transpose the results into the form that i wanted so that i do not have to change my algorithm or logic .... How to achieve this with the minimum changes ? This is my first time working with Access Database
You might be able to use a crosstab query to generate what you are expecting. You could also build a transpose function.
Either way, I think you'll stil run into the 255 column limit and MS Access is using temporary table, etc.
However, I think you'll have far less work and better results if you change the structure of your table.
I assume that this like a fill-in-the-bubble questionnaire, and it's mostly multiple choice. In which case instead of recording the result, I would record the answer for the question
SlNo Q1 Q2
1 B
2 B
3 A
4 A C
5 A
Then you have far fewer columns to work with. And you query for where Q1='A' instead of Q1a=1.
The alternative is break the table up into sections (personal, career, etc.) and then do a join, and only show the column you need (so as not to exceed that 255 column limit).
An way to do this that handles more questions is have a table for the person, a table for the question, and a table for the response
Person
SlNo PostalCode
1 90210
2 H0H 0H0
3
Questions
QID, QTitle, QDesc
1 Q1a Gender Male
2 Q1b Gender Female
3 Q2a Boat
4 Q2b Car
Answers
SlNo QID Result
1 2 True
1 3 True
1 4 True
2 1 True
2 3 False
2 4 True
You can then find the question takers by selecting Persons from a list of Answers
select * from Person
where SlNo in (
select SlNo from Answers, Questions
where
questions.qid = answers=qid
and
qtitle = 'Q1a'
and
answers.result='True')
and SlNo in (
select SlNo from Answers, Questions
where
questions.qid = answers=qid
and
qtitle = 'Q2a'
and
answers.result='True')
I finally got the solutions
I created two table one having 225 columns and the other having 225 column
(total 450 columns)
I created a SQL statement
select count(*) from T1,T2 WHERE T1.SlNo=T2.SlNo
and added the conditions what i want
It is coming correct after this ..
The database was entered wrongly by the other staff in the beginning but just to throw away one week of work was not good , so had to stick to this design ... and the deadly is next week .. now it's working :) :)

Relational Database Design (MySQL)

I have a table User that stores user information - such as name, date of birth, locations, etc.
I have also created a link table called User_Options - for the purpose of storing multi-value attributes - this basically stores the checkbox selections.
I have a front-end form for the user to fill in and create their user profile. Here are the tables I have created to generate the checkbox options:
Table User_Attributes
=====================
id attribute_name
---------------------
1 Hobbies
2 Music
Table User_Attribute_Options
======================================
id user_attribute_id option_name
--------------------------------------
1 1 Reading
2 1 Sports
3 1 Travelling
4 2 Rock
5 2 Pop
6 2 Dance
So, on the front-end form there are two sets of checkbox options - one set for Hobbies and one set for Music.
And here are the User tables:
Table User
========================
id name age
------------------------
1 John 25
2 Mark 32
Table User_Options
==================================================
id user_id user_attribute_id value
--------------------------------------------------
1 1 1 1
2 1 1 2
3 1 2 4
4 1 2 5
5 2 1 2
6 2 2 4
(in the above table 'user_attribute_id' is the ID of the parent attribute and 'value' is the ID of the attribute option).
So I'm not sure that I've done all this correctly, or efficiently. I know there is a method of storing hierarchical data in the same table but I prefer to keep things separate.
My main concern is with the User_Options table - the idea behind this is that there only needs to be one link table that stores multi-value attributes, rather than have a table for each and every multi-value attribute.
The only thing I can see that I'd change is that in the association table, User_Options, you have an id that doesn't seem to serve a purpose. The primary key for that table would be all three columns, and I don't think you'd be referring to the options a user has by an id--you'd be getting them by user_id/user_attribute_id. For example, give me all the user options where user is 1 and user attribute id is 2. Having those records uniquely keyed with an additional field seems extraneous.
I think otherwise the general shape of the tables and their relationships looks right to me.
There's nothing wrong with how you've done it.
It's possible to make things more extensible at the price of more linked table references (and in the composition of your queries). It's also possible to make things flatter, and less extensible and flexible, but your queries will be faster.
But, as is usually the case, there's more than one way to do it.