How to design this many-to-many database?

How to design this many-to-many database? - mysql

I want such a database design. It serves to store a list of words in one language, i.e. English, and their translations in another language, i.e. Spanish. So initially I have a table English (id,word,isTranslated).
1 i false
2 love false
3 you false
4 hate false
5 him false
...
"isTranslated" is a boolean indicating whether this word has a translation or not yet.
In the Web front end, each time a number of words are displayed in an html page from the table English. For example:
2 love
3 you
4 hate
A user then clicks each of these words and submits the translated words in a web form and they are stored into the Spanish table. Unlike the English table, the Spanish table starts with zero records. It gets populated gradually through the html form.
I am thinking the Spanish table should have the same structure as the English does;
Spanish (id,word,isTranslated)
So that I can have another associative table EnglishSpanish(English_ID, Spanish_ID), which stores the translated pairs in two tables. It's a many-to-many relationship. The purpose of this table is to facilitate retrieval of the counterparts of a word in a language.
Does this make sense? The trouble I am having is, how to populate the other two tables Spanish and EnglishSpanish gradually, as users submit their translations over time? The first table English is pre-loaded.
Thank you for your insights and help.

I once designed a dictionary (one-to-one) relationship, as in real world, you will pick up only one best translation for each English phrase:
id, term, lang, english_id
1 I En 1
2 Love En 2
3 You En 3
4 我 Cn 1
5 爱 Cn 2
6 你 Cn 3
7 ichi De 1
...
Above Designed also works for one-many relationship
If you want it as many-many relationship, you need 2 tables:
id, term, lang
1 xyz en
2 ...
id1, id2
1 2
1 3
2 5
...

Related

MYSQL DB Best method to store keywords and URL index

Which of these methods would be the most efficient way of storing, retrieving, processing and searching a large (millions of records) index of stored URLs along with there keywords.
Example 1: (Using one table)
TABLE_URLs-----------------------------------------------
ID DOMAIN KEYWORDS
1 mysite.com videos,photos,images
2 yoursite.com videos,games
3 hissite.com games,images
4 hersite.com photos,pictures
---------------------------------------------------------
Example 2: (one-to-one Relationship from one table to another)
TABLE_URLs-----------------------------------------------
ID DOMAIN KEYWORDS
1 mysite.com
2 yoursite.com
3 hissite.com
4 hersite.com
---------------------------------------------------------
TABLE_URL_KEYWORDS---------------------------------------------
ID DOMAIN_ID KEYWORDS
1 1 videos,photos,images
2 2 videos,games
3 3 games,images
4 4 photos,pictures
---------------------------------------------------------
Example 3: (one-to-one Relationship from one table to another (Using a reference table))
TABLE_URLs-----------------------------------------------
ID DOMAIN
1 mysite.com
2 yoursite.com
3 hissite.com
4 hersite.com
---------------------------------------------------------
TABLE_URL_TO_KEYWORDS------------------------------------
ID DOMAIN_ID KEYWORDS_ID
1 1 1
2 2 2
3 3 3
4 4 4
---------------------------------------------------------
TABLE_KEYWORDS-------------------------------------------
ID KEYWORDS
1 videos,photos,images
2 videos,games
3 games,images
4 photos,pictures
---------------------------------------------------------
Example 4: (many-to-many Relationship from url to keyword ID (using reference table))
TABLE_URLs-----------------------------------------------
ID DOMAIN
1 mysite.com
2 yoursite.com
3 hissite.com
4 hersite.com
---------------------------------------------------------
TABLE_URL_TO_KEYWORDS------------------------------------
ID DOMAIN_ID KEYWORDS_ID
1 1 1
2 1 2
3 1 3
4 2 1
5 2 4
6 3 4
7 3 3
8 4 2
9 4 5
---------------------------------------------------------
TABLE_KEYWORDS-------------------------------------------
ID KEYWORDS
1 videos
2 photos
3 images
4 games
5 pictures
---------------------------------------------------------
My understanding is that Example 1 would take the largest amount of storage space however searching through this data would be quick (Repeat keywords saved multiple times, however keywords are sat next to the relevant domain)
wWhereas Example 4 would save a tons on storage space but searching through would take longer. (Not having to store duplicate keywords, however referencing multiple keywords for each domain would take longer)
Could anyone give me any insight or thoughts on which the best method would be to utilise when designing a database that can handle huge amounts of data? With the foresight that you may want to display a URL with its assosicated keywords OR search for one or more keywords and bring up the most relevant URLs

You do have a many-to-many relationship between url and keywords. The canonical way to represent this in a relational database is to use a bridge table, which corresponds to example 4 in your question.
Using the proper data structure, you will find out that the queries will be much easier to write, and as efficient as it gets.
I don't know what drives you to think that searchin in a structure like the first one will be faster. This requires you to do pattern matching when searching for each single keyword, which is notably slow. On the other hand, using a junction table lets you search for exact matches, which can take advantage of indexes.
Finally, maintaining such a structure is also much easier; adding or removing keywords can be done with insert and delete statements, while other structures require you do do string manipulation in delimited list, which again is tedious, error-prone and inefficient.

None of the above.
Simply have a table with 2 string columns:
CREATE TABLE domain_keywords (
domain VARCHAR(..) NOT NULL,
keyword VARCHAR(..) NOT NULL,
PRIMARY KEY(domain, keyword),
INDEX(keyword, domain)
) ENGINE=InnoDB
Notes:
It will be faster.
It will be easier to write code.
Having a plain id is very much a waste.
Normalizing the domain and keyword buys little space savings, but at a big loss in efficiency.
"Huse database"? I predict that this table will be smaller than your Domains table. That is, this table is not your main concern for "huge".

Database design - "Separate Tables Vs One table" for Select Queries

I have a MySQL table like following
Books Table
book-id category author author_place book_name book_price --------other 50 columns directly related to book-id
1 adventure tom USA skydiving 300
2 spiritual rom Germany what you are 500
3 adventure som India woo woo 700
4 education kom Italy boring 900
5 adventure lom Pak yo yo 90
.
.
4000 spiritual tom USA you are 10
As you can see there are around 4000 rows and around 55 columns, I am using this table mostly for select query, Maybe add or update new book after2-3 weeks
I have doubt about the category and author columns
now if I need to select the table by category and author, I can simply do
SELECT * from books Where author = 'tom'
Select * FROM books WHERE category='education'
It works fine, But according to standard database design I think I should separate the category and authors columns into separate tables (especially authors) and use their primary key as foreign key in the books table
Something like this
Books Table
book-id categ_id author_id book_name book_price --------other 50 columns directly related to book-id
1 1 1 skydiving 300
2 2 2 what you are 500
3 1 3 woo woo 700
4 3 4 boring 900
5 1 5 yo yo 90
.
.
4000 3 1 you are 10
Category Table
categ_id category_name
1 advernture
2 spiritual
3 education
. .
. .
30 something
Authors Table
author_id author country
1 tom USA
2 rom Germany
3 som India
4 kom Italy
5 lom Pak
But then I have to use join the tables each time I make a select query by authors or category, Which I think will be inefficient, Something like this
SELECT * FROM Books LEFT JOIN authors on authors.author_id = books.author_id WHERE books.author_id =1
SELECT * FROM Books LEFT JOIN categories on categories.categ_id = books.categ_id_id WHERE books.categ_id =1
So should I separate the first table into separate tables or first table design is better in this case?

This question has it's answer from Mr. Edgar F. Codd himself - the inventor of the relation model upon which all RDBMS are build.
Shortly after releasing the relational model papers he and his team followed with papers on the so called normal forms. There are few of them but the first 3 (at least) should be generally considered mandatory:
First normal form (1NF)
Second normal form (2NF)
Third normal form (3NF)
When you read them you'll see that your initial design is in violation of 2NF and you have come with a solution that more or less respects it. Go ahead with a the NF-compliant design without any doubts.
To elaborate a bit on your concerns with Join's performance. This is not an issue as long as the following criteria is met:
your database schema is well designed (2NF compliant at least)
you use Foreign keys to link the tables (MySQL's docs)
you join the tables by their FK
you have the hardware resources necessary to run your data
efficiently
e.g. on MySQL with InnoDB, on 2NF compliant schema using Foreign keys the join performance by the FK will be among the last things you'd ever be concerned.
Historically there was a DB engine in MySQL - the MyISAM - that did not support foreign key constraints. Perhaps it's the main source of feedback about poor join performance (along poor schema designs of course).

Designing database for Multilanguage dictionary

I want to make a database for English to Myanmar multi-ethnics dictionary.
Currently,I have 4 English to Myanmar Ethnic dictionary data for English to Myanmar/Burma Languages.
Example:
English#WordTyp#Transalation#Eg_Sentences#Synonyms
Love #(n)# translation # Language 1 #Eg_Sentences#Synonyms
Love #(v)# translation # Language 2 #Eg_Sentences#Synonyms
Love #(v)# translation # Language 3 #Eg_Sentences#Synonyms
Love #(n)# translation # Language 4 #Eg_Sentences#Synonyms
"WordType" can be "verb", "noun" , "adverb", etc...
Here is example data entries
http://i.imgur.com/hbE60Vm.png
I am not good in Database design, I am thinking about to create "6" tables:
English_Word
PartOfSpeech
Translation
Language
Synonyms
Example_Sentences
in near future, I want to add more Myanmar/Burma ethnic languages.
One English word can have many word type(noun,verb,adverb,..) and can be translated(just a strings) into many languages.
But how is the relations between them? Here is my first database schema.
Could you give me feedback on my draft database table design?
my draft database table design
Many thanks in advance.
regards,

id Should be a auto increment field and we can treat all words (in all languages as words )
WORD (ID,NAME,LANGUAGE_ID,TYPE_ID) //TYPE_ID AND LANG_ID ARE FOREIGN KEYS
TYPE (ID,NAME) // PART OF SPEACH
LANGUAGE (ID,NAME)
SENTENSE (ID, EXAMPLE)
SYNONUMS (ID,WORD_ID,SYNONYM_ID) //TYPE_ID AND SYNONYM_ID ARE FOREIGN KEYS TO WORD TABLE (SYNONYM IS ALSO A WORD)
DICTIONARY (WORD_ID,TRANSLATION_ID, SENTENSE_ID) // TRANSLATION IS ALSO A WORD
LANGUAGE
id name
1 english
2 tamil
3 gggggg
TYPE
ID NAME
1 NOUN
2 MMMMM
WORD
ID NAME LANGUAGE_ID TYPE_ID
1 LOVE 1 1
2 $#%#$% 2 1
3 PASSION 1 1
4 HHJHJHJ 1 2
SENTENCE
ID EXAMPLE
1 HAFKAKJDHAKJFHAHLFALFLASLDAL
SYNONUMS
ID WORD_ID SYNONYM_ID
1 1 3 //LOVE =>SYNONUMS PASSION
2 1 4
DICTIONARY
WORD_ID TRANSLATION_ID SENTENSE_ID
1 2 1 //LOVE $#%#$% HAFKAKJDHAKJFHAHLFALFLASLDAL

When is it better to flatten out data using comma separated values to improve search query performance?

My question about SEARCH query performance.
I've flattened out data into a read-only Person table (MySQL) that exists purely for search. The table has about 20 columns of data (mostly limited text values, dates and booleans and a few columns containing unlimited text).
Person
=============================================================
id First Last DOB etc (20+ columns)...
1 John Doe 05/02/1969
2 Sara Jones 04/02/1982
3 Dave Moore 10/11/1984
Another two tables support the relationship between Person and Activity.
Activity
===================================
id activity
1 hiking
2 skiing
3 snowboarding
4 bird watching
5 etc...
PersonActivity
===================================
id PersonId ActivityId
1 2 1
2 2 3
3 2 10
4 2 16
5 2 34
6 2 37
7 2 38
8 etc…
Search considerations:
Person table has potentially 200-300k+ rows
Each person potentially has 50+ activities
Search may include Activity filter (e.g., select persons with one and/or more activities)
Returned results are displayed with person details and activities as bulleted list
If the Person table is used only for search, I'm wondering if I should add the activities as comma separated values to the Person table instead of joining to the Activity and PersonActivity tables:
Person
===========================================================================
id First Last DOB Activity
2 Sara Jones 04/02/1982 hiking, snowboarding, golf, etc.
Given the search considerations above, would this help or hurt search performance?
Thanks for the input.

Horrible idea. You will lose the ability to use indexes in querying. Do not under any circumstances store data in a comma delimited list if you ever want to search on that column. Realtional database are designed to have good performance with tables joined together. Your database is relatively small and should have no performance issues at all if you index properly.
You may still want to display the results in a comma delimted fashion. I think MYSQL has a function called GROUP_CONCAT for that.

saving tree data in database (family tree)

I am trying to store a family tree.
Here is the platform that I am using, Zend framework, Mysql, Ajax
I have searched stackoverflow I came across this post which is very helpful in handling data in terms of objects.
"Family Tree" Data Structure
I'll Illustrate my use case in brief.
User can create family members or friends based on few relations defined in database. I have Model for relations too. User can create family members like Divorced spouse, frineds. Max the Tree can be deep that we are assuming max to kids of the grandchildren but it can expand in width too. Brother/sister & their family.
I am looking an efficient database design for lesser query time. If I have to use the data structures described in above post where I must keep them as they necessary have to be a Model.
For representation I am planning to use Visualization: Organizational Chart from
http://code.google.com/apis/chart/interactive/docs/gallery/orgchart.html#Example
I'll summarize what I need
Database design
Placing of controllers (ajax) & models
The people that the user will create they will not be any other users. just some another data
yeah thats it! I'll post a complete solution on this thread when I'll be completing the project, of course with help of expertise of u guys
Thanks in advance
EDIT I I'll Contribute more to elaborate my situation
I have a user table, a relation table, & last family/family tree table
the Family table must have similar structure to following
ID userid relation id Name
1 34 3 // for son ABC
2 34 4 // for Wife XYZ
3 34 3 // for Mom PQR
4 34 3 // for DAd THE
5 34 3 // for Daughter GHI
6 34 3 // for Brother KLM
The drawback for this approach is generating relations to the other nodes like daughter-in-law, wifes brother & their family.
The ideal way of doing is for a user we can add Parents, siblings, children & for extra relations they must be derived from the family members relation i.e. Brother-in-law must be derived as sister's husband, or wife's brother.
THis is what I can think now. I just need Implementation guidelines.
Hope this helps u guys to provide a better solution.

I guess that from the database point of view it would be best to implement it like
id | name | parent_male | parent_female
Other option would be string prefixing
id | name | prefix
1 | Joe | 0001
2 | Jack | 000100001 //ie. Joes son
3 | Marry| 0001 //ie. Jacks mother
4 | Eve | 0002 // new family tree
5 | Adam | 00020001 // ie. Eves son
6 | Mark | 000200010001 // ie. Adams son
Other (more effective) algorithms like MPTT assume that the data is a tree, which in this case is not (it has circles).
To show it would work - to select Mark's grandparents:
--Mark
SELECT prefix FROM family_tree WHERE id = 6;
-- create substring - trim N 4-character groups from the end where N is N-th parent generation => 2 for grandparent ==> 0002
--grandparents
SELECT * FROM family_tree WHERE prefix = '0002'
-- same for other side of family
-- cousins from one side of family
SELECT * FROM family_tree WHERE prefix LIKE '0002%' AND LENGTH(prefix) = 12

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008