Best way to implement many-to-many relationship in MySQL - mysql

I have 2 tables, users(~1000 users) and country(~50 countries). A user can support many countries so I am planning to create a mapping table, user_country. However, since I have 1000 users and 50 countries, I will have a maximum of 50000 entries for this table. Is this the best way to implement this or is there a more appropriate method for this?
If this is the best way, how can i add a user supporting many countries to this table using only one SQL statement?
For ex:
INSERT INTO user_country(userid, countrycode)
VALUES ('user001','US'),
('user001','PH'),
('user001','KR'),
('user001','JP')
Above SQL statement will be too long if a user supports all 50 countries. And I have to do it for 1000 users. Anyone have any ideas the most efficient way to implement this?

From the point of view of database design, a table like your user_country is the only sensible way to go. 50000 records are a breeze for MySQL, and having them together with the appropriate indexes will open up all ways of future use for those data.
As far as I can see, this is unrelated to the problem of many large SQL insert statements. No matter how you represent the data in the database, you will have to write statements containing, for each user, a list of countries.
This is a one-time action, right? So it doesn't need to be a masterpiece in software engineering. What I sometimes do is load the raw data in Excel, line by line, then write a formula that "calculates" the appropriate SQL statement for the first line, and copy this formula for all lines. Then throw all these statements at the database. Even if there are tens of thousands of them, it's not much effort.

Personally I'd do the insert based on a select:
INSERT INTO user_country SELECT 'user001', countryid from countries WHERE countryid IN ('US', 'PH', 'KR', 'JP');
You need to adapt to your column names.
The alternative of storing list of countries in a single column usercountries varchar (255) as US,FR,KR and so on would be possible as well - but you'd lose the ability to select users based on the country they support. In fact you don't lose it - but
SELECT * FROM users WHERE usercountries like '%KR%';
Is not a good query in terms of index usage. But as you only have 1000 users a tablescan will be mighty quick as well.

Related

Why are arrays not saveable in sql

I know that sql can't save arrays (correct me if i'm wrong).
why?
I know this is a stupid question, but
Arrays are only structured data. Why can't sql save that?
Can i rewrite my mysql database or download a Addon for sql so i can save arrays?
Thanks in advance
Relational database management systems (RDBMS), such as MySQL, SQL Server, Oracle and PostgreSQL usually store data in tables. This is a very good way to store related data.
Let's say there are three entities: customers, orders, and products, and the orders contain multiple products. Four tables hence:
customers(customer_no, name)
products(product_no, name, price)
orders(order_no, customer_no, date)
order_details(order_no, product_no, amount)
We would provide indexes (i.e. search trees) to easily find orders of a customer or products in an order. Now let's say, we want to know how many orders have been made for product 123:
select count(distinct order_no)
from order_details
where product_no = 123;
The DBMS will quickly find the order_detail records for the product, because looking up an index is like searching by last name in a telephone book (binary search). And then it's mere counting. So only few records get read and the whole query is really fast.
Now the same with arrays. Something like:
products(product_no, name, price)
customers
(
customer_no,
name,
array of orders
(
order_no,
date,
array of products
(
product_no,
amount
)
)
)
Well, the order details are now hidden inside an order element which itself is inside a customer object. To get the number of orders for product 123, the only approach seems to be to read all customer records, loop through all orders and see whether they contain the product. This can take awfully long. Moreover without foreign key constraints for the relations between the entities, the arrays may contain product numbers that don't even exist.
Well, there may be ways to kind of index array data and there may be ways to guarantee data consistency for them, but the relational approach with tables has proven to solve these things extremely well. So we would avoid arrays and rather build our relations with tables instead. This is what a relational database is made for.
(Having said this, arrays may come in handy every now and then, e.g. in a recursive query were you want to remember which records have already been visited, but these occasions are rare.)
To answer my own question, i first want to thank for the comments
THANK YOU!
back to the question: ordinary sql cant save arrays and doesnt want to save save, because of normalization issues.
you can save arrays on another way:
A SQL Table is like an array. Link a new table as array. Create the table manually or, if the array could change, with Code. There is no need for arrays in sql
If you have to do, or want to do so, you can use Nosql or PostgreSql or save the Data with JSON, Oracle and XML

SQL only get rows that matches full number split by a comma

I'm working on something that shows shops under a specific category, however I have an issue because I store the categories of a shop like this in a record with the id of a category. "1,5,12". Now, the problem is if I want to show shops with category 2, it "mistakens" 12 as category 2. This is the SQL right now.
SELECT * FROM shops WHERE shop_cats LIKE '%".$sqlid."%' LIMIT 8
Is there a way to split the record "shop_cats" by a comma in SQL, so it checks the full number? The only way I can think of is to get all the shops, and do it with PHP, but I don't like that as it will take too many resources.
This is a really, really bad way to store categories, for many reasons:
You are storing numbers as strings.
You cannot declare proper foreign key relationships.
A (normal) column in a table should have only one value.
SQL has poor string functions.
The resulting queries cannot take advantage of indexes.
The proper way to store this information in a database is using a junction table, with one row per shop and per category.
Sometimes, we are stuck with other people's really bad design decisions. If this is your case, then you can use FIND_IN_SET():
WHERE FIND_IN_SET($sqlid, shop_cats) > 0
But you should really fix the data structure.
If you can, the correct solution should be to normalize the table, i.e. have a separate row per category, not with commas.
If you can't, this should do the work:
SELECT * FROM shops WHERE CONCAT(',' , shop_cats , ',') LIKE '%,".$sqlid.",%' LIMIT 8
The table shops does not follow 1NF (1st Normal Form) i.e; every column should exactly one value. To avoid that you need to create another table called pivot table which relates two tables (or entities). But to answer your question, the below SQL query should do the trick.
SELECT * FROM shops WHERE concat(',',shop_cats,',') LIKE '%,".$sqlid.",%' LIMIT 8

sql query to check many interests are matched

So I am building a swingers site. The users can search other users by their interests. This is only part of a number of parameters used to search a user. The thing is there are like 100 different interests. When searching another user they can select all the interests the user must share. While I can think of ways to do this, I know it is important the search be as efficient as possible.
The backend uses jdbc to connect to a mysql database. Java is the backend programming language.
I have debated using multiple columns for interests but generating the thing is the sql query need not check them all if those columns are not addressed in the json object send to the server telling it the search criteria. Also I worry i may have to make painful modifications to the table at a later point if i add new columns.
Another thing I thought about was having some kind of long byte array, or number (used like a byte array) stored in a single column. I could & this with another number corresponding to the interests the user is searching for but I read somewhere this is actually quite inefficient despite it making good sense to my mind :/
And all of this has to be part of one big sql query with multiple tables joined into it.
One of the issues with me using multiple columns would be the compiting power used to run statement.setBoolean on what could be 40 columns.
I thought about generating an xml string in the client then processing that in the sql query.
Any suggestions?
I think the correct term is a Bitmask. I could maybe have one table for the bitmask that maps the users id to the bitmask for querying users interests, and another with multiple entries for each interest per user id for looking up which user has which interests efficiently if I later require this?
Basically, it would be great to have a separate table with all the interests, 2 columns: id and interest.
Then, have a table that links the user to the interests: user_interests which would have the following columns: id,user_id,interest_id. Here some knowledge about many-to-many relations would help a lot.
Hope it helps!

When dealing with databases, does adding a different table when we can use a simple hash a good thing?

For example, here's the problem I faced... I have three tables. Products, Districtrates, Deliverycharges. In my app, a product's delivery will be calculated through a pre-defined rate defined in the Districtrates table. If we want, we can also add a custom rate overriding the pre-defined rate. Each product can have all 25 districts or only some. So here's my solution :
Create three tables as I mentioned above. Districtrates table will only have 25 records for all the 25 districts in my country. For each product, I will add 25 records to the Deliverycharges table with the productID, deliveryrateID and a custom rate value if available. Some products might have less than 25 districts (Only the ones available for that product).
I can even store this in a simple hash in one cell in the products table. Like this : {district1: nil, district2: 234, district4: 543} (It's in Ruby syntax). In here, if the value is nil, we can take the default value from the deliveryrate table. Here also the hash will have all 25 districts! But the above method (creating a table) is easy to work with. The only problem is, it will add nearly 25 records per each product.
So my question is, is this a good thing? This is only one scenario... There are more where we can use one simple array or hash in a cell rather than creating a table. Creating a table is easy to maintain. But is it the right way?
One of the main points of using a relational database is the ability to query (and update) the data in it using SQL.
That only works if you put the data in a form that the database actually understands. Traditionally, this means defining a table schema.
There are now extensions to let the database work with "semi-structured" data (such as XML/JSON/JSONB), but you should only need to go there when the data really does not fit into the relational model, otherwise you are giving up on a lot of features/performance.
If you put a Ruby string into a text column, you will not have any way to use it from SQL. So no proper searching, indexing, or efficient updates of these delivery rates.

Automated normalization of mySQL database - how to do it?

I have a mySQL database filled with one huge table of 80 columns and 10 million rows. The data may have inconsistencies.
I would like to normalize the database in an automated and efficient way.
I could do it using java/c++/..., but I would like to do as much as possible inside the database. I guess that any work outside the database will slow down things very much.
Suggestions on how to do it? What are good resources/tutorials to start with?
I am not looking for any hints on what normalization is (found plenty of this stuff using google)!
You need to study the columns to identify 'like' entities and break them out into seperate tabels. At best an automated tool might identify groups of rows with identical values for some of the columns, but a person who understood the data would have to decide if those truely belong as a seperate entity.
Here's a contrived example - suppose your columns were first name, last name, address, city, state, zip. An automated tool might identify rows of people who were members of the same family with the same last name, address, city, state, and zip and incorrectly conclude that those five columns represented an entity. It might then split the tables up:
First Name, ReferenceID
and another table
ID, Last Name, Address, City, State, Zip
See what i mean?
I can't think of any way you can automate it. You would have to create the tables that you want, and then go through and replace each piece of data with manual queries.
e.g.,
INSERT INTO contact
SELECT DISTINCT first_name, last_name, phone
FROM massive_table;
then you could drop the columns out of the massive table and replace it with a contact_id column.
You would have a similar process when pulling out rows that go into a one-to-many table.
In cleaning up messy data, I like to create user defined mysql functions to do typical data-scrubbing stuff... that way you can reuse them later. Approaching this way also lets you see if you can find existing udf's that have been written which you can use (with or without modification)... for example mysqludf.org