MySQL: Join 1 master table to 1000 sub tables - mysql

I have a problem.
I have one master table of urls. The UrlId is primary key. Starting from 10000. In this table i have all urls addresses i have found. Then i have 1000 tables in a different database (so i cannot use foreign keys). They are all identical to eachother like urldata.100 up to urldata.999. (Here i store meta data etc)
Suppose i want to use the left function in a join to join each master record for instance
Id 10000 - joins to table urldata.100
Id 11034 - joins to table urldata.110
So i wanna take out the first 3 numbers from each UrlId from the master table in a massive join.
Can this be achieved?

This is a REALLY REALLY bad design, as it restricts both scalability, as well as the performance of the database.
If you really want to store your meta data for different UrlIds, you just need to create one table. Call it UrlMeta, or something similar.
The structure would be (UrlID, MetaName, MetaValue). This structure can store as much meta data as you want as (MetaName, MetaValue) pairs.
After you manage this, your queries would become extremely simple, as you only need to join 2 tables, instead of 1001!

Related

Modelling ownership in MySQL

I have a table Things and I want to add ownership relations to a table Users. I need to be able to quickly query the owners of a thing and the things a user owns. If I know that there will be at most 50 owners, and the pdf for the number of owners will probably look like this, should I rather
add 50 columns to the Things table, like CoOwner1Id, CoOwner2Id, …, CoOwner50Id, or
should I model this with a Ownerships table which has UserId and ThingId columns, or
would it better to create a table for each thing, for example Thing8321Owners with a row for each owner, or
perhaps a combination of these?
The second choice is the correct one; you should create an intermediate table between the table Things and the table Owners (that contains the details of each owner).
This table should have the thing_id and the owner_id as the primary key.
So finally, you well have 3 tables:
Things (the things details and data)
Owner (the owners details and data)
Ownerships (the assignment of each thing_id to an owner_id)
Because in a relational DB you should not have any redundant data.
You should definitely go with option 2 because what you are trying to model is a many to many relationship. (Many owners can relate to a thing. Many things can relate to an owner.) This is commonly accomplished using what I call a bridging table. (Which exactly what option 2 is.) It is a standard technique in a normalized database.
The other two options are going to give you nightmares trying to query or maintain.
With option 1 you'll need to join the User table to the Thing table on 50 columns to get all of your results. And what happens when you have a really popular thing that 51 people want to own?
Option 3 is even worse. The only way to easily query the data is to use dynamic sql or write a new query each time because you don't know which Thing*Owners table to join on until you know the ID value of the thing you're looking for. Or you're going to need to join the User table to every single Thing*Owners table. Adding a new thing means creating a whole new table. But at least a thing doesn't have a limit on the number of owners it could possibly have.
Now isn't this:
SELECT Users.Name, Things.Name
FROM Users
INNER JOIN Ownership ON Users.UserId=Ownership.UserId
INNER JOIN Things ON Things.ThingId=Ownership.ThingId
much easier than any of those other scenarios?

What is the right way of building user favourites table (Performance)

I guess that title isn't very descriptive, so I will explain! I have table called users_favs where is stored all info about which posts user has liked, which post he has favourited and the same for comments. info there is stored as serealized array / or JSON who cares.
Question: What is better? Stay like this or to make 4 tables for each of the fields and store not in serealized version but like user_id => post_id???
What I think about second option is that after some time this field will be GIGANTIC. Also, I will need to make 4 queries (or with JOINS) to take all of the info from these tables.
Keeping it in 1 table means that you'll only need 1 table access and 0 joins to get all the data. While storing it in 4 tables, you'll need at least 1 table access and n-1 joins, when you need n fields of information. Your result set at the end of the query will probably be the same, so the amount of data send over the network is independent of your table structure.
I presume a scenario when you will have data for fav_categories and other columns are null. Similarly for columns fav_posts, liked_posts, liked_comments. So there is a high probability that in each row , only three columns will have data most of the time (id,user_id,any one of rest). If my assumptions are right and the use cases as well , then i would definitely go four four tables.
To add to above you can always choose from whether you want to make read-friendly or write-friendly.

Adding extra column to reduce complex queries

Let's look at two tables, for example tables Post and User
In Post table, column user_id is foreign key.
Table Post
-id
-user_id
-post
Table User
-userID
-username
If we want to get poster's username, we have to use join query and get it from User table.
Wouldn't it be easier that we add extra column in Post table for storing posters' usernames in order to simplify SQL queries. In this case, Post table would have user_id and username columns (username column is redundant) but that would eliminate join queries for catching posters' usernames.
What is the best choice, to store usernames in Post table or not
Creating a view is an opition: and ust query that view. However, views in msql don't offer as much benefit in terms of performance as they once did.
It might be that just running the join is fine. Depending on the amount of data you will eventually store you may need to consider other means to improve performance such as partitioning tables etc etc. - but that is for down-the-line
Adding redundant data to the DB is not a good choice.
If you really have to simplify your select queries then create a view for that case. That way the data is in one place and you don't need to join every time. But a simple join is actually not that big a deal.

adding data to interrelated tables..easier way?

I am a bit rusty with mysql and trying to jump in again..So sorry if this is too easy of a question.
I basically created a data model that has a table called "Master" with required fields of a name and an IDcode and a then a "Details" table with a foreign key of IDcode.
Now here's where its getting tricky..I am entering:
INSERT INTO Details (Name, UpdateDate) Values (name, updateDate)
I get an error: saying IDcode on details doesn't have a default value..so I add one then it complains that Field 'Master_IDcode' doesn't have a default value
It all makes sense but I'm wondering if there's any easy way to do what I am trying to do. I want to add data into details and if no IDcode exists, I want to add an entry into the master table. The problem is I have to first add the name to the fund Master..wait for a unique ID to be generated(for IDcode) then figure that out and add it to my query when I enter the master data. As you can imagine the queries are going to probably get quite long since I have many tables.
Is there an easier way? where everytime I add something it searches by name if a foreign key exists and if not it adds it on all the tables that its linked to? Is there a standard way people do this? I can't imagine with all the complex databases out there people have not figured out a more easier way.
Sorry if this question doesn't make sense. I can add more information if needed.
p.s. this maybe a different question but I have heard of Django for python and that it helps creates queries..would it help my situation?
Thanks so much in advance :-)
(decided to expand on the comments above and put it into an answer)
I suggest creating a set of staging tables in your database (one for each data set/file).
Then use LOAD DATA INFILE (or insert the rows in batches) into those staging tables.
Make sure you drop indexes before the load, and re-create what you need after the data is loaded.
You can then make a single pass over the staging table to create the missing master records. For example, let's say that one of your staging table contains a country code that should be used as a masterID. You could add the master record by doing something along the lines of:
insert
into master_table(country_code)
select distinct s.country_code
from staging_table s
left join master_table m on(s.country_code = m.country_code)
where m.country_code is null;
Then you can proceed and insert the rows into the "real" tables, knowing that all detail rows references a valid master record.
If you need to get reference information along with the data (such as translating some code) you can do this with a simple join. Also, if you want to filter rows by some other table this is now also very easy.
insert
into real_table_x(
key
,colA
,colB
,colC
,computed_column_not_present_in_staging_table
,understandableCode
)
select x.key
,x.colA
,x.colB
,x.colC
,(x.colA + x.colB) / x.colC
,c.understandableCode
from staging_table_x x
join code_translation c on(x.strange_code = c.strange_code);
This approach is a very efficient one and it scales very nicely. Variations of the above are commonly used in the ETL part of data warehouses to load massive amounts of data.
One caveat with MySQL is that it doesn't support hash joins, which is a join mechanism very suitable to fully join two tables. MySQL uses nested loops instead, which mean that you need to index the join columns very carefully.
InnoDB tables with their clustering feature on the primary key can help to make this a bit more efficient.
One last point. When you have the staging data inside the database, it is easy to add some analysis of the data and put aside "bad" rows in a separate table. You can then inspect the data using SQL instead of wading through csv files in yuor editor.
I don't think there's one-step way to do this.
What I do is issue a
INSERT IGNORE (..) values (..)
to the master table, wich will either create the row if it doesn't exist, or do nothing, and then issue a
SELECT id FROM master where someUniqueAttribute = ..
The other option would be stored procedures/triggers, but they are still pretty new in MySQL and I doubt wether this would help performance.

How to store data in mysql, to get the fastest performance?

I'm thinking about it, which of the following two query types would give me the fastest performance for a user messaging module inside my site:
The first one i thought about is a multi table setup, which has a connection table, and a main table. The connection table holds the connection between accounts, and the messaging table.
In this case a query would look like following, to get some data of the author, and the messages he has sent:
SELECT m.*, a.username
FROM messages AS m
LEFT JOIN connection_table
ON (message_id = m.id)
LEFT JOIN accounts AS a
ON (account_id = a.id)
WHERE m.id = '32341'
Inserting into it is a little bit more "complicated".
My other idea, and in my thought the better solution of this problem is that i store the data i would use in a connection table in the same table where is store the data of the mail. Sounds like i would get lots of duplicated entries, but no, because i have a field which has text type and holds user ids like this: *24*32*249*
If I want to query them, i use the mysql LIKE method. Deleting is an other problem, but for this i have one more field where i store who has deleted the post.
Sad about that i don't know how to join this.
So what would you recommend? Are there other ways?
Sounds like you are using an n:m relation.. if yes, don't put a list of ids in a single column but create a mapping table containing two columns - the primary key of table1 and the primary key of table2. Then selecting, inserting and deleting will all be easy and still cheap.
I wonder how many messages will be send to multiple recipients? It might just be easier to have it all in one table - MessageID, SentFrom, SentTo, Message, and dup it for multiple people. This obviously makes it extremely easy to query.
Definately avoid storing multiple ID's in one field and using LIKE - that'll be a performance killer - go with ThiefMasters suggestion if you want something like that.