Add column to 50M+ records table? Most efficient way? - mysql

I have a table products that has over 50M records. I want to track who uploaded given product in my system but simply adding uploaded_by_id to such a huge table isn't the solutions I'm looking for. What else than a join table can I create to be able to query for products uploaded by given id in given time range?
Product.where(uploaded_by_id: #user.id, created_at: time_range) is what I need to do but I need more efficient way.

You might want to look into tools like
Soundscloud's Large Hadron Migrator or
Percona's pt-online-schema-change.
Both tools allow altering tables without locking them.

Instead of touching the main table, add another table (Vertical Partitioning). The new table would have the same PRIMARY KEY, but not AUTO_INCREMENT. The new column(s) would go into this table.
Create rows in the new table only when the new column(s) have a value.
When you don't need the new column(s), continue to read only from the old table.
When you also need the new column(s), use LEFT JOIN.

Related

Add new columns without using Alter

Is it possible to add new columns to an existing table without using alter statement?
Other people are answering unequivocally "no, it is not possible." This is the answer to your literal question. But I'm wondering why you ask the question.
One of the biggest pain points of MySQL is that using ALTER TABLE locks the table while you're making a change like adding a column, and the more data in your table, the longer this lasts while it restructures the table. I'm guessing this is the issue you have, and you're trying to get an alternative that doesn't block access to the table while you're adding a new column.
(In the future, it would help folks give you the best answers if you explain more about what you're trying to do.)
The answer to this question is yes, there is a solution: pt-online-schema-change is a free tool that accomplishes this.
You use it just like you would use ALTER TABLE, but you use it at the command-line instead of in an SQL query.
pt-online-schema-change --alter "ADD COLUMN c1 INT" D=sakila,t=actor
In this example, the database name is sakila and the table name is actor. The script does a lot of work behind the scenes:
Create a table like the original table, but empty of rows
ALTER TABLE to add the column or whatever other alteration you told it. You can do anything you would normally do with ALTER TABLE. In fact, it's doing ALTER TABLE for you, against the empty copy table.
Copy rows from the original table to the new table in the background.
Create triggers to capture any changes made to the original table while it's gradually copying the bulk of the data.
Swap the names of the new table (with the extra column) and the original table, once all data has been copied.
Drop the original table.
This has a few caveats, like the original table must have a primary key, and must not have existing triggers.
It tends to take longer than doing a traditional ALTER TABLE, but since it's not blocking access to the original table, it's still more convenient.
Does this help?
Is it possible to add new columns to an existing table without using the alter statement?
No.
Is it possible to add new columns to an existing table without using alter statement?
I don't think it's impossible.
However I'm not sure what you want to do.
lets say you have a table
select * from Store
and you want just export the data or perhaps you want to do something with that data like a selection. but you don't want to STORE the data in your Database
you can just fill a value and give it a name
select
'Test' as name,
*
from Store
this will populate your column with the value your entered.
data results

Insert a new column in SQL

I have a DB consisting of 4 fields.My application will retrieve data from that db. I have one primary key(the id).I also want depending on the id, provide other data that will be organized in a new table. What is better? Create a new table and search again into it, or given the fact that I have already found the row because of the id, create a new element that will be a table. For example can I create a new element named info, and make it be to something like an array,as I want 11 rows,and 2 columns for the info. My SQL code so far is this:
CREATE TABLE people (
id INT NOT NULL AUTO_INCREMENT PRIMARY KEY ,
name VARCHAR( 100 ) NOT NULL ,
sex BOOL NOT NULL DEFAULT '1',
birthyear INT NOT NULL
)
What changes do I need to make? This table is already created.
If each row in the existing table now also needs associating with an 11x2 set of data, you're best off creating another table.
Don't try to stuff 22 items of data into a single field, it's a really bad idea.
If, however, it's always the same (22 items), you could just add 22 fields. It depends on how that data is going to be used, searched, joined on, etc.
Exactly how to do that depends on your RDBMS and your interface to it. It may be easier to create a whole new table and copy the old data across. Or the environment you have may allow you to add the columns and it do the leg work for you.
I think it would be best to create a separate new table to contain the additional data. That is primarly because you have more than one record per ID in the original table.
The records in the new table would have a foreign key peopleID field linking them to the people table.
I believe you are hinting at embedding tables. Which isn't really what MySQL is meant to do. Instead, you should do the following; Create a table like that in your example. Then create a new table that will have a column for an ID (which will be the same as that in the people table) and the other various columns. You can then do an inner join to join the two together. Additionally, if you want to reference different tables for different rows, you may want to add in a column for what 'type' it is.
Alternatively, you could use a 'No-SQL' solution like Mongo. This lets you add things dynamically. But I wouldn't suggest doing this until you have a decent grasp of a relational database.

MySQL find record through many table

In my case I have many table in my database.
My goal is to create a search engine where user can create all logical search he wants.
So I need to find a solution to generate all join based on user search critera.
In some case table has (1:n) links in other case (n:1).
One solution is to image all links and create all join, but I thinks it's a worse solution.
So if you have an idea, I'll very happy to read that.
Thanks a lot.
You can manage it like this don't know it is good or bad but a solution.
Create a new table containing all the searchable fields from various tables and reference to the record id to that table should also be stored in this table.
Insert the new record in this table whenever a new record inserted in those tables.
Search in this single table containing data from all other tables.
OR
consider to use VIEW

How to normalize live database

I need to perform normalization on data structure. I have one table with lots of redundant data (42 columns)
few examples:
files_shit (id, filename String, upload_user, user_name, tags text, ....)
and I want to create 3 tables file, user and tags
I have almost 30 000 records.
What is the best way to copy data from file_shit to files, users and tags and creating references? (between tags and files will be another another table file_tags)
First, you cannot convert this table. You will have to use new ones. A simple way is to use this table as a staging table. Create new tables. Then select from this table and add to those.
You will have to identify the primary key for each table. Then fill up the tables (you may have to identify which table to fill first for reasons of referential integrity...etc.. ).
Sudo code eg : insert into files(columns..)Select <files columns> from files_shit group by primary_colum;
(Note - This means you will use the primary column(s) as the primary key. If you want to use autogenerated integers (optimal) you will have to perform lookups... )
Lot is dependent on the new schema and relations (which you havent defined clearly here). Hope this helps.
EDIT- Lookups
You will have an INT id field for each table.eg. file_id. These will be system generated (Mostly auto_increment). In simple words, this info is not in your current table. So, when u add a file to the file table, and it gets a file_id, you will have to 'look up' the id for this file to add to the user table to satisfy your foreign key relationships(based on how they exist).
SIMPLE EG -
Try adding additional file_id/tag_id columns to your main table.
Fill tag table first (basically the ones that dont refer anyother).
Fill main tables tag_id for each row by joining tag table (lookup).
UPDATE <mainTable> mT JOIN tag_table tT on mT.tag_pk_column= tT.tag_pk_column
SET mT.tag_id=tT.tag_id
Now insert into files ...select file_pk_col, tag_Id group by file_pk_col
-This is an example lookup for the tag table.
The simplest way is to take the database offline, create new tables, including all the required constraints, and use INSERT INTO . . . SELECT column_list FROM old_table to populate the new tables. Some data probably won't satisfy the constraints in the new tables; you'll have to fix that.
It gets more complicated if you can't take the database offline, or if you have to make the changes transparent to application programs. Triggers, rules, and updatable views will help with that.

Different database tables joining on single table

So imagine you have multiple tables in your database each with it's own structure and each with a PRIMARY KEY of it's own.
Now you want to have a Favorites table so that users can add items as favorites. Since there are multiple tables the first thing that comes in mind is to create one Favorites table per table:
Say you have a table called Posts with PRIMARY KEY (post_id) and you create a Post_Favorites with PRIMARY KEY (user_id, post_id)
This would probably be the simplest solution, but could it be possible to have one Favorites table joining across multiple tables?
I've though of the following as a possible solution:
Create a new table called Master with primary key (master_id). Add triggers on all tables in your database on insert, to generate a new master_id and write it along the row in your table. Also let's consider that we also write in the Master table, where the master_id has been used (on which table)
Now you can have one Favorites table with PRIMARY KEY (user_id, master_id)
You can select the Favorites table and join with each individual table on the master_id and get the the favorites per table. But would it be possible to get all the favorites with one query (maybe not a query, but a stored procedure?)
Do you think that this is a stupid approach? Since you will perform one query per table what are you gaining by having a single table?
What are your thoughts on the matter?
One way wold be to sub-type all possible tables to a generic super-type (Entity) and than link user preferences to that super-type. For example:
I think you're on the right track, but a table-based inheritance approach would be great here:
Create a table master_ids, with just one column: an int-identity primary key field called master_id.
On your other tables, (users as an example), change the user_id column from being an int-identity primary key to being just an int primary key. Next, make user_id a foreign key to master_ids.master_id.
This largely preserves data integrity. The only place you can trip up is if you have a master_id = 1, and with a user_id = 1 and a post_id = 1. For a given master_id, you should have only one entry across all tables. In this scenario you have no way of knowing whether master_id 1 refers to the user or to the post. A way to make sure this doesn't happen is to add a second column to the master_ids table, a type_id column. Type_id 1 can refer to users, type_id 2 can refer to posts, etc.. Then you are pretty much good.
Code "gymnastics" may be a bit necessary for inserts. If you're using a good ORM, it shouldn't be a problem. If not, stored procs for inserts are the way to go. But you're having your cake and eating it too.
I'm not sure I really understand the alternative you propose.
But in general, when given the choice of 1) "more tables" or 2) "a mega-table supported by a bunch of fancy code work" ..your interests are best served by more tables without the code gymnastics.
A Red Flag was "Add triggers on all tables in your database" each trigger fire is a performance hit of it's own.
The database designers have built in all kinds of technology to optimize tables/indexes, much of it behind the scenes without you knowing it. Just sit back and enjoy the ride.
Try these for inspiration Database Answers ..no affiliation to me.
An alternative to your approach might be to have the favorites table as user_id, object_id, object_type. When inserting in the favorites table just insert the type of the favorite. However i dont see a simple query being able to work with your approach or mine. One way to go about it might be to use UNION and get one combined resultset and then identify what type of record it is based on the type. Another thing you can do is, turn the UNION query into a MySQL VIEW and simply query that VIEW.
The benefit of using a single table for favorites is a simplicity, which some might consider as against the database normalization rules. But on the upside, you dont have to create so many favorites table and you can add anything to favorites easily by just coming up with a new object_type identifier.
It sounds like you have an is-a type relationship that needs to be modeled. All of the items that can be favourited are a type of "item". It sounds like you are on the right track, but I wouldn't use triggers. What could be the right answer if I have understood correctly, is to pull all the common fields into a single table called items (master is a poor name, master of what?), this should include all the common data that would be needed when you need a users favourite items, I'd expect this to include fields like item_id (primary key), item_type and human_readable_name and maybe some metadata about when the item was created, modified etc. Each of your specific item types would have its own table containing data specific to that item type with an item_id field that has a foreign key relationship to the item table. Then you'd wrap each item type in its own insertion, update and selection SPs (i.e. InsertItemCheese, UpdateItemMonkey, SelectItemCarKeys). The favourites table would then work as you describe, but you only need to select from the item table. If your app needs the specific data for each item type, it would have to be queried for each item (caching is your friend here).
If MySQL supports SPs with multiple result sets you could write one that outputs all the items as a result set, then a result set for each item type if you need all the specific item data in one go. For most cases I would not expect you to need all the data all the time.
Keep in mind that not EVERY use of a PK column needs a constraint. For example a logging table. Even though a logging table has a copy of the PK column from the table being logged, you can't build a constraint.
What would be the worst possible case. You insert a record for Oprah's TV show into the favorites table and then next year you delete the Oprah Show from the list of TV shows but don't delete that ID from the Favorites table? Will that break anything? Probably not. When you join favorites to TV shows that record will fall out of the result set.
There are a couple of ways to share values for PK's. Oracle has the advantage of sequences. If you don't have those you can add a "Step" to your Autonumber fields. There's always a risk though.
Say you think you'll never have more than 10 tables of "things which could be favored" Then start your PK's at 0 for the first table increment by 10, 1 for the second table increment by 10, 2 for the third... and so on. That will guarantee that all the values will be unique across those 10 tables. The risk is that a future requirement will add table 11. You can always 'pad' your guestimate