combine data - keep unique key - mysql

I have several large tables (~100 million rows in total) which all have a similar schema: They log certain settings of an object (u_id) at a point of time
u_id | x | y | time
---------------------------
1 | 2 | 3 | [timestamp]
1 | 1 | 3 | [timestamp]
2 | 1 | 2 | [timestamp]
2 | 2 | 5 | [timestamp]
3 | 3 | 2 | [timestamp]
I now want to combine these tables into one large table which is holding ALL data. However I want to leave the u_ids unique. Obviously each source table does have e.g. u_id 1. When combining the data in the result table the entries should still be distinguishable (however I do not need to associate them back to their original values). This only has to be done once so performance does not matter.
My first idea was to add a prefix (like a_, b_, etc.) to each u_id before writing it to the destination but this obviously would introduce overhead. I'd prefer that the destination table would use an AI value for minimum overhead but I don't know how to achieve that as each source u_id can have multiple (several thousand) entries.

I think you should take one column for Type in your destination table . Type will be represent different tables of source . then you can combine u_id and Type as primary key . it will solve your problem .

Related

Insert date into multiple table or create one table with more column and store at once

Suppose I have several user tables i.e user_table1, user_table2, user_table3, user_table..... . I have created this because if I create single-user table, it will create approx 150 columns. So I separated it with fewer columns table. In user_table1 user_id is set to primary key and rest of the table I set user_id as a foreign key.
user_table1
-------------------------------------------
| user_id | column1 | column2 | column.....|
-------------------------------------------
| 1 | value 1 | value 2 | value .....|
-------------------------------------------
user_table2
-----------------------------------------------
| user_id(fk) | column1 | column2 | column.....|
-----------------------------------------------
| 1 | value 1 | value 2 | value .....|
-----------------------------------------------
user_table3
-----------------------------------------------
| user_id(fk) | column1 | column2 | column.....|
-----------------------------------------------
------ ----- --------- ----- ------- -----
The first table is generally stored for login details and some other value when the user register. So my question is after registration when user edit their profile (profile details value will store into another table i.e user_table2, user_table3) how to insert to another table? Is this method is ok or I should create one table with 150 columns?
I can't think of a case where multiple tables that all have a 1:1 relationship to each other do anything other than add complexity to working with them since simple look-ups become joins, etc. If you really don't have duplicated information, one large table is probably easier to work with. If you are having data repeated across multiple users, then you should probably reevaluate your schema and set up tables that represent different types of information.

Avoid Duplicate Records with BeforeChange Table Event

I have a situation in MS Access database that I must prevent duplicate records based on combination of three attributes:
StudentNumber
ColleagueID
TypeOfAttending
So, for one combination (StudentNumber & ColleagueID) I have three types of attending: A, B and C.
Here is an example:
+---------------+-------------+---------------+
| StudentNumber | ColleagueID | AttendingType |
+---------------+-------------+---------------+
| 100 | 10 | A |
| 100 | 10 | B |
| 100 | 10 | C |
| 100 | 11 | A |
| 100 | 11 | B |
| 100 | 11 | C |
| 100 | 11 | C |
+---------------+-------------+---------------+
So last row would not be acceptable.
Does anyone have any idea?
As noted, you could choose all 3 as a PK. Or you can even create a unique index on all 3 columns. These two ideas are thus code free.
Last but least, you could use a Before change macro,and do a search (lookup) in the table to check if the existing record exists. So far, given your information, likely a unique index is the least effort, and does not require you to change the PK to all 3 columns (which as noted is a another solution).
So, you could consider a before change macro. And use this:
Lookup a Record in MyTable
Where Condition = [z].[Field1]=[MyTable].[Field1] And
[z].[Field2]=[MyTable].[Field2] And
[z].[ID]<>[MyTable].[ID]
Alias Z
RaiseError -123
Error Description: There are other rows with this data
So, you can use a data macro, use the before change table macro. Make sure you have the raise error code indented "inside" of the look up code. And note how we use a alias for the look up, since the table name (MyTable) is already in context, and is already the current row of data, so we lookup using "z" as a alias to distinguish between the current row, and that of lookup record.
So, from a learning point of view, the above table macro can be used, but it likely less work and effort to simply setup a uniquie index on all 3 columns.

Over 2500 tables in mysql

My application stores login information of over 2500 employees in a table named "emp_login".
Now I have to store the activities of every employee on daily basis. For this purpose i have created a separate table for every employee. E.g. emp00001, emp0002... Each table will have about 50 columns.
After digging in alot on stackoverflow I'm kind of confused. Many of the experts say that database having more than 200-300 tables on mysql is considered to be poorly designed.
My question is whether it is good idea to have such a bulk of tables? Is my database poorly designed? Should i choose other database like mssql? Or some alternative idea is there to handle the database of such applications??
Do -not- do it that way. Every employee should be in 1 table and have a primary key index ID ie:
1: Tom
2: Pete
You then assign the actions with a column that references the employees ID number
Action, EmployeeID
You should always group identical entities in a table with index ids and then link properties / actions to those entities by Id. Imagine what you would have to do to search a database that consisted of a different table for every employee. Would defeat the whole point of using SQL.
Event table could look like:
Punchin, 1, 2018/01/01 00:00
That would tell you Tom punched In at 2018/01/01 00:00. This is a very simple example, and you prob wouldn’t wanna structure an event table that way but it should get you on the right track.
This is nothing to do with MySQL but to do with your design which is flawed. You should have one table for all your employees. This contains information unique to the employees such as firstname, lastname and email address.
|ID | "John" | "Smith" | "john.smith#gmail.com" |
|1 | "James" | "Smith" | "james.smith#gmail.com" |
|2 | "jane" | "Jones" | "jane.jones.smith#yahoo.com" |
|3 | "Joanne" | "DiMaggio" | "jdimaggio#outlook.com" |
Note the ID column. Typicially this would be an integer with AUTO_INCREMENT set and you would make it the Primary Key. Then you get a new unique number every time you add a new user.
Now you have separate tables for every piece of RELATED data. E.g. the city they live in or their login time (which I'm guessing you want from the table name).
If it's a one to many relationship (i.e. each user has many login times), you create a single extra table which REFERENCES your first table. This is a DEPENDENT table. Like so:
| UserId | LoginTime |
| 1 | "10:00:04 13-09-2018" |
| 2 | "11:00:00 13-09-2018" |
| 3 | "11:29:07 14-09-2018" |
| 1 | "09:00:00 15-09-2018" |
| 2 | "10:00:00 15-09-2018" |
Now when you query your database you do a JOIN on the UserId field to connect the two tables. If it were only their LAST login time, then you could put it in the user table because it would be a single piece of data. But because they will have many login times, then login times needs to be its own table.
(N.b. I haven't put an ID column on this table but it's a good idea.)
If it's data that ISN'T unique to the each user, i.e. it's a MANY to MANY relationship, such as the city they live in, then you need two tables. One contains the cities and the other is an INTERMEDIARY table that joins the two. So as follows:
(city table)
| ID | City |
| 1 | "London" |
| 2 | "Paris" |
| 3 | "New York" |
(city-user table)
| UserID | CityID |
| 1 | 1 |
| 2 | 1 |
| 3 | 3 |
Then you would do two JOINS to connect all three tables and get which city each employee lived in. Again, I haven't added an ID field and PRIMARY KEY to the intermediary table because it isn't strictly necessary (you could create a unique composite key which is a different discussion) but it would be a good idea.
That's the basic thing you need to know. Always divide your data up by function. Do NOT divide it up by the data itself (i.e. table per user). The thing you want to look up right now is called "Database Normalization". Stick that into a search engine and read a good overview. It wont take long and will help you enormously.

Mysql insertion order [duplicate]

This question already has answers here:
Return rows in the exact order they were inserted
(4 answers)
Closed 4 years ago.
I don't know whether it is already answered. I hadn't got any answers.In Mysql tables, the rows will be arranged in the order of primary key. For example
+----+--------+
| id | name |
+----+--------+
| 1 | john |
| 2 | Bryan |
| 3 | Princy |
| 5 | Danny |
+----+--------+
If I insert anothe row insert into demo_table values(4,"Michael").The table will be like
+----+---------+
| id | name |
+----+---------+
| 1 | john |
| 2 | Bryan |
| 3 | Princy |
| 4 | Michael |
| 5 | Danny |
+----+---------+
But I need the table to be like
+----+---------+
| id | name |
+----+---------+
| 1 | john |
| 2 | Bryan |
| 3 | Prince |
| 5 | Danny |
| 4 | Michael |
+----+---------+
I want the row to be concatenated to the table i.e.,
The rows of the table should be in the order of insertion.Can anybody suggest me the query to get it.Thank you for any answer in advance.
There is in general no internal order to the records in a MySQL table. The only order which exists is the one you impose at the time you query. You typically impose that order using an ORDER BY clause. But there is a bigger design problem here. If you want to order the records by the time when they were inserted, then you should either add a dedicated column to your table which contains a timestamp, or perhaps make the id column auto increment.
If you want to go with the latter option, here is how you would do that:
ALTER TABLE demo_table MODIFY COLUMN id INT auto_increment;
Then, do your insertions like this:
INSERT INTO demo_table (name) VALUES ('Michael');
The database will choose an id value for the Michael record, and in general it would be greater than any already existing id value. If you need absolute control, then adding a timestamp column might make more sense.
Just add another Column Created (Timestamp) in your table to store the time of insertion
Then use this Command for insertion
insert into demo_table id, name,created values(4,"Michael",NOW())
The NOW() function returns the current date and time.
Since you are recording the timestamp, it can be also used for future reference too
It's not clear why you want to control the "order" in which the data is stored in your table. The relational model does not support this; unless you specify an order by clause, the order in which records are returned is not deterministic.. Even if it looks like data is stored in a particular sequence, the underlying database engine can change its mind at any point in time without breaking the standards or documented behaviours.
The fact you observe a particular order when executing a select query without order by is a side effect. Side effects are usually harmless, right up to the point where the mean feature changes and the side effect's behaviour changes too.
What's more - it's generally a bad idea to rely on the primary key to have "meaning". I assume your id column represents a primary key; you should really not rely on any business meaning in primary keys - this is why most people use surrogate keys. Depending on the keys indicating in which order a record was created is probably harmless, but it still seems like a side effect to me. In this, I don't support #TimBiegeleisen's otherwise excellent answer.
If you care about the order in which records were entered, make this explicit in the schema by adding a timestamp column, and write your select statement to order by that timestamp. This is the least sensitive to bugs or changes in the underlying logic/database engine.

1 very large table or 3 large table? MySQL Performance

Assume a very large database. A table with 900 million records.
Method A:
Table: Posts
+----------+-------------- +------------------+----------------+
| id (int) | item_id (int) | post_type (ENUM) | Content (TEXT) |
+----------+---------------+------------------+----------------+
| 1 | 1 | user | some text ... |
+----------+---------------+------------------+----------------+
| 2 | 1 | page | some text ... |
+----------+---------------+------------------+----------------+
| 3 | 1 | group | some text ... |
// row 1 : User with ID 1 has a post with ID #1
// row 2 : Page with ID 1 has a post with ID #2
// row 3 : Group with ID 1 has a post with ID #3
The goal is displaying 20 records from all 3 post_types in a page.
SELECT * FROM posts LIMIT 20
But I am worried about number of records for this method
Method B:
Separate 900 million records to 3 tables with 300 millions for each one.
Table: User Posts
+----------+-------------- +----------------+
| id (int) | user_id (int) | Content (TEXT) |
+----------+---------------+----------------+
| 1 | 1 | some text ... |
+----------+---------------+----------------+
| 2 | 2 | some text ... |
+----------+---------------+----------------+
| 3 | 3 | some text ... |
Table: Page Posts
+----------+-------------- +----------------+
| id (int) | page_id (int) | Content (TEXT) |
+----------+---------------+----------------+
| 1 | 1 | some text ... |
+----------+---------------+----------------+
| 2 | 2 | some text ... |
+----------+---------------+----------------+
| 3 | 3 | some text ... |
Table: Group Posts
+----------+----------------+----------------+
| id (int) | group_id (int) | Content (TEXT) |
+----------+----------------+----------------+
| 1 | 1 | some text ... |
+----------+----------------+----------------+
| 2 | 2 | some text ... |
+----------+----------------+----------------+
| 3 | 3 | some text ... |
now to get a list of 20 posts to display
SELECT * FROM User_Posts LIMIT 10
SELECT * FROM Page_Posts LIMIT 10
SELECT * FROM group_posts LIMIT 10
// and make an array or object of result. and display in output.
In this method, I should sort them in an array in php, and then semd them to page.
Which method is preferred?
Separating a 900 million records table to three tables will affect on speed of reading and writing in mysql?
This is actually a discussion about Singe - Table - Inheritance vs. Table Per Class Inheritance and missing out joined inheritance. The former is related to Method A, the second to your Method B and Method C would be as having all IDs of your posts in one table and deferring specific attributes for group or user - posts ijto different tables.
Whilst having a big sized table always has its negativ impacts related to table full scans the approach of splitting tables has it's own , too. It depends on how often your application needs to access the whole list of posts vs. only retrieving certain post types.
Another consideration you should take into account is data partitioning which can be done with MySQL or Oracle Database e.g. which is a way of organizing your data within tables given opportunities for information lifecycle (which data is accessed when and how often, can part of it be moved and compressed reducing database size and increasing the speed for accessing the left part of the data in the table), which is basically split into three major techniques:
Range based partitioning, list based partitioning and hash based partitioning.
Other features not so commonly supported related to reducing table sizes are the ones dealing with insert's with timestamp invalidating the inserted data automatically after a certain timeperiod has expired.
What indeed is a major application design decision and can boost performance is to distinguish between read and writeaccesses to the database at application level.
Consider a MySQL - Backend: Because writeaccesses are obviously more critical to database performance then read accesses you could setup a MySQL - Instance for writing to the database and another one as replicant of this for the readaccesses, though this is also discussable, mainly when it comes to RDT (real time decisions), where absolute consistency of data at any given time is a must.
Using object pools as a layer between your application and the database also is a technique to improve application performance though I don't know of existing solutions in the PHP world yet. Oracle Hot Cache is a pretty sophisticated example of it.
You could build your own one implemented on top of a in - memory database or using memcache, though.