Is there an easy way to backup and restore partial data from a mysql database while maintaining the FK constraints?
Say if I have 2 tables
| CustomerId | CustomerName |
-----------------------------
| 12 | Bon Jovi |
| 13 | Seal |
and
| AddressId| CustomerId | City |
---------------------------------------
| 1 | 12 | London |
| 2 | 13 | Paris |
The backup would only take customer 12 and address 1.
My goal is to take a large database from a production server and replicate it locally, but with partial data.
Due to fairly complicated schema, a custom query is not an option. Also I can't rely on the existence of a main table from which one would get the related rows.
Thanks
You could replicate specific customers manually and by adding an FK constraint on the address table replication will fail to insert/update these records.
For replicating specified tables in the db http://dev.mysql.com/doc/refman/5.1/en/replication-options-slave.html#option_mysqld_replicate-do-table .
Use this parameter to silently skip errors on replication http://dev.mysql.com/doc/refman/5.1/en/replication-options-slave.html#sysvar_slave_skip_errors .
Related
I'm working on a project that synchronizes contents on multiple clusters. I want to use two programs to do this task. One program periodically updates the existence status of contents on each cluster, The other program do the content copy when one content is not on all clusters.
The database tables are designed in following way.
table contents -- stores content information.
+----+----------+
| id | name |
+----+----------+
| 1 | content_1|
| 2 | content_2|
| 3 | content_3|
+----+----------+
table clusters -- stores cluster information
+----+----------+
| id | name |
+----+----------+
| 1 | cluster_1|
| 2 | cluster_2|
+----+----------+
table content_cluster -- each record indicates that one content is on one cluster
+----------+----------+-------------------+
|content_id|cluster_id| last_update_date|
+----------+----------+-------------------+
| 1 | 1 |2020-10-01T11:30:00|
| 2 | 2 |2020-10-01T11:30:00|
| 3 | 1 |2020-10-01T10:30:00|
| 3 | 2 |2020-10-01T10:30:00|
+----------+----------+-------------------+
The first program updates these tables periodically (a little part may changed, most of the table stay the same). The second program iteratively gets one content record which is not on all clusters. The select query is something as follow.
SELECT content_id
FROM content_cluster
GROUP BY content_id
HAVING COUNT(cluster_id) < <cluster_number>
LIMIT 1;
It seems not so effective as I have to group the table on my each query.
As I am not familiar with database, I'm wondering if it is a good way to design the database. Do I have to add some indices? How should I write select query to make it effective?
I am trying to build one giant schema that makes data users to query easier, in order to achieve that, streaming events have to be joined with User Metadata by USER_ID and ID. In data engineering, This operation is called "Data Enrichment" right? the tables below are the example.
# `Event` (Stream)
+---------+--------------+---------------------+
| UERR_ID | EVENT | TIMESTAMP |
+---------+--------------+---------------------+
| 1 | page_view | 2020-04-10T12:00:11 |
| 2 | button_click | 2020-04-10T12:01:23 |
| 3 | page_view | 2020-04-10T12:01:44 |
+---------+--------------+---------------------+
# `User Metadata` (Static)
+----+-------+--------+
| ID | NAME | GENDER |
+----+-------+--------+
| 1 | Matt | MALE |
| 2 | John | MALE |
| 3 | Alice | FEMALE |
+----+-------+--------+
==> # Result
+---------+--------------+---------------------+-------+--------+
| UERR_ID | EVENT | TIMESTAMP | NAME | GENDER |
+---------+--------------+---------------------+-------+--------+
| 1 | page_view | 2020-04-10T12:00:11 | Matt | MALE |
| 2 | button_click | 2020-04-10T12:01:23 | John | MALE |
| 3 | page_view | 2020-04-10T12:01:44 | Alice | FEMALE |
+---------+--------------+---------------------+-------+--------+
I was developing this using Spark, and User Metadata is stored in MySQL, then I realized it would be waste of parallelism of Spark if the spark code includes joining with MySQL tables right?
The bottleneck will be happening on MySQL if traffic will be increased I guess..
Should I store those table to key-value store and update it periodically?
Can you give me some idea to tackle this problem? How you usually handle this type of operations?
Solution 1 :
As you suggested you can keep a local cache copy of in key-value pair on your local and updated the cache as regular interval.
Solution 2 :
You can use a MySql to Kafka Connector as below,
https://debezium.io/documentation/reference/1.1/connectors/mysql.html
For every DML or table alter operations on your User Metadata Table there will be a respective event fired to a Kafka topic (e.g. db_events). You can run a thread in parallel in your Spark streaming job which polls db_events and updates your local cache key-value.
This solution would make your application a near-real time application in true sense.
One over head I can see is that there will be need to run a Kafka Connect service with Mysql Connector (i.e. Debezium) as a plugin.
I am creating an mobile application. In this app, I have created a Login and Register activity. I have also created a online Database using AWS(Amazon Web Service) to store all the login information of the user upon registering.
In my database, i have a table called 'users'. This table holds the following fields "fname","lname","username","password". This part works and successfully stores data from my phone to the database.
for example,
| fname | lname | username | password |
| ------ | ------ | -------- | -------- |
| john | doe | jhon123 | 1234 |
Inside the app, I have an option where the user may click on "Start Log", which will record a start and end values on a seekBar.
How can i create a table under a user who is logged in.
(Essentially, i want to be able to create multiple tables under a user.)
for example,
This table should be under the user "john123":
| servo | Start | End |
| ------ | ------ | --- |
| 1 | 21 | 30 |
| 2 | 30 | 11 |
| 3 | 50 | 41 |
| 4 | 0 | 15 |
I know its a confusing question, but
i am essentially just trying to have multiple tables linked to a user.
As to:
How to create a table for a user in a Database
Here are some GUI tools you might find useful:
MySQL - MySQL Workbench
PostgreSQL - PG Admin
As for creating a separate table for each user, refer to #Polymath's answer. There is no benefit in creating separate tables for each user (you might as well use a json file).
What you should do is create a logs table that has a user_id attribute referencing the id in the users table.
-------------------------------------------------------
| id | fname | lname | username | password |
| -- | ------ | ------ | -------- | ------------------- |
| 1 | john | doe | jhon123 | encrypted(password) |
-------------------------------------------------------
|______
|
V
---------------------------------------
| id | user_id | servo_id | start | end |
| -- | ------- | -------- | ----- | --- |
| 1 | 1 | 1 | 21 | 30 |
| 2 | 1 | 2 | 30 | 11 |
---------------------------------------
You should also look into database normalization as your "john123" table is not in 3NF. The servo should be decomposed out of logs table if it will be logged by multiple users or multiple times (which I'm guessing is the case for you).
In reading this I wonder if your design is right. It sounds like you are trying to create a table for each user. I also wonder how scalable it is to have a unique table per user. if you scale to millions of users you will have millions of tables to manage and will probably need a separate index table to find the right table for the user. Why a table for each? Why not a single table with the UserID as a use key value. You can extract the data just by filtering on the UserID.
Select * FROM UsersData ORDER BY DateTime WHERE User == UserID
However I will leave that for you to ponder.
You mentioned that this is a Mobile App. I think what you need to do is look at AWS Federated access and Cognito which will allow you to Identify a user using federate Identities. Pass the User unique Id , plus a temporary (one use) credentials linked to an access Role. Combined this way, you can scale to millions of users with full authentication without managing millions of accounts.
RL
We have been developing the system at my place of work for sometime now and I feel the database design is getting out of hand somewhat.
For example we have a table widgets (I'm spoofing these somewhat):
+-----------------------+
| Widget |
+-----------------------+
| Id | Name | Price |
| 1 | Sprocket | 100 |
| 2 | Dynamo | 50 |
+-----------------------+
*There's about 40+ columns on this table already
We want to add on a property for each widget for packaging information. We need to know if it has packaging information, if it doesn't have packaging information or we don't know if it does or doesn't. We then need to also store the type of packaging details (assuming it does or maybe it doesn't and it's reduntant info now).
We already have another table which stores the details information information (I personally think this table should be divided up but that's another issue).
PD = PackageDetails
+--------------------------------+
| System Properties |
+--------------------------------+
| Id | Type | Value |
| 28 | PD | Boxed |
| 29 | PD | Vacuum Sealed |
+--------------------------------+
*There's thousands of rows in the table for all system wide table properties
Instinctively I would create a number of mapping tables to capture this information. I have however been instructed to just add another column onto each table to avoid doing a join.
My solution:
Create tables:
+---------------------------------------------------+
| widgets_packaging |
+---------------------------------------------------+
| Id | widget_id | packing_info | packing_detail_id |
| 1 | 27 | PACKAGED | 2 |
| 2 | 28 | UNKNOWN | NULL |
+---------------------------------------------------+
+--------------------+
| packaging |
+--------------------+
| Id | |
| 1 | Boxed |
| 2 | Vacuum Sealed |
+--------------------+
If I want to know what packaging a widget has I join through to widgets_packaging and join again to packaging if I want to know the exact details. Therefore no more columns on the widgets table.
I have however been told to ignore this and put the value int for the packing information and another as a foreign key to System Properties table to find the packaging details. Therefore adding another two columns to the table and creating yet more rows in the system properties table to store package details.
+------------------------------------------------------------+
| Widget |
+------------------------------------------------------------+
| Id | Name |Price | has_packaging | packaging_details |
| 1 | Sprocket |100 | 1 | 28 |
| 2 | Dynamo |50 | 0 | 29 |
+------------------------------------------------------------+
The reason for this is because it's simpler and doesn't involve a join if you only want to know if the widget has packaging (there are lots of widgets). They are concerned that more joins will slow things down.
Which is the more correctly solution here and are their concerns about speed legitimate? My gut instint is that we can't just keep adding columns onto the widgets table as it is growing and growing with flags for properties at present.
The answer to this really depends on whether the application(s) using this database are read or write intensive. If it's read intensive, the de-normalized structure is a better approach because you can make use of indexes. Selects are faster with fewer joins, too.
However, if your application is write intensive, normalization is a better approach (the structure you're suggesting is a more normalized approach). Tables tend to be smaller, which means they have a better chance of fitting into the buffer. Also, normalization tends to lead to less duplication of data, which means updates and inserts only need to be done in one place.
To sum it up:
Write Intensive --> normalization
smaller tables have a better chance of fitting into the buffer
less duplicated data, which means quicker updates / inserts
Read Intensive --> de-normalization
better structure for indexes
fewer joins means better performance
If your application is not heavily weighted toward reads over writes, then a more mixed approach would be better.
I am little confused if I should go in this way
I have tables like
| account | | Acc_registration_Info |
| AccID_PK | | AccRegInfo_PK |
| | | |
| | | |
Should I connect them between both primary keys? Also how to secure them in case of mismatching IDs?
I am trying to follow by Advanture Works DB structure, but this is little hard to understand, some of AW DB tables are splitted as hell (like users and their passwords in different tables).
I don't really feel confident about making so much tables and relate them one-to-one by PKs... My other hard decision is to connect Shop table with details shop informations table by PK, etc. etc.
On the other hand making too much non-primary columns to connect other tables doesn't look awesome
i think you have to make one primary key of a table the foreign key of the ather, that's how it work:
| account | | Acc_registration_Info |
| AccID_PK | | AccRegInfo_PK |
| #AccRegInfo_FK | | |
| | | |
like that if you want to know the reg info for an account you have just to pick the #AccRegInfo_FK of that account (in account table) and compart it to AccRegInfo_PK (in reg info table) and you ll get what you wnat , and of course what is called in relation databases joint