I need to onboard some of my merchants on a third party platform, for which I need to send merchant details along with a request ref no.(rrn) (unique for every onboarding request). After onboarding, whether the merchant actually got onboarded or not will be done by a seperate verify api, which will require me to send request ref no.
The question is whether I should add a new column in my already existing merchants table or should I create a separate table for mapping merchant_id with request rrn, given that only 5% of merchants will be onboarded on this third-party?
As long as the merchants table is small (less than 1 million), you can just add a column to the table and index it
I would suggest that you store this information in a separate table. You have described a simple scenario, but the actual scenario might be more complicated:
There is a timelag between starting the on-boarding and it being successful.
Perhaps the onboarding is not successful.
If a merchant can be "onboarded", they can be "offboarded".
And then onboarded again.
In other words, if you are managing the process, you should keep track of each of the possibilities.
If you are not managing the process and the merchant simply tells you that they are onboarded, then storing the information as a flag is fine.
Related
What is the best-practice for maintaining the integrity of linked data entities on update?
My scenario
I have two entities "Client and
Invoice". [client is definition and
Invoice is transaction].
After issuing many invoices to the
client it happens that the client
information needs to be changed
e.g. "his billing address/location
changed or business name ... etc".
It's normal that the users must be
able to update the client
information to keep the integrity of
the data in the system.
In the invoice "transaction entity"
I don't store just the client id but
also all the client information related to the
invoice like "client name, address,
contact", and that's well known
approach for storing data in
transaction entities.
If the user created a new invoice the
new client information will be
stored in the invoice record along
with the same client-id (very
obvious!).
My Questions
Is it okay to bind the data entities
"clients" from different locations
for the Insert and the update?
[Explanation: if I followed the
approach from step 1-4 I have to
bind the client entity from the
client table in case of creating new
invoice but in case of
updating/printing the invoice I have
to bind the client entity from the
invoice table otherwise the data
won't be consistent or integer...So
how I can keep the data integrity
without creating spaghetti code in
the DAL to handle this custom
requirements of data binding??]
I passed through a system that was
saving all previous versions of an
entity data before the update
"keeping history of all versions".
If I want to use the same method to
avoid the custom binding how I can
do this in term of database design
"Using MYSQL"? [Explanation: some
invoices created with version 1.0 of
the client then the client info
updated and its version became 1.1
and new invoices created with last
version...So is it good to follow
this methodology? and how I should
design my entities/tables to fulfil the requirements of entity
versioning and binding?
Please provide any book or reference
that can kick me in the right
direction?
Thanks,
What you need to do is leave the table the way it is. You are correct, you should be storing the customer information in the invoice for history of where the items were shipped to. When it changes, you should NOT update this information except for any invoices which have not yet been shipped. To maintain this type of information, you need a trigger on the customer table that looks for invoices that have not been shippe and updates those addresses automatically.
If you want to save historical versions of the client information, the correct process is to create an audit table and populate it through a trigger.
Data integrity in this case is simply through a foreign key to the customer id. The id itself should not ever change or be allowed to change by the user and should be a surrogate number such as an integer. Becasue you should not be changing the address information in the actual invoice (unless it has not been shipped in which case you had better change it or the product will be shipped to the wrong place), this is sufficent to maintain data integrity. This also allows you to see where the stuff was actually shipped but still look up the current info about the client through the use of the foreign key.
If you have clients that change (compaies bought by other companies), you can either run a process onthe server to update the customer id of old records or create a table structure that show which client ids belong to a current parent id. The first is easier to do if you aren;t talking about changing millions of records.
"This is a business case where data mnust be denormalized to preserve historical records of what was shipped where. His design is not incorrect."
Sorry for adding this as a new response, but the "add comment" button still doesn't show.
"His design" is indeed not incorrect ... because it is normalized !!!
It is normalized because it is not at all times true that the address corresponding to an invoice functionally depends on the customer ID exclusively.
So : normalization, yes I do think so. Not that normalization is the only issue involved here.
I'm not completely clear on what you are getting at, but I think you want to read up on normalization, available in many books on relational databases and SQL. I think what you will end up with is two tables connected by a foreign key, but perhaps some soul-searching per previous sentence will help you clarify your thoughts.
i have a table as below:
Account no. Login Name Numbering
1234 rty234 1
1234 bhoin1 1
3456 rty234 2
3456 0hudp 2
9876 cfrdk 3
From the table above, you can see that rty234 and bhoin1 registered a same account no of 1234, thus i know that rty234 and bhoin1 are related and i numbered them as 1. The numbering field was based on the account no.
Then I found that rty234 also registered another account no of 3456 and the same account no was registered by 0hudp as well. Thus, i concluded that rty234, bhoin1 and 0hudp are related. Therefore, i wanted to renumber the third and forth row to 1. If they are not further related, then just remain the numbering. How can i achieve that using mysql.
The expected output will be as follow:
Account no. Login Name Numbering New_Numbering
1234 rty234 1 1
1234 bhoin1 1 1
3456 rty234 2 1
3456 0hudp 2 1
9876 cfrdk 3 3
You need to understand how to design a relational database.
These groupings that you want to make with the New_Numbering field should be done at the time the accounts are registered. I see two pieces of arbitrary information that needs to be tracked: account number and login name. Seems like the people registering the account can type whatever they want here, effectively, perhaps account numbers must be numerical. That detail doesn't matter.
What you want here is one account which can have multiple account numbers associated with it, and multiple logins. I would also assume that future development may add more to this, for example - why do people need multiple logins? Maybe different people are using them, or different applications. Presumably, we could collect additional information about the login names that stores additional details about each login. The same could be said about account numbers - certainly they contain more detail than just an account number.
First, you need one main login table.
You describe rty234 and bhoin1 as if they are unique people. So make this is a login_name column which is a unique index in a login table. This table should have an auto-increment login_id as the primary key. Probably this table also has a password field and additional information about that person.
Second, create an account table.
After creating their login, make them register an account with that login. Make this a two-step process. When they offer a new account number, create a record for it in the account table with additional identifying information that only the account-holder would know. Somehow you have to validate that this is actually their account in order to create this record, I would think. This table would also contain an auto-incremented primary key called account_id in addition to account_no and probably other details about the account.
Third, create a login_account table.
Once you validate that a login actually should have access to an account, create a record here. This should contain a login_id and an account_id which connects these two tables. Additionally, it might be good to include the information provided which shows that this login should have access to this account.
Now, when you want to query this data, you can find groups of data that have the same login_id or account_id, or even that share either a login or an account with a specific registration. Beyond that, it gets hairy to do in an SQL query. So if you really want to be able to go through the data and see who is in the same organization or something, because they share either a login or an account with the same group, you have to have some sort of script.
Create an organization table.
This table should contain an organization_id so you can track it, but probably once you identify the group you'll want to add a name or additional notes, or link it to additional functionality. You can then also add this organization_id field to the login or account tables, so you can fill them once you know the organization. You have to think about if it's possible for two organizations to share accounts, and maybe there's a more complicated design necessary. But I'm going to keep it simple here.
Your script should load up all of the login_id and account_id values and cache them somewhere. Then go through them all and if they have an organization_id, put their login_id or account_id in a hashmap with the value as the organization_id. Then load up all of the login_account records. If either the login_id or account_id has an organization_id in its hashmap, then add the other to its hashmap with the same organization_id. (if there's already one there, it would violate the simple organization uniqueness assumption I made, but this is where you would handle complexity - so I would just throw an exception and see if it happens when I run the script)
Hopefully this is enough example to get you started. When you properly design a database like this, you allow the information to connect naturally. This makes column additions and future updates much easier. Good luck!
What is the "proper" (most normalized?) way to store requests in the database? For example, a user submits an article. This article must be reviewed and approved before it is posted to the site.
Which is the more proper way:
A) store it in in the Articles table with an "Approved" field which is either a 0, 1, 2 (denied, approved, pending)
OR
B) Have an ArticleRequests table which has the same fields as Articles, and upon approval, move the row data from ArticleRequests to Articles.
Thanks!
Since every article is going to have an approval status, and each time an article is requested you're very likely going to need to know that status - keep it inline with the table.
Do consider calling the field ApprovalStatus, though. You may want to add a related table to contain each of the statuses unless they aren't going to change very often (or ever).
EDIT: Reasons to keep fields in related tables are:
If the related field is not always applicable, or may frequently be null.
If the related field is only needed in rare scenarios and is better described by using a foreign key into a related table of associated attributes.
In your case those above reasons don't apply.
Definitely do 'A'.
If you do B, you'll be creating a new table with the same fields as the other one and that means you're doing something wrong. You're repeating yourself.
I think it's better to store data in main table with specific status. Because it's not necessary to move data between tables if this one is approved and the article will appear on site at the same time. If you don't want to store disapproved articles you should create cron script with will remove unnecessary data or move them to archive table. In this case you will have less loading of your db because you can adjust proper time for removing old articles for example at night.
Regarding problem using approval status in each query: If you are planning to have very popular site with high-load for searching or making list of article you will use standalone server like sphinx or solr(mysql is not good solution for this purposes) and you will put data to these ones with status='Approved'. Using delta indexing helps you to keep your data up-to-date.
Assuming a system similar to Netflix where members create a wish list of movies and, based on their type of plan, one, two, or more of those movies in their list turn into orders, which one of the following schemas makes more sense?
A controls table storing the following columns:
controls(memberid, currentMoviesAtHome, moviesAtHomeLimit, currentMonthlyMovies, monthlyMoviesLimit)
The user does not actually decide when the order is created as that depends on their account controls. A daily function will go through the customers and their controls and choose ones where currentMoviesAtHome < moviesAtHomeLimit AND currentMonthlyMovies < monthlyMoviesLimit ...
A separate accounts table linked to a plans plans table:
accounts(memberid, planid, currentMoviesAtHome, currentMonthlyMovies)
plans(planid, moviesAtHomeLimit, monthlyMoviesLimit)
The second option, having the ACCOUNTS and PLANS tables, is normalized so it would be my recommendation.
Additionally, these tables:
MOVIES
WISHLIST
movie_id (primary key, foreign key to MOVIES.movie_id)
account_id (primary key, foreign key to ACCOUNTS.account_id)
is_onsite
The is_onsite would be a boolean to determine if the movie has been sent to the client. If it has, value should be set to 1. Use this to sum to know if the account is at or under their plan limit. When videos are returned, only delete the rows that have is_onsite set to 1.
A daily function will go through the customers and their controls and choose
This doesn't answer your question but I thought I'd mention that your design is suboptimal. Rather than polling, as you describe above, you're much better off deciding what to do on-demand; that is, there will obviously be a time in your application's use where the limit values will be updated. What you should do is fire some kind of event at that time and consume the event that will decide whether or not to send out another movie.
Polling on a daily basis will not scale.
Firing and handling an event will not only be faster but it will be easier to maintain in the long run. Good luck.
What is the best-practice for maintaining the integrity of linked data entities on update?
My scenario
I have two entities "Client and
Invoice". [client is definition and
Invoice is transaction].
After issuing many invoices to the
client it happens that the client
information needs to be changed
e.g. "his billing address/location
changed or business name ... etc".
It's normal that the users must be
able to update the client
information to keep the integrity of
the data in the system.
In the invoice "transaction entity"
I don't store just the client id but
also all the client information related to the
invoice like "client name, address,
contact", and that's well known
approach for storing data in
transaction entities.
If the user created a new invoice the
new client information will be
stored in the invoice record along
with the same client-id (very
obvious!).
My Questions
Is it okay to bind the data entities
"clients" from different locations
for the Insert and the update?
[Explanation: if I followed the
approach from step 1-4 I have to
bind the client entity from the
client table in case of creating new
invoice but in case of
updating/printing the invoice I have
to bind the client entity from the
invoice table otherwise the data
won't be consistent or integer...So
how I can keep the data integrity
without creating spaghetti code in
the DAL to handle this custom
requirements of data binding??]
I passed through a system that was
saving all previous versions of an
entity data before the update
"keeping history of all versions".
If I want to use the same method to
avoid the custom binding how I can
do this in term of database design
"Using MYSQL"? [Explanation: some
invoices created with version 1.0 of
the client then the client info
updated and its version became 1.1
and new invoices created with last
version...So is it good to follow
this methodology? and how I should
design my entities/tables to fulfil the requirements of entity
versioning and binding?
Please provide any book or reference
that can kick me in the right
direction?
Thanks,
What you need to do is leave the table the way it is. You are correct, you should be storing the customer information in the invoice for history of where the items were shipped to. When it changes, you should NOT update this information except for any invoices which have not yet been shipped. To maintain this type of information, you need a trigger on the customer table that looks for invoices that have not been shippe and updates those addresses automatically.
If you want to save historical versions of the client information, the correct process is to create an audit table and populate it through a trigger.
Data integrity in this case is simply through a foreign key to the customer id. The id itself should not ever change or be allowed to change by the user and should be a surrogate number such as an integer. Becasue you should not be changing the address information in the actual invoice (unless it has not been shipped in which case you had better change it or the product will be shipped to the wrong place), this is sufficent to maintain data integrity. This also allows you to see where the stuff was actually shipped but still look up the current info about the client through the use of the foreign key.
If you have clients that change (compaies bought by other companies), you can either run a process onthe server to update the customer id of old records or create a table structure that show which client ids belong to a current parent id. The first is easier to do if you aren;t talking about changing millions of records.
"This is a business case where data mnust be denormalized to preserve historical records of what was shipped where. His design is not incorrect."
Sorry for adding this as a new response, but the "add comment" button still doesn't show.
"His design" is indeed not incorrect ... because it is normalized !!!
It is normalized because it is not at all times true that the address corresponding to an invoice functionally depends on the customer ID exclusively.
So : normalization, yes I do think so. Not that normalization is the only issue involved here.
I'm not completely clear on what you are getting at, but I think you want to read up on normalization, available in many books on relational databases and SQL. I think what you will end up with is two tables connected by a foreign key, but perhaps some soul-searching per previous sentence will help you clarify your thoughts.