I'm puzzled and looking for a way out here. I will appreciate any help:
I am sending notifications from a server to Android devices using GCM. In Mysql, I have a User Table (UT) with user ID, user data and GCM registration ID. I also have a User Notifications Table (UNT) in which I store the notification types that each user is registered to. This table includes the user ID and the notification type ID.
When Sending the notification, I need to go through UNT and build an array of all user IDs that are registered to this type of notification. Then I need to go through the UT and get the GCM Registration ID for each user and send the notification.
DB design-wise, I believe that this is the right way to do it. However, in notification sending, speed is a major issue if I want a million users to get the notification a few seconds after sending it. Going through 2 tables significantly increases the processing time (I measured 47 seconds for 1 million users when going through both tables compared to 17 seconds when going through 1 table).
The question is will it be right to store the GCM registration ID also in the UNT so I won't have to go through the UT? Again... DB design wise it is incorrect but GCM wise, it might be the best solution.
If you know of additional methods to solve this issue, I'll be happy to hear about it.
Thank you
You can always decide to hold data redundantly. Yes, this means de-normalizing data, but is something that is often done when you need quick access to many data - in data warehouses for instance.
The dbms even supports this by ON UPDATE CASCADE. But GCM registration ID must be unique in the table.
So either it is unique in UT, then just add the field in UNT, fill it, and create the foreign key with the cascade option.
Or it is not unique in UT, then you need a GCM table (which you should have then anyhow) and have this foreign key from UNT to GCM then. (But in this case you would have to think about if it is really a user notification table you need or a a GCM notification table or both.)
Related
I am working on a little package using PHP and MySQL to handle entries for events. After completing an entry form the user will see all his details on a page called something like website.com/entrycomplete.php?entry_id=15 where the entry_id is a sequential number. Obviously it will be laughably easy for a nosey person to change the entry_id number and look at other people's entries.
Is there a simple way of camouflaging the entry_id? Obviously I'm not looking to secure the Bank of England so something simple and easy will do the job. I thought of using MD5 but that produces quite a long string so perhaps there is something better.
Security through obscurity is no security at all.
Even if the id's are random, that doesn't prevent a user from requesting a few thousand random id's until they find one that matches an entry that exists in your database.
Instead, you need to secure the access privileges of users, and disallow them from viewing data they shouldn't be allowed to view.
Then it won't matter if the id's are sequential.
If the users do have some form of authentication/login, use that to determine if they are allowed to see a particular entry id.
If not, instead of using a url parameter for the id, store it in and read it from a cookie. And be aware that this is still not secure. An additional step you could take (short of requiring user authentication) is to cryptographically sign the cookie.
A better way to implement this is to show only the records that belong to that user. Say the id is the unique identifier for each user. Now store both entry_id and id in your table (say table name is entries).
Now when the user requests for record, add another condition in the mysql query like this
select * from entries where entry_id=5 and id=30;
So if entry_id 5 does not belong to this user, it will not have any result at all.
Coming towards restricting the user to not change his own id, you can implement jwt tokens. You can give a token on login and add it to every call. You can then decrypt the token in the back end and get the user's actual id out of it.
What is the best-practice for maintaining the integrity of linked data entities on update?
My scenario
I have two entities "Client and
Invoice". [client is definition and
Invoice is transaction].
After issuing many invoices to the
client it happens that the client
information needs to be changed
e.g. "his billing address/location
changed or business name ... etc".
It's normal that the users must be
able to update the client
information to keep the integrity of
the data in the system.
In the invoice "transaction entity"
I don't store just the client id but
also all the client information related to the
invoice like "client name, address,
contact", and that's well known
approach for storing data in
transaction entities.
If the user created a new invoice the
new client information will be
stored in the invoice record along
with the same client-id (very
obvious!).
My Questions
Is it okay to bind the data entities
"clients" from different locations
for the Insert and the update?
[Explanation: if I followed the
approach from step 1-4 I have to
bind the client entity from the
client table in case of creating new
invoice but in case of
updating/printing the invoice I have
to bind the client entity from the
invoice table otherwise the data
won't be consistent or integer...So
how I can keep the data integrity
without creating spaghetti code in
the DAL to handle this custom
requirements of data binding??]
I passed through a system that was
saving all previous versions of an
entity data before the update
"keeping history of all versions".
If I want to use the same method to
avoid the custom binding how I can
do this in term of database design
"Using MYSQL"? [Explanation: some
invoices created with version 1.0 of
the client then the client info
updated and its version became 1.1
and new invoices created with last
version...So is it good to follow
this methodology? and how I should
design my entities/tables to fulfil the requirements of entity
versioning and binding?
Please provide any book or reference
that can kick me in the right
direction?
Thanks,
What you need to do is leave the table the way it is. You are correct, you should be storing the customer information in the invoice for history of where the items were shipped to. When it changes, you should NOT update this information except for any invoices which have not yet been shipped. To maintain this type of information, you need a trigger on the customer table that looks for invoices that have not been shippe and updates those addresses automatically.
If you want to save historical versions of the client information, the correct process is to create an audit table and populate it through a trigger.
Data integrity in this case is simply through a foreign key to the customer id. The id itself should not ever change or be allowed to change by the user and should be a surrogate number such as an integer. Becasue you should not be changing the address information in the actual invoice (unless it has not been shipped in which case you had better change it or the product will be shipped to the wrong place), this is sufficent to maintain data integrity. This also allows you to see where the stuff was actually shipped but still look up the current info about the client through the use of the foreign key.
If you have clients that change (compaies bought by other companies), you can either run a process onthe server to update the customer id of old records or create a table structure that show which client ids belong to a current parent id. The first is easier to do if you aren;t talking about changing millions of records.
"This is a business case where data mnust be denormalized to preserve historical records of what was shipped where. His design is not incorrect."
Sorry for adding this as a new response, but the "add comment" button still doesn't show.
"His design" is indeed not incorrect ... because it is normalized !!!
It is normalized because it is not at all times true that the address corresponding to an invoice functionally depends on the customer ID exclusively.
So : normalization, yes I do think so. Not that normalization is the only issue involved here.
I'm not completely clear on what you are getting at, but I think you want to read up on normalization, available in many books on relational databases and SQL. I think what you will end up with is two tables connected by a foreign key, but perhaps some soul-searching per previous sentence will help you clarify your thoughts.
I am developing an app with PhoneGap and have been storing the user id and user level in local storage, for example:
window.localStorage["userid"] = "20";
This populates once the user has logged in to the app. This is then used in ajax requests to pull in their information and things related to their account (some of it quite private). The app is also been used in web browser as I am using the exact same code for the web. Is there a way this can be manipulated? For example user changes the value of it in order to get info back that isnt theirs?
If, for example another app in their browser stores the same key "userid" it will overwrite and then they will get someone elses data back in my app.
How can this be prevented?
Before go further attack vectors, storing these kind of sensitive data on client side is not good idea. Use token instead of that because every single data that stored in client side can be spoofed by attackers.
Your considers are right. Possible attack vector could be related to Insecure Direct Object Reference. Let me show one example.
You are storing userID client side which means you can not trust that data anymore.
window.localStorage["userid"] = "20";
Hackers can change that value to anything they want. Probably they will changed it to less value than 20. Because most common use cases shows that 20 is coming from column that configured as auto increment. Which means there should be valid user who have userid is 19, or 18 or less.
Let me assume that your application has a module for getting products by userid. Therefore backend query should be similar like following one.
SELECT * FROM products FROM owner_id = 20
When hackers changed that values to something else. They will managed to get data that belongs to someone else. Also they could have chance to remove/update data that belongs to someone else agains.
Possible malicious attack vectors are really depends on your application and features. As I said before you need to figure this out and do not expose sensitive data like userID.
Using token instead of userID is going solved that possible break attemps. Only things you need to do is create one more columns and named as "token" and use it instead of userid. ( Don't forget to generate long and unpredictable token values )
SELECT * FROM products FROM owner_id = iZB87RVLeWhNYNv7RV213LeWxuwiX7RVLeW12
What is the best-practice for maintaining the integrity of linked data entities on update?
My scenario
I have two entities "Client and
Invoice". [client is definition and
Invoice is transaction].
After issuing many invoices to the
client it happens that the client
information needs to be changed
e.g. "his billing address/location
changed or business name ... etc".
It's normal that the users must be
able to update the client
information to keep the integrity of
the data in the system.
In the invoice "transaction entity"
I don't store just the client id but
also all the client information related to the
invoice like "client name, address,
contact", and that's well known
approach for storing data in
transaction entities.
If the user created a new invoice the
new client information will be
stored in the invoice record along
with the same client-id (very
obvious!).
My Questions
Is it okay to bind the data entities
"clients" from different locations
for the Insert and the update?
[Explanation: if I followed the
approach from step 1-4 I have to
bind the client entity from the
client table in case of creating new
invoice but in case of
updating/printing the invoice I have
to bind the client entity from the
invoice table otherwise the data
won't be consistent or integer...So
how I can keep the data integrity
without creating spaghetti code in
the DAL to handle this custom
requirements of data binding??]
I passed through a system that was
saving all previous versions of an
entity data before the update
"keeping history of all versions".
If I want to use the same method to
avoid the custom binding how I can
do this in term of database design
"Using MYSQL"? [Explanation: some
invoices created with version 1.0 of
the client then the client info
updated and its version became 1.1
and new invoices created with last
version...So is it good to follow
this methodology? and how I should
design my entities/tables to fulfil the requirements of entity
versioning and binding?
Please provide any book or reference
that can kick me in the right
direction?
Thanks,
What you need to do is leave the table the way it is. You are correct, you should be storing the customer information in the invoice for history of where the items were shipped to. When it changes, you should NOT update this information except for any invoices which have not yet been shipped. To maintain this type of information, you need a trigger on the customer table that looks for invoices that have not been shippe and updates those addresses automatically.
If you want to save historical versions of the client information, the correct process is to create an audit table and populate it through a trigger.
Data integrity in this case is simply through a foreign key to the customer id. The id itself should not ever change or be allowed to change by the user and should be a surrogate number such as an integer. Becasue you should not be changing the address information in the actual invoice (unless it has not been shipped in which case you had better change it or the product will be shipped to the wrong place), this is sufficent to maintain data integrity. This also allows you to see where the stuff was actually shipped but still look up the current info about the client through the use of the foreign key.
If you have clients that change (compaies bought by other companies), you can either run a process onthe server to update the customer id of old records or create a table structure that show which client ids belong to a current parent id. The first is easier to do if you aren;t talking about changing millions of records.
"This is a business case where data mnust be denormalized to preserve historical records of what was shipped where. His design is not incorrect."
Sorry for adding this as a new response, but the "add comment" button still doesn't show.
"His design" is indeed not incorrect ... because it is normalized !!!
It is normalized because it is not at all times true that the address corresponding to an invoice functionally depends on the customer ID exclusively.
So : normalization, yes I do think so. Not that normalization is the only issue involved here.
I'm not completely clear on what you are getting at, but I think you want to read up on normalization, available in many books on relational databases and SQL. I think what you will end up with is two tables connected by a foreign key, but perhaps some soul-searching per previous sentence will help you clarify your thoughts.
When you want to alert a user of something once (one time notes about new features, upcoming events, special offers, etc.), what's the best way to do it?
I'm mainly concerned with the data representation, but if there are more issues to think about please point them out. This is my first time approaching this particular problem.
So my thoughts so far ...
You could have a users, a messages, and a seen/acknowledged messages table. When the user acknowledges the messages, we have a new entry in the seen table with a user id & message id pair.
However, the seen table will grow rapidly with the number of users and messages. At some point, this would become unwieldy (any insight when that would be for a single mysql db on a single server?).
Would it be better to just create 1 seen table per message and maybe end up with 20-30 such additional tables to start? Not really a problem. It just comes with the added nuisance of having to create a new table every time there is a new message (of course, that would be automated in the code - still a little more coding).
This is for a project that has 2-3K current users, but the hopes are to grow that to 10K over the next year, and of course, we're looking beyond that, too ...
Edit:
I'm not enthusiastic about the currently top voted method at all. The proposal seems to be to prepopulate a messages table and delete messages as they are seen. This seems to be a lot more work. You not only have to add your entire user list each time you add a new message. You also have to add all the messages for a new user each time you add a new user - separate logic.
On top of that, the record of a message being "seen" is actually the absence of a record. That does not seem right. Plus, if you later decide to track when messages were seen with a simple time stamp. You've have to rewrite a lot of code and other code becomes unusable.
Lastly, could someone tell me why it's so absolutely horrible to add new tables to the database? Doesn't this happen all the time when a new feature is added? Take any CMS: Joomla or Wordpress for example. When you add new plugins, you are creating tables dynamically. So it has to be more nuanced and contextual than "don't do it". What are the pitfalls and what are the circumstances under which you don't do it or it's okay to do?
I can see that you might say: Be careful about creating new tables on a production servers. Make sure it's been well tested, but ultimately, you're just adding an empty table.
This may require and extended answer, so if any knows any articles, please post them.
Edit: Gabriel Sosa gave a nice flushed out example of his messages table, and I'll simply create a seen table similar to what I originally posted although with timestamp column too. Thanks!
You could have the unseen messages listed in a table, and once the message is displayed you delete that row from the table. And you could also delete rows after X weeks, perhaps, whether or not the users see those messages. That would keep the table from growing unbounded. I'm imagining tables like this:
messages
--------
type PRIMARY KEY
text TEXT
unseenMessages
--------------
id PRIMARY KEY
messageType FOREIGN KEY
user FOREIGN KEY
expirationDate DATE
This unseenMessages table would hold all of the messages in your system, once per message per user. When a user loads a page you check if they have any entries in this table. If so you display those messages and then delete them from the table. Think of it like a message "inbox".
Also, I would not do anything that involves dynamic table creation. You should never, ever* create tables on the fly. Ever.
* Except temporary tables, of course.
All of your messages should be stored in one table, or at least a fixed number of predefined tables. It is a cardinal database sin to create tables on the fly. The same goes for adding and removing columns on the fly. You just don't change the database schema dynamically. You don't do that in polite society. If you think you have to, you haven't designed the database correctly.
The programming analogue is the eval() function: it's just one of those things that's almost never a good idea. And in all fairness, eval() is okay in certain situations. Creating tables on the fly never is.
Your volume is not intimidating for a modern RDBMS. Keep in mind that there are many 100's of MILLIONS of records sitting in Twitter's MySQL database and other SQL Server and Oracle databases.
I can see two ways to actually solve this
Set a distinct cookie for the particular message which is good if these messages area a rarity
Create a couple tables to hold the message details
Messages - would hold the message definition including the message text (or HTML in your case?), as well a message status of active/inactive.
User Messages - a cross reference
table that includes a row if a user
has viewed the message. When the
user sees and acknowledges a message
you could insert a row into this
table.
To determine whether a user
should see the message or not, you
would query this table and the
active messages with the user's ID.
If a result is returned, then you
should bypass the message, otherwise
display it..
I think this provides you an opportunity to scale well into the future since the "User Messages" tables would only be integer association keys between the User/Account table and the Message table. You could also log the user's disposition on the User Message table (acknowledged, viewed, bypassed, etc...)
Let me know if this isn't clear, and i can try to explain better or provide a diagram. I'm sure there are some other patterns for doing this as well. Bank of America flashes these things after logging into my online accounts about once every month or so.
this was my aproach:
CREATE TABLE IF NOT EXISTS `system_user_messages` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`user_id` int(11) NOT NULL,
`section` enum('home','account','all') NOT NULL DEFAULT 'home',
`message` varchar(250) NOT NULL,
`message_type` varchar(25) NOT NULL,
`show` tinyint(4) NOT NULL DEFAULT '1',
`allow_dismiss` tinyint(4) NOT NULL DEFAULT '1',
`created_on` datetime NOT NULL,
`dismissed_on` datetime DEFAULT NULL,
`show_order` int(11) NOT NULL DEFAULT '0',
PRIMARY KEY (`id`),
KEY `idx-user_id` (`user_id`),
KEY `idx-section` (`section`),
KEY `message_type` (`message_type`)
);
I added allow_dismiss because you may dont want allow the user to dismiss that message. In my case when some user's CC is about to expire we dont allow to dismiss and then the system remove the message once the user updates the CC information. By the other hand you may also want to show the message only on certain areas of your site.
I posted the sql because I think is clear in this way... I know there are lot improments to make over this schema but maybe give you an idea.
You could have a users, a messages,
and a seen/acknowledged messages
table. When the user acknowledges the
messages, we have a new entry in the
seen table with a user id & message id
pair.
That seems pretty reasonable.
However, the seen table will grow
rapidly with the number of users and
messages. At some point, this would
become unwieldy (any insight when that
would be for a single mysql db on a
single server?).
Based on the number of users you're talking, I wouldn't bet on that being a concern unless you do a lot of messages often. If it makes sense to retire messages, you could have a disabled flag on messages and a background job to remove rows from the seen table (warehouse it!) that correspond to disabled messages.
Performance wise, a lot of it is riding on your server's specs and what else it is doing. You can get a lot of performance out of a well-indexed, simple table even with lots (100s of thousands+) of rows and some configuration tweaks.
John Kugelman's solution (1 entry per user per message) makes more sense if you want to control who sees messages on a person/group level.
The techniques aren't mutually exclusive either.
Update:
Agree with no dynamic table creation. :)
How important are the messages? If they are truely just a view once "hey here's what's new", does it matter if you know if everyone sees them? (But maybe you want to measure effectiveness?) Why not just display for a week or month or whatever timeframe it is your average user logs in on. Any key site updates can be put on a separate change log page for those really interested. If there are special offers, then its just time limited and why wouldn't users want to be reminded of it each time they log in? Or you could have part of the home page (or a link) dedicated to what's new on the site and record in a cookie if the user has clicked it since it has last been updated, and if not, then highlight it for that user. Sorry for the rambley response.