storing Socket.io open sockets' ids in a MySQL db - mysql

** Problem:
For a social media app (think of Facebook), it's necessary to have a real-time notification system, For that I came across many options including how Facebook itself handles it; which's by using the old long polling hack. I would like instead to use Socket.io, but the simple implementations that I found on the internet involve systems where you broadcast to all users or users in (say: some chat rooms);
** Suggested solution:
for my case I thought of handling it in the following manner:
The user connects to the app, a new socket/connection is established and the relative socket id is stored in a MySQL (user_id, socket_id) 'open_sockets' table.
When a user - (say: likes a post), it automatically registers him as a subscriber in a MySQL (post_id, user_id) 'subscriptions' table.
Now, when the post gets updated by someone else replying or liking it..etc, I query all the subscribers' ids from the aforementioned 'subscriptions' table, and then query their relative socket_id from 'open_sockets' table.
I broadcast to clients with the retrieved socket ids.
This system could get complicated (in nature and requirements), it involves communicating with the database to retrieve the right socket ids.
** questions:
What do you think about this solution?
What could be the best way to handle such a scenario (for example: in a Facebook like platform)?

Related

How to store socket.id across multiple servers nodejs and socket.io

What is the best way to store user's socket.id across multiple servers? Take for example a simple chat app, if two users on different servers are sending messages to each other, the two servers must store each user's socket id somewhere so they can send the message from one user to another.
Currently I am using a redis hash to store each user's socket id (if they are online) but this doesn't work if a user has two connections (for example they have two tabs of the chat app open). Is the best approach to continue using redis but restructure the data structure in a way that makes it work when a user is connected twice, or would it be better to move to something like mongodb or mysql?
I would also like a way to expire data, for example if a socket id is stored for more than 24h then it should be automatically deleted. I have looked into doing this with redis but it doesn't seem possible to delete only one pair inside a hash. Is expiring data something that can be done in mysql or mongodb?
Did you try socket rooms?
check this link for rooms and namespaces
for example, if a user has multiple connections join them in a room with a unique name(maybe the userId or something)

If my users are stored in another database, should I duplicate them in my service that uses SQL database?

If my users are stored in some other database, but I am building posts in my SQL database, should I create another table users?
If I did, I would be duplicating all of my users and would have to make sure this stays in sync with the other database, but on the other hand, my posts tables could save space by referring to fk instead of full id string each time.
What is the recommendation? Create another table users or just pass in the user ids to query?
If you have a service that stores and provides information about users then other services that need that information should communicate with the User service to get it. That is, presumably, the reason the User service exists in the first place.
Depending on the volatility of the users list and requirements for changes there to be respected in the Posts service you might consider some short-term caching in the Posts service, but I certainly wouldn't persist another copy of the user list there.
There are 3 obvious solutions.
The simplest, cleanest and fastest is to use foreign keys and joins between your "posts" database and your "users" database. In this case, when you show a list of posts, you can get both the post and user data in a single query, and there's no need to keep things up to date.
The next option is to store a copy of the user data alongside your posts. This leads to entertaining failure modes - data in the user database may get out of sync. However, this is a fairly common strategy when using 3rd party authentication systems (e.g. logging on with your Google/Facebook/Github/Stack Exchange credentials). The way to make this work is to minimize the amount of data you duplicate, and have it be safe if it's out of date. For instance, a user's display name is probably okay; current bank account balance is probably not.
The final option is to store the primary key for users in your posts database, and to retrieve the user data at run time. This is less likely to lead to bugs with data getting out of sync, but it can cause performance problems - retrieving user details for 1000 posts one by one is obviously much slower than retrieving everything through a joined query.
The choice then is "do I have a service which combines post and user data and my UI retrieves everything from that service, or do I let the UI retrieve posts, and then users for each post". That's mostly down to the application usage, and whether you can use asynchronous calls to retrieve user information. If at all possible (assuming you're building a web application), the simplest option might be to return the posts and user IDs and use Ajax requests to retrieve the user data as needed.
The CQRS approach (common to microservice architectures) provides some structure for this.

CakePHP response slow

I have a few set of API's written in CakePHP which we want to migrate to Amazon AWS.
Following is the current situation:
Website is hosted on GoDaddy as shared hosting with domain, for example: democompany.com
Backend database is MySQL which we access via PhpMyAdmin. It has several tables e.g. users, plans, purchases etc.
All API's are written in CakePHP which we access via base URL:
democompany.com/cake
For example, for adding an entry in users table, we create a JSON and send it via REST API. Below image show the JSON:
Now, since our users are growing, our API response time has slowed. Sending a POST or GET takes time to return the response.
We were thinking of migrating our API's and database to Amazon AWS or any other solution. I am not much aware of AWS, so don't know which product would be best.
Which would be the best solution and offer immediate response and would be cost-effective?
A slowing MySQL database with PHP backend can have many reasons. Try these:
One of the most important thing is to think about your indexes. You probably have a primary index on ID with auto_increment. But if you query a lot on another column (like SELECT * FROM users WHERE email LIKE '%john%') it is important to also set an index on the email column. How indexes work is vital if you want high performing databases. See this post for a start of how this works: How do MySQL indexes work?
Another thing is the amount and complexity of your queries. Do you use many queries in one page load or only a few? Try to get as much information as possible out of one query.
Sorting data can be extremely expensive as well. Does removing SORT BY whatever speed things up a lot? Check this out: MYSQL, very slow order by
If you looked at all of this and are sure that all your queries are running smooth you can look at persistent connections (re-using connections in one page load for example), bigger servers, etc.

How did Facebook or Twitter implement their subscribe system

I'm working on a SNS like mobile app project, where users upload their contents and can see updates of their subscribed topic or friends on their homepage.
I store user contents in mysql, and query the user specific homepage data by simply querying out first who and what the user subscribed and then query the content table filtering out using the 'where userid IN (....) or topic IN (....)' clause.
I suspect this would become quite slow when the content table piles up or when a user subscribe tons of users or topics. Our newly released app is already starting to have thousands of new users each week, and getting more over time. Scalability must be a concern for us right now.
So I wonder how Facebook or Twitter handle this subscribing problem with their amazing number of users. Do they handle a list for each user? I tried to search, but all I got is how to interact with Facebook or Twitter rather than how they actually implement this feature.
I noticed that you see only updates rather than history in your feed when using Facebook. Which means that subscribing a new user won't dump lots out dated content into your feed as how it would be by using my current method.
How do Facebook design their database and how did they dispatch new contents to subscribed users?
My backend is currently PHP+MySQL, and I don't mind introducing other backend technologies such as Redis or JMS and stuff if that's the way it should be done.
Sounds like you guys are still in a pretty early stage. There are N-number of ways to solve this, all depending on which stage of DAUs you think you'll hit in the near term, how much money you have to spend on hardware, time in your hands to build it, etc.
You can try an interim table that queues up the newly introduced items, its meta-data on what it entails (which topic, friend user_id list, etc.). Then use a queue-consumer system like RabbitMQ/GearMan to manage the consumption of this growing list, and figure out who should process this. Build the queue-consumer program in Scala or a J2EE system like Maven/Tomcat, something that can persist. If you really wanna stick with PHP, build a PHP REST API that can live in php5-fpm's memory, and managed by the FastCGI process manager, and called via a proxy like nginx, initiated by curl calls at an appropriate interval from a cron executed script.
[EDIT] - It's probably better to not use a DB for a queueing system, use a cache server like Redis, it outperforms a DB in many ways and it can persist to disk (lookup RDB and AOF). It's not very fault tolerant in case the job fails all of a sudden, you might lose a job record. Most likely you won't care on these crash edge cases. Also lookup php-resque!
To prep for the SNS to go out efficiently, I'm assuming you're already de-normalizing the tables. I'd imagine a "user_topic" table with the topic mapped to users who subscribed to them. Create another table "notification_metadata" describing where users prefer receiving notifications (SMS/push/email/in-app notification), and the meta-data needed to push to those channels (mobile client approval keys for APNS/GCM, email addresses, user auth-tokens). Use JSON blobs for the two fields in notification_metadata, so each user will have a single row. This saves I/O hits on the DB.
Use user_id as your primary key for "notification_meta" and user_id + topic_id as PK for "user_topic". DO NOT add an auto-increment "id" field for either, it's pretty useless in this use case (takes up space, CPU, index memory, etc). If both fields are in the PK, queries on user_topic will be all from memory, and the only disk hit is on "notification_meta" during the JOIN.
So if a user subscribes to 2 topics, there'll be two entries in "user_topic", and each user will always have a single row in "notification_meta"
There are more ways to scale, like dynamically creating a new table for each new topic, sharding to different MySQL instances based on user_id, partitioning, etc. There's N-ways to scale, especially in MySQL. Good luck!

how to design secure database for Mobile recharge codes

Suppose I want to keep millions of recharge codes into a separate database(named A) having a table . I want to design another database(named B) which will be used by a web application.
I want to keep my database A separate and as secure as it can be, preferably not exposed to the network. so that nobody could get access/hack to the huge sensitive data.
But I also have to populate one table of database B with the codes from table of Database A as needed or requested from web application.
I am using Mysql DB and Apache Tomcat as web server .
Can you please suggest me any best and secure way of designing the database keeping in mind that..
1) The safety of codes in database A are the priority.
2) the tables will contain millions of rows so quick response is also requirement.
I'm adding this as an answer because it is too long for a comment.
I think this is more about app design and layering than about the database design as such. In terms of DB design, you just need the tables to have indexes that use all the keys you will have. The db-access will be sub-second.
In terms of app design, I suppose your app will know when to look at table-B and when it has to retrieve from table-A.
So, the key issue is: how to access A. The simplest way would be for the app to connect to A, and read it via SQL. The problem with this is that a hacker who is on your app server could then see your connection details. You could try to obscure the connection details from app-server to A. This would be "security through obscurity" and would be something, but would not stop a good hacker.
If you're serious about control, you could have an app running on A. You can block all ways for apps from outside A to access the database on A, leaving the app on A as the sole point of access.
By it very uniqueness, the app could provide another level of obscurity. For instance, the app could insist on knowing the customer-id for whom the code is being requested, and could check this against some info (even on B). But, there are better reasons to use one...
The app on B could
impose controls: e.g. only 1000 codes given out per hour
send alerts: e.g. email an operator if more than 500 codes have been requested in
the hour