How to paginate roster fetch call on ejabberd - ejabberd

I am running my chat service on ejabberd, but after 4-5 months of no downtime, I have come to use case where it's taking a lot of time in fetching rosters for the users whose roster list is too big. At many places it's mentioned that pagination functionality is not present on ejabberd, but is there any way we can optimise it ?

To my knowledge, there is no XMPP specification that define roster pagination. ejabberd does not do anything special in that regard.
What you can look into is XMPP roster versioning (https://xmpp.org/extensions/xep-0237.html), but this is different from pagination.

Related

storing Socket.io open sockets' ids in a MySQL db

** Problem:
For a social media app (think of Facebook), it's necessary to have a real-time notification system, For that I came across many options including how Facebook itself handles it; which's by using the old long polling hack. I would like instead to use Socket.io, but the simple implementations that I found on the internet involve systems where you broadcast to all users or users in (say: some chat rooms);
** Suggested solution:
for my case I thought of handling it in the following manner:
The user connects to the app, a new socket/connection is established and the relative socket id is stored in a MySQL (user_id, socket_id) 'open_sockets' table.
When a user - (say: likes a post), it automatically registers him as a subscriber in a MySQL (post_id, user_id) 'subscriptions' table.
Now, when the post gets updated by someone else replying or liking it..etc, I query all the subscribers' ids from the aforementioned 'subscriptions' table, and then query their relative socket_id from 'open_sockets' table.
I broadcast to clients with the retrieved socket ids.
This system could get complicated (in nature and requirements), it involves communicating with the database to retrieve the right socket ids.
** questions:
What do you think about this solution?
What could be the best way to handle such a scenario (for example: in a Facebook like platform)?

Move information-resource stored in the database tables with two step using 'reservation'

I need to architect a database and service, I have resource that I need to deliver to the users. And the delivery takes some time or requires user to do some more job.
These are the tables I store information into.
Table - Description
_______________________
R - to store resources
RESERVE - to reserve requested resources
HACK - to track some requests that couldn`t be made with my client application (statistics)
FAIL - to track requests that can`t be resolved, but the user isn't guilty (statistics)
SUCCESS - to track successfully delivery (statistics)
The first step when a user requests resouce
IF (condition1 is true - user have the right to request resource) THEN
IF (i've successfully RESERVE-d resource and commited the transaction) THEN
nothing to do more
ELSE
save request into FAIL
ELSE
save request into HACK
Then the second step
IF (condition2 is true - user done his job and requests the reserved resource) THEN
IF (the resource delivered successfully) THEN
save request into SUCCESS
ELSE
save request into FAIL
depending on application logic move resource from RESERVE to R or not
ELSE
save request into HACK, contact to the user,
if this is really a hacker move resource from RESERVE to R
This is how I think to implement the system. I've stored transactions into the procedures. But the main application logic, where I decide which procedure to call are done in the application/service layer.
Am I on a right way, is such code division between the db and the service layers normal? Your experienced opinions are very important.
Clarifying and answering to RecentCoin's questions.
The difference between the HACK and FAIL tables are that I store more information in the HACK table, like user IP and XFF. I`m not going to penalize each user that appeared in that table. There can be 2 reasons that a user(request) is tracked as a hack. The first is that I have a bug (mainly in the client app) and this will help me to fix them. The second is that someone does manually requests, and tries to bypass the rules. If he tries 'harder' I'll be able to take some precautions.
The separation of the reserve and the success tables has these reasons.
2.1. I use reserve table in some transactions and queries without using the success table, so I can lock them separately.
2.2. The data stored in success will not slow down my queries, wile I'm querying the reserve table.
2.3. The success table is kind of a log for statistics, that I can delete or move to other database for future analyse.
2.4. I delete the rows from the reserve after I move them to the success table. So I can evaluate approximately the max rows count in that table, because I have max limit for reservations for each user.
The points 2.3 and 2.4 could be achieved too by keeping in one table.
So are the reasons 2.1 and 2.2 enough good to keep the data separately?
The resource "delivered successfully" mean that the admin and the service are done everything they could do successfully, if they couldn't then the reservation fails
4 and 6. The restrictions and right are simple, they are like city and country restrictions, The users are 'flat', don't have any roles or hierarchy.
I have some tables to store users and their information. I don't have LDAP or AD.
You're going in the right direction, but there are some other things that need to be more clearly thought out.
You're going to have to define what constitutes a "hack" vs a "fail". Especially with new systems, users get confused and it's pretty easy for them to make honest mistakes. This seems like something you want to penalize them for in some fashion so I'd be extremely careful with this.
You will want to consider having "reserve" and "success" be equivalent. Why store the same record twice? You should have a really compelling reason do that.
You will need to define "delivered successfully" since that could be anything from an entry in a calendar to getting more pens and post notes.
You will want to define your resources as well as which user(s) have rights to them. For example, you may have a conference room that only managers are allowed to book, but you might want to include the managers' administrative assistants in that list since they would be booking the room for the manager(s).
Do you have a database of users? LDAP or Active Directory or will you need to create all of that yourself? If you do have LDAP or AD, can use something like SAML?
6.You are going to want to consider how you want to assign those rights. Will they be group based where group membership confers the rights to reserve, request, or use a given thing? For example, you may only want architects printing to the large format printer.

How did Facebook or Twitter implement their subscribe system

I'm working on a SNS like mobile app project, where users upload their contents and can see updates of their subscribed topic or friends on their homepage.
I store user contents in mysql, and query the user specific homepage data by simply querying out first who and what the user subscribed and then query the content table filtering out using the 'where userid IN (....) or topic IN (....)' clause.
I suspect this would become quite slow when the content table piles up or when a user subscribe tons of users or topics. Our newly released app is already starting to have thousands of new users each week, and getting more over time. Scalability must be a concern for us right now.
So I wonder how Facebook or Twitter handle this subscribing problem with their amazing number of users. Do they handle a list for each user? I tried to search, but all I got is how to interact with Facebook or Twitter rather than how they actually implement this feature.
I noticed that you see only updates rather than history in your feed when using Facebook. Which means that subscribing a new user won't dump lots out dated content into your feed as how it would be by using my current method.
How do Facebook design their database and how did they dispatch new contents to subscribed users?
My backend is currently PHP+MySQL, and I don't mind introducing other backend technologies such as Redis or JMS and stuff if that's the way it should be done.
Sounds like you guys are still in a pretty early stage. There are N-number of ways to solve this, all depending on which stage of DAUs you think you'll hit in the near term, how much money you have to spend on hardware, time in your hands to build it, etc.
You can try an interim table that queues up the newly introduced items, its meta-data on what it entails (which topic, friend user_id list, etc.). Then use a queue-consumer system like RabbitMQ/GearMan to manage the consumption of this growing list, and figure out who should process this. Build the queue-consumer program in Scala or a J2EE system like Maven/Tomcat, something that can persist. If you really wanna stick with PHP, build a PHP REST API that can live in php5-fpm's memory, and managed by the FastCGI process manager, and called via a proxy like nginx, initiated by curl calls at an appropriate interval from a cron executed script.
[EDIT] - It's probably better to not use a DB for a queueing system, use a cache server like Redis, it outperforms a DB in many ways and it can persist to disk (lookup RDB and AOF). It's not very fault tolerant in case the job fails all of a sudden, you might lose a job record. Most likely you won't care on these crash edge cases. Also lookup php-resque!
To prep for the SNS to go out efficiently, I'm assuming you're already de-normalizing the tables. I'd imagine a "user_topic" table with the topic mapped to users who subscribed to them. Create another table "notification_metadata" describing where users prefer receiving notifications (SMS/push/email/in-app notification), and the meta-data needed to push to those channels (mobile client approval keys for APNS/GCM, email addresses, user auth-tokens). Use JSON blobs for the two fields in notification_metadata, so each user will have a single row. This saves I/O hits on the DB.
Use user_id as your primary key for "notification_meta" and user_id + topic_id as PK for "user_topic". DO NOT add an auto-increment "id" field for either, it's pretty useless in this use case (takes up space, CPU, index memory, etc). If both fields are in the PK, queries on user_topic will be all from memory, and the only disk hit is on "notification_meta" during the JOIN.
So if a user subscribes to 2 topics, there'll be two entries in "user_topic", and each user will always have a single row in "notification_meta"
There are more ways to scale, like dynamically creating a new table for each new topic, sharding to different MySQL instances based on user_id, partitioning, etc. There's N-ways to scale, especially in MySQL. Good luck!

Allowing multiple logins from one account with ejabberd

I have just started getting my hands dirty with building IM applications with ejabberd XMPP server and I have a requirement to allow one user account to login simultaneously from multiple devices and be able to follow conversations on all their logged in devices much like what gives in Skype, FB.
Is this possible with ejabberd out of the box or are there any further customizations one has to do?
Any pointers I can get woild be helpful. The body of knowledge out there is quite huge and knowing where to start looking has been quite daunting.
Yes, connecting from multiple devices at once is part of the XMPP standard. In a JID, the "resource" portion (e.g.: the part after the slash in jome#stackoverflow.com/desktop) is unique to a single connection and users may have many resources. So the resource could be your MAC or some unique device ID.
Vanilla XMPP allows users to specify priorities with each resource, and messages are routed to the highest-priority resource present. To follow a conversation across all resources at once, you need to enable XEP-0280.

The best way to store clicks and views statistics?

I have some items and I want to store statistics like daily clicks and the traffic source of the page and I want to store these statistics for 100`s of items. Based on what stats I will save I want to be able to generate charts with the clicks of each day (like on google analytics) and to show the number of clicks from each traffic source.
I'm not a specialist, but I'm thinking to store statistics in a mysql table for a single day then write them in multiple .xml files. I have a slow, cheap server and I`m searching for the best method, please help me!
These "items" are embedded in other websites. I control these items using php
Since this items are embedded in other websites storing this infos / request is a NO GO.
This means you either need to install and setup mysql on this other websites, which is unlikely.
Or you connect to a remote mysql .. which is quite expensive for each request.
Especially when you say yourself, that you only have a "cheap" server.
Additionally you risk bringing the websites with the embedded items down, when your mysql server fails.
Better to use google analytics to track the visited pages correctly instead of developing one.
It will show you daily, country wise visitors.