Data Model for Profile Page Views - mysql

Say I have a site with user profiles that have publicly accessible pages (each profile has several pages each). I'd like to show the users page view statistics (e.g. per page, for a certain time period, etc.). What's a good way to store page views?
Here's what I was thinking:
Table Page Views
================
- Id (PK)
- Profile Id (FK)
- Page Id (FK)
- Timestamp
I'm afraid this solution won't scale. Suggestions?

Your intuition is correct, writing to a database doesn't scale particularly well. You want to avoid a database transaction for each page request.
That noted, is scaling really your concern? If so, and assuming a Internet site (as opposed to intra), skip rolling your own and collect the hit data with Google Analytics or something similar. Then take that data and process it to generate totals per profile.
However, if you're really hellbent on doing it yourself, consider log parsing instead. If you can enumerate the URLs per profile, use that information, and your web server logs, to generate hit totals. Tools such as Microsoft's Log Parser, which can process A LOT of different formats, or *nix command line tools like sed and grep are your friends here.
If enumeration's not possible change code to log the information you need and process that log file.
With logs in place, generate results using a batch process and insert those results into a database using MySQL's LOAD DATA.
Final note on the roll your own approach I've recommended - this will scale a lot better if you have a clustered environment than database transaction per request.

It depends on what kind of reports you want to make available.
If you want to be able to say "this is the list of people that viewed your page between these two dates", then you must store all the data you proposed.
If you only need to be able to say "your page was viewed X times between these two dates", then you only need a table with a page ID, date, and counter. Update the counter column on each page view with a single UPDATE query.

I suppose you can have
tblPerson
personid(pk)
activeProfileID(fk) -- the active profile to use.
timestamp
tblPage
pageid(pk)
data
tblPersonProfile
profileID(pk)
timestamp
tblProfilePages
profilePageID(pk)
profileid(pk)
pageid(pk)
isActive

Related

is storing frequently used data in json in mysql worth it?

in vue.js app the main focus is working with prospects. prospects have many things like contacts, listings, and half a dozen other objects/tables.
they also have interactions, which could have 30 or more per prospect, while most things like emails or phones would have 1-3 results. I load 50 prospects at a time in to the front end
I'm trying to decide if loading it all into the front end to work 50 prospects at a time is a good idea, or if i should have a json column with interactions as part of the prospects table that i would update each time an interaction is saved, with minimal info like date, type, subject...
it seems like an extra step (and duplicate data, how important is that?) to update the json column with each interaction, but also seems like it would save looking up and loading data all the time
I'm not a programmer, but have been teaching myself how to do something i need done for my business with tutorials and youtube, any opinions from those who deal with this professionally would be appreciated
also, if anyone wants to tell me how to ask this question in a better formatted way, I'm open ears
thanks
Imagine if you have 1000 data, but you are sending only 50 of them, and your user do a filter by price. Will you display only the filtered data from 50 or 1000 of them?
That depends on whether you want to expose all 1000 data to the front end. It's a choice between that, and calling your server api everytime.
If you are calling the server, consider using a cache like Redis to store your results .
Pseudo code.
Request Received
Check Redis Cache - Redis.get('key')
If key exist - return cache.
Else -
check mysql for latest results.
Redis.set('key', latest results);
CreateRequest Received
- Write to mysql
- Redis.delete('key') // next request to view will create new cache with fresh data.
Your key can be anything like, e.g your url ('/my/url')
https://laravel.com/docs/8.x/redis

Create mysql tables dynamically based on a criteria in ruby on rails 5

I am new to Rails and I am trying to create a web app where you scrape some html from a page and store it into a database in order to compare it to a different version e.g the price of a product changed. The way I want to make it work is to create a new table every time you scrape something from a domain that's new.
So basically every domain has its own table for changes. I know how to create tables with migrations but how do you dynamically create a table when a new domain is added ?
The recommended "relational database" way to do this is to have a singular table and relate that table to the source. For page snapshots you can often hash the content to test for duplicated data, and a UNIQUE index on your content hash can automatically prevent those sorts of inserts.
If portions of the page update but you're not interested in them, like advertising blocks, you can use a tool like Nokogiri to pre-process and strip out that content before hashing and saving.
Now if this is just part of a pipeline where you're capturing pages with the express intent of extracting price information later, you may not need a database at all for that part of the process. You could funnel the raw page data into a queue like RabbitMQ and have workers process it, boiling it down to the price data, which is all you insert in the database.
If you need to preserve the page snapshots for diagnostic or historical reasons then a table will work. To save on size you can explore using an ARCHIVE type table. These are append-only, you can't edit them, but they are compact and perform well.
You could periodically TRUNCATE a table of that sort to clear out old data so you're not keeping junk around forever.

Concurrent inserts mysql - calling same insert stored proc before the first set of inserts is completed

I am working on social networking site, which includes the creation of media content and also records the interaction of users with the created content.
Background of issue - approach used currently
There is a page called news-feed, which displays the content and activity done with the content by the users they are following on site.
Display order of the content changes with more and more user interactions(eg. if there are more number of comments on a post, its likely to be shown on top of the one with lesser number of comments. However, number of comments is just one of the attributes used to rank the post).
I am using mysql(innodb) database to store the data as follows:
activity_master : activities allowed to be part of news feed(post, comment etc)
activity_set : for aggregation of activities on the same object
activity_feed: details of actual activity
Detailed ER Diagram is at the end of question
Scenario
A user(with 1000 followers) posts something, which initiates an async call to the procedure to insert the relevant entries(1000 rows for 1000 followers) in above mentioned tables for all followers.
Some followers started commenting(activity allowed to be part of news feed) before the above call is completed which initiates another call to the same procedure to insert entries(x total number of their own followers) of this activity for their particular set of followers. (e.g User B commented on this post)
All the insert requests(which seems way too many) will have to be processed in queue by innodb engine
Questions
Is there a better and efficient way to do this? (I definitely think there would be one)
How many insert requests can innodb handle in its default configuration?
How to avoid deadlock (or resource congestion at database end) in this case
Or is there any other type of database best suited in this case
Thanks for showing your interest by reading the description, any sort of help in this regard is much appreciated and let me know if any further details are required, thanks in advance!
ER Diagram of tables (not reputed enough to embed the image directly :( )
A rule of thumb: "Don't queue it, just do it".
Inserting 1000 rows is likely to be untenable. Tomorrow, it will be 10000.
Can't you do the processing on the select side instead of the insert side?

How to store / retrieve large amounts of data sets within XPages?

Currently we're working on a solution where we want to track (for analysis) the articles a user clicks on/opens and 'likes' from a given list of articles. Subsequently, the user needs to be able to see and re-click/open the article (searching is not needed) in a section on his/her personal user profile. Somewhere around the 100 new articles are posted every day. The increasing(!) amount of daily visitor (users) lies around the 2000 a day. The articles are currently stored and maintained within a MySQL Db.
We could create a new record in the MySQL Db for every article read / 'liked'. 'Worst case' would this create (2500 * 100 = ) 250000 records a day. That won’t hold long of course… So how would you store (process) this within XPages, given the scenario?
My thoughts after reading “the article” :) about MIME/Bean’s: what about keeping 'read articleObjects' in a scope and (periodically) store/save them as MIME on the user profile document? This only creates 100 articleObjects a day (or 36500 a year). In addition, one could come up with a mechanism where articleObjects are shifted from one field to another as time passes by, so the active scope would only contain the 'read articleObjects' from last month or so.
I would say that this exactly what a relational database is for. My first approach would be to have a managed bean (session scope) to read/access user's data in MySQL (JDBC). If you want, you can build an internal cache inside the bean.
For the presented use case, I would not bother with the JDBC datasources in ExtLib. Perhaps even the #Jdbc functions would suffice.
Also, you did not say how you are doing the analysis? If you store the information in Domino, you'll probably have to write an export tool.

Using Redis as a Key/Value store for activity stream

I am in the process of creating a simple activity stream for my app.
The current technology layer and logic is as follows:
** All data relating to an activity is stored in MYSQL and an array of all activity id's are kept in Redis for every user.**
User performs action and activity is directly stored in an 'activities' table in MYSQL and a unique 'activity_id' is returned.
An array of this user's 'followers' is retrieved from the database and for each follower I push this new activity_id into their list in Redis.
When a user views their stream I retrieve the array of activity id's from redis based on their userid. I then perform a simple MYSQL WHERE IN($ids) query to get the actual activity data for all these activity id's.
This kind of setup should I believe be quite scaleable as the queries will always be very simple IN queries. However it presents several problems.
Removing a Follower - If a user stops following someone we need to remove all activity_id's that correspond with that user from their Redis list. This requires looping through all ID's in the Redis list and removing the ones that correspond to the removed user. This strikes me as quite unelegant, is there a better way of managing this?
'archiving' - I would like to keep the Redis lists to a length of
say 1000 activity_id's as a maximum as well as frequently prune old data from the MYSQL activities table to prevent it from growing to an unmanageable size. Obviously this can be achieved
by removing old id's from the users stream list when we add a new
one. However, I am unsure how to go about archiving this data so
that users can view very old activity data should they choose to.
What would be the best way to do this? Or am I simply better off
enforcing this limit completely and preventing users from viewing very old activity data?
To summarise: what I would really like to know is if my current setup/logic is a good/bad idea. Do I need a total rethink? If so what are your recommended models? If you feel all is okay, how should I go about addressing the two issues above? I realise this question is quite broad and all answers will be opinion based, but that is exactly what I am looking for. Well formed opinions.
Many thanks in advance.
1 doesn't seem so difficult to perform (no looping):
delete Redis from Redis
join activities on Redis.activity_id = activities.id
and activities.user_id = 2
and Redis.user_id = 1
;
2 I'm not really sure about archiving. You could create archive tables every period and move old activities from the main table to an archive table periodically. Seems like a single properly normalized activity table ought to be able to get pretty big though. (make sure any "large" activity stores the activity data in a separate table, the main activity table should be "narrow" since it's expected to have a lot of entries)