How can website hits statistics be helpful to improve usability? - usability

Have you noticed that almost every links in facebook have ref query string?
I belive that, with that ref, facebook somehow track and study their user behaviour. this could be their secret recipe of making a better usability.
So, I am trying out the same thing, change http://a.com/b.aspx
to
http://a.com/b.aspx?ref=c and log every hits into a table.
========================================================================
userid | page | ref | response_time | dtmTime
========================================================================
54321 | profile.aspx | birthday | 123 | 2009-12-23 11:05:00
12345 | compose.aspx | search | 456 | 2009-12-23 11:05:02
54321 | payment.aspx | gift | 234 | 2009-12-23 11:05:01
12345 | chat.aspx | search | 567 | 2009-12-23 11:05:03
..... | ............ | ........ | ... | ...................
I think it's a good start. I just don't know what to do with these informations.
Is there any appropriate methodology to process these informations?

Research has shown that fast responses are a way to improve not only usability of a website. It's also a way to improve conversion rates or site usage in general.
Tests at Amazon revealed that every 100 ms increase in load time of Amazon.com decreased sales by 1%
Experiments at Microsoft on Live Search showed that when search results pages were slowed by 1 second: a) Queries per user declined by 1.0%, and b) Ad clicks per user declined by 1.5%
People simply don't want to wait. Therefore, we track response time percentiles for our sites. Additionally, nice visualization of this data helps with measuring performance optimization efforts and monitoring server health.
Here is an example generated using Google Charts:
That looks bad! Response times of > 4000 ms certainly indicate performance problems that have a considerable impact on usability. At times the 800 ms percentile (which we consider a good indicator for our apps) was as low as 77%. We typically try to get the 800 ms percentile at 95%. So this looks like there's some serious work ahead ... but the image is nice, isn't it? ;)

Here's a second answer as the former was only about response time statistics.
The ref query string allows to identify the sources, especially of people entering a Conversion funnel. So you might make statements like "N $ of revenue come from users clicking link X on page Y". Now you could try to modify link X to X1 and see if it increases revenue from this page. That would be your first step into A/B Testing and Multivariate Analysis. Google Website Optimizer is a tool exactly for this purpose.

Well facebook uses them for user interface usage observation (I believe) so they see where people click more (logo or profile link) and they consider changing the UI accordingly in order to make interaction better.
You might also be able to use it to see common patterns in usage. For instance, if people follow a certain chain profile -> birthday -> present -> send you might consider adding in a function or feature to "send present" on their profile when it's that persons birthday. Just a thought.

To make the best use of your website statistics you need to think about what your users are trying to acheive and what you want them to achieve. These are your site's goals
For an ecomerce site this is failrly easy. Typical goals might be:
Search for a product and find information about it.
Buy a product.
Contact someone for help.
You can then use your stats to see if people are completing the site's goals. To do this you need to collect a visitors information together so you can see all the pages they have been to.
Once you can look at all the pages a user has visitted and the sequence they visitted them in you can see what they have been doing. You can look for drop out points where they were about to buy something and then didn't. You can identify product searches that were unsuccessful. You can do all sorts. You can then try and fix these issues and watch the stats to see if it has helped.
The stats you're collecting are a good start, but collecting good stats and collating them is complicated. I'd suggest using an existing stats package I personally use Google Analytics, but there are others available.

Related

Should inbox messages be stored in a database, or are there more effecient alternatives?

I am working on a chat website where users can create rooms, invite others and chat together. I have a lot of the core infrastructure for the website in place, including most of the server and half of the website itself, but I have come across a decision I have to make.
Each user can receive messages to their inbox, for example, they receieved an invite request to join another user's room, or a more general alert relating to their account.
Question: should I store these inbox messages in a database, and what are the alternatives?
Inbox messages typically last for a couple of days, so they are quite ephemeral pieces of data. If I was to store it in a database, this is a rough idea on how the entity would look:
| accountId | message | type |
|-----------|---------------------------------------------------------|-----------------|
| 59 | user3 requested you to join 'hangouts' | invite_request |
| 24 | dialto accepted your request to join 'study group' | invite_response |
| 67 | please confirm your email address to claim your account | account_alert |
On the website, I would create an interface where they can view their inbox messages, and they can discard them if they want. If they discard an inbox message, then it is deleted in the database.
Is this the best solution for this problem, in terms of effieciency? Are there alternatives?
I don't know if this will help but here is my tech stack for this application:
Database: MySQL
Backend: NodeJS | Graphql
Frontend: React | Graphql
Thanks a bunch.
[ I might take around 6-8 hours to respond as I am about the leave for school = 7:48 AM, sorry :) ]
There are no good alternatives for persistence of structured data.
There are many different databases which are optimised for different purposes, no one size fits all..
As a thumb rule, when you start a project, you keep it simple and avoid complexity at all cost.
When you get some real scale, then you start doing optimizations and looking at access patterns, horizontal scaling/partitioning(distributed systems), in memory stores, etc..

How to filter DB results with permissions stored in another DB

i'm currently searching for a good approach to filter DB results based on permissions which are stored in another services DB.
Let me first show the current state:
There's one Document-Service with 2 tables (permission, document) in its MySQL DB. When documents for a user are requested, a paginated result should be returned. For brevity let's ignore the pagination for now.
Permission table: Document table:
user_id| document_id document_id| more columns
-------|------------ A
1 | A B
2 | A C
2 | B
2 | C
The following request "GET /documents/{userId}" will result in the following query against the DB:
SELECT d.* FROM document d JOIN permission p WHERE p.user_id = '{userId}' AND p.document_id = d.document_id;
That's the current implementation and now i am asked to move the permission table into its own service. I know, one would say that's not a good idea, but this question is just a broken down example and in the real scenario it's a more meaningful change than it looks like. So let's take it as a "must-do".
Now my problem: After i move the table into another DB, i cannot use it in the sql query of Document-Service anymore to filter results.
I also cannot query everything and filter in code, because there will be too much data AND i must use pagination which is currently implemented by LIMIT/OFFSET in the query (ignored in this example for brevity).
I am not allowed to access a DB from any other application except its service.
My question is: Is there any best practise or suggested approach for this kind of situation?
I already had 2 ideas which i would like to list here, even though i'm not really happy with either of them:
Query all document_ids of a user from the new Permission-Service and change the SQL to "SELECT * FROM document WHERE document_id IN {doc_id_array_from_permission_service}". The array could get pretty big and the statement slow; not happy about that.
Replicate the permission table into Document-Service DB on startup and keep the query as it is. But then i need to implement a logic/endpoint to update the table in the Document-Service whenever it changes in the Permission-Service otherwise it get's out of sync. This feels like i'm duplicating so much logic in both services.
For the sake of this answer, I'm going to assume that it is logical for Permissions to exist completely independently of Documents. That is to say - if the ONLY place a Permission is relevant is with respect to a DocumentID, it probably does not make sense to split them up.
That being the case, either of the two options you laid out could work okay; both have their caveats.
Option 1: Request Documents with ID Array
This could work, and in your simplified example you could handle pagination prior to making the request to the Documents service. But, this requires a coordinating service (or an API gateway) that understands the logic of the intended actions here. It's doable, but it's not terribly portable and might be tough to make performant. It also leaves you the challenge of now maintaining a full, current list of DocumentIDs in your Permissions service which feels upside-down. Not to mention the fact that if you have Permissions related to other entities, those all have to get propagated as well. Suddenly your Permissions service is dabbling in lots of areas not specifically related to permissions.
Option 2: Eventual Consistency
This is the approach I would take. Are you using a Messaging Plane in your Microservices architecture? If so, this is where it shines! If not, you should look into it.
So, the way this would work is any time you make a change to Permissions, your Permissions Service generates a permissionUpdatedForDocument event containing the relevant new/changed Permissions info. Your Documents service (and any other service that cares about permissions) subscribes to these events and stores its own local copy of relevant information. This lets you keep your join, pagination, and well-bounded functionality within the Documents service.
There are still some challenges. I'd try to keep your Permissions service away from holding a list of all the DocumentID values. That may or may not be possible. Are your permissions Role or Group-based? Or, are they document-specific? What Permissions does the Documents service understand?
If permissions are indeed tied explicitly to individual documents, and especially if there are different levels of permission (instead of just binary yes/no access), then you'll have to rethink the structure in your Permissions service a bit. Something like this:
Permission table:
user_id| entity_type| entity_id | permission_type
-------|------------|-----------|----------------
1 | document | A | rwcd
2 | document | A | r
2 | document | B | rw
2 | document | C | rw
1 | other | R | rw
Then, you'll need to publish serviceXPermissionUpdate events from any Service that understands permissions for its entities whenever those permissions change. Your Permissions service will subscribe to those and update its own data. When it does, it will generate its own event and your Documents service will see confirmation that its change has been processed and accepted.
This sounds like a lot of complication, but it's easy to implement, performant, and does a nice job of keeping each service pretty well contained. The Messaging plane is where they interact with each other, and only via well-defined contracts (message names, formats, etc.).
Good luck!

Database Design: How should I store user's news preferences in MySQL database?

I'm trying to implement a personalized news aggregator. I have to save user's preferred news sources in db. Should I store all the news sources liked by a user as JSON string and then decode it after retrieving?
feeds | user
or have individual column for each news source(no of news sources would be around 200)?
feed_name1 | feed_name2 | ..... | user
Sounds like a many-to-many situation. A solution that used a person table, a newsfeed table, and a person_newsfeed table might be appropriate. There are lots of articles on line about this and any good database theory book (or Something like Oracle: The Complete Reference) should cover this in detail.
This article is a pretty good (but very short) summary:
http://www.databaseprimer.com/pages/relationship_xtox/
Oracle8 The Complete Reference covers this around page 35 with the worker, skill, workerskills example during their discussion of third normal form.

SQL , count equivalent values in each column

I'm working on a URL shortener project with PHP & MYSQL which tracks visits of each url. I've provided a table for visits which mainly consists of these properties :
time_in_second | country | referrer | os | browser | device | url_id
#####################################################################
1348128639 | US | direct | win | chrome | mobile | 3404
1348128654 | US | google | linux | chrome | desktop| 3404
1348124567 | UK | twitter| mac | mozila | desktop| 3404
1348127653 | IND | direct | win | IE | desktop| 3465
Now I want to make a query on this table. for example I want to get visits data for the url with url_id=3404. Because I should provide statistics and draw graphs, for this url, I need these data:
Number of each kind of OS for this URL , for example 20 windows, 15 linux , ...
Number of visits in each desired period of time , for example each 10 minutes in past 24 hour
Number of visits for each country
...
As you see, some data like country may accept lots of different values.
One good idea which I can imagine is to make query which outputs number of each unique value in each column, for example in the country case for the data given above, on column for num_US, one for num_UK, and one for num_IND.
Now the question is how to implement such a high-performance query in sql (MYSQL) ?
Also if you think this is not an efficient query for performance, what's your suggestion?
Any help will be appreciated deeply.
UPDATE: look at this question : SQL; Only count the values specified in each column . I think this question is similar to mine , but the difference is in variety of values possible (as lots of values are possible for country property) for each column which makes the query more complex.
It looks like you need to do more than one query. You probably could write one query with different parameters but that would make it complex and hard to maintain. I would approach it as multiple small queries. So for each requirement I make a query and call them separately or individually. For example, if you want the country query you mentioned, you could do the following
SELECT country, count (*) FROM <TABLE_NAME> WHERE url_id = 3404 GROUP BY Country
By the way, I have not tested this query, so it may be inaccurate, but this is just to give you an idea. I hope this helps.
Also, another suggestion is to use Google Analytics, look into it, they do have a lot of what you already are implementing, maybe that helps as well.
Cheers.
Each of these graphs you want to draw represents a separate relation, so my off-the-cuff response is that you can't build a single query that gives you exactly the data you need for every graph you want to draw.
From this point, your choises are:
Use different queries for different graphs
Send a bunch of data to the client and let it do the required post-processing to create the exact sets of data it needs for different graphs
farm it all out to Google Analytics (a la #wahab-mirjan)
If you go with option 2 you can minimize the amount of data you send by counting hits per (10-minute, os, browser, device, url_id) tupple. This essentially removes all duplicate rows and gives you a count. The client software would take these numbers and further reduce them by country (or whatever) to get the numbers it needs for a graph. To be honest though, I think you're buying yourself extra complexity for not very much gain.
If you insist on doing this yourself (instead of using a service) then go with a different query for each kind of graph. Start with a couple of reasonable indexes (url_id and time_in_second are obvious starting points). Use the explain statement (or whatever your database provides) to understand how each query is executed.
Sorry, I am new to Stack Overflow and having a problem with comment formatting. Here is my answer again, hopefully it workds now:
Not sure how it is poor in performance. The way I am thinking is you will end up with a table that looks like this:
country | count
#################
US | 304
UK | 123
UK | 23
So when you group by country, and count, it will be one query. I think this will get you going in the right direction. In any case, it is just an opinion, so if you find another approch, I am interested in knowing it as well.
Apologies about the comment messup up there..
Cheers

System for tracking changes in whois records

What's the best storage mechanism (from the view of the database to be used and system for storing all the records) for a system built to track whois record changes? The program will be run once a day and a track should be kept of what the previous value was and what the new value is.
Suggestions on database and thoughts on how to store the different records/fields so that data is not redundant/duplicated
(Added) My thoughts on one mechanism to store data
Example case showing sale of one domain "sample.com" from personA to personB on 1/1/2010
Table_DomainNames
DomainId | DomainName
1 example.com
2 sample.com
Table_ChangeTrack
DomainId | DateTime | RegistrarId | RegistrantId | (others)
2 1/1/2009 1 1
2 1/1/2010 2 2
Table_Registrars
RegistrarId | RegistrarName
1 GoDaddy
2 1&1
Table_Registrants
RegistrantId | RegistrantName
1 PersonA
2 PersonB
All tables are "append-only". Does this model make sense? Table_ChangeTrack should be "added to" only when there is any change in ANY of the monitored fields.
Is there any way of making this more efficient / tighter from the size point-of-view??
The primary data is the existence or changes to the whois records. This suggests that your primary table be:
<id, domain, effective_date, detail_id>
where the detail_id points to actual whois data, likely normalized itself:
<detail_id, registrar_id, admin_id, tech_id, ...>
But do note that most registrars consider the information their property (whether it is or not) and have warnings like:
TERMS OF USE: You are not authorized
to access or query our Whois database
through the use of electronic
processes that are high-volume and
automated except as reasonably
necessary to register domain names or
modify existing registrations...
From which you can expect that they'll cut you off if you read their databases too much.
You could
store the checksum of a normalized form of the whois record data fields for comparison.
store the original and current version of the data (possibly in compressed form), if required.
store diffs of each detected change (possibly in compressed form), if required.
It is much like how incremental backup systems work. Maybe you can get further inspiration from there.
you can write vbscript in an excel file to go out and query a webpage (in this case, the particular 'whois' url for a specific site) and then store the results back to a worksheet in excel.