I want to keep track of a user counter though time and be able to generate stats about the changes in the counter through time.
I'm pretty set (although if they are better ways I would like to hear about them) about the two main tables. user and counter_change that would look pretty much like this:
user:
+-----------+------------+
| id | username |
+-----------+------------+
| 1 | foo |
| 2 | bar |
+-----------+------------+
counter_change:
+-----------+--------------------+------------+
| user_id | counter_change_val | epoch_time |
+-----------+--------------------+------------+
| 1 | 10 | 1513242884 |
| 1 | -1 | 1513242889 |
+-----------+--------------------+------------+
I want to be able to show the current counter value (with the base value being 0) at the frontend as well as some stats trough time (ex: yesterday your net counter was +10 or -2, etc)
I've thought about some possible solutions but none of them seem to be the perfect solution.
Add counter to user table (or on some new counters table):
This solution seems to be the more resources effective, at the time of inserting a counter_change, update the counter in user with the counter_change_val.
Pros:
Get the counter current value would consume virtually no resources.
Cons:
The sum of counter_changes_val could diverge from the counter in user if a bug occurs.
Couldn't be really used for stats fields as it would require an additional query, and at that point a trigger would be more handy.
Add a calculated counter to user table (or on some new counters table) on insert/update:
This solution would consist of a SQL trigger or some sort of function at ORM level that would update the value on an insert to the the counter_change table with the sum of the counter_change_val.
This would be also used on calculated fields that imply grouping by dates. For example get the average daily changes of the last 30 days.
Pros:
Get the counter current value would consume virtually no resources.
Cons:
On every insert an aggregation of all the current user counter_change would be needed.
Add a view or select with the sum of counter
This solution would consist of creating a view or select to get the sum of the aggregate counter_change_val when needed.
Pros:
Adds no fields to the tables.
Cons:
As it is calculated at runtime it would add time to request response time.
Every time the counter is consulted an aggregation of the counter_change values would be needed.
Actually, I am not sure that I have understood what you are trying to do. Nevertheless, I would suggest Option 1 or Option 2:
Option 1 is efficient, and it is sufficiently safe against errors if it is done right. For example, you could wrap inserting the counter_change and computing the new counter_value in a transaction; this will prevent any inconsistencies. You could do that either in the back-end software or in a trigger (e.g. upon inserting a counter_change).
Regarding Option 2, it is not clear to me why an aggregation over all the counter_change of the current user would be needed. You can adjust the counter_value in the user table from within an insert trigger as with option 1, and you can use transactions to make it safe.
IMHO, adjusting the current counter_value upon every insert of a counter_change is the most efficient solution. You can do it either in the back-end software or from within a trigger. In both cases, use transactions.
Option 3) should not be used because it will put much of load onto the system (assume you have 1000 counter_changes per user ...).
Regarding the statistics: This is a different problem from storing the data in the first place. You probably will need some sort of aggregation for any statistical data. To speed this up, you could think about caching results and things like that.
Related
I have an application that let's devices communicate over MQTT.
When two (or more devices) are paired, they are in a session (with a session-id)
The topics are for example:
session/<session-id>/<sender-id>/phase
with a payload like
{'phase': 'start', 'othervars': 'examplevar'}
Every session is logged into a mySQL database into the following format:
| id | session-id | sender | topic (example: phase) | payload | entry-time | ...
Now, when I just want to get a whole session I can just query by session-id.
Another view I want to achieve looks like this:
| session-id (distinct) | begin time | end time | duration | success |
Success is a boolean; true when in the current session there is an entry where the payload has a 'phase':'success'. Otherwise it is not successful.
Now I have the problem that this query is very slow. Everytime I want to access it, it has to calculate for each session if it was successful, along with the time calculation.
Should I make a script at the end of a session, to calculate this information and put it in another table? The problem I have with this solution is that I will have duplicate data.
Can I make this faster with indexes? Or did I just make a huge design mistake from the beginning?
Thanks in advance
Indexes? Yes. YES!
If session-id is unique, get rid of id and use PRIMARY KEY(session_id).
success could be TINYINT NOT NULL with values 0 or 1 for fail or success.
If the "payload" is coming in as JSON, then I suggest storing the entire string in a column for future reference, plus pull out any columns that you need to search on and index them. In later versions of MySQL, there is a JSON datatype, which could be useful.
Please provide some SELECTs so we can further advise.
Oh, did I mention how important indexes are to databases?
When the probability of data loss of value(field) in 1 update (or simple in some time passing) is more than 0, then it is sure that on some point in future, the data is lost.
But how relevant is this theoretical conclusion in practice?
I need to have a database that stores the "user-likes" of a certain thing by a database design ID | THING | LIKES(int). Additionally to that there will be a database storing every single like of a user in a design ID | USER | THING.
When the amount of likes of a certain THING has to be displayed, it would be too slow to count every row of the second database WHERE THING = $value so I would just look up LIKESof the first database and if the user likes a thing I would just increase the number of LIKES by 1 (as in the theoretical question above).
I wouldn't worry about writing data from an "false-values" point of view. Most databases I know of guarantee the ACID-Properties.
Counting of course is slower than already having the count available to access via a key-index.
I have a database for a device and the columns are like this:
DeviceID | DeviceParameter1 | DeviceParameter2
At this stage I need only these parameters, but maybe a few months down the line, I may need a few more devices which have more parameters, so I'll have to add DeviceParameter3 etc as columns.
A friend suggested that I keep the parameters as rows in another table (ParamCol) like this:
Column | ColumnNumber
---------------------------------
DeviceParameter1 | 1
DeviceParameter2 | 2
DeviceParameter3 | 3
and then refer to the columns like this:
DeviceID | ColumnNumber <- this is from the ParamCol table
---------------------------------------------------
switchA | 1
switchA | 2
routerB | 1
routerB | 2
routerC | 3
He says that for 3NF, when we expect a table whose columns may increase dynamically, it's better to keep the columns as rows. I don't believe him.
In your opinion, is this really the best way to handle a situation where the columns may increase or is there a better way to design a database for such a situation?
This is a "generic data model" question - if you google the term you'll find quite a bit of material on the net.
Here is my view: if and only if the parameters are NOT qualitatively different from the application perspective, then go with the dynamic row solution (i.e. a generic data model). What does qualitatively mean - it means that within your application you don't treat Parameter3 any different to Parameter17.
You should never ever generate new columns on-the-fly, that's a very bad idea. If the columns are qualitatively different and you want to be able to cater for new ones, then you could have a different Device Parameter table for each different category of parameters. The idea is to avoid dynamic SQL as much as possible as it brings a set of its own problems.
Adding dynamic column is a bad idea, Actually it's a bad design. I would agree with your second option , Adding rows is OK,
Because if you want to add dynamically grow the columns then you have to provide them a default value, also you will not be able to use them as 'UNIQUE' vals, you will find really hard while updating the tables, So better to stick with adding 'ROWS' plan.
I'm working on a URL shortener project with PHP & MYSQL which tracks visits of each url. I've provided a table for visits which mainly consists of these properties :
time_in_second | country | referrer | os | browser | device | url_id
#####################################################################
1348128639 | US | direct | win | chrome | mobile | 3404
1348128654 | US | google | linux | chrome | desktop| 3404
1348124567 | UK | twitter| mac | mozila | desktop| 3404
1348127653 | IND | direct | win | IE | desktop| 3465
Now I want to make a query on this table. for example I want to get visits data for the url with url_id=3404. Because I should provide statistics and draw graphs, for this url, I need these data:
Number of each kind of OS for this URL , for example 20 windows, 15 linux , ...
Number of visits in each desired period of time , for example each 10 minutes in past 24 hour
Number of visits for each country
...
As you see, some data like country may accept lots of different values.
One good idea which I can imagine is to make query which outputs number of each unique value in each column, for example in the country case for the data given above, on column for num_US, one for num_UK, and one for num_IND.
Now the question is how to implement such a high-performance query in sql (MYSQL) ?
Also if you think this is not an efficient query for performance, what's your suggestion?
Any help will be appreciated deeply.
UPDATE: look at this question : SQL; Only count the values specified in each column . I think this question is similar to mine , but the difference is in variety of values possible (as lots of values are possible for country property) for each column which makes the query more complex.
It looks like you need to do more than one query. You probably could write one query with different parameters but that would make it complex and hard to maintain. I would approach it as multiple small queries. So for each requirement I make a query and call them separately or individually. For example, if you want the country query you mentioned, you could do the following
SELECT country, count (*) FROM <TABLE_NAME> WHERE url_id = 3404 GROUP BY Country
By the way, I have not tested this query, so it may be inaccurate, but this is just to give you an idea. I hope this helps.
Also, another suggestion is to use Google Analytics, look into it, they do have a lot of what you already are implementing, maybe that helps as well.
Cheers.
Each of these graphs you want to draw represents a separate relation, so my off-the-cuff response is that you can't build a single query that gives you exactly the data you need for every graph you want to draw.
From this point, your choises are:
Use different queries for different graphs
Send a bunch of data to the client and let it do the required post-processing to create the exact sets of data it needs for different graphs
farm it all out to Google Analytics (a la #wahab-mirjan)
If you go with option 2 you can minimize the amount of data you send by counting hits per (10-minute, os, browser, device, url_id) tupple. This essentially removes all duplicate rows and gives you a count. The client software would take these numbers and further reduce them by country (or whatever) to get the numbers it needs for a graph. To be honest though, I think you're buying yourself extra complexity for not very much gain.
If you insist on doing this yourself (instead of using a service) then go with a different query for each kind of graph. Start with a couple of reasonable indexes (url_id and time_in_second are obvious starting points). Use the explain statement (or whatever your database provides) to understand how each query is executed.
Sorry, I am new to Stack Overflow and having a problem with comment formatting. Here is my answer again, hopefully it workds now:
Not sure how it is poor in performance. The way I am thinking is you will end up with a table that looks like this:
country | count
#################
US | 304
UK | 123
UK | 23
So when you group by country, and count, it will be one query. I think this will get you going in the right direction. In any case, it is just an opinion, so if you find another approch, I am interested in knowing it as well.
Apologies about the comment messup up there..
Cheers
In my case I want to maintain a table for store some kind of data and after some period remove from the first table and store to another table.
I want to clarify the what is the best practice in this kind of scenario.
I am using MySql database in java base application.
Generally, I follow this procedure. Incase I want to delete a row. I have a tinyint column called deleted. I mark this column for that row as true.
That indicates that that row has been marked as deleted, So I dont, pick it up.
Later (maybe once a day), I run a script which in a single shot either delete the rows entirely or migrate them to another table... etc.
This is useful as every time you delete a row (even if it's 1 row), mysql has to reindex (it's indexes). This might require significant system resources depending on your data size or number of indexes. You might not want to incur these overheads everytime...
You did not provide enough information but I think if both tables have same data structure then you can avoid using two tables. Just add another column in first table and set status/type for those particular second table records.
For Example:
id | Name | BirthDate | type
------------------------------------
1 | ABC | 01-10-2001 | firsttable
2 | XYZ | 01-01-2000 | secondtable
You can pick records like this:
select * from tablename where type='firsttable'
OR
select * from tablename where type='secondtable'
If you are archiving old data, there should be a way to set up a scheduled job in mysql. I know there is in SQL Server and it's the kind of function that most databases require, so I imagine it can be done in mySQL. Shecdule the job to run in the low-usage hours. Have it select all records more than a year old (or whatever amount of time of records you want to keep active) and move them to an archive table and then delete them. Depending on the number of records you would be moving, it might be best to do this once a week or daily. You don't want the number of records expiring to be so large it affects performance greatly or makes the job take too long.
Inarchiving, the critical piece is to make sure you keep all the records that will be needed frequently and don't forget to consider reporting in that(many reports need to havea years worth or two years worth of data, do not archive records these reports should need). Then you also need to set up a way for users to access the archived records on the rare occasions they may need to see them.