I'm working on a URL shortener project with PHP & MYSQL which tracks visits of each url. I've provided a table for visits which mainly consists of these properties :
time_in_second | country | referrer | os | browser | device | url_id
#####################################################################
1348128639 | US | direct | win | chrome | mobile | 3404
1348128654 | US | google | linux | chrome | desktop| 3404
1348124567 | UK | twitter| mac | mozila | desktop| 3404
1348127653 | IND | direct | win | IE | desktop| 3465
Now I want to make a query on this table. for example I want to get visits data for the url with url_id=3404. Because I should provide statistics and draw graphs, for this url, I need these data:
Number of each kind of OS for this URL , for example 20 windows, 15 linux , ...
Number of visits in each desired period of time , for example each 10 minutes in past 24 hour
Number of visits for each country
...
As you see, some data like country may accept lots of different values.
One good idea which I can imagine is to make query which outputs number of each unique value in each column, for example in the country case for the data given above, on column for num_US, one for num_UK, and one for num_IND.
Now the question is how to implement such a high-performance query in sql (MYSQL) ?
Also if you think this is not an efficient query for performance, what's your suggestion?
Any help will be appreciated deeply.
UPDATE: look at this question : SQL; Only count the values specified in each column . I think this question is similar to mine , but the difference is in variety of values possible (as lots of values are possible for country property) for each column which makes the query more complex.
It looks like you need to do more than one query. You probably could write one query with different parameters but that would make it complex and hard to maintain. I would approach it as multiple small queries. So for each requirement I make a query and call them separately or individually. For example, if you want the country query you mentioned, you could do the following
SELECT country, count (*) FROM <TABLE_NAME> WHERE url_id = 3404 GROUP BY Country
By the way, I have not tested this query, so it may be inaccurate, but this is just to give you an idea. I hope this helps.
Also, another suggestion is to use Google Analytics, look into it, they do have a lot of what you already are implementing, maybe that helps as well.
Cheers.
Each of these graphs you want to draw represents a separate relation, so my off-the-cuff response is that you can't build a single query that gives you exactly the data you need for every graph you want to draw.
From this point, your choises are:
Use different queries for different graphs
Send a bunch of data to the client and let it do the required post-processing to create the exact sets of data it needs for different graphs
farm it all out to Google Analytics (a la #wahab-mirjan)
If you go with option 2 you can minimize the amount of data you send by counting hits per (10-minute, os, browser, device, url_id) tupple. This essentially removes all duplicate rows and gives you a count. The client software would take these numbers and further reduce them by country (or whatever) to get the numbers it needs for a graph. To be honest though, I think you're buying yourself extra complexity for not very much gain.
If you insist on doing this yourself (instead of using a service) then go with a different query for each kind of graph. Start with a couple of reasonable indexes (url_id and time_in_second are obvious starting points). Use the explain statement (or whatever your database provides) to understand how each query is executed.
Sorry, I am new to Stack Overflow and having a problem with comment formatting. Here is my answer again, hopefully it workds now:
Not sure how it is poor in performance. The way I am thinking is you will end up with a table that looks like this:
country | count
#################
US | 304
UK | 123
UK | 23
So when you group by country, and count, it will be one query. I think this will get you going in the right direction. In any case, it is just an opinion, so if you find another approch, I am interested in knowing it as well.
Apologies about the comment messup up there..
Cheers
Related
I want to keep track of a user counter though time and be able to generate stats about the changes in the counter through time.
I'm pretty set (although if they are better ways I would like to hear about them) about the two main tables. user and counter_change that would look pretty much like this:
user:
+-----------+------------+
| id | username |
+-----------+------------+
| 1 | foo |
| 2 | bar |
+-----------+------------+
counter_change:
+-----------+--------------------+------------+
| user_id | counter_change_val | epoch_time |
+-----------+--------------------+------------+
| 1 | 10 | 1513242884 |
| 1 | -1 | 1513242889 |
+-----------+--------------------+------------+
I want to be able to show the current counter value (with the base value being 0) at the frontend as well as some stats trough time (ex: yesterday your net counter was +10 or -2, etc)
I've thought about some possible solutions but none of them seem to be the perfect solution.
Add counter to user table (or on some new counters table):
This solution seems to be the more resources effective, at the time of inserting a counter_change, update the counter in user with the counter_change_val.
Pros:
Get the counter current value would consume virtually no resources.
Cons:
The sum of counter_changes_val could diverge from the counter in user if a bug occurs.
Couldn't be really used for stats fields as it would require an additional query, and at that point a trigger would be more handy.
Add a calculated counter to user table (or on some new counters table) on insert/update:
This solution would consist of a SQL trigger or some sort of function at ORM level that would update the value on an insert to the the counter_change table with the sum of the counter_change_val.
This would be also used on calculated fields that imply grouping by dates. For example get the average daily changes of the last 30 days.
Pros:
Get the counter current value would consume virtually no resources.
Cons:
On every insert an aggregation of all the current user counter_change would be needed.
Add a view or select with the sum of counter
This solution would consist of creating a view or select to get the sum of the aggregate counter_change_val when needed.
Pros:
Adds no fields to the tables.
Cons:
As it is calculated at runtime it would add time to request response time.
Every time the counter is consulted an aggregation of the counter_change values would be needed.
Actually, I am not sure that I have understood what you are trying to do. Nevertheless, I would suggest Option 1 or Option 2:
Option 1 is efficient, and it is sufficiently safe against errors if it is done right. For example, you could wrap inserting the counter_change and computing the new counter_value in a transaction; this will prevent any inconsistencies. You could do that either in the back-end software or in a trigger (e.g. upon inserting a counter_change).
Regarding Option 2, it is not clear to me why an aggregation over all the counter_change of the current user would be needed. You can adjust the counter_value in the user table from within an insert trigger as with option 1, and you can use transactions to make it safe.
IMHO, adjusting the current counter_value upon every insert of a counter_change is the most efficient solution. You can do it either in the back-end software or from within a trigger. In both cases, use transactions.
Option 3) should not be used because it will put much of load onto the system (assume you have 1000 counter_changes per user ...).
Regarding the statistics: This is a different problem from storing the data in the first place. You probably will need some sort of aggregation for any statistical data. To speed this up, you could think about caching results and things like that.
I have an application that let's devices communicate over MQTT.
When two (or more devices) are paired, they are in a session (with a session-id)
The topics are for example:
session/<session-id>/<sender-id>/phase
with a payload like
{'phase': 'start', 'othervars': 'examplevar'}
Every session is logged into a mySQL database into the following format:
| id | session-id | sender | topic (example: phase) | payload | entry-time | ...
Now, when I just want to get a whole session I can just query by session-id.
Another view I want to achieve looks like this:
| session-id (distinct) | begin time | end time | duration | success |
Success is a boolean; true when in the current session there is an entry where the payload has a 'phase':'success'. Otherwise it is not successful.
Now I have the problem that this query is very slow. Everytime I want to access it, it has to calculate for each session if it was successful, along with the time calculation.
Should I make a script at the end of a session, to calculate this information and put it in another table? The problem I have with this solution is that I will have duplicate data.
Can I make this faster with indexes? Or did I just make a huge design mistake from the beginning?
Thanks in advance
Indexes? Yes. YES!
If session-id is unique, get rid of id and use PRIMARY KEY(session_id).
success could be TINYINT NOT NULL with values 0 or 1 for fail or success.
If the "payload" is coming in as JSON, then I suggest storing the entire string in a column for future reference, plus pull out any columns that you need to search on and index them. In later versions of MySQL, there is a JSON datatype, which could be useful.
Please provide some SELECTs so we can further advise.
Oh, did I mention how important indexes are to databases?
I have a database for a device and the columns are like this:
DeviceID | DeviceParameter1 | DeviceParameter2
At this stage I need only these parameters, but maybe a few months down the line, I may need a few more devices which have more parameters, so I'll have to add DeviceParameter3 etc as columns.
A friend suggested that I keep the parameters as rows in another table (ParamCol) like this:
Column | ColumnNumber
---------------------------------
DeviceParameter1 | 1
DeviceParameter2 | 2
DeviceParameter3 | 3
and then refer to the columns like this:
DeviceID | ColumnNumber <- this is from the ParamCol table
---------------------------------------------------
switchA | 1
switchA | 2
routerB | 1
routerB | 2
routerC | 3
He says that for 3NF, when we expect a table whose columns may increase dynamically, it's better to keep the columns as rows. I don't believe him.
In your opinion, is this really the best way to handle a situation where the columns may increase or is there a better way to design a database for such a situation?
This is a "generic data model" question - if you google the term you'll find quite a bit of material on the net.
Here is my view: if and only if the parameters are NOT qualitatively different from the application perspective, then go with the dynamic row solution (i.e. a generic data model). What does qualitatively mean - it means that within your application you don't treat Parameter3 any different to Parameter17.
You should never ever generate new columns on-the-fly, that's a very bad idea. If the columns are qualitatively different and you want to be able to cater for new ones, then you could have a different Device Parameter table for each different category of parameters. The idea is to avoid dynamic SQL as much as possible as it brings a set of its own problems.
Adding dynamic column is a bad idea, Actually it's a bad design. I would agree with your second option , Adding rows is OK,
Because if you want to add dynamically grow the columns then you have to provide them a default value, also you will not be able to use them as 'UNIQUE' vals, you will find really hard while updating the tables, So better to stick with adding 'ROWS' plan.
After searching many forums, I think my problem is how to type the question properly because I can't seem to find an answer remotely close to what I need, yet I think this is excel > mysql 101 by the looks of it..
I have an excel sheet with dozens of types of blinds (for windows). There is a row which is the width.. and a left column that is the height. As you cross reference a width and height (say 24 x 36) it has a price value.
| 24 | 30 | 32 | 36 (width)
----------------------------
24 | $50 $55 etc
30 | $60 etc etc(price)
32 | $70
(height)
I can't for the life of me figure out where or how I am to import this into mysql when my database looks like this..
itemname_id <<(my primary) | width | height | price
-------------------------------------------------------------------
Am I doomed to manually typed thousands of combinations or is this common? How do I type the correct terms to find a solution? I'm not speaking the right lingo evidently.
Thank you so much for any guidance. I've looked forever and I keep hitting a wall.
It probably would have helped to know that the layout of your Excel data is commonly referred to as a pivot table. It is possible to "unpivot" the data in Excel to get the data in the format that you want to import to your database.
This brief article shows how to create a pivot table and then unpivot it. Basically, that entails creating a "sum of values" pivot table and then double-clicking on the single value that is the result. It's counter-intuitive, but pretty simple to do.
Have you noticed that almost every links in facebook have ref query string?
I belive that, with that ref, facebook somehow track and study their user behaviour. this could be their secret recipe of making a better usability.
So, I am trying out the same thing, change http://a.com/b.aspx
to
http://a.com/b.aspx?ref=c and log every hits into a table.
========================================================================
userid | page | ref | response_time | dtmTime
========================================================================
54321 | profile.aspx | birthday | 123 | 2009-12-23 11:05:00
12345 | compose.aspx | search | 456 | 2009-12-23 11:05:02
54321 | payment.aspx | gift | 234 | 2009-12-23 11:05:01
12345 | chat.aspx | search | 567 | 2009-12-23 11:05:03
..... | ............ | ........ | ... | ...................
I think it's a good start. I just don't know what to do with these informations.
Is there any appropriate methodology to process these informations?
Research has shown that fast responses are a way to improve not only usability of a website. It's also a way to improve conversion rates or site usage in general.
Tests at Amazon revealed that every 100 ms increase in load time of Amazon.com decreased sales by 1%
Experiments at Microsoft on Live Search showed that when search results pages were slowed by 1 second: a) Queries per user declined by 1.0%, and b) Ad clicks per user declined by 1.5%
People simply don't want to wait. Therefore, we track response time percentiles for our sites. Additionally, nice visualization of this data helps with measuring performance optimization efforts and monitoring server health.
Here is an example generated using Google Charts:
That looks bad! Response times of > 4000 ms certainly indicate performance problems that have a considerable impact on usability. At times the 800 ms percentile (which we consider a good indicator for our apps) was as low as 77%. We typically try to get the 800 ms percentile at 95%. So this looks like there's some serious work ahead ... but the image is nice, isn't it? ;)
Here's a second answer as the former was only about response time statistics.
The ref query string allows to identify the sources, especially of people entering a Conversion funnel. So you might make statements like "N $ of revenue come from users clicking link X on page Y". Now you could try to modify link X to X1 and see if it increases revenue from this page. That would be your first step into A/B Testing and Multivariate Analysis. Google Website Optimizer is a tool exactly for this purpose.
Well facebook uses them for user interface usage observation (I believe) so they see where people click more (logo or profile link) and they consider changing the UI accordingly in order to make interaction better.
You might also be able to use it to see common patterns in usage. For instance, if people follow a certain chain profile -> birthday -> present -> send you might consider adding in a function or feature to "send present" on their profile when it's that persons birthday. Just a thought.
To make the best use of your website statistics you need to think about what your users are trying to acheive and what you want them to achieve. These are your site's goals
For an ecomerce site this is failrly easy. Typical goals might be:
Search for a product and find information about it.
Buy a product.
Contact someone for help.
You can then use your stats to see if people are completing the site's goals. To do this you need to collect a visitors information together so you can see all the pages they have been to.
Once you can look at all the pages a user has visitted and the sequence they visitted them in you can see what they have been doing. You can look for drop out points where they were about to buy something and then didn't. You can identify product searches that were unsuccessful. You can do all sorts. You can then try and fix these issues and watch the stats to see if it has helped.
The stats you're collecting are a good start, but collecting good stats and collating them is complicated. I'd suggest using an existing stats package I personally use Google Analytics, but there are others available.