Average a difference in timestamps across all rows - mysql

I have a table that looks like so:
id | username | jointime | parttime
------------------------------------------
1 | foo | 1391806818 | 1391814383
2 | bar | 1391406218 | 1392714270
3 | baz | 1391327818 | 1393197383
4 | qux | 1391815603 | 1391818320
I would like to find the overall average time that's being spent on the site (parttime - jointime).
I tried a query like the one below, but it just returned the average time spent by one single user.
SELECT AVG(parttime - jointime) as time FROM foo_table
Any ideas as to how I can get the overall average difference?
Thanks

You are already getting what you want.
Using your example data, result is:
796974.75
http://sqlfiddle.com/#!2/d94977/1/0
That is the average of the 4 differences in your example data:
7565
1308052
1869565
2717
http://sqlfiddle.com/#!2/d94977/2/0
It is not the average of one particular user, but indeed the average of all sessions (stored on the table), which I believe is what you want.

Wrap that in a SUM.
SELECT SUM(AVG(parttime - jointime)) as time FROM foo_table

Related

Sql query with mixed selected rows order

I have a table that looks like below:
| id | group_id | title |
-------------------------
| 1 | 1 | Hello |
| 2 | 1 | World |
| 3 | 2 | Foo |
| 4 | 2 | Bar |
My query may look like below to return the results above:
SELECT * FROM my_table ORDER BY id
Question
How can I order this table so that the group ids appears to be random, but still the same every time the query is executed.
Possible result example
This result looks to be in a random order. If I run the same query a week later, I want to see the exact same order which means it's not really random.
| id | group_id | title |
-------------------------
| 2 | 1 | World |
| 4 | 2 | Bar |
| 1 | 1 | Hello |
| 3 | 2 | Foo |
Appears to be random from a group_id perspective. It's no longer ordered by group_id like 1 1 2 2, but 1 2 1 2. It could also have been 2 1 1 2 or something that does not increase.
Should return the same results every time, not random each time.
I could order by title but if a title should change that row will be reordered. So the order needs to be made with the id I guess.
I want to avoid file or database caching if possible.
Is it possible?
How about taking the modulo function for your advantage.
SELECT * FROM my_table ORDER BY id % 3,id
Define a value to use with the modulo function (in my example 3) and order your table by the modulo of the id.
This should return the same order everytime you run the query and return some order that is pseudo random.
Since the modulo function can return the same value for different ids you also need to order by the original id to have a defined, reproducable order.
order this table so that the group ids appears to be random
Only ORDER BY RAND() may provide really random ordering.
but still the same every time the query is executed
Create separate static ordering table, fill it randomly with source table's ids, join it and order by it.
I did not solve the problem with the solution from #Kylro, but I found another way which works great.
SELECT * FROM my_table ORDER BY COS(id), id
Cos is sometimes a positive value and sometimes a negative value, almost random like. It works perfecty for this problem.

SQL group by dates, simple select statement

INTRO: Given a table with a column 'time' of unique dates(or datetime) and another column with some random integer called 'users'.
I usually do a call as such:
select table.dates, count(table.dates)
from table
group by year(table.dates), month(table.dates)
order by table.dates desc
which will return the number of users per month, albeit in an unformatted way. (I know it's not the standard way, but I check my values and this seems to work)
Here is my problem:
DATA: a table with with non-unique year/month dates, and a corresponding user count on that row.
PROBLEM: I wish to sum the user counts for identical dates, and again show a user count for every month.
EDIT: Perhaps you can ignore the INTRO, and here is an example of the data I need to work with:
| Date |user count |
----------------------.-
|2015-01 | 9 |
|2014-09 | 5 |
|2014-09 | 2 |
|2014-08 | 5 |
|2014-09 | 7 |
|2014-08 | 2 |
|2014-07 | 3 |

Access a parent field from sub query in mysql

I'm trying to access a field being called from the parent query within a nested one and here is my table
TABLE: reminders.
Columns: id:PK, rid:VARCHAR, title:VARCHAR, remind:Integer, start_day:DATE
SELECT id, remind, rid, title
FROM reminders
WHERE DATEDIFF(start_day, NOW()) <= (SELECT LEAST(3, remind))
Basically the second "remind" column in the LEAST() command is suppossed to reference the first "remind" column value for every row being spanned but for reasons that I can't just imagine i keep getting unexpected returns.
EDIT
In response to Sir Gordons that i provide more detailed info, I will try my best but I really do not know how to present table data here, but i'll try.
So basically i'm trying to SELECT all items from the reminders table WHERE the DIFFERENCE between the SET DAY (start_day) and TODAY doesn't exceed one of TWO values, those are either 3 or the value set in the remind column of the current row. Basically if the value set there is less than 3 then it should be used instead, but if it exceeds 3, 3 should be chosen. Here's a visual of the table.
+---+-----------------+--------------------+-----------------+-------------+
|id | rid | title | start_day | remind |
+---|-----------------|--------------------|-----------------|-------------|
|1 | ER456GH | This is real deep | 2014-01-01 | 10 |
|2 | OUBYV90 | This is also deep | 2014-01-13 | 10 |
|3 | UI90POL | This is deeper | 2014-01-13 | 60 |
|4 | TWEET90 | This is just deep | 2014-01-14 | 0 |
+---+-----------------+--------------------+-----------------+-------------+
So in editing this I realized that there was a false table entry under remind on the 4th entry that was causing it to pull false (i.e where remind = 0). Sigh. Some serious short sight on my part/lack of sleep I guess. The query does work . Thanks again.
You don't need a subquery here. Does this work?
SELECT id, remind, rid, title
FROM reminders
WHERE DATEDIFF(start_day, NOW()) <= LEAST(3, remind);

MySQL - how to optimize query to count votes

Just after some opinions on the best way to achieve the following outcome:
I would like to store in my MySQL database products which can be voted on by users (each vote is worth +1). I also want to be able to see how many times in total a user has voted.
To my simple mind, the following table structure would be ideal:
table: product table: user table: user_product_vote
+----+-------------+ +----+-------------+ +----+------------+---------+
| id | product | | id | username | | id | product_id | user_id |
+----+-------------+ +----+-------------+ +----+------------+---------+
| 1 | bananas | | 1 | matthew | | 1 | 1 | 2 |
| 2 | apples | | 2 | mark | | 2 | 2 | 2 |
| .. | .. | | .. | .. | | .. | .. | .. |
This way I can do a COUNT of the user_product_vote table for each product or user.
For example, when I want to look up bananas and the number of votes to show on a web page I could perform the following query:
SELECT p.product AS product, COUNT( v.id ) as votes
FROM product p
LEFT JOIN user_product_vote v ON p.id = v.product_id
WHERE p.id =1
If my site became hugely successful (we can all dream) and I had thousands of users voting on thousands of products, I fear that performing such a COUNT with every page view would be highly inefficient in terms of server resources.
A more simple approach would be to have a 'votes' column in the product table that is incremented each time a vote is added.
table: product
+----+-------------+-------+
| id | product | votes |
+----+-------------+-------+
| 1 | bananas | 2 |
| 2 | apples | 5 |
| .. | .. | .. |
While this is more resource friendly - I lose data (eg. I can no longer prevent a person from voting twice as there is no record of their voting activity).
My questions are:
i) am I being overly worried about server resources and should just stick with the three table option? (ie. do I need to have more faith in the ability of the database to handle large queries)
ii) is their a more efficient way of achieving the outcome without losing information
You can never be over worried about resources, when you first start building an application you should always have resources, space, speed etc. in mind, if your site's traffic grew dramatically and you never built for resources then you start getting into problems.
As for the vote system, personally I would keep the votes like so:
table: product table: user table: user_product_vote
+----+-------------+ +----+-------------+ +----+------------+---------+
| id | product | | id | username | | id | product_id | user_id |
+----+-------------+ +----+-------------+ +----+------------+---------+
| 1 | bananas | | 1 | matthew | | 1 | 1 | 2 |
| 2 | apples | | 2 | mark | | 2 | 2 | 2 |
| .. | .. | | .. | .. | | .. | .. | .. |
Reasons:
Firstly user_product_vote does not contain text, blobs etc., it's purely integer so it takes up less resources anyways.
Secondly, you have more of a doorway to new entities within your application such as Total votes last 24 hr, Highest rated product over the past 24 hour etc.
Take this example for instance:
table: user_product_vote
+----+------------+---------+-----------+------+
| id | product_id | user_id | vote_type | time |
+----+------------+---------+-----------+------+
| 1 | 1 | 2 | product |224.. |
| 2 | 2 | 2 | page |218.. |
| .. | .. | .. | .. | .. |
And a simple query:
SELECT COUNT(id) as total FROM user_product_vote WHERE vote_type = 'product' AND time BETWEEN(....) ORDER BY time DESC LIMIT 20
Another thing is if a user voted at 1AM and then tried to vote again at 2PM, you can easily check when the last time they voted and if they should be allowed to vote again.
There are so many opportunities that you will be missing if you stick with your incremental example.
In regards to your count(), no matter how much you optimize your queries it would not really make a difference on a large scale.
With an extremely large user-base your resource usage will be looked at from a different perspective such as load balancers, mainly server settings, Apache, catching etc., there's only so much you can do with your queries.
If my site became hugely successful (we can all dream) and I had thousands of users voting on thousands of products, I fear that performing such a COUNT with every page view would be highly inefficient in terms of server resources.
Don't waste your time solving imaginary problems. mysql is perfectly able to process thousands of records in fractions of a second - this is what databases are for. Clean and simple database and code structure is far more important than the mythical "optimization" that no one needs.
Why not mix and match both? Simply have the final counts in the product and users tables, so that you don't have to count every time and have the votes table , so that there is no double posting.
Edit:
To explain it a bit further, product and user table will have a column called "votes". Every time the insert is successfull in user_product_vote, increment the relevant user and product records. This would avoid dupe votes and you wont have to run the complex count query every time as well.
Edit:
Also i am assuming that you have created a unique index on product_id and user_id, in this case any duplication attempt will automatically fail and you wont have to check in the table before inserting. You will just to make sure the insert query ran and you got a valid value for the "id" in the form on insert_id
You have to balance the desire for your site to perform quickly (in which the second schema would be best) and the ability to count votes for specific users and prevent double voting (for which I would choose the first schema). Because you are only using integer columns for the user_product_vote table, I don't see how performance could suffer too much. Many-to-many relationships are common, as you have implemented with user_product_vote. If you do want to count votes for specific users and prevent double voting, a user_product_vote is the only clean way I can think of implementing it, as any other could result in sparse records, duplicate records, and all kinds of bad things.
You don't want to update the product table directly with an aggregate every time someone votes - this will lock product rows which will then affect other queries which are using products.
Assuming that not all product queries need to include the votes column, you could keep a separate productvotes table which would retain the running totals, and keep your userproductvote table as a means to enforce your user voting per product business rules / and auditing.

Combine count rows in MySQL

I've got a table in MySQL that looks roughly like:
value | count
-------------
Fred | 7
FRED | 1
Roger | 3
roger | 1
That is, it was created with string ops outside of MySQL, so the values are case- and trailing-whitespace-sensitive.
I want it to look like:
value | count
-------------
Fred | 8
Roger | 4
That is, managed by MySQL, with value a primary key. It's not important which one (of "Fred" or "FRED") is kept.
I know how to do this in code. I also know how to generate a list of problem values (with a self-join). But I'd like to come up with a SQL update/delete to migrate my table, and I can't think of anything.
If I knew that no pair of records had variants of one value, with the same count (like ("Fred",4) and ("FRED",4)), then I think I can do it with a self-join to copy the counts, and then an update to remove the zeros. But I have no such guarantee.
Is there something simple I'm missing, or is this one of those cases where you just write a short function outside of the database?
Thanks!
As an example of how to obtain the results you are looking for with a SQL query alone:
SELECT UPPER(value) AS name, SUM(count) AS qty FROM table GROUP BY name;
If you make a new table to hold the correct values, you INSERT the above query to populate the new table as so:
INSERT INTO newtable (SELECT UPPER(value) AS name, SUM(count) AS qty FROM table GROUP BY name);
Strangely, MySQL seems to do this for you. I just tested this in MySQL 5.1.47:
create table c (value varchar(10), count int);
insert into c values ('Fred',7), ('FRED',1), ('Roger',3), ('roger',1);
select * from c;
+-------+-------+
| value | count |
+-------+-------+
| Fred | 7 |
| FRED | 1 |
| Roger | 3 |
| roger | 1 |
+-------+-------+
select value, sum(count) from c group by value;
+-------+------------+
| value | sum(count) |
+-------+------------+
| Fred | 8 |
| Roger | 4 |
+-------+------------+
I was surprised to see MySQL transform the strings like that, and I'm not sure I can explain why it did that. I was expecting to have to get four distinct rows, and to have to use some string functions to map the values to a canonical form.