Querying a database of statistics to get counts of different events - mysql

I'm making a database of a soccer league that has these tables:
+---------------------+
| Tables_in_league484 |
+---------------------+
| player |
| statevent |
+---------------------+
18 rows in set (0.09 sec)
and the player table in question look like this,
mysql> desc player;
+-----------+-------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-----------+-------------+------+-----+---------+----------------+
| pid | int(11) | NO | PRI | NULL | auto_increment |
| lastname | varchar(55) | YES | | NULL | |
| firstname | varchar(85) | YES | | NULL | |
| dob | date | YES | | NULL | |
| posid | int(11) | YES | MUL | NULL | |
| tid | int(11) | YES | MUL | NULL | |
| shirtnum | int(11) | YES | | NULL | |
| email | varchar(85) | YES | | NULL | |
+-----------+-------------+------+-----+---------+----------------+
8 rows in set (0.09 sec)
posid is fk for position table;
tid is fk for team table;
mysql> desc statevent;
+--------+-------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+--------+-------------+------+-----+---------+----------------+
| eid | int(11) | NO | PRI | NULL | auto_increment |
| gid | int(11) | YES | MUL | NULL | |
| pid | int(11) | YES | MUL | NULL | |
| minute | int(11) | YES | | NULL | |
| typeid | int(11) | YES | | NULL | |
+--------+-------------+------+-----+---------+----------------+
5 rows in set (0.09 sec)
where the typeids are:
1 for shot
2 for save
3 for goal
4 for assist
how can i structure a mysql query that gives me a result that looks like this
+--------+------+------+-------+---------+----------------+
| Name | Team | Shots| Saves | Goals | Assists |
+--------+------+------+-------+---------+----------------+
| Nick | 1| 8| 0| 4| 1|
| Jeff | 4| 5| 0| 5| 6|
| Jim | 7| 7| 0| 6| 3|
+--------+------+------+-------+---------+----------------+
that ends after the 10th result? (limit 10)
I've been trying for hours and I'm knackered thinking about it. What do I count? What do I group by? Can I order by aliases?
EDIT
I failed to mention in my first edit that, while there are 18 helpful tables in this database, they are all empty (thus entirely useless) as they relate to the stat events.
They would have been wonderfully helpful.
However, I have to structure my query on this one table of statevents using only typeid. Is this possible?

Essentially, you're just trying to construct a simple PIVOT TABLE query. Personally I'd advocate just returning a GROUPed result set and handle the data display at the application level, but if you must do the pivoting in MySQL then it might look something like this - I've changed some column/table names to get you thinking a bit...
SELECT p.firstname
, p.team_id
, COUNT(CASE WHEN event_type_id = 1 THEN 'foo' END) Shots
, COUNT(CASE WHEN event_type_id = 2 THEN 'foo' END) Saves
, COUNT(CASE WHEN event_type_id = 3 THEN 'foo' END) Goals
, COUNT(CASE WHEN event_type_id = 4 THEN 'foo' END) Assists
FROM player p
JOIN stat_event e
ON e.player_id = p.player_id
GROUP
BY p.player_id;

You would have to join the player table with the other tables you need counts from (shots, saves, goals etc).
One you have the join in place, you would need to aggregate on player id, player name and team with the help of a group by clause.
Your final query will look something like this..
SELECT p.firstname, t.team, COUNT(sh.shots), COUNT(sa.saves), COUNT(g.goals),COUNT(a.assists)
FROM player p
INNER JOIN team t
ON p.tid = t.tid
....
GROUP BY p.pid, p.firstname, t.team
LIMIT 10
EDIT:
I am not a DB expert. I have one SUBOPTIMAL way of achieving this.
I would create a temporary table containing information of the form (it would have to contain pid and tid information too):
...
Nick Goals 13
Matt Saves 4
Nick Saves 11
...
This should be simple to achieve.
I would then use a SQL cursor to iterate over all distinct player ids and recover statistics from the temporary table we constructed above.

Related

Sql query performance is varying though they are the same

There are 2 tables and their structure as below:
mysql> desc product;
+-------+-------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-------+-------------+------+-----+---------+-------+
| id | int(11) | NO | PRI | NULL | |
| brand | varchar(20) | YES | | NULL | |
+-------+-------------+------+-----+---------+-------+
2 rows in set (0.02 sec)
mysql> desc sales;
+-------------+-------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-------------+-------------+------+-----+---------+-------+
| id | int(11) | YES | | NULL | |
| yearofsales | varchar(10) | YES | | NULL | |
| price | int(11) | YES | | NULL | |
+-------------+-------------+------+-----+---------+-------+
3 rows in set (0.01 sec)
Here id is the foreign key.
And Queries are as follows:
1.
mysql> select brand,sum(price),yearofsales
from product p, sales s
where p.id=s.id
group by s.id,yearofsales;
+-------+------------+-------------+
| brand | sum(price) | yearofsales |
+-------+------------+-------------+
| Nike | 917504000 | 2012 |
| FF | 328990720 | 2010 |
| FF | 328990720 | 2011 |
| FF | 723517440 | 2012 |
+-------+------------+-------------+
4 rows in set (1.91 sec)
2.
mysql> select brand,tmp.yearofsales,tmp.sum
from product p
join (
select id,yearofsales,sum(price) as sum
from sales
group by yearofsales,id
) tmp on p.id=tmp.id ;
+-------+-------------+-----------+
| brand | yearofsales | sum |
+-------+-------------+-----------+
| Nike | 2012 | 917504000 |
| FF | 2011 | 328990720 |
| FF | 2012 | 723517440 |
| FF | 2010 | 328990720 |
+-------+-------------+-----------+
4 rows in set (1.59 sec)
Question is: Why the second query takes less time than the first one? I have executed it multiple times in different order as well.
You can check the execution plan for the two queries and the indexes on the two tables to see why one query takes more than the other. Also, you cannot run one simple test and trust the results, there are many factors that can impact the execution of queries, like the server being busy with something else when executing one query, so it runs slower. You'll have to run both queries a big number of times and then compare the averages.
However, it is highly recommended to use explicit joins instead of implicit joins:
SELECT brand, SUM(price), yearofsales
FROM product p
INNER JOIN sales s ON p.id = s.id
GROUP BY s.id, yearofsales;

MYSQL - output extra column based on a certain condition

At first, I want to apologize for providing such a weak title; I couldn't describe it in a better way.
Consider the following: We have three tables, one for users, one for records and one for ratings. The tables are quite self-explanatory but the schema for database is as following:
+---------------------+
| Tables_in_relations |
+---------------------+
| records |
| ratings |
| users |
+---------------------+
The schema for records table is as following:
+----------+----------------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+----------+----------------------+------+-----+---------+----------------+
| id | smallint(5) unsigned | NO | PRI | NULL | auto_increment |
| title | varchar(256) | NO | | NULL | |
| year | int(4) | NO | | NULL | |
+----------+----------------------+------+-----+---------+----------------+
The schema for users table is as following:
+----------+----------------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+----------+----------------------+------+-----+---------+----------------+
| id | smallint(5) unsigned | NO | PRI | NULL | auto_increment |
| email | varchar(256) | NO | | NULL | |
| name | varchar(256) | NO | | NULL | |
| password | varchar(256) | NO | | NULL | |
+----------+----------------------+------+-----+---------+----------------+
ratings table is, obvoiusly, where the ratings are stored among with the record_id and user_id and works as a relation table.
It's schema is as following:
+----------+----------------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+----------+----------------------+------+-----+---------+----------------+
| id | smallint(5) unsigned | NO | PRI | NULL | auto_increment |
| record_id| smallint(5) unsigned | NO | MUL | NULL | |
| user_id | smallint(5) unsigned | NO | MUL | NULL | |
| rating | int(1) | NO | | NULL | |
+----------+----------------------+------+-----+---------+----------------+
Now, In my application, I have a search function that fetches records based on a certain keyword. The output should also include the average rating of a certain record and a total amount of ratings per record. This can be accomplished by following query:
SELECT re.id, re.title, re.year, ROUND(avg(ra.rating)) as avg_rate,
COUNT(ra.record_id) as total_times_rated
FROM records re
LEFT JOIN ratings ra ON ra.record_id = re.id
GROUP BY re.id;
which will give me the following output:
+----+------------------------+------+----------+-------------------+
| id | title | year | avg_rate | total_times_rated |
+----+------------------------+------+----------+-------------------+
| 1 | Test Record 1 | 2008 | 3 | 4 |
| 2 | Test Record 2 | 2012 | 2 | 4 |
| 3 | Test Record 3 | 2003 | 3 | 4 |
| 4 | Test Record 4 | 2012 | 3 | 3 |
| 5 | Test Record 5 | 2003 | 2 | 3 |
| 6 | Test Record 6 | 2006 | 2 | 3 |
+----+------------------------+------+----------+-------------------+
Question:
Now, here comes the tricky part, at least for me. Within my app, you can search records whether signed in or not and if signed in, I'd also like to include the user's own rating value in the above query.
I know that I can run a conditional to check whether user is signed in or not by reading the session value and execute a corresponding query based on that. I just don't know how to include that individual rating value of a certain user to the above query.
You can add user's rating in the result by adding a SELECT query in columns:
SELECT re.id, re.title, re.year, ROUND(avg(ra.rating)) as avg_rate,
COUNT(ra.record_id) as total_times_rated,
(SELECT rating FROM ratings WHERE user_id = ? AND record_id = re.id) as user_rating
FROM records re
LEFT JOIN ratings ra ON ra.record_id = re.id
GROUP BY re.id;
We can get the user_id from session and pass it to this query in order to generate user_rating column in the result.
Assuming user can rate a record multiple times, I have used SUM. If not, we can remove it from the query.
Update
If you don't want GROUP BY to consider that value then you can wrap the existing query into another query and add a column to it, e.g.:
SELECT a.id, a.title, a.year, a.avg_rate, a.total_times_rated,
(SELECT rating FROM ratings WHERE user_id = ? AND record_id = a.id) as user_rating
FROM (SELECT re.id as id, re.title as title, re.year as year, ROUND(avg(ra.rating)) as avg_rate,
COUNT(ra.record_id) as total_times_rated
FROM records re
LEFT JOIN ratings ra ON ra.record_id = re.id
GROUP BY re.id) a;

Why does this query return an intermediate record?

I ran a somewhat nonsense query on MySQL, but because its output is the same each time, I'm wondering if someone can help me understand the underlying algorithm.
Here's the table Orders on which we'll execute the query (database taken from here, just in case someone's interested):
+----------------+-------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+----------------+-------------+------+-----+---------+-------+
| orderNumber | int(11) | NO | PRI | NULL | |
| orderDate | date | NO | | NULL | |
| requiredDate | date | NO | | NULL | |
| shippedDate | date | YES | | NULL | |
| status | varchar(15) | NO | | NULL | |
| comments | text | YES | | NULL | |
| customerNumber | int(11) | NO | MUL | NULL | |
+----------------+-------------+------+-----+---------+-------+
There are 326 records for now, with the largest orderNumber being 10425.
Now here's the query I ran (basically removed GROUP BY from a sensible query):
mysql> select count(1), orderNumber, status from orders;
+----------+-------------+---------+
| count(1) | orderNumber | status |
+----------+-------------+---------+
| 326 | 10100 | Shipped |
+----------+-------------+---------+
1 row in set (0.00 sec)
So I'm asking for the total number of rows, along with status and orderNumber, which can be just about anything under the given circumstances. But the query always returns orderNumber 10100, even if I log out and run it again.
Is there a predictable answer for this?
There's no predictable answer for which you should use in your design. In general, the DB will return the values of the first row that matches the query. If you want predictability, you should apply an aggregate to every column (e.g. using MIN or MAX to always get smallest/largest value)

How can I optimize this mysql query to find maximum simultaneous calls?

I'm trying to calculate maximum simultaneous calls. My query, which I believe to be accurate, takes way too long given ~250,000 rows. The cdrs table looks like this:
+---------------+-----------------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+---------------+-----------------------+------+-----+---------+----------------+
| id | bigint(20) unsigned | NO | PRI | NULL | auto_increment |
| CallType | varchar(32) | NO | | NULL | |
| StartTime | datetime | NO | MUL | NULL | |
| StopTime | datetime | NO | | NULL | |
| CallDuration | float(10,5) | NO | | NULL | |
| BillDuration | mediumint(8) unsigned | NO | | NULL | |
| CallMinimum | tinyint(3) unsigned | NO | | NULL | |
| CallIncrement | tinyint(3) unsigned | NO | | NULL | |
| BasePrice | float(12,9) | NO | | NULL | |
| CallPrice | float(12,9) | NO | | NULL | |
| TransactionId | varchar(20) | NO | | NULL | |
| CustomerIP | varchar(15) | NO | | NULL | |
| ANI | varchar(20) | NO | | NULL | |
| ANIState | varchar(10) | NO | | NULL | |
| DNIS | varchar(20) | NO | | NULL | |
| LRN | varchar(20) | NO | | NULL | |
| DNISState | varchar(10) | NO | | NULL | |
| DNISLATA | varchar(10) | NO | | NULL | |
| DNISOCN | varchar(10) | NO | | NULL | |
| OrigTier | varchar(10) | NO | | NULL | |
| TermRateDeck | varchar(20) | NO | | NULL | |
+---------------+-----------------------+------+-----+---------+----------------+
I have the following indexes:
+-------+------------+-----------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+-------+------------+-----------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| cdrs | 0 | PRIMARY | 1 | id | A | 269622 | NULL | NULL | | BTREE | | |
| cdrs | 1 | id | 1 | id | A | 269622 | NULL | NULL | | BTREE | | |
| cdrs | 1 | call_time_index | 1 | StartTime | A | 269622 | NULL | NULL | | BTREE | | |
| cdrs | 1 | call_time_index | 2 | StopTime | A | 269622 | NULL | NULL | | BTREE | | |
+-------+------------+-----------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
The query I am running is this:
SELECT MAX(cnt) AS max_channels FROM
(SELECT cl1.StartTime, COUNT(*) AS cnt
FROM cdrs cl1
INNER JOIN cdrs cl2
ON cl1.StartTime
BETWEEN cl2.StartTime AND cl2.StopTime
GROUP BY cl1.id)
AS counts;
It seems like I might have to chunk this data for each day and store the results in a separate table like simultaneous_calls.
I'm sure you want to know not only the maximum simultaneous calls, but when that happened.
I would create a table containing the timestamp of every individual minute
CREATE TABLE times (ts DATETIME UNSIGNED AUTO_INCREMENT PRIMARY KEY);
INSERT INTO times (ts) VALUES ('2014-05-14 00:00:00');
. . . until 1440 rows, one for each minute . . .
Then join that to the calls.
SELECT ts, COUNT(*) AS count FROM times
JOIN cdrs ON times.ts BETWEEN cdrs.starttime AND cdrs.stoptime
GROUP BY ts ORDER BY count DESC LIMIT 1;
Here's the result in my test (MySQL 5.6.17 on a Linux VM running on a Macbook Pro):
+---------------------+----------+
| ts | count(*) |
+---------------------+----------+
| 2014-05-14 10:59:00 | 1001 |
+---------------------+----------+
1 row in set (1 min 3.90 sec)
This achieves several goals:
Reduces the number of rows examined by two orders of magnitude.
Reduces the execution time from 3 hours+ to about 1 minute.
Also returns the actual timestamp when the highest count was found.
Here's the EXPLAIN for my query:
explain select ts, count(*) from times join cdrs on times.ts between cdrs.starttime and cdrs.stoptime group by ts order by count(*) desc limit 1;
+----+-------------+-------+-------+---------------+---------+---------+------+--------+------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+-------+---------------+---------+---------+------+--------+------------------------------------------------+
| 1 | SIMPLE | times | index | PRIMARY | PRIMARY | 5 | NULL | 1440 | Using index; Using temporary; Using filesort |
| 1 | SIMPLE | cdrs | ALL | starttime | NULL | NULL | NULL | 260727 | Range checked for each record (index map: 0x4) |
+----+-------------+-------+-------+---------------+---------+---------+------+--------+------------------------------------------------+
Notice the figures in the rows column, and compare to the EXPLAIN of your original query. You can estimate the total number of rows examined by multiplying these together (but that gets more complicated if your query is anything other than SIMPLE).
The inline view isn't strictly necessary. (You're right about a lot of time to run the EXPLAIN on the query with the inline view, the EXPLAIN will materialize the inline view (i.e. run the inline view query and populate the derived table), and then give an EXPLAIN on the outer query.
Note that this query will return an equivalent result:
SELECT COUNT(*) AS max_channels
FROM cdrs cl1
JOIN cdrs cl2
ON cl1.StartTime BETWEEN cl2.StartTime AND cl2.StopTime
GROUP BY cl1.id
ORDER BY max_channels DESC
LIMIT 1
Though it still has to do all the work, and probably doesn't perform any better; the EXPLAIN should run a lot faster. (We expect to see "Using temporary; Using filesort" in the Extra column.)
The number of rows in the resultset is going to be the number of rows in the table (~250,000 rows), and those are going to need to be sorted, so that's going to be some time there. The bigger issue (my gut is telling me) is that join operation.
I'm wondering if the EXPLAIN (or performance) would be any different if you swapped the cl1 and cl2 in the predicate, i.e.
ON cl2.StartTime BETWEEN cl1.StartTime AND cl1.StopTime
I'm thinking that, just because I'd be tempted to try a correlated subquery. That's ~250,000 executions, and that's not likely going to be any faster...
SELECT ( SELECT COUNT(*)
FROM cdrs cl2
WHERE cl2.StartTime BETWEEN cl1.StartTime AND cl1.StopTime
) AS max_channels
, cl1.StartTime
FROM cdrs cl1
ORDER BY max_channels DESC
LIMIT 11
You could run an EXPLAIN on that, we're still going to see a "Using temporary; Using filesort", and it will also show the "dependent subquery"...
Obviously, adding a predicate on the cl1 table to cut down the number of rows to be returned (for example, checking only the past 15 days); that should speed things up, but it doesn't get you the answer you want.
WHERE cl1.StartTime > NOW() - INTERVAL 15 DAY
(None of my musings here are sure-fire answers to your question, or solutions to the performance issue; they're just musings.)

mysql - select distinct mutually exclusive (based on another column's value) rows

First off, I would like to say that if after reading the question, anyone has a suggestion on a more informative title for this question, please tell me, as I think mine is somewhat lacking, now, on to business...
Given this table structure:
+---------+-------------------------------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+---------+-------------------------------------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| account | varchar(20) | YES | UNI | NULL | |
| domain | varchar(100) | YES | | NULL | |
| status | enum('FAILED','PENDING','COMPLETE') | YES | | NULL | |
+---------+-------------------------------------+------+-----+---------+----------------+
And this data:
+----+---------+------------------+----------+
| id | account | domain | status |
+----+---------+------------------+----------+
| 1 | jim | somedomain.com | COMPLETE |
| 2 | bob | somedomain.com | COMPLETE |
| 3 | joe | somedomain.com | COMPLETE |
| 4 | frank | otherdomain.com | COMPLETE |
| 5 | betty | otherdomain.com | PENDING |
| 6 | shirley | otherdomain.com | FAILED |
| 7 | tom | thirddomain.com | FAILED |
| 8 | lou | fourthdomain.com | COMPLETE |
+----+---------+------------------+----------+
I would like to select all domains which have a 'COMPLETE' status for all accounts (rows).
Any domains which have a row containing any value other then 'COMPLETE' for the status must not be returned.
So in the above example, My expected result would be:
+------------------+
| domain |
+------------------+
| somedomain.com |
| fourthdomain.com |
+------------------+
Obviously, I can achieve this by using a sub-query such as:
mysql> select distinct domain from test_table where status = 'complete' and domain not in (select distinct domain from test_table where status != 'complete');
+------------------+
| domain |
+------------------+
| somedomain.com |
| fourthdomain.com |
+------------------+
2 rows in set (0.00 sec)
This will work fine on our little mock-up test table, but in the real situation, the tables in question will be tens (or even hundreds) of thousands of rows, and I'm curious if there is some more efficient way to do this, as the sub-query is slow and intensive.
How about this:
select domain
from test_table
group by domain
having sum(case when status = 'COMPLETE'
then 0 else 1 end) = 0
I think this will work. Effectively just joins two basic queries together, then compares their count.
select
main.domain
from
your_table main
inner join
(
select
domain, count(id) as cnt
from
your_table
where
status = 'complete'
group by
domain
) complete
on complete.domain = main.domain
group by
main.domain
having
count(main.id) = complete.cnt
You should also ensure you have an index on domain as this relies on a join on that column.