Calculate average of values between 2 columns sql

Calculate average of values between 2 columns sql - mysql

I have a table called validation_errors that looks like this:
+-------------+--------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-------------+--------------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| link | varchar(200) | NO | MUL | NULL | |
| message | varchar(500) | NO | | | |
| explanation | mediumtext | NO | | NULL | |
| type | varchar(50) | NO | | | |
| subtype | varchar(50) | NO | | | |
| message_id | varchar(50) | NO | | | |
+-------------+--------------+------+-----+---------+----------------+
Link table looks like this:
+-----------+--------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-----------+--------------+------+-----+---------+-------+
| link | varchar(200) | NO | PRI | NULL | |
| visited | tinyint(1) | NO | | 0 | |
| validated | tinyint(1) | NO | | 0 | |
+-----------+--------------+------+-----+---------+-------+
I wish to calculate the average number of validation errors per page per topdomain.
I have a query that can fetch the amount of pages per topdomain:
SELECT substr(link, - instr(reverse(link), '.')) as domain , count(*) as count
FROM links
GROUP BY domain
ORDER BY count desc
limit 30;
And have a sql query that can fetch the amount of validation errors per top domain:
SELECT substr(link, - instr(reverse(link), '.')) as domain ,count(*) as count
FROM validation_errors
GROUP BY domain
ORDER BY count desc
limit 30;
What i now need to do is combine them into a query and divise the results of one column with the other and i can't figure out how to do it.
Any help would be greatly apriciated.

First, use substring_index(), rather than your construct. Here is the query to join them together:
select domain, sum(numviews) as numviews, sum(numerrors) as numerrors,
sum(numerrors) / nullif(sum(numviews), 0) as error_rate
from ((SELECT substring_index(link, '.', -1) as domain , count(*) as numviews, 0 as numerrors
FROM links
GROUP BY domain
) UNION ALL
(SELECT substring_index(link, '.', -1) as domain , 0, count(*)
FROM validation_errors
GROUP BY domain
)
) d
GROUP BY domain;
With both variables, I don't know which 30 you want to choose, so I haven't included an order by.
Note that this doesn't use a join, it uses union all with aggregation. This ensures that you will get all domains, even those with no views and those with no errors.

Related

MYSQL - output extra column based on a certain condition

At first, I want to apologize for providing such a weak title; I couldn't describe it in a better way.
Consider the following: We have three tables, one for users, one for records and one for ratings. The tables are quite self-explanatory but the schema for database is as following:
+---------------------+
| Tables_in_relations |
+---------------------+
| records |
| ratings |
| users |
+---------------------+
The schema for records table is as following:
+----------+----------------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+----------+----------------------+------+-----+---------+----------------+
| id | smallint(5) unsigned | NO | PRI | NULL | auto_increment |
| title | varchar(256) | NO | | NULL | |
| year | int(4) | NO | | NULL | |
+----------+----------------------+------+-----+---------+----------------+
The schema for users table is as following:
+----------+----------------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+----------+----------------------+------+-----+---------+----------------+
| id | smallint(5) unsigned | NO | PRI | NULL | auto_increment |
| email | varchar(256) | NO | | NULL | |
| name | varchar(256) | NO | | NULL | |
| password | varchar(256) | NO | | NULL | |
+----------+----------------------+------+-----+---------+----------------+
ratings table is, obvoiusly, where the ratings are stored among with the record_id and user_id and works as a relation table.
It's schema is as following:
+----------+----------------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+----------+----------------------+------+-----+---------+----------------+
| id | smallint(5) unsigned | NO | PRI | NULL | auto_increment |
| record_id| smallint(5) unsigned | NO | MUL | NULL | |
| user_id | smallint(5) unsigned | NO | MUL | NULL | |
| rating | int(1) | NO | | NULL | |
+----------+----------------------+------+-----+---------+----------------+
Now, In my application, I have a search function that fetches records based on a certain keyword. The output should also include the average rating of a certain record and a total amount of ratings per record. This can be accomplished by following query:
SELECT re.id, re.title, re.year, ROUND(avg(ra.rating)) as avg_rate,
COUNT(ra.record_id) as total_times_rated
FROM records re
LEFT JOIN ratings ra ON ra.record_id = re.id
GROUP BY re.id;
which will give me the following output:
+----+------------------------+------+----------+-------------------+
| id | title | year | avg_rate | total_times_rated |
+----+------------------------+------+----------+-------------------+
| 1 | Test Record 1 | 2008 | 3 | 4 |
| 2 | Test Record 2 | 2012 | 2 | 4 |
| 3 | Test Record 3 | 2003 | 3 | 4 |
| 4 | Test Record 4 | 2012 | 3 | 3 |
| 5 | Test Record 5 | 2003 | 2 | 3 |
| 6 | Test Record 6 | 2006 | 2 | 3 |
+----+------------------------+------+----------+-------------------+
Question:
Now, here comes the tricky part, at least for me. Within my app, you can search records whether signed in or not and if signed in, I'd also like to include the user's own rating value in the above query.
I know that I can run a conditional to check whether user is signed in or not by reading the session value and execute a corresponding query based on that. I just don't know how to include that individual rating value of a certain user to the above query.

You can add user's rating in the result by adding a SELECT query in columns:
SELECT re.id, re.title, re.year, ROUND(avg(ra.rating)) as avg_rate,
COUNT(ra.record_id) as total_times_rated,
(SELECT rating FROM ratings WHERE user_id = ? AND record_id = re.id) as user_rating
FROM records re
LEFT JOIN ratings ra ON ra.record_id = re.id
GROUP BY re.id;
We can get the user_id from session and pass it to this query in order to generate user_rating column in the result.
Assuming user can rate a record multiple times, I have used SUM. If not, we can remove it from the query.
Update
If you don't want GROUP BY to consider that value then you can wrap the existing query into another query and add a column to it, e.g.:
SELECT a.id, a.title, a.year, a.avg_rate, a.total_times_rated,
(SELECT rating FROM ratings WHERE user_id = ? AND record_id = a.id) as user_rating
FROM (SELECT re.id as id, re.title as title, re.year as year, ROUND(avg(ra.rating)) as avg_rate,
COUNT(ra.record_id) as total_times_rated
FROM records re
LEFT JOIN ratings ra ON ra.record_id = re.id
GROUP BY re.id) a;

Counting and ordering and joining

My previous question gave me the answer that I could take
mysql> describe taps;
+------------+-----------+------+-----+-------------------+-------+
| Field | Type | Null | Key | Default | Extra |
+------------+-----------+------+-----+-------------------+-------+
| tag | int(11) | NO | | NULL | |
| station | int(11) | NO | | NULL | |
| time_Stamp | timestamp | NO | | CURRENT_TIMESTAMP | |
+------------+-----------+------+-----+-------------------+-------+
3 rows in set (0.00 sec)
and use the query
SELECT tag
, COUNT(DISTINCT station) as `visit_count`
FROM taps
GROUP
BY tag
ORDER
BY COUNT(DISTINCT station) DESC
to get the visitors ordered by the number of stations they have visited.
Now I want to add
mysql> describe visitors;
+--------+---------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+--------+---------+------+-----+---------+-------+
| tag_id | int(11) | NO | | NULL | |
| name | text | NO | | NULL | |
| email | text | NO | | NULL | |
| phone | text | NO | | NULL | |
+--------+---------+------+-----+---------+-------+
4 rows in set (0.00 sec)
And, instead of getting the visitors tag_id, I want to get his name, email and phone. I know that it involves aJOIN, but just can't figure it out :-(
[Update] Just to be clear, I want to output an HTML table, ordered by whoever visited the most stations, showing name, email & phone

SELECT tag
,v.email, COUNT(DISTINCT station) as `visit_count`
FROM taps as t JOIN visitors as v ON t.tag = v.tag_id
GROUP
BY v.email
ORDER
BY COUNT(DISTINCT station) DESC

HQL/MySQL for listing distincts and duplicates

I have list of 20.000+ objects. These objects have a fk to a table called title. Two tipps are considered duplicate if they are linked to the same title, and they belong to the same package(tipp_pkg_fk, this is a parameter).
I need a list of all objects, with the duplicates listed together. For example:
tippA.title.name = "One"
tippB.title.name = "Two"
tippC.title.name = "Two"
Ideally from the above I will get a list result like this: [[tippA],[tippB,tippC]]
I am not sure how to do this, I have made an attempt (first in Mysql so I can test it, then ill change it to HQL):
select tipp.tipp_id, 1 as sortOrder
from (select distinct a.tipp_id as id
from title_instance_package_platform a, title_instance_package_platform b
where a.tipp_pkg_fk= 1 and b.tipp_pkg_fk = 1 and a.tipp_ti_fk = b.tipp_ti_fk) duplicates,
title_instance_package_platform tipp
where tipp.tipp_id != duplicates.id
union all
select duplicates.id, 2 as sortOrder
from (select distinct a.tipp_id as id
from title_instance_package_platform a , title_instance_package_platform b
where a.tipp_pkg_fk = 1 and b.tipp_pkg_fk=1 and a.tipp_ti_fk = b.tipp_ti_fk) duplicates
order by sortOrder, id;
This executed for 330 seconds, then I got the message fetching in MySQL workbench, and computer started dying at that point. So the idea is that first I select all the IDs that are not duplicate, then I select all the IDS that are duplicate, and then I merge them and order them so that they appear together. I am looking for the most efficient way to do this, as I will be executing this query several times during an overnight job.
For my TIPP model, the following are part of the mapping:
static mapping = {
pkg column:'tipp_pkg_fk', index: 'tipp_idx'
title column:'tipp_ti_fk', index: 'tipp_idx'
}
+-----------------------------+--------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-----------------------------+--------------+------+-----+---------+----------------+
| tipp_id | bigint(20) | NO | PRI | NULL | auto_increment |
| tipp_version | bigint(20) | NO | | NULL | |
| tipp_pkg_fk | bigint(20) | NO | MUL | NULL | |
| tipp_plat_fk | bigint(20) | NO | MUL | NULL | |
| tipp_ti_fk | bigint(20) | NO | MUL | NULL | |
| date_created | datetime | NO | | NULL | |
| last_updated | datetime | NO | | NULL | |
+-----------------+---------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-----------------+---------------+------+-----+---------+----------------+
| ti_id | bigint(20) | NO | PRI | NULL | auto_increment |
| ti_version | bigint(20) | NO | | NULL | |
| date_created | datetime | NO | | NULL | |
| ti_imp_id | varchar(255) | NO | MUL | NULL | |
| last_updated | datetime | NO | | NULL | |
| ti_title | varchar(1024) | YES | | NULL | |
| ti_key_title | varchar(1024) | YES | | NULL | |
| ti_norm_title | varchar(1024) | YES | | NULL | |
| sort_title | varchar(1024) | YES | | NULL | |
+-----------------+---------------+------+-----+---------+----------------+
Update
After some changes it is working:
select tipp.tipp_id as id, 1 as sortOrder
from
title_instance_package_platform tipp
where tipp.tipp_id not in (select distinct a.tipp_id as id
from title_instance_package_platform a, title_instance_package_platform b
where a.tipp_pkg_fk= 1 and b.tipp_pkg_fk = 1 and a.tipp_ti_fk = b.tipp_ti_fk)
union all
select duplicates.id as id, 2 as sortOrder
from (select distinct a.tipp_id as id
from title_instance_package_platform a , title_instance_package_platform b
where a.tipp_pkg_fk = 1 and b.tipp_pkg_fk=1 and a.tipp_ti_fk = b.tipp_ti_fk) duplicates
order by sortOrder, id;
I still haven't got the duplicates grouped together though, instead everything comes as a list, which means I still need to group them.

Can you do your select from the other side?
select all titles and packages and list all tipps to these, only if a tipp exists (count > 0) and bundle these together to get the array you showed?

Seems like you could compute both the dups and the non-dups at the same time. Something like
SELECT ( a.tipp_ti_fk = b.tipp_ti_fk ) AS sortOrder,
a.tipp_id as id
from title_instance_package_platform a ,
title_instance_package_platform b
where a.tipp_pkg_fk = 1
and b.tipp_pkg_fk = 1
You might need a DISTINCT.
This composite index would help:
INDEX(tipp_pkg_fk, tipp_ti_fk, tipp_id)

Conditional logic in SQL query

I have a table that looks like the following:
mysql> desc mlb_lineups;
+----------------------+--------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+----------------------+--------------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| player_id | int(11) | NO | MUL | NULL | |
| team_id | int(11) | NO | | NULL | |
| game_id | int(11) | NO | MUL | NULL | |
| gamedate | date | NO | | NULL | |
| pos | int(11) | NO | | NULL | |
| is_home | int(11) | NO | | 0 | |
| is_pitcher | int(11) | YES | MUL | 0 | |
| opponent_team_id | int(11) | NO | MUL | NULL | |
| first_name | varchar(255) | YES | | NULL | |
| last_name | varchar(255) | YES | | NULL | |
| position | varchar(20) | YES | | NULL | |
| hand_throws_with | varchar(1) | YES | | NULL | |
+----------------------+--------------+------+-----+---------+----------------+
In order for me to retrieve a lineup that a team used last, let's say a team with team_id 31 in this case, I'd run the following query:
select * from mlb_lineups
where team_id = 31
and pos > -1
order by gamedate DESC,
pos ASC LIMIT 9;
That works fine and dandy. What I'm trying to do though is a bit tricky and I can't seem to piece the way the inner query and/or conditional logic would work here. I want to run a query that basically says: retrieve a lineup that a team used last where the opponent_team_id had an is_pitcher equal to 1 with a hand_throws_with equal to L. mlb_lineups table will contain at least one row where a player is_pitcher is equal to 1 and hand_throws_with is equal to L where a team has a lefty throwing on the mound.
Essentially what I'd need to do to find out what the last lineup a team_id did used when their opposing pitcher had a hand_throws_with equal to L I'd have to run a query that would figure out what the last opponent_team_id they faced is with that particular handedness and then retrieve their lineup for that game_id. Does this schema provide enough information to run a single query for that? Did I provide enough information to make my problem understandable?

Presumably you'll need to JOIN the table back to itself using the opponent_team_id and game_id fields. There are a couple of ways to do this.
Here is one method using EXISTS:
select *
from mlb_lineups ml1
where tsn_team_id = 31
and pos > -1
and exists (
select 1
from mlb_lineups ml2
where ml1.opponent_team_id = ml2.team_id
and ml2.is_pitcher = 1
and ml2.hand_throws_with = 'L'
and ml1.game_id = ml2.game_id
)
order by gamedate desc, pos
limit 9;
This method uses a standard JOIN but it may require DISTINCT, depends on the data:
select ml1.*
from mlb_lineups ml1
inner join mlb_lineups ml2 on ml1.game_id = ml2.game_id
and ml1.team_id = ml2.opponent_team_id
and ml2.is_pitcher = 1
and ml2.hand_throws_with = 'L'
where ml1.tsn_team_id = 31
and ml1.pos > -1
order by ml1.gamedate desc, ml1.pos
limit 9;

MySQL merge results into table from count of 2 other tables, matching ids

I've got 3 tables: model, model_views, and model_views2. In an effort to have one column per row to hold aggregated views, I've done a migration to make the model look something like this, with a new column for the views:
+---------------+---------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+---------------+---------------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| user_id | int(11) | NO | | NULL | |
| [...] | | | | | |
| views | int(20) | YES | | 0 | |
+---------------+---------------+------+-----+---------+----------------+
This is what the columns for model_views and model_views2 look like:
+------------+------------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+------------+------------------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| user_id | smallint(5) | NO | MUL | NULL | |
| model_id | smallint(5) | NO | MUL | NULL | |
| time | int(10) unsigned | NO | | NULL | |
| ip_address | varchar(16) | NO | MUL | NULL | |
+------------+------------------+------+-----+---------+----------------+
model_views and model_views2 are gargantuan, both totalling in the tens of millions of rows each. Each row is representative of one view, and this is a terrible mess for performance. So far, I've got this MySQL command to fetch a count of all the rows representing single views in both of these tables, sorted by model_id added up:
SELECT model_id, SUM(c) FROM (
SELECT model_views.model_id, COUNT(*) AS c FROM model_views
GROUP BY model_views.model_id
UNION ALL
SELECT model_views2.model_id, COUNT(*) AS c FROM model_views2
GROUP BY model_views2.model_id)
AS foo GROUP BY model_id
So that I get a nice big table with the following:
+----------+--------+
| model_id | SUM(c) |
+----------+--------+
| 1 | 1451 |
| [...] | |
+----------+--------+
What would be the safest route for pulling off commands from here on in to merge the values of SUM(c) into the column model.views, matched by the model.id to model_ids that I get out of the above SQL query? I want to only fill the rows for models that still exist - There is probably model_views referring to rows in the model table which have been deleted.

You can just use UPDATE with a JOIN on your subquery:
UPDATE model
JOIN (
SELECT model_views.model_id, COUNT(*) AS c
FROM model_views
GROUP BY model_views.model_id
UNION ALL
SELECT model_views2.model_id, COUNT(*) AS c
FROM model_views2
GROUP BY model_views2.model_id) toupdate ON model.id = toupdate.model_id
SET model.views = toupdate.c

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

Calculate average of values between 2 columns sql - mysql

Related

MYSQL - output extra column based on a certain condition

Counting and ordering and joining

HQL/MySQL for listing distincts and duplicates

Conditional logic in SQL query

MySQL merge results into table from count of 2 other tables, matching ids

Categories

Resources