Variance Dissimilarity in SQL with three table - mysql

I have three table:
rtgitems
rtgusers
POI
(the tables aren't complete for reasons of space).
I want to resolve this form:
where r_i,x is the value of column "voto" for the user "rater" i for the "item" x and avg_x is the average (the division from "totalrate" and "nrrates" -> totalrate/nrrates). |G| is given and isn't a trouble.
I want this table result:
Nome (from POI) | VD_x(G)
Tour Eiffel | 23
Arc | 18
...
I tried this for the firsts two table for to take the value for calculate the average (the third table I don't know how matching with the others):
SELECT totalrate, nrrates, voto FROM rtgitems INNER JOIN rtgusers ON rtgitems.item=rtgusers.item GROUP BY rater
but don't work.
Thanks for the help.

Just focus on the rtgusers table. If you want to bring in the names, that's fine. You can do it after the variance calculation (you seem to know what a join is). The first table seems superfluous to the problem.
You can calculate the variance by pre-calculating the summary values and then applying the formula. I think this is the basic logic that you want:
SELECT ru.item, (1.0 / max(rus.n)) * sum(power(ru.voto - avg_voto), 2)
FROM rtgusers ru join
(select ru.item, avg(voto * 1.0) as avg_voto, count(*) as n
from rtgusers ru
group by ru.item
) rus
on ru.item = rus.item
group by ru.item;

Related

Moving average query MS Access

I am trying to calculate the moving average of my data. I have googled and found many examples on this site and others but am still stumped. I need to calculate the average of the previous 5 flow for the record selected for the specific product.
My Table looks like the following:
TMDT Prod Flow
8/21/2017 12:01:00 AM A 100
8/20/2017 11:30:45 PM A 150
8/20/2017 10:00:15 PM A 200
8/19/2017 5:00:00 AM B 600
8/17/2017 12:00:00 AM A 300
8/16/2017 11:00:00 AM A 200
8/15/2017 10:00:31 AM A 50
I have been trying the following query:
SELECT b.TMDT, b.Flow, (SELECT AVG(Flow) as MovingAVG
FROM(SELECT TOP 5 *
FROM [mytable] a
WHERE Prod="A" AND [a.TMDT]< b.TMDT
ORDER BY a.TMDT DESC))
FROM mytable AS b;
When I try to run this query I get an input prompt for b.TMDT. Why is b.TMDT not being pulled from mytable?
Should I be using a different method altogether to calculate my moving averages?
I would like to add that I started with another method that works but is extremely slow. It runs fast enough for tables with 100 records or less. However, if the table has more than 100 records it feels like the query comes to a screeching halt.
Original method below.
I created two queries for each product code (There are 15 products): Q_ProdA_Rank and Q_ProdA_MovAvg
Q_ProdA_RanK (T_ProdA is a table with Product A's information):
SELECT a.TMDT, a.Flow, (Select count(*) from [T_ProdA]
where TMDT<=a.TMDT) AS Rank
FROM [T_ProdA] AS a
ORDER BY a.TMDT DESC;
Q_ProdA_MovAvg
SELECT b.TMDT, b.Flow, Round((Select sum(Flow) from [Q_PRodA_Rank] where
Rank between b.Rank-1 and (b.Rank-5))/IIf([Rank]<5,Rank-1,5),0) AS
MovingAvg
FROM [Q_ProdA_Rank] AS b;
The problem is that you're using a nested subquery, and as far as I know (can't find the right site for the documentation at the moment), variable scope in subqueries is limited to the direct parent of the subquery. This means that for your nested query, b.TMDT is outside of the variable scope.
Edit: As this is an interesting problem, and a properly-asked question, here is the full SQL answer. It's somewhat more complex than your try, but should run more efficiently
It contains a nested subquery that first lists the 5 previous flows for per TMDT and prod, then averages that, and then joins that in with the actual query.
SELECT A.TMDT, A.Prod, B.MovingAverage
FROM MyTable AS A LEFT JOIN (
SELECT JoinKeys.TMDT, JoinKeys.Prod, Avg(Top5.Flow) As MovingAverage
FROM (
SELECT JoinKeys.TMDT, JoinKeys.Prod, Top5.Flow
FROM MyTable As JoinKeys INNER JOIN MyTable AS Top5 ON JoinKeys.Prod = Top5.Prod
WHERE Top5.TMDT In (
SELECT TOP 5 A.TMDT FROM MyTable As A WHERE JoinKeys.Prod = A.Prod AND A.TMDT < JoinKeys.TMDT ORDER BY A.TMDT
)
)
GROUP BY JoinKeys.TMDT, JoinKeys.Prod
) AS B
ON A.Prod = B.JoinKeys.Prod AND A.TMDT = B.JoinKeys.TMDT
While in my previous version I advocated a VBA approach, this is probably more efficient, only more difficult to write and adjust.

SQL Validate a column with the same column

I have the following situation. I have a table with all info of article. I will like to compare the same column with it self. because I have multiple type of article. Single product and Master product. the only way that I have to differences it, is by SKU. for example.
ID | SKU
1 | 11111
2 | 11112
3 | 11113
4 | 11113-5
5 | 11113-8
6 | 11114
7 | 11115
8 | 11115-1-W
9 | 11115-2
10 | 11116
I only want to list or / and count only the sku that are full unique. follow th example the sku that are unique and no have variant are (ID = 1, 2, 6 and 10) I will want to create a query where if 11113 are again on the column not cout it. so in total I will be 4 unique sku and not "6 (on total)". Please let me know. if this are possible.
Assuming the length of master SKUs are 5 characters, try this:
select a.*
from mytable a
left join mytable b on b.sku like concat(a.sku, '%')
where length(a.sku) = 5
and b.sku is null
This query joins master SKUs to child ones, but filters out successful joins - leaving only solitary master SKUs.
You can do this by grouping and counting the unique rows.
First, we will need to take your table and add a new column, MasterSKU. This will be the first five characters of the SKU column. Once we have the MasterSKU, we can then GROUP BY it. This will bundle together all of the rows having the same MasterSKU. Once we are grouping we get access to aggregate functions like COUNT(). We will use that function to count the number of rows for each MasterSKU. Then, we will filter out any rows that have a COUNT() over 1. That will leave you with only the unique rows remaining.
Take that unique list and LEFT JOIN it back into your original table to grab the IDs.
SELECT ID, A.MasterSKU
FROM (
SELECT
MasterSKU = SUBSTRING(SKU,1,5),
MasterSKUCount = COUNT(*)
FROM MyTable
GROUP BY SUBSTRING(SKU,1,5)
HAVING COUNT(*) = 1
) AS A
LEFT JOIN (
SELECT
ID,
MasterSKU = SUBSTRING(SKU,1,5)
FROM MyTable
) AS B
ON A.MasterSKU = B.MasterSKU
Now one thing I noticed from you example. The original SKU column really looks like three columns in one. We have multiple values being joined with hypens.
11115-1-W
There may be a reason for it, but most likely this violates first normal form and will make the database hard to query. It's part of the reason why such a complicated query is needed. If the SKU column really represents multiple things then we may want to consider breaking it out into MasterSKU, Version, and Color or whatever each hyphen represents.

Is it possible to write a query to compare rows to other rows in same table?

I have a table with the following structure. I need to return all rows where the district of the record immediately preceding and immediately following the row are different than the district for that row. Is this possible? I was thinking of a join on the table itself but not sure how to do it.
id | zip_code | district
__________________________
20063 10169 12
20064 10169 9
20065 10169 12
Assuming that "preceding" and "following" are in the sense of the ID column, you can do:
select *
from zip_codes z1
inner join zip_codes z2 on z1.id=z2.id + 1
inner join zip_codes z3 on z1.id=z3.id - 1
where z1.district <> z2.district and z1.district <> z3.district
This will automatically filter out the first and last rows, because of the inner joins, if you need those to count, change it to left outer join.
Also, this checks if it's different from both. To find if it's different from either (as is implied in the comment), change the and in the where clause to an or. But note, that then, all three rows in your example fit that criteria, even if there are long rows of twelves above and below these rows.

GROUP BY does not remove duplicates

I have a watchlist system that I've coded, in the overview of the users' watchlist, they would see a list of records, however the list shows duplicates when in the database it only shows the exact, correct number.
I've tried GROUP BY watch.watch_id, GROUP BY rec.record_id, none of any types of group I've tried seems to remove duplicates. I'm not sure what I'm doing wrong.
SELECT watch.watch_date,
rec.street_number,
rec.street_name,
rec.city,
rec.state,
rec.country,
usr.username
FROM
(
watchlist watch
LEFT OUTER JOIN records rec ON rec.record_id = watch.record_id
LEFT OUTER JOIN members usr ON rec.user_id = usr.user_id
)
WHERE watch.user_id = 1
GROUP BY watch.watch_id
LIMIT 0, 25
The watchlist table looks like this:
+----------+---------+-----------+------------+
| watch_id | user_id | record_id | watch_date |
+----------+---------+-----------+------------+
| 13 | 1 | 22 | 1314038274 |
| 14 | 1 | 25 | 1314038995 |
+----------+---------+-----------+------------+
GROUP BY does not "remove duplicates". GROUP BY allows for aggregation. If all you want is to combine duplicated rows, use SELECT DISTINCT.
If you need to combine rows that are duplicate in some columns, use GROUP BY but you need to to specify what to do with the other columns. You can either omit them (by not listing them in the SELECT clause) or aggregate them (using functions like SUM, MIN, and AVG). For example:
SELECT watch.watch_id, COUNT(rec.street_number), MAX(watch.watch_date)
... GROUP by watch.watch_id
EDIT
The OP asked for some clarification.
Consider the "view" -- all the data put together by the FROMs and JOINs and the WHEREs -- call that V. There are two things you might want to do.
First, you might have completely duplicate rows that you wish to combine:
a b c
- - -
1 2 3
1 2 3
3 4 5
Then simply use DISTINCT
SELECT DISTINCT * FROM V;
a b c
- - -
1 2 3
3 4 5
Or, you might have partially duplicate rows that you wish to combine:
a b c
- - -
1 2 3
1 2 6
3 4 5
Those first two rows are "the same" in some sense, but clearly different in another sense (in particular, they would not be combined by SELECT DISTINCT). You have to decide how to combine them. You could discard column c as unimportant:
SELECT DISTINCT a,b FROM V;
a b
- -
1 2
3 4
Or you could perform some kind of aggregation on them. You could add them up:
SELECT a,b, SUM(c) "tot" FROM V GROUP BY a,b;
a b tot
- - ---
1 2 9
3 4 5
You could add pick the smallest value:
SELECT a,b, MIN(c) "first" FROM V GROUP BY a,b;
a b first
- - -----
1 2 3
3 4 5
Or you could take the mean (AVG), the standard deviation (STD), and any of a bunch of other functions that take a bunch of values for c and combine them into one.
What isn't really an option is just doing nothing. If you just list the ungrouped columns, the DBMS will either throw an error (Oracle does that -- the right choice, imo) or pick one value more or less at random (MySQL). But as Dr. Peart said, "When you choose not to decide, you still have made a choice."
While SELECT DISTINCT may indeed work in your case, it's important to note why what you have is not working.
You're selecting fields that are outside of the GROUP BY. Although MySQL allows this, the exact rows it returns for the non-GROUP BY fields is undefined.
If you wanted to do this with a GROUP BY try something more like the following:
SELECT watch.watch_date,
rec.street_number,
rec.street_name,
rec.city,
rec.state,
rec.country,
usr.username
FROM
(
watchlist watch
LEFT OUTER JOIN est8_records rec ON rec.record_id = watch.record_id
LEFT OUTER JOIN est8_members usr ON rec.user_id = usr.user_id
)
WHERE watch.watch_id IN (
SELECT watch_id FROM watch WHERE user_id = 1
GROUP BY watch.watch_id)
LIMIT 0, 25
I Would never recommend using SELECT DISTINCT, it's really slow on big datasets.
Try using things like EXISTS.
You are grouping by watch.watch_id and you have two results, which have different watch IDs, so naturally they would not be grouped.
Also, from the results displayed they have different records. That looks like a perfectly valid expected results. If you are trying to only select distinct values, then you don't want ot GROUP, but you want to select by distinct values.
SELECT DISTINCT()...
If you say your watchlist table is unique, then one (or both) of the other tables either (a) has duplicates, or (b) is not unique by the key you are using.
To suppress duplicates in your results, either use DISTINCT as #Laykes says, or try
GROUP BY watch.watch_date,
rec.street_number,
rec.street_name,
rec.city,
rec.state,
rec.country,
usr.username
It sort of sounds like you expect all 3 tables to be unique by their keys, though. If that is the case, you are simply masking some other problem with your SQL by trying to retrieve distinct values.

SQL GROUP BY - Multiple results in one column?

I am trying to perform a SELECT query using a GROUP BY clause, however I also need to access data from multiple rows and somehow concatenate it into a single column.
Here's what I have so far:
SELECT
COUNT(v.id) AS quantity,
vt.name AS name,
vt.cost AS cost,
vt.postage_cost AS postage_cost
FROM vouchers v
INNER JOIN voucher_types vt
ON v.type_id = vt.id
WHERE
v.order_id = 1 AND
v.sold = 1
GROUP BY vt.id
Which gives me the first four columns I need in the following format.
quantity | name | cost | postage_cost
2 X 5 1
2 Y 6 1
However, I would also like a fifth column to be displayed, showing all of the codes associated with each line of the order like this:
code
ABCD, EFGH
IJKL, MNOP
Where the comma separated values are pulled from the voucher table.
Is this possible?
Any advice would be appreciated.
Thanks
This is what GROUP_CONCAT does.
Assuming the column is called code you would just add ,GROUP_CONCAT(v.code) As Codes to your select list.