Data Analytics with python - data-analysis

Mean marks obtained by male and female students of school ABCD in first unit test are shown as below.
Male Female
Sample Size 64 36
Sample Mean Marks 44 41
Population Variance 128 72
The standard error for the difference between the two means is
4.0
7.46
4.24
2.0

Related

MySQL - Connect three different tables in transposed fashion

I have three tables:
business:
id name
1 Charlie's Bakery
2 Mark's Pizza
3 Rob's Market
balanco_manual:
id business_id year unit
012 1 2015 ones
123 1 2014 tens
364 2 2014 cents
conta_balanco:
id conta balanco_id valor
412 12.3 012 12324
344 12.5 012 54632
414 14.1 364 344122
789 12 364 2312415
646 12 123 342
I need to combine them all in the business table and make them look like this:
business:
id name 12.3-2015 12.5-2015 11.56-2015 12-2014 2015-unit 2014-unit
1 Charlie's Bakery 12324 54632 NaN 342 ones tens
2 Mark's Pizza NaN NaN NaN 2312415 NaN cents
3 Rob's Market NaN NaN NaN NaN NaN NaN
Explaining a little bit further: the business table has basic registries about the businesses, balanco_manual has yearly information of each one of those businesses and conta_balanco has details of the yearly information in balanco_manual.
Trying to put that last table into words:
- First I need to join business with balanco_manual, combining the "id" column in business with the "business_id" column in balanco_manual. Note that I combine unit and year in one single column named "[year]-unit". Let's call this table "new_business" to make it easir to understand
- After, I need to combine "new_business" with conta_balanco in a similar way we did with the "unit" column. Each "conta" should be combined with the year and become a column "conta-[year]".
I'm quite a beginner with SQL and I'm having interesting difficulties. Could someone help me to crack that out?

5 point average in SSRS

I try to put a 5 point avg in my chart. I add a trendline, but it looks like this:
And then I created a new series to calculate there the avg. and this looks like this:
but I would like to show this in a 5 point average. How can I do this?
This answer is based on my experience with Excel, not reporting services, but it is probably the same problem.
Your chart is probably a scatter plot rather than a line chart (note: this is Excel terminology). A scatter plot does not have an intrinsic ordering in the data. A line chart does.
The solution (for a scatter plot) is simply to sort the data by the x-values. The same will probably work for you. If you are pulling the data from a database, then order by can accomplish this. Otherwise, you can sort the data in the application.
Using this post as a starting point you can see that it is possible to calculate a moving average for a chart using the SQL query that pulls the data from the database.
For example, using this table in my database called mySalesTable
myDate sales myDate sales myDate sales
---------- ------ ---------- ------ ---------- ------
01/01/2015 456 16/01/2015 546 31/01/2015 658
02/01/2015 487 17/01/2015 12 01/02/2015 121
03/01/2015 245 18/01/2015 62 02/02/2015 654
04/01/2015 812 19/01/2015 516 03/02/2015 261
05/01/2015 333 20/01/2015 1 04/02/2015 892
06/01/2015 449 21/01/2015 65 05/02/2015 982
07/01/2015 827 22/01/2015 15 06/02/2015 218
08/01/2015 569 23/01/2015 656 07/02/2015 212
09/01/2015 538 24/01/2015 25 08/02/2015 312
10/01/2015 455 25/01/2015 549 09/02/2015 21
11/01/2015 458 26/01/2015 261
12/01/2015 542 27/01/2015 21
13/01/2015 549 28/01/2015 21
14/01/2015 432 29/01/2015 61
15/01/2015 685 30/01/2015 321
You can pull out this data, and create a Moving average based on the last 5 dates by using the following query for your dataset
SELECT mst.myDate, mst.sales, avg(mst_past.sales) AS moving_average
FROM mySalesTable mst
JOIN mySalesTable as mst_past
ON mst_past.myDate
BETWEEN DATEADD(D, -4, mst.myDate) AND mst.myDate
GROUP BY mst.myDate, mst.sales
ORDER BY mst.myDate ASC
This is effectively joining a sub-table for each row consisting of the previous 4 dates and the current date, and finds the average for these dates, outputting that as the column moving_average
You can then chart both these fields as normal, to give the following output (with the data table so you and see the actual calculated moving average)
Hopefully this will help you. Please let me know if you require further assistance

Problems with node names in Igraph with R

I have a list of data.frames (dfList) from which I'd like to generate directed weighted networks in Igraph in R.
The edge-weight variable is "ValueUSD", while vertices are identified by numbers with the columns "Reporter" and "Partner".
Here it follows a function I prepared called "write.graphs" to be used with lapply on my "dfList".
write.graphs<-function(filename){
d<-graph.data.frame(filename[c("Reporter", "Partner")], directed=TRUE)
d <- set.edge.attribute(d, "weight", value=filename$ValueUSD)
}
graphs<-lapply(dfList, write.graphs)
Every thing work perfectly. If I check for vertex names I get:
graphs[1]$names
NULL
But troubles emerge when I want to use names in place of numbers to identify my vertices, using the corresponding columns "ReporterN" and "PartnerN" in each of my data.frames in dfLIst.
Here you can see how my data.frames look like:
dfList[1]
$`Aug 2014`
Reporter YearPeriod Year Period Commodity Partner NetWeightKg ValueUSD Price PartnerN ReporterN
1 76 201408 2014 8 150910 0 4472917 22028271 4.924811 World Brazil
2 76 201408 2014 8 150910 32 380891 1533948 4.027262 Argentina Brazil
3 76 201408 2014 8 150910 152 239776 1336057 5.572105 Chile Brazil
4 76 201408 2014 8 150910 251 289 2164 7.487889 France Brazil
5 76 201408 2014 8 150910 300 27592 170658 6.185054 Greece Brazil
6
This is the message I get:
> grafi<-lapply(dfList, scrivi.grafi)
There were 50 or more warnings (use warnings() to see the first 50)
> warnings()
Warning messages:
1: In graph.data.frame(filename[c("ReporterN", "PartnerN")], ... :
In `d' `NA' elements were replaced with string "NA"
2: In graph.data.frame(filename[c("ReporterN", "PartnerN")], ... :
In `d' `NA' elements were replaced with string "NA"
3: In graph.data.frame(filename[c("ReporterN", "PartnerN")], ... :
If it helps, I checked:
class(dfList[1]$PartnerN)
[1] "NULL"
Any suggestion? Can anyone explain me what happens?
Thanks a lot, Umberto.

SQL queries to get (elo)rating history (for graph, highest points etc)

I'm running a site with user ranking-list based on elo-rating.
I want to provide more statistics to users and I have pretty much covered, but cant really figure out how to make queries for these ones.
Players highest ranking points
Players ranking points history (for graph)
MySQL db has two tables for statistics: ranking_statistics which holds overall statistics:
id, ranking, wins, losses, draws, total6m, total8m, total10m
and ranking_matches which holds statistics for matches played:
id, home_id, away_id, home_ranking, away_ranking, home6m, away6m, home8m, away8m, home10m, away10m, datetime
Here is some sample data from ranking_matches:
46 442 456 30 -30 6 6 5 3 3 4 2013-10-14 21:22:58
54 456 480 34.0391 -34.0391 6 4 6 4 2 1 2013-10-16 17:33:37
55 473 475 30 -30 9 9 7 8 6 4 2013-10-17 03:06:41
and from ranking_statistics:
442 1029.97 7 2 6 120 89 55
456 1003.93 6 2 5 99 84 65
I would want to retrieve players highest ranking points on history (ranking_statistics.ranking holds current points) and that could be retrieved from ranking_matches by quering all matches with players id as home or away and then calculating all ranking changes with highest score remembered (starting points is 1000). With this query, a graph of points history would be drawn also.
I have tried to understand how this is done but could not get it by myself and there doesnt seem to be any similar questions posted (or atleast I did not found any)
Results could be also calculated with PHP because all the data is output with it.
Sample output:
Player id: 442
Current rating: 1029.97
Highest rating: 1054.32 (on 10-23-2013)
For history graph, 2 values need to be retrieved to be able to draw a history line graph, date and rankingpoints.

Conditional Sorting according to one value and then alphabetically

I have this group in a table. Where I need to display one value on the top and rest according to its alphabetical order.
Table
Column1 Value#1 Value#2
Alpha 12 26
Beta 65 745
Gamma 987 87
Pie 7 2
Non-Beta 132 426
Zeta 112 266
I want to sort it like this(Can anyone also tell me the real use of this other than Viewing Purpose)
Table
Column1 Value#1 Value#2
Non-Beta 132 426
Alpha 12 26
Beta 65 745
Gamma 987 87
Pie 7 2
Zeta 112 266
So the Non-Beta has to be displayed on the top and rest according to alphabetical order.
Edit
Thank you very much for the below reply Chris, Really appreciate and yes it works.
I have one more question from the over table format itself...How can I display it in the below format...
Table
Column1 Value#1 Value#2
Non-Beta 132 426
Alpha 12 26
Pie 7 2
Zeta 112 266
Total 263 720
Beta 65 745
Gamma 987 87
Total 1057 832
Thank you
Select the table, right-click the table handle, select Tablix Properties and select the Sorting tab. Press the Add button then click the fx button to open the expression editor. Enter the following expression:
=IIF(Fields!Column1.Value = "Non-Beta", "A" + Fields!Column1.Value, "B" + Fields!Column1.Value)
All we are doing is prefixing the special value field with something so it comes before the other fields.