Python Seaborn Pandas bar chart problem - same years for different states - bar-chart

So I have this dataFrame I'm using -
DataFrame
and I am trying to show a bar chart of the data, such that for each state I could see the number of incidents per year, for each of the years in the df.
I've got so far with this-
graph
but the thing is that every state has different time frames - meaning the graph supposed to be showing different years. (i.e Arizona for 89-95 and Maryland for 87-93), but I get the same years for all states.
What should I do?
my code is -
g = sns.FacetGrid(dfagg, col="State")
g.map(sns.barplot, "Year", "Incident")
where dfagg is the df mentioned earlier.
Thanks

Pass sharex=False to the FacetGrid function:
g = sns.FacetGrid(dfagg, col="State", sharex=False)
g.map(sns.barplot, "Year", "Incident")

You can set the axis by:
ax.set(ylim=(<min year>, <max year>))

Related

Non Scaled SSRS Line Chart with mulitple series

I am trying to present time series of multiple sensors on a single SSRS (v14) line chart
I need to plot N series, with each independently plotting the series data in the space provided by the chart (independent vertical axis)
More about the data
There can be anywhere from ~1-10 series
The challenge is that they are different orders of magnitude.
One might be degrees F (~0-212)
One might be Carbon ppm (~1-16)
One might be Ftlbs Thrust (~10k-100k)
the point is , they have no relation and can be very different
The exact value is not important. I can hide the vertical axis
More about what I am trying to do
The idea is to show the multiple time series, plotted together against time for the 4 hours before and after
'an event'. Its not the necessarily the exact value that is important. the subject matter expert would be looking for something odd (temperature falls, thrust spikes, etc).
Things I have tried
If there were just 2 series, i could easily use the 2nd axis available in the SSRS chart. Thats exactly the idea I am chasing. But in this case, I want N series to plot using its own axis.
I have tried stacking N transparent graphs on top of each other. This would be a really ugly solution, but SSRS even wont let you do it. It unstacks them for you.
I have experimented with the Allow Scale Breaks property on the Vert Axis. This would solve the problem but we don't like the 'double jagged line'
Turning on Logarithmic scale is a possibility. It does do a better job of displaying all the data. but its not really what we want. Its going to change the shape of data that ranges over a couple orders of magnitude.
I tried the sparkline component and am having the same problem.
This approach is essentially the same a Greg's answer above. I've had to do this same process in the past comparing trends of data even though the units were dissimilar.
I took a very simple approach of adding an additional column to the query that showed each value as a percentage of the maximum value in each series.
As an example (just 2 series here for clarity) I started with data like this in myTable
Series Month myValue
A Jan 4
A Feb 8
A Mar 16
B Jan 200
B Feb 300
B Mar 400
My Dataset query would be something like.
SELECT *, myValue / MAX(myValue) OVER(PARTITION BY Series) as myPlotValue FROM myTable
This gives us a final dataset which looks liek this.
Series Month myValue myPlotValue
A Jan 4 0.25
A Feb 8 0.5
A Mar 16 1
B Jan 200 0.5
B Feb 300 0.75
B Mar 400 1
As you can see all plot values are now between 0 and 1.
I created that charts using the myPlotValue field and had the option of using the original values from the myValue field as datapoint labels.
After talking to some math people, this is a standard problem and it is solved by a process called normalization of the data.
Essentially you are changing all the series to fit in a given range (usually 0-1)
You can scale and add an offset if that makes sense for your problem domain somehow.
https://www.statisticshowto.datasciencecentral.com/normalized/

create a weighted undirected graph from data.frame in r?

I have search similar question before,but still do not solve my question.
I have data frame with three columns,First two columns are
vertex ,third column is weights.
I want to create a weighted undirected graph,I use code like this
graph.data.frame(d =aggdat1, directed = F)
but How can I add weights ?
besides ,in my data frame,there are some repeated edges ,like a-b and b-a
,I just need make "directed = F"?
thank you very much.

Spatial Join for two variable visualization

I want to know if I can use Spatial Join functions for visualize a dataset based in two variables.
My csv has 541000 rows and I'm trying to make a visualization in Zeppelin with Spark to minimize de point draws.
All examples I've seen are to GIS systems but there are not the type of data I need.
My csv is this:
id, variableX, variableY, type.
I'm trying to apply a Spatial Join logic to variableX and variableY.
Thank you.
spark-highcharts might do what you want.
It's too much to plot half million points directly. There are some aggregation or filter needed. spark-highcharts will do the aggregation automatically.
For 2 dimension data, chart type like, line, area, spline.
For 3 dimension data, chart type like, arearange, scatter can be used.
With following code to plot bank data provided in Zeppelin Tutorial. It can plot a spline chart with xAxis use column age, and yAxis using aggregated average balance
import com.knockdata.spark.highcharts._
import com.knockdata.spark.highcharts.model._
highcharts(bank.series("name" -> "age", "y" -> avg($"balance")).orderBy($"age")).
xAxis(new XAxis("age").typ("category")).
chart(Chart.spline).
plot()

A way to import data for each time (ticks) in netlogo?

I have started using netlogo just some months and I am not professional in it,I want to import some time series data such as x, y and z for each ticks during simulation. I need an answer of how to import these data from a excel file that was converted to csv file and have 3 columns and 30 rows that each columns are related to x, y and z and each rows are related to different years and i want to use each year for each ticks for example, first row should be used in ticks 0, 2nd in tick 2 and to the end that 30th year for tick 29.
the name of file for these data is: data.csv
As Seth mentioned, the CSV extension can help with this. The CSV extension documentation has example code that you will probably find quite useful (see the section "Read a file one line per tick").

decimal lag values in ACF plot instead of integers lags

I have a monthly time series. When I run the code acf(timeseries), the lags on the x axis show up as decimals instead of integers, as shown in the screenshot:
What is wrong? How could I have lags=c(1,2,3,4,5,6,etc) on the x-axis? I need something like this (photoshopped photo) (excuse the mis-alignment of values with ticks on the x-axis):
Try Acf (first letter is in upper-case) function in package "forecast".
Perhaps it's because you are using a non-desirable format for the acf/ccf function.
I faced the same problem and I solved it by changing the input vectors from time-series (ts) to numeric:
[variable]<-as.numeric([variable])
And it worked. I hope it helped.
Since you are using a tseries object, you need to pass coredata() to the ACF and PACF functions:
acf(coredata(your_ts_object))
pacf(coredata(your_ts_object))
This will pass just the numerical values in the time series and won't make a mess, giving you integer lags.
I think you have to get the results from the acf() function then plot it in your own like this:
storing acf results:
a=acf(ts,plot = F) #ts is an annual time series(frequency =12)
plotting acf:
plot(a$lag*12,a$acf,xlab="Lag",ylab="ACF",main="",type="h")
note that you have to multiple the lag * frequency of your serie in this case 12.
plotting horizontal lines:
abline(h=c(-0.19,0,0.19),col=c("blue","black","blue"),lty=c(2,1,2))
h : to specify where to plot ths lines
col : for the colors of the lines
lty : to specify the type of the line
that worked for me, i hope that's what you locking for
As the ts is monthly, so the yearly lag is divided into 12. The first figure is just a portion of the total ACF (i.e., for 1.5 years approx.). To have ACF for the full ts, use acf(ts_object, lag.max = the max length of your ts_object). E.g., if you have 15 years monthly data, then set lag.max = 12*15.