How can I make a bar graph from a BLAST result - blast

I have a blast result in table format. Below are the first three columns. The first column is the query ID (in this example we have 2 queries; 6031753 and 60317532), and the second column is the hits against the query sequence and have 3 parts
a) swiss prot id sp|Q10CQ1|
b) gene name MAD14
c) organism ORYSJ
I would like to make the bar chart of genes which are present and how many times they appear against each query.
For example for the first query (60317531)
MAD14 2 times
MAD15 1 time
AGL8 2 time
AP1 3 time
Fields: query_id subject_id %_identity
gi|60317531|gb|AAX18712.1| sp|Q10CQ1|MAD14_ORYSJ 84.21
gi|60317531|gb|AAX18712.1| sp|P0C5B1|MAD14_ORYSI 83.40
gi|60317531|gb|AAX18712.1| sp|Q6Q9I2|MAD15_ORYSJ 68.91
gi|60317531|gb|AAX18712.1| sp|Q42429|AGL8_SOLTU 57.20
gi|60317531|gb|AAX18712.1| sp|O22328|AGL8_SOLCO 58.00
gi|60317531|gb|AAX18712.1| sp|Q41276|AP1_SINAL 65.79
gi|60317531|gb|AAX18712.1| sp|D7KWY6|AP1_ARALL 65.79
gi|60317531|gb|AAX18712.1| sp|Q8GTF4|AP1C_BRAOB 64.21
gi|60317532|gb|AAX18713.1| sp|B4YPV4|AP1C_BRAOA 64.21
gi|60317532|gb|AAX18713.1| sp|Q96355|1AP1_BRAOT 64.21
gi|60317532|gb|AAX18713.1| sp|P0DI14|AP1_BRARP
In the bar chart the x axis should be genes, the y axis should be the frequency, and the query ID would be the title of the graph.
Is there any automatic way I can do this? I have ~40,000 queries and around ~100 hits against each query in a single file.

Step1: Extract the 2nd Col form your output file using:
awk '{print$2}'
Step2: Then open the file in vim editor and type the following command:
:%s!*..*_!!g
Step3: Use this file to plot R.
data <- read.table("ur_file_name.txt", header=F, sep=" ")
barplot(data$V2, xlab="Genes", ylab="Frequency", main="Query ID")

Related

How to prevent stacked bars aggregating by date in DASH?

I have a simple dataframe who's index is a time series. The dataframe looks something like this:
I want to shows stacked bars for the 'Percent_male' and 'Percent_Female' columns using plotly.graph_objects and Dash. This works fine for data rows who's date index are unique. However, any rows whose index is not a unique date, such as for '2022-10-10' for example, in which there are 4 data samples occurring on the same date, the 4 samples all aggregate into one large stacked bar, but I wish to keep separate bars per sample/ row. The end result looks like:
The 2 traces are quite simply:
`trace1 = go.Bar(
x = df.index,
y = df.Percent_Male,
name = '% Male',
)
trace2 = go.Bar(
x = df.index,
y = df.Percent_Female,
name = '% Female'
)`
And the Plotly go figure is created in the Dash app.layout like so:
`app.layout = html.Div([
html.H1('Gender Balance'),
html.Div([
dcc.Graph(
id='plot1',
figure={
'data': data2,
'layout':go.Layout(
title='Historic',
height=640,
width=980,
barmode='stack',
)
}
)],style={'display':'inline-block'}
])`
Is there any way to plot unique bars per dataframe column?

Gnuplot: How to plot the sum of the previous residual to the next residual till (n-1) summs (desired plot values) of my (n) residuals?

I have this residuals .conv file (like a .txt file):
0 -3.39778780952e+00 -3.64444458026e+00 -6.13098312717e+00 -3.33731379258e+00 9.60958415473e-02
1 -5.68002563742e+00 -5.05666214505e+00 -6.80071922409e+00 -5.04462782788e+00 2.22031207076e-03
2 -5.64107082704e+00 -4.97249772797e+00 -6.79863961158e+00 -4.96268354902e+00 2.61811084403e-03
3 -5.80553774139e+00 -5.92252773129e+00 -8.00308750495e+00 -5.70572702588e+00 4.58014609089e-04
4 -5.78806459727e+00 -6.01639219099e+00 -8.59850185227e+00 -5.72299102224e+00 3.93884617760e-04
5 -6.81638698130e+00 -6.73802955972e+00 -8.62025323625e+00 -6.62315587350e+00 8.89255543212e-05
6 -6.51291720873e+00 -6.69293919422e+00 -8.66002661220e+00 -6.43426100314e+00 1.09754466079e-04
7 -7.57778891780e+00 -7.00314649895e+00 -8.63923998027e+00 -6.98816143858e+00 5.94338661679e-05
8 -7.56036077709e+00 -7.01274467096e+00 -8.62954759664e+00 -6.99585319125e+00 5.69443692058e-05
and i want to plot in GNUPLOT, the 1st column with a new column (or not if it doesnt need to make a new one) that would sum the previous value of the 6th column residual to the next residual of the 6th column and so on till i have (n-1) summs (desired plot values) of the 6th column values in order to get the final deformation of each iteration.
For example as i sum the 6th column values: 9.60e10-2 + 2.22e10-3 --> 1st value of my desired column data
9.60e10-2 + 2.22e10-3 + 2.22e10-3 --> 2nd value of my desired column data
and so on
I think i need a function to plot this for GNUPLOT.
Thank you
With the following code you plot the sum up a column. I guess it comes close to what I understood from your question. However, 1st value will be 1st row, 2nd value will be 1st+2nd row, 3rd value will be 1st+2nd+3d row, etc. I hope you can modify it to your needs.
Code:
### sum up a column
reset session
$Data <<EOD
0 -3.39778780952e+00 -3.64444458026e+00 -6.13098312717e+00 -3.33731379258e+00 9.60958415473e-02
1 -5.68002563742e+00 -5.05666214505e+00 -6.80071922409e+00 -5.04462782788e+00 2.22031207076e-03
2 -5.64107082704e+00 -4.97249772797e+00 -6.79863961158e+00 -4.96268354902e+00 2.61811084403e-03
3 -5.80553774139e+00 -5.92252773129e+00 -8.00308750495e+00 -5.70572702588e+00 4.58014609089e-04
4 -5.78806459727e+00 -6.01639219099e+00 -8.59850185227e+00 -5.72299102224e+00 3.93884617760e-04
5 -6.81638698130e+00 -6.73802955972e+00 -8.62025323625e+00 -6.62315587350e+00 8.89255543212e-05
6 -6.51291720873e+00 -6.69293919422e+00 -8.66002661220e+00 -6.43426100314e+00 1.09754466079e-04
7 -7.57778891780e+00 -7.00314649895e+00 -8.63923998027e+00 -6.98816143858e+00 5.94338661679e-05
8 -7.56036077709e+00 -7.01274467096e+00 -8.62954759664e+00 -6.99585319125e+00 5.69443692058e-05
EOD
plot s=0 $Data u 1:(s=s+$6) w lp pt 7 title "Sum up"
### end of code
Result:
If I understand well, you want to make a cumulative plot, like this:
It was generated by:
plot[1:][0.095:] 'temp.dat' u 1:6 smooth cumulative w p ps 2 lw 2 notitle
but the key is smooth cumulative.

SSRS Lookup combined with SUM IIF

I have the following situation (in SQL Server 2008 R2):
. Two datasets (Dataset 1 / Dataset 2)
. Rows (A - H ) <br>
. Columns (2011 - 2012 - 2013 - P12 M)
I recieve the first three columns from dataset 1, the last column (Past 12 Months) with an lookup
on
Column A=Lookup(Fields!A.Value, Fields!A.Value, Fields!Total.Value, "Dataset 2")
So far so good..
The challenging part:
Row B/A = ROW B divided by ROW A
I use this statement:
=SUM(IIF( Fields!A.Value = "B", Fields!Total.Value, 0)) / SUM(IIF( Fields!A.Value = "A", Fields!Total.Value, 0))
But how to get there with an lookup?
How to get to the first question mark (Answer: 2,23)?
I tried to combine the statement with an lookup, so far no result.
Searched the internet and found/tried some URL:
SSRS nested iif expression in lookup
Method 1:
Handling it the way the report is designed currently.
You would need to explicitly give the value in source expression of Lookup.
=Lookup("A", Fields!A.Value, Fields!Total.Value, "Dataset 2")
/Lookup("B", Fields!A.Value, Fields!Total.Value, "Dataset 2")
Method 2:
I would probably redesign the datasets and combine the dataset 1 and dataset 2. It will help remove doing funky logic in the report. Use new dataset to populate the matrix.
New Dataset:
SELECT CAST(Year as varchar(10)), A, Total
FROM DataSet1_Table
UNION ALL
SELECT 'L12M', A, Total
FROM DataSet2_Table

Series Grouping SSRS

I have the following data in the dataset
Key Assignee Sev InOps InTek
1 A 1 Y Y
2 B 2 Y N
3 C 3 N Y
Need to plot the chart as follows so that I get
Sev on X Axis
Count(Key) on Y
Assignee belongs to Ops (Y) as Ops bar
Assignee belongs to Tek(Y) as Tek bar -For each severity we will have two bars then , one for Ops and another for Tek
which will show as follows
Sev 1 Ops Tek
1 1
Sev2 1 0
Sev3 0 1
I have the chart configuration done as follows
In Count I have dragged the Key column
In Category group I have the Sev column
in the series group , do I need to put two series opscolumn and tek respectively ?
The simplest way to do this, if possible, would be to pivot the data when generating the Dataset, i.e. having something like this:
From here it's trivial to create the Chart - Series based on InType, Category based on Severity and the data is a Count of Key.
If you can't pivot your data, create a Chart like this:
The expression for the first Chart Series is:
=Sum(IIf(Fields!InOps.Value = "Y", 1, 0))
and the second:
=Sum(IIf(Fields!InTek.Value = "Y", 1, 0))
It's also useful to set a custom Legend text for each of the Series:
Either way, you get the required result:

Access partition function: Is there a way to make it show bin categories that don't have a count?

I'm trying to use the Access Partition function to generate the bins used to generate a histogram chart to show the frequency distribution of my % utilization data set. However, the Partition function only shows the category bin ranges (e.g. 0:9, 10:19 etc) only for the categories that have a count. I would like it to show up to 100.
Example:
Using this function:
% Utilization: Partition([Max],0,100,10)
The Full SQL is:
SELECT Count([qry].[Max]) AS Actuals, Partition([Max],0,100,10) AS [% Utilization]
FROM [qry]
GROUP BY Partition([Max],0,100,10);
gives me:
Actuals | % Utilization
4 | 0: 9
4 | 10: 19
4 | 20: 29
but I want it to show 0s for the ranges that don't have values up to 90:99. Can this be done?
Thanks in Advance
The only way I can think of doing this is with an additional Bins table that contains all the bins you wish to illustrate:
SELECT Bins.[% Utilization], t.Actuals FROM Bins
LEFT JOIN
(SELECT Count(max) AS Actuals,
Partition([max],0,100,10) AS [% Utilization]
FROM qry
GROUP BY Partition([max],0,100,10)) t
ON t.[% Utilization]=bins.[% Utilization]