Trying to reproduce mirbase results locally with BLAST - blast

I am trying to reproduce locally on my computer what I get running mirbase on their website using BLAST. The 'search sequences' option is: mature miRNAs which I had downloaded on my computer and make it as a BLAST database with command:
./makeblastdb -in /home/marianoavino/Downloads/mature.fa -dbtype 'nucl' -out /home/marianoavino/Downloads/mature
then on mirbase I see they use an e-value of 10, which I leave locally.
On mirbase at the end of the analysis they give you these parameter setting:
Search parameters
Search algorithm:
BLASTN
Sequence database:
mature
Evalue cutoff:
10
Max alignments:
100
Word size:
4
Match score:
+5
Mismatch penalty:
-4
and this is the command line I use on my computer for BLAST
./blastn -db /home/marianoavino/Downloads/mature -evalue 10 -word_size 4 -query /home/marianoavino/Downloads/testinputblast.fasta -task "blastn" -out /home/marianoavino/Downloads/testBLast.out
The results of the two analysis are different, with mirbase finding much more stuff than local BLAST.
Do you have any idea on which parameters I should use on local blast command line to match those listed mirbase parameters in order to get the same answer?

There can be lots of reasons for different results including the version of blast you are using and which they used, parameters (like you said) and differences in the databases (remember, database size is used to calculate things like evalue, so you may end up with different results).
Exact replication of results may be difficult, but the question is are the differences meaningful? Just because an alignment has some evalue (which 10 is unusually high) does not mean it is meaningful. For a given sequence, if searches are yielding different number of alignments, but the same number of high quality alignments (high bitscore, low evalue, full alignment between query and subject sequences), does it matter?
I would try and compare results to see where these differences are, then move forward

Related

weak instruments stata results

I'm using the command ivregress 2sls, with clusters (each cluster is a school) and with pweights.
I have one endogenous variables x1, and 4 instruments. I'm trying to test my model and check if my instruments are not weak.
I used the estat firstatage command and I'm not sure how to interpret the result.
picture:
results
You might want to try the user-written weakivtest which is available from the SSC (ssc install weakivtest) and an improvement to estat firststage. If your F-test stat rejects the Null at the common significance levels (or the critical values when using weakivtest), your instruments are strong.

Reverse engineering GCN binary (ELF) in C#, but not an expert on elf or elf conventions. Do some of these sections/section names look familiar?

My first exposure to the ELF format binary began less than 2 weeks ago; please excuse the crudeness of my grasp of them (and of course, correction of any misconceptions I display here would be welcome).
The story so far: I have some GCN binaries which I am trying to fully reverse engineer so that I might be able to generate my own with a higher degree of control (i.e. limiting the number of intermediate steps executed by code not my own and not entirely within my understanding). What I've found from some resources online and my own delving is that each binary contains two ELF structures; the first is fairly small, containing three sections (no program headers) named "", ".shstrtab", and ".ddiPipelineHeader".
The ".ddiPipelineHeader" section size is 48 bytes, with the byte 0 being a 1, and bytes 16-19 containing what appears to be a 32bit integer that corresponds to the number of bytes in the binary from the start of the second ELF structure. All the other bytes in this section are 0. A google search of ".ddiPipelineHeader" returned exactly 1 result which I didn't find useful. Before I run off all half-cocked into dangerous, crashy GPU experimentation-land, does this section's structure sound at all familiar? Is there possibly an explanation of what each byte would do (e.g. bytes 4-15 are 0 padding, etc. etc.)?
I also have all the sections contained in the second ELF to ask about, but I figure I'll be able to delve more deeply into those with a better foundation gleaned here, so I'll hold off on that part for now.
Thanks for any insight!

JMeter: Capturing Throughput in Command Line Interface Mode

In Jmeter v2.13, is there a way to capture Throughput via non-GUI/Command Line mode?
I have the jmeter.properties file configured to output via the Summariser and I'm also outputting another [more detailed] .csv results file.
call ..\..\binaries\apache-jmeter-2.13\bin\jmeter -n -t "API Performance.jmx" -l "performanceDetailedResults.csv"
The performanceDetailedResults.csv file provides:
timeStamp
elapsed time
responseCode
responseMessage
threadName
success
failureMessage
bytes sent
grpThreads
allThreads
Latency
However, no amount of tweaking the .properties file or the test itself seems to provide Throuput results like I get via the GUI's Summary Report's Save Table Data button.
All articles, postings, and blogs seem to indicate it wasn't possible without manual manipulation in a spreadsheet. But I'm hoping someone out there has figured out a way to do this with no, or minimal, manual manipulation as the client doesn't want to have to manually calculate the Throughput value each time.
It is calculated by JMeter Listeners so it isn't something you can enable via properties files. Same applies to other metrics which are calculated like:
Average response time
50, 90, 95, and 99 percentiles
Standard Deviation
Basically throughput is calculated as simple as dividing total number of requests by elapsed time.
Throughput is calculated as requests/unit of time. The time is calculated from the start of the first sample to the end of the last sample. This includes any intervals between samples, as it is supposed to represent the load on the server.
The formula is: Throughput = (number of requests) / (total time)
Hopefully it won't be too hard for you.
References:
Glossary #1
Glossary #2
Did you take a look at JMeter-Plugins?
This tool can generate aggregate report through commandline.

Can I test tesseract ocr in windows command line?

I am new to tesseract OCR. I tried to convert an image to tif and run it to see what the output from tesseract using cmd in windows, but I couldn't. Can you help me? What will be command to use?
Here is my sample image:
The simplest tesseract.exe syntax is tesseract.exe inputimage output-text-file.
The assumption here, is that tesseract.exe is added to the PATH environment variable.
You can add the -psm N argument if your text argument is particularly hard to recognize.
I see that the regular syntax (without any -psm switches) works fine enough with the image you attached, unless the level of accuracy is not good enough.
Note that non-english characters (such as the symbol next to prescription) are not recognized; my default installation only contains the English training data.
Here's the tesseract syntax description:
C:\Users\vish\Desktop>tesseract.exe
Usage:tesseract.exe imagename outputbase [-l lang] [-psm pagesegmode] [configfile...]
pagesegmode values are:
0 = Orientation and script detection (OSD) only.
1 = Automatic page segmentation with OSD.
2 = Automatic page segmentation, but no OSD, or OCR
3 = Fully automatic page segmentation, but no OSD. (Default)
4 = Assume a single column of text of variable sizes.
5 = Assume a single uniform block of vertically aligned text.
6 = Assume a single uniform block of text.
7 = Treat the image as a single text line.
8 = Treat the image as a single word.
9 = Treat the image as a single word in a circle.
10 = Treat the image as a single character.
-l lang and/or -psm pagesegmode must occur before anyconfigfile.
Single options:
-v --version: version info
--list-langs: list available languages for tesseract engine
And here's the output for your image (NOTE: When I downloaded it, it converted to a PNG image):
C:\Users\vish\Desktop>tesseract.exe ECL8R.png out.txt
Tesseract Open Source OCR Engine v3.02 with Leptonica
C:\Users\vish\Desktop>type out.txt.txt
1 Project Background
A prescription (R) is a written order by a physician or medical doctor to a pharmacist in the form of
medication instructions for an individual patient. You can't get prescription medicines unless someone
with authority prescribes them. Usually, this means a written prescription from your doctor. Dentists,
optometrists, midwives and nurse practitioners may also be authorized to prescribe medicines for you.
It can also be defined as an order to take certain medications.
A prescription has legal implications; this means the prescriber must assume his responsibility for the
clinical care ofthe patient.
Recently, the term "prescriptionΓÇ¥ has known a wider usage being used for clinical assessments,

2D non-polynomial function fitting from the command line

I just wrote a simple Unix command line utility that could be implemented a lot more efficiently. I can measure its performance by just running it on a number of inputs and measuring the time it takes. This will produce a set of pairs of numbers, s t, where s is the input size and t the processing time. In order to determine the performance characteristics of my utility, I need to fit a function through these data points. I can do this manually, but I prefer to be lazy and let a utility do it for me.
Does such a utility exist?
Its input is a sequence of pairs of numbers.
Its output is a formula that expresses how the second number depends as a function on the first, plus an error measure.
One step of the way is to have a utility that does this just for polynomials.
This has been discussed here but it didn't produce a ready-to-use solution.
The next step is to extend the utility to try non-polynomial terms: negative-degree polynomials (as in y = 1/x) and logarithmic terms (as in y = x log x) will need to be tried as well. One idea to cope with the non-polynomial terms is to just surround the polynomial fitting with x and y scale transformations. I don't know whether that will do. This question is related but not exactly the same.
As I said, I'm lazy: I'm not looking for ideas on how to to write this myself, I'm looking for a reliable result of a project that has already done it for me. Any suggestions?
I believe that SAS has this, RS/1 has this, I think that Mathematica has this, Execel and most spreadsheets have a primitive form of this and usually there are add-ons available for more advanced forms. There are lots of Lab analysis and Statistical analysis tools that have stuff like this.
RE., Command Line Tools:
SAS, RS/1 and Minitab were all command line tools 20 years ago when I used them. I bet at least one of them still has this capability.