Emboss needle() warning: "Sequence Character not found in ajSeqCvtKS" ...? - warnings

I am using EMBOSSwin's needle() command line function which performs pairwise global alignments, but I encounter a strange warning.
So I have 24 pairs of amino acid sequences that need aligning, I run the needle() command from python using "subprocess.call()" - whilst this process occurs (seemingly smoothly) I get the following warning:
Warning: Sequence character string not found in ajSeqCvtKS
EXTRA CLUES:
Albeit this strange warning... alignments are successfully generated by needle() in .fasta format as you can see...
... BUT... I am getting unexplained "AssertionErrors" when trying to read these alignments back into python - using biopython's AlignIO.read() function (see: http://bit.ly/1aHK9w7 for my question directly related to this AssertionError)...
*To be clear: these AlignIO() AssertionErrors may not be related to the needle() warning, but I am treating the warning as a prime lead in the investigation...!

Related

TCL issues with Octal numbers seen after porting from EDK 1.05 to EDK2

I have an EFI Shell tool which uses EDK 1.05 and TCL 8.3 sources. This tool accepts user commands to display PCI-E adapter information and to upgrade firmware on it. I recently ported it to UDK2017. I am using VS2012x86 toolchain to build the tool.
When I run the binary from EFI Shell, TCL reports errors such as these.
can't use invalid octal number as operand of "||"
syntax error in expression "(1<<0)"
syntax error in expression "(0x1<<0)"
I have read about TCL and Octal numbers
Since this issue is not being seen with EDK 1.05 code with the same TCL version, I am wondering if there is any flag I am missing out. I am hoping there is a simple solution to get past this error since there was no change in the TCL version.
Octal Issue
It's hard to be sure, but I suspect with the octal number issue you've got code that's parsing something like 080808 as a number, which is interpreted as octal because of the leading 0 (just like a constant in C or C++) and so can't contain an 8 (or 9). To parse a number definitely as decimal, the scan command is used:
set val 080808
scan $val "%d" parsedVal
# Properly, should check that [scan] has a result of 1, but I probably wouldn't bother
puts "$val -> $parsedVal"
Odd Expression Syntax Error
The other syntax error in expression "(1<<0)" errors are stranger, as those are definitely valid syntax. I've only got versions back to 8.4 on this machine, but…
$ tclsh8.4
% expr (1<<0)
1
The only ways that could be an invalid expression are if it is either in some custom expression language (which would be application-specific; you'll have to read the documentation to figure that out) or if you're using an expression string as a numeric value:
% set val (1<<0)
(1<<0)
% expr {$val + 1}
can't use non-numeric string as operand of "+"
but that wouldn't produce exactly the error you are seeing. Very puzzling indeed!
Use Stack Traces
There is something that might help you figure out what is going on. After an error, the global errorInfo variable has a stack trace generated. For example, after the above erroring expr it has this:
% puts $errorInfo
can't use non-numeric string as operand of "+"
while executing
"expr {$val + 1}"
The good thing is that this tells you exactly what command and where gave you the error; that can make a gigantic difference in your detective work to hunt down your problems.

How to decode an HTTP request with utf-8 and treat the surrogate keys (Emojis)

I'm having a hard time dealing with some parsing issues related to Emojis.
I have a json requested through the brandwatch site using urllib.(1) Then, I must decode it in utf-8, however, when I do so, I'm getting surrogate keys and the json.loader cannot deal with them. (2)
I've tried to use BeautifulSoup4, which works great, however, when there's a &quot on the site result, it is transformed to ", and then, the json.loader cannot deal with it for it says that a , is missing. After tons of search, I gave up trying to escape the " which would be the ideal.(3)
So now, I'm stuck with both "solutions/problems". Any ideas on how to proceed?
Obs: This is a program that fetchs data from the brandwatch and put it inside an MySQL database. So performance is an issue here.
Obs2: PyJQ is a JQ for Python with does the request and I can change the opener.
(1) - Dealing with the first approach using urllib, these are the relevants parts of the code used for it:
def downloader(url):
return json.loads(urllib.request.urlopen(url).read().decode('utf8'))
...
parsed = pyjq.all(jqparser,url=url, vars={"today" : start_date}, opener=downloader)
Error Casted:
Exception ignored in: '_pyjq.pyobj_to_jv'
UnicodeEncodeError: 'utf-8' codec can't encode character '\ud83d' in position 339: surrogates not allowed
*** Error in `python': munmap_chunk(): invalid pointer: 0x00007f5f806303f0 ***
If I print the result of urllib.request.urlopen(url).read().decode('utf8') instead of sending it to json.loader, that's what appears. These keys seems to be Emojis.
"fullname":"Botinhas\uD83D\uDC62"
(2) Dealing with the second approach using BeautifulSoup4, here's the relevant part of the code. (Same as above, just changed the downloader function)
def downloader(url):
return json.loads(BeautifulSoup(urllib.request.urlopen(url), 'lxml').get_text())
...
parsed = pyjq.all(jqparser,url=url, vars={"today" : start_date}, opener=downloader)
And this is the error casted:
Expecting ',' delimiter: line 1 column 4814765 (char 4814764)
Doing the print, the " before Diretas Já should be escaped.
"title":"Por "Diretas Já", manifestações pelo país ocorrem em preparação ao "Ocupa Brasília" - Sindicato dos Engenheiros no Estado do Rio de Janeiro"
I've thought of running a regex, however, I'm not sure whether this would be the most appropriate solution to this case as performance is an issue.
(3) - Part of Brandwatch result with the &quot problem mentioned above
UPDATE:
As Martin stated in the comments, I ran a replace swapping &quot for nothing. Then, it raised the former problem, of the emoji.
Exception ignored in: '_pyjq.pyobj_to_jv'
UnicodeEncodeError: 'utf-8' codec can't encode character '\ud83d' in position 339: surrogates not allowed
*** Error in `python': munmap_chunk(): invalid pointer: 0x00007f5f806303f0 ***
UPDATE2:
I've added this to the downloader function:
re.sub(r'\\u(d|D)([a-z|A-Z|0-9]{3})', "", urllib.request.urlopen(url).read().decode('utf-8','ignore'))
It solved the issue, however, I don't think it's the best way to solve it. If anybody knows a better option.

Golang CSV read : extraneous " in field error

I am using a simple program to read CSV file, somehow I noticed when I created a CSV using EXCEL or windows based computer go library fails to read it. even when I use cat command it only shows me last line on the terminal. It always results in this error extraneous " in field.
I researched somewhat than I found it is somewhat related to carriage return differences between OS.
But I really want to ask how to make a generic csv reader. I tried reading the same csv using pandas and it was reading successfully. But i am not been able to achieve this using my Go code.
Also screen shot of correct csv Is here
Your file clearly shows that you've got an extra quote at the end of the content. While programs like pandas may be fine with that, I assume it's not valid csv so go does return an error.
Quick example of what's wrong with your data: https://play.golang.org/p/KBikSc1nzD
Update: After your update and a little bit of searching, I have to apoligize, the carriage return does matter and seems to be tha main culprit here, Go seems to be ok handling the \r\n windows variant but not the \r one. In that case what you can do is wrap the bytes.Reader into a custom reader that replaces the \r byte with the \n byte.
Here's an example: https://play.golang.org/p/vNjzwAHmtg
Please note, that the example is just that, an example, it's not handling all the possible cases where \r might be a legit byte.

MySQL on remote machine accessed via chromebook terminal returns nonsense unicode which persists after I leave MySQL

I am using the terminal in a chromebook to ssh into a remote server. When I run a MySQL (5.6) select query, sometimes one of the fields will return nonsense unicode (when the field should return an email address) and change the MySQL prompt from:
mysql>
to
└≤⎽─┌>
and whatever text I type is converted into weird unicode. The problem persists even after I exit MySQL
One of the values in your database happened to have the sequence of bytes 0x1B, 0x28, 0x30 (ESC ) 0) in it. When you did the query, MySQL printed this byte sequence directly to your console. You can reproduce the effect by typing from python:
>>> print '\x1B\x28\x30'
Consoles use control characters (in particular 0x1B, ESC) as a way to allow applications to control aspects of the console other than pure text, such as colours and cursor movements. This behaviour is inherited from the old dumb-terminal devices that they are pretending to be (which is why they are also known as terminal emulators), along with some weirder tricks that we probably don't need any more. One of those is to switch permanently between different character sets (considered encodings, now, but this long predates Unicode).
One of those alternative character sets is the DEC Special Graphics Character Set which it looks like you have here. In this character set the byte 0x6D, usually used in ASCII for m, comes out as the graphical character └.
You could in principle reset your terminal to normal ASCII by printing a byte sequence 0x1B, 0x28, 0x42 (ESC ) B), but this tends to be a pain to arrange when your console is displaying rubbish.
There are potentially other ways your console can become confused; it's not, in general safe to print arbitrary binary data to the console. There even used to be nastier things you could do with the console by faking keyboard input, which made this a security problem, but today it's just an annoyance factor.
However, one wouldn't normally expect to have any control codes in an e-mail address field. I suggest the application using the database should be doing some validation on the input it receives, and dropping or blocking all control codes (other than potentially newlines where necessary).
As a quick hack to clean this field for the specific case of the ESC character, you could do something like:
UPDATE things SET email=REPLACE(email, CHAR(0x1B), '');

Escape single and double quote in TCL

I am using the following script , but it is throwing error message
tcl;
eval {
add command "Audit Param"\
setting "Error : Part's and Spec's desc contains \"OBS\" or \"REPLACE\"" "(Reference No)"\
user all;
}
It is showing error as : Expected word got 'and'.
I tried with Part\'s, but still not working. How to escape both single and double quote , if it is having both?
Single quote and Tcl
In Tcl itself, the single quote character (') has no special meaning at all. It's just an ordinary character like comma (,) or period (.). (Well, except commas have special meaning in expressions and periods are used in floating point values and Tk widget names. Single quote has no meaning at all by comparison.)
With what you have written, any special meaning (and hence any need to quote) is limited to the add command.
Complex quoting situations are often resolved in Tcl by using a different quoting strategy. In particular, putting things in braces disables all substitutions (except backslash-newline-whitespace collapsing). This lets me write the equivalent to what you've written as:
add command "Audit Param" \
setting {Error : Part's and Spec's desc contains "OBS" or "REPLACE"} \
"(Reference No)" user all
Any complaint here is coming from inside that code and is not in the code as written per se. (The eval { ... } adds nothing. Nor does it incur a penalty other than making your code slightly harder to read.)
The real problem
At a very loose guess, that problem string is being used inside an SQL statement with direct string substitution instead of prepared parameters; that could produce that sort of error message. Check the contents of the global errorInfo variable after the failure happens to get a stack trace that can help pin down what went wrong; that might help you see where inside things the code is failing. If it is a piece of naughty SQL, there is code to fix because you've got something that is vulnerable to SQL injection problems (which might or might not be a security problem, depending on the exposure of that command). And if that's the case, doubling up each single quote (changing ' to '') ought to work around the problem in the short run.