SAM (Sequence Alignment/Map) Format Alignment Tags - duplicates

I am using samtools to remove duplicates. To mark and then remove duplicates markdup relies on ms
(mate score) and MC (mate cigar) tags that fixmates provides.
Does anyone knows exactly what are these tags? How is fixmates doing?
Thanks for the help!

MC tag is for Mate CIGAR, whish is the CIGAR of its mate. ms is not a standard tag, but I assume it is the score of its mate.
You can find the detailed information in the specs: https://samtools.github.io/hts-specs/SAMtags.pdf
fixmates should be simply copying data from one mate to the other and viceversa.
BTW, this is not a programming question so this is not the proper place to ask about it. Better try in bioinformatics.

Related

How to clean up HTML leaving only <a> <b> <i> <p> tags?

I have to process a very large amount of HTML text for epub conversion, and every "automated" solution I found and tried is way less than satisfactory.
So I was thinking toward a regex batch command solution, but I am too regex illiterate to make it work, especially considering possible nesting instances. Can anybody help or point me to a surefire solution?
Thanks in advance!
The best solution is to use an HTML parser.
For simple cases you may try the following regex: <[abip]>[^<>]*<\/[abip]>|<[abip][^<>]*\/>

Mail-Regex which does not match addresses in HTML-form-elements

With Regex, I need to find and replace all the mailaddresses in a fully rendered HTML-page, because i want to SPAM-protect all of them. To be precise i want all addresses except them in formular-elements (because if a validation of a user-input fails, i still want to display the inserted mailaddress and not a replaced one).
To find or write a Regex to simply search mailaddresses is not a problem. The problem is the exclusion of the ones in formular-elements. Has anyone a suggestion how to resolve this problem? Is this possible in Regex?
Some examples:
I want to match "...My content, mail#mail.com, more content......"
But i don't want to match: "...Your mail:mail#mail.com..."
I know it would be better to parse the HTML and simply skip form-elements, but performance matters and as i said before, this task is performed every time the website is called...
Thanks for your help!
It's probably impossible. See: RegEx match open tags except XHTML self-contained tags to start with. Second regex doesn't do a very good job of "not". (Some regex support it, some don't, but all are slow at it.) Perhaps someone who is much better at regex than me might be able to help you, but I suspect doing this is impossible.

Tournament bracket [duplicate]

I am trying to create a bracket system using HTML. I've found other solutions, however, most require lots of absolute/relative positioning or tables.
I'm looking for a way to make it flexible, so I can just change the HTML to change it from a 16-man bracket to a 64-man bracket.
[404 - link removed]
Now, I don't see much wrong with my current example, however, I'm just curious if there is anyone out there has some suggestions on improving or completely changing the way I am doing it.
I'd rather stay away from tables, and definitely stay away from any sort of positioning (this is meant to be flexible).
If you have any ideas, that would be great. :)
Thanks,
Andrew
That actually looks fairly good. What I would do to improve it is encapsulate the logic in a bit of Javascript, supply the bracket information in some sort of text format, and have the Javascript parse the text format to generate the bracket as deeply as you need it.

Coding a Flexible HTML Sports Bracket

I am trying to create a bracket system using HTML. I've found other solutions, however, most require lots of absolute/relative positioning or tables.
I'm looking for a way to make it flexible, so I can just change the HTML to change it from a 16-man bracket to a 64-man bracket.
[404 - link removed]
Now, I don't see much wrong with my current example, however, I'm just curious if there is anyone out there has some suggestions on improving or completely changing the way I am doing it.
I'd rather stay away from tables, and definitely stay away from any sort of positioning (this is meant to be flexible).
If you have any ideas, that would be great. :)
Thanks,
Andrew
That actually looks fairly good. What I would do to improve it is encapsulate the logic in a bit of Javascript, supply the bracket information in some sort of text format, and have the Javascript parse the text format to generate the bracket as deeply as you need it.

Tools to reduce generated HTML size

I'm using google docs, and some templates we are using were created using MS-Office.
The resulting HTML is fat and ugly, and the 500KB per doc limitation on google makes some cleanup mandatory.
I was able to find redundant "style" attributes and move them to some CSS class, and rename the most redundant classes names to shorter ones, which makes me save about 50% of the original size.
Are you aware of some existing tools/scripts/lib which could do this painful job for me, or at least help me to write this magic tool ?
Thanks in advance !
EDIT: I gave a try to both tidy, demoronizer and "manual rewrite":
- Input : 140Kb
- Tidy'ed : 110Kb
- Demoronized : 135Kb
So my favorite answer will be "rewrite it!"
Thanks !
MS-Office makes crappy HTML, period. You're better of spending time rebuilding the HTML from the original text than trying to walk through that minefield.
I made a few macros that do some search/replace functions on Word to do basic things like wrap <p> tags around paragraphs and stuff like that, then re-markup the whole thing from scratch.
You could try tidy it will clean up many things.
Without commenting on its name, I could mention demoronizer, which the author describes as:
...a Perl program available for downloading from this site which corrects numerous errors and incompatibilities in HTML generated by, or edited with, Microsoft applications.
YMMV.
One of my favourite utilties now is actually Windows Live Writer - it does a neat job of stripping rubbish out of Word doc files. Some might disagree but I use it quite often!