how to remove the lines in captcha - ocr

I have a simple captcha,I want to recognize the picture.
the picture is like:
I want to use the tesseract. http://code.google.com/p/tesseract-ocr/
but the tesseract only can use on the clear picture.
so I should preprocess the pic.
the preprocess code is:
im = Image.open('test.png')
# text = image_to_string(im)
enhancer = ImageEnhance.Contrast(im)
im = enhancer.enhance(4)
img = img.convert("RGBA")
width,height = im.size
# pixdata = img.load()
for y in xrange(img.size[1]):
for x in xrange(img.size[0]):
if im.getpixel((x,y)) != (0,0,0):
im.putpixel((x,y),(255,255,255) )
for y in xrange(img.size[1]):
for x in xrange(img.size[0]):
if y<2 or y>(img.size[1]-3):
continue
if im.getpixel((x, y))[0]==255 and im.getpixel((x, y+2))[0]==0 and im.getpixel((x, y-1))[0]==0:
im.putpixel((x, y),(0,0,0))
# else:
# continue
list(im.getdata())
im.show()
after the process,the pic is like:
so I failed. can anyone give me some tips?
I know how to remove the line if the line is a pixel width,but the line here is not consistent.

Related

Niftis being plotted skewed

When I plot single images they appear to be skewed, but doesn't appear that way when I look at the images in 3DSlicer or another viewer. I'm not sure if there's something I should be adjusting that I'm not aware of. The below is how I converted from DICOM:
dicom2nifti.convert_directory(path_to_dicom_before, path_to_dicom_before_converted, compression=True, reorient=True)
dicom2nifti.convert_directory(path_to_dicom_post, path_to_dicom_post_converted, compression=True, reorient=True)
print(glob(path_to_dicom_before_converted + '*.nii.gz'))
nii_before = nib.load(glob(path_to_dicom_before_converted + '*.nii.gz')[0])
nii_after = nib.load(glob(path_to_dicom_post_converted + '*.nii.gz')[0])
nii_before_data = nii_before.get_fdata()
nii_after_data = nii_after.get_fdata()
fig, ax = plt.subplots(figsize=[10, 5])
plotting.plot_img(nii_before, cmap='gray', axes=ax)
plt.show()
fig, ax = plt.subplots(figsize=[10, 5])
plotting.plot_img(nii_after, cmap='gray', axes=ax)
plt.show()
plt.imshow(nii_before_data[100], cmap='bone')
plt.axis('off')
plt.show()
Affine of the first:
[[-3.19454312e-01 7.17869774e-02 3.95075195e-02 6.01478424e+01]
[ 5.83867840e-02 2.97792435e-01 -2.28872180e-01 1.27874863e+02]
[ 4.69673797e-02 1.18071720e-01 5.53225577e-01 1.12181287e+03]
[ 0.00000000e+00 0.00000000e+00 0.00000000e+00 1.00000000e+00]]
As you can see in this answer you are plotting the row 100 with all columns and all slices! Also you need to plot the pixel array nii_before_data and not the whole Nifti image nii_before which contains other types of data.
you can try:
nii_before = nib.load(glob(path_to_dicom_before_converted + '*.nii.gz')[0])
nii_after = nib.load(glob(path_to_dicom_post_converted + '*.nii.gz')[0])
nii_before_data = nii_before.get_fdata()
nii_after_data = nii_after.get_fdata()
## Same goes for nii_after_data
if(len(nii_before_data.shape)==3):
for slice_Number in range(nii_before_data.shape[2]):
plt.imshow(nii_before_data[:,:,slice_Number ])
plt.show()
if(len(nii_before_data.shape)==4):
for frame in range(nii_before_data.shape[3]):
for slice_Number in range(nii_before_data.shape[2]):
plt.imshow(nii_before_data[:,:,slice_Number,frame])
plt.show()
If you can provide a sample Nifti Image the solution might be more precise according to your data.

Plotly Express: Prevent bars from stacking when Y-axis catgories have the same name

I'm new to plotly.
Working with:
Ubuntu 20.04
Python 3.8.10
plotly==5.10.0
I'm doing a comparative graph using a horizontal bar chart. Different instruments measuring the same chemical compounds. I want to be able to do an at-a-glance, head-to-head comparison if the measured value amongst all machines.
The problem is; if the compound has the same name amongst the different instruments - Plotly stacks the data bars into a single bar with segment markers. I very much want each bar to appear individually. Is there a way to prevent Plotly Express from automatically stacking the common bars??
Examples:
CODE
gobardata = []
for blended_name in _df[:20].blended_name: # should always be unique
##################################
# Unaltered compound names
compound_names = [str(c) for c in _df[_df.blended_name == blended_name]["injcompound_name"].tolist()]
# Random number added to end of compound_names to make every string unique
# compound_names = ["{} ({})".format(str(c),random.randint(0, 1000)) for c in _df[_df.blended_name == blended_name]["injcompound_name"].tolist()]
##################################
deltas = _df[_df.blended_name == blended_name]["delta_rettime"].to_list()
gobardata.append(
go.Bar(
name = blended_name,
x = deltas,
y = compound_names,
orientation='h',
))
fig = go.Figure(data = gobardata)
fig.update_traces(width=1)
fig.update_layout(
bargap=1,
bargroupgap=.1,
xaxis_title="Delta Retention Time (Expected - actual)",
yaxis_title="Instrument name(Injection ID)"
)
fig.show()
What I'm getting (Using actual, but repeated, compound names)
What I want (Adding random text to each compound name to make it unique)
OK. Figured it out. This is probably pretty klugy, but it consistently works.
Basically...
Use go.FigureWidget...
...with make_subplots having a common x-axis...
...controlling the height of each subplot based on number of bars.
Every bar in each subplot is added as an individual trace...
...using a dictionary matching bar name to a common color.
The y-axis labels for each subplot is a list containing the machine name as [0], and then blank placeholders ('') so the length of the y-axis list matches the number of bars.
And manually manipulating the legend so each bar name appears only once.
# Get lists of total data
all_compounds = list(_df.injcompound_name.unique())
blended_names = list(_df.blended_name.unique())
#################################################################
# The heights of each subplot have to be set when fig is created.
# fig has to be created before adding traces.
# So, create a list of dfs, and use these to calculate the subplot heights
dfs = []
subplot_height_multiplier = 20
subplot_heights = []
for blended_name in blended_names:
df = _df[(_df.blended_name == blended_name)]#[["delta_rettime", "injcompound_name"]]
dfs.append(df)
subplot_heights.append(df.shape[0] * subplot_height_multiplier)
chart_height = sum(subplot_heights) # Prep for the height of the overall chart.
chart_width = 1000
# Make the figure
fig = make_subplots(
rows=len(blended_names),
cols=1,
row_heights = subplot_heights,
shared_xaxes=True,
)
# Create the color dictionary to match a color to each compound
_CSS_color = CSS_chart_color_list()
colors = {}
for compound in all_compounds:
try: colors[compound] = _CSS_color.pop()
except IndexError:
# Probably ran out of colors, so just reuse
_CSS_color = CSS_color.copy()
colors[compound] = _CSS_color.pop()
rowcount = 1
for df in dfs:
# Add bars individually to each subplot
bars = []
for label, labeldf in df.groupby('injcompound_name'):
fig.add_trace(
go.Bar(x = labeldf.delta_rettime,
y = [labeldf.blended_name.iloc[0]]+[""]*(len(labeldf.delta_rettime)-1),
name = label,
marker = {'color': colors[label]},
orientation = 'h',
),
row=rowcount,
col=1,
)
rowcount += 1
# Set figure to FigureWidget
fig = go.FigureWidget(fig)
# Adding individual traces creates redundancies in the legend.
# This removes redundancies from the legend
names = set()
fig.for_each_trace(
lambda trace:
trace.update(showlegend=False)
if (trace.name in names) else names.add(trace.name))
fig.update_layout(
height=chart_height,
width=chart_width,
title_text="∆ of observed RT to expected RT",
showlegend = True,
)
fig.show()

Not saving html interactive file with R

I am trying to design a circos plot using BioCircos R package. BioCircos allows to save the plots as .html interactive files. However, when I run the package using RScript the saved .html file is empty. To save the .html file I used saveWidget option from htmlwidgets package. Is it something wrong with saveWidget option? The code I used follows:
#!/usr/bin/Rscript
######R script for BioCircos test
library(htmlwidgets)
library(BioCircos)
genomes <- list("chra1" = 217471166, "chra2" = 181034961, "chra3" = 153873357, "chra4" = 153961319, "chra5" = 164033575,
"chra6" = 154486312, "chra7" = 133565930, "chra8" = 147241510, "chra9" = 91218944, "chra10" = 52432566, "chrb1" = 843366180, "chrb2" = 842558404, "chrb3" = 707956555, "chrb4" = 635713434, "chrb5" = 567300182,
"chrb6" = 439630435, "chrb7" = 236595445, "chrb8" = 231667822, "chrb9" = 230778867, "chrb10" = 151572763, "chrb11" = 103205957) # custom genome
links_chromosomes_01 <- c("chra1", "chra2", "chra3", "chra4", "chra4", "chra5", "chra6", "chra7", "chra7", "chra8", "chra8", "chra9", "chra10") # Chromosomes on which the links should start
links_chromosomes_02 <- c("chrb2", "chrb3", "chrb1", "chrb9", "chrb10", "chrb4", "chrb5", "chrb6", "chrb1", "chrb8", "chrb3", "chrb7", "chrb6") # Chromosomes on which the links should end
links_pos_01 <- c(115060347, 102611974, 14761160, 128700431, 128681496, 42116205, 58890582, 40356090,
146935315, 136481944, 157464876, 39323393, 84752508, 136164354,
99573657, 102580613,
111139346, 120764772, 90748238, 122164776,
44933176, 18823342,
48771409, 128288229, 150613881, 18509106, 123913217, 51237349,
34237851, 53357604, 78270031,
25306417, 25320614,
94266153,
41447919, 28810876, 2802465,
45583472,
81968637, 27858237, 17263637,
30569409) ### links chra chromosomes
links_pos_02 <- c(410543481, 463189512, 825903588, 353914638, 354135472, 717707494, 643107332, 724899652,
583713545, 558756961, 642015290, 154999098, 340216235, 557731577,
643350872, 655077847,
85356666, 157889318, 226411560, 161566470,
109857786, 25338955,
473876792, 124495704, 46258030, 572314729, 141584107, 426419779,
531245660, 220131772, 353941099,
62422773, 62387030,
116923325,
76544045, 33452274, 7942164,
642047816,
215981114, 39278129, 23302654,
418922633) ### links chrb chromosomes
links_labels <- c("aldh1a3", "amh", "cyp26b1", "dmrt1", "dmrt3", "fgf20", "hhip", "srd5a3",
"amhr2", "dhh", "fgf9", "nr0b1", "rspo1", "wnt1",
"aldh1a2", "cyp19a1",
"lhx9", "pdgfb", "ptch2", "sox10",
"cbln1", "wt1",
"esr1", "foxl2", "gata4", "lrpprc", "serpine2", "srd5a2",
"asns", "ctnnb1", "srd5a1",
"cyp26a1", "cyp26c1",
"wnt4",
"ar", "nr5a1", "ptgds",
"fgf16",
"cxcr4", "pdgfa", "sox8",
"sox9")
tracklist <- BioCircosLinkTrack('myLinkTrack', links_chromosomes_01, links_pos_01,
links_pos_01, links_chromosomes_02, links_pos_02, links_pos_02,
maxRadius = 0.55, labels = links_labels)
#plotting results
plot_chra_chrb <- BioCircos(tracklist, genome = chra_chrb_genomes, genomeFillColor = "RdBu", chrPad = 0.02, displayGenomeBorder = FALSE, genomeLabelTextSize = "10pt", genomeTicksScale = 4e+3,
elementId = "chra_chrb_comp_plot_test.html")
saveWidget(plot_chra_chrb, "chra_chrb_comp_plot_test.html", selfcontained = F, libdir = "lib")
The command line to run this script:
Rscript /path_to/Circle_plot_test.r
I tried to use this script in RStudio (without saveWidget() command), however it took too long to run in my personnel computer and the results was not displayed. However, this could be due to memory usage setup because when I took off some data, the script easily generates the plot in RStudio and I am able to save it. Is there other way to save the .hmtl interactive files in R or am I doing something wrong using htmlwidgets package in my script?
Thanks all in advance for any help and comments.
When you said it took too long to run, that was a sign that something was wrong! You weren't getting anything when you used saveWidget, because there is nothing returned from BioCiros.
I found two things that are a problem. The first one will result in a blank output—you can't use a '.' in the element ID. This ID will be used in the HTML coding.
You were getting huge delays due to the scale you set for genomeTickScale. That scaling value is for a tick mark attribute. I'm not sure why you set it to .004. However, when I comment out that line, it renders immediately. I have no issues with saving the widget, either.
--One other thing, you had chra_chrb_genomes as the object name assigned to the parameter genome in the function BioCircos. I assumed it was the object genome from your question since it was the only unused object.
The only things I changed were in the BioCircos function:
(plot_chra_chrb <- BioCircos(tracklist, genome = genomes, #chra_chrb_genomes,
genomeFillColor = "RdBu",
chrPad = 0.02,
displayGenomeBorder = FALSE,
genomeLabelTextSize = "10pt",
# genomeTicksScale = 4e+3, # problematic
elementId = "chra_chrb_comp_plot_test" # no periods
))

How to automatically crop an .OBJ 3D model to a bounding box?

In the now obsoleted Autodesk ReCap API it was possible to specify a "bounding box" around the scene to be generated from images.
In the resulting models, any vertices outside the bounding box were discarded, and any volumes that extended beyond the bounding box were truncated to have faces at the box boundaries.
I am now using Autodesk's Forge Reality Capture API which replaced ReCap. Apparently, This new API does not allow the user to specify a bounding box.
So I am now searching for a program that takes an .OBJ file and a specified bounding box as input, and outputs a file of just the vertices and faces within this bounding box.
Given that there is no way to specify the bounding box in Reality Capture API, I created this python program. It is crude, in that it only discards faces that have vertices that are outside the bounding box. And it actually does discards nondestructively, only by commenting them out in the output OBJ file. This allows you to uncomment them and then use a different bounding box.
This may not be what you need if you truly want to remove all relevant v, vn, vt, vp and f lines that are outside the bounding box, because the OBJ file size remains mostly unchanged. But for my particular needs, keeping all the records and just using comments was preferable.
# obj3Dcrop.py
# (c) Scott L. McGregor, Dec 2019
# License: free for all non commercial uses. Contact author for any other uses.
# Changes and Enhancements must be shared with author, and be subject to same use terms
# TL;DR: This program uses a bounding box, and "crops" faces and vertices from a
# Wavefront .OBJ format file, created by Autodesk Forge Reality Capture API
# if one of the vertices in a face is not within the bounds of the box.
#
# METHOD
# 1) All lines other than "v" vertex definitions and "f" faces definitions
# are copied UNCHANGED from the input .OBJ file to an output .OBJ file.
# 2) All "v" vertex definition lines have their (x, y, z) positions tested to see if:
# minX < x < maxX and minY < y < maxY and minZ < z < maxZ ?
# If TRUE, we want to keep this vertex in the new OBJ, so we
# store its IMPLICIT ORDINAL position in the file in a dictionary called v_keepers.
# If FALSE, we will use its absence from the v_keepers file as a way to identify
# faces that contain it and drop them. All "v" lines are also copied unchanged to the
# output file.
# 3) All "f" lines (face definitions) are inspected to verify that all 3 vertices in the face
# are in the v_keepers list. If they are, the f line is output unchanged.
# 4) Any "f" line that refers to a vertex that was cropped, is prefixed by "# CROPPED: "
# in the output file. Lines beginning # are treated as comments, and ignored in future
# processing.
# KNOWN LIMITATIONS: This program generates models in which the outside of bound faces
# have been removed. The vertices that were found outside the bounding box, are still in the
# OBJ file, but they are now disconnected and therefore ignored in later processing.
# The "f" lines for faces with vertices outside the bounding box are also still in the
# output file, but now commented out, so they don't process. Because this is non-destructive.
# we can easily change our bounding box later, uncomment cropped lines and reprocess.
#
# This might be an incomplete solution for some potential users. For such users
# a more complete program would delete unneeded v, vn, vt and vp lines when the v vertex
# that they refer to is dropped. But note that this requires renumbering all references to these
# vertice definitions in the "f" face definition lines. Such a more complete solution would also
# DISCARD all 'f' lines with any vertices that are out of bounds, instead of making them copies.
# Such a rewritten .OBJ file would be var more compact, but changing the bounding box would require
# saving the pre-cropped original.
# QUIRK: The OBJ file format defines v, vn, vt, vp and f elements by their
# IMPLICIT ordinal occurrence in the file, with each element type maintaining
# its OWN separate sequence. It then references those definitions EXPLICITLY in
# f face definitions. So deleting (or commenting out) element references requires
# appropriate rewriting of all the"f"" lines tracking all the new implicit positions.
# Such rewriting is not particularly hard to do, but it is one more place to make
# a mistake, and could make the algorithm more complicated to understand.
# This program doesn't bother, because all further processing of the output
# OBJ file ignores unreferenced v, vn, vt and vp elements.
#
# Saving all lines rather than deleting them to save space is a tradeoff involving considerations of
# Undo capability, compute cycles, compute space (unreferenced lines) and maintenance complexity choice.
# It is left to the motivated programmer to add this complexity if needed.
import sys
#bounding_box = sys.argv[1] # should be in the only string passsed (maxX, maxY, maxZ, minX, minY, minZ)
bounding_box = [10, 10, 10, -10, -10, 1]
maxX = bounding_box[0]
maxY = bounding_box[1]
maxZ = bounding_box[2]
minX = bounding_box[3]
minY = bounding_box[4]
minZ = bounding_box[5]
v_keepers = dict() # keeps track of which vertices are within the bounding box
kept_vertices = 0
discarded_vertices = 0
kept_faces = 0
discarded_faces = 0
discarded_lines = 0
kept_lines = 0
obj_file = open('sample.obj','r')
new_obj_file = open('cropped.obj','w')
# the number of the next "v" vertex lines to process.
original_v_number = 1 # the number of the next "v" vertex lines to process.
new_v_number = 1 # the new ordinal position of this vertex if out of bounds vertices were discarded.
for line in obj_file:
line_elements = line.split()
# Python doesn't have a SWITCH statement, but we only have three cases, so we'll just use cascading if stmts
if line_elements[0] != "f": # if it isn't an "f" type line (face definition)
if line_elements[0] != "v": # and it isn't an "v" type line either (vertex definition)
# ************************ PROCESS ALL NON V AND NON F LINE TYPES ******************
# then we just copy it unchanged from the input OBJ to the output OBJ
new_obj_file.write(line)
kept_lines = kept_lines + 1
else: # then line_elements[0] == "v":
# ************************ PROCESS VERTICES ****************************************
# a "v" line looks like this:
# f x y z ...
x = float(line_elements[1])
y = float(line_elements[2])
z = float(line_elements[3])
if minX < x < maxX and minY < y < maxY and minZ < z < maxZ:
# if vertex is within the bounding box, we include it in the new OBJ file
new_obj_file.write(line)
v_keepers[str(original_v_number)] = str(new_v_number)
new_v_number = new_v_number + 1
kept_vertices = kept_vertices +1
kept_lines = kept_lines + 1
else: # if vertex is NOT in the bounding box
new_obj_file.write(line)
discarded_vertices = discarded_vertices +1
discarded_lines = discarded_lines + 1
original_v_number = original_v_number + 1
else: # line_elements[0] == "f":
# ************************ PROCESS FACES ****************************************
# a "f" line looks like this:
# f v1/vt1/vn1 v2/vt2/vn2 v3/vt3/vn3 ...
# We need to delete any face lines where ANY of the 3 vertices v1, v2 or v3 are NOT in v_keepers.
v = ["", "", ""]
# Note that v1, v2 and v3 are the first "/" separated elements within each line element.
for i in range(0,3):
v[i] = line_elements[i+1].split('/')[0]
# now we can check if EACH of these 3 vertices are in v_keepers.
# for each f line, we need to determine if all 3 vertices are in the v_keepers list
if v[0] in v_keepers and v[1] in v_keepers and v[2] in v_keepers:
new_obj_file.write(line)
kept_lines = kept_lines + 1
kept_faces = kept_faces +1
else: # at least one of the vertices in this face has been deleted, so we need to delete the face too.
discarded_lines = discarded_lines + 1
discarded_faces = discarded_faces +1
new_obj_file.write("# CROPPED "+line)
# end of line processing loop
obj_file.close()
new_obj_file.close()
print ("kept vertices: ", kept_vertices ,"discarded vertices: ", discarded_vertices)
print ("kept faces: ", kept_faces, "discarded faces: ", discarded_faces)
print ("kept lines: ", kept_lines, "discarded lines: ", discarded_lines)
Unfortunately, (at least for now) there is no way to specify the bounding box in Reality Capture API.

octave/matlab read text file line by line and save only numbers into matrix

I have a question regarding octave or matlab data post processing.
I have files exported from fluent like below:
"Surface Integral Report"
Mass-Weighted Average
Static Temperature (k)
crossplane-x-0.001 1242.9402
crossplane-x-0.025 1243.0017
crossplane-x-0.050 1243.2036
crossplane-x-0.075 1243.5321
crossplane-x-0.100 1243.9176
And I want to use octave/matlab for post processing.
If I read first line by line, and save only the lines with "crossplane-x-" into a new file, or directly save the data in those lines into a matrix. Since I have many similar files, I can make plots by just calling their titles.
But I go trouble on identify lines which contain the char "crossplane-x-". I am trying to do things like this:
clear, clean, clc;
% open a file and read line by line
fid = fopen ("h20H22_alongHGpath_temp.dat");
% save full lines into a new file if only chars inside
txtread = fgetl (fid)
num_of_lines = fskipl(fid, Inf);
char = 'crossplane-x-'
for i=1:num_of_lines,
if char in fgetl(fid)
[x, nx] = fscanf(fid);
print x
endif
endfor
fclose (fid);
Would anybody shed some light on this issue ? Am I using the right function ? Thank you.
Here's a quick way for your specific file:
>> S = fileread("myfile.dat"); % collect file contents into string
>> C = strsplit(S, "crossplane-x-"); % first cell is the header, rest is data
>> M = str2num (strcat (C{2:end})) % concatenate datastrings, convert to numbers
M =
1.0000e-03 1.2429e+03
2.5000e-02 1.2430e+03
5.0000e-02 1.2432e+03
7.5000e-02 1.2435e+03
1.0000e-01 1.2439e+03