I am trying to read series of dicom images from a folder named as series 8.below is code to read series of dicom images from a particular folder.i am getting error index exceeds matrix dimensions at info = dicominfo(fullfile(fileFolder,fileNames{1})).
clear all;
close all;
clc;
fileFolder = fullfile(pwd, 'series 8');
files = dir ( fullfile (fileFolder, '*.dcm'));
fileNames = {files.name};
%examine file header (metadata , from dicom stack)
info = dicominfo(fullfile(fileFolder,fileNames{1}))
%extract size info from metadata
voxelsize = [info.PixelSpacing;info.SliceThickness];
%read one file to get size
I = dicomread(fullfile(fileFolder,fileNames{1}))
classI = class(I);
sizeI = size(I);
numImages = length(fileNames);
%read slice images populate 3d matrix
hWaitBar = waitbar(0,'reading dicom files');
%create array
mri= zeroes(info.rows , info.columns , numImages , classI )
for i=length(fileNames):-1:1
fname = fullfile(fileFolder, fileNames{i});
mri(:,:,i) = unit16(dicomread(fname));
waitbar((length(fileNames)-i+1)/length(fileNames))
end
delete(hWaitBar);
Below two options that will do it. In the first you create a loop around your whole approach and as filenames is zero when there is no dcm file the loop will not get executed. The second option tests whether files is empty and if so it is not executed.
clear all;
close all;
clc;
fileFolder = fullfile(pwd, 'series 8');
files = dir ( fullfile (fileFolder, '*.dcm'));
fileNames = {files.name};
%examine file header (metadata , from dicom stack)
for i=length(fileNames):-1:1
if (i == 1)
info = dicominfo(fullfile(fileFolder,fileNames{i}));
%extract size info from metadata
voxelsize = [info.PixelSpacing;info.SliceThickness];
%read one file to get size
I = dicomread(fullfile(fileFolder,fileNames{i}));
classI = class(I);
sizeI = size(I);
numImages = length(fileNames);
%read slice images populate 3d matrix
hWaitBar = waitbar(0,'reading dicom files');
%create array
mri= zeroes(info.rows , info.columns , numImages , classI );
else
fname = fullfile(fileFolder, fileNames{i});
mri(:,:,i) = unit16(dicomread(fname));
waitbar((length(fileNames)-i+1)/length(fileNames))
end
end
delete(hWaitBar);
the second
clear all;
close all;
clc;
fileFolder = fullfile(pwd, 'series 8');
files = dir ( fullfile (fileFolder, '*.dcm'));
fileNames = {files.name};
%examine file header (metadata , from dicom stack)
if ~isempty(files)
info = dicominfo(fullfile(fileFolder,fileNames{1}))
%extract size info from metadata
voxelsize = [info.PixelSpacing;info.SliceThickness];
%read one file to get size
I = dicomread(fullfile(fileFolder,fileNames{1}))
classI = class(I);
sizeI = size(I);
numImages = length(fileNames);
%read slice images populate 3d matrix
hWaitBar = waitbar(0,'reading dicom files');
%create array
mri= zeroes(info.rows , info.columns , numImages , classI )
for i=length(fileNames):-1:1
fname = fullfile(fileFolder, fileNames{i});
mri(:,:,i) = unit16(dicomread(fname));
waitbar((length(fileNames)-i+1)/length(fileNames))
end
delete(hWaitBar);
end
Related
I am creating a python script that can read scanned, and tabular .pdfs and extract some important data and insert it into a JSON to later be implemented into a SQL database (I will also be developing the DB as a project for learning MongoDB).
Basically, my issue is I have never worked with any JSON files before but that was the format I was recommended to output to. The scraping script works, the pre-processing could be a lot cleaner, but for now it works. The issue I run into is the keys, and values are in the same list, and some of the values because they had a decimal point are two different list items. Not really sure where to even start.
I don't really know where to start, I suppose since I know what the indexes of the list are I can easily assign keys and values, but then it may not be applicable to any .pdf, that is the script cannot be coded explicitly.
import PyPDF2 as pdf2
import textract
with "TestSpec.pdf" as filename:
pdfFileObj = open(filename, 'rb')
pdfReader = pdf2.pdfFileReader(pdfFileObj)
num_pages = pdfReader.numpages
count = 0
text = ""
while count < num_pages:
pageObj = pdfReader.getPage(0)
count += 1
text += pageObj.extractText()
if text != "":
text = text
else:
text = textract.process(filename, method="tesseract", language="eng")
def cleanText(x):
'''
This function takes the byte data extracted from scanned PDFs, and cleans it of all
unnessary data.
Requires re
'''
stringedText = str(x)
cleanText = stringedText.replace('\n','')
splitText = re.split(r'\W+', cleanText)
caseingText = [word.lower() for word in splitText]
cleanOne = [word for word in caseingText if word != 'n']
dexStop = cleanOne.index("od260")
dexStart = cleanOne.index("sheet")
clean = cleanOne[dexStart + 1:dexStop]
return clean
cleanText = cleanText(text)
This is the current output
['n21', 'feb', '2019', 'nsequence', 'lacz', 'rp', 'n5', 'gat', 'ctc', 'tac', 'cat', 'ggc', 'gca', 'cat', 'ttc', 'ccc', 'gaa', 'aag', 'tgc', '3', 'norder', 'no', '15775199', 'nref', 'no', '207335463', 'n25', 'nmole', 'dna', 'oligo', '36', 'bases', 'nproperties', 'amount', 'of', 'oligo', 'shipped', 'to', 'ntm', '50mm', 'nacl', '66', '8', 'xc2', 'xb0c', '11', '0', '32', '6', 'david', 'cook', 'ngc', 'content', '52', '8', 'd260', 'mmoles', 'kansas', 'state', 'university', 'biotechno', 'nmolecular', 'weight', '10', '965', '1', 'nnmoles']
and we want the output as a JSON setup like
{"Date | 21feb2019", "Sequence ID: | lacz-rp", "Sequence 5'-3' | gat..."}
and so on. Just not sure how to do that.
here is a screenshot of the data from my sample pdf
So, i have figured out some of this. I am still having issues with grabbing the last 3rd of the data i need without explicitly programming it in. but here is what i have so far. Once i have everything working then i will worry about optimizing it and condensing.
# for PDF reading
import PyPDF2 as pdf2
import textract
# for data preprocessing
import re
from dateutil.parser import parse
# For generating the JSON file array
import json
# This finds and opens the pdf file, reads the data, and extracts the data.
filename = "*.pdf"
pdfFileObj = open(filename, 'rb')
pdfReader = pdf2.PdfFileReader(pdfFileObj)
text = ""
pageObj = pdfReader.getPage(0)
text += pageObj.extractText()
# checks if extracted data is in string form or picture, if picture textract reads data.
# it then closes the pdf file
if text != "":
text = text
else:
text = textract.process(filename, method="tesseract", language="eng")
pdfFileObj.close()
# Converts text to string from byte data for preprocessing
stringedText = str(text)
# Removed escaped lines and replaced them with actual new lines.
formattedText = stringedText.replace('\\n', '\n').lower()
# Slices the long string into a workable piece (only contains useful data)
slice1 = formattedText[(formattedText.index("sheet") + 10): (formattedText.index("secondary") - 2)]
clean = re.sub('\n', " ", slice1)
clean2 = re.sub(' +', ' ', clean)
# Creating the PrimerData dictionary
with open("PrimerData.json",'w') as file:
primerDataSlice = clean[clean.index("molecular"): -1]
primerData = re.split(": |\n", primerDataSlice)
primerKeys = primerData[0::2]
primerValues = primerData[1::2]
primerDict = {"Primer Data": dict(zip(primerKeys,primerValues))}
# Generatring the JSON array "Primer Data"
primerJSON = json.dumps(primerDict, ensure_ascii=False)
file.write(primerJSON)
# Grabbing the date (this has just the date, so json will have to add date.)
date = re.findall('(\d{2}[\/\- ](\d{2}|january|jan|february|feb|march|mar|april|apr|may|may|june|jun|july|jul|august|aug|september|sep|october|oct|november|nov|december|dec)[\/\- ]\d{2,4})', clean2)
Without input data it is difficult to give you working code. A minimal working example with input would help. As for JSON handling, python dictionaries can dump to json easily. See examples here.
https://docs.python-guide.org/scenarios/json/
Get a json string from a dictionary and write to a file. Figure out how to parse the text into a dictionary.
import json
d = {"Date" : "21feb2019", "Sequence ID" : "lacz-rp", "Sequence 5'-3'" : "gat"}
json_data = json.dumps(d)
print(json_data)
# Write that data to a file
So, I did figure this out, the problem was really just that because of the way my pre-processing was pulling all the data into a single list wasn't really that great of an idea considering that the keys for the dictionary never changed.
Here is the semi-finished result for making the Dictionary and JSON file.
# Collect the sequence name
name = clean2[clean2.index("Sequence") + 11: clean2.index("Sequence") + 19]
# Collecting Shipment info
ordered = input("Who placed this order? ")
received = input("Who is receiving this order? ")
dateOrder = re.findall(
r"(\d{2}[/\- ](\d{2}|January|Jan|February|Feb|March|Mar|April|Apr|May|June|Jun|July|Jul|August|Aug|September|Sep|October|Oct|November|Nov|December|Dec)[/\- ]\d{2,4})",
clean2)
dateReceived = date.today()
refNo = clean2[clean2.index("ref.No. ") + 8: clean2.index("ref.No.") + 17]
orderNo = clean2[clean2.index("Order No.") +
10: clean2.index("Order No.") + 18]
# Finding and grabbing the sequence data. Storing it and then finding the
# GC content and melting temp or TM
bases = int(clean2[clean2.index("bases") - 3:clean2.index("bases") - 1])
seqList = [line for line in clean2 if re.match(r'^[AGCT]+$', line)]
sequence = "".join(i for i in seqList[:bases])
def gc_content(x):
count = 0
for i in x:
if i == 'G' or i == 'C':
count += 1
else:
count = count
return round((count / bases) * 100, 1)
gc = gc_content(sequence)
tm = mt.Tm_GC(sequence, Na=50)
moleWeight = round(mw(Seq(sequence, generic_dna)), 2)
dilWeight = float(clean2[clean2.index("ug/OD260:") +
10: clean2.index("ug/OD260:") + 14])
dilution = dilWeight * 10
primerDict = {"Primer Data": {
"Sequence": sequence,
"Bases": bases,
"TM (50mM NaCl)": tm,
"% GC content": gc,
"Molecular weight": moleWeight,
"ug/0D260": dilWeight,
"Dilution volume (uL)": dilution
},
"Shipment Info": {
"Ref. No.": refNo,
"Order No.": orderNo,
"Ordered by": ordered,
"Date of Order": dateOrder,
"Received By": received,
"Date Received": str(dateReceived.strftime("%d-%b-%Y"))
}}
# Generating the JSON array "Primer Data"
with open("".join(name) + ".json", 'w') as file:
primerJSON = json.dumps(primerDict, ensure_ascii=False)
file.write(primerJSON)
Source layer is layer, output layer is output. The script is updating the source layer with the new fields and their tally, along with the output layer. I've tried deleting fields from layer at the end; setting fc as a different output, copying fc to ouput at the end and then deleting the fields from fc/layer after that; and copying the source layer right of the bat (conceptually this makes the most sense to me...maybe I did it wrong)...no dice.
Any ideas that would preserve the source layer as is but get this script to run and tally on the output? Thanks for any input!!
#workspace
arcpy.env.workspace = wspace = arcpy.GetParameterAsText(0)
#buildings
layer = arcpy.GetParameterAsText(1)
#trees
trees = arcpy.GetParameterAsText(2)
#buffer building to search
buffer = arcpy.GetParameterAsText(3)
#tree field interested in - tree condition, tree location, or tree pit
tf = arcpy.GetParameterAsText(4)
#output file
output = arcpy.GetParameterAsText(5)
#make feature layers to reference
treelayer = arcpy.MakeFeatureLayer_management(trees, trees + ".shp")
fc = arcpy.MakeFeatureLayer_management(layer, output)
pit = ["Sidewalk", "Continuous", "Lawn"]
if tf == "Tree Pit":
for a in pit:
arcpy.AddField_management(fc, a, "SHORT")
with arcpy.da.SearchCursor(fc, ["OBJECTID"]) as fcrows:
for a in fcrows:
arcpy.SelectLayerByAttribute_management(fc, "NEW_SELECTION", "OBJECTID={}".format(a[0]))
arcpy.SelectLayerByLocation_management(treelayer, "WITHIN_A_DISTANCE", fc, buffer, "NEW_SELECTION")
tlrows = arcpy.da.SearchCursor(treelayer, "SITE")
list1 = []
for tlrow in tlrows:
list1.append(int(tlrow[0]))
fcrows1 = arcpy.da.UpdateCursor(fc, pit)
for fcrow1 in fcrows1:
if list1.count(1) > 0:
fcrow1[0] = list1.count(1)
else:
fcrow1[0] = 0
if list1.count(2) > 0:
fcrow1[1] = list1.count(2)
else:
fcrow1[1] = 0
if list1.count(3) > 0:
fcrow1[2] = list1.count(3)
else:
fcrow1[2] = 0
fcrows1.updateRow(fcrow1)
You don't want a variable equal to the function -- just make the feature layer.
arcpy.MakeFeatureLayer_management(layer, output)
Then, subsequent steps should affect only the output layer and ignore the source layer, e.g.:
for a in pit:
arcpy.AddField_management(output, a, "SHORT")
with arcpy.da.SearchCursor(output, ["OBJECTID"]) as fcrows:
I have yet to find a complete example for using the mkfifo() function online. I am able to make the fifo like this:
mkfifo("file",777)
But when I fopen() this file, Octave just hangs. What is the proper way to create, queue, and dequeue bytes from a mkfifo object?
I would like to create an in-memory fifo in Octave (on-disk is fine too) and read and write it from the same Octave script. My project is running in real time, and so I need a buffer so that I can fill and drain from the same Octave script. I've searched for a fifo library with zero results. Even just creating a vector and pushing and popping will suit my needs. I tried this myself, but I'm running into object oriented programming design problems because Octave does not allow pass by reference or pointers.
There are two issues. First: mkfifo expects mode as integer with base 10, if you write "777" you think in octal, base 8. Second: mkfifo uses umask to modify the permissions to (mode & ~umask) (See man 3)
As example:
fn=tempname
[ERR, MSG] = mkfifo (fn, base2dec("744", 8))
stat(fn)
fn = /tmp/oct-83UCBR
ERR = 0
MSG =
ans =
scalar structure containing the fields:
dev = 2053
ino = 3408172
mode = 4580
modestr = prwxr--r--
nlink = 1
uid = 1000
gid = 1000
rdev = 0
size = 0
atime = 1.4311e+09
mtime = 1.4311e+09
ctime = 1.4311e+09
blksize = 4096
blocks = 0
As you can see the modestr is now prwxr--r-- as you would expect from octal 744
Now you can open one end of the FIFO:
fid = fopen (fn, "r")
Of course this blocks until the other end of the fifo gets connected.
The fifo class works, but only up to a certain size. The max size in bytes of a fifo can be found by running:
cat /proc/sys/fs/pipe-max-size
1048576
Below is the code that I wrote for an in-memory fifo. It's fairly crude but it works well:
1; % Prevent Octave from thinking that this is a function file
global fifoCount fifoSamples fifoFiles fifoFids fifoDataType
fifoSamples = zeros(0);
fifoCount = 0;
fifoFiles = cell(1);
fifoFids = zeros(0);
fifoDataType = 'single';
fifoDataTypeSize = 4;
fifoMaxBytes = 1048576; % this is operating system enforced, changing here will not help
function [] = o_fifo_write(index, data)
global fifoCount fifoSamples fifoFiles fifoFids fifoDataType
wrcount = fwrite(fifoFids(index), data, fifoDataType);
[sz,~] = size(data);
fifoSamples(index) = fifoSamples(index) + sz;
if( sz ~= wrcount )
disp(sprintf('o_fifo_write was given %d samples but wrote %d', sz, wrcount));
end
if( ~iscolumn(data) )
disp('data must be columnar in o_fifo_write');
end
end
function [data] = o_fifo_read(index, count)
global fifoCount fifoSamples fifoFiles fifoFids fifoDataType
[data, rdcount] = fread(fifoFids(index), count, fifoDataType);
[sz,~] = size(data);
fifoSamples(index) = fifoSamples(index) - sz;
if( sz ~= rdcount || sz ~= count )
disp(sprintf('in o_fifo_read %d %d %d should all be the same', sz, rdcount, count));
end
end
function [avail] = o_fifo_avail(index)
global fifoCount fifoSamples fifoFiles fifoFids fifoDataType
avail = fifoSamples(index);
end
function index = o_fifo_new()
global fifoCount fifoSamples fifoFiles fifoFids fifoDataType
fifoCount = fifoCount + 1;
index = fifoCount;
fifoSamples(index) = 0;
fifoFiles{index} = tempname;
[ERR, MSG] = mkfifo(fifoFiles{index}, base2dec('744',8));
fifoFids(index) = fopen(fifoFiles{index}, 'a+');
% fcntl(fifoFids(index), F_SETFL, O_NONBLOCK); % uncomment to avoid hangs when trying to overfill fifo
end
% ---- usage -----
txfifo = o_fifo_new();
disp(o_fifo_avail(txfifo));
o_fifo_write(txfifo, [1.243 pi 2*pi 4/3*pi]');
disp(o_fifo_avail(txfifo));
disp(o_fifo_read(txfifo, 4));
disp(o_fifo_avail(txfifo));
I'm learning programming in computercraft (minecraft) and having some trouble with reading some storage cells.
The function that I'm working on shall go through all cells and add together the storage capacity to a variable in a for loop.
This is what I got so far
local cell1 = peripheral.wrap("tile_thermalexpansion_cell_reinforced_name_2")
local cell2 = peripheral.wrap("tile_thermalexpansion_cell_reinforced_name_3")
local cell3 = peripheral.wrap("tile_thermalexpansion_cell_reinforced_name_4")
local cell4 = peripheral.wrap("tile_thermalexpansion_cell_reinforced_name_5")
local cell5 = peripheral.wrap("tile_thermalexpansion_cell_reinforced_name_6")
local cell6 = peripheral.wrap("tile_thermalexpansion_cell_reinforced_name_7")
cells = {"cell1", "cell2", "cell3", "cell4", "cell5", "cell6"}
local totStorage
function getTotStorage(table)
for key = 1,6 do
x = table[key]
totStorage = totStorage + x.getMaxEnergyStored()
end
print(totStorage)
end
I get an error on this line
totStorage = totStorage + x.getMaxEnergyStored()
saying "Attempt to call nil".
Any suggestions?
cells = {"cell1", "cell2", "cell3", "cell4", "cell5", "cell6"}
Is shorthand for:
cells = {
-- key value
[1] = "cell1",
[2] = "cell2",
[3] = "cell3",
[4] = "cell4",
[5] = "cell5",
[6] = "cell6"
}
Assuming that your getTotStorage call is similar to below (you didn't post it)
getTotStorage(cells)
In the loop you try to call a method on x which is a string value:
for key = 1,6 do
x = table[key]
totStorage = totStorage + x.getMaxEnergyStored()
end
This is essentially trying to do:
totStorage = totStorage + ("cell1").getMaxEnergyStored()
What you could do is rearrange your code so that the values are the objects returned by peripheral.wrap:
local cells = {
peripheral.wrap("tile_thermalexpansion_cell_reinforced_name_2"),
peripheral.wrap("tile_thermalexpansion_cell_reinforced_name_3"),
peripheral.wrap("tile_thermalexpansion_cell_reinforced_name_4"),
peripheral.wrap("tile_thermalexpansion_cell_reinforced_name_5"),
peripheral.wrap("tile_thermalexpansion_cell_reinforced_name_6"),
peripheral.wrap("tile_thermalexpansion_cell_reinforced_name_7"),
}
function getTotStorage(t)
local totStorage = 0
for i,v in ipairs(t) do
totStorage = totStorage + v.getMaxEnergyStored()
end
print(totStorage)
end
A few observations:
Use local when you can (i.e. totStorage) to limit the scope of a variable (no need to have a loop counter as a global)
Don't name variables that collide with the standard Lua library (i.e. string, table, math)
ipairs is a better way to loop through a sequence
Is it possible to insert or save an image to a database table using MatLab?
Here's my code:
%Code for Database Login
conn = database('vlmsystem','admin','vlog');
indata = imread('C:\Users\Sony Vaio\Documents\Task\0.1 Systems\System 1 - edited\Appendix\images database\auto1.jpg');
a = getframe(h);
indata = a.cdata;
hgsave(h, 'tempfile.fig')
fid = fopen('tempfile.fig', 'r')
indata = fread(fid, inf, '*uint8')
fclose(fid)
s = size(indata);
bdata = reshape(indata,[],1);
x = conn.Handle
StatementObject = x.preparestatement(insertcommand);
StatementObject.setObject(1,bdata)
StatementObject.execute
close(StatementObject)
dbpath = 'C:\Users\Sony Vaio\Documents\Task\0.1 Systems\System 1 - edited\Appendix\vlogdbase.mdb';
tableName = 'vehicleLog';
colnames = {'date_time','plate_number','login_logout','physical_feature'}
colnames1 = {'date_time'}
colnames2 = {'plate_number'}
colnames3 = {'login_logout'}
colnames4 = {'physical_feature'}
dat = datestr(now);
pltno = (f);
lilo = 'login';
physf = {bdata}
coldata = {dat,pltno,lilo,}
insert(conn,tableName,colnames,coldata);
close(conn);
And I am getting this error.
Error using graphicsversion Input was not a valid graphics object
Error in getframe (line 50) usingMATLABClasses =
~graphicsversion(parentFig, 'handlegraphics');
Error in licenseplate>StartKnop_Callback (line 248) a = getframe(h);
Tried copying this solution but I can't seem to make it work. Here's the link.
EDIT: Fix Code....but... how to insert binary data into the database.
There's no binary option in the database. The result won't feed into the table.
%Code for Database Login
conn = database('vlmsystem','admin','vlog');
indata = imread('C:\Users\Sony Vaio\Documents\Task\0.1 Systems\System 1 - edited\Appendix\images database\auto1.jpg');
s = size(indata);
bdata = reshape(indata,[],1);
dbpath = 'C:\Users\Sony Vaio\Documents\Task\0.1 Systems\System 1 - edited\Appendix\vlogdbase.mdb';
tableName = 'vehicleLog';
colnames = {'date_time','plate_number','login_logout','physical_feature'}
colnames1 = {'date_time'}
colnames2 = {'plate_number'}
colnames3 = {'login_logout'}
colnames4 = {'physical_feature'}
dat = datestr(now);
pltno = (f);
lilo = 'login';
physf = {bdata}
coldata = {dat,pltno,lilo,physf}
insert(conn,tableName,colnames,coldata);
close(conn);
Please read what you are copying.
The solution says:
Alternatively, if you have a figure and want to save a snapshot of it, use the command below:
You copied both blocks, one that reads files, one hat uses getframe to read a frame from a handle.