draw csv file data as a heatmap using numpy and matplotlib - csv

I was able to load my csv file into a numpy array:
data = np.genfromtxt('csv_file', dtype=None, delimiter=',')
Now I would like to generate a heatmap. I have 19 categories from 11 samples, along these lines:
COG station1 station2 station3 station4
COG0001 0.019393497 0.183122497 0.089911227 0.283250444 0.074110521
COG0002 0.044632051 0.019118032 0.034625785 0.069892277 0.034073709
COG0003 0.033066112 0 0 0 0
COG0004 0.115086472 0.098805295 0.148167492 0.040019101 0.043982814
COG0005 0.064613057 0.03924007 0.105262559 0.076839235 0.031070155
COG0006 0.079920475 0.188586049 0.123607421 0.27101229 0.274806929
COG0007 0.051727492 0.066311584 0.080655401 0.027024185 0.059156417
COG0008 0.126254841 0.108478559 0.139106704 0.056430812 0.099823028
I wanted to use matplotlib colormesh, but I'm at loss.
all the examples I could find used random number arrays.
any help and insights would be greatly appreciated.

What i can decrypt from your question is that you have an 11 x 19 array and the numbers comprising this array appear to be real numbers in the range 0 <= x <= 1 (obviously neither assumption is critical to the answer).
Below is the code to create a heatmap of your array such that the smallest values are lighter and the larger values are darker shades of grey (eg, '0' is white, and '1' is black).
So first, create an array identical in shape and value range to yours:
import numpy as NP
M = NP.random.rand(209).reshape(11, 19)
M.shape
# returns: (11, 19)
# if the array returned from your call to 'genfromtxt'
# is not 11 x 19,
# then you need to reshape it so that it is,
# use, e.g., 'data.reshape(11, 19)'
from matplotlib import pyplot as PLT
from matplotlib import cm as CM
fig = PLT.figure()
ax1 = fig.add_subplot(111)
gray_r refers to a particular matplotlib color map--ie, creates a look-up table that maps each of the cell values in your 2D array to a cell color/hue (put another way: color maps just maps a palette to data;
the r just refers to reverse; i pefer this mapping because it seems more intuitive to me--ie, white is mapped to 0 and larger values are mapped to darker shades of gray;
the available colormaps are in the module cm; dir(matplotlib.cm) to get a list of the installed colormaps (there are dozens); the Matplotlib Site has an excellent visual display of them (as a set of matplotlib plots of course).
# select the color map by calling get_cmap and passing in a registered colormap
# and an integer value for _lut_ which is just the number of different colors desired
cmap = CM.get_cmap('gray_r', 10)
# map the colors/shades to your data
ax1.imshow(M, interpolation="nearest", cmap=cmap)
# plot it
PLT.show()

Related

How to read, save and display images, encoded in csv format

I've got some images for training and testing a tensorflow model encoded in csv format. Is there a way to extract those images and / or save them in jpg like format?
Part of the file can be seen above as a opened in excel as a screenshot. If you prefer text to hyperlinks, here is a part of it in a form of a text:
label pixel1 pixel2 ...
6 149 149 ...
5 126 128 ...
10 85 88 ...
0 203 205 ...
There are 785 columns and 7173 rows in total. I have no idea how to deal with that.
You can do it like this
# first i create a dummy dataset to work on
data = make_classification(10000, n_features=784, random_state=1234)
df = pd.DataFrame(data[0], columns=[str(f'col_{i}') for i in range(784)])
df['label'] = data[1]
# Now we create a img_vector and labels array from the dataframe
img_vector = df[[str(f'col_{i}') for i in range(784)]].values
labels = df['label'].values
# splitting the data
# Now we creating the dataset
def get_img(inputs, labels):
# here you have 784 pixels which usually represent a 28*28 image with 1 channel
# hence I reshape it that way
img = tf.reshape(inputs, (28,28,1))
# you can also add some augmentation
img = tf.image.flip_left_right(img)
img = tf.image.flip_up_down(img)
return img, labels
# We pass the img_vector and labels to the make the dataset
train_dataset = tf.data.Dataset.from_tensor_slices((train_mat, train_label))
# Map the dataset to get images form it.
train_dataset = train_dataset.map(get_img).batch(16)
# same for valid dataset
valid_dataset = tf.data.Dataset.from_tensor_slices((valid_mat, valid_label))
valid_dataset = valid_dataset.map(get_img).batch(16)
# A sanity check
import matplotlib.pyplot as plt
sample = None
for i in train_dataset:
sample = i
break
plt.imshow(sample[0][0])
# Creating a model
model = tf.keras.Sequential([
tf.keras.layers.Conv2D(3,3, input_shape=(28,28,1)),
tf.keras.layers.Flatten(),
tf.keras.layers.Dense(1, activation='sigmoid')
])
# Compile the model
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['acc'])
# Finally train the model
model.fit(train_dataset,
epochs=10,
validation_data=valid_dataset)
Also, if you ever take a dataset from Kaggle you will usually find a sample notebook for that dataset in the code section.
You can read any row, plot it and save it as image like this:
import numpy as np
import pandas as pd
# read csv file
df = pd.read_csv("data.csv")
# read pixels
images = np.array(df.iloc[:,1:])
labels = np.array(df.iloc[:,0])
# select random number between 0 and 7172
index = 2
# reshape 784 rows to 28 height x 28 width
sample_image = images[index,:].reshape(28,28)
# import plt for displaying image
from matplotlib import pyplot as plt
# plot image
plt.imshow(sample_image)
plt.axis('off')
# plot it's label
print(labels[index])
# save image
plt.savefig("./image{}_label{}".format(index,labels[index]))

How to set decimal places in plotly subplots hoverlabel?

I would be very happy if someone could help me with this:
I created a loop using pandas and plotly express which creates n stacked subplots from a tuple of dataframes selected by the user.
The source data has 10 decimal places, so I set
pd.set_option('precision',10)
The dataframes show adequate decimal precision, the scatter plots work, but I cannot get the hover label to show all 10 decimal places.
I tried to set
fig.update_layout(hoverlabel_namelength=-1)
but it only changes the X-Axis reference in the hoverlabel, not the Y-Axis (containing the numbers).
Can anyone help me?
Thank you very much in advance!!
Maria
Here is my source program:
#import libraries
import tkinter as tk
import tkinter.filedialog
from pathlib import Path
import pandas as pd
import plotly.express as px
import plotly.graph_objects as go
from plotly.subplots import make_subplots
import numpy as np
pd.set_option('precision',10)
#select files into tuple 'datafiles' via tkinter
root = tkinter.Tk()
pathdir='/***/DSM_Exports/'
datafiles = tkinter.filedialog.askopenfilenames(parent=root,title='Choose a file', initialdir=pathdir)
datafiles = root.tk.splitlist(datafiles)
#prepare subplots template n rows, 1 column
fig = make_subplots(rows=len(datafiles), cols=1, shared_xaxes=True, vertical_spacing=0.01)
# set up loop to create subplot
for counter in range (0, len(datafiles)): #Set up loop with length of datafiles tuple
print(counter, datafiles[counter])
# import file
table=pd.read_csv(datafiles[counter], sep="\t", header=None)
pd.set_option('expand_frame_repr', False)
# extract DSM cumulative dose column
numrows = table.shape[0]+1
print('Number of rows', numrows)
DSMcml= table[[1,2,3]] #extract colulmns start time, end time and cumul dose
#double paranthesis!
DSMcml= DSMcml.iloc[1:numrows] #cut column name
DSMcml[2]= pd.to_datetime(DSMcml[2]) #convert to datetime endtime
DSMcml[3]=DSMcml[3].str.replace(',','.') #change dot to comma in [3]
DSMcml[3]=DSMcml[3].astype(float, errors = 'raise') #change [3] to float
DSMcml= DSMcml[DSMcml[3]>=0].dropna() #>>remove lines with values <0
fig_Xdata= DSMcml[2] #extract end times for X-axis
fig_Ydata= DSMcml[3].round(10) #extract cumul dose for Y-axis
tracename=Path(datafiles[counter]).stem
fig.add_trace(
go.Scatter(x=fig_Xdata, y=fig_Ydata, mode='lines', name=tracename),
row=counter+1, col=1)
fig.update_layout(title_text=datafiles[counter], hovermode='x unified', hoverlabel_namelength=-1)
fig.update_xaxes(showspikes=True, spikecolor='green', spikesnap='cursor', spikemode='across', spikedash='solid')
counterstring=str(counter+1) #set x-axis indicator for shared spike-line
fig.update_traces(xaxis='x'+counterstring) # set shared spike-line
fig.show()
```
You can use a hovertemplate when you add your traces:
fig.add_trace(go.Scatter(x=fig_Xdata, y=fig_Ydata,
hovertemplate='%{y:.10f}', mode='lines', name=tracename), row=counter+1, col=1)

Reshaping image in Pyfaster RCNN CAFFE model

I am working on a project to train the Pyfaster RCNN model using CAFFE. test.prototxt uses the below input parameters:
name: "ZF"
input: "data"
input_shape {
dim: 1
dim: 3
dim: 224
dim: 224
}
When demo.py is called this prototxt file is used. Can someone please tell me where exactly the the demo image gets reshaped to the above dimensions. I have traced that all the way back to the fast_rcnn.test.py file which has a function called im_detect. that has a line which I believe does the reshaping:
def im_detect(net, im, boxes=None):
"""Detect object classes in an image given object proposals.
Arguments:
net (caffe.Net): Fast R-CNN network to use
im (ndarray): color image to test (in BGR order)
boxes (ndarray): R x 4 array of object proposals or None (for RPN)
Returns:
scores (ndarray): R x K array of object class scores (K includes
background as object category 0)
boxes (ndarray): R x (4*K) array of predicted bounding boxes
"""
blobs, im_scales = _get_blobs(im, boxes)
# When mapping from image ROIs to feature map ROIs, there's some aliasing
# (some distinct image ROIs get mapped to the same feature ROI).
# Here, we identify duplicate feature ROIs, so we only compute features
# on the unique subset.
if cfg.DEDUP_BOXES > 0 and not cfg.TEST.HAS_RPN:
v = np.array([1, 1e3, 1e6, 1e9, 1e12])
hashes = np.round(blobs['rois'] * cfg.DEDUP_BOXES).dot(v)
_, index, inv_index = np.unique(hashes, return_index=True,
return_inverse=True)
blobs['rois'] = blobs['rois'][index, :]
boxes = boxes[index, :]
if cfg.TEST.HAS_RPN:
im_blob = blobs['data']
blobs['im_info'] = np.array(
[[im_blob.shape[2], im_blob.shape[3], im_scales[0]]],
dtype=np.float32)
# reshape network inputs
net.blobs['data'].reshape(*(blobs['data'].shape))
if cfg.TEST.HAS_RPN:
net.blobs['im_info'].reshape(*(blobs['im_info'].shape))
else:
net.blobs['rois'].reshape(*(blobs['rois'].shape))
But I am still not able to figure out how can I get to the file/code where these dimensions are defined.
Any help would be appreciated.
Look at the line
# reshape network inputs
net.blobs['data'].reshape(*(blobs['data'].shape))
As you can see, the input blob 'data' is reshaped according to the input image size. Once you forward caffe will reshape all consequent blobs according to the input shape

I don't understand the X_train Y_train I/O in scikit learn?

I have been attempting to use Scikit-neural network's mlp(multi layer perceptn) to train a model, for speaker recognition.
I have a csv file named 'mfcc_kushal.csv' that is arranged with 13 columns (for 13 features) and 1201 rows (for each frame). These are my own voice signal's features.
If i attempt to take this as ndarray, my X_train will have 1201 samples with 13 rows. Now my Y_train is basically a binary output, simply its me or not.
but there is an error saying the samples in X and Y are not the same.
The library tells me to use 1201 classes instead of two. Am i missing something. I think I am not getting the implementation.
My code:
import csv
import numpy as np
from sknn.mlp import Classifier, Layer
def csv_extractor(csv_file):
with open(csv_file,'r') as dest_f:
data_iter = csv.reader(dest_f,
delimiter = ',')
data = [data for data in data_iter]
feat = np.asarray(data, dtype = None)
feat = feat.astype(np.float)
return feat
feat = csv_extractor('mfcc_kushal.csv')
print(feat)
print(feat.shape[0])
#NN modeling
y_train = np.arange(2)
NN = Classifier(layers =[Layer("Sigmoid", units = 100), Layer("Softmax")],
learning_rate = 0.001,
n_iter = 500 )
NN.fit(feat,y_train)
The following output shows what it says. I have also printed the ndarray and number of rows in it. The ndarray are MFCCs of a single person for 1201 different frames of the same speech signal.
Output:

Make a multiline plot from .CSV file in matplotlib

I've been trying for weeks to plot 3 sets of (x, y) data on the same plot from a .CSV file, and I'm getting nowhere. My data was originally an Excel file which I have converted to a .CSV file and have used pandas to read it into IPython as per the following code:
from pandas import DataFrame, read_csv
import pandas as pd
# define data location
df = read_csv(Location)
df[['LimMag1.3', 'ExpTime1.3', 'LimMag2.0', 'ExpTime2.0', 'LimMag2.5','ExpTime2.5']][:7]
My data is in the following format:
Type mag1 time1 mag2 time2 mag3 time3
M0 8.87 41.11 8.41 41.11 8.16 65.78;
...
M6 13.95 4392.03 14.41 10395.13 14.66 25988.32
I'm trying to plot time1 vs mag1, time2 vs mag2 and time3 vs mag3, all on the same plot, but instead I get plots of time.. vs Type, eg. for the code:
df['ExpTime1.3'].plot()
I get 'ExpTime1.3' (y-axis) plotted against M0 to M6 (x-axis), when what I want is 'ExpTime1.3' vs 'LimMag1.3', with x-labels M0 - M6.
How do I get 'ExpTime..' vs 'LimMag..' plots, with all 3 sets of data on the same plot?
How do I get the M0 - M6 labels on the x-axis for the 'LimMag..' values (also on the x-axis)?
Since trying askewchan's solutions, which did not return any plots for reasons unknown, I've found that I can get a plot of ExpTimevs LimMagusing df['ExpTime1.3'].plot(),if I change the dataframe index (df.index) to the values of the x axis (LimMag1.3). However, this appears to mean that I have to convert each desired x-axis to the dataframe index by manually inputing all the values of the desired x-axis to make it the data index. I have an awful lot of data, and this method is just too slow, and I can only plot one set of data at a time, when I need to plot all 3 series for each dataset on the one graph. Is there a way around this problem? Or can someone offer a reason, and a solution, as to why I I got no plots whatsoever with the solutions offered by askewchan?\
In response to nordev, I have tried the first version again, bu no plots are produced, not even an empty figure. Each time I put in one of the ax.plotcommands, I do get an output of the type:
[<matplotlib.lines.Line2D at 0xb5187b8>], but when I enter the command plt.show()nothing happens.
When I enter plt.show()after the loop in askewchan's second solution, I get an error back saying AttributeError: 'function' object has no attribute 'show'
I have done a bit more fiddling with my original code and can now get a plot of ExpTime1.3vs LimMag1.3 with the code df['ExpTime1.3'][:7].plot(),by making the index the same as the x axis (LimMag1.3), but I can't get the other two sets of data on the same plot. I would appreciate any further suggestions you may have. I'm using ipython 0.11.0 via Anaconda 1.5.0 (64bit) and spyder on Windows 7 (64bit), python version is 2.7.4.
If I have understood you correctly, both from this question as well as your previous one on the same subject, the following should be basic solutions you could customize to your needs.
Several subplots:
Note that this solution will output as many subplots as there are Spectral classes (M0, M1, ...) vertically on the same figure. If you wish to save the plot of each Spectral class in a separate figure, the code needs some modifications.
import pandas as pd
from pandas import DataFrame, read_csv
import numpy as np
import matplotlib.pyplot as plt
# Here you put your code to read the CSV-file into a DataFrame df
plt.figure(figsize=(7,5)) # Set the size of your figure, customize for more subplots
for i in range(len(df)):
xs = np.array(df[df.columns[0::2]])[i] # Use values from odd numbered columns as x-values
ys = np.array(df[df.columns[1::2]])[i] # Use values from even numbered columns as y-values
plt.subplot(len(df), 1, i+1)
plt.plot(xs, ys, marker='o') # Plot circle markers with a line connecting the points
for j in range(len(xs)):
plt.annotate(df.columns[0::2][j][-3:] + '"', # Annotate every plotted point with last three characters of the column-label
xy = (xs[j],ys[j]),
xytext = (0, 5),
textcoords = 'offset points',
va = 'bottom',
ha = 'center',
clip_on = True)
plt.title('Spectral class ' + df.index[i])
plt.xlabel('Limiting Magnitude')
plt.ylabel('Exposure Time')
plt.grid(alpha=0.4)
plt.tight_layout()
plt.show()
All in same Axes, grouped by rows (M0, M1, ...)
Here is another solution to get all the different Spectral classes plotted in the same Axes with a legend identifying the different classes. The plt.yscale('log') is optional, but seeing as how the values span such a great range, it is recommended.
import pandas as pd
from pandas import DataFrame, read_csv
import numpy as np
import matplotlib.pyplot as plt
# Here you put your code to read the CSV-file into a DataFrame df
for i in range(len(df)):
xs = np.array(df[df.columns[0::2]])[i] # Use values from odd numbered columns as x-values
ys = np.array(df[df.columns[1::2]])[i] # Use values from even numbered columns as y-values
plt.plot(xs, ys, marker='o', label=df.index[i])
for j in range(len(xs)):
plt.annotate(df.columns[0::2][j][-3:] + '"', # Annotate every plotted point with last three characters of the column-label
xy = (xs[j],ys[j]),
xytext = (0, 6),
textcoords = 'offset points',
va = 'bottom',
ha = 'center',
rotation = 90,
clip_on = True)
plt.title('Spectral classes')
plt.xlabel('Limiting Magnitude')
plt.ylabel('Exposure Time')
plt.grid(alpha=0.4)
plt.yscale('log')
plt.legend(loc='best', title='Spectral classes')
plt.show()
All in same Axes, grouped by columns (1.3", 2.0", 2.5")
A third solution is as shown below, where the data are grouped by the series (columns 1.3", 2.0", 2.5") rather than by the Spectral class (M0, M1, ...). This example is very similar to
#askewchan's solution. One difference is that the y-axis here is a logarithmic axis, making the lines pretty much parallel.
import pandas as pd
from pandas import DataFrame, read_csv
import numpy as np
import matplotlib.pyplot as plt
# Here you put your code to read the CSV-file into a DataFrame df
xs = np.array(df[df.columns[0::2]]) # Use values from odd numbered columns as x-values
ys = np.array(df[df.columns[1::2]]) # Use values from even numbered columns as y-values
for i in range(df.shape[1]/2):
plt.plot(xs[:,i], ys[:,i], marker='o', label=df.columns[0::2][i][-3:]+'"')
for j in range(len(xs[:,i])):
plt.annotate(df.index[j], # Annotate every plotted point with its Spectral class
xy = (xs[:,i][j],ys[:,i][j]),
xytext = (0, -6),
textcoords = 'offset points',
va = 'top',
ha = 'center',
clip_on = True)
plt.title('Spectral classes')
plt.xlabel('Limiting Magnitude')
plt.ylabel('Exposure Time')
plt.grid(alpha=0.4)
plt.yscale('log')
plt.legend(loc='best', title='Series')
plt.show()
You can call pyplot.plot(time, mag) three different times in the same figure. It would be wise to give a label to them. Something like this:
import matplotlib.pyplot as plt
...
fig = plt.figure()
ax = fig.add_subplot(111)
ax.plot(df['LimMag1.3'], df['ExpTime1.3'], label="1.3")
ax.plot(df['LimMag2.0'], df['ExpTime2.0'], label="2.0")
ax.plot(df['LimMag2.5'], df['ExpTime2.5'], label="2.5")
plt.show()
If you want to loop it, this would work:
fig = plt.figure()
ax = fig.add_subplot(111)
for x,y in [['LimMag1.3', 'ExpTime1.3'],['LimMag2.0', 'ExpTime2.0'], ['LimMag2.5','ExpTime2.5']]:
ax.plot(df[x], df[y], label=y)
plt.show()