How can I cluster 5 dimensional data using HDBSCAN - hdbscan

I am trying to cluster NTU-RGB+D 120 skeleton dataset using HDBSCAN. The numpy array of the skeleton data has 5 dimention
**dataset.shape=[40091, 3, 300, 25, 2]**
where No of data = 40091, Coordinates = 3 (x-y-z), No of frame = 300, No of joints = 25, No of body in the video = 2
When I am trying to cluster it using hdbscan but the fit raises an error message saying it only accepts 2D data. How can I do it with 5 dimension. I am completely new to work with skeleton data and clustering.

I have changed the shape of the data using view() so that now it contains
samples * (frames*joints) to reduce the dimension

Related

SHAP multiclass summary plot for Deep Explainer

I want to use SHAP summary plot for multiclass classification problem using Deep Explainer. I have 3 classes and for shap_values I got a list of 3 arrays each having (1000,1,24) size. Each array representing a class, I am getting the summary plot for individual class
import shap
background = train_x[np.random.choice(train_x.shape[0], 1000, replace=False)]
explainer = shap.DeepExplainer(model, background)
back= test_x[np.random.choice(test_x.shape[0], 1000, replace=False)]
shap_values = explainer.shap_values(back)
shap.summary_plot(shap_values[0][:,0,:], plot_type = 'bar', feature_names = features)
but when i try to plot all three classes on a single summary plot by this code
shap.summary_plot(shap_values,back_x, plot_type="bar",feature_names = features)
it gives me following error
IndexError: index 12 is out of bounds for axis 0 with size 1
how to plot all the 3 classes on a single summary plot?

Training loss is Nan using image segmentation in TPU using TFrecords

I am a beginner trying to work with TPUs using Tensorflow in Kaggle Kernels. I previously trained an Unet model using a dataset in GPU, and now I am trying to implement that in TPU. I made a tfrecord out of the dataset images and mask, and the TFrecord returns image and mask. When I try to train in TPU, the loss is always Nan, even though the metrics accuracy is normal. Since this is the same model and loss I used in GPU, I am guessing the problem is in tfrecord or loading dataset.
The code for loading data is given below :
def decode_image(image_data):
image = tf.image.decode_jpeg(image_data, channels=3)
image = tf.cast(image, tf.float32) / (255.0) # convert image to floats in [0, 1] range
image = tf.reshape(image, [*IMAGE_SIZE, 3]) # explicit size needed for TPU
return image
def decode_image_mask(image_data):
image = tf.image.decode_jpeg(image_data, channels=3)
image = tf.cast(image, tf.float64) / (255.0) # convert image to floats in [0, 1] range
image = tf.reshape(image, [*IMAGE_SIZE, 3]) # explicit size needed for TPU
image=tf.image.rgb_to_grayscale(image)
image=tf.math.round(image)
return image
def read_tfrecord(example):
TFREC_FORMAT = {
"image": tf.io.FixedLenFeature([], tf.string), # tf.string means bytestring
"mask": tf.io.FixedLenFeature([], tf.string), # shape [] means single element
}
example = tf.io.parse_single_example(example, TFREC_FORMAT)
image = decode_image(example['image'])
mask=decode_image_mask(example['mask'])
return image, mask
def load_dataset(filenames, ordered=False):
# Read from TFRecords. For optimal performance, reading from multiple files at once and
# disregarding data order. Order does not matter since we will be shuffling the data anyway.
ignore_order = tf.data.Options()
if not ordered:
ignore_order.experimental_deterministic = False # disable order, increase speed
dataset = tf.data.TFRecordDataset(filenames, num_parallel_reads=AUTO) # automatically interleaves reads from multiple files
dataset = dataset.with_options(ignore_order) # uses data as soon as it streams in, rather than in its original order
dataset = dataset.map(read_tfrecord, num_parallel_calls=AUTO)
return dataset
def get_training_dataset():
dataset = load_dataset(TRAINING_FILENAMES)
dataset = dataset.repeat() # the training dataset must repeat for several epochs
dataset = dataset.shuffle(2048)
dataset = dataset.batch(BATCH_SIZE,drop_remainder=True)
dataset = dataset.prefetch(AUTO) # prefetch next batch while training (autotune prefetch buffer size)
return dataset
def get_validation_dataset(ordered=False):
dataset = load_dataset(VALIDATION_FILENAMES, ordered=ordered)
dataset = dataset.batch(BATCH_SIZE)
dataset = dataset.cache()
dataset = dataset.prefetch(AUTO) # prefetch next batch while training (autotune prefetch buffer size)
return dataset
def count_data_items(filenames):
# the number of data items is written in the name of the .tfrec files, i.e. flowers00-230.tfrec = 230 data items
n = [int(re.compile(r"-([0-9]*)\.").search(filename).group(1)) for filename in filenames]
return np.sum(n)
So, what am I doing wrong?
Turns out the problem was that I was unbatching the data and batching it to 20 to properly view the image and masks in matplotlib, and this was screwing up how data was being sent to the model, hence the Nan loss. Making another copy of the dataset and using that to view image, while sending the original to train solved this problem.

Invalid shape error plotting 2D image in python for MNIST Sign Language dataset having PIXEL VALUES AS COLUMNS

I have a MNIST Sign Language dataset with pixel values as columns.
I get the error when I try to plot an image at one of the indexes as follows:
#Training dataset
dfr = pd.read_csv("sign_mnist_train.csv")
X_train_orig = dfr.iloc[:,1:]
Y_train_orig = dfr['label']
#Testing dataset
dfe = pd.read_csv("sign_mnist_test.csv")
X_test_orig = dfe.iloc[:,1:]
Y_test_orig = dfe['label']
#shapes of dataset
print(dfr.shape) #(27455, 785)
print(dfe.shape) #(7172, 785)
#Example of a picture
index = 1
plt.imshow(X_train_orig.iloc[index])
#TypeError: Invalid shape (784,) for image data
Looks like the image you are trying to plot is a flattened one corresponding to [B, N], where N is 1x28x28, and B is 27455 which is your image size (27455, 784). This is fine if you want to feed it to a Linear layer of 784 long vector. To plot this image you have to reshape it to correspond to [27455, 1, 28, 28]. You can try this out:
image = X_train_orig.iloc[index]
image = np.reshape(image.values, (28, 28))
plt.imshow(image)

plotting maps using OSM or other shapefiles and matplotloib for standardized report

We are developing a standardized report for our activities. The last graph I need is to display the geographic area of the activities (there are close to 100 locations).
The output for these reports is PDF letter or A4 size
The report is a mplotlib figure, where:
fig = plt.figure(figsize=(8.5, 11))
rect0 = 0, .7,, 0.18, 0.3
rect1 = .3, .7, .18, .3
rect2 = .8, .29, .2, .7
rect3 = 0, 0, .8, .4
ax1 = fig.add_axes(rect0)
ax2 = fig.add_axes(rect1)
ax3 = fig.add_axes(rect2)
ax4 = fig.add_axes(rect3)
The contents and layout for axes 1-3 are settled and work great. However ax4 is where the map contents would be displayed (ideally).
I was hoping to do something like this:
map1 = Basemap(llcrnrlon=6.819087, llcrnrlat=46.368452, urcrnrlon=6.963978,
urcrnrlat=46.482906, resolution = 'h', projection='tmerc',
lon_0=6.88, lat_0=46.42, ax=4)
map1.readshapefile('a valid shape file that works') #<----- this is the sticking point
map1.draw(insert locator coordinates)
plt.savefig(report to be inserted to document)
plt.show()
However I have not been successful in obtaining a shape file that works from open street maps or GIS.
Nor have I identified the correct process to transform the data from openstreetmaps.
Nor have I identified the process to extract that information from the OSM/xml document or the transformed GeoJSON document.
Ideally I would like to grab the bounding box information from openstreetmaps and generate the map directly.
What is the process to get a shapefile that works with the .readshapefile() call?
Or alternatively how do I get the defined map into a Matplotlib axes ?
It might be easiest to use the cartopy.io.img_tiles module, which will automatically pull the OSM tiles for use with cartopy. Using the pre-rendered tiles would negate the trouble of handling and styling individual shapefiles/XML.
See the cartopy docs on using these tiles within cartopy.

Graphhopper: Cannot create location index when graph has invalid bounds

I am using graphhopper 0.8 via maven in my java project. I create a network with the folling code
FlagEncoder encoder = new CarFlagEncoder();
EncodingManager em = new EncodingManager(encoder);
// Creating and saving the graph
GraphBuilder gb = new GraphBuilder(em).
setLocation(testDir).
setStore(true).
setCHGraph(new FastestWeighting(encoder));
GraphHopperStorage graph = gb.create();
for (Node node : ALL NODES OF MY NETWORK) {
graph.getNodeAccess().setNode(uniqueNodeId, nodeX, nodeY);
}
for (Link link : ALL LINKS OF MY NETWORK) {
EdgeIteratorState edge = graph.edge(fromNodeId, toNodeId);
edge.setDistance(linkLength);
edge.setFlags(encoder.setProperties(linkSpeedInMeterPerSecond * 3.6, true, false));
}
Weighting weighting = new FastestWeighting(encoder);
PrepareContractionHierarchies pch = new PrepareContractionHierarchies(graph.getDirectory(), graph, graph.getGraph(CHGraph.class), weighting, TraversalMode.NODE_BASED);
pch.doWork();
graph.flush();
LocationIndex index = new LocationIndexTree(graph.getBaseGraph(), graph.getDirectory());
index.prepareIndex();
index.flush();
At this point, the bounding box saved in the graph shows the correct numbers. Files are written to disk including the "location_index". However, reloading the data gets me the following error
Exception in thread "main" java.lang.IllegalStateException: Cannot create location index when graph has invalid bounds: 1.7976931348623157E308,1.7976931348623157E308,1.7976931348623157E308,1.7976931348623157E308
at com.graphhopper.storage.index.LocationIndexTree.prepareAlgo(LocationIndexTree.java:132)
at com.graphhopper.storage.index.LocationIndexTree.prepareIndex(LocationIndexTree.java:287)
The reading is done with the following code
FlagEncoder encoder = new CarFlagEncoder();
EncodingManager em = new EncodingManager(encoder);
GraphBuilder gb = new GraphBuilder(em).
setLocation(testDir).
setStore(true).
setCHGraph(new FastestWeighting(encoder));
// Load and use the graph
GraphHopperStorage graph = gb.load();
// Load the index
LocationIndex index = new LocationIndexTree(graph.getBaseGraph(), graph.getDirectory());
if (!index.loadExisting()) {
index.prepareIndex();
}
So LocationIndexTree.loadExisting runs fine until entering prepareAlgo. At this point, the graph is loaded. However, the bounding box is not set and kept at the defaults?! Reading the location index does not update the bounding box. Hence, the error downstreams. What am I doing wrong? How do I preserve the bounding box in the first place? How to reconstruct the bbox?
TL;DR Don't use cartesian coordinates but stick to the WGS84 used by OSM.
A cartesian coordinate system like e.g. EPSG:25832 may have coordinates in the range of millions. After performing some math the coordinates may further increase in magnitude. Eventually, Graphhopper will store the coordinates as integers. That is, all coordinates may end up as Integer.MAX_VALUE. Hence, an invalid bounding box.