Are there major benefits of selecting NIfTi over DICOM (or viz.) as the choice of data format? I am working on 3D Volumetric semantic segmentation. I will have to convert either format to numpy array or tensor before feeding to the network, but curious on the performance benefits of selection.
(This question risks being opinion-based, so trying to stick to facts.)
DICOM is a very powerful, flexible but complex format, and its strength is to provide interoperability between different hardware and software. However, DICOM is not particularly efficient for image processing and analysis. One potential drawback of DICOM is that a single volume is stored as a sequence of 2D slices, which can be cumbersome to deal with.
NIfTi is an improved version of the Analyze file format, which was designed to be simpler than DICOM, while still retaining all the essential metadata. And it has the added benefit of being able to store a volume in a single file, with a simple header followed by raw data. This makes it fast to load and process.
There are several other medical file formats suitable for this task. You may also wish to consider NRRD which has many features in common with NIfTi. Simple format, fast to parse and load, flexible storage encoding for 2,3,4D data. Many tools and libraries can process NRRD files too.
So given your primary need is for efficient storage and analysis, NIfTi or NRRD would be a better choice.
Related
I'm very new in deep learning, and I'm targeting to use GAN (Generative Adversarial Network) to recognize emotional speech. I've only known images being as inputs to most deep learning algorithms, such as GAN. but I'm curious as to how audio data can be an input into it, besides of using images of the spectrograms as the input. also, i'd appreciate it if you can explain it in laymen terms.
Audio data can be be represented in form of numpy arrays but before moving to that you must understand what audio really is. If you give a thought on what an audio looks like, it is nothing but a wave like format of data, where the amplitude of audio change with respect to time.
Assuming that our audio is represented in time domain, we can extract the values at every half-second(arbitrary). This is called sampling rate.
Converting the data into frequency domain can reduce the amount of computation requires as the sampling rate is less.
Now, let's load the data. We'll use a library called librosa , which can be installed using pip.
data, sampling_rate = librosa.load('audio.wav')
Now, you have both the data and the sampling rate. We can plot the waveform now.
librosa.display.waveplot(data, sr=sampling_rate)
Now, you have the audio data in form of numpy array. You can now study the features of the data and extract the ones you find interesting to train your models.
Further to Ayush’s discussion, for information on the challenges and work arounds of dealing with large amounts of data at different time scales in audio data I suggest this post on WaveNet: https://deepmind.com/blog/article/wavenet-generative-model-raw-audio
After that it sounds like you want to do classification. In that case a GAN on it’s own is not suitable. If you have plenty of data you could use a straight LSTM (or another type of RNN) which is designed to model time series, or you can take set sized chunks of input and use a 1-d CNN (similar to WaveNet). If you have lots of unlabelled data from the same or similar domain and limited training data you could use a GAN to learn to generate new samples, then use the discriminator from the GAN as pre-trained weights for a CNN classifier.
Since you are trying to perform Speech Emotion Recognition (SER) using deep learning, you can go for a recurrent architecture (LSTM or GRU) or a combination of CNN and recurrent network architecture (CRNN) instead of GANs since GANs are complicated and difficult to train.
In a CRNN, the CNN layers will extract features of varying details and complexity, whereas the recurrent layers will take care of the temporal dependencies. You can then finally use a fully connected layer for regression or classification output, depending on whether your output label is discrete (for categorical emotions like angry, sad, neutral etc) or continuous (arousal and valence space).
Regarding the choice of input, you can use either a spectrogram input (2D) or raw speech signal (1D) as input. For spectrogram input, you have to use a 2D CNN whereas for a raw speech signal you can use a 1D CNN. Mel scale spectrograms are usually preferred over linear spectrograms since our ears hear frequencies in log scale and not linearly.
I have used a CRNN architecture to estimate the level of verbal conflict arising from conversational speech. Even though it is not SER, it is a very similar task.
You can find more details in the paper
http://www.eecs.qmul.ac.uk/~andrea/papers/2019_SPL_ConflictNET_Rajan_Brutti_Cavallaro.pdf
Also, check my github code for the same paper
https://github.com/smartcameras/ConflictNET
and a SER paper whose code I reproduced in Python
https://github.com/vandana-rajan/1D-Speech-Emotion-Recognition
And finally as Ayush mentioned, Librosa is one of the best Python libraries for audio processing. You have functions to create spectrograms in Librosa.
There are a number of guides I've seen on using LSTMs for time series in tensorflow, but I am still unsure about the current best practices in terms of reading and processing data - in particular, when one is supposed to use the tf.data.Dataset API.
In my situation I have a file data.csv with my features, and would like to do the following two tasks:
Compute targets - the target at time t is the percent change of
some column at some horizon, i.e.,
labels[i] = features[i + h, -1] / features[i, -1] - 1
I would like h to be a parameter here, so I can experiment with different horizons.
Get rolling windows - for training purposes, I need to roll my features into windows of length window:
train_features[i] = features[i: i + window]
I am perfectly comfortable constructing these objects using pandas or numpy, so I'm not asking how to achieve this in general - my question is specifically what such a pipeline ought to look like in tensorflow.
Edit: I guess that I'd also like to know whether the 2 tasks I listed are suited for the dataset api, or if i'm better off using other libraries to deal with them?
First off, note that you can use dataset API with pandas or numpy arrays as described in the tutorial:
If all of your input data fit in memory, the simplest way to create a
Dataset from them is to convert them to tf.Tensor objects and use
Dataset.from_tensor_slices()
A more interesting question is whether you should organize data pipeline with session feed_dict or via Dataset methods. As already stated in the comments, Dataset API is more efficient, because the data flows directly to the device, bypassing the client. From "Performance Guide":
While feeding data using a feed_dict offers a high level of
flexibility, in most instances using feed_dict does not scale
optimally. However, in instances where only a single GPU is being used
the difference can be negligible. Using the Dataset API is still
strongly recommended. Try to avoid the following:
# feed_dict often results in suboptimal performance when using large inputs
sess.run(train_step, feed_dict={x: batch_xs, y_: batch_ys})
But, as they say themselves, the difference may be negligible and the GPU can still be fully utilized with ordinary feed_dict input. When the training speed is not critical, there's no difference, use any pipeline you feel comfortable with. When the speed is important and you have a large training set, the Dataset API seems a better choice, especially you plan distributed computation.
The Dataset API works nicely with text data, such as CSV files, checkout this section of the dataset tutorial.
I'm experimenting with writing a game in a functional programming style, which implies representing the game state with a purely functional, immutable data structures.
One of the most important data structures would be a 3D grid representing the world, where objects can be stored at any [x,y,z] grid location. The properties I want for this data structure are:
Immutable
Fast persistent updates - i.e. creation of a new version of the entire grid with small changes is cheap and achieved through structural sharing. The grid may be large so copy-on-write is not a feasible option.
Efficient handling of sparse areas / identical values - empty / unpopulated areas should consume no resources (to allow for large open spaces). Bonus points if it is also efficient at storing large "blocks" of identical values
Unbounded - can grow in any direction as required
Fast reads / lookups - i.e. can quickly retrieve the object(s) at [x,y,z]
Fast volume queries, i.e. quick searches through a region [x1,y1,z1] -> [x2,y2,z2], ideally exploiting sparsity so that empty spaces are quickly skipped over
Any suggestions on the best data structure to use for this?
P.S. I know this may not be the most practical way to write a game, I'm just doing it as a learning experience and to stretch my abilities with FP......
I'd try an octtree. The boundary coordinates of each node are implicit in structure placement, and each nonterminal node keep 8 subtree but no data. You can thus 'unioning' to gain space.
I think that Immutable and Unbounded are (generally) conflicting requirements.
Anyway... to grow a octtree you must must replace the root.
Other requirement you pose should be met.
What are some more interesting graph data structures for working with networks? I am interested in structures which may offer some particular advantage in terms of traversing the network, finding random nodes, size in memory or for insertion/deletion/temporary hiding of nodes for example.
Note: I'm not so much interested in database like designs for addressing external memory problems.
One of my personal favorites is the link/cut tree, a data structure for partitioning a graph into a family of directed trees. This lets you solve network flow problems asymptotically faster than more traditional methods and can be used as a more powerful generalization of the union/find structure you may have heard of before.
I've heard of Skip Graphs ( http://www.google.com/search?ie=UTF-8&oe=UTF-8&sourceid=navclient&gfns=1&q=skip+graphs ), a probabilistic graph structure that is - as far as I know - already in use in some peer-to-peer applications.
These graphs are kind of self-organizing and their goal is to achieve a good connectivity and a small diameter. There is a distributed algorithm that tries to achieve such graphs: http://www14.informatik.tu-muenchen.de/personen/jacob/Publications/podc09.pdf
The graph size is in the billions of nodes, and tens of billions of vertices.
It will store webpages urls, and links between webpages and it will be used for testing ranking algorithms.
Any language is fine but java is prefered.
Solutions i found so far:
neo4j
storing in sorted flat files
Yes, i have already read Best Way to Store/Access a Directed Graph.
Update
The data can be distributed on multiple computers and does not need to be fully in-memory.
Depending on your implementation, another solution could be Terracotta. I think supports object graphs of this magnitude using a distributed virtual heap.
http://www.terracotta.org/web/display/docs/Concept+and+Architecture+Guide#ConceptandArchitectureGuide-VirtualHeap