Opensmile: unreadable csv file while extracting prosody features from wav file - csv

I am extracting prosody features from an audio file while using Opensmile using Windows version of Opensmile. It runs successful and an output csv is generated. But when I open csv, it shows some rows that are not readable. I used this command to extract prosody feature:
SMILEXtract -C \opensmile-3.0-win-x64\config\prosody\prosodyShs.conf -I audio_sample_01.wav -O prosody_sample1.csv
And the output of csv looks like this:
[
Even I tried to use the sample wave file given in Example audio folder given in opensmile directory and the output is same (not readable). Can someone help me in identifying where the problem is actually? and how can I fix it?

You need to enable the csvSink component in the configuration file to make it work. The file config\prosody\prosodyShs.conf that you are using does not have this component defined and always writes binary output.
You can verify that it is the standart binary output in this way: omit the -O parameter from your command so it becomesSMILEXtract -C \opensmile-3.0-win-x64\config\prosody\prosodyShs.conf -I audio_sample_01.wav and execute it. You will get a output.htk file which is exactly the same as the prosody_sample1.csv.
How output csv? You can take a look at the example configuration in opensmile-3.0-win-x64\config\demo\demo1_energy.conf where a csvSink component is defined.
You can find more information in the official documentation:
Get started page of the openSMILE documentation
The section on configuration files
Documentation for cCsvSink

This is how I solved the issue. First I added the csvSink component to the list of the component instances. instance[csvSink].type = cCsvSink
Next I added the configuration parameters for this instance.
[csvSink:cCsvSink]
reader.dmLevel = energy
filename = \cm[outputfile(O){output.csv}:file name of the output CSV
file]
delimChar = ;
append = 0
timestamp = 1
number = 1
printHeader = 1
\{../shared/standard_data_output_lldonly.conf.inc}`
Now if you run this file it will throw you errors because reader.dmLevel = energy is dependent on waveframes. So the final changes would be:
[energy:cEnergy]
reader.dmLevel = waveframes
writer.dmLevel = energy
[int:cIntensity]
reader.dmLevel = waveframes
[framer:cFramer]
reader.dmLevel=wave
writer.dmLevel=waveframes
Further reference on how to configure opensmile configuration files can be found here

Related

file "(...).csv" not Stata file error in using merge command

I use Stata 12.
I want to add some country code identifiers from file df_all_cities.csv onto my working data.
However, this line of code:
merge 1:1 city country using "df_all_cities.csv", nogen keep(1 3)
Gives me the error:
. run "/var/folders/jg/k6r503pd64bf15kcf394w5mr0000gn/T//SD44694.000000"
file df_all_cities.csv not Stata format
r(610);
This is an attempted solution to my previous problem of the file being a dta file not working on this version of Stata, so I used R to convert it to .csv, but that also doesn't work. I assume it's because the command itself "using" doesn't work with csv files, but how would I write it instead?
Your intuition is right. The command merge cannot read a .csv file directly. (using is technically not a command here, it is a common syntax tag indicating a file path follows.)
You need to read the .csv file with the command insheet. You can use it like this.
* Preserve saves a snapshot of your data which is brought back at "restore"
preserve
* Read the csv file. clear can safely be used as data is preserved
insheet using "df_all_cities.csv", clear
* Create a tempfile where the data can be saved in .dta format
tempfile country_codes
save `country_codes'
* Bring back into working memory the snapshot saved at "preserve"
restore
* Merge your country codes from the tempfile to the data now back in working memory
merge 1:1 city country using `country_codes', nogen keep(1 3)
See how insheet is also using using and this command accepts .csv files.

CSV Data Set Config not looping

I'm using v5.1.1 of JMeter and attempting to use the "CSV Data Set Config". The file is read correctly as I can tell from the Debug Sampler/Results Tree, but the file is not being read line by line. In other words, it reads the first line and never proceeds to the next line for processing.
I would like to use the data inside the CSV to iterate over a series of HTTP Requests to an external API. I currently have a single thread with only the "CSV Data Set Config" and "HTTP Request".
Do I need to wrap this with a ForEach controller or another looping construct? Perhaps I'm missing it but I do not see in the documentation that would indicate it's necessary.
Thanks
You dont need to wrap this in a ForEach loop. First line in the CSV file is a var name:
Let's say your csv file looks like
foo, bar
1, John
2, George
3, Laura
And you use an http request sampler
then ${foo} and ${bar} will get iterated sequentially. However please make sure you are mindful about the CSV Data Set Config options. The following options works ok for me:
By default CSV Data Set Config doesn't trigged any "looping", it reads next line from the CSV file for each thread (virtual user) for each iteration.
So if you want to see more values from the CSV file - either add more users or loops or both.
Given
This CSV file:
line1
line2
line3
Following CSV Data Set Config setup:
And the following Thread Group setup:
You will get the following values (assuming __threadNum() function to visualize current virtual user number and ${__jm__Thread Group__idx} pre-defined variable to show current Thread Group iteration) :
Check out JMeter Parameterization - The Complete Guide article for more information on various approaches on parameterizing JMeter tests using external data sources

Saving specific Excel sheet as .csv

I am trying figure out how to save a specific Excel sheet as CSV via command line on Linux.
I am able to save the first sheet with the command below:
libreoffice --headless --convert-to csv --outdir /tmp /tmp/test.xls
It seems that there should be a way to specify the sheet I want to save, but I am not able to find one.
Is there a way to save it via LibreOffice?
I know OP has probably moved on by now but since this was the first result in my search, I figured I'd take a stab at leaving an answer that works and is actually usable for the next googler.
First, LibreOffice still only lets you save the first sheet. If that is all you need, then try libreoffice --convert-to csv Test.ods. Interestingly, the GUI does the same thing - only letting you export the active sheet. So it's not that terminal is ignored so much that it is just a limitation in LibreOffice.
I needed to extract several sheets into separate csv files so "active sheet only" didn't cut it for me. After seeing this answer only had a macro as the suggestion, I kept looking. There were a few ways to get the other sheets in various places I found after this page. I don't recall any of them that allowed you to extract a specific sheet (unless it was some random github tool that I skipped over).
I liked the method of using the Gnumeric spreadsheet application because it is in most central repos and doesn't involve converting to xsl / xslx first. However, there are a few caveats to be aware of.
First, if you want to be able to extract only one sheet without knowing the sheet name ahead of time then this won't work. If you do know the sheet name ahead or time or are ok with extracting all the sheets, then this works fairly well. The sheet name can be used to create the output files so it's not completely lost which is nice too.
Second, if you want the quoting style to match the same style you'd get by manually exporting from the LibreOffice GUI, then you will need to forget the term "csv" and think in terms of "txt" until you finish the conversion (e.g. convert to .txt files then rename them). Otherwise, if you don't care about an exact match on quoting style, then this doesn't matter. I will show both ways below. If you don't know what a quoting style is, basically in csv if you have spaces or a string that contains , you put quotes around the cell value to distinguish from the commas used to separate text. Some programs quote everything, others quote if there are spaces and/or commas in the value, and others don't quote at all (or only quote for commas?).
Last, there seems to be a difference in the precision when converting via LibreOffice and Gnumeric's ssconvert tool. Not enough to matter for most people, for most use-cases. But still worth noting. In my original ods file, I had a formula that was taking the average of 3 cells with 58.14, 59.1, and 59.05 respectfully. This average came to 58.7633333333333 when I exported via the LibreOffice GUI. With ssconvert, the same value was 58.76333333333333 (e.g. it had one additional decimal place compared to LibreOffice version). I didn't really care for my purposes but if you need to exactly match LibreOffice or don't want the extra precision, then I guess it might matter.
From man ssconvert, we have the following options:
-S, --export-file-per-sheet: Export a file for each sheet if the exporter only supports one sheet at a time. The output filename is treated as a template in which sheet number is substituted for %n, sheet name is substituted for %s, and sheet object name is substituted for %o in case of graph export. If there are no substitutions, a default of ".%n" is added.
-O, --export-options=optionsstring : Specify parameters for the chosen exporter. optionsstring is a list of parameter=value pairs, separated by spaces. The parameter names and values allowed are specific to the exporter and are documented below. Multiple parameters can be specified
During my testing, the -O options were ignored if I specified the output file with a .csv extension. But if I used .txt then they worked fine.
I'm not covering them all and I'm paraphrasing so read the man page if you want more details. But some of the options you can provide in the optionsstring are as follows:
sheet: Name of the sheet. You can repeat this option for multiple sheets. In my testing, using indexes did NOT work.
separator: If you want a true comma separated values files, then we'll need to use commas.
format: I'll be using raw bc I want the unformatted values. If you need something special for dates, etc read the man page.
quoting-mode: when to quote values. can be always, auto, or never. If you want to mimic LibreOffice as closely as possible, choose never.
So let's get to a terminal.
# install gnomic on fedora
$ sudo dnf install -y gnumeric
# install gnomic on ubuntu/mint/debian
$ sudo apt-get install -y gnumeric
# use the ssconvert util from gnumeric to do the conversion
# let it do the default quoting - this will NOT match LibreOffice
# in this example, I am just exporting 1 named sheet using
# -S, --export-file-per-sheet
$ ssconvert -S -O 'sheet=mysheet2' Test.ods test_a_%s.csv
$ ls *.csv
test_a_mysheet2.csv
# same thing but more closely mimicking LibreOffice output
$ ssconvert -S -O 'sheet=mysheet2 separator=, format=raw quoting-mode=never' Test.ods test_b_%s.txt;
$ mv test_b_mysheet2.txt test_b_mysheet2.csv;
# Q: But what if I don't know the sheet names?
# A: then you'll need to export everything
# notice the 'sheet' option is removed from the
# list of -O options vs previous command
$ ssconvert -S -O 'separator=, format=raw quoting-mode=never' Test.ods test_c_%n_%s.txt;
$ ls test_c*
test_c_0_mysheet.txt test_c_3_yoursheet2.txt
test_c_1_mysheet2.txt test_c_4_yoresheet.txt
test_c_2_yoursheet.txt test_c_5_holysheet.txt
# Now to rename all those *.txt files to *.csv
$ prename 's/\.txt/\.csv/g' test_c_*.txt
$ ls test_c*
test_c_0_mysheet.csv test_c_3_yoursheet2.csv
test_c_1_mysheet2.csv test_c_4_yoresheet.csv
test_c_2_yoursheet.csv test_c_5_holysheet.csv
Command:
soffice --headless "macro:///Library1.Module1.ConvertSheet(~/Desktop/Software/OpenOffice/examples/input/Test1.ods, Sheet2)"
Code:
Sub ConvertSheet( SpreadSheetPath as String, SheetNameSeek as String)
REM IN SpreadSheetPath is the FULL PATH and file
REM IN SheetName sheet name to be found and converted to CSV
Dim Doc As Object
Dim Dummy()
SheetNameSeek=trim(SheetNameSeek)
If (Not GlobalScope.BasicLibraries.isLibraryLoaded("Tools")) Then
GlobalScope.BasicLibraries.LoadLibrary("Tools")
End If
REM content of an opened window can be replaced with the help of the frame parameter and SearchFlags:
SearchFlags = com.sun.star.frame.FrameSearchFlag.CREATE + _
com.sun.star.frame.FrameSearchFlag.ALL
REM Set up a propval object to store the filter properties
Dim Propval(1) as New com.sun.star.beans.PropertyValue
Propval(0).Name = "FilterName"
Propval(0).Value = "Text - txt - csv (StarCalc)"
Propval(1).Name = "FilterOptions"
Propval(1).Value = "44,34,76,1"
Url=ConvertToUrl(SpreadSheetPath)
Doc = StarDesktop.loadComponentFromURL(Url, "MyFrame", _SearchFlags, Dummy)
FileN=FileNameoutofPath(Url)
BaseFilename = Tools.Strings.GetFileNameWithoutExtension(FileN)
DirLoc=DirectoryNameoutofPath(ConvertFromUrl(Url),"/")+"/"
Sheets = Doc.Sheets
NumSheets = Sheets.Count - 1
For J = 0 to NumSheets
SheetName = Sheets(J).Name
if (SheetName = SheetNameSeek) then
Doc.getCurrentController.setActiveSheet(Sheets(J))
Filename = DirLoc + BaseFilename + "."+ SheetName + ".csv"
FileURL = convertToURL(Filename)
Doc.StoreAsURL(FileURL, Propval())
end if
Next J
Doc.close(true)
NextFile = Dir
End Sub
I ended up using xlsx2csv
Version 0.7.8 supports general xlsx files pretty well. It allows to specify the tab by number and by name.
It does not do a good job on macros and complication multi-sheet documents, but it does a very good job on regular multi-sheet xlsx documents.
Unfortunately, xlsx2csv does not support password protected xlsx, so for that I still have to use Win32::OLE Perl module and run it on Windows environment.
From what I can see Libreoffice still does not have the ability to select the tab via command line.

Display html report in jupyter with R

The qa() function of the ShortRead bioconductor library generates quality statistics from fastq files. The report() function then prepares a report of the various measures in an html format. A few other questions on this site have recommended using the display_html() function of IRdisplay to show html in jupyter notebooks using R (irkernel). However it only throws errors for me when trying to display an html report generated by the report() function of ShortRead.
library("ShortRead")
sample_dir <- system.file(package="ShortRead", "extdata", "E-MTAB-1147") # A sample fastq file
qa_object <- qa(sample_dir, "*fastq.gz$")
qa_report <- report(qa_object, dest="test") # Makes a "test" directory containing 'image/', 'index.html' and 'QA.css'
library("IRdisplay")
display_html(file = "test/index.html")
Gives me:
Error in read(file, size): unused argument (size)
Traceback:
1. display_html(file = "test/index.html")
2. display_raw("text/html", FALSE, data, file, isolate_full_html(list(`text/html` = data)))
3. prepare_content(isbinary, data, file)
4. read_all(file, isbinary)
Is there another way to display this report in jupyter with R?
It looks like there's a bug in the code. The quick fix is to clone the github repo, and make the following edit to the ./IRdisplay/R/utils.r, and on line 38 change the line from:
read(file,size)
to
read(size)
save the file, switch to the parent directory, and create a new tarbal, e.g.
tar -zcf IRdisplay.tgz IRdisplay/
and then re-install your new version, e.g. after re-starting R, type:
install.packages( "IRdisplay.tgz", repo=NULL )

AWS download all Spark "part-*" files and merge them into a single local file

I've run a Spark job via databricks on AWS, and by calling
big_old_rdd.saveAsTextFile("path/to/my_file.json")
have saved the results of my job into an S3 bucket on AWS. The result of that spark command is a directory path/to/my_file.json containing portions of the result:
_SUCCESS
part-00000
part-00001
part-00002
and so on. I can copy those part files to my local machine using the AWS CLI with a relatively simple command:
aws s3 cp s3://my_bucket/path/to/my_file.json local_dir --recursive
and now I've got all those part-* files locally. Then I can get a single file with
cat $(ls part-*) > result.json
The problem is that this two-stage process is cumbersome and leaves file parts all over the place. I'd like to find a single command that will download and merge the files (ideally in order). When dealing with HDFS directly this is something like hadoop fs -cat "path/to/my_file.json/*" > result.json.
I've looked around through the AWS CLI documentation but haven't found an option to merge the file parts automatically, or to cat the files. I'd be interested in either some fancy tool in the AWS API or some bash magic that will combine the above commands.
Note: Saving the result into a single file via spark is not a viable option as this requires coalescing the data to a single partition during the job. Having multiple part files on AWS is fine, if not desirable. But when I download a local copy, I'd like to merge.
This can be done with a relatively simple function using boto3, the AWS python SDK.
The solution involves listing the part-* objects in a given key, and then downloading each of them and appending to a file object. First, to list the part files in path/to/my_file.json in the bucket my_bucket:
import boto3
bucket = boto3.resource('s3').Bucket('my_bucket')
keys = [obj.key for obj in bucket.objects.filter(Prefix='path/to/my_file.json/part-')]
Then, use Bucket.download_fileobj() with a file opened in append mode to write each of the parts. The function I'm now using, with a few other bells and whistles, is:
from os.path import basename
import boto3
def download_parts(base_object, bucket_name, output_name=None, limit_parts=0):
"""Download all file parts into a single local file"""
base_object = base_object.rstrip('/')
bucket = boto3.resource('s3').Bucket(bucket_name)
prefix = '{}/part-'.format(base_object)
output_name = output_name or basename(base_object)
with open(output_name, 'ab') as outfile:
for i, obj in enumerate(bucket.objects.filter(Prefix=prefix)):
bucket.download_fileobj(obj.key, outfile)
if limit_parts and i >= limit_parts:
print('Terminating download after {} parts.'.format(i))
break
else:
print('Download completed after {} parts.'.format(i))
The downloading part may be an extra line of code.
As far as cat'ing in order, you can do it according to time created, or alphabetically.
Combined in order of time created: cat $(ls -t) > outputfile
Combined & Sorted alphabetically: cat $(ls part-* | sort) > outputfile
Combined & Sorted reverse-alphabetically: cat $(ls part-* | sort -r) > outputfile