Apache Spark - Redundant JSON output after Spark SQL query

Apache Spark - Redundant JSON output after Spark SQL query - json

"I have done the following:
DataFrame input= sqlContext.read().json("/path/t*json");
DataFrame res = sqlContext.sql("SELECT *, Some_Calculation calc FROM input group by user_id")
res.write().json("/path/result.json");
This reads all the files in the directory that start with a 't' and are JSON files. Thus far, OK. But the output of this program is not a JSON file, as I intended, but a directory called result.json with a as many crc and as many output files as the number of input files. This results in many identical files. Since my calculations are grouped by the user_id, my aggregate calculations check out, but I get just as much output files as the input files I have, and since the aggregate calculations have the same result, many of those files are identical.
First of all, how can I generate a single JSON file as the output?
Second of all, how can I get rid of the redundancy?
Thank you!

Related

Dump Pandas Dataframe to multiple json files

I have a Dataframe loaded from a CSV file
df = pd.read_csv(input_file, header=0)
and I want to process it and eventually save it into multiple JSON files (for example a new file every X rows).
Any suggestion how to achieve that?

This should work:
for idx, group in df.groupby(np.arange(len(df))//10):
group.to_json(f'{idx}_name.json', orient='index') # orient: split, records, index, values, table, columns
Change the 10 to the number of rows you wish to write out for each iteration.

Does anyone have a script to convert a Chrome Bookmarks file with [sub]*folders into a CVS file?

I want to be able to do Vimdiffs and Vimfolds on Bookmarks files that have been converted to CVS files ie with one description and one uri per line. However, because the Bookmarks file has multiple levels for the folders, the CSV file will also need fields for the different levels of folder names on each line.
I am new to jq but it seems like it should be able to do this sort of conversion?
Thanks,
Phil.

Have you tried to use any free tools like: https://json-csv.com/
or json2csv: https://www.npmjs.com/package/json2csv
If neither of those works, perhaps this approach.
When I need to reconstruct data I write a set of loops that identify each property I want for each line in my CSV. Let's say my JSON has Name, Email, Phone but for some reason all are at different object levels in my JSON.
First right a loop that resolves Name, then a loop for Email, and one for Phone. At the end of the first loop call the second, and from the second call the third.
Then you can use jq -n which allows to create JSON with no input.
So your CSV output would be like jq -n '{NewName: .["'$Name'"]}'
once you have a clean JSON with all data points at the same level CSV conversion is smooth.
Hope this helps

Looping two CSV files in JMeter

I'd like to loop two csv files in Jmeter. I found this which is close, but I'd like the outer file to give me the CSV filename for the inner CSV.
So the outer file might have
filename
A
B
C
And this would lead to the inner loop looping
A.csv
B.csv
C.csv
When I try the technique referenced above, I get an error that the filename does not exist and I can see in the error that the problem is that jmeter is not substituting the variable in the filename for CSV data set under the Loop Controller. I suspect jmeter evaluates all the variables at a time when the variable introduced by the outer CSV file are not yet defined.

JMeter is not substituting a variable, but it will substitute a property.
Convert your variable into property and the approach will start working.
See Knit One Pearl Two: How to Use Variables in Different Thread Groups. guide to learn how you can do it and this SO answer for working JMeter script realizing alike scenario.

Load csv file with integers in Octave 3.2.4 under Windows

I am trying to import in Octave a file (i.e. data.txt) containing 2 columns of integers, such as:
101448,1077
96906,924
105704,1017
I use the following command:
data = load('data.txt')
However, the "data" matrix that results has a 1 x 1 dimension, with all the content of the data.txt file saved in just one cell. If I adjust the numbers to look like floats:
101448.0,1077.0
96906.0,924.0
105704.0,1017.0
the loading works as expected, and I obtain a matrix with 3 rows and 2 columns.
I looked at the various options that can be set for the load command but none of them seem to help. The data file has no headers, just plain integers, comma separated.
Any suggestions on how to load this type of data? How can I force Octave to cast the data as numeric?

The load function is not to read csv files. It is meant to load files saved from Octave itself which define variables.
To read a csv file use csvread ("data.txt"). Also, 3.2.4 is a very old version no longer supported, you should upgrade.

Natural ordering files in directory into a cell array using Octave

I have files being generated by another program/user that have names such as "jh-1.txt, jh-2.txt, ..., jh-100.txt, ..., jh-1024.txt". I'm extracting a column from these files, manipulating the data, and outputting to a new matrix. The only problem is that Octave is using ASCII ordering and not natural ordering when reading in the files. Thus, the output matrix is not ordered in a natural way. My question is, can Octave sort file names in a natural order? I'm getting file names in the standard method:
fileDirectory = '/path/to/directory';
filePattern = fullfile(fileDirectory, '*.txt'); % Selects only the txt files.
dataFiles = dir(filePattern); % Gets the info from the txt files in the directory.
baseFileName = {dataFiles.name}'; % Gets all the txt file names.
I can't rename the files because this is a script for another user. They are on a Windows machine and already have Octave installed with Cygwin and I don't want to make them use the command line more than they have to because they are unfamiliar with it. Alternatively, it would be nice to have the output with the file names in a column but, I haven't figured that one out either (bit of a noob with Octave myself). That way the user could use Excel (which they are familiar with) to sort the columns.

I don't think there's a built in natural sort in Octave. However, there is a natural sort submission on Mathwork's File Exchange. I've not used it, but the comments imply it works in Octave too.

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

Apache Spark - Redundant JSON output after Spark SQL query - json

Related

Dump Pandas Dataframe to multiple json files

Does anyone have a script to convert a Chrome Bookmarks file with [sub]*folders into a CVS file?

Looping two CSV files in JMeter

Load csv file with integers in Octave 3.2.4 under Windows

Natural ordering files in directory into a cell array using Octave

Categories

Resources