json data appears in a single line - json

I have a badly formatted json file
R˜{"xData":{"x":[7872,7904,...4670]}} R˜{"xData":{"x":[7904,7904,...8000]}} ...
That is, there is only one line, whereas each new data record should start with a new line starting from the character R. Also, the character ~ after R is unwanted. Since the file is about 1GB, it is impossible to manually insert a new line just before each R. Each x is a vector of the same number of data points, say 5000. If the total number of lines are 100000, i.e. 1e5 occurences of the character R, then how to obtain a separate output file containing only the matrix of x values in that output file? This matrix will be 5000 columns by 100000 rows.

Related

GnuPlot :: Plotting 3D recorded in an unconventional format

I would like to prepare a script file to draw a 3D plot of some kinetic spectroscopy results. In the experiment the absorption spectrum of a solution is measured sequentially at increasing times from t0 to tf with a constant increase in time Δt.
The plot will show the variation of absorbamce (Z) with wavelength and time.
The data are recorded using a UV-VIS spectrometer and saved as a CSV text file.
The file contains a table in which the first column are the wavelengths of the spectra. Afterwards, a column is added for each the measured spectra, and a number of columns depends on the total time and the time interval between measuerments. The time for each spectra appears in the headers line.
I wonder if the data can be plotted directly witha minimum of preformatting and without the need to rewrite the data in a more estandar XYZ format.
The structure of the data file is something like this
Title; espectroscopia UV-Vis
Comment;
Date; 23/10/2018 16:41:12
Operator; laboratorios
System Name; Undefined
Wavelength (nm); 0 Min; 0,1 Min; 0,2 Min; 0,3 Min; ... 28,5 Min
400,5551; 1,491613E-03; 1,810312E-03; 2,01891E-03; ... 4,755786E-03
... ... ... ... ... ...
799,2119; -5,509266E-04; 3,26314E-04; -4,319865E-04; ... -5,087912E-04
(EOF)
A copy of a sample data is included in this file kinetic_spectroscopy.csv.
Thanks.
Your data is in an acceptable form for gnuplot, but persuading the program to plot this as one line per wavelength rather than a gridded surface is more difficult. First let's establish that the file can be read and plotted. The following commands should read in the x/y coordinates (x = first row, y = first column) and the z values to construct a surface.
DATA = 'espectros cinetica.csv'
set datafile separator ';' # csv file with semicolon
# Your data uses , as a decimal point.
set decimal locale # The program can handle this if your locale is correct.
show decimal # confirm this by inspecting the output from "show".
set title DATA
set ylabel "Wavelength"
set xlabel "Time (min)"
set xyplane 0
set style data lines
splot DATA matrix nonuniform using 1:2:3 lc palette
This actually looks OK with your data. For a smaller number of scans it is probably not what you would want. In order to plot separate lines, one per scan, we could break this up into a sequence of line plots rather than a single surface plot:
DATA = 'espectros cinetica.csv'
set datafile separator ";"
set decimal locale
unset key
set title DATA
set style data lines
set ylabel "Wavelength"
set xlabel "Time (min)"
set xtics offset 0,-1 # move labels away from axis
splot for [row=0:*] DATA matrix nonuniform every :::row::row using 1:2:3
This is what I get for the first 100 rows of your data file. The row data is colored sequentially by gnuplot linetypes. Other coloring schemes are possible.

Plot csv file with multiple rows using gnuplot

I have a csv file which contains 500 rows and 100 columns.
I want to plot the data in the way that:
Each row represent a curve on the graph. Every column represents a
value on the curve (100 values).
500 such curves on the graph.
The code:
set xrange [0:100]
set yrange [0:20]
set term png
set output 'output.png'
set datafile separator ','
plot 'myplot.csv'
But this does not seem to work.
How can I configue gnuplot to achieve that?
Edit:
The data is in this format (Shortened):
7.898632397,7.834565295,8.114238484,7.636553599,7.759415909,7.829112422
7.898632397,8.379427648,8.418542273,7.921914453,7.558814684,7.237529826
7.898632397,7.862492565,8.132579657,8.419279034,8.350564183,8.578430051
7.898632397,7.613394134,7.213820144,7.42985323,7.74212712,7.144952258
7.898632397,7.736819889,8.14247944,8.025533212,8.256498438,8.133741173
7.898632397,7.906868959,8.032605101,8.308540463,8.238641673,8.143985756
set datafile separator comma
plot for [row=0:*] 'myplot.csv' matrix every :::row::row with lines
However I suspect that with 500 lines the plot will be too crowded to interpret.

Number Format: 1.0 = "0000100000"

I have a task to deliver a numeric/decimal value as part of a fixed length text file. These values will take the following form:
10 chars, w/last 5 chars representing the decimal portion of the string. These will all be positive numbers.
A few examples:
0.123 = "0000012300"
1.0 = "0000100000"
123.456 = "0012345600"
234 = "0023400000"
The numeric data resides in an Access database formatted as numbers (double).
My current thought is:
Retain the orignal numeric data in one table
Convert to TEXT strings via query, save to a second table
Export to a fixed width flat file using MSAccess export function
Can anyone suggest a reasonable approach to produce the necessary 10 character TEXT conversion?
Thanks!
Perhaps just multiply by 100000 and format?
Format(x * 100000, "0000000000")

Reading a CSV file of varying precision in Fortran

I am using an external program to run a simulation which returns to me a csv file containing output data. I need to read the data from this file into my fortran program, which analyses and optimizes the input conditions to rerun the external program.
The CSV file has say 20 columns and 70 rows. Each column contains output data for a specific parameter. Now since that program is not written by me, I cannot control the precision of the output values. So in many cases the external program truncates the number of digits after the decimal it they are zero. So it is possible in run number 1, a certain field has 3 digits after the decimal, but has only 2 digits after the decimal in run number 2.
What am I supposed to do for this? I cannot use the read command since in that I need to specify in advance the number of digits my program has to read.
I basically need a way for my program to identify data between commas and read a value or varying precision between the commas.
For input, the decimal part of a format specifier is only used if the input field does not contain a decimal point.
For the last few decades (since the demise of punched cards), users typically expect that a numeric value that doesn't contain a decimal point is an integer value. Consequently, for input, format specifications for real numbers should always have .0 for their decimal part.
For example, after:
CHARACTER(4) :: input
REAL :: a, b
input = '1 '
READ (input, "(F4.0)") a
READ (input, "(F4.1)") b
a will have the value 1.0, and b will have the value 0.1.
(For input, it doesn't particularly matter which particular real data descriptor is used (F, E, D, or G) - they all behave the same regardless of the nature of the input.)
So, for input, all you have to worry about is getting the field width right. Once you have read a record into a string this is easy enough to do by using the INDEX intrinsic.

How to use Gnuplot to create histogram from binned data from CSV file?

I have a CSV file which is generated by a process that outputs the data in pre-defined bins (say from -100 to +100 in steps of 10). So, each line looks somewhat like this:
1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20
i.e. 20 comma separated values, the first representing the frequency in the range -100 to -90, while the last represents the frequency between 90 to 100.
The problem is, Gnuplot seems to require the raw data for it to be able to generate a histogram, whereas I have only the frequency distribution. How do I proceed in this case? I'm looking for the simplest possible histogram, that perhaps displays the data using vertical bars.
You already have histogram data, so you mustn't use "set histogram".
Generate the x-values from the linenumbers, and do a simple boxplot
plot dataf using (($0-10)*10):$1 with boxes