---- Update with what I got so far and what's left to resolve can be found in point 3 below ----
Using Octave I want to create 30 horizontal box and whisker plots without spread (x-axis) from 30 different GeoTIFF's. This is a sketch of how I would like the plot to look like:
Ideally the best solution for me would be an Octave code (workflow) that would allow me to place multiple GeoTIFFs in one directory and then with one click create a box and whisker plot for all GeotIFFs at once - just like the sketch above.
A GeoTIFF-sample with 3 GeoTIFF's can be downloaded here. The file looks like this in QGIS:
It holds elevation values on band 1 (the ones that each box and whisker plot should be based on, and no data values (-999), the no-data values should be excluded from the plot.
Right now this is what I got:
Using img = imread ("filname.tif") gets the file into Octave. Using hist (img(:), 200); shows that all cells are concentrated around 65300. imagesc (img, [65100 65600]) follwed by colorbar displays the image extent but's it's clear that this way simply doesn't import the real cell values. I can't find a working solution to import GeoTIFF's with cell values, therefor my current work-around is exporting the GeoTIFF from QGIS with gdal_translate -of aaigrid which creates a .asc-file that I manually edit to remove header rows, rename to .csv and load into Octave. That .csv can be found here.
To load it and create a box plot I'm currently using this code (thanks to #Andy and #Cris Luengo):
pkg load statistics
s = urlread ("https://drive.google.com/uc?export=download&id=1RzJ-EO0OXgfMmMRG8wiCBz-51RcwSM5h");
o = str2double (strsplit (s, ";"));
o(isnan (o)) = [];
boxplot (o)
set(gca,"xtick",[])
view([-90 90])
print out.png
The results is pretty close but I'm still failing to: A) load GeoTIFF's directly from a folder. If this is not possible I'm gonna have to modify the code to load all *.csv in a directory to the same box plot and label each plot by filename (which I'm unsure how to accomplish. B) to get the x-axis reversed (going from 200-450, not the other way around). This is caused by the view([-90 90]) that I use to make the box plot horizontal instead of vertical which is needed for layout reasons.
Anyone with any ideas on how to resolve the last adjustments?
---- Background info ----
I have 30 GeoTIFFs containing results from a viewshed analysis, for every 2x2 meter square there is a value the tells me how high a building can be (in meters) before it's visible from the viewshed point. The results cover the whole city of Stockholm but the above mentioned 30 GeoTIFFs are smaller clips of an area where new development is planned. The results help planners to understand how new development might effect each of the 30 places (that are important for cultural heritage management).
As part of a bigger PDF-report (where these results are visualized with different maps in different scales) I'm trying to produce a box and whisker plot (as a compliment to the maps) the gives the reader an overview over how much space is there is left at the planned development area, based on each of the 30 viewshed (GeoTIFF) results (one box and whisker for each of the 30 locations). Below is an example of how a map in the report can look like:
Does not directly read GeoTIFF but calls gdal_translate under the hood. Just place all your .tif in the same directory. Make sure gdal_translate is in your PATH:
pkg load statistics
clear all;
fns = glob ("*.tif");
for k=1:numel (fns)
ofn = tmpnam;
cmd = sprintf ('gdal_translate -of aaigrid "%s" "%s"', fns{k}, ofn);
[s, out] = system (cmd);
if (s != 0)
error ('calling gdal_translate failed with "%s"', out);
endif
fid = fopen (ofn, "r");
# read 6 headerlines
hdr = [];
for i=1:6
s = strsplit (fgetl (fid), " ");
hdr.(s{1}) = str2double (s{2});
endfor
d = dlmread (fid);
# check size against header
assert (size (d), [hdr.nrows hdr.ncols])
# set nodata to NA
d (d == hdr.NODATA_value) = NA;
raw{k} = d;
# create copy with existing values
raw_v{k} = d(! isna (d));
fclose (fid);
endfor
## generate plot
boxplot (raw_v)
set (gca, "xtick", 1:numel(fns),
"xticklabel", strrep (fns, ".tif", ""));
view ([-90 90])
zoom (0.95)
print ("out.png")
gives
Related
So I'm currently trying to use Python to transform large sums of data into a neat and tidy .csv file from a .txt file. The first stage is trying to get the 8-digit company numbers into one column called 'Company numbers'. I've created the header and just need to put each company number from each line into the column. What I want to know is, how do I tell my script to read the first eight characters of each line in the .txt file (which correspond to the company number) and then write them to the .csv file? This is probably very simple but I'm only new to Python!
So far, I have something which looks like this:
with open(r'C:/Users/test1.txt') as rf:
with open(r'C:/Users/test2.csv','w',newline='') as wf:
outputDictWriter = csv.DictWriter(wf,['Company number'])
outputDictWriter.writeheader()
rf = rf.read(8)
for line in rf:
wf.write(line)
My recommendation would be 1) read the file in, 2) make the relevant transformation, and then 3) write the results to file. I don't have sample data, so I can't verify whether my solution exactly addresses your case
with open('input.txt','r') as file_handle:
file_content = file_handle.read()
list_of_IDs = []
for line in file_content.split('\n')
print("line = ",line)
print("first 8 =", line[0:8])
list_of_IDs.append(line[0:8])
with open("output.csv", "w") as file_handle:
file_handle.write("Company\n")
for line in list_of_IDs:
file_handle.write(line+"\n")
The value of separating these steps is to enable debugging.
I have a MATLAB script that I would like to run in Octave. But it turns out that the timeseries and synchronize functions from MATLAB are not yet implemented in Octave. So my question is if there is a way to express or replace these functions in Octave.
For understanding, I have two text files with different row lengths, which I want to synchronize into one text file with the same row length over time. The content of the text files is:
Text file 1:
1st column contains the distance
2nd column contains the time
Text file 2:
1st column contains the angle
2nd column contains the time
Here is the part of my code that I use in MATLAB to synchronize the files.
ts1 = timeseries(distance,timed);
ts2 = timeseries(angle,timea);
[ts1 ts2] = synchronize(ts1,ts2,'union');
distance = ts1.Data;
angle = ts2.Data;
Thanks in advance for your help.
edit:
Here are some example files.
input distance
input roation angle
output
The synchronize function seems to create a common timeseries from two separate ones (here, specifically via their union), and then use interpolation (here 'linear') to find interpolated values for both distance and angle at the common timepoints.
An example of how to achieve this to get the same output in octave as your provided output file is as follows.
Note: I had to preprocess your input files first to replace 'decimal commas' with dots, and then 'tabs' with commas, to make them valid csv files.
Distance_t = csvread('input_distance.txt', 1, 0); % skip header row
Rotation_t = csvread('input_rotation_angle.txt', 1, 0); % skip header row
Common_t = union( Distance_t(:,2), Rotation_t(:,2) );
InterpolatedDistance = interp1( Distance_t(:,2), Distance_t(:,1), Common_t );
InterpolatedRotation = interp1( Rotation_t(:,2), Rotation_t(:,1), Common_t );
Output = [ InterpolatedRotation, InterpolatedDistance ];
Output = sortrows( Output, -1 ); % sort according to column 1, in descending order
Output = Output(~isna(Output(:,2)), :); % remove NA entries
(Note, The step involving removal of NA entries was necessary because we did not specify we wanted extrapolation during the interpolation step, and some of the resulting distance values would be outside the original timerange, which octave labels as NA).
I habitually use csvRead in scilab to read my data files however I am now faced with one which contains blocks of 200 rows, preceeded by 3 lines of headers, all of which I would like to take into account.
I've tried specifying a range of data following the example on the scilab help website for csvRead (example is right at the bottom of the page) (https://help.scilab.org/doc/6.0.0/en_US/csvRead.html) but I always come out with the same error messages :
The line and/or colmun indices are outside of the limits
or
Error in the column structure.
My first three lines are headers which I know can cause a problem but even if I omit them from my block-range, I still have the same problem.
Otherwise, my data is ordered such that I have my three lines of headers (two lines containing a header over just one or two columns, one line containing a header over all columns), 200 lines of data, and a blank line - this represents data from one image and I have about 500 images in the file, I would like to be able to read and process all of them and keep track of the headers because they state the image number which I need to reference later. Example:
DTN-dist_Devissage-1_0006_0,,,,,,
L0,,,,,,
X [mm],Y [mm],W [mm],exx [1] - Lagrange,eyy [1] - Lagrange,exy [1] - Lagrange,Von Mises Strain [1] - Lagrange
-1.13307,-15.0362,-0.00137507,7.74679e-05,8.30045e-05,5.68249e-05,0.00012711
-1.10417,-14.9504,-0.00193334,7.66086e-05,8.02914e-05,5.43132e-05,0.000122655
-1.07528,-14.8647,-0.00249155,7.57493e-05,7.75786e-05,5.18017e-05,0.0001182
Does anyone have a solution to this?
My current code, following an adapted version of the Scilab-help example looks like this (I have tried varying the blocksize and iblock values to include/omit headers:
blocksize=200;
C1=1;
C2=14;
iblock=1
while (%t)
R1=(iblock-1)*blocksize+4;
R2=blocksize+R1-1;
irange=[R1 C1 R2 C2];
V=csvRead(filepath+filename,",",".","",[],"",irange);
iblock=iblock+1
end
Errors
The CSV
A lot's of your problem comes from the inconsistency of the number of coma in your csv file. Opening it in LibreOffice Calc and saving it puts the right number of comma, even on empty lines.
R1
Your current code doesn't position R1 at the beginning of the values. The right formula is
R1=(iblock-1)*(blocksize+blanksize+headersize)+1+headersize;
End of file
Currently your code raise an error and the end of the file because R1 becomes greater than the number of lines. To solve this, you can specify the maximum number of block or test the value of R1 against the number of lines.
Improved solution for much bigger file.
When solving your probem with a big file, two problems were raised :
We need to know the number of blocks or the number of lines
Each call of csvRead is really slow because it process the whole file at each call (1s / block !)
My idea was to read the whole file and store it in a string matrix ( since mgetl as been improved since 6.0.0 ), then use csvTextScan on a submatrix. Doing so also removes the manual writing of the number of block/lines.
The code follows :
clear all
clc
s = filesep()
filepath='.'+s;
filename='DTN_full.csv';
// header is important as it as the image name
headersize=3;
blocksize=200;
C1=1;
C2=14;
iblock=1
// let save everything. Good for the example.
bigstruct = struct();
// Read all the value in one pass
// then using csvTextScan is much more efficient
text = mgetl(filepath+filename);
nlines = size(text,'r');
while ( %t )
mprintf("Block #%d",iblock);
// Lets read the header
R1=(iblock-1)*(headersize+blocksize+1)+1;
R2=R1 + headersize-1;
// if R1 or R1 is bigger than the number of lines, stop
if sum([R1,R2] > nlines )
mprintf('; End of file\n')
break
end
// We use csvTextScan ony on the lines that matters
// speed the program, since csvRead read thge whole file
// every time it is used.
H=csvTextScan(text(R1:R2),",",".","string");
mprintf("; %s",H(1,1))
R1 = R1 + headersize;
R2 = R1 + blocksize-1;
if sum([R1,R2]> nlines )
mprintf('; End of file\n')
break
end
mprintf("; rows %d to %d\n",R1,R2)
// Lets read the values
V=csvTextScan(text(R1:R2),",",".","double");
iblock=iblock+1
// Let save theses data
bigstruct(H(1,1)) = V;
end
and returns
Block #1; DTN-dist_0005_0; rows 4 to 203
....
Block #178; DTN-dist_0710_0; rows 36112 to 36311
Block #179; End of file
Time elapsed 1.827092s
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
this is my first question on here. I work as a meteorologist and have some coding experience, though it is far from professionally taught. Basically what I have is a .csv file from a weather station that is giving me data that is too detailed. (65.66 degrees and similar values) What I want to do is automate a way via a script file that would access the .csv file and get rid of values that were too detailed. (Take a temp from 65.66 to 66 (rounding up for anything above .5 and down for below) or for a pressure (29.8889) and making it (29.89) using the same rounding rules.) Is this possible to be done? If so how should I go about it. Again keep in mind that my coding skills for batch files are not the strongest.
Any help would be much appreciated.
Thanks,
I agree with the comments above. Math in batch is limited to integers, and won't work well for the manipulations you want.
I'd use PowerShell. Besides easily handling floating point math, it also has built-in methods for objectifying CSV data (as well as XML and other types of structured data). Take the following hypothetical CSV data contained within weather.csv:
date,time,temp,pressure,wx
20160525,12:30,65.66,30.1288,GHCND:US1TNWS0001
20160525,13:00,67.42,30.3942,GHCND:US1TNWS0001
20160525,13:30,68.92,31.0187,GHCND:US1TNWS0001
20160525,14:00,70.23,30.4523,GHCND:US1TNWS0001
20160525,14:30,70.85,29.8889,GHCND:US1TNWS0001
20160525,15:00,69.87,28.7384,GHCND:US1TNWS0001
The first thing you want to do is import that data as an object (using import-csv), then round the numbers as desired -- temp rounded to a whole number, and pressure rounded to a precision of 2 decimal places. Rounding to a whole number is easy. Just recast the data as an integer. It'll be rounded automatically. Rounding the pressure column is pretty easy as well if you invoke the .NET [math]::round() method.
# grab CSV data as a hierarchical object
$csv = import-csv weather.csv
# for each row of the CSV data...
$csv | foreach-object {
# recast the "temp" property as an integer
$_.temp = [int]$_.temp
# round the "pressure" property to a precision of 2 decimal places
$_.pressure = [math]::round($_.pressure, 2)
}
Now pretend you want to display the temperature, barometric pressure, and weather station name where "date" = 20160525 and "time" = 14:30.
$row = $csv | where-object { ($_.date -eq 20160525) -and ($_.time -eq "14:30") }
$row | select-object pressure,temp,wx | format-table
Assuming "pressure" started with a value of 29.8889 and "temp" had a value of 70.85, then the output would be:
pressure temp wx
-------- ---- --
29.89 71 GHCND:US1TNWS0001
If the CSV data had had multiple rows with the same date and time values (perhaps measurements from different weather stations), then the table would display with multiple rows.
And if you wanted to export that to a new csv file, just replace the format-table cmdlet with export-csv destination.csv
$row | select-object pressure,temp,wx | export-csv outfile.csv
Handy as a pocket on a shirt, right?
Now, pretend you want to display the human-readable station names rather than NOAA's designations. Make a hash table.
$stations = #{
"GHCND:US1TNWS0001" = "GRAY 1.5 E TN US"
"GHCND:US1TNWS0003" = "GRAY 1.9 SSE TN US"
"GHCND:US1TNWS0016" = "GRAY 1.3 S TN US"
"GHCND:US1TNWS0018" = "JOHNSON CITY 5.9 NW TN US"
}
Now you can add a "station" property to your "row" object.
$row = $row | select *,"station"
$row.station = $stations[$row.wx]
And now if you do this:
$row | select-object pressure,temp,station | format-table
Your console shows this:
pressure temp station
-------- ---- -------
29.89 71 GRAY 1.5 E TN US
For extra credit, say you want to export this row data to JSON (for a web page or something). That's slightly more complicated, but not impossibly so.
add-type -AssemblyName System.Web.Extensions
$JSON = new-object Web.Script.Serialization.JavaScriptSerializer
# convert $row from a PSCustomObject to a more generic hash table
$obj = #{}
# the % sign in the next line is shorthand for "foreach-object"
$row.psobject.properties | %{
$obj[$_.Name] = $_.Value
}
# Now, stringify the row and display the result
$JSON.Serialize($obj)
The output of that should be similar to this:
{"station":"GRAY 1.5 E TN US","wx":"GHCND:US1TNWS0001","temp":71,"date":"201605
25","pressure":29.89,"time":"14:30"}
... and you can redirect it to a .json file by using > or pipe it into the out-file cmdlet.
DOS batch scripting is, by far, not the best place to edit text files. However, it is possible. I will include sample, incomplete DOS batch code at the bottom of this post to demonstrate the point. I recommend you focus on Excel (no coding needed) or Python.
Excel - You don't need to code at all with Excel. Open the csv file. Let's say you have 66.667 in cell B12. In cell C12 enter a formula using the round function (code below). You can also teach yourself some Visual Basic for Applications. But, for this simple task, that is overkill. When done, if you save as csv format, you will loose your formulae and only have data. Consider saving as xlsx or xlsm.
Visual Basic Script - you can run vbscript on your machine with
cscript.exe (or wscript.exe), which is part of Windows. But, if using VB script, you might as well use VBA in Excel. It is almost identical.
Python is a very high level langauge with built in libraries
that make editing a csv file super easy. I recommend Anaconda
(a Python suite) from continuum.io. But, you can find the generic Python at
python.org as well. Anaconda will come prepackaged with lots of
helpful libraries. For csv editing, you will likely want to use the
pandas library. You can find plenty of short videos on YouTube.
Excel
Say you have 66.667 in cell B12. Set the formula in C13 to...
"=ROUND(B12,0)" to round to integer
"=ROUND(B12,1)" to round to one decimal place
As you copy and past, Excel will attempt to intelligently update the formulas for you.
Python
import pandas as pd
from StringIO import StringIO
import numpy as np
# load csv file to memory. Name your columns "using names=[]"
df = pd.read_csv(StringIO("C:/temp/weather.csv"), names=["city", "temperature", "date"])
df["temperature"].apply(np.round) #you just rounded the temperature column
pd.to_csv('newfile.csv') # export to a new csv file
pd.to_xls('newfile.xls') # or export to an excel file instead
DOS Batch
A Batch script for this is much, much harder. I will not write the whole program, because it is not a great solution. But, I'll give you a taste in DOS batch code at the bottom of this post. Compared to using Python or Excel, it is extremely complex.
Here is a rough sketch of DOS code. Because I don't recommend this method, I didn't take the time to debug this code.
setlocal ENABLEDELAYEDEXPANSION
:: prep our new file for output. Let's write the header row.
echo col1, col2, col3 >newfile.csv
:: read the existing text file line by line
:: since it is csv, we will parse on comma
:: skip lines starting with semi-colon
FOR /F "eol=; tokens=2,3* delims=, " %%i in (input_file.txt) do (
set col1=%%I, set col2=%%J, set col3=%%K
:: truncate col2 to 1 decimal place
for /f "tokens=2 delims==." %%A in ("col2") do (
set integer=%%A
set "decimal=%%B
set decimal=%decimal:~0,1%
:: or, you can use an if statement to round up or down
:: Now, put the integer and decimal together again and
:: redefine the value for col2.
set col2=%integer%.%decimal%
:: write output to a new csv file
:: > and >> can redirect output from console to text file
:: >newfile.csv will overwrite file.csv. We don't want
:: that, since we are in a loop.
:: >>newfile.csv will append to file.csv, perfect!
echo col1, col2, col3 >>newfile.csv
)
)
:: open csv file in default application
start myfile.csv
I am trying to plot a graph using R which is populated by MySQL query results. I have the following code:
rs = dbSendQuery(con, "SELECT BuildingCode, AccessTime from access")
data = fetch(rs, n=-1)
x = data[,1]
y = data[,2]
cat(colnames(data),x,y)
This gives me an output of:
BuildingCode AccessTime TEST-0 TEST-1 TEST-2 TEST-3 TEST-4 14:40:59 07:05:00 20:10:59 08:40:00 07:30:59
But this is where I get stuck. I have idea how to pass the "cat" data into an R plot. I have spend hours searching online and most of the examples of R plots I have come across use read.tables(text=""). This is not feasible for me as the data has to come from a database and not be hard coded in. I also found something about saving the output as a CSV but MySQL can not overwrite existing files so after the code was executed once I was unable to do it again as a file already existed.
My question is, how can I use the "cat" data (or another way of doing it if there is a better way) to plot a graph using data that isn't hard coded?
Note: I am using RApache as my web server and I have installed the Brew package.
Make the plot using R and just pass the path to the file back in cat
<%
## Your other code to get the data, assuming it gets a data.frame called data
## Plot code
library(Cairo)
myplotfilename <- "/path/to/dir/myplot.png"
CairoPNG(filename = myplotfilename, width = 480, height = 480)
plot(x=data[,1],y=data[,2])
tmp <- dev.off()
cat(myplotfilename)
%>