CSVProvider start reading csv at specific row - csv

I want to read a csv file with using FSharp.Data CSVProvider.
The data looks like:
;Datum;Von;bis;MW
Maximum;16.10.2015;19:00;19:15;9268,000
Minimum;26.12.2015;13:30;13:45;-5195,000
"Datum";"Von";"bis";"Vertikale Netzlast [MW]";
01.01.2015;00:00;00:15;1.216;
01.01.2015;00:15;00:30;1.121;
01.01.2015;00:30;00:45;1.090;
01.01.2015;00:45;01:00;981;
I want to use the following code:
let csvValues = CsvProvider<"http://ws.50hertz.com/web01/api/PhotovoltaicForecast/DownloadFile?fileName=2015.csv&callback=?", ";">.GetSample()
How can I start to read the file at row 5 or if the first column contains "Datum"?

It is working with SkipWhile:
let csvValues = CsvProvider<"http://ws.50hertz.com/web01/api/PhotovoltaicForecast/DownloadFile?fileName=2015.csv&callback=?", ";", IgnoreErrors=true>.GetSample()
.SkipWhile(fun r -> not (r.Column1.Contains("Datum")))
Or this is also working, with an option in constructor to skip rows:
let csvValues = CsvProvider<"http://ws.50hertz.com/web01/api/PhotovoltaicForecast/DownloadFile?fileName=2015.csv&callback=?", ";", IgnoreErrors=true, SkipRows=3>.GetSample()

Related

Issue printing header using Rust's CSV crate

Here is my setup:
I am reading a csv file, the path to which is passed into the built exe as an argument, and I am using the crate Clap for it.
It all reads the file with no problem, but I am having trouble printing the headers.
I'd like to be able to print the headers without the quotes, but when I print it, only the first header/column gets printed without them, and the remaining ones do not.
Here's what I mean:
This is the part of the code that prints the header:
let mut rdr = csv::Reader::from_path(file)?;
let column_names = rdr.headers();
println!("{}", match column_names {
Ok(v) => v.as_slice(),
Err(_) => "Error!"
});
With this, this is what the output is:
warning: `csv_reader` (bin "csv_reader") generated 2 warnings
Finished release [optimized] target(s) in 0.13s
Running `target\release\csv_reader.exe -f C:\nkhl\Projects\dataset\hw_25000.csv`
Index "Height(Inches)" "Weight(Pounds)"
()
As you can see, Index does not get printed with the quotes, which is how I'd like the others to be printed. Printing with Debug marker enabled, I get this:
let mut rdr = csv::Reader::from_path(file)?;
let column_names = rdr.headers();
println!("{:?}", match column_names {
Ok(v) => v.as_slice(),
Err(_) => "Error!"
});
warning: `csv_reader` (bin "csv_reader") generated 2 warnings
Finished release [optimized] target(s) in 1.92s
Running `target\release\csv_reader.exe -f C:\nkhl\Projects\dataset\hw_25000.csv`
"Index \"Height(Inches)\" \"Weight(Pounds)\""
()
The CSV can be found here: https://people.sc.fsu.edu/~jburkardt/data/csv/hw_25000.csv
This is how it looks:
"Index", "Height(Inches)", "Weight(Pounds)"
1, 65.78331, 112.9925
2, 71.51521, 136.4873
3, 69.39874, 153.0269
I hope I am doing something utterly silly, but for the life of me, I am unable to figure it out.
Your csv data contains extraneous spaces after the commas, because of that Rusts csv thinks that the quotes around Height(Inches) are part of the header, not meant to escape them.
Unfortunately the lack of standardization around csv makes both interpretations valid.
You can use trim to get rid of the extra spaces:
let data: &[u8] = include_bytes!("file.csv");
let mut rdr = csv::ReaderBuilder::new().trim(csv::Trim::All).from_reader(data);
But csv does the unquoting before it applies the trim so this does still leave you with the same problem.
You can additionaly disable quoting to at least get the same behaviour on all columns:
let mut rdr = csv::ReaderBuilder::new().quoting(false).trim(csv::Trim::All).from_reader(data);
If you somehow can remove the spaces from your csv file it works just fine:
fn main() {
let data: &[u8] = br#""Index","Height(Inches)","Weight(Pounds)"
1,65.78331,112.9925
2,71.51521,136.4873
3,69.39874,153.0269"#;
let mut rdr = csv::Reader::from_reader(data);
let hd = rdr.headers().unwrap();
println!("{}", hd.as_slice());
// prints `IndexHeight(Inches)Weight(Pounds)` without any `"`
}
Playground

Python3 Replacing special character from .csv file after convert the same from JSON

I am trying to develop a program using Python3.6.4 which convert a JSON file into a CSV file and also we need to clean the data in the csv file. as for example:
My JSON File:
{emp:[{"Name":"Bo#b","email":"bob#gmail.com","Des":"Unknown"},
{"Name":"Martin","email":"mar#tin#gmail.com","Des":"D#eveloper"}]}
Problem 1:
After converting that into csv its creating a blank row between every 2 rows. As
**Name email Des**
[<BLANK ROW>]
Bo#b bob#gmail.com Unknown
[<BLANK ROW>]
Martin mar#tin#gmail.com D#eveloper
Problem 2:
In my code I am using emp but I need to use it dynamically.
fobj = open("D:/Users/shamiks/PycharmProjects/jsonSamle.txt")
jsonCont = fobj.read()
print(jsonCont)
fobj.close()
employee_parsed = json.loads(jsonCont)
emp_data = employee_parsed['employee']
As we will not know the structure or content of up-coming JSON file.
Problem 3:
I also need to remove all # characters from the CSV file.
For solving Problem 3, you can use .replace (https://www.tutorialspoint.com/python/string_replace.htm).
For problem 2, you can use the dictionary keys and then get the zeroth item out of it.
fobj = open("D:/Users/shamiks/PycharmProjects/jsonSamle.txt")
jsonCont = fobj.read().replace("#", "")
print(jsonCont)
fobj.close()
employee_parsed = json.loads(jsonCont)
first_key = employee_parsed.keys()[0]
emp_data = employee_parsed[first_key]
I can't solve problem 1 without more code to see how your are exporting the result. It may be that your data has newlines in it. In which case, you could add .replace("\n","") and/or .replace("\r","") after the previous replace so the line would read fobj.read().replace("#", "").replace("\n", "").replace("\r", "").

Parsing a .json column in Power BI

I want to parse a .json column through Power BI. I have imported the data directly from the server and have a .json column in the data along with other columns. Is there a way to parse this json column?
Example:
Key IDNumber Module JsonResult
012 200 Dine {"CategoryType":"dining","City":"mumbai"',"Location":"all"}
97 303 Fly {"JourneyType":"Return","Origin":"Mumbai (BOM)","Destination":"Chennai (MAA)","DepartureDate":"20-Oct-2016","ReturnDate":"21-Oct-2016","FlyAdult":"1","FlyChildren":"0","FlyInfant":"0","PromoCode":""}
276 6303 Stay {"Destination":"Clarion Chennai","CheckInDate":"14-Oct-2016","CheckOutDate":"15-Oct-2016","Rooms":"1","NoOfPax":"2","NoOfAdult":"2","NoOfChildren":"0"}
I wish to retain the other columns and also get the simplified parsed columns.
There is an easier way to do it, in the Query Editor on the column you want to read as a json:
Right click on the column
Select Transform>JSON
then the column becomes a Record that you can split in every property of the json using the button on the top right corner.
Use Json.Document function like this
let
...
your_table=imported_the_data_directly_from_the_server,
json=Table.AddColumn(your_table, "NewColName", each Json.Document([JsonResult]))
in
json
And then expand record to table using Table.ExpandRecordColumn
Or by clicking this button
Use Json.Document() function to convert string to Json data.
let
Source = Json.Document(Json.Document(Web.Contents("http://localhost:18091/pools/default/buckets/Aggregation/docs/AvgSumAssuredByProduct"))[json]),
#"Converted to Table" = Record.ToTable(Source),
#"Filtered Rows" = Table.SelectRows(#"Converted to Table", each not Text.Contains([Name], "type_")),
#"Renamed Columns" = Table.RenameColumns(#"Filtered Rows",{{"Name", "AvgSumAssuredByProduct"}}),
#"Changed Type" = Table.TransformColumnTypes(#"Renamed Columns",{{"Value", type number}})
in
#"Changed Type"
import json
from urllib import urlopen
import string
from UserList import *
l=[]
j=[]
d_base=urlopen('https://api.thingspeak.com/channels/193888/fields/1.json?results=1')
data = json.load(d_base)
for k in data['feeds']:
name = k['entry_id']
value = k['field1']
l.append(name)
j.append(value)
print l[0]
print j[0]
**this python code may useful for you **
**270
1035
**

How to control quoting on non-numerical entries in a csv file?

I am using Python3's csv module and am wondering why I cannot control quoting correctly. I am using the option quoting = csv.QUOTE_NONNUMERIC but am still seeing all entries quoted. Any idea as to why that is?
Here's my code. Essentially, I am reading in a csv file and want to remove all duplicate lines that have the same text string:
import sys
import csv
class Row:
def __init__(self, row):
self.text, self.a, self.b = row
self.elements = row
with open(sys.argv[2], 'w', newline='') as output:
writer = csv.writer(output, delimiter=';', quotechar='"',
quoting=csv.QUOTE_NONNUMERIC)
with open(sys.argv[1]) as input:
reader = csv.reader(input, delimiter=';')
header = next(reader)
Row.labels = header
assert Row.labels[1] == 'Label1'
writer.writerow(header)
texts = set()
for row in reader:
row_object = Row(row)
if row_object.text not in texts:
writer.writerow(row_object.elements)
texts.add(row_object.text)
When I look at the generated file, the content looks like this:
"Label1";"Label2";"Label3"
"AAA";"123";"456"
...
But I want this:
"Label1";"Label2";"Label3"
"AAA";123;456
...
OK ... I figured it out myself. The answer, I am afraid, was rather simple - and obvious in retrospect. Since the content of each line is obtained from a csv.reader()its elements are strings by default. As a result, the get quoted by the subsequently employed csv.writer().
To be treated as an int, they first need to be cast to an int:
row_object.elements[1]= int(row_object.a)
This explanation can be proven by inserting a type check before and after this cast:
print('Type: {}'.format(type(row_object.elements[1])))

I'm trying read 3 or more column from CSV file, it giving me an index error. anyone have an idea

//Create a new filereader object, using the context variable so it can be used between test components
context.fileReader = new BufferedReader(new FileReader('C:/data.csv'))
//Read in the first line of the data file
//this is the code fro the testcase
firstLine = context.fileReader.readLine()
//Split the first line into a string array and assign the array elements to various test case properties
String[] propData = firstLine.split(",")
testCase.setPropertyValue("data1",propData[0])
testCase.setPropertyValue("data2",propData[1])
testCase.setPropertyValue("data3",propData[2])
//Rename request test steps for readability in the log; append the element name to the test step names
testCase.getTestStepAt(0).setName("data1-" + propData[0])
testCase.getTestStepAt(1).setName("data2-" + propData[1])
testCase.getTestStepAt(2).setName("data3-" + propData[2])
//this is the Code that reads from CSV file
context.fileReader = new BufferedReader(new FileReader('C:/data.csv'))
/*Read in the next line of the file
We can use the same fileReader created in the Setup script because it
was assigned to the context variable.*/
nextLine = context.fileReader.readLine()
/*If the end of the file hasn't been reached (nextLine does NOT equal null)
split the line and assign new property values, rename test request steps,
and go back to the first test request step*/
if(nextLine != null){
String[] propData = nextLine.split(",")
curTC = testRunner.testCase
curTC.setPropertyValue("data1",propData[0])
curTC.setPropertyValue("data2",propData[1])
curTC.setPropertyValue("data3",propData[2])
curTC.getTestStepAt(0).setName("data1-" + propData[0])
curTC.getTestStepAt(1).setName("data2-" + propData[1])
curTC.getTestStepAt(2).setName("data3-" + propData[2])
testRunner.gotoStep(0)
}
This is the error that I'm getting. Does anyone have any idea? I'm trying to read more than 3 columns from the CSV file, please help.
TestCase failed [java.lang.IndexOutOfBoundsException: Index: 2, Size: 2:java.lang.IndexOutOfBoundsException: Index: 2, Size: 2], time taken = 0
Here is CSV file data:
Hydrogen,1,H,1.00797,20.4
Carbon,6,C,12.0115,5100
Oxygen,8,O,15.9994,90.2
Gold,79,Gd,196.967,3239
Uranium,92,U,238.03,4091
Use OpenCSV instead for parsing CSV files with Java or Groovy. You can add the jar to the Groovy classpath (and dynamically resolve their dependencies) by using Grapes like this:
#Grab(group='com.opencsv', module='opencsv', version='3.3')
You're in luck. It's no problem that you're new to SoapUI, because OpenCSV doesn't have anything to do with SoapUI :)
How to read a CSV file using OpenCSV and Groovy
#Grab('com.opencsv:opencsv:3.5')
import com.opencsv.CSVReader
/*
* Mock some CSV data
*/
def reader = new StringReader(
'''column1,column2,column3,column4,column5
Hydrogen,1,H,1.00797,20.4
Carbon,6,C,12.0115,5100
Oxygen,8,O,15.9994,90.2
Gold,79,Gd,196.967,3239
Uranium,92,U,238.03,4091''')
/*
* A nice mapping to give each field in the CSV file a name.
* Much better than a bunch of propData[n] all over the place.
*/
def field = [
ELEMENT: 0
]
reader.withReader {
new CSVReader(it).eachWithIndex {list, index ->
if(index == 0) {
/*
* Do whatever you need to do with the header of the CSV file.
* Example:
* testCase.setPropertyValue("data1",list[field.ELEMENT])
*/
} else {
/*
* Do whatever you need to do with the remaining rows.
* Example:
* curTC.setPropertyValue("data1",list[field.ELEMENT])
*/
}
}
}
Header and Data
You'll notice that in the eachWithIndex() loop there's an if-else. This makes it possible to process the header and then proceed with the remaining rows without having to restart reading the file.
You should be able to plug your SoapUI-specific code into the appropriate section.
Varying number of fields
If for some reason your data rows don't all have the same number of fields, you can check how many fields there are like this: list.size()