Import csv file data to populate a Prolog knowledge base - csv

I have a csv file example.csv which contains two columns with header var1 and var2.
I want to populate an initially empty Prolog knowledge base file import.pl with repeated facts, while each row of example.csv is treated same:
fact(A1, A2).
fact(B1, B2).
fact(C1, C2).
How can I code this in SWI-Prolog ?
EDIT, based on answer from #Shevliaskovic:
:- use_module(library(csv)).
import:-
csv_read_file('example.csv', Data, [functor(fact), separator(0';)]),
maplist(assert, Data).
When import. is run in console, we update the knowledge base exactly the way it is requested (except for the fact that the knowledge base is directly updated in memory, rather than doing this via a file and subsequent consult).
Check setof([X, Y], fact(X,Y), Z). :
Z = [['A1', 'A2'], ['B1', 'B2'], ['C1', 'C2'], [var1, var2]].

SWI Prolog has a built in process for this.
It is
csv_read_file(+File, -Rows)
Or you can add some options:
csv_read_file(+File, -Rows, +Options)
You can see it at the documentation. For more information
Here is the example that the documentation has:
Suppose we want to create a predicate table/6 from a CSV file that we
know contains 6 fields per record. This can be done using the code
below. Without the option arity(6), this would generate a predicate
table/N, where N is the number of fields per record in the data.
?- csv_read_file(File, Rows, [functor(table), arity(6)]),
maplist(assert, Rows).
For example:
If you have a File.csv that looks like:
A1 A2
B1 B2
C1 C2
You can import it to SWI like:
9 ?- csv_read_file('File.csv', Data).
The result would be:
Data = [row('A1', 'A2'), row('B1', 'B2'), row('C1', 'C2')].

Related

Stata read numeric data as string using variable names

I am reading a csv file into Stata using
import delimited "../data_clean/winter20.csv", encoding(UTF-8)
The raw data looks like:
y id1
-.7709586 000000000020
-.4195721 000000003969
-.8932499 300000000021
-1.256116 200000007153
-.7858037 000000000000
The imported data become:
y id1
-.7709586 20
-.4195721 000000003969
-.8932499 300000000021
-1.256116 200000007153
-.7858037 0
However, there are some columns of IDs which are read as numeric. I would like to import them as strings. I want to read the data exactly as how the raw data looks like.
The way I found online is:
import delimited "/Users/tianwang/Dropbox/Construction/data_clean/winter20.csv", encoding(UTF-8) stringcols(74 97 116) clear
However, the raw data may be updated and column numbers may change. The following
import delimited "/Users/tianwang/Dropbox/Construction/data_clean/winter20.csv", encoding(UTF-8) stringcols(id1 id2 id3) clear
gives error id1: invalid numlist in stringcols() option. Is there a way to specify variable names rather than column numbers?
The reason is leading zeros are missing if I read IDs as numeric. Methodtostring does not recover the leading zeros. format id1 %09.0f only works if variables have equal number of digits.
I think this should do it.
import delimited "../data_clean/winter20.csv", stringcols(_all) encoding(UTF-8) clear
PS: Tested in Stata16/Win10

Importing a CSV file as a matrix

I would like to import a CSV file (file.csv) as a matrix in Julia to plot it as a heatmap using GR. My CSV file contains 255 rows and 255 entries on each row. Here are some entires from the CSV file to illustrate the format of the rows:
file.csv
-1.838713563526794E-8;-1.863045549663876E-8;-2.334704481052452E-8 ...
-1.7375447279939282E-8;-1.9194929690414267E-8;-2.0258124812468942E-8; ...
⋮
-1.1706980663321613E-8;-1.6244768693064608E-8;-5.443335580296977E-9; ...
Note: The elipsis (...) are not part of the CSV file, rather they indicate that entires have been omitted.
I have tried importing the file as a matrix using the following line m = CSV.read("./file.csv"), but this results in a 255 by 1 vector rather than the 255 by 255 matrix. Does anyone know of an effective way to import CSV files as matrices in Julia?
You can use
using DelimitedFiles
m = readdlm("./file.csv", ';', Float64)
(last argument specifying type can be omitted if you want Float64)
m = CSV.read("./file.csv") returns a DataFrame.
If CSV.jl reads the file correctly so that all the columns of m are of type Float64 containing no missings, then you can convert m to a Float64 matrix with Matrix{Float64}(m), or obtain the matrix with one line:
m = Matrix{Float64}(CSV.read("./file.csv", header=0, delim=';'))
# or with piping syntax
m = CSV.read("./file.csv", header=0, delim=';') |> Matrix{Float64}
readdlm, though, should normally be enough and first solution to go for such simple CSV files like yours.
2022 Answer
Not sure if there has been a change to CSV.jl, however, if I do CSV.read("file.csv") it will error
provide a valid sink argument, like 'using DataFrames; CSV.read(source, DataFrame)'
You can however use the fact that it wants any Tables.jl compatible type:
using CSV, Tables
M = CSV.read("file.csv", Tables.matrix, header=0)

Selectively Import only Json data in txt file into R.

I have 3 questions I would like to ask as I am relatively new to both R and Json format. I read quite a bit of things but I don't quite understand still.
1:) Can R parse Json data when the txt file contains other irrelevant information as well?
Assuming I can't, I uploaded the text file into R and did some cleaning up. So that it will be easier to read the file.
require(plyr)
require(rjson)
small.f.2 <- subset(small.f.1, ! V1 %in% c("Level_Index:", "Feature_Type:", "Goals:", "Move_Count:"))
small.f.3 <- small.f.2[,-1]
This would give me a single column with all the json data in each line.
I tried to write new .txt file .
write.table(small.f.3, file="small clean.txt", row.names = FALSE)
json_data <- fromJSON(file="small.clean")
The problem was it only converted 'x' (first row) into a character and ignored everything else. I imagined it was the problem with "x" so I took that out from the .txt file and ran it again.
json_data <- fromJSON(file="small clean copy.txt")
small <- fromJSON(paste(readLines("small clean copy.txt"), collapse=""))
Both time worked and I manage to create a list. But it only takes the data from the first row and ignore the rest. This leads to my second question.
I tried this..
small <- fromJSON(paste(readLines("small clean copy.txt"), collapse=","))
Error in fromJSON(paste(readLines("small clean copy.txt"), collapse = ",")) :
unexpected character ','
2.) How can I extract the rest of the rows in the .txt file?
3.) Is it possible for R to read the Json data from one row, and extract only the nested data that I need, and subsequently go on to the next row, like a loop? For example, in each array, I am only interested in the Action vectors and the State Feature vectors, but I am not interested in the rest of the data. If I can somehow extract only the information I need before moving on to the next array, than I can save a lot of memory space.
I validated the array online. But the .txt file is not json formatted. Only within each array. I hope this make sense. Each row is a nested array.
The data looks something like this. I have about 65 rows (nested arrays) in total.
{"NonlightningIndices":[0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15],"LightningIndices":[],"SelectedAction":12,"State":{"Features":{"Data":[21.0,58.0,0.599999964237213,12.0,9.0,3.0,1.0,0.0,11.0,2.0,1.0,0.0,0.0,0.0,0.0]}},"Actions":[{"Features":{"Data":[4.0,4.0,1.0,1.0,0.0,3.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.12213890532609,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.13055793241076,0.0,0.0,0.0,0.0,0.0,0.231325346416068,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.949158357257511,0.0,0.0,0.0,0.0,0.0,0.369666537828737,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0851765937900996,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.223409208023677,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.698640447815897,1.69496718435102,0.0,0.0,0.0,0.0,1.42312654023416,0.0,0.38394999584831,0.0,0.0,0.0,0.0,1.0,1.22164326251584,1.30980246401454,1.00411570750454,0.0,0.0,0.0,1.44306759429513,0.0,0.00568191150434618,0.0,0.0,0.0,0.0,0.0,0.0,0.157705869690127,0.0,0.0,0.0,0.0,0.102089274086033,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.37039305683305,2.64354332879095,0.0,0.456876463171171,0.0,0.0,0.208651305680117,0.0,0.0,0.0,0.0,0.0,2.0,0.0,3.46713142511126,2.26785558685153,0.284845692694476,0.29200364444299,0.0,0.562185300773834,1.79134869431988,0.423426746571872,0.0,0.0,0.0,0.0,5.06772310533214,0.0,1.95593334724537,2.08448537685298,1.22045520912269,0.251119892385839,0.0,4.86192274732091,0.0,0.186941346075472,0.0,0.0,0.0,0.0,4.37998688020614,0.0,3.04406665275463,1.0,0.49469909818283,0.0,0.0,1.57589195190525,0.0,0.0,0.0,0.0,0.0,0.0,3.55229001446173]}},......
{"NonlightningIndices":[0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,24],"LightningIndices":[[15,16,17,18,19,20,21,22,23]],"SelectedAction":15,"State":{"Features":{"Data":[20.0,53.0,0.0,11.0,10.0,2.0,1.0,0.0,12.0,2.0,1.0,0.0,0.0,1.0,0.0]}},"Actions":[{"Features":{"Data":[4.0,4.0,1.0,1.0,0.0,3.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.110686363475575,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.13427913742728,0.0,0.0,0.0,0.0,0.0,0.218834141070836,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.939443046803111,0.0,0.0,0.0,0.0,0.0,0.357568892126985,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0889329732996782,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.22521492930721,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.700441220022084,1.6762090551226,0.0,0.0,0.0,0.0,1.44526456614638,0.0,0.383689214317325,0.0,0.0,0.0,0.0,1.0,1.22583659574753,1.31795156033445,0.99710368703165,0.0,0.0,0.0,1.44325394830013,0.0,0.00418600599483917,0.0,0.0,0.0,0.0,0.0,0.0,0.157518319482216,0.0,0.0,0.0,0.0,0.110244186273209,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.369899973785845,2.55505143302811,0.0,0.463342609296841,0.0,0.0,0.226088384842823,0.0,0.0,0.0,0.0,0.0,2.0,0.0,3.47842109127488,2.38476342332125,0.0698115810371108,0.276804206873942,0.0,1.53514282355593,1.77391161515718,0.421465101754304,0.0,0.0,0.0,0.0,4.45530484778828,0.0,1.43798302409155,3.46965807176681,0.468528940277049,0.259853183829217,0.0,4.86988325473155,0.0,0.190659677933533,0.0,0.0,0.963116148760181,0.0,4.29930830894124,0.0,2.56201697590845,0.593423384852181,0.46165947868584,0.0,0.0,1.59497392171253,0.0,0.0,0.0,0.0,0.0368838512398189,0.0,4.24538684327048]}},......
I would really appreciate any advice here.

Write data to CSV file from swi-prolog code

How can I write data to CSV file from the prolog code below?
Thanks,
SB
run:-
member(A,[a,b,c,d,e]),
member(B,[1,2,3,4,5,6]),
write(A),write(' '),write(B),nl,
fail.
run.
Simple solution
Since you are using SWI-Prolog, you can use the CSV library.
?- use_module(library(csv)).
?- findall(row(A,B), (member(A, [a,b,c,d,e]), member(B, [1,2,3,4,5])), Rows), csv_write_file('output.csv', Rows).
As you can see, I do this in two steps:
I create terms of the form row(A, B).
Then I hand those terms to csv_write_file/2 which takes care of creating a syntactically correct output file.
Non-standard separators
In your question you are not writing a comma between A and B but a space. If you really want to use the space as a separator you can set this as an option:
csv_write_file('output.csv', Rows, [option(separator(0' )])
'Unbalanced' arguments
Also, in your question you have more values for B than for A. You can write code that handles this, but there are several ways in which this can be dealt with. E.g., (1) you can fill missing cells with nill; (2) you can throw an exception if same_length(As, Bs) fails; (3) you can only write the 'full' rows:
length(As0, N1),
length(Bs0, N2),
N is max(N1, N2),
length(As, N),
append(As, _, As0),
length(Bs, N),
append(Bs, _, Bs0),

Reading XML data into R from a html source

I'd like to import data into R from a given webpage, say this one.
In the source code (but not on the actual page), the data I'd like to get is stored in a single line of javascript code which starts like this:
chart_Line1.setDataXML("<graph rotateNames (stuff omitted) >
<set value='699.99' name='16.02.2013' />
<set value='731.57' name='18.02.2013' />
<set value='more values' name='more dates' />
...
<trendLines> (now a different command starts, stuff omitted)
</trendLines></graph>")
(Note that I've included line breaks for readability; the data is in one single line in the original file. It would suffice to import only the line which starts with chart_Line1.setDataXML - it's line 56 in the source if you want to have a look yourself)
I can read the whole html file into a string using scan("URLofFile", what="raw"), but how do I extract the data from this?
Can I specify the data format with what="...", keeping in mind that there are no line breaks to separate the data, but several line breaks in the irrelevant prefix and suffix?
Is this something which can be done in a nice way using R tools, or do you suggest that this data acquisition should rather be done with a different script?
With some trial & error, I was able to find the exact line where the data is contained. I read the whole html file, and then dispose of all other lines.
require(zoo)
require(stringr)
# get html data, scrap all lines but the interesting one
theurl <- "https://www.magickartenmarkt.de/Black_Lotus_Unlimited.c1p5093.prod"
sec <- scan(file =theurl, what = "character", sep="\n")
sec <- sec[45]
# extract all strings of the form "value='X'", where X is a 1 to 3 digit number with some separator and 2 decimal places
values <- str_extract_all(sec, "value='[0-9]{1,3}.[0-9]{2}'")
# dispose of all non-numerical, non-separator values
values <- str_replace_all(unlist(values),"[^0-9/.]","")
# get all dates in the form "name='DD.MM.YYYY"
dates <- str_extract_all(sec, "name='[0-9]{2}.[0-9]{2}.[0-9]{4}'")
# dispose of all non-numerical, non-separator values
dates <- str_replace_all(unlist(dates),"[^0-9/.]","")
# convert dates to canonical format
dates <- as.Date(dates,format="%d.%m.%Y")
# put values and dates into a list of ordered observations, converting the values from characters to numbers first.
MyZoo <- zoo(as.numeric(values),dates)