Related
An ordinary way to make a tuple in Julia is like this:
n = 5
t2 = (n,n) # t2 = (5,5)
t3 = (n,n,n)# t3 = (5,5,5)
I want to make a tuple of arbitrary size functionally.
n = 5
someFunction(n,size) = ???
t10 = someFunction(n,10) # t10 = (5,5,5,5,5,5,5,5,5,5)
How can I realize this?
Any information would be appreciated.
Maybe what you are looking for is ntuple ?
julia> ntuple(_ -> 5, 10)
(5, 5, 5, 5, 5, 5, 5, 5, 5, 5)
Note that, you can also use tuple or Tuple:
julia> tuple((5 for _ in 1:10)...)
(5, 5, 5, 5, 5, 5, 5, 5, 5, 5)
julia> Tuple(5 for _ in 1:10)
(5, 5, 5, 5, 5, 5, 5, 5, 5, 5)
I am converting a nested json file having more than 100 records into a flattend csv file. The sample json file is shown below:
sampleJson = {
'record1':
{
'text':[ ['A', 'fried', 'is', 'a', 'nice', 'companion', '.'],
['The', 'birds', 'are', 'flying', '.']],
'values':[ [0, 1, 0, 0],
[1, 1, 0, 1]],
'pairs':[ [0, 2],
[2, 1]]
},
'record2':
{
'text':[ ['We', 'can', 'work', 'hard', 'together', '.'],
['Let', 'the', 'things', 'happen', '.'],
['There', 'is', 'always', 'a', 'way', 'out', '.']],
'values':[ [0, 1, 0, 0],
[0, 1, 1, 1],
[1, 1, 0, 1]],
'pairs':[ [0, 2],
[3, 4],
[2, 1]]
},
..... 100 records
}
The csv structure i want from this nested json is:
record1, A friend is a nice companion., 0, 1, 0, 0, [0, 2]
, The bids are flying., 1, 1, 0, 1, [2, 1]
record2, We can work hard together., 0, 1, 0, 0, [0, 2]
, Let the things happen., 0, 1, 1, 1, [4, 3]
, There is always a way out., 1, 1, 0, 1, [2, 1]
record3,
....... upto 100 records
I used the following code to flatten the nested file:
def flatten_json(y):
out = {}
def flatten(x, name=''):
if type(x) is dict:
for a in x:
flatten(x[a], name + a + '_')
elif type(x) is list:
i = 0
for a in x:
flatten(a, name + str(i) + '_')
i += 1
else:
out[name[:-1]] = x
flatten(y)
return out
flatIt = flatten_json(sampleJson)
df= pd.json_normalize(flatIt)
df.to_csv('outPutFile.csv', encoding='utf-8')
print(df)
I am getting a long list of columns with a structure like record1.text, record1.values, record1.pairs, record2.text and so on with one row and also each word of the sentences in the text is in a separate column.
I will appreciate some help.
Thanks..
You can use this example to parse the Json to dataframe:
import pandas as pd
sampleJson = {
'record1':
{
'text':[ ['A', 'fried', 'is', 'a', 'nice', 'companion', '.'],
['The', 'birds', 'are', 'flying', '.']],
'values':[ [0, 1, 0, 0],
[1, 1, 0, 1]],
'pairs':[ [0, 2],
[2, 1]]
},
'record2':
{
'text':[ ['We', 'can', 'work', 'hard', 'together', '.'],
['Let', 'the', 'things', 'happen', '.'],
['There', 'is', 'always', 'a', 'way', 'out', '.']],
'values':[ [0, 1, 0, 0],
[0, 1, 1, 1],
[1, 1, 0, 1]],
'pairs':[ [0, 2],
[3, 4],
[2, 1]]
},
}
all_data = []
for k, v in sampleJson.items():
texts, values, pairs = v['text'], v['values'], v['pairs']
for t, val, p in zip(texts, values, pairs):
all_data.append({
'record': k,
'text': ' '.join(t),
'pairs': p,
**{'val_{}'.format(i): val_ for i, val_ in enumerate(val, 1)}
})
df = pd.DataFrame(all_data)
print(df)
Prints this dataframe:
record text pairs val_1 val_2 val_3 val_4
0 record1 A fried is a nice companion . [0, 2] 0 1 0 0
1 record1 The birds are flying . [2, 1] 1 1 0 1
2 record2 We can work hard together . [0, 2] 0 1 0 0
3 record2 Let the things happen . [3, 4] 0 1 1 1
4 record2 There is always a way out . [2, 1] 1 1 0 1
I have the csv file that contains data as:
A B C
A - 4 5
B 8 - 6
C 2 3 -
I want to have facts in the following form:
num(a,b,4).
num(a,c,5).
num(b,a,8).
num(b,c,6).
num(c,a,2).
num(c,b,3).
There should not be facts for similar alphabets like num(a,a,-).
I am using prolog's csv_read_file as:
csv_read_file(Path, Rows, [functor(num), arity(4)]), maplist(assert, Rows).
and its giving me output as:
Rows = [num('', 'A', 'B', 'C'), num('A', -, 4, 5), num('B', 8, -, 6), num('C', 2, 3, -)]
It seems to be a basic question, but I am not able to think about condition to perform this. Any help will be highly appreciated.
As per Isabelle Newbie Answer:
Open :- csv_read_file('Path', Rows, [functor(num), arity(4)]), table_entry(Rows, Row).
header_row_entry(Header,Row,Entry):-
arg(1, Row, RowName),
functor(Header, _, Arity),
between(2,Arity,ArgIndex),
arg(ArgIndex, Header, ColumnName),
arg(ArgIndex, Row, Value),
Entry = num(RowName, ColumnName, Value),
writeln(Entry).
table_entry(Entries, Entry):-
Entries = [Header | Rows],
member(Row, Rows),
header_row_entry(Header, Row, Entry).
Now, can anyone explain how and where I should use maplist to convert the rows in form of facts (neglect filtering of '-' and lowercase for now) so that when I query:
?-num(A,B,X).
I should get:
X=4
Next task is, I want to implement depth first search algorithm on it. Any details regarding this will be highly appreciated.
Consider a table header num('', 'A', 'B', 'C') and a row in the table num('B', 8, -, 6). From this you want to compute a table entry identified by the row's name, which here is 'B', and by a column name: the column name being 'A' for the first value (8), 'B' for the second (-), 'C' for the third (6).
Here's a simple way to do this, involving some typing and the obligatory copy-and-paste errors:
header_row_entry(Header, Row, Entry) :-
Header = num('', ColumnName, _, _),
Row = num(RowName, Value, _, _),
Entry = num(RowName, ColumnName, Value).
header_row_entry(Header, Row, Entry) :-
Header = num('', _, ColumnName, _),
Row = num(RowName, _, Value, _),
Entry = num(RowName, ColumnName, Value).
header_row_entry(Header, Row, Entry) :-
Header = num('', _, _, ColumnName),
Row = num(RowName, _, _, Value),
Entry = num(RowName, ColumnName, Value).
This enumerates all the entries in a row on backtracking:
?- Header = num('', 'A', 'B', 'C'), Row = num('B', 8, -, 6),
header_row_entry(Header, Row, Entry).
Header = num('', 'A', 'B', 'C'),
Row = num('B', 8, -, 6),
Entry = num('B', 'A', 8) ;
Header = num('', 'A', 'B', 'C'),
Row = num('B', 8, -, 6),
Entry = num('B', 'B', -) ;
Header = num('', 'A', 'B', 'C'),
Row = num('B', 8, -, 6),
Entry = num('B', 'C', 6).
To enumerate all the entries in an entire table, it remains to enumerate all rows, then enumerate row entries as above. Here this is:
table_entry(Entries, Entry) :-
Entries = [Header | Rows],
member(Row, Rows),
header_row_entry(Header, Row, Entry).
And now, given your table:
?- Table = [num('', 'A', 'B', 'C'), num('A', -, 4, 5), num('B', 8, -, 6), num('C', 2, 3, -)], table_entry(Table, Entry).
Table = [num('', 'A', 'B', 'C'), num('A', -, 4, 5), num('B', 8, -, 6), num('C', 2, 3, -)],
Entry = num('A', 'A', -) ;
Table = [num('', 'A', 'B', 'C'), num('A', -, 4, 5), num('B', 8, -, 6), num('C', 2, 3, -)],
Entry = num('A', 'B', 4) ;
Table = [num('', 'A', 'B', 'C'), num('A', -, 4, 5), num('B', 8, -, 6), num('C', 2, 3, -)],
Entry = num('A', 'C', 5) ;
Table = [num('', 'A', 'B', 'C'), num('A', -, 4, 5), num('B', 8, -, 6), num('C', 2, 3, -)],
Entry = num('B', 'A', 8) ;
Table = [num('', 'A', 'B', 'C'), num('A', -, 4, 5), num('B', 8, -, 6), num('C', 2, 3, -)],
Entry = num('B', 'B', -) . % etc.
Depending on what you want exactly, it remains to lowercase the row and column names (the irritatingly named downcase_atom in SWI-Prolog, for example) and filter out the - entries. You can then assert the entries using a failure-driven loop or by collecting all of them using findall and asserting using maplist.
Now that we have a working solution, we might want header_row_entry to be a bit nicer. We can use arg/3 to capture more explicitly that we are trying to pair a column name and a value that are at the same argument position in their respective header and row terms:
header_row_entry(Header, Row, Entry) :-
arg(1, Row, RowName),
functor(Header, _, Arity),
between(2, Arity, ArgIndex),
arg(ArgIndex, Header, ColumnName),
arg(ArgIndex, Row, Value),
Entry = num(RowName, ColumnName, Value).
This is shorter than the above and applicable to any number of columns in the table.
I ran the code below
a = ['dog', 'in', 'plants', 'crouches', 'to', 'look', 'at', 'camera']
b = ['a', 'brown', 'dog', 'in', 'the', 'grass', ' ', ' ']
from nltk.translate.bleu_score import corpus_bleu
bleu1 = corpus_bleu(a, b, weights=(1.0, 0, 0, 0))
print(bleu1)
This is the error
The hypothesis contains 0 counts of 3-gram overlaps. Therefore the
BLEU score evaluates to 0, independently of how many N-gram overlaps
of lower order it contains. Consider using lower n-gram order or use
SmoothingFunction() warnings.warn(_msg)
Can someone tell me what is the problem here? I can not find the solution on google. Thank you.
Best,
DD
I found the solution. Basically, I need a list inside a list for list 'a'. So code below will work without error.
a = [['dog', 'in', 'plants', 'crouches', 'to', 'look', 'at', 'camera']]
b = ['a', 'brown', 'dog', 'in', 'the', 'grass', ' ', ' ']
from nltk.translate.bleu_score import corpus_bleu
bleu1 = corpus_bleu(a, b, weights=(1.0, 0, 0, 0))
print(bleu1)
I'm practicing NLP with the nltk Library and I want to build myself an dataset for that. I combine several documents into a list of lists and then preprocess them. First I tokenize it, lowercase it and then I want to remove punctuation. It works for a vecor, but not for a list of lists:
Example for a vector:
a = 'This is a Testsentence and it is beautiful times 10!**!.'
b = word_tokenize(a)
c = [x.lower() for x in b]
['this', 'is', 'a', 'testsentence', 'and', 'it', 'is', 'beautiful', 'times', '10', '.']
d = [x for x in c if x.isalpha()]
['this', 'is', 'a', 'testsentence', 'and', 'it', 'is', 'beautiful', 'times']
Now I want to do it in a list of lists, but I fail to write the list comprehension at the end:
aa = 'This is a Testsentence and it is beautiful times 10.'
bb = 'It is a beautiful Testsentence?'
cc = 'Testsentence beautiful!'
dd = [aa, bb, cc]
ee = [word_tokenize(x) for x in dd]
ff = [[x.lower() for x in y] for y in ee]
[['this', 'is', 'a', 'testsentence', 'and', 'it', 'is', 'beautiful', 'times', '10', '.'], ['it', 'is', 'a', 'beautiful', 'testsentence', '?'], ['testsentence', 'beautiful', '!']]
This is where my problems start since I cant figure out how to write the list comprehension correctly.
gg = [[j.isalpha() for j in i] for i in ff]
This is the Result
[[True, True, True, True, True, True, True, True, True, False, False], [True, True, True, True, True, False], [True, True, False]]
But I want something like this:
[['this', 'is', 'a', 'testsentence', 'and', 'it', 'is', 'beautiful', 'times', '10', '.'], ['it', 'is', 'a', 'beautiful', 'testsentence', '?'], ['testsentence', 'beautiful', '!']]
Thanks :)
Try the following
gg = [[j for j in i if j.isalpha()] for i in ff]
This returns the expected answer
[['this', 'is', 'a', 'testsentence', 'and', 'it', 'is', 'beautiful', 'times'],
['it', 'is', 'a', 'beautiful', 'testsentence'],
['testsentence', 'beautiful']]