How is it possible to get the input of an json file:
{ "name": "John",
"work": "chef",
"age": "29",
"messages": [
{
"msg_name": "Hello",
"msg": "how_are_you"
},
{ "second_msg_name": "hi",
"msg": "fine"
}
]
}
into a Lua table? All json.lua scripts I found did not work for JSON with new lines. Does someone know a solution?
So piglets solution works for the string in the same script.
But how does it work with a JSON file?
local json = require("dkjson")
local file = io.open("C:\\Users\\...\\Documents\\Lua_Plugins\\test_file_reader\\test.json", "r")
local myTable = json.decode(file)
print(myTable)
Here then comes the error "bad argument #1 to 'strfind' (string expected, got FILE*)" on. Does someone see my fault?
local json = require("dkjson")
local yourString = [[{ "name": "John",
"work": "chef",
"age": "29",
"messages": [
{
"msg_name": "Hello",
"msg": "how_are_you"
},
{ "second_msg_name": "hi",
"msg": "fine"
}
]
}]]
local myTable = json.decode(yourString)
http://dkolf.de/src/dkjson-lua.fsl/home
I found the solution:
local json = require("dkjson")
local file = io.open("C:\\Users\\...\\Documents\\Lua_Plugins\\test_file_reader\\test.json", "r")
local content = file:read "*a"
file:close()
local myTable = json.decode(content)
I have a template json file with list of hosts for an example. The same to be replaced with the dynamic value being generated by shell script.
Sample rds.json
{
"lob": "coaching",
"function": "badminton",
"hosts": [
"node1.rds.sports.com",
"node2.rds.sports.com",
"node3.rds.sports.com"
],
"adminserver": "node1.rds.sports.com",
"user_name": "coach",
"sudo_type": "sudo",
"group_name": "admin"
}
echo $myHosts:
"host1.rds.sports.com", "host2.rds.sports.com", "host3.rds.sports.com", "host4.rds.sports.com", "host5.rds.sports.com", "host6.rds.sports.com", "host7.rds.sports.com", "host8.rds.sports.com"
The value of $myHosts should get replaced in hosts elements key.
Desired output:
{
"lob": "coaching",
"function": "badminton",
"hosts": [
"host1.rds.sports.com",
"host2.rds.sports.com",
"host3.rds.sports.com",
"host4.rds.sports.com",
"host5.rds.sports.com",
"host6.rds.sports.com",
"host7.rds.sports.com",
"host8.rds.sports.com"
],
"adminserver": "hosts1.rds.sports.com",
"user_name": "coach",
"sudo_type": "sudo",
"group_name": "admin"
}
I'm going to assume the contents of myHosts is a valid JSON array body.
jq --argjson hosts "[$myHosts]" '.hosts = $hosts | .adminserver = .hosts[0]' rds.json
jqplay
I have two json files. I am validating the response is same or different. I need to show the user where there is an exact change. Some what like the particular key is added or removed or changed in this file.
file1.json
[
{
"Name": "Jack",
"region": "USA",
"tags": [
{
"name": "Name",
"value": "Assistant"
}
]
},
{
"Name": "MATHEW",
"region": "USA",
"tags": [
{
"name": "Name",
"value": "Worker"
}
]
}
]
file2.json
[
{
"Name": "Jack",
"region": "USA",
"tags": [
{
"name": "Name",
"value": "Manager"
}
]
},
{
"Name": "MATHEW",
"region": "US",
"tags": [
{
"name": "Name",
"value": "Assistant"
}
]
}
]
If you see Two JSON you can find the difference as a region in file2.json has changed US and Values changed from manager to assistant and worker. Now I want to show the user that file2.json has some changes like region :US and Manager changed to Assistant.
I have used deepdiff for validating purpose.
from deepdiff import DeepDiff
def difference(oldurl_resp,newurl_resp,file1):
ddiff = DeepDiff(oldurl_resp, newurl_resp,ignore_order=True)
if(ddiff == {}):
print("BOTH JSON FILES MATCH !!!")
return True
else:
print("FAILURE")
output = ddiff
if(output.keys().__contains__('iterable_item_added')):
test = output.get('iterable_item_added')
print('The Resource name are->')
i=[]
for k in test:
print("Name: ",test[k]['Name'])
print("Region: ",test[k]['region'])
msg= (" Name ->"+ test[k]['Name'] +" Region:"+test[k]['region'] +". ")
i.append(msg)
raise JsonCompareError("The json file has KEYS changed!. Please validate for below" +str(i) +"in "+file1)
elif(output.keys().__contains__('iterable_item_removed')):
test2 = output.get('iterable_item_removed')
print('The name are->')
i=[]
for k in test2:
print(test2[k]['Name'])
print(test2[k]['region'])
msg= (" Resource Name ->"+ test2[k]['Name'] +" Region:"+test2[k]['region'] +". ")
i.append(msg)
raise JsonCompareError("The json file has Keys Removed!!. Please validate for below" +str(i)+"in "+file1)
This code just shows the resource Name I want to show the tags also which got changed and added or removed.
Can anybody guide me
If you just print out the values of "test" variables, you will find out that "tag" variable changes are inside of it, test value of test in this example will be:
test = {'root[0]': {'region': 'USA', 'Name': 'Jack', 'tags': [{'name': 'Name', 'value': 'Manager'}]}, 'root[1]': {'region': 'US', 'Name': 'MATHEW', 'tags': [{'name': 'Name', 'value': 'Assistant'}]}}
and you can print test[k]['tags'] or add it your "msg" variable.
Suggestion:
Also, if your data has some primary key (for example they have "id", or their order is always fixed), you can compare their data 1 by 1 (instead of comparing whole lists) and you can have a better comparison. For example if you compare data of "Jack" together, you will have the following comparison:
{'iterable_item_removed': {"root['tags'][0]": {'name': 'Name', 'value': 'Assistant'}}, 'iterable_item_added': {"root['tags'][0]": {'name': 'Name', 'value': 'Manager'}}}
You should try the deepdiff library. It gives you the key where the difference occurs and the old and new value.
from deepdiff import DeepDiff
ddiff = DeepDiff(json_object1, json_object2)
# if you want to compare by ignoring order
ddiff = DeepDiff(json_object1, json_object2, ignore_order=True)
I am trying to solve an issue with the json file that I get from the twitterAPI api.request.
I am trying to extract the links to the media inside of each tweet with this piece of code:
for item in api.request(TWITTER_ENDPOINT, TWITTER_PARAMS):
if 'entities' in item:
if 'media' in item['entities']:
page = item['text']
link = item['entities']['media']['media_url']
elif 'message' in item:
print('ERROR %s: %s\n' % (item['code'], item['message']))
The format of the field in the json file is a str according to the API documentation, and I can print it if I use this:
urls = [user['media_url'] for user in item['entities']['media']]
print(type(urls[0]))
But the problem is that then I have it stored in a list. What would I need to do to create a variable "link" of type string?
The format of the JSON is this:
"entities": {
"hashtags": [],
"symbols": [],
"user_mentions": [],
"urls": [],
"media": [
{
"id": 707667182633619500,
"id_str": "707667182633619456",
"indices": [
23,
46
],
"media_url": "http://pbs.twimg.com/media/CdIjjaAUEAA7-VU.jpg",
I am a beginner in Python, and I would really appreciate a bit of help with this!
I have a dump of the IMDB database in form of a CSV.
The CSV look like this :
name, movie, role
"'El Burro' Van Rankin, Jorge","Serafín (1999)",PLAYED_IN
"'El Burro' Van Rankin, Jorge","Serafín (1999)",PLAYED_IN
"'El Burro' Van Rankin, Jorge","Serafín (1999)",PLAYED_IN
.........
"A.S., Alwi","Rumah masa depan (1984)",PLAYED_IN
"A.S., Giri","Sumangali (1940)",PLAYED_IN
"A.S., Luis","Bob the Drag Queen: Bloodbath (2016)",PLAYED_IN
"A.S., Pragathi","Suli (2016)",PLAYED_IN
"A.S.F. Dancers, The","D' Lucky Ones! (2006)",PLAYED_IN
.........
My goal is to put the data in Elastic Search but I don't want to have duplicate of actors so I want to aggregate the movie they are playing in so that the dataset look like this :
{
"_index": "imdb13",
"_type": "logs",
"_id": "AVmw9JHCrsOFTsZwAmBm",
"_score": 13.028783,
"_source": {
"#timestamp": "2017-01-18T09:42:15.149Z",
"movie": [
"Naomi and Ely's No Kiss List (2015)",
"Staten Island Summer (2015/II)",
"What Happened Last Night (2016)",
...
],
"#version": "1",
"name": "Abernethy, Kevin",
}
}
So I am using Logstash to push the data into ElasticSearch. I use the aggregate plugin and my configuration file is the following :
input {
file {
path => "/home/maeln/imdb-data/roles.csv"
start_position => "beginning"
}
}
filter {
csv {
columns => [ "name", "movie" ]
remove_field => ["role", "message", "host", "column3", "path"]
separator => ","
}
aggregate {
task_id => "%{name}"
code => "
map['movie'] ||= []
event.to_hash.each do |key,value|
map[key] = value unless map.has_key?(key)
map[key] << value if map[key].is_a?(Array)
end
"
push_previous_map_as_event => true
timeout => 30
timeout_tags => ['aggregated']
}
if "aggregated" not in [tags] {
drop {}
}
}
output {
elasticsearch {
hosts => "localhost:9200"
index => "imdb13"
}
}
But then, when I do a simple search on the index, all the actors are duplicated with only one movie in the "movie" field, like this :
{
"took": 4,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 149,
"max_score": 13.028783,
"hits": [
{
"_index": "imdb13",
"_type": "logs",
"_id": "AVmw9JHCrsOFTsZwAmBm",
"_score": 13.028783,
"_source": {
"#timestamp": "2017-01-18T09:42:15.149Z",
"movie": [
"Naomi and Ely's No Kiss List (2015)"
],
"#version": "1",
"name": "Abernethy, Kevin",
"tags": [
"aggregated"
]
}
},
{
"_index": "imdb13",
"_type": "logs",
"_id": "AVmw9JHCrsOFTsZwAmBq",
"_score": 12.998644,
"_source": {
"#timestamp": "2017-01-18T09:42:15.149Z",
"movie": [
"Staten Island Summer (2015/II)"
],
"#version": "1",
"name": "Abernethy, Kevin",
"tags": [
"aggregated"
]
}
},
{
"_index": "imdb13",
"_type": "logs",
"_id": "AVmw9JHCrsOFTsZwAmBu",
"_score": 12.998644,
"_source": {
"#timestamp": "2017-01-18T09:42:15.150Z",
"movie": [
"What Happened Last Night (2016)"
],
"#version": "1",
"name": "Abernethy, Kevin",
"tags": [
"aggregated"
]
}
},
.....
Is there a way to fix this ?
The log from logstash with the --debug option (only partially, the whole log is around ~1Gio) : paste (I put it on pastebin because of the 30000 chars limit in stackoverflow).
The last lines of the log :
[2017-01-18T11:34:09,977][DEBUG][logstash.filters.csv ] filters/LogStash::Filters::CSV: removing field {:field=>"path"}
[2017-01-18T11:34:09,977][DEBUG][logstash.filters.csv ] filters/LogStash::Filters::CSV: removing field {:field=>"role"}
[2017-01-18T11:34:09,977][DEBUG][logstash.filters.csv ] Event after csv filter {:event=>2017-01-18T10:34:09.900Z %{host} %{message}}
[2017-01-18T11:34:09,977][DEBUG][logstash.filters.csv ] filters/LogStash::Filters::CSV: removing field {:field=>"message"}
[2017-01-18T11:34:09,977][DEBUG][logstash.filters.csv ] filters/LogStash::Filters::CSV: removing field {:field=>"path"}
[2017-01-18T11:34:09,977][DEBUG][logstash.filters.csv ] filters/LogStash::Filters::CSV: removing field {:field=>"host"}
[2017-01-18T11:34:09,977][DEBUG][logstash.pipeline ] output received {"event"=>{"#timestamp"=>2017-01-18T10:34:09.897Z, "movie"=>["Tayong dalawa (2009)"], "#version"=>"1", "name"=>"Anselmuccio, Alex", "tags"=>["aggregated"]}}
[2017-01-18T11:34:09,977][DEBUG][logstash.filters.csv ] Event after csv filter {:event=>2017-01-18T10:34:09.915Z %{host} %{message}}
[2017-01-18T11:34:09,977][DEBUG][logstash.filters.csv ] filters/LogStash::Filters::CSV: removing field {:field=>"column3"}
[2017-01-18T11:34:09,977][DEBUG][logstash.filters.aggregate] Aggregate create_timeout_event call with task_id 'Anson, Christopher'
[2017-01-18T11:34:09,977][DEBUG][logstash.filters.csv ] filters/LogStash::Filters::CSV: removing field {:field=>"path"}
[2017-01-18T11:34:09,977][DEBUG][logstash.util.decorators ] filters/LogStash::Filters::Aggregate: adding tag {"tag"=>"aggregated"}
[2017-01-18T11:34:09,977][DEBUG][logstash.pipeline ] output received {"event"=>{"#timestamp"=>2017-01-18T10:34:09.917Z, "movie"=>["Tabi tabi po! (2001)"], "#version"=>"1", "name"=>"Anson, Alvin", "tags"=>["aggregated"]}}
[2017-01-18T11:34:09,978][DEBUG][logstash.filters.csv ] Event after csv filter {:event=>2017-01-18T10:34:09.921Z %{host} %{message}}
[2017-01-18T11:34:09,978][DEBUG][logstash.filters.aggregate] Aggregate successful filter code execution {:code=>"\n\t\t\t\tmap['movie'] ||= []\n\t\t\t\t\tevent.to_hash.each do |key,value|\n\t\t\t\t\tmap[key] = value unless map.has_key?(key)\n\t\t\t\t\tmap[key] << value if map[key].is_a?(Array)\n\t\t\t\tend\n\t\t\t\t"}
[2017-01-18T11:34:09,978][DEBUG][logstash.pipeline ] output received {"event"=>{"#timestamp"=>2017-01-18T10:34:09.911Z, "movie"=>["21 Jump Street (1987)"], "#version"=>"1", "name"=>"Ansley, Zachary", "tags"=>["aggregated"]}}
[2017-01-18T11:34:09,978][DEBUG][logstash.filters.aggregate] Aggregate create_timeout_event call with task_id 'Anseth, Elias Moussaoui'
[2017-01-18T11:34:09,978][DEBUG][logstash.pipeline ] output received {"event"=>{"#timestamp"=>2017-01-18T10:34:09.897Z, "movie"=>["Tayong dalawa (2009)"], "#version"=>"1", "name"=>"Anselmuccio, Alex", "tags"=>["aggregated"]}}
[2017-01-18T11:34:09,978][DEBUG][logstash.util.decorators ] filters/LogStash::Filters::Aggregate: adding tag {"tag"=>"aggregated"}
[2017-01-18T11:34:09,978][DEBUG][logstash.pipeline ] output received {"event"=>{"#timestamp"=>2017-01-18T10:34:09.917Z, "movie"=>["The Death Match: Fighting Fist of Samurai Joe (2013)"], "#version"=>"1", "name"=>"Anson, Alvin", "tags"=>["aggregated"]}}
[2017-01-18T11:34:09,978][DEBUG][logstash.filters.aggregate] Aggregate successful filter code execution {:code=>"\n\t\t\t\tmap['movie'] ||= []\n\t\t\t\t\tevent.to_hash.each do |key,value|\n\t\t\t\t\tmap[key] = value unless map.has_key?(key)\n\t\t\t\t\tmap[key] << value if map[key].is_a?(Array)\n\t\t\t\tend\n\t\t\t\t"}
[2017-01-18T11:34:09,978][DEBUG][logstash.pipeline ] output received {"event"=>{"#timestamp"=>2017-01-18T10:34:09.917Z, "movie"=>["The Diplomat Hotel (2013)"], "#version"=>"1", "name"=>"Anson, Alvin", "tags"=>["aggregated"]}}
[2017-01-18T11:34:09,978][DEBUG][logstash.filters.aggregate] Aggregate create_timeout_event call with task_id 'Anson, Alvin'
[2017-01-18T11:34:09,978][DEBUG][logstash.pipeline ] output received {"event"=>{"#timestamp"=>2017-01-18T10:34:09.897Z, "movie"=>["Tayong dalawa (2009)"], "#version"=>"1", "name"=>"Anselmuccio, Alex", "tags"=>["aggregated"]}}
[2017-01-18T11:34:09,978][DEBUG][logstash.pipeline ] filter received {"event"=>{"path"=>"/home/maeln/Projets/oracle-of-bacon/imdb-data/roles.csv", "#timestamp"=>2017-01-18T10:34:09.900Z, "#version"=>"1", "host"=>"maeln-GE70-2PE", "message"=>"\"Ansfelt, Jacob\",\"Manden med de gyldne ører (2009)\",PLAYED_IN"}}
[2017-01-18T11:34:09,978][DEBUG][logstash.util.decorators ] filters/LogStash::Filters::Aggregate: adding tag {"tag"=>"aggregated"}
[2017-01-18T11:34:09,978][DEBUG][logstash.filters.aggregate] Aggregate successful filter code execution {:code=>"\n\t\t\t\tmap['movie'] ||= []\n\t\t\t\t\tevent.to_hash.each do |key,value|\n\t\t\t\t\tmap[key] = value unless map.has_key?(key)\n\t\t\t\t\tmap[key] << value if map[key].is_a?(Array)\n\t\t\t\tend\n\t\t\t\t"}
Pastebin with only the line containing logstash.filters.aggregate : link
The issue you're facing relates to the fact that once a line is read it is handed out to a filter+output thread.
If you have several CPUs, several of those threads will be processing your lines in parallel and hence the output order is not guaranteed anymore. More importantly, each of your aggregate filters will be local to a given thread so it's definitely possible that several lines relating to the same actor (even if in order) get processed by different threads in parallel and the output order might differ.
Once solution would be to run logstash with the option -w 1 to only create a single worker thread, but you'll decrease the throughput by doing so.