I'm trying to process the following with an JSON Input step:
{"address":[
{"AddressId":"1_1","Street":"A Street"},
{"AddressId":"1_101","Street":"Another Street"},
{"AddressId":"1_102","Street":"One more street", "Locality":"Buenos Aires"},
{"AddressId":"1_102","Locality":"New York"}
]}
However this seems not to be possible:
Json Input.0 - ERROR (version 4.2.1-stable, build 15952 from 2011-10-25 15.27.10 by buildguy) :
The data structure is not the same inside the resource!
We found 1 values for json path [$..Locality], which is different that the number retourned for path [$..Street] (3509 values).
We MUST have the same number of values for all paths.
The step provides Ignore Missing Path flag but it only works if all the rows misses the same path. In that case that step acts as as expected an fills the missing values with null.
This limits the power of this step to read uneven data, which was really one of my priorities.
My step Fields are defined as follows:
Am I missing something? Is this the correct behavior?
What I have done is use JSON Input using $.address[*] to read to a jsonRow field the full map of each element p.e:
{"address":[
{"AddressId":"1_1","Street":"A Street"},
{"AddressId":"1_101","Street":"Another Street"},
{"AddressId":"1_102","Street":"One more street", "Locality":"Buenos Aires"},
{"AddressId":"1_102","Locality":"New York"}
]}
This results in 4 jsonRows one for each element, p.e. jsonRow = {"AddressId":"1_101","Street":"Another Street"}. Then using a Javascript step I map my values using this:
var AddressId = getFromMap('AddressId', jsonRow);
var Street = getFromMap('Street', jsonRow);
var Locality = getFromMap('Locality', jsonRow);
In a second script tab I inserted minified JSON parse code from https://github.com/douglascrockford/JSON-js and the getFromMap function:
function getFromMap(key,jsonRow){
try{
var map = JSON.parse(jsonRow);
}
catch(e){
var message = "Unparsable JSON: "+jsonRow+" Desc: "+e.message;
var nr_errors = 1;
var field = "jsonRow";
var errcode = "JSON_PARSE";
_step_.putError(getInputRowMeta(), row, nr_errors, message, field, errcode);
trans_Status = SKIP_TRANSFORMATION;
return null;
}
if(map[key] == undefined){
return null;
}
trans_Status = CONTINUE_TRANSFORMATION;
return map[key]
}
You can solve this by changing the JSONPath and splitting up the steps in two JSON input steps. The following website explains a lot about JSONPath: http://goessner.net/articles/JsonPath/
$..AddressId
Does in fact return all the AddressId's in the address array, BUT since Pentaho is using grid rows for input and output [4 rows x 3 columns], it can't handle a missing value aka null value when you want as results return all the Streets (3 rows) and return all the Locality (2 rows), simply because there are no null values in the array itself as in you can't drive out of your garage with 3 wheels on your car instead of the usual 4.
I guess your script returns null (where X is zero) values like:
A S X
A S X
A S L
A X L
The scripting step can be avoided same by changing the Fields path of the first JSONinput step into:
$.address[*]
This is to retrieve all the 4 address lines. Create a next JSONinput step based on the new source field which contains the address line(s) to retrieve the address details per line:
$.AddressId
$.Street
$.Locality
This yields the null values on the four address lines when a address details is not available in an address line.
Related
I'm new to Python and using Python3 to display the data from my weather station
The problem I have is it used to work perfectly until I got a replacement station.
I found the problem
In the weather data sent there are 3 fields (not sure of the correct name) but they are
lightning_strike_last_distance
lightning_strike_last_distance_msg
lightning_strike_last_epoch
In my new station these fields are completely missing as there has been no lightning since I got the new one
As a result the station display just doesn't parse the weather data as those fields are not there.
How can I get the program to check if those fields/elements or whatever the correct name is, and if they are there parse them as usual
but if they are not there to skip those and move onto the next section
This is the relevant section of code
lightning_strike_last_distance = forecast_json["current_conditions"]["lightning_strike_last_distance"]
lightning1 = lightning_strike_last_distance*0.621371 #Convert kph to mph
data.lightning_strike_last_distance = "{0:.2f} miles".format(lightning1)
lightning_strike_last_epoch = forecast_json["current_conditions"]["lightning_strike_last_epoch"]
data.lightning_strike_last_epoch = time.strftime("%d-%m-%Y %H:%M:%S", time.localtime(lightning_strike_last_epoch))
How can I fix it so the program skips those 3 elements/sections if they are missing?
try following pattern:
lightning_strike_last_distance = forecast_json["current_conditions"]["lightning_strike_last_distance"] if "lightning_strike_last_distance" in forecast_json["current_conditions"] else None
It will set lightning_strike_last_distance to the value if it is present, and set it to None if it is not present.
Repeat that pattern for all other assignements.
to test it quickly try :
data = {"a":{"b":1,},}
valueB = data["a"]["b"] if "b" in data["a"] else None
valueC = data["a"]["c"] if "c" in data["a"] else None
print (valueB)
print (valueC)
I have wrote a code like below to concatenate two arrays together and save them as a JSON file.
In this code, "seg" is an array of some number, which has been produced somewhere in my code. info is also an array containing some data following by "Seg" array.
Defining variable types:
seg: Array<any> = [];
info: Array<any>=[];
final: Array<{info:any, Seg:any}>=[];
push value in array and concatenate them together:
this.info.push({date_created: 25 , description: 'aaa', year:'2015'});
this.final.push({info: this.info ,Seg:this.seg});
this.file.writeFile(this.file.externalApplicationStorageDirectory, 'test.json', JSON.stringify(this.final));
the produced file is something like this:
[{"info":[{"date_created: 25 , "description"="aaa", "year" :"2015"}],"seg":[2,3,4,5]}]
As you can see, the info information is placed between two bracket, so JSON file consider it as a list, not record.
Does anyone knows , how can I remove this brackets from the info array sides?
Should change the type of variable from array to anything else?
You can use like this to store as a record
seg: Array<any> = [];
info: Array<any>=[];
final:{info:any, Seg:any};
this.final.Seg = this.seg;
this.final.info = this.info;
My goal is to (1) import Twitter JSON, (2) extract data of interest, (3) create pandas data frame for the variables of interest. Here is my code:
import json
import pandas as pd
tweets = []
for line in open('00.json'):
try:
tweet = json.loads(line)
tweets.append(tweet)
except:
continue
# Tweets often have missing data, therefore use -if- when extracting "keys"
tweet = tweets[0]
ids = [tweet['id_str'] for tweet in tweets if 'id_str' in tweet]
text = [tweet['text'] for tweet in tweets if 'text' in tweet]
lang = [tweet['lang'] for tweet in tweets if 'lang' in tweet]
geo = [tweet['geo'] for tweet in tweets if 'geo' in tweet]
place = [tweet['place'] for tweet in tweets if 'place' in tweet]
# Create a data frame (using pd.Index may be "incorrect", but I am a noob)
df=pd.DataFrame({'Ids':pd.Index(ids),
'Text':pd.Index(text),
'Lang':pd.Index(lang),
'Geo':pd.Index(geo),
'Place':pd.Index(place)})
# Create a data frame satisfying conditions:
df2 = df[(df['Lang']==('en')) & (df['Geo'].dropna())]
So far, everything seems to be working fine.
Now, the extracted values for Geo result in the following example:
df2.loc[1921,'Geo']
{'coordinates': [39.11890951, -84.48903638], 'type': 'Point'}
To get rid of everything except the coordinates inside the squared brackets I tried using:
df2.Geo.str.replace("[({':]", "") ### results in NaN
# and also this:
df2['Geo'] = df2['Geo'].map(lambda x: x.lstrip('{'coordinates': [').rstrip('], 'type': 'Point'')) ### results in syntax error
Please advise on the correct way to obtain coordinates values only.
The following line from your question indicates that this is an issue with understanding the underlying data type of the returned object.
df2.loc[1921,'Geo']
{'coordinates': [39.11890951, -84.48903638], 'type': 'Point'}
You are returning a Python dictionary here -- not a string! If you want to return just the values of the coordinates, you should just use the 'coordinates' key to return those values, e.g.
df2.loc[1921,'Geo']['coordinates']
[39.11890951, -84.48903638]
The returned object in this case will be a Python list object containing the two coordinate values. If you want just one of the values, you can slice the list, e.g.
df2.loc[1921,'Geo']['coordinates'][0]
39.11890951
This workflow is much easier to deal with than casting the dictionary to a string, parsing the string, and recapturing the coordinate values as you are trying to do.
So let's say you want to create a new column called "geo_coord0" which contains all of the coordinates in the first position (as shown above). You could use a something like the following:
df2["geo_coord0"] = [x['coordinates'][0] for x in df2['Geo']]
This uses a Python list comprehension to iterate over all entries in the df2['Geo'] column and for each entry it uses the same syntax we used above to return the first coordinate value. It then assigns these values to a new column in df2.
See the Python documentation on data structures for more details on the data structures discussed above.
I'd like to format my query results as a single JSON object containing an array object for each record. Need help writing the script though - the JSON.stringify function is building an array of objects (My JSON is inside out!).
I can always write a function to build the JSON manually but I have a feeling there's already a function to do what I'm looking for. I just can't find it.
The JSON string I want to get:
{["id":1,"info":"Ipsum 0"], ["id":2,"info":"Ipsum 1"],
["id":3,"info":"Ipsum 2"], ["id":4,"info":"Ipsum 3"] (and so on) }
Actual Results
[{"id":1,"info":"Ipsum 0"},{"id":2,"info":"Ipsum 1"},
{"id":3,"info":"Ipsum 2"},{"id":4,"info":"Ipsum 3"},
{"id":5,"info":"Ipsum 4"},{"id":6,"info":"Ipsum 5"},
{"id":7,"info":"Ipsum 6"},{"id":8,"info":"Ipsum 7"},
{"id":9,"info":"Ipsum 8"},{"id":10,"info":"Ipsum 9"}]
My code so far (based on this example)
var sqlite3 = require('sqlite3').verbose();
var db = new sqlite3.Database(':memory:');
db.serialize(function() {
db.run("CREATE TABLE lorem (info TEXT)");
var stmt = db.prepare("INSERT INTO lorem VALUES (?)");
for (var i = 0; i < 10; i++) {
stmt.run("Ipsum " + i);
}
stmt.finalize();
var sql = "SELECT rowid AS id, info FROM lorem";
// Print the records as JSON
db.all(sql, function(err, rows) {
console.log(JSON.stringify(rows));
});
});
db.close();
Based on what I know of JSON I was expecting the whole recordset to be enclosed with curly brackets, and each record to be enclosed with a square bracket. However I'm seeing the opposite.
Nope, you have it backward. Database results will be modeled as an array of objects - 1 array represents the results of the entire query, and each object in that array represents a single result record. In JSON, Arrays use square brackets, objects use curly braces (Same as actual JavaScript code).
I'm using wget to fetch several dozen JSON files on a daily basis that go like this:
{
"results": [
{
"id": "ABC789",
"title": "Apple",
},
{
"id": "XYZ123",
"title": "Orange",
}]
}
My goal is to find row's position on each JSON file given a value or set of values (i.e. "In which row XYZ123 is located?"). In previous example ABC789 is in row 1, XYZ123 in row 2 and so on.
As for now I use Google Regine to "quickly" visualize (using the Text Filter option) where the XYZ123 is standing (row 2).
But since it takes a while to do this manually for each file I was wondering if there is a quick and efficient way in one go.
What can I do and how can I fetch and do the request? Thanks in advance! FoF0
In python:
import json
#assume json_string = your loaded data
data = json.loads(json_string)
mapped_vals = []
for ent in data:
mapped_vals.append(ent['id'])
The order of items in the list will be indexed according to the json data, since the list is a sequenced collection.
In PHP:
$data = json_decode($json_string);
$output = array();
foreach($data as $values){
$output[] = $values->id;
}
Again, the ordered nature of PHP arrays ensure that the output will be ordered as-is with regard to indexes.
Either example could be modified to use a mapped dictionary (python) or an associative array (php) if needs demand.
You could adapt these to functions that take the id value as an argument, track how far they are into the array, and when found, break out and return the current index.
Wow. I posted the original question 10 months ago when I knew nothing about Python nor computer programming whatsoever!
Answer
But I learned basic Python last December and came up with a solution for not only get the rank order but to insert the results into a MySQL database:
import urllib.request
import json
# Make connection and get the content
response = urllib.request.urlopen(http://whatever.com/search?=ids=1212,125,54,454)
content = response.read()
# Decode Json search results to type dict
json_search = json.loads(content.decode("utf8"))
# Get 'results' key-value pairs to a list
search_data_all = []
for i in json_search['results']:
search_data_all.append(i)
# Prepare MySQL list with ranking order for each id item
ranks_list_to_mysql = []
for i in range(len(search_data_all)):
d = {}
d['id'] = search_data_all[i]['id']
d['rank'] = i + 1
ranks_list_to_mysql.append(d)