I am trying to turn a data frame into some nested json and have been struggling a bit. Here is an example I created. The use case is to split a document for each guest at a hotel chain, with the guest at the top, the hotel details under the visit data, and the daily charges under the 'measurements' info.
The dataframe:
Here is an example of how I am trying to get the JSON to look
I have tried creating a multilevel index and using to_json:
Is there a way to do this using to_json() or will I need to build some nested loops to create nested dictionaries? This is the best I have been able to get:
I would recommend a programming approach. pandas.DataFrame.groupby can be useful.
def hotel_data_to_json(df):
return [
person_data_to_json(person_df)
for person_id, person_df
in df.groupby('person_id')
]
def person_data_to_json(df):
row = df.iloc[0]
return {
'person_id': row['person_id'],
'personal_name': row['personal_name'],
'family_name': row['family_name'],
'visits': [
visit_data_to_json(visit_df)
for visit_id, visit_df
in df.groupby('visit_id')
]
}
def visit_data_to_json(df):
row = df.iloc[0]
# and so on
I have the following text structure in JSON:
{
"Results":{
"Burger King":{
"Location":"New York",
"Address":"Avenue Dalmatas, 20"
},
"Mcdonalds":{
"Location":"Los Angeles",
"Address":"Oscar Street, 50"
}
}
}
I managed to get the city and address results, but for that I need to mention the name of the restaurant in the code to grab the token string.
Dim JSON As String = **json here**
Dim token As JToken
token = JObject.Parse(JSON.ToString())
Dim data = token.SelectToken("Results").SelectToken("Burger King").SelectToken("Location")
My question is, how can I list only the restaurants (Burger King, Mcdonalds, etc), for example, in a Listbox? So I can add some feature that checks the address and city with the user's choice, which I know and already understand how to do, but getting the token with only the names of the restaurants is being difficult for me. If i have a new restaurant name for example, I wouldn't want to include manually in the code. I have tried a lot of different ways, but the last one I have used it was the following:
Dim data = token.SelectToken("Results").SelectToken(0) 'i thought it would print 'Burger King'
'or this one
Dim data = token.SelectToken("Results")(0).ToString()
I've tried some 'For Each' loop code but I wasn't successful either. I've researched countless methods on how to do this and nothing works. I believe it's something simple that I'm either ignoring or completely forgetting.. Please give me a light! Thanks.
I can post c# code for you, it would be easier for you to translate it to VB if you need
var jsonObject= JObject.Parse(json);
List<string> restaurantNames= ((JObject)jsonObject["Results"]).Properties()
.Select(x=>x.Name).Distinct().ToList();
result
Burger King
Mcdonalds
I'm trying to access items in a list via AppleScript. Try as I might, I can't seem to access them. I've pasted my code below
tell application "JSON Helper"
set result to fetch JSON from "https://newsapi.org/v2/top-headlines?sources=bbc-news&pageSize=1&apiKey=X"
set news to title of articles of result
end tell
set result_string to news & ""
Note, I've removed my api key. The api format is :
{"status":"ok","totalResults":10,"articles":[{"source":{"id":"bbc-news","name":"BBC News"},"author":"BBC News","title":"Trump urges Israeli 'care' on settlements","description":"The US president also casts doubt on whether the Palestinians or Israel are ready to talk peace.","url":"http://www.bbc.co.uk/news/world-middle-east-43025705","urlToImage":"https://ichef.bbci.co.uk/news/1024/branded_news/165CD/production/_99979519_044058382.jpg","publishedAt":"2018-02-11T16:47:27Z"}]}
I'm trying to access title but I keep getting "can't get title".
Any input would be much appreciated. Thanks!
Using the sample JSON data you gave, I used this command:
tell application "JSON Helper" to read JSON from ¬
"{
\"status\": \"ok\",
\"totalResults\": 10,
\"articles\": [
{
\"source\": {
\"id\": \"bbc-news\",
\"name\": \"BBC News\"
},
\"author\": \"BBC News\",
\"title\": \"Trump urges Israeli 'care' on settlements\",
\"description\": \"The US president also casts doubt on whether the Palestinians or Israel are ready to talk peace.\",
\"url\": \"http://www.bbc.co.uk/news/world-middle-east-43025705\",
\"urlToImage\": \"https://ichef.bbci.co.uk/news/1024/branded_news/165CD/production/_99979519_044058382.jpg\",
\"publishedAt\": \"2018-02-11T16:47:27Z\"
}
]
}"
to retrieve an AppleScript record. Pretty-printing it as I have above, however, you might already be able to see that articles is a list (containing one item). Therefore, whilst this:
set news to title of articles of result
produces an error, this:
set news to title of item 1 of articles of result
retrieves the correct datum.
I want to parse a string of complex JSON in Pig. Specifically, I want Pig to understand my JSON array as a bag instead of as a single chararray. I found that complex JSON can be parsed by using Twitter's Elephant Bird or Mozilla's Akela library. (I found some additional libraries, but I cannot use 'Loader' based approach since I use HCatalog Loader to load data from Hive.)
But, the problem is the structure of my data; each value of Map structure contains value part of complex JSON. For example,
1. My table looks like (WARNING: type of 'complex_data' is not STRING, a MAP of <STRING, STRING>!)
TABLE temp_table
(
user_id BIGINT COMMENT 'user ID.',
complex_data MAP <STRING, STRING> COMMENT 'complex json data'
)
COMMENT 'temp data.'
PARTITIONED BY(created_date STRING)
STORED AS RCFILE;
2. And 'complex_data' contains (a value that I want to get is marked with two *s, so basically #'d'#'f' from each PARSED_STRING(complex_data#'c') )
{ "a": "[]",
"b": "\"sdf\"",
"**c**":"[{\"**d**\":{\"e\":\"sdfsdf\"
,\"**f**\":\"sdfs\"
,\"g\":\"qweqweqwe\"},
\"c\":[{\"d\":21321,\"e\":\"ewrwer\"},
{\"d\":21321,\"e\":\"ewrwer\"},
{\"d\":21321,\"e\":\"ewrwer\"}]
},
{\"**d**\":{\"e\":\"sdfsdf\"
,\"**f**\":\"sdfs\"
,\"g\":\"qweqweqwe\"},
\"c\":[{\"d\":21321,\"e\":\"ewrwer\"},
{\"d\":21321,\"e\":\"ewrwer\"},
{\"d\":21321,\"e\":\"ewrwer\"}]
},]"
}
3. So, I tried... (same approach for Elephant Bird)
REGISTER '/path/to/akela-0.6-SNAPSHOT.jar';
DEFINE JsonTupleMap com.mozilla.pig.eval.json.JsonTupleMap();
data = LOAD temp_table USING org.apache.hive.hcatalog.pig.HCatLoader();
values_of_map = FOREACH data GENERATE complex_data#'c' AS attr:chararray; -- IT WORKS
-- dump values_of_map shows correct chararray data per each row
-- eg) ([{"d":{"e":"sdfsdf","f":"sdfs","g":"sdf"},... },
{"d":{"e":"sdfsdf","f":"sdfs","g":"sdf"},... },
{"d":{"e":"sdfsdf","f":"sdfs","g":"sdf"},... }])
([{"d":{"e":"sdfsdf","f":"sdfs","g":"sdf"},... },
{"d":{"e":"sdfsdf","f":"sdfs","g":"sdf"},... },
{"d":{"e":"sdfsdf","f":"sdfs","g":"sdf"},... }]) ...
attempt1 = FOREACH data GENERATE JsonTupleMap(complex_data#'c'); -- THIS LINE CAUSE AN ERROR
attempt2 = FOREACH data GENERATE JsonTupleMap(CONCAT(CONCAT('{\\"key\\":', complex_data#'c'), '}'); -- IT ALSO DOSE NOT WORK
I guessed that "attempt1" was failed because the value doesn't contain full JSON. However, when I CONCAT like "attempt2", I generate additional \ mark with. (so each line starts with {\"key\": ) I'm not sure that this additional marks breaks the parsing rule or not. In any case, I want to parse the given JSON string so that Pig can understand. If you have any method or solution, please Feel free to let me know.
I finally solved my problem by using jyson library with jython UDF.
I know that I can solve it by using JAVA or other languages.
But, I think that jython with jyson is the most simplist answer to this issue.
I have been given a rather large corpus of conversational data with which to import the relevant information into R and run some statistical analysis.
The problem is I do not need half the information provided in each entry. Each line in a specific JSON file from the dataset relates to a particular conversation of the nature A->B->A. The attributes provided are contained within a nested array for each of the respective statements in the conversation. This is best illustrated diagrammatically:
What I need is to simply extract the 'actual_sentence' attribute from each turn (turn_1,turn_2,turn_3 - aka A->B->A) and remove the rest.
So far my efforts have been in vain as I have been using the jsonlite package which seems to import the JSON fine but lacks the 'tree depth' to discern between the specific attributes of each turn.
An example:
The following is an example of one row/record of a provided JSON formatted .txt file:
{"semantic_distance_1": 0.375, "semantic_distance_2": 0.6486486486486487, "turn_2": "{\"sentence\": [\"A\", \"transmission\", \"?\"], \"script_filename\": \"Alien.txt\", \"postag\": [\"AT\", null, \".\"], \"semantic_set\": [\"infection.n.04\", \"vitamin_a.n.01\", \"angstrom.n.01\", \"transmittance.n.01\", \"transmission.n.05\", \"transmission.n.02\", \"transmission.n.01\", \"ampere.n.02\", \"adenine.n.01\", \"a.n.07\", \"a.n.06\", \"deoxyadenosine_monophosphate.n.01\"], \"additional_info\": [], \"original_sentence\": \"A transmission?\", \"actual_sentence\": \"A transmission?\", \"dependency_grammar\": null, \"actor\": \"standard\", \"sentence_type\": null, \"ner\": {}, \"turn_in_file\": 58}", "turn_3": "{\"sentence\": [\"A\", \"voice\", \"transmission\", \".\"], \"script_filename\": \"Alien.txt\", \"postag\": [\"AT\", \"NN\", null, \".\"], \"semantic_set\": [\"vitamin_a.n.01\", \"voice.n.10\", \"voice.n.09\", \"angstrom.n.01\", \"articulation.n.03\", \"deoxyadenosine_monophosphate.n.01\", \"a.n.07\", \"a.n.06\", \"infection.n.04\", \"spokesperson.n.01\", \"transmittance.n.01\", \"voice.n.02\", \"voice.n.03\", \"voice.n.01\", \"voice.n.06\", \"voice.n.07\", \"voice.n.05\", \"voice.v.02\", \"voice.v.01\", \"part.n.11\", \"transmission.n.05\", \"transmission.n.02\", \"transmission.n.01\", \"ampere.n.02\", \"adenine.n.01\"], \"additional_info\": [], \"original_sentence\": \"A voice transmission.\", \"actual_sentence\": \"A voice transmission.\", \"dependency_grammar\": null, \"actor\": \"computer\", \"sentence_type\": null, \"ner\": {}, \"turn_in_file\": 59}", "turn_1": "{\"sentence\": [\"I\", \"have\", \"intercepted\", \"a\", \"transmission\", \"of\", \"unknown\", \"origin\", \".\"], \"script_filename\": \"Alien.txt\", \"postag\": [\"PPSS\", \"HV\", \"VBD\", \"AT\", null, \"IN\", \"JJ\", \"NN\", \".\"], \"semantic_set\": [\"i.n.03\", \"own.v.01\", \"receive.v.01\", \"consume.v.02\", \"accept.v.02\", \"rich_person.n.01\", \"vitamin_a.n.01\", \"have.v.09\", \"have.v.07\", \"nameless.s.01\", \"have.v.01\", \"obscure.s.04\", \"have.v.02\", \"stranger.n.01\", \"angstrom.n.01\", \"induce.v.02\", \"hold.v.03\", \"wiretap.v.01\", \"give_birth.v.01\", \"a.n.07\", \"a.n.06\", \"deoxyadenosine_monophosphate.n.01\", \"infection.n.04\", \"unknown.n.03\", \"unknown.s.03\", \"get.v.03\", \"origin.n.03\", \"origin.n.02\", \"transmittance.n.01\", \"origin.n.05\", \"origin.n.04\", \"one.s.01\", \"have.v.17\", \"have.v.12\", \"have.v.10\", \"have.v.11\", \"take.v.35\", \"experience.v.03\", \"intercept.v.01\", \"unknown.n.01\", \"iodine.n.01\", \"strange.s.02\", \"suffer.v.02\", \"beginning.n.04\", \"one.n.01\", \"transmission.n.05\", \"transmission.n.02\", \"transmission.n.01\", \"ampere.n.02\", \"lineage.n.01\", \"unknown.a.01\", \"adenine.n.01\"], \"additional_info\": [], \"original_sentence\": \"I have intercepted a transmission of unknown origin.\", \"actual_sentence\": \"I have intercepted a transmission of unknown origin.\", \"dependency_grammar\": null, \"actor\": \"computer\", \"sentence_type\": null, \"ner\": {}, \"turn_in_file\": 57}", "syntax_distance_1": null, "syntax_distance_2": null}
As you can see there is a great deal of information that I do not need and given my poor knowledge of R, importing it (and the rest of the file it is contained within) in this form leads me to the following in R:
The command used for this was:
json = fromJSON(paste("[",paste(readLines("JSONfile.txt"),collapse=","),"]"))
Essentially it is picking up on syntax_distance_1, syntax_distance_2, semantic_distance_1,semantic_distance_2 and then lumping all of the turn data into three enormous and unstructured arrays.
What I would like to know is if I can somehow either:
Specify a tree depth that enables R to discern between each of the 'turn' variables
OR
Simply cherry pick the turn$actual_sentence information from the outset and remove all the rest in the import process.
Hope that is enough information, please let me know if there is anything else I can add to clear it up.
Since in this case you know that you need to go one level deeper, what you can do is use one of the apply functions to parse the turn_x strings. The following snippet of code illustrates the basic idea:
# Read the json file
json_file <- fromJSON("JSONfile.json")
# use the apply function to parse the turn_x strings.
# Checking that the element is a character helps avoid
# issues with numerical values and nulls.
pjson_file <- lapply(json_file, function(x) {if (is.character(x)){fromJSON(x)}})
If we look at the results, we see that the whole data structure has been parsed this time. To access the actual_sentence field, what you can do is:
> pjson_file$turn_1$actual_sentence
[1] "I have intercepted a transmission of unknown origin."
> pjson_file$turn_2$actual_sentence
[1] "A transmission?"
> pjson_file$turn_3$actual_sentence
[1] "A voice transmission."
If you want to scale this logic so that it works with a large dataset, you can encapsulate it in a function that would return the three sentences as a character vector or a dataframe if you wish.