Importing and converting specific attributes of JSON files in R - json
I have been given a rather large corpus of conversational data with which to import the relevant information into R and run some statistical analysis.
The problem is I do not need half the information provided in each entry. Each line in a specific JSON file from the dataset relates to a particular conversation of the nature A->B->A. The attributes provided are contained within a nested array for each of the respective statements in the conversation. This is best illustrated diagrammatically:
What I need is to simply extract the 'actual_sentence' attribute from each turn (turn_1,turn_2,turn_3 - aka A->B->A) and remove the rest.
So far my efforts have been in vain as I have been using the jsonlite package which seems to import the JSON fine but lacks the 'tree depth' to discern between the specific attributes of each turn.
An example:
The following is an example of one row/record of a provided JSON formatted .txt file:
{"semantic_distance_1": 0.375, "semantic_distance_2": 0.6486486486486487, "turn_2": "{\"sentence\": [\"A\", \"transmission\", \"?\"], \"script_filename\": \"Alien.txt\", \"postag\": [\"AT\", null, \".\"], \"semantic_set\": [\"infection.n.04\", \"vitamin_a.n.01\", \"angstrom.n.01\", \"transmittance.n.01\", \"transmission.n.05\", \"transmission.n.02\", \"transmission.n.01\", \"ampere.n.02\", \"adenine.n.01\", \"a.n.07\", \"a.n.06\", \"deoxyadenosine_monophosphate.n.01\"], \"additional_info\": [], \"original_sentence\": \"A transmission?\", \"actual_sentence\": \"A transmission?\", \"dependency_grammar\": null, \"actor\": \"standard\", \"sentence_type\": null, \"ner\": {}, \"turn_in_file\": 58}", "turn_3": "{\"sentence\": [\"A\", \"voice\", \"transmission\", \".\"], \"script_filename\": \"Alien.txt\", \"postag\": [\"AT\", \"NN\", null, \".\"], \"semantic_set\": [\"vitamin_a.n.01\", \"voice.n.10\", \"voice.n.09\", \"angstrom.n.01\", \"articulation.n.03\", \"deoxyadenosine_monophosphate.n.01\", \"a.n.07\", \"a.n.06\", \"infection.n.04\", \"spokesperson.n.01\", \"transmittance.n.01\", \"voice.n.02\", \"voice.n.03\", \"voice.n.01\", \"voice.n.06\", \"voice.n.07\", \"voice.n.05\", \"voice.v.02\", \"voice.v.01\", \"part.n.11\", \"transmission.n.05\", \"transmission.n.02\", \"transmission.n.01\", \"ampere.n.02\", \"adenine.n.01\"], \"additional_info\": [], \"original_sentence\": \"A voice transmission.\", \"actual_sentence\": \"A voice transmission.\", \"dependency_grammar\": null, \"actor\": \"computer\", \"sentence_type\": null, \"ner\": {}, \"turn_in_file\": 59}", "turn_1": "{\"sentence\": [\"I\", \"have\", \"intercepted\", \"a\", \"transmission\", \"of\", \"unknown\", \"origin\", \".\"], \"script_filename\": \"Alien.txt\", \"postag\": [\"PPSS\", \"HV\", \"VBD\", \"AT\", null, \"IN\", \"JJ\", \"NN\", \".\"], \"semantic_set\": [\"i.n.03\", \"own.v.01\", \"receive.v.01\", \"consume.v.02\", \"accept.v.02\", \"rich_person.n.01\", \"vitamin_a.n.01\", \"have.v.09\", \"have.v.07\", \"nameless.s.01\", \"have.v.01\", \"obscure.s.04\", \"have.v.02\", \"stranger.n.01\", \"angstrom.n.01\", \"induce.v.02\", \"hold.v.03\", \"wiretap.v.01\", \"give_birth.v.01\", \"a.n.07\", \"a.n.06\", \"deoxyadenosine_monophosphate.n.01\", \"infection.n.04\", \"unknown.n.03\", \"unknown.s.03\", \"get.v.03\", \"origin.n.03\", \"origin.n.02\", \"transmittance.n.01\", \"origin.n.05\", \"origin.n.04\", \"one.s.01\", \"have.v.17\", \"have.v.12\", \"have.v.10\", \"have.v.11\", \"take.v.35\", \"experience.v.03\", \"intercept.v.01\", \"unknown.n.01\", \"iodine.n.01\", \"strange.s.02\", \"suffer.v.02\", \"beginning.n.04\", \"one.n.01\", \"transmission.n.05\", \"transmission.n.02\", \"transmission.n.01\", \"ampere.n.02\", \"lineage.n.01\", \"unknown.a.01\", \"adenine.n.01\"], \"additional_info\": [], \"original_sentence\": \"I have intercepted a transmission of unknown origin.\", \"actual_sentence\": \"I have intercepted a transmission of unknown origin.\", \"dependency_grammar\": null, \"actor\": \"computer\", \"sentence_type\": null, \"ner\": {}, \"turn_in_file\": 57}", "syntax_distance_1": null, "syntax_distance_2": null}
As you can see there is a great deal of information that I do not need and given my poor knowledge of R, importing it (and the rest of the file it is contained within) in this form leads me to the following in R:
The command used for this was:
json = fromJSON(paste("[",paste(readLines("JSONfile.txt"),collapse=","),"]"))
Essentially it is picking up on syntax_distance_1, syntax_distance_2, semantic_distance_1,semantic_distance_2 and then lumping all of the turn data into three enormous and unstructured arrays.
What I would like to know is if I can somehow either:
Specify a tree depth that enables R to discern between each of the 'turn' variables
OR
Simply cherry pick the turn$actual_sentence information from the outset and remove all the rest in the import process.
Hope that is enough information, please let me know if there is anything else I can add to clear it up.
Since in this case you know that you need to go one level deeper, what you can do is use one of the apply functions to parse the turn_x strings. The following snippet of code illustrates the basic idea:
# Read the json file
json_file <- fromJSON("JSONfile.json")
# use the apply function to parse the turn_x strings.
# Checking that the element is a character helps avoid
# issues with numerical values and nulls.
pjson_file <- lapply(json_file, function(x) {if (is.character(x)){fromJSON(x)}})
If we look at the results, we see that the whole data structure has been parsed this time. To access the actual_sentence field, what you can do is:
> pjson_file$turn_1$actual_sentence
[1] "I have intercepted a transmission of unknown origin."
> pjson_file$turn_2$actual_sentence
[1] "A transmission?"
> pjson_file$turn_3$actual_sentence
[1] "A voice transmission."
If you want to scale this logic so that it works with a large dataset, you can encapsulate it in a function that would return the three sentences as a character vector or a dataframe if you wish.
Related
How to print multiple specific JSON values with Python?
I am attempting to print values from an API via JSON response. I was successful when I tried to print the first and foremost "live" value of the response, but I started running into problems when I tried printing anything other than the "live" value. Below is a sample of what I usually receive from the API, and my goal here is to print out only every visible "name" values. { "live":[ { "id":203003098, "yt_video_key":"K0uWjPoiMRY", "bb_video_id":"None", "title":"【Minecraft】Nature, Please Guide Me! ft. #Ceres Fauna Ch. hololive-EN #holoCouncil", "thumbnail":"None", "status":"live", "live_schedule":"2021-09-14T02:00:00.000Z", "live_start":"2021-09-14T02:00:51.000Z", "live_end":"None", "live_viewers":11000, "channel":{ "id":2260367, "yt_channel_id":"UC3n5uGu18FoCy23ggWWp8tA", "bb_space_id":"None", "name":"Nanashi Mumei Ch. hololive-EN", "photo":"https://yt3.ggpht.com/MI8E8Wfmc_ngNZXUwu8ad0D-OtqDhmqGVULEu25z-ccscwzJpAw-7ewFXzZYLK2jHB9d5OgQDq4=s800-c-k-c0x00ffffff-no-rj", "published_at":"2021-07-26T15:45:01.162Z", "twitter_link":"nanashimumei_en", "view_count":4045014, "subscriber_count":281000, "video_count":14 } }, { "id":202920144, "yt_video_key":"owk8w59Lcus", "bb_video_id":"None", "title":"【Undertale】平和なPルートでハッピーエンド目指す!【雪花ラミィ/ホロライブ】", "thumbnail":"None", "status":"live", "live_schedule":"2021-09-14T00:00:00.000Z", "live_start":"2021-09-14T00:04:22.000Z", "live_end":"None", "live_viewers":6200, "channel":{ "id":31879, "yt_channel_id":"UCFKOVgVbGmX65RxO3EtH3iw", "bb_space_id":"None", "name":"Lamy Ch. 雪花ラミィ", "description":"ホロライブ所属。\n人里離れた白銀の大地に住む、雪の一族の令嬢。\nホロライブの笑顔や彩りあふれる配信に心を打たれ、\nお供のだいふくと共に家を飛び出した。\n真面目だが世間知らずで抜けたところがある。\n\n\n\nお問い合わせ\nカバー株式会社:http://cover-corp.com/ \n公式Twitter:https://twitter.com/hololivetv", "photo":"https://yt3.ggpht.com/ytc/AKedOLQDR06gp26jxNNXh88Hhv1o-pNrnlKrYruqUIOx=s800-c-k-c0x00ffffff-no-rj", "published_at":"2020-04-13T03:51:15.590Z", "twitter_link":"yukihanalamy", "view_count":66576847, "subscriber_count":813000, "video_count":430 } }, { "id":203019193, "yt_video_key":"QM2DjVNl1gY", "bb_video_id":"None", "title":"【MINECRAFT】 Adventuring with Mumei! #holoCouncil", "thumbnail":"None", "status":"live", "live_schedule":"2021-09-14T02:00:00.000Z", "live_start":"2021-09-14T02:00:58.000Z", "live_end":"None", "live_viewers":8600, "channel":{ "id":2260365, "yt_channel_id":"UCO_aKKYxn4tvrqPjcTzZ6EQ", "bb_space_id":"None", "name":"Ceres Fauna Ch. hololive-EN", "description":"A member of the Council and the Keeper of \"Nature,\" the second concept created by the Gods.\nShe has materialized in the mortal realm as a druid in a bid to save nature.\nShe has Kirin blood flowing in her veins, and horns that are made out of the branches of a certain tree; they are NOT deer antlers.\n\n\"Nature\" refers to all organic matter on the planet except mankind.\nIt is long said that her whispers, as an avatar of Mother Nature, have healing properties. Whether or not that is true is something only those who have heard them can say.\nWhile she is usually affable, warm, and slightly mischievous, any who anger her will bear the full brunt of Nature\\'s fury.\n\n", "photo":"https://yt3.ggpht.com/0lkccaVapSr1Z3uuXWbnaQxeqRWr9Tcs4R9rLBRSrAsN9gLacpiT2OFWfFKr4NhF97_hqK3eTg=s800-c-k-c0x00ffffff-no-rj", "published_at":"2021-07-26T15:38:58.797Z", "twitter_link":"ceresfauna", "view_count":5003954, "subscriber_count":253000, "video_count":17 } } ], My code: url = "https://api.holotools.app/v1/live" response = urlopen(url) data_json = json.loads(response.read()) print(data_json['live'])
I think you're new to programming language so following is the special note for the new programmer. You did well in printing the data but this is not end because your goal is to get the name so you need to traverse in the response one by one let me show you url = "https://api.holotools.app/v1/live" response = urlopen(url) data_json = json.loads(response.read()) dicts = data_json['live'] #Why I'm using loop here? Because we need to get every element of list(data_json['live'] is a list) for dict in dicts: print(dict["channel"]["name"] ***Now here after getting single element from list as a dict I select its key which is "channel"*** Following are some useful links through which you can learn how to traverse in json https://www.kite.com/python/answers/how-to-iterate-through-a-json-string-in-python https://www.delftstack.com/howto/python/iterate-through-json-python/ There are also stackoverflow answer which are about: How to get data from json? but it need some programming skills too following is the link of answers. Iterating through a JSON object Looping through a JSON array in Python How can I loop over entries in JSON?
dumping list to JSON file creates list within a list [["x", "y","z"]], why?
I want to append multiple list items to a JSON file, but it creates a list within a list, and therefore I cannot acces the list from python. Since the code is overwriting existing data in the JSON file, there should not be any list there. I also tried it by having just an text in the file without brackets. It just creates a list within a list so [["x", "y","z"]] instead of ["x", "y","z"] import json filename = 'vocabulary.json' print("Reading %s" % filename) try: with open(filename, "rt") as fp: data = json.load(fp) print("Data: %s" % data)#check except IOError: print("Could not read file, starting from scratch") data = [] # Add some data TEMPORARY_LIST = [] new_word = input("give new word: ") TEMPORARY_LIST.append(new_word.split()) print(TEMPORARY_LIST)#check data = TEMPORARY_LIST print("Overwriting %s" % filename) with open(filename, "wt") as fp: json.dump(data, fp) example and output with appending list with split words: Reading vocabulary.json Data: [['my', 'dads', 'house', 'is', 'nice']] give new word: but my house is nicer [['but', 'my', 'house', 'is', 'nicer']] Overwriting vocabulary.json
So, if I understand what you are trying to accomplish correctly, it looks like you are trying to overwrite a list in a JSON file with a new list created from user input. For easiest data manipulation, set up your JSON file in dictionary form: { "words": [ "my", "dad's", "house", "is", "nice" ] } You should then set up functions to separate your functionality to make it more manageable: def load_json(filename): with open(filename, "r") as f: return json.load(f) Now, we can use those functions to load the JSON, access the words list, and overwrite it with the new word. data = load_json("vocabulary.json") new_word = input("Give new word: ").split() data["words"] = new_word write_json("vocabulary.json", data) If the user inputs "but my house is nicer", the JSON file will look like this: { "words": [ "but", "my", "house", "is", "nicer" ] } Edit Okay, I have a few suggestions to make before I get into solving the issue. Firstly, it's great that you have delegated much of the functionality of the program over to respective functions. However, using global variables is generally discouraged because it makes things extremely difficult to debug as any of the functions that use that variable could have mutated it by accident. To fix this, use method parameters and pass around the data accordingly. With small programs like this, you can think of the main() method as the point in which all data comes to and from. This means that the main() function will pass data to other functions and receive new or edited data back. One final recommendation, you should only be using all capital letters for variable names if they are going to be constant. For example, PI = 3.14159 is a constant, so it is conventional to make "pi" all caps. Without using global, main() will look much cleaner: def main(): choice = input("Do you want to start or manage the list? (start/manage)") if choice == "start": data = load_json() words = data["words"] dictee(words) elif choice == "manage": manage_list() You can use the load_json() function from earlier (notice that I deleted write_json(), more on that later) if the user chooses to start the game. If the user chooses to manage the file, we can write something like this: def manage_list(): choice = input("Do you want to add or clear the list? (add/clear)") if choice == "add": words_to_add = get_new_words() add_words("vocabulary.json", words_to_add) elif choice == "clear": clear_words("vocabulary.json") We get the user input first and then we can call two other functions, add_words() and clear_words(): def add_words(filename, words): with open(filename, "r+") as f: data = json.load(f) data["words"].extend(words) f.seek(0) json.dump(data, f, indent=4) def clear_words(filename): with open(filename, "w+") as f: data = {"words":[]} json.dump(data, f, indent=4) I did not utilize the load_json() function in the two functions above. My reasoning for this is because it would call for opening the file more times than needed, which would hurt performance. Furthermore, in these two functions, we already need to open the file, so it is okayt to load the JSON data here because it can be done with only one line: data = json.load(f). You may also notice that in add_words(), the file mode is "r+". This is the basic mode for reading and writing. "w+" is used in clear_words(), because "w+" not only opens the file for reading and writing, it overwrites the file if the file exists (that is also why we don't need to load the JSON data in clear_words()). Because we have these two functions for writing and/or overwriting data, we don't need the write_json() function that I had initially suggested. We can then add to the list like so: >>> Do you want to start or manage the list? (start/manage)manage >>> Do you want to add or clear the list? (add/clear)add >>> Please enter the words you want to add, separated by spaces: these are new words And the JSON file becomes: { "words": [ "but", "my", "house", "is", "nicer", "these", "are", "new", "words" ] } We can then clear the list like so: >>> Do you want to start or manage the list? (start/manage)manage >>> Do you want to add or clear the list? (add/clear)clear And the JSON file becomes: { "words": [] } Great! Now, we implemented the ability for the user to manage the list. Let's move on to creating the functionality for the game: dictee() You mentioned that you want to randomly select an item from a list and remove it from that list so it doesn't get asked twice. There are a multitude of ways you can accomplish this. For example, you could use random.shuffle: def dictee(words): correct = 0 incorrect = 0 random.shuffle(words) for word in words: # ask word # evaluate response # increment correct/incorrect # ask if you want to play again pass random.shuffle randomly shuffles the list around. Then, you can iterate throught the list using for word in words: and start the game. You don't necessarily need to use random.choice here because when using random.shuffle and iterating through it, you are essentially selecting random values. I hope this helped illustrate how powerful functions and function parameters are. They not only help you separate your code, but also make it easier to manage, understand, and write cleaner code.
Parse complex Json string contained in Hadoop
I want to parse a string of complex JSON in Pig. Specifically, I want Pig to understand my JSON array as a bag instead of as a single chararray. I found that complex JSON can be parsed by using Twitter's Elephant Bird or Mozilla's Akela library. (I found some additional libraries, but I cannot use 'Loader' based approach since I use HCatalog Loader to load data from Hive.) But, the problem is the structure of my data; each value of Map structure contains value part of complex JSON. For example, 1. My table looks like (WARNING: type of 'complex_data' is not STRING, a MAP of <STRING, STRING>!) TABLE temp_table ( user_id BIGINT COMMENT 'user ID.', complex_data MAP <STRING, STRING> COMMENT 'complex json data' ) COMMENT 'temp data.' PARTITIONED BY(created_date STRING) STORED AS RCFILE; 2. And 'complex_data' contains (a value that I want to get is marked with two *s, so basically #'d'#'f' from each PARSED_STRING(complex_data#'c') ) { "a": "[]", "b": "\"sdf\"", "**c**":"[{\"**d**\":{\"e\":\"sdfsdf\" ,\"**f**\":\"sdfs\" ,\"g\":\"qweqweqwe\"}, \"c\":[{\"d\":21321,\"e\":\"ewrwer\"}, {\"d\":21321,\"e\":\"ewrwer\"}, {\"d\":21321,\"e\":\"ewrwer\"}] }, {\"**d**\":{\"e\":\"sdfsdf\" ,\"**f**\":\"sdfs\" ,\"g\":\"qweqweqwe\"}, \"c\":[{\"d\":21321,\"e\":\"ewrwer\"}, {\"d\":21321,\"e\":\"ewrwer\"}, {\"d\":21321,\"e\":\"ewrwer\"}] },]" } 3. So, I tried... (same approach for Elephant Bird) REGISTER '/path/to/akela-0.6-SNAPSHOT.jar'; DEFINE JsonTupleMap com.mozilla.pig.eval.json.JsonTupleMap(); data = LOAD temp_table USING org.apache.hive.hcatalog.pig.HCatLoader(); values_of_map = FOREACH data GENERATE complex_data#'c' AS attr:chararray; -- IT WORKS -- dump values_of_map shows correct chararray data per each row -- eg) ([{"d":{"e":"sdfsdf","f":"sdfs","g":"sdf"},... }, {"d":{"e":"sdfsdf","f":"sdfs","g":"sdf"},... }, {"d":{"e":"sdfsdf","f":"sdfs","g":"sdf"},... }]) ([{"d":{"e":"sdfsdf","f":"sdfs","g":"sdf"},... }, {"d":{"e":"sdfsdf","f":"sdfs","g":"sdf"},... }, {"d":{"e":"sdfsdf","f":"sdfs","g":"sdf"},... }]) ... attempt1 = FOREACH data GENERATE JsonTupleMap(complex_data#'c'); -- THIS LINE CAUSE AN ERROR attempt2 = FOREACH data GENERATE JsonTupleMap(CONCAT(CONCAT('{\\"key\\":', complex_data#'c'), '}'); -- IT ALSO DOSE NOT WORK I guessed that "attempt1" was failed because the value doesn't contain full JSON. However, when I CONCAT like "attempt2", I generate additional \ mark with. (so each line starts with {\"key\": ) I'm not sure that this additional marks breaks the parsing rule or not. In any case, I want to parse the given JSON string so that Pig can understand. If you have any method or solution, please Feel free to let me know.
I finally solved my problem by using jyson library with jython UDF. I know that I can solve it by using JAVA or other languages. But, I think that jython with jyson is the most simplist answer to this issue.
Strange behaviour in fromJSON in RJSONIO package
Ok, I'm trying to convert the following JSON data into an R data frame. For some reason fromJSON in the RJSONIO package only reads up to about character 380 and then it stops converting the JSON properly. Here is the JSON:- "{\"metricDate\":\"2013-05-01\",\"pageCountTotal\":\"33682\",\"landCountTotal\":\"11838\",\"newLandCountTotal\":\"8023\",\"returnLandCountTotal\":\"3815\",\"spiderCountTotal\":\"84\",\"goalCountTotal\":\"177.000000\",\"callGoalCountTotal\":\"177.000000\",\"callCountTotal\":\"237.000000\",\"onlineGoalCountTotal\":\"0.000000\",\"conversionPerc\":\"1.50\",\"callConversionPerc\":\"74.68\"}\n{\"metricDate\":\"2013-05-02\",\"pageCountTotal\":\"32622\",\"landCountTotal\":\"11626\",\"newLandCountTotal\":\"7945\",\"returnLandCountTotal\":\"3681\",\"spiderCountTotal\":\"58\",\"goalCountTotal\":\"210.000000\",\"callGoalCountTotal\":\"210.000000\",\"callCountTotal\":\"297.000000\",\"onlineGoalCountTotal\":\"0.000000\",\"conversionPerc\":\"1.81\",\"callConversionPerc\":\"70.71\"}\n{\"metricDate\":\"2013-05-03\",\"pageCountTotal\":\"28467\",\"landCountTotal\":\"11102\",\"newLandCountTotal\":\"7786\",\"returnLandCountTotal\":\"3316\",\"spiderCountTotal\":\"56\",\"goalCountTotal\":\"186.000000\",\"callGoalCountTotal\":\"186.000000\",\"callCountTotal\":\"261.000000\",\"onlineGoalCountTotal\":\"0.000000\",\"conversionPerc\":\"1.68\",\"callConversionPerc\":\"71.26\"}\n{\"metricDate\":\"2013-05-04\",\"pageCountTotal\":\"20884\",\"landCountTotal\":\"9031\",\"newLandCountTotal\":\"6670\",\"returnLandCountTotal\":\"2361\",\"spiderCountTotal\":\"51\",\"goalCountTotal\":\"7.000000\",\"callGoalCountTotal\":\"7.000000\",\"callCountTotal\":\"44.000000\",\"onlineGoalCountTotal\":\"0.000000\",\"conversionPerc\":\"0.08\",\"callConversionPerc\":\"15.91\"}\n{\"metricDate\":\"2013-05-05\",\"pageCountTotal\":\"20481\",\"landCountTotal\":\"8782\",\"newLandCountTotal\":\"6390\",\"returnLandCountTotal\":\"2392\",\"spiderCountTotal\":\"58\",\"goalCountTotal\":\"1.000000\",\"callGoalCountTotal\":\"1.000000\",\"callCountTotal\":\"8.000000\",\"onlineGoalCountTotal\":\"0.000000\",\"conversionPerc\":\"0.01\",\"callConversionPerc\":\"12.50\"}\n{\"metricDate\":\"2013-05-06\",\"pageCountTotal\":\"25175\",\"landCountTotal\":\"10019\",\"newLandCountTotal\":\"7082\",\"returnLandCountTotal\":\"2937\",\"spiderCountTotal\":\"62\",\"goalCountTotal\":\"24.000000\",\"callGoalCountTotal\":\"24.000000\",\"callCountTotal\":\"47.000000\",\"onlineGoalCountTotal\":\"0.000000\",\"conversionPerc\":\"0.24\",\"callConversionPerc\":\"51.06\"}\n{\"metricDate\":\"2013-05-07\",\"pageCountTotal\":\"35892\",\"landCountTotal\":\"12615\",\"newLandCountTotal\":\"8391\",\"returnLandCountTotal\":\"4224\",\"spiderCountTotal\":\"62\",\"goalCountTotal\":\"239.000000\",\"callGoalCountTotal\":\"239.000000\",\"callCountTotal\":\"321.000000\",\"onlineGoalCountTotal\":\"0.000000\",\"conversionPerc\":\"1.89\",\"callConversionPerc\":\"74.45\"}\n{\"metricDate\":\"2013-05-08\",\"pageCountTotal\":\"34106\",\"landCountTotal\":\"12391\",\"newLandCountTotal\":\"8389\",\"returnLandCountTotal\":\"4002\",\"spiderCountTotal\":\"90\",\"goalCountTotal\":\"221.000000\",\"callGoalCountTotal\":\"221.000000\",\"callCountTotal\":\"295.000000\",\"onlineGoalCountTotal\":\"0.000000\",\"conversionPerc\":\"1.78\",\"callConversionPerc\":\"74.92\"}\n{\"metricDate\":\"2013-05-09\",\"pageCountTotal\":\"32721\",\"landCountTotal\":\"12447\",\"newLandCountTotal\":\"8541\",\"returnLandCountTotal\":\"3906\",\"spiderCountTotal\":\"54\",\"goalCountTotal\":\"207.000000\",\"callGoalCountTotal\":\"207.000000\",\"callCountTotal\":\"280.000000\",\"onlineGoalCountTotal\":\"0.000000\",\"conversionPerc\":\"1.66\",\"callConversionPerc\":\"73.93\"}\n{\"metricDate\":\"2013-05-10\",\"pageCountTotal\":\"29724\",\"landCountTotal\":\"11616\",\"newLandCountTotal\":\"8063\",\"returnLandCountTotal\":\"3553\",\"spiderCountTotal\":\"139\",\"goalCountTotal\":\"207.000000\",\"callGoalCountTotal\":\"207.000000\",\"callCountTotal\":\"301.000000\",\"onlineGoalCountTotal\":\"0.000000\",\"conversionPerc\":\"1.78\",\"callConversionPerc\":\"68.77\"}\n{\"metricDate\":\"2013-05-11\",\"pageCountTotal\":\"22061\",\"landCountTotal\":\"9660\",\"newLandCountTotal\":\"6971\",\"returnLandCountTotal\":\"2689\",\"spiderCountTotal\":\"52\",\"goalCountTotal\":\"3.000000\",\"callGoalCountTotal\":\"3.000000\",\"callCountTotal\":\"40.000000\",\"onlineGoalCountTotal\":\"0.000000\",\"conversionPerc\":\"0.03\",\"callConversionPerc\":\"7.50\"}\n{\"metricDate\":\"2013-05-12\",\"pageCountTotal\":\"23341\",\"landCountTotal\":\"9935\",\"newLandCountTotal\":\"6960\",\"returnLandCountTotal\":\"2975\",\"spiderCountTotal\":\"45\",\"goalCountTotal\":\"0.000000\",\"callGoalCountTotal\":\"0.000000\",\"callCountTotal\":\"12.000000\",\"onlineGoalCountTotal\":\"0.000000\",\"conversionPerc\":\"0.00\",\"callConversionPerc\":\"0.00\"}\n{\"metricDate\":\"2013-05-13\",\"pageCountTotal\":\"36565\",\"landCountTotal\":\"13583\",\"newLandCountTotal\":\"9277\",\"returnLandCountTotal\":\"4306\",\"spiderCountTotal\":\"69\",\"goalCountTotal\":\"246.000000\",\"callGoalCountTotal\":\"246.000000\",\"callCountTotal\":\"324.000000\",\"onlineGoalCountTotal\":\"0.000000\",\"conversionPerc\":\"1.81\",\"callConversionPerc\":\"75.93\"}\n{\"metricDate\":\"2013-05-14\",\"pageCountTotal\":\"35260\",\"landCountTotal\":\"13797\",\"newLandCountTotal\":\"9375\",\"returnLandCountTotal\":\"4422\",\"spiderCountTotal\":\"59\",\"goalCountTotal\":\"212.000000\",\"callGoalCountTotal\":\"212.000000\",\"callCountTotal\":\"283.000000\",\"onlineGoalCountTotal\":\"0.000000\",\"conversionPerc\":\"1.54\",\"callConversionPerc\":\"74.91\"}\n{\"metricDate\":\"2013-05-15\",\"pageCountTotal\":\"35836\",\"landCountTotal\":\"13792\",\"newLandCountTotal\":\"9532\",\"returnLandCountTotal\":\"4260\",\"spiderCountTotal\":\"94\",\"goalCountTotal\":\"187.000000\",\"callGoalCountTotal\":\"187.000000\",\"callCountTotal\":\"258.000000\",\"onlineGoalCountTotal\":\"0.000000\",\"conversionPerc\":\"1.36\",\"callConversionPerc\":\"72.48\"}\n{\"metricDate\":\"2013-05-16\",\"pageCountTotal\":\"33136\",\"landCountTotal\":\"12821\",\"newLandCountTotal\":\"8755\",\"returnLandCountTotal\":\"4066\",\"spiderCountTotal\":\"65\",\"goalCountTotal\":\"192.000000\",\"callGoalCountTotal\":\"192.000000\",\"callCountTotal\":\"260.000000\",\"onlineGoalCountTotal\":\"0.000000\",\"conversionPerc\":\"1.50\",\"callConversionPerc\":\"73.85\"}\n{\"metricDate\":\"2013-05-17\",\"pageCountTotal\":\"29564\",\"landCountTotal\":\"11721\",\"newLandCountTotal\":\"8191\",\"returnLandCountTotal\":\"3530\",\"spiderCountTotal\":\"213\",\"goalCountTotal\":\"166.000000\",\"callGoalCountTotal\":\"166.000000\",\"callCountTotal\":\"222.000000\",\"onlineGoalCountTotal\":\"0.000000\",\"conversionPerc\":\"1.42\",\"callConversionPerc\":\"74.77\"}\n{\"metricDate\":\"2013-05-18\",\"pageCountTotal\":\"23686\",\"landCountTotal\":\"9916\",\"newLandCountTotal\":\"7335\",\"returnLandCountTotal\":\"2581\",\"spiderCountTotal\":\"56\",\"goalCountTotal\":\"5.000000\",\"callGoalCountTotal\":\"5.000000\",\"callCountTotal\":\"34.000000\",\"onlineGoalCountTotal\":\"0.000000\",\"conversionPerc\":\"0.05\",\"callConversionPerc\":\"14.71\"}\n{\"metricDate\":\"2013-05-19\",\"pageCountTotal\":\"23528\",\"landCountTotal\":\"9952\",\"newLandCountTotal\":\"7184\",\"returnLandCountTotal\":\"2768\",\"spiderCountTotal\":\"57\",\"goalCountTotal\":\"1.000000\",\"callGoalCountTotal\":\"1.000000\",\"callCountTotal\":\"14.000000\",\"onlineGoalCountTotal\":\"0.000000\",\"conversionPerc\":\"0.01\",\"callConversionPerc\":\"7.14\"}\n{\"metricDate\":\"2013-05-20\",\"pageCountTotal\":\"37391\",\"landCountTotal\":\"13488\",\"newLandCountTotal\":\"9024\",\"returnLandCountTotal\":\"4464\",\"spiderCountTotal\":\"69\",\"goalCountTotal\":\"227.000000\",\"callGoalCountTotal\":\"227.000000\",\"callCountTotal\":\"291.000000\",\"onlineGoalCountTotal\":\"0.000000\",\"conversionPerc\":\"1.68\",\"callConversionPerc\":\"78.01\"}\n{\"metricDate\":\"2013-05-21\",\"pageCountTotal\":\"36299\",\"landCountTotal\":\"13174\",\"newLandCountTotal\":\"8817\",\"returnLandCountTotal\":\"4357\",\"spiderCountTotal\":\"77\",\"goalCountTotal\":\"164.000000\",\"callGoalCountTotal\":\"164.000000\",\"callCountTotal\":\"221.000000\",\"onlineGoalCountTotal\":\"0.000000\",\"conversionPerc\":\"1.24\",\"callConversionPerc\":\"74.21\"}\n{\"metricDate\":\"2013-05-22\",\"pageCountTotal\":\"34201\",\"landCountTotal\":\"12433\",\"newLandCountTotal\":\"8388\",\"returnLandCountTotal\":\"4045\",\"spiderCountTotal\":\"76\",\"goalCountTotal\":\"195.000000\",\"callGoalCountTotal\":\"195.000000\",\"callCountTotal\":\"262.000000\",\"onlineGoalCountTotal\":\"0.000000\",\"conversionPerc\":\"1.57\",\"callConversionPerc\":\"74.43\"}\n{\"metricDate\":\"2013-05-23\",\"pageCountTotal\":\"32951\",\"landCountTotal\":\"11611\",\"newLandCountTotal\":\"7757\",\"returnLandCountTotal\":\"3854\",\"spiderCountTotal\":\"68\",\"goalCountTotal\":\"167.000000\",\"callGoalCountTotal\":\"167.000000\",\"callCountTotal\":\"231.000000\",\"onlineGoalCountTotal\":\"0.000000\",\"conversionPerc\":\"1.44\",\"callConversionPerc\":\"72.29\"}\n{\"metricDate\":\"2013-05-24\",\"pageCountTotal\":\"28967\",\"landCountTotal\":\"10821\",\"newLandCountTotal\":\"7396\",\"returnLandCountTotal\":\"3425\",\"spiderCountTotal\":\"106\",\"goalCountTotal\":\"167.000000\",\"callGoalCountTotal\":\"167.000000\",\"callCountTotal\":\"203.000000\",\"onlineGoalCountTotal\":\"0.000000\",\"conversionPerc\":\"1.54\",\"callConversionPerc\":\"82.27\"}\n{\"metricDate\":\"2013-05-25\",\"pageCountTotal\":\"19741\",\"landCountTotal\":\"8393\",\"newLandCountTotal\":\"6168\",\"returnLandCountTotal\":\"2225\",\"spiderCountTotal\":\"78\",\"goalCountTotal\":\"0.000000\",\"callGoalCountTotal\":\"0.000000\",\"callCountTotal\":\"28.000000\",\"onlineGoalCountTotal\":\"0.000000\",\"conversionPerc\":\"0.00\",\"callConversionPerc\":\"0.00\"}\n{\"metricDate\":\"2013-05-26\",\"pageCountTotal\":\"19770\",\"landCountTotal\":\"8237\",\"newLandCountTotal\":\"6009\",\"returnLandCountTotal\":\"2228\",\"spiderCountTotal\":\"79\",\"goalCountTotal\":\"0.000000\",\"callGoalCountTotal\":\"0.000000\",\"callCountTotal\":\"8.000000\",\"onlineGoalCountTotal\":\"0.000000\",\"conversionPerc\":\"0.00\",\"callConversionPerc\":\"0.00\"}\n{\"metricDate\":\"2013-05-27\",\"pageCountTotal\":\"26208\",\"landCountTotal\":\"9755\",\"newLandCountTotal\":\"6779\",\"returnLandCountTotal\":\"2976\",\"spiderCountTotal\":\"82\",\"goalCountTotal\":\"26.000000\",\"callGoalCountTotal\":\"26.000000\",\"callCountTotal\":\"40.000000\",\"onlineGoalCountTotal\":\"0.000000\",\"conversionPerc\":\"0.27\",\"callConversionPerc\":\"65.00\"}\n{\"metricDate\":\"2013-05-28\",\"pageCountTotal\":\"36980\",\"landCountTotal\":\"12463\",\"newLandCountTotal\":\"8226\",\"returnLandCountTotal\":\"4237\",\"spiderCountTotal\":\"132\",\"goalCountTotal\":\"208.000000\",\"callGoalCountTotal\":\"208.000000\",\"callCountTotal\":\"276.000000\",\"onlineGoalCountTotal\":\"0.000000\",\"conversionPerc\":\"1.67\",\"callConversionPerc\":\"75.36\"}\n{\"metricDate\":\"2013-05-29\",\"pageCountTotal\":\"34190\",\"landCountTotal\":\"12014\",\"newLandCountTotal\":\"8279\",\"returnLandCountTotal\":\"3735\",\"spiderCountTotal\":\"90\",\"goalCountTotal\":\"179.000000\",\"callGoalCountTotal\":\"179.000000\",\"callCountTotal\":\"235.000000\",\"onlineGoalCountTotal\":\"0.000000\",\"conversionPerc\":\"1.49\",\"callConversionPerc\":\"76.17\"}\n{\"metricDate\":\"2013-05-30\",\"pageCountTotal\":\"33867\",\"landCountTotal\":\"11965\",\"newLandCountTotal\":\"8231\",\"returnLandCountTotal\":\"3734\",\"spiderCountTotal\":\"63\",\"goalCountTotal\":\"160.000000\",\"callGoalCountTotal\":\"160.000000\",\"callCountTotal\":\"219.000000\",\"onlineGoalCountTotal\":\"0.000000\",\"conversionPerc\":\"1.34\",\"callConversionPerc\":\"73.06\"}\n{\"metricDate\":\"2013-05-31\",\"pageCountTotal\":\"27536\",\"landCountTotal\":\"10302\",\"newLandCountTotal\":\"7333\",\"returnLandCountTotal\":\"2969\",\"spiderCountTotal\":\"108\",\"goalCountTotal\":\"173.000000\",\"callGoalCountTotal\":\"173.000000\",\"callCountTotal\":\"226.000000\",\"onlineGoalCountTotal\":\"0.000000\",\"conversionPerc\":\"1.68\",\"callConversionPerc\":\"76.55\"}\n\r\n" and here is my R output metricDate "2013-05-01" pageCountTotal "33682" landCountTotal "11838" newLandCountTotal "8023" returnLandCountTotal "3815" spiderCountTotal "84" goalCountTotal "177.000000" callGoalCountTotal "177.000000" callCountTotal "237.000000" onlineGoalCountTotal "0.000000" conversionPerc "1.50" callConversionPerc "74.68\"}{\"metricDate\":\"2013-05-02\",\"pageCountTotal\":\"32622\",\"landCountTotal\":\"11626\",\"newLandCountTotal\":\"7945\",\"returnLandCountTotal\":\"3681\",\"spiderCountTotal\":\"58\",\"goalCountTotal\":\"210.000000\",\"callGoalCountTotal\":\"210.000000\",\"callCountTotal\":\"297.000000\",\"onlineGoalCountTotal\":\"0.000000\",\"conversionPerc\":\"1.81\",\"callConversionPerc\":\"70.71\"}{\"metricDate\":\"2013-05-03\",\"pageCountTotal\":\"28467\",\"landCountTotal\":\"11102\",\"newLandCountTotal\":\"7786\",\"returnLandCountTotal\":\"3316\",\"spiderCountTotal\":\"56\",\"goalCountTotal\":\"186.000000\",\"callGoalCountTotal\":\"186.000000\",\"callCountTotal\":\"261.000000\",\"onlineGoalCountTotal\":\"0.000000\",\"conversionPerc\":\"1.68\",\"callConversionPerc\":\"71.26\"}{\"metricDate\":\"2013-05-04\",\"pageCountTotal\":\"20884\",\"landCountTotal\":\"9031\",\"newLandCountTotal\":\"6670\",\"returnLandCountTotal\":\"2361\",\"spiderCountTotal\":\"51\",\"goalCountTotal\":\"7.000000\",\"callGoalCountTotal\":\"7.000000\",\"callCountTotal\":\"44.000000\",\"onlineGoalCountTotal\":\"0.000000\",\"conversionPerc\":\"0.08\",\"callConversionPerc\":\"15.91\"}{\"metricDate\":\"2013-05-05\",\"pageCountTotal\":\"20481\",\"landCountTotal\":\"8782\",\"newLandCountTotal\":\"6390\",\"returnLandCountTotal\":\"2392\",\"spiderCountTotal\":\"58\",\"goalCountTotal\":\"1.000000\",\"callGoalCountTotal\":\"1.000000\",\"callCountTotal\":\"8.000000\",\"onlineGoalCountTotal\":\"0.000000\",\"conversionPerc\":\"0.01\",\"callConversionPerc\":\"12.50\"}{\"metricDate\":\"2013-05-06\",\"pageCountTotal\":\"25175\",\"landCountTotal\":\"10019\",\"newLandCountTotal\":\"7082\",\"returnLandCountTotal\":\"2937\",\"spiderCountTotal\":\"62\",\"goalCountTotal\":\"24.000000\",\"callGoalCountTotal\":\"24.000000\",\"callCountTotal\":\"47.000000\",\"onlineGoalCountTotal\":\"0.000000\",\"conversionPerc\":\"0.24\",\"callConversionPerc\":\"51.06\"}{\"metricDate\":\"2013-05-07\",\"pageCountTotal\":\"35892\",\"landCountTotal\":\"12615\",\"newLandCountTotal\":\"8391\",\"returnLandCountTotal\":\"4224\",\"spiderCountTotal\":\"62\",\"goalCountTotal\":\"239.000000\",\"callGoalCountTotal\":\"239.000000\",\"callCountTotal\":\"321.000000\",\"onlineGoalCountTotal\":\"0.000000\",\"conversionPerc\":\"1.89\",\"callConversionPerc\":\"74.45\"}{\"metricDate\":\"2013-05-08\",\"pageCountTotal\":\"34106\",\"landCountTotal\":\"12391\",\"newLandCountTotal\":\"8389\",\"returnLandCountTotal\":\"4002\",\"spiderCountTotal\":\"90\",\"goalCountTotal\":\"221.000000\",\"callGoalCountTotal\":\"221.000000\",\"callCountTotal\":\"295.000000\",\"onlineGoalCountTotal\":\"0.000000\",\"conversionPerc\":\"1.78\",\"callConversionPerc\":\"74.92\"}{\"metricDate\":\"2013-05-09\",\"pageCountTotal\":\"32721\",\"landCountTotal\":\"12447\",\"newLandCountTotal\":\"8541\",\"returnLandCountTotal\":\"3906\",\"spiderCountTotal\":\"54\",\"goalCountTotal\":\"207.000000\",\"callGoalCountTotal\":\"207.000000\",\"callCountTotal\":\"280.000000\",\"onlineGoalCountTotal\":\"0.000000\",\"conversionPerc\":\"1.66\",\"callConversionPerc\":\"73.93\"}{\"metricDate\":\"2013-05-10\",\"pageCountTotal\":\"29724\",\"landCountTotal\":\"11616\",\"newLandCountTotal\":\"8063\",\"returnLandCountTotal\":\"3553\",\"spiderCountTotal\":\"139\",\"goalCountTotal\":\"207.000000\",\"callGoalCountTotal\":\"207.000000\",\"callCountTotal\":\"301.000000\",\"onlineGoalCountTotal\":\"0.000000\",\"conversionPerc\":\"1.78\",\"callConversionPerc\":\"68.77\"}{\"metricDate\":\"2013-05-11\",\"pageCountTotal\":\"22061\",\"landCountTotal\":\"9660\",\"newLandCountTotal\":\"6971\",\"returnLandCountTotal\":\"2689\",\"spiderCountTotal\":\"52\",\"goalCountTotal\":\"3.000000\",\"callGoalCountTotal\":\"3.000000\",\"callCountTotal\":\"40.000000\",\"onlineGoalCountTotal\":\"0.000000\",\"conversionPerc\":\"0.03\",\"callConversionPerc\":\"7.50\"}{\"metricDate\":\"2013-05-12\",\"pageCountTotal\":\"23341\",\"landCountTotal\":\"9935\",\"newLandCountTotal\":\"6960\",\"returnLandCountTotal\":\"2975\",\"spiderCountTotal\":\"45\",\"goalCountTotal\":\"0.000000\",\"callGoalCountTotal\":\"0.000000\",\"callCountTotal\":\"12.000000\",\"onlineGoalCountTotal\":\"0.000000\",\"conversionPerc\":\"0.00\",\"callConversionPerc\":\"0.00\"}{\"metricDate\":\"2013-05-13\",\"pageCountTotal\":\"36565\",\"landCountTotal\":\"13583\",\"newLandCountTotal\":\"9277\",\"returnLandCountTotal\":\"4306\",\"spiderCountTotal\":\"69\",\"goalCountTotal\":\"246.000000\",\"callGoalCountTotal\":\"246.000000\",\"callCountTotal\":\"324.000000\",\"onlineGoalCountTotal\":\"0.000000\",\"conversionPerc\":\"1.81\",\"callConversionPerc\":\"75.93\"}{\"metricDate\":\"2013-05-14\",\"pageCountTotal\":\"35260\",\"landCountTotal\":\"13797\",\"newLandCountTotal\":\"9375\",\"returnLandCountTotal\":\"4422\",\"spiderCountTotal\":\"59\",\"goalCountTotal\":\"212.000000\",\"callGoalCountTotal\":\"212.000000\",\"callCountTotal\":\"283.000000\",\"onlineGoalCountTotal\":\"0.000000\",\"conversionPerc\":\"1.54\",\"callConversionPerc\":\"74.91\"}{\"metricDate\":\"2013-05-15\",\"pageCountTotal\":\"35836\",\"landCountTotal\":\"13792\",\"newLandCountTotal\":\"9532\",\"returnLandCountTotal\":\"4260\",\"spiderCountTotal\":\"94\",\"goalCountTotal\":\"187.000000\",\"callGoalCountTotal\":\"187.000000\",\"callCountTotal\":\"258.000000\",\"onlineGoalCountTotal\":\"0.000000\",\"conversionPerc\":\"1.36\",\"callConversionPerc\":\"72.48\"}{\"metricDate\":\"2013-05- (I've truncated the output a little). The R output has been read properly up until "callConversionPerc" and after that the JSON parsing seems to break. Is there some default parameter that I've missed that could couse this behaviour? I have checked for unmasked speechmarks and anything obvious like that I didn't see any. Surely it wouldn't be the new line operator that occurs shortly after, would it? EDIT: So this does appear to be a new line issue. Here's another 'JSON' string I've pulled into R, again the double quote marks are all escaped "{\"modelId\":\"7\",\"igrp\":\"1\",\"modelName\":\"Equally Weighted\",\"modelType\":\"spread\",\"status\":200,\"matchCriteria\":\"\",\"lookbackDays\":90}\n{\"modelId\":\"416\",\"igrp\":\"1\",\"modelName\":\"First and Last Click Weighted \",\"modelType\":\"spread\",\"status\":200,\"matchCriteria\":\"\",\"lookbackDays\":90,\"firstWeight\":3,\"lastWeight\":3}\n{\"modelId\":\"5\",\"igrp\":\"1\",\"modelName\":\"First Click\",\"modelType\":\"first\",\"status\":200,\"matchCriteria\":\"\",\"lookbackDays\":90}\n{\"modelId\":\"8\",\"igrp\":\"1\",\"modelName\":\"First Click Weighted\",\"modelType\":\"spread\",\"status\":200,\"matchCriteria\":\"\",\"lookbackDays\":90,\"firstWeight\":3}\n{\"modelId\":\"128\",\"igrp\":\"1\",\"modelName\":\"First Click Weighted across PPC\",\"modelType\":\"spread\",\"status\":200,\"matchCriteria\":\"\",\"lookbackDays\":90,\"firstWeight\":3,\"channelsMode\":\"include\",\"channels\":[5]}\n{\"modelId\":\"6\",\"igrp\":\"1\",\"modelName\":\"Last Click\",\"modelType\":\"last\",\"status\":200,\"matchCriteria\":\"\",\"lookbackDays\":90}\n{\"modelId\":\"417\",\"igrp\":\"1\",\"modelName\":\"Last Click Weighted \",\"modelType\":\"spread\",\"status\":200,\"matchCriteria\":\"\",\"lookbackDays\":90,\"lastWeight\":3}\n\r\n" When I try to parse this using fromJSON I get the same problem, it gets to the last term on the first line and then stop parsing properly. Note that in this new case the output is slightly different from before returning NULL for the last item (instead of the messy string from the previous example. $modelId [1] "7" $igrp [1] "1" $modelName [1] "Equally Weighted" $modelType [1] "spread" $status [1] 200 $matchCriteria [1] "" $lookbackDays NULL As you can see, the components now use the "$" convention as if they are naming components and the last item is null. I am wondering if this is to do with the way that fromJSON is parsing the strings, and when it is asked to create a variable with the same name as a variable that already exists it then fails and just returns a string or a NULL. I would have thought that dealing with that sort of case would be coded into RJSONIO as it's pretty standard for JSON data to have repeating names. I'm stumped as to how to fix this.
There are two aspects of the JSON that seem to be causing trouble. The first is the trailing "\n\r\n", so get rid of that contJSON = sub("\n\r\n$, "", contJSON) The second is that the string is actually a series of valid JSON lines rather than a single JSON object. So either split it into valid JSON objects and process each individually lapply(strsplit(contJSON, "\n"), fromJSON, asText=TRUE) or create a string representing a single valid JSON object and process that fromJSON(sprintf("[%s]", gsub("\n", ",", contJSON)), asText=TRUE) Both of these rely on details of the data so are not generally useful. It's clear that asText is an argument for fromJSON > args(RJSONIO::fromJSON) function (content, handler = NULL, default.size = 100, depth = 150L, allowComments = TRUE, asText = isContent(content), data = NULL, maxChar = c(0L, nchar(content)), simplify = Strict, nullValue = NULL, simplifyWithNames = TRUE, encoding = NA_character_, stringFun = NULL, ...) NULL So if R is complaining about an unused parameter it's likely that you're actually accessing a different function, in particular rjson::fromJSON. Perhaps search() shows that rjson appears before RJSONIO?
Using Python's csv.dictreader to search for specific key to then print its value
BACKGROUND: I am having issues trying to search through some CSV files. I've gone through the python documentation: http://docs.python.org/2/library/csv.html about the csv.DictReader(csvfile, fieldnames=None, restkey=None, restval=None, dialect='excel', *args, **kwds) object of the csv module. My understanding is that the csv.DictReader assumes the first line/row of the file are the fieldnames, however, my csv dictionary file simply starts with "key","value" and goes on for atleast 500,000 lines. My program will ask the user for the title (thus the key) they are looking for, and present the value (which is the 2nd column) to the screen using the print function. My problem is how to use the csv.dictreader to search for a specific key, and print its value. Sample Data: Below is an example of the csv file and its contents... "Mamer","285713:13" "Champhol","461034:2" "Station Palais","972811:0" So if i want to find "Station Palais" (input), my output will be 972811:0. I am able to manipulate the string and create the overall program, I just need help with the csv.dictreader.I appreciate any assistance. EDITED PART: import csv def main(): with open('anchor_summary2.csv', 'rb') as file_data: list_of_stuff = [] reader = csv.DictReader(file_data, ("title", "value")) for i in reader: list_of_stuff.append(i) print list_of_stuff main()
The documentation you linked to provides half the answer: class csv.DictReader(csvfile, fieldnames=None, restkey=None, restval=None, dialect='excel', *args, **kwds) [...] maps the information read into a dict whose keys are given by the optional fieldnames parameter. If the fieldnames parameter is omitted, the values in the first row of the csvfile will be used as the fieldnames. It would seem that if the fieldnames parameter is passed, the given file will not have its first record interpreted as headers (the parameter will be used instead). # file_data is the text of the file, not the filename reader = csv.DictReader(file_data, ("title", "value")) for i in reader: list_of_stuff.append(i) which will (apparently; I've been having trouble with it) produce the following data structure: [{"title": "Mamer", "value": "285713:13"}, {"title": "Champhol", "value": "461034:2"}, {"title": "Station Palais", "value": "972811:0"}] which may need to be further massaged into a title-to-value mapping by something like this: data = {} for i in list_of_stuff: data[i["title"]] = i["value"] Now just use the keys and values of data to complete your task. And here it is as a dictionary comprehension: data = {row["title"]: row["value"] for row in csv.DictReader(file_data, ("title", "value"))}
The currently accepted answer is fine, but there's a slightly more direct way of getting at the data. The dict() constructor in Python can take any iterable. In addition, your code might have issues on Python 3, because Python 3's csv module expects the file to be opened in text mode, not binary mode. You can make your code compatible with 2 and 3 by using io.open instead of open. import csv import io with io.open('anchor_summary2.csv', 'r', newline='', encoding='utf-8') as f: data = dict(csv.reader(f)) print(data['Champhol']) As a warning, if your csv file has two rows with the same value in the first column, the later value will overwrite the earlier value. (This is also true of the other posted solution.) If your program really is only supposed to print the result, there's really no reason to build a keyed dictionary. import csv import io # Python 2/3 compat try: input = raw_input except NameError: pass def main(): # Case-insensitive & leading/trailing whitespace insensitive user_city = input('Enter a city: ').strip().lower() with io.open('anchor_summary2.csv', 'r', newline='', encoding='utf-8') as f: for city, value in csv.reader(f): if user_city == city.lower(): print(value) break else: print("City not found.") if __name __ == '__main__': main() The advantage of this technique is that the csv isn't loaded into memory and the data is only iterated over once. I also added a little code the calls lower on both the keys to make the match case-insensitive. Another advantage is if the city the user requests is near the top of the file, it returns almost immediately and stops looking through the file. With all that said, if searching performance is your primary consideration, you should consider storing the data in a database.