Regex for getting values from JSON block - json

Could someone rectify the error I am making while grouping a JSON data block for generating key, value tuples list. I am able to collect all key,value pairs except the last pair which ends with }. I don't seem to understand why or | alternation is not working for the last pair.
\"(.*?)\"[:]+\"?(.*?)\"?[,[^\s](?=\")|[^\s]\}$]
I am using Python's re.findall function to generate the groups.
Example data block :
{
"author_flair_text": null,
"author": "joeinfro",
"id": "d2nvjik",
"link_id": "t3_4h4boa",
"gilded": 0,
"created_utc": 1462060800,
"author_flair_css_class": null,
"parent_id": "t3_4h4boa",
"ups": 1,
"body": "thats 1 case per 5000 people. nice!",
"subreddit_id": "t5_2qh13",
"stickied": false,
"edited": false,
"subreddit": "worldnews",
"distinguished": null,
"score": 1,
"retrieved_on": 1465550534,
"controversiality": 0
}
EDIT : Finally found the solution for it. Made use of non-capturing group of regex.
Solution :
\"(.*?)\"[:]+\"?(.*?)\"?(?:,(?=\")|\}$)
`

Related

How to map jooq result to their respective entities

I have this SQL query:
select question.*,
question_option.id
from question
left join question_option on question_option.question_id = question.id;
How do I map the result obtained to the entity. so that the expected result should be like
Can anyone give the sample code for getting the result as above
{
"id": 2655,
"type": "MCQSingleCorrect",
"difficultyLevel": "Advanced",
"question": "Which country are you from?",
"answer": null,
"marks": 1.5,
"negativeMarks": 0.5,
"hint": null,
"explanation": null,
"booleanAnswer": null,
"passage": null,
"isPassageQuestion": null,
"audioFile": null,
"videoFile": null,
"questionFiles": [],
"tags": [],
"updatedAt": "2021-12-21T11:57:03.229136Z",
"createdAt": "2021-12-21T11:57:03.229098Z",
"questionOptions": [
{
"id": 2719,
"option": "India",
"index": 1,
"correct": false,
"blank": null
},
{
"id": 2720,
"option": "Newzealand",
"index": 1,
"correct": false,
"blank": null
},
{
"id": 2721,
"option": "England",
"index": 1,
"correct": true,
"blank": null
},
{
"id": 2722,
"option": "Australia",
"index": 1,
"correct": false,
"blank": null
}
]}
I'm answering from the perspective of our comments discussion, where I suggested you don't need JPA in the middle, because you can do every mapping / projection with jOOQ directly. In this case, if you're targeting a JSON client, why not just use SQL/JSON, for example? Rather than joining, you nest your collection like this:
ctx.select(jsonObject(
key("id", QUESTION.ID),
key("type", QUESTION.TYPE),
..
key("questionOptions", jsonArrayAgg(jsonObject(
key("id", QUESTION_OPTION.ID),
key("option", QUESTION_OPTION.OPTION),
..
)))
))
.from(QUESTION)
.leftJoin(QUESTION_OPTION)
.on(QUESTION_OPTION.QUESTION_ID.eq(QUESTION.ID))
// Assuming you have a primary key here.
// Otherwise, add also the other QUESTION columns to the GROUP BY clause
.groupBy(QUESTION.ID)
.fetch();
This will produce a NULL JSON array if a question doesn't have any options. You can coalesce() it to an empty array, if needed. There are other ways to achieve the same thing, you could also use MULTISET if you don't actually need JSON, but just some hierarchy of Java objects.
As a rule of thumb, you hardly ever need JPA in your code when you're using jOOQ, except if you really rely on JPA's object graph persistence features.
You can write the query with jOOQ and the do this:
Query result = em.createNativeQuery(query.getSQL());
query.getResultList() // or query.getSingleResult() depending what you need.
Read more here:
https://www.jooq.org/doc/3.15/manual/sql-execution/alternative-execution-models/using-jooq-with-jpa/using-jooq-with-jpa-native/
JSON can be fetched directly using SQL (and also jOOQ). Here are some examples:
https://72.services/use-the-power-of-your-database-xml-and-json/

Work around for sequelize boolean value from mysql database

I am getting data from mysql database to express rest api app. Using sequelize as a ORM.
When it is comes to a BIT(1) value from mysql, sequelize returns a instance of buffer object.
{
"id": 4,
"ProductPrice": 12.25,
"ProductQuantityOnHand": 0,
"ProductCode": "P486",
"ProductName": "FirstProduct",
"ProductDescription": null,
"ProductActive": {
"type": "Buffer",
"data": [
1
]
},
"createdAt": "2019-02-02T11:27:00.000Z",
"updatedAt": "2019-02-02T11:27:00.000Z"
}
Like here product active a BIT(1) and sequelize returning a object.
How can I get boolean value instead of an object?
Like this.
{
"id": 4,
"ProductPrice": 12.25,
"ProductQuantityOnHand": 0,
"ProductCode": "P486",
"ProductName": "FirstProduct",
"ProductDescription": null,
"ProductActive": true,
"createdAt": "2019-02-02T11:27:00.000Z",
"updatedAt": "2019-02-02T11:27:00.000Z"
}
I might suggest that you just use an INT column in your MySQL table. Assuming you only store values 0 and 1, these same values should show up in your ORM/application layer.
As the value 0 is "falsy" in JavaScipt, it would logically behave the same way as false, and vice-versa for 1, which is "truthy."

How do you get the first element out of a JSON String without knowing the name of the element in FileMaker 16 or 17?

I had an issue today with Filemaker on how to get the first element out of a json result without knowing the key.
Example $json result from an API call
{
"26298070": {
"task_id": "26298070",
"parent_id": "0",
"name": "DEPOT-0045 Research ODBC Model Extraction via Django To cut down on development time from Filemaker to Postgres",
"external_task_id": "32c8fd51-2066-42b9-b88b-8a2275fafc3f",
"external_parent_id": "64e7c829-d88e-48ae-9ba4-bb7a3871a7ce",
"level": "1",
"add_date": "2018-06-04 21:45:16",
"archived": "0",
"color": "#34C644",
"tags": "DEPOT-0045",
"budgeted": "1",
"checked_date": null,
"root_group_id": "91456",
"assigned_to": null,
"assigned_by": null,
"due_date": null,
"note": "",
"context": null,
"folder": null,
"repeat": null,
"billable": "0",
"budget_unit": "hours",
"public_hash": null,
"modify_time": null
}
}
I tried JSONGetElement( $json, "") and got the original json.
I tried JSONGetElement( $json, ".") and got the original json.
I tried JSONGetElement( $json, 1 ) and got nothing.
How do you get the first element out of a JSON String without knowing the name of the element in FileMaker 16 or 17?
Try this for the root element:
JSONListKeys ( $json ; "" )
result: 26298070
Once you get the root, you can get the child keys.
I remembered that FileMaker has a function to extract words from text so I thought I'd see what happened if I extracted the first word as a key.
I tried
JSONGetElement ( $json ; MiddleWords ( $json,1,1 ) )
and got the result I was looking for.
{
"add_date": "2018-06-04 21:45:16",
"archived": "0",
"assigned_by": null,
"assigned_to": null,
"billable": "0",
"budget_unit": "hours",
"budgeted": "1",
"checked_date": null,
"color": "#34C644",
"context": null,
"due_date": null,
"external_parent_id": "64e7c829-d88e-48ae-9ba4-bb7a3871a7ce",
"external_task_id": "32c8fd51-2066-42b9-b88b-8a2275fafc3f",
"folder": null,
"level": "1",
"modify_time": null,
"name": "DEPOT-0045 Research ODBC Model Extraction via Django To cut down on development time from Filemaker to Postgres",
"note": "",
"parent_id": "0",
"public_hash": null,
"repeat": null,
"root_group_id": "91456",
"tags": "DEPOT-0045",
"task_id": "26298070"
}
which makes it easy to parse simple JSON schema's that use attributes for keys.

R: JSON data imported into wrong columns

I imported some JSON data using rjson library. The problem I'm facing is that some of the data appears to be misaligned. I suspect this is due to missing values.
How can I detect and re-align the data that is in incorrect columns and fill empty values with NULL. I cannot share the data. I hope the image will be enough.
code used to import data:
library(rjson)
json_data <- do.call(rbind, lapply(readLines(training.file$filepaths[ind]), rjson::fromJSON))
json_data <- as.data.frame(json_data)
I have also tried using jsonlite::fromJSON function instead of rjson::fromJSON, but get the following error
Error in feed_push_parser(readBin(con, raw(), n), reset = TRUE) :
parse error: trailing garbage
d_str": null, "place": null} {"truncated": false, "text": "R
(right here) ------^
json file format (data is manipulated but all properties are present in this example):
{
"truncated": false, "text": "abc abc", "in_reply_to_status_id": null,
"id": 123, "favorite_count": 0, "retweeted": false, "entities": {
"symbols": [], "user_mentions": [], "hashtags": [], "urls": []
},
"in_reply_to_screen_name": null, "id_str": "123", "retweet_count": 0,
"in_reply_to_user_id": null, "screen_name_statistics": {
"has_underscore": true, "contains_swear": false, "has_digits": false,
"contains_condition": false, "has_chars": true
},
"user": {
"verified": false, "geo_enabled": false, "followers_count": 0,
"utc_offset": -14400, "statuses_count": 17600, "friends_count": 4425,
"lang": "en", "favourites_count": 1900, "screen_name": "1name1",
"url": null, "created_at": "Sat Jun 00 03:36:27 +0000 2012",
"time_zone": "Atlantic Time (Canada)", "listed_count": 2
},
"geo": null, "in_reply_to_user_id_str": null, "lang": "en",
"created_at": "Mon Nov 55 05:18:49 +0000 2013",
"in_reply_to_status_id_str": null, "place": null
}
Further information:
obj1 and obj2 contain different number of properties obj1 contains 19 properties while obje contains 20 properties
misalignment occurs when list is converted to dataframe using as.data.frame. A custom function may be required to take property names into consideration.
I used rjson::fromJson function to import data. This was imported as a list which I then converted to dataframe for further analysis using as.data.frame.
At first I did not notice that the json objects had different number of properties which was causing misalignment of data in dataframe. column names were not matched.
To fix this, I wrote a custom mapping function which looks at individual values from list and maps them in a pre-defined dataframe.
Code specific to my example is available here
. Specifically the "importJSON" function tackles the mapping of list to dataframe.

JSON to CSV: How to add filters (columns) in the final Excel table?

First, I apologize if my description is not accurate enough for you, I am a total newbie and I don't know a thing about programming, so don't hesitate to tell me if you need more detailed info, but I will try to be as precise as possible.
So I have downloaded a bunch of tweets thanks to Twitter's API and the Terminal (through Twurl). All the tweets are in a .json file (that I open with TextWrangler, I'm on a Mac) and the thing is that when I export my .json file to a .csv file in order to process and analyze the data more easily thanks to Excel (or at least the Excel version of LibreOffice), I don't have all the parameters I would require for my study, I lack the "bio" part of each Tweet info present in the .json file. In other words, in my final table I have a column for the tweet ID, one for the tweet author, one for the text of the tweet itself and so on... But I don't have a column for the bio of the tweet author, whereas this information is displayed in the .json file itself. So my question is: is there a code or anything which would enable me to have one more column displaying some more info present in the basic .json file in my final .csv table?
Again, this may not be clear, so don't hesitate to tell me if you need me to highlight a specific point.
Thanks in advance for any insight, I really need help on this one, this is for a research project I need to carry on for my PhD, so any help would be more than welcome!
EDIT: As an example, here is a sample of the data I have for one tweet in my original .json file:
{
"created_at": "Mon Apr 28 09:00:40 +0000 2014",
"id": 460705144846712800,
"id_str": "460705144846712832",
"text": "Work can suck a dick today",
"source": "Twitter for iPhone",
"truncated": false,
"in_reply_to_status_id": null,
"in_reply_to_status_id_str": null,
"in_reply_to_user_id": null,
"in_reply_to_user_id_str": null,
"in_reply_to_screen_name": null,
"user": {
"id": 253350311,
"id_str": "253350311",
"name": "JEEEZUS",
"screen_name": "Maxi_Flex",
"location": "Southchestershire",
"url": "http://www.soundcloud.com/maxi_flex",
"description": "Jazz Personality.G Mentality.",
"protected": false,
"followers_count": 457,
"friends_count": 400,
"listed_count": 1,
"created_at": "Thu Feb 17 02:08:57 +0000 2011",
"favourites_count": 1229,
"utc_offset": null,
"time_zone": null,
"geo_enabled": true,
"verified": false,
"statuses_count": 13661,
"lang": "en",
"contributors_enabled": false,
"is_translator": false,
"is_translation_enabled": false,
"profile_background_color": "08ABFC",
"profile_background_image_url": "http://pbs.twimg.com/profile_background_images/444297891977244672/Z1BkfCFB.jpeg",
"profile_background_image_url_https": "https://pbs.twimg.com/profile_background_images/444297891977244672/Z1BkfCFB.jpeg",
"profile_background_tile": true,
"profile_image_url": "http://pbs.twimg.com/profile_images/454073282778902529/gCGicDBH_normal.jpeg",
"profile_image_url_https": "https://pbs.twimg.com/profile_images/454073282778902529/gCGicDBH_normal.jpeg",
"profile_banner_url": "https://pbs.twimg.com/profile_banners/253350311/1392339276",
"profile_link_color": "FA05F2",
"profile_sidebar_border_color": "FFFFFF",
"profile_sidebar_fill_color": "DDEEF6",
"profile_text_color": "333333",
"profile_use_background_image": true,
"default_profile": false,
"default_profile_image": false,
"following": null,
"follow_request_sent": null,
"notifications": null
},
"geo": null,
"coordinates": null,
"place": null,
"contributors": null,
"retweet_count": 0,
"favorite_count": 0,
"entities": {
"hashtags": [],
"symbols": [],
"urls": [],
"user_mentions": []
},
"favorited": false,
"retweeted": false,
"filter_level": "medium",
"lang": "en"
}
So in the final csv file, I have some of the info I mentionned above, but what I would need to add in the csv file is the "description" part (bold) of each string. Any help would be appreciated!
The problem is probably that JSON is hierarchical and CSV is not. I'm guessing that you are only getting the top level JSON elements and not the nested objects. For example if your JSON is:
{
'name': 'test',
'author': {
'id': 123,
'created': ''
}
}
you are only getting 'name' and not 'author.id'? If this is the case, check out other questions on SO related to flattening JSON out for CSV e.g. flattening json to csv format
Any good JSON to CSV converter will work, try this one. If there is somehting funky in the JSON we need an example of the input JSON and what is getting spit out.
If you just need that one field enter the following command on the command line:
cat test.json | sed -n 's/.*description\":\"\([^"]*\)\".*/Description, \1/p' > result.csv
Where test.json is the file with all the JSON entries in it.
Here is the output from an example I ran:
cat test.json | sed -n 's/.*description\":\"\([^"]*\)\".*/\1/p'
Jazz Personality.G Mentality.
Jazz Personality.G Mentality.
Jazz Personality.G Mentality.
Jazz Personality.G Mentality.
If the file is very large you may need to split in to parts:
split -l N test.json part
Where N is the number of lines per part.