Parsing and cleaning text file in Python?

Parsing and cleaning text file in Python? - json

I have a text file which contains raw data. I want to parse that data and clean it so that it can be used further.The following is the rawdata.
"{\x0A \x22identifier\x22: {\x0A \x22company_code\x22: \x22TSC\x22,\x0A \x22product_type\x22: \x22airtime-ctg\x22,\x0A \x22host_type\x22: \x22android\x22\x0A },\x0A \x22id\x22: {\x0A \x22type\x22: \x22guest\x22,\x0A \x22group\x22: \x22guest\x22,\x0A \x22uuid\x22: \x221a0d4d6e-0c00-11e7-a16f-0242ac110002\x22,\x0A \x22device_id\x22: \x22423e49efa4b8b013\x22\x0A },\x0A \x22stats\x22: [\x0A {\x0A \x22timestamp\x22: \x222017-03-22T03:21:11+0000\x22,\x0A \x22software_id\x22: \x22A-ACTG\x22,\x0A \x22action_id\x22: \x22open_app\x22,\x0A \x22values\x22: {\x0A \x22device_id\x22: \x22423e49efa4b8b013\x22,\x0A \x22language\x22: \x22en\x22\x0A }\x0A }\x0A ]\x0A}"
I want to remove all the hexadecimal characters,I tried parsing the data and storing in an array and cleaning it using re.sub() but it gives the same data.
for line in f:
new_data = re.sub(r'[^\x00-\x7f],\x22',r'', line)
data.append(new_data)

\x0A is the hex code for newline. After s = <your json string>, print(s) gives
>>> print(s)
{
"identifier": {
"company_code": "TSC",
"product_type": "airtime-ctg",
"host_type": "android"
},
"id": {
"type": "guest",
"group": "guest",
"uuid": "1a0d4d6e-0c00-11e7-a16f-0242ac110002",
"device_id": "423e49efa4b8b013"
},
"stats": [
{
"timestamp": "2017-03-22T03:21:11+0000",
"software_id": "A-ACTG",
"action_id": "open_app",
"values": {
"device_id": "423e49efa4b8b013",
"language": "en"
}
}
]
}
You should parse this with the json module load (from file) or loads (from string) functions. You will get a dict with 2 dicts and a list with a dict.

Related

json file into lua table

How is it possible to get the input of an json file:
{ "name": "John",
"work": "chef",
"age": "29",
"messages": [
{
"msg_name": "Hello",
"msg": "how_are_you"
},
{ "second_msg_name": "hi",
"msg": "fine"
}
]
}
into a Lua table? All json.lua scripts I found did not work for JSON with new lines. Does someone know a solution?
So piglets solution works for the string in the same script.
But how does it work with a JSON file?
local json = require("dkjson")
local file = io.open("C:\\Users\\...\\Documents\\Lua_Plugins\\test_file_reader\\test.json", "r")
local myTable = json.decode(file)
print(myTable)
Here then comes the error "bad argument #1 to 'strfind' (string expected, got FILE*)" on. Does someone see my fault?

local json = require("dkjson")
local yourString = [[{ "name": "John",
"work": "chef",
"age": "29",
"messages": [
{
"msg_name": "Hello",
"msg": "how_are_you"
},
{ "second_msg_name": "hi",
"msg": "fine"
}
]
}]]
local myTable = json.decode(yourString)
http://dkolf.de/src/dkjson-lua.fsl/home

I found the solution:
local json = require("dkjson")
local file = io.open("C:\\Users\\...\\Documents\\Lua_Plugins\\test_file_reader\\test.json", "r")
local content = file:read "*a"
file:close()
local myTable = json.decode(content)

Dataframe Row to JSON giving incorrect result

I need to convert each row of DataFrame to JSON.
Each row is converted to JSON by the following line of script:
data = df.loc[0].to_json()
Output:
{
"uid": "1234",
"url": "https:\\/\\/google.com",
"name": "google.com",
"queries": [
"Query1",
"Query2",
"Query3"
]
}
Why is it not parsing the URLs properly?
Ideally, the JSON format of the records should be like below, isn't it?
{
"uid": "1234",
"url": "https:\/\/google.com",
"name": "google.com",
"queries": [
"Query1",
"Query2",
"Query3"
]
}
This is how the df looks:

json.dumps(df.loc[0].to_dict()) returns the data as json, without escaping the URLs. See below:
In[19]: df
Out[19]:
uid url name queries
0 1234 https://google.com google.com [Query1, Query2, Query3]
In [20]: json.dumps(df.loc[0].to_dict())
Out[20]: '{"uid": "1234", "url": "https://google.com", "name": "google.com", "queries": ["Query1", "Query2", "Query3"]}'

Robot Framework how to count item list in JSON

I would like to count item "Start" from JSON APIs on Robot Framework
{
"result": {
"api": "xxx",
"timestamp": "14:41:18",
"series": [
{
"series_code": "test",
"series_name_eng": "test",
"unit_eng": "t",
"series_type": "e",
"frequency": "s",
"last_update_date": "2020",
"observations": [
{
"start": "2020-01",
"value": "999"
},
{
"start": "2020-02",
"value": "888"
},
{
"start": "2020-03",
"value": "777"
},
]
}
I use this not working
${json_string} Get File ./example.json
${json_object} evaluate json.loads('''${json_string}''') json
#${value}= get value from json ${json_object} $.result.series[0].observations
${x_count} Get Length ${json_object["$.result.series[0].observations"]}
Could you please help guide to for how?

The Json provided in the example above is not valid one. That will need to be fixed by closing series array ] then close the results object } and then close the outer object }
Valid json will look like this -
${Getjson}= {"result":{"api":"xxx","timestamp":"14:41:18","series":[{"series_code":"test","series_name_eng":"test","unit_eng":"t","series_type":"e","frequency":"s","last_update_date":"2020","observations":[{"start":"2020-01","value":"999"},{"start":"2020-02","value":"888"},{"start":"2020-03","value":"777"}]}]}}
You were close with this jsonpath $.result.series[0].observations. The correct one is in below example -
${json}= Convert String to JSON ${Getjson}
#{Start}= Get Value From Json ${json} $.result.series[?(#.observations)].observations[?(#.start)].start
${length} Get length ${Start}
log ${length}
Output:-

How to open a .mongo file and export the content into csv?

EDIT 2014-05-01: I tried fromJSON first (as suggested below), but that only parsed the first line. I found out that there were commas missing between the brackets of each JSON line so I changed that in TextEdit and saved the file. I also added [ at the beginning of the file and ] at the end and then it worked with JSON. Now the next step: from a list (with embedded lists) to a dataframe (or csv).
I get a data package from edX every now and then on the courses we are evaluating. Some of these are just plain .csv files which are quite easy to handle, others are more difficult for me (not having a CS or programming background).
I have 2 files I want to open and parse into csv files for analysis in R. I have tried many many json2csv tools out there, but to no avail. I also tried the simple methods described here to turn json into csv.
The data is confidential, so I cannot share the entire data set, but will share the first two lines of the file, maybe that helps. The problem is that nowhere I find anything about .mongo files, which to me seems quite strange, do they even exist? Or is this just a JSON file that may be corrupted (which could explain the errors)?
Any suggestions are welcome.
The first 2 lines in one of the .mongo files:
{
"_id": {
"$oid": "52d1e62c350e7a3156000009"
},
"votes": {
"up": [
],
"down": [
],
"up_count": 0,
"down_count": 0,
"count": 0,
"point": 0
},
"visible": true,
"abuse_flaggers": [
],
"historical_abuse_flaggers": [
],
"parent_ids": [
],
"at_position_list": [
],
"body": "the delft university accredited course with the scholarship (fundamentals of water treatment) is supposed to start in about a month's time. But have the scholarship list been published? Any tentative date??",
"course_id": "DelftX/CTB3365x/2013_Fall",
"_type": "Comment",
"endorsed": false,
"anonymous": false,
"anonymous_to_peers": false,
"author_id": "269835",
"comment_thread_id": {
"$oid": "52cd40c5ab40cf347e00008d"
},
"author_username": "tachak59",
"sk": "52d1e62c350e7a3156000009",
"updated_at": {
"$date": 1389487660636
},
"created_at": {
"$date": 1389487660636
}
}{
"_id": {
"$oid": "52d0a66bcb3eee318d000012"
},
"votes": {
"up": [
],
"down": [
],
"up_count": 0,
"down_count": 0,
"count": 0,
"point": 0
},
"visible": true,
"abuse_flaggers": [
],
"historical_abuse_flaggers": [
],
"parent_ids": [
{
"$oid": "52c63278100c07c0d1000028"
}
],
"at_position_list": [
],
"body": "I got it. Thank you!",
"course_id": "DelftX/CTB3365x/2013_Fall",
"_type": "Comment",
"endorsed": false,
"anonymous": false,
"anonymous_to_peers": false,
"parent_id": {
"$oid": "52c63278100c07c0d1000028"
},
"author_id": "2655027",
"comment_thread_id": {
"$oid": "52c4f303b03c4aba51000013"
},
"author_username": "dmoronta",
"sk": "52c63278100c07c0d1000028-52d0a66bcb3eee318d000012",
"updated_at": {
"$date": 1389405803386
},
"created_at": {
"$date": 1389405803386
}
}{
"_id": {
"$oid": "52ceea0cada002b72c000059"
},
"votes": {
"up": [
],
"down": [
],
"up_count": 0,
"down_count": 0,
"count": 0,
"point": 0
},
"visible": true,
"abuse_flaggers": [
],
"historical_abuse_flaggers": [
],
"parent_ids": [
{
"$oid": "5287e8d5906c42f5aa000013"
}
],
"at_position_list": [
],
"body": "if u please send by mail \n",
"course_id": "DelftX/CTB3365x/2013_Fall",
"_type": "Comment",
"endorsed": false,
"anonymous": false,
"anonymous_to_peers": false,
"parent_id": {
"$oid": "5287e8d5906c42f5aa000013"
},
"author_id": "2276302",
"comment_thread_id": {
"$oid": "528674d784179607d0000011"
},
"author_username": "totah1993",
"sk": "5287e8d5906c42f5aa000013-52ceea0cada002b72c000059",
"updated_at": {
"$date": 1389292044203
},
"created_at": {
"$date": 1389292044203
}
}

R doesn't have "native" support for these files but there is a JSON parser with the rjson package. So I might load my .mongo file with:
myfile <- "path/to/myfile.mongo"
myJSON <- readLines(myfile)
myNiceData <- fromJSON(myJSON)
Since RJson converts into a data structure that fits the object being read, you'll have to do some additional snooping but once you have an R data type you shouldn't have any trouble working with it from there.
Another package to consider when parsing JSON data is jsonlite. It will make data frames for you so you can write them to a csv format with write.table or some other applicable method for writing objects.
NOTE: if it is easier to connect to the MongoDB and get the data from a request, then RMongo may be a good bet. The R-Bloggers also made a post about using RMongo that has a nice little walkthrough.

I used RJSON as suggested by #theWanderer and with the help of a colleague wrote the following code to parse the data into columns, choosing the specific columns that are needed, and checking each of the instances if they return the right variables.
Entire workflow:
Checked some of the data in jsonlint - corrected the errors → },{ instead of }{ between each line and [ and ] at the beginning and end of the file
Made a smaller file to play with, containing about 11 JSON lines
Used the code below to parse the datafile - however, checking the different listItems first if they are not lists themselves (that gives problems) // as you will see, I also removed things like \n because that gave errors and added an empty value for parent_id if there is none in the data (otherwise it would mix up the data)
The code to import the .mongo file into R and then parse it into CSV:
library(rjson)
###### set working directory to write out the data file
setwd("/your/favourite/dir/json to csv/")
#never ever convert strings to factors
options(stringsAsFactors = FALSE)
#import the .mongo file to R
temp.data = fromJSON(file="temp.mongo", method="C", unexpected.escape="error")
file.remove("temp.csv") ## removes the old datafile if there is one
## (so the data is not appended to the file,
## but a new file is created)
listItem = temp.data[[1]] ## prepare the listItem the first time
for (listItem in temp.data){
parent_id = ""
if (length(listItem$parent_id)>0){
parent_id = listItem$parent_id
}
write.table(t(c(
listItem$votes$up_count, listItem$visible, parent_id,
gsub("\n", "", listItem$body), listItem$course_id, unlist(listItem["_type"]),
listItem$endorsed, listItem$anonymous, listItem$author_id,
unlist(listItem$comment_thread_id), listItem$author_username,
as.POSIXct(unlist(listItem$created_at)/1000, origin="1970-01-01"))), # end t(), c()
file="temp.csv", sep="\t", append=TRUE, row.names=FALSE, col.names=FALSE)
}

How to store a Json File Using Lift Mapper in MySql

I am new to Liftweb. I want to Store a Json File in Mysql database using Lift Mapper
My Json File Like Below:-
[
{
"name": "Root Category",
"Id": "1",
"dispName": "",
"childs": [
{
"name": "Sub Category",
"Id": "",
"dispName": "",
"childs": [
{
"name": "Spec1",
"Id": "",
"dispName": "",
"childs": []
}
]
}
]
},
{
"name": "Root Category",
"Id": "",
"dispName": "",
"childs": [
{
"name": "Sub Category",
"Id": "",
"dispName": "",
"childs": [
{
"name": "Spec1",
"Id": "",
"dispName": "",
"childs": []
}
]
}
]
}
]
Is it Possible to store a Json File in Lift Mapper .Please give me Suggestions. It will be great if some one provide any sample
Best Regards
GSY

At the moment there is no good support for storing JSON in MySQL. I mean it's not going to provide capabilities MongoDB provides for example. However there are some JSON processing functions provided by community if you want. Given all that you can store it in VARCHAR. TEXT or BLOB field type as simple text. Here is a Mapper example:
import net.liftweb.mapper._
import net.liftweb.common._
class SomeDbClass extends LongKeyedMapper[SomeDbClass] with IdPK {
def getSingleton = SomeDbClass
// set limit of chars - can be used in `validate()`
object quota_type extends MappedString(this, 1024)
}
object SomeDbClass extends SomeDbClass with LongKeyedMetaMapper[SomeDbClass]
For one of my projects I store JSON as a string in Postgres similarly because I just need to read and write it without having to parse it in DB and query by fields. Whenever I need efficient JSON storage with query and update support I use MongoDB with Record + ( Casbah or Rogue ).

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

Parsing and cleaning text file in Python? - json

Related

json file into lua table

Dataframe Row to JSON giving incorrect result

Robot Framework how to count item list in JSON

How to open a .mongo file and export the content into csv?

How to store a Json File Using Lift Mapper in MySql

Categories

Resources