I have a json file containing a list of strings like this:
['Hello\nHow are you?', 'What is your name?\nMy name is john']
I have to read this file and store it as a list of strings but I am so confused that how should I read json file like this. Also, I should use utf-8 encoding format.
Let's assume you have one or multiple lines as described in the json file. Here is my suggestion (Remember to replace the file name test.json to yours):
import ast
with open("test.json", "r") as input_file:
line_list = input_file.readlines()
all_texts = [item for sublist in line_list for item in ast.literal_eval(sublist)]
print(all_texts)
The file you have shown is not in json format. Anyways, to read a json file you have to do following
import json
jsonObj = json.loads('path/to/file.json')
This will return a dictionary object and store it in jsonObj.
Related
I have uploaded a *.mat file that contains a 'struct' to my jupyter lab using:
from pymatreader import read_mat
data = read_mat(mat_file)
Now I have a multi-dimensional dictionary, for example:
data['Forces']['Ss1']['flap'].keys()
Gives the output:
dict_keys(['lf', 'rf', 'lh', 'rh'])
I want to convert this into a JSON file, exactly by the keys that already exist, without manually do so because I want to perform it to many *.mat files with various key numbers.
EDIT:
Unfortunately, I no longer have access to MATLAB.
An example for desired output would look something like this:
json_format = {
"Forces": {
"Ss1": {
"flap": {
"lf": [1,2,3,4],
"rf": [4,5,6,7],
"lh": [23 ,5,6,654,4],
"rh": [4 ,34 ,35, 56, 66]
}
}
}
}
ANOTHER EDIT:
So after making lists of the subkeys (I won't elaborate on it), I did this:
FORCES = []
for ind in individuals:
for force in forces:
for wing in wings:
FORCES.append({
ind: {
force: {
wing: data['Forces'][ind][force][wing].tolist()
}
}
})
Then, to save:
with open(f'{ROOT_PATH}/Forces.json', 'w') as f:
json.dump(FORCES, f)
That worked but only because I looked manually for all of the keys... Also, for some reason, I have squared brackets at the beginning and at the end of this json file.
The json package will output dictionaries to JSON:
import json
with open('filename.json', 'w') as f:
json.dump(data, f)
If you are using MATLAB-R2016b or later, and want to go straight from MATLAB to JSON check out JSONENCODE and JSONDECODE. For your purposes JSONENCODE
encodes data and returns a character vector in JSON format.
MathWorks Docs
Here is a quick example that assumes your data is in the MATLAB variable test_data and writes it to a file specified in the variable json_file
json_data = jsonencode(test_data);
writematrix(json_data,json_file);
Note: Some MATLAB data formats cannot be translate into JSON data due to limitations in the JSON specification. However, it sounds like your data fits well with the JSON specification.
I have the file log.txt with following data:
{"__TIMESTAMP":"2020-07-09T19:05:20.858013","__LABEL":"web_channel","__LEVEL":4,"__DIAGNOSE_SLOT":"","msg":"Port web_channel/diagnose_client not connected!"}
{"__TIMESTAMP":"2020-07-09T19:05:21.229737","__LABEL":"context_logging_addon","__LEVEL":4,"__DIAGNOSE_SLOT":"","msg":"startup component"}
{"__TIMESTAMP":"2020-07-09T19:05:21.229761","__LABEL":"context_logging_addon","__LEVEL":4,"__DIAGNOSE_SLOT":"","msg":"activate component"}
{"__TIMESTAMP":"2020-07-09T19:05:21.229793","__LABEL":"context_monitoring_addon","__LEVEL":4,"__DIAGNOSE_SLOT":"","msg":"startup component"}
{"__TIMESTAMP":"2020-07-09T19:05:21.229805","__LABEL":"context_monitoring_addon","__LEVEL":4,"__DIAGNOSE_SLOT":"","msg":"activate component"}
If I define a single row, I can convert in real JSON type:
import json
import datetime
from json import JSONEncoder
log = {
"__TIMESTAMP":"2020-07-09T19:05:21.229737",
"__LABEL":"context_logging_addon",
"__LEVEL":4,
"__DIAGNOSE_SLOT":"",
"msg":"Port web_channel/diagnose_client not connected!"}
class DateTimeEncoder(JSONEncoder):
#Override the default method
def default(self,obj):
if isinstance(obj,(datetime.date,datetime.datetime)):
return obj.isoformat()
print("Printing to check how it will look like")
print(DateTimeEncoder().encode(log))
I have the following output, which format is perfect JSON.
Printing to check how it will look like
{"__TIMESTAMP": "2020-07-09T19:05:21.229737", "__LABEL": "context_logging_addon", "__LEVEL": 4, "__DIAGNOSE_SLOT": "", "msg": "Port web_channel/diagnose_client not connected!"}
But I don't know how should I open the log.txt file, read the data to convert into JSON without any failure.
Could you help me please? Thanks in advance.
Let us say your log.txt file is in the same directory than your .py file.
Just open it with with open(... and then parse your file according to your syntax to create a list of dictionaries (each item corresponding to a row, then parse each dictionary as you're currently doing).
Here is how you could open and parse your file:
with open("log.txt","r") as file:
all_text = file.readlines()
parsed_line = list()
for text in all_text:
parsed_line.append(dict([item.split('":"') for item in text[2:-2].split('","')]))
If you have any question about the parsing let me know. This one is pretty straightforward.
Hope this helped you.
Try it this way:
logs = """[your log file above]"
for log in logs.splitlines():
print(DateTimeEncoder().encode(log))
Output:
"{\"__TIMESTAMP\":\"2020-07-09T19:05:20.858013\",\"__LABEL\":\"web_channel\",\"__LEVEL\":4,\"__DIAGNOSE_SLOT\":\"\",\"msg\":\"Port web_channel/diagnose_client not connected!\"}"
"{\"__TIMESTAMP\":\"2020-07-09T19:05:21.229737\",\"__LABEL\":\"context_logging_addon\",\"__LEVEL\":4,\"__DIAGNOSE_SLOT\":\"\",\"msg\":\"startup component\"}"
"{\"__TIMESTAMP\":\"2020-07-09T19:05:21.229761\",\"__LABEL\":\"context_logging_addon\",\"__LEVEL\":4,\"__DIAGNOSE_SLOT\":\"\",\"msg\":\"activate component\"}"
"{\"__TIMESTAMP\":\"2020-07-09T19:05:21.229793\",\"__LABEL\":\"context_monitoring_addon\",\"__LEVEL\":4,\"__DIAGNOSE_SLOT\":\"\",\"msg\":\"startup component\"}"
"{\"__TIMESTAMP\":\"2020-07-09T19:05:21.229805\",\"__LABEL\":\"context_monitoring_addon\",\"__LEVEL\":4,\"__DIAGNOSE_SLOT\":\"\",\"msg\":\"activate component\"}"
This is my code and it convert successfully. However, when i import this json into firebase and it state that Invalid JSON files.
import csv
import json
csvfile = open('C:/Users/Senior/seaborn-data/Denver DatasetCleaning Finalize.csv', 'r')
jsonfile = open('C:/Users/Senior/seaborn-data/Denver DatasetCleaning Finalize.json', 'w')
fieldnames = ("OFFENSE_CODE ","OFFENSE_CATEGORY_ID","FIRST_OCCURRENCE_DATE","DATE","YEAR","MONTH","DAY","TIME","HOUR","MINUTE","INCIDENT_ADDRESS","GEO_LON","GEO_LAT","NEIGHBORHOOD_ID")
reader = csv.DictReader( csvfile, fieldnames)
for row in reader:
json.dump(row, jsonfile)
jsonfile.write('\n')
each time json.dump is called it is outputting json. but several json strings concatenated together are not still json
what you maybe want to do is read the entire csv into a variable, then json.dump that
I want to read json or xml file in pyspark.lf my file is split in multiple line in
rdd= sc.textFile(json or xml)
Input
{
" employees":
[
{
"firstName":"John",
"lastName":"Doe"
},
{
"firstName":"Anna"
]
}
Input is spread across multiple lines.
Expected Output {"employees:[{"firstName:"John",......]}
How to get the complete file in a single line using pyspark?
There are 3 ways (I invented the 3rd one, the first two are standard built-in Spark functions), solutions here are in PySpark:
textFile, wholeTextFile, and a labeled textFile (key = file, value = 1 line from file. This is kind of a mix between the two given ways to parse files).
1.) textFile
input:
rdd = sc.textFile('/home/folder_with_text_files/input_file')
output: array containing 1 line of file as each entry ie. [line1, line2, ...]
2.) wholeTextFiles
input:
rdd = sc.wholeTextFiles('/home/folder_with_text_files/*')
output: array of tuples, first item is the "key" with the filepath, second item contains 1 file's entire contents ie.
[(u'file:/home/folder_with_text_files/', u'file1_contents'), (u'file:/home/folder_with_text_files/', file2_contents), ...]
3.) "Labeled" textFile
input:
import glob
from pyspark import SparkContext
SparkContext.stop(sc)
sc = SparkContext("local","example") # if running locally
sqlContext = SQLContext(sc)
for filename in glob.glob(Data_File + "/*"):
Spark_Full += sc.textFile(filename).keyBy(lambda x: filename)
output: array with each entry containing a tuple using filename-as-key with value = each line of file. (Technically, using this method you can also use a different key besides the actual filepath name- perhaps a hashing representation to save on memory). ie.
[('/home/folder_with_text_files/file1.txt', 'file1_contents_line1'),
('/home/folder_with_text_files/file1.txt', 'file1_contents_line2'),
('/home/folder_with_text_files/file1.txt', 'file1_contents_line3'),
('/home/folder_with_text_files/file2.txt', 'file2_contents_line1'),
...]
You can also recombine either as a list of lines:
Spark_Full.groupByKey().map(lambda x: (x[0], list(x[1]))).collect()
[('/home/folder_with_text_files/file1.txt', ['file1_contents_line1', 'file1_contents_line2','file1_contents_line3']),
('/home/folder_with_text_files/file2.txt', ['file2_contents_line1'])]
Or recombine entire files back to single strings (in this example the result is the same as what you get from wholeTextFiles, but with the string "file:" stripped from the filepathing.):
Spark_Full.groupByKey().map(lambda x: (x[0], ' '.join(list(x[1])))).collect()
If your data is not formed on one line as textFile expects, then use wholeTextFiles.
This will give you the whole file so that you can parse it down into whatever format you would like.
This is how you would do in scala
rdd = sc.wholeTextFiles("hdfs://nameservice1/user/me/test.txt")
rdd.collect.foreach(t=>println(t._2))
"How to read whole [HDFS] file in one string [in Spark, to use as sql]":
e.g.
// Put file to hdfs from edge-node's shell...
hdfs dfs -put <filename>
// Within spark-shell...
// 1. Load file as one string
val f = sc.wholeTextFiles("hdfs:///user/<username>/<filename>")
val hql = f.take(1)(0)._2
// 2. Use string as sql/hql
val hiveContext = new org.apache.spark.sql.hive.HiveContext(sc)
val results = hiveContext.sql(hql)
Python way
rdd = spark.sparkContext.wholeTextFiles("hdfs://nameservice1/user/me/test.txt")
json = rdd.collect()[0][1]
I am trying to: 1. read a json file with groovy; 2. change the object a little bit; 3. output/override as a new json file.
so far, I know we can use import groovy.json.JsonSlurper;, but I searched some samples, all for parse the .txt file (convert txt to json format object) like this:
def inputFile = file('json_input/' + fileName)
def inputJson = new JsonSlurper().parseText(inputFile.text)
print out...
but not read json file directly.
Is there any simple way to read and get the json object from a .json file so that I can make some changes of the data?
Thanks!