Converting a JSON file of unknown structure to CSV using R - json

I am looking for a way to convert a rather big (3GB) json file to csv. I tried using R and this is the code that I used.
library("rjson")
data <- fromJSON(file="C:/POI data 30 Rows.json")
json_data <- as.data.frame(data)
write.csv(json_data, file='C:/POI data 30 Rows Exported.csv')
The example I am using is only a subset of the total data of about 30 rows. which I extracted using EMeditor and copied and pasted into a text file. The problem is however it only converts the first row of the data.
I am not an experienced programmer and have tried everything on youtube tutorials from php to excel and nothing seems to work. The problem is I have no Idea what the structure of the data is so I can not create a predetermined data frame and there is a number of missing values within the data.
Any advice would be greatly appreciated.

Related

extract sub-headers in R or Python

desperate newbie here. I have a question for which I just cannot find the right solution. I received a dta file from which I want to extract the sub-headers of each column. Unfortunately, I am not versed in Stata or have access to it. I read my dta file into R and changed it to a data frame and also data table. It displays the column names and sub-headers well. However I cannot extract the sub-headers and they also disappear when I save the data frame or table as a csv or excel file locally. When I call colnames(df) or names(df), I only receive the column names and not the sub-headers. I also tried it with python without luck. Unfortunately, I am not allowed to share the data. So I hope my problem is understandable without an example. Thank you in advance!

Importing specific columns from a CSV into excel

I am trying to do what the title says and also do it for new records. I cannot link the CSV file because it exceeds the 255 limit. So i am attempting to split up the table.
I have the below table in access
DateOfTest
Time
PromptTime
TestSequence
PATResults
Logs
Serial Number
1
2
3
4
5
6
7
Obviously, where the numbers are i want the data from the CSV to be inserted.
I have created a form including a button so i can run some VBA, but i cannot find the correct information online for my work, as i am new to VBA it is also a bit confusing.
I have attempted some random code, but i was just spraying and praying at that point
I am not sure I understood your question. In the impoer tool you can choose columns, but if you want to do it with a script, I would suggest to perform pre-processing phase with simple python and pandas to read the csv file, remove any unwanted columns and save to another CSV to be uploaded directly to excel.
something like this
import pandas as pd
df = pd.read_csv ('csvfile.csv')
df.drop('column_name', inplace=True, axis=1)
df.to_excel ('filename.xlsx', index = False, header=True)

How to calculate column averages from csv file and write average rows into a new file using python

Basically, I have a csv file being updated every 5 seconds. It adds a new row of data to it every 5 seconds. The file looks like this:
I need to calculate averages for each column into a new row and write those row into a new averages file and keep appending to it.I don't have any headers for my columns file.
First off, I think people will be more inclined to help if you demonstrate that you've made an effort yourself.
To start you off, I'd suggest looking at the Pandas library. It includes the read_csv and to_csv functions.
Because it's a bit hidden in the docs- you can tell Pandas to add rows to the end of an existing csv by setting the Python write mode in the to_csv function:
df.to_csv('averages.csv', mode='a', header=False)
Hit me up if you have any more questions!

how to save dataframe into csv pyspark

I am trying to save dataframe into hdfs system.
It gets saved as part-0000 and into multiple parts.
I want to save it as an excel sheet or just one part file?
How can we achieve this?
code used so far:
df1.write.csv('/user/gtree/tree.csv')
Your dataframe is being saved based on its partitions(multiple partitions= multiple files). You can coalesce or bring your partitions down to 1, so that only 1 file can be written.
Link:https://spark.apache.org/docs/latest/api/python/pyspark.sql.html#pyspark.sql.DataFrame.coalesce
df1.coalesce(1).write.csv('/user/gtree/tree.csv')
You can use .repartition(1) to set the partitions to only 1
df.repartition(1).save(filePath)

How to Get Data from CSV File and Send them to Excel Using Pentaho?

I have a tabular csv file that has seven columns and containing the following data:
ID,Gender,PatientPrefix,PatientFirstName,PatientLastName,PatientSuffix,PatientPrefName
2 ,M ,Mr ,Lawrence ,Harry , ,Larry
I am new to pentaho and I want to design a transformation that moves the data (values of the 7 columns) to an empty excel sheet. The excel sheet has different column names, but should carry the same data, as shown:
prefix_name,first_name,middle_name,last_name,maiden_name,suffix_name,Gender,ID
I tried to design a transformation using the following series of steps, but it gives me errors at the end that I could not interpret them.
What is the proper design to move the data from the csv file to the excel sheet in this case? Any ideas to solve this problem?
As #Brian.D.Myers mentioned in the comment you can use select values step. But here is how you do it step by step explanation.
Select all the fields from CSV file input step.
Configure the select values step as follows.
In the Content tab of Excel writer step click on Get fields button and fill the fields. Alternatively you can use Excel output step as well.