extract sub-headers in R or Python - csv

desperate newbie here. I have a question for which I just cannot find the right solution. I received a dta file from which I want to extract the sub-headers of each column. Unfortunately, I am not versed in Stata or have access to it. I read my dta file into R and changed it to a data frame and also data table. It displays the column names and sub-headers well. However I cannot extract the sub-headers and they also disappear when I save the data frame or table as a csv or excel file locally. When I call colnames(df) or names(df), I only receive the column names and not the sub-headers. I also tried it with python without luck. Unfortunately, I am not allowed to share the data. So I hope my problem is understandable without an example. Thank you in advance!

Related

Is there a way to insert the filename into the record when importing a JSON file into Power BI?

Not sure how to ask this but here goes. I have a collection of 500+ JSON files that I need to import into PowerBI. Each JSON has four different levels of information that I need to parse out. I converted the JSON top-level info into a table and transposed it so I had one row like the attached screenshot.
enter image description here
My first question is: can I easily add the filename to the JSON record? I would like to use the filename as a unique key in later queries.
Thanks!
It looks like you may be connecting to each JSON file individually? If I'm correct, assuming all the JSON files can be in a single folder, you can use the "Folder" connection. That then allows you to right-click on the original folder query and choose "reference" to then create various transformations for each JSON file, and it includes the file name.
Related details:
https://powerbi.tips/2016/06/loading-data-from-folder/
https://learn.microsoft.com/en-us/power-bi/guidance/power-query-referenced-queries
Hoping that helps!

Metadata map for importing CSV data into IPTC XMP images using Bridge

Let's say I have 100 scanned Tif files. I also have a CSV of the metadata for those 100 Tif files. Each file is named with its unique identifier, which is also column 1 of the csv.
First: How do I find a map that tells me what columns should be named what, in order to stay within the IPTC standard using XMP? (I've googled for most of the day and have found nothing)
Second: How can I merge the metadata in the CSV to each corresponding image?
I'm basically creating a spreadsheet with all 50,000 images in an archival collection, and plan to use the CSV to create the metadata for the images once they're scanned.
Thanks!
To know where to put your metadata, I'd suggest looking at the IPTC Photo Metadata Standard page. Without knowing more about your data, it's hard for someone else to say what data should go where.
As for embedding your data into your files from a CSV file, I'd suggest exiftool. Change the header of each column to the name of the TAG to write to and make the first column the path/filename of each file, your command would be as simple as
exiftool -csv=file.csv /path/to/files
See exiftool FAQ #26 for more details.

Converting a JSON file of unknown structure to CSV using R

I am looking for a way to convert a rather big (3GB) json file to csv. I tried using R and this is the code that I used.
library("rjson")
data <- fromJSON(file="C:/POI data 30 Rows.json")
json_data <- as.data.frame(data)
write.csv(json_data, file='C:/POI data 30 Rows Exported.csv')
The example I am using is only a subset of the total data of about 30 rows. which I extracted using EMeditor and copied and pasted into a text file. The problem is however it only converts the first row of the data.
I am not an experienced programmer and have tried everything on youtube tutorials from php to excel and nothing seems to work. The problem is I have no Idea what the structure of the data is so I can not create a predetermined data frame and there is a number of missing values within the data.
Any advice would be greatly appreciated.

Data Services CSV Flat File there should be a column delimiter after column [n]

I'm really struggling with this one. Data services (v14.2.3.549) keeps flagging up an error saying "A column delimiter was seen after column number <80> for row number <1> in file " it says this for what looks like every row it processes.
I've used the same settings as all the last files I imported, which are also CSV files. The files are exported from a web front end as excel then saved as csv. I tried opening the file with excel, clearing empty columns after end of data, in case there was anything in them, and rerunning to no avail.
I don't really know what to look for in the file so can anyone help me find out what I should be looking for so I can map my way to the problem. It seems that this problem is throughout this collection of files, as if I try importing using wild card on end of file name it comes up with same errors in other files.
Many thanks
Andrew
I used "Adaptable Schema" set to "yes" in the file format definition to get around this error.

Formatting csv files in Excel

Win XP, Excel 2007
I know there are various other posts on csv formatting but couldn't quite find what i needed.
Some of our data is held off site by another company and they send us a csv file every morning with the previous days data.
The problem is this data has come from web input forms that may have drop-down lists.
For example there may be a drop down list of Number of Employees with options like 1-10, 11-25, 26-50 etc
When we open the csv file in Excel certain options like 1-10 has been turned into Oct-01 date format which we do not want.
Is there an easy way to change these back OR reformat the cells and do a find...replace? (This didn't seem to work terribly well as it kept reverting back to the date)
Indeed is there a better way of opening the csv file to keep the formatting intact? and save us doing lots of find...replaces.
Ultimately we will need to open the csv in Excel though.
Grateful for any hints
Isn't that SO annoying? Here's how I deal with this issue:
When you open the CSV file in Excel, you should get a dialog with parsing options. First you select delimited or fixed then you get a screen that previews the data parsing.
It's easy to miss, but in the upper right corner of the dialog box there's an option to set a specific data format for each column. Select the column you want to protect and set the format to text. (This keeps Excel from dropping the leading zeros in ZIP codes for New England too!)
Once you get it into Excel, you can do a vlookup or replace to reset the values to your own codes.
Hope this helps. Good luck.