In Stata, how do I add variable labels from a separate csv file? - csv

I have a set of csv files that are very simple to load into Stata using the -insheet- command. But they have very uninformative variable names. For each of these files, I also have a file of metadata consisting of two columns: the original (uninformative) variable names, and a description of what the variables actually mean. I'd like to use these metadata files to create variable labels, preferably without going through and typing up all the separate label commands or turning the metadata file into a dictionary for each file. It seems like there must be a quick way of loading the metadata file into Stata and looping through it to generate the label commands, but I don't know what it is. Any thoughts?

Ideally each line of the metadata is something like
varname1 "more interesting description"
in which case you can prefix each line with
label var
and then run the file as if it were a do-file using do. See the help for label. That is easy in a decent text editor, as for example searching for the start of each line and replacing it with label var (note the need for the space).
What could bite here includes:
You don't have double quotes " " as delimiters, in which case you need to insert them.
The extra information does not qualify as a variable label because it is more than 80 characters long. See help limits.
There are other ways to do this with Stata. You could write a program to read in the metadata and write out a do-file using file, but if this were my problem I would reach first for my text editor. (Most experienced Stata programmers use something else as well as doedit.)

Related

How to import csv in KNIME and ignore the quote marks

I have a csv file with data like this:
"Column1; Column2; Column3"
"ValueA; ValueB; ValueC"
"ValueD; ValueE; ValueF"
When i import it using the 'CSV Reader'-Node it interprets the quote marks as content.
I need the data to be imported without the quotation marks though (formatting it after that does not feel like a clean way of doing this and the node interprets the data formats wrong).
The setting of the node is as follows: https://i.stack.imgur.com/FJC1k.png
How can i deal with this?
In configuration dialog add " as Quote Char.
#FlipForties Hi,
As a heavy KNIME user I would recommend trying to load your data via the File Reader node instead. It's much more flexible than the CSV Read node and you should be able to load your data as is without issues. I made a test data-set and it looks ok upon load. See screen shot below:
enter image description here

Not able to load CSV file in weka

I am not being able to load csv file using weka, I have removed each and every special symbol even using text editor, still no luck. I am attaching the file, I will be obliged if solve this problem.
It shows "Wrong number of values, Read 31, expected 27, read token[EOL], line 3"
link : https://drive.google.com/open?id=0By7zyIPDD6HJMmthWnZLSUk5aFE
You have planty of empty fields in your file and if you download it as .csv even the header gets three commas at its end.
e.g. your 6th line:
,Doug Walker,,,131,,Rob Walker,131,,Documentary,Doug Walker,Star Wars: Episode VII The Force Awakens  ,8,143,,0,,,,,,,,,12,7.1,,0,,,
Simmilar to the suggestion in this post you could try s.th. like notepad++ or another text editor to replace ",," by ",?," to fill up your gaps.
Convert NA values to ? automatically while loading
I did this and then you get in your first row two question marks as column names wich obviously doesnt work, so change the first row to look like this:
color,director_name,num_critic_for_reviews,duration,director_facebook_likes,actor_3_facebook_likes,actor_2_name,actor_1_facebook_likes,gross,genres,actor_1_name,movie_title,num_voted_users,cast_total_facebook_likes,actor_3_name,facenumber_in_poster,plot_keywords,?,num_user_for_reviews,language,country,content_rating,budget,title_year,actor_2_facebook_likes,imdb_score,aspect_ratio,movie_facebook_likes,additionalColName1,additionalColName2,additionalColName3
if you try now to import your data weka starts telling you which lines it doesn't like and why. Btw. you did not "removed each and every special symbol"!
After removing a fiew lines with e.g. the Ç character it worked.
Thats just an ugly workaround, try filling the empty values and find a regular expression or a better way to save your file to remove the last three commas of every line, i was just too lazy for now. But i could load it into weka and that's what you wanted (:

Taking count in Rapidminer

How to take a row count of a list which is in word document?? If the same list is in excel I am able to take the count using aggregate operator but in word document it is not happening.
I recommend the answer from #awchisholm as it's the easiest solution. However, if you have several word documents this might become impractical.
In this case you can use the operator Loop Zip files to unzip the word document and look inside the for the file /word/document.xml and using RapidMiner's text functions (or Read XML) look for each instance of <w:p ...>...</w:p>, this represents a new line so you can count them from there.
There is also an xml doc in the unzipped directory called /docProps/app.xml you can read this in to find some meta information about the document such as number of words, characters & pages. Unfortunately I've found that unreliable for number of lines which is why I recommend using the <w:p> tag to search.
RapidMiner cannot easily read Word documents. You have to save the document as a text file and use the Read CSV operator to read the file.

Java.io.IOException: wrong number of values (WEKA CSV to ARFF)

Currently working on a Data Mining project using my own dataset I had found using Weka. The only issue is that taking my file from csv format and converting it into arff format is causing issues.
java.io.IOException: wrong number of values. Read 2, expected 5, Read Token[EOL], line 3
This is the error I am getting. I have browsed around online looking for similar issues and have tried removing all quotes and special characters that throw this exception. Every place I looked told me to remove special characters and I believe there are none left. The link to my dataset is here : https://docs.google.com/spreadsheets/d/1xqEe7MZE9SdKB_yvFSgWeSVYuDrq0b31Eu5oECNbGH0/edit#gid=1736568367&vpid=A1
This is the first three lines of my file where the first is the attribute names, file is separated by commas in note
Inequality Adjusted HPI Rank,Sub Region,Inequality Adjusted Life Expectancy,Inquality Adjusted Well being,Footprint
,Inequality adjusted HPI
1,1,73.1,6.9,2.5,48.2
2,6,65.17333333,5.487667631,1.390974448,45.97489063
If you open your file with a text editor, you will see that Footprint has quotes around it. Delete the quotes and you are good to go!
Weka is normally not that good in reading CSV files that include special characters, and ARFF files are normally easier to use. Therefore, in such cases, the easiest way is to convert your CSV file to an ARFF file using R ("RWeka" and "foreign" libraries can handle this conversion).
There is also another possibility. I was creating my CSV file and the header had a different number of elements compared to the rest of the data. So, check the header as well...!

Changing The Delimiter to CTRL+A in Python CSV Module

I'm trying to write a csv file with the delimiter ctrl+a. I'm going to have to eventually write the file to hadoop and I'm unable to use a standard delimiter.
Currently I'm trying this:
writer = csv.writer(f, delimiter = "\u0001")
for item in aList:
writer.writerow(item)
f.close()
However, the outputted excel file doesn't appear to be written correctly...
Some rows are condensed into one block, while others will have one field in the first and then the rest condensed into the second block, etc.
Is the error where I'm setting up the writer object, or am I just not familiar with separating files this way?
You can try using the nonprinting "group separator" character, which can be represented in python code as '\035'
see http://www.asciitable.com/index/asciifull.gif for some other nonprinting characters if you need more.
It may be helpful to include more context about why you want to use nonstandard delimiter. And whether Excel parsing of the file is necessary, or just a quick check to see if the file might be parsed properly by the target system, Hadoop.