I have 2 CSV files almost identical with the following differences:
The first has a column, "date".
The second doesn't have "date" and also has 50 rows less than the 1st ("email").
They are a list of subscribers with date created. The second, however, is the updated list with subscribers who wanted to be removed, but this no longer has the date created.
Is there any way to import column "date" from 1st CSV into the 2nd CSV by making a reference to the "email" column so I can get the correct date of that subscriber?
Sorry, there seems to be not a ready made (probably an evening's worth of effort) command line tool available.
You could look at different ways, one complex way is to load it in tables, to the merge (using a select and join on the two tables) and export it back as csv.
The simplest I could think of was to use R (given that you have header names, in your CSV?):
csv1_data <- read.csv('/path/to/csv1.csv')
csv2_data <- read.csv('/path/to/csv2.csv')
merged_csv <- merge(csv1_data, csv2_data)
write.table(merged_csv,file="/path/to/merged_csv.csv",sep=",",row.names=T)
The first 2 lines load the data in R, the 3 line merges them using the default S3 method, the final line exports the result as a csv file, with the headers.
Hope this helps!
Related
I am trying to do what the title says and also do it for new records. I cannot link the CSV file because it exceeds the 255 limit. So i am attempting to split up the table.
I have the below table in access
DateOfTest
Time
PromptTime
TestSequence
PATResults
Logs
Serial Number
1
2
3
4
5
6
7
Obviously, where the numbers are i want the data from the CSV to be inserted.
I have created a form including a button so i can run some VBA, but i cannot find the correct information online for my work, as i am new to VBA it is also a bit confusing.
I have attempted some random code, but i was just spraying and praying at that point
I am not sure I understood your question. In the impoer tool you can choose columns, but if you want to do it with a script, I would suggest to perform pre-processing phase with simple python and pandas to read the csv file, remove any unwanted columns and save to another CSV to be uploaded directly to excel.
something like this
import pandas as pd
df = pd.read_csv ('csvfile.csv')
df.drop('column_name', inplace=True, axis=1)
df.to_excel ('filename.xlsx', index = False, header=True)
Despite searching both Google and the documentation I can't figure this out:
I have a CSV file that has a header line, like this:
ID,Name,Street,City
1,John Doe,Main Street,Some City
2,Jane Done,Sideroad,Other City
Importing into FileMaker works well, except for two things:
It imports the header line as a data set, so I get one row that has an ID of "ID", a Name of "Name", etc.
It assigns the items to fields by order, including the default primary key, created date, etc. I have to manually re-assign them, which works but seems like work that could be avoided.
I would like it to understand that the header line is not a data set and that it could use the field names from the header line and match them to the field names in my FileMaker table.
How do I do that? Where is it explained?
When you import records, you have the option to select a record in the source file that contains field names (usually the first row). See #4 here.
Once you have done that, you will get the option to map the fields automatically by matching names.
If you're doing this periodically, it's best to script the action. A script will remember your choices, so you only need to do this once.
Here is what I use.
OS: Linux Mint 18
Editor: LibreOffice Writer 5.1.6.2
Situation
Consider the following foo.csv file (just example, the raw data contains hundred of lines):
A,B,C
1,2,3
To create a table in Writer with the data from foo.csv usually one creates the table via Toolbar and then type the contents (possibly using TAB to navigate between cells).
Here is the result of procedure above:
Goal: Since the whole foo.csv contains hundreds of lines, how to proceed?
1st try: copy and paste the data from foo.csv into the table does not work, as seen below.
2nd try: copy and paste the data from foo.csv into the table with all cells selected does not work, as seen below.
Question: is it possible to edit an odt file in some way to write some code (like we could do with tags in HTML) to produce such table?
Embed a Calc spreadsheet is not acceptable.
Just use the "Text to Table" feature:
Insert the csv as "plain text" into your writer document (not into a table, just anywhere else);
Select the inserted lines;
Select Menu "Table" -> "Convert" -> "Text to Table";
Adjust the conversion properties as needed (set separator to comma: Select "Other", enter a comma into the box at the right"):
Hit OK - LO Writer will convert the text content of your CSV into a nice Writer table.
Please note that if you use this solution, there's nothing like a "connection" between the writer table and the csv data. Changing the csv won't affect the writer table. This would be possible only by embedding an object (but this won't result into a Writer table...).
If the csv data is the only content of the odt (writer) file, there's another option: Use LibreOffice Base to create a LO Database using the csv file (dynamically updated if the csv changes), and use the Report feature to get a tabular output of the csv data. LO Base will store the output layout as report, making it easy to create an up-to-date report.
I'm working on a flow where I get CSV files. I want to put the records into different directories based on the first field in the CSV record.
For ex, the CSV file would look like this
country,firstname,lastname,ssn,mob_num
US,xxxx,xxxxx,xxxxx,xxxx
UK,xxxx,xxxxx,xxxxx,xxxx
US,xxxx,xxxxx,xxxxx,xxxx
JP,xxxx,xxxxx,xxxxx,xxxx
JP,xxxx,xxxxx,xxxxx,xxxx
I want to get the field value of the first field i.e, country. Put those records into a particular directory. US records goes to US directory, UK records goes to UK directory, and so on.
The flow that I have right now is:
GetFile ----> SplitText(line split count = 1 & header line count = 1) ----> ExtractText (line = (.+)) ----> PutFile(Directory = \tmp\data\${line:getDelimitedField(1)}). I need the header file to be replicated across all the split files for a different purpose. So I need them.
The thing is, the incoming CSV file gets split into multiple flow files with the header successfully. However, the regex that I have given in ExtractText processor evaluates it against the splitted flow files' CSV header instead of the record. So instead of getting US or UK in the "line" attribute, I always get "country". So all the files go to \tmp\data\country. Help me how to resolve this.
I believe getDelimitedField will only work off a singular line and is likely not moving past the newline in your split file.
I would advocate for a slightly different approach in which you could alter your ExtractText to find the country code through a regular expression and avoid the need to include the contents of the file as an attribute.
Using a regex of ^.*\n+(\w+) will capture the first line and the first set of word characters up to the comma and place them in the attribute name you specify in capture group 1. (e.g. country.1).
I have created a template that should get the value you are looking for available at https://github.com/apiri/nifi-review-collateral/blob/master/stackoverflow/42022249/Extract_Country_From_Splits.xml
I'd like to know more about how Stata 13 can work with dataset in .CSV of large size (let's say higher than the RAM I have).
I can open the first nth line or the first nth columns with the following command
import delimited using filename.csv, rowrange(1:1000) colrange(1:3)
However, it seems I cannot open, without loading the whole dataset first, one of the following things:
the first and last three variables
the first and last 100 lines
a list of lines such that a variable satisfies some condition
Are there ways to do these things in Stata?
I'm not sure you can do this with one command, but you can try importing by parts and using merge. An example:
clear all
set more off
*----- example data -----
copy http://www.stata.com/examples/auto.csv auto.csv, replace
*----- what you want -----
* import first two columns
import delimited using "auto.csv", colrange(1:2) rowrange(1:6)
gen obs = _n
* save in temp file
tempfile first
save "`first'"
* import last two columns
import delimited using "auto.csv", colrange(4:5) rowrange(1:6) clear
gen obs = _n
* merge current data with the tempfile
merge 1:1 obs using "`first'", assert(match) nogen
* list
drop obs
order make foreign price
list
The previous covers point 1 in your question. For point 2, do something similar but instead of merge, use append.
The commands infile and use both support the use of if and in in their syntax, which may help you with point 3.
Edit
An example for point 2:
clear all
set more off
*----- example data -----
copy http://www.stata.com/examples/auto.csv auto.csv, replace
*----- what you want -----
* import first two rows of data
import delimited using "auto.csv", colrange(1:4) rowrange(2:3)
* save in temp file
tempfile first
save "`first'"
* import last two rows of data
import delimited using "auto.csv", colrange(1:4) rowrange(10:11) clear
* append current data with the tempfile
append using "`first'"
* list
sort make
list
Observation 1 starts in row 2 (row 1 contains variable names), so we need to shift everything in rowrange() by 1. Curiously, some testing shows that adding the varnames(1) option did nothing to change this behaviour.