importing CSV into FileMaker Pro 18 with CSV header line - csv

Despite searching both Google and the documentation I can't figure this out:
I have a CSV file that has a header line, like this:
ID,Name,Street,City
1,John Doe,Main Street,Some City
2,Jane Done,Sideroad,Other City
Importing into FileMaker works well, except for two things:
It imports the header line as a data set, so I get one row that has an ID of "ID", a Name of "Name", etc.
It assigns the items to fields by order, including the default primary key, created date, etc. I have to manually re-assign them, which works but seems like work that could be avoided.
I would like it to understand that the header line is not a data set and that it could use the field names from the header line and match them to the field names in my FileMaker table.
How do I do that? Where is it explained?

When you import records, you have the option to select a record in the source file that contains field names (usually the first row). See #4 here.
Once you have done that, you will get the option to map the fields automatically by matching names.
If you're doing this periodically, it's best to script the action. A script will remember your choices, so you only need to do this once.

Related

AWS glue: crawler does not identify metadata when CSV contains string and timestamp/date values

I have come across one thing when we consider CSV as input to crawler
crawler doesn't identify the columns header when all the data is in string format in CSV.
#P1 Headers are displayed as col0,col1...colN.
#P2 And actual column names are considered as data.
#P3 Metadata (i.e. column datatype is shown as string even the CSV dataset consists of date/timestamp value)
If we are going to consider custom (CSV) classifier then we are manually mentioning the column header.
#P2 will get covered i.e. column names will be removed however
#P1 still remain same. column header will be displayed as col0,col1...colN.
There are 3 things I want to avoid and achieve expected result.
CSV with strings only should show actual column names instead of col0,col1...colN.
Metadata of generated table should show correctly (i.e. date/timestamp, string) once it is crawled by crawler.
If Custom classifier is used, we need to mention column header names manually in classifier, yet result is not satisfactory.
Need generic solution instead of manual interventions.
Have gone through this document: here
If anyone has already implemented the solution, Please help.
I got solution to one of the above points. Headers i.e. first line of CSV is displayed by using 'Has heading' in CSV classifier.
However, Solution for following is yet to figure out.
Metadata of CSV file is shown as string even if column contains timestamp/date value. Crawler is reading these datatypes as string.
Custom classifier needs manual interventions. I have mentioned all column names in classifier. Is there generic solution?
If we are using pd.to_csv to write the dataframe, then to avoid getting column names as col1, col2 and so on, add the parameter
index_label='index' such as:
pd.to_csv(df,index_label='index')

NiFi : Regular Expression in ExtractText gets CSV header instead of data

I'm working on a flow where I get CSV files. I want to put the records into different directories based on the first field in the CSV record.
For ex, the CSV file would look like this
country,firstname,lastname,ssn,mob_num
US,xxxx,xxxxx,xxxxx,xxxx
UK,xxxx,xxxxx,xxxxx,xxxx
US,xxxx,xxxxx,xxxxx,xxxx
JP,xxxx,xxxxx,xxxxx,xxxx
JP,xxxx,xxxxx,xxxxx,xxxx
I want to get the field value of the first field i.e, country. Put those records into a particular directory. US records goes to US directory, UK records goes to UK directory, and so on.
The flow that I have right now is:
GetFile ----> SplitText(line split count = 1 & header line count = 1) ----> ExtractText (line = (.+)) ----> PutFile(Directory = \tmp\data\${line:getDelimitedField(1)}). I need the header file to be replicated across all the split files for a different purpose. So I need them.
The thing is, the incoming CSV file gets split into multiple flow files with the header successfully. However, the regex that I have given in ExtractText processor evaluates it against the splitted flow files' CSV header instead of the record. So instead of getting US or UK in the "line" attribute, I always get "country". So all the files go to \tmp\data\country. Help me how to resolve this.
I believe getDelimitedField will only work off a singular line and is likely not moving past the newline in your split file.
I would advocate for a slightly different approach in which you could alter your ExtractText to find the country code through a regular expression and avoid the need to include the contents of the file as an attribute.
Using a regex of ^.*\n+(\w+) will capture the first line and the first set of word characters up to the comma and place them in the attribute name you specify in capture group 1. (e.g. country.1).
I have created a template that should get the value you are looking for available at https://github.com/apiri/nifi-review-collateral/blob/master/stackoverflow/42022249/Extract_Country_From_Splits.xml

Best way to parse a big and intricated Json file with OpenRefine (or R)

I know how to parse json cells in Open refine, but this one is too tricky for me.
I've used an API to extract the calendar of 4730 AirBNB's rooms, identified by their IDs.
Here is an example of one Json file : https://fr.airbnb.com/api/v2/calendar_months?key=d306zoyjsyarp7ifhu67rjxn52tv0t20&currency=EUR&locale=fr&listing_id=4212133&month=11&year=2016&count=12&_format=with_conditions
For each ID and each day of the year from now until november 2017, i would like to extract the availability of this rooms (true or false) and its price at this day.
I can't figure out how to parse out these informations. I guess that it implies a series of nested forEach, but i can't find the right way to do this with Open Refine.
I've tried, of course,
forEach(value.parseJson().calendar_months, e, e.days)
The result is an array of arrays of dictionnaries that disrupts me.
Any help would be appreciate. If the operation is too difficult in Open Refine, a solution with R (or Python) would also be fine for me.
Rather than just creating your Project as text, and working with GREL to parse out...
The best way is just select the JSON record part that you want to work with using our visual importer wizard for JSON files and XML files (you can even use a URL pointing to a JSON file as in your example). (A video tutorial shows how here: https://www.youtube.com/watch?v=vUxdB-nl0Bw )
Select the JSON part that contains your records that you want to parse and work with (this can be any repeating part, just select one of them and OpenRefine will extract all the rest)
Limit the amount of data rows that you want to load in during creation, or leave default of all rows.
Click Create Project and now your in Rows mode. However if you think that Records mode might be better suited for context, just import the project again as JSON and then select the next outside area of the content, perhaps a larger array that contains a key field, etc. In the example, the key field would probably be the Date, and why I highlight the whole record for a given date. This way OpenRefine will have Keys for each record and Records mode lets you work with them better than Row mode.
Feel free to take this example and make it better and even more helpful for all , add it to our Wiki section on How to Use
I think you are on the right track. The output of:
forEach(value.parseJson().calendar_months, e, e.days)
is hard to read because OpenRefine and JSON both use square brackets to indicate arrays. What you are getting from this expression is an OR array containing twelve items (one for each month of the year). The items in the OR array are JSON - each one an array of days in the month.
To keep the steps manageable I'd suggest tackling it like this:
First use
forEach(value.parseJson().calendar_months,m,m.days).join("|")
You have to use 'join' because OR can't store OR arrays directly in a cell - it has to be a string.
Then use "Edit Cells->Split multi-valued cells" - this will get you 12 rows per ID, each containing a JSON expression. Now for each ID you have 12 rows in OR
Then use:
forEach(value.parseJson(),d,d).join("|")
This splits the JSON down into the individual days
Then use "Edit Cells->Split multi-valued cells" again to split the details for each day into its own cell.
Using the JSON from example URL above - this gives me 441 rows for the single ID - each contains the JSON describing the availability & price for a single day. At this point you can use the 'fill down' function on the ID column to fill in the ID for each of the rows.
You've now got some pretty easy JSON in each cell - so you can extract availability using
value.parseJson().available
etc.

Import CSV column from different file into new file

I have 2 CSV files almost identical with the following differences:
The first has a column, "date".
The second doesn't have "date" and also has 50 rows less than the 1st ("email").
They are a list of subscribers with date created. The second, however, is the updated list with subscribers who wanted to be removed, but this no longer has the date created.
Is there any way to import column "date" from 1st CSV into the 2nd CSV by making a reference to the "email" column so I can get the correct date of that subscriber?
Sorry, there seems to be not a ready made (probably an evening's worth of effort) command line tool available.
You could look at different ways, one complex way is to load it in tables, to the merge (using a select and join on the two tables) and export it back as csv.
The simplest I could think of was to use R (given that you have header names, in your CSV?):
csv1_data <- read.csv('/path/to/csv1.csv')
csv2_data <- read.csv('/path/to/csv2.csv')
merged_csv <- merge(csv1_data, csv2_data)
write.table(merged_csv,file="/path/to/merged_csv.csv",sep=",",row.names=T)
The first 2 lines load the data in R, the 3 line merges them using the default S3 method, the final line exports the result as a csv file, with the headers.
Hope this helps!

SSIS Need Flat File output with 2 column headers the same

I am trying to use SSIS Flat File destination, but cannot come up with a work around for getting the output file to have two columns named to same thing.
I have a requirement for the output file to have the column headers:
first1, last1, email, shortname, email
Whenever I try to map the source data, I get error messages saying things like "This column name already exists" and "There is more than one data source column with the name "email"".
What's the best work around?
Thanks
Assuming I understand the problem correctly, you need to have the same column name in the output file twice. Doesn't matter whether it's same data or not, just the header needs to be repeated.
It's a little hokey, but in your connection manager, uncheck "Column Names in the first data row" and redefine the columns as email and email1. This will allow you to connect the columns to the right places in the file.
In your flat file destination, you have the ability to define Header row(s). It's very limited, you can't put useful things in there like dynamic checksums and such but in your case, paste in first1, last1, email, shortname, email and run the package. Data will be extracted to the correct columns and a header row will be prepended to the file with all the "right" field names.
Two downsides to this approach. First is the connection manager becomes output only as it would attempt to read in the header row from the file. Second is that any changes to the layout will not be kept in sync with the manual header row.