What's the syntax to write an array for solr in a csv file?, i need to update a multivalued field but when i upload the file, the data get all in the array but like just one element like this:
multiField:["data1,data2,data3"]
instead of this
multiField:["data1", "data2" , "data3"]
how i can write this in the csv file by default?
You can use the split parameter to split a single field into multiple values:
&f.multiField.split=,
.. should do what you want.
Related
I am trying to store my pyspark output into csv, but when I try to save it in csv, the output does not look the same. I have the output in this form:
When I try to convert this to csv, the Concat tasks column does not show up properly, due to the size of the data. Given my requirement, it's necessary for me to store the data in csv format. Is there a way out for this. (P.S- I also see columns showing nonsensical values, even though the pyspark output shows correct value)
I have a table in Cassandra DB and one of the column has value in JSON format. I am using Datastax DevCenter for querying the DB and when I try to export the result to CSV, JSON value gets broken to separate column wherever there is coma(,). I even tried to export from command prompt without giving and delimiter, that too resulted in broken JSON value.
Is there anyway to achieve this task?
Use the COPY command to export the table as a whole with a different delimiter.
For example :
COPY keyspace.your_table (your_id,your_col) TO 'your_table.csv' WITH DELIMETER='|' ;
Then filter on this data programmatically in whatever way you want.
I want to upload csv data into BigQuery. When the data has different types (like string and int), it is capable of inferring the column names with the headers, because the headers are all strings, whereas the other lines contains integers.
BigQuery infers headers by comparing the first row of the file with
other rows in the data set. If the first line contains only strings,
and the other lines do not, BigQuery assumes that the first row is a
header row.
https://cloud.google.com/bigquery/docs/schema-detect
The problem is when your data is all strings ...
You can specify --skip_leading_rows, but BigQuery still does not use the first row as the name of your variables.
I know I can specify the column names manually, but I would prefer not doing that, as I have a lot of tables. Is there another solution ?
If your data is all in "string" type and if you have the first row of your CSV file containing the metadata, then I guess it is easy to do a quick script that would parse the first line of your CSV and generates a similar "create table" command:
bq mk --schema name:STRING,street:STRING,city:STRING... -t mydataset.myNewTable
Use that command to create a new (void) table, and then load your CSV file into that new table (using --skip_leading_rows as you mentioned)
14/02/2018: Update thanks to Felipe's comment:
Above comment can be simplified this way:
bq mk --schema `head -1 myData.csv` -t mydataset.myNewTable
It's not possible with current API. You can file a feature request in the public BigQuery tracker https://issuetracker.google.com/issues/new?component=187149&template=0.
As a workaround, you can add a single non-string value at the end of the second line in your file, and then set the allowJaggedRows option in the Load configuration. Downside is you'll get an extra column in your table. If having an extra column is not acceptable, you can use query instead of load, and select * EXCEPT the added extra column, but query is not free.
How can Mapreduce parse a CSV file with 80 columns and for each row in excel format it results two to three lines in CSV format? Text input format doesn't work in this case. Does key value input format work in this case?
You can write your own InoutFormat & RecordReader which will read multiple lines and send as a single record to your Mapper.
I have a CSV file that I can open in Excel 2012 and it comes in perfectly. When I try to setup the metadata for this CSV file in Talend the fields (columns) are not splitting the same was as Excel splits them. I suspect I am not properly setting the metadata.
The specific issue is that I have a column with string data in it which may contain commas within the string. For example suppose I have a CSV file with three columns: ID, Name and Age which looks like this:
ID,Name,Age
1,Ralph,34
2,Sue,14
3,"Smith, John", 42
When Excel reads this CSV file it looks at the second element of the third row ("Smith, John") as a single token and places it into a cell by itself.
In Talend it trys to break this same token into two since there is a comma within the token. Apparently Excel ignores all delimeters within a quoted string while Talend by default does not.
My question is how to I get Talend to behave the same as Excel?
if you use tfileinputdelimited component to read this csv file, you can use delimeter as "," and under csv options properties of this component you should enable Text Enclosure """ option or even if you use metadata there would be an option to define string/text enclosure - here you should mention """ to resolve your problem