I have to export data to a CSV file, but I'm not sure if what I'm trying to do is possible. One specific column of the CSV file need to hold multiple values, and each one of these values must have a date specified. Example:
1.50 happened on the 31/01/2021
1.45 happened on the 28/02/2021
1.56 happened on the 31/03/2021
1.55 happened on the 30/04/2021
Can I do the following?
Place;Performance;Name
Berlin;"31/01/2021 1.50" "28/02/2021 1.45" "31/03/2021 1.56" "30/04/2021 1.55";Andrew
With the folowing you will get one cell in column 'Perfomance' with value:
"31/01/2021 1.50" "28/02/2021 1.45" "31/03/2021 1.56" "30/04/2021 1.55"
This will be interpreted as string/text.
You can try to desing specific output in Excel then save as CSV, and see with notepad how the csv looks like.
Related
I have some csv file which contains data like this 0234
When I open that csv, my libreoffice automatically converts value like 234 (leading zero removed)
libreoffice also formats some large numbers, so instead of original values i'm getting like: 13323+15
Question: can I somehow set-up libreoffice like so, that it never changed original values and opened file without any auto formatting ?
I am trying to save dataframe into hdfs system.
It gets saved as part-0000 and into multiple parts.
I want to save it as an excel sheet or just one part file?
How can we achieve this?
code used so far:
df1.write.csv('/user/gtree/tree.csv')
Your dataframe is being saved based on its partitions(multiple partitions= multiple files). You can coalesce or bring your partitions down to 1, so that only 1 file can be written.
Link:https://spark.apache.org/docs/latest/api/python/pyspark.sql.html#pyspark.sql.DataFrame.coalesce
df1.coalesce(1).write.csv('/user/gtree/tree.csv')
You can use .repartition(1) to set the partitions to only 1
df.repartition(1).save(filePath)
I'm pasting Tab Delimited data from Notepad++ to excel (about 50k rows and 3 columns). No matter how many different ways I try it, Excel wants to convert a cell containing one " to the next instance of " into one cell content.
For Example, if my data looked like this:
"Apple 1.0 Store
Banana 1.3 Store
"Cherry" 2.5 Garden
Watermelon 4.0 Field
The excel file looks like this:
Apple1.0StoreBanana1.3Store
Cherry 2.5GardenWatermelon4.0Field
One way to get around this is to open the file as a CSV in excel, however this leads to Excel formatting the number values to simplified ones using Excel's "General" format. So the data would look like the following:
"Apple 1 Store
Banana 1.3 Store
"Cherry" 2.5 Garden
Watermelon 4 Field
The data I'm getting is coming from SQL Server Studio so my options for file formats are:
.CSV
.Txt (Tab-delimited)
Copy Pasting from Query results
The solution I'm looking for is to have the data represented in Excel with no excel processing taking place on the quotations, numbers or any other cell contents.
Don't open the file directly in excel. Instead import it and control the data types and file layout.
Open a new excel document:
Select Data menu:
Select From Text in get External Data section.
Select file to import
On step 1 of import wizard select delimited
Click next
Select tab checkbox and change text qualifier to {none}.
Click next
Set column data types to general, text, text
Click finish.
Excel auto imports the data the best it can when you open directly in excel. You lose flexibility/control when this happens. better to import and control yourself to get the fine adjustments you're looking for.
You end up with something like this:
By treating the numbers like text, the zero's don't get messed up.
By setting the text qualifier to none, the quotes don't get messed up.
Have you tried opening it via Text Import?
Got to Data tab > From Text (third form left on default)
You will have window similar to Text To Columns.
Select correct delimiter, remember to remove the quote sign from TExt Qualifier and mark all columns as text to avoid Excel autoformatting.
Step 1:
Step 2:
Step 3:
EXCEL TIP: TIME SAVING IN IMPORTING CSV FILES INTO EXCEL: If u pre-set your Text-To-Columns delimiter parameters correctly in EXCEL (eg specify tabs as the delimiter) and then copy and paste the CSV data, Excel will import the CSV paste directly into the correct columns without u having to going through the Text-To-Columns rigmarole. This was particularly time saving when i had to import hundreds of bank statements into Excel.
However if your Text-To-Columns delimiters are pre-specified incorrectly as e.g. comma and you are importing tab delimited files then excel will dump all the data into one column, and u will have to go through the time consuming process of converting Text-To-Columns for each statement.
EXCEL LOOKS TO THE EXISTING Text-To-Columns delimiters TO SEE IF IT CAN USE THOSE TO MAKE YOUR LIFE EASIER WHEN PASTING DATA
Hope that tip helps (It saved me several hours)
I have a tabular csv file that has seven columns and containing the following data:
ID,Gender,PatientPrefix,PatientFirstName,PatientLastName,PatientSuffix,PatientPrefName
2 ,M ,Mr ,Lawrence ,Harry , ,Larry
I am new to pentaho and I want to design a transformation that moves the data (values of the 7 columns) to an empty excel sheet. The excel sheet has different column names, but should carry the same data, as shown:
prefix_name,first_name,middle_name,last_name,maiden_name,suffix_name,Gender,ID
I tried to design a transformation using the following series of steps, but it gives me errors at the end that I could not interpret them.
What is the proper design to move the data from the csv file to the excel sheet in this case? Any ideas to solve this problem?
As #Brian.D.Myers mentioned in the comment you can use select values step. But here is how you do it step by step explanation.
Select all the fields from CSV file input step.
Configure the select values step as follows.
In the Content tab of Excel writer step click on Get fields button and fill the fields. Alternatively you can use Excel output step as well.
I have 2 CSV files almost identical with the following differences:
The first has a column, "date".
The second doesn't have "date" and also has 50 rows less than the 1st ("email").
They are a list of subscribers with date created. The second, however, is the updated list with subscribers who wanted to be removed, but this no longer has the date created.
Is there any way to import column "date" from 1st CSV into the 2nd CSV by making a reference to the "email" column so I can get the correct date of that subscriber?
Sorry, there seems to be not a ready made (probably an evening's worth of effort) command line tool available.
You could look at different ways, one complex way is to load it in tables, to the merge (using a select and join on the two tables) and export it back as csv.
The simplest I could think of was to use R (given that you have header names, in your CSV?):
csv1_data <- read.csv('/path/to/csv1.csv')
csv2_data <- read.csv('/path/to/csv2.csv')
merged_csv <- merge(csv1_data, csv2_data)
write.table(merged_csv,file="/path/to/merged_csv.csv",sep=",",row.names=T)
The first 2 lines load the data in R, the 3 line merges them using the default S3 method, the final line exports the result as a csv file, with the headers.
Hope this helps!