I have a problem loading the .CSV file as the connection manager editor settings are out of my knowledge.
When i load the .CSV file up to 18 rows i have no problem it is loading in to the table.
However, from the 19th column the data is not partioning correctly.
row delimeter is {CR}{LF}
column delimeter is Comma {,}
How can i partition the data correctly?
any help?
Here are some ideas I have with no details.
What happens when you try to import the same .CSV file into Excel? Anything interesting around row 19?
Does there appear to be anything different about row 19?
If you delete row 19, what happens?
See, I bet you've thought of these things as well, and probably more, since you have the details. If you want anything more than superficial bad guesses, you'll have to provide a little detail.
I've found the CSV Import to be a bit limited with regards to bad data. If you're having trouble with the 19th column, I would suggest figuring out why that column is failing. You can try and tell the import task's error conditions to Ignore Errors with data truncation, etc...but that may not fix the issue.
I have often switched complicated or error-prone CSV imports to simply use a SSIS Script Task, then just write my own code to parse out the CSV and handle bad data.
If it's not partitioning correctly, it might be something as trivial as one of your field values on row 19 containing a comma, thus throwing out the import by making that row seem to have more columns. If this is the case, I hope you can get a revised version of the CSV file - this time with a text qualifier set. If possible, use something like | rather than " as the qualifier so that it's less likely to appear in the field values.
Put the file in a text editor such as notepad++ or textpad and change the view to show control characters. You will probably find your culprit there.
Nothing unusuale. when i paste in excel as one column and converting text to column has no problem. but i can see in the SSIS preview the field value where the problem has started has two square boxs and data of the next row.
if any one want to see the file let me know i will e-mail you the file.
Related
Have you guys ever faced an issue while saving results (export) of a GA4 query in BigQuery as JSONL (delimited) where columns containing NULLS were removed in the JSON downloaded; which can also be noticed post uploading it as a table (upload -> JSONL) and querying. Yet another thing I’ve noticed was that the sequence of the columns after importing were different compared to the original table before exporting. Has anyone faced this issue? If so, I’d appreciate how you found a way around and how can one download and re-upload a JSON with the schema integrity.
P.S: If you are wondering why would one download and re-upload a JSONL, it was just to see if an option was viable other than being in the google cloud ecosystem. Also to be a bit more specific about the NULLS being removed, I meant let’s say float_value or double_value from GA4 Bigquery export(event_params) being eliminated if they were all NULLs
Thanks a ton in advance
Columns containing NULLs were removed. I expected it to retain the data structure like json local.
The sequence of the columns after importing were different / changed compared to pre-export.
I am unable to import csv table > DATEs columns to BigQuery,
DATEs are not recognized, even they have correct format according this docu
https://cloud.google.com/bigquery/docs/schema-detect YYYY-MM-DD
So DATEs columns are not recognized and are renamed to _2020-0122, 2020-01-23...
Is the issue that DATES are in 1st row as column name ?
But How can I then import dates, when I want use them in TimeSeries Charts (DataStudio) ?
here is sample source csv>
Province/State,Country/Region,Lat,Long,2020-01-22,2020-01-23,2020-01-24,2020-01-25,2020-01-026
Anhui,China,31.8257,117.2264,1,9,15,39,60
Beijing,China,40.1824,116.4142,14,22,36,41,68
Chongqing,China,30.0572,107.874,6,9,27,57,75
Here is ig from Bigquery
If you have finite number of days, you can try unpivot table when using it. See blog post.
otherwise, if you dont know how many day columns in csv file.
choose a unique character as csv delimiter then just load whole file into a single column staging table, then use split function. you'll also need unnest. This approach requires a full scan and will be more expensive, especially when file gets bigger.
The issue is that in column names you cannot have a date type, for this reason when the CSV is imported it takes the dates and transforms them to the format with underscores.
The first way to face the problem would be modifying the CSV file, because any import with the first row as a header will change the date format and then it will be harder to get to date type again. If you have any experience in any programming language you can do the transformation very easily. I can help doing this but I do not know your use case so maybe this is not possible. Where does this CSV come from?
If the CSV previous modification is not possible then the second option is what ktopcuoglu said, importing the whole file as one column and process this using SQL function. This is way harder than the first option and as you import all the data into a single column, all the data will have the same data type, what will be a headache too.
If you could explain where the CSV comes from we may be able to influence it before being ingested by BigQuery. Else, you'll need to deep into SQL a bit.
Hope it helps!
Hi, now I can help you further.
First I found some COVID datasets into the public bigquery datasets. The one you are taking from github is already in BigQuery, but there are many others that may work better for your task such as the one called “covid19_ecdc”, that is inside bigquery-public-data. This last one has the confirmed cases and deaths per date and country so it should be easy to make a time series.
Second, I found an interesting link performing what you meant with python and data studio. It’s a kaggle discussion so you may not be familiar with it, but it deserves a check for sure . Moreover, he is using the dataset you are trying to use.
Hope it helps. Do not hesitate to ask!
I had data in a CSV file, which appears in a notepad something like this:
In SSIS, when I try to load this file in a Delimited format, the data which appears in the preview gets messed up due to the commas which occur in the numeric values, eg. in thousands and millions. The data looks something like this:
Is there any way in which this problem can be taken care of in the connection manager itself ?
Thanks!
Use Text Qualifier as shown here:
This will take care of the columns that have quotes inside. Sometimes it gets really bad with CSV data, and I've had to resort to script components doing some cleanup, but that's really rare.
I need to regularly merge data from multiple CSV files into a single spreadsheet by appending the rows from each source file. Only OpenOffice/LibreOffice is able to read the UTF-8 CSV file, which has quote-delimited fields containing newline characters.
Now, each CSV file column headings, but the order of the columns vary from file to file. Some files also have missing columns, and some have extra columns.
I have my master list of column names, and the order in which I would like them all to go. What is the best way to tackle this? LibreOffice gets the CSV parsing right (Excel certainly does not). Ultimately the files will all go into a single merged spreadsheet. Every row from each source file must be kept intact, apart from the column ordering.
The steps also need to be handed over to a non-technical third party eventually, so I am looking for an approach that will not offer too many non-expert technical hurdles.
Okay, I'm approaching this problem a different way. I have instead gone back to the source application (WooCommerce) to fix the export, so the spreadsheets list all the same columns and all in the same order, on every export. This does have other consequences that I need to follow up, such as managing patches and trying to get the changes accepted by the source project. But it does avoid having to append the CSV files with mis-matched columns, which seems to be a common issue that no-one has any real solutions for (yes, I have searched, a lot).
When exporting a CSV from Access 2007, it automatically converts decimals into scientific notation.
Unfortunately the tool that receives them treats these fields as text, and displays them as is.
The values being exported are from a query being run against some Excel linked tables, and they appear perfectly in the query view.
Is there any way to disable the automatic conversion to scientific notation.
I.e. if it appears as 0.007 in the query, it will appear as 0.007 in the output csv rather then 7E3?
Note: I'm constrained to use Excel and Access for this. As much as I'd like to switch to SQL Server, my wife would be unhappy if I put it on her work laptop!
You have a couple of choices:
you can use the Format() function directly in your query to force the data in the offending columns to be formatted a certain way, for instance:
SELECT ID, Format([Price],"standard") as Pricing FROM ORDERS;
you can write your own CSV export routine in VBA.
I posted one recently as an answer to this question.
You can easily modify the code to format numeric types a certain way.
If you don't know how, let me know and I'll modify the code and post it here.
You could write a short amount of VBA code in access to query the data from the linked table or Access query and write it out to a text file, thus creating your own .CSV and foregoing the "Wizard". I never liked Access' export "wizard" much, and just created the files myself.
One easy way to handle this in a Query is to double-convert the value to long integer and then to string.
For CSV-export it is character anyway.
myValue:ZString(ZLong(123456789))