I am hoping someone can point me in the right direction, in relation to the scenario I am faced with.
Essentially, I am given a csv each day containing payment information of 200+ lines
As the Payment reference is input by the user at source, this isn't always in the format I need.
The process is currently done manually, and can take considerable time, therefore I was hoping to come up with a batch file to isolate the reference I require, based on a set of parameters.
Each reference should be; 11 digits in length, be numeric only and start either 1,2 or 3.
I have attached a basic example with this post.
It may be that this isn't possible in batch, but any ideas would be appreciated.
Thanks in advance :-)
I'm not too sure about batch but Python and Regexcan help you out here.
Here is a great tutorial on using csv's with python.
Once you have that down, you could use Regex to filter out the correct values.
Here is the correct expression to help you out ^[1|2|3][0-9]{10}$
Related
I am unable to import csv table > DATEs columns to BigQuery,
DATEs are not recognized, even they have correct format according this docu
https://cloud.google.com/bigquery/docs/schema-detect YYYY-MM-DD
So DATEs columns are not recognized and are renamed to _2020-0122, 2020-01-23...
Is the issue that DATES are in 1st row as column name ?
But How can I then import dates, when I want use them in TimeSeries Charts (DataStudio) ?
here is sample source csv>
Province/State,Country/Region,Lat,Long,2020-01-22,2020-01-23,2020-01-24,2020-01-25,2020-01-026
Anhui,China,31.8257,117.2264,1,9,15,39,60
Beijing,China,40.1824,116.4142,14,22,36,41,68
Chongqing,China,30.0572,107.874,6,9,27,57,75
Here is ig from Bigquery
If you have finite number of days, you can try unpivot table when using it. See blog post.
otherwise, if you dont know how many day columns in csv file.
choose a unique character as csv delimiter then just load whole file into a single column staging table, then use split function. you'll also need unnest. This approach requires a full scan and will be more expensive, especially when file gets bigger.
The issue is that in column names you cannot have a date type, for this reason when the CSV is imported it takes the dates and transforms them to the format with underscores.
The first way to face the problem would be modifying the CSV file, because any import with the first row as a header will change the date format and then it will be harder to get to date type again. If you have any experience in any programming language you can do the transformation very easily. I can help doing this but I do not know your use case so maybe this is not possible. Where does this CSV come from?
If the CSV previous modification is not possible then the second option is what ktopcuoglu said, importing the whole file as one column and process this using SQL function. This is way harder than the first option and as you import all the data into a single column, all the data will have the same data type, what will be a headache too.
If you could explain where the CSV comes from we may be able to influence it before being ingested by BigQuery. Else, you'll need to deep into SQL a bit.
Hope it helps!
Hi, now I can help you further.
First I found some COVID datasets into the public bigquery datasets. The one you are taking from github is already in BigQuery, but there are many others that may work better for your task such as the one called “covid19_ecdc”, that is inside bigquery-public-data. This last one has the confirmed cases and deaths per date and country so it should be easy to make a time series.
Second, I found an interesting link performing what you meant with python and data studio. It’s a kaggle discussion so you may not be familiar with it, but it deserves a check for sure . Moreover, he is using the dataset you are trying to use.
Hope it helps. Do not hesitate to ask!
I'm working on following visualization: frequency of specific job build status result per job. Here's what it looks like:
The question is - how can I sort by one of the specific values, say, by "success"?
I've taken tens of attempts to attack the issue, re-read docs but I'm still failing. Let's say that the field name is buildStatus.
Have you tried to order your elasticsearch output as described in the documentation?
You should also have a look at this Kibana blog post, maybe it will help you.
I'm a geneticist trying to automate a very laborious search result from my data. I understand that this question may have been asked before, please bear with me as i'm not entirely sure what keywords i should use. Thanks in advance!
What i want to achieve:
Search a specific string of numbers in a website from my list of data (csv file), then select top option. Once in the page, search for specific keyword(s)and return results into csv file.
Rinse repeat for remaining numbers.
That's it. It took me 1 day to do a couple of hundred entries. It takes up too much manpower when i really hope to use my time better than this.
A customer of mine is looking to mass create some customizing data related the routes. and as such I have a small program which reads in a CSV file with all of the fields as they would be in the customizing transaction.
I'm having a particular problem wrapping my head around a field TVRO-TRAZTD for a couple of reasons.
The user is only filling in a number which represents a number of days.
There is a conversion exit on TRAZTD, except it's obsolete, use CONVERT TIMESTAMP they say
I don't have a timestamp, I have a decimal number representing a part of a day
For example, TRAZTD would be entered as 0,58 from the CSV file, so why is it represented in the table as 135.512?
I tried it the old fashion way and multiplied 0,58 * 24 which gives me 13,92. if I take 13,92 * 10 I get 139.200, which isn't the same but it's the closest I can get, but I don't get it why 10?
Using the conversion exit even though it's obsolete doens't give me a result either, no matter number I give it I always get 0 back. I can't use the convert timestamp either because well, it's not a timestamp or I didn't look up carefully enough how to use it (I didn't see anything other than strings and characters).
The other thing I tried too was just saying "screw it" and placed the data from the CSV directly into the field and hoping the conversion routine will take care of the work, but that doesn't happen either.
Is there anybody out here that can maybe shed some light on where the number after the conversion comes from?
everybody I came to a solution, just incase anybody stumbles upon this same problem.
I took the value from the excel document and multiplied it by 24 to get the amount of hours, and then multipled it 10000 because I don't know, I picked it randomly.
I have the following data in my MySQL database. These three columns are a subset of a table that I have selected using a query.
Value Date Time
230.8 13/08/08 15:01:22+22
233.7 13/08/08 15:13:12+22
234.5 13/08/08 15:40:33+22
I want to represent this data on a graph of (Value) versus (Date & Time) in a chronological manner. What is the format I need to put the above data into before using JSON cause I've had a look at a few tutorials and when I apply the same logic (like this:http://www.d3noob.org/2013/02/using-mysql-database-as-source-of-data.html) I don't seem to be getting any graph at all.
Or will JSON and D3.js not work for my requirement? Do I need to look at something else? Like some other JavaScript?
Your question is a little bit vague, but I'll try to adress a few of your topics to help you get started.
Firstly, I would suggest finding the visualization that fits your needs. From the data subset that you showed in the question, I would suggest maybe this one. It is interesting because if you have multiple values for different times in a given day, you could construct various time series graphs and compare them interactively. There are other options, so you should explore and find a good starting point to improve and adapt to your needs.
Regarding the origin/format of the data, if you are able to extract that data you showed to a variable (with PHP, for example), you can then manipulate the data and build a structure from it. It doesn't necessarily have to be JSON and/or CSV. As long as you can handle it with d3.js's API functions. It isn't very difficult, but it is something that requires you to understand and read about the topic. First understand how to query for your needs with MySQL. Then, I would suggest starting here if you decide to go with JSON.
The example visualization I mentioned above uses a CSV file as a data source. Other option could be for instance to build a CSV file (or data structure - ie, an array) to feed into d3.js. There are various questions covering "how to create CSV with PHP", so you shouldn't have much difficulty finding the info you need.
Either way, after you feel confortable with what you know about these topics, start breaking your problem into smaller tasks and finding answers to one question at a time. If you need, post more questions here in SO and include your attempts at coding a solution, this will definitely get you all the help you might need.
in python it would look like this:
import json
output = json.dumps(['data', {'data_1': ('230.8', '13/08/08', '15:01:22+22')}, {'data_2': ('233.7', '13/08/08', '15:13:12+22')}, {'data_3': ('234.5', '13/08/08', '15:40:33+22')}])
print output
more information about python and json can be found here