I get monthly data feeds from source that i need to import into database but the problem is that the feed changes every month, sometimes there are more columns and sometimes there are less columns. There is no consistency whatsoever.
How do I manage and automate these data feeds?
Rather than expertise and out of box thinking, I think this needs a forthright response that automation is impossible unless the data provider commits to a particular column structure.
Related
i am trying to automate downloading online statistics into my financial models.
i am using australian statistics database http://stat.data.abs.gov.au/index.aspx? to automatically pull cpi data via an api json link converted into a power query and excel's "get data from web" functionality.
however, i can only bring the existing data, but can't make it add a new period of data when it's published on the website. i'd be happy with either adding new data column or replacing existing data e.g. so that i always have last four quarters.
i'm not an expert and i can't find anything useful because i'm probably not using the right terms in my google search!
any advice would be much appreciated
So this is more of a conceptual question. There might be some fundamental concepts which I don't understand clearly so please point out any mistakes in my understanding.
I am tasked with designing a framework and a part of it is I have a MySQL DB and a REST API which acts as the Data Access Layer. Now, the user should be able to parse various data (JSON, CSV, XML, Text, Source Code etc.) and send it to the REST API which persists the data to the DB.
Question 1: Should I specify that all data sent to the REST API should be in JSON format no matter what is parsed? This will ensure (best to my understanding) language independence and gives the REST API a common format to deal with.
Question 2: When it comes to a data model, what should I specify? Is it like a one-model-fits-all sort of thing or is the data model subject to change based on the incoming data?
Question 3: When I think of a relational data model, the thought of foreign keys comes to mind which creates the relation. Now, it might happen that some data may not contain any relation at all. If we think of something like Customer Order sort of data then the relation is easy to identify. But what if the data does not have any relation at all? How does the relational model fit into this?
Any help/suggestion is greatly appreciated. Thank you!
EDIT:
First off, the data can be both structured (say XML) and unstructured (say two text files). I want the DAL to be able to handle and persist whatever data that comes in (that's why I thought of a REST interface in front of the DB).
Secondly, I also just recently thought about MongoDB as an option and was looking into it (I have never used NoSQL DBs before). It kind of makes sense to use it if the incoming data in REST is in JSON. From what I understood I can create a collection in Mongo. Does that make more sense than using a Relational DB??
Finally, as to what I want to do with the data is I have a tool which performs a sort of difference analysis (think git diff) on the data. Say I sent two XML files and the tool retrieves it from the DB and performs the difference analysis and stores the result back in the DB.
Based on these requirements, what would be the optimum way to go about it?
The answer to this will depend on what sort of data it is. Are all these different data types using different notation for the same data? If so then storing in normalised database tables is the way to go. If its just arbitrary strings that happen to have some form of encoding, then its probably best to store in raw.
Again, it depends on what you want to do with it afterwards. Are you analysing the data, and you reporting on it? Are you reading one format and converting to another? Is it all some form of key-value pairs in some notation or other
No way to answer this further without understanding what you are trying to achieve.
I'm attempting to take stock data pulled from Google and create tables for each ticker to record historical market data in Access. I am able to easily import the delimited text data into Access, the problem is, I am pulling multiple tickers in one pull. When imported the data is vertical as such:
I know how to easily do this in Excel, yet I am having the worst time figuring out how to automate it in access. The reason I am attempting to automate it is that the database will be pulling this data and updating it every 15 minutes for over 300 ticker symbols. Essentially in the example above, I need to find 'CVX' then in a new table have it list out the data below it horizontally like so:
I have been searching online and am literally going bananas because I can't figure out how to do this (which would be simple in Excel). Does anyone have any experience manipulating data in this way or know of any potential solutions?
After some more research I realized the data I was getting is in JSON format. After digging a little more I was able to find online converters This one worked particularly well. After converting the file to CSV it was easy to import the data into Access.
I'm running an eCommerce store and I am being provided with a daily data feed of info like Item Number, Price, Description, Inventory Status, etc.
Some notes:
I know the URL of the .xls file
I need to modify the Item Number on the .xls for all products to add two letters to the beginning
Price and Inventory Status on the website database need to be updated daily for each item, matched up by Item Number
If Item Number does not exist, new item is created with all information contained in the excel sheet
This all needs to be fully automated (this is the part I need the most assistance with)
I used to have a company that took care of this for $50/month, but I now have access to the data myself. My programming experience is limited to web languages (PHP, HTML, etc.) and some basic C++. A side question would be whether or not it's worth taking this responsibility upon myself or if I should continue working with a company who has systems in place to handle this already.
If you can get the CSV instead of the XLS, load it yourself into a new table, update what you need and then insert the rows into your production table.
If you're stuck with the XLS, find a library for PHP that will allow you to parse it and then write the records to the new table.
As for your second question, yes, it's absolutely worthwhile to cutout the thieves who are charging you $600/year for something that should take you an hour or two to write yourself.
Good luck.
There are two suggestions here. One involves using mysqlimport, the other TOAD. If you need more explanation of how to implement this, expand your question.
I have a database (MySql) and need to store some results from a web service monthly.
The data can have 10 results today but may have 200 next month.
I need to use a BI tool to create charts and what not.
Someone proposed to serialize the data and save the blobs in the database, while the solution seems to work, I have a gut feeling that when the time comes to hook it up with the BI tool, hell will break loose.
Has anyone had this issue before?
Thanks
Edit: adding extra info.
The problem is that we haven't chosen the BI tool yet. But what it needs to do is create charts for the results. Some of the results come from Google Analytics. So we will be charting number of visitors to a site for the last 6 months. Or Number of viewed pages.
The answer is simple: do not store Serialized data in a database.
Do some research, atomize your data and create data structure.
Once you've done it, you will be able to use any BI tool in the world.
That's the purpose of a database and what distinguishes a database from a flat file.