How to analyze a data set directly from website without first downloading it? - data-analysis

I am learning Analysing data for my research. There is a website that contains all the data set for every day for the past 26 years. I have to write a python code such that if I enter the date, the data set for that day should open. Since the files are in .cdf format I have to use python to open them. Can someone tell me what are the things that I need to learn and what are the libraries that will help me to open the data set from the website without first downloading them? I have some experience with python but not a lot.
Also is there any good source that I can visit to learn more about Data Analysis using python?

You can use Pandas for this matter.
Pandas lets you store files on a dataframe using a link, without downloading it to your local machine.

Related

How to automate data extraction from Elasticsearch Dev Tools?

I have to do the following steps two or three times a day
log in into Elasticsearch
Go to Dev Tools
Run a specific query by selecting it and pressing ctrl + enter
Query that I have to run
Select the results that returns in the "buckets" and copy it.
The yellow markdown in the image is what I have to select and copy
Then I go to https://www.convertcsv.com/json-to-csv.htm and paste the results so it converts to CSV.
Where I have to paste the results.
I can then download the CSV and then import it into google sheets so I can view the results in a Looker Dashboard.
Button to download the converted CSV.
This take me some time everyday and I would like if there is any way that I could automate such routine.
Maybe some ETL tool that can perform at least part of the process or may some more specific way to do it with python.
Thanks in advance.
I don't have much experience with what I want to do and I tried to search online similar issues, but couldn't really find anything useful.
I don't know you tried, but there is a reporting tool on elasticsearch inside the "Stack Management > Reporting". On the other side, there are another tools which you can work from a server with crontab. Here are some of them :
A little bit old but I think it can work for you. ES2CSV. YOu can check there are examples inside the docs folder. YOu can send queries via file and report to CSV.
Another option which is my preference too. You can use pandas library of python. YOu can write a script according to this article, and you can get a csv export CSV. The article I mentioned is really explained in a great way.
Another alternative a library written in Java. But the documentation is a little bit weak.
Another alternative for python library can be elasticsearch-tocsv. This one is a little bit recently updated when I compare it to first alternative. But the query samples are a little bit weak. But there is a detailed article. You can check it.
You can use elasticdump, which is written on NodeJS and a great tool to report data from elasticsearch. And there is a CSV export option. You can see examples on GitHub page.
I will try to find more and I will update this answer time by time. Thanks!

Uploading Data into Redis

I am working on school project where we need to create website and use Redis to search database, in my case it will be movie database. I got JSON file with names and ratings of 100 movies. I would like to upload this dataset into Redis instead on entering the entire dataset manually. JSON file is saved on my desktop and I am using Ubuntu 20.04.
Is there a way to do it?
I have never used Redis so my question might be very silly. I've been looking all over the internet and cannot find exactly what needs to be done. I might be googling incorrect question maybe that's why I cannot find the answer.
Any help would be appreciated.
Write an appropriate program to do the job. There's no one-size-fits-all process because how your data is structured in redis is up to you; once you decide on that, it should be easy to write a program to parse the JSON and insert the data.

Conversion of GRIB and NetCDF to my database

I have downloaded "High Resolution Initial Conditions" climate forecast data for one day, it was in extension .tar.gz so I extracted it in my local directory and I get the files like in the attached image. I think, that the files without extension are GRIB data (because first word in them is "GRIB"). So I want to get data from the big files (GRIB and NetCDF formats containing climate data like temerature & pressure in grid) to my database, but they are binary. Can you recommend me some easy way for getting data from these files? I can't get any information about handling their datasets on their website.
Converting these files to .csv would be nice, but I can't find a program to convert the GRIB files.
Using python and some available modules it is simple...
The Enthought Python Distribution includes several packages, including netCDF4, to deal with NetCDF files!
I've never worked with GRIB files, but google tells that another python package exists, pygrib2.
Or you can use PyNio, a Python package that allows to read and write netCDF3 and netCDF4 classic format, and to read GRIB1 and GRIB2 files.
I don't know the ammount of data you have, but usually it is crazy to convert it to *.csv! Python is easy to learn, and suitable to work with this kind of data (with matplotlib package you can even plot it). Or, if you really need it in a *.csv, you can select with python a smaller domain, for example, or the needed variables...
For conversion into text, look into http://www.cpc.ncep.noaa.gov/products/wesley/wgrib.html or http://www.cpc.ncep.noaa.gov/products/wesley/wgrib2/
Both are C programs from one of the big names in GRIB.
I'm currently dealing with a similar issue.
In my case I'm trying to rely on the GrADS software, which can "easily" transform GRIB data into other formats.
If your dataset is not huge, then you can export it to csv using this tutorial.
My dataset is 80gb in GRIB binary files, so I'm very restricted in what software I can use to handle it (no R unless I find a computer with more than 80gb of RAM).

Best way to gather, then import data into drupal?

I am building my first database driven website with Drupal and I have a few questions.
I am currently populating a google docs excel spreadsheet with all of the data I want to eventually be able to query from the website (after it's imported). Is this the best way to start?
If this is not the best way to start what would you recommend?
My plan is to populate the spreadsheet then import it as a csv into the mysql db via the CCK Node.
I've seen two ways to do this.
http://drupal.org/node/133705 (importing data into CCK nodes)
http://drupal.org/node/237574 (Inserting data using spreadsheet/csv instead of SQL insert statements)
Basically my question(s) is what is the best way to gather, then import data into drupal?
Thanks in advance for any help, suggestions.
There's a comparison of the available modules at http://groups.drupal.org/node/21338
In the past when I've done this I simply write code to do it on cron runs (see http://drupal.org/project/phorum for an example framework that you could strip down and build back up to do what you need).
If I were to do this now I would probably use the http://drupal.org/project/migrate module where the philosophy is "get it into MySQL, View the data, Import via GUI."
There is a very good module for this, node import. It allows you to take your GoogleDocs spreadsheet and import it as a .csv file.
It's really easy to use, the module allows you to map your .csv columns to the node fields you want them to go to, so you don't have to worry about setting your columns in a particular order. Also, if there is an error on some records, it will spit out a .csv with the error files and what caused the error, but will import all good records.
I have imported up to 3000 nodes with this method.

How to process Excel files stored in an image data type column using SSIS package?

I have a .NET webforms front end that allows admin users to upload two .xls files for offline processing. As these files will be used for validation (and aggregation) I store these in an image field in a table.
My ultimate goal is to create an SSIS package that will process these files offline. Does anyone know how to use SSIS to read a blob from a table into its native (in this case .xls) format for use in a Data Flow task?
In my (admittedly limited) experience with SSIS, it is quite good at rapidly getting something up and running, but frusteratingly limited in getting something that "feels" like the most elegant, efficient solution to a programmer.
Since the Excel Source Editor seems to take only files as input, you need to give it a file or reimplement its functionality in code that can take a blob. I understand that this is unsatisfying, but in the end, this is a time saving tool.