As part of my Master's thesis, I'm trying to run some statistics on which factors affect whether crowdfunding campaigns get funded or not. I've been trying to get data from the largest platform Kickstarter.com. Unfortunately, they have removed all the non-successful campaigns from their website (unless you have the direct link).
Luckily, I'm not the only one looking for this data.
Webrobots.io have a scraper robot which crawls all Kickstarter projects and collects data in JSON format (http://webrobots.io/kickstarter-datasets/).
The latest dataset can be found on:
http://webrobots.io/wp-content/uploads/2015/10/Kickstarter_2015-10-22.json_.zip
However, my programming skills are limited, and I don't know how to convert it into an excel file where I can manipulate the data and run my analysis. I found a few online converters, but the file is far too big for it (approx 300 mb).
Can someone please help me get the file converted?
It will earn you an acknowledgement in my Master's thesis when it gets published :)
Thanks in advance!!!
I guess the answer for this varies massively on a few things.
What subject is the masters covering? (mainly to appease many people who will probably assume you're hoping for people to do your homework for you! This might explain why the thread has been down-voted already)
You mention your programming skills are limited... What programming skills do you have? What language would you be using to achieve this goal? Bear in mind that even with a fully coded solution, if it's not in the language you know, you might not be able to compile it!
What kind of information do you want from the JSON file?
With regards to question 3, I've looked in the JSON file and it contains hierarchical data which is pretty difficult to replicate in a flat file i.e. an Excel or CSV file (I should know, we had to do this a lot in a previous job of mine).
But, I would look at the following plan of action to achieve what you're after:
Use a JSON parser to serialize the data into a class structure (Visual Studio can create the classes for you... See this S/O thread - How to show the "paste Json class" in visual studio 2012 when clicking on Paste Special?)
Once you've got the objects in memory, you can then step through them one by one and pick out the data you want and append them to a comma-separated string (in C# I'd use the StringBuilder) and write the rows of data out to a file on disk.
Once this is complete, you'll have the data you want.
Depending on what data you want from the JSON file, step 2 could be the most difficult part as you'd need to step into the different levels of the data hierarchy.
Hope this points you in the right direction?
You may want to look at this Blog.
http://jdunkerley.co.uk/2015/09/04/downloading-and-parsing-met-office-historic-station-data-with-alteryx/
He uses a process with Alteryx that may line up with what you are trying to do. I am looking to do something similar, but haven't tried it yet. I'll update this answer if I get it to work.
Related
TL;DR: I'm looking for some resources on generating GRIB2 data sets on the fly, ideally using in-house-generated wind data in a CSV format.
We have a bunch of data for a series of localized weather stations monitoring wind information around our city. They report in at ~2-3 minute intervals (far more frequent than standard weather data), and from their reports we have lat, lon, wind speed, and wind direction. Someone went and told the boss about these really slick visualizations, like this that can display wind speed and direction, and it's my job to make it happen.
The above plug-in for Leaflet, GitHub here, as well as several others, all use GRIB2 data, which from my research involves a left/right set of data and an up/down set of data for a series of points plotted out across a region.
The problem I'm having is that I've only found a handful of tools that interact with GRIB2 data, and most seem to decode data from the GRIB2 dataset, and only one tool running on Fortran seems to exist that compiles GRIB2 data together.
So, is there any way to generate GRIB2 data on the fly using proprietary data at 2-3 minute intervals?
I've gone through this resource on NOAA's website, which is where I found a few tools.
I know how frustrating it can be to work with GRIB and some of the other science/weather related formats. This may not be the best answer, but it might be your only answer as I find these types of questions to only gather dust because of the general lack of knowledge with the formats and tools.
From what I remember, CDO tools (link here) can do some magical things - but I am not that experienced with it. I do use it for converting satellite data to plain text and it's been an absolute lifesaver! So I will explain :
My suggestion was to first convert the CSV to netCDF. I had a link saved for this a long time ago, but never came to really needing it. (discussion here). Essentially, some python code should be able to do the conversion for you. There may be several ways to do this, but I have never looked into it beyond initial research.
Next, you should be able to convert .nc to .grib using CDO. I know it can do quite alot. Here is a discussion regarding this, so it must be able to be done.
I also see at this link where someone converts grib to netcdf, but you should be able to do it in reverse as well. I just don't know the exact commands. From the link :
As an example of use of CDO, converting from GRIB to netCDF can be as simple as
cdo -f nc copy file.grb file.nc
I would suspect its just the reverse but probably something like :
cdo -f grb file.nc file.grb
Hopefully you can put things together for it to work without being too hack-y.
You can do this in a simple python script using pandas , xarray and cfgrib
import pandas as pd
import cfgrib
data = pd.read_csv('your_csv_data.csv')
xarray_data = data.to_xarray()
cfgrib.to_grib(xarray_data, 'out2.grib')
Please note that you have to define grib specifications first before you store as grib data.
I'm looking to identify some possible software options that will allow for custom rules to manipulate bulk data files (.csv) For example, proper capitalization (allowing for states to remain capital and unique surnames), identifying the word count of specific words in a field, and some other custom rules. Any guidance would be appreciated.
You could use Talend Open Studio for this task. It is an Opensource ETL tool for data manipulation and integration. You can for example ImportCSV >> DATABASE >> perform transformations >> ExportCSV. The possibilities are endless.
You can find it here: http://www.talend.com/products-data-integration/talend-open-studio.php
It also sounds like you might be looking to create a profile of the data. For this you can use Talend Open Profiler, they recently added support for flat files such as your .csv. It is simple to use and you should be up and running in 30 mins.
You can find the download here: http://www.talend.com/products-data-quality/talend-open-profiler.php
You can find some tutorials here:http://www.talendforge.org/tutorials/menu.php
On the tutorials choose the Data Quality tab, and scroll down until 'Talend Open Profiler'
It is my first step in assessing data quality on a new dataset.
A quick google "data scrubbing utilities" turned up this:
http://data-scrubbing.qarchive.org/
They look to be very close to what you're looking for.
It'll really depend on how complex the rules get. Much more complex than simple stuff, and you'd probably be ahead by just coding something up (or having it coded).
Any ideas ?
I think the original source was a goldmine database, looking around it appears that the file was likely built using an application called ACT which I gather is a huge product I don't really want to be deploying for a one off file total size less than 5 meg.
So ...
Anyone know of a simple tool that I can run this file through to convert it to a standard CSV or something?
It does appear to be (when looking at it in notepad and excel) in some sort of csv type format but it's like the data is encrypted somehow.
Ok this is weird,
I got a little confused because the data looked a complete mess, in actual fact the mess was the data, that's what it was meant to look like.
Simply put, i opened the file in notepad, seemed to have a sort of pattern so i droppped it on excel.
Apparently excel has no issues reading these files ... strange huh !!!
I am unaware of any third party tooling for opening these files specifically, although there is an SDK available for C# which could resolve your problem with a little elbow grease.
The SDK can be aquired for free Here
Also there is a developer forum which could provide some valuable resources including training material with sample code Here
Resources will be provided with the SDK
Also, out of interest since ACT is a Sage product have you any Sage software floating about which you could attempt to access the data with? Most offices have!
Failing all of the above there is a trial available for ACT! Here!
Good luck with your problem!
So, I want to import, export and modify the database. I have read that I have to do that by XML, but I don't really understand their doc system and I haven't found any good tutorials out there that explain this. I am slowly reading the very expensive and short book which is somewhat answering my questions, but I crave more.
As a second question, I want to have a order system where I can send out information or emails with my own code. I assume this would be some type of plug-in that would override or be called at a certain time. Any info would be helpful.
Some parts of the magento data can be imported/exported via the backend (System->Import/Export), namely products and customers.
If you want to deal with the complete DB - use your DB tool of choice (I prefer mysqldump).
When dealing with exported CSV.. use OpenOffice, from my experience it deals better with the separation characters than Excel.
As for your second question - as far as I understood, you will have to develop a module if you want to do something different than the existing functionality and keep the original mail functions. If you don't want to/have to keep the original functions, you can opt to overwrite the module, which is much easier as far as I can see. Google search for "overriding magento module" should turn up atleast one decent tutorial.
I found what I was looking for here:
(on magento site: Resources -> Magento Core API -> Product API or whichever API you want)
The problem is there is no Order API yet (or none that I've seen)
http://www.magentocommerce.com/wiki/doc/webservices-api/api/catalog_product#examples
This details how you'd write an external php script and obtain,edit or delete products (or anything else with an API).
Modules still look daunting, but I am reading through the (very thin) magento book (the only one available).
I hope this helps someone else.
I want to write a simple application like the "stardict" (but not so huge), that searches for the phrase in the dictionary and provides the corresponding value.
I guess that it is kind of "bicycle" and that it was done many times by different people... But the thing is that all the suitable open software, that is available in the web is the "stardict" and it is just incredibly ugly for me personally.
I think that I can write some back-end, that searches for articles in the dictionary and just provides the result in plain form. And the second app, the front-end would just present the result on the screen in acceptable form.
Please recommend the dictionary application the file format to start with, I just want to hear suggestions from my fellow programmers.
Requirements: free, open, has converters to and from popular formats.
P.S. The Apple's "Dictionary" would be just perfect, but it cannot search for the phrase. So if anyone knows how to extend it with "plugins", just let me know. This app is not free, but it is acceptable also.
Have you thought of Unix spell/ispell and the like?