translate html table to csv - csv

So.. I am a little confused on this problem myself so I am turning to the one place I know I am certain to find help!
I have a shopping cart where in order to purchase something a user must first fill out a form. At the end I am displaying the information they entered in the form along with the product they are purchasing, how much they are paying and any sort of modifiers the product might have (like size, color, shipping etc etc). I have a third party company that then needs to retrieve that information in a CSV file.
Any ideas on how this can be achieved. I am new at this sort of thing so I apologize if I may have missed this somewhere else on the forum.
Thanks!
Jamie

This is actually not too hard! A CSV files is just a comma separated values file.
Whatever you are using as a backend (PHP, c#, c) you can write to a file like so:
CustomerID, Name, Item1
1, John, Table
and that would appear in excel as CustomerID Name and Item1 as the header row and 1, John, Table as the first record. Make sure that when you create the file you call it MyFile.CSV so it is associated with Excel by windows!
Look here for how to properly format your CSV file!
http://creativyst.com/Doc/Articles/CSV/CSV01.htm
EDIT:
I see now you say HTML table so in the case you have no server code access I direct you here for more info.
Export to csv in jQuery

Related

extract sub-headers in R or Python

desperate newbie here. I have a question for which I just cannot find the right solution. I received a dta file from which I want to extract the sub-headers of each column. Unfortunately, I am not versed in Stata or have access to it. I read my dta file into R and changed it to a data frame and also data table. It displays the column names and sub-headers well. However I cannot extract the sub-headers and they also disappear when I save the data frame or table as a csv or excel file locally. When I call colnames(df) or names(df), I only receive the column names and not the sub-headers. I also tried it with python without luck. Unfortunately, I am not allowed to share the data. So I hope my problem is understandable without an example. Thank you in advance!

How to decouple variable names in external files and the code?

Imagine I have an external file dates.csv in the following format:
Name
Date
start_of_fin_year
01.03.2022
end_of_fin_year
28.02.2023
Obviously, the file may get updated in the future, and the date may change. I create a piece of code that checks the file periodically to extract needed dates and put them into the DB/variables. Roughly speaking, I have this pseudocode:
start_of_fin_year = SELECT Date FROM table WHERE Name = 'start_of_fin_year'
The problem I face: my code will break if I or someone else changes the name in the table. How do I prevent this?
FYI this is a personal project that I developed on my own, but I will have to give access to .csv files to others so they can update info. I'm afraid they may accidentally change the names, so that's why I'm worried.

Best way to parse a big and intricated Json file with OpenRefine (or R)

I know how to parse json cells in Open refine, but this one is too tricky for me.
I've used an API to extract the calendar of 4730 AirBNB's rooms, identified by their IDs.
Here is an example of one Json file : https://fr.airbnb.com/api/v2/calendar_months?key=d306zoyjsyarp7ifhu67rjxn52tv0t20&currency=EUR&locale=fr&listing_id=4212133&month=11&year=2016&count=12&_format=with_conditions
For each ID and each day of the year from now until november 2017, i would like to extract the availability of this rooms (true or false) and its price at this day.
I can't figure out how to parse out these informations. I guess that it implies a series of nested forEach, but i can't find the right way to do this with Open Refine.
I've tried, of course,
forEach(value.parseJson().calendar_months, e, e.days)
The result is an array of arrays of dictionnaries that disrupts me.
Any help would be appreciate. If the operation is too difficult in Open Refine, a solution with R (or Python) would also be fine for me.
Rather than just creating your Project as text, and working with GREL to parse out...
The best way is just select the JSON record part that you want to work with using our visual importer wizard for JSON files and XML files (you can even use a URL pointing to a JSON file as in your example). (A video tutorial shows how here: https://www.youtube.com/watch?v=vUxdB-nl0Bw )
Select the JSON part that contains your records that you want to parse and work with (this can be any repeating part, just select one of them and OpenRefine will extract all the rest)
Limit the amount of data rows that you want to load in during creation, or leave default of all rows.
Click Create Project and now your in Rows mode. However if you think that Records mode might be better suited for context, just import the project again as JSON and then select the next outside area of the content, perhaps a larger array that contains a key field, etc. In the example, the key field would probably be the Date, and why I highlight the whole record for a given date. This way OpenRefine will have Keys for each record and Records mode lets you work with them better than Row mode.
Feel free to take this example and make it better and even more helpful for all , add it to our Wiki section on How to Use
I think you are on the right track. The output of:
forEach(value.parseJson().calendar_months, e, e.days)
is hard to read because OpenRefine and JSON both use square brackets to indicate arrays. What you are getting from this expression is an OR array containing twelve items (one for each month of the year). The items in the OR array are JSON - each one an array of days in the month.
To keep the steps manageable I'd suggest tackling it like this:
First use
forEach(value.parseJson().calendar_months,m,m.days).join("|")
You have to use 'join' because OR can't store OR arrays directly in a cell - it has to be a string.
Then use "Edit Cells->Split multi-valued cells" - this will get you 12 rows per ID, each containing a JSON expression. Now for each ID you have 12 rows in OR
Then use:
forEach(value.parseJson(),d,d).join("|")
This splits the JSON down into the individual days
Then use "Edit Cells->Split multi-valued cells" again to split the details for each day into its own cell.
Using the JSON from example URL above - this gives me 441 rows for the single ID - each contains the JSON describing the availability & price for a single day. At this point you can use the 'fill down' function on the ID column to fill in the ID for each of the rows.
You've now got some pretty easy JSON in each cell - so you can extract availability using
value.parseJson().available
etc.

Creating simple web page associating image with excel column

I have an excel page with columns such as:
Col Col2 Col3
abc 1 3
def 4 5
hgi 9 5
and so on.
And I have various images in a folder with file names corresponding to 'Col' i.e. abc.jpg, def.jpg, etc.
I want to create a webpage with asks for a string from a user, searches it in Col1, then displays corresponding Col2, Col3, and the image corresponding to that Col1.
NOW, I'm not asking for what to do, but I just want to know HOW will I do this? What all tools/scripts/languages/software do I need to know to implement a system like this? I'll get to learning that straight away.
I have very little experience in web development hence a very noob-ish question.
Thanks a lot guys!
If you want to have the user enter the information on the Webpage itself, I would recommend JavaScript. JavaScript is a good enough language to do the stuff you want to do. For accessing Excel, you could always export the Excel file as a CSV (Comma Separated Values) or another format and read into your JavaScript. You can read a file in your JavaScript code using this link, for example.

Formatting csv files in Excel

Win XP, Excel 2007
I know there are various other posts on csv formatting but couldn't quite find what i needed.
Some of our data is held off site by another company and they send us a csv file every morning with the previous days data.
The problem is this data has come from web input forms that may have drop-down lists.
For example there may be a drop down list of Number of Employees with options like 1-10, 11-25, 26-50 etc
When we open the csv file in Excel certain options like 1-10 has been turned into Oct-01 date format which we do not want.
Is there an easy way to change these back OR reformat the cells and do a find...replace? (This didn't seem to work terribly well as it kept reverting back to the date)
Indeed is there a better way of opening the csv file to keep the formatting intact? and save us doing lots of find...replaces.
Ultimately we will need to open the csv in Excel though.
Grateful for any hints
Isn't that SO annoying? Here's how I deal with this issue:
When you open the CSV file in Excel, you should get a dialog with parsing options. First you select delimited or fixed then you get a screen that previews the data parsing.
It's easy to miss, but in the upper right corner of the dialog box there's an option to set a specific data format for each column. Select the column you want to protect and set the format to text. (This keeps Excel from dropping the leading zeros in ZIP codes for New England too!)
Once you get it into Excel, you can do a vlookup or replace to reset the values to your own codes.
Hope this helps. Good luck.