OSM to CSV with OSMCONVERT gives my "empty" amenities - csv

I'm trying to convert my country's Open Street Map data to a CSV file which I can then load in tableau and map businesses.
I downloaded osmconvert, which seems to be the default tool for this process.
Then I downloaded the Dominican Republic's data from this URL:
http://download.geofabrik.de/central-america/haiti-and-domrep.html
When I run the following command:
osmconvert64.exe data.osm --csv="#id #lon #lat #amenity #name #shop" --csv-headline --csv-separator=, -o=outfile.csv
I get a sheet like this:
Picture of excel sheet with empty columns
I seem to get the ID, Latitude and Longitude data right, but get empty amenity, name, and shop columns.
Am I writing the command wrong? I'd appreciate any help, since I can't seem to find an user-friendly tutorial on using this tool on the internet.

I'm not very familiar with osmconvert but as far as I know it will generate an entry for each element in the input file. Most elements from the input file won't have a amenity, name or shop tag. That's why you get so many empty columns. I guess you have to specify an additional filter to get only elements with an amenity or shop tag, possibly with the help of osmfilter.
Alternatively just use Overpass API and perform a simple query for all elements with an amenity or shop tag. Overpass API is also able to generate CSV output.

Related

Is there a way to import a bunch of JSON files into Excel

I have about a hundred JSON files of data that I would like to be able to manipulate in Excel. The reason why there are so many files is that the API I pulled from limits responses to 50 items per request, so I chained 100 requests together in Postman and each request generated its own file.
The layout of each file is as follows:
{
"href": "dsjdsjds.com",
"total": 4293,
"next": "sdsadsads.com",
"prev": "dsjdjsdj.com",
"limit": 50,
"offset": 50,
"itemSummaries": [...]
}
Pretty much all of the data that I want lies inside the itemSummaries class.
I'm pretty new to this and not sure if the optimal way would be to use a Python script, or if there was a way to use VBA or something. I was thinking that I'd need to combine all of the data into a single file first, but I don't know how to do that either. I appreciate the help!
Here is what I did for similar situation where I had to import multiple JSON files, all with same structure.
Use Get & Transform in Data Ribbon to import the JSON file as text.
The Power Query will recognize this as JSON. Edit the result in Power
Query window and expand/transform the imported data until you can
show in tabular form.
You can then convert these manual sequence of
steps into Custom Function. See here for details -
https://www.poweredsolutions.co/2019/02/19/parameters-and-functions-in-power-bi-power-query-custom-functions/
Go back to Excel and this time instead of importing the JSON file, import the folder where all these JSON are available and apply your custom function on the individual JSON files to produce a consolidated table.
I found this article.
Import from JSON
Select Data > Get Data > From File > From Folder. The Browse dialog box appears.
Locate the folder containing the files you want to combine.
Note The message “No items match your search” means Power Query has found a folder and it’s displayed in the Folder name box. The files you want to combine are still in the folder, but just not visible.
A list of the files in the folder appears in the dialog box. Verify that all the files you want are listed.
Select one of the commands at the bottom of the dialog box, for example Combine > Combine & Transform. There are additional commands discussed in the section About all those commands.
The Power Query Editor appears.
The Value column is a structured List column. Select the Expand icon, and then select Expand to New rows.
The Value column is now a structured Record column. Select the Expand icon. A drop-down dialog box appears.
Keep all the columns selected. You may want to clear the Use original column name as a prefix check box. Select OK.
Select all the columns that contain data values. Select Home, the arrow next to Remove Columns, and then select Remove Other Columns.
Select Home > Close & Load.
Result
Power Query automatically creates a query to consolidate the data from each file into a worksheet. The query steps and columns created depend on which command you choose. For more information, see the section, About all those queries.

Query using references to cell

I have a table with latitude and longitudes. I want to pull weather data from an API (openweathermap.org) using the coordinates from my table. Any help would be appreciated. I don't have much experience with queries.
let
Source = Json.Document(Web.Contents("api.openweathermap.org/data/2.5/weather?lat=49.57&lon=-121.79&appid=xxxxx"))
in
Source
Your API call should contain "https://" in the web address like so:
https://api.openweathermap.org/data/2.5/weather?lat=49.57&lon=-121.79&appid=xxxxx
Also you might already be aware, but it also goes without saying you need to replace XXXXX with your actual appid

Mass Upload Files To Specific Contacts Salesforce

I need to upload some 2000 documents to specific users in salesforce. I have a csv file that has the Salesforce-assigned ContactID, as well as a direct path to the files on my desktop. Each contact's specific file url has been included in the csv. How can I upload them all at one and, especially, to the correct contact?
You indicated in the comments / chat that you want it as "Files".
The "Files" object is bit more complex than Attachments, you'll need to do it in 2-3 steps. What you see as a File (you might see it referred to in documentation as Chatter Files or Salesforce Content) is actually several tables. There's
ContentDocument which can be kind of a file header (title, description, language, tags, linkage to many other areas in SF - because it can be standalone, it can be uploaded to certain SF Content Library, it can be linked to Accounts, Contacts, $_GOD knows what else)
ContentVersion which is well, actual payload. Only most recent version is displayed out of the box but if you really want you can go back in time
and more
The crap part is that you can't insert ContentDocument directly (there's no create() call in the list of operations) .
Theory
So you'll need:
Insert ContentVersion (v1 will automatically create for you parent ContentDocuments... it does sound bit ass-backwards but it works). After this is done you'll have a bunch of standalone documents loaded but not linked to any Contacts
Learn the Ids of their parent ContentDocuments
Insert ContentDocumentLink records that will connect Contacts and their PDFs
Practice
This is my C:\stacktest folder. It contains some SF cheat sheet PDFs.
Here's my file for part 1 of the load
Title PathOnClient VersionData
"Lightning Components CheatSheet" "C:\stacktest\SF_LightningComponents_cheatsheet_web.pdf" "C:\stacktest\SF_LightningComponents_cheatsheet_web.pdf"
"Process Automation CheatSheet" "C:\stacktest\SF_Process_Automation_cheatsheet_web.pdf" "C:\stacktest\SF_Process_Automation_cheatsheet_web.pdf"
"Admin CheatSheet" "C:\stacktest\SF_S1-Admin_cheatsheet_web.pdf" "C:\stacktest\SF_S1-Admin_cheatsheet_web.pdf"
"S1 CheatSheet" "C:\stacktest\SF_S1-Developer_cheatsheet_web.pdf" "C:\stacktest\SF_S1-Developer_cheatsheet_web.pdf"
Fire Data Loader, select Insert, select showing all Salesforce objects. Find ContentVersion. Load should be straightforward (if you're hitting memory issues set batch size to something low, even 1 record at a time if really needed).
You'll get back a "success file", it's useless. We don't need the Ids of generated content versions, we need their parents... Fire "Export" in Data Loader, pick all objects again, pick ContentDocument. Use query similar to this:
Select Id, Title, FileType, FileExtension
FROM ContentDocument
WHERE CreatedDate = TODAY AND CreatedBy.FirstName = 'Ethan'
You should see something like this:
"ID","TITLE","FILETYPE","FILEEXTENSION"
"0690g0000048G2MAAU","Lightning Components CheatSheet","PDF","pdf"
"0690g0000048G2NAAU","Process Automation CheatSheet","PDF","pdf"
"0690g0000048G2OAAU","Admin CheatSheet","PDF","pdf"
"0690g0000048G2PAAU","S1 CheatSheet","PDF","pdf"
Use Excel and magic of VLOOKUP or other things like that to link them back by title to Contacts. You wrote you already have a file with Contact Ids and titles so there's hope... Create a file like that:
ContentDocumentId LinkedEntityId ShareType Visibility
0690g0000048G2MAAU 0037000000TWREI V InternalUsers
0690g0000048G2NAAU 0030g000027rQ3z V InternalUsers
0690g0000048G2OAAU 0030g000027rQ3a V InternalUsers
0690g0000048G2PAAU 0030g000027rPz4 V InternalUsers
1st column is the file Id, then contact Id, then some black magic you can read about & change if needed in ContentDocumentLink docs.
Load it as insert to (again, show all objects) ContentDocumentLink.
Woohoo! Beer time.
Your CSV should contain following fields :
- ParentID = Id of object you want to link the attachment to (the ID of the contact)
- Name = name of the file
- ContentType = extension(.xls or .pdf or ...)
- OwnerId = if empty I believe it takes your user as owner
- body = the location on your machine of the file (for instance: C:\SFDC\Files\test.pdf
Use this csv to insert the records (via data loader) into the Attachment object.
You will then see for each contact, that records have been added to the 'Notes & Attachments' related list.

Best way to parse a big and intricated Json file with OpenRefine (or R)

I know how to parse json cells in Open refine, but this one is too tricky for me.
I've used an API to extract the calendar of 4730 AirBNB's rooms, identified by their IDs.
Here is an example of one Json file : https://fr.airbnb.com/api/v2/calendar_months?key=d306zoyjsyarp7ifhu67rjxn52tv0t20&currency=EUR&locale=fr&listing_id=4212133&month=11&year=2016&count=12&_format=with_conditions
For each ID and each day of the year from now until november 2017, i would like to extract the availability of this rooms (true or false) and its price at this day.
I can't figure out how to parse out these informations. I guess that it implies a series of nested forEach, but i can't find the right way to do this with Open Refine.
I've tried, of course,
forEach(value.parseJson().calendar_months, e, e.days)
The result is an array of arrays of dictionnaries that disrupts me.
Any help would be appreciate. If the operation is too difficult in Open Refine, a solution with R (or Python) would also be fine for me.
Rather than just creating your Project as text, and working with GREL to parse out...
The best way is just select the JSON record part that you want to work with using our visual importer wizard for JSON files and XML files (you can even use a URL pointing to a JSON file as in your example). (A video tutorial shows how here: https://www.youtube.com/watch?v=vUxdB-nl0Bw )
Select the JSON part that contains your records that you want to parse and work with (this can be any repeating part, just select one of them and OpenRefine will extract all the rest)
Limit the amount of data rows that you want to load in during creation, or leave default of all rows.
Click Create Project and now your in Rows mode. However if you think that Records mode might be better suited for context, just import the project again as JSON and then select the next outside area of the content, perhaps a larger array that contains a key field, etc. In the example, the key field would probably be the Date, and why I highlight the whole record for a given date. This way OpenRefine will have Keys for each record and Records mode lets you work with them better than Row mode.
Feel free to take this example and make it better and even more helpful for all , add it to our Wiki section on How to Use
I think you are on the right track. The output of:
forEach(value.parseJson().calendar_months, e, e.days)
is hard to read because OpenRefine and JSON both use square brackets to indicate arrays. What you are getting from this expression is an OR array containing twelve items (one for each month of the year). The items in the OR array are JSON - each one an array of days in the month.
To keep the steps manageable I'd suggest tackling it like this:
First use
forEach(value.parseJson().calendar_months,m,m.days).join("|")
You have to use 'join' because OR can't store OR arrays directly in a cell - it has to be a string.
Then use "Edit Cells->Split multi-valued cells" - this will get you 12 rows per ID, each containing a JSON expression. Now for each ID you have 12 rows in OR
Then use:
forEach(value.parseJson(),d,d).join("|")
This splits the JSON down into the individual days
Then use "Edit Cells->Split multi-valued cells" again to split the details for each day into its own cell.
Using the JSON from example URL above - this gives me 441 rows for the single ID - each contains the JSON describing the availability & price for a single day. At this point you can use the 'fill down' function on the ID column to fill in the ID for each of the rows.
You've now got some pretty easy JSON in each cell - so you can extract availability using
value.parseJson().available
etc.

How to access latitude and longitude data from the page source of a craigslist search?

So I have been trying to pull latitude and longitude data from the page source of a craigslist search, but I can't seem to find the data. I am basically trying to replicate another example that I found on Stack Overflow, found here
The problem I seem to be having is that, unlike in the example from the link above, I cannot find these data in the source <p, although <div contains id="mapcontainer", data-arealat, and data-arealon.
I am using the following function to pull the attributes that I want:
xpathSApply(X,path="//p[#attribute I want]",fun = xmlGetAttr , name = 'name of attribute')
The problem is that I cannot find any latitude and longitude data in the page source. My question, then, is whether there is some sort of conversion from other attributes which must be done to get latitude and longitude data for each listing of a craigslist search?
Any help is greatly appreciated.