Sort numbers by string - mysql

Hi I am using an API from Postcode Anywhere, the idea being to add a company by searching by postcode to select the address, this is pretty standard and the code works fine.
Just some background info, PAW works in two stages, 1 the post code search criteria is sent off to their services, which returns an array of possible addresses, you then select the address you want, and in stage 2, the full PAF file for that ID is returned and stored to the table.
The problem I am having is that the array they send includes an Address Field which includes house number and street address in one field, making it difficult to sort alphanumerically.
This is the sample data I have in my table:
and this is how it looks in my application:
As you can see it is not ideal and I have no control on how they send the data.
Does anyone have any ideas on how I can search a string based on numbers that can be 1, 11, 2, instead of 01, 02, 03, etc, or at the very least be able to split this into two rows. Also please note, that it most cases, the post code search will result business/property names as well as house numbers, as seen in this example.
Any thoughts would be greatly appreciated.

Have you considered using a different API provider for the data, Allies Computing (who I work for) have a single step API, where the initial postcode search returns all fields in the response. It also orders these results by premise number/name.
Give it a try here - https://developers.alliescomputing.com/postcoder-web-api/address-lookup/premise
There are also other providers of PAF data that do it this way such as Crafty Clicks and Ideal Postcodes.
It might also be worth checking the PAF license with your provider to ensure you comply with that too.

Related

API Query Syntax - JSON object could not be decoded

This relates to the Patentsview.org API.
http://www.patentsview.org/api/uspc.html#field_list
I would like to modify my current query to limit itself to one or more USPC IDs (US patent classification) I am using id=348 for my test case.
Here is a query I have that works:
PATENTS_API_URL_TEMPLATE = 'http://www.patentsview.org/api/patents/query?q={%22_text_any%22:{%22patent_abstract%22:%22term_placeholder%22}}&f=[%22patent_number%22,%22patent_date%22,%22inventor_last_name%22,%22patent_abstract%22]'
The above query searches for the term_placeholder anywhere in the text and returns the patent number, data, inventors last name, and abstract.
I don't care about the date and name. I only care about the patent number and abstract. But I also want to limit it to one or more patent classes.
I tried the following:
PATENTS_API_URL_TEMPLATE = 'http://www.patentsview.org/api/uspc_mainclasses/query?q={%22_and%22:[{%22_text_any%22:{%22patent_abstract%22:%22term_placeholder%22}},{%22uspc_mainclass_id%22:%22348%22}}]}&f=[%22patent_number%22,%22patent_abstract%22]'
The error I get is ValueError: No JSON object could be decoded.
I was using the following example provided by Patentsview:
http://www.patentsview.org/api/uspc_mainclasses/query?q={"_and":[{"_contains":{"assignee_organization":"Census"}},{"_gte":{"patent_date":"2000-01-01"}},{"_lte":{"patent_date":"2010-12-31"}},{"_contains":{"uspc_mainclass_title":"Electricity"}}]}&f=["inventor_id","inventor_first_name","inventor_last_name"]
I replaced the double quotes with %22, switched from api/patents to api/uspc_mainclasses, removed the name and date fields, and otherwise tried to follow the example.
What am I doing wrong? Thanks!
You just have an extra } after the 348". Without it the api returns the json you are looking for
[https://api.patentsview.org/uspc_mainclasses/query?q={"_and":\[{"_text_any":{"patent_abstract":"term_placeholder"}},{"uspc_mainclass_id":"348"}\]}&f=\["patent_number","patent_abstract"\]][1]
You can search multiple classes by changing the uspc_mainclass_id to an array
[https://api.patentsview.org/uspc_mainclasses/query?q={"_and":\[{"_text_any":{"patent_abstract":"term_placeholder"}},{"uspc_mainclass_id":\["348","343"\]}\]}&f=\["patent_number","patent_abstract"\]][2]
Note that the uspc classifications stopped being used by the patent office in June 2015. Your query won't return results more recent that than. You'll need to use the cpc classification fields like cpc_group_id if you want to do classification searches that return more recent results.
[1]: https://api.patentsview.org/uspc_mainclasses/query?q=%7B%22_and%22:[%7B%22_text_any%22:%7B%22patent_abstract%22:%22term_placeholder%22%7D%7D,%7B%22uspc_mainclass_id%22:%22348%22%7D]%7D&f=[%22patent_number%22,%22patent_abstract%22]
[2]: https://api.patentsview.org/uspc_mainclasses/query?q=%7B%22_and%22:[%7B%22_text_any%22:%7B%22patent_abstract%22:%22term_placeholder%22%7D%7D,%7B%22uspc_mainclass_id%22:[%22348%22,%22343%22]%7D]%7D&f=[%22patent_number%22,%22patent_abstract%22]

Best way to parse a big and intricated Json file with OpenRefine (or R)

I know how to parse json cells in Open refine, but this one is too tricky for me.
I've used an API to extract the calendar of 4730 AirBNB's rooms, identified by their IDs.
Here is an example of one Json file : https://fr.airbnb.com/api/v2/calendar_months?key=d306zoyjsyarp7ifhu67rjxn52tv0t20&currency=EUR&locale=fr&listing_id=4212133&month=11&year=2016&count=12&_format=with_conditions
For each ID and each day of the year from now until november 2017, i would like to extract the availability of this rooms (true or false) and its price at this day.
I can't figure out how to parse out these informations. I guess that it implies a series of nested forEach, but i can't find the right way to do this with Open Refine.
I've tried, of course,
forEach(value.parseJson().calendar_months, e, e.days)
The result is an array of arrays of dictionnaries that disrupts me.
Any help would be appreciate. If the operation is too difficult in Open Refine, a solution with R (or Python) would also be fine for me.
Rather than just creating your Project as text, and working with GREL to parse out...
The best way is just select the JSON record part that you want to work with using our visual importer wizard for JSON files and XML files (you can even use a URL pointing to a JSON file as in your example). (A video tutorial shows how here: https://www.youtube.com/watch?v=vUxdB-nl0Bw )
Select the JSON part that contains your records that you want to parse and work with (this can be any repeating part, just select one of them and OpenRefine will extract all the rest)
Limit the amount of data rows that you want to load in during creation, or leave default of all rows.
Click Create Project and now your in Rows mode. However if you think that Records mode might be better suited for context, just import the project again as JSON and then select the next outside area of the content, perhaps a larger array that contains a key field, etc. In the example, the key field would probably be the Date, and why I highlight the whole record for a given date. This way OpenRefine will have Keys for each record and Records mode lets you work with them better than Row mode.
Feel free to take this example and make it better and even more helpful for all , add it to our Wiki section on How to Use
I think you are on the right track. The output of:
forEach(value.parseJson().calendar_months, e, e.days)
is hard to read because OpenRefine and JSON both use square brackets to indicate arrays. What you are getting from this expression is an OR array containing twelve items (one for each month of the year). The items in the OR array are JSON - each one an array of days in the month.
To keep the steps manageable I'd suggest tackling it like this:
First use
forEach(value.parseJson().calendar_months,m,m.days).join("|")
You have to use 'join' because OR can't store OR arrays directly in a cell - it has to be a string.
Then use "Edit Cells->Split multi-valued cells" - this will get you 12 rows per ID, each containing a JSON expression. Now for each ID you have 12 rows in OR
Then use:
forEach(value.parseJson(),d,d).join("|")
This splits the JSON down into the individual days
Then use "Edit Cells->Split multi-valued cells" again to split the details for each day into its own cell.
Using the JSON from example URL above - this gives me 441 rows for the single ID - each contains the JSON describing the availability & price for a single day. At this point you can use the 'fill down' function on the ID column to fill in the ID for each of the rows.
You've now got some pretty easy JSON in each cell - so you can extract availability using
value.parseJson().available
etc.

Address parts, proper order

if I look up this address:
Maanweg 174, 2500 BD The Hague, Nederlands
google maps finds it perfectly. if I look it up this way:
Maanweg 174, The Hague, Nederlands 2500 BD
It does not, because the postal code is out of place.
Now my question: I have 4 fields: Address, City/Region, Country and Postal Code. if the user starts to type the address, a lookup on google maps comes up with a list of addresses the user can pick from. I break the user selection appart and fill-in my 4 fields.
However... if the user changes some part of the address, I need to reconstitute it into a string to feed it to google... but I don't know the proper order. in the Netherlands, the postal code goes after the address. in the US it goes at the end, right before the country.
how can I find out what the proper order is?
There is no unique format, you should use the format as it will be used in the particular country (see: https://developers.google.com/maps/faq#geocoder_queryformat )
According to http://www.bitboost.com/ref/international-address-formats/netherlands/ the order of the first example is correct(for the Netherlands).

Is there a way to select fields that are grayed out on the Socrata web client?

I am attempting to get the compared_to_national column for the readmission data located at Data.Medicare.gov
This column is grayed out on the web interface making me think that it is a computed field or a join with another table.
$ curl https://data.medicare.gov/resource/7xux-kdpw? | grep national
shows that this value is not being returned at all even when everything is selected. Am I missing something here or is this data just not available?
Sorry for the late reply, but maybe I can still help out.
The greyed out values are data that was incompatible with the selected datatype at import time. In the case of Readmissions Complications and Deaths - Hospital, that's textual data that the data owner attempted to import into a Numeric column type.
Unfortunately it isn't queryable, but I'll let the account manager for CMS know so that hopefully they can have that fixed.

Valid to return different json-response depending on list or retrieve?

I am currently designing a Rest API and is a little stuck on performance matters for 2 of the use cases in the system:
List all campaigns (api/campaigns) - needs to return campaign data needed for listing and paging campaigns. Maybe return up to 1000 records and would take ages to retreive and return detailed data. The needed data can be returned in a single DB call.
Retrieve campaign item (api/campaigns/id) - need to return all data about the campaign and may take up to a second to run. Multiple DB calls is needed to get all campaign data for a single campaign.
My question is: Is it valid to return different json-responses to those 2 calls (if well documented) even if it regards the same resource? I am thinking that the list response is a sub set of the retreive-response. The reason for this is to make to save DB calls and bandwitdh + parsing.
Thanks in advance!
I think it's both fine and expected for /campaigns and /campaigns/{id} to return different information. I would suggest using query parameters to limit the amount of information you need to return. For instance, only return a URI to each player unless you see a ?expand=players query parameter, in which case you return detailed player information.