API Query Syntax - JSON object could not be decoded - json

This relates to the Patentsview.org API.
http://www.patentsview.org/api/uspc.html#field_list
I would like to modify my current query to limit itself to one or more USPC IDs (US patent classification) I am using id=348 for my test case.
Here is a query I have that works:
PATENTS_API_URL_TEMPLATE = 'http://www.patentsview.org/api/patents/query?q={%22_text_any%22:{%22patent_abstract%22:%22term_placeholder%22}}&f=[%22patent_number%22,%22patent_date%22,%22inventor_last_name%22,%22patent_abstract%22]'
The above query searches for the term_placeholder anywhere in the text and returns the patent number, data, inventors last name, and abstract.
I don't care about the date and name. I only care about the patent number and abstract. But I also want to limit it to one or more patent classes.
I tried the following:
PATENTS_API_URL_TEMPLATE = 'http://www.patentsview.org/api/uspc_mainclasses/query?q={%22_and%22:[{%22_text_any%22:{%22patent_abstract%22:%22term_placeholder%22}},{%22uspc_mainclass_id%22:%22348%22}}]}&f=[%22patent_number%22,%22patent_abstract%22]'
The error I get is ValueError: No JSON object could be decoded.
I was using the following example provided by Patentsview:
http://www.patentsview.org/api/uspc_mainclasses/query?q={"_and":[{"_contains":{"assignee_organization":"Census"}},{"_gte":{"patent_date":"2000-01-01"}},{"_lte":{"patent_date":"2010-12-31"}},{"_contains":{"uspc_mainclass_title":"Electricity"}}]}&f=["inventor_id","inventor_first_name","inventor_last_name"]
I replaced the double quotes with %22, switched from api/patents to api/uspc_mainclasses, removed the name and date fields, and otherwise tried to follow the example.
What am I doing wrong? Thanks!

You just have an extra } after the 348". Without it the api returns the json you are looking for
[https://api.patentsview.org/uspc_mainclasses/query?q={"_and":\[{"_text_any":{"patent_abstract":"term_placeholder"}},{"uspc_mainclass_id":"348"}\]}&f=\["patent_number","patent_abstract"\]][1]
You can search multiple classes by changing the uspc_mainclass_id to an array
[https://api.patentsview.org/uspc_mainclasses/query?q={"_and":\[{"_text_any":{"patent_abstract":"term_placeholder"}},{"uspc_mainclass_id":\["348","343"\]}\]}&f=\["patent_number","patent_abstract"\]][2]
Note that the uspc classifications stopped being used by the patent office in June 2015. Your query won't return results more recent that than. You'll need to use the cpc classification fields like cpc_group_id if you want to do classification searches that return more recent results.
[1]: https://api.patentsview.org/uspc_mainclasses/query?q=%7B%22_and%22:[%7B%22_text_any%22:%7B%22patent_abstract%22:%22term_placeholder%22%7D%7D,%7B%22uspc_mainclass_id%22:%22348%22%7D]%7D&f=[%22patent_number%22,%22patent_abstract%22]
[2]: https://api.patentsview.org/uspc_mainclasses/query?q=%7B%22_and%22:[%7B%22_text_any%22:%7B%22patent_abstract%22:%22term_placeholder%22%7D%7D,%7B%22uspc_mainclass_id%22:[%22348%22,%22343%22]%7D]%7D&f=[%22patent_number%22,%22patent_abstract%22]

Related

Trying to pull the Name and/or ID of the code below, but can only pull the Job-Base-Cost

Below is the code I have now. It pulls the Job-Base-Cost just fine, however I cannot get it to pull the ID and or Name of the item. Can you help?
Link to the sites XML pull.
=importxml("link","//job-base-cost")
This is a sample of one line of the OP's XML file
<job-base-cost id="24693" name="Abaddon Blueprint">109555912.69</job-base-cost>
The OP wants to use the IMPORTXML function to report the ID and Name as well as the Job Cost from the XML data. Presently, the OP's formula is:
=importxml("link","//job-base-cost")
There are two options:
1 - One long column
=importxml("link","//#id | //#name | //job-base-cost")
Note //#id and //#name in the xpath query: // indicate nodes in the document (at any level, not just the root level) and # indicate attributes. The pipe | operator indicates AND. So the plain english query is to display the id, name and job-base-cost.
2 - Three columns (table format)
={IMPORTXML("link","//#name"),IMPORTXML("link","//job-base-cost"),IMPORTXML("link","//#id")}
This creates a series that will display the fields in each of three columns.
Note: there is an arrayformula that uses a single importXML function described in How do I return multiple columns of data using ImportXML in Google Spreadsheets?. Readers may want to look at whether that option can be implemented.
My thanks to #Tanaike for his comment which spurred me to look at how xpath works.

Extract comma-separated values from JSON Records within a List with PowerQuery

As part of a tool I am creating for my team I am connecting to an internal web service via PowerQuery.
The web service returns nested JSON, and I have trouble parsing the JSON data to the format I am looking for. Specifically, I have a problem with extracting the content of records in a column to a comma separated list.
The data
As you can see, the data contains details related to a specific "race" (race_id). What I want to focus on is the information in the driver_codes which is a List of Records. The amount of records varies from 0 to 4 and each record is structured as id: 50000 (50000 could be any 5 digit number). So it could be:
id: 10000
id: 20000
id: 30000
As requested, an example snippet of the raw JSON:
<race>
<race_id>ABC123445</race_id>
<begin_time>2018-03-23T00:00:00Z</begin_time>
<vehicle_id>gokart_11</vehicle_id>
<driver_code>
<id>90200</id>
</driver_code>
<driver_code>
<id>90500</id>
</driver_code>
</race>
I want it to be structured as:
10000,20000,30000
The problem
When I choose "Extract values" on the column with the list, then I get the following message:
Expression.Error: We cannot convert a value of type Record to type
Text.
If I instead choose "Expand to new rows", then duplicate rows are created for each unique driver code. I now have several rows per unique race_id, but what I wanted was one row per unique race_id and a concatenated list of driver codes.
What I have tried
I have tried grouping the data by the race_id, but the operations allowed when grouping data do not include concatenating rows.
I have also tried unpivoting the column, but that leaves me with the same problem: I still get multiple rows.
I have googled (and Stack Overflowed) this issue extensively without luck. It might be that I am using the wrong keywords, however, so I apologize if a duplicate exists.
UPDATE: What I have tried based on the answers so far
I tried Alexis Olson's excellent and very detailed method, but I end up with the following error:
Expression.Error: We cannot convert the value "id" to type Number. Details:
Value=id
Type=Type
The error comes from using either of these lines of M code (one with a List.Transform and one without):
= Table.Group(#"Renamed Columns", {"race_id", "begin_time", "vehicle_id"},
{{"DriverCodes", each Text.Combine([driver_code][id], ","), type text}})
= Table.Group(#"Renamed Columns", {"race_id", "begin_time", "vehicle_id"},
{{"DriverCodes", each Text.Combine(List.Transform([driver_code][id], each Number.ToText(_)), ","), type text}})
NB: if I do not write [driver_code][id] but only [id] then I get another error saying that column [id] does not exist.
Here's the JSON equivalent to the XML example you gave:
{"race": {
"race_id": "ABC123445",
"begin_time": "2018-03-23T00:00:00Z",
"vehicle_id": "gokart_11",
"driver_code": [
{ "id": "90200" },
{ "id": "90500" }
]}}
If you load this into the query editor, convert it to a table, and expand out the Value record, you'll have a table that looks like this:
At this point, choose Expand to New Rows, and then expand the id column so that your table looks like this:
At this point, you can apply the trick #mccard suggested. Group by the first columns and aggregate over the last using, say, max.
This last step produces M code like this:
= Table.Group(#"Expanded driver_code1",
{"Name", "race_id", "begin_time", "vehicle_id"},
{{"id", each List.Max([id]), type text}})
Instead of this, you want to replace List.Max with Text.Combine as follows:
= Table.Group(#"Changed Type",
{"Name", "race_id", "begin_time", "vehicle_id"},
{{"id", each Text.Combine([id], ","), type text}})
Note that if your id column is not in the text format, then this will throw an error. To fix this, insert a step before you group rows using Transform Tab > Data Type: Text to convert the type. Another options is to use List.Transform inside your Text.Combine like this:
Text.Combine(List.Transform([id], each Number.ToText(_)), ",")
Either way, you should end up with this:
An approach would be to use the Advanced Editor and change the operation done when grouping the data directly there in the code.
First, create the grouping using one of the operations available in the menu. For instance, create a column"Sum" using the Sum operation. It will give an error, but we should get the starting code to work on.
Then, open the Advanced Editor and find the code corresponding to the operation. It should be something like:
{{"Sum", each List.Sum([driver_codes]), type text}}
Change it to:
{{"driver_codes", each Text.Combine([driver_codes], ","), type text}}

Sort numbers by string

Hi I am using an API from Postcode Anywhere, the idea being to add a company by searching by postcode to select the address, this is pretty standard and the code works fine.
Just some background info, PAW works in two stages, 1 the post code search criteria is sent off to their services, which returns an array of possible addresses, you then select the address you want, and in stage 2, the full PAF file for that ID is returned and stored to the table.
The problem I am having is that the array they send includes an Address Field which includes house number and street address in one field, making it difficult to sort alphanumerically.
This is the sample data I have in my table:
and this is how it looks in my application:
As you can see it is not ideal and I have no control on how they send the data.
Does anyone have any ideas on how I can search a string based on numbers that can be 1, 11, 2, instead of 01, 02, 03, etc, or at the very least be able to split this into two rows. Also please note, that it most cases, the post code search will result business/property names as well as house numbers, as seen in this example.
Any thoughts would be greatly appreciated.
Have you considered using a different API provider for the data, Allies Computing (who I work for) have a single step API, where the initial postcode search returns all fields in the response. It also orders these results by premise number/name.
Give it a try here - https://developers.alliescomputing.com/postcoder-web-api/address-lookup/premise
There are also other providers of PAF data that do it this way such as Crafty Clicks and Ideal Postcodes.
It might also be worth checking the PAF license with your provider to ensure you comply with that too.

Accessing data imported into RavenDB via CSV import

I have successfully imported geodata (originally from a shapefile, converted to CSV) into my RavenDB. I am now trying to access the data with a naive, simplistic select (sanity check to see if everything's there) but I can't get any data member values back. Since I am a total RavenDB newbie and haven't created the data myself (programmatically), my approach was to define a class that has the same name as what I find in Raven Studio (minus the automatically-appended plural 's') under Raven-Entity-Name, and to declare each of the data members to be of type string.
The query runs through and retrieves the first 128 results, but all the data members are null. I used this:
List<AdministrativeArea> AdministrativeArea = session
.Query<AdministrativeArea>()
.ToList();
Looking at the entries in Raven Studio, I can see that some of the data member values of the documents are coloured blue (so are probably already type-cast to be integer) but that shouldn't be the cause of ALL the data members showing up as null...
No exceptions are being thrown, and the query list contains elements. What am I doing wrong here ?
Thanks for your help !
The problem was the instantiation of int data members. Even when declaring the int members to be nullable, the problem of empty strings came up and prevented correct instantiation of the objects.
I supose that when CSV imports are used, and in some of the cases the field/data member comes up "empty" (but as a string type), and in other cases, you DO have numbers, then you have to resort to declaring them all to be strings. The only other solution I could think of is to adapt the CSV import code, for which I am still too new to RavenDB to think about.

Best way to parse a big and intricated Json file with OpenRefine (or R)

I know how to parse json cells in Open refine, but this one is too tricky for me.
I've used an API to extract the calendar of 4730 AirBNB's rooms, identified by their IDs.
Here is an example of one Json file : https://fr.airbnb.com/api/v2/calendar_months?key=d306zoyjsyarp7ifhu67rjxn52tv0t20&currency=EUR&locale=fr&listing_id=4212133&month=11&year=2016&count=12&_format=with_conditions
For each ID and each day of the year from now until november 2017, i would like to extract the availability of this rooms (true or false) and its price at this day.
I can't figure out how to parse out these informations. I guess that it implies a series of nested forEach, but i can't find the right way to do this with Open Refine.
I've tried, of course,
forEach(value.parseJson().calendar_months, e, e.days)
The result is an array of arrays of dictionnaries that disrupts me.
Any help would be appreciate. If the operation is too difficult in Open Refine, a solution with R (or Python) would also be fine for me.
Rather than just creating your Project as text, and working with GREL to parse out...
The best way is just select the JSON record part that you want to work with using our visual importer wizard for JSON files and XML files (you can even use a URL pointing to a JSON file as in your example). (A video tutorial shows how here: https://www.youtube.com/watch?v=vUxdB-nl0Bw )
Select the JSON part that contains your records that you want to parse and work with (this can be any repeating part, just select one of them and OpenRefine will extract all the rest)
Limit the amount of data rows that you want to load in during creation, or leave default of all rows.
Click Create Project and now your in Rows mode. However if you think that Records mode might be better suited for context, just import the project again as JSON and then select the next outside area of the content, perhaps a larger array that contains a key field, etc. In the example, the key field would probably be the Date, and why I highlight the whole record for a given date. This way OpenRefine will have Keys for each record and Records mode lets you work with them better than Row mode.
Feel free to take this example and make it better and even more helpful for all , add it to our Wiki section on How to Use
I think you are on the right track. The output of:
forEach(value.parseJson().calendar_months, e, e.days)
is hard to read because OpenRefine and JSON both use square brackets to indicate arrays. What you are getting from this expression is an OR array containing twelve items (one for each month of the year). The items in the OR array are JSON - each one an array of days in the month.
To keep the steps manageable I'd suggest tackling it like this:
First use
forEach(value.parseJson().calendar_months,m,m.days).join("|")
You have to use 'join' because OR can't store OR arrays directly in a cell - it has to be a string.
Then use "Edit Cells->Split multi-valued cells" - this will get you 12 rows per ID, each containing a JSON expression. Now for each ID you have 12 rows in OR
Then use:
forEach(value.parseJson(),d,d).join("|")
This splits the JSON down into the individual days
Then use "Edit Cells->Split multi-valued cells" again to split the details for each day into its own cell.
Using the JSON from example URL above - this gives me 441 rows for the single ID - each contains the JSON describing the availability & price for a single day. At this point you can use the 'fill down' function on the ID column to fill in the ID for each of the rows.
You've now got some pretty easy JSON in each cell - so you can extract availability using
value.parseJson().available
etc.