How to interpret Record Type in Socrata New York City Real Property Legals database? - socrata

Can you look at https://data.cityofnewyork.us/City-Government/ERROR-in-record-type/dq2e-3a6q
This shows a record type that appears to be incorrect.
It shows
P:10,item":"Bloomfield"},{"count":9,item":"New Britain"},{"count":8,item":"West Htfd"},{"count":7,item":"Torrington"},{"count":6,item":"Meriden"},{"count":5,item":"Whfd"},{"count":4,item":"Manchester
If you select count(*) and group by record_type you see:
curl 'https://data.cityofnewyork.us/resource/636b-3b5g.json?$select=count(*),record_type&$group=record_type'
[ {
"count" : "1",
"record_type" : "P:10,item\":\"Bloomfield\"},{\"count\":9,item\":\"New Britain\"},{\"count\":8,item\":\"West Htfd\"},{\"count\":7,item\":\"Torrington\"},{\"count\":6,item\":\"Meriden\"},{\"count\":5,item\":\"Whfd\"},{\"count\":4,item\":\"Manchester"
}
, {
"count" : "36631085",
"record_type" : "P"
}
This means there are 36M record type's having the value "P" and one very odd one.
One suggestion for New York City Open Data Law:
We must modify the Open Data Law (http://www1.nyc.gov/site/doitt/initiatives/open-data-law.page) to require New York City Government agencies to not only to open up data but to actually use the open data portal for government agency public sites.
If we allow agencies to simply dump data into a portal, then we have no quality testing. And agencies can trumpet how many datasets are open but no one is actually using the data.
This simple change "agency must use it's own data (aka, dogfood)" will encourage quality. If you read, http://www1.nyc.gov/site/doitt/initiatives/open-data-law.page it only mentions quality once and nothing about usage of the data. A portal is not a thing to brag about, it is an important way to join technology and government.
Thanks!

Related

data.medicare.gov/resource/4pq5-n9py.json Numbers as Dates

Calling the API https://data.medicare.gov/resource/4pq5-n9py.json returns erratic results.
{
...
"reported_cna_staffing_hours_per_resident_per_day" : "2.53304",
"cycle_2_number_of_complaint_health_deficiencies" : "2017-06-22T00:00:00",
"cycle_2_health_deficiency_score" : "0",
...
}
I believe cycle_2_number_of_complaint_health_deficiencies should be a number. The data on the website is correct so I'm assuming that it is a problem with the API
It appears that field is defined as a floating timestamp. It appears the human readable name, Rating Cycle 1 Standard Survey Health Date, differs from the API name which you see on the API call. Looks like it's more an issue with confusing naming conventions.
Take a look at the metadata page for the underlying API names.

Designing REST - save big set of related entities

In my system, I have an entity (sales) who can serve people which have certain ZIP codes.
So, each sales can have thousands of ZIP codes binded to his account.
I need to develop REST API that would allow to load and edit list of sales zip codes.
Basically I have 2 options:
1) Creates 2 Resources : Sales and SalesZip. Submit Sales data, and then sumbit SalesZip records for each supported zip code.
2) Create Sales entity, and load list of supported zip codes like this:
{
id : 1,
name : "John",
zip : [
"90231",
"12341",
...
]
}
And submit zip codes like an array:
zip[]=90231,12341
Both ways have some disadvantages.
If use first option, I may need to submit too many separate HTTP requests.
If use second option, I may need to send quite big PUT/POST request.
Question
Which option should I use?
What's best practics of designing such functionality?
What is exactly "quite big"?
In a rough estimation, if each char are 2 bytes, and your ZIP codes have 5 chars, each code is 10 bytes. Assuming that US has 41,741 ZIP codes, in US worst case scenario, a salesman that sells across all country, would need a payload of around 417,410 bytes, or 407.6 kbytes.
In average, to how many ZIP codes a salesman belong? how is it distributed? How often do you get these requests? You may discover that is not that bad after all.
There is not enough data to make a decision, but it seems that second option is not bad.

authorized.net ambiguity in country names

Hi I am working on a site and integrating authorize.net payment gateway. I am thinking of adding a dropdown for country names, will passing of "United States Of America" as country variable work? Or should I use "US"? Should I use ISO codes for every country? I tried on test developer account but it seems to accept everything I passes to it as correct!
~Ajit
I know authorize.net doesn't require country names. A simple way to see if they even validate them would be to run a transaction through the production gateway, pass a nonsense value and see if the transaction still goes through.
If you do standardize to support authorize.net (or for another reason), I'd suggest country codes versus full names. Codes seem to change less often, and also can be useful as identifiers. For example, I have an application which presents data for roughly 200 countries; I have flag icons (multiple sizes for each country) that use a 2 digit country code in their name. Using codes made this fairly easy to implement and maintain.
According to their AIM Guide:
x_country: Optional
Value: The country of the customer’s billing
Format: Up to 60 characters (no symbols)

How should I populate city/state fields based on the zip?

I'm aware there are databases for zip codes, but how would I grab the city/state fields based on that? Do these databases contain the city/states or do I have to do some sort of lookup to a webservice?
\begin{been-there-done-that}
Important realization: There is not a one-to-one mapping between cities/counties and ZIP codes. A ZIP code is not based on a political area but instead a distribution area as defined for the USPS's internal use. It doesn't make sense to look up a city based on a ZIP code unless you have the +4 or the entire street address to match a record in the USPS address database; otherwise, you won't know if it's RICHMOND or HENRICO, DALLAS or FORT WORTH, there's just not enough information to tell.
This is why, for example, many e-commerce vendors find dealing with New York state sales tax frustrating, since that tax scheme is based on county, e-commerce systems typically don't ask for the county, and ZIP codes (the only information they provide instead) in New York can span county lines.
The USPS updates its address database every month and costs real money, so pretty much any list that you find freely available on the Internet is going to be out of date, especially with the USPS closing post offices to save money.
One ZIP code may span multiple place names, and one city often uses several (but not necessarily whole) ZIP codes. Finally, the city name listed in the ZIP code file may not actually be representative of the place in which the addressee actually lives; instead, it represents the location of their post office. Our office mail is addressed to ASHLAND, but we work about 7 miles from the town's actual political limits. ASHLAND just happens to be where our carrier's route originates from.
For guesstimating someone's location, such as for a search of nearby points of interest, these sources and City/State/ZIP sets are probably fine, they don't need to be exact. But for address validation in a data entry scenario? Absolutely not--validate the whole address or don't bother at all.
Just a friendly reminder to take a step back and remember the data source's intended use!
\end{been-there-done-that}
Modern zip code databases contain columns for City, State fields.
http://sourceforge.net/projects/zips/
http://www.populardata.com/
Using the Ziptastic HTTP/JSON API
This is a pretty new service, but according to their documentation, it looks like all you need to do is send a GET request to http://ziptasticapi.com, like so:
GET http://ziptasticapi.com/48867
And they will return a JSON object along the lines of:
{"country": "US", "state": "MI", "city": "OWOSSO"}
Indeed, it works. You can test this from a command line by doing something like:
curl http://ziptasticapi.com/48867
Using the US Postal Service HTTP/XML API
According to this page on the US Postal Service website which documents their XML based web API, specifically Section 4.0 (page 22) of this PDF document, they have a URL where you can send an XML request containing a 5 digit Zip Code and they will respond with an XML document containing the corresponding City and State.
According to their documentation, here's what you would send:
http://SERVERNAME/ShippingAPITest.dll?API=CityStateLookup&XML=<CityStateLookupRequest%20USERID="xxxxxxx"><ZipCode ID= "0"><Zip5>90210</Zip5></ZipCode></CityStateLookupRequest>
And here's what you would receive back:
<?xml version="1.0"?>
<CityStateLookupResponse>
<ZipCode ID="0">
<Zip5>90210</Zip5>
<City>BEVERLY HILLS</City>
<State>CA</State>
</ZipCode>
</CityStateLookupResponse>
USPS does require that you register with them before you can use the API, but, as far as I could tell, there is no charge for access. By the way, their API has some other features: you can do Address Standardization and Zip Code Lookup, as well as the whole suite of tracking, shipping, labels, etc.
I'll try to answer the question "HOW should I populate...", and not "SHOULD I populate..."
Assuming you are going to do this more than once, you would want to build your own database. This could be nothing more than a text file you downloaded from any of the many sources (see Pentium10 reply here). When you need a city name, you search for the ZIP, and extract the city/state text. To speed things up, you would sort the table in numeric order by ZIP, build an index of lines, and use a binary search.
If you ZIP database looked like (from sourceforge):
"zip code", "state abbreviation", "latitude", "longitude", "city", "state"
"35004", "AL", " 33.606379", " -86.50249", "Moody", "Alabama"
"35005", "AL", " 33.592585", " -86.95969", "Adamsville", "Alabama"
"35006", "AL", " 33.451714", " -87.23957", "Adger", "Alabama"
The most simple-minded extraction from the text would go something like
$zipLine = lookup($ZIP);
if($zipLine) {
$fields = explode(", ", $zipLine);
$city = $fields[4];
$state = $fields[5];
} else {
die "$ZIP not found";
}
If you are just playing with text in PHP, that's all you need. But if you have a database application, you would do everything in SQL. Further details on your application may elicit more detailed responses.

DopeWars codebase - Where are the main calculations taking place?

Not really a c/c++ person so I was hoping someone could direct me to the files that contain the main calculations of the game?
I am specifically interested in how things are calculated when deciding if the person 'wins' or 'loses' (generally speaking) during events like running/standing/etc.
In other words, winning/losing will be based on many factors: what are they? What are the formulae?
You didn't reference the source, so I Googled DopeWars and found this:
http://dopewars.sourceforge.net/
Looking into the source, serverside.h/c seems to be what you are looking for. But keep in mind a lot of the limits are already predefined in dopewars.c. Take a look at the drug prices in this struct:
struct DRUG DefaultDrug[] = {
/* The names of the default drugs, and the messages displayed when they
* are specially cheap or expensive */
{N_("Acid"), 1000, 4400, TRUE, FALSE,
N_("The market is flooded with cheap home-made acid!")},
{N_("Cocaine"), 15000, 29000, FALSE, TRUE, ""},
}
Note: sample struct is not complete. Please review the source to see the full listing.
The actual functionality that validates the actions chosen by the player exists in serverside.c.
It is up to the "server" (game engine) to validate the players choice and next step to be taken and communicate it back to the client. The client in this case can be a GUI or Curses (command line) driven client. It is the clients responsibility to update the screen, get new input from the server (be it typing characters for input or mouse clicks).