Summing from large public dataset

Summing from large public dataset - json

I'm working on a web app that will let users explore some data from a public API. The idea is to let the user select a U.S. state and some other parameters, and I'll give them a line chart showing, for example, what percentage of home loans in that state were approved or denied over time.
I can make simple queries along these lines work with a small number rows, but these are rather large datasets, so I'm only seeing a sliver of the whole. Asking for all the data produces an error. I think the solution is to aggregate the data. But that's where I start getting 400 bad request responses from the server.
For example, this is an attempt to summarize 2008 California data to give the total number of applications per approval category:
https://api.consumerfinance.gov/data/hmda/slice/hmda_lar.json?$where=as_of_year=2008,state_abbr="CA"&$select=action_taken,SUM(action_taken)&group=action_taken
All summary variations produce a 400 error. I can't find an example that is very similar to what I'm trying to do, so any help is appreciated.
Publisher's information is here:
http://cfpb.github.io/api/hmda/queries.html

Sorry for the delay, but its worth noting that that API is based on Qu, CFPB's open-source query platform, and not Socrata, so you can't use the same SoQL query language on that API.
Its unclear how you can engage them for help, but there is a "Contact Us" email link in the docs page and an issues forum on GitHub.

Related

why can I only get 250 rows (/shoe transactions) of stockx data when accessing their API?

I have tried to gather data directly from the API of stockx which seemed possible according to an article from Jan 2019: https://medium.com/#thewillmundy/stockx-sneaker-data-in-three-simple-steps-8977d0016b80 . I am thereby able to get a request url which gives me some transactions in JSON-format.
I have tried changing the parameters within the request url (limit as well as page), which is possible, but only for the latest 250 transactions (due to high volume of sales for some shoes, I can thereby only receive the sales history for the last few days)...
My Goal: getting the whole sales history (often several thousand transactions) - in the article mentioned above, thats possible
Could it be a restriction from stockx?
or is there a way?
Would be so so grateful for help!!!
Best regards, Marvin

I think the API will only give you the 250 most recent sales because that's all the product webpage itself will allow you to load when you click view all sales. Any sales further back in time aren't directly accessible from the product page, and we're essentially requesting the same data that page can request using the link it would use. I guess those are stored and accessed in a different way internally.
I'm guessing StockX changed its API since that article is a little old. I would try to contact StockX about their API via email, but I don't think they're really continuing developer support:
https://twitter.com/stockx/status/1000004306844647424?lang=en
It's pretty disappointing because I was also looking to work with the sales data but what can you do :/

Open Payments - How to reconcile or explain differences between API data versus live site visual data

We are connected to the API for https://openpaymentsdata.cms.gov/. In some cases the data points differ (for example - quantity of general payments) between the raw API data versus what is seen on the live site.
What is the source for these data differences?
Which is the correct data - the API or the live site? Best case scenario is the API is the correct data and the live site wasn't updated yet.
We'd like to reconcile this to pull the correct data, but if it can't be reconciled through the API then how can it best be explained? It seems almost like a dispute on the transaction(s) was initiated which changed one of the data points. If a dispute is initiated by the profile owner, does the API data change to reflect any updates?
This is my first attempt to reach out for an idea how these data differences occur.
Here's a random example. The profile URL: https://openpaymentsdata.cms.gov/physician/209169/
The quantity of general payments shown on the live site is: 139
The quantity of general payments pulled from the API is: 151

I see the issue here. Based on the API call https://openpaymentsdata.cms.gov/resource/bqf5-h6wd.json?physician_profile_id=209169&$$app_token=oXbsFwj7KElCMesuRAZEfTDfB&$select=total_amount_of_payment_usdollars,number_of_payments_included_in_total_amount&$limit=50000&$offset=0 you get a field called "number_of_payments_included_in_total_amount" which has a number associated with it. Most of these are "1", but the 136th entry has "13" assigned to it. You are summing across the field which equals 151.
However, the OpenPayments website appears to only be counting entries/rows. There are 139 entries/rows in the results.
Unfortunately, I don't know which one is accurate. It could be a misconfiguration of the OpenPayments website or it could be a misinterpretation of what number_of_payments_included_in_total_amount means. But, at least, it explains the difference in how you get your counts.

GoogleBetterAds - violatingSites.list - google-apis-explorer

I can get a list of summaries of violating sites, using the following link:
https://developers.google.com/ad-experience-report/[...]/violatingSites/list
My questions:
Is this list exhaustive?
If not, is it possible to get an exhaustive list (or not) and how?
Is it possible to know how these websites are pulled (the share of websites analysed, etc)?

- Is this list exhaustive?
What's size of your actual API return?
If you have an API return statement increasingly longer and longer with new data at each new request, you can think have the exhaustive list (with a possible update
latency).
If the API return statement have always same size with different data, in example old data will not appears and it replaced by new data, it's not exhaustive.
- If not, is it possible to get an exhaustive list (or not) and how?
I have no idea at the moment, the total number of websites can be in billion ...
- Is it possible to know how these websites are pulled (the share of websites analysed, etc)?
I have no idea for the moment too, I think it is either a confidential process or that it is described in the general conditions and subtily in the documentation...

Counting views of any element on website

I am using such MySQL request for measuring views count
UPDATE content SET views=views+1 WHERE id='$id'
For example if I want to check how many times some single page has been viewed I've just putting it on top of page code. Unfortunately I always receiving about 5-10x bigger amount than results in Google Analytics.
If I am correct one refresh should increase value in my data base about +1. Doesn't "Views" in Google Analytics works in the same way?
If e.g. Google Analytics provides me that single page has been viewed 100x times and my data base says it was e.g. 450x times. How such simple request could generate additional 350 views? And I don't mean visits or unique visits. Just regular views.
Is it possible that Google Analytics interprates such data in a little bit different way and my data base result is correct?

There are quite a few reasons why this could be occurring. The most usual culprit is bots and spiders. As soon as you use a third-party API like Google Analytics, or Facebook's API, you'll get their bots making hits to your page.
You need to examine each request in more detail. The user agent is a good place to start, although I do recommend researching this area further - discriminating between human and bot traffic is quite a deep subject.

In Google Analytics the data is provided by the user, for example:
A user view a page on your domain, now he is on charge to comunicate to Google The PageView, if something fails in the road, the data will no be included in the reports.
In the other case , the SQL sistem that you have is a Log Based Analytic, the data is collected by your system reducing the data collection failures.
If we see this in that way, that means taht some data can be missed with the slow conections and users that dont execute javascriopt (Adbloquers or bots), or the HTML page is not properly printed***.
Now 5x times more it's a huge discrepancy, in my experiences must be near 8-25% of discrepancy. (tested over transaction level, maybe in Pageview can be more)
What i recomend you is:
Save device, browser information, the ip, and some other metadata information that can be useful and dont forget the timesatmp, so in that way yo can isolate the problem, maybe are robots or users with adblock, in the worst case you code is not properly implemented ( located in the Footer as example)
*** i added this because one time i had a huge discrepancy, but it was a server error, the HTML code was not properly printed showing to the user a empty HTTP. The MYSQL was no so fast to save the information and process the HTML code. I notice it when the effort test (via Screaming frog) showed a lot of 500x errors. ( Wordpress Blog with no cache)

Instagram API Follower List Export

Does anybody know of a simple way to use the Instagram API to export a list of a user's followers? I'm looking to be able to take a list of people that follow me, and a list of people that follow another user, then find all of the common followers, and finally pick a certain number of random usernames from the common followers. I know it sounds a bit complex, but mostly I just need a way to get the list of followers, and I should be able to figure out the rest. A CSV File would probably do what I need if possible. So does anyone know how to do this, or have a tutorial you could direct me to? Thanks

Yes, there is a way to do this. But you can no longer do it using the Instagram API. The Instagram API is no longer accepting new applications and it is being shut down in stages starting on the 31st of July 2018:
https://developers.facebook.com/blog/post/2018/01/30/instagram-graph-api-updates/
However, there is another way to obtain the data. You can export the followers of any Instagram account into a CSV file using this website:
www.magimetrics.com
It is very easy to use Magi Metrics. Sign up for an account, search for the username of the Instagram account you want to export and choose the option 'Export followers'. Magi Metrics will do all the work and will email you the CSV file once it's done. It's free to use for exports less than 100 rows, and you can pay to upgrade to do more.
Using Magi Metrics you can export a list of all the people that follow you, and all the people who follow another user. You can then take these two CSV files and copy them into Microsoft Excel.
In the first column you will find the Instagram numerical ID, which is unique to each user. If you apply the COUNTIF formula to this column, you can identify the Instagram users who are in both lists:
http://spreadsheetpro.net/comparing-two-columns-unique-values/
Finally, to choose a random subset from the list of Instagram users, you can use this method:
https://www.extendoffice.com/documents/excel/4487-excel-generate-random-number-from-list.html
Using these steps you can achieve what you want: a random subset from the list of Instagram accounts who both follow your account and another user!
I appreciate this question dates from 2016, so your need has probably long gone. But if anyone else on the Internet faces the same problem, I've posted this answer to help them. Good luck!

You're luck.I found a tool easily no longer do it using the Instagram API.
you can have a try. InsExport
How does it work?
1 Enter an Instagram username
2 Select the export type(follwer or following)
3 Click the Export button
https://chrome.google.com/webstore/detail/insexport-get-instagram-f/okmokimdgjhndamggnkdojhbofdmepno?hl=zh-CN&authuser=0

There are multiple ways to do this. Programatically, you can easily request a list of up to 100 of your own followers. With one token, that's the max per request.
Here's the full documentation from Instagram: https://www.instagram.com/developer/endpoints/users/
Automated services will get a list of all your followers by requesting them 100 at a time using multiple tokens, getting around that 5,000 request limit (https://www.crowdbabble.com/download-all-instagram-followers/).

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008