I can get a list of summaries of violating sites, using the following link:
https://developers.google.com/ad-experience-report/[...]/violatingSites/list
My questions:
Is this list exhaustive?
If not, is it possible to get an exhaustive list (or not) and how?
Is it possible to know how these websites are pulled (the share of websites analysed, etc)?
- Is this list exhaustive?
What's size of your actual API return?
If you have an API return statement increasingly longer and longer with new data at each new request, you can think have the exhaustive list (with a possible update
latency).
If the API return statement have always same size with different data, in example old data will not appears and it replaced by new data, it's not exhaustive.
- If not, is it possible to get an exhaustive list (or not) and how?
I have no idea at the moment, the total number of websites can be in billion ...
- Is it possible to know how these websites are pulled (the share of websites analysed, etc)?
I have no idea for the moment too, I think it is either a confidential process or that it is described in the general conditions and subtily in the documentation...
Related
Short version:
What is the proper way to list/query files by suffix, "fullText contains 'ext', "fileExtension = 'ext'" or "title contains 'ext'"? These do not always return the same results; only one of them is documented (the first), and it's not consistent.
Long versions:
I've been developing Google Drive apps for years. Every now and then I have to change to my list queries to get the correct results. My application needs to find files with a certain suffix. Official documentation indicates that I need to use the "fullText contains 'ext'" syntax, but sometimes this fails to find some files. At one time I switched to the undocumented "fileExtension = 'ext'" syntax, but again after some time I found files that wouldn't show up and went back to fullText searches. However, again I've seen files not showing up with that search and tried using "title contains 'ext'" (or v3 "name contains 'ext'") which seems to work, but for how long? I don't like using undocumented queries which might just suddenly stop working.
I feel like I'm going in circles since I don't know why fullText fails (and only for some users, and when it does work I've seen the parents field come up empty sometimes...which doesn't happen with other queries) or why the title search works (not documented to search suffixes...and I'm pretty sure it didn't used to work). I might just perform all three searches, but this affects performance, and the "or" keyword with some combinations of those three searches returns no results at all.
My application has thousands of files, each with multiple revisions, in hundreds of folders and each folder is shared with dozens of users and those permissions are changing on a regular basis as people are added and removed from projects. There are hundreds of different owners of the individual files. I suspect this complexity and the time it takes to propagate permissions and file changes affects my queries, but doesn't explain why one search would work and another wouldn't or why the information returned on a file in one query would be different from the other. That is, even after several days the problem doesn't correct itself and often a file must be remove and re-uploaded for everyone to see it. I have experienced the slow updates to meta data for shared files resulting in mismatches between meta data, files, and search results, but I take all of that into account and still have queries which simply won't work properly.
Maybe I'm expecting too much from a free API? Overall I'm very happy with what i can do, but it can be very frustrating when it's not working and you know you're doing it right! :)
You can search or filter files with the 'files.list' or 'children.list' methods of the Drive API. These methods accept the 'q' parameter which is search query.
For more information, see: https://developers.google.com/drive/v3/web/search-parameters
6.30.15 - HOW CAN I MAKE THIS QUESTION BETTER AND MORE HELPFUL TO OTHERS? FEEDBACK WOULD BE HELPFUL. THANKS!
I am developing a web application that will handle/manage a VERY LARGE data set - Currently any kind of heavy load causes the browser to lock up - whether I'm in the Django Rest Framework API or in the Dojo/Dgrid. This is kind of a dual question.
I've researched and can't find a clear way to do this on either side.
How do I limit how much the database sends at one time to the Django Rest Framework and/ or The Dojo Dgrid. The Dgrid pulls the data from the Django Rest API. The DRF pulls data directly from the MySQL database.
If I can control how much data is sent at one time, then hopefully it won't lock up the browser as much. ANY suggestions, advice, help, examples would be very helpful. Thanks in advance!
UPDATED 6.22.15 -
Alright, I FINALLY Got the pagination to work and it display the limit/offset in the headers. :) YAY!!!! I can also see the data in the Response headers. HOWEVER... the grid won't populate and I keep getting this odd error:
TypeError: transform(...) is null
return transform(value, key).toString();
instrum...tion.js (line 20)
I've gotten this error before, but I've never been able to find a solution to it. After researching, there's not much I can find on HOW to fix or really even what it is. Any help with this would be greatly appreciated!! I'm SO CLOSE to getting this thing to work correctly after WEEKS and WEEKS of beating my head against a wall. Please help! :) Thanks in advance!!!
2nd Update - This was an answer from a previous post - but I'm still not sure how to fix it. When I addressed another issue - it went away for awhile, but I still have no idea how to correct the issue.
Problem 3: "transform(...) is null return transform(value, key).toString();"
This sounds largely tangential to the original issue, but the most common cause is a widget template that is referencing a property via ${...} that doesn't actually exist in the widget.
I don't know how to answer this on the layer between DRF and the database, but as discussed in other SO questions like this one, DRF allows you to limit the amount of data sent with requests via page or offset/limit parameters.
Based on the phrasing of your question, it sounds like the client side is actually requesting too much data. I'll outline how the flow should work, so hopefully you can spot what you've missed:
A dgrid instance is set up with a collection referencing a dstore/Rest instance
The dstore/Rest instance is created with appropriate properties set. In this case, based on the DRF Documentation:
useRangeHeaders: false (this is already the default)
rangeStartParam: 'offset'
rangeCountParam: 'limit'
As a result, when the grid renders, you should see requests sent to your server e.g. endpoint?offset=0&limit=25 - if you don't see those two parameters, that would be why you're getting too much data
The server will need to query the database with the respective OFFSET and LIMIT
The server must provide a response with the expected number of items (except if it reaches the end of the data set first, which should be reflected by the total property in the response, presuming the customization in the previous SO answer I linked is used)
Ultimately, if the service is working as expected, the grid should only be requesting a handful of items at a time, and should only be firing one or two requests at any given time.
Would add as a comment, but not enough reputation at the moment ....
Your question is pretty general, but one strategy would be to allow the user to select the number of items they wish to view at a time and then allow the user to page through the data with 'next x items' and 'prev x items' buttons. Your data object query would then use the current position +/- 'x' as the index range to reduce the number of data objects sent to the browser. This is the basic flow for Ebay, Amazon, Google, or any site with thousands of items to display. The 'next' and 'prev' actions could be wired as POST requests.
When I use the search functionality on the scribd docs API to search for a function, like
http://api.scribd.com/api?method=docs.search&api_key=API_KEY&query=hello+world
It returns irrelevant results, and ones different to the search functionality of the site. This request, for example, returns results about Guitar Hero, World of Warcraft and Virtual Worlds etc. Whereas the site search on https://www.scribd.com/search-documents?query=hello+world gives documents titled "Hello World" as you would expect. Is there a parameter that I can add to the api call that will make it return relevant results?
You may try playing with the simple parameter to see if it makes any difference to your queries. According to the API reference (half of it is inaccessible at the moment) it makes the results the same as for the website:
(optional)This option specifies whether or not to allow advanced search queries (more information). When set to false, the API search behaves the same as the search on Scribd.com. When set to true, the API search allows advanced queries that contain filters such as title:"A Tale of Two Cities". Set to "true" by default.
I tried your query myself, but it still doesn't give adequate results, even though it changes things a bit. But it is still not good enough regardless of the simple option being set to false. Even if you try to run their sample queries 1:1 they are still giving 90% irrelevant results.
Then I found a similar issue being discussed in the following google group thread back in 2011. At the end Jared Friedman (the CTO of Scribd) himself admits that API search and website Search work differently and it is not in their priorities to fix this. In 2014 another developer complained. Seems to me that four years later this is still the case.
I'd suggest contacting Scribd support directly and asking them what is the current status of the docs.search API and if there is some preliminary approval process in place (for example, they may do a background check on accounts and only then provide relevant results, otherwise they return just test results for any query) although I doubt it.
I'm new to geocoding so I'm not certain this is even the question I should be asking, but all of the other discussions I've seen on this topic (here and on the Google API forum) are so application specific that I feel like I might be missing a very elementary step - I don't need to know how to implement a store finder - I need to know if I should.
Here is my specific situation - I have been contracted to design an application wherein we will build a database of shops (say, independently owned bars and pubs). This list will continually grow and change as shops close and new ones open. The user can enter his/her point of origin (zip code or address) and be shown a list or map containing all the various shops within a given radius in order of proximity.
I know how to deliver these results from a static database:
One would store the longitude and latitude as columns for each row and then just use that information to check distances.
But I have inherited an (already fairly large) database of shops which have addresses but not coordinates - so I'm not sure what the best way to get those addresses is. I could write a script to query them one at a time against google geocoding, I could have a data entry person manually look up the coordinates for each one and populate the data that way, or maybe there is a third option I'm not aware of.
Is this the right place to be asking this question? Google Maps Geocoding doesn't host a forum of their own, but refers people to Stack Overflow. Other forums on the net dealing with this topic are all relating to a specific technical question but no one seems to be talking about it from a top-down perspective (ie the big picture).
Google imposes a 2,500 queries per day limit on free users and a 100,000 queries a day limit on paid ones - neither of these seem to be up to the task of a site with even moderate traffic if, every time a user makes a request, the entire database (perhaps thousands of shops) are being checked against Google's data. It seems certain we must store the coords locally but even storing them locally, there will have to be checks against Google in order to plot them on a map. If I had a finite number of locations (if, for example, I had six hardware shops) and I wanted to make a store locator, there would be a wealth of discussions, tutorials, and stack overflow questions available to point the way for me, but I'm dealing with a potentially vast number of records and not sure how to proceed or where to begin.
Any advice would be welcome - Additionally, if this is not the best place to be asking this question, a helpful response would be to indicate a better place to post it. I've searched for three days but haven't found what looks like a good resource for asking such subjective questions.
The best way of course would be when you use a geocoding-service to get coordinates and store the coordinates in your DB. But it's not possible with google's geocoding-service, because it's not permitted to store geocoded data permanent.
There are free services without this restriction, some keywords to search for: mapquest, nominatim, geonames(but these services are less accurate than google)
Another option would be to use a FusionTable. The geocoding would run automatically(but the daily limits are the same as for the geocoding-service). The benefit: the geocoding is permanent(you can't access the locations directly by e.g. downloading the DB-dump), but you may use the coordinates for plotting markers(via a FusionTablesLayer) or filtering(e.g. by distance)
The number of entries shouldn't be an issue, 100k is no problem for a database
Let's say we have a site where we have a list of items. On each of these items you can start a couple of different process that will result in somekind of output related to the item in question. How should you design for the most appropriate use of the http verbs? What I would like to have is multiple links per item and each link trigger one of the actions, but in my scenario that doesn't match the HTTP-VERB get, which will be used if I am using links. On the other hand, I don't want to have buttons which all are in a separate form with different actions.
It's somewhat hard to explain but hopefully you understand, it should be some best practices to apply here.
You should NOT use GET. GET requests should be safe which means they are intended only for information retrieval and should not change the state of the server. (i.e. things like logging are OK, but things that actually update the state of the application are a no-no.) Think of a crawler going over your application. Anything you wouldn't mind a crawler going through is fine for GET, but that doesn't sound like your situation (because you said, "start a couple of different processes", but I could be misinterpreting your use case).
That leaves PUT, DELETE and POST. PUT and DELETE must be idempotent, meaning that multiple identical requests should have the same effect as a single request. So if you had a request that updated a person's name, for example, if you called it once or 100 times, the person's name would still be the same, so it is idempotent.
POST is the most flexible verb. If the processes you are kicking off are not safe or idempotent (or even if they are) you can use POST, which simply doesn't guarantee anything about safety or idempotency. The disadvantages there are:
If you use POST when GET is more semantically correct, it is less communicative of the intent of your request, since POST usually means you are sending a payload.
You just couldn't take advantage of the web's caching infrastructure that makes it so scalable.
In the past, I have used POST with query args to specify custom actions. It made sense in my use case because I had a majority of custom actions needing to pass a payload. Since you do not want to use buttons, you can use GET with query args to specify the different actions, but you have to be very careful that the action you are taking does not have any side effects and is idempotent. As noted in the comment by #jhericks below, there are many things in the network that assume that GET's are safe and may repeat GET's.
From a pure RESTful perspective though, this is not ideal. Your items will have a specific URI and GET on the URI will return the items representation. Running actions on the item is effectively a change in the state of the item representation and that should be done with a POST(or a PUT depending on who you ask and if your web server supports PUT). In real life though, using query args is an easy work around and it may make sense to your use case.
Im not sure i fully understand your question.
But here's a quick paragraph which might help you.
REST is about making smart clients and simple servers. GET, PUT, DELETE represent the basic operations of file access at the lowest level. What you should be doing is completely ignoring anything the server can offer and be offloading that work onto clients.
So, the question is, why is the server being triggered to do many things. why can't the client do all of these things itself.
Mike Brown