Store urls in mysql using tinyurl or similar - mysql

I want to store a large number of unique, indexable urls in mysql. They will be written by the user and will have a really broad acceptance, so almost anything goes there.
But I don't like the already answered SO question nor it's predecessor. Actually I think there might be a 'better' way for doing so, but I don't completely see the implications of this, so that's why I ask this question.
What are the advantages/disadvantages of using tinyurl.com's API for this purpose? In this link we can see an article about how to do this. It's just this code (could be even shorter but I prefer it in a function):
function createTinyUrl($strURL)
{
return file_get_contents("http://tinyurl.com/api-create.php?url=".$strURL);
}
Then I would have to compare only the example in tinyurl.com/example to check the uniqueness of the link. From the wikipedia article we can read:
If the URL has already been requested, TinyURL will return the existing alias rather than create a duplicate entry.
Disadvantages (basically same ones as using any centralized system):
Tinyurl servers could go down temporarily or forever. Just in case, the full url will be stored in another not-indexed field.
Non reversible. You can get a tinyurl from your url, but the other way back is more difficult. However, first problem's solution also solve this.
In my case, the urls are NOT private, but this could prevent someone from using this solution.
Delay. The requests might be a bit slower, but since it's server to server talking, I don't think this will be noticeable. Will do some benchmark though.
Any other disadvantage that you can see using this method? The main reason for using it is not to have duplicated entries and to make them indexable.

As the user eggyal suggested, a hash based index of the full URI is really similar to using the Tinyurl API, only I don't need to send data anywhere. I will use that then.
NOTE: this answer is here only to add closure to the question

Related

REST: Creating a Json based Query: Which http method to use?

I have a question to RESTful services. In REST the POST method is used to create an entity.
And GET is used to query entities. Right?
As I read in another posts it is not allowed in HTTP to send a GET request with a body.
But when I want to send Json to make a query, what is the best way? Are there any best practices or how do you solve such json queries?
Thanks for your answers
In REST the POST method is used to create an entity. And GET is used to query entities. Right?
Not really. GET is used to fetch representations of resources. POST is deliberately vague -- anything not worth standardizing can use POST.
when I want to send Json to make a query, what is the best way?
There is no best way to do it, just trade offs.
The basic plot of HTTP is that you GET representations of resources. If the resource you want doesn't exist, you create a new one. So the "REST" flow would look something like sending a request to the server to create a "the answer to my query" resource, and then using GET to obtain the current representation of that resource. Which is great, because we can fetch the latest representation of that resource any time we're worried that our copy is out of date. Other people with the same query can use the same resource, so we can use a general-purpose cache to take a lot of the work. The end result is "web scale".
OK, not that great, because we learned that sending information over insecure channels is a bad idea; but we can put a general-purpose caching proxy in front of our server, and get some scale that way.
But "create a new resource" is a lot of ceremony when you only expect to need the query once.
Creating a new resource was using POST in this situation anyway, so why not return a representation of the solution right away? And the answer is, go right ahead! that works great... but doesn't give you any cache support at all. You are effectively performing a remote call under the guise of modifying a resource.
Also, POST doesn't promise idempotent semantics -- on an unreliable network, requests can get lost, and general purpose components won't know that in this particular case it is harmless to just repeat the same request.
PUT has idempotent semantics... but it also has very specific opinions about the contents of the payload that don't match "query" at all.
You can dig through other standardized methods, but there aren't really any good fits. The only methods that are close are SEARCH and REPORT, which are coupled to WebDAV semantics.
You can invent your own non standard method; but general purpose components won't understand it.
You can standardize a new method with the semantics you need, but that's a lot of work.
Or you can just use POST.
Remember, the web took over the world using nothing more than GET and POST. So it's probably fine.

REST And Can Delete/Update etc actions

I am writing a REST API. However, one of the requirements is to allow the caller to determine if an action may be performed (so that, for example, a button can be enabled or disabled, etc.)
The action might not be allowed for several reasons - perhaps user permissions, but possibly because, for example, you can't delete a shared object, or you can't create an item with the same name as another item or an array of other business rules.
All the logic to determine if something can be deleted should be determined in the back end, but the front end must show this in the GUI.
I am trying to find the right pattern to use for this in REST, and am coming up a bit short. I could create a parallel API so for every entity endpoint there was an EntityPermissions endpoint, but that seems to be overkill. I could also do something like add an HTTP header that indicates that the request was only to check permisisons, not perform it, but that seems a bit dubious, and likely to mess up the http cache.
Can anyone point me to the common pattern for doing something like this? Does it have a name? Or a web page that discusses it? I'm sure everyone has their own ideas on this (like my dumb ideas) but I this seems to be a common enough requirement that I figure there must be a common pattern for it. But google didn't help much.
There's going to be multiple opinionated answers about this. I'll share mine. Might not be the best for your problem, but it's a valid solutions.
If you followed the real definition of REST, you would be building a hypermedia/HATEOAS-style webservice. Urls would not be hardcoded, they would be discovered and actions would be discovered by the existence of a link.
If an action may not be performed, you can just hide the link. If a user fetches the next resource they just see all the available actions right there.
A popular format for hypermedia API's is HAL. You might decorate the links further with more information from HTTP Link hints.
If this is the first time you heard of hypermedia API's, there might be a bit of a learning curve. The results of learning this can be very beneficial though.

Asp.NET Core 2 Images

I have a couple of questions about images, since I don't know what is better for my purposes. Also this might me helpful for other people because I couldn't find this info in other questions.
Well, although this is an asp.net core 2.0 application the first question could is a general question about images.
QUESTION 1
When I have images that I want to load everytime I usually add a query string so the explorers like Chrome or IE don't get the chached image they have. In my case I add the time ticks to the url of the image, this way it loads the image everytime since the query string is always different:
filePath += "?" + DateTime.Now.Ticks;
But in my case I have a panel where the administrators of the page can change a lot of images. The problem, when they change those images if there is no query string the users are going to see an old image they have stored in their explorer cache.
The question is, if I add the query string to many images is not bad for the performance? is there any other solution for this?
QUESTION 2
I also have photos of the users and other images stored in the site. When I saw a image all the visitors of the site can see the path (for example: www.site.com/user_files/user_001/photo001.jpg).
Is there a way to hide those paths or transform in another thing is asp.net core 2.0?
Thanks a lot.
Using something like ticks will get the job done, but in a very naive way. You're going to put more stress both on your server and the clients, since effectively the image will have to be refetched every single time, regardless of whether it has changed or not. If you will have any mobile users, the situation is far worse for them, as they'll be forced to redownload all these resources over and over, usually over limited (and costly) data plans.
A far better approach is to use a cryptographic digest, often called a "hash". Essentially, the same data encrypted in the same way will return the same hash. It's usually used to detect tampering with transmitted data, but since each message will (generally) have a unique hash and that hash will be the same each time for the same piece of data, you can also use this to generate a cache-busting query string that only changes when the image data itself changes.
Now, to be thorough, there's technically no guarantee that two messages won't result in the same hash. Instances where that occurs are called "collisions" and they can happen. However, if you use a sufficiently complex algorithm like SHA256, the likelihood of collisions is greatly reduced. Regardless, it should not be a real issue for concern for this particular use case of cache-busting images.
Simplistically, to create the hash, you simply do something like:
string hash;
using (var sha256 = SHA256.Create())
{
hash = Convert.ToBase64String(sha256.ComputeHash(imageBytes));
}
The value of hash then will be something like z1JZs/EwmDGW97RuXtRDjlt277kH+11EEBHtkbVsUhE=.
However, ASP.NET Core has an ImageTagHelper built-in that will handle this for you. Essentially, you just need to do:
<img src="/path/to/image.jpg" asp-append-version="true" />
As for your second question, about hiding or obfuscating the image path, that's not strictly possible, but can be worked around. The URL you use to reference the image uniquely identifies that resource. If you change it in any way, it's effectively not the same resource any more, and thus, would not locate the actual image you wanted to display. So, in a strict sense, no, you cannot change the URL. However, you can proxy the request through a different URL, effectively obfuscating the URL for the original image.
Simply, you'd just have an action on some controller that takes an image path (as part of the query string), loads that from the filesystem and returns it as a response. Care should be taken limit the scope of files that can be returned like this, both based on directory (only allow your image directory, for example, not C:\Windows\, etc.) and file type (only allow images to be returned, not random text files, config files, etc.). That portion is straight-forward enough, and you can find many examples online if you need them.
Ultimately, this doesn't really solve anything, though, because now your image path is simply in the query string instead. However, now that you've set this part up, you can encrypt that part of the query string using the Data Protection API. There's some basic getting started information available in the docs. Essentially, you're just going to encrypt the image path when creating the URL, and then in your action that returns the image, you decrypt the path first before running the rest of the code. For the encryption part, you can create a tag helper to do this for you without having to have a ton of logic in your views.

Which verb should be used for custom actions?

Let's say we have a site where we have a list of items. On each of these items you can start a couple of different process that will result in somekind of output related to the item in question. How should you design for the most appropriate use of the http verbs? What I would like to have is multiple links per item and each link trigger one of the actions, but in my scenario that doesn't match the HTTP-VERB get, which will be used if I am using links. On the other hand, I don't want to have buttons which all are in a separate form with different actions.
It's somewhat hard to explain but hopefully you understand, it should be some best practices to apply here.
You should NOT use GET. GET requests should be safe which means they are intended only for information retrieval and should not change the state of the server. (i.e. things like logging are OK, but things that actually update the state of the application are a no-no.) Think of a crawler going over your application. Anything you wouldn't mind a crawler going through is fine for GET, but that doesn't sound like your situation (because you said, "start a couple of different processes", but I could be misinterpreting your use case).
That leaves PUT, DELETE and POST. PUT and DELETE must be idempotent, meaning that multiple identical requests should have the same effect as a single request. So if you had a request that updated a person's name, for example, if you called it once or 100 times, the person's name would still be the same, so it is idempotent.
POST is the most flexible verb. If the processes you are kicking off are not safe or idempotent (or even if they are) you can use POST, which simply doesn't guarantee anything about safety or idempotency. The disadvantages there are:
If you use POST when GET is more semantically correct, it is less communicative of the intent of your request, since POST usually means you are sending a payload.
You just couldn't take advantage of the web's caching infrastructure that makes it so scalable.
In the past, I have used POST with query args to specify custom actions. It made sense in my use case because I had a majority of custom actions needing to pass a payload. Since you do not want to use buttons, you can use GET with query args to specify the different actions, but you have to be very careful that the action you are taking does not have any side effects and is idempotent. As noted in the comment by #jhericks below, there are many things in the network that assume that GET's are safe and may repeat GET's.
From a pure RESTful perspective though, this is not ideal. Your items will have a specific URI and GET on the URI will return the items representation. Running actions on the item is effectively a change in the state of the item representation and that should be done with a POST(or a PUT depending on who you ask and if your web server supports PUT). In real life though, using query args is an easy work around and it may make sense to your use case.
Im not sure i fully understand your question.
But here's a quick paragraph which might help you.
REST is about making smart clients and simple servers. GET, PUT, DELETE represent the basic operations of file access at the lowest level. What you should be doing is completely ignoring anything the server can offer and be offloading that work onto clients.
So, the question is, why is the server being triggered to do many things. why can't the client do all of these things itself.
Mike Brown

Generally a Good Idea to Always Hash Unique Identifiers in URL?

Most sites which use an auto-increment primary-key display it openly in the url.
i.e.
example.org/?id=5
This makes it very easy for anyone to spider a site and collect all the information by simply incrementing the value of id. I can understand where in some cases this is a bad thing if permissions/authentication are not setup correctly and anyone could view anything by simply guessing the id, but is it ever a good thing?
example.org/?id=e4da3b7fbbce2345d7772b0674a318d5
Is there ever a situation where hashing the id to prevent crawling is bad-practice (besides losing the time it takes to setup this functionality)? Or is this all a moot topic because by putting something on the web you accept the risk of it being stolen/mined?
Generally with web-sites you're trying to make them easy to crawl and get access to all the information so that you can get good search rankings and drive traffic to your site. Good web developers design their HTML with search engines in mind, and often also provide things like RSS feeds and site maps to make it easier to crawl content. So if you're trying to make crawling more difficult by not using sequential identifiers then (a) you aren't making it more difficult, because crawlers work by following links, not by guessing URLs, and (b) you're trying to make something more difficult that you also spend time trying to make easier, which makes no sense.
If you need security then use actual security. Use checks of the principal to authorize or deny access to resources. Obfuscating URLs is no security at all.
So I don't see any problem with using numeric identifiers, or any value in trying to obfuscate them.
Using a hash like MD5 or SHA on the ID is not a good idea:
there is always the possibility of collisions. That is, two different IDs hash to the same value.
How are you going to unhash it back to the actual ID?
A better approach if you're set on avoiding incrementing IDs would be to use a GUID, or just a random value when you create the ID.
That said, if your application security relies on people not guessing an ID, that shows some flaws elsewhere in the system. My advice: stick to the plain and easy auto-incrementing ID and apply some proper access control.
I think hashing for publicly accessible id's is not a bad thing, but showing sequential id's will in some cases be a bad thing. Even better, use GUID/UUIDs for all your IDs. You can even use sequential GUIDS in a lot of technologies, so it's faster (insert-stage) (though not as good in a distributed environment)
Hashing or randomizing identifiers or other URL components can be a good practice when you don't want your URLs to be traversable. This is not security, but it will discourage the use (or abuse) of your server resources by crawlers, and can help you to identify when it does happen.
In general, you don't want to expose application state, such as which IDs will be allocated in the future, since it may allow an attacker to use a prediction in ways that you didn't forsee. For example, BIND's sequential transaction IDs were a security flaw.
If you do want to encourage crawling or other traversal, a more rigorous way would be to provide links, rather than by providing an implementation detail which may change in the future.
Using sequential integers as IDs can make many things cheaper on your end, and might be a resonable tradeoff to make.
My opinion is that if something is on the web, and is served without requiring authorization, it was put with the intention that it should be publicly accessible. Actively trying to make it more difficult to access seems counter-intuitive.
Often, spidering a site is a Good Thing. If you want your information available as much as possible, you want sites like Google to gather data on your site, so that others can find it.
If you don't want people to read through your site, use authentication, and deny access to people who don't have access.
Random-looking URLs only give the impression of security, without giving the reality. If you put account information (hidden) in a URL, everyone will have access to that web spider's account.
My general rule is to use a GUID if I'm showing something that has to be displayed in a URL and also requires credentials to access or is unique to a particular user (like an order id). http://site.com/orders?id=e4da3b7fbbce2345d7772b0674a318d5
That way another user won't be able to "peek" at the next order by hacking the url. They may be denied access to someone else's order, but throwing a zillion letters and numbers at them is a pretty clear way to say "don't mess with this".
If I'm showing something that's public and not tied to a particular user, then I may use the integer key. For example, for displaying pictures, you might wish to allow your users to hack the url to see the next picture.
http://example.org/pictures?id=4, http://example.org/pictures?id=5, etc.
(I actually wouldn't do either as a simple GET parameter, I'd use mod_rewrite (or something) to make readable urls. Something like http://example.org/pictures/4 -> /pictures.php?picture_id=4, etc.)
Hashing an integer is a poor implementation of security by obscurity, so if that's the goal, a true GUID or even a "sequential" GUID (whether via NEWSEQUENTIALID() or COMB algorithm) is much better.
Either way, no one types URLs anymore, so I don't see much sense in worrying about the difference in length.