I have a couple of questions about images, since I don't know what is better for my purposes. Also this might me helpful for other people because I couldn't find this info in other questions.
Well, although this is an asp.net core 2.0 application the first question could is a general question about images.
QUESTION 1
When I have images that I want to load everytime I usually add a query string so the explorers like Chrome or IE don't get the chached image they have. In my case I add the time ticks to the url of the image, this way it loads the image everytime since the query string is always different:
filePath += "?" + DateTime.Now.Ticks;
But in my case I have a panel where the administrators of the page can change a lot of images. The problem, when they change those images if there is no query string the users are going to see an old image they have stored in their explorer cache.
The question is, if I add the query string to many images is not bad for the performance? is there any other solution for this?
QUESTION 2
I also have photos of the users and other images stored in the site. When I saw a image all the visitors of the site can see the path (for example: www.site.com/user_files/user_001/photo001.jpg).
Is there a way to hide those paths or transform in another thing is asp.net core 2.0?
Thanks a lot.
Using something like ticks will get the job done, but in a very naive way. You're going to put more stress both on your server and the clients, since effectively the image will have to be refetched every single time, regardless of whether it has changed or not. If you will have any mobile users, the situation is far worse for them, as they'll be forced to redownload all these resources over and over, usually over limited (and costly) data plans.
A far better approach is to use a cryptographic digest, often called a "hash". Essentially, the same data encrypted in the same way will return the same hash. It's usually used to detect tampering with transmitted data, but since each message will (generally) have a unique hash and that hash will be the same each time for the same piece of data, you can also use this to generate a cache-busting query string that only changes when the image data itself changes.
Now, to be thorough, there's technically no guarantee that two messages won't result in the same hash. Instances where that occurs are called "collisions" and they can happen. However, if you use a sufficiently complex algorithm like SHA256, the likelihood of collisions is greatly reduced. Regardless, it should not be a real issue for concern for this particular use case of cache-busting images.
Simplistically, to create the hash, you simply do something like:
string hash;
using (var sha256 = SHA256.Create())
{
hash = Convert.ToBase64String(sha256.ComputeHash(imageBytes));
}
The value of hash then will be something like z1JZs/EwmDGW97RuXtRDjlt277kH+11EEBHtkbVsUhE=.
However, ASP.NET Core has an ImageTagHelper built-in that will handle this for you. Essentially, you just need to do:
<img src="/path/to/image.jpg" asp-append-version="true" />
As for your second question, about hiding or obfuscating the image path, that's not strictly possible, but can be worked around. The URL you use to reference the image uniquely identifies that resource. If you change it in any way, it's effectively not the same resource any more, and thus, would not locate the actual image you wanted to display. So, in a strict sense, no, you cannot change the URL. However, you can proxy the request through a different URL, effectively obfuscating the URL for the original image.
Simply, you'd just have an action on some controller that takes an image path (as part of the query string), loads that from the filesystem and returns it as a response. Care should be taken limit the scope of files that can be returned like this, both based on directory (only allow your image directory, for example, not C:\Windows\, etc.) and file type (only allow images to be returned, not random text files, config files, etc.). That portion is straight-forward enough, and you can find many examples online if you need them.
Ultimately, this doesn't really solve anything, though, because now your image path is simply in the query string instead. However, now that you've set this part up, you can encrypt that part of the query string using the Data Protection API. There's some basic getting started information available in the docs. Essentially, you're just going to encrypt the image path when creating the URL, and then in your action that returns the image, you decrypt the path first before running the rest of the code. For the encryption part, you can create a tag helper to do this for you without having to have a ton of logic in your views.
Related
We are developing server with REST API, which accepts and responses with JSON. The problem is, if you need to upload images from client to server.
Note: and also I am talking about a use-case where the entity (user) can have multiple files (carPhoto, licensePhoto) and also have other properties (name, email...), but when you create new user, you don't send these images, they are added after the registration process.
The solutions I am aware of, but each of them have some flaws
1. Use multipart/form-data instead of JSON
good : POST and PUT requests are as RESTful as possible, they can contain text inputs together with file.
cons : It is not JSON anymore, which is much easier to test, debug etc. compare to multipart/form-data
2. Allow to update separate files
POST request for creating new user does not allow to add images (which is ok in our use-case how I said at beginning), uploading pictures is done by PUT request as multipart/form-data to for example /users/4/carPhoto
good : Everything (except the file uploading itself) remains in JSON, it is easy to test and debug (you can log complete JSON requests without being afraid of their length)
cons : It is not intuitive, you cant POST or PUT all variables of entity at once and also this address /users/4/carPhoto can be considered more as a collection (standard use-case for REST API looks like this /users/4/shipments). Usually you cant (and dont want to) GET/PUT each variable of entity, for example users/4/name . You can get name with GET and change it with PUT at users/4. If there is something after the id, it is usually another collection, like users/4/reviews
3. Use Base64
Send it as JSON but encode files with Base64.
good : Same as first solution, it is as RESTful service as possible.
cons : Once again, testing and debugging is a lot worse (the body can have megabytes of data), there is increase in size and also in processing time in both - client and server
I would really like to use solution no. 2, but it has its cons... Anyone can give me a better insight of "what is best" solution?
My goal is to have RESTful services with as much standards included as possible, while I want to keep it as simple as possible.
OP here (I am answering this question after two years, the post made by Daniel Cerecedo was not bad at a time, but the web services are developing very fast)
After three years of full-time software development (with focus also on software architecture, project management and microservice architecture) I definitely choose the second way (but with one general endpoint) as the best one.
If you have a special endpoint for images, it gives you much more power over handling those images.
We have the same REST API (Node.js) for both - mobile apps (iOS/android) and frontend (using React). This is 2017, therefore you don't want to store images locally, you want to upload them to some cloud storage (Google cloud, s3, cloudinary, ...), therefore you want some general handling over them.
Our typical flow is, that as soon as you select an image, it starts uploading on background (usually POST on /images endpoint), returning you the ID after uploading. This is really user-friendly, because user choose an image and then typically proceed with some other fields (i.e. address, name, ...), therefore when he hits "send" button, the image is usually already uploaded. He does not wait and watching the screen saying "uploading...".
The same goes for getting images. Especially thanks to mobile phones and limited mobile data, you don't want to send original images, you want to send resized images, so they do not take that much bandwidth (and to make your mobile apps faster, you often don't want to resize it at all, you want the image that fits perfectly into your view). For this reason, good apps are using something like cloudinary (or we do have our own image server for resizing).
Also, if the data are not private, then you send back to app/frontend just URL and it downloads it from cloud storage directly, which is huge saving of bandwidth and processing time for your server. In our bigger apps there are a lot of terabytes downloaded every month, you don't want to handle that directly on each of your REST API server, which is focused on CRUD operation. You want to handle that at one place (our Imageserver, which have caching etc.) or let cloud services handle all of it.
small 2023 update: If possible, but CDN in front of the pictures, it usually will save you a lot of money and make the pictures even more available (i.e. no issues when peaks happen).
Cons : The only "cons" which you should think of is "not assigned images". User select images and continue with filling other fields, but then he says "nah" and turn off the app or tab, but meanwhile you successfully uploaded the image. This means you have uploaded an image which is not assigned anywhere.
There are several ways of handling this. The most easiest one is "I don't care", which is a relevant one, if this is not happening very often or you even have desire to store every image user send you (for any reason) and you don't want any deletion.
Another one is easy too - you have CRON and i.e. every week and you delete all unassigned images older than one week.
There are several decisions to make:
The first about resource path:
Model the image as a resource on its own:
Nested in user (/user/:id/image): the relationship between the user and the image is made implicitly
In the root path (/image):
The client is held responsible for establishing the relationship between the image and the user, or;
If a security context is being provided with the POST request used to create an image, the server can implicitly establish a relationship between the authenticated user and the image.
Embed the image as part of the user
The second decision is about how to represent the image resource:
As Base 64 encoded JSON payload
As a multipart payload
This would be my decision track:
I usually favor design over performance unless there is a strong case for it. It makes the system more maintainable and can be more easily understood by integrators.
So my first thought is to go for a Base64 representation of the image resource because it lets you keep everything JSON. If you chose this option you can model the resource path as you like.
If the relationship between user and image is 1 to 1 I'd favor to model the image as an attribute specially if both data sets are updated at the same time. In any other case you can freely choose to model the image either as an attribute, updating the it via PUT or PATCH, or as a separate resource.
If you choose multipart payload I'd feel compelled to model the image as a resource on is own, so that other resources, in our case, the user resource, is not impacted by the decision of using a binary representation for the image.
Then comes the question: Is there any performance impact about choosing base64 vs multipart?. We could think that exchanging data in multipart format should be more efficient. But this article shows how little do both representations differ in terms of size.
My choice Base64:
Consistent design decision
Negligible performance impact
As browsers understand data URIs (base64 encoded images), there is no need to transform these if the client is a browser
I won't cast a vote on whether to have it as an attribute or standalone resource, it depends on your problem domain (which I don't know) and your personal preference.
Your second solution is probably the most correct. You should use the HTTP spec and mimetypes the way they were intended and upload the file via multipart/form-data. As far as handling the relationships, I'd use this process (keeping in mind I know zero about your assumptions or system design):
POST to /users to create the user entity.
POST the image to /images, making sure to return a Location header to where the image can be retrieved per the HTTP spec.
PATCH to /users/carPhoto and assign it the ID of the photo given in the Location header of step 2.
There's no easy solution. Each way has their pros and cons . But the canonical way is using the first option: multipart/form-data. As W3 recommendation guide says
The content type "multipart/form-data" should be used for submitting forms that contain files, non-ASCII data, and binary data.
We aren't sending forms,really, but the implicit principle still applies. Using base64 as a binary representation, is incorrect because you're using the incorrect tool for accomplish your goal, in other hand, the second option forces your API clients to do more job in order to consume your API service. You should do the hard work in the server side in order to supply an easy-to-consume API. The first option is not easy to debug, but when you do it, it probably never changes.
Using multipart/form-data you're sticked with the REST/http philosophy. You can view an answer to similar question here.
Another option if mixing the alternatives, you can use multipart/form-data but instead of send every value separate, you can send a value named payload with the json payload inside it. (I tried this approach using ASP.NET WebAPI 2 and works fine).
I have an optimization question.
Lets say that I'm making a website, and it has a JSON file with 5,000 pairs (about 582 kb) and through the combination of 3 sliders and some select tags it is possible to display every value. So the time to appear between every pair is in microseconds.
My question is: If the website is also made to run on mobile browsers, where is it more efficient to have the 5000 pairs of data - in a JSON file or in the data base? and why?
I am building a photo site with similar requirements and I can say after months of investigations and experimenting that there are no easy answer to that question. But I will try to give you some hints:
Try to divide the data in chunks, for example - if your sliders are selecting values between 1 through 100, instead of delivering exactly what the client selected, round up a bit maybe +-10 or maybe more, that way you can continue filtering on the client side without a server roundtrip. Save all data in client memory before querying.
Don't render more than what is visible on the screen, JSON storage and filtering is fast but DOM is very slow, minimize the visible elements.
Use 304 caching - meaning - whenever the client is requesting the same data twice; send a proper 304 response with etag. For example - a good rule of thumb here is to use something you know very easily, like the max ID in the database or so to see if any new data has been updated since the last call. If not, just send 304 and the client will use whatever he had the last time.
Use absolute positioning. Don't even try to use the CSS float or something like that, it will not work. Just calculate each position of each element. This will also help you to achieve tip nr 2 (by filtering out all elements that are outside of the visible screen). You can still use CSS transitions which gives nice animations when they change sliders.
You could experiment with IndexedDB to help with the client side querying but unfortunately the support in different browsers are still not good enough plus you hit the roof on storage, better to use the ordinary cache and with proper headings.
Good Luck!
A database like MongoDB would be good for this. It still uses the JSON syntax for storage so you can use the values from the JSON file. The querying is very fast too and you wouldn't have to parse the JSON file and store it in an object before using it.
Given the size of the data (just 582Kb) I will opt for the Json file.
The drawback is you will have a penalty starting the app and loading the data in memory, but then all queries will run very fast in memory as a good advantage.
You need to think about how much acceses will your app do to the database (how many queries) against load the file just once. And think if your main objective are mobile browsers or pcs.
For this volume of data I wouldn't try a database (another process consuming resources), just try how much resources (time, memory) are needed to load the JSON file.
If the data is going to grow... then you will need to rethink this, or maybe split your json file following some criteria.
I want to store a large number of unique, indexable urls in mysql. They will be written by the user and will have a really broad acceptance, so almost anything goes there.
But I don't like the already answered SO question nor it's predecessor. Actually I think there might be a 'better' way for doing so, but I don't completely see the implications of this, so that's why I ask this question.
What are the advantages/disadvantages of using tinyurl.com's API for this purpose? In this link we can see an article about how to do this. It's just this code (could be even shorter but I prefer it in a function):
function createTinyUrl($strURL)
{
return file_get_contents("http://tinyurl.com/api-create.php?url=".$strURL);
}
Then I would have to compare only the example in tinyurl.com/example to check the uniqueness of the link. From the wikipedia article we can read:
If the URL has already been requested, TinyURL will return the existing alias rather than create a duplicate entry.
Disadvantages (basically same ones as using any centralized system):
Tinyurl servers could go down temporarily or forever. Just in case, the full url will be stored in another not-indexed field.
Non reversible. You can get a tinyurl from your url, but the other way back is more difficult. However, first problem's solution also solve this.
In my case, the urls are NOT private, but this could prevent someone from using this solution.
Delay. The requests might be a bit slower, but since it's server to server talking, I don't think this will be noticeable. Will do some benchmark though.
Any other disadvantage that you can see using this method? The main reason for using it is not to have duplicated entries and to make them indexable.
As the user eggyal suggested, a hash based index of the full URI is really similar to using the Tinyurl API, only I don't need to send data anywhere. I will use that then.
NOTE: this answer is here only to add closure to the question
Some setup:
We have some static images, publicly available. However, we want to be able to reference these images with alternative URLs in the image tag.
So, for example, we have an image with a URL like:
http://server.com/images/2/1/account_number/public/assets/images/my_cool_image.jpg
And, we want to insert that into our front html as:
<img src="http://server.com/image/2/my_cool_image.jpg">
instead of
<img src="http://server.com/images/2/1/account_number/public/assets/images/my_cool_image.jpg">
One really neat solution we've come up with is to use 301 redirects. Now, our testing has rendered some pretty neat results (all current generation browsers work), but I am wondering if there are caveats to this approach that I may be missing.
EDIT: To clarify, the reason we want to use this approach is that we are also planning on using an external host to serve up resources, and we want to be able to turn this off on occasion. So, perhaps the URL in the would be
http://client.com/image/3/cool_image.jpg
in addition to the "default" way of accessing
Another technique that doesn't require a server round-trip is to use your HTTP server's equivalent of a "rewrite engine". This allows you to specify, in the server configuration, that requests for one URL should be satisfied by sending the results for some other URL instead. This is more efficient than telling the browser to "no, go look over there".
One possible downside may be SEO - a long and complex file name with generic folder names may hinder a shorter/snappier URL.
The main issue though is that there will be an extra HTTP request for every image. In general you should try and minimise HTTP requests to improve performance. I think you should rewrite the URLs to the longer versions behind the scenes as others have said.
You don't have to rewrite the entire URL with the final URL. If you want a little more flexibility, you could rewrite the images like this:
http://server.com/image/2/my_cool_image.jpg
Rewritten as:
http://server.com/getImage?id=2&name=my_cool_image.jpg
Then the "getImage" script would read a config file either serve up the longer-names file on server.com or the other one from client.com. You only have one trip to the server and a tiny bit of overhead on the server itself, unnoticeable to the visitor.
The only real downside is a second DNS lookup and a little server overhead to calculate the redirect, which both affect performance. Otherwise I can't think of any problem with this technique.
A slug is part of a URL that describes or titles a page and is usually keyword rich for that page improving SEO. e.g. In this URL PHP/JS - Create thumbnails on the fly or store as files that last section "php-js-create-thumbnails-on-the-fly-or-store-as-files" is the slug.
Currently I am storing the slug for each page with the page's record in the DB. The slug is generated from the Title field when the page is generated and stored with the page. However, I'm considering generating the slug on the fly in case I want to change it. I'm trying to work out which is better and what others have done.
So far I've come up with these pro points for each one:
Store slug:
- "Faster" processor doesn't need to generate it each time (is generated once)
Generate-on-the fly:
- Flexible (can adjust slug algorithm and don't need to regen for whole table).
- Uses less space in DB
- Less data transferred from DB to App
What else have I missed and how do/would you do it?
EDIT:
I'd just like to clarify what looks like a misunderstanding in the answers. The slug has no effect on landing on the correct page. To understand this just chop off or mangle any part of the slug on this site. e.g.:
PHP/JS - Create thumbnails on the fly or store as files
PHP/JS - Create thumbnails on the fly or store as files
PHP/JS - Create thumbnails on the fly or store as files
will all take you to the same page. The slug is never indexed.
You wouldn't need to save the old slugs. If you landed on a page which had an "old slug" then you can detected that and just do a 301 redirect to the correctly "slugged" one. In the examples above, if Stack Overflow implemented it, then when you landed on any of the links with truncated slugs above, it would compare the slug in the url to the one generated by the current slug algorithm and if different it would do a 301 redirect to the same page but with the new slug.
Remember that all internally generated links would immediately be using the new algorithm and only links from outside pointing in would be using the old slug.
Wouldn't changing the slugs for existing pages be a really bad idea? It would break all your inlinks for a start.
Edit, following Guy's clarification in the question: You still need to take old slugs into account. For instance: if you change your slug algorithm Google could start to see multiple versions of each page, and you could suffer a duplicate content penalty, or at best end up sharing PR and SERPs between multiple versions of the same page. To avoid that, you'd need a canonical version of the page that any non-canonical slugs redirected to - and hence you'd need the canonical slug in the database anyway.
You might need to take another thing into consideration, what if you want the user/yourself to be able to define their own slugs. Maybe the algorithm isn't always sufficient.
If so you more or less need to store it in the database anyhow.
If not I don't think it matters much, you could generate them on the fly, but if you are unsure whether you want to change them or not let them be in the database. In my eyes there is no real performance issue with either method (unless the on-the-fly generation is very slow or something like that).
Choose the one which is the most flexible.
For slug generation I don't think that generation time should be an issue, unless your slug algorithm is insanely complicated! Similarly, storage space won't be an issue.
I would store the slug in the database for the simple reason that slugs usually form part of a permalink and once a permalink is out in the wild it should be considered immutable. Having the ability to change a slug for published data seems like a bad idea.
The best way to handle slugs is to only store the speaking part of the slug in the database and keep the routing part with the unique identifier for dynamic generation. Otherwise (if you store the whole url or uri) in the database it might become a massive task to rewrite all the slugs in the database first if you changed your mind about how to call them.
Let's take this questions SO slug as example:
/questions/807195/should-i-create-a-slug-on-the-fly-or-store-in-db
it's:
/route/unique-ID/the-speaking-part-thats-not-so-important
The dynamic part is obviously:
/route/unique-ID/
And the one I would store in the database is the speaking part:
the-speaking-part-thats-not-so-important
This allows you to always change your mind about the route's name and do the proper redirects without to have to look inside the database first and you're not forced to do db changes. The unique Id is always your database data unique Id so you can identify it correctly and you of cause know what your routes are.
And don't forget to set the canonical tag. If you take a look inside this page code it's there:
<link rel="canonical" href="http://stackoverflow.com/questions/807195/should-i-create-a-slug-on-the-fly-or-store-in-db" />
This allows search engines to identify the correct page link and ignore others in case you have duplicate content.