I am writing a web app and I would like to display timestamps on the page in the user's localtime. There seems to be several ways to do this but it is not obvious what is a good way.
Use geolocation from the IP address to get the timezone - This seems like a lot of overhead.
Use javascript to finally render the datetime on the page - This seems like a lot of complex client side javascript.
Use javascript to get the timezone and locale from the user's preferences and save it in a cookie. The server can then use this to format the date. Server side code is nice, but there don't seem to be many good ways to get this for the first page load.
Any options for http request headers? Which ones? How reliable across browsers are they?
Any advice on good ways to implement this?
Try my timeago jquery plugin. It converts UTC timestamps into fuzzy time phrases (e.g. "about 2 hours ago").
You output HTML like this:
<abbr class="timeago" title="2008-07-17T09:24:17Z">July 17, 2008</abbr>
and the plugin turns that into something like this:
<abbr class="timeago" title="July 17, 2008">about a year ago</abbr>
by using a little jQuery like this:
jQuery(document).ready(function() {
jQuery('abbr.timeago').timeago();
});
#2 is not very complex, you could even use something like http://ejohn.org/blog/javascript-pretty-date/ to display times like on this site ("2 hours ago").
If you absolutely against that, make a guess based on either JS time (seems less error-prone than guessing by IP) and let the user change it.
1) Yes, it's overhead. But you could get it once per session, and then store the offset.
2) Displaying the date with JavaScript isn't too complex. Look at the Date object.
I would make a guess at the user's timezone from his computer's local timezone (using JS), and then store that. I would also allow them to change the offset in their preferences of your web app.
Related
I have a couple of questions about images, since I don't know what is better for my purposes. Also this might me helpful for other people because I couldn't find this info in other questions.
Well, although this is an asp.net core 2.0 application the first question could is a general question about images.
QUESTION 1
When I have images that I want to load everytime I usually add a query string so the explorers like Chrome or IE don't get the chached image they have. In my case I add the time ticks to the url of the image, this way it loads the image everytime since the query string is always different:
filePath += "?" + DateTime.Now.Ticks;
But in my case I have a panel where the administrators of the page can change a lot of images. The problem, when they change those images if there is no query string the users are going to see an old image they have stored in their explorer cache.
The question is, if I add the query string to many images is not bad for the performance? is there any other solution for this?
QUESTION 2
I also have photos of the users and other images stored in the site. When I saw a image all the visitors of the site can see the path (for example: www.site.com/user_files/user_001/photo001.jpg).
Is there a way to hide those paths or transform in another thing is asp.net core 2.0?
Thanks a lot.
Using something like ticks will get the job done, but in a very naive way. You're going to put more stress both on your server and the clients, since effectively the image will have to be refetched every single time, regardless of whether it has changed or not. If you will have any mobile users, the situation is far worse for them, as they'll be forced to redownload all these resources over and over, usually over limited (and costly) data plans.
A far better approach is to use a cryptographic digest, often called a "hash". Essentially, the same data encrypted in the same way will return the same hash. It's usually used to detect tampering with transmitted data, but since each message will (generally) have a unique hash and that hash will be the same each time for the same piece of data, you can also use this to generate a cache-busting query string that only changes when the image data itself changes.
Now, to be thorough, there's technically no guarantee that two messages won't result in the same hash. Instances where that occurs are called "collisions" and they can happen. However, if you use a sufficiently complex algorithm like SHA256, the likelihood of collisions is greatly reduced. Regardless, it should not be a real issue for concern for this particular use case of cache-busting images.
Simplistically, to create the hash, you simply do something like:
string hash;
using (var sha256 = SHA256.Create())
{
hash = Convert.ToBase64String(sha256.ComputeHash(imageBytes));
}
The value of hash then will be something like z1JZs/EwmDGW97RuXtRDjlt277kH+11EEBHtkbVsUhE=.
However, ASP.NET Core has an ImageTagHelper built-in that will handle this for you. Essentially, you just need to do:
<img src="/path/to/image.jpg" asp-append-version="true" />
As for your second question, about hiding or obfuscating the image path, that's not strictly possible, but can be worked around. The URL you use to reference the image uniquely identifies that resource. If you change it in any way, it's effectively not the same resource any more, and thus, would not locate the actual image you wanted to display. So, in a strict sense, no, you cannot change the URL. However, you can proxy the request through a different URL, effectively obfuscating the URL for the original image.
Simply, you'd just have an action on some controller that takes an image path (as part of the query string), loads that from the filesystem and returns it as a response. Care should be taken limit the scope of files that can be returned like this, both based on directory (only allow your image directory, for example, not C:\Windows\, etc.) and file type (only allow images to be returned, not random text files, config files, etc.). That portion is straight-forward enough, and you can find many examples online if you need them.
Ultimately, this doesn't really solve anything, though, because now your image path is simply in the query string instead. However, now that you've set this part up, you can encrypt that part of the query string using the Data Protection API. There's some basic getting started information available in the docs. Essentially, you're just going to encrypt the image path when creating the URL, and then in your action that returns the image, you decrypt the path first before running the rest of the code. For the encryption part, you can create a tag helper to do this for you without having to have a ton of logic in your views.
I am implementing i18n in my webapp and am in the testing phase at the moment. I am using java.util.Locale on the server side to pass the locale to the relevant APIs (date time etc) that consume the information. Here is my setup:
browser language has been set to "Hindi"
operating System country has been set to "India"
I send a request to the server expecting the "Accept-Language" header to be hi-IN but the value remains hi regardless of country setting on my OS ... actual value Accept-Language:hi;en-US,en;q=0.8,q=0.6
my webapp uses the incoming value in the request header and does i18n or l10n accordingly by loading the appropriate language translation from resource files
I have a test case where I manually send in new Locale("hi", "IN") to indicate language and country. This test case prints values in the correct language as I expect but since the value coming in from the request is only hi, I am unable to see the desired result.
Not sure why the browsers (Chrome and Firefox) do not support the language_country format for all entries in their selection. Is there anything I am missing?
Edit: I made a few corrections based on the answer by #pawel-dyda. To quote a part of his reponse
Your language tag should be hi-IN, which I believe should explain the odd behaviour.
The crux of the issue (the reason I am raising this question here) is that I am unable to get my browser to send the value hi-IN to the server in the Accept-Language header.
I think you're missing few things.
Regarding to second point, setting operating system country usually doesn't affect what web browser sends on its Accept-Language list. Usually, because I can give you the counter example: Safari on Mac OS X.
There is a slight chance that it has some effect on mobile web browsers, but I haven't performed any tests myself.
In regards to points 3 and 5... Well, you gave an example of Accept-Language list. Please take a closer look on it: it contains en-US, that is English (US). Your language tag should be hi-IN, which I believe should explain the odd behaviour.
I am not sure what you meant in point 4. Not knowing the implementation details, I can only guess that you're trying to load resource files (and judging by the locale format it would be Java properties...) as well as have some defaults for things like formatting.
For properties files usually (not always!) language alone is enough. But there is a problem with formatting.
Well, most of the times you will receive merely the language and you have no choice, but to accept this fact. There are two ways to mitigate this issue:
You can implement a user profile and let user choose his/her preferred UI language and formatting settings (it is best practice to keep those separated).
You can "guess" the most likely country. In case of Hindi, it's quite obvious what will be the result of guessing. It is a bit more complicated in case of for example German, which is used in Germany ("default"), Austria and Switzerland. There are obviously many more cases, if you want to find the aid in "guessing", CLDR is the best source of information.
The best approach is to actually implement locale settings in the user profile, but use smart guessing based on data taken from CLDR; basically you combine points 1 & 2.
And don't forget about fallback! That is locale fallback (going through the list in Accept-Language header until you find something that your application supports) and resource fallback (should you have messages_fr.properties, but no messages_fr_ca.properties, but the request came as fr-CA, it makes sense to return French translations from the prior file).
By the way: you can open Firefox about:config site. It has a setting named intl.accept_languages. I bet, that if you change its contents, you'll be able to send what you want. However, as I said it is useless, because users won't change their settings...
I have an optimization question.
Lets say that I'm making a website, and it has a JSON file with 5,000 pairs (about 582 kb) and through the combination of 3 sliders and some select tags it is possible to display every value. So the time to appear between every pair is in microseconds.
My question is: If the website is also made to run on mobile browsers, where is it more efficient to have the 5000 pairs of data - in a JSON file or in the data base? and why?
I am building a photo site with similar requirements and I can say after months of investigations and experimenting that there are no easy answer to that question. But I will try to give you some hints:
Try to divide the data in chunks, for example - if your sliders are selecting values between 1 through 100, instead of delivering exactly what the client selected, round up a bit maybe +-10 or maybe more, that way you can continue filtering on the client side without a server roundtrip. Save all data in client memory before querying.
Don't render more than what is visible on the screen, JSON storage and filtering is fast but DOM is very slow, minimize the visible elements.
Use 304 caching - meaning - whenever the client is requesting the same data twice; send a proper 304 response with etag. For example - a good rule of thumb here is to use something you know very easily, like the max ID in the database or so to see if any new data has been updated since the last call. If not, just send 304 and the client will use whatever he had the last time.
Use absolute positioning. Don't even try to use the CSS float or something like that, it will not work. Just calculate each position of each element. This will also help you to achieve tip nr 2 (by filtering out all elements that are outside of the visible screen). You can still use CSS transitions which gives nice animations when they change sliders.
You could experiment with IndexedDB to help with the client side querying but unfortunately the support in different browsers are still not good enough plus you hit the roof on storage, better to use the ordinary cache and with proper headings.
Good Luck!
A database like MongoDB would be good for this. It still uses the JSON syntax for storage so you can use the values from the JSON file. The querying is very fast too and you wouldn't have to parse the JSON file and store it in an object before using it.
Given the size of the data (just 582Kb) I will opt for the Json file.
The drawback is you will have a penalty starting the app and loading the data in memory, but then all queries will run very fast in memory as a good advantage.
You need to think about how much acceses will your app do to the database (how many queries) against load the file just once. And think if your main objective are mobile browsers or pcs.
For this volume of data I wouldn't try a database (another process consuming resources), just try how much resources (time, memory) are needed to load the JSON file.
If the data is going to grow... then you will need to rethink this, or maybe split your json file following some criteria.
Horribly worded question...I know.
I'm working on an application that processes data for the previous day. The problem is that I know the customer is going to eventually ask to it for every hour or some other arbitrary time interval. I know that languages such as Java or SQL have masks for defining dates. Well what about a way to define a time interval?
Let me ask it this way. If someone asked you to create a configurable piece of software how would you allow the user to specify the time intervals?
UPDATE
What I'm looking for is a textual representation. Imagine somethign that would be put into a config file.
Here's how Google App Engine's cron API is configured, it's a bit more user-friendly than cron:
http://code.google.com/appengine/docs/python/config/cron.html#The_Schedule_Format
Python (and Java?) source to implement it should be available in the SDK. Haven't looked to see how easy it would be to extract, but it should at least provide some ideas.
I think it'd be possible to add various things to the format as required - it's structured enough to be extensible. For example it's currently missing the ability to say "every hour at 4 minutes past", which is common in UNIX cron but not really relevant to GAE because although perhaps the engine could figure out which minutes are busy and which aren't, and balance load, the user certainly can't.
Obviously the major weakness of offering something that looks like natural language to the average user, is that they'll think your code is psychic, and expect it to also understand things like "every other Wednesday except the week after Easter", or "whenever the clocks change" ;-)
I'm not sure if I understand your question correctly. Are you asking for a way to input time intervals?
I don't know your environment, but most language frameworks offer some more or less sophisticated GUI components. Something like a calendar control may be what you are looking for?
Edit: Ah, command line (or config file) it is.
Can you parse an XML file? In this case, you could just serialize a timespan (most languages have a class like this). The use may edit the XML file, typically just replacing, say, the value of the <minutes> part. Deserializing this XML file to obtain a Timespan again is easy.
If you prefer plaintext or command line input, it is a bit less easy. However, many languages support some kind of Timespan.Parse (string text, string format). You should have a look if this concept is present in your environment.
Many environment offer some kind of sscanf(), that parses an input string. Sometimes there is a "time" format as well.
If the earlier ideas don't provide a solution, you can try to parse the date using regex. There should be many resources on this topic on this site as well as the web.
If you dislike regex, split the input string according to your own format rules. That's kind of an ugly solution though.
If you don't feel like splitting user input stringy by hand, use Random.NextInt(1000) and hope nobody notices. 0:-)
When screen-scraping, what are the "gotcha"s to look out for?
The inspiration for this is: my spouse's co-worker asked me to scrape all the pages from a Blogger-hosted blog that her friend with cancer kept in her final months and this lady wanted to keep all of the posts in case the blog were ever deleted. I eventually found a free tool that was barely good enough.
One issue with scraping many Blogger pages is that there's often a navigation menu where you can click on the triangles to expand the post lists by year or month. These little buggers created insane amounts of duplicate content because you'd have the same page over and over again with different combinations of the menus being expanded/collapsed. In Blogger's case I'm not sure this is avoidable since the links are all formatted as real http links and not obvious JavaScript calls. Still, it got me thinking:
If you were to scrape a website, what kinds of potentially non-obvious things would you compensate for?
Do not use regex to scrape
While regular expressions can be good for a large variety of tasks, I find it usually falls short when parsing HTML DOM. The problem with HTML is that the structure of your document is so variable that it is hard to accurately (and by accurately I mean 100% success rate with no false positive) extract a tag.
What I recommend you do is use a DOM parser such as BeautifulSoup or equivalent (SimpleHTMLDom in PHP).
Some may think this is overkill, but in the end, it will be easier to maintain and also allows for more extensibility.
A regular expression could be devised to achieve the same goal but would be limited. For example, developing a regex to get the src and alt tag would force the alt attribute to be after the src or the opposite, and to overcome this limitation would add more complexity to the regular expression.
Also, consider the following. To properly match an <img> tag using regular expressions and to get only the src attribute (captured in group 2), you need the following regular expression:
<\s*?img\s+?[^>]*?\s*?src\s*?=\s*?(["'])((\\?+.)*?)\1[^>]*?>
And then again, the above can fail if:
The attribute or tag name is in capital and the i modifier is not used.
Quotes are not used around the src attribute.
Another attribute then src uses the > character somewhere in their value.
Some other reason I have not foreseen.
So again, simply don't use regular expressions to parse a dom document.
I screen scrape a lot. Some advice:
Emulate a User-Agent string for some browser you want to use. Different websites frequently return very different results depending on what your user agent is. If they don't recognize the User-Agent they will often revert to lowest common denominator, so it's usually best to start with some recent browser. (For example the World of Warcraft Armory returns beautiful, easy to parse XML if it thinks you're a recent Firefox. If it doesn't know what you are it sends terrible HTML).
Be polite to the site you're scraping; don't hit it too hard. Your scraper will go faster if you multi-thread it, making many requests at once, but that will annoy the site owner.
Be smart about error handling. Do not write code like while (1) { makeRequest(); }. If your code or the server throws an error a loop like this will immediately fetch another request, generating another error. It can get ugly quickly. Handle errors well and consider putting in sleeps or exits if you see a lot of errors.
When developing your parsing code, test against a cached version rather than hitting the server every time. Will make your development go faster and is the basis of a simple test suite.
First, I'd check for an RSS feed. On blogger, you just have to add /rss to the root url, if I remember correctly.
Then I'd check if there isn't already some tool to scrape blogger.
Then if there's no RSS feed, and no existing tool, I'd give up and do it by hand with copy/paste. Unless we're talking 5000 pages, it's much faster and easier that way. Take it from someone who's tried.
If you have access to the actual account, blogger has an export function.
edit: Or of course, you could try mechanical turk.
As far as gotchas are concerned..It's usually a good idea to limit the amount of requests made over a certain period of time. Smashing a site with alot of requests in a short space of time is a good way to have your requests rejected.
Aside from the technical considerations, make sure your not putting yourself at legal risk. Most large sites have specific legal language in their terms of use that disallows programmatic access to their services via an automated computer program, and also, the obvious copyright concerns.
From a technical standpoint, definitely use a DOM parser library and you'll save loads of time. Many provide the ability to read HTML into an XML structure that can be queried using XPath to find exactly what you need.
If you know someone who has access to the account, they can use Blogger's export "Export blog" feature.