Currently I am scraping Instagram comments for a sentiment analysis project, and am using an Instagram scraper. It is supposed to output a comment file but it doesn't, so a workaround is to find the query URL in the log file and paste it into a browser.
An example URL would be this https://www.instagram.com/graphql/query/?query_hash=33ba35852cb50da46f5b5e889df7d159&variables={%22shortcode%22:%22CMex-IGn1G-%22,%22first%22:50,%22after%22:%22QVFCaERkTm84aWF3T1Exbmw5V0xhb05haVBEY2JaYmxhSTNGWVZ4M2RQWi0yVzVUSExlUlRYOUtsOVEtM0trRzBmSGxyYjdJV094a1hlYm1aLXZjdkVpZQ==%22}.
On Firefox I am able to view the JSON response and am also able to download it through two ways:
CTRL + A to select all and paste into a JSON file.
Download webpage as a JSON file.
The issue with these methods are that neither of these retain the emoji data. The first loses the emojis as they are not stored in unicode, but rather as question marks ???. I assumed this was related to the encoding, so tried to paste the raw response into Unicode files. Instead they are the emojis which can be represented as emojis ️🙌👏😍, but not unicode.
The second method either saves it with only the message {"message":"rate limited","status":"fail"} or another incorrect format.
The thing is, is that a few months ago I scraped some pages and managed to save the comments with the emojis stored in the unicode format. This is frustrating as I know it can be done, but I can't remember the process how I did it as I would have tried something basic, as I have outlined.
I am out of ideas and would greatly appreciate any help. Thank you.
I have a file hosted on AWS Linux AMI. The link is http://54.179.188.146/a/a.docx I can visit the link and download the file.
I am trying to use Microsoft Online Doc Viewer to view the Word File online at this link https://view.officeapps.live.com/op/view.aspx?src=http://54.179.188.146/a/a.docx but it returns a page stating "An error occurred We're sorry, but for some reason we can't open this for you."
I had chmod the file to 775 but it still cannot view.
I had uploaded to another server and it is working. May I know what is wrong? Is it a server configuartion issue? Please advise.
Thanks.
This is Old but giving some more pointers to the new visitors , i am posting the consolidated answer for the root cause of the "We’re sorry, but for some reason we can’t open this for you" error in https://view.officeapps.live.com/op/view.aspx?src=
If you see the error, "We’re sorry, but for some reason we can’t open this for you," it means the document could not be found or could not be displayed. Likely reasons include:
There’s no document to be found at the URL you provided. Make sure
you provide the correct URL.
The document is too large. Word and PowerPoint documents must be less
than 10 megabytes; Excel must be less than five megabytes.
The document was not saved in a format that is supported for opening
in a web browser. Try saving your document in one of the following
formats:
Word: docx, docm, dotm, dotx
Excel: xlsx, xlsb, xls, xlsm
PowerPoint: pptx, ppsx, ppt, pps, pptm, potm, ppam, potx, ppsm
You need to sign in or provide a password to open the document. Make
the document publically available to view.
The document’s file name contains invalid characters. Try encoding
the file name when you type the document’s URL, or rename the file to
use only letters and numbers. For example, to encode a URL that
includes an ampersand (i.e. &), you would type %26 for the ampersand
character. For more information about URL encoding, also known as
percent encoding.
more info can be found here
The value after "src=" should be URL-encoded. See details on MS Page
You should checked all reasons from here
There’s no document to be found at the URL you provided. Make sure you provide the correct URL.
Try to open file from browser.
Make sure you don't try to send on preview service path of the file from your local host. To which, obviously, there is no access from the Internet.
Path to file must be http:// or https://
If path to your file start with https:// make sure your site have necessary secure certificate.
Domain name matters.
Will not be open in preview service
http://185.231.70.200/vacuumcleanerprocedure.doc
Will be open in preview service
http://domainname.com/vacuumcleanerprocedure.doc
The document is too large. Word and PowerPoint documents must be less than 10 megabytes; Excel must be less than five megabytes.
Try different files with different Microsoft file types.
The document was not saved in a format that is supported for opening in a web browser. Try saving your document in one of the following
formats: Word: docx, dotx Excel: xlsx, xlsb, xls, xlsm PowerPoint:
pptx, ppsx, ppt, pps, potx, ppsm
Try different files with different Microsoft file types.
You need to sign in or provide a password to open the document. Make the document publicly available to view.
File permission and folders mode should be 775.
Check if in .htaccess file of your apache server there are allow access to ms-office files.
Check if your file available from internet. Try to open file from browser. If you see “You don't have permission to access filename on this server” see answer here
The document’s file name contains invalid characters. Try encoding the file name when you type the document’s URL, or rename the file to
use only letters and numbers. For example, to encode a URL that
includes an ampersand (&), you would type %26 for the ampersand
character. For more information about URL encoding, also known as
percent encoding, see Percent-encoding on Wikipedia.
The value after "src=" should be URL-encoded. When you place the link on the preview service, it already encodes it for preview. Additionally, I may encode the link here, but the result will be the same.
I'm trying to use the MediaWiki filepath magic word` so that I can create some template links that pass a specific MediaWiki file. Unfortunately with certain file types, filepath just returns nothing.
The file I'm trying to get the path for that's failing is a text file in this case. I have confirmed that I am using the correct filename as I can create a regular file link using [[File:Name.txt]], and {{filepath:Image.png}} works properly.
Example of what I'm trying to accomplish:
[http://server/processfile.php?path={{filepath:<filename>}} Process A File]
Is this a known issue? Is there an easy way that I can debug what's happening here?
After digging around a bunch more I was able to resolve the issue. It turns out that even though the MediaWiki would accept the file, it was being assigned a random mime type because it was a .yaml file.
After updating mime.types and mime.info in MediaWiki and adding the mime type (text/yaml) to my IIS configuration, I was able to get the downloads working and the file links showing up.
Full disclosure: I may have been using an incorrectly cased file name even though I said that I was using the correct file name. :P
I have a file named Test%3f.htm on my webserver. I am trying to access the file through a web browser. I realize the %3f decodes to a question mark which I do not want. So I have tried to access it as http://mysite.com/Test%253f.htm but have had no luck. Any help would be greatly appreciated.
you need to decode the the url this is encoded url the %3f is for ?
For what it is worth, I found that on IIS7, I was able to turn on
<requestFiltering allowDoubleEscaping="true"/>
in the web config. This allowed the urls to be processed with the character in the file name.
http://www.mamstore.co.uk/bin/pxisapi1.exe/catalogue?level=805838
Look where its (meant to say) £5 T-shirts. Instead the '£' comes up as an invalid character, yet the exact same char is shown just below on the products.
I am getting the same when i pull a php files contents in with Jquery. The actual PHP file shows the chars correctly (without any head/body set etc) as soon as i pull it into the site it suddenly has issues with it.
Its stored in an SQL DB on a custom build CMS / WMS system.
Any suggestions would be much appreciated.
Cheers
Your page is encoded with UTF, but character in breadcrumbs is encoded with ISO. What encoding do you have in your database?