rename an html page according to an image within it - html

firstly I'll give some background regarding the situation.
I have a website containing approximately 56k pages each page contain a mapped sketch of a machine part. this machine part is made out of smaller parts which are outlined in the image and hold a certain number. when you hover over the numbers a box with the part item code shows up.
I order parts according to this item codes but recently a lot of the items codes have changed, therefore I am looking for a solution.
now I own a database with data on all the 56k parts and I want to link the relevant webpage to each record according to the name of the part(a column in my database), the problem is that the webpages names has no logic name that could connect with the part name in any way but the image that is displayed in the page has the exact name of the part.
I want to rename all the html files I has according to the Images displayed within them. how can I achieve that without renaming all the 56k pages manually?
additionally how can I add the links to all the 56k pages automatically to my database after all the above is done?
Thank you for your patience I know it was long.

If you have a *nix shell, then a simple egrep will get you far
egrep "<img src=\".*\"" -r . > list
The regexp would have to be adapted to match the part you are looking for of course.
You could easily to some search/replace in the resulting list to create a batch script that will do all the renaming for you.

Pick your favorite scripting language and parse each html file to find the image name to use in renaming the file. Personally I would use Perl as it makes parsing the files and updating a database at the same time with the URL easy.

Related

Visual Studio Code - Compare two csv and find data that not in the two files

So, I have two csv files that I need to compare. However, I am not sure if I am using compare active file in Visual Code will help me.
File 1.csv --> the starting point
id,name
12a,mark
134,jon
151,pete
z18,sab
329,lin
m32,sam
kla,kop
l5h,ming
File 2.csv --> modified one, basically made some changes (delete two id)
id,name
12a,mark
134,jon
151,pete
l5h,ming
kla,kop
329,lin
So, I want to use visual code to compare between these two files and find out which line from the 1.csv that already been removed. If I use compare active file in visual code, it only gave me which line that different from the original one. But I cannot find between the original file (1.csv) and the modified one (2.csv), which data/id has been removed.
I am not sure whether visual code can do this or what keyword that I need to use in google to find this solution. So I am wondering if anyone could help me with this.
Ps
The real files that I need to deal with in the same situation have more than a thousand id.
Sorry if this has been resolved or asked somewhere in StackOverflow, English is not my native and I don't know what keyword that I should use in StackOverflow for this.
Thanks!
If you multi-select (Ctrl+Left-Click on Windows) both files (file1.csv, file2.csv) in the VS Code (File) Explorer window, then select Compare Selected from the right-click menu in the (File)Explorer window, a diff window will open with the comparison you desire.
Note the first-selected file will appear on the left pane of the diff window.

PhpStorm searching in project tool window

I'm using PhpStorm 10.0.4
When I start typing characters in project tool window it searchs for files containing typed text.
Is it possible to change this behavior so only files that begins with typed text would be matched?
Is it possible to change this behavior so only files that begins with typed text would be matched?
AFAIK no. There are no GUI settings for this at all.
Plus, this Speed Search is used in many places/tool windows and search logic is the same.
P.S. If you need to search for files .. why not try more appropriate (in general sense) Navigate | File... instead?
Speed Search only finds items in already expanded nodes (as it's a basic search on already displayed text) .. but Navigate | File... will look for files everywhere in the project.
It's not possible directly but you can create and use a scope for that.
Open Settings and go the Appearance & Behaviour -> Scopes. Create a new scope, give it a name (let's say "My Files") and put file:*/c* in the Pattern edit box.
In the big list of files under the Pattern edit box you can preview its effects. The files that are included in the scope are colored in green, the directories that contain included files are colored in blue.
This simple pattern selects only the files whose name start with c, in all directories. You can use slightly more complex filters using wild cards, include or exclude entire directories etc. With a little practice you can create filters that match usual needs pretty well.
When you are pleased with the scope definition, close the Settings box and go back to the Project view. Click on the arrow next to Project and you'll get a list of views of the project files. All the scopes you created should be there. Select "My Files" and only the files (and directories) included in that scope will be displayed in the Project view.
It is not a dynamic filter, you have to work a little to set it up, but it is useful when you work on large projects, with thousands of files, and you need to hide the files not important for your task.

Bash diff body text of html file only

I'm writing a shell script which tracks the changes of a website and emails me with the contents of the change if one occurs. The idea is to use wget to grab a copy of the html and compare it to the version from the last time the script ran. Wget works fine to save the html file but I'm having trouble comparing the files. The trouble is that I'm only interested in changes in the html file's plain text, not the code, links, etc.
Diff works to find all the changes in the two files but it ALWAYS returns changes even when the plain text is identical. This is because each link on the site has a corresponding authenticity token that differs each time the page is accessed. In order to diff only the lines that include plain text I'm attempting to filter it to exclude any line that begins with "<" OR "(any_amount_of_spaces)<". I've looked at the diff man page but I can't seem to find an operator that will do what I need. I don't know much about REGEX but would that work with diff -I for this?
Thanks!
You could use lynx -dump to render the pages and feed those to diff, but since you are not interested in links you would need to get rid of the References section that this yields (with e.g. awk) rendering this a not-so-robust solution (but maybe good enough for your use case).
If you don't mind using something 3rd-party go for html2text:
diff <(html2text before.html) <(html2text after.html)
PS: There are two different programs called html2text.

Move images from service and update paths in forum

I run a popular forum where one of the members who has made a lot of awesome contributions recently contacted us. He has posted several hundred images from his webshots gallery, but the service is changing as are all the images. I need to change all the image src paths in all of his posts in our mysql database.
He was given the opportunity to download all the images which he has given to me. Because I will be having to do a lot of these changes in production I need to make sure that I don't screw this up.
The image src in his posts look similar to this, where 0103935217 I believe is his user ID.
http://inlinethumb25.webshots.com/47576/2156388770103935217S500x500Q85.jpg
The images downloaded from the service look like this. Notice the S500x500Q85 has been replaced with a random string.
2156388770103935217Reacil_fs.jpg
So I have two tasks:
I need to rename the all the files I've put on my server removing the random characters and the _fs designation.
I need to change the file paths in all of his posts removing domain and container and replacing it with mine. In addition I need to remove the S500x500Q85 designation.
For 1. I have a regex but I'm unsure how to do the replacement 0103935217\w+?_fs
For 2. I know my query needs to be something along the lines of the below. I'm a little unsure how to do this though, is it with a regex?
UPDATE posts SET post_body = replace(post_body, '','') WHERE user_id = 1234

How to search a word in a html file without any java coding?

I'm doing a project in Java which creates a user manual (html files that are linked together like Windows "Help and support centre") of software. Now once a user manual is created I have only html files remaining. Now I want to search html file that contains specified keyword(Search Engine).How can I do this without Java code??
grep, find, python script, or open any file with a text editor and try edit->search
(on windows use windows search in file)
If all of your other code is written in java, then it'll be sensible (without knowing your usecase) to use java for searching as well. You might of course use some commandline programs as grep or find - or built in search functionality in a webbrowser, but if the search should be part of a java application anyway, why not go for java and e.g. Lucene?
If this 'help' is going to be online than you can embed google search in it (limiting the search results to specified site:). Alternatively if you're hosting the pages yourself you can use htdig for indexing the pages.
However if it's going to offilne you'll be better of by generating a static index page with links to topics. In order to create a more help-system-alike user experience you can hide the contents of the index in the invisible html DIV tags and add a JavaScript that takes searched phrase as an input and that unhides the matched words with their links.
Maybe I'm missing something, but have you looked at javahelp? It has indexing and searching built in, and can be used online or offline.