Digitalize old diagrams and tables - ocr

Reading old science papers I found nomograms with chemical content, as for my example a nomogram of sulfuric acid and its enthalpy:
Is there a possibility to mathematical correctly digitalize this nomogram if one does not have the correct table of numbers which created it?
My approach would be "giving" a program the grid and its values and then "read" the curvature of the individual data.
Is this possible or am I here on the wrong forum for such a question?

You can try to use tesseract-ocr. This tool is giving you ejected text and box boundaries of each word/letter. They might solve your problem, you can try this approach.
I used this tool to detect, wether words belong to same line, are on the same level on the page.

Related

creating a common embedding for two languages

My task deals with multi-language like (english and hindi). For that I need a common embedding to represent both languages.
I know there are methods for learning multilingual embedding like 'MUSE', but this represents those two embeddings in a common vector space, obviously they are similar, but not the same.
So I wanted to know if there is any method or approach that can learn to represent both embedding in form of a single embedding that represents the both the language.
Any lead is strongly appreciated!!!
I think a good lead would be to look at past work that has been done in the field. A good overview to start with is Sebastian Ruder's talk, which gives you a multitude of approaches, depending on the level of information you have about your source/target language. This is basically what MUSE is doing, and I'm relatively sure that it is considered state-of-the-art.
The basic idea in most approaches is to map embedding spaces such that you minimize some (usually Euclidean) distance between the both (see p. 16 of the link). This obviously works best if you have a known dictionary and can precisely map the different translations, and works even better if the two languages have similar linguistic properties (not so sure about Hindi and English, to be honest).
Another recent approach is the one by Multilingual-BERT (mBERT), or similarly, XLM-RoBERTa, but those learn embeddings based on a shared vocabulary. This might again be less desirable if you have morphologically dissimilar languages, and also has the drawback that they incorporate a bunch of other, unrelated, languages.
Otherwise, I'm unclear on what exactly you are expecting from a "common embedding", but happy to extend the answer once clarified.

Storing site data in columns or rows

This is a question of how to perform the best practice of storing data from a webpage. Like texts/image-urls/links etc.
I have an CMS were you can create web pages. Here you can edit texts/upload images. In the future it would also be nice to "add new elements", add links to a-tags etc.
I need to have a robust and flexible solution that also have good performance. In both getting/recieving this data.
Lets consider I have 1000 pages with each around 25 elements on each page that can be updated and stored in the database.
Alternative 1)
Create a table and 1 column for each element on these pages for example columns like:
title_1, title_2,image_1,image_2.
Here we have a set of columns that we can update, these we can use on the web page.
Alternative 2)
Create 1 table with the columns (id, namespace, page_id, data)
And for each element on the page I add the namespace in association with the page_id to make the data output unique. In the data I can add any kind of information; text, links etc.
What do you suggest as a good solution for this issue? I'm ofcourse also open for other alternatives.
Thanks!
I would recommend option two, with the addition of a column identifying the element id/or type, if indeed the element id is somehow comparable. That is to say, if anchor text (say) is always stored as element id = 4, then you might want an element id = 4 so that you could compare anchor texts across multiple documents.
If, on the other hand (and this is the scenario I imagine is more likely), you may have 1-25 elements on a page and each of them could be different (eg document one has three anchor texts and four images, document two has one anchor text and no images, etc) it would make sense to add an element_type_id table that stores a bit of information about the element types. This is assuming that you ever have any interest in comparing (say) images across multiple documents, or anchor texts across multiple documents, etc.
Another thing to consider: if you are likely to see the same element over and over again, it actually makes more sense to effectively parameterize those elements by way of a lookup table. So basically store each (say) unique anchor text in one table and reference its id in your actual data table.
If I may add one additional thing: SO may not be the best place for the particular question you are asking. I'm not totally sure of that and maybe I'm wrong... but I would poke around the Stack Exchange network and see if other forums more closely deal with the type of question you asking. In the very least, I'd observe that your question is fairly vague and the goal of achieving a "robust and flexible solution that also {has} good performance. In both getting/recieving this data." is not likely to be accomplished simply by asking for advice on SO. There is a LOT that goes into data architecture, and certainly many of the details I would consider important in designing this myself are not present in your questions. And if you're not sure what those details are, I am not sure if SO is really the best place to set about learning them. I think https://softwareengineering.stackexchange.com/ may be a better fit for this question.
Just my opinion, and I could be wrong. Either way, I would consider learning a bit about database normal forms (http://www.bkent.net/Doc/simple5.htm or Google it) as well as do a little research on the types of design considerations that go into building a database (an old but still good SO article on that is here: What are the most important considerations when designing a database?)

Multi-level numbered headings in HTML5 for outlining the document

So I am writing a manual in html5, and it's going to need numbering.
The headings will need to be numbered eg "Section 4: Some Stuff"
Some subheadings will need to be numbered eg "4.01: the first point
you need to know about some stuff"
Just to be difficult, the manual will have tables and images, so they will need to be numbered, also eg
"Fig 4.03 A cat. Most of the images on the internet are of cats."
Also, there are lots of process lists in the manual. It would be nice if these were numbered under the subheadings eg
4.05 A simple process
4.05.01 Pull a leaf from the tree
4.05.02 Eat it
4.05.03 Now you are a caterpillar
4.05.04 Turn into a beautiful butterfly
I've been researching the different ways to number my headings, subheadings, figures, and lists. I'm finding answers, just not good answers.
imperfect solution 1: use CSS counters
These can't be copied to editing programs (word etc)
They also apparently don't work with screen-readers
imperfect solution 2: Use ordered lists
These won't 'fail gracefully' afaik - if all my headings are a 'heading' class of ordered list, They will just look like a plain list without CSS.
Has someone solved this problem already? What's the solution?
Super extra kudos for anyone for anyone who can supply a smart way of auto-updating my figure cross references!
Use text.
<h3>4.01: the first point you need to know about some stuff</h3>
The numbering is not just styling (you might want to reference these numbers, right?), so a CSS solution is out of question.
Using ol can work in some cases, but it has many drawbacks:
You don’t want to use an ol for your whole document, do you?
User-agents don’t have to render any numbers at all.
Many user-agents will not allow to search for or copy the numbers.
You can’t get the exact kind of numbering scheme you want to have (e.g., nested ol typically don’t render a delimiter like . but start again with the first value).

DIV or Table for showing database data

I know it's told to use table for tabular data, but I see in many websites and CMS that they use div for showing database content, for example in admin area for editing them, shouldn't they use table for showing these data? What's the best way?
Use a table, since it's tabular data. Unordered, ordered, or dictionary lists should be used when you want to present data in a non-tabular fashion, like the list of questions on the front page of SO.
I will answer your question with another question :
Do you want your data to remain presentable if CSS are not available ?
Yes, definitively go for Tables
No, it's up to you, whichever makes you all warm and fuzzy inside ;-)
Typically you would use DIVs for page layout and TABLEs to display tabular data. In your question you ask about the admin areas for a CMS. If in the admin area they are displaying a grid that represents one or more tables in a database then yes it would probably be best displayed as a table.
However the distinction should be made based on how you are actually presenting the data. Just because the data started out as tabular data (in a database table) doesn't mean that it is inherintly tabular data. If you intend to display it in some other form then DIVs might be the better choice.
It completely depends on what type of data we are talking about. Unless you can give an example of the data, then you won't get a very good answer.
Edit: Per your comment, yes, use a table. If you're showing lists of things from a database then you should use a LIST. There is no golden rule -- the format you use should reflect the data coming from the database.
Table because it is data.
The best markup to use to present a piece of data is always that which is most semantically appropriate. This, of course, raises the question of exactly what is semantically appropriate. This is not a trivial question, and it depends entirely on the sort of data that you're presenting. If your data is tabular in nature, then you should definitely put it in a table. Most data is not tabular in nature, so it shouldn't go in a table.
The reason that using tables is discouraged is because they have historically been misused for non-semantic presentational purposes. Often, authors would place data that wasn't even remotely tabular in nature inside a table tag solely to get it to appear a certain way. This is poor practice, and one should instead create the desired appearance using CSS. This criticism, however, applies not to the use of tables in general, but merely to the use of tables for inappropriate content.
To address a couple of other things:
Don't worry about browsers without CSS. This isn't a problem in this day and age, unless you're using non-graphical browsers.
Search engines prefer semantic content. If tables are the proper semantics, then the search engines will prefer them.
Div is more widely supported by browsers, while table has some quirks and exceptions that make it cumbersome.
a div is more general purpose, you can do a lot with it, not just tables. take a close look at:
http://www.w3schools.com/tags/tag_DIV.asp
search engines more like div

Does it make sense to use the <table> tag on a "modern" website? [duplicate]

This question already has answers here:
Closed 13 years ago.
I am developing a "modern" website, and I'm having a lot of trouble getting the CSS to make everything line up properly. I feel like they layout would be a lot easier if I just used a table, but I've been avoiding <table> tags, because I've been told that they are "old-fashioned" and not the right way to do things.
Is it okay to use tables? How do I decide when a table is appropriate, and when I should use CSS instead? Do I just do whatever is easier?
The answer is yes, it's fine to use tables. The general rule of thumb is that if you are displaying tabular data, a table is probably a good way to go. You should generally try to style your table with css as much as you can though.
Also, this pie graph might help you:
alt text http://www.ratemyeverything.net/image/7292/0/Time_Breakdown_of_Modern_Web_Design.ashx
EDIT: Tables are fine. For displaying data. Just like my second sentence stated. The question was "is it ok to use tables". The answer is - yes, it is ok to use tables. It is not illegal.
Since even though it's implied to use tables for data in my general rule of thumb, apparently I must also state that the corollary is that it's not ok to use tables for anything else, even though the poster already seemed to grasp this concept. So, for the record, the general rule of thumb is to not use tables for laying out your site.
Tables should be used to represent tabular data. CSS should be used for presentation and layout.
This question has also been exhaustively answered here:
Why not use tables for layout in HTML?
Essentially - if you have tabular data, then use a table. There's really no need now to use tables for layout - sure, they were often considered 'easier' but semantically the page is horrid, they were often considered inaccessible.
See some discussion:
css-discuss
and a particularly comical URL - shouldiusetablesforlayout.com
In the 'modern' approach of tables it is not about using table tags or div tags, but about using the right tag for the right purpose.
The table tag is used for tabular data. There is nothing wrong with using it for that!
For using CSS, there are a lot of tutorials and guides (good and bad) around. Indicators of a bad tutorial are: lot of use of blocks (divs) that only make sense for the layout and not for the content. Good signs are the ones that advise to use the right tags for the right content and teach you how to make up that tags.
Tables are only appropriate for tabular data. Imagine you have to add some spreadsheet like data, where you have clear row/column headers, and some data inside those rows.
A product comparison, for example, is also a valid table item.
I believe that tables are OK for display of rectilinear data of arbitrary rows and/or columns. That's about it. Tables should not be used for layout purposes anymore.
In general, HTML markup should describe the structure and content of a web page—it should not be used to control presentational aspects such as layout and styling (that's what CSS is for). A <table> tag, like most have already said, should represent tabular data—something that would appear as a table of information.
The reason why people rag on tables so much is that in the old days, there was no such thing as CSS—all page layout was done directly in HTML. Tags were not thought of as describing content—all anyone really cared about was how a tag would make things look in a web browser. As a result of this, people figured that, since they could organize things into rows and columns, tables must be good for laying out elements of a web page. This became a really popular technique—in fact, I'd wager that using tables was considered the preferred method of laying out web pages for quite some time.
So when people tell you that tables are "old-fashioned," they are specifically referring to this abuse of the <table> tag that was so popular back in the old days. Like I said, there's nothing wrong with HTML tables themselves, but using them for web page layout just doesn't make sense nowadays.
(Plus, from a purely pragmatic standpoint, layouts done with HTML tables are very inflexible and hard to maintain.)
its ok to use tables when you are showing data in a grid / tabular format. however, for general structure of the site, its highly recommended that you use css driven div, ul, li elements to give you more lucid website.
If you anyways decide to work with tables, you must consider the following cons :
they are not SEO friendly
they are quite rigid in terms of their structure and at times difficult to maintain as well
you may be spending little extra time on div based website, but its worth every minute spent.
The whole "anti-Table" movement is a reaction to a time when deeply nested tables were the only method to layout pages, leading to HTML that was very hard to understand.
Tables are a valid method for tabular (data) layout, and if a table is the easiest way to implement a layout, then by any means use a table.
Table is always the right choice when you have the need to present data in a grid.
Quoting Sitepoints's book HTML Utopia: Designing Without Tables using CSS
If you have tabular data and the appearance of that data is less important than its appropriate display in connection with other portions of the same data set, then a table is in order. If you have information that would best be displayed in a spreadsheet such as Excel, you have tabular data.
I would say no for using tables to construct your layout. Tables make sense only for actual tabular data you need to represent. If you spend enough time figuring the CSS out you will find its easier then using tables for a layout. Just remember: Tables for displaying data. CSS for page layouts.
Tables are just that: Tables.
They are frowned upon because they should not be used for layout, as has been the fashionable thing to do before browsers could position stuff properly.
If what you want to markup is, in fact, a table, then use a table. Other than that, try to stay away.
One small thing: Aligning two parts of text to the exact same line that won't move apart (think, username and post date). There using a table is IMHO an option.
First get it working. Then get it perfect.
Get the layout done in some way before making it perfect or better.
How many people per day will go to the page you are working on? A million? or 20 ?
How much time are you going to spend on CSS issues instead of other issues? Does your boss want you to spend this much time on the issue? Does he/she know what you are doing?
Absolutely. I don't know where CSS zealots invented the idea that tables are not naturally used for "layout". Tables have been used for laying things out since their invention, whether those things be numbers, words, or pretty pictures. That's what they do. Moreover, table is part of all versions of (X)HTML so there are no deprecation concerns.
Absolutely.
All that HTML offers was originally intended for you to define the markup of your page. In my book, absolute and relative positions of elements on a page belong to markup. So both divs and tables are very much suited for this task. Pick up what works best for your particular need.
CSS adds many styling possibilities and also layout tricks but it complements HTML options not replaces them.
There is actually a very fine line between seeing something as a markup or styling issue. CSS proponents would say that with CSS you can relocate and reshuffle completely all big and little pieces of a page. I cannot however imagine putting header below, footer above and making things appear in reverse order.
Take an example. You design a notebook. You know where to place major components, mainboard, cooling system, keyboard, display and ports. You may certainly wish to rearrange a little bit port connectors, on whic side and in which sequence they appear, but you don't really expect to put display where the keyboard is, put keyboard on the lid, make fans blow to your face and have all connectors on the botom to be reached through holes on your desk.
Using tables can make it slightly difficult to rearrange elements on a page. This might be true. However, in most cases you know in advance how approximately your page should look like and you would not want to change everything drastically. if you can't say it before your begin your work you probably have no clear idea what you are doing and what for.
Moreover, only tables possess elastic properties, which allows the to stretch to the width/height of their content. Nothing else of HTML/CSS can be used to do that.
CSS design on one side allows you to create quite adjustable designs. On the other hand, it locks you out from designing a page adjustable to its content. Both wins and losses.
Table is also the only tool to make very complex and precise interfaces. For example, the page SO is very simple. It probably can be done with pure CSS. In the meantime, have you seen any enterprise-class software like CRMs, SRMs etc? That multitude of buttons, text field, check boxes, dropdownlists all precisely located on a screen? Good luck achieving that kind of complexity with just CSS. And these layouts migrate from desktop applications into web each day (keyword: software-as-a-service).
So choose what suits best your current need and don't trust those CSS lovers. Actually don't trust any fanatics at all.