How to add publication in academicpages.github.io? - jekyll

I forked https://github.com/academicpages/academicpages.github.io and I am trying to adjust the content. I want to add a publication on the publication site. So, I created "2021-05-21-willingness-to-vaccinate-against-COVID-19.md" in the publications folder.However, it does not show up on https://dangraeber.github.io. How does it come? My repo: https://github.com/dangraeber/dangraeber.github.io.
And does there exist a documentation for this template?
Thanks in advance!
Best
Daniel

When I do a local build I'm getting the following error:
Conversion error: Jekyll::Converters::Markdown encountered an error while converting '_publications/2021-05-21-willingness-to-vaccinate-against-COVID-19.md':
The source text contains invalid characters for the used encoding UTF-8
It's the ö character in Schröder. If I remove that, I have no build errors.
You have three options I can think of:
Change ö the to o.
Use the HTML entry like so: Schröder.
Change your encoding. (I don't know enough about language encodings to give advice about this.)

Related

Liquibase - Trim whitespaces from CSV

I have a formatted CSV file for <loadData .../> of Liquibase.
There are some whitespaces for having a nice look.
But because of that whitespaces, I have wrong data in my DB.
How to solve it? Is there any "flag" or something for forcing Liquibase to trim whitespaces?
I tried to make it looking something like the next
id;name ;surname
1 ;test123;test123
2 ;test1 ;test123
3 ;"test" ;test123
Anyway, my DB contains test1__ and test"_ as well where _ is a space.
Also quotchar=""" didn't help (and it was expected, it is a redundant line).
Btw, id column which is defined as numeric - ok (1,2,3, etc with no errors).
Check out this Jira issue.
To quote Nathan Voxland:
It probably makes sense to keep the default as trimming since I think
that will cause less surprises. However, I added a global
configuration flag that lets you change the default.
You can set it either through a liquibase.trimCsvWhitespace=false
system property or by using the
LiquibaseConfiguration.getInstance().getProperty(GlobalConfiguration.class,
GlobalConfiguration.CSV_TRIM_WHITESPACE).setValue() API call.
Try adding liquibase.trimCsvWhitespace=falseproperty.
On further review, it looks like it was a change just in 3.5.0. I
usually try to keep backwards compatibility, even when it is
unexpected behavior but was thinking it had changed with 3.4.0 and so
changing it back to preserving whitespace would break other people
that are now expecting it to be trimming.
However, since it did change unexpectedly in 3.5.0 only, it is
definitely a bug and so I'm just setting the logic back to preserving
whitespace.
Accodring to Jira ticket this bug was fixed in liquibase version 3.5.1, but looks like it actually wasn't.

Extract a html tag that contains a string in openrefine?

There is not much to add to the title. It's what i'm trying to do. Any suggestions?
I reviewed the docs at github and googled extensively.
The best i got is:
value.parseHtml().select('p[contains('xyz')]')
It results in a syntax error.
The 'select' syntax is based on the select syntax in Beautiful Soup (http://jsoup.org/cookbook/extracting-data/selector-syntax)
In this case I believe the syntax you need is:
value.parseHtml().select("p:contains(xyz)")
Owen
Perhaps you missed my writeup (and WARNING) on the wiki :) here ?
https://github.com/OpenRefine/OpenRefine/wiki/StrippingHTML#extract-html-attributes-text-links-with-integrated-grel-jsoup-commands
WARNING: Make sure to use .toString() suffixes when needed to output strings into Refine cells while working with the built-in HTML GREL commands (the default output is org.jsoup.nodes objects). Otherwise you'll get a preview just fine in the Expression Editor, BUT no data shown in the Refine cells when you apply it!
BTW, How could we make the docs better and where, so that someone doesn't miss this in the future ?
I even gave folks a nice example in our docs that shows using .toString() :
https://github.com/OpenRefine/OpenRefine/wiki/GREL-Other-Functions#selectelement-e-string-s

NLTK letter 'u' in front of text result?

I'm learning NLTK with a tutorial and whenever I try to print some text contents, it returns with 'u' in front of it.
In the tutorial it looks like this,
firefox.txt Cookie Manager: "Don't allow sites that set removed cookies to se...
But in my result, it looks like this
(u'firefox.txt', u'Cookie Manager: "Don\'t allow sites that set removed cookies to se', '...')
I am not sure why. I followed exact way the tutorial is explaining. Can someone help me understand this problem? Thank you!
That leading u just means that that string is Unicode. All strings are Unicode in Python 3. The parentheses means that you are dealing with a tuple. Both will go away if you print the individual elements of the tuple, as with t[0], t[1], and so on (assuming that t is your tuple).
If you want to print the whole tuple as a whole, removing u's and parentheses, try the following:
print " ".join (t)
As mentioned in other answer the leading u just means that string is Unicode. str() can be used to convert unicode to str but there doesnt seem to be a direct way to convert all the values in a tuple from unicode to string.
Simple function as below and using it when ever you are referring to any tuple in nltk.
>>> def str_tuple(t, encoding="ascii"):
... return tuple([i.encode(encoding) for i in t])
>>> str_tuple(nltk.corpus.gutenberg.fileids())
('austen-emma.txt', 'austen-persuasion.txt', 'austen-sense.txt', 'bible-kjv.txt', 'blake-poems.txt', 'bryant-stories.txt', 'burgess-busterbrown.txt', 'carroll-alice.txt', 'chesterton-ball.txt', 'chesterton-brown.txt', 'chesterton-thursday.txt', 'edgeworth-parents.txt', 'melville-moby_dick.txt', 'milton-paradise.txt', 'shakespeare-caesar.txt', 'shakespeare-hamlet.txt', 'shakespeare-macbeth.txt', 'whitman-leaves.txt')
I guess you are using Python2.6 or any version before 3.0.
Python allows its users to do the same operation on 'str()' and 'unicode' in the early version. They tried to make conversion between 'str()' and 'unicode' directly in some case rely on default encoding, which on most platform is ASCII. That's probably the reason cause your problem. Here are two ways may solve it:
First, manually assign decoding method. For example:
>> for name in nltk.corpus.gutenberg.fileids():
>> name.decode('utf-8')
>> print(name)
The other way is to UPDATE your Python to version 3.0+ (Recommended). They fix this problem in Python3.0. Here is the link to update detail description:
https://docs.python.org/release/3.0.1/whatsnew/3.0.html#text-vs-data-instead-of-unicode-vs-8-bit
Hope this helps you.

Getting MySQL to properly distinguish Japanese characters in SELECT calls

I'm setting up a database to do some linguistic analysis, and Japanese Kana are giving me just a bit of trouble.
Unlike other questions on this so far, I don't know that it's an encoding issue, per se. I've set the coallation to utf8_unicode_ci, and on the surface it's saving and recalling most things all right.
The problem, however, is when I get into related kana, such as キ (ki) and ギ (gi). For sorting purposes, Japanese doesn't distinguish between the two unless they are in direct conflict. So for example:
ぎ (gi) comes before きかい (kikai)
きる (kiru) comes before ぎわく (giwaku)
き (ki) comes before ぎ (gi)
It's this behavior that I think is at the root of my problem. When loading my data set from an external file, I had it do a SELECT call to verify that specific readings in Japanese had not already been logged. If it was already there, it would fetch the ID so it could be paired to a headword; otherwise a new entry was added and paired thereafter.
What I noticed after I put everything in is that wherever two such similar readings occurred, the first one encountered would be logged and would then show up as a false positive for the other if it showed up. For example:
キョウ (kyou) appeared first, so characters with ギョウ (gyou) got paired with kyou instead
ズ (zu) appeared before ス (su), so likewise even more characters got incorrectly matched.
I can go through and manually sort it out if need be, but what I would really like to do is set the database up to take a stricter view regarding differentiating between characters (e.g. if the characters have two different UTF-8 code points, treat them as different characters). Is there any way to get this behavior?
You can use utf8_bin to get a collation that compares characters by their Unicode code points.
The utf8_general_ci collation also distinguishes キョウ and ギョウ.
when saving to database
save it as binary
and when calling back change it to Japanese
same problem accorded with me with Arabic language

PDF Open Parameters: comment=commentID doesn't work

According to Adobe's Manual on PDF Open Parameters PDF files can be opened with certain parameters from command line or from a link in HTML.
These open Parameters include page=pagenum, zoom=scale, comment=commentID and others (the first parameter should be preceded with a # and the next should be preceded with a &
The official PDF Open Parameters from adobe gives this example:
#page=1&comment=452fde0e-fd22-457c-84aa-2cf5bed5a349
but the comment part doesn't work for me!
page=pagenum and zoom=scale work for me well. But comment=commentID does not work. I tried on Adobe reader 6.0.0 and Adobe Pro Extended 9.0.0: I can't get to the specified comment.
Also, I get the comment ID by exporting the comments in XFDF format and in the resulting file, there is a name attribute for every comment that I hope corresponds to the ID (well, the appearance looks like the example in the manual).
I thought maybe there is a setting that I should first enable (or maybe disable in adobe) or maybe I am getting the comment IDs wrong, or maybe something else?!
Any help would be extremely appreciated
According to the docs, you must include a page=X along with your comment=foo. Your copied sample has it, but it's copied from the docs, not something you did yourself.
Are you missing a page= when setting comment?
BASTARDS!
From the last page of the manual you linked:
URL Limitations
●Only one digit following a decimal point is retained for float values.
●Individual parameters, together with their values (separated by & or #), can be no greater then 32 characters in length.
Emphasis added.
The comment ID is a 16-byte value expressed as hex, with four hyphens thrown in to break up the monotony. That's 36 characters right there... starting with "comment=" adds another 8 characters. 44 characters total.
According to that, a comment ID can NEVER WORK, including the samples they show in their docs.
Are you just trying it on the command line, or have you tried via a web browser too? I wonder if that makes a difference. If not, we're looking at a feature that CANNOT WORK. EVER... and probably never has.