Jekyll: What is the default _data sorting criteria? - jekyll

When iterating through an array of files in the _data folder, what is the default criteria for sorting the files?
At first I was expecting it to be sorted alphabetically, but after some testing I realized it was not. Still, I couldn't figure out what was the criteria being used to sort the files.
{%- for file in site.data.folder -%}
{{ file | inspect }}
<br />
<br />
{%- endfor -%}
From what I understood file is an array containing the filename as the first element and the data as the second element, so I'm not sure using sort with any property name would work. When I tried I had the error message:
Liquid Exception: no implicit conversion of String into Integer
When using sort with no arguments, I could return the files sorted by filename alphabetic order:
{%- assign files = site.data.folder | sort -%}
{%- for file in files -%}
{{ file | inspect }}
<br />
<br />
{%- endfor -%}
So my questions are:
What is the default sorting criteria for _data files?
Is sorting in relation to an object property possible? (I'm thinking the issue with that one is having an array and not the pure objects when you access site.data.folder)
Example:
After creating the default Jekyll page, I created the _data/folder directory, where I'd include 5 random .json files:
_data/folder/a.json
_data/folder/b.json
_data/folder/c.json
_data/folder/d.json
_data/folder/e.json
Each of them have the following content:
_data/folder/a.json:
{"name":"Mike"}
_data/folder/b.json:
{"id":"4343"}
_data/folder/c.json:
[{"age":"29"},{"job":"journalist"}]
_data/folder/d.json:
{"name":"John"}
_data/folder/e.json
{"haircolor":"green"}
With those files in place, I created a page named page.html on the root directory with:
---
---
<pre>{{ site.data.folder | inspect }}</pre>
<br />
<br />
{%- for file in site.data.folder -%}
<pre>{{ file | inspect }}</pre>
<br />
{%- endfor -%}
And the output of that page was:
{"e"=>{"haircolor"=>"green"}, "c"=>[{"age"=>"29"}, {"job"=>"journalist"}], "d"=>{"name"=>"John"}, "a"=>{"name"=>"Mike"}, "b"=>{"id"=>"4343"}}
["e", {"haircolor"=>"green"}]
["c", [{"age"=>"29"}, {"job"=>"journalist"}]]
["d", {"name"=>"John"}]
["a", {"name"=>"Mike"}]
["b", {"id"=>"4343"}]
The files were not ordered alphabetically, but instead in some apparently random order. I can get them in alphabetical order by using:
---
---
<pre>{{ site.data.folder | sort | inspect }}</pre>
<br />
<br />
{%- assign folder = site.data.folder | sort -%}
{%- for file in folder -%}
<pre>{{ file | inspect }}</pre>
<br />
{%- endfor -%}
Output:
[["a", {"name"=>"Mike"}], ["b", {"id"=>"4343"}], ["c", [{"age"=>"29"}, {"job"=>"journalist"}]], ["d", {"name"=>"John"}], ["e", {"haircolor"=>"green"}]]
["a", {"name"=>"Mike"}]
["b", {"id"=>"4343"}]
["c", [{"age"=>"29"}, {"job"=>"journalist"}]]
["d", {"name"=>"John"}]
["e", {"haircolor"=>"green"}]
But it's still unclear what is the ordering criteria on the call without sort.

Going from #ashmaroli's assumption that this was not a Jekyll's issue, I started making a little bit of research about file ordering and ran into the following resources:
File ordering behavior while using Dir on Ruby
Indeterministic File order using Dir
The link describes a counter intuitive behavior when loading multiple dependencies. If the order the files are loaded matter the shortcut below could result in they being loaded in a different order than the expected.
Dir[File.join(File.dirname(__FILE__), 'example/*.rb')].each{ |f| require f }
This is apparently due to the underlying glob system call according to the answer in the link.
Python glob ordering
How is Pythons glob.glob ordered?
In the SO question above, the user is asking why the returned glob file order in Python is different than the order on the output of ls -l. Even though the question is about Python and not Ruby, the underlying call to the OS is likely the same. The OS is not required to deliver the files in any order, so they should be sorted after the call.
The first answer states that if you run ls -U you get the unordered list of files, which matches the order I have here when I make a list of _data objects on Jekyll without sorting. So this is most likely the cause of the weird ordering: it's OS dependent.
Since Jekyll orders the _post files, I think it wouldn't be a major issue to order _data files by default as well, to avoid any confusion. But as it was stated before in the question itself, it can be easily done with the sort filter.

Related

Does Jinja support variable assignment as a result of a loop?

I've been using Jinja and DBT for a month now, and despite reading a lot about it, I didn't quite figure out how to create a list from another, using a simple for loop like I would in Python.
Just a toy example:
{%- set not_wanted_columns = ['apple', 'banana'] -%}
{%- set all_columns = ['kiwi', 'peach', 'apple', 'banana', 'apricot', 'pineapple'] -%}
What I want is a list as so:
{% set filtered_columns = ['kiwi', 'peach', 'apricot', 'pineapple'] %}
Naturally, I don't want to manually write this result because the full list might be dynamic or too long. I'm not even sure if Jinja does actually support this, although I do think this is a common problem.
As you have probably read from the documentation:
Please note that assignments in loops will be cleared at the end of the iteration and cannot outlive the loop scope. Older versions of Jinja had a bug where in some circumstances it appeared that assignments would work. This is not supported.
Source: https://jinja.palletsprojects.com/en/3.1.x/templates/#for
And I guess when you are speaking about
using a simple for loop like I would in Python
What you mean here is using a list comprehension.
So, as showed in the documentation, Jinja is using filter to achieve this:
Example usage:
{{ numbers|select("odd") }}
{{ numbers|select("divisibleby", 3) }}
Similar to a generator comprehension such as:
(n for n in numbers if test_odd(n))
(n for n in numbers if test_divisibleby(n, 3))
Source: https://jinja.palletsprojects.com/en/3.1.x/templates/#jinja-filters.select
There are actual four of those filter acting as generator comprehension:
reject
rejectattr
select
selectattr
So, in your case, a reject filter would totally do the trick:
{%- set filtered_columns = all_columns
| reject('in', not_wanted_columns)
| list
-%}
But, if you really want, you could also achieve it in a for:
{%- for column in all_columns if column not in not_wanted_columns -%}
{% do filtered_columns.append(column) %}
{%- endfor -%}
The do statement being a way to use list.append() without oddities being printed out: https://jinja.palletsprojects.com/en/3.1.x/templates/#expression-statement.

How to loop through all files in Jekyll's _data folder, and pull their filenames?

Similar to the question at How to loop through all files in Jekyll's _data folder?, how would one loop through the files in their /_data directory (or a subdirectory) and pull the filenames of each file?
for example, if you had:
_data/
navigation.yml
news.yml
people/
advisors.yml
board.yml
staff.yml
... and you wanted to get the list of files inside /_data/people/?
If you loop through the subdirectory, each for item (in this case "file") will have file[0] as the filename (without the extension) and file[1] as the content of the file.
Thus, your code can look like:
{% for file in site.data.people %}
{{ file[0] }}
{% endfor %}
would result in:
advisors
board
staff

How to Convert a folder of csv files to json files within Jekyll

Is it possible to convert a folder of csv files to json as part of a Jekyll workflow? I currently use a python script to do this but would like to do it entirely within Jekyll
You can call your python script as part of the build process, but you'll have to make a teeny tiny plugin to do it. This also assumes you're not using github pages, because they don't like plugins.
make a _plugins directory in your site root.
Inside that directory, create csv_to_json.rb
In that ruby file, call your python script with the following code:
# The following line tells jekyll to run everything between 'do' and 'end'
# when it finishes writing the site to disk:
Jekyll::Hooks.register :site, :post_write do |_site|
# Backticks are one way to call shell commands from ruby:
`python your_script_here.py` # replace with the correct filename
end
This is untested code. Relevant documentation here and here
There are quite a few ways to do this, but I think that's the simplest for your case.
You can :
1 - Store your csv in _data/foldername (eg : _data/members) see : Jekyll's data files
2 - Put all your datas in a new array with concat filter
{% comment %} ### Create an empty array{% endcomment %}
{% assign all-members = "" | split: "" %}
{% for part in site.data.members %}
{% assign all-members = all-members | concat: part[1] %}
{% endfor %}
3 - Output datas unsing the jsonify filter : {{ all-members | jsonify }}
An all in one, a members.json file can look like :
---
layout: null
---
{% assign all-members = "" | split: "" %}
{%- for part in site.data.members %}
{% assign all-members = all-members | concat: part[1] %}
{% endfor -%}
{{ all-members | jsonify }}
Thank you to both of you. I have gone the plugin route as this is the easiest and I may need to develop another plugin at some point so I may as well learn how to make one.

use variable from YFM in post

How can I use dynamic data file?
Say I have several data files: file1.yml, file2.yml, file3.yml and in YFM I want to tell which data file to use:
---
datafilename: file1
---
{{ site.data.datafilename.person.name }}
^
How to tell liquid that here should be file1
Ideally would be to use post's file name. So that post1.md would use post1.yml data file and so on.
This should work from inside a post :
{{ site.data[page.slug].person.name }}

Create multiple jekyll pages from yaml without using a plugin

I am working on a Jekyll site on and I want to be able to have a page for each person in the group. I know I can use a collections to generate pages, if the files in the collection are markdown. I want to be able to have yaml files in the collection, then generate pages after passing each yaml file to a template.
People files might look like this:
# person-1.yaml
name: thingy m. bob
position: coffee fetcher
bio: no bio needed
# person-2.yaml
name: mars e. pan
position: head honcho
bio: expert in everything
Then a template file like this (people-template.md):
# {{ page.name }} - {{ page.position }}
{{ page.bio }}
And the output would be individual file under /people/, i.e, /people/person-1, /people/person-2, which are formatted as in the template, but using the .yaml files.
I am using GitHub pages, so I don't want to have to use any plugins which that doesn't support.
I have implemented something similar ... this is the setup I created:
- _Layouts
- person.html
...
people
- index.md (list of people - see code below)
- _posts
- 2015-01-01-person-one.md (ordering defined by date which is thrown away)
- 2015-01-02-person-two.md
- 2015-01-03-person-three.md
...
Then for a list of people you can use something like:
<ul>
{% for person in site.categories.people %}
<li></li>
{% endfor %}
</ul>
with each person being in the form
---
name: "thingy m. bob"
# using quotes to avoid issues with non-alpha characters
position: "coffee fetcher"
bio: "no bio needed"
layout: person
---
any markdown if you want more of a description
I hope that has given you something to start with ... I think that putting the _posts folder under the people folder will automatically set the category to people. If I am wrong, just add category: people to the yaml.
You can set the pattern for the post urls in _config.yaml if you want to remove the date part.
Good luck