I have the following Ansible project structure:
├── demo.yml
├── hosts
├── group_vars
│ └── all
└── roles
├── common
│ ├── tasks
│ │ └── main.yml
│ └── templates
│ └── init.j2
Inside 'hosts', I have:
[primary]
server1
[secondary]
server2
In roles/common/templates/init.j2, I want to be able to refer to the [primary] group variable. Since Ansible uses Jinja2 for its template module. I was directed to this Jinja2 doc.
I tried:
print("{{ group['primary'] }}")
But it will return:
['server1']
Right now I can only get it within a loop:
{% for host in groups['primary'] %}
print("{{ host }}")
{% endfor %}
It will return what I want:
server1
But how do I get this result without using a loop?
Try this...
groups['primary'][0]
or just print group and you should be able to see how the data is store.
Hope this helps!
Related
I'm new to Chef and Test-kitchen and I'm trying to use random JSON file as attribute or environment (preferably attribute), but unfortunately I can't access the JSON values from the recipes.
I'm using the following directory structure:
uat
├── attributes
│ ├── dev.json
│ ├── .kitchen
│ │ └── logs
│ │ └── kitchen.log
│ └── prod.json
├── Berksfile
├── Berksfile.lock
├── chefignore
├── environments
│ ├── dev.json
│ └── prod.json
├── Gemfile
├── Gemfile.lock
├── .kitchen
│ ├── default-windows.yml
│ └── logs
│ ├── default-windows.log
│ └── kitchen.log
├── .kitchen.yml
├── metadata.rb
└── recipes
├── default.rb
├── prep.rb
└── service_install.rb
This is the .kitchen.yml:
---
driver:
name: machine
username: sample_user
password: sample_pass
hostname: 192.168.1.102
port: 5985
provisioner:
name: chef_zero
json_attributes: true
environments_path: 'environments/dev'
platforms:
- name: windows
suites:
- name: default
run_list:
- recipe[uat::default]
This is the dev.json:
{
"groupID": "Project-name",
"directoryName": "sample_folder",
"environmentType": "UAT",
}
This is the recipe prep.rb :
directory "C:/Users/test/#{node['directoryName']}" do
recursive true
action :create
end
If I create something.rb in attributes folder and with content: default['directoryName'] = 'sample_folder', it works like a charm, but I need to use a JSON file which to store parameters company wide.
Could you please help me find what I'm doing wrong.
So a couple of issues. First, the environments_path points at a folder, not the specific file, so that should just be environments/. Second, it has to be an actual environment object, see https://docs.chef.io/environments.html#json for a description of the schema. Third, you would need to actually apply the environment to the test node:
provisioner:
# Other stuff ...
client_rb:
chef_environment: dev
I have a Jekyll blog and some of the raw posts have some additional files at the same level with the markdown file. For example:
.
├── _posts
├── first-post
├── 2007-10-29-first-post.md
└── download.zip
How can I and up with a generated structure such as
.
├── _sites
├── first-post
├── index.html
└── download.zip
The download.zip file needs to be in the same location as its dependent post (I cannot use any includes or other redirect tricks)
Try this jekyll-postfiles which is:
A Jekyll plugin that copies static files from the _posts to the _site folder
I get the following error when I try to deploy to aws Elastic Beanstalk.
Printing Status:
INFO: createEnvironment is starting.
INFO: Using elasticbeanstalk-us-west-2-695359152326 as Amazon S3 storage bucket for environment data.
ERROR: InvalidParameterValue: Unknown template parameter: StaticFiles
ERROR: Failed to launch environment.
I am using a preconfigured dockerpython template with the following structure.
.
├── application.config
├── application.py
├── Dockerfile
├── Dockerrun.aws.json
├── iam_policy.json
├── LICENSE.txt
├── misc
├── NOTICE.txt
├── README.md
├── requirements.txt
├── static
│ ├── bootstrap
│ ├── images
│ └── jquery
└── templates
├── aboutus.html
├── clients.html
├── commonheaderincludes.html
├── commonhtmlheader.html
├── footer.html
├── header.html
├── index.html
└── services.html
Please help.
I have some daily data to save to multi folders(mostly based on time). now I have two format to store the files one is parquet and the other is csv , I would like to save to parquet format to save some space.
the folder structure is like following :
[root#hdp raw]# tree
.
├── entityid=10001
│ └── year=2017
│ └── quarter=1
│ └── month=1
│ ├── day=6
│ │ └── part-r-00000-84f964ec-f3ea-46fd-9fe6-8b36c2433e8e.snappy.parquet
│ └── day=7
│ └── part-r-00000-84f964ec-f3ea-46fd-9fe6-8b36c2433e8e.snappy.parquet
├── entityid=100055
│ └── year=2017
│ └── quarter=1
│ └── month=1
│ ├── day=6
│ │ └── part-r-00000-84f964ec-f3ea-46fd-9fe6-8b36c2433e8e.snappy.parquet
│ └── day=7
│ └── part-r-00000-84f964ec-f3ea-46fd-9fe6-8b36c2433e8e.snappy.parquet
├── entityid=100082
│ └── year=2017
│ └── quarter=1
│ └── month=1
│ ├── day=6
│ │ └── part-r-00000-84f964ec-f3ea-46fd-9fe6-8b36c2433e8e.snappy.parquet
│ └── day=7
│ └── part-r-00000-84f964ec-f3ea-46fd-9fe6-8b36c2433e8e.snappy.parquet
└── entityid=10012
└── year=2017
└── quarter=1
└── month=1
├── day=6
│ └── part-r-00000-84f964ec-f3ea-46fd-9fe6-8b36c2433e8e.snappy.parquet
└── day=7
└── part-r-00000-84f964ec-f3ea-46fd-9fe6-8b36c2433e8e.snappy.parquet
now I have a python list stores all the folders need to be read,suppose each time run it need to read only some of the folders base on filter conditions.
folderList=df_inc.collect()
folderString=[]
for x in folderList:
folderString.append(x.folders)
In [44]: folderString
Out[44]:
[u'/data/raw/entityid=100055/year=2017/quarter=1/month=1/day=7',
u'/data/raw/entityid=10012/year=2017/quarter=1/month=1/day=6',
u'/data/raw/entityid=100082/year=2017/quarter=1/month=1/day=7',
u'/data/raw/entityid=100055/year=2017/quarter=1/month=1/day=6',
u'/data/raw/entityid=100082/year=2017/quarter=1/month=1/day=6',
u'/data/raw/entityid=10012/year=2017/quarter=1/month=1/day=7']
the files were writen by :
df_join_with_time.coalesce(1).write.partitionBy("entityid","year","quarter","month","day").mode("append").parquet(rawFolderPrefix)
when I try to read the folders stored in folderString by df_batch=spark.read.parquet(folderString) error java.lang.ClassCastException: java.util.ArrayList cannot be cast to java.lang.String encounters.
if I save the files in csv format and read it through below code it just works fine as following: please if anyway to read the filelist for parquet folder ,much appreciate!
In [46]: folderList=df_inc.collect()
...: folderString=[]
...:
...: for x in folderList:
...: folderString.append(x.folders)
...: df_batch=spark.read.csv(folderString)
...:
In [47]: df_batch.show()
+------------+---+-------------------+----------+----------+
| _c0|_c1| _c2| _c3| _c4|
+------------+---+-------------------+----------+----------+
|6C25B9C3DD54| 1|2017-01-07 00:00:01|1483718401|1483718400|
|38BC1ADB0164| 3|2017-01-06 00:00:01|1483632001|1483632000|
|38BC1ADB0164| 3|2017-01-07 00:00:01|1483718401|1483718400|
You are facing a miss understanding of partition in Hadoop and Parquet.
See, I have a simple file structure partitioned by year-month. It is like this:
my_folder
.
├── year-month=2016-12
| └── my_files.parquet
├── year-month=2016-11
| └── my_files.parquet
If I make a read from my_folder without any filter in my dataframe reader like this:
df = saprk.read.parquet("path/to/my_folder")
df.show()
If you check the Spark DAG visualization you can see that in this case it will read all my partitions as you said:
In the case above, each point in the first square is one partition of my data.
But if I change my code to this:
df = saprk.read.parquet("path/to/my_folder")\
.filter((col('year-month') >= lit(my_date.strftime('%Y-%m'))) &
(col('year-month') <= lit(my_date.strftime('%Y-%m'))))
The DAG visualization will show how many partitions I'm using:
So, if you filter by the column that is the partition you will not read all the files. Just that you need, you don't need to use that solution of reading one folder by folder.
I got this solved by :
df=spark.read.parquet(folderString[0])
y=0
for x in folderString:
if y>0:
df=df.union(spark.read.parquet(x))
y=y+1
it's a very ugly solution ,if you have good idea ,please let me know. many thanks.
few days later,found the perfect way to solve the problem by:
df=spark.read.parquet(*folderString)
I thought that setting the relative_directory (Jekyll Collection Docs) (github PR) property being set would help me keep my files organized without compromising my desired output, but it seems to be ignored/not used for producing files. I don't want my collections to be in the root directory, because I find it confusing to have ~10 collection folders adjacent to _assets, _data, _includes, _layouts, and others.
Fixes or alternative solutions are welcomed, as long as long as the output is the same, and my pages are in their own directory, without needing to put permalink front-matter on every single page.
_config.yaml
collections:
root:
relative_directory: '_pages/root'
output: true
permalink: /:path.html
root-worthy:
relative_directory: '_pages/root-worthy'
output: true
permalink: /:path.html
docs:
relative_directory: '_pages/docs'
output: true
permalink: /docs/:path.html
Directory Structure:
├── ...
├── _layouts
├── _pages
│ ├── root
│ │ ├── about.html
│ │ └── contact.html
│ ├── root_worthy
│ │ ├── quickstart.html
│ │ └── seo-worthy-page.html
│ └── docs
│ ├── errors.html
│ └── api.html
├── _posts
└── index.html
Desired output:
├── ...
├── _site
│ ├── about.html
│ ├── contact.html
│ ├── quickstart.html
│ ├── seo-worthy-page.html
│ └── docs
│ ├── errors.html
│ └── api.html
└── ...
It seems that the PR you mention is still not merged.
For 3.1.6 and next 3.2, jekyll code is still :
#relative_directory ||= "_#{label}"
But the requester made a plugin that looks like this :
_plugins/collection_relative_directory.rb
module Jekyll
class Collection
def relative_directory
#relative_directory ||= (metadata['relative_directory'] && site.in_source_dir(metadata['relative_directory']) || "_#{label}")
end
end
end