Generate fake CSV to test with rspec - csv

I want to test my method which import a CSV file.
But I don't know how to generate fake CSV files to test it.
I tried a lot of solution I already found on stack but it's not working in my case.
Here is the csv original file :
firstname,lastname,home_phone_number,mobile_phone_number,email,address
orsay,dup,0154862548,0658965848,orsay.dup#gmail.com,2 rue du pré paris
richard,planc,0145878596,0625147895,richard.planc#gmail.com,45 avenue du general leclerc
person.rb
def self.import_data(file)
filename = File.join Rails.root, file
CSV.foreach(filename, headers: true, col_sep: ',') do |row|
firstname, lastname, home_phone_number, mobile_phone_number, email, address = row
person = Person.find_or_create_by(firstname: row["firstname"], lastname: row['lastname'], address: row['address'] )
if person.is_former_email?(row['email']) != true
person.update_attributes({firstname: row['firstname'], lastname: row['lastname'], home_phone_number: row['home_phone_number'], mobile_phone_number: row['mobile_phone_number'], address: row['address'], email: row['email']})
end
end
end
person_spec.rb :
require "rails_helper"
RSpec.describe Person, :type => :model do
describe "CSV file is valid" do
file = #fake file
it "should read in the csv" do
end
it "should have result" do
end
end
describe "import valid data" do
valid_data_file = #fake file
it "save new people" do
Person.delete_all
expect { Person.import_data(valid_data_file)}.to change{ Person.count }.by(2)
expect(Person.find_by(lastname: 'dup').email).to eq "orsay.dup#gmail.com"
end
it "update with new email" do
end
end
describe "import invalid data" do
invalid_data_file = #fake file
it "should not update with former email" do
end
it "should not import twice from CSV" do
end
end
end

I successfully used the Faked CSV Gem from https://github.com/jiananlu/faked_csv to achieve your purpose of generating a CSV File with fake data.
Follow these steps to use it:
Open your command line (i.e. on OSX open Spotlight with CMD+Space, and enter "Terminal")
Install Faked CSV Gem by running command gem install faked_csv. Note: If using a Ruby on Rails project add gem 'faked_csv' to your Gemfile, and then run bundle install
Validate Faked CSV Gem installed successfully by typing in Bash Terminal faked_csv --version
Create a Configuration File for the Faked CSV Gem and where you define how to generate fake data. For example, the below will generate a CSV file with 200 rows (or edit to as many as you wish) and contain comma separated columns for each field. If the value of field type is prefixed with faker: then refer to the "Usage" section of the Faker Gem https://github.com/stympy/faker for examples.
my_faked_config.csv.json
{
"rows": 200,
"fields": [
{
"name": "firstname",
"type": "faker:name:first_name",
"inject": ["luke", "dup", "planc"]
},
{
"name": "lastname",
"type": "faker:name:last_name",
"inject": ["schoen", "orsay", "richard"]
},
{
"name": "home_phone_number",
"type": "rand:int",
"range": [1000000000, 9999999999]
},
{
"name": "mobile_phone_number",
"type": "rand:int",
"range": [1000000000, 9999999999]
},
{
"name": "email",
"type": "faker:internet:email"
},
{
"name": "address",
"type": "faker:address:street_address",
"rotate": 200
}
]
}
Run the following command to use the configuration file my_faked_config.csv.json to generate a CSV file in the current folder named my_faked_data.csv, which contains the fake data faked_csv -i my_faked_config.csv.json -o my_faked_data.csv
Since the generated file may not include the associated Label for each column after generation, simply manually insert the following line at the top of my_faked_data.csv firstname,lastname,home_phone_number,mobile_phone_number,email,address
Review the final contents of the my_faked_data.csv CSV file containing the fake data, which should appear similar to the following:
my_faked_data.csv
firstname,lastname,home_phone_number,mobile_phone_number,email,address
Kyler,Eichmann,8120675609,7804878030,norene#bergnaum.io,56006 Fadel Mission
Hanna,Barton,9424088332,8720530995,anabel#moengoyette.name,874 Leannon Ways
Mortimer,Stokes,5645028548,9662617821,moses#kihnlegros.org,566 Wilderman Falls
Camden,Langworth,2622619338,1951547890,vincenza#gaylordkemmer.info,823 Esmeralda Pike
Nikolas,Hessel,5476149226,1051193757,jonathon#ziemannnitzsche.name,276 Reinger Parks
...
Modify your person_spec.rb Unit Test using the technique shown below, which passes in Mock data to test functionality of the import_data function of your person.rb file
person_spec.rb
require 'rails_helper'
RSpec.describe Person, type: :model do
describe 'Class' do
subject { Person }
it { should respond_to(:import_data) }
let(:data) { "firstname,lastname,home_phone_number,mobile_phone_number,email,address\r1,Kyler,Eichmann,8120675609,7804878030,norene#bergnaum.io,56006 Fadel Mission" }
describe "#import_data" do
it "save new people" do
File.stub(:open).with("filename", {:universal_newline=>false, :headers=>true}) {
StringIO.new(data)
}
Product.import("filename")
expect(Product.find_by(firstname: 'Kyler').mobile_phone_number).to eq 7804878030
end
end
end
end
Note: I used it myself to generate a large CSV file with meaningful fake data for my Ruby on Rails CSV app. My app allows a user to upload a CSV file containing specific column names and persist it to a PostgreSQL database and it then displays the data in a Paginated table view with the ability to Search and Sort using AJAX.

Use openoffice or excel and save the file out as a .csv file in the save options. A spreadsheet progam.

Related

Python Lambda actioning CSV file once but not the second time

I am experiencing a strange issue with my Python code. It's objective is the following:
Retrieve a .csv from S3
Convert that .csv into JSON (Its an array of objects)
Add a few key value pairs to each object in the Array, and change the original key values
Validate the JSON
Sent the JSON to a /output S3 bucket
Load the JSON into Dynamo
Here's what the .csv looks like:
Prefix,Provider
ABCDE,Provider A
QWERT,Provider B
ASDFG,Provider C
ZXCVB,Provider D
POIUY,Provider E
And here's my python script:
import json
import boto3
import ast
import csv
import os
import datetime as dt
from datetime import datetime
import jsonschema
from jsonschema import validate
s3 = boto3.client('s3')
dynamodb = boto3.resource('dynamodb')
providerCodesSchema = {
"type": "array",
"items": {
"type": "object",
"properties": {
"providerCode": {"type": "string", "maxLength": 5},
"providerName": {"type": "string"},
"activeFrom": {"type": "string", "format": "date"},
"activeTo": {"type": "string"},
"apiActiveFrom": {"type": "string"},
"apiActiveTo": {"type": "string"},
"countThreshold": {"type": "string"}
},
"required": ["providerCode", "providerName"]
}
}
datestamp = dt.datetime.now().strftime("%Y/%m/%d")
timestamp = dt.datetime.now().strftime("%s")
updateTime = dt.datetime.now().strftime("%Y/%m/%d/%H:%M:%S")
nowdatetime = dt.datetime.now()
yesterday = nowdatetime - dt.timedelta(days=1)
nintydaysfromnow = nowdatetime + dt.timedelta(days=90)
def lambda_handler(event, context):
filename_json = "/tmp/file_{ts}.json".format(ts=timestamp)
filename_csv = "/tmp/file_{ts}.csv".format(ts=timestamp)
keyname_s3 = "newloader-ptv/output/{ds}/{ts}.json".format(ds=datestamp, ts=timestamp)
json_data = []
for record in event['Records']:
bucket_name = record['s3']['bucket']['name']
key_name = record['s3']['object']['key']
s3_object = s3.get_object(Bucket=bucket_name, Key=key_name)
data = s3_object['Body'].read()
contents = data.decode('latin')
with open(filename_csv, 'a', encoding='utf-8') as csv_data:
csv_data.write(contents)
with open(filename_csv, encoding='utf-8-sig') as csv_data:
csv_reader = csv.DictReader(csv_data)
for csv_row in csv_reader:
json_data.append(csv_row)
for elem in json_data:
elem['providerCode'] = elem.pop('Prefix')
elem['providerName'] = elem.pop('Provider')
for element in json_data:
element['activeFrom'] = yesterday.strftime("%Y-%m-%dT%H:%M:%S.00-00:00")
element['activeTo'] = nintydaysfromnow.strftime("%Y-%m-%dT%H:%M:%S.00-00:00")
element['apiActiveFrom'] = " "
element['apiActiveTo'] = " "
element['countThreshold'] = "3"
element['updateDate'] = updateTime
try:
validate(instance=json_data, schema=providerCodesSchema)
except jsonschema.exceptions.ValidationError as err:
print(err)
err = "Given JSON data is InValid"
return None
with open(filename_json, 'w', encoding='utf-8-sig') as json_file:
json_file.write(json.dumps(json_data, default=str))
with open(filename_json, 'r', encoding='utf-8-sig') as json_file_contents:
response = s3.put_object(Bucket=bucket_name, Key=keyname_s3, Body=json_file_contents.read())
for jsonElement in json_data:
table = dynamodb.Table('privateProviders-loader')
table.put_item(Item=jsonElement)
print("finished enriching JSON")
os.remove(filename_csv)
os.remove(filename_json)
return None
I'm new to Python, so please forgive any amateur mistakes in the code.
Here's my issue:
When I deploy the code, and add a valid .csv into my S3 bucket, everything works.
When I then add an invalid .csv into my S3 buck, again it work, the import fails as the validation kicks in and tells me the problem.
However, when I add the valid .csv back into the S3 bucket, I get the same cloudwatch log as I did for the invalid .csv, and my Dynamo isn't updated, nor is an output JSON file sent to /output in S3.
With some troubleshooting I've noticed the following behavour:
When I first deploy the code, the first .csv loads as expected (dynamo table updated + JSON file sent to S3 + cloudwatch logs documenting the process)
If I enter the same valid .csv into the S3 bucket, it gives me the same nice looking cloudwatch logs, but none of the other actions take place (Dynamo not updated etc)
If I add the invalid .csv, that seems to break the cycle, and I get a nice Cloudwatch log showing the validation has kicked in, but if I reload the valid .csv, which just previously resulted in good cloudwatch logs (but no actual real outputs), I now get a repeat of the validation error log.
In short, the first time the function is invoked, it seems to work, the second time it doesn't.
It seems as though the python function is caching something or not closing out the function when finished, and I've played about with the return command etc, but nothing I've tried works. I've sunk many hours into trying to move parts of the code around etc. thinking the structure or order of events is the problem, and I've the code above gives me the closest behaviour to expected, given that it seems to work completely the first and only time I load the .csv into S3.
Any help or general pointers would be massively appreciated.
Thanks
P.s. Here's an example of the Cloudwatch log when validation kicks in a and stops an invalid .csv from being processed. If I then add a valid .csv to S£, the function is triggered, but I get this same error, even though the file is actually good.
2021-06-29T22:12:27.709+01:00 'ABCDEE' is too long
2021-06-29T22:12:27.709+01:00 Failed validating 'maxLength' in schema['items']['properties']['providerCode']:
2021-06-29T22:12:27.709+01:00 {'maxLength': 5, 'type': 'string'}
2021-06-29T22:12:27.709+01:00 On instance[2]['providerCode']:
2021-06-29T22:12:27.709+01:00 'ABCDEE'
2021-06-29T22:12:27.710+01:00 END RequestId: 81dd6a2d-130b-4c8f-ad08-39307841adf9
2021-06-29T22:12:27.710+01:00 REPORT RequestId: 81dd6a2d-130b-4c8f-ad08-39307841adf9 Duration: 482.43 ms Billed Duration: 483

bigrquery bq_table_load csv file with tab delimiter

I am trying to use bigrquery's bq_table_load() command to move a tab delimited csv file from google storage to bigrquery. It works but it doesn't automatically recognize the column names. Doing the same thing interactively (i.e. in the bigquery clould console) works well. Comparing the jobs metadata for the two jobs (R induced jobs vs cloud console job) I note that the column delimiter is not set to TAB for the R job. This is despite me including this in my command call; e.g. as follows:
bq_table_load(<x>,<uri>, fieldDelimiter="Tab", source_format = "CSV", autodetect=TRUE)
I tried all sorts of variations of this...nothing seems to work (i.e. the R job will always have the Comma delimiter set)...here are some of the variations I tried:
bq_table_load(<x>,<uri>, field_delimiter="Tab", source_format = "CSV", autodetect=TRUE)
bq_table_load(<x>,<uri>, field_delimiter="\t", source_format = "CSV", autodetect=TRUE)
bq_table_load(<x>,<uri>, field_delimiter="tab", source_format = "CSV", autodetect=TRUE)
Any suggestions?
You can define Schema using a schema file, a sample is given below:-
Sample BQ load command, where '$schema_dir/$TABLENAME.json' Represent a schema file :-
bq --nosync load --source_format=CSV --skip_leading_rows=3 --allow_jagged_rows=TRUE --max_bad_records=10000 \
--allow_quoted_newlines=TRUE $projectid:$dataset.$TABLENAME \
$csv_data_path/$FILENAME $schema_dir/$TABLENAME.json
Sample Schema file
[
{
"mode": "NULLABLE",
"name": "C1",
"type": "STRING"
}
]

How to define config file variables?

I have a configuration file with:
{path, "/mnt/test/"}.
{name, "Joe"}.
The path and the name could be changed by a user. As I know, there is a way to save those variables in a module by usage of file:consult/1 in
-define(VARIABLE, <parsing of the config file>).
Are there any better ways to read a config file when the module begins to work without making a parsing function in -define? (As I know, according to Erlang developers, it's not the best way to make a complicated functions in -define)
If you need to store config only when you start the application - you may use application config file which is defined in 'rebar.config'
{profiles, [
{local,
[{relx, [
{dev_mode, false},
{include_erts, true},
{include_src, false},
{vm_args, "config/local/vm.args"}]
{sys_config, "config/local/yourapplication.config"}]
}]
}
]}.
more info about this here: rebar3 configuration
next step to create yourapplication.config - store it in your application folder /app/config/local/yourapplication.config
this configuration should have structure like this example
[
{
yourapplicationname, [
{path, "/mnt/test/"},
{name, "Joe"}
]
}
].
so when your application is started
you can get the whole config data with
{ok, "/mnt/test/"} = application:get_env(yourapplicationname, path)
{ok, "Joe"} = application:get_env(yourapplicationname, name)
and now you may -define this variables like:
-define(VARIABLE,
case application:get_env(yourapplicationname, path) of
{ok, Data} -> Data
_ -> undefined
end
).

How to keep appium capabilities in json file and call in code

Following is my appium capability set to run a test
cap = new DesiredCapabilities();
cap.setCapability(CapabilityType.PLATFORM, "Android");
cap.setCapability(CapabilityType.VERSION, "5.1.0");
cap.setCapability("deviceName", "mygeny510");
cap.setCapability("appPackage", "com.android.dialer");
cap.setCapability("appActivity", "com.android.dialer.DialtactsActivity");
driver = new AndroidDriver<MobileElement>(new URL("http://127.0.0.1:4723/wd/hub"), cap);
I want to keep the capabilities in a apm.json file
[
{
"platformName": "android",
"appPackage":"com.android.dialer",
"appActivity": "com.android.dialer.DialtactsActivity",
"deviceName": "mygeny510"
}
]
Now can anyone help to call the apm.json into the code instead writing each capabilities by using cap.setcapability(,)
You can place all the desired capabilities on to a seperate file and load the file in an other file for referencing it.
For eg,
I have the desired capabilities in env.rb
def abc
{
caps:
{
platformName: "iOS",
deviceName: "",
udid: "",
app: (File.join(File.dirname(__FILE__), "")),
bundleId: "",
automationName: "XCUITest",
xcodeOrgId: "",
xcodeSigningId: "",
platformVersion: "9.3.2",
noReset: "true",
fullReset: "false",
showIOSLog: "true"
}
}
end
Now you can go to the file where you want to launch this desired capabilities. For this, you would need to load the .json file into this file. I have used require_relative to load the file in order to call the method. Once you do that, you can start the session with
def AnyName
Appium::Driver.new(abc) #Pass capabilities for appium inside the driver
Appium.promote_appium_methods Object #Makes all appium_lib methods accessible from steps
$driver.start_driver #Starts appium driver before the tests begin
end
Hope this helps!

extract values from json using Ruby

I need to extract only the value for 'admins' from this Json using Ruby :
JSON -
{
"Roles":[
{
"admins":[
"me"
],
"role":"cleanup"
},
{
"admins":[
"tester"
],
"role":"create a mess"
},
]
}
RUBY -
require 'json'
file = File.read('adminlist_Feb_2017.json')
thismonthlist=JSON.parse(file)
puts thismonthlist['admins']
Output - this gives me a blank output however if i change the last line to :
puts thismonthlist['Roles']
it gives me everything. I just want the list of admins.
Try something like this
thismonthlist[:Roles].flat_map { |role| role[:admins] }
=> ["me", "tester"]
admins = []
File.open('adminlist_Feb_2017.json', 'r') do |file|
json = JSON.parse(file.read)
admins = json["Roles"].flat_map{|role| role["admins"]}.uniq
end
admins
# => ["me", "tester"]
I open the file and process it in a block to ensure it's closed at the end. In the block I read the file content and parse the json string into a hash. Then I go through the "Roles" of the hash, grab the "admins" arrays and return it as one array only with Enumerable#flat_map. After I use Enumerable#uniq to return each admin only once.