Extracting multiple agencies from a GTFS file - gtfs

I can use the transitland-lib to extract a single agency from a GTFS file and create a new one like this:
sudo ./transitland-linux extract -extract-agency 53 germany.gtfs.zip my_agency.gtfs.zip
Is there a convenient way of extracting multiple agencies in a similar fashion? So e.g. can I somehow use the extract command somewhat like this:
sudo ./transitland-linux extract -extract-agency 53,54 germany.gtfs.zip my_agency.gtfs.zip
.. to create a new GTFS file that contains only agency 53 and 54?

You should be able to extract multiple agencies by chaining multiple arguments, like so:
sudo ./transitland-linux extract -extract-agency 53 -extract-agency 54 germany.gtfs.zip my_agency.gtfs.zip
P.S. Glad you are making good use of transitland-lib!

This can be done by using the onebusaway-gtfs-transformer-cli command-line application.
Specifically "retain an entity":
{"op":"retain", "match":{"file":"agency.txt", "agency_id":"53"}}
{"op":"retain", "match":{"file":"agency.txt", "agency_id":"54"}}

Related

Formatting wide output via 'column' (or similar) command(s)

This question actually asks the 'inverse' solution as the one here, namely I would like to wrap the long column (column 4) on multiple lines. In effect, the output should look like:
cat test.csv | column -s"," -t -c5
col1 col2 col3 col4 col5
1 2 3 longLineOfText 5
ThatIWantTo
InspectAndWould
LikeToWrap
(excuse the u.u.o.c. duplicated over here :) )
The solution would ideally :
make use of standard *nix text processing utilities (e.g. column, paste, pr which usually are present on any modern Linux machine nowadays, usually coming from the core-utils package);
avoid jq as it is not necessarily present on every (production) system;
don't overheat the brain: yes... am looking mainly at you awk & co. gurus :). "Normal" awk / perl / sed is fine.
as a special bonus , a solution using vim would be even more welcome (again, no brain smoke please), since that would allow for syntax-coloring as well.
The background: I want to be able to make sense of the output of docker history, so as a last resort even some Go Template-magic would suit, as would using jq.
In extreme cases (if the benefits of ease-of-remembering-and-use outweigh the inconvenience of downloading a new utilty (preferably self-contained / static linked) utility on the server - is ok, or using json processing commands (in which case using pythons json module would be preferred)
Thanks !
LE:
Please keep in mind, that dockers output has the columns separated with several spaces, which unfortunately confuses most commands :(

line feed within a column in csv

I have a csv like below. some of columns have line break like column B below. when I doing wc -l file.csv unix is returning 4 but actually these are 3 records. I don't want to replace line break with space, I am going to load data in database using sql loader and want to load data as it is. what should I do so that unix consider line break as one record?
A,B,C,D
1,"hello
world",sds,sds
2,sdsd,sdds,sdds
Unless you're dealing with trivial cases (No quoted fields, no embedded commas, no embedded newlines, etc.), CSV data is best processed with tools that understand the format. Languages like perl and python have CSV parsing libraries available, there are packages like csvkit that provide useful utilities, and more.
Using csvstat from csvkit on your example:
$ csvstat -H --count foo.csv
Row count: 3

Does anyone have a script to convert a Chrome Bookmarks file with [sub]*folders into a CVS file?

I want to be able to do Vimdiffs and Vimfolds on Bookmarks files that have been converted to CVS files ie with one description and one uri per line. However, because the Bookmarks file has multiple levels for the folders, the CSV file will also need fields for the different levels of folder names on each line.
I am new to jq but it seems like it should be able to do this sort of conversion?
Thanks,
Phil.
Have you tried to use any free tools like: https://json-csv.com/
or json2csv: https://www.npmjs.com/package/json2csv
If neither of those works, perhaps this approach.
When I need to reconstruct data I write a set of loops that identify each property I want for each line in my CSV. Let's say my JSON has Name, Email, Phone but for some reason all are at different object levels in my JSON.
First right a loop that resolves Name, then a loop for Email, and one for Phone. At the end of the first loop call the second, and from the second call the third.
Then you can use jq -n which allows to create JSON with no input.
So your CSV output would be like jq -n '{NewName: .["'$Name'"]}'
once you have a clean JSON with all data points at the same level CSV conversion is smooth.
Hope this helps

Spark :How to generate file path to read from s3 with scala

How do I generate and load multiple s3 file path in scala so that I can use :
sqlContext.read.json ("s3://..../*/*/*")
I know I can use wildcards to read multiple files but is there any way so that I can generate the path ? For example my fIle structure looks like this:
BucketName/year/month/day/files
s3://testBucket/2016/10/16/part00000
These files are all jsons. The issue is I need to load just spacific duration of files, for eg. Say 16 days then I need to loado files for start day ( oct 16) : oct 1 to 16.
With 28 day duration for same start day I would like to read from Sep 18
Can some tell me any ways to do this ?
You can take a look at this answer, You can specify whole directories, use wildcards and even CSV of directories and wildcards. E.g.:
sc.textFile("/my/dir1,/my/paths/part-00[0-5]*,/another/dir,/a/specific/file")
Or you can use AWS API to get the list of files locations and read those files using spark .
You can look into this answer to AWS S3 file search.
You can generate comma separated path list:
sqlContext.read.json (s3://testBucket/2016/10/16/,s3://testBucket/2016/10/15/,...);

creating a list from json csv file using python

I am sorry for asking this question, but i already look through but could not find the answer. I am honestly newbie.I am trying to generate a list of whole word from a json csv file. I already created a list of lines, but then i cannot use split() to generate new list containing separate word (later i need to count word occurrence).
My input file contains twitter information:
twitter data
i tried to write simple code:
myfile=open('fileName','r')
words=[]
for line in myfile:
words.append(line.split())
len(words)=82
I also tried reader=csv.reader(myFile) and reader=csv.DictReader(myFile)
but in all I can get each line, but how to further split the string/line into independent word. Sorry and thank you in advanced.
My data #I change to a different example as maybe last one was bad formatted:
id,flags,expiration,cas,value
493926581610364928,0,0,2635740904247446,"{""contributors"":null,""truncated"":false,""text"":""#xaaronh #blueredandgold If Namco Bandai's One Piece Unlimited World is anything to go by, no local retail release means no eShop either =\\"",""in_reply_to_status_id"":493925918998425600,""id"":493926581610364928,""favorite_count"":0,""source"":""Twitter Web Client"",""retweeted"":false,""coordinates"":null,""entities"":{""symbols"":[],""user_mentions"":[{""id"":139852376,""indices"":[0,8],""id_str"":""139852376"",""screen_name"":""xaaronh"",""name"":""Aaron""},{""id"":74393990,""indices"":[9,24],""id_str"":""74393990"",""screen_name"":""blueredandgold"",""name"":""Leigh""}],""hashtags"":[],""urls"":[]},""in_reply_to_screen_name"":""xaaronh"",""in_reply_to_user_id"":139852376,""retweet_count"":0,""id_str"":""493926581610364928"",""favorited"":false,""user"":{""follow_request_sent"":false,""profile_use_background_image"":true,""default_profile_image"":false,""id"":42302246,""profile_background_image_url_hp"":""hp://pbs.twimg.com/profile_background_images/464279459932020736/v1xnMcrV.jpeg"",""verified"":false,""profile_text_color"":""333333"",""profile_image_url_https"":""hp://pbs.twimg.com/profile_images/490791031487463424/udSldTQ3_normal.png"",""profile_sidebar_fill_color"":""DDEEF6"",""entities"":{""description"":{""urls"":[{""url"":""hp:tttt"",""indices"":[67,89],""expanded_url"":""hp://infernalmonkey.com"",""display_url"":""infernalmonkey.com""}]}},""followers_count"":506,""profile_sidebar_border_color"":""000000"",""id_str"":""42302246"",""profile_background_color"":""1A1B1F"",""listed_count"":22,""is_translation_enabled"":false,""utc_offset"":36000,""statuses_count"":8676,""description"":""I probably tweet about video games and onaholes. Let's be friends! (NSFW)"",""friends_count"":261,""location"":""Sydney, Australia"",""profile_link_color"":""2FC2EF"",""profile_image_url"":""hp://pbs.twimg.com/profile_images/490791031487463424/udSldTQ3_normal.png"",""following"":false,""geo_enabled"":false,""profile_banner_url"":""hp://pbs.twimg.com/profile_banners/42302246/1406105444"",""profile_background_image_url"":""hp://pbs.twimg.com/profile_background_images/464279459932020736/v1xnMcrV.jpeg"",""screen_name"":""infernal_monkey"",""lang"":""en"",""profile_background_tile"":false,""favourites_count"":2018,""name"":""Lance McGill"",""notifications"":false,""url"":null,""created_at"":""Sun May 24 23:20:25 +0000 2009"",""contributors_enabled"":false,""time_zone"":""Sydney"",""protected"":false,""default_profile"":false,""is_translator"":false},""geo"":null,""in_reply_to_user_id_str"":""139852376"",""lang"":""en"",""_id"":""493926581610364928"",""created_at"":""Tue Jul 29 01:10:48 +0000 2014"",""in_reply_to_status_id_str"":""493925918998425600"",""place"":null,""metadata"":{""iso_language_code"":""en"",""result_type"":""recent""}}"
This is not the best solution, just an effort from a noob (me), definitely need further editing for better output. I am using windows OS.
import csv
import json
abc=[]
myList=[]
myDict={}
myFile=open('fileName.csv','r',encoding='utf-8')
myReader=csv.reader(myFile)
header=next(myReader)
for line in myReader:
abc=json.loads(line[4])
myDict=abc
myList.append(myDict['text'])
dct={}
for eachLine in myList:
item=eachLine.split()
for one in item:
if one in dct:
dct[one]+=1
else:
dct[one]=1
finalList=list(dct.items())
finalList.sort()