I need a csv file with 1 million entries [closed] - csv
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 3 years ago.
Improve this question
I want a sample csv file of about 1 million entries in it. From where can i get that ,can anybody please help me with this?
Make your own...
perl -E 'for($i=0;$i<1000000;$i++){say "Line $i,field2,field3,",int rand 100}' > BigBoy.csv
Output
Line 0,field2,field3,58
Line 1,field2,field3,4
Line 2,field2,field3,12
Line 3,field2,field3,39
Line 4,field2,field3,41
Line 5,field2,field3,18
...
...
Line 999998,field2,field3,67
Line 999999,field2,field3,62
Or use the Faker package in Python. Here to generate fake (dummy) names and addresses:
#!/usr/bin/env python3
# Synthesize dummy address CSV
# Requires: pip install Faker
from faker import Faker
fake=Faker('en_GB') # Use different locale, e.g. 'de_DE' or leave blank for USA
Faker.seed(42) # Omit this line if you want different data on each run
print('firstname, lastname, street, city, postcode, Email, telephone')
for _ in range(10):
first = fake.first_name()
last = fake.last_name()
street = fake.street_name()
city = fake.city()
zip = fake.postcode()
email = fake.ascii_safe_email()
tel = fake.phone_number()
print(f'{first},{last},{street},{city},{zip},{email},{tel}')
Output
firstname, lastname, street, city, postcode, Email, telephone
Ruth,Griffiths,Smith dam,Lake Janiceland,S0A 3JW,moorefrancesca#example.net,+44191 4960637
Joan,White,Sam square,Cookberg,N11 1QQ,samuel78#example.org,0141 496 0184
Teresa,Hurst,Mellor squares,North Irenebury,BT3 6LT,ben55#example.org,(029) 2018 0419
Heather,Thompson,Ben mountain,Dixonside,N03 5RL,kellykirsty#example.net,+441214960376
Carly,Hale,Davidson summit,Fionachester,S5D 8UD,taylorcarl#example.net,(0116) 4960691
Or to generate fake (dummy) URIs, IP addresses and MAC addresses:
#!/usr/bin/env python3
# Synthesize dummy IP address and MAC addresses
# Requires: pip install Faker
from faker import Faker
fake=Faker()
Faker.seed(42) # Omit this line if you want different data on each run
print('URI, IP, MAC')
for _ in range(10):
URI = fake.uri()
IP = fake.ipv4_public()
MAC = fake.mac_address()
print(f'{URI},{IP},{MAC}')
Sample Output
URI, IP, MAC
http://walker.com/,203.75.32.207,d8:10:0f:2f:6f:77
http://www.santos.org/posts/app/privacy.html,216.228.82.113,4f:6e:ac:34:2f:c2
https://baker.com/,146.195.110.208,b9:62:23:17:74:94
http://www.ramirez-reid.com/,101.107.68.129,88:24:57:7d:53:ec
Try using Majestic Million CSV which is free.
If you don't mind paying a small fee you can try BrianDunning.com
Related
Apple .numbers to csv using command line? [closed]
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers. We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations. Closed 7 days ago. This post was edited and submitted for review 6 days ago. Improve this question I am doing a project where I want to automate the downloading and use of each datafile for reproducability. Unfortunately, one of the data sources I need is only provided in Apple .numbers format. With LibreOffice, I can open and save as a .csv, but I want something I can script. I found this tool ssconvert, which is part of the Gnumeric package, but I do not see .numbers on the list of importers. └✸Fri,Feb10#15:21:~$ ssconvert --list-importers ID | Description Gnumeric_Excel:excel | MS Excel™ (*.xls) Gnumeric_Excel:excel_enc | MS Excel™ (*.xls) requiring encoding specification Gnumeric_Excel:excel_xml | MS Excel™ 2003 SpreadsheetML Gnumeric_Excel:xlsx | ECMA 376 / Office Open XML [MS Excel™ 2007/2010] (*.xlsx) Gnumeric_OpenCalc:openoffice | Open Document Format (*.sxc, *.ods) Gnumeric_QPro:qpro | Quattro Pro (*.wb1, *.wb2, *.wb3) Gnumeric_XmlIO:sax | Gnumeric XML (*.gnumeric) Gnumeric_applix:applix | Applix (*.as) Gnumeric_dif:dif | Data Interchange Format (*.dif) Gnumeric_html:html | HTML (*.html, *.htm) Gnumeric_lotus:lotus | Lotus 123 (*.wk1, *.wks, *.123) Gnumeric_mps:mps | Linear and integer program (*.mps) file format Gnumeric_oleo:oleo | GNU Oleo (*.oleo) Gnumeric_plan_perfect:pln | Plan Perfect Format (PLN) import Gnumeric_sc:sc | SC/xspread Gnumeric_stf:stf_csvtab | Comma or tab separated values (CSV/TSV) Gnumeric_sylk:sylk | MultiPlan (SYLK) Gnumeric_xbase:xbase | Xbase (*.dbf) file format Is there a straightforward way to do this conversion programatically in Linux? #Tripleee provided the necessary clue in a comment. Thus, with LibreOffice installed, I can do what I want with: soffice --headless --convert-to csv *.numbers --outdir .
Making all the decimals equal with sed [closed]
Closed. This question needs to be more focused. It is not currently accepting answers. Want to improve this question? Update the question so it focuses on one problem only by editing this post. Closed 1 year ago. Improve this question I have a csv file with this structure: 123;rr;2;RRyO, chess mobil;pio;25.766;1;0;24353;21.6;;S 1243;rho;9;RpO, chess yext cat;downpio;67.98;1;0;237753;25.34600;;S I want all the numbers of a specific column to have only 2 decimals (adding or removing decimals). With this output 123;rr;2;RRyO, chess mobil;pio;25.766;1;0;24353;21.60;;S 1243;rho;9;RpO, chess yext cat;downpio;67.98;1;0;237753;25.34;;S I have tried this, but doesnt work sed 's/[[:digit:]]*\.//g' data.csv Any idea? Maybe a script is needed?
Perl to the rescue! perl -F\; -ne '$F[9] = sprintf "%.2f", $F[9]; print join ";", #F' -- file.csv Note that it will set the value on line 2 to 25.35, not 25.34, as that's how %f rounds 25.346. You can use $F[9] = sprintf "%.2f", int($F[9] * 100) / 100 to get the output you want. In sed, you need to distinguish the two cases: there's only a single deciaml, or there're more than two. sed -E 's/(;[0-9]+)\.([0-9])(;[^;]*;[^;]*)$/\1.\20\3/' \ -E 's/(;[0-9]+)\.([0-9]{2})[0-9]+(;[^;]*;[^;]*)$/\1.\2\3/'
$ awk '{$(NF-2) = sprintf( "%0.2f", $(NF-2))}1' FS=\; OFS=\; input 123;rr;2;RRyO, chess mobil;pio;25.766;1;0;24353;21.60;;S 1243;rho;9;RpO, chess yext cat;downpio;67.98;1;0;237753;25.35;;S
This might work for you (GNU sed): sed -E 's/^/;/;s/;[0-9]*\.[0-9]*/&00/g;s/(;[0-9]*\.[0-9]{2})[^;]*/\1/g;s/.//' file Prepend a csv delimiter to the start of the line so that global regexp will match successfully. If a field looks like a decimal; append two 0's. If a field looks like a decimal; shorten it to two decimal places. Remove the introduced csv delimiter. N.B. This does not account for rounding. If rounding is required perhaps: sed -E 's/^/;/;s/;([0-9]*\.[0-9]*)/$(printf ";%.2f" \1)/g;s/.(.*)/echo "\1"/e' file
Prettify a one-line JSON file [closed]
Closed. This question is opinion-based. It is not currently accepting answers. Want to improve this question? Update the question so it can be answered with facts and citations by editing this post. Closed 2 years ago. Improve this question I downloaded a 203775480 bytes (~200 MiB, exact size is important for a later error) JSON file which has all entries all on one line. Needless to say, my text editor (ViM) cannot efficiently navigate in it and I'm not able to understand anything from it. I'd like to prettify it. I tried to use cat file.json | jq '.', jq '.' file.json, cat file.json | python -m json.tool but none worked. The former two commands print nothing on stdout while the latter says Expecting object: line 1 column 203775480 (char 203775479). I guess it's broken somewhere near the end, but of course I cannot understand where as I cannot even navigate it. Have you got some other idea for prettifying it? (I've also tried gg=G in ViM: it did not work).
I found that the file was indeed broken: I accidentally noticed a ']' at the beginning of the file so I struggled to go to the end of the file and added a ']' at the end (it took me maybe 5 minutes). Then I've rerun cat file.json | python -m json.tool and it worked like a charm.
merging CSV files in Hadoop [closed]
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers. We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations. Closed 5 years ago. Improve this question I am new to the Hadoop framework and would really appreciate it if someone could walk me thru this. I am trying to merge two .csv files. The two files have the same headers are orderd the same, etc. The thing is that I have no idea how to merge these files to one and then clean the empty lines and unused columns.
The two files have the same headers are orderd the same etc Since the files are the same, you can upload them to the same directory. hdfs dfs -mkdir -p /path/to/input hdfs dfs -put file1.csv /path/to/input hdfs dfs -put file2.csv /path/to/input HDFS will natively treat these as "parts of a single file" if you read from hdfs:///path/to/input Note, you'll want to strip the header from both files before placing them into HDFS in this fashion. Another option would be to concatenate the files locally. (Again, remove the headers first, or at least from all but the first file) cat file1.csv file2.csv > file3.csv hdfs dfs -put file3.csv /path/to/input After that, use whatever Hadoop tools you know to read the files.
Since they have the same structure,load them both using PIG into 2 relations and then UNION the 2 relations.Finally you can FILTER the records that match certain criteria.I am assuming the files have 2 fields each for simplicity. A = LOAD '/path/file1.csv' USING PigStorage(',') AS (a1:chararray;a2:chararray); B = LOAD '/path/file2.csv' USING PigStorage(',') AS (b1:chararray;b2:chararray); C = UNION A,B; D = FILTER C BY (C.$0 is NULL OR C.$1 is NULL) <-- If first or second column is null filter the record. DUMP D;
how to get OS name in Windows Powershell using functions [closed]
Closed. This question is not reproducible or was caused by typos. It is not currently accepting answers. This question was caused by a typo or a problem that can no longer be reproduced. While similar questions may be on-topic here, this one was resolved in a way less likely to help future readers. Closed 8 years ago. Improve this question I'm trying to return OS name using functions in Windows Powershell. I built this code, but i don't get any results. Any help please? Function Get-OSName { (Get-WmiObject Win32_OperatingSystem).Name } "Name of the OS: $(Get-OSName)" Thank you.
Try exploring the object to find out what property you want: Get-WmiObject Win32_OperatingSystem | select -Property * You will notice the 'Caption' property contains the friendly OS name, as micky-balladelli mentioned. Your example would change to: Function Get-OSName { (Get-WmiObject Win32_OperatingSystem).Caption } Cheers!
You have omitted a critical part of your screen, with your image. What is important is the line directly after the last line shown. If the last line shown is truly the last, then you still need to press Enter once more. Consider this command PS > 'hello world' hello world Notice the result printed as soon as I hit Enter. However certain actions, like defining a function cause PowerShell to enter interactive mode. Once PowerShell is in interactive mode it will require pressing Enter twice to start the evaluation. Example PS > function foo { >> echo bar >> } >> 'hello world' >> 'dog bird mouse' >> hello world dog bird mouse Notice this time around I was able to enter a command after the same 'hello world' command.