Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 2 years ago.
Improve this question
I downloaded a 203775480 bytes (~200 MiB, exact size is important for a later error) JSON file which has all entries all on one line. Needless to say, my text editor (ViM) cannot efficiently navigate in it and I'm not able to understand anything from it. I'd like to prettify it. I tried to use cat file.json | jq '.', jq '.' file.json, cat file.json | python -m json.tool but none worked. The former two commands print nothing on stdout while the latter says Expecting object: line 1 column 203775480 (char 203775479).
I guess it's broken somewhere near the end, but of course I cannot understand where as I cannot even navigate it.
Have you got some other idea for prettifying it? (I've also tried gg=G in ViM: it did not work).
I found that the file was indeed broken: I accidentally noticed a ']' at the beginning of the file so I struggled to go to the end of the file and added a ']' at the end (it took me maybe 5 minutes).
Then I've rerun cat file.json | python -m json.tool and it worked like a charm.
We are using Forge to import a STEP file into the modelspace of an output.DWG. Then a DLL combines modelspace geometry of several DWG files into several layout/paperspace of a single DWG. This sheet combination was working perfectly until just recently, when the combination process completely stopped happening.
Has something in Forge changed recently that we're not aware of? Updates/patches, or something like that which could have caused this issue?
This is an issue for a production application and is considered an outage at this point, and is very time-sensitive.
Edit: Here are some differences we noticed between the log files generated by this process. In this first section, the verbiage being written by AutoCAD has changed slightly during an extraction process:
[08/01/2019 17:15:35] End downloading https://.... 1556909 bytes have been unpacked to folder T:\Aces\Jobs\a43e5ca7faaa4db8b5374aaef71b36d3\cadlayouts.
[08/19/2019 17:25:53] End downloading file https://.... 1771363 bytes have been written to T:\Aces\Jobs\d12f3bed13b84d29b31226222e3cf3c9\cadlayouts.
In the log from 8/19, all lines logged in between:
Start AutoCAD Core Engine standard output dump.
And:
End AutoCAD Core Engine standard output dump.
Are being written twice, but this did not happen in the log file from August 1st or any of the logs before that date.
Edit 2:
Yesterday we used the .NET DirectoryInfo class to pull all directories into one list and all files into another and write them all to the log. The cadlayouts entity that should be recognized as a directory (because it's a zip that is extracted by Forge) is instead listed as a file. Our process runs a Directory.Exists() check before the work item merges the DWGs into the output, and this call returns false for the cadlayouts folder, bypassing our combination logic. How can the Forge zip extraction process be working correctly if the resulting entity on the file system is not considered a directory?
It sounds like you have an input argument that is a zip and you expect it to be unzipped into a folder. Please look row 4 in the table below. I suspect that this is what you are experiencing. There WAS a recent change here: we used to look at downloaded bits and unconditionally uncompressed if we found a zip header. (i.e. we acted identically for row 3 and row 4). We now only do this if you ask us to do it.
EDIT: The first column in the table is the value of the zip attribute of Activity's parameters while the second column is the pathInzip attribute of Workitem's arguments.
+---+------------+-----------------+---------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------+
| # | Activity | Workitem | Arg direction | Comments |
+---+------------+-----------------+---------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------+
| 1 | zip==true | pathInZip!=null | input | Zip is uncompressed to the folder specified in localname. Any path reference to this argument will expand to full path of pathInZip. |
+---+------------+-----------------+---------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------+
| 2 | zip==false | pathInZip!=null | input | Zip is uncompressed to the folder specified in localname. Any path reference to this argument will expand to full path of pathInZip. |
+---+------------+-----------------+---------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------+
| 3 | zip==true | pathInZip==null | input | If zip is provided then it is uncompressed to the folder specified in localname. Any path reference to this argument will expand to full path of localName. |
+---+------------+-----------------+---------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------+
| 4 | zip==false | pathInZip==null | input | If zip is provided then it is left compressed. Any variable referencing this argument will expand to full path of localName. |
+---+------------+-----------------+---------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------+
| 5 | zip==true | pathInZip!=null | output | Workitem will be rejected. |
+---+------------+-----------------+---------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------+
| 6 | zip==false | pathInZip!=null | output | Workitem will be rejected. |
+---+------------+-----------------+---------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------+
| 7 | zip==true | pathInZip==null | output | Output(s) at localName will be zipped if localName is a folder. |
+---+------------+-----------------+---------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------+
| 8 | zip==false | pathInZip==null | output | Output at localName will not be zipped. |
+---+------------+-----------------+---------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------+
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 5 years ago.
Improve this question
I am new to the Hadoop framework and would really appreciate it if someone could walk me thru this.
I am trying to merge two .csv files.
The two files have the same headers are orderd the same, etc.
The thing is that I have no idea how to merge these files to one and then clean the empty lines and unused columns.
The two files have the same headers are orderd the same etc
Since the files are the same, you can upload them to the same directory.
hdfs dfs -mkdir -p /path/to/input
hdfs dfs -put file1.csv /path/to/input
hdfs dfs -put file2.csv /path/to/input
HDFS will natively treat these as "parts of a single file" if you read from hdfs:///path/to/input
Note, you'll want to strip the header from both files before placing them into HDFS in this fashion.
Another option would be to concatenate the files locally. (Again, remove the headers first, or at least from all but the first file)
cat file1.csv file2.csv > file3.csv
hdfs dfs -put file3.csv /path/to/input
After that, use whatever Hadoop tools you know to read the files.
Since they have the same structure,load them both using PIG into 2 relations and then UNION the 2 relations.Finally you can FILTER the records that match certain criteria.I am assuming the files have 2 fields each for simplicity.
A = LOAD '/path/file1.csv' USING PigStorage(',') AS (a1:chararray;a2:chararray);
B = LOAD '/path/file2.csv' USING PigStorage(',') AS (b1:chararray;b2:chararray);
C = UNION A,B;
D = FILTER C BY (C.$0 is NULL OR C.$1 is NULL) <-- If first or second column is null filter the record.
DUMP D;
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 3 years ago.
Improve this question
I want a sample csv file of about 1 million entries in it. From where can i get that ,can anybody please help me with this?
Make your own...
perl -E 'for($i=0;$i<1000000;$i++){say "Line $i,field2,field3,",int rand 100}' > BigBoy.csv
Output
Line 0,field2,field3,58
Line 1,field2,field3,4
Line 2,field2,field3,12
Line 3,field2,field3,39
Line 4,field2,field3,41
Line 5,field2,field3,18
...
...
Line 999998,field2,field3,67
Line 999999,field2,field3,62
Or use the Faker package in Python. Here to generate fake (dummy) names and addresses:
#!/usr/bin/env python3
# Synthesize dummy address CSV
# Requires: pip install Faker
from faker import Faker
fake=Faker('en_GB') # Use different locale, e.g. 'de_DE' or leave blank for USA
Faker.seed(42) # Omit this line if you want different data on each run
print('firstname, lastname, street, city, postcode, Email, telephone')
for _ in range(10):
first = fake.first_name()
last = fake.last_name()
street = fake.street_name()
city = fake.city()
zip = fake.postcode()
email = fake.ascii_safe_email()
tel = fake.phone_number()
print(f'{first},{last},{street},{city},{zip},{email},{tel}')
Output
firstname, lastname, street, city, postcode, Email, telephone
Ruth,Griffiths,Smith dam,Lake Janiceland,S0A 3JW,moorefrancesca#example.net,+44191 4960637
Joan,White,Sam square,Cookberg,N11 1QQ,samuel78#example.org,0141 496 0184
Teresa,Hurst,Mellor squares,North Irenebury,BT3 6LT,ben55#example.org,(029) 2018 0419
Heather,Thompson,Ben mountain,Dixonside,N03 5RL,kellykirsty#example.net,+441214960376
Carly,Hale,Davidson summit,Fionachester,S5D 8UD,taylorcarl#example.net,(0116) 4960691
Or to generate fake (dummy) URIs, IP addresses and MAC addresses:
#!/usr/bin/env python3
# Synthesize dummy IP address and MAC addresses
# Requires: pip install Faker
from faker import Faker
fake=Faker()
Faker.seed(42) # Omit this line if you want different data on each run
print('URI, IP, MAC')
for _ in range(10):
URI = fake.uri()
IP = fake.ipv4_public()
MAC = fake.mac_address()
print(f'{URI},{IP},{MAC}')
Sample Output
URI, IP, MAC
http://walker.com/,203.75.32.207,d8:10:0f:2f:6f:77
http://www.santos.org/posts/app/privacy.html,216.228.82.113,4f:6e:ac:34:2f:c2
https://baker.com/,146.195.110.208,b9:62:23:17:74:94
http://www.ramirez-reid.com/,101.107.68.129,88:24:57:7d:53:ec
Try using Majestic Million CSV which is free.
If you don't mind paying a small fee you can try BrianDunning.com
For example CSV and JSON are human and machine readable text formats.
Now I am looking for something similar even more graphical for table data representation.
Instead of:
1,"machines",14.91
3,"mammals",1.92
50,"fruit",4.239
789,"funghi",29.3
which is CSV style or
[
[1,"machines",14.91],
[3,"mammals",1.92],
[50,"fruit",4.239],
[789,"funghi",29.3]
]
which is JSON style, and I am not going to give an XML example, something similar like this is what I have in mind:
1 | "machines"| 14.91
3 | "mammals" | 1.92
50 | "fruit" | 4.239
789 | "funghi" | 29.3
There should be reader and writer libraries for it for some languages and it should somehow be a standard. Of course I could roll my own but if there is also a standard I'd go with that.
I have seen similar things as part of wiki or markup languages, but it should serve as a human easily editable data definition format and be read and also written by software libraries.
That's not exactly what markup and wiki languages are for. What I am looking for belongs more to the csv,json and xml family.
I would checkout textile. It has a table syntax almost exactly like what you described.
For example, the table in your example would be constructed like this:
| 1 | machines | 14.91 |
| 3 | mammals | 1.92 |
| 50 | fruit | 4.239 |
| 789 | funghi | 29.3 |
An alternative (albeit not optimized for tabular data), is YAML, which is nice for JSON-ish type data.
Alternatively you could also look at the CSV editor's i.e.
CsvEd
CsvEasy
ReCsvEditor
There whole purpose is to display CSV and update data in a more readable Format. The ReCsvEditor will display both Xml and Csv files in a a similar format.
Google CsvEditor, you will find plenty