New job, new platform: Pega. So Pega is being used to host a relief map of ranchers and farmers for a conservation project. A data table can be generated with about 20-22 columns. The developer told me that the export to CSV uses a function exposed by Pega to convert a JSON payload to a .CSV file
the problem is, empty cells in the data table on the browser have a "-" (the dash symbol) because our client doesn't want empty data cells. The dashes are not carried over to the .CSV export
the second problem is that a column that displays that data in quotations is being exported to .CSV with double quotations.
I have been going over to the code in developer view trying to figure out where and how that data is being exported that particular way but I can't find the exact string.
Does anyone else have experience with Pega using a JSON command to export a .CSV file?
Related
I basically have a procedure where I make multiple calls to an API and using a token within the JSON return pass that pack to a function top call the API again to get a "paginated" file.
In total I have to call and download 88 JSON files that total 758mb. The JSON files are all formatted the same way and have the same "schema" or at least should do. I have tried reading each JSON file after it has been downloaded into a data frame, and then attempted to union that dataframe to a master dataframe so essentially I'll have one big data frame with all 88 JSON files read into.
However the problem I encounter is roughly on file 66 the system (Python/Databricks/Spark) decides to change the file type of a field. It is always a string and then I'm guessing when a value actually appears in that field it changes to a boolean. The problem is then that the unionbyName fails because of different datatypes.
What is the best way for me to resolve this? I thought about reading using "extend" to merge all the JSON files into one big file however a 758mb JSON file would be a huge read and undertaking.
Could the other solution be to explicitly set the schema that the JSON file is read into so that it is always the same type?
If you know the attributes of those files, you can define the schema before reading them and create an empty df with that schema so you can to a unionByName with the allowMissingColumns=True:
something like:
from pyspark.sql.types import *
my_schema = StructType([
StructField('file_name',StringType(),True),
StructField('id',LongType(),True),
StructField('dataset_name',StringType(),True),
StructField('snapshotdate',TimestampType(),True)
])
output = sqlContext.createDataFrame(sc.emptyRDD(), my_schema)
df_json = spark.read.[...your JSON file...]
output.unionByName(df_json, allowMissingColumns=True)
I'm not sure this is what you are looking for. I hope it helps
When integrating data with DFO365 we use Data Projects to integrate data from other systems. Definitions for these projects can be exported; giving a zip file containing multiple XML files; one per data entity in the project, a PackageHeader.xml, and a Manifest.xml file. Within the manifest file is an element called QueryData, which seems to be a byte array held as a string.
The QueryData field looks like this:
<QueryData>4a012f270000110001e649010000000a4de9030000862b00008c2b0000882b00
008b2b0000000084045400610078005600410054004e0075006d005400610062
006c00650045006e0074006900740079000000110001e8032e00540061007800
5600410054004e0075006d005400610062006c00650045006e00740069007400
79005f0031000000e2092a005400610078005600410054004e0075006d005400
610062006c00650045006e0074006900740079000000094de8030000f3190000
00920402001100010000ffffffffffffffff9b04ffff9a04ffff000000000000
01ffffffff009005000000000000000000000000000000000000000000000000
000000000000</QueryData>
I tried treating this as a byte encoded string with some success; i.e. the below PowerShell converts the above code to:
Ŋ✯ ʼn 蘀+谀+蠀+謀+ 萀各愀砀嘀䄀吀一甀洀吀愀戀氀攀䔀渀琀椀琀礀 ᄀĀϨ.TaxVATNumTableEntity_1 ৢ*TaxVATNumTableEntity 䴉Ϩ ᧳ 鈀ȄᄀĀ
I am using a spark job for reading csv file data from a stating area and coping that data into HDFS using following code line:
val conf = new SparkConf().setAppName("WCRemoteReadHDFSWrite").set("spark.hadoop.validateOutputSpecs", "true");
val sc = new SparkContext(conf)
val rdd = sc.textFile(source)
rdd.saveAsTextFile(destination)
csv file is having data in following format:
CTId,C3UID,region,product,KeyWord
1,1004634181441040000,East,Mobile,NA
2,1004634181441040000,West,Tablet,NA
whereas when data goes into HDFS it goes in following format:
CTId,C3UID,region,product,KeyWord
1,1.00463E+18,East,Mobile,NA
2,1.00463E+18,West,Tablet,NA
I am not able to find any valid reason behind this.
Any kind of help would be appreciated.
Regards,
Bhupesh
What happens is that because your C3UID is a large number, it gets parsed as Double and then is saved in standard Double notation. You need to fix the schema, and make sure you read the second column either as Long, BigDecimal or String, then there will be no change in String-representation.
Sometimes your CSV file could also be the culprit. Do NOT open CSV file in excel as excel could convert those big numeric values into exponential format and hence once you use spark job for importing data into hdfs, it goes as it is in string format.
Hence be very sure that your data in CSV should never be opened in excel before importing to hdfs using spark job. If you really want to see the content of your excel use either notepad++ or any other text editor tool
I need to take an Excel file that includes many columns, two of which are longitude and latitude.
How do I get ArcMap to accept this file as spatial data, and map it based on the lat/long data?
My data is from this page which allows for developers to access the raw data. I downloaded the data and loaded it into an excel file, and that's as far as I could get.
What you're looking for is Add XY Data. You can find it in the File menu (File / Add Data / Add XY Data...)
The dialog box that comes up asks you to indicate the table that was added, what columns contain XY data, and (ideally) the coordinate system of the XY data.
Note: Sometimes it helps to convert an Excel spreadsheet to plain CSV data first; ArcMap can be finicky about fields formatted as text instead of numbers, for example.
Add XY data will do the job. Just make sure that the values of latitude and longitude do not have trailing whitespace, otherwise, ArcMap doesn't show those columns when it prompts to choose the columns for x and y.
Using this DevCenter, how to save result of query to json (I actually get JSON table) to file on disk? I know there is "COPY All as CSV" when right click on any row, however I dont see copy as JSON. If I just use copy as csv, the original format is modified and therefore of no use for further access in any other tool.
In short, there should be some way to export as whatever type. But it seems this visual tool only has export as csv.
DevCenter 1.4 was recently released (Jul 20, '15) and has support for Cassandra 2.2 INSERT JSON, SELECT JSON statements and fromJson and toJson functions. Included in this support, DevCenter can display JSON for entire result sets or individual cell contents.
So with the combination of Cassandra 2.2 and DevCenter 1.4 you can query your data using either SELECT JSON or toJson (for a single column) and then copy/paste the JSON data as you wish.