Please keep in mind this is a open question and I am not looking for a specific answer but just approaches and routes I can take.
Essentially I am getting a csv file from my aws s3 bucket. I am able to get it successfully using
AmazonS3 s3Client = new AmazonS3Client(new ProfileCredentialsProvider());
S3Object object = s3Client.getObject(
new GetObjectRequest(bucketName, key));
Now I want to populate a dynamodb table using this JSON file.
I was confused as i found all sorts of stuff online.
Here is one suggestion - This approach is however only reading the file it is not inserting anything to the dynamodb table.
Here is another suggestion - This approach is lot closer to what i am looking for , it is populating a table from a JSON file.
However i was wondering is there a generic way to ready any json file and populate a dynamodb table based on that ? Also for my case what approach is the best?
Since i originally asked the question I did more work.
What I have done so far
I have a csv file sitting in s3 that looks like this
name,position,points,assists,rebounds
Lebron James,SF,41,12,11
Kyrie Irving,PG,41,7,5
Stephen Curry,PG,29,8,4
Klay Thompson,SG,31,5,5
I am able to sucessfully pick it up as a s3object doing the following
AmazonS3 s3client = new AmazonS3Client(/**new ProfileCredentialsProvider()*/);
S3Object object = s3client.getObject(
new GetObjectRequest("lambda-function-bucket-blah-blah", "nba.json"));
InputStream objectData = object.getObjectContent();
Now I want to insert this in to my dynamodb table so i am attempting the following.
AmazonDynamoDBClient dbClient = new AmazonDynamoDBClient();
dbClient.setRegion(Region.getRegion(Regions.US_BLAH_1));
DynamoDB dynamoDB = new DynamoDB(dbClient);
//DynamoDB dynamoDB = new DynamoDB(client);
Table table = dynamoDB.getTable("MyTable");
//after this point i have tried many json parsers etc and did table.put(item) etc but nothing has worked. I would appreciate kind help
For CSV parsing, you can use plain reader as your file looks quite simple
AmazonS3 s3client = new AmazonS3Client(/**new ProfileCredentialsProvider()*/);
S3Object object = s3client.getObject(
new GetObjectRequest("lambda-function-bucket-blah-blah", "nba.json"));
InputStream objectData = object.getObjectContent();
AmazonDynamoDBClient dbClient = new AmazonDynamoDBClient();
dbClient.setRegion(Region.getRegion(Regions.US_BLAH_1));
DynamoDB dynamoDB = new DynamoDB(dbClient);
//DynamoDB dynamoDB = new DynamoDB(client);
Table table = dynamoDB.getTable("MyTable");
String line = "";
String cvsSplitBy = ",";
try (BufferedReader br = new BufferedReader(
new InputStreamReader(objectData, "UTF-8"));
while ((line = br.readLine()) != null) {
// use comma as separator
String[] elements = line.split(cvsSplitBy);
try {
table.putItem(new Item()
.withPrimaryKey("name", elements[0])
.withString("position", elements[1])
.withInt("points", elements[2])
.....);
System.out.println("PutItem succeeded: " + elements[0]);
} catch (Exception e) {
System.err.println("Unable to add user: " + elements);
System.err.println(e.getMessage());
break;
}
}
} catch (IOException e) {
e.printStackTrace();
}
Depending the complexity of your CSV, you can use 3rd party libraries like Apache CSV Parser or open CSV
I leave the original answer for parsing JSon
I would use the Jackson library and following your code do the following
AmazonS3 s3client = new AmazonS3Client(/**new ProfileCredentialsProvider()*/);
S3Object object = s3client.getObject(
new GetObjectRequest("lambda-function-bucket-blah-blah", "nba.json"));
InputStream objectData = object.getObjectContent();
AmazonDynamoDBClient dbClient = new AmazonDynamoDBClient();
dbClient.setRegion(Region.getRegion(Regions.US_BLAH_1));
DynamoDB dynamoDB = new DynamoDB(dbClient);
//DynamoDB dynamoDB = new DynamoDB(client);
Table table = dynamoDB.getTable("MyTable");
JsonParser parser = new JsonFactory()
.createParser(objectData);
JsonNode rootNode = new ObjectMapper().readTree(parser);
Iterator<JsonNode> iter = rootNode.iterator();
ObjectNode currentNode;
while (iter.hasNext()) {
currentNode = (ObjectNode) iter.next();
String lastName = currentNode.path("lastName").asText();
String firstName = currentNode.path("firstName").asText();
int minutes = currentNode.path("minutes").asInt();
// read all attributes from your JSon file
try {
table.putItem(new Item()
.withPrimaryKey("lastName", lastName, "firstName", firstName)
.withInt("minutes", minutes));
System.out.println("PutItem succeeded: " + lastName + " " + firstName);
} catch (Exception e) {
System.err.println("Unable to add user: " + lastName + " " + firstName);
System.err.println(e.getMessage());
break;
}
}
parser.close();
Inserting the records in your table will depend of your schema, I just put an arbitrary example, but anyway this will get you the reading of your file and the way to insert into the dynamoDB table
As you talked about the different approaches, another possibility is to setup a AWS Pipeline
Related
I am working on c# utility to migrate data from SQL server 2017 to MongoDB. Below are steps I am following
1) Getting data from SQL server in JSON format (FOR JSON AUTO)
2) Parsing into BSON document
3) Then trying to insert into MongoDB
But I am getting error while reading JSON data from SQL.
My Json data is combination of root attributes as well as nested objects.
So Its dynamic data, that I want to PUSH as it is to MongoDB.
string jsonData = string.Empty;
foreach (var userId in userIdList)
{
using (SqlConnection con = new SqlConnection("Data Source=;Initial Catalog=;Integrated Security=True"))
{
using (SqlCommand cmd = new SqlCommand("Usp_GetUserdata", con))
{
cmd.CommandType = CommandType.StoredProcedure;
cmd.Parameters.Add("#userId", SqlDbType.Int).Value = userId;
con.Open();
var reader = cmd.ExecuteReader();
jsonResult = new StringBuilder();
//cmd.ExecuteNonQuery();
if (!reader.HasRows)
{
jsonResult.Append("[]");
}
else
{
while (reader.Read())
{
jsonResult.Append(reader.GetValue(0));
jsonData = reader.GetValue(0).ToString();
File.WriteAllText(#"c:\a.txt", jsonResult.ToString());
File.WriteAllText(#"c:\a.txt",jsonData);
jsonData.TrimEnd(']');
jsonData.TrimStart('[');
//Create client connection to our MongoDB database
var client = new MongoClient(MongoDBConnectionString);
//Create a session object that is used when leveraging transactions
var session = client.StartSession();
//Create the collection object that represents the "products" collection
var employeeCollection = session.Client.GetDatabase("mongodev").GetCollection<BsonDocument>("EmpData");
//Begin transaction
session.StartTransaction();
try
{
dynamic resultJson = JsonConvert.DeserializeObject(result);
var document = BsonSerializer.Deserialize<BsonDocument>(resultJson);
//MongoDB.Bson.BsonDocument document
// = MongoDB.Bson.Serialization.BsonSerializer.Deserialize<BsonDocument>(jsonResult);
employeeCollection.InsertOneAsync(document);
//BsonArray pipeline =
// MongoDB.Bson.Serialization.BsonSerializer.Deserialize<BsonArray>(jsonData);
//var documents = pipeline.Select(val => val.AsBsonDocument);
//employeeCollection.InsertManyAsync(documents);
session.CommitTransaction();
}
catch (Exception e)
{
Console.WriteLine(e);
session.AbortTransaction();
throw;
}
}
}
}
}
}
I have a file that contains a json array of objects:
[
{
"test1": "abc"
},
{
"test2": [1, 2, 3]
}
]
I wish to use use Jackson's JsonParser to take an inputstream from this file, and at every call to .next(), I want it to return an object from the array until it runs out of objects or fails.
Is this possible?
Use case:
I have a large file with a json array filled with a large number of objects with varying schemas. I want to get one object at a time to avoid loading everything into memory.
EDIT:
I completely forgot to mention. My input is a string that is added to over time. It slowly accumulates json over time. I was hoping to be able to parse it object by object removing the parsed object from the string.
But I suppose that doesn't matter! I can do this manually so long as the jsonParser will return the index into the string.
Yes, you can achieve this sort of part-streaming-part-tree-model processing style using an ObjectMapper:
ObjectMapper mapper = new ObjectMapper();
JsonParser parser = mapper.getFactory().createParser(new File(...));
if(parser.nextToken() != JsonToken.START_ARRAY) {
throw new IllegalStateException("Expected an array");
}
while(parser.nextToken() == JsonToken.START_OBJECT) {
// read everything from this START_OBJECT to the matching END_OBJECT
// and return it as a tree model ObjectNode
ObjectNode node = mapper.readTree(parser);
// do whatever you need to do with this object
}
parser.close();
What you are looking for is called Jackson Streaming API. Here is a code snippet using Jackson Streaming API that could help you to achieve what you need.
JsonFactory factory = new JsonFactory();
JsonParser parser = factory.createJsonParser(new File(yourPathToFile));
JsonToken token = parser.nextToken();
if (token == null) {
// return or throw exception
}
// the first token is supposed to be the start of array '['
if (!JsonToken.START_ARRAY.equals(token)) {
// return or throw exception
}
// iterate through the content of the array
while (true) {
token = parser.nextToken();
if (!JsonToken.START_OBJECT.equals(token)) {
break;
}
if (token == null) {
break;
}
// parse your objects by means of parser.getXxxValue() and/or other parser's methods
}
This example reads custom objects directly from a stream:
source is a java.io.File
ObjectMapper mapper = new ObjectMapper();
JsonParser parser = mapper.getFactory().createParser( source );
if ( parser.nextToken() != JsonToken.START_ARRAY ) {
throw new Exception( "no array" );
}
while ( parser.nextToken() == JsonToken.START_OBJECT ) {
CustomObj custom = mapper.readValue( parser, CustomObj.class );
System.out.println( "" + custom );
}
This is a late answer that builds on Ian Roberts' answer. You can also use a JsonPointer to find the start position if it is nested into a document. This avoids custom coding the slightly cumbersome streaming token approach to get to the start point. In this case, the basePath is "/", but it can be any path that JsonPointer understands.
Path sourceFile = Paths.get("/path/to/my/file.json");
// Point the basePath to a starting point in the file
JsonPointer basePath = JsonPointer.compile("/");
ObjectMapper mapper = new ObjectMapper();
try (InputStream inputSource = Files.newInputStream(sourceFile);
JsonParser baseParser = mapper.getFactory().createParser(inputSource);
JsonParser filteredParser = new FilteringParserDelegate(baseParser,
new JsonPointerBasedFilter(basePath), false, false);) {
// Call nextToken once to initialize the filteredParser
JsonToken basePathToken = filteredParser.nextToken();
if (basePathToken != JsonToken.START_ARRAY) {
throw new IllegalStateException("Base path did not point to an array: found "
+ basePathToken);
}
while (filteredParser.nextToken() == JsonToken.START_OBJECT) {
// Parse each object inside of the array into a separate tree model
// to keep a fixed memory footprint when parsing files
// larger than the available memory
JsonNode nextNode = mapper.readTree(filteredParser);
// Consume/process the node for example:
JsonPointer fieldRelativePath = JsonPointer.compile("/test1");
JsonNode valueNode = nextNode.at(fieldRelativePath);
if (!valueNode.isValueNode()) {
throw new IllegalStateException("Did not find value at "
+ fieldRelativePath.toString()
+ " after setting base to " + basePath.toString());
}
System.out.println(valueNode.asText());
}
}
Here headers are also inserting into database .here uploading the csv file with comma separated data
string Feedback = string.Empty;
string connString = ConfigurationManager.ConnectionStrings["DataBaseConnectionString"].ConnectionString;
using (MySqlConnection conn = new MySqlConnection(connString))
{
var copy = new MySqlBulkLoader(conn);
conn.Open();
try
{
copy.TableName = "BulkImportDetails";
copy.FileName = fileName;
copy.FieldTerminator = ",";
copy.LineTerminator = #"\n";
copy.Load();
Feedback = "Upload complete";
}
catch (Exception ex)
{
Feedback = ex.Message;
}
finally { conn.Close(); }
}
return Feedback;
Use the NumberOfLinesToSkip property to skip the first line, like so:
copy.NumberOfLinesToSkip = 1;
The use of this property is clearly shown in the documentation for MySQLBulkLoader. You must make a habit of reading the documentation to resolve your queries before you post a question here.
Hi below is my code to extract particular metadata tags and write those tags to a json file. And i imported json.lib.jar and tika-app.jar into my build path.
File dir = new File("C:/pdffiles");
File listDir[] = dir.listFiles();
for (int i = 0; i < listDir.length; i++)
{
System.out.println("files"+listDir.length);
String file=listDir[i].toString();
File file1 = new File(file);
InputStream input = new FileInputStream(file1);
Metadata metadata = new Metadata();
BodyContentHandler handler = new BodyContentHandler(10*1024*1024);
AutoDetectParser parser = new AutoDetectParser();
parser.parse(input, handler, metadata);
Map<String, String> map = new HashMap<String, String>();
map.put("File name: ", listDir[i].getName());
map.put("Title: " , metadata.get("title"));
map.put("Author: " , metadata.get("Author"));
map.put("Content type: " , metadata.get("Content-Type"));
JSONObject json = new JSONObject();
json.accumulateAll(map);
FileWriter file2;
file2 = new FileWriter("C:\\test.json");
file2.write(json.toString());
file2.flush();
}
But it is writing only single file metadata to the json file. Is there any problem with my code, please suggest me.
may be you should use-
file2.write(json.toJSONString());
instead of this line -
file2.write(json.toString());
I am trying to dynamicalyl populate my jqGrid...
I have been running into a hell of a time getting my jquery grid to populate with data. How would you set up your jquery string? I create an object like so...
public static object JsonHelper(TemplateModel model){
var values = model.Template;
var JsonDataList = new {
total = 1,
page = 1,
records = model.Template.Count,
rows = (from val in values
select new {
cell = //new string(
":[\"id\" :\"" + val.EncounterId +",\""+
"\""+val.MRN + ",\""+
"\""+val.HostpitalFinNumber +",\""+
"\""+val.FirstName+",\"" +
"\""+val.LastName +",\"" +
"\""+val.DateOfBirth.ToString() +",\""+
"\""+val.CompletedPathway +",\""+
"\""+val.CompletedPathwayReason +",\""+
"\""+val.PCPAppointmentDateTime.ToString() + ",\""+
"\""+ val.SpecialistAppointmentDateTime.ToString() + ",\""+
"\""+val.AdminDate.ToString()+"\"]"
}).ToString()//.ToArray()
};
return JsonDataList;
}
That is just an object,
However I return the object using the Json methbod call...
Here is what I do...
return Json(DataRepository.JsonHelper(model.FirstOrDefault()), JsonRequestBehavior.AllowGet);
I get the model from the search call... I have know idea what I am doing wrong... Can somebody give me a simple example of how to turn a simple object into json?
I suggest you look into Google's gson library. I used it when working with JSON and it worked perfectly.
Well, I just used a string builder and a good JSON debugger to get the right strings, and it appears as though it works...