How to update mysql database from csv files in groovy-grails? - mysql

I have a table in my database which needs to updates with value for some rows and columns from csv file( this file outside of the grails application). The csv file contains large set of data with map to specific address and city. Some of the address in my application have wrong cities. So I want to get a city from database(grails application db), compare it with the city in csv file, map address to it, and add that address to the application database.
what is the best approach?

For Grails 3 use https://bintray.com/sachinverma/plugins/org.grails.plugins:csv to parse CSV, add the following to build.gradle. The plugin is also available for Grails 2.
repositories {
https://bintray.com/sachinverma/plugins/org.grails.plugins:csv
}
dependencies {
compile "org.grails.plugins:csv:1+"
}
Then in your service use like:
def is
try {
is = params.csvfile.getInputStream()
def csvMapReader = new CSVMapReader( new InputStreamReader( is ) )
csvMapReader.fieldKeys = ["city","address1", "address2"]
csvMapReader.eachWithIndex { map, idx ->
def dbEntry = DomainObject.findByAddress1AndAddress2( map.address1, map.address2 )
if ( map.city != dbEntry.city ) {
// assuming we're just updating the city on current entry?
dbEntry.city = map.city
dbEntry.save()
}
// do whatever logic
}
finally {
is?.close
}
This is of course a simplified version as I don't know you're csv or schema layout.

Related

Why GoogleFit is not syncing steps data?

I am a developer working on integrating Google Fit with a smart wearable companion app.I am exporting steps data to google fit app.It works most of the time, but randomly sometimes it doesn't work.All hourly data has been dumped into DataSet class and inserting dataset in Fitness.getHistoryClient function. DataSet object had steps for the hours and I got 200 response for the API but the data is not seen in Google Fit app.
Could someone help me?
Here is my code,
val dataSource = DataSource.Builder()
.setAppPackageName(context)
.setDataType(DataType.TYPE_STEP_COUNT_DELTA)
.setStreamName(TAG + AppConstants.STEPSCOUNT_FIT.value)
.setType(DataSource.TYPE_RAW)
.build()
// Create a data set
var dataSet = DataSet.create(dataSource)
dataSet = fillStepsData(dataSet)
LogHelper.i(TAG, "Inserting the dataset in the History API.")
val lastSignedInAccount = GoogleSignIn.getLastSignedInAccount(context)
return if (lastSignedInAccount != null) {
Fitness.getHistoryClient(context, lastSignedInAccount)
.insertData(dataSet)
.addOnCompleteListener { task ->
if (task.isSuccessful) {
// At this point, the data has been inserted and can be read.
LogHelper.i(TAG, "Data insert was successful!")
readHistoryData()
} else {
LogHelper.e(
TAG,
"There was a problem inserting the dataset.",
task.exception
)
}
}
}
fillStepsData(dataSet) - this function returns DataSet.DataSet contains DataPoint which includes all hourly data.

azure ADF - Get field list of .csv file from lookup activity

context: azure - ADF brief process description:
Get a list of the fields defined in the first row of a .csv(blobed) file. This is the first step, detect fields
then 2nd step would be a kind of compare with actual columns of an SQL table
3rd one a stored procedure execution to make the alter table task, finishing with a (customized) table containing all fields needed to successfully load the .csv file into the SQl table.
To begin my ADF pipeline, I set up a lookup activity that "querys" the first line of my blobed file, "First row only" flag = ON.As a second pipeline activity, an "Append Variable" task, there I would like to get all .csv fields(first row) retrieved from the lookup activity, as a list.
Here is where a getting the nightmare.
As far I know, with dynamic content I can get an array with all values (w/ format like {"field1_name":"field1_value_1st_row", "field2_name":"field2_value_1st_row", etc })
with something like #activity('Lookup1').output.firstrow.
Or any array element with #activity('Lookup1').output.firstrow.<element_name>,
but I can't figure out how to get a list of all field names (keys?) of the array.
I will appreciate any advice, many thanks!
I would save the part of LookUp Activity because it seems that you are familiar with it.
You could use Azure Function HttpTrigger to get the key list of firstrow JSON object. For example your json object like this as you mentioned in your question:
{"field1_name":"field1_value_1st_row", "field2_name":"field2_value_1st_row"}
Azure Function code:
module.exports = async function (context, req) {
context.log('JavaScript HTTP trigger function processed a request.');
var array = [];
for(var key in req.body){
array.push(key);
}
context.res = {
body: {"keyValue":array}
};
};
Test Output:
Then use Azure Function Activity to get the output:
#activity('<AzureFunctionActivityName>').keyValue
Use Foreach Activity to loop the keyValue array:
#item()
Still based on the above sample input data,please refer to my sample code:
dct = {"field1_name": "field1_value_1st_row", "field2_name": "field2_value_1st_row"}
list = []
for key in dct.keys():
list.append(key)
print(list)
dicOutput = {"keys": list}
print(dicOutput)
Have you considered doing this in ADF data flow? You would map the incoming fields to a SQL dataset without a target schema. Define a new table name in the dataset definition and then map the incoming fields from your CSV to a new target table schema definition. ADF will write the rows to a new table using that file's schema.

Entity Framework (MySQL) - move data from one database to another using two dbcontexts

Using Entity Framework 6 and MySQL, I am trying to archive data from a 'production' database table to an 'archive' database. I have created two DBContexts one for each database. Each database has the same schema.
I can move an entire table of data from the production database to the archive database using the following code:
using (MyDBContext archiveContext =
MyDBContext.CreateEntitiesForSpecificDatabaseName("archive_db"))
using (MyDBContext prodContext =
MyDBContext.CreateEntitiesForSpecificDatabaseName("prod_db"))
{
if(prodContext.myTable.Any())
{
archiveContext.myTable.AddRange(prodContext.myTable.AsNoTracking());
archiveContext.SaveChanges();
}
}
However I don't want to archive the whole table, I only wish to archive data older than a certain date, so I tried the following:
using (MyDBContext archiveContext =
MyDBContext.CreateEntitiesForSpecificDatabaseName("archive_db"))
using (MyDBContext prodContext =
MyDBContext.CreateEntitiesForSpecificDatabaseName("prod_db"))
{
IQueryable<myTable> dataToArchive =
from mt in prodContext.myTable
where mt.date < DateTimeSixMonths
select mt;
archiveContext.myTable.AddRange(dataToArchive);
archiveContext.SaveChanges();
}
but I cannot get around the exception I get when I run this:
System.InvalidOperationException: 'An entity object cannot be
referenced by multiple instances of IEntityChangeTracker.'
It occurs on this line:
archiveContext.myTable.AddRange(dataToArchive);
Is it possible to somehow remove the tracking from the 'dataToArchive'
Have you tried disposing the first DataContext after retrieving data? Something like this:
List<myTable> dataToArchive;
using (MyDBContext prodContext =
MyDBContext.CreateEntitiesForSpecificDatabaseName("prod_db"))
{
dataToArchive = (from mt in prodContext.myTable
where mt.date < DateTimeSixMonths
select mt).ToList();
}
using (MyDBContext archiveContext =
MyDBContext.CreateEntitiesForSpecificDatabaseName("archive_db"))
{
archiveContext.myTable.AddRange(dataToArchive);
archiveContext.SaveChanges();
}
Using EF to manage archiving data isn't ideal, something like that would be better served at the database level using insert-select + delete for low to moderate data volumes or detachable partitions (I.e. 3-6 mo. partition sizes) that can be moved between databases.
To do this with EF (only recommended for small and non-complex domain models) you should be able to accomplish this by disabling the proxy generation in your read context, load the data AsNoTracking, then add it to the new context DbSet. This example does not handle associated entities, or do the delete from the prod DbSet.
using (MyDBContext prodContext =
MyDBContext.CreateEntitiesForSpecificDatabaseName("prod_db"))
{
prodContext.Configuration.ProxyCreationEnabled = false;
dataToArchive = prodContext.myTable.AsNoTracking()
.Where(mt => mt.Date < DateTimeSixMonths);
using (MyDBContext archiveContext =
MyDBContext.CreateEntitiesForSpecificDatabaseName("archive_db"))
{
archiveContext.myTable.AddRange(dataToArchive);
archiveContext.SaveChanges();
}
}

Groovy: Parsing file with specific extension for import to MySQL and then rename it to *.bak

I made a script for connecting to database and changing data in specific column for definitely number.
Now i want read a numbers from text file with specific extension, making changes for those numbers in database and then rename file with .bak extension.
help me, please. i appreciate your help in advance!
import groovy.sql.Sql
sql = Sql.newInstance('jdbc:mysql://localhost:3306/database', 'login', 'password', 'com.mysql.jdbc.Driver')
int rowsAffected = sql.executeUpdate('update tablename set column = '01' where number=$NumberFromFile')
println "updated: ${rowsAffected}"
Something like this should work:
def newValue = '01'
new File( '/path/to/data.input' ).with { file ->
file.withReader { reader ->
new Scanner( reader ).useDelimiter( ';' ).with { scanner ->
while( scanner.hasNext() ) {
sql.executeUpdate "UPDATE tablename SET column=$newValue WHERE number=${scanner.nextInt()}"
}
}
}
file.renameTo( new File( file.parent, "${file.name}.bak" ) )
}
Obviously, you probably want to do it in a transaction, or a batch, but this should give you the idea
now i have a script:
def tmn_file = ~/.*\.tmn/
def tmc_file = ~/.*\.tmc/
def newTerm = new Properties().with { props ->
new File(inputPath).eachFile(tmn_file) { file ->
file.withReader { reader ->
load( reader )
println "Read data from file $file:"
something read from file...
switch( props.ACTION ) {
case 'NEW':
do something...
}
switch( props.ACTION ) {
case 'CHANGE':
do something...
}
this script looking in directory with path inputPath file with extension tmn_file, which can containt ACTION - NEW or CHANGE.
Script works great, but i want to make another thing:
if file have extension *.tmn (tmn_file) - start only ACTION with NEW case
if file have extension *.tmc (tmc_file) - start only ACTION with CHANGE case
How i can realize decision?

Excluding Content From SQL Bulk Insert

I want to import my IIS logs into SQL for reporting using Bulk Insert, but the comment lines - the ones that start with a # - cause a problem becasue those lines do not have the same number f fields as the data lines.
If I manually deleted the comments, I can perform a bulk insert.
Is there a way to perform a bulk insert while excluding lines based on a match such as : any line that beings with a "#".
Thanks.
The approach I generally use with BULK INSERT and irregular data is to push the incoming data into a temporary staging table with a single VARCHAR(MAX) column.
Once it's in there, I can use more flexible decision-making tools like SQL queries and string functions to decide which rows I want to select out of the staging table and bring into my main tables. This is also helpful because BULK INSERT can be maddeningly cryptic about the why and how of why it fails on a specific file.
The only other option I can think of is using pre-upload scripting to trim comments and other lines that don't fit your tabular criteria before you do your bulk insert.
I recommend using logparser.exe instead. LogParser has some pretty neat capabilities on its own, but it can also be used to format the IIS log to be properly imported by SQL Server.
Microsoft has a tool called "PrepWebLog" http://support.microsoft.com/kb/296093 - which strips-out these hash/pound characters, however I'm running it now (using a PowerShell script for multiple files) and am finding its performance intolerably slow.
I think it'd be faster if I wrote a C# program (or maybe even a macro).
Update: PrepWebLog just crashed on me. I'd avoid it.
Update #2, I looked at PowerShell's Get-Content and Set-Content commands but didn't like the syntax and possible performance. So I wrote this little C# console app:
if (args.Length == 2)
{
string path = args[0];
string outPath = args[1];
Regex hashString = new Regex("^#.+\r\n", RegexOptions.Multiline | RegexOptions.Compiled);
foreach (string file in Directory.GetFiles(path, "*.log"))
{
string data;
using (StreamReader sr = new StreamReader(file))
{
data = sr.ReadToEnd();
}
string output = hashString.Replace(data, string.Empty);
using (StreamWriter sw = new StreamWriter(Path.Combine(outPath, new FileInfo(file).Name), false))
{
sw.Write(output);
}
}
}
else
{
Console.WriteLine("Source and Destination Log Path required or too many arguments");
}
It's pretty quick.
Following up on what PeterX wrote, I modified the application to handle large log files since anything sufficiently large would create an out-of-memory exception. Also, since we're only interested in whether or not the first character of a line starts with a hash, we can just use StartsWith() method on the read operation.
class Program
{
static void Main(string[] args)
{
if (args.Length == 2)
{
string path = args[0];
string outPath = args[1];
string line;
foreach (string file in Directory.GetFiles(path, "*.log"))
{
using (StreamReader sr = new StreamReader(file))
{
using (StreamWriter sw = new StreamWriter(Path.Combine(outPath, new FileInfo(file).Name), false))
{
while ((line = sr.ReadLine()) != null)
{
if(!line.StartsWith("#"))
{
sw.WriteLine(line);
}
}
}
}
}
}
else
{
Console.WriteLine("Source and Destination Log Path required or too many arguments");
}
}
}