I am currently in the process of moving my sensor reading data from my Azure Blob storage into a SQL database. I have multiple .csv files and in those files I have various columns that holds the date ( in the format: 25/4/2017), time, sensor_location and sensor_readings.
My question; If I want to store the data according to their respective columns using Logic App, what step should I take? and how do I push the second file data into the row after the first file data? Thanks
You will need to either write a script, (any high level language which has support or extensions for mysql will do, python, php, nodejs, etc) to import your data or you can use a mysql client like sequelpro https://www.sequelpro.com/ which imports csv files.
Here is a link as to how to insert data into mysql with php:
http://php.net/manual/en/pdo.prepared-statements.php
You can read the csv file with:
$contents = file_get_contents('filename.csv');
$lines = explode("\n", $contents);
foreach($lines as $line) { ...
// insert all rows to mysql here
Related
I have a spreadsheet which really has only one complicated table. I basically convert the spreadsheet to a cvs and use a groovy script to generate the INSERT scripts.
However, I cannot do this with a table that has 28 fields with data within some of the fields on the spreadsheet that make importing into the CVS even more complicated. So the fields in the new CVS are not differentiated properly or my script has not accounted for it.
Does anyone have any suggestions on a better approach to do this? Thanks.
Have a look at LOAD DATA INFILE statement. It will help you to import data from the CSV file into table.
This is a recurrent question on stackoverflow. Here is an updated answer.
There are actually several ways to import an excel file in to a MySQL database with varying degrees of complexity and success.
Excel2MySQL or Navicat utilities. Full disclosure, I am the author of Excel2MySQL. These 2 utilities aren't free, but they are the easiest option and have the fewest limitations. They also include additional features to help with importing Excel data into MySQL. For example, Excel2MySQL automatically creates your table and automatically optimizes field data types like dates, times, floats, etc. If your in a hurry or can't get the other options to work with your data then these utilities may suit your needs.
LOAD DATA INFILE: This popular option is perhaps the most technical and requires some understanding of MySQL command execution. You must manually create your table before loading and use appropriately sized VARCHAR field types. Therefore, your field data types are not optimized. LOAD DATA INFILE has trouble importing large files that exceed 'max_allowed_packet' size. Special attention is required to avoid problems importing special characters and foreign unicode characters. Here is a recent example I used to import a csv file named test.csv.
phpMyAdmin: Select your database first, then select the Import tab. phpMyAdmin will automatically create your table and size your VARCHAR fields, but it won't optimize the field types. phpMyAdmin has trouble importing large files that exceed 'max_allowed_packet' size.
MySQL for Excel: This is a free Excel Add-in from Oracle. This option is a bit tedious because it uses a wizard and the import is slow and buggy with large files, but this may be a good option for small files with VARCHAR data. Fields are not optimized.
For comma-separated values (CSV) files, the results view panel in Workbench has an "Import records from external file" option that imports CSV data directly into the result set. Execute that and click "Apply" to commit the changes.
For Excel files, consider using the official MySQL for Excel plugin.
A while back I answered a very similar question on the EE site, and offered the following block of Perl, as a quick and dirty example of how you could directly load an Excel sheet into MySQL. Bypassing the need to export / import via CSV and so hopefully preserving more of those special characters, and eliminating the need to worry about escaping the content.
#!/usr/bin/perl -w
# Purpose: Insert each Worksheet, in an Excel Workbook, into an existing MySQL DB, of the same name as the Excel(.xls).
# The worksheet names are mapped to the table names, and the column names to column names.
# Assumes each sheet is named and that the first ROW on each sheet contains the column(field) names.
#
use strict;
use Spreadsheet::ParseExcel;
use DBI;
use Tie::IxHash;
die "You must provide a filename to $0 to be parsed as an Excel file" unless #ARGV;
my $sDbName = $ARGV[0];
$sDbName =~ s/\.xls//i;
my $oExcel = new Spreadsheet::ParseExcel;
my $oBook = $oExcel->Parse($ARGV[0]);
my $dbh = DBI->connect("DBI:mysql:database=$sDbName;host=192.168.123.123","root", "xxxxxx", {'RaiseError' => 1,AutoCommit => 1});
my ($sTableName, %hNewDoc, $sFieldName, $iR, $iC, $oWkS, $oWkC, $sSql);
print "FILE: ", $oBook->{File} , "\n";
print "DB: $sDbName\n";
print "Collection Count: ", $oBook->{SheetCount} , "\n";
for(my $iSheet=0; $iSheet < $oBook->{SheetCount} ; $iSheet++)
{
$oWkS = $oBook->{Worksheet}[$iSheet];
$sTableName = $oWkS->{Name};
print "Table(WorkSheet name):", $sTableName, "\n";
for(my $iR = $oWkS->{MinRow} ; defined $oWkS->{MaxRow} && $iR <= $oWkS->{MaxRow} ; $iR++)
{
tie ( %hNewDoc, "Tie::IxHash");
for(my $iC = $oWkS->{MinCol} ; defined $oWkS->{MaxCol} && $iC <= $oWkS->{MaxCol} ; $iC++)
{
$sFieldName = $oWkS->{Cells}[$oWkS->{MinRow}][$iC]->Value;
$sFieldName =~ s/[^A-Z0-9]//gi; #Strip non alpha-numerics from the Column name
$oWkC = $oWkS->{Cells}[$iR][$iC];
$hNewDoc{$sFieldName} = $dbh->quote($oWkC->Value) if($oWkC && $sFieldName);
}
if ($iR == $oWkS->{MinRow}){
#eval { $dbh->do("DROP TABLE $sTableName") };
$sSql = "CREATE TABLE IF NOT EXISTS $sTableName (".(join " VARCHAR(512), ", keys (%hNewDoc))." VARCHAR(255))";
#print "$sSql \n\n";
$dbh->do("$sSql");
} else {
$sSql = "INSERT INTO $sTableName (".(join ", ",keys (%hNewDoc)).") VALUES (".(join ", ",values (%hNewDoc)).")\n";
#print "$sSql \n\n";
eval { $dbh->do("$sSql") };
}
}
print "Rows inserted(Rows):", ($oWkS->{MaxRow} - $oWkS->{MinRow}), "\n";
}
# Disconnect from the database.
$dbh->disconnect();
Note:
Change the connection ($oConn) string to suit, and if needed add a
user-id and password to the arguments.
If you need XLSX support a quick switch to Spreadsheet::XLSX is all
that's needed. Alternatively it only takes a few lines of code, to
detect the filetype and call the appropriate library.
The above is a simple hack, assumes everything in a cell is a string
/ scalar, if preserving type is important, a little function with a
few regexp can be used in conjunction with a few if statements to
ensure numbers / dates remain in the applicable format when written
to the DB
The above code is dependent on a number of CPAN modules, that you can install, assuming outbound ftp access is permitted, via a:
cpan YAML Data::Dumper Spreadsheet::ParseExcel Tie::IxHash Encode Scalar::Util File::Basename DBD::mysql
Should return something along the following lines (tis rather slow, due to the auto commit):
# ./Excel2mysql.pl test.xls
FILE: test.xls
DB: test
Collection Count: 1
Table(WorkSheet name):Sheet1
Rows inserted(Rows):9892
Suppose we are getting lots of list data in json format, for every single day the api returns the same data.
Now if I apply the filter on the json then where can I store the API json result for the current day so that there is no need to call the API multiple times.
How can I store it in a txt file or in a database or maybe in cache?
It depends on your aims. You may use a text file or the DB field.
You may use a Redis as a cache.
Try to start with text file at first. Probably it will help you.
1) Draft usage of text (.json) file.
// $json = json_encode($array); // if you don't have json data
$filePath = sprintf('%s_cache.json', date('Y-m-d'));
file_put_contents($filename, $json);
2) Usage of JSON in MySQL
INSERT INTO table VALUES (JSON_OBJECT("key", "value")); // something like this
INSERT INTO table VALUES ('{"key": "value"}'); // or this one
More details about MySQL are here: https://dev.mysql.com/doc/refman/5.7/en/json.html
I started to pull GLUE table via using pyathena since last week. However, one annoying thing I noticed that is if I wrote my code as shown below, sometimes it works and returns a pandas dataframe but other times, this piece of codes will create a csv and a csv metadata in the folder where physical data (parquet) are stored in S3 and registered in GLUE.
I know that if you use pandas cursor, it may end up with these two files but I just wonder if I can access data without these two files since every time these two files generated in S3, my read in process failed.
Thank you!
import os
access_key_id = os.getenv('AWS_ACCESS_KEY_ID')
secret_access_key = os.getenv('AWS_SECRET_ACCESS_KEY')
connect1 = connect(s3_staging_dir='s3://xxxxxxxxxxxxx')
df = pd.read_sql("select * from abc.table_name", connect1)
df.head()
go to Athena
click settings -> workgroup name -> edit workgroup
Update "Query result location"
click "Override client-side settings"
Note: If you have not setup any other workgroups for your Athena environment, you should only find one workgroup named "Primary".
This should resolve your problem. For more information you can read:
https://docs.aws.amazon.com/athena/latest/ug/querying.html
I have around four *.sql self-contained dumps ( about 20GB each) which I need to convert to datasets in Apache Spark.
I have tried installing and making a local database using InnoDB and importing the dump but that seems too slow ( spent around 10 hours with that )
I directly read the file into spark using
import org.apache.spark.sql.SparkSession
var sparkSession = SparkSession.builder().appName("sparkSession").getOrCreate()
var myQueryFile = sc.textFile("C:/Users/some_db.sql")
//Convert this to indexed dataframe so you can parse multiple line create / data statements.
//This will also show you the structure of the sql dump for your usecase.
var myQueryFileDF = myQueryFile.toDF.withColumn("index",monotonically_increasing_id()).withColumnRenamed("value","text")
// Identify all tables and data in the sql dump along with their indexes
var tableStructures = myQueryFileDF.filter(col("text").contains("CREATE TABLE"))
var tableStructureEnds = myQueryFileDF.filter(col("text").contains(") ENGINE"))
println(" If there is a count mismatch between these values choose different substring "+ tableStructures.count()+ " " + tableStructureEnds.count())
var tableData = myQueryFileDF.filter(col("text").contains("INSERT INTO "))
The problem is that the dump contains multiple tables as well each of which needs to become a dataset. For which I need to understand if we can do it for even one table. Is there any .sql parser written for scala spark ?
Is there a faster way of going about it? Can I read it directly into hive from .sql self-contained file?
UPDATE 1: I am writing the parser for this based on Input given by Ajay
UPDATE 2: Changing everything to dataset based code to use SQL parser as suggested
Is there any .sql parser written for scala spark ?
Yes, there is one and you seem to be using it already. That's Spark SQL itself! Surprised?
The SQL parser interface (ParserInterface) can create relational entities from the textual representation of a SQL statement. That's almost your case, isn't it?
Please note that ParserInterface deals with a single SQL statement at a time so you'd have to somehow parse the entire dumps and find the table definitions and rows.
The ParserInterface is available as sqlParser of a SessionState.
scala> :type spark
org.apache.spark.sql.SparkSession
scala> :type spark.sessionState.sqlParser
org.apache.spark.sql.catalyst.parser.ParserInterface
Spark SQL comes with several methods that offer an entry point to the interface, e.g. SparkSession.sql, Dataset.selectExpr or simply expr standard function. You may also use the SQL parser directly.
shameless plug You may want to read about ParserInterface — SQL Parser Contract in the Mastering Spark SQL book.
You need to parse it by yourself. It requires following steps -
Create a class for each table.
Load files using textFile.
Filter out all the statements other than insert statements.
Then split the RDD using filter into multiple RDDs based on the table name present in insert statement.
For each RDD, use map to parse values present in insert statement and create object.
Now convert RDDs to datasets.
I've got a pretty substantial XLS file a client provided 830 total tabs/sheets.
I've designed a multi table database with PHPMyAdmin (MySQL obviously) to house the information that's in there, and have populated about 5 of those sheets by hand to ensure the data will fit into the designed database.
Is there a piece of software or some sort of tool that will help me format this XLS document and map it to the right places in the database?
According to this thread you can import a csv file exported from excel with php.
Quoting #noelthefish
as you seem to be using PHP here is a function that is built into PHP
array fgetcsv ( resource $handle [, int $length [, string $delimiter [, string $enclosure [, string $escape ]]]] )
<?php
$row = 1;
$handle = fopen("test.csv", "r");
while (($data = fgetcsv($handle, 1000, ",")) !== FALSE) {
$num = count($data);
echo "<p> $num fields in line $row: </p>\n";
$row++;
for ($c=0; $c < $num; $c++) {
echo $data[$c] . "\n";
}
}
fclose($handle);
?>
this is the very basic of what you need. You would of course put in
the database update part within the while loop. Take out the ECHO
statements as well unless you are debugging but basically with a
little alteration this small piece of code will do what you need. if
you need more info check out uk.php.net
It's possible to import data from a spreadsheet into Access and map the fields to whatever table/column you want. It's also possible to use an ODBC connector to connect Access to a MySQL DB, essentially using Access as a front-end to MySQL.
Alternatively, you can do as toomanyairmiles suggests and simply write the PHP code to massage the CSV data into the format your MySQL DB needs. This is what I've done in the past when we needed to import sales data from disparate sources into an in-house sales/royalties-tracking system. If you need to do frequent imports, I would suggest automating a system (e.g. via an Excel macro) to export the individual sheets into CSVs in a single directory or, if it's more convenient, you can zip them together and upload the zip file to the PHP app.
If you're doing bulk imports into MySQL, you may want to consider using LOAD DATA INFILE, which is generally the fastest way to import data into MySQL—to the point where it's actually faster to write a new CSV file to disk after you've massaged the data, and then use LOAD DATA INFILE rather than doing a bulk INSERT directly. Doing this on my intranet actually cut the insertion time from over 3 seconds (already an improvement over the ~3min. it took to do thousands of individual INSERTs) down to 240ms.