Apache Drill Hbase Query - Audit Columns - apache-drill

How can I write queries to extract data on condition based on audit columns like TimeRange, etc.
I am actually looking to filter data based on timestamp i.e. basically extracting data for a specific version

As per Write Drill query output to csv (or some other format)
use dfs.tmp;
alter session set `store.format`='csv';
create table dfs.tmp.my_output as select * from cp.`employee.json`;

Related

Export blob column from mysql dB to disk and replace it with new file name

So I'm working on a legacy database, and unfortunately the performance of database is very slow. Simple select query can take up to 10 seconds in tables with less than 10000 record.
So i tried to investigate problem and found out that deleting column that they have used to store files (mostly videos and images) fix the problem and improve performance a lot.
Along with adding proper indexes I was able to run exact same query that used to take 10-15sec to run in under 1sec.
So my question is. Is there any already existing tool or script I can use to help me export those blobs (videos) from database and save the to disk and update row with new file name/path on file system?
If not is there any proper way to optimize database so that those blob would not impact performance that much?
Hint some one clients consuming this database use high level orms so we don't have much control on queries orm use to fetch rows and its relations. So I cannot optimize queries directly.
SELECT column FROM table1 WHERE id = 1 INTO DUMPFILE 'name.png';
How about this way?
These is also INTO_OUTFILEinstead of INTO_DUMPFILE
13.2.10.1 SELECT ... INTO Statement The SELECT ... INTO form of SELECT enables a query result to be stored in variables or written to a file:
SELECT ... INTO var_list selects column values and stores them into
variables.
SELECT ... INTO OUTFILE writes the selected rows to a file. Column and
line terminators can be specified to produce a specific output format.
SELECT ... INTO DUMPFILE writes a single row to a file without any
formatting.
Link: https://dev.mysql.com/doc/refman/8.0/en/select-into.html
Link: https://dev.mysql.com/doc/refman/8.0/en/select.html

Create new column with SQL SELECT from existing column data

I am looking to create a CSV with output generated from a SQL query.
Right now it is a SELECT statement which takes data from multiple tables, however i'd like to add a column as part of my output which takes specific data from an existing column and adds it to a new column as part of the csv output for easier reading.
Is this possible?

DataPipeline: Use only first 4 values from CSV in pipeline

I have a CSV, which has a variable structure, which I only want to take the first 4 values from. The CSV stored in S3 has between 7 and 8 fields in it, and I would like to take just the first 4. I have attempted to use the following prepared statement:
INSERT INTO locations (timestamp, item_id, latitude, longitude) VALUES (?, ?, ?, ?);
However I am getting:
Parameter index out of range (5 > number of parameters, which is 4).
Which I believe means that it is attempting to load in the other variables in the CSV. Is it possible to just take the first 4 values? Or otherwise deal with a variable length CSV?
Use transformSql option. You didn't mention what you are loading into, from redshift docs :
The SQL SELECT expression used to transform the input data. When you
copy data from DynamoDB or Amazon S3, AWS Data Pipeline creates a
table called staging and initially loads it in there. Data from this
table is used to update the target table. If the transformSql option
is specified, a second staging table is created from the specified SQL
statement. The data from this second staging table is then updated in
the final target table. transformSql must be run on the table named
staging and the output schema of transformSql must match the final
target table's schema.

Hive External Table - Where is data location meta data stored?

I am using Hive external tables on Amazon EMR. Often these tables are partitioned, with each partition pointing to a different bucket in S3. I am using MySQL for Hive meta data storage.
I want to be able to see the location/bucket on S3 that each partition is pointing to. I have looked into the meta data tables in MySQL. I can see partition information there, but nothing that indicates that actual location of the data.
Is this data available in MySQL, or can it be obtained by Hive commands?
The following hive command can be used to get the location
hive> show create table <TableName>;
Please search for the line LOCATION in the output of the above hive command.
For an external partitioned table, each partition has a location, rather than the table itself having a location. You need to use something like
show partitions employees
to get the partition list then
describe extended employees partition (year='2016', month='05', day='25')
to see the location of a particular partition.
Other commands like show create table employees may not give useful info about the data location:
LOCATION
'hdfs://nameservice1/user/hive/warehouse/something.db/employees'
describe extended table_name
Will provide you all details about the tables including (tableName:ca_data, dbName:suman, owner:suman, createTime:1494368591, lastAccessTime:0, retention:0, sd:StorageDescriptor(cols:) and many more.
Another Command:
desc formatted table_name;
If you want to see the actual data storage location of hive table,you can use multiple ways .
1) hive> show create table <TableName>;
It will provide you the table syntax long with where actual data located path .
2) describe extended table_name or describe formatted table_name.
It will give you the location,owner,comments,table type etc details .
3) The above formats will help you only if you want to check the location of single table but the above steps won't help if you want to check the location of multiple tables in multiple databases .
So here we can you hive metastore and get the locations of multiple tables with a single query .
I saw a very good article about how to get the location of all hive tables HDFS path, please read it .
https://askdoubts.com/question/how-to-find-out-list-of-all-hive-external-tables-and-hdfs-paths-from-hive-metastore/#comment-19
Thanks,
Mahesh
As h4ck3r mentioned, you could use the "Show create table" command to look for location information.
To see partition specific information, use Show Table/Partition Extended :
SHOW TABLE EXTENDED will list information for all tables matching the given regular expression. Users cannot use regular expression for table name if a partition specification is present. This command's output includes basic table information and file system information like totalNumberFiles, totalFileSize, maxFileSize, minFileSize,lastAccessTime, and lastUpdateTime. If partition is present, it will output the given partition's file system information instead of table's file system information.

Dump MySQL Table(s) as XML using SQL command (like table_to_xml() or query_to_xml() in postgresql?)

is there a way to retrieve the result of a query / or to dump a whole table into an XML fragment which can be retrieved by using an XML query? I know there is something like this for PostgreSQL (9.0), table_to_xml()and query_to_xml().
I also know that mysqldump --xml can export XML, but I do seek for something that allows me to issue a simple query. The application I’m working on should allow some users to dump a certain table into an XML file on their machine, therefor I need to issue a query and obtain a String or something (is there an XML type in MySQL?).
I need the result to be XML and a Result Set of a query, not a file on server or something.
The query resulting in a SQL script similar to a MySQL dump for a single table would have three parts:
SHOW CREATE TABLE tblname - to generate CREATE TABLE statement.
DESCRIBE tblname - to retrieve column names for the construction of INTO(...) part of the INSERT queries.
SELECT * FROM tblname - to retrieve values for the construction of VALUES(...) part of the INSERT queries. Each row in the result set will correspond to an INSERT statement. INSERT statements will be generated in the loop handling the result set.
If this is to be done from MySQL, it can be wrapped into a stored procedure.
Found this in a question here at stackoverflow, as linked in the comments. Proposes to manually build XML in a query, like
SELECT concat("<this-is-xml>", field1, "</this-is-xml>") FROM ...
Of course, xml-charcter escaping and so on has to be done manually.
There seems to be no native way to directly get the result of a query as xml.
There is also a library (lib_mysqludf_xql) for mysql which provides XML functionality for MySQL.
INTO OUTFILE will dump the results to an XML file, so you could then send that to a client.