How GreenPlum implements external table? - external

I am learning gpdb. And I am confused on external tables. Following is my understanding.
For Read-only table, gpdb only create metadata of the table. All data is stored on remote server like hdfs. When querying, data is transferred from remote server to segment. Data will not be saved on segments when query ends.
For Write-only table, gpdb will load data from remote. All data will be saved on segments. Insert action will modify data on local segment, not remote.
Is my understanding right?

An External Table is a table where the data is stored and managed outside of the database. An example would be a CSV or TEXT file.
A Readable External Table has the metadata stored in the database and like you said, all data is stored somewhere else like HDFS. It can also be the local filesystem to Greenplum. The most common External Table uses gpfdist to "serve" the file(s). gpfdist is basically a web server that multiple segments can read at the same time.
Example:
First, I will start gpfdist on four different hosts. These hosts must be accessible by all segment hosts too.
[gpadmin#host1] gpfdist -p 8999 -d /landing/ &
[gpadmin#host2] gpfdist -p 8999 -d /landing/ &
[gpadmin#host3] gpfdist -p 8999 -d /landing/ &
[gpadmin#host4] gpfdist -p 8999 -d /landing/ &
And now put an example file in each:
[gpadmin#host1] hostname > /landing/hostname.txt
[gpadmin#host2] hostname > /landing/hostname.txt
[gpadmin#host3] hostname > /landing/hostname.txt
[gpadmin#host4] hostname > /landing/hostname.txt
Create an External Table to read it:
[gpadmin#master] psql
gpadmin=# create external table public.ext_hostnames
(hostname text)
location (
'gpfdist://host1:8999/hostname.txt',
'gpfdist://host2:8999/hostname.txt',
'gpfdist://host3:8999/hostname.txt',
'gpfdist://host4:8999/hostname.txt')
format 'text' (delimiter '|' null as '');
This simple example shows files managed outside of the database that are now accessible by Greenplum with an External Table.
A Writable External Table reverses this and allows you to write data to an external system like HDFS or the posix filesystem.
[gpadmin#master] gpfdist -p 8999 -d /home/gpadmin/ &
[gpadmin#master] psql
gpadmin=# create writable external table public.wrt_example
(foo text)
location ('gpfdist://master:8999/foo.txt')
format 'text' (delimiter '|' null as '');
insert into public.wrt_example values ('jon');
gpadmin=# insert into public.wrt_example select 'jon';
gpadmin=# \q
[gpadmin#master] cat /home/gpadmin/foot.txt
jon
The use case for a Writable External Table is to take data in Greenplum and put it somewhere else.
A common example is when you use HDFS for a Data Lake and Greenplum for Analytics. You could read data out of HDFS with an External Table and INSERT it into Greenplum tables. You then analyze this data, use analytical functions from the Madlib package, and find new insights about your data. Now you want to push this new data from Greenplum back to HDFS so that other consumers can benefit from the insights. You then use a Writable External Table to INSERT the data from Greenplum to HDFS.
The most common use case for a Readable External Table is loading files from external sources into Greenplum.

Related

Hadoop/Hive : Loading data from .csv on a remote machine

I am having a csv file that can from a http url. Is there any way I can load it from there :-
This is what I am trying
LOAD DATA INPATH 'http://192.168.56.101:8081/TeamHalf.csv' OVERWRITE INTO TABLE csvdata;
Hive Load command is a follows :
LOAD DATA [LOCAL] INPATH 'filepath' [OVERWRITE] INTO TABLE tablename [PARTITION (partcol1=val1, partcol2=val2 ...)]
1) if LOCAL specified - Loads from local FS filepath
2) if no LOCAL - Loads from HDFS filepath only i.e,:
filepath must refer to files within the same filesystem as the table's (or partition's) location
So Load from remote http:path won't work. refer HIVE DML . The possible way is (Staging) to load the data from remote http:path to LocalFS or HDFS , then to Hive Warehouse.

Update automatic attributes in Opscode Chef (serialized_object)

I had a couple of nodes in my chef server that had a problem while bootstrapping and missed the FQDN and domain automatic attributes due to which they were not indexed by SOLR and not searchable by knife. I could not rebootstrap these machines, but wanted to fix this and it took me a while to do so. Therefore I am posting this hoping that it will save others some time.
Automatic attributes are stored by Chef in the database and are not editable by knife (see Chef Attributes Overview). They are stored in chef's database as a column named serialized_object in the nodes table in hex and is in fact a gzipped JSON string.
To obtain the JSON string:
Use a PostgreSQL client to connect to the chef PostgreSQL (you can find the credentials on the chef server in /etc/chef-server/chef-server-secrets.json)
Save the contents of the serialized_object to a file say serialized_object.hex (it should look something like '\x1f8b08000...')
Run: xxd -p -r serialized_object.hex > serialized_object.gz
Run: gunzip serialized_object.gz
Now the file serialized_object contains the attributes in JSON format which you can edit. After editing you can store its contents back in chef server by following this:
Run: gzip serialized_object
Run: xxd -p serialized_object.gz > serialized_object.hex
Now you need to use the PostgreSQL client and insert the Hex data (be sure to remove prefix backslashes and x from the hex string) with the following query:
update nodes set serialized_object = decode('1f8b08000...','hex') where name = ''
Hope this helps someone :)

MySQL : Load data infile

I am getting error when using load data to insert the query .
"load data infile '/home/bharathi/out.txt' into table Summary"
This file is there in the location . But mysql throws the below error .
ERROR 29 (HY000): File '/home/bharathi/out.txt' not found (Errcode: 13)
show variables like 'data%';
+---------------+-----------------+
| Variable_name | Value |
+---------------+-----------------+
| datadir | /var/lib/mysql/ |
+---------------+-----------------+
Data Dir is pointing to root permissioned folder . I can't change this variable because it's readonly .
How can I do the load data infile operation ?
I tried changing file permissions , load data local infile . It wont work .
As documented under LOAD DATA INFILE Syntax:
For security reasons, when reading text files located on the server, the files must either reside in the database directory or be readable by all. Also, to use LOAD DATA INFILE on server files, you must have the FILE privilege. See Section 6.2.1, “Privileges Provided by MySQL”. For non-LOCAL load operations, if the secure_file_priv system variable is set to a nonempty directory name, the file to be loaded must be located in that directory.
You should therefore either:
Ensure that your MySQL user has the FILE privilege and, assuming that the secure_file_priv system variable is not set:
make the file readable by all; or
move the file into the database directory.
Or else, use the LOCAL keyword to have the file read by your client and transmitted to the server. However, note that:
LOCAL works only if your server and your client both have been configured to permit it. For example, if mysqld was started with --local-infile=0, LOCAL does not work. See Section 6.1.6, “Security Issues with LOAD DATA LOCAL”.
The solution which really worked for me was to:
sudo chown mysql:mysql /path/to/the/file/to/be/read.csv
Adding it for future reference.

importing mysql structures only or restore ignoring non-existing tables and columns?

I have 2 servers (1 for development, 1 for service)
I keep ADD/DELETE columns and CREATE/DELETE Indexes on my development server, so these 2 server have similar but different mysql data structures.
I know there's an option to expert structures only. (like –no-data)
Is there a way (except 3rd party software like mysqldiff.org) to import structure only to an existing data?
Alternativley, is there a way to import only data ignoring non-existing tables and colums?
(I thought this may do the trick if I back-up data -> import structure -> restore the data.)
thanks in advance
I had exactly the same problem. I solved it like this:
Like you already mentioned:
back-up data -> import structure -> restore the data as follows
#DEVELOPEMENT (export structure only)<br/>
**mysqldump -u user -p –d test > structure-only.sql**
#DEPLOYMENT (export data only)<br/>
**mysqldump -u user -p -t -c test > data-only.sql**
#DEPLOYMENT (import structure only)<br/>
**mysql -u user -p test < structure-only.sql**
#DEPLOYMENT (import data only)<br/>
**mysql -u user -p test < data-only.sql**
The trick that made it work for me was the -c switch on the second export (this way full column names are used, and there are no 'column count' errors when you import it back in the new structure).

.db file and MySQL

I am having real issues with a .db file its around 20gb in size with three tables and the rest data.
I am on a mac so i am having to use some crappy apps but it wont open in Access.
Does any one know what software will produce a .db file and what software will allow me to open it and export it as a CSV or MySQL file ?
Also if the connection was interrupted during transit could this effect the file ?
Since mac is BSD-based now, try opening a terminal and executing the command file /path/to/large/db -- it should tell you at least what file type the DB is, and from there you can determine what program to use to open it. It might be MySQL, might be PostGreSQL, might be SQLite -- file will tell you.
Example:
$ file a.db
a.db: SQLite 3.x database
$ file ~/.kde/share/apps/amarok/mysqle/amarok/tracks.{frm,MYD,MYI}
~/.kde/share/apps/amarok/mysqle/amarok/tracks.frm: MySQL table definition file Version 10
~/.kde/share/apps/amarok/mysqle/amarok/tracks.MYD: data
~/.kde/share/apps/amarok/mysqle/amarok/tracks.MYI: MySQL MISAM compressed data file Version 1
So it's SQLite v3? Then try
sqlite3 /path/to/db
and you can perform pretty much standard SQL from the CLI. At the CLI, you can type .tables to list all the tables in that DB. -- Or if you prefer a GUI, there are a few options listed in this question. Accepted answer was SQLite manager for Firefox.
Then you could drop tables or delete as you see fit.
Here's an example of dumping a csv to stdout:
$ sqlite3 -separator ',' -list a.db "SELECT * FROM t"
3,4
3,5
100,200
And to store it to a file -- the > operator redirects output to a file you name:
$ sqlite3 -separator ',' -list a.db "SELECT * FROM t" > a.csv
$ cat a.csv # puts the contents of a.csv on stdout
3,4
3,5
100,200
-separator ',' indicates that fields should be delimited by a comma; -list means to put row data on the same line, using the delimiter; a.db indicates which db to use; and "SELECT * FROM t" is just the SQL command to execute.
I'm not a Mac user but if it's a SQLite file I've heard great things about Base.