I'm running a cygnus instance in a Fi-Lab vm. I've compiled it from branch release/0.6.0. Everything is working except that Hive extended tables for context updates are not created. Cygnus is succesfully receiving updates from Orion, they are forwarded to Cosmos and the HDFS files are created.
The cygnus log says Creating Hive external table=... and does not raise any error. Then, if I log into the cosmos head node, I can see the HDFS files created, but if I use the Hive console, the tables are not there.
If I type SHOW TABLES; in the Hive console, I can see some tables that end with _row and _column, so I guess it is working for others.
Any hint? Should I use another version of cygnus?
Finally I found the problem. Flume is using libthrift 0.7 but cygnus needs 0.9. The solution is in the README (at the end of section Installing Cygnus and its dependencies (from sources)) but I skipped it. You have to manually overwrite the jar file in the Flume bin distribution with the 0.9 version.
Related
I can't seem to get Flask to migrate my models. I'm following along with the Flask Mega-Tutorial series. At the database setup part of the tutorial, I'm just trying to substitute MySQL for SQLite that is used in the tutorial.
I followed SQLAlchemy's instructions for connecting to a MySQL database, and I've included mysqlclient in my pipfile.
But when I run:
flask db init
flask db migrate
I get the following:
INFO [alembic.runtime.migration] Context impl SQLiteImpl.
INFO [alembic.runtime.migration] Will assume non-transactional DDL.
ERROR [root] Error: Can't locate revision identified by '7bb962b87f19'
I've tried deleting the migrations folder, and I've deleted the environment and recreated to see if that would fix anything. But it keeps saying it can't locate revision. I thought maybe it did something with the database, I've read other solutions saying you have to flush the database. But, there aren't any tables created or any schemas in the database.
Other info:
DATABASE_URL="mysql://root:mypassword#localhost/flask_tutorial"
SQLite database runs fine and makes the migrations. So it must be something I've done with the mysql setup.
Any suggestions or ideas that may lead me in the right direction would be greatly appreciated.
The session was storing incorrect environmental variables exported. Had to completely restart my machine for the settings to reset. I even tried unset for the variables, which didn't remove the variables. Once Flask was able to pick up the correct variable settings for my configuration everything worked as planned.
Please make sure you have used this "SQLALCHEMY_TRACK_MODIFICATIONS= True"
Did you install MySqldb lib. to connect with your mysql db ?
I had the same issue. I had deleted the migrations folder but hadn't done the same for the app.db file. So, every time I tried to create any db migrations, it failed because it was using the old (and still existing) app.db file.
This is how my database was configured:
class Config(object):
# Database configuration
SQLALCHEMY_DATABASE_URI = os.environ.get('DATABASE_URL') or \
'sqlite:///' + os.path.join(basedir, 'app.db')
SQLALCHEMY_TRACK_MODIFICATIONS = False
Deleting the app.db file solved the error. Subsequent db migrations and upgrades worked well.
I have installed CDH 5.16 in a RHEL 7 server and installed kafka separately.
I am trying to load data from mysql to HDFS or Hive table on real time basis(CDC approach). That is if some data is updated or added in mysql table ,it should be immediately reflected in HDFS or Hive table.
Approach i have come up with:
Use kafka-connect to connect to mysql server and push table data to a kafka topic
and write a consumer code in spark-stream which reads the data from topic
and store it in HDFS.
One problem with this approach is, hive table on top of these files should
be refreshed periodically for the update to be reflected.
I also came to know of Kafka-Hive integration in HDP 3.1. Unfortunately i am using Hadoop 2.6.0. So cant leverage this feature.
Is there any other better way achieve this?
I am using Hadoop 2.6.0 and CDH 5.16.1
Today when trying to start mysql service, I got this error :
"mysql: job failed to start"
I needed to work with mysql, so I made a backup of my data directory (/var/lib/mysql) and reinstalled the server (mysql-server-5.6), note that I couldn't have used mysqldump because mysql wouldn't start, even with innodb_force_recovery>0
Now mysql starts just fine, but if I put back the old data directory, it shows the error mentioned before. I concluded that the problem comes from corrupted data
Now all I have is this data folder (containing ibdata1, ib_logfile* and such) and I want to restore all the data (not only the structure).
Thank you
"Old" and "new" are the same version 5.6? Try to check privilege of files, folders too.
Good luck.
I hope you tried all innodb_force_recovery values including 6. If MySQL still doesn't start follow instructions in my post https://twindb.com/recover-corrupt-mysql-database/ . There is a web interface to the data recovery toolkit on https://recovery.twindb.com/, you can upload and recover the database there if you are ok uploading the data.
I need to migrate the couchbase data into HDFS but the db and Hadoop clusters are not accessible to each other. So I cannot use sqoop in the recommended way. Is there a way to import couchbase data into local files (instead of HDFS) using sqoop. If it is possible I can do that and then transfer the local files using ftp and then use sqoop again to transfer them to HDFS.
If that's a bad solution, then is there any other way I can transfer all the cb data in local files. Creating views on this cb cluster is a difficult task and I would like to avoid using it.
Alternative solution (perhaps not as elegant, but it works):
Use Couchbase backup utility: cbbackup and save locally all data.
Transfer backup files to HDFS reachable network host.
Install Couchbase in the network segment where HDFS is reachable and use Couchbase restore from backup procedure to populate that instance.
Use Scoop (in recommended way) against that Couchbase instance that has access to HDFS.
You can use the cbbackup utility that comes with the Couchbase installation to export all data to backup files. By default the backups are actually stored in SQLite format, so you can move them to your Hadoop cluster and then use any JDBC SQLite driver to import the data from each *.cbb file individually with Sqoop. I actually wrote a blog about this a while ago, you can check it out.
To get you started, here's one of the many JDBC SQLite drivers out there.
You can use couchbase kafka adapter to stream data from couchbase to kafka and from kafka you can store in any file system you like. CouchbaseKafka adapter uses TAP protocol to push data to kafka.
https://github.com/paypal/couchbasekafka
With a Cloudera install of HBase, I saw three places have config information :
/etc/hbase/conf/hbase-site.xml,
/usr/lib/hbase/conf/hbase-site.xml,
and /var/run/cloudera-scm-agent/process/*-hbase-MASTER
Which one exactly is in effect? Or maybe all of them do?
In all cases of hbase the /etc/hbase/conf/hbase-site.xml file is always read. The /usr/lib/hbase/conf/hbase-site.xml is a symlink to /etc/hbase/conf/hbase-site.xml so it is the same file.
Lastly, anything in /var/run/ is a runtime variable and in your case it is the Cloudera Manager Agent. The Manager Agents are responsible for the management console and logging amongst other tasks.
I hope that helps,
Pat
The config file used is /usr/lib/hbase/conf/hbase-site.xml.
Other files aren't symbolic links.
Since the same configuration information needs to be used for other processes like HMaster,RegionServer, /usr/lib/hbase/conf/hbase-site.xml file is synced at different locations while initializing/preprocessing of these daemons. Hence it is advised to make any configuration changes in /usr/lib/hbase/conf/hbase-site.xml file only.
Also you need to make the same changes to these file on all nodes in your cluster and restart the HBase daemons.
I hope these answer your question.
Per my search and learning, HBase actually has two types of hbase-site.xml files, one for HMaster/RegionServer, and the other for client.
In Cloudera's distribution, the hbase-site.xml file in folder /var/run/cloudera-scm-agent/process/*-hbase-MASTER is the config used by the running HMaster process. Similar is for RegionServer.
Yet the site.xml file under /usr/lib/hbase/conf/ and /etc/hbase/conf/, symlinked from one to the other (according to #apesa), is for client usage. If one starts HBase shell on an HMaster host or a RegionServer host, this client config file will be used so the shell application knows how to connect to the ZooKeeper quorum to obtain the running HBase service. If one wants to use the HBase service from a client host, then he needs to copy this client xml file to the client host.
For regular Apache installation of HBase, as was indicated in Sachin's answer, the same hbase-site.xml is used for both purposes, though the HMaster, the RegionServer, and the client processes will use only the options needed and ignore the rest.
From experimenting with the hbase binary version 1.2.0-cdh5.16.1, it appears to use the Java classpath to find the hbase-site.xml file to use, whether running as a server or a client shell. There is a configuration parameter (--config) you can pass to hbase to control the config directory used, which by default is ./conf (run hbase to view the documented help on this).
This observation is supported by other answers on this topic (e.g. Question 14327367).
Therefore, to answer your specific question, to determine which config file is used on your machine, run hbase classpath and find which of the 3 directories appears first in the classpath.