SDM
People Publications Projects

Back to ESG Home

ESG Data Node Deployment Experience at NERSC

Jump to:

This document explains our experience in ESG data node installation at NERSC.

Please note that there is an installation script provided at the distribution site in order to automate the process (visit http://rainbow.llnl.gov/dist). The esg-node script prepares environment variables, downloads required software, installs and configures all necessary components, and also checks for updates automatically. However, it requires root privileges, and some of the installation directories and path information are embedded inside the script.

In order to complete installation without root privileges, we have edited the installation scripts (see esg-node-nerc and esg-globus-nersc ). Please take a look at those, you will need to change installation directory, log files, user passwords, script directory, etc. Besides, you might need to manually start/stop some services such as PostgreSQL. Here is the diff output for those modified scripts:

A Step-by-Step Installation Guide

The following is a step-by-step guide to install ESG Data node components based on the information given in the installation script (version 0.2.9).

Relevant info

Prerequisites:

Here is a list of components you need to have before starting installation. Also you need an account from ESG gateway portal (from association gateway, i.e.: http://pcmdi3.llnl.gov/esgcet - with publishing role). You will need this to authenticate with MyProxy client.

Note: Make sure your account has been added as a data publisher!

We assume that you already have necessary components installed (PostgreSQL, Apache Ant, JAVA, git, and curl), and PATH and LD_LIBRARY_PATH are set properly. If not, see NecessaryComponents first.

Determine an installation directory and set an environment variable for the installation directory, INSTALL_HOME (that will be referred in the next steps).

Initializing PostgreSQL database

Create data and log directories for PostgreSQL, and dont forget to set proper ownership/permission for those directories.

export PGDATA=$INSTALL_HOME/pgsql/data
mkdir -p $PGDATA
mkdir -p $INSTALL_HOME/pgsql/log
chmod 700
$PGDATA

Initialize the database by running initdb;

$initdb -D $PGDATA

By default, "trust" authentication has been enabled for local connections. Change this by editing "pg_hba.conf". Or, give -A option while running initdb command. It is recommended to use "md5" since it sends encrypted passwords. In "trust" authentication, any local user can connect to the database.

Note: I would recommend to change it to "md5" after you run esgsetup --db (after CDAT installation) - had a problem in this when esgsetup is connecting to the database.

Start database:

pg_ctl -D $PGDATA  start
Set environment variables:
export PGUSER=dbsuper
export PGPORT=5432
export PGHOST=localhost

And, create a database user ("dbsuper") and set a password - this will be needed while setting up ESGCET later.

createuser
-P -s -e dbsuper

Edit "$PGDATA/postgresql.conf", to change port number (default is 5432) and other parameters such as logging options ( log directory and log filename).

Verify your installation by running:

psql -U dbsuper postgres

Installing CDAT (Python + CDMS)

Set an installation directory for CDAT (Climate Data Analysis Tool):

export CDAT_HOME=$INSTALL_HOME/cdat

We will be using version eb8b668. Download the package and compile it...

git clone http://esg-repo.llnl.gov/git/cdat.git
cd cdat
git checkout eb8b668

If you have Python already installed, specify the path. Note that Python should have tk/tcl support, install Tkinter.

./configure --prefix=$CDAT_HOME --with-python=/usr/bin/python
--enable-esg
make

Alternatively, if you dont give the Python path, CDAT installer will download and install Python itself.

./configure --prefix=$CDAT_HOME
--enable-esg
make

Note: In Ubuntu; first install Tkinter packages and then specify the path for Python while running "configure". Python installed by CDAT (default) does not work somehow (saying missing tk/tcl support in Python).

Also update path information (make sure cdat/bin is before /usr/bin in your path):

export PATH=$CDAT_HOME/bin:$CDAT_HOME/Externals/lib:$PATH
export LD_LIBRARY_PATH=$CDAT_HOME/lib:$CDAT_HOME/Externals/lib:$LD_LIBRARY_PATH

Installing ESGCET (esgcet-2.4-py2.6.egg)

Dowload ESGCET package (that will be required scripts and packages for publishing)

wget http://rainbow.llnl.gov/dist/externals/esgcet-2.4-py2.6.egg
chmod 755 esgcet-2.4-py2.6.egg
easy_install esgcet-2.4-py2.6.egg

Complete the setup by giving an organization ID (rootid in the following):

bin/esgsetup --config --rootid nersc

$HOME/.esgcet/esg.ini will be created and initial configuration will be saved in esg.ini file.

Before proceeding further, please make sure that PosgreSQL is up and running. Run esgsetup to create database entries. It will ask the database admin user (dbsuper), and will create esgcet database with owner esgcet (you also need to set a password for esgcet database user).

$CDAT_HOME/bin/esgsetup --db

Update environment by adding the organization ID as follows (advised);

export ESG_ROOT_ID=nersc

Note that you might need to edit ~/.esgcet/esg.ini to set the password for esgcet database user.

There is already a "test" project defined in esg.ini

Dowload a sample data file and scan this sample dataset and publish (ESG_ROOT_ID is nersc). Note that this sample file should be inside the Thredds root catalog directory! Also. you need to specify the full path of the directory while running esgscan_directory.

mkdir $INSTALL_HOME/data/testdir
cd testdir
wget http://rainbow.llnl.gov/dist/externals/sftlf.nc
cd ..
esgscan_directory --dataset pcdmi.nersc.test --project test $INSTALL_HOME/data/testdir > scan.out
esgpublish --map scan.out --project test

Tomcat Installation

Download and install tomcat:

wget http://download.filehat.com/apache/tomcat/tomcat-6/v6.0.26/bin/apache-tomcat-6.0.26.tar.gz
tar xvf apache-tomcat-6.0.26.tar.gz -C $INSTALL_HOME
cd $INSTALL_HOME
ln -s apache-tomcat-6.0.26 tomcat

Set TOMCAT_HOME environment variable:

export TOMCAT_HOME=$INSTALL_HOME/tomcat
cd $TOMCAT_HOME/bin
tar xvf jsvc.tar.gz
cd jsvx-src
autoconf
chmod 755 configure; ./configure --with-java=$JAVA_HOME
make
cp jsvc $TOMCAT_HOME/bin

Next step is to configure tomcat by editing $TOMCAT_HOME/conf/server.xml.

Make sure server.xml has appropriate permissions (chmod 600 server.xml). By default, port 8080 and 8443 will be used. If you want to change and use 80 and 443 instead, edit Connector port numbers in server.xml.

You may want to look at Tomcat documentation for Servlet/JSP and SSL configuration.

Now, we setup the keystore:

$JAVA_HOME/bin/keytool -genkey -alias tomcat -keyalg RSA -keystore $TOMCAT_HOME/conf/keystore-tomcat -validity 365

It will ask keystore password and key password for tomcat (default is "changeit").

Go to conf directory and download truststore file:

cd $TOMCAT_HOME/conf
wget http://rainbow.llnl.gov/dist/externals/jssecacerts

Open server.xml and edit path for keystore and truststore. You can search keystore_file and truststore_file in this preconfigured sample server.xml file.

It is beneficial to create a "tomcat" user and start tomcat with this user's privileges. In that case, change the ownership of the tomcat directory

(chmod -R tomcat $TOMCAT_HOME).

It is useful to set CATALINA_HOME environment variable. You can start/stop tomcat using the cataline.sh script.

export CATALINA_HOME=$TOMCAT_HOME
$CATALINA_HOME/bin/catalina.sh stop
$CATALINA_HOME/bin/catalina.sh start

In order to use jsvc, start tomcat (make sure JAVA_HOME is set) by running the following command (preparing a startup script will be helpful);

cd $TOMCAT_HOME
/bin/jsvc -Djava.endorsed.dirs=./endorsed -pidfile /tmp/tomcat-jsvc.pid \
-cp $TOMCAT_HOME/bin/bootstrap.jar:$TOMCAT_HOME/bin/tomcat-juli.jar:$TOMCAT_HOME/bin/commons-daemon.jar \
-outfile ./logs/catalina.out -errfile ./logs/catalina.err -Xmx2048m -Xms1024m \
-Dsun.security.ssl.allowUnsafeRenegotiation=true org.apache.catalina.startup.Bootstrap

Stop tomcat by running the following (jsvc);

cd $TOMCAT_HOME
./bin/jsvc -pidfile /tmp/tomcat-jsvc.pid -stop org.apache.catalina.startup.Bootstrap

Thredds data server (v 4.1.6)

Download Thredds war file and put into the tomcat "webapps" directory:

cd $TOMCAT_HOME/webapps
wget http://rainbow.llnl.gov/dist/thredds/4.1.6/thredds.war

Restart tomcat (after restart the war file will be extracted under webapps directory)

Edit $TOMCAT_HOME/conf/tomcat-user.xml. Search for user entry in tomcat-user.xml and add a user ( dnode_user) with administrative privileges. The entry should look like:

<tomcat-users>
<role rolename="tdsConfig"/>
<role rolename="manager"/>
<role rolename="tdrAdmin"/>
<user username="dnode_user" password="digest_password_here" roles="tdrAdmin,tdsConfig"/>
</tomcat-users>

First, generate a password hash by running

$TOMCAT/bin/digest.sh -a SHA <password for dnode_user>

Use this password hash and add line, shown below, to the tomcat-user.xml file. Then, restart the tomcat.

<user_entry='<user username="dnode_user" password="<PASSWORD_HASH_HERE>" roles="tdrAdmin,tdsConfig">

Configure tomcat for digest authentication. Create directory $TOMCAT_HOME/conf/Catalina/localhost if does not exists. Add or edit thredds.xml file in $TOMCAT_HOME/conf/Catalina/localhost. It should look like:

<?xml version="1.0" encoding="UTF-8"?>
<Context path="/thredds">
<Realm className="org.apache.catalina.realm.MemoryRealm" digest="SHA" />
</Context>

A sample web.xml file is given here . Make sure SSL is enabled (this is used by the ESG-publisher to re-initialize Thredds Data server andcheck logs). It should look like:

<user-data-constraint>
<transport-guarantee>CONFIDENTIAL</transport-guarantee>
</user-data-constraint>

Note: esg.ini is important. Make sure "thredds_url" "thredds_reinit_error_url" and "thredds_reinit_url" are correct (they should point to full host name - not localhost)

Security-Token-Filters and Certificate From Gateway

Here, we are using gateway node ESG-PCMDI(pcmdi3.llnl.gov/esgcet) as myProxy end-point (default myProxy port 2119).

Download necessary classes into a temporary directory:

cd $TOMCAT_HOME/temp
wget http://rainbow.llnl.gov/dist/utils/InstallCert.class
wget http://rainbow.llnl.gov/dist/utils/InstallCert$SavingTrustManager.class

End-point is pcmdi3.llnl.gov, SSL port is 443, and default password for SSL end point is "changeit"

cd $TOMCAT_HOME/conf
cp jssecacerts jssecacerts.bak
$JAVA_HOME/bin/java -classpath .:$TOMCAT_HOME/temp InstallCert  pcmdi3.llnl.gov:443 <password>

This will add certificate to keystore "jssecacerts". Change owner and permission of that file (chmod 644 jssecacerts; chown tomcat jssecacerts).

Note: Copy jssecacerts into $JAVA_HOME/jre/lib/security (Installation script does this but probably this is not necessary! Its path has been specified in server.xml already) cp -p $TOMCAT_HOME/conf/jssecacerts $JAVA_HOME/jre/lib/security

Add following into the environmen (optional)

export ESG_GATEWAY_NAME=ESG-PCMDI
export ESG_GATEWAY_SVC_ROOT=pcmdi3.llnl.gov/esgcet
export MYPROXY_SERVER=pcmdi3.llnl.gov

Download ESG token validator filters

cd $TOMCAT_HOME/webapps/thredds/WEB-INF/lib
wget http://rainbow.llnl.gov/dist/filters/eske.jar
wget http://rainbow.llnl.gov/dist/filters/hessian-3.0.20.jar

Now, you need to edit $TOMCAT_HOME/webapps/thredds/WEB-INF/web.xml and add the following filter specifications:

Add the following ESG security token filter and servlet entries into the web.xml:

A sample web.xml file is given here.

More information about ESG token validation filter can be found at ESG data node documentation

Restart Tomcat. Make sure PostgreSQL is running.

$CDAT_HOME/bin/esgsetup --thredds --publish --gateway pcmdi3.llnl.gov

In this step, you need to specify Thredds content directory and ESG data path root directory. If they dont exist, create root directory (and replica directory).

mkdir $INSTALL_HOME/data
mkdir $INSTALL_HOME/data.replica

You may also need to edit esg.ini file and change the path for content directory. It should look like (give full path)

thredds_dataset_roots = esg_dataroot | /project/projectdirs/esg/datanode/data

Make sure thredds_username and thredds_password are set correctly.

Verify whether everything is configured properly (dont forget to restart Tomcat) by creating Tredds catalog for the data set we have scanned before. (ESG_ROOT_ID is nersc).

esgpublish --use-existing pcdmi.nersc.test --noscan --thredds

This step might take some time. It will reinitialize the Thredds Data Server, so make sure url's are set correctly in ~/.esgcet/esg.ini

Node Manager (0.0.2)

cd $TOMCAT_HOME/temp/
wget http://rainbow.llnl.gov/dist/esg-node/esg-node.0.0.2.tar.gz
tar xzf esg-node.0.0.2.tar.gz
cd esg-node.0.0.2

Go to Tomcat webapp directory, and replace tokens in node.properties

mkdir -p $TOMCAT_HOME/webapps/esg-node
cd $TOMCAT_HOME/webapps/esg-node
jar xvf $TOMCAT_HOME/esg-node.0.0.2/esg-node.war
cd WEB-INF/classes

Edit node.properties. Change the following options in node.properties file (in webapps/esg-node/WEB-INF/classes)

db.driver -> org.postgresql.Driver
db.protocol -> jdbc:postgresql
db.host -> localhost db.port -> 5432
db.database -> esgcet
db.user -> dbsuper
db.password -> <dbsuper password>
mail.smtp.host -> <mail.admin.address>

Create esgcet database if not created yet

createdb esgcet

Configure PostgreSQL by running:

cd $TOMCAT_HOME/temp/esg-node.0.0.2/db
ant -buildfile database-tasks.ant.xml \
-Dnode.property.file=$TOMCAT_HOME/webapps/esg-node/WEB-INF/classes/node.properties \
-Dsql.jdbc.base.url=jdbc:postgresql://localhost:5432/ -Dsql.jdbc.database.name=esgcet \
-Dsql.jdbc.database.user=dbsuper \ -Dsql.jdbc.database.password=<dbsuper_password>
-Dsql.jdbc.driver.jar=$TOMCAT_HOME/webapps/esg-node/WEB-INF/lib/postgresql-8.3-603.jdbc3.jar \
make_node_db

Restart Tomcat.

Globus Installation

See esg-globus-nersc.

Set Environmental Variables

Set installation directory (INSTALL_HOME) and create an environment file "$INSTALL_DIR/env.sh", so it can be used for sourcing the environment.

cat $INSTALL_HOME/env.sh << EOF
export CDAT_HOME=$INSTALL_HOME/cdat
export TOMCAT_HOME=$INSTALL_HOME/tomcat
export CATALINA_HOME=$TOMCAT_HOME
export GLOBUS_HOME=$INSTALL_HOME/globus

export LD_LIBRARY_PATH=$CDAT_HOME/lib:$CDAT_HOME/Externals/lib:$GLOBUS_HOME/lib:$LD_LIBRARY_PATH
export PATH=$CDAT_HOME/bin:$CDAT_HOME/Externals/bin:$TOMCAT_HOME/bin:$GLOBUS_HOME/bin:$PATH

export PGDATA=$INSTALL_HOME/pgsql/data
export PGUSER=dbsuper
export PGPORT=5432
export PGHOST=localhost

export ESG_ROOT_ID=nersc
export ESG_GATEWAY_NAME=ESG-PCMDI
export ESG_GATEWAY_SVC_ROOT=pcmdi3.llnl.gov/esgcet

export MYPROXY_SERVER=pcmdi3.llnl.gov
export X509_CERT_DIR=~/.globus/certificates

EOF

Test Publication

myproxy-logon -s pcmdi3.llnl.gov -l <username_of_your_account_from gateway> -p 2119 -o ~/.globus/certificate-file -T

esglist_files pcmdi.nersc.test

esgpublish --use-existing pcmdi.nersc.test --noscan --publish

esgunpublish --skip-thredds pcmdi.nersc.test

Installing Necessary Components

Curl

export CURL_HOME=$INSTALL_HOME/curl
wget http://curl.haxx.se/download/curl-7.20.1.tar.gz
tar xvzf curl-7.20.1.tar.gz
cd curl-7.20.1
./configure --prefix=$CURL_HOME
make all
make install
$CURL_HOME/bin/curl --version
export PATH=$CURL_HOME/bin:$PATH
export LD_LIBRARY_PATH=$CURL_HOME/lib:$LD_LIBRARY_PATH

GIT (with libcurl)

export GIT_HOME=$INSTALL_HOME/git
wget http://kernel.org/pub/software/scm/git/git-1.7.1.tar.gz
tar xvzf git-1.7.1.tar.gz
cd git-1.7.1
./configure --prefix=$GIT_HOME
make all
make install

export PATH=$GIT_HOME/bin:$PATH
export LD_LIBRARY_PATH=$GIT_HOME/lib:$LD_LIBRARY_PATH

Java

export JAVA_HOME=$INSTALL_HOME/java
wget http://rainbow.llnl.gov/dist/java/1.6.0_20/jdk1.6.0_20-32.tar.gz
tar xvfz jdk1.6.0_20-32.tar.gz -C $INSTALL_HOME
ln -s $INSTALL_HOME/jdk1.6.0_20
$JAVA_HOME
$JAVA_HOME/bin/java --version
export PATH=$JAVA_HOME/bin:$PATH

Apache ANT

export ANT_HOME=$INTALL_HOME/ant
wget http://www.trieuvan.com/apache/ant/binaries/apache-ant-1.8.1-bin.tar.gz
tar xvfz apache-ant-1.8.1-bin.tar.gz -C $INSTALL_HOME
ln -s $INSTALL_HOME/apache-ant-1.8.1
$ANT_HOME
$ANT_HOME/bin/ant -version
export PATH=$ANT_HOME/bin:$PATH

PostgreSQL

export PGHOME=$INSTALL_HOME/pgsql
wget http://ftp9.us.postgresql.org/pub/mirrors/postgresql/source/v8.4.3/postgresql-8.4.3.tar.gz
tar xvzf postgresql-8.4.3.tar.gz
cd postgresql-8.4.3
./configure --prefix=$PGHOME --enable-thread-safety
make
make install
cd contrib/tablefunc
make
make install
export PATH=$PGHOME/bin:$PATH
export LD_LIBRARY_PATH=$PGHOME/lib:$LD_LIBRARY_PATH