2.1. Manual Installation¶
2.1.1. Install Conda¶
To use viral-ngs, you need to install the Conda package manager which is most easily obtained via the Miniconda Python distribution. Miniconda can be installed on your system without admin priviledges.
After installing Miniconda for your platform, be sure to update it:
conda update -y conda
2.1.2. Configure Conda¶
The viral-ngs software and its dependencies are distributed through the bioconda channel for the conda package manager. It is necessary to add this channel to the conda config:
conda config --add channels broad-viral conda config --add channels bioconda conda config --add channels r conda config --add channels conda-forge
2.3. Virtualized Installation (Easy Deploy)¶
The viral-ngs package includes a script that can be used to set up a complete virtualized environment for running viral-ngs either on a local machine via VirtualBox, or on AWS EC2. This is an easiesr way to get the software up and running, as it sets up most dependencies automatically within an environment known to work.
As noted above, GATK and NovoAlign cannot be installed automatically due to
licensing restrictions. In order to run the easy deployment script, you will
first need to license and download these tools, and set the
NOVOALIGN_LICENSE_PATH environment variables.
The easy deployment script has been tested to run on OS X 10.11 (El Capitan) and Ubuntu 15.04 (Vivid Vervet).
2.3.2. Requirements for running on AWS EC2¶
In order to deploy a virtualized viral-ngs environment to AWS EC2, you will first need
to set up the appropriate credentials for creating EC2 instances. AWS credentials and
SSH keypairs are passed in as environment variables, and
run.sh will prompt for
the values if the environment variables are not set (though the values given
interactively are ephemeral).
The following environment variables are needed:
For more information, see the following AWS pages:
- Getting set up with AWS
- How to create an AWS EC2 key pair
- Defining security group rules
- List of EC2 regions
Note that the EC2 instance created by the easy-deploy script is currently configured to be an m4.2xlarge, which costs ~$0.55/hour to run. It is suggested that the instance be terminated via the AWS web console once processing with viral-ngs is complete. See the AWS page for current pricing .
As viral-ngs does not currently build a depletion database for BMTagger or BLAST automatically,
it is the responsibility of the user to create a depletion database for use within the virtualized
viral-ngs environment. It can be created within the virtual machine (VM), or uploaded
after the fact via
2.3.4. Running Easy Deploy¶
Running Easy Deploy to create a virtualized viral-ngs environment is as simple as running
easy-deploy-virtualized/run.sh. Before running this script, copy any data you wish to have in the vm to the
easy-deploy-virtualized/data directory on your local machine. During setup, the
files will be copied into the
~/data/ directory of virtual machine.
To start, the script
run.sh installs the necessary dependencies on the user’s machine (ansible, vagrant, virtualbox, and virtualbox-aws). The provisioning is handled by Ansible, with Vagrant handling creation of the VMs and EC2 instances. On OSX it depends on Homebrew, and will install it if it is not present. It depends on having apt on linux. Ruby >=2.0 is required for vagrant-aws, so versions of Ubuntu older than 15.04 (notably 14.04 LTS) will need to have ruby >=2.0 installed and made default.
2.3.5. Details on Easy Deploy¶
Per the Vagrantfile, local VM RAM usage is set to 8GB. On EC2 it currently uses an m4.2xlarge instance with 32GB of RAM and 8 vCPUs.
Ansible clones the master branch of viral-ngs from GitHub, creates a Python 3 virtual environment, and installs the viral-ngs Python dependencies. The viral-ngs tool unit tests are run to download, install, and build all of the viral-ngs tools. A
Snakefile for viral-ngs is copied to the home directory of the VM (locally:
/home/vagrant/, on EC2: /home/ubuntu/), along with an associated
config.yaml file. Files to contain sample names (
sample-depletion.txt, etc.) are also created. A directory is created within the VM,
~/data/, to store data to be processed. This directory on the VM is synced to the
./data/ directory on the host machine, relative to the location of the
easy-deploy-virtualized/Vagrantfile. On local VMs, syncing of the directory is two-way and fast. On EC2 instances, the syncing is currently one way (local->EC2) due to Vagrant limitations.