I live right outside of Washington D.C. I can walk to NIH, and I pass three hospitals on the way to the grocery store.
Not surprisingly, I have met a few biologist that needed programming help.
After helping a few biologists in completely independent situations, I began to notice a pattern. Researchers have data, can write R, but can’t organize all of the packages and dependencies they need to execute their analysis. Every time, I install R using Anaconda and hook into the Bioconda channel, and all is well.
After the second encounter, I wrote in a guide using GitHub Markdown. I have given it out three or four times since then! I figure it is about time to make it public. See the guide below.
Installing R and RStudio via Anaconda for Biologists
This guide will take users through installation of R in a new Anaconda virtual environment. I always recommend biologists use Anaconda for managing R and its dependencies, because it gives us access to the Bioconda channel. The Bioconda channel is an incredibly powerful hub for many of the most important bioinformatic software. Not only does it consolidate the packages into a single channel, it manages version dependencies between them.
Any software stack that can be built with Anaconda can be replicated on any other similar system with ease. Reproduce-ability of environments and results is a must for biological research in the computer age.
We will make frequent mention of Python in this guide. This is because, at its core, Anaconda is a virtualizer for Python. There is a close relationship between Python and R in the biology community, so they are managed simultaneously by Anaconda. The principles in this guide can be similarly applied to creating Python environments. See the relevant section of the related guide in this repository on installing Python.
Installing and Setting Up Anaconda (UNIX)
Download Anaconda version 4.3.1 (for Python 3.6) for the appropriate system here: https://www.continuum.io/downloads
On UNIX-like systems, install with the following command:
bash Anaconda3-4.3.1-Linux-x86_64.sh
You are encouraged to use the bash command regardless whether or not you are in a bash shell. It is very important that Linux users do not run this with sudo, as it will confuse Anaconda, causing it to install in the /root/ directory.
When prompted to append the Anaconda path to your system’s $PATH, it is advised you say [no] unless you intend for Anaconda Python to overwrite your system Python.
Test by navigating to the installation directory, then navigating to bin/ and running conda --version
.
For example:
cd ~/anaconda3/bin conda --version
Installing and Setting Up Anaconda (Windows)
Download Anaconda3 version 4.3.1 (for Python 3.6) for the appropriate system here: https://www.continuum.io/downloads
On Windows, simply run the installer with the default settings.
Windows users may want to untick the box that overwrites the system Python with Anaconda Python 3.6.1.
To verify a successful installation, open the Anaconda Prompt executable (typically found by searching in your taskbar) and run conda --version
. Locate the Anaconda Prompt by searching for it in your Windows Taskbar.
Symlinking Anaconda Commands (UNIX)
If you opted not to say [no] to adding Anaconda to your $PATH variables (which was advised) you may have to symlink common anaconda commands to continue using this guide. This section does not apply to Windows users, as we will assume they are working out of the Anaconda prompt rather than their system prompt.
Recall the installation location of Anaconda. If you went with the default paths it will be installed at ~/anaconda3 for Linux users. We want to symlink conda, activate, and deactivate so that we can manage virtual environments without messing with our system $PATH variables. Use the following commands to symlink the Anaconda commands.
NOTE: Make sure these do not interfere with any existing shell commands and adjust if necessary. For most people, this will not be a problem.
sudo ln -s /home/<user>/anaconda3/bin/conda /usr/bin/conda sudo ln -s /home/<user>/anaconda3/bin/activate /usr/bin/activate sudo ln -s /home/<user>/anaconda3/bin/deactivate /usr/bin/deactivate
Test by running conda --version
from outside the Anaconda installation directory.
Creating a Virtual Environment (UNIX)
We will reference this document: conda.io/docs/using/envs.html throughout.
We will name our environment “BioSandbox” for these examples.
To create an environment: conda create --name BioSandbox python=3.5
To activate the environment: source activate BioSandbox
To deactivate the environment: source deactivate BioSandbox
To list environments: conda info --envs
To remove an environment: conda remove --name BioSandbox --all
After activating the environment, the command line will be prepended by (BioSandbox) and the $PATH variable will be modified to point to anaconda3/envs/BioSandbox/bin.
We will install dependencies to the virtual environment, so make sure the (BioSandbox)
environment is active for the next steps.
Creating a Virtual Environment (Windows)
We will reference this document: conda.io/docs/using/envs.html throughout.
Windows users will run these commands in the Anaconda Prompt. The only difference is Windows users will ignore the source
command when acitvating and deactivating virtual environments.
To create an environment: conda create --name BioSandbox python=3.5
To activate the environment: activate BioSandbox
To deactivate the environment: deactivate BioSandbox
To list environments: conda info --envs
To remove an environment: conda remove --name BioSandbox --all
After activating the environment, the command line will be prepended by (BioSandbox) and the $PATH variable will be modified to point to anaconda3/envs/BioSandbox/bin.
We will install dependencies to the virtual environment, so make sure the (BioSandbox) environment is active for the next steps.
Installing R and RStudio
Run these commands in order. This will install R, RStudio, and add various channels for future package installations.
conda config --add channels conda-forge conda config --add channels defaults conda config --add channels r conda config --add channels bioconda conda install r conda install rstudio
Using the Anaconda Environment
Now, whenever you are working on this project, make sure you are operating in the (BioSandbox) virtual environment. When you are using an IDE, you may have to tweak the settings to make sure it is using the (BioSandbox) environment. When you are using the command line, make sure that you have activated the environment and (BioSandbox) is prepended to the command line.
For maximum convenience, start RStudio from the Anaconda Prompt or command line by running rstudio from your virtual environment. This will load RStudio with your virtual environment’s R installation rather than your system’s R installation.
Installing Bioconda Packages
Most of the popular bioinformatics packages that you will encounter are managed by Bioconda. In addition, most of them are UNIX-only. Use the following commands to install a few you may be interested in:
conda install bioconductor-phyloseq conda install bioconductor-rsamtools conda install bwa conda install fastqc conda install multiqc conda install delly conda install bowtie conda install bedtools
Check out bioconda.github.io/recipes.html for the full list of Bioconda packages, and <a href=”https://docs.continuum.io/anaconda/packages/pkg-docs target=”_blank”>docs.continuum.io/anaconda/pkg-docs for a full list of Anaconda packages.
Opening RStudio with your Anaconda Environment
Simply run:
rstudio
in the Anaconda Prompt or the terminal with your Anaconda Environment active. RStudio will open with the R version and packages of the environment. No hassle.
Thank you for sharing this guide! Unfortunately I could not get this to work, error message:
dyld: Library not loaded: @rpath/libintl.9.dylib
Referenced from: /Users/jpwhalley/anaconda3/envs/BioSandbox/lib/R/lib/libR.dylib
Reason: image not found
Trace/BPT trap: 5
A possible solution is to install R from a different channel:
https://github.com/ContinuumIO/anaconda-issues/issues/6312
However this probably causes issues with other dependencies.
For me it would be great to get python3, R and rpy2 (and various packages) all working together. Alas, I think this is beyond my technical capabilities.
Thank you for sharing this guide. I had some difficulties installing R Studio, presumably because I also had Python 2.7 that conflicts the installation. I used the following line instead to get R Studio installed.
conda create -n rstudio rstudio python=3.7
(courtsey of Ray Donnelly [mingwandroid] from github)
Hey Chris, Thanks a lot for the tutorial, very simple to follow and everything worked perfectly. Just wanted to make sure I understand the intended workflow:
Previously, I had Rstudio installed in the Applications folder of my Mac, and would load bioconductor packages to run specific applications in R.
Now I have it installed in my BioSandbox environment, and would use rstudio by first source activating this environment and then loading rstudio from command line (at which point the IDE pops up).
When I need to download bioinformatics packages through conda, I should then install them using conda only after first source activating BioSandbox, at which point they will be installed into that environment.
Is it also true that if I install an R package in Rstudio after I’ve opened it from BioSandbox, that it will be preserved in that environment / that Rstudio (vs. the Rstudio I have in my applications folder)? Should I actually delete the Rstduio in my applications folder?
Thanks and sorry if any of these questions come off as too basic.
One addtional related question. When I tried to install bioconductor-phyloseq and bioconductor-rsamtools I got an unsatisfiable error message noting that they were both found to be in conflict with r-recipes.
I’m not sure how to navigate this issue and would greatly appreciate any help.
Thanks!
Hi Paul,
If you start RStudio from the Anaconda terminal, it will use an independent instance of R and all of its package installs should be attached to the Anaconda environment. They will be preserved in the Anaconda environment.
There is nothing wrong with still having a global install of R/RStudio (Python users are used to dealing with a similar situation.) Just remember which is which to get the most out of Anaconda.
I haven’t encountered your issue with bioconductor-phyloseq or bioconductor-rsamtools, although I haven’t tried to install them recently. I would try installing an older version of R or R recipes (maybe R 3.3.3). Anyone else have guidance on this?
Best,
Chris
Hey Chris, thanks for the reply. Makes a lot of sense.
As far as the last piece, still no luck with that. I used conda info to look at the dependencies for bioconductor-rsamtools and r-recipes. Seems like the conflicting dependency is r-base.
biocondocutor-rsamtools lists r-base 3.4.1* as a dependency
r-recipes lists r-base >=3.4.2,<3.4.3a0
Not sure if that makes any sense to you or if there is a quick solution that comes to mind.
I also wanted to separately note that after doing this for some reason I can't use source activate or conda from my home directory. I have to 'log into bash' first. Any thoughts on that?
They work fine once I activate the BioSandbox environment or if I type bash first.
Hello, I am trying to install R using Anaconda-Navigator. In theory it installs but when trying to launch it RStudio won’t open but instead it requests a file be saved then quits. Any fix for this?
Thanks for the helpful guide. I have few questions here…a. can I have a single conda environment for both python and r? b. Can “conda install” resolve dependencies while installing r packages?
According to this very helpful post: https://community.rstudio.com/t/why-not-r-via-conda/9438/4
You should NOT “mix proprietary channels (anaconda,r) with open-source channels (conda-forge, bioconda, defaults). Problems will arise.”
The author cites that R versions from different channels are often not compatible since they are built with different internals. The Anaconda and R channels should largely be compatible since the R channel is sort of a subset of the annaconda channel.
If you want to use bioconda the author recommends the following: “The conda-forge, bioconda, and defaults channels are open-source channels. They are largely compatible with the following caveat. In your .condarc, you should always use the following order: 1. conda-forge, 2. bioconda, 3. defaults. If you stick to that order, you should not have any problems.”
Try making a new environment, activating it and then running the
conda config
commands in the following order “1. conda-forge, 2. bioconda, 3. defaults” leave out “r” and see if that helpsThank you for the awesome guide! Worked perfectly for me!
Thank you!!! That’s the best tutorial for R installation in Anaconda I’ve seen! And probably the only one for biologists )
I had several failed rstudio installations before I found this, even with official manual for creating R enviroment. Don’t know what was wrong but now everything works which makes me really happy! Thank you again
Major update!
As of issue 334, this channel priority has changed!
https://github.com/bioconda/bioconda-utils/pull/334
Bioconda is now lower priority to conda-forge. Here’s how you set this up today:
conda config –add channels defaults
conda config –add channels bioconda
conda config –add channels conda-forge
I’m not sure if the r channel should be the lowest or second lowest priority, but it’s way down there.