Wednesday, April 29, 2015

How to display dplyr's SQL query

dplyr verbs can be chained to query a database without writing SQL queries. dplyr uses lazy evaluation, meaning that database queries are prepared and only executed when asked by a specific verb such as collect(). I was wondering if it is possible to display the SQL query generated by dplyr?

Indeed dplyr::explain() displays the SQL query generated by dplyr. I have copied a reproducible example below based on the dplyr database vignette.
 

Wednesday, April 08, 2015

Ipython notebook and R

I chose to use python 3. Several of the shell commands below have a "3" suffix in Debian testing as of April 2015: ipython3, pip3.

Install programs

I installed ipython-3-notebook (in Debian Jessie) from the synaptic package manager.

In order to install the R module, I installed PIP for python 3 in the synaptic package manager. PIP is the Python Package Index, a module installation tool. Then I used pip3 to install rpy2
sudo pip3 install rpy2
There is a blog post on how to avoid using sudo to install pip modules.

Install statsmodel, a module for statistical modelling and econometrics in python. Maybe I should have installed python-statsmodels as a Debian package instead? But I it seems to be linked to python 2.x instead of python 3 (it had a dependency on python 2.7-dev). Therefore I installed statsmodels with pip3, using the --user flag mentioned above to install is as a user only module.
pip3 install --user statsmodels
The installation took several minutes on my system. It seemed to be installing a number of dependencies. Many warnings about variables defined but not used were returned but the installation kept running. The final message was:
Successfully installed statsmodels numpy scipy pandas patsy python-dateutil pytz
Cleaning up...

Starting the Ipython notebook

Move to a directory where the notebooks will be stored, start a ipython notebook kernel
cd python
ipython3 notebook

Shortcuts

See also the Ipython Notebook shortcuts. Useful shorcuts are ESCAPE to go in navigation mode, ENTER, to enter edit mode. It seems one can use vim navigation keys j and k to move up and down cells. Pressing the "d" key twice deletes a cell. CTRL+ENTER run cell in place, SHIFT+ENTER to run the cell and jump to the next one, and ALT+ENTER to run the cell and insert a new cell below. 

Run R commands in the Ipython notebook


Load an ipython extension that deals with R commands
%load_ext rpy2.ipython
 Display a standard R dataset
%R head(cars)
%R plot(cars)
Use data from the python statsmodels module based on this page.
import statsmodels.datasets as sd
data = sd.longley.load_pandas()
Print column names of the dataset
print(data.endog_name)
print(data.exog_name)
Print a dataset as an html table by simply giving its name in the cell. For example this data frame contains exogenous variables:
data.exog
Python can pass variables to R with the following command:
totemp = data.endog
gnp = data.exog['GNP']
%R -i totemp,gnp
Estimate a linear model with R
%%R
fit <- br="" gnp="" least-squares="" lm="" nbsp="" regression="" totemp="">print(fit$coefficients)  # Display the coefficients of the fit.
plot(gnp, totemp)  # Plot the data points.
abline(fit)  # And plot the linear regression.
Plot the datapoints and linear regression with the ggplot2 package
%%R
library(ggplot2)
ggplot(data = NULL, aes(x =gnp, y = totemp)) +
    geom_point() +
    geom_abline( aes(intercept=coef(fit)[1], slope=coef(fit)[2]))

Wednesday, April 01, 2015

Virtual Machine setup for development purposes


Creating a Virtual machine with Vagrant and PuPHeT.


According to those 2013 stack overflow questions, there were many reasons not to develop in a VM, unless one had to specifically develop for several OS:
But in the same year, the PhPHet developer explained why he thinks that one has to develop in a virtual machine.

Running a VM 

I followed the vagrant instructions to install a basic VM.
vagrant init hashicorp/precise32 vagrant up
"The guest machine entered an invalid state while waiting for it
to boot. " [...] "If the provider you're using has a GUI that comes with it, it is often helpful to open that and watch the machine"
I started the virtual machine in virtual box, an error message came up: 
"VT-x is disabled in the BIOS. (VERR_VMX_MSR_VMXON_DISABLED)."
Under Machine / Settings/ System / Acceleration, I disabled the Hardware virtualisation. The VM could then start. This works for 32 bits systems. Unfortunately 64 bit systems require hardware virtualisation, this means I cannot change this setting for 64 systems. I'll have to enable VT-x in the BIOS later on.

After I installed Virtual box, my mouse was rendered invisible. This may be due to the fact that the mouse was captured and that I didn't know the host capture key (default to the right Ctrl key) to free the mouse from the virtual machine's window.

Connecting to the virtual machine

Connecting from the virtual box GUI. The default user is "vagrant" and password "vagrant".

Connecting with SSH into the machine from a command prompt:
 vagrant ssh

 

Shared folder

A folder can be share with the host operating system. In virtual Box settings for the machine, under shared folder, create a machine folder and set it to auto-mount in the guest operating system.

Other tools



Messages by the vagrant creator

Tao of hashicorp
Comparing Filesystem Performance in Virtual Machines Automation Obsessed