Monday, October 10, 2016

Writing a thesis with Lyx / Latex

The Lyx editor is a WYSIWYM - What You See Is What You Mean editor for Latex. I have been using it for 3 years to write research articles. It is now time to compile 4 articles as thesis chapters. A collection of thesis templates has been made for Lyx. I used the default Lyx thesis template which comes pre-installed. The template makes it possible to compile many child documents into a main documents.




Modification to existing articles to integrate them as child documents in the thesis


One of my chapter title was too long to be used as page header on all even pages. This answer solved the issue: I inserted
\chaptermark{short title name} in ERT (Evil Red Text) after the chapter title.

Article 3 had an appendix which caused subsequent article 4 to have weird section number A,B,C instead of 1,2,3. I read the appendix after each chapter question and considered using  subapendix from the appendix package. But it returned an error "\AtBeginEnvironment{subappendices}" is an undefined control sequance. Maybe because I forgot to remove the \appendix option in Document/start appendix here. For the moment I decided not to use subappendices.


Wednesday, August 17, 2016

I miss the iceweasel logo

Becomes


Iceweasel was a re-branding of Firefox introduced by Debian because of license issue. The issue was solved in 2016, as explained in this bug report renaming Iceweasel to Firefox:
= About branding =
Mozilla & Debian both acknowledge that the branding issue mentioned in
bug 354622 is no longer relevant.
The Firefox logo was released under a free copyright license which
matches the DFSG.
As PhilGil wrote on the Debian Forum :
"I'll miss Iceweasel too, but eliminating the rebranding will free up the Mozilla team for more important work and maybe allow them to push out updates in a more timely manner."
A far more detailed article: The end of the Iceweasel Age.
"In essence, then, the logo-licensing problem, the trademark-usage incompatibility, and the patch-maintenance problem have all been resolved, so, Ledru said, Debian could return to the Firefox branding."
Note the patch maintenance problem was solved thanks to the Firefox ESR (Extended Support Release).

Wednesday, May 25, 2016

Jmulti over wine on Linux

Jmulti is a Java based time series analysis software. Unfortunately the Linux version is not maintained any more. I installed wine in the hope to use Jmulti on top of wine. I downloaded the executable “jmultiVM_win-4.24.exe” which is said to contain the Java virtual machine. Then I ran “wine jmultiVM_win-4.24.exe”. Install complains about “InvokeShellLinker failed to extract icon from L"C:\\jmulti4\\jmulti.exe" “ but installation is successful nevertheless.

To start jmulti go to the newly created directory
cd ~/.wine/drive_c/jmulti4/  
then:
 wine jmulti.exe

Black screen issue

Jmulti starts but I have black areas in menu. Rmathew explains how to remove DirectX-based acceleration for Java 2D completely.  Looking for a registry key like:
HKEY_CURRENT_USER\Software\JavaSoft\Java2D\1.5.0_11
and setting the value of "DXAcceleration" to "0" fixes it.

Friday, March 25, 2016

Estimating panel data models with the R package plm

Panel data, also called longitudinal data concerns individuals observed through time. It is said to have both a cross section and time series dimension. The R package plm provides panel data estimators for econometricians and is documented in a detailed vignette.

Default settings of the plm() function

By default, the plm() function assumes that the individual and time indexes are in the first two columns. If this is  not the case, an index argument has to specify the name of those two variables in the dataset. For example the argument index = c("country","year") would specify that the individual index is in the column country and the time index is in the column year.

The plm() function's default settings perform a "Oneway (individual) effect Within Model". "Oneway (individual) effect" is a model specification considering that each individual i has a constant, unobserved effect \alpha_i. "Within Model" is an estimation method, identical to the Least Square with Dummy Variables (LSDV) estimation.

Monday, February 08, 2016

FreeRDP

FreeRDP enables a connection to a windows machine from Debian GNU-Linux. freerdp-x11 can be installed from the Synaptic package manager. The program is called xfreerdp.


Usage example:
xfreerdp /u:**user** /p:**password** /v:**IP** /drive:data,/home/paul/R/forestproductsdemand/data-end
Connect in fullscreen mode using a stored configuration connection.rdp and the password Pwd123! :
xfreerdp connection.rdp /p:Pwd123! /f
Connect to host rdp.contoso.com with user CONTOSO\\JohnDoe and password Pwd123!
xfreerdp /u:CONTOSO\\JohnDoe /p:Pwd123! /v:rdp.contoso.com

Connect to host 192.168.1.100 on port 4489 with user JohnDoe, password Pwd123!. The screen width is set to 1366 and the height to 768. See also option /drive below.
xfreerdp /u:JohnDoe /p:Pwd123! /w:1366 /h:768 /v:192.168.1.100:

The screen shot below shows a windows session accessed from within the Gnome desktop. A folder is shared an can be visible in both the windows and Gnome file explorers.


Options

Command line options
/drive:nameofdrive,localfolder : Redirect localfolder as nameofdrive on the server machine

Window size

Pressing CTRL+ALT+ENTER will switch from windowed to full screen and vice-versa. This is sufficient for me to see the full screen on a Gnome Desktop. But there is no option for scroll bars if the distant screen is to large to appear on screen.

Multiple RDP connections to a windows machine

I'm working with a colleague on the same machine. She has long running processes on the machine and I need to use specific software. According to this Microsoft forum, multiple RDP connection is not possible on windows.

Change keyboard layout

List available keyboard layout
xfreerdp /kbd-list
I didn't manage to enter the keyboard option. I asked a question on SuperUser.

Automation for a thin client

Command script to open xfreerdp and reopen it when it closes suggests this script: 
while (true); do xfreerdp -f xxx.xxx.xxx.xxx ; sleep 2; done

Monday, January 25, 2016

A year on Debian

Inspired by other users seen on the Linux setup. I switched my work computer to Debian GNU/Linux in January 2015. I used the Jessie version which was "unstable" at the time. It became "stable" in april 2015 and I think the adjective is deserved. I'm happy.


I mostly use the R statistical software, the iceweasel web browser, the Lyx editor, the reference manager Jabref and the Evolution mail and calendar program. I learned a lot by using git (file revision system) and databases through the command line. I sometimes use vim to edit text files and I use the RStudio editor in Vim mode. I also use the Libre office Calc, Writer and Draw software.

As long as you stick with what is in the huge Debian package repository, software updates are easy. If you need programs updated very recently (in the last months), installation can become tricky. Although it mostly involves adding new software repositories or installing .deb files directly. For example I managed to install the proprietary statistical software STATA.

R packages installation on Debian Jessie


This morning I wanted to install packagedocs to produce web documentation for a package. Packagedocs is based on staticdocs. They required an upgrade to R >= 3.2.0.

Installing R 3.2.3

Based on Debian packages for R software, I added this line to /etc/apt/sources.list
deb http://cran.univ-paris1.fr/bin/linux/debian jessie-cran3/
Reloaded package information in the synaptic package manager, marked all upgrades and installed the package.

Installing packages

Some packages can be installed from the CRAN repository
install.packages("Rcpp")
install.packages("ggplot2")
install.packages(c("dplyr","tidyr"))
install.packages("ggplot2")
install.packages(c("dplyr","tidyr")
Other packages are not available on CRAN. They can be installed from github:
devtools::install_github("hadley/staticdocs")
devtools::install_github("hafen/packagedocs")

See also my older post on R packages and RStudio install on Debian Wheezy.

Friday, December 18, 2015

Using SSH keys to access remote servers and git repositories

An SSH key can be used to access a virtual private server or a remote git repository without the need to enter a password every time. By sharing your public key with the remote server, your compter is authenticated as a trusted access point.


Creating SSH keys 

In Debian GNU Linux, using the Gnome desktop, you can create a private and public SSH key pair with for example the seahorse key manager. Under File / New / Secure Shell Key.

Created keys will be visible under ~/.ssh/ the private key is called id_rsa and the public key id_rsa.pub. You should only share the public key.

At the command line, you can create keys with
ssh-keygen -t rsa -C "your_email@example.com"

Virtual Private Server

I bought a virtual private server with Debian pre-installed. A public key can be added in the file ~/.ssh/authorized_keys. When connected to the server, edit the file:
vim ~/.ssh/authorized_keys
You might need to change access permission to that file as explained in this gist.

Bitbucket

Your public key can be added to your bitbucket account under manage account / security / SSH keys. This page explains how to use the SSH protocol with Bitbucket in more details.

Github

Your public key can be added to your Github account under profile / settings / SSH key. More details on how to generate and use SSH keys for github.

Then at the top of your Github repository you should see the "clone URL". Copy the SSH URL, in the form: git@github.com:yourusername/yourrepository.git
Add it as a remote origin:
git remote add origin git@github.com:yourusername/yourrepository.git
If there was already a remote repository you might need to delete it first with git remote remove origin.



The push and set the remote repository as an upstream repository:
git push --set-upstream origin master
Subsequent push can be simply made with
git push

See also

See also my other blog posts on the bash shell commands and on git commands.

Wednesday, November 25, 2015

Ruby, Perl, R, Bash

A comparison of some programming languages, couldn't add python because it's not recognised as a programming language by the Google trend website.

The trend for one keyword is relative to all other searchers over the same time period. The decreasing trend of Perl in this graph does not mean that searches for Perl decreased in absolute number. It means that the proportion of these searches to the overall Google searches was decreasing. How Trends data is adjusted.

Tuesday, October 20, 2015

Data integration with Knime and the R statistical software

I am testing the Knime software to create data pipelines. I started by installing the following extensions:
  •   KNIME Connectors for Common Databases    
  •   KNIME Interactive R Statistics Integration    

Database operations


I tried chaining the node database Row filter after database selector (containing an SQL statement of the form "select * from table"). But the query was taking ages because my source table is rather large.  I replaced the SQL statement in the node database row filter by a statement of the form "select * from table where code = 999". This time the query was much shorter.
Unlike dplyr which updates the SQL query - based on the group_by(), select(), filter() verbs - before  executing a final SQL query, it seems that Knime is executing all SQL queries one after the other.

Interaction with the R statistical program


Then I pushed the data to R. input data frame is called knime.in One issue is that most character vectors are transformed into factors. This was causing various errors. max(year) returned an error, and various merge operation were failing. I had to tell R to change back all those column types to character or numeric.

I wanted to use a filter before using a plot. But I needed to filter on 2 columns. I didn't know how to implement this in Knime. A Google search returned this forum. Rule based row filter seems to work.




In the workflow above, I used R View to display  a plot generated with ggplot.

Workflow are a nice way to display data integration steps and probably easy to explain to others. Node configuration is rather straightforward, once you have found the right node in the repository. I haven't figured out yet how to use input forms and flow variables.

I don't know how easy it is to maintain functional workflows on the long term.

Monday, September 14, 2015

Programming a test harness

I would like to build a test harness around programs. Automated tests should increase my confidence in the reproducibility of their outcome.
"Whenever you are tempted to type something into a print statement or a debugger expression, write it as a test instead." — Martin Fowler. Quoted here.

Where to store test data

While trying to find out where to place test data, this answer thought me to distinguish between unit tests, which are meant to test each function individually on small mock data and integration tests, which would be based on a larger, real dataset.

Testthat

In a commit called "Don't attach dplyr backends", Hadley Wickham removed direct function calls from loaded packages. Probably to ensure that packages are not loaded directly, he changed function calls to a form of packagename::function().

The author of the testthat R package wrote that autotest
"[...] promotes a workflow where the only way you test your code is through tests. Instead of modify-save-source-check you just modify and save, then watch the automated test output for problems."

Debian Continuous Integration

ci.debian.net

"How often are test suites executed?
The test suite for a source package will be executed:

  • when any package in the dependency chain of its binary packages changes;
  • when the package itself changes;
  • when 1 month is passed since the test suite was run for the last time."

Online Continuous Integration


Wednesday, June 17, 2015

Rstudio tips - Key bindings to program and explore data with R


Rstudio is an editor for the R statistical programming language which can be installed on windows, mac and Linux. See my post explaining R setup under Debian.

Edit code

  • TAB auto complete object names
  • F1 on a function name shows the help page of that function
  • F2 on a function name jumps to the code where that function was created. I found this key so useful that I decided to created this blog post.
  • CTRL+W  close a tab
  • CTRL+F find and replace text
  • CTRL+SHIFT+F find in all files in a directory (like grep), then click on results lines to jump in the files

Explore data

In the environment window, click on a data frame to view it then click on filter to filter the data frame according to various criteria.

Create pdf or html reports

When editing a markdown .Rmd document, the pdf or html report can be generated with CTRL+SHIFT+K.

Create a package

The R packages book by Hadley Wickham explains how to create R packages. Useful short-cuts when working with packages:
  • CTRL+SHIFT+B build the package
  • CTRL+SHIFT+D generate documentation
  • CTRL+SHIFT+T run devtools::test()
The documentation step can be set to run automatically with the package building under build / configure build tools / generate documentation with Roxygen / configure.

Vim mode

Vim mode can be activated under Tools / Global options / Code. Enter command mode with ":" and ask for ":help".  I use primarily the following keys:
  • jklhw$ggG navigate text
  • iaoA enter edit mode to insert text
  • Escape return to navigation mode
  • v select text
  • ypP copy selected text and paste
  • d delete 
  • /nN search 

Wednesday, May 13, 2015

Entreprise Resource Planing trends

Interest for the search topics: Odoo (formerly open ERP), SAP ERP and Microsoft Dynamics in Germany (link to trends for the whole world).
This Google trend chart shows interest in search topics, not just search query. Which means, that trends also include queries that are related to the search topic but do not contain the same exact wording. Explanation from the Google Trends page:
"When you measure interest in a search topic (Tokyo - Capital of Japan) our algorithms count many different search queries that may relate to the same topic (東京, Токио, Tokyyo, Tokkyo, Japan Capital, etc). When you measure interest in a search query (Toyko - Search term), our systems will count only searches including that string of text ("Tokyo")."

Wednesday, April 29, 2015

How to display dplyr's SQL query

dplyr verbs can be chained to query a database without writing SQL queries. dplyr uses lazy evaluation, meaning that database queries are prepared and only executed when asked by a specific verb such as collect(). I was wondering if it is possible to display the SQL query generated by dplyr?

Indeed dplyr::explain() displays the SQL query generated by dplyr. I have copied a reproducible example below based on the dplyr database vignette.
 

Wednesday, April 08, 2015

Ipython notebook and R

I chose to use python 3. Several of the shell commands below have a "3" suffix in Debian testing as of April 2015: ipython3, pip3.

Install programs

I installed ipython-3-notebook (in Debian Jessie) from the synaptic package manager.

In order to install the R module, I installed PIP for python 3 in the synaptic package manager. PIP is the Python Package Index, a module installation tool. Then I used pip3 to install rpy2
sudo pip3 install rpy2
There is a blog post on how to avoid using sudo to install pip modules.

Install statsmodel, a module for statistical modelling and econometrics in python. Maybe I should have installed python-statsmodels as a Debian package instead? But I it seems to be linked to python 2.x instead of python 3 (it had a dependency on python 2.7-dev). Therefore I installed statsmodels with pip3, using the --user flag mentioned above to install is as a user only module.
pip3 install --user statsmodels
The installation took several minutes on my system. It seemed to be installing a number of dependencies. Many warnings about variables defined but not used were returned but the installation kept running. The final message was:
Successfully installed statsmodels numpy scipy pandas patsy python-dateutil pytz
Cleaning up...

Starting the Ipython notebook

Move to a directory where the notebooks will be stored, start a ipython notebook kernel
cd python
ipython3 notebook

Shortcuts

See also the Ipython Notebook shortcuts. Useful shorcuts are ESCAPE to go in navigation mode, ENTER, to enter edit mode. It seems one can use vim navigation keys j and k to move up and down cells. Pressing the "d" key twice deletes a cell. CTRL+ENTER run cell in place, SHIFT+ENTER to run the cell and jump to the next one, and ALT+ENTER to run the cell and insert a new cell below. 

Run R commands in the Ipython notebook


Load an ipython extension that deals with R commands
%load_ext rpy2.ipython
 Display a standard R dataset
%R head(cars)
%R plot(cars)
Use data from the python statsmodels module based on this page.
import statsmodels.datasets as sd
data = sd.longley.load_pandas()
Print column names of the dataset
print(data.endog_name)
print(data.exog_name)
Print a dataset as an html table by simply giving its name in the cell. For example this data frame contains exogenous variables:
data.exog
Python can pass variables to R with the following command:
totemp = data.endog
gnp = data.exog['GNP']
%R -i totemp,gnp
Estimate a linear model with R
%%R
fit <- br="" gnp="" least-squares="" lm="" nbsp="" regression="" totemp="">print(fit$coefficients)  # Display the coefficients of the fit.
plot(gnp, totemp)  # Plot the data points.
abline(fit)  # And plot the linear regression.
Plot the datapoints and linear regression with the ggplot2 package
%%R
library(ggplot2)
ggplot(data = NULL, aes(x =gnp, y = totemp)) +
    geom_point() +
    geom_abline( aes(intercept=coef(fit)[1], slope=coef(fit)[2]))

Wednesday, April 01, 2015

Virtual Machine setup for development purposes


Creating a Virtual machine with Vagrant and PuPHeT.


According to those 2013 stack overflow questions, there were many reasons not to develop in a VM, unless one had to specifically develop for several OS:
But in the same year, the PhPHet developer explained why he thinks that one has to develop in a virtual machine.

Running a VM 

I followed the vagrant instructions to install a basic VM.
vagrant init hashicorp/precise32 vagrant up
"The guest machine entered an invalid state while waiting for it
to boot. " [...] "If the provider you're using has a GUI that comes with it, it is often helpful to open that and watch the machine"
I started the virtual machine in virtual box, an error message came up: 
"VT-x is disabled in the BIOS. (VERR_VMX_MSR_VMXON_DISABLED)."
Under Machine / Settings/ System / Acceleration, I disabled the Hardware virtualisation. The VM could then start. This works for 32 bits systems. Unfortunately 64 bit systems require hardware virtualisation, this means I cannot change this setting for 64 systems. I'll have to enable VT-x in the BIOS later on.

After I installed Virtual box, my mouse was rendered invisible. This may be due to the fact that the mouse was captured and that I didn't know the host capture key (default to the right Ctrl key) to free the mouse from the virtual machine's window.

Connecting to the virtual machine

Connecting from the virtual box GUI. The default user is "vagrant" and password "vagrant".

Connecting with SSH into the machine from a command prompt:
 vagrant ssh

 

Shared folder

A folder can be share with the host operating system. In virtual Box settings for the machine, under shared folder, create a machine folder and set it to auto-mount in the guest operating system.

Other tools



Messages by the vagrant creator

Tao of hashicorp
Comparing Filesystem Performance in Virtual Machines Automation Obsessed

Tuesday, March 31, 2015

Gauss commands

Comments begin "/*" end "*/" or begin "@" end "@"

    /* Comments */
    @ Comments @


Change working directory:

    chdir
 

Load data 
The filename can be either a literal or a string. If the filename is in a string variable, then the ^ (caret) operator must precede the name of the string, as in:

    filestr = "data/filename.txt";
    loadm x = ^filestr;
 

Run a script 

    run file_name;
     

Indexing matrices
See help aptech.com.gauss.13.0/doc/LF.6-DataTypes.html
The statement

    y = x[1:3,5:8];
 

Will put the intersection of the first three rows and the 
fifth through eighth columns of x into the matrix y.

Plot

plotXY(datax[.,1], datax[.,2:cols(datax)]) 
plotXY(datay[.,1], datay[.,2:cols(datay)])

Gauss resources

Basic GAUSS workshop 2002
Aptech Tutorial, running a program file

Wednesday, March 25, 2015

Octave commands

I am trying to run Matlab based test statistics in GNU Octave.

Octave commands 

List variables available in memory
who %this is a comment
whos %provides class details
Change and display working directory
cd directory_name
pwd
Manipulate data structures:
x.a = 1;
x.b = [1, 2; 3, 4];
x.c = "string";
Display the value of a variable
disp(x)
Loop over a list of files
csvfiles = dir("*.csv")
for file= csvfiles'
fprintf(1,'Doing something with %s\n',file.name)
end
Creating character arrays
"In the MATLAB® computing environment, all variables are arrays, and strings are of type char (character arrays)."

Reading data from an Excel or CSV file

The test statistics I wanted to use loads data from an Excel file but this returned the error :" 'xlsread' undefined ". Reading excel file is provided by the IO package which is not installed by default. The package is available in the Debian repository under "octave-io" , with the description "This package [...] contains functions to [...] read Excel spreadsheet (xlsread) and OpenDocument spreadsheet (odsread)." It is based on Apache POI. Load the package an try to read a file:
pkg load io;
data=xlsread('file_name.xls');
xlsread returns an error "Detected XLS interfaces: None."  This forum post recommends to load the java and windows packages as well. Those packages are not available in the Debian repositories.
I decided to convert the Excel file to csv and use csvread instead.

The script now gives the same output as on a windows machine running Matlab.

Warning: possible Matlab-style short-circuit operator 

Short-circuit boolean operators explains that:
"MATLAB has special behavior that allows the operators ‘&’ and ‘|’ to short-circuit when used in the truth expression for if and while statements. The Octave parser may be instructed to behave in the same manner, but its use is strongly discouraged." [...]
I wonder why it is strongly discouraged. 
"To obtain short-circuit behavior for logical expressions in new programs, you should always use the ‘&&’ and ‘||’ operators."
I replaced "|" by "||" in the code.

Writing test results to a file

Matlab low level file IO, explains how to use fprintf (a vectorised implementation of the c function) to write text data to a file.

Thursday, March 12, 2015

Stata commands

Load csv data

cd /home/paul/ 
insheet using filename.csv

tsset and xtset for panel variables

The 2 commands are basically similar (STATA forum discussion). tsset mentions "If you tsset panelvar timevar, you do not need to xtset panelvar timevar to use the xt commands."
xtset country year

View available test results

How to access stored estimation results
 
Stata help: "to see what was returned from an estimation command", type:
ereturn list

Then display results with:
 display e(depvar)
matrix list e(b)

View the source code of a command

viewsource xtset.ado

Thursday, March 05, 2015

Why should research organisations release free software?

Research organisations protect software with Intellectual Property (IP) rights. Some of these IP rights authorise the release of source code but some prevent source code release. Within the organisation, a decision maker should ask herself:
  • Can the organisation pay a person or a group of person in years to come to maintain that program in the long run?
If the answer is no, read on.

Researchers frequently move to other job positions. Once a researcher has moved to another job, the software codes he/she wrote is likely to sit idle on the organisation's storage drives. When no insider knows how to modify a computer program's code, the value of that program for the organisation will depend on the possibility for outsiders to modify the code.
  1. If researchers outside the organisation are not allowed to update the software, it will not be used. IP rights preventing source code modification don't have any value.
  2. If on the other hand, the piece of software is released as free and open source software, researchers outside the organisations are likely to update the software once the need arises. IP rights ensure that the first creator's contribution with its organisation's affiliation remains cast in the software's stone. An acknowledgement mentioning the organisation will travel with the piece of software as long as this piece of code is useful. This is likely to attract future project contribution and funding to the host organisation.