Thursday, December 26, 2013

Python-pandas importing a data frame from MySQL

I wanted to load tables from a mysql database and to run analyses on them. Had already done some analysis on R, but wanted to make them portable to a website, and thought that python would be better suited for that. The version of pandas currently shipped with Ubuntu is outdated 0.7. I had to use another method to get a newer version. Pandas source code is currently hosted on GitHub at: http://github.com/pydata/pandas 
After a
    sudo apt-get install python-pip 
 I installed pandas via ``pip``::
    pip install --upgrade pandas 
Still my script with
    dtf = pandas.io.sql.read_frame("SELECT * FROM Table", db) 
was returning an error, expecting list got tuple.
And  pandas.__version__ was still at 0.7.0.
I uninstalled the python-pandas package.:
    sudo apt-get remove python-pandas
And ran again
    sudo pip install pandas
After that, pandas.io.sql.read_frame() was working as expected.
And dtf.head() showed me a proper vue of the table.
Columns can be selected with dtf.columnname or dtf['columnname'].

Tuesday, December 17, 2013

Lyx article editing tips

The Lyx "What You See is What You Mean" editor is available at Lyx and from your package manager.

Citations

Lyx nicely integrates with a bibtex bibliography. You can create a shortcut such as CTRL+SHIFT+C to insert a reference. A small menu will pop-up to choose from the references.

Unfortunately  "BibTeX does not support files encoded in UTF-8". To get characters such as "é" encoded correctly latin1 has to be used.This can be set as the database property in Jabref.

Shortcuts

Under Tools / preferences / Editing / Shortcuts /  Stackoverflow question on how to find the command name for a shortcut. I mapped:
  • citation-insert to Ctrl+Alt+C 
  • href-insert to Ctrl+K
  • label-insert to Ctrl+Alt+L
  • statistics to Ctrl+Alt+T
See also the Lyx wiki page on shortcuts.

Merge and filter bibliographies 

Answers to a question on merging 2 bibtex file recommend using Jabref file / import into current database.

Create a shorter bibliography with only cited documents from another bibliography Extracting only the BibTex records relevant to the paper. Answers recommend using bibtool or bibexport.

I have written an article in Lyx which cites 2 bibliographies and a colleague asked to have a .tex version of this article. Here is my recipe:
  1. Use Jabref to merge two bibliographies. 
  2. Export Lyx file to LaTeX (PDFLaTeX).
  3. Edit the .tex file to insert a link to the merged bibliographies. 
    1. \phantomsection\addcontentsline{toc}{section}{\refname}\bibliography{long.bib} 
  4. Compile the .tex with TeXstudio. 
  5. Used bibexport to extract a short bibliography corresponding only to the citations used in one article.
bibexport -o short.bib article.aux
  1. Replace long.bib by short.bib in the .tex file.
I have done it 3 times already. Could I create a script that does all this automatically?

Spell checker

A spell checker can be activated under Tools / preferences / language settings / spellchecker. On my system the only spell check engine available is Enchant, "a wrapper that provides automatic access to many spell checking libraries (ispell, aspell, hspell, hunspell, etc.)".

The button displaying a red-underlined A can be activated to spell check continuously.

Language of document

Under Document / Settings / Language, it's possible to change the document language, which will have an impact on the language used for spell checking. In my system compiling a document in French returned an error:
"Package babel Error: Unknown option `french'."
This stackexchange post suggest installing the texlive-lang-french Debian package. This fixed the error.

Math mode

Press CTRL+M at the cursor or on highlighted text to enter math mode. Latex syntax "\beta", "x_t" can be used to enter equations and is displayed directly as math symbols.

Change input encoding

This question on "how do I prevent  Lyx from loading inputenc?" explains where this part of the Latex preamble is defined.
In Document > Settings..., under Language > Encoding, choose "Other" and select "Language Default (no inputenc)".

Instead I replaced Latin1 by utf8 as input encoding. This works for the moment.

Encoding issues and Greek symbols used outside math mode

"Encoding file 'lgrenc.def' not found"  was fixed by installing the debian package texlive-lang-greek (as explained here on stackoverflow). Other errors appeared related to the use of beta signs outside math mode. As one user explained on stackoverflow, placing \beta signs in math mode usually fixes the issue.

Hyperlinks in PDF

How to change hyperlink colors in Lyx
Document > Settings > PDF Properties under the Additional options field. For example, a custom set of instructions may be the following: linkcolor=black, urlcolor=blue, citecolor=green, pdfstartview={FitH}, hyperfootnotes=false, unicode=true 
I changed marineblue to blue, as my system complained that marineblue was not available.

The lyx function that creates hyperlinks is called href-insert. You can create a CTRL+K shortcut that inserts hyperlink under Tools / preferences / Editing / Shortcuts / "Cursor, Mouse and Editing functions" by inserting the function name: href-insert.

Presentation template, called beamer

The beamer theme and color theme can be changed by inserting this in Document / Seetings / Latex preamble:
\usetheme{Boadilla}
\usecolortheme{orchid}
An overview of themes is available in the beamer theme matrix.
Here is an explanation on how to change to any other RGB colour. Beware the definition of RGB colours are unusual in Latex, divide values by 255.
\definecolor{darkgreen}{rgb}{.2,.4,.2}
\usecolortheme[named=darkgreen]{structure}

Slide numbering can be added to the beamer template. By inserting this code block in Document / Settings / latex preamble:
\addtobeamertemplate{navigation symbols}{}{%
    \usebeamerfont{footline}%
    \usebeamercolor[fg]{footline}%
    \hspace{1em}%
    \insertframenumber/\inserttotalframenumber
}

Tables too wide for a slide can be resized by placing the table between the following ERT. If there is a table float, this instruction has to be placed inside the table float.


}

 Elsevier article template

Elsevier provides a latex template for articles called elsarticle which is usable in Lyx. However compiling with this template doesn't seem to work when using the hyperref package with the general option "automatically fill header". After deactivating this option, compilation works again.

The elsarticle template uses numerical references by default. it is possible to activate author year reference according to this stack exchange answer.

 Tips

  • CTRL+D uses yap, a faster (on windows) previewer than the default PDF one. There is no noticeable difference on Linux.

To produce a meeting report with a wider template than the default article.
  • I would have liked to find an overview of the templates with screenshots of PDF results. But I couldn't find such overview online. LyX wiki has a list of examples and a list of layouts
  • In fact all that was needed was to edit the Document \ Settings \ Page margins, I set the inner and out margins to 2 cm and was happy with the result.

knitr and Lyx

Export to Microsoft Word compatible format RTF

Under File / Export / Rich Text Format, it's possible to export Lyx files to a format compatible with Microsoft Word.
My Lyx file uses many packages and formatting options. Export to RTF didn't work and returned the error:
"An error occurred while running: latex2rtf"
I made a copy in a temporary directory (/tmp) and removed external packages and options in the latex preamble, before performing the conversion. Then the conversion to RTF worked.

Compare different versions of a document

Use Tools/Compare to highlight differences between a new and an old document. Then show changes in output to show changes in red and blue in the PDF output.
Alternatively, you can export the Lyx files to Latex and use Latexdiff as such:
latexdiff draft.tex revision.tex > diff.tex

Avoid complex tables or figures in the diff, as they can cause issues. I had to create a fake old documents with final figures and table for latexdiff to return a usable output.

Monday, December 16, 2013

R display an estimated equation as a latex formula with coefficients in parenthesis

In a Rnw file this function can display an estimated equation in latex form with coefficients in parenthesis.

Wednesday, December 11, 2013

MySQL commands

As root, create a database and grant permissions to a new user

To connect to the mysql client as root
mysql -u root -p
You'll be prompted for the password.

How to create a new user
CREATE USER 'newuser'@'localhost' IDENTIFIED BY 'password';
Create a database
create database databasename
Connect to a database
connect databasename
 Grant permissions to a user
GRANT ALL PRIVILEGES ON * . * TO 'newuser'@'localhost';
Then you can log out of mysql
quit

As a user 

To connect to the mysql client as a user
mysql -u username -p
If you have the correct privileges, you can also create a database with a pipe. How to create a database
echo "create database databasename" | mysql -u username -p
Load a dump
cat file.sql | mysql -u username -p databasename

Graphical client

On Ubuntu, I used mysql navigator by shi bok jang.
On Debian, I used  mysql workbench

More on users

Change password, as the user itself or as root:
SET PASSWORD FOR 'Karl'@'localhost' = PASSWORD('cleartext password');
As root , list all users
SELECT User FROM mysql.user;
As root , delete a user
drop user Rasdfas@localhost;
MySQL documentation on adding user accounts.

More on databases

Commands below work if your shell user name is the same as your mysql user name. If it's dufferent add -u username to the command.

List all databases (in mysql client)
show databases
Delete a database (in the mysql client)
DROP database databasename;
Rename a database  (in the shell)
mysqladmin -p create new_database
mysqldump -p old_database | mysql -p new_database
After you have verified that everything is in order
drop database old_database
Move a table from one database to another
mysqldump -p database_1 table_name | mysql -p database_2 
Back up only part of a database with the where option
mysqldump -p -w"productcode=440799" tradeflows raw_flow_yearly > sawnwood99raw.sql
Back up only the structure of a database (not the data)
mysqldump -p -d tradeflows > tradeflows.sql
Load a dump into a database :
cat file.sql | mysql -u username -p databasename

Configuration file my.cnf

It's not desirable to share user name and password in software disseminated over the internet. User name, password and database names can be placed under groups in the configuration file. For example for a given project enter this group in ~/.my.cnf
[project_name]
user = user
password = password
host = localhost
database = dbname
Then simply specify group=project_name to the client that accesses the database.

Data Visualisation Tools

A list of Data visualisation tools I've tried.

Desktop tools
  • R with ggplot2 package
  • Excel with pivot tables and charts
Web tools:
Sample websites:
Data storage tools:

Tuesday, December 03, 2013

Presentation with Beamer and Rnw

Copied from this post by Paul Hiemstra quoting a presentation by Yihui Xie.
There was a slight mistake in this presentation made with beamer and Rnw file. The code chunk options were not quoted properly. I corrected this in the code below and now it works.


\documentclass{beamer}
% Inspiration from
% http://www.r-bloggers.com/r-and-presentations-a-basic-example-of-knitr-and-beamer/

\begin{document}

\title{A Minimal Demo of knitr}
\author{Yihui Xie}

\maketitle

\begin{frame}[fragile]
You can test if \textbf{knitr} works with this minimal demo. OK, let's
get started with some boring random numbers:

<>=

set.seed(1121)
(x=rnorm(20))
mean(x);var(x)
@
\end{frame}

\begin{frame}[fragile]
The first element of \texttt{x} is \Sexpr{x[1]}. Boring boxplots
and histograms recorded by the PDF device:

<>=
## two plots side by side (option fig.show=hold)
boxplot(x)
hist(x,main='')
@
\end{frame}

\begin{frame}[fragile]
Plots
<>=

## two plots side by side (option fig.show=hold)
boxplot(x)
hist(x,main='')
@
\end{frame}

\end{document}

Sunday, November 17, 2013

HP printer on Ubuntu Linux

I used the add printer dialogue on Ubuntu Linux but I didn't manage to install a HP deskjet 2514 printer. Following a recommendation read on Ubuntu-fr doc I installed the HPLIP tools (HP Linux Imaging and Printing). And it worked, I can now print and scan from Ubuntu on this cheap Deskjet printer. Once HPLIP is installed, new HP printers can be installed using the command: hp-setup.

Printers installed on the local computer can be accessed through: http://localhost:631/printers/

Next step is to plug the printer to my internet router to enable printing through the wifi. My internet router has a windows printer sharing capability. On Lubuntu I needed to install the smb client to be able to access a printer shared on a Windows network.


-----
HPLIP Installer ver. 5.1
[...]
Enable the universe/multiverse repositories. Also be sure you are using the Ubuntu "Main" Repositories. See: https://help.ubuntu.com/community/Repositories/Ubuntu for more information.  Disable the CD-ROM/DVD source if you do not have the Ubuntu installation media inserted in the drive.  During the install process you will be added to the lp and lpadmin group, please quit the installer before the setup stage, log out, log back in, and run hp-setup to complete the install.

Wednesday, November 06, 2013

Code documentation

"I believe that the time is ripe for significantly better documentation of programs, and that we can best achieve this by considering programs to be works of literature."
—Donald Knuth, “Literate Programming

Inspired  roxygen

Further quotes from Literate Programming by Donald Knuth:
"first [...], I thought that I would be designing a language for “top-down” programming, where a top-level description is given [...] and successively refined. On the other hand I knew that I often created major parts of programs in a “bottom-up” fashion, starting with the definitions of basic procedures and data structures and gradually building more and more powerful subroutines. I had the feeling that top-down and bottom-up were opposing methodologies: one more suitable for program exposition and the other more suitable for  program creation.  But after gaining experience with WEB, I have come to realize that there is no need to choose once and for all between top-down and bottom-up, because a program is best thought of as a web instead of a tree. A hierarchical structure is present, but the most important thing about a program is its structural relationships.
[...]
Thus, WEB may be only for the subset of computer scientists who like to write and to explain what they are doing. "

Friday, October 25, 2013

Why R


R is an open source statistical software. Many package enable statistical data analysis.

Task views

 R in the field of forest and environmental economics:

Graphics

In the press

"R first appeared in 1996, when the statistics professors Ross Ihaka and Robert Gentleman of the University of Auckland in New Zealand released the code as a free software package. [...] Some statisticians who took an early look at the software considered it rough around the edges. But despite its shortcomings, R immediately gained a following with people who saw the possibilities in customizing the free software. "
Examples of R usage by David Smith:
  • R is used frequently by The New York Times to produce consistently great infographics.
  • Google uses R to calculate the ROI on advertising campaigns.

Courses

Stanford university offers a course on Machine Learning in R.




Code golf

R performed quite well in a decimal to roman number conversion challenge. R code is shorter than python or perl implementation but longer than common lisp. 

See also:

Git commands

See also my other posts labelled git:
I've started using GIT to track file modifications and I followed this advice to set it up:  create a repository.
I use git both on windows and on Ubuntu / Debian GNU-Linux.

The commands I've used to upload content to github.com/paul4forest/forestproductsdemand are:
git remote add origin https://github.com/paul4forest/forestproductsdemand
git pull origin master
git add
git commit -m "Explanatory message"
git push origin master
Alternatively "git commit -a"" is a replacement for "git add"" and "git commit". What is the difference between pull and clone: "I like to think of 'clone' as "make me a local copy of that repo" and 'pull' as "get me the updates from some specified remote."

The commands to setup a fresh repository from bitbucket :
mkdir /path/to/your/project
cd /path/to/your/project
git init
git remote add origin ssh://git@bitbucket.org/username/bbreponame.git
# Upload and set the local changes as upstream
git push -u origin master
See also this discussion on why do I need to set upstream?

Commands to copy an existing repository from bitbucket :
 git clone git@bitbucket.org/username/bbreponame.git

Go back in time 

Display the modification log
git log 
Display the log of a particular branch (after a fetch for example)
git log origin/master
Display a compact log for one file or one directory only
git log --abbrev-commit --pretty=oneline path_to_file
Identify the commit identity in the log and copy its sha number. Then to go back to this state for the whole folder: 
git reset --hard commit_sha
To go back to this state for only one file, see git checkout
git checkout commit_hash  path_to_file/file_name
No commit hash to get to get the file back to the latest commit.

Chekout the older revision of a file under a new name

git show commit_sha:filename > new_file_name
See also alias and git grep below.

 Help

Get help on a command (will start a web browser):
git init --help

Configure user name and email

Display your user name, email and remote repositories
git config -l
To change username and email
git config --global user.name "Your Name"
git config --global user.email you@example.com
Setting your email in git explains how to change the email for the current repository only.

Branching

To start work in a new branch:
git branch new_branch_name
git checkout
new_branch_name
To compare a file between 2 branches:
git diff branch1 branch2 file_name
To merge changes back to the master branch:
git checkout master
git merge branch1
If there were conflicts, they will be presented in this way:
"The area where a pair of conflicting changes happened is marked with markers <<<<<<<, =======, and >>>>>>>. The part before the ======= is typically your side, and the part afterwards is typically their side."

I might need to delete a branch at some point:
git branch -d branchname
Delete a remote branch (stackoverflow question)
git push --delete origin temp
Deleting your master branch.

If I am on a detached head, it is recommended to create a temporary branch (stackoverflow).
git branch temp
git checkout temp
git add -a
git  commit -m "description of changes"
git checkout master
git merge temp
Delete uncommitted changes in current working directory:
git checkout branch_name .
See also below git clean.

Add minor change to the previous commit (git commit --amend):
git commit --amend

Tagging

Creating an annotated tag
git tag -a v1.4 -m 'my version 1.4'
You can add a tag after the fact. To tag an earlier commit, specify the commit checksum or part of it:
git log --pretty=oneline
git tag -a v1.2 -m 'version 1.2' 9fceb02
Delete a tag
git tag -d tag_name

A regular push command won't push a tag (bitbucket), to push all your tags :
git push origin --tags

Display changes

To view modified files that have not been committed and to view commit history you can use: 
git status
git log

git log --pretty=oneline
Shows the changes between the working directory and the index.
git diff

Shows the changes between the index and the HEAD
git diff --cached
Shows all the changes between the working directory and HEAD
git diff HEAD
The 3 lines above were copied from this question on git diff.


Show when the tip of branches have been updated
git reflog
 Alternatively, call the repository browser with:
gitk
To view a shorter version of the log file, and get an idea at where I am in the history:
git log --graph --decorate --all --pretty=oneline
You can define an alias for git log as explained be Fred here:
git config --global alias.lg "log --color --graph --pretty=format:'%Cred%h%Creset -%C(yellow)%d%Creset %s %Cgreen(%cr)%C(bold blue)<%an>%Creset' --abbrev-commit"
The new alias can then be used with
git lg
Use tags to specify important points in history, such as software versions.

Working with files

Get back a file to the last commit 
git checkout path_to_file/file_name
Get back a file to a previous commit, using the commit hash
git checkout 4fb987f175210c09daaa4d0240070ffc9641120b path_to_file/file_name
Rename a file
git mv file_name file_name_new
Change the case of a file on a windows FAT 32 system:
git mv load.r load2.R
git mv load2.R load.R
Sometimes the vi editor starts. To exit the vi editor:
ESC:q!
If a file or folder has been renamed outside of git, I get this warning:
$ git add .
warning: You ran 'git add' with neither '-A (--all)' or '--ignore-remo
whose behaviour will change in Git 2.0 with respect to paths you removed
Paths like 'docs/efi/efi_logo_rgb_small_siw.jpg' that are
removed from your working tree are ignored with this version of Git.

* 'git add --ignore-removal ', which is the current default,
  ignores paths you removed from your working tree.

* 'git add --all ' will let you also record the removals.
Therefore I think I should always run "git add --all "

Remove local (untracked) files from my current Git branch
Show what will be deleted with the -n option:
git clean -f -n
Then - beware: this will delete files - run:
git clean -f
Alternatively clean in interactive mode:
git clean -i

Search text

 Search all files in the subdirectory "subdir" for lines containing the words "factor" and "item". Show 2 lines of context (2 leading and 2 trailing lines).
git grep -e item --and -e factor -C 2 -- subdir/
Stackoverflow:  How to search committed code in the git history?

Bulk replace strings

Use git grep to replace strings in many files in the directory :
git grep -l 'original_text' | xargs sed -i 's/original_text/new_text/g'

Save local modification temporarily

# I had edited the current file in between so needed to use
git stash # save local modifications away
git checkout __commit__hash__
# do some stuff in there ...
# Get back to most recent version of the code
git checkout branch_name
git stash pop # reload local modifications

.gitignore

To ignore all files in a folder but not the folder itself. Put this .gitignore into the folder, then git add .gitignore
*
!.gitignore
 To exclude everything except a specific directory foo/bar (note the /* - without the slash, the wildcard would also exclude everything within foo/bar):
    /*
    !/foo
    /foo/*
    !/foo/bar

Remote

When a repository is connected to several remote repositories, to change the default git remote, push with :
git push -u origin master
Then later push of that branch to that remote can be made simply with:
git push

Another command without specifying the remote and the branch
$ git push -u
fatal: The current branch master has no upstream branch.
To push the current branch and set the remote as upstream, use

    git push --set-upstream origin master

After I run this set upstream flag, I can push to the remote server. Then I get this message

[...]
 * [new branch]      master -> master
Branch master set up to track remote branch master from origin.
I'll have to figure out what this does.

Using the gh-branch to publish project documentation on github

SO Answer to the question "How to add a git repo as a submodule of itself? (Or: How to generate GitHub Pages programmatically?)": An alternative to using Git Submodules to generate GitHub Pages is to use Git Subtree Merge Strategy.

In fact I didn't use quite that strategy and I instead cloned a temporary copy of my repository. Created the gh-page  branch. Pushed it to github. Then I went back to the original repository (where I have a few large untracked data files I find handy to keep for analyses purposes).

Then within the inst folder, I cloned only the gh-branch. To clone only one branch:
git clone -b mybranch --single-branch git://sub.domain.com/repo.git
Then I renamed the folder to "web", so that I had a inst/web folder, tracking the gh-branch. inst/web is ignored in the main repository.

References

Presentations:
Workflow:
Turorial:

Friday, September 20, 2013

Export R ggplot as windows meta file for powerpoint

I want to make a poster with graphs imported from R, using the ggplot package.
  1. I could export the images as png, but they look pixelised on a big file. 
  2. I would have liked to import SVG but it's not possible in microsoft powerpoint. 
  3. win.metafile() is the solution
Took me a while to figure out that the windows metafile is also a scalable graphic format. It can be used as such:
win.metafile("file name.emf",width=17, height=9) # width and height in inches,
# You can divide by 72 to have the size in pixel
ggplot() + geom_line(data=dtf, aes(...))
dev.off()

Thursday, August 22, 2013

Export Excel files with multiple sheets from R using XLSX

require(xlsx)
wb = createWorkbook()

# Create a first sheet
sheet  = createSheet(wb, sheetName="NewSheet.1")
rows = createRow(sheet, rowIndex=1:2)
cells = createCell(rows, colIndex=1:8)
setCellValue(cells[[1,1]],"Text")
setCellValue(cells[[1,2]],-0.1)
setCellValue(cells[[1,3]],"=B1+1")

# Create another sheet
sheet  = createSheet(wb, sheetName="NewSheet.2")
rows = createRow(sheet, rowIndex=1:2)
cells = createCell(rows, colIndex=1:8)
setCellValue(cells[[1,1]],"Text")

# Save Excel file and show me the names of the sheets that have been created
saveWorkbook(wb, "trying out.xlsx")
names(getSheets(wb) )

XLSX Package on CRAN

XLSX is based on the Apache  POI project. I still need to figure out  how to evaluate formulas in the Excel file. There may be some help at the Apache POI formula evaluation page.

Tuesday, July 09, 2013

Latex to word

My thesis co-director recommended me to use LaTeX, so I produced a document in LaTeX and exported it to a PDF. But one of the persons I'm collaborating with wants to comment on a word file. I tried latex2rtf but the output doesn't take into account my tables formatted with tabularx and the bibliography, so the reviewer will be missing on part of the document. Latex2rtf is not working for me so I'll be using the export option of Adobe Acrobat.

I might also try Pandoc.

Friday, March 08, 2013

Installing RExcel

I wanted to install RExcel from the cran package.
The think is that you need this packages installed before:
  • rscproxy 2.0-5
  • rcom 2.2-5
For the next install, If I need to reinstall RExcel for someone else, I think I'll try using this RandFriends package. Which installs R together with the required extension for running RExcel.

Thursday, February 28, 2013

60 tabs opened in Firefox

I can leave my computer on, just put it on pause for the night. Firefox browser obviously remembers all tabs previously opened, this handy when I want to comeback to something I was reading the day before.

Sometimes I overdo it, like today I had 60 tabs opened: 

Looks like I need to "flush" tabs down, which I did :)

There is a study on "How many tabs do people use?" by Patrick Dubroy.