If you are going to be doing quantitative analysis of any kind then you should write using a good text editor. The same can be said, I’d argue, for working with any highly structured document subject to a lot of revision, such as a scholarly paper. Text editors are different from word processors. Unlike applications such as Microsoft Word, text editors generally don’t make a big effort to make what you write look like as though it is being written on a printed page.1 Instead, they focus on working with text efficiently, while keeping it in a plain and portable format, as opposed to binary file formats like
.docx. Figure 1 shows an example.
Text editors can also help you where word processors are not much use. If you are writing code to do some statistical analysis, for instance, then at a minimum a good editor will highlight keywords and operators in a way that makes the code more readable. Typically, it will also passively signal to you when you’ve done something wrong syntactically (such as forget a closing brace or semicolon or quotation mark), and automagically indent or tidy up your code as you write it. More advanced editors can work with a linter to more actively check and flag stylistic or syntactical errors as you go. If you are writing a scholarly paper or a dissertation that incorporates data of any sort, and especially numerical data, a good text editor can make it easier to maintain control of things. Just as the actual numbers are crunched by your stats program—not your text editor—the typesetting of your paper is handled by a specialized application, too. That tool should automatically take care of things like entries in your bibliography, the labelling of tables and figures, and cross-references and other paraphernalia. The best editors can closely integrate with the tools you use to do the various pieces of your work.
Emacs is a text editor, in the same way the blue whale is a mammal. It does the things I have just described, and rather more besides, if you want it to. Combining Emacs with some other applications and add-ons allows you to manage writing and data-analysis effectively. If it seems odd to do a bunch of different tasks inside an editor, the blogger Rekado makes a useful analogy to the way people use web browsers:
While very powerful and flexible, Emacs can be annoying. Indeed, to many people encountering it for the first time—especially those used to standard applications on Windows or Mac OS—its conventions seem bizarre and byzantine. As applications go, Emacs is quite ancient. The first version was written by Richard Stallman in the 1970s. Because it evolved in a much earlier era of computing (before the development of decent graphical displays, for instance, and window managers, and possibly also fire), it doesn’t share many of the conventions of modern applications. Like most powerful text editors, Emacs offers many opportunities to waste your time learning its particular conventions, tweaking its settings, and generally customizing it. There are several good alternatives on each major platform.
Given all that, why mention it in the first place? Partly because it’s the editor I use. Partly because it is available for all of the main desktop and laptop computing plaforms. And partly because it is very, very good at doing what I want it to do. There are many good reasons to use something like TextMate, or Sublime Text instead of Emacs. Similarly, when doing data analysis with R, you may just want to use the RStudio environment. You will do fine if you go with these alternatives.
When you write papers in plain text, how do you manage the formatting, sectioning, and other related aspects of your document? Markdown is a loosely-standardized way of writing plain text that includes information about the formatting of your document. It was originally developed by John Gruber, with input from Aaron Swartz. The aim was to make a simple format that could incorporate structural information about the document (such as headings and subheadings, emphasis, hyperlinks, lists, footnotes, and so on), with minimal loss of readability. Formats like HTML or TeX are much more extensive markup languages, but Markdown was meant to be simple. Over the years it has become a de facto standard. Text editors and note-taking applications support it, and tools exist to convert Markdown not just into HTML (its original target output format) but many other document types as well. Listing 1 shows the markdown source for this paragraph and its subheading.
Listing 1: The Markdown source for a nearby part of this document.
# Use Markdown When you write papers in plain text, how do you manage the formatting, sectioning, and other related aspects of your document? [Markdown](http://en.wikipedia.org/wiki/Markdown) is a loosely-standardized way. It was originally developed by John Gruber, with input from Aaron Swartz. The aim was to make a simple format that could incorporate structural information about the document (such as headings and subheadings, *emphasis*, [hyperlinks](http://daringfireball.net/markdown), lists, footnotes, and so on), with minimal loss of readability. Formats like HTML or TeX are much more extensive markup languages, but Markdown was meant to be simple. Over the years it has become a *de facto* standard. Text editors and note-taking applications support it, and tools exist to convert Markdown not just into HTML (its original target output format) but many other document types as well. @lst:markdown-example shows the markdown source for this paragraph and its subheading.
The excerpt shown in Listing 1 shows a few of the most common Markdown conventions, most notably how it represents headings and subheadings (a
# symbol for a top-level header, with
## for the next level down, and so on), how it represents hyperlinks, and how it emphasizes text. There are a number of Markdown variants, or “flavors”, that have extended it to manage things like cross-references and labels, citations, and other textual elements. Citations are particularly important. The
pandoc-citeproc filter is an add-on that handles these. It can be installed alongside
pandoc. Your bibliography can be stored in one of a variety of formats (such as BibTeX, or EndNote). Within your
.md document, cites are referred to by their key, such as
pandoc converts your document, the cite key is replaced with the reference information like this (Healy and Moody 2014), and the full bibliographic entry is included in an automaticaly-generated list of references. Read Pandoc’s documentation for more details about citations. At the end of the excerpt you can also see that the code listing is labeled with
@lst:markdown-example, for example. A Pandoc filter named
pandoc-crossref extends this
@label convention to deal with labeled Figures, Tables, and so on. Using Markdown in this way means you do not have to worry whether your reference list is complete, or whether cross-references (to ‘Figure 3’ for example) remain correct after you move things around in your text.
You will probably be doing some—perhaps a great deal—of quantitative data analysis. R is an environment for statistical computing. It’s well-supported, continually improving, and has a very active expert-user community. The documentation that comes with the software is complete, if somewhat terse, but there are a large number of excellent reference and teaching texts that illustrate its use. These include Dalgaard (2008), Venables and Ripley (2002), Maindonald and Braun (2003), Fox (2002), Harrell (2016), Matloff (2011), and Gelman and Hill (2007). Although it is a command-line tool at its core, it can easily be used in conjunction with the RStudio IDE. You can download R from The R Project Homepage.
R can be used directly within Emacs by way of a package called ESS (for “Emacs Speaks Statistics”). As shown in Figure 2, it allows you to work with your code in one Emacs frame and a live R session in another right beside it. Because everything is inside Emacs, it is easy to do things like send a chunk of your code over to R using a keystroke. This is a very efficient way of doing interactive data analysis while building up code you can use again in future.
You’ll present your results in papers, but also in talks where you will likely use some kind of presentation software. You can use Microsoft PowerPoint or Apple’s Keynote. Or, you can produce HTML or PDF slides directly from plain text documents.2
Next: Reproducing Your Work →
← Previous: Keep a Record of Your Work
Dalgaard, Peter. 2008. Introductory Statistics with R. Second edition. New York: Springer.
Fox, John. 2002. An R and S-Plus Companion to Applied Regression. Thousand Oaks: Sage.
Gelman, Andrew, and Jennifer Hill. 2007. Data Analysis Using Regression and Multilevel/Hierarchical Models. New York: Cambridge University Press.
Harrell, Frank. 2016. Regression Modeling Strategies. Second. New York: Springer.
Healy, Kieran, and James Moody. 2014. “Data Visualization in Sociology.” Annual Review of Sociology 40: 105–28.
Maindonald, John, and John Braun. 2003. Data Analysis and Graphics Using R: An Example-Based Approach. New York: Cambridge University Press.
Matloff, Norman. 2011. The Art of R Programming. San Francisco: No Starch Press.
Venables, W.N., and B.D. Ripley. 2002. Modern Applied Statistics with S. Fourth. New York: Springer.
The actual business of giving talks based on your work is beyond the scope of this discussion. Suffice to say that there is plenty of good advice available via Google, and you should pay attention to it.↩