I really like developing software and making my own life and work easier with it. But what I enjoy even more is to see others actually use it! So every now and then I look at CRAN download counts of my R packages. I’m not in any top-10 rankings or anything. But that was also never the point. I just like sharing my knowledge and see others use it!
Everyone is talking about AI at the moment. So when I talked to my collogues Mariken and Kasper the other day about how to make teaching R more engaging and how to help students overcome their problems, it is no big surprise that the conversation eventually found it’s way to the large language model GPT-3.5 by OpenAI and the chat interface ChatGPT. It’s advantages for learning R (or any programming languages) are rather obvious:
I have tried to venture into Python several times over the years. The language itself seems simple enough to learn but as someone who has only ever used R (and a bit of Stata), there were two things that held me back:
I never really found an IDE that I liked. I tried a few different ones including Spyder and Jupyter Notebook (not technically an IDE) and compared to RStudio and R Markdown they felt rather limited.
As an example, let’s make different versions of a simple plot and let the user decide which one to display. First I make the plots and save them in a sub-directory:
R 4.0.0 was released on 2020-04-24. Among the many news two stand out for me: First, R now uses stringsAsFactors = FALSE by default, which is especially welcome when reading in data (e.g., via read.csv) and when constructing data.frames. The second news that caught my eye was that all packages need to be reinstalled under the new version.
This can be rather cumbersome if you have collected a large number of packages on your machine while using R 3.
Today I was struggling with a relatively simple operation: unnest() from the tidyr package. What it’s supposed to do is pretty simple. When you have a data.frame where one or multiple columns are lists, you can unlist these columns while duplicating the information in other columns if the length of an element is larger than 1.
library(tibble) df <- tibble( a = LETTERS[1:5], b = LETTERS[6:10], list_column = list(c(LETTERS[1:5]), "F", "G", "H", "I") ) df ## # A tibble: 5 x 3 ## a b list_column ## <chr> <chr> <list> ## 1 A F <chr > ## 2 B G <chr > ## 3 C H <chr > ## 4 D I <chr > ## 5 E J <chr > library(tidyr) unnest(df, list_column) ## # A tibble: 9 x 3 ## a b list_column ## <chr> <chr> <chr> ## 1 A F A ## 2 A F B ## 3 A F C ## 4 A F D ## 5 A F E ## 6 B G F ## 7 C H G ## 8 D I H ## 9 E J I I came across this a lot while working on data from Twitter since individual tweets can contain multiple hashtags, mentions, URLs and so on, which is why they are stored in lists.
I’m happy to announce that rwhatsapp is now on CRAN. After being tested by users on GitHub for a year now, I decided it is time to make the package available to a wider audience. The goal of the package is to make working with ‘WhatsApp’ chat logs as easy as possible.
‘WhatsApp’ seems to become increasingly important not just as a messaging service but also as a social network—thanks to its group chat capabilities.
Some time ago, I saw a presentation by Wouter van Atteveldt who showed that wordclouds aren’t necessarily stupid. I was amazed since wordclouds were one of the first things I ever did in R and they are still often shown in introductions to text analysis. But the way they are mostly done is, in fact, not very informative. Because the position of the individual words in the cloud do not mean anything, the only information communicated is through the font size and sometimes font colour of the words.
My PhD supervisor once told me that everyone doing newspaper analysis starts by writing code to read in files from the ‘LexisNexis’ newspaper archive. However, while I do recommend this exercise, not everyone has the time.
These are the first words of the introduction to my first R package, LexisNexisTools. My PhD supervisor was also my supervisor for my master dissertation and he said these words before he gave me my very first book about R.