I love everything() from the tidyverse

It is probably true that I love everything from the tidyverse, full stop, but in this case parens matter. Reordering columns happens a bunch and it can be a pain, but everything() allows for easy manipulation of columns in a dataframe. Let’s give it a go.

How I’d do it in Python

Say I have some data on rainfall and I want to add a column for the decade in which the reading was taken (Source: tidytuesday).

import pandas as pd
pd.set_option("display.max_columns", 20)

df = pd.read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2020/2020-01-07/rainfall.csv')

df["decade"] = df.year // 10 * 10

df.head()
##    station_code city_name  year  month  day  rainfall  period quality    lat  \
## 0          9151     Perth  1967      1    1       NaN     NaN     NaN -31.96   
## 1          9151     Perth  1967      1    2       NaN     NaN     NaN -31.96   
## 2          9151     Perth  1967      1    3       NaN     NaN     NaN -31.96   
## 3          9151     Perth  1967      1    4       NaN     NaN     NaN -31.96   
## 4          9151     Perth  1967      1    5       NaN     NaN     NaN -31.96   
## 
##      long                        station_name  decade  
## 0  115.79  Subiaco Wastewater Treatment Plant    1960  
## 1  115.79  Subiaco Wastewater Treatment Plant    1960  
## 2  115.79  Subiaco Wastewater Treatment Plant    1960  
## 3  115.79  Subiaco Wastewater Treatment Plant    1960  
## 4  115.79  Subiaco Wastewater Treatment Plant    1960

If I create a new column, it is tacked onto the end of the dataframe, but I’d prefer the decade and year columns to be closer together. In python, I find this to be kind of a pain since you need to know the column index and you have to do conversions and junk (insert is also an option, but you have to drop duplicate columns…it’s a whole deal):

cols = list(df.columns)
# manual ordering of list...ewww
cols = cols[:3] + [cols[-1]] + cols[3:-1]

df[cols].head()
##    station_code city_name  year  decade  month  day  rainfall  period quality  \
## 0          9151     Perth  1967    1960      1    1       NaN     NaN     NaN   
## 1          9151     Perth  1967    1960      1    2       NaN     NaN     NaN   
## 2          9151     Perth  1967    1960      1    3       NaN     NaN     NaN   
## 3          9151     Perth  1967    1960      1    4       NaN     NaN     NaN   
## 4          9151     Perth  1967    1960      1    5       NaN     NaN     NaN   
## 
##      lat    long                        station_name  
## 0 -31.96  115.79  Subiaco Wastewater Treatment Plant  
## 1 -31.96  115.79  Subiaco Wastewater Treatment Plant  
## 2 -31.96  115.79  Subiaco Wastewater Treatment Plant  
## 3 -31.96  115.79  Subiaco Wastewater Treatment Plant  
## 4 -31.96  115.79  Subiaco Wastewater Treatment Plant

How Tidyverse does it

I find it much easier to manipulate the column names. In comes everything() to the rescue. Combining with select(), you can get pretty fancy.

library(tidyverse)

rainfall <- read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2020/2020-01-07/rainfall.csv') %>% 
  mutate(decade = year %/% 10 * 10)
# drop it right into place
rainfall %>% 
  select(station_code:year, decade, everything())
## # A tibble: 179,273 x 12
##    station_code city_name  year decade month day   rainfall period quality   lat
##    <chr>        <chr>     <dbl>  <dbl> <chr> <chr>    <dbl>  <dbl> <chr>   <dbl>
##  1 009151       Perth      1967   1960 01    01          NA     NA <NA>    -32.0
##  2 009151       Perth      1967   1960 01    02          NA     NA <NA>    -32.0
##  3 009151       Perth      1967   1960 01    03          NA     NA <NA>    -32.0
##  4 009151       Perth      1967   1960 01    04          NA     NA <NA>    -32.0
##  5 009151       Perth      1967   1960 01    05          NA     NA <NA>    -32.0
##  6 009151       Perth      1967   1960 01    06          NA     NA <NA>    -32.0
##  7 009151       Perth      1967   1960 01    07          NA     NA <NA>    -32.0
##  8 009151       Perth      1967   1960 01    08          NA     NA <NA>    -32.0
##  9 009151       Perth      1967   1960 01    09          NA     NA <NA>    -32.0
## 10 009151       Perth      1967   1960 01    10          NA     NA <NA>    -32.0
## # … with 179,263 more rows, and 2 more variables: long <dbl>,
## #   station_name <chr>

The reason I love everything() is you don’t have to think about it. Get the thing in place then shove the rest on and it’ll deal with it nicely. Also learned after writing this that last_col() is also available to help out.

Image Credit

Heart by Zach Bogart from the Noun Project

Zach Bogart
Zach Bogart
Data Explorer

Science, Design, & Data. I’ll know it when I see it.

Related