I love everything() from the tidyverse
It is probably true that I love everything from the tidyverse, full stop, but in this case parens matter. Reordering columns happens a bunch and it can be a pain, but everything()
allows for easy manipulation of columns in a dataframe. Let’s give it a go.
How I’d do it in Python
Say I have some data on rainfall and I want to add a column for the decade in which the reading was taken (Source: tidytuesday).
import pandas as pd
pd.set_option("display.max_columns", 20)
df = pd.read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2020/2020-01-07/rainfall.csv')
df["decade"] = df.year // 10 * 10
df.head()
## station_code city_name year month day rainfall period quality lat \
## 0 9151 Perth 1967 1 1 NaN NaN NaN -31.96
## 1 9151 Perth 1967 1 2 NaN NaN NaN -31.96
## 2 9151 Perth 1967 1 3 NaN NaN NaN -31.96
## 3 9151 Perth 1967 1 4 NaN NaN NaN -31.96
## 4 9151 Perth 1967 1 5 NaN NaN NaN -31.96
##
## long station_name decade
## 0 115.79 Subiaco Wastewater Treatment Plant 1960
## 1 115.79 Subiaco Wastewater Treatment Plant 1960
## 2 115.79 Subiaco Wastewater Treatment Plant 1960
## 3 115.79 Subiaco Wastewater Treatment Plant 1960
## 4 115.79 Subiaco Wastewater Treatment Plant 1960
If I create a new column, it is tacked onto the end of the dataframe, but I’d prefer the decade
and year
columns to be closer together. In python, I find this to be kind of a pain since you need to know the column index and you have to do conversions and junk (insert is also an option, but you have to drop duplicate columns…it’s a whole deal):
cols = list(df.columns)
# manual ordering of list...ewww
cols = cols[:3] + [cols[-1]] + cols[3:-1]
df[cols].head()
## station_code city_name year decade month day rainfall period quality \
## 0 9151 Perth 1967 1960 1 1 NaN NaN NaN
## 1 9151 Perth 1967 1960 1 2 NaN NaN NaN
## 2 9151 Perth 1967 1960 1 3 NaN NaN NaN
## 3 9151 Perth 1967 1960 1 4 NaN NaN NaN
## 4 9151 Perth 1967 1960 1 5 NaN NaN NaN
##
## lat long station_name
## 0 -31.96 115.79 Subiaco Wastewater Treatment Plant
## 1 -31.96 115.79 Subiaco Wastewater Treatment Plant
## 2 -31.96 115.79 Subiaco Wastewater Treatment Plant
## 3 -31.96 115.79 Subiaco Wastewater Treatment Plant
## 4 -31.96 115.79 Subiaco Wastewater Treatment Plant
How Tidyverse does it
I find it much easier to manipulate the column names. In comes everything()
to the rescue. Combining with select()
, you can get pretty fancy.
library(tidyverse)
rainfall <- read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2020/2020-01-07/rainfall.csv') %>%
mutate(decade = year %/% 10 * 10)
# drop it right into place
rainfall %>%
select(station_code:year, decade, everything())
## # A tibble: 179,273 x 12
## station_code city_name year decade month day rainfall period quality lat
## <chr> <chr> <dbl> <dbl> <chr> <chr> <dbl> <dbl> <chr> <dbl>
## 1 009151 Perth 1967 1960 01 01 NA NA <NA> -32.0
## 2 009151 Perth 1967 1960 01 02 NA NA <NA> -32.0
## 3 009151 Perth 1967 1960 01 03 NA NA <NA> -32.0
## 4 009151 Perth 1967 1960 01 04 NA NA <NA> -32.0
## 5 009151 Perth 1967 1960 01 05 NA NA <NA> -32.0
## 6 009151 Perth 1967 1960 01 06 NA NA <NA> -32.0
## 7 009151 Perth 1967 1960 01 07 NA NA <NA> -32.0
## 8 009151 Perth 1967 1960 01 08 NA NA <NA> -32.0
## 9 009151 Perth 1967 1960 01 09 NA NA <NA> -32.0
## 10 009151 Perth 1967 1960 01 10 NA NA <NA> -32.0
## # … with 179,263 more rows, and 2 more variables: long <dbl>,
## # station_name <chr>
The reason I love everything()
is you don’t have to think about it. Get the thing in place then shove the rest on and it’ll deal with it nicely. Also learned after writing this that last_col()
is also available to help out.
Image Credit
Heart by Zach Bogart from the Noun Project