For this edition of TidyTuesday exploration, I’m going to give map data in R a go. Let’s look at some trees in San Francisco and see if we can plot them geographically.
Let’s see if we can plot all of the trees on a map.
Trying A County Overlay
Let’s try plotting things overlaid on the county lines of San Francisco. Here I’m helped by a great tutorial from Eric Anderson. Going to get the state and plot the San Francisco County.
states = map_data("state")
ca_df = states %>%
filter(region == 'california')
ca_base = ca_df %>%
ggplot(aes(long, lat, group = group)) +
coord_fixed(1.2) +
geom_polygon(color = 'grey', fill = NA)
counties = map_data("county")
san_fran = counties %>%
filter(str_detect(subregion, "fran") & str_detect(region, "califor"))
Here’s where San Francisco County is.
with_san_fran = ca_base +
geom_polygon(data = san_fran, fill = NA, color = "red")
with_san_fran
We can do some zooming in to get a closer look.
with_san_fran +
coord_fixed(xlim = c(-122.55, -122.3),
ylim = c(37.68, 37.83),
ratio = 1.3)
And if we add in the trees, we see they don’t line up very nicely. Not sure why this is happening. After checking Google Maps, looks like the tree locations make sense in relation to other landmarks, but the county lines seems to be off. They may be a little less accurate or I’m doing something wrong plotting.
with_san_fran +
geom_point(data = trees, aes(longitude, latitude), group = NA,
alpha=0.1, color = "seagreen") +
coord_fixed(xlim = c(-122.55, -122.3),
ylim = c(37.68, 37.83),
ratio = 1.3)
New Approach: Just the Trees
Let’s ditch the county overlay and just plot the trees that makeup the majority of the region. Making the circles smaller helps to reveal the detail.
sf_trees = trees %>%
filter(between(longitude, -122.525, -122.35) &
between(latitude, 37, 38))
sf_trees %>%
ggplot(aes(longitude, latitude)) +
geom_point(alpha=0.1, color = "seagreen", size = 0.2) +
coord_fixed(xlim = c(-122.525, -122.35),
ylim = c(37.7, 37.82),
ratio = 1.3) +
theme_void()
Simply gorgeous!
Let’s Paint Some Happy Little Trees
Let’s see if we can color in the trees based on something. Let’s do their legal_status
sf_trees %>%
count(legal_status, sort = TRUE)
## # A tibble: 10 x 2
## legal_status n
## <chr> <int>
## 1 DPW Maintained 140598
## 2 Permitted Site 37995
## 3 Undocumented 8080
## 4 Significant Tree 1620
## 5 Planning Code 138.1 required 949
## 6 Property Tree 315
## 7 Section 143 225
## 8 Private 156
## 9 <NA> 54
## 10 Landmark tree 33
Slim things down a bit to limit colors.
sf_trees %>%
mutate(legal_status_simple = if_else(legal_status %in% c("DPW Maintained", "Permitted Site"), legal_status, "Other")) %>%
ggplot(aes(longitude, latitude)) +
geom_point(aes(color = legal_status_simple), alpha=0.4, size = 0.1) +
coord_fixed(xlim = c(-122.525, -122.35),
ylim = c(37.7, 37.82),
ratio = 1.3) +
scale_color_manual(values = c("seagreen", "grey", "indianred")) +
theme_void() +
labs(title = "San Francisco's Trees",
color = "Legal Status",
caption = "zachbogart.com\nSource: TidyTuesday") +
guides(color = guide_legend(override.aes = list(alpha=1, size = 3)))
Let’s try coloring by the type of tree, too.
sf_trees %>%
count(species, sort=TRUE)
## # A tibble: 570 x 2
## species n
## <chr> <int>
## 1 Platanus x hispanica :: Sycamore: London Plane 11502
## 2 Tree(s) :: 10391
## 3 Metrosideros excelsa :: New Zealand Xmas Tree 8684
## 4 Lophostemon confertus :: Brisbane Box 8486
## 5 Tristaniopsis laurina :: Swamp Myrtle 7173
## 6 Pittosporum undulatum :: Victorian Box 7086
## 7 Prunus cerasifera :: Cherry Plum 6700
## 8 Magnolia grandiflora :: Southern Magnolia 6250
## 9 Ficus microcarpa nitida 'Green Gem' :: Indian Laurel Fig Tree 'Green G… 5623
## 10 Arbutus 'Marina' :: Hybrid Strawberry Tree 5611
## # … with 560 more rows
sf_trees %>%
mutate(species_simple = if_else(species %in% c("Platanus x hispanica :: Sycamore: London Plane",
"Metrosideros excelsa :: New Zealand Xmas Tree",
"Lophostemon confertus :: Brisbane Box",
"Tristaniopsis laurina :: Swamp Myrtle",
"Pittosporum undulatum :: Victorian Box",
"Prunus cerasifera :: Cherry Plum"), species, "Other")) %>%
ggplot(aes(longitude, latitude)) +
geom_point(aes(color = species_simple), alpha=0.2, size = 0.1) +
coord_fixed(xlim = c(-122.525, -122.35),
ylim = c(37.7, 37.82),
ratio = 1.3) +
scale_color_manual(values = c("seagreen", "indianred", "grey", "sandybrown",
"limegreen", "firebrick", "palevioletred")) +
theme_void() +
labs(title = "San Francisco's Trees",
color = "Tree Type",
caption = "zachbogart.com\nSource: TidyTuesday") +
guides(color = guide_legend(override.aes = list(alpha=1, size = 3)))
Schweet. There seems to be some interesting concentrations like along Sunset Parkway (vertical line on the left). Also some well-known parks are not present, such as the Presidio or Golden Gate Park (a single dot in the center of the big blank rectangle). Either way, a gorgeous result of the complexity of a busy city just looking at the trees that dot the landscape.
Till next time!
Learning
- Haven’t done map work before in R.
coord_fixed()
is important to get preserve the representation of the data visually. Will need further understanding since mostly going off of online tutorials. - When working with map data, it is helpful to compare with another source to confirm things are the right way round.
- Didn’t trip me up this time but was reminded as I worked that people say “lat/long”, but plotting in x/y space, the phrase is the reverse: “Long/Lat”
- I skimmed Eric Anderson’s tutorial on mapping in R to get me started. A helpful guide.
- Interesting problem: When plotting so many points, the desire is to crank down the opacity, but coloring points means the legend will be super faded. Learned that the legend aesthetics can be overritten, allowing for large opaque circles to be used to define the legend. Will be helpful in the future.
- Always looking for nice hex colors that are named, so this color site was helpful
Image Credit
twig by Zach Bogart from the Noun Project