Tidytuesday #4: Trees in San Francisco

Reveal the City, Just Through Trees

For this edition of TidyTuesday exploration, I’m going to give map data in R a go. Let’s look at some trees in San Francisco and see if we can plot them geographically.

Let’s see if we can plot all of the trees on a map.

Trying A County Overlay

Let’s try plotting things overlaid on the county lines of San Francisco. Here I’m helped by a great tutorial from Eric Anderson. Going to get the state and plot the San Francisco County.

states = map_data("state")

ca_df = states %>% 
  filter(region == 'california')

ca_base = ca_df %>% 
  ggplot(aes(long, lat, group = group)) +
    coord_fixed(1.2) +
    geom_polygon(color = 'grey', fill = NA)
counties = map_data("county")

san_fran = counties %>% 
  filter(str_detect(subregion, "fran") & str_detect(region, "califor"))

Here’s where San Francisco County is.

with_san_fran = ca_base +
  geom_polygon(data = san_fran, fill = NA, color = "red")

with_san_fran

We can do some zooming in to get a closer look.

with_san_fran +
  coord_fixed(xlim = c(-122.55, -122.3), 
              ylim = c(37.68, 37.83),
              ratio = 1.3)

And if we add in the trees, we see they don’t line up very nicely. Not sure why this is happening. After checking Google Maps, looks like the tree locations make sense in relation to other landmarks, but the county lines seems to be off. They may be a little less accurate or I’m doing something wrong plotting.

with_san_fran +
  geom_point(data = trees, aes(longitude, latitude), group = NA,
             alpha=0.1, color = "seagreen") +
  coord_fixed(xlim = c(-122.55, -122.3), 
              ylim = c(37.68, 37.83),
              ratio = 1.3)

New Approach: Just the Trees

Let’s ditch the county overlay and just plot the trees that makeup the majority of the region. Making the circles smaller helps to reveal the detail.

sf_trees = trees %>% 
  filter(between(longitude, -122.525, -122.35) &
         between(latitude, 37, 38))

sf_trees %>% 
  ggplot(aes(longitude, latitude)) +
  geom_point(alpha=0.1, color = "seagreen", size = 0.2) +
  coord_fixed(xlim = c(-122.525, -122.35), 
              ylim = c(37.7, 37.82),
              ratio = 1.3) +
  theme_void()

Simply gorgeous!

Let’s Paint Some Happy Little Trees

Let’s see if we can color in the trees based on something. Let’s do their legal_status

sf_trees %>% 
  count(legal_status, sort = TRUE)
## # A tibble: 10 x 2
##    legal_status                      n
##    <chr>                         <int>
##  1 DPW Maintained               140598
##  2 Permitted Site                37995
##  3 Undocumented                   8080
##  4 Significant Tree               1620
##  5 Planning Code 138.1 required    949
##  6 Property Tree                   315
##  7 Section 143                     225
##  8 Private                         156
##  9 <NA>                             54
## 10 Landmark tree                    33

Slim things down a bit to limit colors.

sf_trees %>% 
  mutate(legal_status_simple = if_else(legal_status %in% c("DPW Maintained", "Permitted Site"), legal_status, "Other")) %>% 
  
  ggplot(aes(longitude, latitude)) +
  geom_point(aes(color = legal_status_simple), alpha=0.4, size = 0.1) +
  coord_fixed(xlim = c(-122.525, -122.35), 
              ylim = c(37.7, 37.82),
              ratio = 1.3) +
  scale_color_manual(values = c("seagreen", "grey", "indianred")) +
  theme_void() +
  labs(title = "San Francisco's Trees",
       color = "Legal Status",
       caption = "zachbogart.com\nSource: TidyTuesday") +
  guides(color = guide_legend(override.aes = list(alpha=1, size = 3)))

Let’s try coloring by the type of tree, too.

sf_trees %>% 
  count(species, sort=TRUE)
## # A tibble: 570 x 2
##    species                                                                     n
##    <chr>                                                                   <int>
##  1 Platanus x hispanica :: Sycamore: London Plane                          11502
##  2 Tree(s) ::                                                              10391
##  3 Metrosideros excelsa :: New Zealand Xmas Tree                            8684
##  4 Lophostemon confertus :: Brisbane Box                                    8486
##  5 Tristaniopsis laurina :: Swamp Myrtle                                    7173
##  6 Pittosporum undulatum :: Victorian Box                                   7086
##  7 Prunus cerasifera :: Cherry Plum                                         6700
##  8 Magnolia grandiflora :: Southern Magnolia                                6250
##  9 Ficus microcarpa nitida 'Green Gem' :: Indian Laurel Fig Tree 'Green G…  5623
## 10 Arbutus 'Marina' :: Hybrid Strawberry Tree                               5611
## # … with 560 more rows
sf_trees %>% 
  mutate(species_simple = if_else(species %in% c("Platanus x hispanica :: Sycamore: London Plane", 
                                                 "Metrosideros excelsa :: New Zealand Xmas Tree",
                                                 "Lophostemon confertus :: Brisbane Box",
                                                 "Tristaniopsis laurina :: Swamp Myrtle",
                                                 "Pittosporum undulatum :: Victorian Box",
                                                 "Prunus cerasifera :: Cherry Plum"), species, "Other")) %>% 
  
  ggplot(aes(longitude, latitude)) +
  geom_point(aes(color = species_simple), alpha=0.2, size = 0.1) +
  coord_fixed(xlim = c(-122.525, -122.35), 
              ylim = c(37.7, 37.82),
              ratio = 1.3) +
  scale_color_manual(values = c("seagreen", "indianred", "grey", "sandybrown",
                                "limegreen", "firebrick", "palevioletred")) +
  theme_void() +
  labs(title = "San Francisco's Trees",
       color = "Tree Type",
       caption = "zachbogart.com\nSource: TidyTuesday") +
  guides(color = guide_legend(override.aes = list(alpha=1, size = 3)))

Schweet. There seems to be some interesting concentrations like along Sunset Parkway (vertical line on the left). Also some well-known parks are not present, such as the Presidio or Golden Gate Park (a single dot in the center of the big blank rectangle). Either way, a gorgeous result of the complexity of a busy city just looking at the trees that dot the landscape.

Till next time!

Learning

  • Haven’t done map work before in R. coord_fixed() is important to get preserve the representation of the data visually. Will need further understanding since mostly going off of online tutorials.
  • When working with map data, it is helpful to compare with another source to confirm things are the right way round.
  • Didn’t trip me up this time but was reminded as I worked that people say “lat/long”, but plotting in x/y space, the phrase is the reverse: “Long/Lat”
  • I skimmed Eric Anderson’s tutorial on mapping in R to get me started. A helpful guide.
  • Interesting problem: When plotting so many points, the desire is to crank down the opacity, but coloring points means the legend will be super faded. Learned that the legend aesthetics can be overritten, allowing for large opaque circles to be used to define the legend. Will be helpful in the future.
  • Always looking for nice hex colors that are named, so this color site was helpful

Image Credit

twig by Zach Bogart from the Noun Project

Zach Bogart
Zach Bogart
Data Explorer

Science, Design, & Data. I’ll know it when I see it.

Related