Street Trees in San Francisco, CA

Introduction

This is an analysis of the various trees that have been planted and are maintained on the streets of San Francsico in California. The data has been acquired from DataSF and is updated daily. The link to the data: https://data.sfgov.org/City-Infrastructure/Street-Tree-List/tkzw-k3nq.

The three questions I will be examining in this report are as follows.
1. What are the most common tree species on San Francisco streets and in what quantity?
2. What does the distribution of DBH (diameter at breast height) values of Juniperus chinensis look like?
3. How do the DBH values compare for the top three most prevalent species planted?

Data Import and First Look

Library imports:

library(ggplot2)
library(readr)
library(knitr)
library(magrittr)
library(lubridate)
library(dplyr)

Data imports:

library(readr)
df <- read_csv("Street_Tree_List.csv", 
    col_types = cols(PlantDate = col_datetime(format = "%m/%d/%Y %H:%M:%S AM")))

Let’s take a quick look at the data, isolating the variables we will be using:

simpledf <- df %>% select(PlantType, qSpecies, DBH, PlantDate)
kable(head(simpledf))

PlantType	qSpecies	DBH	PlantDate
Tree	Eriobotrya deflexa :: Bronze Loquat	3	2018-01-30 12:00:00
Tree	Tristaniopsis laurina :: Swamp Myrtle	3	2018-01-30 12:00:00
Tree	Pinus radiata :: Monterey Pine	NA	NA
Tree	Tristania conferta ::	3	2018-01-30 12:00:00
Tree	Acacia melanoxylon :: Blackwood Acacia	17	NA
Tree	Eriobotrya deflexa :: Bronze Loquat	3	2018-01-30 12:00:00

Data Analysis

Question 1: What are the most common tree species on San Francisco streets and in what quantity?

First we will see how many total species are cataloged in this dataset.

length(unique(simpledf$qSpecies))

## [1] 565

That’s a lot of species!

Next, we can take a glimpse at the frequency of each planted tree species in SF streets.

x <- as.data.frame(table(simpledf$qSpecies), na.rm = TRUE)
kable(head(x))

Var1	Freq
::	1912
Abutilon hybridum :: Flowering maple	5
Acacia baileyana :: Bailey’s Acacia	832
Acacia baileyana ‘Purpurea’ :: Purple-leaf Acacia	21
Acacia cognata :: River Wattle	27
Acacia cyclops :: Cyclops wattle	16

Looking at the data in x we can see that the most common species is Sycamore: London Plane with 11,489 tree individuals - impressive. Next, we will import the SF_Top_Ten Dataset which soley includes the top ten species, their common names, and frequency counts.

SF_Top_Ten <- read_csv("SF_Top_Ten.csv")
kable(head(SF_Top_Ten))

Species	Frequency
Sycamore: London Plane	11489
New Zealand Xmas Tree	8702
Brisbane Box	8494
Victorian Box	7030
Swamp Myrtle	6975
Cherry Plum	6698

Using this we can construct a barplot of the top ten tree species and their frequency.

ggplot(data= SF_Top_Ten, aes(x= reorder(Species, -Frequency), y = Frequency)) +
   geom_bar(stat="identity", fill="steelblue")+
  theme_minimal()+
  labs(x= "Tree species", y= "Number of tree individuals", title= "Top ten tree species planted on the streets of San Francisco") + theme(axis.text.x = element_text(angle = 90, hjust = 1))

Looking at this graph, we see that there are nearly 12,000 London Plane Sycamore trees planted in SF streets. There is a significant drop in quantity moving to the second most frequent tree, the New Zealand Christmas tree.

Question 2: How do the DBH (diameter at breast height) values of Juniperus chinensis species change with age?

We are going to filter the simpledf data to isolate only the Juniperus chinesis records. We will also remove any NA values from the DBH and PlantDate column.

Juniperdf <- simpledf %>% filter(qSpecies == "Juniperus chinensis :: Juniper")
Juniperdf %>% na.omit()

From here we will find the elapsed time between planting date and today’s date.

Juniperdf$age <- today() - as.Date(Juniperdf$PlantDate)

ggplot() +
  geom_point(data= Juniperdf, mapping=aes(x=age, y=DBH), color="darkgreen")+
  labs(y= "DBH in cm", x="Age in days", title= "Change in DBH as Juniper ages") +
  ylim(0,30)

It’s important to note that 5,000 days is roughly 13.5 years old and 15,000 days is around 42 years old. We can see that there is a slight upwards trend in DBH as Juniper ages, but most trees don’t grow over 15 cm in diameter at breast height.

Next, we can also look at the distribution of DBH values for our Juniper species to see where most diameters fall.

ggplot()+
  geom_histogram(data=Juniperdf, mapping= aes(x= DBH), color="darkblue", fill="lightblue")+
  labs(title= "Diameter at Breast Height (cm) distribution for Juniper", y= "Count", x="DBH (cm)")

This graph shows us that the majority of planted Juniper trees are roughly 17cm diameter or less, but there are some outliers extending to above 60cm in diameter.

Question 3: How do the DBH values compare for the top three most prevalent species planted?

To do this analysis, we must first filter our data so we isolate the top three species - London Plane Sycamore, New Zealand Christmas tree, and Brisbane Box. From there we can create a facet grid, comparing the DBH values of these species.

topthree <- simpledf %>% filter(qSpecies== "Platanus x hispanica :: Sycamore: London Plane" | qSpecies== "Metrosideros excelsa :: New Zealand Xmas Tree"| qSpecies=="Lophostemon confertus :: Brisbane Box")

ggplot()+
  geom_histogram(data= topthree, 
                 mapping = aes(x=DBH, fill=qSpecies), bins= 20) +
  facet_grid(. ~ qSpecies) +
  scale_x_log10() +
  labs(title= "Distribution of DBHs for the top three most common species in SF streets", y="Tree individuals", x= "DBH in cm")+
  scale_fill_discrete(name = "Species")

Conclusion

There are many things we have learned from this dataset.

First, we learned which species, out of 565 were the most common on the streets of SF. We also saw that the most common species, the London Plane Sycamore, has almost 12,000 trees planted - an enormous number!

Second, we saw the age span of Juniper trees and their respective DBH values. There is a slight upward trend in DBH in the planted trees as age increases, but most trees stay beneath 15cm in diameter. Additionally, the histogram of DBH values for Juniper gave us a good idea of what most trees’ DBH values averaged.

Third, we got a comparison of the DBH values of the top three most common species planted. Across species, the trees’ DBH values were mostly clustered near 10cm - but we also saw the difference in quantity of trees between Sycamore and other two species.

Deviations: The only deviation I made was needing to create and import the SF_Top_Ten.csv file. It was an enormous help, to be able to simplify the names of the species so they were readable and record their frequencies easily. I tried for a long time to do these adjustments in R, but decided that for so few values (10), the best way to work with the data was to create my own csv file with the needed, simple information.

Thank you very much for all your help in making this project!