Introduction

This is an analysis of the various trees that have been planted and are maintained on the streets of San Francsico in California. The data has been acquired from DataSF and is updated daily. The link to the data: https://data.sfgov.org/City-Infrastructure/Street-Tree-List/tkzw-k3nq.

The three questions I will be examining in this report are as follows.
1. What are the most common tree species on San Francisco streets and in what quantity?
2. What does the distribution of DBH (diameter at breast height) values of Juniperus chinensis look like?
3. How do the DBH values compare for the top three most prevalent species planted?

Data Import and First Look

Library imports:
library(ggplot2)
library(readr)
library(knitr)
library(magrittr)
library(lubridate)
library(dplyr)
Data imports:
library(readr)
df <- read_csv("Street_Tree_List.csv", 
    col_types = cols(PlantDate = col_datetime(format = "%m/%d/%Y %H:%M:%S AM")))

Let’s take a quick look at the data, isolating the variables we will be using:

simpledf <- df %>% select(PlantType, qSpecies, DBH, PlantDate)
kable(head(simpledf))
PlantType qSpecies DBH PlantDate
Tree Eriobotrya deflexa :: Bronze Loquat 3 2018-01-30 12:00:00
Tree Tristaniopsis laurina :: Swamp Myrtle 3 2018-01-30 12:00:00
Tree Pinus radiata :: Monterey Pine NA NA
Tree Tristania conferta :: 3 2018-01-30 12:00:00
Tree Acacia melanoxylon :: Blackwood Acacia 17 NA
Tree Eriobotrya deflexa :: Bronze Loquat 3 2018-01-30 12:00:00

Data Analysis

Question 1: What are the most common tree species on San Francisco streets and in what quantity?

First we will see how many total species are cataloged in this dataset.

length(unique(simpledf$qSpecies))
## [1] 565

That’s a lot of species!

Next, we can take a glimpse at the frequency of each planted tree species in SF streets.

x <- as.data.frame(table(simpledf$qSpecies), na.rm = TRUE)
kable(head(x))
Var1 Freq
:: 1912
Abutilon hybridum :: Flowering maple 5
Acacia baileyana :: Bailey’s Acacia 832
Acacia baileyana ‘Purpurea’ :: Purple-leaf Acacia 21
Acacia cognata :: River Wattle 27
Acacia cyclops :: Cyclops wattle 16

Looking at the data in x we can see that the most common species is Sycamore: London Plane with 11,489 tree individuals - impressive. Next, we will import the SF_Top_Ten Dataset which soley includes the top ten species, their common names, and frequency counts.

SF_Top_Ten <- read_csv("SF_Top_Ten.csv")
kable(head(SF_Top_Ten))
Species Frequency
Sycamore: London Plane 11489
New Zealand Xmas Tree 8702
Brisbane Box 8494
Victorian Box 7030
Swamp Myrtle 6975
Cherry Plum 6698

Using this we can construct a barplot of the top ten tree species and their frequency.

ggplot(data= SF_Top_Ten, aes(x= reorder(Species, -Frequency), y = Frequency)) +
   geom_bar(stat="identity", fill="steelblue")+
  theme_minimal()+
  labs(x= "Tree species", y= "Number of tree individuals", title= "Top ten tree species planted on the streets of San Francisco") + theme(axis.text.x = element_text(angle = 90, hjust = 1)) 

Looking at this graph, we see that there are nearly 12,000 London Plane Sycamore trees planted in SF streets. There is a significant drop in quantity moving to the second most frequent tree, the New Zealand Christmas tree.

Question 2: How do the DBH (diameter at breast height) values of Juniperus chinensis species change with age?

We are going to filter the simpledf data to isolate only the Juniperus chinesis records. We will also remove any NA values from the DBH and PlantDate column.

Juniperdf <- simpledf %>% filter(qSpecies == "Juniperus chinensis :: Juniper")
Juniperdf %>% na.omit()

From here we will find the elapsed time between planting date and today’s date.

Juniperdf$age <- today() - as.Date(Juniperdf$PlantDate)
ggplot() +
  geom_point(data= Juniperdf, mapping=aes(x=age, y=DBH), color="darkgreen")+
  labs(y= "DBH in cm", x="Age in days", title= "Change in DBH as Juniper ages") +
  ylim(0,30) 

It’s important to note that 5,000 days is roughly 13.5 years old and 15,000 days is around 42 years old. We can see that there is a slight upwards trend in DBH as Juniper ages, but most trees don’t grow over 15 cm in diameter at breast height.

Next, we can also look at the distribution of DBH values for our Juniper species to see where most diameters fall.

ggplot()+
  geom_histogram(data=Juniperdf, mapping= aes(x= DBH), color="darkblue", fill="lightblue")+
  labs(title= "Diameter at Breast Height (cm) distribution for Juniper", y= "Count", x="DBH (cm)")

This graph shows us that the majority of planted Juniper trees are roughly 17cm diameter or less, but there are some outliers extending to above 60cm in diameter.

Question 3: How do the DBH values compare for the top three most prevalent species planted?

To do this analysis, we must first filter our data so we isolate the top three species - London Plane Sycamore, New Zealand Christmas tree, and Brisbane Box. From there we can create a facet grid, comparing the DBH values of these species.

topthree <- simpledf %>% filter(qSpecies== "Platanus x hispanica :: Sycamore: London Plane" | qSpecies== "Metrosideros excelsa :: New Zealand Xmas Tree"| qSpecies=="Lophostemon confertus :: Brisbane Box")

ggplot()+
  geom_histogram(data= topthree, 
                 mapping = aes(x=DBH, fill=qSpecies), bins= 20) +
  facet_grid(. ~ qSpecies) +
  scale_x_log10() +
  labs(title= "Distribution of DBHs for the top three most common species in SF streets", y="Tree individuals", x= "DBH in cm")+
  scale_fill_discrete(name = "Species")

Conclusion

There are many things we have learned from this dataset.

First, we learned which species, out of 565 were the most common on the streets of SF. We also saw that the most common species, the London Plane Sycamore, has almost 12,000 trees planted - an enormous number!

Second, we saw the age span of Juniper trees and their respective DBH values. There is a slight upward trend in DBH in the planted trees as age increases, but most trees stay beneath 15cm in diameter. Additionally, the histogram of DBH values for Juniper gave us a good idea of what most trees’ DBH values averaged.

Third, we got a comparison of the DBH values of the top three most common species planted. Across species, the trees’ DBH values were mostly clustered near 10cm - but we also saw the difference in quantity of trees between Sycamore and other two species.

Deviations: The only deviation I made was needing to create and import the SF_Top_Ten.csv file. It was an enormous help, to be able to simplify the names of the species so they were readable and record their frequencies easily. I tried for a long time to do these adjustments in R, but decided that for so few values (10), the best way to work with the data was to create my own csv file with the needed, simple information.

Thank you very much for all your help in making this project!