This is an analysis of the various trees that have been planted and are maintained on the streets of San Francsico in California. The data has been acquired from DataSF and is updated daily. The link to the data: https://data.sfgov.org/City-Infrastructure/Street-Tree-List/tkzw-k3nq.
The three questions I will be examining in this report are as follows.
1. What are the most common tree species on San Francisco streets and in what quantity?
2. What does the distribution of DBH (diameter at breast height) values of Juniperus chinensis look like?
3. How do the DBH values compare for the top three most prevalent species planted?
library(ggplot2)
library(readr)
library(knitr)
library(magrittr)
library(lubridate)
library(dplyr)
library(readr)
df <- read_csv("Street_Tree_List.csv",
col_types = cols(PlantDate = col_datetime(format = "%m/%d/%Y %H:%M:%S AM")))
Let’s take a quick look at the data, isolating the variables we will be using:
simpledf <- df %>% select(PlantType, qSpecies, DBH, PlantDate)
kable(head(simpledf))
PlantType | qSpecies | DBH | PlantDate |
---|---|---|---|
Tree | Eriobotrya deflexa :: Bronze Loquat | 3 | 2018-01-30 12:00:00 |
Tree | Tristaniopsis laurina :: Swamp Myrtle | 3 | 2018-01-30 12:00:00 |
Tree | Pinus radiata :: Monterey Pine | NA | NA |
Tree | Tristania conferta :: | 3 | 2018-01-30 12:00:00 |
Tree | Acacia melanoxylon :: Blackwood Acacia | 17 | NA |
Tree | Eriobotrya deflexa :: Bronze Loquat | 3 | 2018-01-30 12:00:00 |
First we will see how many total species are cataloged in this dataset.
length(unique(simpledf$qSpecies))
## [1] 565
That’s a lot of species!
Next, we can take a glimpse at the frequency of each planted tree species in SF streets.
x <- as.data.frame(table(simpledf$qSpecies), na.rm = TRUE)
kable(head(x))
Var1 | Freq |
---|---|
:: | 1912 |
Abutilon hybridum :: Flowering maple | 5 |
Acacia baileyana :: Bailey’s Acacia | 832 |
Acacia baileyana ‘Purpurea’ :: Purple-leaf Acacia | 21 |
Acacia cognata :: River Wattle | 27 |
Acacia cyclops :: Cyclops wattle | 16 |
Looking at the data in x
we can see that the most common species is Sycamore: London Plane with 11,489 tree individuals - impressive. Next, we will import the SF_Top_Ten Dataset which soley includes the top ten species, their common names, and frequency counts.
SF_Top_Ten <- read_csv("SF_Top_Ten.csv")
kable(head(SF_Top_Ten))
Species | Frequency |
---|---|
Sycamore: London Plane | 11489 |
New Zealand Xmas Tree | 8702 |
Brisbane Box | 8494 |
Victorian Box | 7030 |
Swamp Myrtle | 6975 |
Cherry Plum | 6698 |
Using this we can construct a barplot of the top ten tree species and their frequency.
ggplot(data= SF_Top_Ten, aes(x= reorder(Species, -Frequency), y = Frequency)) +
geom_bar(stat="identity", fill="steelblue")+
theme_minimal()+
labs(x= "Tree species", y= "Number of tree individuals", title= "Top ten tree species planted on the streets of San Francisco") + theme(axis.text.x = element_text(angle = 90, hjust = 1))
Looking at this graph, we see that there are nearly 12,000 London Plane Sycamore trees planted in SF streets. There is a significant drop in quantity moving to the second most frequent tree, the New Zealand Christmas tree.
We are going to filter the simpledf
data to isolate only the Juniperus chinesis records. We will also remove any NA values from the DBH and PlantDate column.
Juniperdf <- simpledf %>% filter(qSpecies == "Juniperus chinensis :: Juniper")
Juniperdf %>% na.omit()
From here we will find the elapsed time between planting date and today’s date.
Juniperdf$age <- today() - as.Date(Juniperdf$PlantDate)
ggplot() +
geom_point(data= Juniperdf, mapping=aes(x=age, y=DBH), color="darkgreen")+
labs(y= "DBH in cm", x="Age in days", title= "Change in DBH as Juniper ages") +
ylim(0,30)
It’s important to note that 5,000 days is roughly 13.5 years old and 15,000 days is around 42 years old. We can see that there is a slight upwards trend in DBH as Juniper ages, but most trees don’t grow over 15 cm in diameter at breast height.
Next, we can also look at the distribution of DBH values for our Juniper species to see where most diameters fall.
ggplot()+
geom_histogram(data=Juniperdf, mapping= aes(x= DBH), color="darkblue", fill="lightblue")+
labs(title= "Diameter at Breast Height (cm) distribution for Juniper", y= "Count", x="DBH (cm)")
This graph shows us that the majority of planted Juniper trees are roughly 17cm diameter or less, but there are some outliers extending to above 60cm in diameter.
To do this analysis, we must first filter our data so we isolate the top three species - London Plane Sycamore, New Zealand Christmas tree, and Brisbane Box. From there we can create a facet grid, comparing the DBH values of these species.
topthree <- simpledf %>% filter(qSpecies== "Platanus x hispanica :: Sycamore: London Plane" | qSpecies== "Metrosideros excelsa :: New Zealand Xmas Tree"| qSpecies=="Lophostemon confertus :: Brisbane Box")
ggplot()+
geom_histogram(data= topthree,
mapping = aes(x=DBH, fill=qSpecies), bins= 20) +
facet_grid(. ~ qSpecies) +
scale_x_log10() +
labs(title= "Distribution of DBHs for the top three most common species in SF streets", y="Tree individuals", x= "DBH in cm")+
scale_fill_discrete(name = "Species")
There are many things we have learned from this dataset.
First, we learned which species, out of 565 were the most common on the streets of SF. We also saw that the most common species, the London Plane Sycamore, has almost 12,000 trees planted - an enormous number!
Second, we saw the age span of Juniper trees and their respective DBH values. There is a slight upward trend in DBH in the planted trees as age increases, but most trees stay beneath 15cm in diameter. Additionally, the histogram of DBH values for Juniper gave us a good idea of what most trees’ DBH values averaged.
Third, we got a comparison of the DBH values of the top three most common species planted. Across species, the trees’ DBH values were mostly clustered near 10cm - but we also saw the difference in quantity of trees between Sycamore and other two species.
Deviations: The only deviation I made was needing to create and import the SF_Top_Ten.csv
file. It was an enormous help, to be able to simplify the names of the species so they were readable and record their frequencies easily. I tried for a long time to do these adjustments in R, but decided that for so few values (10), the best way to work with the data was to create my own csv file with the needed, simple information.
Thank you very much for all your help in making this project!