Catégories

## Classification methods in GIS

In this article we are going to discuss classic classification methods in GIS, and why doing a classification.  Sometimes you might be having continuous data and you want to discretize that data. In other words,  you reduce the number of classes. For example, you might be having a  digital elevation model where digital elevation values might be varying between a thousand meters to five thousand meters.   Now you want to create discrete classes for this DEM, and this map will be a relief that you will go for its discretization that is classification.  There are different several methods which are available. Six methods so far have been implemented in standard GIS software. The purpose here is to change continuous data into a discrete one.  We will see from polygon data that the purpose of classification in GIS, is first, to make the continuous data into discrete data and become easier for reading and understanding a map.  Sometimes you would like to hide certain things and highlight certain things and you emphasize something and for that purpose also the classification can be used. there are six classification  methods :

1) Equal interval,

2) Natural breaks (Jenks),

3) Quantile,

4) Equal area,

5) Standard deviations  and more recently it has been added

the geometrical interval which is quite useful.

In this article we will discuss two methods that we will apply and see their impact on the output results :

Here you see two maps of Al Houaz province in morocco,   one that has had its data class using the quantile classification method, the other one using the equal interval method.  let’s take a look at these two methods.

Getting Started :

–       if you have highly variable data

–       if you don’t mind outlier values being less visible

–       if you wish to focus attention on relative rankings

–       if you wish to avoid empty classes and produce an even distribution of mouth colors

then use quantiles to map your data,  results using quartiles place an approximately equal number of observations into each class,    here’s the map that is generated from our data using quantiles.

You can see we have broken the data into five classes and each class has an approximately equal number of observations. This is what we get using the quantile classification.

Let’s take a closer look at how we arrived at this result in our data from Al Haouz. we have 217 observations and we want to divide these into five classes. 217 divided by 5 does not go evenly but we get approximately 43 observations in each of our classes.

Let’s take a look at the equal interval method of classification

– if your data are not highly skewed and are more continuous

– if you wish to focus attention on the outliers

– if you wish each class range to be equal

then use the equal interval method of classification class. Ranges will be approximately equal but there may be a different number of observations per class especially if the data is skewed. In the case of Al haouz Data data, we have a couple of communes that have a much higher population than most of the other communes. Here’s the map that’s generated using the equal interval    classification system

You can see that we have one class contain only one observation, and another class contain 2 observations and another one contain only three observations, then all administrative units containing fewer than 60k. Go into the smallest class you can see that this map gives you a very different look, it draws special attention to the outlying higher values at one end of your data range let’s take a closer look at how we arrived at this result using our Al Haouz data.

We have a range of approximately 2.4k as our low value and approximately 28k as our high value. If we subtract the low value from the high value and then divide this range by five classes, we’ll get the following result with five equal ranges :

211 observations will fall into our lowest range five into the medium and one into the highest. you can see that this second method of classification gives a very different result.

Be careful when choosing your map classification method, fifferent methods will be useful for different types of data.

Application in R :

generating chloroplet map with different classification methods with 4 steps in R

Step 1 :

install.packages(« sf », « tidyverse », « classInt », « viridis », « readxl »)

library(sf)

library(tidyverse)

library(classInt)

library(viridis)

Step 3 :

Establishig spatial joint

#Establish a spatial join between the communal shapefile and the attribute data

colnames(my_data)[1] <- « Nom_Com »

haouz_census <- left_join(al_haouz, my_data, by = « Nom_Com »)

Step 2 :

#reading shapefile of Al Haouz Data

« … /Data/communes.shp »)

#Reading attribute file containing census data

Ploting data using ggplot2

#Ploting the data using ggplot package

ggplot() +

geom_sf(data = al_haouz, size = 1, color = « black », fill = « cyan1 ») +

ggtitle(« Province d’Al Haouz ») +

coord_sf()

Step 3

Establishig spatial joint

#Establish a spatial join between the communal shapefile and the attribute data

colnames(my_data)[1] <- « Nom_Com »

haouz_census <- left_join(al_haouz, my_data, by = « Nom_Com »)

Step 4 :

## Creating a choropleth map

A choropleth map uses colour to show variation in the values of a variable across administrative areas. There are two types of choropleth map: classed or unclassed. In classed choropleth maps values are grouped into intervals using a variety of classification methods (like equal intervals, quantiles and natural breaks mentionned above). These classesare then mapped to a few discrete colours.

To create a classed choropleth map we need to assign groupings to the « population_active » variable. There are several ways to do this but the simplest is to assign an equal number of observations to each group. We can do this using the cut_number() function from the ggplot2 package. All we need to do is to supply the variable name and the number of class intervals.

#plot the new data

ggplot() +                                                                          # initialise a ggplot object

geom_sf(data = haouz_census,                                                      # add a simple features (sf) object

aes(fill = cut_number(Population_active, 5)),                             # group percent into equal intervals and use for fill

alpha = 0.8,                                                              # add transparency to the fill

colour = ‘white’,                                                         # make polygon boundaries white

size = 0.3) +                                                             # adjust width of polygon boundaries

scale_fill_brewer(palette = « PuBu »,                                               # choose a http://colorbrewer2.org/ palette

name = « Population active ») +                                   # add legend title

labs(x = NULL, y = NULL,                                                          # drop axis titles

title = « Al Haouz Map showing … »,      # add title

subtitle = « Source: HCP 2014 »,                             # add subtitle

Natural breaks (Fisher-Jenks algorithm) is a more sophisticated classification method because it creates distinct groups of similar values by minimising the sum of variance in created classes. We’ll use the classIntervals() function from the classInt package. We just need to supply the ‘percent’ variable that we want to group into classes, the number of classes, and the classification method. We’ll choose 5 class intervals and the natural breaks or ‘jenks’ method for grouping values.

#using Jenks method

classes <- classIntervals(haouz_census\$Population_active, n = 5, style = « jenks »)

You can check the class intervals by running:

classes\$brks

Next we’ll create a new column in our sf object using the base R cut() function to cut up our population_active variable into distinct groups.

#Add the class to the data

haouz_census <- haouz_census %>%

mutate(Population_active_prc = cut(Population_active, classes\$brks, include.lowest = T))

Now we are ready to plot again. The code is unchanged from the previous plot except that the fill aesthetic is mapped to the newly created ‘population_active_prc’ variable.

ggplot() +

geom_sf(data = haouz_census,

aes(fill = Population_active_prc),

alpha = 0.8,

colour = ‘white’,

size = 0.3) +

scale_fill_brewer(palette = « PuBu »,

name = « No qualifications (%) ») +

labs(x = NULL, y = NULL,

title = « Al Haouz Map showing …2011 »,

subtitle = « Source: HCP 2014 »,

caption = « Contains HCP data ©  copyright and database right (2014) »)

References :

http://wiki.gis.com/wiki/index.php/Classification

http://wiki.gis.com/wiki/index.php/Quantile

http://wiki.gis.com/wiki/index.php/Equal_Interval_classification

Writing by

AZMI RIDA Phd in GIS and Remote Sensing

Catégories

!