Advanced object-oriented programming in R: statistical programming for data science, analysis and finance.
Welcome to Object-oriented Programming in R. I wrote this book to have teaching material beyond the typical introductory level of most textbooks on R. This book is intended to introduce objects and classes in R and how objectoriented programming is done in R. Object-oriented programming is based...
Saved in:
Main Author: | |
---|---|
Format: | Book |
Language: | English |
Published: |
Apress
2020
|
Subjects: | |
Online Access: | http://dspace.uniten.edu.my/jspui/handle/123456789/15355 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Welcome to Object-oriented Programming in R. I wrote this book to have
teaching material beyond the typical introductory level of most textbooks on
R. This book is intended to introduce objects and classes in R and how objectoriented
programming is done in R. Object-oriented programming is based on
the concept of objects and on designing programs in terms of operations that one
can do with objects and how objects communicate with other objects.
This is often thought of in terms of objects with states, where operations on
objects change the object state. Think of an object such as a bank account. Its
state would be the amount on it, and inserting or withdrawing money from it
would change its state. Operations we do on objects are often called “methods”
in the literature, but in some programming languages the conceptual model
is that objects are communicating and sending each other messages, and the
operations you do on an object are how it responds to messages it receives.
In R, data is immutable, so you don’t write code where you change an
object’s state. Rather, you work with objects as values, and operations on objects
create new objects when you need new “state”. Objects and classes in R are more
like abstract data structures. You have values and associated operations you can
do on these values. Such abstract data structures are implemented differently in
different programming languages. Most object-oriented languages implement
them using classes and class hierarchies while many functional languages define
them using some kind of type specifications that define which functions can be
applied to objects.
Types determine what you can do with objects. You can, for example, add
numbers, and you can concatenate strings, but you can’t really add strings or
concatenate numbers. In some programming languages, so-called statically
typed languages, you associate types with variables, which restrict which objects
the variables can refer to and enables some consistency check of code before
you run it. In such languages, you can specify new types by defining which
operations you can do on them, and you then need to add type specifications to
variables referring to them. Other programming languages, called dynamically
typed languages, do not associate types with variables but let them refer to any
kind of objects. R is dynamically typed, so you do not specify abstract data types
through a type specification. The operations you can do on objects are simply
determined by which functions you can call on the objects. You can still think
of these as specifications of abstract data structures; however, they are just
implicitly defined Abstract data structures can be implemented in different ways, which is
what makes them abstract, and the way to separate implementation from an
interface is through polymorphic or generic functions, a construction founded
on object-oriented programming. Generic functions are implemented through
a class mechanism, also derived from object-oriented programming. The
functions implemented by a class determine the interface of objects in the class,
and by constructing hierarchies of classes, you can share the implementation of
common functions between classes.
Abstract data structures are often used in algorithmic programming to
achieve efficient code, but such programming is frequently not the objective of
R programs. There, we are more interested in fitting data to models and such,
which frequently does not require algorithmic data structures. Fitted models,
however, are also examples of abstract data structures in the sense that I use the
term in this book. Models have an abstract interface that allows us to plot fitted
models, predict new response variables for new data, and so forth, and we can
use the same generic functions for such operations. Different models implement
their own versions of these generic functions, so you can write generic code that
will work on linear models, decision trees, or neural networks, for example.
Object-oriented programming was not built into the R language initially
but was added later, and unfortunately, more than one object-oriented system
was added. There are actually three different ways to implement object-oriented
constructions in R, each with different pros and cons, and these three systems
do not operate well together. I will cover all three in this book (S3, S4, and R6)
but put most emphasis on the S3 system which is the basis of the so-called “tidy
verse”, the packages such as tidyr, dplyr, ggplot2, etc., which form the basis of
most data analysis pipelines these days.
When developing your own software, I will strongly recommend that you
stick to one object-oriented system instead of mixing them, but which one you
choose is a matter of taste and which other packages your code is intended to
work with.
Most books I have read on object-oriented programming, and the classes I
have taken on object-oriented programming, have centered on object-oriented
modeling and software design. There, the focus is on how object-orientation
can be used to structure how you think about your software and how the
software can reflect physical or conceptual aspects of the world that you try to
model in your software. If, for instance, you implement software for dealing
with accounting, you would model accounts as objects with operations for
inserting and withdrawing money. You would try to, as much as possible, map
concepts from the problem domain to software as directly as possible. This is
a powerful approach to designing your software, but there are always aspects
of software that do not readily fit into such modeling, especially when it comes
to algorithmic programming and design of data structures. Search trees and
sorting algorithms, for instance, are usually not reflecting anything concrete in a
problem domain. Object-oriented programming, however, is also a very powerful tool to
use when designing algorithms and data structures. The way I was taught
programming, algorithms and data structures were covered in separate classes
from those in which I was taught object-orientation. Combining objectorientation
and algorithmic programming were something I had to teach myself
by writing software. I think this was a pity since the two really fit together well.
In this book, I will try to cover object-orientation both as a modeling
technique for designing software but also as a tool for developing reusable
algorithmic software. Polymorphism, a cornerstone of object-oriented
programming, lends itself readily to developing flexible algorithms and to
combining different concrete implementations of abstract data types to tailor
abstract algorithms to concrete problems. A main use of R is machine learning
and data science where efficient and flexible algorithms are more important
than modeling a problem domain, so much of the book will focus on those
aspects of object-oriented programming.
To read this book, you need to know the fundamentals of R programming:
how to manipulate data and how to write functions. We will not see particularly
complex R programming, so you do not need a fundamental knowledge of how
to do functional programming in R, but should you want to learn how, I suggest
reading the first book in this series which is about exactly that. You should be
able to follow the book without having read it, though. |
---|