1.0 Overview

With the Federal Reserve once again cutting interest rates to zero and embarking on massive quantitative easing measures in response to the economic malaise brought about by COVID-19, retail investors are becoming increasingly aware that leaving their hard-earned money in the bank is financially imprudent.

However, it may seem daunting for first-time investors looking to enter the market, with so many finance websites offering different resources and opinions. Our application looks to guide the beginner investor in their journey to begin investing by zeroing in on a select few stocks, commodities and bonds.

2.0 Motivation and Objectives

Existing financial data websites such as Yahoo Finance do a good job in providing historical price data and technical indicators, but the beginner investor lacks the knowledge to properly utilise and benefit from these. For example, there is a whole plethora of technical indicators available for use, but whether the investor is able to truly understand and use them properly is questionable. We have also identified a gap in such websites, that they do not provide any form of forecasting or cross-asset analysis to aid in investors’ decisions. In addition, it may be informative for an investor to see how macro events on a global scale may affect assets.

This project aims to utilise various R packages, mainly tidyverse, TTR and dtwcluster in building an interactive R Shiny application. The R Shiny application (details in section 2) will be an integrated dashboard which allows users to perform different types of analysis while aiding them in making informed decisions in portfolio construction.

3.0 Scope

Financial Asset Universe - The list of assets covered in our application is shown below:

S/N	Stock Market Indices	Commodities	Bonds
1	US	Gold	Investment Grade
2	UK	Silver	High-Yield
3	Germany	Copper
4	France	Iron
5	Italy	Natural Gas
6	Spain	Crude Oil
7	Brazil
8	Singapore
9	Hong Kong
10	Japan
11	China
12	Australia
13	Korea
14	India

4.0 Data

We will extract all historical price data from Jan 1 1997 to Dec 31 2020 for our selected assets using Bloomberg Terminal. The fields which we will extract data for include:

Open Price: The asset’s opening price on a trading day

High Price: The asset’s highest price on a trading day

Low Price: The asset’s lowest price on a trading day

Close Price: The asset’s closing price on a trading day

5.0 Application Features

The main target audience for our application are prospective investors who are looking for a tool to help them build their portfolio and diversify their holdings. This section explores features of our application and some ways it can help our users. The application itself is split up into 3 main sections address different parts of the investing process.

5.1 Exploratory Data Analysis (EDA)

Base visualizations and analysis of key financial asset returns metrics

Exploratory Data Analysis (EDA) is an approach to analyzing data sets to summarize their main characteristics, often with visual methods. It will be useful for potential investors to have a visual tool to understand various key metrics when analysing financial assets. Some of the key metrics our application aims to help users visualize include:

Correlation
- Correlation matrix to show correlation between selected assets at different time horizons
Distribution of returns
- Histogram showing distribution of returns for different assets will allow users to visualize volatility
Risk to Reward
- Purely looking at gross returns often does not paint the full picture as most high-returns assets come with greater volatility. Metrics such as Sharpe - Ratio will allow users to get a better sense of the Risk:Reward profile of different assets
Technical indicators
- Technical analysis is a trading discipline employed to evaluate investments and identify trading opportunities by analyzing statistical trends gathered from trading activity, such as price movement and volume.
- Our application will categorize indicators based on their category (trend, momentum or volatility) so as to allow users to know precisely what the indicator is looking at. This is in contrast to many existing websites where users have a whole plethora of indicators at their disposal but not knowing what they are actually evaluating.

Possible insights: Diversification through identifying weakly correlated markets and indices. This allows the user to ‘put their eggs across different baskets’ and spread out their risk.

5.2 Time-Series Forecasting

Predicting financial asset prices using an ARIMA model

ARIMA is an acronym that stands for AutoRegressive Integrated Moving Average. It is a class of model that captures a suite of different standard temporal structures in time series data.

AR: Autoregression. A regression model that uses the dependencies between an observation and a number of lagged observations.
I: Integrated. To make the time series stationary by measuring the differences of observations at different time.
MA: Moving Average. An approach that takes into accounts the dependency between observations and the residual error terms when a moving average model is used to the lagged observations (q).

Along with various parameters to be tuned, an ARIMA model uses purely price action data for forecasting future prices.

The general form of an ARIMA model is denoted as ARIMA (p, q, d). With seasonal time series data, it is likely that short-run non-seasonal components contribute to the model. ARIMA model is typically represented as ARIMA (p, q, d), where: —

p is the number of lag observations utilized in training the model (i.e., lag order).
d is the number of times differencing is applied (i.e. the degree of differencing).
q is known as the size of the moving average window (i.e., order of moving average).

Possible insights: Forecast what are the likely price movements in the period ahead in order to identify optimal entry points

5.3 Time-Series Clustering

Clustering financial assets based on historical returns

Time series clustering is to partition time series data into groups based on similarity or distance, so that time series in the same cluster are similar. Dynamic Time Warping (DTW) is used as the similarity/distance metric and finds optimal alignment between two time series.

The first step is to work out an appropriate distance/similarity metric (i.e. DTW).
The second step, use existing clustering techniques, such as k-means, hierarchical clustering, density-based clustering or subspace clustering, to find clustering structures.
The final result will then be presented on the visualization.

Possible insights: Time series in the same cluster are likely to undergo similar cycles and risks.This means that users can expect similar movements across cycles/periods and act accordingly.

6.0 Tools and Packages

S/N	Package	Documentation
1	tidyverse	https://www.tidyverse.org/
2	dplyr	https://www.rdocumentation.org/packages/dplyr/versions/0.7.8
3	TTR	https://cran.r-project.org/web/packages/TTR/index.html
4	plotly	https://www.rdocumentation.org/packages/plotly/versions/4.9.3/topics/plot_ly
5	ggplot2	https://www.rdocumentation.org/packages/ggplot2/versions/3.3.3
6	Shiny	https://www.rdocumentation.org/packages/shiny/versions/1.6.0
7	dtwcluster	https://www.rdocumentation.org/packages/dtwclust/versions/3.1.1
8	dygraphs	https://www.rdocumentation.org/packages/dygraphs/versions/1.1.1.6
9	tidyquant	https://cran.r-project.org/web/packages/tidyquant/index.html
10	readr	https://www.rdocumentation.org/packages/readr/versions/1.3.1
11	rticles	https://www.rdocumentation.org/packages/rticles/versions/0.18

Project Proposal