Fintech in Investment Management, Level II | New reading 2019

Download Mock Exams for 2019 Exam Levels I, II and III by clicking here.


Fintech (finance + technology) is playing a major role in the advancement and improvement of:

  • investment management industry (such as assessment of investment opportunities, portfolio optimization, risk mitigation etc.).
  • investment advisory services (e.g. Robo-advisors with or without intervention of human advisors are providing tailored, low-priced, actionable advice to investors).
  • financial record keeping, blockchain and distributed ledger technology (DLT) through finding improved ways of recording, tracking or storing financial assets.


For the scope of this reading, term ‘Fintech’ is referred to as technology-driven innovations in the field of financial services and products.

Note: In common usage, fintech may also refer to companies associated with new technologies or innovations.

Initially, the scope of fintech was limited to data processing and to the automation of routine tasks. Today, advanced computer systems are using artificial intelligence and machine learning to perform decision-making tasks including investment advice, financial planning, business lending/payments etc.

Some salient fintech developments related to the investment industry include:

  • Analysis of large data sets: These days, professional investment decision making process uses extensive amounts of traditional data sources (e.g. economic indicators, financial statements) as well as non-traditional data sources (such as social media, sensor networks) to generate profits.
  • Analytical tools: There is a growing need of techniques involving artificial intelligence (AI) to identify complex, non-linear relationships among such gigantic datasets.
  • Automated trading: Automated trading advantages include lower transaction costs, market liquidity, secrecy, efficient trading etc.
  • Automated advice: Robo-advisors or automated personal wealth management are low-cost alternates for retail investors.
  • Financial record keeping: DLT (distributed ledger technology) provides advanced and secure means of record keeping and tracing ownership of financial assets on peer-to-peer (P2P) basis. P2P lowers involvement of financial intermediaries


Big data refers to huge amount of data generated by traditional and non-traditional data sources.

Details of traditional and non-traditional sources are given in the table below.

Fintech Level II CFA Exam

Big data typically have the following features:

  • Volume
  • Velocity
  • Variety


Quantities of data denoted in millions, or even billions, of data points. Exhibit below shows data grow from MB to GB to larger sizes such as TB and PB.


Velocity determines how fast the data is communicated. Two criteria are Real-time or Near-time data, based on time delay.


Data is collected in a variety of forms including:

  • structured data – data items are often arranged in tables where each field represent a similar type of information. (e.g. SQL tables, CSV files)
  • unstructured data – cannot be organized in table and requires special applications or programs (e.g. social media, email, text messages, pictures, sensors, video/voice messages)
  • semi-structured data – contains attributes of both structured and unstructured data (e.g. HTML codes)

Exhibit: Big Data Characteristics: Volume, Velocity & Variety

Reading 6, Level II CFA 2019

3.1 Sources of Big Data

In addition to traditional data sources, alternative data sources are providing further information (regarding consumer behaviors, companies’ performances and other important investment-related activities) to be used in investment decision-making processes.

Main sources of alternative data are data generated by:

1.     Individuals:

Data in the form of text, video, photo, audio or other online activities (customer reviews, e-commerce). This type of data is often unstructured and is growing considerably.

2.     Business processes:

Data (often structured) generated by corporations or other public entities e.g. sales information, corporate exhaust. Corporate exhaust includes bank records, point of sale, supply chain information.


  • Traditional corporate metrics (annual, quarterly reports) are lagging indicators of business performance.
  • Business process data are real-time or leading indicators of business performance.

3.     Sensors:

Data (often unstructured) connected to devices via wireless networks. The volume of such data is growing exponentially compared to other two sources. IoT (internet of things) is the network of physical devices, home appliances, smart buildings that enable objects to share or interact information.

Alternative datasets are now used increasingly in the investment decision making models. Investment professionals will have to be vigilant about using information, which is not in the public domain regarding individuals without their explicit knowledge or consent.

3.2 Big Data Challenges

In investment analysis, using big data is challenging in terms of its quality (selection bias, missing data, outliers), volume (data sufficiency) and suitability. Most of the times, data is required to be sourced, cleansed and organized before use, however, performing these processes with alternative data is extremely challenging due to the qualitative nature of the data. Therefore, artificial intelligence and machine learning tools help addressing such issues.


Artificial intelligence (AI) technology in computer systems is used to perform tasks that involve cognitive and decision-making ability similar or superior to human brains.

Initially, AI programs were used in specific problem-solving framework following ‘if-then’ rules. Later, advanced processors enabled AI programs such as neural networks (which are based on how human brains process information) to be used in financial analysis, data mining, logistics etc.

Machine learning (ML) algorithms are computer programs that perform tasks and improve their performance overtime with experience. ML requires large amount of data (big data) to model accurate relationships.

ML algorithms use inputs (set of variables or datasets), learn from data by identifying relationships in the data to refine the process and model outputs (targets). If no targets are given, algorithms are used to describe the underlying structure of the data.

ML divides data into two sets:

  • Training data: that helps ML to identify relationships between inputs and outputs through historical patterns.
  • Validation data: that validates the performance of the model by testing the relationships developed (using the training data).

ML still depends on human judgment to develop suitable techniques for data analysis. ML works on sufficiently large amount of data which is clean, authentic and is free from biases.

The problem of overfitting (too complex model) occurs when algorithm models the training data too precisely. Over-trained model treats noise as true parameters. Such models fail to predict outcomes with out-of-sample data.

The problem of underfitting (too simple model) occurs when models treat true parameters as noise and fail to recognize relationships within the training data.

Sometimes results of ML algorithms are unclear and are not comprehensible i.e. when ML techniques are not explicitly programmed, they may appear to be opaque or ‘black box’.


4.1 Types of Machine Learning

ML approaches are used to identify relationships between variables, detect patterns or structure data. Two main types of machine learning are:

1.     Supervised leaning:

Uses labeled training data (set of inputs supplied to the program), and process that information to find the output. Supervised learning follows the logic of ‘X leads to Y’. Supervised learning is used to forecast a stock’s future returns or to predict stock market performance for next business day.

2.     Unsupervised learning:

Does not make use of labelled training data and does not follow the logic of ‘X leads to Y’. There are no outcomes to match to, however, the input data is analyzed, and the program discovers structures within the data itself e.g. splitting data into groups based on some similar attributes.

Deep Learning Nets (DLNs):

Some approaches use both supervised and unsupervised ML techniques. For example, deep learning nets (DLNs) use neural networks often with many hidden layers to perform non-linear data processing such as image, pattern or speech recognition, forecasting etc.

There is a significant role of advanced ML techniques in the evolution of investment research. ML techniques make it possible to

  • render greater data availability
  • analyze big data
  • improve software processing speeds
  • reduce storage costs

As a result, ML techniques are providing insights into individual firms, national or global levels and are a great help in predicting trends or events. Image recognition algorithms are used in store parking lots, shipping/manufacturing activities, agriculture fields etc.


Data science is interdisciplinary area that uses scientific methods (ML, statistics, algorithms, computer-techniques) to obtain information from big data or data in general.

The unstructured nature of the big data requires some specialized treatments (performed by data scientist) before using that data for analysis purpose.

5.1 Data Processing Methods

Various data processing methods are used by scientists to prepare and manage big data for further examination. Five data processing methods are given below:


Data capture refers to how data is collected and formatted for further analysis. Low-latency systems are systems that communicate high data volumes with small delay times such as applications based on real-time prices and events. High-latency systems suffers from long delays and do not require access to real-time data and calculations.


Data curation refers to managing and cleaning data to ensure data quality. This process involves detecting data errors and adjusting for missing data.


Data storage refers to archiving and storing data. Different types of data (structured, unstructured) require different storage formats


Search refers to how to locate requested data. Advanced applications are required to search from big data.


Data transfer refers to how to move data from its storage location to the underlying analytical tool. Data retrieved from stock exchange’s price feed is an example of direct data feed.

5.2 Data Visualization

Data visualization refers to how data will be formatted and displayed visually in graphical format.

Data visualization for

A. Traditional structured data can be done using tables, charts and trends.

B. Non-traditional unstructured data can be achieved using new visualization techniques such as:

  • interactive 3D graphics
  • multidimensional (more than three dimensional) data requires additional visualization techniques using colors, shapes, sizes etc.
  • tag cloud, where words are sized and displayed based on their frequency in the file.
  • Mind map, a variation of tag cloud, which shows how different concepts are related to each other.

Data visualization Tag Cloud Example

CFA Level 2, data visual


6.1 Text Analytics and Natural Language Processing

Please sign in to your FinQuiz CFA Exam Prep account for the complete summary.

Download Mock Exams for 2019 Exam Levels I, II and III by clicking here.

For general inquiries, please write to us at For pre-sales inquiries, get in touch at

CFA Institute does not endorse, promote or warrant the accuracy or quality of FinQuiz. CFA® and Chartered Financial Analyst® are registered trademarks owned by CFA Institute. BA II Plus is registered trademark owned by Texas Instruments.

Copyright © 2008-2020 FinQuiz:CFA Exam Prep. All rights reserved.
Terms and Conditions | Privacy Policy | Blog | Contact Us | FAQs | Changes | IFRS vs GAAP