This document presents a Hadoop project to analyze stock prices using MapReduce and Hive. The project aims to find the adjusted closing price for stock days without dividends by joining daily stock price and dividend datasets. The technical architecture uses Eclipse, Hadoop, and AWS EC2 for development and clustering. Pseudo code is provided for MapReduce jobs to parse input data and calculate adjusted closing prices for output. The results found adjusted closing prices for 44 records out of 75 total records by removing headers and dividend-only days.
Hadoop Stock Analyzer Project Using MapReduce and Hive
1. Hadoop Project
Stock Analyzer
(Mapreduce and Hive Implementation)
Presented by
Punit Kishore(A13011)
Debayan Datta(A13006)
Sunil Kumar P(A13020)
Maruthi Nataraj K(A13009)
Ashish Ranjan(A13004)
Praxis Business School
2. AGENDA
Understanding of the problem
Technical Architecture
Basic Structure
Pseudo Code
Final Result
Business Implications
Electronics Template
3. UNDERSTANDING OF THE PROBLEM
Objective : To find the adjusted closing price for each
day that a stock not reported a dividend.
Data Sources :
NYSE daily prices dataset with the below schema
exchange
stock_symbol
date
stock_price
_open
stock_
price_high
stock_price
_low
stock_price
_close
stock_volume
stock_pric
e_adj_close
NYSE dividends dataset with the below schema
exchange
stock_symbol
date
dividends
Isolation of dividend data from total data will give better
picture of the company because sometimes firms avoid
cutting dividends even when earnings drop.
Framework– Mapreduce/Hive
Electronics Template
11. TECHNICAL ARCHITECTURE
Sample data - NYSE_daily_prices_AT.csv (Testing is done on sample data only due to
load and time constraints).
Electronics Template
14. BASIC STRUCTURE
Input Key Value Pair <Memory Pointer,NYSE,AIT,
12-11-2009,X,X,X,X,X,20.69>
Intermediary Key Value Pair<AIT12-11-2009,1~20.69~0>
<AIT12-11-2009,1~Null~1>
Output/Result Key Value Pair
AIT
12-11-2009
20.69
Electronics Template
15. PSEUDO CODE
import java and hadoop packages
Mapper
Mapper
public static class StockAnalysisMapper extends MapReduceBase implements
Mapper<LongWritable, Text, Text, Text>
{
// declaration of Mapkey and Mapvalue
@Override
public void map(LongWritable key, Text value,OutputCollector<Text, Text> output,
Reporter reporter) throws IOException
{
// declaration of private variables
// switch case to parse the input lines and store the data
// check for null values in the key
// check the header and send the key value to output collector
}
}
Electronics Template
16. PSEUDO CODE
public static class StockAnalysisReducer extends MapReduceBase
implements Reducer<Text, Text, Text, Text>
Reducer
Reducer
{
//Declaration of required private variables
@Override
public void reduce(Text key, Iterator<Text> values,OutputCollector<Text, Text> output, Reporter
reporter) throws IOException
{
//Declaration of sum and flag variables
while (values.hasNext())
{
// Parse the inputs which are count,stock adjusted closing price and check
// Store them as required after parsing
//check for null values of stock adjusted closing price
}
}
}
//Increment the sum
// write to output if sum is 1
Electronics Template
18. FINAL RESULT
• NYSE Daily A
– 14 inclusive of
1 header
• NYSE Daily B
– 39 inclusive of
1 header
• Dividends file
– 22 inclusive of
1 header
Total – 75
Electronics Template
19. FINAL RESULT
• Total – 75
• Matching
records – 7
• Headers – 3
• Dividend
records – 21
• Final Output
– 44 records
Electronics Template
22. BUSINESS IMPLICATIONS
The daily close stock prices are adjusted for dividend distributions/stock
splits because they are a part of total return and affect the historical volatility
estimates .
The primary use for the adjusted closing price is as a means to develop an
accurate track record of a stock's performance. The comparison of a stock's
historical adjusted closing price to its current price shows the true rate of
return.
Graphing the volatility history of the target firm simultaneously with that of its
competitors and Market Index can provide unique insights into risk and
comparative advantages(frequency distribution of returns can also be used).
Historic stock price volatility might have implications to business valuators.
Electronics Template