KCSE 2015 Tutorial 빅데이터 분석 기술의 소프트웨어 공학 분야 활용 (Applying Big Data Analytics to Software Engineering)

빅데이터 분석 기술의
소프트웨어 공학 분야 활용

빅데이터 분석
Hype Cycle for Emerging Technologies, 2014.8
3
Data Science
Prescriptive Analytics
At the PEAK
Big Data
Sliding Into the Trough
가트너 2015년 10대 전략 기술 중 보편화된 첨단 분석, 콘텍스트 리치 시스템, 스마트 머신이 있음
Advanced, Pervasive and Invisible Analytics, Context-Rich Systems, Smart Machine
빅데이터 및 분석 주제는 2011년부터 전략 기술로 채택됨 (2014.10)

BIG DATA 특성
4
http://www.ibmbigdatahub.com/infographic/four-vs-big-data
2013

SKT
5
통화량에 기반한 데이터 분석

KT
6
KT의 기지국 데이터
농림축산검역본부의 축산업계 차량 데이터
AI 확산경로 및 확산 의심매체 이동 데이터
KT 통화량 통계 데이터 30억건
서울시 보유 노선 데이터

NETFLIX
7
빅데이터 기반으로 시리즈 “House of Cards” 제작. 배우, 감독등을 데이타를 분석해서 casting
하루 평균 3천만 건의 동영상 재생기록, 400만 건의 이용자 평가, 300만 건의 검색정보, 위치정보, 단말정보

GOOGLE
8
무인자동차가 초당 1GB 이상의 센서 데이터를 실시간으로 처리

AGRICULTURE
9
Cow에게 ePill이라는 센서를 먹이고 실시간으로 Vital Sign을 모니터링함.
질병, 영양상태 및 Heat stress와 같은 환경 영향에 대한 Insight를 제공

10
Ag Equipment: mobile networks
• Tractor and Implement(s) are acting as one network
• Always connected!
Cloud / Internet

AGRICULTURE, IOT, BIG DATA
11
Analytics
GPS
Thermo
stat
wireless (WiFi, Bluetooth, …)
wired
Drone
Bio
sensor
• Metering
• Temperature
• Humidity
• Water PH
• Soil moisture level
• Heart rate
• Rumination
• Wind speed
• Land Dry level
• tractor-mounted
computers
• Weather, Climate
• Rainfalls, Drought
• Atmosphere
• Currents
• Ground water
• Map
• Soil
• Crops
• Irrigation
• Planting Recomm.(planting,
fertilizing & harvesting time)
• Prediction of yields, disease
outbreak
• Selection of Seeds
• Cattle management
• Splinkler (when to spray)
• Herbicide, Clearing Weeds
• Data Service
• Data driven planting advise
(Planting depth or the
distance between crop rows
• Price in Market
• Plant Height
• Weeds
Google
Maps
National
Weather
Service
Image & Data
Internet Services
• Lack of water
• Food (preventing hunger)
• Increasing yields
Smartphone, pad
Tractor &
Combine

GOOGLE’S ANNOUNCEMENT
Google Partners with Cloudera to Bring Cloud Dataflow to Apache
Spark
14
http://techcrunch.com/2015/01/20/google-partners-with-cloudera-to-bring-cloud-dataflow-to-apache-spark/
Google Cloud Dataflow is a simple, flexible, and powerful system you can use to perform
data processing tasks of any size.
https://cloud.google.com/dataflow/what-is-google-cloud-dataflow
2015.1.20

15
“No More New Algorithm Implementations for Hadoop MapReduce”

LET’S TAKE A LOOK AT THE TRENDS
16

17
Open Source Processing Engine for Hadoop Data
AMPLab, UC Berkeley
Developed in 2009
Open Sourced in 2010

SPARK IS FAST
World record set for 100TB sort by open source and public cloud
team
18http://databricks.com/blog/2014/11/05/spark-officially-sets-a-new-record-in-large-scale-sorting.html
Spark sorted the same data
3X faster using 10X fewer
machines.
Daytona GraySort contest

WHY SPARK IS FASTER THAN HADOOP?
19
Read/write between every single iteration
Use Ram in Read-only Mode(RDD)

SPARK
20
https://spark.apache.org/

Data File demands_201405
Data Size 788.8MB
Elapsed Time 1 min 6 secs
# Nodes 1(Signle)
Output File UsageStats.txt
CPU 2.4 GHz Intel Core i5
RAM 8GB
SUMMARY

THE CODE IN SCALA
val ami = sc.textFile("demands_201405").map(_.split(",")) 
val days = ami.map(x => x(1).split(" ")(0)).distinct().count() 
 
val ami0 = ami.map(x => (x(3), x(8).toDouble)) 
val ami1 = ami0.groupByKey() 
val ami2 = ami1.map(x => { 
val d = diff(x._2.foldLeft((0.0, 0.0)) 
((minmax, x) => (lt(minmax._1, x), gt(minmax._2, x)))) 
(x._1, d, d/days) 
} 
)

WORD COUNT
27
/* 
* @(#)Figure.java 
* 
* Project: JHotdraw - a GUI framework for
technical drawings 
* http://
www.jhotdraw.org 
* http://
jhotdraw.sourceforge.net 
* Copyright:? by the original author(s) and all contributors 
* License: Lesser GNU Public License (LGPL) 
* http://
www.opensource.org/licenses/lgpl-license.html 
*/ 
 
package org.jhotdraw.framework; 
 
import org.jhotdraw.util.*; 
import org.jhotdraw.standard.TextHolder; 
 
import java.awt.*; 
import java.io.Serializable; 
 
/** 
* The interface of a graphical figure. A figure knows 
* its display box and can draw itself. A figure can be 
* composed of several figures. To interact and manipulate 
* with a figure it can provide Handles and Connectors.<p> 
* A figure has a set of handles to manipulate its shape or
(Figure,./Figure.java,1)
(java,./Figure.java,1)
(Project,./Figure.java,1)
(JHotdraw,./Figure.java,1)
(a,./Figure.java,1)
…
the: 68: ./Figure.java
figure: 56: ./Figure.java
a: 47: ./Figure.java
public: 45: ./Figure.java
to: 22: ./Figure.java
…
Map
(word, path, 1)
Reduce
(word, count, pathlist)

HADOOP
28
import java.io.IOException; import java.util.*;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.*;
import org.apache.hadoop.mapred.*;
public class LineIndexer {
public static void main(String[] args) {
JobClient client = new JobClient(); JobConf conf = new JobConf(LineIndexer.class);
conf.setJobName("LineIndexer");
conf.setOutputKeyClass(Text.class);
conf.setOutputValueClass(Text.class);
FileInputFormat.addInputPath(conf, new Path("input"));
FileOutputFormat.setOutputPath(conf, new Path("output"));
conf.setMapperClass( LineIndexMapper.class);
conf.setReducerClass( LineIndexReducer.class);
client.setConf(conf);
try {
JobClient.runJob(conf);
} catch (Exception e) { e.printStackTrace(); }
}
public static class LineIndexReducer extends MapReduceBase implements Reducer<Text, Text, Text, Text> {
public void reduce(Text key, Iterator<Text> values, OutputCollector<Text, Text> output, Reporter reporter)
throws IOException { boolean first = true; 
StringBuilder toReturn = new StringBuilder();
while (values.hasNext()) {
if (!first) toReturn.append(", ");
first=false; toReturn.append(
values.next().toString());
}
output.collect(key,new Text(toReturn.toString()));
}
}
public static class LineIndexMapper extends MapReduceBase implements Mapper<LongWritable, Text, Text, Text> {
private final static Text word = new Text(); 
private final static Text location = new Text();
public void map(LongWritable key, Text val, OutputCollector<Text, Text> output, Reporter reporter)
throws IOException {
FileSplit fileSplit = (FileSplit)reporter.getInputSplit();
String fileName = fileSplit.getPath().getName();
location.set(fileName);
String line = val.toString(); StringTokenizer itr = new
StringTokenizer(line.toLowerCase());
while (itr.hasMoreTokens()) {
word.set(itr.nextToken());
output.collect(word, location); }
}
}
}
}
class LineIndexReducer extends MapReduceBase implements Reducer<Text, Text, Text, Text>
class LineIndexMapper extends MapReduceBase implements Mapper< ..>
public void map(LongWritable key, Text val, OutputCollector<Text, Text> output, Reporter reporter
throws IOException {
public void reduce(Text key, Iterator<Text> values, OutputCollector<Text, Text> output, Reporter re
http://deanwampler.github.io/polyglotprogramming/papers/index.html

SPARK
29
import org.apache.spark.SparkContext
import org.apache.spark.SparkContext._
object InvertedIndex {
def main(args: Array[String]) = {
val sc = new SparkContext( "local", "Inverted Index")
sc.textFile("data/crawl").map { line =>
val array = line.split("t", 2)
(array(0), array(1))
}.flatMap {
case (path, text) =>
text.split("""W+""") map { word => (word, path) }
} .map {
case (w, p) => ((w, p), 1)
}.reduceByKey {
case (n1, n2) => n1 + n2
}.map {
case ((w, p), n) => (w, (p, n))
}.groupBy {
case (w, (p, n)) => w
}.map {
case (w, seq) =>
val seq2 = seq map { case (_, (p, n)) => (p, n) }
(w, seq2.mkString(", "))
}.saveAsTextFile(argz.outpath)
sc.stop()
}
}

DON’T PROGRAM ON FRIDAYS, CVS
31
MSR 2005
Locate fix-inducing changes by linking a version archive to a bug database.
CVS <=> BUGZILLA
Link transactions to bug reports Locate fix-inducing changes for bug 42233
최종적으로 1.18에서 버그가 수정됨
pairs (1.14, 1.18) and (1.16, 1.18)

32
Average sizes of fix and fix-inducing transactions
(ECLIPSE)
MOZILLA
Fix, 즉 재작업이 필요한 변경이 정상 변경보다 3배 정도 많음
ECLIPSE
P(Bug | Fix) => Fix 했을 때 재작업이 필요한 확률
P(Bug and Fix) / P(Fix) = 버그 Fix 수정 중에서 재작업이 필요한 수 (Fix를 위한 Fix)
Bug: Fix inducing changes, Fix: 버그 데이터베이스와 연관된 변경

QUESTIONS TRENDS, STACK OVERFLOW
33
질문을 범주화하고 중요도에 따라 Stack Overflow 질문을 순서화 (Rank)
Categories of JavaScript-based discussionsShare of web related questions on Stack Overflow.
MSR ’14
2009년 1월부터 2012년 12월까지 413만 건의 질문. 평균 한달에 85,950 질문

34
Temporal trends in JavaScript-based discussions. mobile related questions on Stack Overflow.

REPUTATION SCORES, STACK OVERFLOW
35
개발자 평판을 정량화 (Reputation = Technical expertise and sustained effort )
높은 평판 점수를 빨리 따기 위한 방법?
MSR 2013
커뮤니티에서 평판을 쌓는 것은 StackOverflow 기여의
주요 동기이며, 질문에 대한 답변에 의해 점수화

36
주말에 올라온 질문이 답변될 확률이 높고, 답변에 대한 Accept률도 높다 - First Answer Interval, Accepted Ratios
23시 부터 5시까지, 답변 시간이 길고. 또한 질문도 적다 - First Answer Interval, % of Questions
Unanswered Ratios Accepted Ratios First Answer Intervals Percentage of Questions Posted
•.NET, OOP 및 Web 개발 관련 질문이 많으므로 이 주제를 잘 알면 빨리 평판을 높일 수 있다. Facebook이나 XCode
혹은 모바일 개발 같은 전문가가 상대적으로 적은 주제에 집중하는 것도 방법이다.
•많은 전문가가 쉴 때 활동을 하는 것도 방법이다 (새벽 4시부터 8시 전)

API RECOMMENDATION
37
Fields in a JIRA Issue - Summary, Description, Component, Reporter, Priority
Link Between A JIRA Issue And A Commit in A Version Control System
Feature 요청에 대한 Description을 입력 받아, Feature 개발을 위한 Library API의 메소드를 추천
ASE 2013

38
History Based
•Feature 설명을 Vector화하여, Vector 유사성을 계산
•신규 Feature 설명과 기존 Closed/Resolved된 features 설명 간의 거리를 계산하여 K neighbor를 찾음
•K neighbor에서 사용된 methods를 찾아 메소드의 점수를 계산
Description Based
•API 문서 상의 각 method description을 vector화함.
•Feature vector와 유사성을 계산
기존 Feature 구현 이력에 기반한 API 추천과 API 메소드 description과의 유사성에 기반한 추천을 통합
텍스트 분석을 통한 Vector화
•문서 상의 단어 사용 횟수와 단어가 사용된 문서 갯수에 기반하여,
•1) Feature(기능) 설명과 이전 Feature 설명 간의 유사도 정의
•2) Feature 설명과 API 설명 간의 유사도 정의
•중요도는 그 단어가 문서 상에 얼마나 많이 나올 수록 중요하고, 그 단어가 전체 문서 상에서 일반적인 단어가 아닐 수록 중요함
•Sparse Vector
New Feature
설명
Closed/Resolved
Feature
사용된 API
API 설명
New Feature
설명
KNN, K Nearest Neighbors

39
Recall-Rate@5 - 메소드 5개를 추천했을 때, 관련 메소드가 있을 확률
Recall-Rate@10 - 메소드 10개를 추천했을 때, 관련 메소드가 있을 확률

MINING SOCIAL NETWORK
40
Postgres DB 메일링 리스트 데이터로부터 커뮤니케이션 네트워크를 구성
January, 1998 to February, 2006.
110,260 messages
4,075 unique email addresses => 3,293 unique “identities”
• 주개발자가 소셜 네트워크의 주요 노드 (HIGH Out/In/Between)
• More active developers tend to be more important.
메일링 리스트 상의 메시지의 송/수신자 정보로부터 소셜 네트워크를 구성
네트워크 상의 에지는 150개 이상의 메시지를 송신 혹은 수신함을 의미
MSR 2006

CONWAY’S LAW
41
Linux:
“The structure of a software system is a direct reflection of the structure of the
development team” [Bowman et al. 99]
Conceptual architecture - 시스템 Documentation
Ownership architecture - 시스템 Documentation & 소스 코드 Repository Log
Concrete architecture - 소스 코드 구조
Conceptual architecture Ownership architecture Concrete architecture
Conceptual 아키텍처와 Concrete 아키텍처 간에 많은 차이를 보임
Ownership 아키텍처가 Concrete 아키텍처를 더 근접
CASCON '98

42
Predicted edges (E) - 에지의 수
Correct edges (K) - Concrete 아키텍처에 있는 에지 수
False negatives (M) - 없지만 Concrete 아키텍처에 있는 에지 수
False positives (V ) - 있지만 Concrete 아키텍처에 없는 에지 수
Conway 법칙에 따르면, Ownership 아키텍처는 Concrete 아키텍처 (As Built)를 결정
=> Ownership 아키텍처에 대한 연구가 더 필요함
•Ownership architecture를 통해서도 Concrete 아키텍처의 Edge를 잘 예측할 수 있음 - Correct 에지 수 K가 더 나음
•Ownership 아키텍처는 False Positive가 더 많음 (Over-estimate) - 개발자가 둘 이상의 모듈에 간여할 때의 각 모듈에
서의 역할을 보다 면밀히 살펴볼 필요가 있음 (코드를 항상 같이 고치는 지)

VISUALIZATION
43
ArgoUML - 140 kLOC Java system for handling UML diagrams
ICSE 2008

44
ArgoUML’s design problems

Mining
SE 데이터를 유용한 데이터로 가공하여 데이터에 기반한 결정과 실행
바로 보이지 않는 숨겨진 패턴, 경향을 찾자
소프트웨어 개발 이력에 대한 실증적 이해
프로젝트에서 필요한 예측과 계획
리스크의 파악과 대처
MINING SE DATA
45
소스 코드
리포지토리
이슈트래커
(버그, 타스크)
메일 및 문서
실행 로그
• 언제 출시할 수 있을까? 지금 출
시하면 얼만큼의 후속 작업이 필
요한가? 안정화가 되어가고 있는
가?
• 현재 프로젝트 개발 진척 속도로
일정을 맞출 수 있을까? 개발 인
원이 부족한가?
• 테스팅이 늦어져, 개발에 실질적
피드백이 안되고 있지는 않는가?
• 특정 개발자에 일이 몰리고 있는
건 아닌가?
• 특정 모듈이 들어오면 버그 수가
증가하는가?
• …
제품 Manager
Engineering
Manager
INSIGHT

DATA AND ANALYSIS NEEDS
46
110명의 Microsoft 엔지니어와 관리자에게 설문 (57 Managers, 53 Engineers)
What factors influence your decision making process?
Importance of factors to decision making amongst
Importance of measuring artifacts
ICSE 2012

47
What indicators do you currently use?
What would you like to use?
indicators in making decisions (use or would use)

Spark를 통한 데이터 분석

SPARK SHELL
49
spark-shell
Welcome to
____ __
/ __/__ ___ _____/ /__
_ / _ / _ `/ __/ '_/
/___/ .__/_,_/_/ /_/_ version 1.2.0
/_/
Using Scala version 2.10.4 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_25)
 
val file = "./Figure.java" 
val (path, text) = (file, sc.textFile(file)) 
 
val wpclist = text.flatMap(x => x.split("""W+""").map(word => (word, path, 1)).filter(_._1.length > 0)) 
wpclist.take(5).foreach(println(_))
(Figure,./Figure.java,1)
(java,./Figure.java,1)
(Project,./Figure.java,1)
(JHotdraw,./Figure.java,1)
(a,./Figure.java,1)

50
val wg = wpclist.groupBy(_._1) 
wg.take(5).foreach(x => println(x._1 + ": " + x._2.size)) 
 
FigureChangeListener: 6
package: 1
means: 1
call: 2
this: 11
val wpc = wg.map(x => { 
val (word, wpclist) = x 
val (path, count) = wpclist.foldLeft((Set[String](), 0))((r, wpc) => (r._1 + wpc._2, r._2 + wpc._3)) 
(word, path, count) 
}) 
FigureChangeListener: ./Figure.java: 6
package: ./Figure.java: 1
means: ./Figure.java: 1
call: ./Figure.java: 2
this: ./Figure.java: 11
 
wpc.sortBy(_._3, false).take(10).foreach(x => println(x._1 + ": " + x._3 + ": " + x._2.mkString(", ")))
the: 68: ./Figure.java
figure: 56: ./Figure.java
a: 47: ./Figure.java
public: 45: ./Figure.java
to: 22: ./Figure.java

WORD COUNT
SourceFile
File List
WordCount
51
(Word, Path, 1)
…
./JHotDraw60b1/src
Figure.java
(Word, Path, 1)
…
(Word, { Path } , n)
…Figure.java
…

WORD COUNT
52
import org.apache.spark.rdd.RDD 
 
object WordCount { 
def analyze(files: List[(String, RDD[String])]) = { 
val wpc = files.flatMap(x => { 
val (path, code) = x 
code.flatMap(t => 
t.split("""W+""").map(word => (word, path, 1))
.filter(_._1.length > 0)
).toLocalIterator 
}) 
 
wpc.groupBy(_._1).map(x => { 
val (key, wpclist) = x 
val (path, count) =
wpclist.foldLeft((Set[String](), 0))((r, wpc) =>
(r._1 + wpc._2, r._2 + wpc._3)) 
(key, path, count)
}).toList.sortWith(_._3 > _._3)
}
}

import java.text.SimpleDateFormat 
import java.util.Calendar 
 
import org.apache.spark.SparkConf 
import org.apache.spark.SparkContext 
 
object DistCodeAnalyzer { 
def main(args: Array[String]): Unit = {
 
val conf = new SparkConf(true).setMaster(“local")
.setAppName("CodeAnalyzer")
val sc = new SparkContext(conf) 
 
val dir = "./JHotDraw60b1/src" 
val (filelist, count, size) = (new SourceFile).read(dir, “.java")
 
val data = filelist.map(x =>
(x.substring(dir.length), sc.textFile(x)))
val result = WordCount.analyze(data) 
 
println(dir + "; # of files - " + count + ", size - " + size) 
result.take(10).foreach(x => println(x._1 + ": " + x._3)) 
} 
}
53

54
import java.io.{File, IOException} 
import java.nio.file.attribute.BasicFileAttributes 
import java.nio.file.{Files, FileVisitResult, Path, SimpleFileVisitor} 
import java.nio.file.FileVisitResult._ 
 
class SourceFile extends SimpleFileVisitor[Path] { 
var filter = "" 
var files = List[String]() 
var count = 0 
var size = 0L 
 
def read(dir: String, ext: String): (List[String], Int, Long) = { 
filter = ext 
files = List[String]() 
size = 0 
val path: Path = new File(dir).toPath 
Files.walkFileTree(path, this) 
(files, count, size) 
} 
 
override def visitFile(file: Path, attr: BasicFileAttributes)= { 
if (attr.isRegularFile() && file.toString.endsWith(filter)) { 
files = file.toString :: files 
size = size + attr.size() 
count = count + 1 
} 
return CONTINUE; 
} 
}

BIG DATA SOFTWARE ENGINEERING
56
Big Data analytics is able to handle
- data volume (large data sets),
- velocity (data arriving at high frequency),
- variety (heterogeneous and unstructured data) and
- veracity (data uncertainty)
the so called four Vs of Big Data.
Research on software analytics and mining software
repositories has delivered promising results mainly
focusing on data volume. However, novel opportunities
may arise when leveraging the remaining three Vs of Big
Data.
Examples include using streaming data (velocity), such as
monitoring data from services and things, and combining a
broad range of heterogeneous data sources (variety) to
take decisions about dynamic software adaptation.

VOLUME
57
Ultra-large-scale software repositories, new library of Alexandria.
e.g. SourceForge (350,000+ projects), GitHub (250,000+ projects), and
Google Code (250,000+ projects)
Domain-specific language and infrastructure to
ease testing MSR-related hypotheses
가장 많이 사용된 Open Source License?
아직 DES encrytion을 사용하는 프로젝트?
코드 제어 정책 (Closed, Open)?
Critical 버그의 수정 기간?
Code Clone Analysis for open source projects?
Evolutionary Analysis for Every Commits from the birth of a project?
http://boa.cs.iastate.edu/
ICSE 2013

VELOCITY
58
AnalyticsContinuous Data Decision
Monitoring
Detection
Dashboard
Quick Actions
val ssc = new StreamingContext(sparkContext, Seconds(1))
val tweets = TwitterUtils.createStream(ssc, auth)
val hashTags = tweets.flatMap(status => getTags(status)
hashTags.saveAsHadoopFiles("hdfs://...")
hashTags Dstream
[#cat, #dog, … ]
Usage Logs, Failure Logs
Security Threats Monitoring
IDE Editor Logs
Feedback

VARIETY
59
Git & GitHub
StackOverflow
Jira
Documentation
E-mail
Sensor
Camera
Logs
Field Calls
+
Bug Detection from Camera - Black Box Machine Test Engineer
Bug Report Generation from Service Calls

DATA ANALYSIS AREA
61
ICSE 2014

62
“For each bug, at what stage in the development cycle was the bug found, at what
stage was it introduced, at what stage could it have been found?”
“What is the net cost of refactoring legacy code so that it can be covered with unit
tests, compared to leaving legacy code uncovered and making changes in it?”
“What are the best and worst practices [of] teams that miss deadlines, or slip
beyond their release schedule call out? It would be great to note what practices
teams adopt and/or cut when they are feeling pressured to meet certain
deadlines.”
“Some means to get a relation defined between crashes found in real world and
tests in our collateral which hit the same code path. Knowing we hit this code
path but still we have a crashing bug in there would be good to know.”
“How many features of the s/w are used by people and at what percentage? It will
help me understand what is the impact/ROI on adding a new complex feature in
terms of the investment we put to add new feature vs. its impact on user.”
“Are there measurable differences in productivity between experienced
developers with CS degrees and developers with unrelated or no degrees?”
Practice
Best
Practice
Evaluating
Quality
Customers and
Requirements
Bug
Productivity

KCSE 2015 Tutorial 빅데이터 분석 기술의 소프트웨어 공학 분야 활용 (Applying Big Data Analytics to Software Engineering)

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

Similaire à KCSE 2015 Tutorial 빅데이터 분석 기술의 소프트웨어 공학 분야 활용 (Applying Big Data Analytics to Software Engineering)

Similaire à KCSE 2015 Tutorial 빅데이터 분석 기술의 소프트웨어 공학 분야 활용 (Applying Big Data Analytics to Software Engineering) (20)

Plus de Chanjin Park

Plus de Chanjin Park (6)

KCSE 2015 Tutorial 빅데이터 분석 기술의 소프트웨어 공학 분야 활용 (Applying Big Data Analytics to Software Engineering)