1. Big Data als innovatie
PDMA Masterclass Big data @CISCO - Amsterdam
Jurjen Helmus
University of Applied Sciences Amsterdam
Innoveren met Big dataOF
2. 2
@JRHelmus / Father & partner/ Fiat X1/9 Innovator /
Lateral thinker / e-mobility big data researcher
Charge volume
Charge point address
Connection time
RFID
3. First remark
Deze presentatie is deels gebaseerd op onderstaand
artikel, verkrijgbaar voor leden op de PDMA.nl website
4. Second remark
balans te zoeken tussen complexiteit en for dummies
Geef dus vooral aan als iets te eenvoudig/bekend is
6. (op en vraag die niet gesteld is)
Big Data is niet het antwoord
7. Big Data is niet direct innovatie
(maar kan er wel toe leiden)
8. Big Data is niet één specifieke nieuwe methode
(maar een paraplu)
9. Big Data is in essentie niet nieuw
(maar een samen gang van 4 werelden)
1. Data Generatie – Sensoren, web2.0,
machine data,
2. Data opslag en werking – naast SQL
ook NOSQL (non-structured) data
opslag, streaming data, meta data
3. Data analyse – sneller, complexer,
beter maar vooral machine learning
(80’s) en deep learning
4. Data visualisatie – sneller, flexibeler,
intuïtiever
Data Generatie
Opslag en
verwerking
Statistische analyse
visualisatie
Bron: Gartner.com
12. Big data analytics verloopt volgens een duidelijke
methodologie
Bron:IDO-LAAD RAAKPRO voorstel & Gartner
13. Big data analytics verloopt volgens een duidelijke
methodologie
Bron:IDO-LAAD RAAKPRO voorstel & Gartner
14. 6 typische analyses in relatie tot Big Data
Cluster analyse Classification Regression
Sentiment analyse Association rule learning Neuraal netwerk
Visualisatie van statistische technieken
Cluster analyse Classification analysis Regressie
Sentiment analyse Association Rule learning Neural network
Tan, P-N, Steinbach, M. and Kumar, V. (2005), Introduciton to Data Mining, Pearson
Eduction, Boston, MA
15. Met name machine learning algoritmes zijn sterk
ontwikkeld onder invloed van enorme datasets
Illustratie van deep learning algoritme
deeplearning.stanford.edu/
21. Our dataset consists of >715,000 charge sessions
from charge point operators in 4 largest cities
Parameter Example Explanation
Charge point
address
Admiralengracht
44
Adress of the charge
point
Charge point
operator
Nuon Owner of the charge
point
Charging
service
provider
Essent Owner of the used
charging card
Charge point
city
Amsterdam
Charge point
postal code
1057EW ZIP code of the area of
the charge point
Volume 0,86 Charged energy [kWh]
Connection
time
0:14:23 Time the car was
connected
Start Date 18-04-2012 Date the session started
End Date 18-04-2012 Date the session ended
Start Time 23:20:55 Time the session started
End Time 23:35:18 Time the session ended
Charging
time
0:14:23 Time the car is actually
charging
RFID 60DF4D78 RFID code of a charging
card Charge volume
Charge point address
Connection time
RFID
The data is enriched with information from the municipality and Dutch Statistics Agency (CBS) such as parking
zones, neighborhoods, demographic & social information
22. Meetbaar maken van klantgedrag
Charge point addressRFID
Individuele klant
Meso niveau
Macro niveau
Sociale
interactie
Moment gebonden
gedrag
Recurring pattern
Sociale
interactie
Connection date time
Eigenschappen EV
EV Range
Max capaciteit
Lerend vermogen
Laad snelheid
Laad patroon
infrastructuur
Transitie moment unsteady
naar steady state
Patroon relatie
Andere gebuikers
aankomstpatroon
Time ratio
honkvastheid
loyaliteit
Connection date time
weersgevoeligheid
slijtage
wachtrijen
Lokale
dynamiek
honkvastheid
23. Klantgedrag kan wiskundig beschreven worden
waaruit middels cluster analyse klantgroepen ontstaan
Sign Explanation
Start time
Mean and standard deviation of the start time of first charge session of the pattern. This is
measured at the left side of the pattern, see Figure 5.
End time
Mean and standard deviation of the end time of last charge session of the pattern. This is
measured at the right side of the pattern see Figure 5.
Duration Mean and standard deviation of the connection time.
TBSweekdays
Mean and standard deviation of time between two charge sessions during weekdays.
TBSweekends
Mean and standard deviation of time between two charge sessions during weekdays.
kWh
Two types of parameters are taken into account. The mean and deviation of the kWh charged;
and the mean and standard deviation of kWh charged divided by largest charge session over all
charge sessions. The latter discounts the effect of the car type.
Charging point volatility
Variability of amount of charging points per charge session corrected by available charging
points per session. This parameter is used both absolute as well as relative. Absolute is the
mean amount of charging points user per charge session. Relative takes into account the
relevant available charging points per session for the specific EV user.
Time Ratio Mean and standard deviation of the charging time divided by connection time
C,L,kWh
Correlation between the time between two charge sessions and the amount of charged kWh of
the last session. For this parameter 0 is no correlation and 1 is maximum correlation.
C,S,TR
Correlation between time ratio and start time of last session. For this parameter 0 is no
correlation and 1 is maximum correlation.
Pattern type
Type of pattern as displayed in example figures. The pattern is formed by the percentage of
total of connection hours per hour of the day
Overzicht meetbaarheid gebruikersgedrag in
24. Laadpatronen worden gebruikt ter segmentering
At least six (car independent*) user types could be distinguished from the dataset
* Sub categories could be defined after taking PHEV/BEV differences into account
** user is regarded as visitor since all charge sessions occurred during weekends
0.00
2.00
4.00
6.00
8.00
10.00
12.00
14.00
16.00
18.00
1 3 5 7 9 11 13 15 17 19 21 23
Commuter
0.00
1.00
2.00
3.00
4.00
5.00
6.00
0 2 4 6 8 10 12 14 16 18 20 22
Car sharing car
0.00
1.00
2.00
3.00
4.00
5.00
6.00
7.00
8.00
1 3 5 7 9 11 13 15 17 19 21 23
Early Resident
0.00
1.00
2.00
3.00
4.00
5.00
6.00
7.00
1 3 5 7 9 11 13 15 17 19 21 23
Late resident
0.00
1.00
2.00
3.00
4.00
5.00
6.00
7.00
8.00
9.00
1 3 5 7 9 11 13 15 17 19 21 23
Visitor **
0
2
4
6
8
10
12
14
16
18
20
0 2 4 6 8 10 12 14 16 18 20 22
Taxi
Source: CHIEF database
25. Voorbeeld: Taxi ondernemers blijken een
stabiliserend gebruikspatroon te hebben
Gemiddelde grootte laadsessie versie standard deviatie en aantal laadsessies op t=T
26. Voorbeeld de time Ratio is a leidende factor for V2X applicatie
Note:
1. In Amsterdam the non-smart charging points directly start charging after connection
2. Slack exists only after charging is finished while connection remains
3. To identify max battery capacity the data requires 1 time ratio 100% session and 1 << 100% session
The time ratio is defined as the charge time divided by the connection time
27. Time Ratio is a leading factor for V2X applications
The time ratio is defined as the charge time divided by the connection time
Slack
Note:
1. Sessions with time ratio <<100% are best usable for V2X applications
2. Sessions with time ratio of 100% are not useful for V2X, these mostly occur at car sharing session
No slack for power delivery/ postponing
or slower charging, low V2X potential
Slack for other charging modalities,
thus high V2X potential
28. 0
0.2
0.4
0.6
0.8
1
0 10 20 30 40 50 60 70 80 90
TimeRatio
kWh charged
Note: for this graph a subset of the data was used since not all charging times are present in the data
Predictive V2X technology based on charging behavior
reveals sweet spots for different applications
The dispersion in the graph is indicative for the predictability of the time ratio.
Time ratio versus kWh charged for Amsterdam
Potential sweet spot
for peak shaving
Potential sweet spot
for power delivery
Highv2x
potential
Lowv2x
potential
Source: CHIEF database
29. The avg kWh charged per session per user reveals several
potential clusters
Map of Amsterdam with avg kWh per user
Source: CHIEF database
30. Similar clusters were found for mean potential
kWh at start of session
Map of Amsterdam with mean potential kWh at start of session per user
Source: CHIEF database
31. Local mean time ratio displays a different pattern
Map of Amsterdam with mean time ratio per user
Low mean time ratios occur at different places than the previous slides display
Source: CHIEF databaseNote: a slightly different dataset was used due required to calculate the time ratio
Prescriptive analytics not only anticipates what will happen and when it will happen, but also why it will happen. Further, prescriptive analytics suggests decision options on how to take advantage of a future opportunity or mitigate a future risk and shows the implication of each decision option. Prescriptive analytics can continually take in new data to re-predict and re-prescribe, thus automatically improving prediction accuracy and prescribing better decision options.
Prescriptive analytics not only anticipates what will happen and when it will happen, but also why it will happen. Further, prescriptive analytics suggests decision options on how to take advantage of a future opportunity or mitigate a future risk and shows the implication of each decision option. Prescriptive analytics can continually take in new data to re-predict and re-prescribe, thus automatically improving prediction accuracy and prescribing better decision options.
Twee vormen van data gebruik
= Data mining
= Probleem gestuurde data analyse
http://youtu.be/pgaEE27nsQw
Proceedings of the 36th International Computers and Industrial Engineering Conference, C&IE 2006, June 2006, Taipei, Taiwan, pp.1-8.
Bb
Give several examples
Sweet spot for peak shaving when applied
Sweet spot for household power delivery
Avg kwh
select RFID, usetype as [Usertype], PostalCode as [Most visited postalcode], [localTimeratio]
from (
select TBL_ChargeSessions.RFID, postalcode, count(PostalCode) as [Count of Postalcode], pb.[Max count],
AVG([ChargeTime]/[ConnectionTime]) as [localTimeratio]
from TBL_ChargeSessions left outer join
(
select RFID, MAX([Count of postalcode]) as [Max count]
from (
select RFID, postalcode, count(postalcode) as [Count of postalcode]
from TBL_ChargeSessions
group by rfid,PostalCode
) as ps
group by RFID
) as pb on TBL_ChargeSessions.RFID = pb.RFID
where City = 'Amsterdam' and kWh>1 and [ConnectionTime]> ChargeTime
group by TBL_ChargeSessions.RFID, PostalCode, [Max count]
) as po left outer join TBL_RFID on RFID = TBL_RFID.RIFD
where [Count of Postalcode] = [Max count]
order by localTimeratio desc