Automatizando tarefas repetitivas com código

•

4 j'aime•3,259 vues

Este documento parece descrever algum tipo de processo de trabalho repetitivo ou tarefas. Contém várias repetições de termos como "Volume", "DoWork()" e traços, sugerindo algum tipo de fluxo de trabalho ou processo sequencial. Infelizmente o documento não fornece muitas informações além disso.

Technologie

Automatizando tarefas repetitivas com código

Recommandé

a real-time architecture using Hadoop and Storm at DevoxxNathan Bijnens

Data Mesh using Microsoft FabricNathan Bijnens

Data Mesh in Azure using Cloud Scale Analytics (WAF)Nathan Bijnens

Dataminds - ML in ProductionNathan Bijnens

Azure Databricks & Spark @ Techorama 2018Nathan Bijnens

Big Data Expo '18 - Microsoft AINathan Bijnens

Spark on Azure, a gentle introduction (nov 2015)Nathan Bijnens

Cloudera, Azure and Big Data at Cloudera Meetup '17Nathan Bijnens

Recommandé

a real-time architecture using Hadoop and Storm at DevoxxNathan Bijnens

Data Mesh using Microsoft FabricNathan Bijnens

Data Mesh in Azure using Cloud Scale Analytics (WAF)Nathan Bijnens

Dataminds - ML in ProductionNathan Bijnens

Azure Databricks & Spark @ Techorama 2018Nathan Bijnens

Big Data Expo '18 - Microsoft AINathan Bijnens

Spark on Azure, a gentle introduction (nov 2015)Nathan Bijnens

Cloudera, Azure and Big Data at Cloudera Meetup '17Nathan Bijnens

Microsoft AI at SAI '17Nathan Bijnens

Microsoft Advanced Analytics @ Data Science Ghent '16Nathan Bijnens

Virdata: lessons learned from the Internet of Things and M2M Cloud Services @...Nathan Bijnens

A real-time (lambda) architecture using Hadoop & Storm (NoSQL Matters Cologne...Nathan Bijnens

A real-time architecture using Hadoop and Storm @ JAX LondonNathan Bijnens

Microsoft Big Data @ SQLUG 2013Nathan Bijnens

A real time architecture using Hadoop and Storm @ FOSDEM 2013Nathan Bijnens

Getting more out of your big dataNathan Bijnens

Hadoop Pig: MapReduce the easy way!Nathan Bijnens

Contenu connexe

Plus de Nathan Bijnens

Microsoft AI at SAI '17Nathan Bijnens

Microsoft Advanced Analytics @ Data Science Ghent '16Nathan Bijnens

Virdata: lessons learned from the Internet of Things and M2M Cloud Services @...Nathan Bijnens

A real-time (lambda) architecture using Hadoop & Storm (NoSQL Matters Cologne...Nathan Bijnens

A real-time architecture using Hadoop and Storm @ JAX LondonNathan Bijnens

Microsoft Big Data @ SQLUG 2013Nathan Bijnens

A real time architecture using Hadoop and Storm @ FOSDEM 2013Nathan Bijnens

Getting more out of your big dataNathan Bijnens

Hadoop Pig: MapReduce the easy way!Nathan Bijnens

Plus de Nathan Bijnens (9)

Microsoft AI at SAI '17

Microsoft Advanced Analytics @ Data Science Ghent '16

Virdata: lessons learned from the Internet of Things and M2M Cloud Services @...

A real-time (lambda) architecture using Hadoop & Storm (NoSQL Matters Cologne...

A real-time architecture using Hadoop and Storm @ JAX London

Microsoft Big Data @ SQLUG 2013

A real time architecture using Hadoop and Storm @ FOSDEM 2013

Getting more out of your big data

Hadoop Pig: MapReduce the easy way!

Automatizando tarefas repetitivas com código

5. Volume

10. - - - -

11.

12.

13.

14.

15.

16.

17.

18.

19.

20.

21.

22.

23.

24.

25.

26.

27.

28.

29.

30.

31.

32.

33. … … DoWork() DoWork() DoWork() …

34.

35.

36.

37.

38.

39.

40.

41.

42.

43.

44.

45.

46.

47. - -

48.

49.

50.

51.

52.

53.

54.

55. - - - - - - -

56.

57.

58.

59.

60.

61.

62. - -

63.

64.

65.

66. - - -

Notes de l'éditeur

44 times as much data in the next decade, 15 Zb in 2015Data silos (erp, crm, …)CustomersTrimble (3Tb in hun database systeem)Truvo (wijzigen van een index duurt 24u)Traditionele systemen kunnen dit volume niet aan.How many data do you have?Turn 12 terabytes of Tweets created each day into improved product sentiment analysisConvert 350 billion annual meter readings to better predict power consumption
Real timeTime sensitivedecisiontakingFrauddetectionEnergy allocationMarketing campaignsMarket transactionsSolution:Real-time solutions in combination with batch (hadoop)Nosql systems
StructuredUnstructured80% is unstructured data, A key drawback of using traditional relational database systems is that they're not good at handling variable data. A flexible data modelWord, email, foto, text, video, …?What are your needs regarding variety?The end result: bringing structure into unstructured dataMonitor 100’s of live video feeds from surveillance cameras to target points of interestExploit the 80% data growth in images, video and documents to improve customer satisfaction
We live in a ever changing world. an organization ability to quickly modify their computing systems to respond to changing business requirements.Change control (adding or changing features of a system between releases)Customization of specific data servicesHow Agile are you?How to be AgileCultureSystemsSchema flexibility
It is easier to store all data in a cost effective way.Compare to DWH world.
The # of followers on Twitter = all follows & unfollows combined.
Data = event
It is easier to store all data in a cost effective way.Compare to DWH world.
Only CD, no more CRUD.Information might ofcourse change.Fault Tolerance
Allows state regeneration. Eg. What was my bank balance on 1 may 2005?
Queries as pure functions that take all data as input is the most general formulation.Different functions may look at different portions and aggregate information in different ways.
Too slow; might be petabyte scale
The batch layer can calculate anything (given enough time).
Doesn’t have to be Hadoop. The importance here is a Distributed FS combined with a processing framework.
Source: PolybasePass2012.pptx
In some circumstances.
In some circumstances.
Consistency (all nodes see the same data at the same time)Availability (a guarantee that every request receives a response about whether it was successful or failed)Partition tolerance (the system continues to operate despite arbitrary message loss or failure of part of the system)http://codahale.com/you-cant-sacrifice-partition-tolerance/
Eg. Unique counts
Nimbus:Manages the clusterWorker Node:Supervisor:Manages workers; restarts them if neededWorkerPhysical JVM process.Execute tasks (those are spread evenly across the workers)TasksEach in his own Thread. Is the actual Bolt or Spout.Processes the stream.
Tuple:Named list of valuesDynamicly typedStreamSequence of Tuples
SpoutSource of StreamsSometimes replayableBoltStream transformationsAt least 1 input stream0 - * output streams
http://www.quora.com/Apache-Hadoop/What-is-the-advantage-of-writing-custom-input-format-and-writable-versus-the-TextInputFormat-and-Text-writable/answer/Eric-Sammer?srid=PU&st=ns
http://www.quora.com/Apache-Hadoop/What-is-the-advantage-of-writing-custom-input-format-and-writable-versus-the-TextInputFormat-and-Text-writable/answer/Eric-Sammer?srid=PU&st=ns