Ce diaporama a bien été signalé.
Le téléchargement de votre SlideShare est en cours. ×

Data science | What is Data science

Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Chargement dans…3
×

Consultez-les par la suite

1 sur 32 Publicité

Data science | What is Data science

This video will give you an idea about Data science for beginners.
Also explain Data Science Process , Data Science Job Roles , Stages in Data Science Project

This video will give you an idea about Data science for beginners.
Also explain Data Science Process , Data Science Job Roles , Stages in Data Science Project

Publicité
Publicité

Plus De Contenu Connexe

Diaporamas pour vous (20)

Publicité
Publicité

Data science | What is Data science

  1. 1. WHAT IS DATA SCIENCE ? BY SHILPA KRISHNA RESEARCH SCHOLAR
  2. 2. Data Science Process DISCOVERY DATA PREPARATIO N MODEL PLANNIN G MODEL BUILDIN G OPERATI ON COMMUNICAT E RESULTS
  3. 3. DISCOVERY  It involves acquiring data from all the identified internal and external sources which helps you to answer the business question.  The data can be : 1. Logs from webservers 2. Data gathered from social media 3. Census datasets 4. Data streamed from online sources using APIs
  4. 4. DATA PREPARATION  Data can have lots of inconsistencies like missing value,blank columns,incorrect data format which needs to be cleaned.  You need to process,explore and condition data before modeling.  The cleaner your data, the better are your predictions.
  5. 5. MODEL PLANNING  In this stage, you need to determine the method and technique to draw the relation between input variables.  Planning for a model is performed by using different statistical formulas and visualization tools like SQL analysis services, R and SAS/access
  6. 6. MODEL BUILDING  Data scientist distributes datasets for training and testing.  Techniques like association, classification, and clustering are applied to the training dataset.  The model once prepared is tested against the “testing” dataset
  7. 7. OPERATIONALIZE  You deliver the final baselined model with reports,code and technical documents.  Model is deployed into a real-time production environment after through testing.
  8. 8. COMMUNICATE RESULTS  The key findings are communicated to all stakeholders.  This helps you to decide if the results of the project are a success or a failure based on the inputs from the model.
  9. 9. MOST PROMINENT DATA SCIENTIST JOB TITLES ARE : 1) Data scientist 2) Data engineer 3) Data analyst 4) Statistician 5) Data admin 6) Business analyst
  10. 10. Data Scientist ROLE LANGUAGES  It is a professional who manages enormous amounts of data to come up with compelling business visions by using various tools, techniques, methodologies, algorithms etc…  R  SAS  PYTHON  SQL  HIVE  MATLAB  PIG  SPARK
  11. 11. Data Engineer ROLE LANGUAGES  He is working with large amounts of data and develops constructs, tests and maintains architectures like large scale processing system and databases.  SQL  HIVE  R  SAS  MATLAB  PYTHON  JAVA  RUBY  C++  PERL
  12. 12. Data Analyst ROLE LANGUAGES  Responsible for mining vast amounts of data and look for relationships, patterns, trends in data.  Later deliver compeling reporting and visualization for analyzing the data to take the most viable business decisions.  R  PYTHON  HTML  JS  C  C++  SQL
  13. 13. Statistician ROLE LANGUAGES  Collects, analyses, understand qualitative and quantitative data by using statistical theories and methods.  SQL  R  MATLAB  TABLEAU  PYTHON  PERL  SPARK  HIVE
  14. 14. Data Administrator ROLE LANGUAGES  Data admin should ensure that the database is accessible to all relevant users also makes sure that it is performing correctly and is being kept safe from hacking  RUBY on Rails  SQL  JAVA  C#  PYTHON
  15. 15. Business Analyst ROLE LANGUAGES  This professional need to improves business processes and He is an intermediary between the business executive team and IT department  SQL  TABLEAU  POWER BI  PYTHON
  16. 16. DEFINE THE GOAL  Define a measurable and quantifiable goal  Goal should be specific and precise  Goal is come up with candidate hypothesis. These hypothesis can then be turned into concrete questions or goals for a full-scale modeling project.
  17. 17. COLLECT AND MANAGE DATA  Time consuming step  Conduct initial exploration and visualization of the data  Clean data: repair data errors and transform variables as needed
  18. 18. BUILD THE MODEL Most common data science modeling tasks are  Classification  Scoring  Ranking  Clustering  Finding relations  Characterization
  19. 19. EVALUATE AND CRITIQUE MODEL Once you have a model, you need to determine if it meets your goals :  Is it accurate enough for your needs ?  Does it perform better than the obvious guess ?  Do the results of the model make sense in the context of the problem domain ?
  20. 20. PRESENT RESULTS AND DOCUMENT  Present results to your project sponser and other stakeholders.  Document the model for those in the organization who are responsible for using running and maintaining the model once it has been deployed.
  21. 21. DEPLOY MODEL  Make sure that the model can be updated as its environment changes.  The model initially be deployed in a small pilot program.
  22. 22. Several ways of gathering data for analysis are :  CSV FILE  FLAT FILE(tab, space or any other separator)  TEXT FILE(In a single file- reading data all at once) or (reading data line by line)  ZIP FILE  APIs(JSON)  MULTIPLE TEXT FILE(data is split over multiple text files)  DOWNLOAD FILE FROM INTERNET(file hosted on a server)  WEBPAGE(scraping)  RDBMS(SQL tables)
  23. 23.  Relational database uses tables which are called Records  Establish connections among records by using primary key and foreign key  Allows users to establish defined relationships between tables  In RDBMS, we use SQL instructions to reproduce and analyze data separately
  24. 24. SOME COMMONLY USED PLOTS FOR EDA ARE :  Histogram  Scatter plots  Maps  Feature corelation plot(Heatmap)  Time series plots
  25. 25. Data management platforms enables organizations and enterprises to use data analytics in beneficial ways, such as :  Personalizing the customer experience  Adding value to customer interactions  Improving customer engagement  Increasing customer loyalty  Reaping and revenues associated with data driven marketing  Identifying the root causes of marketing failures and business issues in real time

×