Contenu connexe


Key Roles In Data-Driven Organisation

  1. Key Roles In Data-Driven Organisation Presented By: Durgesh Gupta, Mayura Zadane
  2. Agenda: ● Introduction ● Key Roles in Data-Driven Organisation. ○ Data Analyst ○ Data Engineer ○ Applied ML Engineering ■ Data Scientist ■ Statistician ■ Applied ML Engineer ■ Ethicist ■ Social Scientist ■ Researcher ○ Tech Lead ■ Analytics Manager ■ Decision Maker
  3. Introduction ● Data Science jobs are one of the hottest jobs of 21st century and its demand is increasing by the day ● In industry, there are different data science roles we come across ● It’s tough to get a general understanding of how they differ in terms of skill sets and what they work on ● Getting brief insights of key job roles and responsibilities of each title along with skills/qualifications can help in understanding roles in data science field.
  4. Introduction
  5. Key Roles in Data-Driven Organisation
  6. Data Engineer ● For many organizations, Data Engineers are first hires on a data team. ● Data Engineers develops, constructs, tests and maintains architectures of databases and systems. ● They gather data from other websites through web scraping, API’s or IoT devices and ingests the data into the data warehouse. ● Data Engineers create ETL (Extract, Transform and Load) processes to make sure that the data gets into the data warehouse. ● Responsible for building efficient data pipelines. Skill sets: ● Big data tools: Hadoop, Spark, Kafka, etc. ● SQL and NoSQL databases like PostgreSQL, Cassandra, MongoDB etc. ● R, Python, C/C++ Programming Languages. ● Cloud Services
  7. Data Analyst ● A Data Analyst collects, processes, performs statistical analysis and creates visualizations on data. ● Analysts implement feature engineering, feature selection, clean the data using programming languages, spreadsheets, and business intelligence tools to describe and categorize the data. ● The master data collected is managed by an analyst including creation, updation, deletion and processing confidential data. ● Analyst creates report and analysis. Provides expertise on data storage structure, data mining and data cleaning. Skills sets: ● Structured Query Language(SQL) or any databases ● Data Mining, cleaning ● Data Analysis, Visualizations ● R or Python Programming Language ● Presentation skills
  8. Applied ML Engineering
  9. Statistician ● Statisticians are professionals who apply statistical methods and models to real-world problems. ● They gather, analyze, and interpret data to aid in many business decision-making processes. ● Statisticians are valuable employees in a range of industries, and often seek roles in areas such as business, health and medicine, government, physical sciences, and environmental sciences. ● Daily tasks are likely to include: ○ Collecting, analyzing, and interpreting data ○ Identifying trends and relationships in data ○ Designing processes for data collection ○ Communicating findings to stakeholders ○ Advising organizational and business strategy ○ Assisting in decision making Skill sets: ● Statistical theory and methods. Data Mining & Machine Learning ● Distributed Computing (Hadoop) ● Databases (SQL and NoSQL) ● R, Python, Spark programming Language
  10. Applied ML Engineer ● The work of a Machine Learning Engineer is to bridge the gap between Data Scientist’s work and production environment. ● Machine Learning Engineer is more concerned with deploying production-ready models. ● Removes errors from data sets and find correct data representation methods. ● Deploys the machine learning model to be integrated into the application/ website. ● Scaling and optimizing the model for production. ● Monitoring and maintenance of deployed models Skill sets: ● Probability & Statistics ● Data Modeling and Evaluation. ● MLOps. ● Applying Machine Learning algorithms and libraries(Tensorflow, Pytorch) ● Software Engineering and system design(AWS, Azure, GCP)
  11. Data Scientist ● A Data Scientist work based on the visualization provided by the data analytics team to build and optimize classifiers using machine learning techniques ● Thoroughly clean data to discard irrelevant information and prepare the data for preprocessing and modeling ● Performs exploratory data analysis (EDA) to determine how to handle missing data. ● Discovers new algorithms to solve problems & build programs to improve current strategies. ● Perform feature engineering, feature selection to implement analytical methods, machine learning and statistical methods to prepare data for use in predictive and prescriptive modeling Skill sets: ● Programming: Python, Java ● Applying Machine Learning algorithms and libraries(Scikit Learn, Tensorflow, PyTorch) ● Predictive Modeling ● Maths and Stats ● Effective Communication
  12. Ethicist ● Data ethics is a cross-cutting discipline that assesses the wider societal impact of technology, producing recommendations for technologists and data professionals. It involves thinking about fairness, accountability, the law, moral dilemmas, and the risks involved in creating technology and data products and policies. ● Data Ethicist in teams will enable Data Engineers and Data Scientists to innovate responsibly and respond to the ongoing demand for implementing data ethics best practice. ● This critical role has been extremely successful in recent years in the private sector, and has been instrumental in the development of high-risk data and artificial intelligence (AI) products. ● Skill Sets: ○ communication skills (data) ○ applied knowledge of social sciences ○ stakeholder relationship management ○ analysis and synthesis (data ethics) ○ bridging the gap between the technical and non-technical (data ethics) ○ product development (data ethics) ○ empathy and inclusivity ○ ethics and privacy ○ Problem-solving ○ facilitating decisions and risks
  13. Social ScientistA social scientist ● AI has the potential to bring along diverse benefits for our health, safety and general well-being. ● A Social Scientist performs research on link between AI and societal impact of it. ● They can detect potential use of AI by considering societal implications of these technologies. ● Such individuals may be especially equipped to spot the problems in AI that aggravate long-ingrained prejudices. ● They have proper domain knowledge on problem statement for which AI is used. Social Scientist
  14. Researcher ● AI researchers conceptualize and explore new ways of leveraging data by developing new AI algorithms, i.e., they create and ask new questions that can be answered using AI. ● AI researchers focus on finding ways to analyze data in innovative ways for automated decision-making and action. ● AI researchers, research novel forms of AI technology to create new applications that use data to drive independent actions. ● Skill Set: ○ AI programming skills: This one goes without saying, but coding skills is a given for any professional in the AI and data science domain. The best programming languages for AI development currently are Python, Lisp, Prolog, R, C/C++ and Java. Out of these languages, Python is most preferred by both tech companies and AI researchers themselves, possibly because of its ease of use. ○ Analytical thinking: Since artificial intelligence is closely intertwined with data analysis, analytical skills are necessary for potential AI researchers. Having good analytical skills translates into the ability to ■ make sense of data ■ verify the validity of the data gathered ■ identify connections between different variables, and ■ form logical conclusions based on the available data.
  15. Tech Lead Roles
  16. Analytics Manager ● The complete cycle revolves around the enterprise goal. ● Identify the key business variables that the analysis needs to predict. ● Define the project goals by asking and refining "sharp" questions that are relevant, specific, and unambiguous. ● Find the relevant data that helps you answer the questions that define the objectives of the project. ● An Analytics Manager manages a team of analysts and data scientists Skills sets: ● R, Python , SQL, SAS, Java Programming ● Leadership & project management ● Data Mining & Predictive modeling ● Interpersonal Communication
  17. Decision Maker ● Real-world data sets are often noisy, are missing values, or have a host of other discrepancies. ● Aim is to produce a clean, high-quality data set whose relationship to the target variables is understood. ● Develop a solution architecture of the data pipeline that refreshes and scores the data regularly