My slides for my talk regarding machine learning and data science. Includes working examples with accompanying repo with reproducible code and data sets available.
The science of statistics deals with the collection, analysis, interpretation, and presentation of data. We see and use data in our everyday lives. The measure of whether the results of research were due to chance. The more statistical significance assigned to an observation, the less likely the observation occurred by chance.
Get detail about what is machine learning is about, its applications from computational biology, web search, finance, e-commerce, social media and much more.
Just finished a basic course on data science (highly recommend it if you wish to explore what data science is all about). Here are my takeaways from the course.
BigMLSchool: ML in the Healthcare IndustryBigML, Inc
Text Analysis: Discovering Insights for the Healthcare Industry.
Learn how Machine Learning helps discover insights for the Healthcare industry by analyzing text. The lecturer is Tomáš Kliegr, Associate Professor at the Department of Information and Knowledge Engineering at Prague University of Economics and Business (VSE).
*Machine Learning School for Business Schools 2021: Virtual Conference.
This course provides an overview of data analytics and business intelligence. It teaches students how to analyze and tell stories with data, which is an in-demand skill as data collection increases. The course covers topics such as SQL, data modeling, Power BI, data warehousing, and big data to prepare students for careers as data analysts. Upon completing the hands-on training, students will feel confident to begin work in the industry analyzing, extracting, transforming, and loading data according to business requirements.
WiTPeu meetup :Analytics con Andrea Villaneswitperu
The document promotes an analytics meetup event. It discusses the goals of analytics, which include deriving insights from vast amounts of data and communicating those insights to support decision making. It also discusses the high demand for analytics talent and profiles data scientist careers. A variety of industries that can benefit from analytics are listed. Text mining techniques like categorization and cluster analysis are briefly explained. Finally, the document advertises a Master's in Analytics program.
This document discusses data science and data scientists. It defines data science as using scientific methods and processes to extract knowledge and insights from structured and unstructured data. Data scientists are analytical experts who use technical skills and curiosity to solve complex problems by straddling both business and IT. They have skills in mathematics, technology, and business strategy. As data has become more valuable, data scientist roles have evolved from statisticians and analysts to help organizations gain insights from large data sources. Managers should learn to identify data science talent to make their organizations more productive by adding data-driven insights.
The science of statistics deals with the collection, analysis, interpretation, and presentation of data. We see and use data in our everyday lives. The measure of whether the results of research were due to chance. The more statistical significance assigned to an observation, the less likely the observation occurred by chance.
Get detail about what is machine learning is about, its applications from computational biology, web search, finance, e-commerce, social media and much more.
Just finished a basic course on data science (highly recommend it if you wish to explore what data science is all about). Here are my takeaways from the course.
BigMLSchool: ML in the Healthcare IndustryBigML, Inc
Text Analysis: Discovering Insights for the Healthcare Industry.
Learn how Machine Learning helps discover insights for the Healthcare industry by analyzing text. The lecturer is Tomáš Kliegr, Associate Professor at the Department of Information and Knowledge Engineering at Prague University of Economics and Business (VSE).
*Machine Learning School for Business Schools 2021: Virtual Conference.
This course provides an overview of data analytics and business intelligence. It teaches students how to analyze and tell stories with data, which is an in-demand skill as data collection increases. The course covers topics such as SQL, data modeling, Power BI, data warehousing, and big data to prepare students for careers as data analysts. Upon completing the hands-on training, students will feel confident to begin work in the industry analyzing, extracting, transforming, and loading data according to business requirements.
WiTPeu meetup :Analytics con Andrea Villaneswitperu
The document promotes an analytics meetup event. It discusses the goals of analytics, which include deriving insights from vast amounts of data and communicating those insights to support decision making. It also discusses the high demand for analytics talent and profiles data scientist careers. A variety of industries that can benefit from analytics are listed. Text mining techniques like categorization and cluster analysis are briefly explained. Finally, the document advertises a Master's in Analytics program.
This document discusses data science and data scientists. It defines data science as using scientific methods and processes to extract knowledge and insights from structured and unstructured data. Data scientists are analytical experts who use technical skills and curiosity to solve complex problems by straddling both business and IT. They have skills in mathematics, technology, and business strategy. As data has become more valuable, data scientist roles have evolved from statisticians and analysts to help organizations gain insights from large data sources. Managers should learn to identify data science talent to make their organizations more productive by adding data-driven insights.
Want to pursue career in Data Science? Have knowledge of limited opportunities? Don't worry!
This e- book helps readers to know about top career opportunities one can pursue in Data Science. Further info.- https://www.henryharvin.com/business-analytics-course-with-python
In this talk you will see how Asw:maximus, real-time decision engine, can be applied in the decision making in sales departments. You will get a chance to see State-of-the-art algothims for time-series, as well as simuliation of sale by using this algorithms.
Attend The Data Science Course in Bangalore From ExcelR. Practical Data Science Course in Bangalore Sessions With Assured Placement Support From Experienced Faculty. ExcelR Offers The Data Science Course in Bangalore.
This document provides an introduction to analytics and data science. It defines analytics as the use of data, analysis, modeling, and fact-based management to drive decisions and actions. The benefits of analytics include better understanding of business dynamics, improved performance, and stronger decision making. Analytics can provide competitive advantages by exploiting unique organizational data. However, analytics may not be practical when there is no time or data, or when decisions rely heavily on experience. Becoming a data scientist requires skills in statistics, programming, communication, and more.
Predictive analytics uses past data and statistical analysis techniques to create models that predict future outcomes and trends. It has applications like LinkedIn connection recommendations, Amazon product recommendations, and Netflix movie recommendations. Good predictive models rely on robust data from the past and present, accounting for factors like customer purchase history, and require ongoing evaluation of assumptions and changing trends to maintain accuracy over time.
This use case showcases how Machine Learning can help you understand your customers to better develop personalized relationships. The lecturer is Arturo Moreno, Associate Professor at ICADE Business School, and a technology entrepreneur, investor, and innovative leader working on the intersection of venture capital and Machine Learning.
*Machine Learning School for Business Schools 2021: Virtual Conference.
This document provides an overview of the steps involved in data analysis by comparing it to cooking. It outlines key steps such as picking a question to analyze, identifying appropriate data sources, cleaning the data, using tools to analyze the data, and presenting the final results. Specific data sources that could be relevant for adult education programs are also mentioned, such as ASISTS, Census data, and other government open data portals. Examples of sample questions that could be analyzed and specific exercises for cleaning employment status data are provided. The document emphasizes that data analysis is an iterative process and encourages sharing results through blogs and presentations.
This document provides tips for aspiring data scientists. It advises them to start by focusing on a topic that interests them and to clearly define their objectives and data collection process. It also recommends that they visualize their data, understand the context, look for additional insights, evaluate results, and find effective uses of the data. The document notes that data is becoming increasingly important in all industries and companies without data-savvy managers will be at a disadvantage.
Prediction of company bankruptcy. Learn about how Machine Learning finds insights of the Czech Business Landscape, presented by Lucie Beranová, Ph.D. Student at Prague University of Economics and Business (VSE) and Data Scientist at Vodafone.
*Machine Learning School for Business Schools 2021: Virtual Conference.
This document provides an overview of the steps involved in conducting a data analysis project, likening it to a cooking process. It outlines 10 key steps: 1) defining the question and scope, 2) identifying relevant data sources, 3) assessing data accuracy and validity, 4) cleaning the data, 5) choosing analysis tools, 6) preparing the data, 7) analyzing the data, 8) presenting the results, 9) iterating based on findings, and 10) sharing results. Specific data sources like ASISTS, Census data, and tools like Excel are also discussed. The document aims to provide guidance on executing an effective data analysis from start to finish.
This document is a resume for Grace Han Yang that summarizes her education and experience in business analytics. She received a Master's degree in Business Analytics from University College London, and a Bachelor's degree in Business Intelligence and Information Management from Southwestern University of Finance and Economics in China. Her experience includes data analysis internships at the City Council of Croydon and EDF Energy, where she conducted analytics, visualization, and machine learning projects on large datasets.
This document provides an overview and introduction to big data. It discusses the technical challenges of big data including issues of volume, variety, velocity and veracity. It also discusses solutions like Hadoop, MapReduce, and big data databases. Additionally, it covers big data analytics including different levels of analytics maturity and techniques like data mining, machine learning, and predictive analytics. Finally, it provides resources for learning more about big data including online courses, sandbox environments, open source tools, and public datasets.
Help Me, Help You: Supporting Your DataData Con LA
Data Con LA 2020
Description
Understand the data product lifecycle and ensure your data is set up for success
In order to get the most out of your data team, understanding the infrastructure needs at every step of the data product lifecycle is imperative. In my presentation we'll cover: - Collect the Right Data: Collect what you want in the future not where you are now - Silo to Warehouse: Consolidating disparate data sources and establish source of truth - Setting Your Team Up for Success: Development Platform and DataOps - Don't Forget to A.I.M. - Thinking about product adoption, implementation, and monitoring - So What? - Tracking impact and making the case for more data
Speaker
Kisa Brostrom, boodleAI, Vice President of Data
Data science involves analyzing data to extract meaningful insights. It uses principles from fields like mathematics, statistics, and computer science. Data scientists analyze large amounts of data to answer questions about what happened, why it happened, and what will happen. This helps generate meaning from data. There are different types of data analysis including descriptive analysis, which looks at past data, diagnostic analysis, which finds causes of past events, and predictive analysis, which forecasts future trends. The data analysis process involves specifying requirements, collecting and cleaning data, analyzing it, interpreting results, and reporting findings. Tools like SAS, Excel, R and Python are used for these tasks.
This document provides an overview of investing in AI-driven startups. It outlines Dr. Roy Lowrance's background working with machine learning systems and startups. It then lists 100 AI startups that have raised over $11.7 billion total. The agenda covers an overview of AI, machine learning and big data, the life cycle of AI projects, and sustainable competitive advantages for AI-based startups.
Analytics is Taking over the World (Again) - UKOUG Tech'17Rittman Analytics
Mark Rittman presented at the UKOUG Tech'17 Conference in December 2017. He discussed how analytics has changed business models and driven disruption twice already. The first wave focused on using analytics for operational efficiency. The second wave saw companies like Amazon, Netflix, and Uber build entirely new data-driven business models. Now, a third wave is underway where analytics and machine learning are being embedded into all business applications and fueling new personalized, data-enriched offerings.
As business owners and execs, as product managers and sales people, we are surrounded by big data. Yet, we have big questions about our customers that we still don't have the answers to. We know a lot about what people are doing but not really the underlying reasons why. To get at that why you need to leverage the power of SMALL data.
Ten years after the term ‘Big Data’ infiltrated the world of marketing, why is it still complex to embed it in the decision-making process? In this webinar, we delve into exploiting data and analytics in favor of your business.
I show some methods for extracting value from your marketing analytics data using modelling techniques. Topics include:
● Sales Forecasting
● What’s in your customers shopping carts?
● What are your customers searching for?
Python source code for all the charts is available on Ayima's GitHub:
● https://github.com/Ayima/google-merch-data-mining
● https://github.com/Ayima/onsite-search-data-mining
The document discusses machine learning considerations at Meetup. It describes how Meetup uses machine learning to improve personalization and insights through recommendations and predictions. It also discusses how Meetup's data, machine learning, and data science teams work together to build ML products. Some key challenges covered include selecting objective functions, making progress on cross-domain projects, prioritizing data needs, translating local model impacts to global effects, and determining model ownership and governance.
Visuals present better and quicker insights when forecasting sales. At a glance business strategies can be planned - time periods, geographic locations, pick variables that can highlight what works or doesn't, where it scores or doesn't, join two or more variables that work in specific geographical locations or don't, etc. All this put together makes data virtualization a very nifty tool to project what can make or break your predictions for sales!
Want to pursue career in Data Science? Have knowledge of limited opportunities? Don't worry!
This e- book helps readers to know about top career opportunities one can pursue in Data Science. Further info.- https://www.henryharvin.com/business-analytics-course-with-python
In this talk you will see how Asw:maximus, real-time decision engine, can be applied in the decision making in sales departments. You will get a chance to see State-of-the-art algothims for time-series, as well as simuliation of sale by using this algorithms.
Attend The Data Science Course in Bangalore From ExcelR. Practical Data Science Course in Bangalore Sessions With Assured Placement Support From Experienced Faculty. ExcelR Offers The Data Science Course in Bangalore.
This document provides an introduction to analytics and data science. It defines analytics as the use of data, analysis, modeling, and fact-based management to drive decisions and actions. The benefits of analytics include better understanding of business dynamics, improved performance, and stronger decision making. Analytics can provide competitive advantages by exploiting unique organizational data. However, analytics may not be practical when there is no time or data, or when decisions rely heavily on experience. Becoming a data scientist requires skills in statistics, programming, communication, and more.
Predictive analytics uses past data and statistical analysis techniques to create models that predict future outcomes and trends. It has applications like LinkedIn connection recommendations, Amazon product recommendations, and Netflix movie recommendations. Good predictive models rely on robust data from the past and present, accounting for factors like customer purchase history, and require ongoing evaluation of assumptions and changing trends to maintain accuracy over time.
This use case showcases how Machine Learning can help you understand your customers to better develop personalized relationships. The lecturer is Arturo Moreno, Associate Professor at ICADE Business School, and a technology entrepreneur, investor, and innovative leader working on the intersection of venture capital and Machine Learning.
*Machine Learning School for Business Schools 2021: Virtual Conference.
This document provides an overview of the steps involved in data analysis by comparing it to cooking. It outlines key steps such as picking a question to analyze, identifying appropriate data sources, cleaning the data, using tools to analyze the data, and presenting the final results. Specific data sources that could be relevant for adult education programs are also mentioned, such as ASISTS, Census data, and other government open data portals. Examples of sample questions that could be analyzed and specific exercises for cleaning employment status data are provided. The document emphasizes that data analysis is an iterative process and encourages sharing results through blogs and presentations.
This document provides tips for aspiring data scientists. It advises them to start by focusing on a topic that interests them and to clearly define their objectives and data collection process. It also recommends that they visualize their data, understand the context, look for additional insights, evaluate results, and find effective uses of the data. The document notes that data is becoming increasingly important in all industries and companies without data-savvy managers will be at a disadvantage.
Prediction of company bankruptcy. Learn about how Machine Learning finds insights of the Czech Business Landscape, presented by Lucie Beranová, Ph.D. Student at Prague University of Economics and Business (VSE) and Data Scientist at Vodafone.
*Machine Learning School for Business Schools 2021: Virtual Conference.
This document provides an overview of the steps involved in conducting a data analysis project, likening it to a cooking process. It outlines 10 key steps: 1) defining the question and scope, 2) identifying relevant data sources, 3) assessing data accuracy and validity, 4) cleaning the data, 5) choosing analysis tools, 6) preparing the data, 7) analyzing the data, 8) presenting the results, 9) iterating based on findings, and 10) sharing results. Specific data sources like ASISTS, Census data, and tools like Excel are also discussed. The document aims to provide guidance on executing an effective data analysis from start to finish.
This document is a resume for Grace Han Yang that summarizes her education and experience in business analytics. She received a Master's degree in Business Analytics from University College London, and a Bachelor's degree in Business Intelligence and Information Management from Southwestern University of Finance and Economics in China. Her experience includes data analysis internships at the City Council of Croydon and EDF Energy, where she conducted analytics, visualization, and machine learning projects on large datasets.
This document provides an overview and introduction to big data. It discusses the technical challenges of big data including issues of volume, variety, velocity and veracity. It also discusses solutions like Hadoop, MapReduce, and big data databases. Additionally, it covers big data analytics including different levels of analytics maturity and techniques like data mining, machine learning, and predictive analytics. Finally, it provides resources for learning more about big data including online courses, sandbox environments, open source tools, and public datasets.
Help Me, Help You: Supporting Your DataData Con LA
Data Con LA 2020
Description
Understand the data product lifecycle and ensure your data is set up for success
In order to get the most out of your data team, understanding the infrastructure needs at every step of the data product lifecycle is imperative. In my presentation we'll cover: - Collect the Right Data: Collect what you want in the future not where you are now - Silo to Warehouse: Consolidating disparate data sources and establish source of truth - Setting Your Team Up for Success: Development Platform and DataOps - Don't Forget to A.I.M. - Thinking about product adoption, implementation, and monitoring - So What? - Tracking impact and making the case for more data
Speaker
Kisa Brostrom, boodleAI, Vice President of Data
Data science involves analyzing data to extract meaningful insights. It uses principles from fields like mathematics, statistics, and computer science. Data scientists analyze large amounts of data to answer questions about what happened, why it happened, and what will happen. This helps generate meaning from data. There are different types of data analysis including descriptive analysis, which looks at past data, diagnostic analysis, which finds causes of past events, and predictive analysis, which forecasts future trends. The data analysis process involves specifying requirements, collecting and cleaning data, analyzing it, interpreting results, and reporting findings. Tools like SAS, Excel, R and Python are used for these tasks.
This document provides an overview of investing in AI-driven startups. It outlines Dr. Roy Lowrance's background working with machine learning systems and startups. It then lists 100 AI startups that have raised over $11.7 billion total. The agenda covers an overview of AI, machine learning and big data, the life cycle of AI projects, and sustainable competitive advantages for AI-based startups.
Analytics is Taking over the World (Again) - UKOUG Tech'17Rittman Analytics
Mark Rittman presented at the UKOUG Tech'17 Conference in December 2017. He discussed how analytics has changed business models and driven disruption twice already. The first wave focused on using analytics for operational efficiency. The second wave saw companies like Amazon, Netflix, and Uber build entirely new data-driven business models. Now, a third wave is underway where analytics and machine learning are being embedded into all business applications and fueling new personalized, data-enriched offerings.
As business owners and execs, as product managers and sales people, we are surrounded by big data. Yet, we have big questions about our customers that we still don't have the answers to. We know a lot about what people are doing but not really the underlying reasons why. To get at that why you need to leverage the power of SMALL data.
Ten years after the term ‘Big Data’ infiltrated the world of marketing, why is it still complex to embed it in the decision-making process? In this webinar, we delve into exploiting data and analytics in favor of your business.
I show some methods for extracting value from your marketing analytics data using modelling techniques. Topics include:
● Sales Forecasting
● What’s in your customers shopping carts?
● What are your customers searching for?
Python source code for all the charts is available on Ayima's GitHub:
● https://github.com/Ayima/google-merch-data-mining
● https://github.com/Ayima/onsite-search-data-mining
The document discusses machine learning considerations at Meetup. It describes how Meetup uses machine learning to improve personalization and insights through recommendations and predictions. It also discusses how Meetup's data, machine learning, and data science teams work together to build ML products. Some key challenges covered include selecting objective functions, making progress on cross-domain projects, prioritizing data needs, translating local model impacts to global effects, and determining model ownership and governance.
Visuals present better and quicker insights when forecasting sales. At a glance business strategies can be planned - time periods, geographic locations, pick variables that can highlight what works or doesn't, where it scores or doesn't, join two or more variables that work in specific geographical locations or don't, etc. All this put together makes data virtualization a very nifty tool to project what can make or break your predictions for sales!
The right path to making search relevant - Taxonomy Bootcamp London 2019OpenSource Connections
This document discusses improving search relevance. It notes that search quality has three aspects: relevance, performance, and experience. It emphasizes that improving relevance requires a cross-functional search team that is educated, empowered, and builds skills internally. It also stresses the importance of continuous measurement and refinement through metrics, instrumentation, and open source tools. The overall message is that achieving search relevance is as much a people problem as a technical one.
This is the result of a research study conducted in Pivotal Chicago office to discover the metrics of product success, prioritization on product decisions and the overall key learnings and feedback from the user interviews conducted.
Data Architecture Strategies: Artificial Intelligence - Real-World Applicatio...DATAVERSITY
Artificial Intelligence (AI) may conjure up images of robots and science fiction. But AI has practical applications in today’s data-driven organization for product recommendation engines, customer support, inventory management, and more. To support AI in order to drive concrete business outcomes, a strong data foundation is needed. This webinar will discuss practical applications for AI in your organization, and how to build a data architecture to support its use.
How to Use Data Effectively by Abra Sr. Business AnalystProduct School
Key Takeaways from this presentation include:
- How data is used to run day to day operations
- How data is used to influence product decisions and marketing strategies
- Which skills are necessary to become self-serving in data tasks regardless of core responsibilities
Business leaders everywhere are looking to data to inform their decision making. Accompanying this demand are misunderstandings of what it takes to transform data into something that can inform a decision. What is the data infrastructure required? In this talk, I'll dispel some of these misunderstandings and discuss what it takes to build good data infrastructure. I'll discuss the components of a good data infrastructure. The best practices and available tools for gathering data, processing it, storing it, analyzing it and communicating the results. The goal is for these components to create a data infrastructure which can evolve from simple reporting to sophisticated insights for decision making.
Presented at OpenWest 2018
What Are the Basics of Product Manager Interviews by Google PMProduct School
Ankit walked through an intro to the Product Manager role, the skills needed, and how the role differs between small and large companies. He wrapped up with some advice that's helped him in his Product Manager interviews over the years.
He gave a structured approach to thinking about what a Product Manager actually does (structured, meaning no "top 10" lists) and what are the skills you need to do well as a Product Manager.
How to Leverage Traditional Media for a Successful Omnichannel StrategyTinuiti
Today’s shoppers are changing and evolving — and with that, your omnichannel retail strategy should too. Shopping journeys now go through a variety of branded touchpoints, both digital and physical, and they are nowhere near linear shopping journeys. Retailers need to make sure to be agile and responsive to customer needs with branded touchpoints to ensure a consistent buying experience across channels, both online and offline. Tune into our webinar as our team of experts discuss how we were able to optimize in store coupon redemption, attribute in store revenue to proper digital channels, and properly retarget consumers for a cohesive customer experience for our client’s success.
How can I become a data scientist? What are the most valuable skills to learn for a data scientist now? Could I learn how to be a data scientist by going through online tutorials? What does a data scientist do?
These are only some of the questions that are being discussed online, on blogs, on forums and on knowledge-sharing platforms like Quora.
Let me share the Beginner's Guide to Data Science which will be really helpful to you.
Also Checkout: http://bit.ly/2Mub6xP
Better Living Through Analytics - Strategies for Data DecisionsProduct School
Data is king! Get ready to understand how a successful analytics team can empower managers from product, marketing, and other areas to make effective, data-driven decisions.
Louis Cialdella, a data scientist at ZipRecruiter, shared some case studies and successful strategies that he has used at ZipRecruiter as well as previous experiences. The purpose of this data talk was to enlighten people on how to make sure that analysts can successfully partner with other departments and get them the information they need to do great things.
Data Drive Your Content Creation - Dawn of the Data Age Lecture SeriesLuciano Pesci, PhD
Content really is king. Whether you’re trying to improve your organic search ranking, test new lead generation channels, or establish your brand as an industry thought leader, you need to create engaging content. But a prerequisite for this is understanding who your audience is and what their pain points are at every stage of their customer journey. This lecture will teach you a data-driven approach to the content creation process and will include our first guest lecturer, Trevor Crump the Director of Acquisition Marketing for Alliance Health, who will reveal his data-driven approach to content creation and how this has been a huge success for his organization.
This Lecture Will:
-TEACH THE DATA-DRIVEN APPROACH TO CONTENT CREATION.
-SHOW YOU HOW TO OPTIMIZE CONTENT FOR THE ENTIRE CUSTOMER JOURNEY.
-EXPLAIN HOW TO MEASURE THE IMPACT OF DATA-DRIVEN CONTENT ON SALES.
You can watch this lecture here: https://youtu.be/g8UXdIchqrw
Similaire à Machine Learning - Startup weekend UCSB 2018 (20)
Build applications with generative AI on Google CloudMárton Kodok
We will explore Vertex AI - Model Garden powered experiences, we are going to learn more about the integration of these generative AI APIs. We are going to see in action what the Gemini family of generative models are for developers to build and deploy AI-driven applications. Vertex AI includes a suite of foundation models, these are referred to as the PaLM and Gemini family of generative ai models, and they come in different versions. We are going to cover how to use via API to: - execute prompts in text and chat - cover multimodal use cases with image prompts. - finetune and distill to improve knowledge domains - run function calls with foundation models to optimize them for specific tasks. At the end of the session, developers will understand how to innovate with generative AI and develop apps using the generative ai industry trends.
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...Aggregage
This webinar will explore cutting-edge, less familiar but powerful experimentation methodologies which address well-known limitations of standard A/B Testing. Designed for data and product leaders, this session aims to inspire the embrace of innovative approaches and provide insights into the frontiers of experimentation!
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...sameer shah
"Join us for STATATHON, a dynamic 2-day event dedicated to exploring statistical knowledge and its real-world applications. From theory to practice, participants engage in intensive learning sessions, workshops, and challenges, fostering a deeper understanding of statistical methodologies and their significance in various fields."
End-to-end pipeline agility - Berlin Buzzwords 2024Lars Albertsson
We describe how we achieve high change agility in data engineering by eliminating the fear of breaking downstream data pipelines through end-to-end pipeline testing, and by using schema metaprogramming to safely eliminate boilerplate involved in changes that affect whole pipelines.
A quick poll on agility in changing pipelines from end to end indicated a huge span in capabilities. For the question "How long time does it take for all downstream pipelines to be adapted to an upstream change," the median response was 6 months, but some respondents could do it in less than a day. When quantitative data engineering differences between the best and worst are measured, the span is often 100x-1000x, sometimes even more.
A long time ago, we suffered at Spotify from fear of changing pipelines due to not knowing what the impact might be downstream. We made plans for a technical solution to test pipelines end-to-end to mitigate that fear, but the effort failed for cultural reasons. We eventually solved this challenge, but in a different context. In this presentation we will describe how we test full pipelines effectively by manipulating workflow orchestration, which enables us to make changes in pipelines without fear of breaking downstream.
Making schema changes that affect many jobs also involves a lot of toil and boilerplate. Using schema-on-read mitigates some of it, but has drawbacks since it makes it more difficult to detect errors early. We will describe how we have rejected this tradeoff by applying schema metaprogramming, eliminating boilerplate but keeping the protection of static typing, thereby further improving agility to quickly modify data pipelines without fear.
The Ipsos - AI - Monitor 2024 Report.pdfSocial Samosa
According to Ipsos AI Monitor's 2024 report, 65% Indians said that products and services using AI have profoundly changed their daily life in the past 3-5 years.
"Financial Odyssey: Navigating Past Performance Through Diverse Analytical Lens"sameer shah
Embark on a captivating financial journey with 'Financial Odyssey,' our hackathon project. Delve deep into the past performance of two companies as we employ an array of financial statement analysis techniques. From ratio analysis to trend analysis, uncover insights crucial for informed decision-making in the dynamic world of finance."
Learn SQL from basic queries to Advance queriesmanishkhaire30
Dive into the world of data analysis with our comprehensive guide on mastering SQL! This presentation offers a practical approach to learning SQL, focusing on real-world applications and hands-on practice. Whether you're a beginner or looking to sharpen your skills, this guide provides the tools you need to extract, analyze, and interpret data effectively.
Key Highlights:
Foundations of SQL: Understand the basics of SQL, including data retrieval, filtering, and aggregation.
Advanced Queries: Learn to craft complex queries to uncover deep insights from your data.
Data Trends and Patterns: Discover how to identify and interpret trends and patterns in your datasets.
Practical Examples: Follow step-by-step examples to apply SQL techniques in real-world scenarios.
Actionable Insights: Gain the skills to derive actionable insights that drive informed decision-making.
Join us on this journey to enhance your data analysis capabilities and unlock the full potential of SQL. Perfect for data enthusiasts, analysts, and anyone eager to harness the power of data!
#DataAnalysis #SQL #LearningSQL #DataInsights #DataScience #Analytics
Global Situational Awareness of A.I. and where its headedvikram sood
You can see the future first in San Francisco.
Over the past year, the talk of the town has shifted from $10 billion compute clusters to $100 billion clusters to trillion-dollar clusters. Every six months another zero is added to the boardroom plans. Behind the scenes, there’s a fierce scramble to secure every power contract still available for the rest of the decade, every voltage transformer that can possibly be procured. American big business is gearing up to pour trillions of dollars into a long-unseen mobilization of American industrial might. By the end of the decade, American electricity production will have grown tens of percent; from the shale fields of Pennsylvania to the solar farms of Nevada, hundreds of millions of GPUs will hum.
The AGI race has begun. We are building machines that can think and reason. By 2025/26, these machines will outpace college graduates. By the end of the decade, they will be smarter than you or I; we will have superintelligence, in the true sense of the word. Along the way, national security forces not seen in half a century will be un-leashed, and before long, The Project will be on. If we’re lucky, we’ll be in an all-out race with the CCP; if we’re unlucky, an all-out war.
Everyone is now talking about AI, but few have the faintest glimmer of what is about to hit them. Nvidia analysts still think 2024 might be close to the peak. Mainstream pundits are stuck on the wilful blindness of “it’s just predicting the next word”. They see only hype and business-as-usual; at most they entertain another internet-scale technological change.
Before long, the world will wake up. But right now, there are perhaps a few hundred people, most of them in San Francisco and the AI labs, that have situational awareness. Through whatever peculiar forces of fate, I have found myself amongst them. A few years ago, these people were derided as crazy—but they trusted the trendlines, which allowed them to correctly predict the AI advances of the past few years. Whether these people are also right about the next few years remains to be seen. But these are very smart people—the smartest people I have ever met—and they are the ones building this technology. Perhaps they will be an odd footnote in history, or perhaps they will go down in history like Szilard and Oppenheimer and Teller. If they are seeing the future even close to correctly, we are in for a wild ride.
Let me tell you what we see.
2. Introduction
● Raul Eulogio
○ Data Analyst at Hospice of Santa Barbara
○ Co-founder: inertia7.com
○ President of Data Science at UCSB
○ Self taught Machine learning enthusiast
3. Data Science at UCSB and Farmer’s Insurance
Competition
Farmers Insurance is challenging you to put your data skills to the test. Seize this
opportunity to practically apply data science to tackle a problem in the insurance
field.
The top-performing teams will bring home:
● 1st place: $2000
● 2nd place: $1000
● 3rd place: $500
Additionally, all winning teams will get to present their work to a panel of Farmers
employees. MUST BE UCSB Student and Paid Member of Data Science at
UCSB. Application here
4. ● Use data collection to your advantage
○ The authors of Lean Analytics state: “data driven learning is the cornerstone of success in
startups. It’s how you learn what’s working and iterate towards the right product and
market before the money runs out.”
● Data Science
○ Enhancing the interpretation of reality
○ Automating machines to respond to their environments
* I will use Machine Learning and Data Science interchangeably
Why Machine Learning*?
6. I’m not here ...
● to tell you Data Science “is the sexiest job of the 21st century”
● to tell you that studies show less than 1% of data is being analyzed
● to show you the usual Venn Diagram that is presented at almost every data
science talk
7.
8. Multifaceted Domain
Machine Learning can be ...
○ Exploratory Analysis
■ Exploring trends in data
■ Creating a narrative with data
○ Unsupervised Learning
■ Exploring Trends and Patterns on a larger scale
■ Find hidden structure in data
○ Supervised Learning
■ Predicting an output based on inputs
■ Regression and Classification
9. Case Studies
Show through examples, all data available online and all work is open source and
on my Github Repository!
○ Exploratory Analysis - Apple Watch Data
■ Existing data collected by customer/user
● Data Collection (Python) and Data Exploration (R)
○ Unsupervised Learning - Spotify Data
■ Data made available by 3rd party source
● Data Exploration and Data Modeling (Python)
○ Supervised Learning - IBM Customer Churn Data
■ Data collected by organization
● Data Exploration (R) and Data Modeling (Python)
10. Exploratory Analysis: A case study on Apple Watch
● Sisense states: “You do [EDA] by taking a broad look at patterns, trends,
outliers, unexpected results and so on in your existing data, using
visual and quantitative methods to get a sense of the story this tells.
You’re looking for clues that suggest your logical next steps, questions
or areas of research.”
● Example data was gathered by Apple Watch on a daily basis to help fitness
tracking and other health related data
11.
12. How can we use customer’s daily fitness regimen to identify important trends?
Can we detect and prevent days/weeks where our customers will reduce their
workout regimen?
Is there correlation between our customer’s workout regimen and use of our
services?
Questions to consider
13. Unsupervised Learning: A Case Study on Spotify
Music
● Noticing trends and patterns within the data
○ Combining all features and usually unlabelled data
● Using Spotify API to create a recommender based on distance metric
● Ability to create clusters within our data
○ Recommend other use cases/product based on customer preference
■ Examples include Recommended videos on Youtube, Customers also bought on
Amazon, Daily mix on Spotify
14. How does it work?
● Nearest Neighbor Algorithm
● Algorithm creates feature space
using all inputs
● Inputs include:
○ Danceability
○ Loudness
○ Tempo
○ Category
○ And more
15. Examples of recommender at work
After some research I found that a lot of these songs were very similar in nature!
I have more knowledge in hip hop so I can say many of these were good recommendations some interesting songs were Don’t Wanna Know and
Summertime Sadness.
16. Questions to consider
How can we provide a seamless music experience for users?
Can we understand a users musical taste to maximize daily workout regime?
How can we effectively utilize 3rd party data to benefit our product?
17. ● Can we accurately predict when a customer will stop using a
service/business?
● Binary Classification problem using customer information including:
○ Tenure
○ Total Charges
○ Monthly Charges
○ Gender
○ Utilization of Phone Services?
○ And more...
● Models used:
○ Gradient Boosting
○ Logistic Regression
● Things to consider: Data Preprocessing, Data leakage, Class Imbalance
Supervised Learning: A case study on Churn Rate
22. Final Results on Churn Data Set
● Gradient Boosting: 76% Accuracy (CV)
● Logistic Regression: 77% Accuracy
(CV)
● Not high but we can still gain insight
○ Variable Importance for GB
○ Coefficients for variables for LR
● Customers with Month-to-Month
Contracts most likely to Churn
● Neural Networks? Careful of Black Box
Model
● Collect more data!
23. Results
● “All models are wrong but some are useful" - George Box
● ~77% accuracy for both Logistic Regression and Gradient Boosting: Not too
high in terms of groundbreaking results but can still can give insight. Typically
90% accuracy is a good start
● Key Takeaway: Data Science/Machine Learning is a life cycle not a one and
done procedure.
● Iterations are key; if model and data didn’t output wanted results, collect more
data. Ask what data should be collected and how it should be collected with
key stakeholders.
24. Questions to consider
How can we integrate Customer Reviews into our Machine Learning process?
What other covariates can we consider when creating our models?
Are we collecting the right data?
Which model can give us the most insight into our data without being to
computationally expensive?
25. Q&A
● If you have any questions or would like to contribute to these projects email
me: raul.eulogio@inertia7.com
● Check out inertia7.com if you want to learn all things Machine Learning and
Data Science
26. Resources
● Overview of Machine Learning using scikit-learn
● Introduction to Gradient Boosting
● Book Recommender (Inspired Spotify Recommender)
● Github Repo with Source code for presentation
● Logistic Regression with Scikit-learn