Architecting a Predictive, Petabyte-Scale, Self-Learning Fraud Detection System

•Télécharger en tant que PPTX, PDF•

2 j'aime•319 vues

Fraud detection is a classic adversarial analytics challenge: As soon as an automated system successfully learns to stop one scheme, fraudsters move on to attack another way. Each scheme requires looking for different signals (i.e. features) to catch; is relatively rare (one in millions for finance or e-commerce); and may take months to investigate a single case (in healthcare or tax, for example) – making quality training data scarce. This talk covers, via live demo and code walk-through, the key lessons we’ve learned while building such real-world software systems over the past few years. We’ll be looking for fraud signals in public email datasets, using IPython and popular open-source libraries (scikit-learn, statsmodel, nltk, etc.) for data science and Apache Spark as the compute engine for scalable parallel processing. We will iteratively build a machine-learned hybrid model – combining features from different data sources and algorithmic approaches, to catch diverse aspects of suspect behavior: - Natural language processing: finding keywords in relevant context within unstructured text - Statistical NLP: sentiment analysis via supervised machine learning - Time series analysis: understanding daily/weekly cycles and changes in habitual behavior - Graph analysis: finding actions outside the usual or expected network of people - Heuristic rules: finding suspect actions based on past schemes or external datasets - Topic modeling: highlighting use of keywords outside an expected context - Anomaly detection: Fully unsupervised ranking of unusual behavior This talk assumes basic understanding of these data science tools, so we can focus on their applicability for this use case and on how they complement each other. Apache Spark is used to run these models at scale – in batch mode for model training and with Spark Streaming for production use. We’ll discuss the data model, computation, and feedback workflows, as well as some tools and libraries built on top of the open-source components to enable faster experimentation, optimization, and productization of the models.

Logiciels

www.globalbigdataconference.com
Twitter : @bigdataconf

Architecting a predictive,
petabyte-scale, self-learning
fraud detection system

4
WHAT WE’RE UP AGAINST
4
4
50+Schemes
(and counting)
99.9999%‘Good’ messages
6+Months
per case
Needle in a haystack
Hybrid analytics
No training data
Semi-supervised learning
Adversarial learning
Online feedback

5
WHY HYBRID ANALYTICS?
5
5
Ignore
more rules
Unusual
timing of
events
Unusual
personal
network
Teamwork
& scale
Think & talk
differently

6
(BITS OF) THE TOOLBOX
6
6
Rule
Inference
Time
Series
AnalysisLink
Analysis
Ensemble
Learning
Natural
Language

7
THE CODE, PLEASE
7 7
Freely available Jupyter notebooks
Open source libraries & open data
Github.com/atigeo/hunting_criminals_demo

9
STREAM PROCESSING
9
9
Kafka
Email Stream
Account transactions
Stream
Email NLP
Features
People graph
Transactions time series

1 1
SAMPLE NATURAL LANGUAGE ANNOTATORS
Understand vocabulary
– Jargon
– Code words
– Multi-lingual
Understand grammar
– Who are we talking about?
– Past, present or future?
– Compound sentences
Understand context
– Email: Re:, Fwd:, attachments
– SMS & IM have their own grammar

1 2
SAMPLE GRAPH FEATURES
Standard algorithms like KMeans don’t work on “haystacks”

1 3
SAMPLE GRAPH FEATURES
Bregman Bubble Clustering

1 4
USER ANALYSIS ITERATION
Email NLP
Features
User graph
Transactions
time series
Graph Features
Time Series
Features
NLP Features
Agent Feedback
Train/TestClassifier

1 7
•Needle in a very large haystack
– Actually needs a petabyte-scale platform
•Multi-modal: no single trick works
– Hybrid analytics
•No labeled data
– Semi-supervised learning
– Cold start problem
•Sparse & high-dimensional
– Graph based features & change over time
•Adversarial
– Feedback & online learning
SUMMARY: CHALLENGES OF LEARNING CRIMINALS

Contenu connexe

Tendances

Get Started with Driverless AI Recipes - Hands-on Training

Sri Ambati

Big data and AI in Socialbakers

ppetr82

Machine learning model to production

Georg Heiler

Escaping Flatland: Interactive High-Dimensional Data Analysis in Drug Discove...

Spark Summit

Plume - A Code Property Graph Extraction and Analysis Library

TigerGraph

Presentation by Gwen Shapira, Product Manager, Confluent. With the rapid increase of Apache Kafka use within organizations, issues of data governance and data quality take center stage. When more and more disparate departments and teams depend on the data in Apache Kafka, it’s important to provide a way to make sure "bad data" does not make its way into critical topics. Every organization that uses Kafka at large scale realize they need a way to deliver these guarantees. In this talk, Kafka committer, Gwen Shapira will review the benefits of a schema registry for large-scale Kafka deployments and will give high-level overview of how the Confluent schema registry is being used in enterprise architectures across industry.

Simplify Governance of Streaming Data

confluent

FROM DATAFRAMES TO GRAPH Data Science with pyTigerGraph

TigerGraph

Big data bi-mature-oanyc summit

Open Analytics

In this session we will demonstrate how non-experts in machine learning, can easily analyze their data with QuickSight and build scalable and production-ready predictive models with Amazon machine learning. After the session you will have a good understanding how to define problems from your business, in terms of data and predictive models, and you will be able to apply analytics and machine learning concepts as a competitive advantage.

Explore Your Data Using Amazon QuickSight and Build Your First Machine Learni...

Amazon Web Services

ルールベースによるTwitter タイムライン感情分析

shokazari

Getting Started With Dato - August 2015

Turi, Inc.

Big data-science-oanyc

Open Analytics

There is no doubt that indicators of compromise (IOCs) are here to stay. However, even the most mature incident response (IR) teams are currently mainly focused on matching known indicators to their captured traffic or logs. The real “eureka” moments of using threat intelligence mostly come out of analyst intuition. You know, the ones that are almost impossible to hire. In this session, we show you how you can apply descriptive statistics, graph theory, and non-linear scoring techniques on the relationships of known network IOCs to log data. Learn how to use those techniques to empower IR teams to encode analyst intuition into repeatable data techniques that can be used to simplify the triage stage and get actionable information with minimal human interaction. With these results, we can make IR teams more productive as soon as the initial triage stages, by providing them data products that provide a “sixth sense” on what events are the ones worth analyst time. They also make painfully evident which IOC feeds an organization consume that are being helpful to their detection process and which ones are not.

Beyond Matching: Applying Data Science Techniques to IOC-based Detection

Alex Pinto

Etl is Dead; Long Live Streams

confluent

Rakshit (Rocky) Bhatt Resume - 2022

Rakshit (Rocky) Bhatt

Elevation Query Extension: Introducing Subselects into Lucene Queries

Lucidworks

The big data era is characterized by the ever-increasing velocity and volume of data. In order to store and analyze the ever-growing data, the operational footprint of data stores and Hadoop have also grown over time. (As per a recent report from IDC, the spending on big data infrastructure is expected to reach $41.5 billion by 2018.) The clusters comprise several thousands of nodes. The high performance of such clusters is vital for delivering the best user experience and productivity of teams. The performance of such clusters is often limited by slow/bad nodes. Finding slow nodes in large clusters is akin to finding a needle in a haystack; hence, manual identification of slow/bad nodes is not practical. To this end, we developed a novel statistical technique to automatically detect slow/bad nodes in clusters comprising hundreds to thousands of nodes. We modeled the problem as a classification problem and employed a simple, yet very effective, distance measure to determine slow/bad nodes. The key highlights of the proposed technique are the following: # Robustness against anomalies (note that anomalies may occur, for example, due to an ad-hoc heavyweight job on a Hadoop cluster) # Given the varying data characteristics of different services, no one model fits all. Consequently, we parameterized the threshold used for classification The proposed technique works well with both hourly and daily data, and has been in use in production by multiple services. This has not only eliminated manual investigation efforts, but has also mitigated the impact of slow nodes, which used to get detected after several weeks/months of lag! We shall walk the audience through how the techniques are being used with REAL data.

Finding bad apples early: Minimizing performance impact

Arun Kejariwal

Webinar: Question Answering and Virtual Assistants with Deep Learning

Lucidworks

For the last 18 months, MLSec Project and Niddel collaborated to collect threat intelligence indicator data from multiple sources in order to make sense of the ecosystem and try to find a measure of efficiency or quality in these feeds. This initiative culminated in the creation of Combine and TIQ-test, two of the open source projects from MLSec Project. These projects have been improved upon for the last year, and are able to gather and compare data from multiple Threat Intelligence sources on the Internet. Alex Sieira and his team have gathered aggregated usage information from intelligence sharing communities in order to determine if the added interest and "push" towards sharing is really being followed by the companies and if its adoption is putting us on the right track to close these gaps. He proposes a new set of metrics on the same vein as TIQ-test to help you understand what a "healthy" threat intelligence sharing community looks like. To better illustrate the points and metrics, Alex will be conducting part of this analysis using usage data from some high-profile threat intelligence platforms and sharing communities that have been kind enough to contribute with usage data for this research.

Sharing is Caring: Understanding and Measuring Threat Intelligence Sharing Ef...

Alex Pinto

This talk was given at H2O World 2018 NYC and can be viewed here: https://youtu.be/xc3j20Om3UM Description: Data science is indeed one of the sexy jobs of the 21st century. But it is also a lot of hard work. And the hard work is seldom about the math or the algorithms. It is about building relevant machine learning products for the real world. We will go over some of the must-haves as you take your machine learning model out of the sandbox and make it work in the big, bad world outside. Speaker's Bio: Krish Swamy is an experienced professional with deep skills in applying analytics and BigData capabilities to challenging business problems and driving customer insights. Krish's analytic experience includes marketing and pricing, credit risk, digital analytics and most recently, big data analytics and data transformation. His key experiences lie in banking and financial services, the digital customer experience domain, with a background in management consulting. Other key skills include influencing organizational change towards a data and analytics driven culture, and building teams of analysts, statisticians and data scientists.

Helping data scientists escape the seduction of the sandbox - Krish Swamy, We...

Sri Ambati

Tendances (20)

Get Started with Driverless AI Recipes - Hands-on Training

Big data and AI in Socialbakers

Machine learning model to production

Escaping Flatland: Interactive High-Dimensional Data Analysis in Drug Discove...

Plume - A Code Property Graph Extraction and Analysis Library

Simplify Governance of Streaming Data

FROM DATAFRAMES TO GRAPH Data Science with pyTigerGraph

Big data bi-mature-oanyc summit

Explore Your Data Using Amazon QuickSight and Build Your First Machine Learni...

ルールベースによるTwitter タイムライン感情分析

Getting Started With Dato - August 2015

Big data-science-oanyc

Beyond Matching: Applying Data Science Techniques to IOC-based Detection

Etl is Dead; Long Live Streams

Rakshit (Rocky) Bhatt Resume - 2022

Elevation Query Extension: Introducing Subselects into Lucene Queries

Finding bad apples early: Minimizing performance impact

Webinar: Question Answering and Virtual Assistants with Deep Learning

Sharing is Caring: Understanding and Measuring Threat Intelligence Sharing Ef...

Helping data scientists escape the seduction of the sandbox - Krish Swamy, We...

Similaire à Architecting a Predictive, Petabyte-Scale, Self-Learning Fraud Detection System

Online Predictive Modeling of Fraud Schemes from Mulitple Live Streams by Cla...

Spark Summit

Hunting criminals with hybrid analytics -- October 2015

Seattle DAML meetup

Sis fri 1030 michael holmes

MediaPost

Active learning from streams of graph, language & time series signals

Turi, Inc.

Serverless Toronto's 6th-anniversary event helps IT pros understand and prepare for the #GenAI tsunami ahead. You'll gain situational awareness of the LLM Landscape, receive condensed insights, and actionable advice about RAG in 2024 from Google AI Lead Mark Ryan and LlamaIndex creator Jerry Liu. We chose #RAG (Retrieval-Augmented Generation) because it is the predominant paradigm for building #LLM (Large Language Model) applications in enterprises today - and that's where the jobs will be shifting. Here is the recording: https://youtu.be/P5xd1ZjD-Os?si=iq8xibj5pJsJ62oW

All in AI: LLM Landscape & RAG in 2024 with Mark Ryan (Google) & Jerry Liu (L...

Daniel Zivkovic

Introduction To Python

Abhishek Prasoon

This presentation is focused on the architecture, scalability concerns, performance bottlenecks, operational characteristics and lessons learned while designing and implementing Yammer distributed real-time search system. Yammer is an enterprise social network SaaS offering with over 100,000 networks (including 85% of the Fortune 100) and nearly 2 million users. The search system we developed scales well up to 1B messages and serves a foundation of knowledge base analysis services Yammer is developing.

Realtime search at Yammer

Boris Aleksandrovsky

See conference video - http://www.lucidimagination.com/devzone/events/conferences/revolution/2011 This talk will be focused on the architecture, scalability concerns, performance bottlenecks, operational characteristics and lessons learned while designing and implementing Yammer distributed real-time search system. Yammer is an enterprise social network SaaS offering with over 100,000 networks (including 85% of the Fortune 100) and nearly 2 million users. The search system we developed scales well up to 1B messages and serves a foundation of knowledge base analysis services Yammer is developing.

Real-time Search at Yammer - By Aleksandrovsky Boris

lucenerevolution

Real Time Search at Yammer

Lucidworks (Archived)

Hunting criminals with hybrid analytics strata hadoop v4

Claudiu Branzan

Natural Language Processing & Semantic Modelsin an Imperfect World

Vital.AI

Codemotion Berlin 2015 recap

Torben Dohrn

Hacking CEH cheat sheet

International College of Security Studies

Hacking - CEH Cheat Sheet Exercises.pdf

john485745

Innovations in technology has revolutionized financial services to an extent that large financial institutions like Goldman Sachs are claiming to be technology companies! It is no secret that technological innovations like Data science and AI are changing fundamentally how financial products are created, tested and delivered. While it is exciting to learn about technologies themselves, there is very little guidance available to companies and financial professionals should retool and gear themselves towards the upcoming revolution. In this master class, we will discuss key innovations in Data Science and AI and connect applications of these novel fields in forecasting and optimization. Through case studies and examples, we will demonstrate why now is the time you should invest to learn about the topics that will reshape the financial services industry of the future! Topics for the Masterclass - Learning Data science in 10 steps

Data science in 10 steps

QuantUniversity

ChatGPT - 5 lessons in 5 minutes

DominikLuke

PHP to Python with No Regrets

Alex Ezell

This talk was recorded in London on Oct 30, 2018 and can be viewed here: https://youtu.be/p4iAnxwC_Eg The good news is building fair, accountable, and transparent machine learning systems is possible. The bad news is it’s harder than many blogs and software package docs would have you believe. The truth is nearly all interpretable machine learning techniques generate approximate explanations, that the fields of eXplainable AI (XAI) and Fairness, Accountability, and Transparency in Machine Learning (FAT/ML) are very new, and that few best practices have been widely agreed upon. This combination can lead to some ugly outcomes! This talk aims to make your interpretable machine learning project a success by describing fundamental technical challenges you will face in building an interpretable machine learning system, defining the real-world value proposition of approximate explanations for exact models, and then outlining the following viable techniques for debugging, explaining, and testing machine learning models Mateusz is a software developer who loves all things distributed, machine learning and hates buzzwords. His favourite hobby data juggling. He obtained his M.Sc. in Computer Science from AGH UST in Krakow, Poland, during which he did an exchange at L’ECE Paris in France and worked on distributed flight booking systems. After graduation he move to Tokyo to work as a researcher at Fujitsu Laboratories on machine learning and NLP projects, where he is still currently based.

Machine Learning Interpretability - Mateusz Dymczyk - H2O AI World London 2018

Sri Ambati

1435488539 221998

Shree Krishna Shrestha

Understanding Human Conversations with AI

Rajath D M

Similaire à Architecting a Predictive, Petabyte-Scale, Self-Learning Fraud Detection System (20)

Online Predictive Modeling of Fraud Schemes from Mulitple Live Streams by Cla...

Hunting criminals with hybrid analytics -- October 2015

Sis fri 1030 michael holmes

Active learning from streams of graph, language & time series signals

All in AI: LLM Landscape & RAG in 2024 with Mark Ryan (Google) & Jerry Liu (L...

Introduction To Python

Realtime search at Yammer

Real-time Search at Yammer - By Aleksandrovsky Boris

Real Time Search at Yammer

Hunting criminals with hybrid analytics strata hadoop v4

Natural Language Processing & Semantic Modelsin an Imperfect World

Codemotion Berlin 2015 recap

Hacking CEH cheat sheet

Hacking - CEH Cheat Sheet Exercises.pdf

Data science in 10 steps

ChatGPT - 5 lessons in 5 minutes

PHP to Python with No Regrets

Machine Learning Interpretability - Mateusz Dymczyk - H2O AI World London 2018

1435488539 221998

Understanding Human Conversations with AI

Plus de David Talby

Building State-of-the-art Natural Language Processing Projects with Free Soft...

David Talby

Turning Medical Expert Knowledge into Responsible Language Models - K1st World

David Talby

How to Apply NLP to Analyze Clinical Trials

David Talby

New Frontiers in Applied NLP - PAW Healthcare 2022

David Talby

Large Language Models, No-Code, and Responsible AI - Trends in Applied NLP in...

David Talby

Applying NLP to Personalized Healthcare - 2021

David Talby

Introducing the Open-Source Library for Testing NLP Models - Healthcare NLP S...

David Talby

The ability of software to reason, answer questions and intelligently converse about clinical notes, patient stories or biomedical papers has risen dramatically in the past few years. This talk covers state of the art natural language processing, deep learning, and machine learning libraries in this space. We'll share benchmarks from industry & research projects on use cases such as clinical data abstraction, patient risk prediction, named entity recognition & resolution, and negation scope detection.

Natural Language Understanding in Healthcare

David Talby

This talk covers three key aspects of applying deep learning for natural language understanding. First, we'll review current use cases for NLP, discuss what makes language understanding a particularly hard problem, and how deep learning promises to help. Second, we'll walk through an example of building a named entity recognizer - showing the common interplay between LSTM's, CNN's, transfer learning and CRF's in today's state of the art systems. Third, we'll cover best practices for taking such systems from prototypes to production. This talk is intended for practicing data scientists and R&D leaders who need to use the latest advances in the field in systems they're currently building.

Deep learning for natural language understanding

David Talby

A text mining system must go way beyond indexing and search to appear truly intelligent. First, it should understand language beyond keyword matching. (For example, distinguishing between “Jane has the flu,” “Jane may have the flu,” “Jane is concerned about the flu,” “Jane’s sister has the flu, but she doesn’t,” or “Jane had the flu when she was 9” is of critical importance.) This is a natural language processing problem. Second, it should “read between the lines” and make likely inferences even if they’re not explicitly written. (For example, if Jane has had a fever, a headache, fatigue, and a runny nose for three days, not as part of an ongoing condition, then she likely has the flu.) This is a semi-supervised machine learning problem. Third, it should automatically learn the right contextual inferences to make. (For example, learning on its own that fatigue is sometimes a flu symptom—only because it appears in many diagnosed patients—without a human ever explicitly stating that rule.) This is an association-mining problem, which can be tackled via deep learning or via more guided machine learning techniques. This is a demo of an end-to-end system that makes nontrivial clinical inferences from free-text patient records and provides real-time inferencing at scale. The architecture is built on Kafka and Spark Streaming for real-time data ingestion and processing, Spark & MLLib for modeling, and Elasticsearch for enabling low-latency access to results. The data science components include a natural language processing pipeline with custom annotators, machine learning models for implicit inferences, and dynamic ontologies built using Word2Vec for representing and learning new relationships between concepts. Source code will be made available after the talk to enable you to hack away on your own.

Natural Language Understanding with Machine Learned Annotators and Deep Learn...

David Talby

A text mining system must go way beyond indexing and search to appear truly intelligent. First, it should understand language beyond keyword matching (for example, distinguishing between “Jane has the flu,” “Jane may have the flu,” “Jane is concerned about the flu," “Jane’s sister has the flu, but she doesn’t,” or “Jane had the flu when she was 9” is of critical importance). This is a natural language processing problem. Second, it should “read between the lines” and make likely inferences even if they’re not explicitly written (for example, if Jane has had a fever, a headache, fatigue, and a runny nose for three days, not as part of an ongoing condition, then she likely has the flu). This is a semi-supervised machine learning problem. And third, it should automatically learn the right contextual inferences to make (for example, learning on its own that fatigue is (sometimes) a flu symptom—only because it appears in many diagnosed patients—without a human ever explicitly stating that rule). This is an association-mining problem, which can be tackled via deep learning or via more guided machine-learning techniques. This is a live demo of an end-to-end system that makes nontrivial clinical inferences from free-text patient records and provides real-time inferencing at scale. The architecture is built out of open source big data components: Kafka and Spark Streaming for real-time data ingestion and processing, Spark for modeling, and Titan and Elasticsearch for enabling low-latency access to results. The data science components include a UIMA pipeline with custom annotators, machine-learning models for implicit inferences, and dynamic ontologies based on deep learning with Word2Vec for representing and learning new relationships between concepts. Source code is publicly available to enable you to hack away on your own.

Semantic Natural Language Understanding with Spark, UIMA & Machine Learned On...

David Talby

Plus de David Talby (11)

Building State-of-the-art Natural Language Processing Projects with Free Soft...

Turning Medical Expert Knowledge into Responsible Language Models - K1st World

How to Apply NLP to Analyze Clinical Trials

New Frontiers in Applied NLP - PAW Healthcare 2022

Large Language Models, No-Code, and Responsible AI - Trends in Applied NLP in...

Applying NLP to Personalized Healthcare - 2021

Introducing the Open-Source Library for Testing NLP Models - Healthcare NLP S...

Natural Language Understanding in Healthcare

Deep learning for natural language understanding

Natural Language Understanding with Machine Learned Annotators and Deep Learn...

Semantic Natural Language Understanding with Spark, UIMA & Machine Learned On...

Dernier

HR Software Buyers Guide in 2024 - HRSoftware.com

Fatema Valibhai

Model Call Girl Services in Delhi reach out to us at 🔝 9953056974 🔝✔️✔️ Our agency presents a selection of young, charming call girls available for bookings at Oyo Hotels. Experience high-class escort services at pocket-friendly rates, with our female escorts exuding both beauty and a delightful personality, ready to meet your desires. Whether it's Housewives, College girls, Russian girls, Muslim girls, or any other preference, we offer a diverse range of options to cater to your tastes. We provide both in-call and out-call services for your convenience. Our in-call location in Delhi ensures cleanliness, hygiene, and 100% safety, while our out-call services offer doorstep delivery for added ease. We value your time and money, hence we kindly request pic collectors, time-passers, and bargain hunters to refrain from contacting us. Our services feature various packages at competitive rates: One shot: ₹2000/in-call, ₹5000/out-call Two shots with one girl: ₹3500/in-call, ₹6000/out-call Body to body massage with sex: ₹3000/in-call Full night for one person: ₹7000/in-call, ₹10000/out-call Full night for more than 1 person: Contact us at 🔝 9953056974 🔝. for details Operating 24/7, we serve various locations in Delhi, including Green Park, Lajpat Nagar, Saket, and Hauz Khas near metro stations. For premium call girl services in Delhi 🔝 9953056974 🔝. Thank you for considering us!

CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE

9953056974 Low Rate Call Girls In Saket, Delhi NCR

Looking for an efficient way to manage your finances? Look no further than our money management app. With easy-to-use features, you can track your expenses, create budgets, and monitor your savings goals all in one place. Our app provides real-time updates on your spending habits and helps you make smarter financial decisions. Take control of your finances today with our user-friendly money management app.

Right Money Management App For Your Financial Goals

Jhone kinadey

%in Midrand+277-882-255-28 abortion pills for sale in midrand

masabamasaba

A Secure and Reliable Document Management System is Essential.docx

ComplianceQuest1

Test automation is a cornerstone of software development and quality assurance in today's rapidly evolving digital landscape. Its significance cannot be overstated. Businesses can enhance efficiency, productivity, and accelerate software delivery to market through automation, streamlining testing processes effectively. This comprehensive guide addresses the best practices for test automation in 2024. It offers a detailed checklist to empower you to optimize your automation efforts and maintain a competitive edge.

The Ultimate Test Automation Guide_ Best Practices and Tips.pdf

kalichargn70th171

Pharm-D Biostatistics and Research methodology

Anusha Are

Unlocking the Future of AI Agents with Large Language Models

aagamshah0812

Azure Native Qumulo scales elastically for common High Performance Compute (HPC) workloads based on application requirements for: Financial Services, Automotive, Genomics / Life Sciences, Media and Entertainment, Energy, Oil and Gas, etc. Performance can be dialed UP (and back down) much higher than the examples shown here. These slides offer a glimpse into ANQ's HPC capabilities, although at a smaller scale. We invite YOU to do your own testing (with a free ANQ trial) and work with us to test your HPC workloads in Azure.

Azure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdf

ryanfarris8

In the past six months, the AI landscape has undergone a massive transformation, ushering in a new era of productivity with the latest in Large Language Models (LLMs) and AI technology. This deep dive unlocks how to: Create CustomGPT Models: No coding needed to tailor AI for your unique projects. Integrate your own data, including PDFs and Excel sheets, making information handling a breeze. Plus, discover how to call your own actions/integrations for even more personalized utility. Navigate Advanced Prompting: Overcome AI's memory limits and utilize Retrieval-Augmented Generation for accessing your personalized data, streamlining how you interact with AI. Stay Ahead with AI Trends: Peek into the evolving world of LLMs, featuring newcomers like Google Gemini, Anthropic Claude, Open Sora, and Twitter Grok, and understand what their advancements mean for your productivity. Witness Real-Life Transformations: Through examples and prompt demonstrations, see firsthand how these AI strategies revolutionize routine tasks, from data analysis to content creation. Learn to leverage image output and input for advanced practical use cases, adding a new dimension to your productivity toolkit. No previous coding or AI experience is needed for this talk. Stay ahead in the fast-evolving world of work. Embrace the AI revolution and transform your workflow with advanced LLM techniques. Join us to ensure you're not left behind in the productivity race.

AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques

VictorSzoltysek

Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf

kalichargn70th171

InShot proinshot.com stands tall among its peers as the ultimate video editing app, offering simplicity, versatility, and power in one package. With its intuitive interface and comprehensive feature set, InShot caters to both beginners and seasoned editors alike. Whether you're creating content for social media, YouTube, or personal projects, InShot empowers you to unleash your creativity and transform your videos into captivating masterpieces. Join the millions of users who trust InShot https://www.proinshot.com/ for all their video editing needs and discover the difference for yourself!

Exploring the Best Video Editing App.pdf

proinshot.com

Define the academic and professional writing..pdf

PearlKirahMaeRagusta1

Software Quality Assurance Interview Questions

Arshad QA

OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...

Shane Coughlan

Direct Style Effect Systems -The Print[A] Example- A Comprehension Aid

Philip Schwarz

How To Troubleshoot Collaboration Apps for the Modern Connected Worker

ThousandEyes

In the realm of real-time applications, Large Language Models (LLMs) have long dominated language-centric tasks, while tools like OpenCV have excelled in the visual domain. However, the future (maybe) lies in the fusion of LLMs and deep learning, giving birth to the revolutionary concept of Large Action Models (LAMs). Imagine a world where AI not only comprehends language but mimics human actions on technology interfaces. For example, the Rabbit r1 device presented at CES 2024, driven by an AI operating system and LAM, brings this vision to life. It executes complex commands, leveraging GUIs with unprecedented ease. In this presentation, join me on a journey as a software engineer tinkering with WebRTC, Janus, and LLM/LAMs. Together, we’ll evaluate the current state of these AI technologies, unraveling the potential they hold for shaping the future of real-time applications.

Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications

Alberto González Trastoy

(Vivek)Call Us, 8448380779,Call girls in Delhi NCr – We Offer best in class call girls. escort Service At Affordable Price At low Rate with Space Night 8000 We Are One Of The Oldest Escort and Call girls Agencies in Delhi. You Will Find That Our Female Escorts Are Full Of Fun, Sexy And They Would Love Enjoy Your Company. We Have A Fantastic Selection Of Escort Ladies Available For In-Calls As Well As Out-Calls. Our Escorts Are Not Only Beautiful But All Have Great Personalities Making Them The Perfect Companion For Any Occasion. In-Call:- You Can Come At Our Place in Delhi Our place Which Is Very Clean Hygienic 100% safe Accommodation. Out-Call:- You have To Come Pick The Girl From My Place We Are Also Provide Door Step Services (Delhi Ncr, Noida, Gurgaon, Faridabad, Ghaziabad Note:- Pic Collectors Time Passers Bargainers Stay Away As We Respect The Value For Your Money Time And Expect The Same From You Hygienic:- Full Ac room And Clean Rooms Available In Hotel 24 * 7 Hourly In Delhi NCR More Details, With WhatsApp Number, +91-8448380779

Sector 18, Noida Call girls :8448380779 Model Escorts | 100% verified

Delhi Call girls

call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️

Delhi Call girls

Dernier (20)

HR Software Buyers Guide in 2024 - HRSoftware.com

CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE

Right Money Management App For Your Financial Goals

%in Midrand+277-882-255-28 abortion pills for sale in midrand

A Secure and Reliable Document Management System is Essential.docx

The Ultimate Test Automation Guide_ Best Practices and Tips.pdf

Pharm-D Biostatistics and Research methodology

Unlocking the Future of AI Agents with Large Language Models

Azure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdf

AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques

Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf

Exploring the Best Video Editing App.pdf

Define the academic and professional writing..pdf

Software Quality Assurance Interview Questions

OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...

Direct Style Effect Systems -The Print[A] Example- A Comprehension Aid

How To Troubleshoot Collaboration Apps for the Modern Connected Worker

Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications

Sector 18, Noida Call girls :8448380779 Model Escorts | 100% verified

call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️

Architecting a Predictive, Petabyte-Scale, Self-Learning Fraud Detection System

1. www.globalbigdataconference.com Twitter : @bigdataconf

2. Architecting a predictive, petabyte-scale, self-learning fraud detection system

3. 3

4. 4 WHAT WE’RE UP AGAINST 4 4 50+Schemes (and counting) 99.9999%‘Good’ messages 6+Months per case Needle in a haystack Hybrid analytics No training data Semi-supervised learning Adversarial learning Online feedback

5. 5 WHY HYBRID ANALYTICS? 5 5 Ignore more rules Unusual timing of events Unusual personal network Teamwork & scale Think & talk differently

6. 6 (BITS OF) THE TOOLBOX 6 6 Rule Inference Time Series AnalysisLink Analysis Ensemble Learning Natural Language

7. 7 THE CODE, PLEASE 7 7 Freely available Jupyter notebooks Open source libraries & open data Github.com/atigeo/hunting_criminals_demo

8. 8

9. 9 STREAM PROCESSING 9 9 Kafka Email Stream Account transactions Stream Email NLP Features People graph Transactions time series

10. 1 0 SAMPLE EMAIL PATTERNS

11. 1 1 SAMPLE NATURAL LANGUAGE ANNOTATORS Understand vocabulary – Jargon – Code words – Multi-lingual Understand grammar – Who are we talking about? – Past, present or future? – Compound sentences Understand context – Email: Re:, Fwd:, attachments – SMS & IM have their own grammar

12. 1 2 SAMPLE GRAPH FEATURES Standard algorithms like KMeans don’t work on “haystacks”

13. 1 3 SAMPLE GRAPH FEATURES Bregman Bubble Clustering

14. 1 4 USER ANALYSIS ITERATION Email NLP Features User graph Transactions time series Graph Features Time Series Features NLP Features Agent Feedback Train/TestClassifier

15. 1 5

16. 1 6

17. 1 7 •Needle in a very large haystack – Actually needs a petabyte-scale platform •Multi-modal: no single trick works – Hybrid analytics •No labeled data – Semi-supervised learning – Cold start problem •Sparse & high-dimensional – Graph based features & change over time •Adversarial – Feedback & online learning SUMMARY: CHALLENGES OF LEARNING CRIMINALS

18. 1 8 @davidtalby

Architecting a Predictive, Petabyte-Scale, Self-Learning Fraud Detection System

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

Similaire à Architecting a Predictive, Petabyte-Scale, Self-Learning Fraud Detection System

Similaire à Architecting a Predictive, Petabyte-Scale, Self-Learning Fraud Detection System (20)

Plus de David Talby

Plus de David Talby (11)

Dernier

Dernier (20)

Architecting a Predictive, Petabyte-Scale, Self-Learning Fraud Detection System