SlideShare a Scribd company logo
1 of 21
Download to read offline
Cloud Tools Guidance
Author: Timothy Spann
Michael Kohs George Vetticaden
Date: 04/19/2023
Last Updated: 5/01/2023
Notice
This document assumes that you have registered for an account, activated it and logged into
the CDP Sandbox. This is for authorized users only who have attended the webinar and have
read the training materials.
A short guide and references are listed here.
THIS IS NOT FOR USE WITH THE FIRST TWO
TUTORIALS. THIS IS FOR BUILDING ASSETS FOR
YOUR OWN NEW FLOWS.
1. How To Build Data Assets
1.1 Create a Kafka Topic
1. Navigate to Data Hub Clusters
2. Navigate to the oss-kafka-demo cluster
3. Navigate to Streams Messaging Manager
Info: Streams Messaging Manager (SMM) is a tool for working with Apache Kafka.
4. Now that you are in SMM.
5. Navigate to the round icon third from the top, click this Topic button.
6. You are now in the Topic browser.
7. Click Add New to build a new topic.
8. Enter the name of your topic prefixed with your “Workload User Name“
<yourusername>_yournewtopic, ex: tim_younewtopic.
9. Enter the name of your topic prefixed with your Workload User Name, ex:
tim_younewtopic. For settings you should create it with (3 partitions,
cleanup.policy: delete, availability maximum) as shown above.
Congratulations! You have built a new topic.
1.2 Create a Schema If You Need One. Not Required For Using Kafka Topics or Tutorials.
1. Navigate to Schema Registry from the Kafka Data Hub.
2. You will see existing schemas.
3. Click the white plus sign in the gray hexagon to create a new schema.
4. You can now add a new schema by entering a unique name starting with your Workload
User Name (ex: tim), followed by a short description and then the schema text as
shown. If you need examples, see the github list at the end of this guide.
5. Click Save and you have a new schema. If there were errors they will be shown and
you can fix them. For more help see, Schema Registry Documentation and Schema
Registry Public Cloud.
Congratulations! You have built a new schema. Start using it in your DataFlow application.
1.3 Create an Apache Iceberg Table
1. Navigate to oss-kudu-demo from the Data Hubs list
2. Navigate to Hue from the Kudu Data Hub.
3. Inside of Hue you can now create your table.
4. Navigate to your database, this was created for you.
Info: The database name pattern is your email address and then all special characters are
replaced with underscore and then _db is appended to that to make the db name and the
ranger policy is created to limit access to just the user and those that are in the admin
group. For example:
5. Create your Apache Iceberg table, it must be prefixed with your Work Load User Name
(userid).
CREATE TABLE <<userid>>_syslog_critical_archive
(priority int, severity int, facility int, version int, event_timestamp bigint, hostname string,
body string, appName string, procid string, messageid string,
structureddata struct<sdid:struct<eventid:string,eventsource:string,iut:string>>)
STORED BY ICEBERG
6. Your table is created in s3a://oss-uat2/iceberg/
7. Once you have sent data to your table, you can query it.
Additional Documentation
● Create a Table
● Query a Table
● Apache Iceberg Table Properties
2. Streaming Data Sets Available for Apps
The following Kafka topics are being populated with streaming data for you.
These come from the read-only Kafka cluster.
Navigate to the Data Hub Clusters.
Click on oss-kafka-datagen.
Click Schema Registry.
Click Streams Messaging Manager.
Use these brokers to connect to them:
Brokers
oss-kafka-datagen-corebroker1.oss-demo.qsm5-opic.cloudera.site:9093,oss-kafka-dat
agen-corebroker0.oss-demo.qsm5-opic.cloudera.site:9093,oss-kafka-datagen-corebro
ker2.oss-demo.qsm5-opic.cloudera.site:9093
Use this link for Schema Registry
https://#{Schema2}:7790/api/v1
Schema Registry Parameter Hostname: Schema2
oss-kafka-datagen-master0.oss-demo.qsm5-opic.cloudera.site
To View Schemas in the Schema Registry click the icon from the datahub
https://oss-kafka-datagen-gateway.oss-demo.qsm5-opic.cloudera.site/oss-kafka-datagen/cd
p-proxy/schema-registry/ui/#/
Schemas
https://github.com/tspannhw/FLaNK-DataFlows/tree/main/schemas
Group ID: yourid_cdf
Customers (customer)
Example Row
{"first_name":"Charley","last_name":"Farrell","age":19,"city":"Sawaynside","country":"Guinea","em
ail":"keven.herzog@hotmail.com","phone_number":"312-269-6619"}
IP Tables (ip_address)
Example Row
{"source_ip":"216.25.204.241","dest_port":219,"tcp_flags_ack":0,"tcp_flags_reset":0,"ts":"2023-0
4-20 15:26:45.517"}
Orders (orders)
Example Row
{"order_id":84170282,"city":"Wintheiserton","street_address":"80206 Caroyln
Lakes","amount":29,"order_time":"2023-04-20 13:25:06.097","order_status":"DELIVERED"}
Plants (plant)
Example Row
{"plant_id":829,"city":"Lake Gerald","lat":"39.568679","lon":"-151.64497","country":"Eritrea"}
Sensors (sensor)
Example Row
{"sensor_id":264,"timestamp_of_production":"2023-04-20 18:28:42.751"}
Sensor Data (sensor_data)
Example Row
{"sensor_id":250,"timestamp_of_production":"2023-04-20 18:42:04.847","sensor_value":-72}
Weather (weather)
Example Row
{"city":"New Ernesto","temp_c":21,"description":"Sleet"}
Transactions (transactions)
Example Row
{"sender_id":40816,"receiver_id":96057,"amount":557,"execution_date":"2023-04-20
16:15:30.744","currency":"UYU"}
These are realistic generated data sources that you can use, they are available from
read-only Kafka topics. These can be consumed by any developers in the sandbox.
Make sure you name your Kafka Consumer your Workload Username _ Some
Name.
Ex: tim_customerdata_reader
3. Bring Your Own Data (Public Only)
● Data is visible and downloadable to all, make sure it is safe, free, open, public data.
● Public REST Feeds are good
○ Wikipedia
https://docs.cloudera.com/dataflow/cloud/flow-designer-beginners-guide-readyflo
w/topics/cdf-flow-designer-getting-started-readyflow.html
○ https://gbfs.citibikenyc.com/gbfs/en/station_status.json
○ https://travel.state.gov/_res/rss/TAsTWs.xml
○ https://www.njtransit.com/rss/BusAdvisories_feed.xml
○ https://www.njtransit.com/rss/RailAdvisories_feed.xml
○ https://www.njtransit.com/rss/LightRailAdvisories_feed.xml
○ https://www.njtransit.com/rss/CustomerNotices_feed.xml
○ https://w1.weather.gov/xml/current_obs/all_xml.zip
○ https://dailymed.nlm.nih.gov/dailymed/services/v2/spls.json?page=1&pagesize=1
00
○ https://dailymed.nlm.nih.gov/dailymed/services/v2/drugnames.json?pagesize=10
0
○ https://dailymed.nlm.nih.gov/dailymed/rss.cfm
● Generic data files
○ https://aws.amazon.com/data-exchange
● Simulators
○ Use external data simulators via REST
○ Use GeneralFlowFile see:
https://www.datainmotion.dev/2019/04/integration-testing-for-apache-nifi.html
● Schemas, Data Sources and Examples
○ https://github.com/tspannhw/FLaNK-AllTheStreams/
○ https://github.com/tspannhw/FLaNK-DataFlows
○ https://github.com/tspannhw/FLaNK-TravelAdvisory/
○ https://github.com/tspannhw/FLiP-Current22-LetsMonitorAllTheThings
○ https://github.com/tspannhw/create-nifi-kafka-flink-apps
○ https://www.datainmotion.dev/2021/01/flank-real-time-transit-information-for.html
CloudToolGuidance03May2023

More Related Content

Similar to CloudToolGuidance03May2023

Open writing-cloud-collab
Open writing-cloud-collabOpen writing-cloud-collab
Open writing-cloud-collab
Karen Vuong
 
Azure Data serices and databricks architecture
Azure Data serices and databricks architectureAzure Data serices and databricks architecture
Azure Data serices and databricks architecture
AdventureWorld5
 

Similar to CloudToolGuidance03May2023 (20)

Mysql ppt
Mysql pptMysql ppt
Mysql ppt
 
Open Writing! Collaborative Authoring for CloudStack Documentation by Jessica...
Open Writing! Collaborative Authoring for CloudStack Documentation by Jessica...Open Writing! Collaborative Authoring for CloudStack Documentation by Jessica...
Open Writing! Collaborative Authoring for CloudStack Documentation by Jessica...
 
Open writing-cloud-collab
Open writing-cloud-collabOpen writing-cloud-collab
Open writing-cloud-collab
 
Lampstack (1)
Lampstack (1)Lampstack (1)
Lampstack (1)
 
SubjectsPlus Manual in Compatible with XAMPP
SubjectsPlus Manual in Compatible with XAMPPSubjectsPlus Manual in Compatible with XAMPP
SubjectsPlus Manual in Compatible with XAMPP
 
Hortonworks Setup & Configuration on Azure
Hortonworks Setup & Configuration on AzureHortonworks Setup & Configuration on Azure
Hortonworks Setup & Configuration on Azure
 
Centralized log-management-with-elastic-stack
Centralized log-management-with-elastic-stackCentralized log-management-with-elastic-stack
Centralized log-management-with-elastic-stack
 
Enhancing Apache Kafka for Large Scale Real-Time Data Pipeline at Tencent | K...
Enhancing Apache Kafka for Large Scale Real-Time Data Pipeline at Tencent | K...Enhancing Apache Kafka for Large Scale Real-Time Data Pipeline at Tencent | K...
Enhancing Apache Kafka for Large Scale Real-Time Data Pipeline at Tencent | K...
 
HPC and HPGPU Cluster Tutorial
HPC and HPGPU Cluster TutorialHPC and HPGPU Cluster Tutorial
HPC and HPGPU Cluster Tutorial
 
Running Spark In Production in the Cloud is Not Easy with Nayur Khan
Running Spark In Production in the Cloud is Not Easy with Nayur KhanRunning Spark In Production in the Cloud is Not Easy with Nayur Khan
Running Spark In Production in the Cloud is Not Easy with Nayur Khan
 
Team lab install_en
Team lab install_enTeam lab install_en
Team lab install_en
 
Reusable, composable, battle-tested Terraform modules
Reusable, composable, battle-tested Terraform modulesReusable, composable, battle-tested Terraform modules
Reusable, composable, battle-tested Terraform modules
 
BDA311 Introduction to AWS Glue
BDA311 Introduction to AWS GlueBDA311 Introduction to AWS Glue
BDA311 Introduction to AWS Glue
 
Querying and Analyzing Data in Amazon S3
Querying and Analyzing Data in Amazon S3Querying and Analyzing Data in Amazon S3
Querying and Analyzing Data in Amazon S3
 
Snowflake free trial_lab_guide
Snowflake free trial_lab_guideSnowflake free trial_lab_guide
Snowflake free trial_lab_guide
 
Snowflake_free_trial_LabGuide.pdf
Snowflake_free_trial_LabGuide.pdfSnowflake_free_trial_LabGuide.pdf
Snowflake_free_trial_LabGuide.pdf
 
Azure Data serices and databricks architecture
Azure Data serices and databricks architectureAzure Data serices and databricks architecture
Azure Data serices and databricks architecture
 
slides.pptx
slides.pptxslides.pptx
slides.pptx
 
slides.pptx
slides.pptxslides.pptx
slides.pptx
 
mush With Xampp
mush With Xamppmush With Xampp
mush With Xampp
 

More from Timothy Spann

Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesConf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Timothy Spann
 
28March2024-Codeless-Generative-AI-Pipelines
28March2024-Codeless-Generative-AI-Pipelines28March2024-Codeless-Generative-AI-Pipelines
28March2024-Codeless-Generative-AI-Pipelines
Timothy Spann
 
TCFPro24 Building Real-Time Generative AI Pipelines
TCFPro24 Building Real-Time Generative AI PipelinesTCFPro24 Building Real-Time Generative AI Pipelines
TCFPro24 Building Real-Time Generative AI Pipelines
Timothy Spann
 
2024 February 28 - NYC - Meetup Unlocking Financial Data with Real-Time Pipel...
2024 February 28 - NYC - Meetup Unlocking Financial Data with Real-Time Pipel...2024 February 28 - NYC - Meetup Unlocking Financial Data with Real-Time Pipel...
2024 February 28 - NYC - Meetup Unlocking Financial Data with Real-Time Pipel...
Timothy Spann
 
2024 Feb AI Meetup NYC GenAI_LLMs_ML_Data Codeless Generative AI Pipelines
2024 Feb AI Meetup NYC GenAI_LLMs_ML_Data Codeless Generative AI Pipelines2024 Feb AI Meetup NYC GenAI_LLMs_ML_Data Codeless Generative AI Pipelines
2024 Feb AI Meetup NYC GenAI_LLMs_ML_Data Codeless Generative AI Pipelines
Timothy Spann
 
DBA Fundamentals Group: Continuous SQL with Kafka and Flink
DBA Fundamentals Group: Continuous SQL with Kafka and FlinkDBA Fundamentals Group: Continuous SQL with Kafka and Flink
DBA Fundamentals Group: Continuous SQL with Kafka and Flink
Timothy Spann
 
OSACon 2023_ Unlocking Financial Data with Real-Time Pipelines
OSACon 2023_ Unlocking Financial Data with Real-Time PipelinesOSACon 2023_ Unlocking Financial Data with Real-Time Pipelines
OSACon 2023_ Unlocking Financial Data with Real-Time Pipelines
Timothy Spann
 
JConWorld_ Continuous SQL with Kafka and Flink
JConWorld_ Continuous SQL with Kafka and FlinkJConWorld_ Continuous SQL with Kafka and Flink
JConWorld_ Continuous SQL with Kafka and Flink
Timothy Spann
 

More from Timothy Spann (20)

DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024
April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024
April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024
 
Real-Time AI Streaming - AI Max Princeton
Real-Time AI  Streaming - AI Max PrincetonReal-Time AI  Streaming - AI Max Princeton
Real-Time AI Streaming - AI Max Princeton
 
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesConf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
 
2024 XTREMEJ_ Building Real-time Pipelines with FLaNK_ A Case Study with Tra...
2024 XTREMEJ_  Building Real-time Pipelines with FLaNK_ A Case Study with Tra...2024 XTREMEJ_  Building Real-time Pipelines with FLaNK_ A Case Study with Tra...
2024 XTREMEJ_ Building Real-time Pipelines with FLaNK_ A Case Study with Tra...
 
28March2024-Codeless-Generative-AI-Pipelines
28March2024-Codeless-Generative-AI-Pipelines28March2024-Codeless-Generative-AI-Pipelines
28March2024-Codeless-Generative-AI-Pipelines
 
TCFPro24 Building Real-Time Generative AI Pipelines
TCFPro24 Building Real-Time Generative AI PipelinesTCFPro24 Building Real-Time Generative AI Pipelines
TCFPro24 Building Real-Time Generative AI Pipelines
 
2024 Build Generative AI for Non-Profits
2024 Build Generative AI for Non-Profits2024 Build Generative AI for Non-Profits
2024 Build Generative AI for Non-Profits
 
2024 February 28 - NYC - Meetup Unlocking Financial Data with Real-Time Pipel...
2024 February 28 - NYC - Meetup Unlocking Financial Data with Real-Time Pipel...2024 February 28 - NYC - Meetup Unlocking Financial Data with Real-Time Pipel...
2024 February 28 - NYC - Meetup Unlocking Financial Data with Real-Time Pipel...
 
Conf42-Python-Building Apache NiFi 2.0 Python Processors
Conf42-Python-Building Apache NiFi 2.0 Python ProcessorsConf42-Python-Building Apache NiFi 2.0 Python Processors
Conf42-Python-Building Apache NiFi 2.0 Python Processors
 
Conf42Python -Using Apache NiFi, Apache Kafka, RisingWave, and Apache Iceberg...
Conf42Python -Using Apache NiFi, Apache Kafka, RisingWave, and Apache Iceberg...Conf42Python -Using Apache NiFi, Apache Kafka, RisingWave, and Apache Iceberg...
Conf42Python -Using Apache NiFi, Apache Kafka, RisingWave, and Apache Iceberg...
 
2024 Feb AI Meetup NYC GenAI_LLMs_ML_Data Codeless Generative AI Pipelines
2024 Feb AI Meetup NYC GenAI_LLMs_ML_Data Codeless Generative AI Pipelines2024 Feb AI Meetup NYC GenAI_LLMs_ML_Data Codeless Generative AI Pipelines
2024 Feb AI Meetup NYC GenAI_LLMs_ML_Data Codeless Generative AI Pipelines
 
DBA Fundamentals Group: Continuous SQL with Kafka and Flink
DBA Fundamentals Group: Continuous SQL with Kafka and FlinkDBA Fundamentals Group: Continuous SQL with Kafka and Flink
DBA Fundamentals Group: Continuous SQL with Kafka and Flink
 
NY Open Source Data Meetup Feb 8 2024 Building Real-time Pipelines with FLaNK...
NY Open Source Data Meetup Feb 8 2024 Building Real-time Pipelines with FLaNK...NY Open Source Data Meetup Feb 8 2024 Building Real-time Pipelines with FLaNK...
NY Open Source Data Meetup Feb 8 2024 Building Real-time Pipelines with FLaNK...
 
OSACon 2023_ Unlocking Financial Data with Real-Time Pipelines
OSACon 2023_ Unlocking Financial Data with Real-Time PipelinesOSACon 2023_ Unlocking Financial Data with Real-Time Pipelines
OSACon 2023_ Unlocking Financial Data with Real-Time Pipelines
 
Building Real-Time Travel Alerts
Building Real-Time Travel AlertsBuilding Real-Time Travel Alerts
Building Real-Time Travel Alerts
 
JConWorld_ Continuous SQL with Kafka and Flink
JConWorld_ Continuous SQL with Kafka and FlinkJConWorld_ Continuous SQL with Kafka and Flink
JConWorld_ Continuous SQL with Kafka and Flink
 
[EN]DSS23_tspann_Integrating LLM with Streaming Data Pipelines
[EN]DSS23_tspann_Integrating LLM with Streaming Data Pipelines[EN]DSS23_tspann_Integrating LLM with Streaming Data Pipelines
[EN]DSS23_tspann_Integrating LLM with Streaming Data Pipelines
 
Evolve 2023 NYC - Integrating AI Into Realtime Data Pipelines Demo
Evolve 2023 NYC - Integrating AI Into Realtime Data Pipelines DemoEvolve 2023 NYC - Integrating AI Into Realtime Data Pipelines Demo
Evolve 2023 NYC - Integrating AI Into Realtime Data Pipelines Demo
 

Recently uploaded

+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
Health
 
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdfintroduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
VishalKumarJha10
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
mohitmore19
 

Recently uploaded (20)

Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS LiveVip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
 
The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...
The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...
The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...
 
Exploring the Best Video Editing App.pdf
Exploring the Best Video Editing App.pdfExploring the Best Video Editing App.pdf
Exploring the Best Video Editing App.pdf
 
8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students
 
AI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplateAI & Machine Learning Presentation Template
AI & Machine Learning Presentation Template
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview Questions
 
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdfintroduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docx
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTV
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Models
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
 
Diamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionDiamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with Precision
 
Introducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) SolutionIntroducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) Solution
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial Goals
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
 

CloudToolGuidance03May2023

  • 1. Cloud Tools Guidance Author: Timothy Spann Michael Kohs George Vetticaden Date: 04/19/2023 Last Updated: 5/01/2023 Notice This document assumes that you have registered for an account, activated it and logged into the CDP Sandbox. This is for authorized users only who have attended the webinar and have read the training materials. A short guide and references are listed here. THIS IS NOT FOR USE WITH THE FIRST TWO TUTORIALS. THIS IS FOR BUILDING ASSETS FOR YOUR OWN NEW FLOWS. 1. How To Build Data Assets 1.1 Create a Kafka Topic 1. Navigate to Data Hub Clusters
  • 2. 2. Navigate to the oss-kafka-demo cluster 3. Navigate to Streams Messaging Manager
  • 3. Info: Streams Messaging Manager (SMM) is a tool for working with Apache Kafka. 4. Now that you are in SMM. 5. Navigate to the round icon third from the top, click this Topic button.
  • 4. 6. You are now in the Topic browser. 7. Click Add New to build a new topic. 8. Enter the name of your topic prefixed with your “Workload User Name“ <yourusername>_yournewtopic, ex: tim_younewtopic.
  • 5. 9. Enter the name of your topic prefixed with your Workload User Name, ex: tim_younewtopic. For settings you should create it with (3 partitions, cleanup.policy: delete, availability maximum) as shown above. Congratulations! You have built a new topic.
  • 6. 1.2 Create a Schema If You Need One. Not Required For Using Kafka Topics or Tutorials. 1. Navigate to Schema Registry from the Kafka Data Hub. 2. You will see existing schemas. 3. Click the white plus sign in the gray hexagon to create a new schema.
  • 7.
  • 8. 4. You can now add a new schema by entering a unique name starting with your Workload User Name (ex: tim), followed by a short description and then the schema text as shown. If you need examples, see the github list at the end of this guide. 5. Click Save and you have a new schema. If there were errors they will be shown and you can fix them. For more help see, Schema Registry Documentation and Schema Registry Public Cloud. Congratulations! You have built a new schema. Start using it in your DataFlow application.
  • 9. 1.3 Create an Apache Iceberg Table 1. Navigate to oss-kudu-demo from the Data Hubs list 2. Navigate to Hue from the Kudu Data Hub. 3. Inside of Hue you can now create your table. 4. Navigate to your database, this was created for you.
  • 10. Info: The database name pattern is your email address and then all special characters are replaced with underscore and then _db is appended to that to make the db name and the ranger policy is created to limit access to just the user and those that are in the admin group. For example: 5. Create your Apache Iceberg table, it must be prefixed with your Work Load User Name (userid). CREATE TABLE <<userid>>_syslog_critical_archive (priority int, severity int, facility int, version int, event_timestamp bigint, hostname string, body string, appName string, procid string, messageid string, structureddata struct<sdid:struct<eventid:string,eventsource:string,iut:string>>) STORED BY ICEBERG 6. Your table is created in s3a://oss-uat2/iceberg/
  • 11. 7. Once you have sent data to your table, you can query it.
  • 12. Additional Documentation ● Create a Table ● Query a Table ● Apache Iceberg Table Properties
  • 13. 2. Streaming Data Sets Available for Apps The following Kafka topics are being populated with streaming data for you. These come from the read-only Kafka cluster. Navigate to the Data Hub Clusters. Click on oss-kafka-datagen.
  • 14. Click Schema Registry. Click Streams Messaging Manager.
  • 15. Use these brokers to connect to them: Brokers oss-kafka-datagen-corebroker1.oss-demo.qsm5-opic.cloudera.site:9093,oss-kafka-dat agen-corebroker0.oss-demo.qsm5-opic.cloudera.site:9093,oss-kafka-datagen-corebro ker2.oss-demo.qsm5-opic.cloudera.site:9093 Use this link for Schema Registry https://#{Schema2}:7790/api/v1 Schema Registry Parameter Hostname: Schema2 oss-kafka-datagen-master0.oss-demo.qsm5-opic.cloudera.site To View Schemas in the Schema Registry click the icon from the datahub https://oss-kafka-datagen-gateway.oss-demo.qsm5-opic.cloudera.site/oss-kafka-datagen/cd p-proxy/schema-registry/ui/#/ Schemas https://github.com/tspannhw/FLaNK-DataFlows/tree/main/schemas Group ID: yourid_cdf Customers (customer) Example Row {"first_name":"Charley","last_name":"Farrell","age":19,"city":"Sawaynside","country":"Guinea","em ail":"keven.herzog@hotmail.com","phone_number":"312-269-6619"}
  • 16. IP Tables (ip_address) Example Row {"source_ip":"216.25.204.241","dest_port":219,"tcp_flags_ack":0,"tcp_flags_reset":0,"ts":"2023-0 4-20 15:26:45.517"} Orders (orders) Example Row
  • 17. {"order_id":84170282,"city":"Wintheiserton","street_address":"80206 Caroyln Lakes","amount":29,"order_time":"2023-04-20 13:25:06.097","order_status":"DELIVERED"} Plants (plant) Example Row {"plant_id":829,"city":"Lake Gerald","lat":"39.568679","lon":"-151.64497","country":"Eritrea"} Sensors (sensor) Example Row {"sensor_id":264,"timestamp_of_production":"2023-04-20 18:28:42.751"}
  • 18. Sensor Data (sensor_data) Example Row {"sensor_id":250,"timestamp_of_production":"2023-04-20 18:42:04.847","sensor_value":-72} Weather (weather) Example Row {"city":"New Ernesto","temp_c":21,"description":"Sleet"}
  • 19. Transactions (transactions) Example Row {"sender_id":40816,"receiver_id":96057,"amount":557,"execution_date":"2023-04-20 16:15:30.744","currency":"UYU"} These are realistic generated data sources that you can use, they are available from read-only Kafka topics. These can be consumed by any developers in the sandbox. Make sure you name your Kafka Consumer your Workload Username _ Some Name. Ex: tim_customerdata_reader
  • 20. 3. Bring Your Own Data (Public Only) ● Data is visible and downloadable to all, make sure it is safe, free, open, public data. ● Public REST Feeds are good ○ Wikipedia https://docs.cloudera.com/dataflow/cloud/flow-designer-beginners-guide-readyflo w/topics/cdf-flow-designer-getting-started-readyflow.html ○ https://gbfs.citibikenyc.com/gbfs/en/station_status.json ○ https://travel.state.gov/_res/rss/TAsTWs.xml ○ https://www.njtransit.com/rss/BusAdvisories_feed.xml ○ https://www.njtransit.com/rss/RailAdvisories_feed.xml ○ https://www.njtransit.com/rss/LightRailAdvisories_feed.xml ○ https://www.njtransit.com/rss/CustomerNotices_feed.xml ○ https://w1.weather.gov/xml/current_obs/all_xml.zip ○ https://dailymed.nlm.nih.gov/dailymed/services/v2/spls.json?page=1&pagesize=1 00 ○ https://dailymed.nlm.nih.gov/dailymed/services/v2/drugnames.json?pagesize=10 0 ○ https://dailymed.nlm.nih.gov/dailymed/rss.cfm ● Generic data files ○ https://aws.amazon.com/data-exchange ● Simulators ○ Use external data simulators via REST ○ Use GeneralFlowFile see: https://www.datainmotion.dev/2019/04/integration-testing-for-apache-nifi.html ● Schemas, Data Sources and Examples ○ https://github.com/tspannhw/FLaNK-AllTheStreams/ ○ https://github.com/tspannhw/FLaNK-DataFlows ○ https://github.com/tspannhw/FLaNK-TravelAdvisory/ ○ https://github.com/tspannhw/FLiP-Current22-LetsMonitorAllTheThings ○ https://github.com/tspannhw/create-nifi-kafka-flink-apps ○ https://www.datainmotion.dev/2021/01/flank-real-time-transit-information-for.html