SlideShare a Scribd company logo
1 of 18
RapidMiner5 2.9 - Word vector tool and RapidMiner
Word Vector tool The Word & Web Vector Tool is a flexible Java library for statistical language modeling and integration of Web and Webservice based data sources.  It supports the creation of word vector representations of text documents in the vector space model that is the point of departure for many text processing applications .
Installation 1.	Download the archive form wvtoolsourceforge website.
Installation 2. Putting it into lib/plugins directory of your RapidMiner installation, example: D:rogram Filesapid-IapidMiner5iblugins
Word Vector tool The aim of the WVTool is to provide a simple to use, simple to extend pure Java library for text and webmining. It can easily be invoked from any Java application.
Word Vector tool WVTool bridges a gap between highly sophisticated linguistic packages as the GATE system on the one side and many partial solutions that are part of diverse text and information retrieval applications on the other side.
Functions
Word List A word list contains all terms used for vectorization together with some statistics  (e.g. in how many documents a term appears). The word list is needed for vectorization to define which terms are considered as dimensions of the vector space and for weighting purposes.
WVtool functions Input list that tells the system which text documents to process WVTool Function  Inputs A configuration object, that tells the system which methods to use in the individual steps.
Defining the input The input list tells the WVTool which texts should be processed. Every item in the list contains the following information: A URI  The language the document is written in (optional) ˆ The type of the document (optional) ˆ The character encoding of the document, e.g. UTF-8 (optional) ˆ A class label
Using Predefined Word Lists In some cases it is necessary to exactly define the dimensions of the vector space, yet leaving the counting of terms and documents to the WVTool . This can be achieved by calling the word list creation function with a list of String values.
Text Input The TextInput operator creates an ExampleSet from a collection of texts. The output ExampleSet contains one row for each text document and one column of each term.
Text Classification, Clustering and Visualization For text classification, the class labels (e.g. positive, negative) are defined in the TextInput operator, as described above. Using clustering or dimensionality reduction, there is a possibility to directly visualize text documents from the RapidMiner Visualization panel.
Creating and Maintaining Word Lists Creating an Initial Word List: An initial word list can be created by using the following chain of operators:
Creating and Maintaining Word Lists Applying a Word List:  You can apply a word list in two ways:  To use the actual weights, first create word vectors using the TextInput Operator and then use the AttributeWeightsLoader and AttributesWeightsApplier on the resulting ExampleSet.
Creating and Maintaining Word Lists Applying a Word List:  You can apply a word list in two ways:  2.  To use the word list only as a selection of relevant terms and leave it to the TextInput to actually weight them, use the AttributeWeightsLoader before. The TextInput will create vectors that contain as dimensions only terms in the word list, that have a weight larger than zero.
Creating and Maintaining Word Lists Updating a Word List :  If you add new documents to your corpus, usually additional terms will be relevant and should be added to the word list. After the InteractiveAttributeWeighting operator pops up, use the load function to load your original word list.
More Questions? Reach us at support@dataminingtools.net Visit: www.dataminingtools.net

More Related Content

Viewers also liked

Viewers also liked (14)

ML+Hadoop at NYC Predictive Analytics
ML+Hadoop at NYC Predictive AnalyticsML+Hadoop at NYC Predictive Analytics
ML+Hadoop at NYC Predictive Analytics
 
Music recommendations @ MLConf 2014
Music recommendations @ MLConf 2014Music recommendations @ MLConf 2014
Music recommendations @ MLConf 2014
 
RapidMiner: Introduction To Rapid Miner
RapidMiner: Introduction To Rapid MinerRapidMiner: Introduction To Rapid Miner
RapidMiner: Introduction To Rapid Miner
 
RAPIDMINER: Rapidminerproducts
RAPIDMINER: RapidminerproductsRAPIDMINER: Rapidminerproducts
RAPIDMINER: Rapidminerproducts
 
A Simple Introduction to Word Embeddings
A Simple Introduction to Word EmbeddingsA Simple Introduction to Word Embeddings
A Simple Introduction to Word Embeddings
 
Data Science Thailand Meetup#11
Data Science Thailand Meetup#11Data Science Thailand Meetup#11
Data Science Thailand Meetup#11
 
Standardizing +113 million Merchant Names in Financial Services with Greenplu...
Standardizing +113 million Merchant Names in Financial Services with Greenplu...Standardizing +113 million Merchant Names in Financial Services with Greenplu...
Standardizing +113 million Merchant Names in Financial Services with Greenplu...
 
Data mining tools (R , WEKA, RAPID MINER, ORANGE)
Data mining tools (R , WEKA, RAPID MINER, ORANGE)Data mining tools (R , WEKA, RAPID MINER, ORANGE)
Data mining tools (R , WEKA, RAPID MINER, ORANGE)
 
RapidMiner: Data Mining And Rapid Miner
RapidMiner: Data Mining And Rapid MinerRapidMiner: Data Mining And Rapid Miner
RapidMiner: Data Mining And Rapid Miner
 
Titan: The Rise of Big Graph Data
Titan: The Rise of Big Graph DataTitan: The Rise of Big Graph Data
Titan: The Rise of Big Graph Data
 
Data Science - Part III - EDA & Model Selection
Data Science - Part III - EDA & Model SelectionData Science - Part III - EDA & Model Selection
Data Science - Part III - EDA & Model Selection
 
Data Science - Part I - Sustaining Predictive Analytics Capabilities
Data Science - Part I - Sustaining Predictive Analytics CapabilitiesData Science - Part I - Sustaining Predictive Analytics Capabilities
Data Science - Part I - Sustaining Predictive Analytics Capabilities
 
Data Science - Part XVII - Deep Learning & Image Processing
Data Science - Part XVII - Deep Learning & Image ProcessingData Science - Part XVII - Deep Learning & Image Processing
Data Science - Part XVII - Deep Learning & Image Processing
 
Data Science - Part V - Decision Trees & Random Forests
Data Science - Part V - Decision Trees & Random Forests Data Science - Part V - Decision Trees & Random Forests
Data Science - Part V - Decision Trees & Random Forests
 

Similar to RapidMiner: Word Vector Tool And Rapid Miner

Team G
Team GTeam G
Team G
butest
 
Ant conc ~design & development of a freeware
Ant conc ~design & development of a freewareAnt conc ~design & development of a freeware
Ant conc ~design & development of a freeware
sarahannelazarus
 
Presentation kaushal
Presentation kaushalPresentation kaushal
Presentation kaushal
Ajay Yadav
 
InstructionYou’ll probably want to import FileReader, PrintWriter,.pdf
InstructionYou’ll probably want to import FileReader, PrintWriter,.pdfInstructionYou’ll probably want to import FileReader, PrintWriter,.pdf
InstructionYou’ll probably want to import FileReader, PrintWriter,.pdf
arsmobiles
 

Similar to RapidMiner: Word Vector Tool And Rapid Miner (20)

List of values Best Practices
List of values Best PracticesList of values Best Practices
List of values Best Practices
 
Team G
Team GTeam G
Team G
 
MS SQL SERVER: SSIS and data mining
MS SQL SERVER: SSIS and data miningMS SQL SERVER: SSIS and data mining
MS SQL SERVER: SSIS and data mining
 
MS SQL SERVER: SSIS and data mining
MS SQL SERVER: SSIS and data miningMS SQL SERVER: SSIS and data mining
MS SQL SERVER: SSIS and data mining
 
Ant conc notes
Ant conc notesAnt conc notes
Ant conc notes
 
I x scripting
I x scriptingI x scripting
I x scripting
 
 
Project seminar
Project seminarProject seminar
Project seminar
 
Ant conc ~design & development of a freeware
Ant conc ~design & development of a freewareAnt conc ~design & development of a freeware
Ant conc ~design & development of a freeware
 
20131112 Introduction to LaTeX for EndNote Users.docx
20131112 Introduction to LaTeX for EndNote Users.docx20131112 Introduction to LaTeX for EndNote Users.docx
20131112 Introduction to LaTeX for EndNote Users.docx
 
Olap
OlapOlap
Olap
 
Programming Without Coding Technology (PWCT) Environment
Programming Without Coding Technology (PWCT) EnvironmentProgramming Without Coding Technology (PWCT) Environment
Programming Without Coding Technology (PWCT) Environment
 
Improving writing aids, the community way
Improving writing aids, the community wayImproving writing aids, the community way
Improving writing aids, the community way
 
Bootcamp - Web Development Session 2
Bootcamp - Web Development Session 2Bootcamp - Web Development Session 2
Bootcamp - Web Development Session 2
 
Intro To Flex Typography 360|Flex
Intro To Flex Typography 360|FlexIntro To Flex Typography 360|Flex
Intro To Flex Typography 360|Flex
 
Presentation kaushal
Presentation kaushalPresentation kaushal
Presentation kaushal
 
InstructionYou’ll probably want to import FileReader, PrintWriter,.pdf
InstructionYou’ll probably want to import FileReader, PrintWriter,.pdfInstructionYou’ll probably want to import FileReader, PrintWriter,.pdf
InstructionYou’ll probably want to import FileReader, PrintWriter,.pdf
 
What is html xml and xhtml
What is html xml and xhtmlWhat is html xml and xhtml
What is html xml and xhtml
 
PDFArticle
PDFArticlePDFArticle
PDFArticle
 
Robot framework
Robot frameworkRobot framework
Robot framework
 

More from Rapidmining Content (11)

RapidMiner: Data Mining And Rapid Miner
RapidMiner:  Data Mining And Rapid MinerRapidMiner:  Data Mining And Rapid Miner
RapidMiner: Data Mining And Rapid Miner
 
RapidMiner: Setting Up A Process
RapidMiner:  Setting Up A ProcessRapidMiner:  Setting Up A Process
RapidMiner: Setting Up A Process
 
RapidMiner: Rapid Miner Products
RapidMiner:  Rapid Miner ProductsRapidMiner:  Rapid Miner Products
RapidMiner: Rapid Miner Products
 
RapidMiner: Advanced Processes And Operators
RapidMiner:  Advanced Processes And OperatorsRapidMiner:  Advanced Processes And Operators
RapidMiner: Advanced Processes And Operators
 
RapidMiner: Learning Schemes In Rapid Miner5
RapidMiner:   Learning Schemes In Rapid Miner5RapidMiner:   Learning Schemes In Rapid Miner5
RapidMiner: Learning Schemes In Rapid Miner5
 
RapidMiner: Performance Validation And Visualization
RapidMiner:  Performance Validation And VisualizationRapidMiner:  Performance Validation And Visualization
RapidMiner: Performance Validation And Visualization
 
Rapid Miner: Data Transformation
Rapid Miner:   Data TransformationRapid Miner:   Data Transformation
Rapid Miner: Data Transformation
 
Rapid Miner: Nested Subprocesses
Rapid Miner:  Nested SubprocessesRapid Miner:  Nested Subprocesses
Rapid Miner: Nested Subprocesses
 
Rapidminer: Visualization Capabilities
Rapidminer:   Visualization CapabilitiesRapidminer:   Visualization Capabilities
Rapidminer: Visualization Capabilities
 
Rapidminer: Modelling Data
Rapidminer:  Modelling DataRapidminer:  Modelling Data
Rapidminer: Modelling Data
 
Rapidminer: Important Elements
Rapidminer: Important ElementsRapidminer: Important Elements
Rapidminer: Important Elements
 

Recently uploaded

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 

Recently uploaded (20)

04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 

RapidMiner: Word Vector Tool And Rapid Miner

  • 1. RapidMiner5 2.9 - Word vector tool and RapidMiner
  • 2. Word Vector tool The Word & Web Vector Tool is a flexible Java library for statistical language modeling and integration of Web and Webservice based data sources. It supports the creation of word vector representations of text documents in the vector space model that is the point of departure for many text processing applications .
  • 3. Installation 1. Download the archive form wvtoolsourceforge website.
  • 4. Installation 2. Putting it into lib/plugins directory of your RapidMiner installation, example: D:rogram Filesapid-IapidMiner5iblugins
  • 5. Word Vector tool The aim of the WVTool is to provide a simple to use, simple to extend pure Java library for text and webmining. It can easily be invoked from any Java application.
  • 6. Word Vector tool WVTool bridges a gap between highly sophisticated linguistic packages as the GATE system on the one side and many partial solutions that are part of diverse text and information retrieval applications on the other side.
  • 8. Word List A word list contains all terms used for vectorization together with some statistics (e.g. in how many documents a term appears). The word list is needed for vectorization to define which terms are considered as dimensions of the vector space and for weighting purposes.
  • 9. WVtool functions Input list that tells the system which text documents to process WVTool Function Inputs A configuration object, that tells the system which methods to use in the individual steps.
  • 10. Defining the input The input list tells the WVTool which texts should be processed. Every item in the list contains the following information: A URI The language the document is written in (optional) ˆ The type of the document (optional) ˆ The character encoding of the document, e.g. UTF-8 (optional) ˆ A class label
  • 11. Using Predefined Word Lists In some cases it is necessary to exactly define the dimensions of the vector space, yet leaving the counting of terms and documents to the WVTool . This can be achieved by calling the word list creation function with a list of String values.
  • 12. Text Input The TextInput operator creates an ExampleSet from a collection of texts. The output ExampleSet contains one row for each text document and one column of each term.
  • 13. Text Classification, Clustering and Visualization For text classification, the class labels (e.g. positive, negative) are defined in the TextInput operator, as described above. Using clustering or dimensionality reduction, there is a possibility to directly visualize text documents from the RapidMiner Visualization panel.
  • 14. Creating and Maintaining Word Lists Creating an Initial Word List: An initial word list can be created by using the following chain of operators:
  • 15. Creating and Maintaining Word Lists Applying a Word List: You can apply a word list in two ways: To use the actual weights, first create word vectors using the TextInput Operator and then use the AttributeWeightsLoader and AttributesWeightsApplier on the resulting ExampleSet.
  • 16. Creating and Maintaining Word Lists Applying a Word List: You can apply a word list in two ways: 2. To use the word list only as a selection of relevant terms and leave it to the TextInput to actually weight them, use the AttributeWeightsLoader before. The TextInput will create vectors that contain as dimensions only terms in the word list, that have a weight larger than zero.
  • 17. Creating and Maintaining Word Lists Updating a Word List : If you add new documents to your corpus, usually additional terms will be relevant and should be added to the word list. After the InteractiveAttributeWeighting operator pops up, use the load function to load your original word list.
  • 18. More Questions? Reach us at support@dataminingtools.net Visit: www.dataminingtools.net