SlideShare a Scribd company logo
1 of 3
Download to read offline
Final Project –CS6243
Transcription Factor DNA Binding Prediction




                    Team Members:
                    Badri Sampath α

                    Iffat Sharmin Chowdhury α

                    Prosunjit Biswas α

                    Tahmina Ahmed α
            α
                Department of Computer Science

            University of Texas at San Antonio.
1. Defining the Scope of the Project:

In this project, we have given a number of labeled (which are p & n) DNA sequence and a number of
unlabeled DNA sequence which we have to label based on a model built from the given labeled
sequences. Eventually, the scope of the problem is to build a binary classifier model based on the given
training DNA sequence and apply the model to label the unlabeled DNA sequence.

        1.1 Challenges of the Projects:

In conventional classification problem, there are a number of different attributes that we can readily use to
build the classifier. In this project, we are only given sequences and label. So, part of the work for this
project, is to find a way for generating meaningful attribute.




                                 Fig. 1 : Overall scope of the project.

    2. K-mer Based Approach:

        In the K-mer approach, we have generated all possible combination of DNA characters for a
specified length of K. The K-mer Approach is shown in details in figure 2. The important steps of the k-
mer approach are discussed in the following paragraphs.




                                 Fig 2: Overall K-mer based process.

After we have generated the K-mers, we have followed different kind of approaches to count the
their frequencies which are i)Strict matching , ii) matching with mismatch and iii) matching based
on Regular Expression.

In order to build an optimum model, we have tuned different parameters of the model. Some of
parameters and their impact on the classifier is shown in table I.

    3. PWM Based Approach:

We have used a motif finding tool named MEME [1] to generate specified number of motifs of
specific minimum and maximum length and motif Alignment and search tool MAST [2] to get the
E-value (bounded to 100)for each sequence. We have derived scores from these E-values by
subtracting the E-value from 100 for ordering the sequences according to their E-value. We
have used these scores specific to each motif as attributes of the sequences and feed them to
different classifiers. Table II gives the synopsis of parameters and their impact on the model.

Table I: Synopsis of the parameters and their effect in the K-mer model building process.

  K-mer Value        Classifier Selection    String Match            MisMatch               Regular
                                                                                           Expression
     5( Best)           Logistic (Best)      When applied         When not applied      Not significant
                                             (perform best)        (perform best)
  4(reasonably           SMO (Good)         When not applied    When applied (perform
      good)                                     (perform          relatively worse)
                                            relatively worse)
 6 (Comparatively     J48 (Comparatively
      bad)                  weak)



Table II: Synopsis of the parameters for PWM approach and their effect in the model

 No. of Motif    No.of Sites a      Min / Max Length of Motif                 Classifier
                 Motif appear
     10                18                     6-15                            J48(Best)
      8                20                     5-16                        Logistic(Moderate)
      5                10                     6-15               Naïve Bayes(comparatively Bad)



   4. Combining K-mer & PWM approach:

In order to obtain a better model, we have combined both K-mer and PWM approaches with
known best parameters. We found reasonable improvement for the combined approach when
applying it in the training data.

   5. Some Difficulties and Limitation of our Work:

Tuning the parameters for the classifier was the most challenging part of the project. We think,
we have done reasonable experiment for choosing the parameters given the limited timeline.

   6. Acknowledgement:

At the end of the project, we would like to thank Dr. Ruan for assigning us such a challenging
project. It offered us good working knowledge of practical Machine Learning and data mining
stuffs. Working in the group was also a nice experience and knowledge sharing scope for us.

References:

[1-2] “MEME Suite“, available at http://meme.sdsc.edu/meme/meme-download.html
[3] “Weka”, available at: http://www.cs.waikato.ac.nz/ml/weka/index_downloading.html

More Related Content

What's hot

Speaker Identification From Youtube Obtained Data
Speaker Identification From Youtube Obtained DataSpeaker Identification From Youtube Obtained Data
Speaker Identification From Youtube Obtained Datasipij
 
Accurate global localization using visual odometry and digital (1)
Accurate global localization using visual odometry and digital (1)Accurate global localization using visual odometry and digital (1)
Accurate global localization using visual odometry and digital (1)Naveen Gouda
 
IRJET- American Sign Language Classification
IRJET- American Sign Language ClassificationIRJET- American Sign Language Classification
IRJET- American Sign Language ClassificationIRJET Journal
 
352735350 rsh-qam11-tif-15-doc
352735350 rsh-qam11-tif-15-doc352735350 rsh-qam11-tif-15-doc
352735350 rsh-qam11-tif-15-docFiras Husseini
 
Frontier in reinforcement learning
Frontier in reinforcement learningFrontier in reinforcement learning
Frontier in reinforcement learningJie-Han Chen
 
Reciprocal Ranking Fusion in Consumer Health Search - IMS UNIPD @ CLEF eHealt...
Reciprocal Ranking Fusion in Consumer Health Search - IMS UNIPD @ CLEF eHealt...Reciprocal Ranking Fusion in Consumer Health Search - IMS UNIPD @ CLEF eHealt...
Reciprocal Ranking Fusion in Consumer Health Search - IMS UNIPD @ CLEF eHealt...Giorgio Di Nunzio
 
Study on Some Key Issues of Synergetic Neural Network
Study on Some Key Issues of Synergetic Neural Network  Study on Some Key Issues of Synergetic Neural Network
Study on Some Key Issues of Synergetic Neural Network Jie Bao
 

What's hot (10)

Speaker Identification From Youtube Obtained Data
Speaker Identification From Youtube Obtained DataSpeaker Identification From Youtube Obtained Data
Speaker Identification From Youtube Obtained Data
 
Accurate global localization using visual odometry and digital (1)
Accurate global localization using visual odometry and digital (1)Accurate global localization using visual odometry and digital (1)
Accurate global localization using visual odometry and digital (1)
 
IRJET- American Sign Language Classification
IRJET- American Sign Language ClassificationIRJET- American Sign Language Classification
IRJET- American Sign Language Classification
 
Matlab course syllabus
Matlab course syllabusMatlab course syllabus
Matlab course syllabus
 
352735350 rsh-qam11-tif-15-doc
352735350 rsh-qam11-tif-15-doc352735350 rsh-qam11-tif-15-doc
352735350 rsh-qam11-tif-15-doc
 
Frontier in reinforcement learning
Frontier in reinforcement learningFrontier in reinforcement learning
Frontier in reinforcement learning
 
D111823
D111823D111823
D111823
 
Reciprocal Ranking Fusion in Consumer Health Search - IMS UNIPD @ CLEF eHealt...
Reciprocal Ranking Fusion in Consumer Health Search - IMS UNIPD @ CLEF eHealt...Reciprocal Ranking Fusion in Consumer Health Search - IMS UNIPD @ CLEF eHealt...
Reciprocal Ranking Fusion in Consumer Health Search - IMS UNIPD @ CLEF eHealt...
 
Analog Communication Apr 2013
Analog Communication Apr 2013Analog Communication Apr 2013
Analog Communication Apr 2013
 
Study on Some Key Issues of Synergetic Neural Network
Study on Some Key Issues of Synergetic Neural Network  Study on Some Key Issues of Synergetic Neural Network
Study on Some Key Issues of Synergetic Neural Network
 

Viewers also liked

An Application of Pattern matching for Motif Identification
An Application of Pattern matching for Motif IdentificationAn Application of Pattern matching for Motif Identification
An Application of Pattern matching for Motif IdentificationCSCJournals
 
Branch prediction contest_report
Branch prediction contest_reportBranch prediction contest_report
Branch prediction contest_reportUT, San Antonio
 
تصنيع البروتينات في الخلية
تصنيع البروتينات في الخليةتصنيع البروتينات في الخلية
تصنيع البروتينات في الخليةUniv. of Tripoli
 
Attribute Based Encryption
Attribute Based EncryptionAttribute Based Encryption
Attribute Based EncryptionUT, San Antonio
 
Sample graduation project presentation
Sample graduation project presentationSample graduation project presentation
Sample graduation project presentationburnsr
 

Viewers also liked (10)

An Application of Pattern matching for Motif Identification
An Application of Pattern matching for Motif IdentificationAn Application of Pattern matching for Motif Identification
An Application of Pattern matching for Motif Identification
 
Branch prediction contest_report
Branch prediction contest_reportBranch prediction contest_report
Branch prediction contest_report
 
Cyber Security Exam 2
Cyber Security Exam 2Cyber Security Exam 2
Cyber Security Exam 2
 
Recitation
RecitationRecitation
Recitation
 
Recitation
RecitationRecitation
Recitation
 
Ksi
KsiKsi
Ksi
 
تصنيع البروتينات في الخلية
تصنيع البروتينات في الخليةتصنيع البروتينات في الخلية
تصنيع البروتينات في الخلية
 
DNA Motif Finding 2010
DNA Motif Finding 2010DNA Motif Finding 2010
DNA Motif Finding 2010
 
Attribute Based Encryption
Attribute Based EncryptionAttribute Based Encryption
Attribute Based Encryption
 
Sample graduation project presentation
Sample graduation project presentationSample graduation project presentation
Sample graduation project presentation
 

Similar to Transcription Factor DNA Binding Prediction

IRJET- Deep Learning Model to Predict Hardware Performance
IRJET- Deep Learning Model to Predict Hardware PerformanceIRJET- Deep Learning Model to Predict Hardware Performance
IRJET- Deep Learning Model to Predict Hardware PerformanceIRJET Journal
 
IRJET- Analysis of PV Fed Vector Controlled Induction Motor Drive
IRJET- Analysis of PV Fed Vector Controlled Induction Motor DriveIRJET- Analysis of PV Fed Vector Controlled Induction Motor Drive
IRJET- Analysis of PV Fed Vector Controlled Induction Motor DriveIRJET Journal
 
IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...
IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...
IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...IJERD Editor
 
2cee Master Cocomo20071
2cee Master Cocomo200712cee Master Cocomo20071
2cee Master Cocomo20071CS, NcState
 
SYNOPSIS on Parse representation and Linear SVM.
SYNOPSIS on Parse representation and Linear SVM.SYNOPSIS on Parse representation and Linear SVM.
SYNOPSIS on Parse representation and Linear SVM.bhavinecindus
 
IRJET - Cognitive based Emotion Analysis of a Child Reading a Book
IRJET -  	  Cognitive based Emotion Analysis of a Child Reading a BookIRJET -  	  Cognitive based Emotion Analysis of a Child Reading a Book
IRJET - Cognitive based Emotion Analysis of a Child Reading a BookIRJET Journal
 
Developing Tools for “What if…” Testing of Large-scale Software Systems
Developing Tools for “What if…” Testing of Large-scale Software SystemsDeveloping Tools for “What if…” Testing of Large-scale Software Systems
Developing Tools for “What if…” Testing of Large-scale Software Systems James Hill
 
A Novel Design For Generating Dynamic Length Message Digest To Ensure Integri...
A Novel Design For Generating Dynamic Length Message Digest To Ensure Integri...A Novel Design For Generating Dynamic Length Message Digest To Ensure Integri...
A Novel Design For Generating Dynamic Length Message Digest To Ensure Integri...IRJET Journal
 
Software Product Measurement and Analysis in a Continuous Integration Environ...
Software Product Measurement and Analysis in a Continuous Integration Environ...Software Product Measurement and Analysis in a Continuous Integration Environ...
Software Product Measurement and Analysis in a Continuous Integration Environ...Gabriel Moreira
 
Quality Prediction in Fingerprint Compression
Quality Prediction in Fingerprint CompressionQuality Prediction in Fingerprint Compression
Quality Prediction in Fingerprint CompressionIJTET Journal
 
Barga Data Science lecture 10
Barga Data Science lecture 10Barga Data Science lecture 10
Barga Data Science lecture 10Roger Barga
 
Image Features Matching and Classification Using Machine Learning
Image Features Matching and Classification Using Machine LearningImage Features Matching and Classification Using Machine Learning
Image Features Matching and Classification Using Machine LearningIRJET Journal
 
Archana kalapgar 19210184_ca684
Archana kalapgar 19210184_ca684Archana kalapgar 19210184_ca684
Archana kalapgar 19210184_ca684ArchanaKalapgar
 
Principles of effort estimation
Principles of effort estimationPrinciples of effort estimation
Principles of effort estimationCS, NcState
 
KnowledgeFromDataAtScaleProject
KnowledgeFromDataAtScaleProjectKnowledgeFromDataAtScaleProject
KnowledgeFromDataAtScaleProjectMarciano Moreno
 
Sign Detection from Hearing Impaired
Sign Detection from Hearing ImpairedSign Detection from Hearing Impaired
Sign Detection from Hearing ImpairedIRJET Journal
 
A DEEP LEARNING APPROACH FOR SEMANTIC SEGMENTATION IN BRAIN TUMOR IMAGES
A DEEP LEARNING APPROACH FOR SEMANTIC SEGMENTATION IN BRAIN TUMOR IMAGESA DEEP LEARNING APPROACH FOR SEMANTIC SEGMENTATION IN BRAIN TUMOR IMAGES
A DEEP LEARNING APPROACH FOR SEMANTIC SEGMENTATION IN BRAIN TUMOR IMAGESPNandaSai
 
A Hierarchical Feature Set optimization for effective code change based Defec...
A Hierarchical Feature Set optimization for effective code change based Defec...A Hierarchical Feature Set optimization for effective code change based Defec...
A Hierarchical Feature Set optimization for effective code change based Defec...IOSR Journals
 

Similar to Transcription Factor DNA Binding Prediction (20)

InternshipReport
InternshipReportInternshipReport
InternshipReport
 
IRJET- Deep Learning Model to Predict Hardware Performance
IRJET- Deep Learning Model to Predict Hardware PerformanceIRJET- Deep Learning Model to Predict Hardware Performance
IRJET- Deep Learning Model to Predict Hardware Performance
 
IRJET- Analysis of PV Fed Vector Controlled Induction Motor Drive
IRJET- Analysis of PV Fed Vector Controlled Induction Motor DriveIRJET- Analysis of PV Fed Vector Controlled Induction Motor Drive
IRJET- Analysis of PV Fed Vector Controlled Induction Motor Drive
 
IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...
IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...
IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...
 
2cee Master Cocomo20071
2cee Master Cocomo200712cee Master Cocomo20071
2cee Master Cocomo20071
 
SYNOPSIS on Parse representation and Linear SVM.
SYNOPSIS on Parse representation and Linear SVM.SYNOPSIS on Parse representation and Linear SVM.
SYNOPSIS on Parse representation and Linear SVM.
 
IRJET - Cognitive based Emotion Analysis of a Child Reading a Book
IRJET -  	  Cognitive based Emotion Analysis of a Child Reading a BookIRJET -  	  Cognitive based Emotion Analysis of a Child Reading a Book
IRJET - Cognitive based Emotion Analysis of a Child Reading a Book
 
Test for AI model
Test for AI modelTest for AI model
Test for AI model
 
Developing Tools for “What if…” Testing of Large-scale Software Systems
Developing Tools for “What if…” Testing of Large-scale Software SystemsDeveloping Tools for “What if…” Testing of Large-scale Software Systems
Developing Tools for “What if…” Testing of Large-scale Software Systems
 
A Novel Design For Generating Dynamic Length Message Digest To Ensure Integri...
A Novel Design For Generating Dynamic Length Message Digest To Ensure Integri...A Novel Design For Generating Dynamic Length Message Digest To Ensure Integri...
A Novel Design For Generating Dynamic Length Message Digest To Ensure Integri...
 
Software Product Measurement and Analysis in a Continuous Integration Environ...
Software Product Measurement and Analysis in a Continuous Integration Environ...Software Product Measurement and Analysis in a Continuous Integration Environ...
Software Product Measurement and Analysis in a Continuous Integration Environ...
 
Quality Prediction in Fingerprint Compression
Quality Prediction in Fingerprint CompressionQuality Prediction in Fingerprint Compression
Quality Prediction in Fingerprint Compression
 
Barga Data Science lecture 10
Barga Data Science lecture 10Barga Data Science lecture 10
Barga Data Science lecture 10
 
Image Features Matching and Classification Using Machine Learning
Image Features Matching and Classification Using Machine LearningImage Features Matching and Classification Using Machine Learning
Image Features Matching and Classification Using Machine Learning
 
Archana kalapgar 19210184_ca684
Archana kalapgar 19210184_ca684Archana kalapgar 19210184_ca684
Archana kalapgar 19210184_ca684
 
Principles of effort estimation
Principles of effort estimationPrinciples of effort estimation
Principles of effort estimation
 
KnowledgeFromDataAtScaleProject
KnowledgeFromDataAtScaleProjectKnowledgeFromDataAtScaleProject
KnowledgeFromDataAtScaleProject
 
Sign Detection from Hearing Impaired
Sign Detection from Hearing ImpairedSign Detection from Hearing Impaired
Sign Detection from Hearing Impaired
 
A DEEP LEARNING APPROACH FOR SEMANTIC SEGMENTATION IN BRAIN TUMOR IMAGES
A DEEP LEARNING APPROACH FOR SEMANTIC SEGMENTATION IN BRAIN TUMOR IMAGESA DEEP LEARNING APPROACH FOR SEMANTIC SEGMENTATION IN BRAIN TUMOR IMAGES
A DEEP LEARNING APPROACH FOR SEMANTIC SEGMENTATION IN BRAIN TUMOR IMAGES
 
A Hierarchical Feature Set optimization for effective code change based Defec...
A Hierarchical Feature Set optimization for effective code change based Defec...A Hierarchical Feature Set optimization for effective code change based Defec...
A Hierarchical Feature Set optimization for effective code change based Defec...
 

More from UT, San Antonio

digital certificate - types and formats
digital certificate - types and formatsdigital certificate - types and formats
digital certificate - types and formatsUT, San Antonio
 
Static Analysis with Sonarlint
Static Analysis with SonarlintStatic Analysis with Sonarlint
Static Analysis with SonarlintUT, San Antonio
 
Shellshock- from bug towards vulnerability
Shellshock- from bug towards vulnerabilityShellshock- from bug towards vulnerability
Shellshock- from bug towards vulnerabilityUT, San Antonio
 
Big Data Processing: Performance Gain Through In-Memory Computation
Big Data Processing: Performance Gain Through In-Memory ComputationBig Data Processing: Performance Gain Through In-Memory Computation
Big Data Processing: Performance Gain Through In-Memory ComputationUT, San Antonio
 
Enumerated authorization policy ABAC (EP-ABAC) model
Enumerated authorization policy ABAC (EP-ABAC) modelEnumerated authorization policy ABAC (EP-ABAC) model
Enumerated authorization policy ABAC (EP-ABAC) modelUT, San Antonio
 
Where is my Privacy presentation slideshow (one page only)
Where is my Privacy presentation slideshow (one page only)Where is my Privacy presentation slideshow (one page only)
Where is my Privacy presentation slideshow (one page only)UT, San Antonio
 
Security_of_openstack_keystone
Security_of_openstack_keystoneSecurity_of_openstack_keystone
Security_of_openstack_keystoneUT, San Antonio
 
Research seminar group_1_prosunjit
Research seminar group_1_prosunjitResearch seminar group_1_prosunjit
Research seminar group_1_prosunjitUT, San Antonio
 
Final Project Transciption Factor DNA binding Prediction
Final Project Transciption Factor DNA binding Prediction Final Project Transciption Factor DNA binding Prediction
Final Project Transciption Factor DNA binding Prediction UT, San Antonio
 
Transcription Factor DNA Binding Prediction
Transcription Factor DNA Binding PredictionTranscription Factor DNA Binding Prediction
Transcription Factor DNA Binding PredictionUT, San Antonio
 
On the incoherencies in web browser access control
On the incoherencies in web browser access controlOn the incoherencies in web browser access control
On the incoherencies in web browser access controlUT, San Antonio
 

More from UT, San Antonio (20)

digital certificate - types and formats
digital certificate - types and formatsdigital certificate - types and formats
digital certificate - types and formats
 
Saml metadata
Saml metadataSaml metadata
Saml metadata
 
Static Analysis with Sonarlint
Static Analysis with SonarlintStatic Analysis with Sonarlint
Static Analysis with Sonarlint
 
Shellshock- from bug towards vulnerability
Shellshock- from bug towards vulnerabilityShellshock- from bug towards vulnerability
Shellshock- from bug towards vulnerability
 
Abac17 prosun-slides
Abac17 prosun-slidesAbac17 prosun-slides
Abac17 prosun-slides
 
Abac17 prosun-slides
Abac17 prosun-slidesAbac17 prosun-slides
Abac17 prosun-slides
 
Big Data Processing: Performance Gain Through In-Memory Computation
Big Data Processing: Performance Gain Through In-Memory ComputationBig Data Processing: Performance Gain Through In-Memory Computation
Big Data Processing: Performance Gain Through In-Memory Computation
 
Enumerated authorization policy ABAC (EP-ABAC) model
Enumerated authorization policy ABAC (EP-ABAC) modelEnumerated authorization policy ABAC (EP-ABAC) model
Enumerated authorization policy ABAC (EP-ABAC) model
 
Where is my Privacy presentation slideshow (one page only)
Where is my Privacy presentation slideshow (one page only)Where is my Privacy presentation slideshow (one page only)
Where is my Privacy presentation slideshow (one page only)
 
Three month course
Three month courseThree month course
Three month course
 
One month-syllabus
One month-syllabusOne month-syllabus
One month-syllabus
 
Zerovm backgroud
Zerovm backgroudZerovm backgroud
Zerovm backgroud
 
Security_of_openstack_keystone
Security_of_openstack_keystoneSecurity_of_openstack_keystone
Security_of_openstack_keystone
 
Research seminar group_1_prosunjit
Research seminar group_1_prosunjitResearch seminar group_1_prosunjit
Research seminar group_1_prosunjit
 
Final Project Transciption Factor DNA binding Prediction
Final Project Transciption Factor DNA binding Prediction Final Project Transciption Factor DNA binding Prediction
Final Project Transciption Factor DNA binding Prediction
 
Transcription Factor DNA Binding Prediction
Transcription Factor DNA Binding PredictionTranscription Factor DNA Binding Prediction
Transcription Factor DNA Binding Prediction
 
Secure webbrowsing 1
Secure webbrowsing 1Secure webbrowsing 1
Secure webbrowsing 1
 
On the incoherencies in web browser access control
On the incoherencies in web browser access controlOn the incoherencies in web browser access control
On the incoherencies in web browser access control
 
Cultural conflict
Cultural conflictCultural conflict
Cultural conflict
 
Pair programming
Pair programmingPair programming
Pair programming
 

Recently uploaded

08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilV3cube
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 

Recently uploaded (20)

08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 

Transcription Factor DNA Binding Prediction

  • 1. Final Project –CS6243 Transcription Factor DNA Binding Prediction Team Members: Badri Sampath α Iffat Sharmin Chowdhury α Prosunjit Biswas α Tahmina Ahmed α α Department of Computer Science University of Texas at San Antonio.
  • 2. 1. Defining the Scope of the Project: In this project, we have given a number of labeled (which are p & n) DNA sequence and a number of unlabeled DNA sequence which we have to label based on a model built from the given labeled sequences. Eventually, the scope of the problem is to build a binary classifier model based on the given training DNA sequence and apply the model to label the unlabeled DNA sequence. 1.1 Challenges of the Projects: In conventional classification problem, there are a number of different attributes that we can readily use to build the classifier. In this project, we are only given sequences and label. So, part of the work for this project, is to find a way for generating meaningful attribute. Fig. 1 : Overall scope of the project. 2. K-mer Based Approach: In the K-mer approach, we have generated all possible combination of DNA characters for a specified length of K. The K-mer Approach is shown in details in figure 2. The important steps of the k- mer approach are discussed in the following paragraphs. Fig 2: Overall K-mer based process. After we have generated the K-mers, we have followed different kind of approaches to count the their frequencies which are i)Strict matching , ii) matching with mismatch and iii) matching based on Regular Expression. In order to build an optimum model, we have tuned different parameters of the model. Some of parameters and their impact on the classifier is shown in table I. 3. PWM Based Approach: We have used a motif finding tool named MEME [1] to generate specified number of motifs of specific minimum and maximum length and motif Alignment and search tool MAST [2] to get the E-value (bounded to 100)for each sequence. We have derived scores from these E-values by subtracting the E-value from 100 for ordering the sequences according to their E-value. We
  • 3. have used these scores specific to each motif as attributes of the sequences and feed them to different classifiers. Table II gives the synopsis of parameters and their impact on the model. Table I: Synopsis of the parameters and their effect in the K-mer model building process. K-mer Value Classifier Selection String Match MisMatch Regular Expression 5( Best) Logistic (Best) When applied When not applied Not significant (perform best) (perform best) 4(reasonably SMO (Good) When not applied When applied (perform good) (perform relatively worse) relatively worse) 6 (Comparatively J48 (Comparatively bad) weak) Table II: Synopsis of the parameters for PWM approach and their effect in the model No. of Motif No.of Sites a Min / Max Length of Motif Classifier Motif appear 10 18 6-15 J48(Best) 8 20 5-16 Logistic(Moderate) 5 10 6-15 Naïve Bayes(comparatively Bad) 4. Combining K-mer & PWM approach: In order to obtain a better model, we have combined both K-mer and PWM approaches with known best parameters. We found reasonable improvement for the combined approach when applying it in the training data. 5. Some Difficulties and Limitation of our Work: Tuning the parameters for the classifier was the most challenging part of the project. We think, we have done reasonable experiment for choosing the parameters given the limited timeline. 6. Acknowledgement: At the end of the project, we would like to thank Dr. Ruan for assigning us such a challenging project. It offered us good working knowledge of practical Machine Learning and data mining stuffs. Working in the group was also a nice experience and knowledge sharing scope for us. References: [1-2] “MEME Suite“, available at http://meme.sdsc.edu/meme/meme-download.html [3] “Weka”, available at: http://www.cs.waikato.ac.nz/ml/weka/index_downloading.html