SlideShare une entreprise Scribd logo
1  sur  20
Télécharger pour lire hors ligne
Sequence Matrix
 Gene concatenation made easy
  Gaurav Vaidya1, David Lohman2, Rudolf Meier2

                           1: NeatCo Asia, Singapore.
                           2: Department of Biological Sciences,
                              National University of Singapore, Singapore.
Our goals


 ✤   Many powerful tools exist for concatenating sequences.

 ✤   Adding new sequences to an existing dataset is tedious and time consuming.

 ✤   Our initial goal: simple, user-friendly program for concatenating sequences.

 ✤   We also added a few tools to help you look for lab contamination in your dataset.
Sequence Matrix


✤   Written in Java.

    ✤   Graphical user interface libraries.

    ✤   Works on different operating systems.

    ✤   Easy to install: download and run the batch file.
Importing sequences



✤   You can use the sequence names as
    entered in the input file.

✤   Or you can ask Sequence Matrix to try
    to identify the species names.
Importing sequences

✤   Sequences mode:                                      ✤   Species name
    ✤   gi|237510679|gb|AY556753.2|Daubentonia               ✤   Daubentonia madagascariensis
        madagascariensis voucher WE94001 5.8S
        ribosomal RNA gene, partial sequence; internal
        transcribed spacer 2, complete sequence; and
        28S ribosomal RNA gene, partial sequence

    ✤   gi|237510678|gb|AY556735.2|Macaca                    ✤   Macaca sylvanus
        sylvanus voucher OK96022 5.8S ribosomal
        RNA gene, partial sequence; internal
        transcribed spacer 2, complete sequence; and
        28S ribosomal RNA gene, partial sequence
Importing sequences



✤   A common source of error is forgetting
    to recode leading and trailing gaps as
    missing information.

✤   Sequence Matrix can automatically
    replace such gaps with question marks.
Importing sequences: Naming



✤   Sequences from one dataset are matched up to another dataset by sequence name.

    ✤   Errors in sequence naming need to be fixed.

✤   We recommend naming your files by gene name: ‘coi’, ‘cytb’, ‘28S’ and so on.
Export: Taxonsets


✤   By default, we generate taxonsets on the
    basis of:

    ✤   Combined length.

    ✤   Number of character sets

    ✤   Information for a particular gene.
Gene trees



✤   Two ways to do them:

    ✤   Use the taxonset of taxa having information for a particular gene to exclude other
        taxa.

    ✤   Export the entire dataset with one file per column.
Export features



✤   You can also export the Sequence Matrix table as an Excel-readable text file.

    ✤   Supervisory mode.

    ✤   Keep track of a project as it grows.
Character sets


✤   We can read character sets defined in
    Nexus CHARSET and TNT xgroup
    commands.

✤   These can be “split” into individual
    columns, or imported as a single
    column representing the entire file.
Excision


✤   Individual sequences can be excised
    from the dataset.

✤   Excised sequences will not be exported.

    ✤   Sequence Matrix will warn you about
        that.
Contamination


✤   You thought you were sequencing Gorilla gorilla

    ✤   but you were really sequencing Homo sapiens.

✤   We have two tools you can use:

    ✤   If Homo sapiens is in your dataset.

    ✤   If Homo sapiens is not in your dataset (experimental!).
H. sapiens in dataset

✤   Looks for pairs of sequences whose
    pairwise distance is very low.

✤   Expected difference depends on gene:

    ✤   28S doesn’t change very much, but

    ✤   COI changes very quickly.

✤   Some interpretation is required.
H. sapiens not present

✤   Use “Pairwise Distance Mode” to look
    for unusual pairwise distances.

✤   Ignore one charset, then sort taxa based
    on their pairwise distance to a
    “reference taxon”.

    ✤   Colour sequences by their individual
        pairwise distances to the reference
        taxon.
H. sapiens not present

✤   Colour pairwise distances on the gene
    in question by their pairwise distance to
    the reference taxon.

✤   Look for colour variation which is
    unusual or out of place.

✤   We would expect sequences from
    different species to be correlated
    together.
Pairwise distance
mode

✤   You need to vary:

    ✤   The gene you are studying.

    ✤   The reference taxon being compared
        against.

✤   Possibly helpful as an alert mechanism.
Summary

✤   Sequence Matrix allows you to assemble and examine multigene, multitaxon datasets.

✤   Taxonsets allow you to analyse subsets of your data in downstream programs.

✤   Excising sequences gives you greater control over which sequences to analyse.

✤   You can look for contamination in two ways:

    ✤   Looking for very low pairwise distances across your entire dataset.

    ✤   Looking for unusual pairwise distances in Pairwise Distance Mode.
Acknowledgements

✤   Rudolf Meier

✤   Zhang Guanyang

✤   Farhan Ali

✤   David Lohman

✤   Everybody at the NUS DBS
    Evolutionary Biology lab.
Question time!

Contenu connexe

Tendances

Comparative genomics
Comparative genomicsComparative genomics
Comparative genomicskiran singh
 
Scope of GM crops in india
Scope of GM crops in indiaScope of GM crops in india
Scope of GM crops in indiaMD SALMAN ANJUM
 
Anticipatory plant breeding-1.pptx
Anticipatory plant breeding-1.pptxAnticipatory plant breeding-1.pptx
Anticipatory plant breeding-1.pptxMARUTHI PRASAD
 
Genome wide association studies seminar Prepared by Ms Varsha Gaitonde.
Genome wide association studies seminar Prepared by Ms Varsha Gaitonde.Genome wide association studies seminar Prepared by Ms Varsha Gaitonde.
Genome wide association studies seminar Prepared by Ms Varsha Gaitonde.Varsha Gayatonde
 
Stat 1163 -correlation and regression
Stat 1163 -correlation and regressionStat 1163 -correlation and regression
Stat 1163 -correlation and regressionKhulna University
 
Approaches to apply mas in plant breeding
Approaches to apply mas in plant breedingApproaches to apply mas in plant breeding
Approaches to apply mas in plant breedingsudha2555
 
Lecture 5 pharmacophore and qsar
Lecture 5  pharmacophore and  qsarLecture 5  pharmacophore and  qsar
Lecture 5 pharmacophore and qsarRAJAN ROLTA
 
Correlation and Path analysis in breeding experiments
Correlation and Path analysis in breeding experimentsCorrelation and Path analysis in breeding experiments
Correlation and Path analysis in breeding experimentsankit dhillon
 
Introducción a la bioinformatica
Introducción a la bioinformaticaIntroducción a la bioinformatica
Introducción a la bioinformaticaMartín Arrieta
 
Omics in plant breeding
Omics in plant breedingOmics in plant breeding
Omics in plant breedingpoornimakn04
 
Bioinformatics p1-perl-introduction v2013
Bioinformatics p1-perl-introduction v2013Bioinformatics p1-perl-introduction v2013
Bioinformatics p1-perl-introduction v2013Prof. Wim Van Criekinge
 
Softwares For Phylogentic Analysis
Softwares For Phylogentic AnalysisSoftwares For Phylogentic Analysis
Softwares For Phylogentic AnalysisPrasanthperceptron
 
Structural Bioinformatics in Drug Discovery.ppt
Structural Bioinformatics in Drug Discovery.pptStructural Bioinformatics in Drug Discovery.ppt
Structural Bioinformatics in Drug Discovery.pptAnandKumar279666
 
Gene pyramiding in tomato
Gene pyramiding in tomatoGene pyramiding in tomato
Gene pyramiding in tomatoRashmi kumari
 

Tendances (20)

Comparative genomics
Comparative genomicsComparative genomics
Comparative genomics
 
ChIP-seq Theory
ChIP-seq TheoryChIP-seq Theory
ChIP-seq Theory
 
Scope of GM crops in india
Scope of GM crops in indiaScope of GM crops in india
Scope of GM crops in india
 
Basics of association_mapping
Basics of association_mappingBasics of association_mapping
Basics of association_mapping
 
Anticipatory plant breeding-1.pptx
Anticipatory plant breeding-1.pptxAnticipatory plant breeding-1.pptx
Anticipatory plant breeding-1.pptx
 
Genome wide association studies seminar Prepared by Ms Varsha Gaitonde.
Genome wide association studies seminar Prepared by Ms Varsha Gaitonde.Genome wide association studies seminar Prepared by Ms Varsha Gaitonde.
Genome wide association studies seminar Prepared by Ms Varsha Gaitonde.
 
QTL mapping
QTL mappingQTL mapping
QTL mapping
 
Stat 1163 -correlation and regression
Stat 1163 -correlation and regressionStat 1163 -correlation and regression
Stat 1163 -correlation and regression
 
Approaches to apply mas in plant breeding
Approaches to apply mas in plant breedingApproaches to apply mas in plant breeding
Approaches to apply mas in plant breeding
 
Lecture 5 pharmacophore and qsar
Lecture 5  pharmacophore and  qsarLecture 5  pharmacophore and  qsar
Lecture 5 pharmacophore and qsar
 
Correlation and Path analysis in breeding experiments
Correlation and Path analysis in breeding experimentsCorrelation and Path analysis in breeding experiments
Correlation and Path analysis in breeding experiments
 
Linear regression
Linear regressionLinear regression
Linear regression
 
Introducción a la bioinformatica
Introducción a la bioinformaticaIntroducción a la bioinformatica
Introducción a la bioinformatica
 
Omics in plant breeding
Omics in plant breedingOmics in plant breeding
Omics in plant breeding
 
Bioinformatics p1-perl-introduction v2013
Bioinformatics p1-perl-introduction v2013Bioinformatics p1-perl-introduction v2013
Bioinformatics p1-perl-introduction v2013
 
Softwares For Phylogentic Analysis
Softwares For Phylogentic AnalysisSoftwares For Phylogentic Analysis
Softwares For Phylogentic Analysis
 
Structural Bioinformatics in Drug Discovery.ppt
Structural Bioinformatics in Drug Discovery.pptStructural Bioinformatics in Drug Discovery.ppt
Structural Bioinformatics in Drug Discovery.ppt
 
Gene pyramiding in tomato
Gene pyramiding in tomatoGene pyramiding in tomato
Gene pyramiding in tomato
 
Biopesticides
BiopesticidesBiopesticides
Biopesticides
 
RNA-Seq with R-Bioconductor
RNA-Seq with R-BioconductorRNA-Seq with R-Bioconductor
RNA-Seq with R-Bioconductor
 

Similaire à Sequence Matrix: Gene concatenation made easy

sequence alignment
sequence alignmentsequence alignment
sequence alignmentammar kareem
 
Introduction to sequence alignment
Introduction to sequence alignmentIntroduction to sequence alignment
Introduction to sequence alignmentKubuldinho
 
Introduction to Galaxy and RNA-Seq
Introduction to Galaxy and RNA-SeqIntroduction to Galaxy and RNA-Seq
Introduction to Galaxy and RNA-SeqEnis Afgan
 
Bioinformaatics for M.Sc. Biotecchnology.pptx
Bioinformaatics for M.Sc. Biotecchnology.pptxBioinformaatics for M.Sc. Biotecchnology.pptx
Bioinformaatics for M.Sc. Biotecchnology.pptxRanjan Jyoti Sarma
 
EST Clustering.ppt
EST Clustering.pptEST Clustering.ppt
EST Clustering.pptMedhavi27
 
RNA sequencing analysis tutorial with NGS
RNA sequencing analysis tutorial with NGSRNA sequencing analysis tutorial with NGS
RNA sequencing analysis tutorial with NGSHAMNAHAMNA8
 
RNASeq Experiment Design
RNASeq Experiment DesignRNASeq Experiment Design
RNASeq Experiment DesignYaoyu Wang
 
презентация за варшава
презентация за варшавапрезентация за варшава
презентация за варшаваValeriya Simeonova
 
AI 바이오 (4일차).pdf
AI 바이오 (4일차).pdfAI 바이오 (4일차).pdf
AI 바이오 (4일차).pdfH K Yoon
 
Functional genomics
Functional genomicsFunctional genomics
Functional genomicsajay301
 
Scalable Genome Analysis With ADAM
Scalable Genome Analysis With ADAMScalable Genome Analysis With ADAM
Scalable Genome Analysis With ADAMfnothaft
 
Part 2 of RNA-seq for DE analysis: Investigating raw data
Part 2 of RNA-seq for DE analysis: Investigating raw dataPart 2 of RNA-seq for DE analysis: Investigating raw data
Part 2 of RNA-seq for DE analysis: Investigating raw dataJoachim Jacob
 
XabTracker & SeqAgent: Integrated LIMS & Sequence Analysis Tools for Antibody...
XabTracker & SeqAgent: Integrated LIMS & Sequence Analysis Tools for Antibody...XabTracker & SeqAgent: Integrated LIMS & Sequence Analysis Tools for Antibody...
XabTracker & SeqAgent: Integrated LIMS & Sequence Analysis Tools for Antibody...Mark Evans
 
RNA-seq: analysis of raw data and preprocessing - part 2
RNA-seq: analysis of raw data and preprocessing - part 2RNA-seq: analysis of raw data and preprocessing - part 2
RNA-seq: analysis of raw data and preprocessing - part 2BITS
 

Similaire à Sequence Matrix: Gene concatenation made easy (20)

31931 31941
31931 3194131931 31941
31931 31941
 
sequence alignment
sequence alignmentsequence alignment
sequence alignment
 
Seq 301116
Seq 301116Seq 301116
Seq 301116
 
1 md2016 homology
1 md2016 homology1 md2016 homology
1 md2016 homology
 
Introduction to sequence alignment
Introduction to sequence alignmentIntroduction to sequence alignment
Introduction to sequence alignment
 
Introduction to Galaxy and RNA-Seq
Introduction to Galaxy and RNA-SeqIntroduction to Galaxy and RNA-Seq
Introduction to Galaxy and RNA-Seq
 
Bioinformaatics for M.Sc. Biotecchnology.pptx
Bioinformaatics for M.Sc. Biotecchnology.pptxBioinformaatics for M.Sc. Biotecchnology.pptx
Bioinformaatics for M.Sc. Biotecchnology.pptx
 
EST Clustering.ppt
EST Clustering.pptEST Clustering.ppt
EST Clustering.ppt
 
RNA sequencing analysis tutorial with NGS
RNA sequencing analysis tutorial with NGSRNA sequencing analysis tutorial with NGS
RNA sequencing analysis tutorial with NGS
 
Sequence Analysis
Sequence AnalysisSequence Analysis
Sequence Analysis
 
RNASeq Experiment Design
RNASeq Experiment DesignRNASeq Experiment Design
RNASeq Experiment Design
 
презентация за варшава
презентация за варшавапрезентация за варшава
презентация за варшава
 
AI 바이오 (4일차).pdf
AI 바이오 (4일차).pdfAI 바이오 (4일차).pdf
AI 바이오 (4일차).pdf
 
Functional genomics
Functional genomicsFunctional genomics
Functional genomics
 
Functional genomics
Functional genomicsFunctional genomics
Functional genomics
 
Scalable Genome Analysis With ADAM
Scalable Genome Analysis With ADAMScalable Genome Analysis With ADAM
Scalable Genome Analysis With ADAM
 
Part 2 of RNA-seq for DE analysis: Investigating raw data
Part 2 of RNA-seq for DE analysis: Investigating raw dataPart 2 of RNA-seq for DE analysis: Investigating raw data
Part 2 of RNA-seq for DE analysis: Investigating raw data
 
XabTracker & SeqAgent: Integrated LIMS & Sequence Analysis Tools for Antibody...
XabTracker & SeqAgent: Integrated LIMS & Sequence Analysis Tools for Antibody...XabTracker & SeqAgent: Integrated LIMS & Sequence Analysis Tools for Antibody...
XabTracker & SeqAgent: Integrated LIMS & Sequence Analysis Tools for Antibody...
 
RNA-seq: analysis of raw data and preprocessing - part 2
RNA-seq: analysis of raw data and preprocessing - part 2RNA-seq: analysis of raw data and preprocessing - part 2
RNA-seq: analysis of raw data and preprocessing - part 2
 
Ensembl annotation
Ensembl annotationEnsembl annotation
Ensembl annotation
 

Dernier

Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Farhan Tariq
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentPim van der Noll
 
QCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesQCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesBernd Ruecker
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
React JS; all concepts. Contains React Features, JSX, functional & Class comp...
React JS; all concepts. Contains React Features, JSX, functional & Class comp...React JS; all concepts. Contains React Features, JSX, functional & Class comp...
React JS; all concepts. Contains React Features, JSX, functional & Class comp...Karmanjay Verma
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
React Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkReact Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkPixlogix Infotech
 
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Nikki Chapple
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...itnewsafrica
 
Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Kaya Weers
 
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesMuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesManik S Magar
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesThousandEyes
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesKari Kakkonen
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch TuesdayIvanti
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI AgeCprime
 
A Glance At The Java Performance Toolbox
A Glance At The Java Performance ToolboxA Glance At The Java Performance Toolbox
A Glance At The Java Performance ToolboxAna-Maria Mihalceanu
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPathCommunity
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityIES VE
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 

Dernier (20)

Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
 
QCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesQCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architectures
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
React JS; all concepts. Contains React Features, JSX, functional & Class comp...
React JS; all concepts. Contains React Features, JSX, functional & Class comp...React JS; all concepts. Contains React Features, JSX, functional & Class comp...
React JS; all concepts. Contains React Features, JSX, functional & Class comp...
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
React Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkReact Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App Framework
 
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
 
Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)
 
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesMuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examples
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch Tuesday
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI Age
 
A Glance At The Java Performance Toolbox
A Glance At The Java Performance ToolboxA Glance At The Java Performance Toolbox
A Glance At The Java Performance Toolbox
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to Hero
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a reality
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 

Sequence Matrix: Gene concatenation made easy

  • 1. Sequence Matrix Gene concatenation made easy Gaurav Vaidya1, David Lohman2, Rudolf Meier2 1: NeatCo Asia, Singapore. 2: Department of Biological Sciences, National University of Singapore, Singapore.
  • 2. Our goals ✤ Many powerful tools exist for concatenating sequences. ✤ Adding new sequences to an existing dataset is tedious and time consuming. ✤ Our initial goal: simple, user-friendly program for concatenating sequences. ✤ We also added a few tools to help you look for lab contamination in your dataset.
  • 3. Sequence Matrix ✤ Written in Java. ✤ Graphical user interface libraries. ✤ Works on different operating systems. ✤ Easy to install: download and run the batch file.
  • 4. Importing sequences ✤ You can use the sequence names as entered in the input file. ✤ Or you can ask Sequence Matrix to try to identify the species names.
  • 5. Importing sequences ✤ Sequences mode: ✤ Species name ✤ gi|237510679|gb|AY556753.2|Daubentonia ✤ Daubentonia madagascariensis madagascariensis voucher WE94001 5.8S ribosomal RNA gene, partial sequence; internal transcribed spacer 2, complete sequence; and 28S ribosomal RNA gene, partial sequence ✤ gi|237510678|gb|AY556735.2|Macaca ✤ Macaca sylvanus sylvanus voucher OK96022 5.8S ribosomal RNA gene, partial sequence; internal transcribed spacer 2, complete sequence; and 28S ribosomal RNA gene, partial sequence
  • 6. Importing sequences ✤ A common source of error is forgetting to recode leading and trailing gaps as missing information. ✤ Sequence Matrix can automatically replace such gaps with question marks.
  • 7. Importing sequences: Naming ✤ Sequences from one dataset are matched up to another dataset by sequence name. ✤ Errors in sequence naming need to be fixed. ✤ We recommend naming your files by gene name: ‘coi’, ‘cytb’, ‘28S’ and so on.
  • 8. Export: Taxonsets ✤ By default, we generate taxonsets on the basis of: ✤ Combined length. ✤ Number of character sets ✤ Information for a particular gene.
  • 9. Gene trees ✤ Two ways to do them: ✤ Use the taxonset of taxa having information for a particular gene to exclude other taxa. ✤ Export the entire dataset with one file per column.
  • 10. Export features ✤ You can also export the Sequence Matrix table as an Excel-readable text file. ✤ Supervisory mode. ✤ Keep track of a project as it grows.
  • 11. Character sets ✤ We can read character sets defined in Nexus CHARSET and TNT xgroup commands. ✤ These can be “split” into individual columns, or imported as a single column representing the entire file.
  • 12. Excision ✤ Individual sequences can be excised from the dataset. ✤ Excised sequences will not be exported. ✤ Sequence Matrix will warn you about that.
  • 13. Contamination ✤ You thought you were sequencing Gorilla gorilla ✤ but you were really sequencing Homo sapiens. ✤ We have two tools you can use: ✤ If Homo sapiens is in your dataset. ✤ If Homo sapiens is not in your dataset (experimental!).
  • 14. H. sapiens in dataset ✤ Looks for pairs of sequences whose pairwise distance is very low. ✤ Expected difference depends on gene: ✤ 28S doesn’t change very much, but ✤ COI changes very quickly. ✤ Some interpretation is required.
  • 15. H. sapiens not present ✤ Use “Pairwise Distance Mode” to look for unusual pairwise distances. ✤ Ignore one charset, then sort taxa based on their pairwise distance to a “reference taxon”. ✤ Colour sequences by their individual pairwise distances to the reference taxon.
  • 16. H. sapiens not present ✤ Colour pairwise distances on the gene in question by their pairwise distance to the reference taxon. ✤ Look for colour variation which is unusual or out of place. ✤ We would expect sequences from different species to be correlated together.
  • 17. Pairwise distance mode ✤ You need to vary: ✤ The gene you are studying. ✤ The reference taxon being compared against. ✤ Possibly helpful as an alert mechanism.
  • 18. Summary ✤ Sequence Matrix allows you to assemble and examine multigene, multitaxon datasets. ✤ Taxonsets allow you to analyse subsets of your data in downstream programs. ✤ Excising sequences gives you greater control over which sequences to analyse. ✤ You can look for contamination in two ways: ✤ Looking for very low pairwise distances across your entire dataset. ✤ Looking for unusual pairwise distances in Pairwise Distance Mode.
  • 19. Acknowledgements ✤ Rudolf Meier ✤ Zhang Guanyang ✤ Farhan Ali ✤ David Lohman ✤ Everybody at the NUS DBS Evolutionary Biology lab.