The ArthropodEST pipeline is a web-accessible tool developed at Kansas State University that allows users to analyze expressed sequence tags (ESTs). It cleans input sequences, screens for contaminants, clusters ESTs into contigs through assembly, and predicts open reading frames and signal peptides. Users can access analysis results and a summary report through a unique URL sent to their email. The pipeline utilizes freely available bioinformatics software and provides more options for EST analysis than other existing online tools.
PISA-VET launch_El Iza Mohamedou_19 March 2024.pptx
Arthropod es tpipeline_poster
1. ArthropodEST: K-State Bioinformatics EST analysis pipeline * Sanjay Chellapilla 1 , Yoonseong Park 2 , Doina Caragea 3 and Susan J. Brown 1 1 Bioinformatics Center, Division of Biology 2 Department of Entomology 3 Department of Computing and Information Sciences Kansas State University, Manhattan KS 66506 ABSTRACT Expressed Sequence Tags (ESTs), produced by single-pass end-sequencing of cDNA clones, generate large datasets that are instrumental in gene discovery and gene sequence determination. Although several EST data analysis pipelines are available on the WWW ( e.g. ESTpass, EGassembler, ESTexplorer etc.), the WWW-accessible K-State Bioinformatics EST analysis pipeline ‘ArthropodEST’ goes further than these existing pipelines in providing more options and analyses, along with a user-friendly interface. The pipeline was developed utilizing freely available bioinformatics and system software (academic or F/OSS licenses). Available options in the pipeline include input sequence cleaning and screening for vectors and contaminants, masking repetitive sequences using repeat databases, clustering and assembly into contigs, computing ORFs (Open Reading Frames) and/or signal-peptide predictions, and assigning functional annotations to the contigs and singletons. The pipeline sends out automatic result notification email(s) containing a unique URL to download results from, to the user‘s email address. A summary report (automatically generated) of the analyses is included in the results available for download. The pipeline is accessible at http://bioinformatics.ksu.edu/ArthropodEST/ Acknowledgements: Supported by KSU-TE-AGC (SC), KSU Bioinformatics Center (DC, SC) and K-INBRE (DC, SC). KANSAS STATE UNIVERSITY KSU BIOINFORMATICS CENTER KSU ARTHROPOD GENOMICS CENTER K-INBRE Input sequences cleaning Vector/contaminant screening Assembly with optional prior clustering into contigs, singletons User downloads results and report from unique URL automatically sent by email Process user inputs, display project-receipt confirmation and summary, send automatic confirmation email, invoke pipeline shell script Further analyses: functional annotations and/or signal-peptide predictions server-side CGI script server-side Pipeline shell-script client-side (User) client-side (User) ArthropodEST homepage COMPONENTS OF THE PIPELINE (a) System software: GNU/Linux Ubuntu 2.6.24-23-server, bash 3.2.39, Apache 2.2.8 with mod_perl/2.0.3, PERL 5.8.8 with PERL modules CGI 3.29, Mail:Mailer 1.74, File::Temp 0.18, MySQL 5.0 and Postfix 2.5.4 Mail Transport Agent (MTA). (b) Bioinformatics software: - TGICL software suite [ http://compbio.dfci.harvard.edu/tgi/software/ ] - Vector databases: NCBI UniVec [ http://www.ncbi.nlm.nih.gov/VecScreen/UniVec.html ] EMBL EmVec [ ftp://ftp.ebi.ac.uk/pub/databases/emvec/ ] - RepeatMasker [ http://www.RepeatMasker.org/ ] and associated RepBase libraries [ http://www.girinst.org/ ] requires either cross_match [ http://www.phrap.org/phredphrapconsed.html ] or wu-blastall [ http://blast.wustl.edu/ ] - CAP3 sequence-assembly program [ http://seq.cs.iastate.edu/ ] - NCBI BLAST suite [ http://www.ncbi.nlm.nih.gov/BLAST/download.shtml ] and/or wu-blastall [ http://blast.wustl.edu/ ] - blast2GO pipeline version B2G4PIPE [ http://blast2go.bioinfo.cipf.es/ ] - signalp [ http://www.cbs.dtu.dk/services/SignalP/ ] and EMBOSS [ http://emboss.sourceforge.net/ ] (c) In-house developed software: WWW-interface HTML/CSS, server-side CGI, PERL, bash shell and awk scripts User-input: project name, e-mail address, input files and options/parameters for analyses Repeat-masking with standard RepBase libraries WORKFLOW