Slides of the I Workshop on command-line tools with the collaboration of CAG (Center for Applied Genomics - Children's Hospital of Philadelphia) bioinformatics analysts.
2nd day
Workshop on command line tools - day 1Leandro Lima
Slides of the I Workshop on command-line tools with the collaboration of CAG (Center for Applied Genomics - Children's Hospital of Philadelphia) bioinformatics analysts.
1st day
This document provides a cheat sheet for using the UNIX C Shell, covering topics like file manipulation, terminal setup, changing directories, listing and displaying files, copying, moving, removing files, printing files, finding files, comparing files, setting permissions, input/output redirection, C shell specific commands, job control, and special startup files.
This document discusses using sliding windows to aggregate streaming data in MapReduce. It proposes buffering input tuples in mappers until a window is full, then emitting the aggregate. Combiners and reducers combine partial aggregates across windows. Window ranges are initialized and updated during merging to remove outdated data and handle late arrivals. This approach allows streaming aggregation queries to be executed with MapReduce.
The document provides an overview of basic shell syntax and commands in UNIX shells. It discusses command line options, how shells find commands, aliases, standard input/output redirection, running jobs in background, pattern matching, switching shells, and combining multiple commands. Key points covered include using options and arguments with commands, the PATH variable, built-in commands, piping output, redirecting I/O, listing and managing jobs, wildcard patterns, and grouping commands.
This document contains a lecture on working with arrays, scripts, and SSH/SCP in UNIX systems. It discusses special variables used in scripts, how to define and manipulate arrays, examples of useful scripts for renaming files, backing up data, and extracting video files from DVDs, and how to use SSH to securely connect to remote systems and SCP to securely transfer files between systems. It also covers generating and using public/private key pairs for passwordless SSH login.
The document describes a trash command that provides a recycle bin functionality for Linux similar to Windows. It works by moving deleted files to the $HOME/.trash directory instead of permanently removing them. Users can restore deleted files by running the rm -l command and specifying the file row number. The trash command also checks the trash directory size and automatically deletes the oldest files if the space limit is exceeded.
The document provides an introduction to basic UNIX commands written by Razor on January 15, 2000 for new UNIX users. It includes commands for working with files and permissions, such as cp and mv to copy and move files, cd to change directories, pwd to show the current directory, mkdir to create directories, and rm to delete files and directories. The first part focuses on commands for copying, moving, changing directories, viewing the current directory, creating directories, and deleting files and directories.
The document describes how to create a simple character device driver for Linux. It involves writing C code for file operations like open, read, write and close. The code is compiled into a kernel module which is loaded and tested. Key steps include creating files for the driver code, adding a Makefile, building the kernel object, loading the module, creating a device file, and verifying the file operations by reading kernel logs. The module can then be unloaded after testing is completed.
Workshop on command line tools - day 1Leandro Lima
Slides of the I Workshop on command-line tools with the collaboration of CAG (Center for Applied Genomics - Children's Hospital of Philadelphia) bioinformatics analysts.
1st day
This document provides a cheat sheet for using the UNIX C Shell, covering topics like file manipulation, terminal setup, changing directories, listing and displaying files, copying, moving, removing files, printing files, finding files, comparing files, setting permissions, input/output redirection, C shell specific commands, job control, and special startup files.
This document discusses using sliding windows to aggregate streaming data in MapReduce. It proposes buffering input tuples in mappers until a window is full, then emitting the aggregate. Combiners and reducers combine partial aggregates across windows. Window ranges are initialized and updated during merging to remove outdated data and handle late arrivals. This approach allows streaming aggregation queries to be executed with MapReduce.
The document provides an overview of basic shell syntax and commands in UNIX shells. It discusses command line options, how shells find commands, aliases, standard input/output redirection, running jobs in background, pattern matching, switching shells, and combining multiple commands. Key points covered include using options and arguments with commands, the PATH variable, built-in commands, piping output, redirecting I/O, listing and managing jobs, wildcard patterns, and grouping commands.
This document contains a lecture on working with arrays, scripts, and SSH/SCP in UNIX systems. It discusses special variables used in scripts, how to define and manipulate arrays, examples of useful scripts for renaming files, backing up data, and extracting video files from DVDs, and how to use SSH to securely connect to remote systems and SCP to securely transfer files between systems. It also covers generating and using public/private key pairs for passwordless SSH login.
The document describes a trash command that provides a recycle bin functionality for Linux similar to Windows. It works by moving deleted files to the $HOME/.trash directory instead of permanently removing them. Users can restore deleted files by running the rm -l command and specifying the file row number. The trash command also checks the trash directory size and automatically deletes the oldest files if the space limit is exceeded.
The document provides an introduction to basic UNIX commands written by Razor on January 15, 2000 for new UNIX users. It includes commands for working with files and permissions, such as cp and mv to copy and move files, cd to change directories, pwd to show the current directory, mkdir to create directories, and rm to delete files and directories. The first part focuses on commands for copying, moving, changing directories, viewing the current directory, creating directories, and deleting files and directories.
The document describes how to create a simple character device driver for Linux. It involves writing C code for file operations like open, read, write and close. The code is compiled into a kernel module which is loaded and tested. Key steps include creating files for the driver code, adding a Makefile, building the kernel object, loading the module, creating a device file, and verifying the file operations by reading kernel logs. The module can then be unloaded after testing is completed.
This document discusses using tracing, awk, and xgraph to analyze network performance parameters from ns2 trace files. It provides details on the wired trace format, examples of awk scripts to calculate link throughput and end-to-end throughput between nodes, as well as a script to calculate average link delay between transmitting and receiving nodes.
This document summarizes the key capabilities of Warp 10, a time series data ingestion, processing, and visualization platform:
1. Warp 10 can ingest high volumes of time series data from sensors and other IoT devices via HTTP, WebSockets, and many collection tools in a performant manner.
2. It provides a feature-rich scripting language called WarpScript that allows users to manipulate, analyze, and transform ingested time series data using over 690 functions and frameworks.
3. Warp 10 includes tools to visualize time series data in real-time through widgets that can display charts, images, and more generated from WarpScript. Dynamic tile widgets also enable building configurable
Shell Script to Extract IP Address, MAC Address InformationVCP Muthukrishna
This script collects the active MAC addresses, IP addresses, and associated hardware vendor information on a system. It uses the arp command to gather this network information and outputs it to an HTML file. The HTML file displays the IP address, MAC address, and includes a hyperlink to lookup the IEEE vendor information based on the first three octets of the MAC address. It also includes an option to email the results in an HTML formatted email.
1. The document describes an ns-2 tutorial exercise on simulating computer networks using the ns-2 simulator. It provides example scripts for basic network simulations.
2. The example scripts simulate simple network topologies with increasing complexity, including UDP and TCP traffic over droptail and queue configurations.
3. Later examples introduce more complex scenarios like dynamic routing protocols and simulating link failures to observe network behavior.
Linux Shell Scripts and Shell Commands✌️Nazmul Hyder
I just shortly describe some Linux shell script and shell commands.Hopefully, it will help you to file edit, make, delete directory operations, grep, pipeline and lots of stuff to your Linux/Mac terminal.
Abstract:
This talk will introduce you to the concept of Kubernetes Volume plugins. We will not only help you understand the basic concepts, but more importantly, using practical examples, we will show how you can develop your own volume plugins and contribute them back to the community of the OSS project as large as Kubernetes.
We will conclude the talk by discussing various challenges one can come across when contributing to a high velocity OSS project of Kubernetes' size which can help you avoid the pain and enjoy the path.
Sched Link: http://sched.co/6BYB
Artimon is a scalable metrics collection and analysis framework. It collects metrics called 'variable instances' that have a name, labels, and timestamped values. Metrics can be exported via a Thrift service and stored in distributed systems like Kafka for later analysis using Groovy scripts. Artimon is designed to collect both IT and business metrics and can adapt to collect from third party sources using agents.
1) The document provides instructions for setting up an AWS account and launching an EC2 instance with an AMI that contains tools and documentation for a hands-on tutorial on NoSQL databases and MongoDB.
2) The tutorial covers basic MongoDB commands and demonstrates how to create, insert, update, and query document data using the mongo shell client. Embedded and nested documents are explored along with geospatial queries.
3) A map-reduce example aggregates historical check-in data to calculate popular locations over different time periods, demonstrating how MongoDB supports batch operations.
This document provides an overview of Bash scripting concepts including file systems, variables and strings, math operations, file ownership and permissions, users and privileges, processes and subshells, loops, conditional statements, I/O redirection, named pipes, signals, and GUI tools. It also includes examples of Bluetooth file sharing, auto-shutdown scripts, lockscreen notifications, web crawling scripts, and time tracking automation. References are provided for further reading.
"PostgreSQL and Python" Lightning Talk @EuroPython2014Henning Jacobs
PL/Python allows users to write PostgreSQL functions and procedures using Python. It enables accessing PostgreSQL data and running Python code from within SQL queries. For example, a function could query a database table, process the results in Python by accessing modules, and return a value to the SQL query. This opens up possibilities to leverage Python's extensive libraries and expressiveness to expose data and perform complex validation from PostgreSQL.
Bash Script Disk Space Utilization Report and EMailVCP Muthukrishna
This bash script generates an HTML disk usage report and emails it. It collects disk usage information using df, formats it into an HTML table, and emails the report. If disk usage exceeds 90%, the row is highlighted red and a critical alert is shown. Usage between 70-80% is highlighted orange. The report is generated daily and emailed to a recipient.
File Space Usage Information and EMail Report - Shell ScriptVCP Muthukrishna
This script generates an HTML report of the top 10 largest files and directories on a server by size and emails the report. It uses the du command to get disk usage information, sorts the results in descending order of size, and outputs the top 10 to a HTML table. The table is written to a file and emailed using sendmail to notify of disk space usage.
Maxym Kharchenko presented ways to manage Oracle databases with Python. He demonstrated a Python tool to ping multiple Oracle databases concurrently and time the execution. The tool reports the status and timing for each database pinged. Python enforces good coding practices and interfaces well with databases, APIs, and other systems. Learning Python helps develop a more Pythonic way of thinking that can improve code quality and productivity.
Coming Out Of Your Shell - A Comparison of *Nix ShellsKel Cecil
This document provides an overview of several popular shell options including bash, zsh, and fish. It discusses their origins, key features, and popular frameworks used to enhance them. The document encourages exploring options like oh-my-zsh and oh-my-fish to benefit from community configurations while also highlighting capabilities in each shell beyond their initial reputation. The takeaways emphasize that with tweaking, bash is capable of more than assumed, zsh rewards investment in unlocking its power, and fish offers useful features out of the box.
The document discusses the dplyr package for R. It provides examples of using dplyr verbs like filter, select, mutate, and summarise to subset and transform data frames. It also demonstrates grouping data with group_by and joining data with inner_join. The key features of dplyr are its simple verbs for filtering, modifying, arranging and summarizing data, its use of piping with %>%, and its convenience for working with tabular data.
The document discusses using functional programming techniques in Perl to efficiently calculate tree hashes of large files uploaded in chunks to cloud storage services. It presents a tree_fold keyword and implementation that allows recursively reducing a list of values using a block in a tail-call optimized manner to avoid stack overflows. This approach is shown to provide concise, efficient and elegant functional code for calculating tree hashes in both Perl 5 and Perl 6.
The document provides instructions for configuring Postfix to integrate with Active Directory for user authentication. It includes configuring Postfix configuration files and LDAP settings to query user information from Active Directory for mail delivery, alias lookups, and more. Commands are provided to install required packages, configure ClamAV for antivirus scanning, and set up virtual users on the mail server using directories mounted from an iSCSI LUN.
Process monitoring in UNIX shell scriptingDan Morrill
This script monitors a hardcoded process called "ssh" and restarts it if it stops running. It will attempt to restart the process 3 times before reporting a failure. The script logs status messages to a log file called "procmon.log". It uses color codes to identify status messages. The script contains functions to monitor the process, detect failures, and close the script logging the ending status.
At the Dublin Fashion Insights Centre, we are exploring methods of categorising the web into a set of known fashion related topics. This raises questions such as: How many fashion related topics are there? How closely are they related to each other, or to other non-fashion topics? Furthermore, what topic hierarchies exist in this landscape? Using Clojure and MLlib to harness the data available from crowd-sourced websites such as DMOZ (a categorisation of millions of websites) and Common Crawl (a monthly crawl of billions of websites), we are answering these questions to understand fashion in a quantitative manner.
The latest generation of big data tools such as Apache Spark routinely handle petabytes of data while also addressing real-world realities like node and network failures. Spark's transformations and operations on data sets are a natural fit with Clojure's everyday use of transformations and reductions. Spark MLlib's excellent implementations of distributed machine learning algorithms puts the power of large-scale analytics in the hands of Clojure developers. At Zalando's Dublin Fashion Insights Centre, we're using the Clojure bindings to Spark and MLlib to answer fashion-related questions that until recently have been nearly impossible to answer quantitatively.
Hunter Kelly @retnuh
tech.zalando.com
The document discusses best practices for using the command line interface (CLI) efficiently. It provides examples of using common UNIX commands like grep, find, awk, sort, uniq, and xargs to analyze command history, filter output, run commands in parallel, and handle encoding. The document emphasizes building small, focused tools according to the UNIX philosophy and provides tips for common tasks like capturing command output, checking exit statuses, running subprocesses, and handling different operating systems. Recommendations are made for CLI productivity tools and the linter ShellCheck.
This document discusses using tracing, awk, and xgraph to analyze network performance parameters from ns2 trace files. It provides details on the wired trace format, examples of awk scripts to calculate link throughput and end-to-end throughput between nodes, as well as a script to calculate average link delay between transmitting and receiving nodes.
This document summarizes the key capabilities of Warp 10, a time series data ingestion, processing, and visualization platform:
1. Warp 10 can ingest high volumes of time series data from sensors and other IoT devices via HTTP, WebSockets, and many collection tools in a performant manner.
2. It provides a feature-rich scripting language called WarpScript that allows users to manipulate, analyze, and transform ingested time series data using over 690 functions and frameworks.
3. Warp 10 includes tools to visualize time series data in real-time through widgets that can display charts, images, and more generated from WarpScript. Dynamic tile widgets also enable building configurable
Shell Script to Extract IP Address, MAC Address InformationVCP Muthukrishna
This script collects the active MAC addresses, IP addresses, and associated hardware vendor information on a system. It uses the arp command to gather this network information and outputs it to an HTML file. The HTML file displays the IP address, MAC address, and includes a hyperlink to lookup the IEEE vendor information based on the first three octets of the MAC address. It also includes an option to email the results in an HTML formatted email.
1. The document describes an ns-2 tutorial exercise on simulating computer networks using the ns-2 simulator. It provides example scripts for basic network simulations.
2. The example scripts simulate simple network topologies with increasing complexity, including UDP and TCP traffic over droptail and queue configurations.
3. Later examples introduce more complex scenarios like dynamic routing protocols and simulating link failures to observe network behavior.
Linux Shell Scripts and Shell Commands✌️Nazmul Hyder
I just shortly describe some Linux shell script and shell commands.Hopefully, it will help you to file edit, make, delete directory operations, grep, pipeline and lots of stuff to your Linux/Mac terminal.
Abstract:
This talk will introduce you to the concept of Kubernetes Volume plugins. We will not only help you understand the basic concepts, but more importantly, using practical examples, we will show how you can develop your own volume plugins and contribute them back to the community of the OSS project as large as Kubernetes.
We will conclude the talk by discussing various challenges one can come across when contributing to a high velocity OSS project of Kubernetes' size which can help you avoid the pain and enjoy the path.
Sched Link: http://sched.co/6BYB
Artimon is a scalable metrics collection and analysis framework. It collects metrics called 'variable instances' that have a name, labels, and timestamped values. Metrics can be exported via a Thrift service and stored in distributed systems like Kafka for later analysis using Groovy scripts. Artimon is designed to collect both IT and business metrics and can adapt to collect from third party sources using agents.
1) The document provides instructions for setting up an AWS account and launching an EC2 instance with an AMI that contains tools and documentation for a hands-on tutorial on NoSQL databases and MongoDB.
2) The tutorial covers basic MongoDB commands and demonstrates how to create, insert, update, and query document data using the mongo shell client. Embedded and nested documents are explored along with geospatial queries.
3) A map-reduce example aggregates historical check-in data to calculate popular locations over different time periods, demonstrating how MongoDB supports batch operations.
This document provides an overview of Bash scripting concepts including file systems, variables and strings, math operations, file ownership and permissions, users and privileges, processes and subshells, loops, conditional statements, I/O redirection, named pipes, signals, and GUI tools. It also includes examples of Bluetooth file sharing, auto-shutdown scripts, lockscreen notifications, web crawling scripts, and time tracking automation. References are provided for further reading.
"PostgreSQL and Python" Lightning Talk @EuroPython2014Henning Jacobs
PL/Python allows users to write PostgreSQL functions and procedures using Python. It enables accessing PostgreSQL data and running Python code from within SQL queries. For example, a function could query a database table, process the results in Python by accessing modules, and return a value to the SQL query. This opens up possibilities to leverage Python's extensive libraries and expressiveness to expose data and perform complex validation from PostgreSQL.
Bash Script Disk Space Utilization Report and EMailVCP Muthukrishna
This bash script generates an HTML disk usage report and emails it. It collects disk usage information using df, formats it into an HTML table, and emails the report. If disk usage exceeds 90%, the row is highlighted red and a critical alert is shown. Usage between 70-80% is highlighted orange. The report is generated daily and emailed to a recipient.
File Space Usage Information and EMail Report - Shell ScriptVCP Muthukrishna
This script generates an HTML report of the top 10 largest files and directories on a server by size and emails the report. It uses the du command to get disk usage information, sorts the results in descending order of size, and outputs the top 10 to a HTML table. The table is written to a file and emailed using sendmail to notify of disk space usage.
Maxym Kharchenko presented ways to manage Oracle databases with Python. He demonstrated a Python tool to ping multiple Oracle databases concurrently and time the execution. The tool reports the status and timing for each database pinged. Python enforces good coding practices and interfaces well with databases, APIs, and other systems. Learning Python helps develop a more Pythonic way of thinking that can improve code quality and productivity.
Coming Out Of Your Shell - A Comparison of *Nix ShellsKel Cecil
This document provides an overview of several popular shell options including bash, zsh, and fish. It discusses their origins, key features, and popular frameworks used to enhance them. The document encourages exploring options like oh-my-zsh and oh-my-fish to benefit from community configurations while also highlighting capabilities in each shell beyond their initial reputation. The takeaways emphasize that with tweaking, bash is capable of more than assumed, zsh rewards investment in unlocking its power, and fish offers useful features out of the box.
The document discusses the dplyr package for R. It provides examples of using dplyr verbs like filter, select, mutate, and summarise to subset and transform data frames. It also demonstrates grouping data with group_by and joining data with inner_join. The key features of dplyr are its simple verbs for filtering, modifying, arranging and summarizing data, its use of piping with %>%, and its convenience for working with tabular data.
The document discusses using functional programming techniques in Perl to efficiently calculate tree hashes of large files uploaded in chunks to cloud storage services. It presents a tree_fold keyword and implementation that allows recursively reducing a list of values using a block in a tail-call optimized manner to avoid stack overflows. This approach is shown to provide concise, efficient and elegant functional code for calculating tree hashes in both Perl 5 and Perl 6.
The document provides instructions for configuring Postfix to integrate with Active Directory for user authentication. It includes configuring Postfix configuration files and LDAP settings to query user information from Active Directory for mail delivery, alias lookups, and more. Commands are provided to install required packages, configure ClamAV for antivirus scanning, and set up virtual users on the mail server using directories mounted from an iSCSI LUN.
Process monitoring in UNIX shell scriptingDan Morrill
This script monitors a hardcoded process called "ssh" and restarts it if it stops running. It will attempt to restart the process 3 times before reporting a failure. The script logs status messages to a log file called "procmon.log". It uses color codes to identify status messages. The script contains functions to monitor the process, detect failures, and close the script logging the ending status.
At the Dublin Fashion Insights Centre, we are exploring methods of categorising the web into a set of known fashion related topics. This raises questions such as: How many fashion related topics are there? How closely are they related to each other, or to other non-fashion topics? Furthermore, what topic hierarchies exist in this landscape? Using Clojure and MLlib to harness the data available from crowd-sourced websites such as DMOZ (a categorisation of millions of websites) and Common Crawl (a monthly crawl of billions of websites), we are answering these questions to understand fashion in a quantitative manner.
The latest generation of big data tools such as Apache Spark routinely handle petabytes of data while also addressing real-world realities like node and network failures. Spark's transformations and operations on data sets are a natural fit with Clojure's everyday use of transformations and reductions. Spark MLlib's excellent implementations of distributed machine learning algorithms puts the power of large-scale analytics in the hands of Clojure developers. At Zalando's Dublin Fashion Insights Centre, we're using the Clojure bindings to Spark and MLlib to answer fashion-related questions that until recently have been nearly impossible to answer quantitatively.
Hunter Kelly @retnuh
tech.zalando.com
The document discusses best practices for using the command line interface (CLI) efficiently. It provides examples of using common UNIX commands like grep, find, awk, sort, uniq, and xargs to analyze command history, filter output, run commands in parallel, and handle encoding. The document emphasizes building small, focused tools according to the UNIX philosophy and provides tips for common tasks like capturing command output, checking exit statuses, running subprocesses, and handling different operating systems. Recommendations are made for CLI productivity tools and the linter ShellCheck.
Shell Script Disk Usage Report and E-Mail Current Threshold StatusVCP Muthukrishna
This shell script generates a disk usage report for each disk partition on a server and emails the report. It checks disk usage percentages against thresholds of 90%, 80%, and 70% and colors partitions red, orange, or green accordingly in the report. It also calculates the difference in disk usage from the previous report 12 hours ago and includes this in the emailed report. Running the script generates an HTML report file and uses sendmail to email the file to specified recipients.
The fundamentals and advance application of Node will be covered. We will explore the design choices that make Node.js unique, how this changes the way applications are built and how systems of applications work most effectively in this model. You will learn how to create modular code that’s robust, expressive and clear. Understand when to use callbacks, event emitters and streams.
This document discusses refactoring Java code to Clojure using macros. It provides examples of refactoring Java code that uses method chaining to equivalent Clojure code using the threading macros (->> and -<>). It also discusses other Clojure features like type hints, the doto macro, and polyglot projects using Leiningen.
This document loads various libraries and reads in multiple csv files containing transportation data. It then performs some data cleaning and preprocessing steps. Various outputs are defined to render tables and plots of subsets of the data. Plots are created to visualize relationships between weighted time, cost, and safety metrics. Interactive elements are added to output text describing user input from the plots. Maps and motion charts are also defined as outputs to visualize additional data aspects.
This document contains an assignment submission for an Operating Systems lab course. It includes commands practiced during classwork and homework on topics like file manipulation and shell scripting. The homework portion focuses on shell scripting, with examples of scripts using basic constructs like loops, conditionals, variables, and input/output redirection.
Introduction to Unix - POS420Unix Lab Exercise Week 3 BTo.docxmariuse18nolet
This document provides an introduction and overview of common Unix commands including find, grep, sort, uniq, diff, and awk. It includes over 30 examples of using each command to find, search, filter, compare, and manipulate text-based files. The examples cover basic and advanced uses of each command, such as recursively searching directories with find, searching for patterns with grep, sorting and de-duplicating lines with sort and uniq, comparing differences between files with diff, and selecting, calculating, and transforming data with awk.
From mysql to MongoDB(MongoDB2011北京交流会)Night Sailer
The document summarizes differences between MySQL and MongoDB data types and operations. MongoDB uses BSON for data types rather than separate numeric, text and blob types. It supports embedded documents and arrays. Unlike MySQL, MongoDB does not have tables or rows, but collections and documents. Operations like insert, update, find, sort and index are discussed as alternatives to SQL equivalents.
paexec distributes tasks over a network or CPUs. It allows processing large amounts of data or tasks in parallel by running tasks on multiple machines or CPUs. It supports heterogeneous environments like BSD, Linux, and Windows. Tasks can have dependencies, and paexec can build a dependency graph to ensure dependent tasks run in the correct order. It is resistant to network and calculator failures and will retry or redistribute failed tasks.
This document discusses time series analysis techniques in R, including decomposition, forecasting, clustering, and classification. It provides examples of decomposing the AirPassengers dataset, forecasting with ARIMA models, hierarchical clustering on synthetic control chart data using Euclidean and DTW distances, and classifying the control chart data using decision trees with DWT features. Accuracy of over 88% was achieved on the classification task.
This document discusses using the doSNOW package in R to perform parallel programming and speed up simulations. It explains how to register clusters, use foreach loops with .combine functions, and load necessary packages within loops. Testing with different numbers of clusters shows speedups over serial execution, with optimal speedups achieved when the number of clusters matches or exceeds the number of cores. Processing jobs in parallel reduces the elapsed time for each job.
Kafka Streams: Revisiting the decisions of the past (How I could have made it...confluent
Kafka Streams: Revisiting the decisions of the past (How I could have made it better), Jason Bell, Kafka DevOps Engineer @ Digitalis.io
https://www.meetup.com/Cleveland-Kafka/events/272339276/
Ns is a network simulator developed at UC Berkeley and elsewhere that allows modeling of TCP/IP networks and wireless networks using C++ and OTcl. It provides objects for nodes, links, network traffic and wireless channel modeling. The document outlines how to install ns, create basic simulations with nodes and traffic, and extend it for wireless simulations using various protocols.
CLI Wizardry - A Friendly Intro To sed/awk/grepAll Things Open
This document provides an introduction to common command line interface (CLI) tools including grep, sed, awk, and xargs. It explains that grep fetches lines containing a search term, sed replaces text within lines, awk processes output by columns, and xargs pipes output to command line arguments. The document demonstrates examples of each tool and how they can be combined in pipelines to extract and transform text for tasks like analyzing log files or creating a storage pool.
Beyond PHP - It's not (just) about the codeWim Godden
Most PHP developers focus on writing code. But creating Web applications is about much more than just wrting PHP. Take a step outside the PHP cocoon and into the big PHP ecosphere to find out how small code changes can make a world of difference on servers and network. This talk is an eye-opener for developers who spend over 80% of their time coding, debugging and testing.
Code is not text! How graph technologies can help us to understand our code b...Andreas Dewes
Today, we almost exclusively think of code in software projects as a collection of text files. The tools that we use (version control systems, IDEs, code analyzers) also use text as the primary storage format for code. In fact, the belief that “code is text” is so deeply ingrained in our heads that we never question its validity or even become aware of the fact that there are other ways to look at code.
In my talk I will explain why treating code as text is a very bad idea which actively holds back our understanding and creates a range of problems in large software projects. I will then show how we can overcome (some of) these problems by treating and storing code as data, and more specifically as a graph. I will show specific examples of how we can use this approach to improve our understanding of large code bases, increase code quality and automate certain aspects of software development.
Finally, I will outline my personal vision of the future of programming, which is a future where we no longer primarily interact with code bases using simple text editors. I will also give some ideas on how we might get to that future.
R is a language and environment for statistical computing and graphics. It is based on S, an earlier language developed at Bell Labs. R features include being cross-platform, open source, having a package-based repository, strong graphics capabilities, and active user and developer communities. Useful URLs and books for learning R are provided. Instructions for installing R and RStudio on different platforms are given. R can be used for a wide range of statistical analyses and data visualization.
Designing Opeation Oriented Web Applications / YAPC::Asia Tokyo 2011Masahiro Nagano
The document describes using Log::Minimal to log messages with timestamps, severity levels, and stack traces. Log::Minimal provides functions like debugf(), infof(), warnf() that log messages, and configuration options like AUTODUMP and PRINT to customize the output format. It can be used to log messages from multi-threaded or distributed applications.
Similaire à Workshop on command line tools - day 2 (20)
Estudos genéticos em doenças Mendelianas e complexasLeandro Lima
O documento discute estudos genéticos de doenças complexas, incluindo: (1) estudos de associação em todo o genoma que buscam variantes genéticas comuns associadas a doenças; (2) desafios como explicar apenas uma pequena parte da variação genética e incluir variantes raras; (3) o uso de interações proteína-proteína para entender melhor a arquitetura genética de doenças.
Uso do Cytoscape para Visualização e Análise de RedesLeandro Lima
O documento discute o software Cytoscape, que é usado para visualização e análise de redes biológicas. Ele permite visualizar redes de interação molecular e vias metabólicas, integrando essas redes com dados de expressão gênica e outras informações. O Cytoscape tem ferramentas integradas e plugins para diferentes análises, e permite acessar bancos de dados públicos, exportar imagens e salvar sessões.
Brokers e Bridges (genes em rede de interação proteína-proteína)Leandro Lima
O documento discute bridges e brokers em uma rede. Bridges conectam partes importantes da rede apesar de terem poucas conexões, e são importantes para evitar que partes da rede fiquem desconectadas. Brokers têm muitas ligações e funcionam como corretores, ligando pessoas que não se conhecem, e sua vizinhança fica desconectada se forem removidos da rede.
Int. à Bioinformática (FMU - 08/05/2012)Leandro Lima
- O documento introduz o campo da bioinformática, discutindo o DNA, genoma, sequenciamento, montagem e anotação de genomas. Também aborda alinhamento de sequências usando programação dinâmica e aplicações como estudos de expressão gênica e redes biológicas.
Redes Complexas aplicadas a Redes Sociais (09/05/2012 - FMU)Leandro Lima
O documento discute redes complexas aplicadas a redes sociais. Ele apresenta o palestrante e seu background acadêmico, define o que são redes e como elas podem ser representadas, e discute como redes sociais, biológicas e de influência podem ser analisadas usando métodos de redes complexas.
Voxxed Days Trieste 2024 - Unleashing the Power of Vector Search and Semantic...Luigi Fugaro
Vector databases are redefining data handling, enabling semantic searches across text, images, and audio encoded as vectors.
Redis OM for Java simplifies this innovative approach, making it accessible even for those new to vector data.
This presentation explores the cutting-edge features of vector search and semantic caching in Java, highlighting the Redis OM library through a demonstration application.
Redis OM has evolved to embrace the transformative world of vector database technology, now supporting Redis vector search and seamless integration with OpenAI, Hugging Face, LangChain, and LlamaIndex. This talk highlights the latest advancements in Redis OM, focusing on how it simplifies the complex process of vector indexing, data modeling, and querying for AI-powered applications. We will explore the new capabilities of Redis OM, including intuitive vector search interfaces and semantic caching, which reduce the overhead of large language model (LLM) calls.
What is Continuous Testing in DevOps - A Definitive Guide.pdfkalichargn70th171
Once an overlooked aspect, continuous testing has become indispensable for enterprises striving to accelerate application delivery and reduce business impacts. According to a Statista report, 31.3% of global enterprises have embraced continuous integration and deployment within their DevOps, signaling a pervasive trend toward hastening release cycles.
A Comprehensive Guide on Implementing Real-World Mobile Testing Strategies fo...kalichargn70th171
In today's fiercely competitive mobile app market, the role of the QA team is pivotal for continuous improvement and sustained success. Effective testing strategies are essential to navigate the challenges confidently and precisely. Ensuring the perfection of mobile apps before they reach end-users requires thoughtful decisions in the testing plan.
The Rising Future of CPaaS in the Middle East 2024Yara Milbes
Explore "The Rising Future of CPaaS in the Middle East in 2024" with this comprehensive PPT presentation. Discover how Communication Platforms as a Service (CPaaS) is transforming communication across various sectors in the Middle East.
How Can Hiring A Mobile App Development Company Help Your Business Grow?ToXSL Technologies
ToXSL Technologies is an award-winning Mobile App Development Company in Dubai that helps businesses reshape their digital possibilities with custom app services. As a top app development company in Dubai, we offer highly engaging iOS & Android app solutions. https://rb.gy/necdnt
Building API data products on top of your real-time data infrastructureconfluent
This talk and live demonstration will examine how Confluent and Gravitee.io integrate to unlock value from streaming data through API products.
You will learn how data owners and API providers can document, secure data products on top of Confluent brokers, including schema validation, topic routing and message filtering.
You will also see how data and API consumers can discover and subscribe to products in a developer portal, as well as how they can integrate with Confluent topics through protocols like REST, Websockets, Server-sent Events and Webhooks.
Whether you want to monetize your real-time data, enable new integrations with partners, or provide self-service access to topics through various protocols, this webinar is for you!
Unlock the Secrets to Effortless Video Creation with Invideo: Your Ultimate G...The Third Creative Media
"Navigating Invideo: A Comprehensive Guide" is an essential resource for anyone looking to master Invideo, an AI-powered video creation tool. This guide provides step-by-step instructions, helpful tips, and comparisons with other AI video creators. Whether you're a beginner or an experienced video editor, you'll find valuable insights to enhance your video projects and bring your creative ideas to life.
Why Apache Kafka Clusters Are Like Galaxies (And Other Cosmic Kafka Quandarie...Paul Brebner
Closing talk for the Performance Engineering track at Community Over Code EU (Bratislava, Slovakia, June 5 2024) https://eu.communityovercode.org/sessions/2024/why-apache-kafka-clusters-are-like-galaxies-and-other-cosmic-kafka-quandaries-explored/ Instaclustr (now part of NetApp) manages 100s of Apache Kafka clusters of many different sizes, for a variety of use cases and customers. For the last 7 years I’ve been focused outwardly on exploring Kafka application development challenges, but recently I decided to look inward and see what I could discover about the performance, scalability and resource characteristics of the Kafka clusters themselves. Using a suite of Performance Engineering techniques, I will reveal some surprising discoveries about cosmic Kafka mysteries in our data centres, related to: cluster sizes and distribution (using Zipf’s Law), horizontal vs. vertical scalability, and predicting Kafka performance using metrics, modelling and regression techniques. These insights are relevant to Kafka developers and operators.
Nashik's top web development company, Upturn India Technologies, crafts innovative digital solutions for your success. Partner with us and achieve your goals
The Role of DevOps in Digital Transformation.pdfmohitd6
DevOps plays a crucial role in driving digital transformation by fostering a collaborative culture between development and operations teams. This approach enhances the speed and efficiency of software delivery, ensuring quicker deployment of new features and updates. DevOps practices like continuous integration and continuous delivery (CI/CD) streamline workflows, reduce manual errors, and increase the overall reliability of software systems. By leveraging automation and monitoring tools, organizations can improve system stability, enhance customer experiences, and maintain a competitive edge. Ultimately, DevOps is pivotal in enabling businesses to innovate rapidly, respond to market changes, and achieve their digital transformation goals.
Stork Product Overview: An AI-Powered Autonomous Delivery FleetVince Scalabrino
Imagine a world where instead of blue and brown trucks dropping parcels on our porches, a buzzing drove of drones delivered our goods. Now imagine those drones are controlled by 3 purpose-built AI designed to ensure all packages were delivered as quickly and as economically as possible That's what Stork is all about.
14 th Edition of International conference on computer visionShulagnaSarkar2
About the event
14th Edition of International conference on computer vision
Computer conferences organized by ScienceFather group. ScienceFather takes the privilege to invite speakers participants students delegates and exhibitors from across the globe to its International Conference on computer conferences to be held in the Various Beautiful cites of the world. computer conferences are a discussion of common Inventions-related issues and additionally trade information share proof thoughts and insight into advanced developments in the science inventions service system. New technology may create many materials and devices with a vast range of applications such as in Science medicine electronics biomaterials energy production and consumer products.
Nomination are Open!! Don't Miss it
Visit: computer.scifat.com
Award Nomination: https://x-i.me/ishnom
Conference Submission: https://x-i.me/anicon
For Enquiry: Computer@scifat.com
WMF 2024 - Unlocking the Future of Data Powering Next-Gen AI with Vector Data...Luigi Fugaro
Vector databases are transforming how we handle data, allowing us to search through text, images, and audio by converting them into vectors. Today, we'll dive into the basics of this exciting technology and discuss its potential to revolutionize our next-generation AI applications. We'll examine typical uses for these databases and the essential tools
developers need. Plus, we'll zoom in on the advanced capabilities of vector search and semantic caching in Java, showcasing these through a live demo with Redis libraries. Get ready to see how these powerful tools can change the game!
Orca: Nocode Graphical Editor for Container OrchestrationPedro J. Molina
Tool demo on CEDI/SISTEDES/JISBD2024 at A Coruña, Spain. 2024.06.18
"Orca: Nocode Graphical Editor for Container Orchestration"
by Pedro J. Molina PhD. from Metadev
Orca: Nocode Graphical Editor for Container Orchestration
Workshop on command line tools - day 2
1. I Workshop on command-
line tools
(day 2)
Center for Applied Genomics
Children's Hospital of Philadelphia
February 12-13, 2015
2. awk - a powerful way to check conditions
and show specific columns
Example: show only CNV that use less than 3
targets (exons)
tail -n +2 DATA.xcnv | awk '$8 <= 3'
3. awk - different ways to do the same thing
tail -n +2 DATA.xcnv | awk '$8 <= 3'
# same effect 1
tail -n +2 DATA.xcnv | awk '$8 <= 3 {print}'
# same effect 2
tail -n +2 DATA.xcnv | awk 'if ($8 <= 3) {print}'
# same effect 3
tail -n +2 DATA.xcnv | awk 'if ($8 <= 3) {print $0}'
# different effect
tail -n +2 DATA.xcnv | awk 'if ($8 <= 3) {print $1}'
5. diff - compare files line by line
# Compare
diff DATA.gold.xcnv DATA.gold2.xcnv
# Tip: install tkdiff to use a
# graphic version of diff
6. Exercises
1. Using adhd.map, show 10 SNPs with rsID starting with 'rs' on
chrom. 2, between positions 1Mb and 2Mb
2. Check which chromosome has more SNPs
3. Check which snp IDs are duplicated
9. Using awk to check number of variants
in ped files
# Options using only awk, but takes (much) more time
awk 'NR == 1 {print (NF-6)/2}' adhd.ped
awk 'NR < 2 {print (NF-6)/2}' adhd.ped # Slow, too
# Better alternative
head -n 1 adhd.ped | awk '{print (NF-6)/2}'
# Now, the map file
wc -l adhd.map
10. time - time command execution
time head -n 1 adhd.ped | awk '{print (NF-6)/2}'
real 0m0.485s
user 0m0.391s
sys 0m0.064s
time awk 'NR < 2 {print (NF-6)/2}' adhd.ped
# Forget… just press Ctrl+C
real 1m0.611s
user 0m51.261s
sys 0m0.826s
11. top - display and update sorted information
about processes / display Linux taks
top
z : color
k : kill process
u : choose specific user
c : show complete commands running
1 : show usage of singles CPUs
q : quit
12. screen - screen manager with terminal emulation (i)
screen
screen -S <session_name>
Ctrl+a, then c: create window
Ctrl+a, then n: go to next window
Ctrl+a, then p: go to previous window
Ctrl+a, then 0: go to window number 0
Ctrl+a, then z: leave your session, but keep running
13. screen - screen manager with terminal emulation (ii)
Ctrl+a, then [ : activate copy mode (to scroll screen)
q : quit copy mode
exit : close current window
screen -r : resume the only session detached
screen -r <session_name> : resume specific
session detached
screen -rD <session_name> : reattach session
14. split - split a file into pieces
split -l <lines_of_each_piece> <input> <prefix>
# Example
split -l 100000 adhd.map map_
wc -l map_*
15. in-line Perl/sed to find and replace (i)
head DATA.gold.xcnv | cut -f3 | perl -pe 's/chr/CHR/g'
head DATA.gold.xcnv | cut -f3 | perl -pe 's/chr//g'
# Other possibilities
head DATA.gold.xcnv | cut -f3 | perl -pe 's|chr||g'
head DATA.gold.xcnv | cut -f3 | perl -pe 's!chr!!g'
head DATA.gold.xcnv | cut -f3 | sed 's/chr//g'
# Creating a BED file
head DATA.gold.xcnv | cut -f3 | perl -pe 's/[:-]/t/g'
16. in-line Perl/sed to find and replace (ii)
# "s" means substitute
# "g" means global (replace all matches, not only first)
# See the difference...
head DATA.gold.xcnv | cut -f3 | sed 's/9/nine/g'
head DATA.gold.xcnv | cut -f3 | sed 's/9/nine/'
# Adding more replacements
head DATA.gold.xcnv | cut -f3 | sed 's/1/one/g; s/2/two/g'
17. copy from terminal to clipboard/
paste from clipboard to terminal
# This is like Ctrl+V in your terminal
pbpaste
# This is like Ctrl+C from your terminal
head DATA.xcnv | pbcopy
# Then, Ctrl+V in other text editor
# On Linux, you can install "xclip"
http://sourceforge.net/projects/xclip/
18. datamash - command-line calculations
tail -n +2 DATA.xcnv |
head |
cut -f6,10,11 |
datamash mean 1 sum 2 min 3
# mean of 1st column
# sum of 2nd column
# minimum of 3rd column
http://www.gnu.org/software/datamash/
19. touch - change file access and
modification times
ls -lh DATA.gold.xcnv
touch DATA.gold.xcnv
ls -lh DATA.gold.xcnv
20. Introduction to "for" loop
tail -n +2 DATA.xcnv | cut -f1 | sort | uniq | head >
samples.txt
for sample in `cat samples.txt`; do touch $sample.txt; done
ls -lh Sample*
for sample in `cat samples.txt`; do
mv $sample.txt $sample.csv;
done
24. Exercise
1. Create a program that shows input
parameters/arguments
2. Create a program (say, "fields", or
"colnames") that prints the column names of
a <tab>-delimited file (example: DATA.xcnv)
3. Send this program to your PATH
25. Running a bash script (i)
cat > arguments.sh
echo Your program is $0
echo Your first argument is $1
echo Your second argument is $2
echo You entered $# parameters.
# Ctrl+C to exit "cat"
26. Running a bash script (ii)
bash arguments.sh
bash arguments.sh A B C D E
27. ls -lh arguments.sh
-rw-r--r--
# First character
b Block special file.
c Character special file.
d Directory.
l Symbolic link.
s Socket link.
p FIFO.
- Regular file.
chmod - set permissions (i)
28. Next characters
user, group, others | read, write, execute
ls -lh arguments.sh
-rw-r--r--
# Everybody can read
# Only user can write/modify
chmod - set permissions (ii)
29. # Add writing permission to group
chmod g+w arguments.sh
ls -lh arguments.sh
# Remove writing permission from group
chmod g-w arguments.sh
ls -lh arguments.sh
# Add execution permission to all
chmod a+x arguments.sh
ls -lh arguments.sh
chmod - set permissions (iii)
30. # Add writing permission to group
./arguments.sh
./arguments.sh A B C D E
# change the name
mv arguments.sh arguments
# Send to your PATH (showing on Mac)
sudo cp arguments /usr/local/bin/
# Go to other directory
# Type argu<Tab>, and "which arguments"
Run your program again