RStudio is a multi-platform integrated development environment (IDE) for R that allows users to develop R code on desktop or mobile devices. It provides features like code completion, executing code directly from source files, navigating to files and functions, version control, and interactive graphics. RStudio can be run locally or accessed via the web, making it a useful tool for developing R code from any device.
This document describes an R workshop on analyzing graphs and networks. It discusses representing graphs as mathematical objects in R and available R packages for graph analysis. Several graph analysis packages in R are listed, including igraph, which allows network visualization and analysis. The workshop agenda includes an introduction to graph concepts in R, possibilities for graph analysis in R, and an example analysis project. The goal is to help participants learn how to represent and analyze their relational data using the R programming language.
RStudio is a multi-platform integrated development environment (IDE) for R that allows users to develop R code on desktop or mobile devices. It provides features like code completion, executing code directly from source files, navigating to files and functions, version control, and interactive graphics. RStudio can be run locally or accessed via the web, making it a useful tool for developing R code from any device.
This document describes an R workshop on analyzing graphs and networks. It discusses representing graphs as mathematical objects in R and available R packages for graph analysis. Several graph analysis packages in R are listed, including igraph, which allows network visualization and analysis. The workshop agenda includes an introduction to graph concepts in R, possibilities for graph analysis in R, and an example analysis project. The goal is to help participants learn how to represent and analyze their relational data using the R programming language.
Analyse de réseaux en sciences sociales en général... et en histoire en parti...Laurent Beauguitte
Présentation sur l'analyse de réseaux en sciences sociales et en histoire faite à l'école d'été Ferney-Voltaire le 25 août 2014 (http://ferney2014.sciencesconf.org/)
Présentation faite à l'école d'été Ferney-Voltaire 2014 (http://ferney2014.sciencesconf.org/) : initiation à l'analyse de réseaux avec R (packages statnet et igraph)
Parallel R in snow (english after 2nd slide)Cdiscount
This presentation discusses parallelizing computations in R using the snow package. It demonstrates how to:
1. Create a cluster with multiple R sessions using makeCluster()
2. Split data across the sessions using clusterSplit() and export data to each node
3. Write functions to execute in parallel on each node using clusterEvalQ()
4. Collect the results, such as by summing outputs, to obtain the final parallelized computation. As an example, it shows how to parallelize the likelihood calculation for a probit regression model, reducing the computation time.
- The document discusses strategies for analyzing large datasets that are too big to fit into memory, including using cloud computing, the ff and rsqlite packages in R, and sampling with the data.sample package.
- The ff and rsqlite packages allow working with data beyond RAM limits but require rewriting code, while data.sample provides sampling without rewriting code but introduces sampling error.
- Cloud computing avoids rewriting code and has no memory limits but requires setup, and sampling is good for analysis but not reporting exact values.
The document introduces building a data science platform in the cloud using Amazon Web Services and open source technologies. It discusses motivations for using a cloud-based approach for flexibility and cost effectiveness. The key building blocks are described as Amazon EC2 for infrastructure, Vertica for fast data storage and querying, and RStudio Server for analytical capabilities. Step-by-step instructions are provided to set up these components, including launching an EC2 instance, attaching an EBS volume for storage, installing Vertica and RStudio Server, and configuring connectivity between components. The platform allows for experimenting and iterating quickly on data analysis projects in the cloud.
This document discusses mixing R source code and documentation in LaTeX documents using knitr. It recommends using knitr in RStudio to embed R code chunks and output (like graphs and tables) in LaTeX documents. Code chunks can include any R code to evaluate, show, or hide. Graphs and tables from R code chunks will be included in the LaTeX output.
This document describes a collapsed dynamic factor analysis model for macroeconomic forecasting. It summarizes that multivariate time series models can more accurately capture relationships between economic variables compared to univariate models. The document then presents a collapsed dynamic factor model that relates a target time series (yt) to unobserved dynamic factors (Ft) estimated from related macroeconomic data (gt). Out-of-sample forecasting experiments on US personal income and industrial production data demonstrate the model achieves more accurate point forecasts than univariate benchmarks like random walk or AR(2) models.
This document discusses time series forecasting and summarizes four illustrations of time series analysis and forecasting:
1. A multivariate model is used to analyze the European business cycle based on trends, common cycles, and leads/lags between economic indicators like GDP, industrial production, and confidence.
2. A bivariate unobserved components model is applied to daily Nordpool electricity spot prices and consumption data. The model decomposes the data into trends, seasons, cycles and residuals. Forecasting results show the bivariate model outperforms the univariate.
3. A periodic dynamic factor model is jointly modeled to 24 hours of French electricity load data. The model accounts for long-term trends, various seasonal patterns,
Analyse de réseaux en sciences sociales en général... et en histoire en parti...Laurent Beauguitte
Présentation sur l'analyse de réseaux en sciences sociales et en histoire faite à l'école d'été Ferney-Voltaire le 25 août 2014 (http://ferney2014.sciencesconf.org/)
Présentation faite à l'école d'été Ferney-Voltaire 2014 (http://ferney2014.sciencesconf.org/) : initiation à l'analyse de réseaux avec R (packages statnet et igraph)
Parallel R in snow (english after 2nd slide)Cdiscount
This presentation discusses parallelizing computations in R using the snow package. It demonstrates how to:
1. Create a cluster with multiple R sessions using makeCluster()
2. Split data across the sessions using clusterSplit() and export data to each node
3. Write functions to execute in parallel on each node using clusterEvalQ()
4. Collect the results, such as by summing outputs, to obtain the final parallelized computation. As an example, it shows how to parallelize the likelihood calculation for a probit regression model, reducing the computation time.
- The document discusses strategies for analyzing large datasets that are too big to fit into memory, including using cloud computing, the ff and rsqlite packages in R, and sampling with the data.sample package.
- The ff and rsqlite packages allow working with data beyond RAM limits but require rewriting code, while data.sample provides sampling without rewriting code but introduces sampling error.
- Cloud computing avoids rewriting code and has no memory limits but requires setup, and sampling is good for analysis but not reporting exact values.
The document introduces building a data science platform in the cloud using Amazon Web Services and open source technologies. It discusses motivations for using a cloud-based approach for flexibility and cost effectiveness. The key building blocks are described as Amazon EC2 for infrastructure, Vertica for fast data storage and querying, and RStudio Server for analytical capabilities. Step-by-step instructions are provided to set up these components, including launching an EC2 instance, attaching an EBS volume for storage, installing Vertica and RStudio Server, and configuring connectivity between components. The platform allows for experimenting and iterating quickly on data analysis projects in the cloud.
This document discusses mixing R source code and documentation in LaTeX documents using knitr. It recommends using knitr in RStudio to embed R code chunks and output (like graphs and tables) in LaTeX documents. Code chunks can include any R code to evaluate, show, or hide. Graphs and tables from R code chunks will be included in the LaTeX output.
This document describes a collapsed dynamic factor analysis model for macroeconomic forecasting. It summarizes that multivariate time series models can more accurately capture relationships between economic variables compared to univariate models. The document then presents a collapsed dynamic factor model that relates a target time series (yt) to unobserved dynamic factors (Ft) estimated from related macroeconomic data (gt). Out-of-sample forecasting experiments on US personal income and industrial production data demonstrate the model achieves more accurate point forecasts than univariate benchmarks like random walk or AR(2) models.
This document discusses time series forecasting and summarizes four illustrations of time series analysis and forecasting:
1. A multivariate model is used to analyze the European business cycle based on trends, common cycles, and leads/lags between economic indicators like GDP, industrial production, and confidence.
2. A bivariate unobserved components model is applied to daily Nordpool electricity spot prices and consumption data. The model decomposes the data into trends, seasons, cycles and residuals. Forecasting results show the bivariate model outperforms the univariate.
3. A periodic dynamic factor model is jointly modeled to 24 hours of French electricity load data. The model accounts for long-term trends, various seasonal patterns,
This document describes a collapsed dynamic factor analysis model for macroeconomic forecasting. It summarizes that multivariate time series models can more accurately capture relationships between economic variables compared to univariate models. The document then presents a collapsed dynamic factor model that relates a target time series (yt) to unobserved dynamic factors (Ft) estimated from related macroeconomic data (gt). Out-of-sample forecasting experiments on US personal income and industrial production data demonstrate the model achieves more accurate point forecasts than univariate benchmarks like random walk or AR(2) models.
This document discusses state space methods for time series analysis and forecasting. It begins by introducing the basic state space model framework, which represents a time series using unobserved states that evolve over time according to a state equation and generate observations according to an observation equation. The document then provides examples of how various time series models, such as regression models with time-varying coefficients, ARMA models, and univariate component models can be expressed as state space models. Finally, it introduces the Kalman filter algorithm, which provides a recursive means of estimating the unobserved states from the observations.
This document provides an overview of a course on forecasting time series using state space methods and unobserved components models. The course covers introduction to univariate component models, state space methods, forecasting different time series components, and exercises for practical forecasting applications with examples. Key topics include white noise processes, random walk processes, the local level model, and simulated data from a local level model.
Prévision de consommation électrique avec adaptive GAMCdiscount
The document discusses generalized additive models (GAM) for short-term electricity load forecasting. GAMs are smooth additive models that decompose a response variable into additive components like trends, cyclic patterns, and nonlinear effects. They summarize how GAMs can model various drivers of electricity consumption, including temperature effects, day-of-week patterns, and lagged load values. Big additive models (BAM) allow applying GAMs to large electricity load datasets. BAMs use QR decomposition and online updating to efficiently estimate high-dimensional additive models.
This document proposes a framework for predicting links in dynamic graph sequences. It formulates the problem as a convex optimization that minimizes three terms: (1) how well feature vectors of past graphs predict future feature vectors, (2) how well predicted features match predicted graph features, and (3) a penalty on the predicted graph to encourage simplicity. The framework assumes graph features change gradually over time and the predicted graph is low rank. It aims to leverage trade-offs between these terms to select predictive graph features.
2. Plan
Problématique et définition
Théorie des graphes: package igraph
1. Partitionnement de graphe :
ex. les communautés
2. Analyse des réseaux :
ex. la popularité
Avril 2012
4. Problématique
A partir d’une matrice de flux, quelles sont les relations
préférentielles?
Qui est pôle d’échange ?
Pour y répondre, nous utiliserons la théorie des graphes
avec la librairie igraph.
Avril 2012
5. Définition: modularité
C’est:
la somme des flux internes d’une communauté
-
la somme des flux reliant les mêmes communes dans un graph
plein.
Le poids de chaque flux est repondéré pour conserver le degré
des nœuds.
Avril 2012
6. Définition: La modularité
La modularité est une mesure pour la qualité d'un
partitionnement des nœuds d'un graph, en communautés.
L’objectif des partitionnements est de maximiser (sous
contrainte ou non) la modularité.
Avril 2012
7. Le partitionnement de graphe
L’approche divisive, on part d’une communauté, les
divisions successives doivent améliorer la modularité.
Avril 2012
8. Le partitionnement de graphe
Module de igraph: leading.eigenvector.community
Code:
library(igraph)
library(foreign)
base = "C:/Flux_dt_au.dbf"
base_flux=read.dbf(base,as.is=T)
g=graph.adjacency(matriceflux,mode="undirected",diag=F,
weighted=T,add.rownames="name")
lec <- leading.eigenvector.community(g)
communautes <- data.frame(V(g)$name, lec$membership)
modularite<- modularity(g, lec$membership)
Avril 2012
9. Le partitionnement de graphe
A l’inverse, l’approche gloutonne fusionne de manière
récursive des communautés atomiques à la première étape.
Avril 2012
10. Le partitionnement de graphe
Module de igraph: fastgreedy.community
Code:
g=graph.adjacency(matriceflux,mode="undirected",diag=F,weighted=T)
fgc<-
fastgreedy.community(g,merges=T,modularity=T,weights=E(g)$weight)
communautes<-community.to.membership(g,fgc$merges, steps=20)
modularite<-modularity(g, communautes$membership)
Avril 2012
11. Analyse des réseaux: la popularité
Quel indicateur permet de définir le territoire moteur des
échanges d’une zone ?
La réponse facebook est : « Le plus populaire est celui qui a des
amis populaires ».
Il existe plusieurs méthodes fondées sur l’étude du premier vecteur
propre de la matrice de flux dont le page rank.
g=graph.adjacency(matriceflux,mode="directed",weighted=T,diag=F,add.ro
wnames="name")
pr<-data.frame(V(g)$name,page.rank(g)$vector)
Avril 2012