Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
RDBL - The open source R package for database interaction
1. 1RDBL – The open source R package for database access
RDBL
The R database layer package
For information, please contact us:
http://functionalfinances.com
info@functionalfinances.com
2. 2RDBL – The open source R package for database access
Overview
RDBL is an open source package
An abstraction layer for R developers to interact with databases
No prior SQL knowledge required
data.frame based syntax
Lazy load – load the data only when absolutely necessary
Maps tables to db.model analogue to a data.frame
Package description
3. 3RDBL – The open source R package for database access
Main functionalities
Supports Oracle, MySql, SQLite, PostgreSQL, MS Access
Loading data from database with column and row filtering
as.data.frame performs loading data
merge to create SQL joins
logical indexing-like syntax for WHERE clauses
aggregate for GROUP BY queries
derived columns
What we have done already
4. 4RDBL – The open source R package for database access
Planned functionalities
INSERT INTO statements
Additional analytical queries (full range of standard SQL functionalities)
UPDATE statements
Optimizing generated SQL queries
What we are working on
5. 5RDBL – The open source R package for database access
Premium services
RDBL is an open source package available for free
Missing features you need can be ordered for a fee
We provide on-site trainings
We offer 24/7 support if required
What we can do for you
6. 6RDBL – The open source R package for database access
Detailed functionalities
library(RMySQL)
# Create a MySQL connection using DBI
# Database layout:
# Chinook database
db.connection <- dbConnect(MySQL(), dbname = "test")
# Set as the connection used
SetConnection(db.connection)
# Create several database models
artist.db <- db.model("Artist")
album.db <- db.model("Album")
# Load data
artists <- as.data.frame(artist.db)
albums <- as.data.frame(albums)
Creating and loading a db.model
7. 7RDBL – The open source R package for database access
Detailed functionalities
# Create the artist and album models
artist.db <- db.model("Artist")
album.db <- db.model("Album")
names(artist.db)
# [1] "ArtistId" "Name"
names(album.db)
# [1] "AlbumId" "Title" "ArtistId"
# Join tables together
# all.x = TRUE means LEFT JOIN
albums.of.artists.db <-
merge(artist.db,
album.db,
by = "ArtistId", # by.x ="ArtistId", by.y = "ArtistId"
all.x = TRUE)
# Load data
albums.of.artists <- as.data.frame(albums.of.artists.db)
Merging db.models
8. 8RDBL – The open source R package for database access
Detailed functionalities
# Column filtering
aoa.db <-
albums.of.artists.db[, c("Name",
"Title")]
# Row filtering
aoa.db <-
aoa.db[aoa.db$Name == "ACDC" &
aoa.db$Title == "Thunderstruck", ]
# Load data
thunder <- as.data.frame(aoa.db)
Filtering
9. 9RDBL – The open source R package for database access
Detailed functionalities
# rename columns with the same name
rename(genre.db, "Name", "GenreName")
rename(artist.db, "Name", "ArtistName")
# join tables together
# artist x album x track x genre to get the
# mapping between artists and genre
artist.genre.db <-
merge(merge(merge(artist.db, album.db,
by = "ArtistId")[,c("ArtistName",
"AlbumId")],
track.db,
by = "AlbumId")[, c("ArtistName",
"GenreId")],
genre.db,
by = "GenreId")[, c("ArtistName",
"GenreName")]
Merge – complex example
10. 10RDBL – The open source R package for database access
Detailed functionalities
# which group has the most rock songs?
rocks <- artist.genre.db[artist.genre.db$GenreName == "Rock", ]
most.rock.db <-
aggregate(rocks,
list(SongCount = count(rocks$GenreName)),
by = "ArtistName")
most.rock.db <-
most.rock.db[order.db(most.rock.db$SongCount,
decreasing = TRUE)]
rock.band <- as.data.frame(most.rock.db)[1, "ArtistName"]
Aggregation
11. 11RDBL – The open source R package for database access
Detailed functionalities
# which group has the most rock songs?
rocks <- artist.genre.db[artist.genre.db$GenreName == "Rock", ]
most.rock.db <-
aggregate(rocks,
list(SongCount = count(rocks$GenreName)),
by = "ArtistName")
most.rock.db$SongCount2 <-
most.rock.db$SongCount * 2
most.rock.db$SongCount3 <-
most.rock.db$SongCount + most.rock.db$SongCount2
# Load data
most.rock <- as.data.frame(most.rock.db)
Derived columns