Make SQL Server the preferred choice for managing Unstructured Data

Make SQL Server the preferred choice for managing
Unstructured Data and allow building Rich Application
Experience on top

Scale Up for storage and search to 100m to 500m documents
Easy use/access to Unstructured data from all applications
Rich insight into unstructured data to make better decisions

Transactional Access Streaming Win32 Access
Streaming Win32 Access??
Database Applications Windows Apps SQL Apps

Blobs SMB Share FileStream
Files/Folders API

Rich Services

Fulltext Search Database

Solutions
Scale-up
Semantic Similarity Disk Disk Disk

FileTable
1 2 3

FileStreams
Search
Multiple Containers

Integrated Administration?
Integrated Administration Remote BLOB Storage
Customer Application
SQL RBS API
D
D Centera SQL
B FileStre Azure lib lib FILESTREAM lib
B FileStreams

Integrated Azure Centera SQL DB
Backup/Replication/AlwaysOn

Machine Boundary
1 Write BLOB(Photo)
Application
2 Return Blob ID
2
RBS Client RBS 3 Write Blob ID to
Library Services: PhotoRef field
• Create
BLOB Store • Fetch
Provider Library • GC
• Delete
ClaimID ClaimDate PhotoRef
1 3 4390 6/5/2007 <Binary(20)>

BLOB Store SQL Server

// Store a new blob.
byte[] myBlobId;
SqlRemoteBlobContext blobContext = new SqlRemoteBlobContext(sqlConn);

using (SqlRemoteBlob newBlob = blobContext.CreateNewBlob()) {
// Write to a System.IO.Stream object.
newBlob.Write(…);
newBlob.Close();
myBlobId = newBlob.BlobId;
}
// Alternative way to write.
newBlob.WriteFromStream(inputStream);

// Add a new row including the blob ID to the database
// table.
// Fetch the blob.
using (SqlRemoteBlob existingBlob = blobContext.OpenBlob(myBlobId)) {
// Read from System.IO.Stream object.
existingBlob.Read(...);
}

// Alternative way to read.
existingBlob.ReadToStream(outputStream);

Store BLOBs in
DB + File System
Application

BLOB

DB

// New TSQL Function:
// Get_filestream_transaction_context()
//
SELECT Get_filestream_transaction_context()

// New TSQL Function :
// PathName()
//
SELECT ClaimImage.PathName()
FROM Insurancedb..Claims

// New SqlFileStream Class in VS08 SP1
//
SqlFileStream sfs = new SqlFileStream(path, txnId, System.IO.FileAccess.Read);

// output file to read into
System.IO.FileStream fs = new System.IO.FileStream ("c:output2.jpg", System.IO.FileMode.Create);
{
byte[] buffer = new byte[512 * 1024];
int cbBytesRead = buffer.Length;
while (cbBytesRead == buffer.Length)
{
cbBytesRead = sfs.Read(buffer, 0, buffer.Length);
fs.Write(buffer, 0, cbBytesRead);
}
}

sfs SqlFileStream

sfs.Write

// commit SQL transaction and close SQL connection.

FileTable Folder Hierarchy
FILESTREAM
Share
MSSQLSERVER

my_machine
Database MSSQLSERVEROffice
Directories DocsDocuments
Private Docs Office Docs
(Database1) (Database2)

FileTable
Directories
Media Documents LogFiles
(FileTable) (FileTable) (FileTable)
User-Defined
Directory
Structure

ALTER DATABASE Contoso SET FILESTREAM( non_transacted_access=FULL,
Directory_name = N’Contoso’)

CREATE TABLE Contoso..Documents AS FILETABLE
WITH (filetable_directory = N'Document Library')
<machine name><FILESTREAM share>ContosoDocument Library

FileTable Schema
File Attribute Name Type Purpose
Path_locator hierarchyid Represents position of this node in the hierarchical FileNamespace.

parent_path_locator hierarchyid Represents the hierarchyID of the parent directory
-- a computed column
stream_id uniqueidentifier UniqueId for Filestream Data
file_stream varbinary(max) filestream Filestream data

file_type nvarchar(255) Type of the file. Can be used for fulltext index creation

cached_file_size bigint Size of the filestream (cached value)
Name nvarchar(255) File/Folder Name (e.g foo.txt)
creation_time datetime2 Creation Time
last_write_time datetime2 LastWrite Time
last_access_time datetime2 LastAccess Time
is_directory bit TRUE for directories.
is_offline bit Offline attribute
is_hidden bit Hidden attribute
is_readonly bit Read Only attribute
is_archive bit Archive attribute
is_system bit System attribute
is_temporary bit Temporary attribute

ALTER TABLE Documents DISABLE FILETABLE_NAMESPACE

machine<FILESTREAMshare><Database_directory><FileTable_Directory>...

GetFileNamespacePath()
FileTableRootPath()
GetPathlocator()

DECLARE @path nvarchar(max)

// get FileNamespace path
SELECT @path=file_stream.GetFileNamespacePath()
FROM DocumentStore WHERE name='MySpec.doc';

// Open File handle
handle = CreateFile( @path, GENERIC_WRITE, 0, NULL, CREATE_ALWAYS,
FILE_ATTRIBUTE_NORMAL, NULL);

sys.dm_filestream_non_transact_handles

sp_kill_filestream_non_transacted_handles

Create/Alter Database
max_size
DBCC Shrinkfile Emptyfile

Use of multiple spindles for achieving better I/O Scalability

File Stores /
SQL BLOBs Remote Blob
External Blob FILESTREAM FILETABLE
API
Stores (CAS)

Depends on Depends on
Streaming Performance
external store external store

Depends on Depends on
Win32 App Compat external store external store

Link Level Consistency

Data Level Consistency

Integrated Query &
Management

Non-local Windows File
n/a
Servers

External Blob Stores n/a

Features FileServer+DB SQL 2008– SQL 2012–
Solution FILESTREAM FileTable

Integrated Admin operations for Relational and File No Yes Yes
data
- Backup/Restore, HA/Mirroring
Integrated Services for Relational and File data No Yes Yes
- Tex/Semantic Search, Reports, Query etc
Integrated Security Model No Yes Yes
In-place update of Filestream data Yes No Yes
(non-transacted)
Fully Transacted update of Filestream data No Yes Yes
File/Directory hierarchy in DB No No Yes
Win32 App compatibility Yes No Yes
Relational access to File Attributes No No Yes

Queries over 350M documents database and random DMLs running in background.
Beating SQL Server 2005 with a scale factor more than 2x and with avg 60x times better throughput

2005/8 vs 2012

2005/8

2012

Query avgExecTime (ms) under various number of connections (50 ~ 2000 users) for customer
playback benchmark

New Search Filter for Document Properties
CONTAINS (PROPERTY ( { column_name }, 'property_name' ), ‘contains_search_condition’ )

Source Table Keyphrases KeyphraseDocuments
--------------
Key Title Document -------------- ID Keyword ID DocID
D1 Annual Budget … --------------
--------------
-------------- T1 revenue T1 (revenue) D1 (Annual Budget)
D2 Corporate Earnings … --------------
--------------
--------------
-------------- T2 growth T2 (growth) D2 (Corporate Earnings)
D3 Marketing Reports … --------------
-------------- T3 Windows T3 (Windows) D3 (Marketing Reports)
--------------
--------------
… … … T4 Azure
-------------- … …
-------------- … … T1 (revenue) D7 (Finance Report)
1
… …
Full-Text and Semantic Processing T3 (Windows) D11 (Azure Strategy)

quarter, record,
T4 (Azure) D11 (Azure Strategy)
revenue…

3
DocumentSimilarity
2
a

Keyword Index (Full-Text) DocID MatchedDocID
ID Keyword Colid … compDocid CompOc CompPid D1 (Annual Budget) D2 (Corporate Earnings)
K1 revenue 1 … 10,23,123 (1,4),(5,8),(1,34) 2,5,6,8,4,3 D1 (Annual Budget) D7 (Finance Report)
K2 growth 1 … 10,23,123 (1,5),(5,9),(1,34) 2,5,6,8,5,4 D3 (Marketing Reports) D11 (Azure Strategy)
… … … … … … … …

CREATE FULLTEXT INDEX ON Production.Document ( ALTER FULLTEXT INDEX ON Production.Document
Title LANGUAGE 1033, ALTER COLUMN Document
Document ADD STATISTICAL_SEMANTICS
LANGUAGE 1033 WITH NO POPULATION;
TYPE COLUMN FileExtension
STATISTICAL_SEMANTICS …
) …
KEY INDEX PK_Document_DocumentID ALTER FULLTEXT INDEX ON Production.Document
ON documents_catalog START FULL POPULATION;
WITH CHANGE_TRACKING OFF, NO POPULATION;

Make SQL Server the preferred choice for managing Unstructured Data

Make SQL Server the preferred choice for managing Unstructured Data

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (18)

En vedette

En vedette (20)

Similaire à Make SQL Server the preferred choice for managing Unstructured Data

Similaire à Make SQL Server the preferred choice for managing Unstructured Data (20)

Plus de Michael Rys

Plus de Michael Rys (17)

Dernier

Dernier (20)

Make SQL Server the preferred choice for managing Unstructured Data

Notes de l'éditeur