SQL Server 2012 introduces new full-text search capabilities that allow rich semantic search over documents stored in SQL Server. The key features include:
1) Integrated full-text indexing and search over both structured and unstructured data stored in SQL Server tables.
2) Semantic search capabilities that understand relationships between concepts and terms in documents.
3) Support for filtering search results based on document properties and metadata.
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Make SQL Server the preferred choice for managing Unstructured Data
1.
2. Make SQL Server the preferred choice for managing
Unstructured Data and allow building Rich Application
Experience on top
3. Scale Up for storage and search to 100m to 500m documents
Easy use/access to Unstructured data from all applications
Rich insight into unstructured data to make better decisions
4. Transactional Access Streaming Win32 Access
Streaming Win32 Access??
Database Applications Windows Apps SQL Apps
Blobs SMB Share FileStream
Files/Folders API
Rich Services
Fulltext Search Database
Solutions
Scale-up
Semantic Similarity Disk Disk Disk
FileTable
1 2 3
FileStreams
Search
Multiple Containers
Integrated Administration?
Integrated Administration Remote BLOB Storage
Customer Application
SQL RBS API
D
D Centera SQL
B FileStre Azure lib lib FILESTREAM lib
B FileStreams
Integrated Azure Centera SQL DB
Backup/Replication/AlwaysOn
5. Machine Boundary
1 Write BLOB(Photo)
Application
2 Return Blob ID
2
RBS Client RBS 3 Write Blob ID to
Library Services: PhotoRef field
• Create
BLOB Store • Fetch
Provider Library • GC
• Delete
ClaimID ClaimDate PhotoRef
1 3 4390 6/5/2007 <Binary(20)>
BLOB Store SQL Server
6. // Store a new blob.
byte[] myBlobId;
SqlRemoteBlobContext blobContext = new SqlRemoteBlobContext(sqlConn);
using (SqlRemoteBlob newBlob = blobContext.CreateNewBlob()) {
// Write to a System.IO.Stream object.
newBlob.Write(…);
newBlob.Close();
myBlobId = newBlob.BlobId;
}
// Alternative way to write.
newBlob.WriteFromStream(inputStream);
7. // Add a new row including the blob ID to the database
// table.
// Fetch the blob.
using (SqlRemoteBlob existingBlob = blobContext.OpenBlob(myBlobId)) {
// Read from System.IO.Stream object.
existingBlob.Read(...);
}
// Alternative way to read.
existingBlob.ReadToStream(outputStream);
14. ALTER DATABASE Contoso SET FILESTREAM( non_transacted_access=FULL,
Directory_name = N’Contoso’)
CREATE TABLE Contoso..Documents AS FILETABLE
WITH (filetable_directory = N'Document Library')
<machine name><FILESTREAM share>ContosoDocument Library
15. FileTable Schema
File Attribute Name Type Purpose
Path_locator hierarchyid Represents position of this node in the hierarchical FileNamespace.
parent_path_locator hierarchyid Represents the hierarchyID of the parent directory
-- a computed column
stream_id uniqueidentifier UniqueId for Filestream Data
file_stream varbinary(max) filestream Filestream data
file_type nvarchar(255) Type of the file. Can be used for fulltext index creation
cached_file_size bigint Size of the filestream (cached value)
Name nvarchar(255) File/Folder Name (e.g foo.txt)
creation_time datetime2 Creation Time
last_write_time datetime2 LastWrite Time
last_access_time datetime2 LastAccess Time
is_directory bit TRUE for directories.
is_offline bit Offline attribute
is_hidden bit Hidden attribute
is_readonly bit Read Only attribute
is_archive bit Archive attribute
is_system bit System attribute
is_temporary bit Temporary attribute
28. File Stores /
SQL BLOBs Remote Blob
External Blob FILESTREAM FILETABLE
API
Stores (CAS)
Depends on Depends on
Streaming Performance
external store external store
Depends on Depends on
Win32 App Compat external store external store
Link Level Consistency
Data Level Consistency
Integrated Query &
Management
Non-local Windows File
n/a
Servers
External Blob Stores n/a
29. Features FileServer+DB SQL 2008– SQL 2012–
Solution FILESTREAM FileTable
Integrated Admin operations for Relational and File No Yes Yes
data
- Backup/Restore, HA/Mirroring
Integrated Services for Relational and File data No Yes Yes
- Tex/Semantic Search, Reports, Query etc
Integrated Security Model No Yes Yes
In-place update of Filestream data Yes No Yes
(non-transacted)
Fully Transacted update of Filestream data No Yes Yes
File/Directory hierarchy in DB No No Yes
Win32 App compatibility Yes No Yes
Relational access to File Attributes No No Yes
30.
31.
32.
33. Queries over 350M documents database and random DMLs running in background.
Beating SQL Server 2005 with a scale factor more than 2x and with avg 60x times better throughput
34. 2005/8 vs 2012
2005/8
2012
Query avgExecTime (ms) under various number of connections (50 ~ 2000 users) for customer
playback benchmark
35.
36. New Search Filter for Document Properties
CONTAINS (PROPERTY ( { column_name }, 'property_name' ), ‘contains_search_condition’ )
41. CREATE FULLTEXT INDEX ON Production.Document ( ALTER FULLTEXT INDEX ON Production.Document
Title LANGUAGE 1033, ALTER COLUMN Document
Document ADD STATISTICAL_SEMANTICS
LANGUAGE 1033 WITH NO POPULATION;
TYPE COLUMN FileExtension
STATISTICAL_SEMANTICS …
) …
KEY INDEX PK_Document_DocumentID ALTER FULLTEXT INDEX ON Production.Document
ON documents_catalog START FULL POPULATION;
WITH CHANGE_TRACKING OFF, NO POPULATION;
Notes de l'éditeur
SQL 2008 provides Filestreams as a way add large blobs/unstructured data streams into SQL and still be able to open a Win32 handle (using SQL API) and provide high streaming performance for the data Win32 Namespace support in SQL Server 2012 has the following goals Reduce the barrier to entry for customers who have data in file servers and have Win32 applications that work on these currently. By enabling Win32 namespace, SQL will generate Windows Share that can be exposed to existing Win32 applications similar to any file server shares. This can allow Win32 applications/mid tier servers (like IIS) to work with this data without having to understand the database/transaction semantics Single integrated set of Admin tools – SQL backup/restore, Replication, HA solutions etc Scale up – Add multiple disks on a machine for storing Filestream data. Use SQL services like Full text search for both FileStream and relational metadata, Property Promotion Infrastructure fro extracting interesting properties from SQL blobs/filestream to surface as relational columns for query
RBS API is exposed in RBS client library.
Blob id is generated after close.Now the app can stored the blob id in the RBS column.
To get the tran context, you need a transaction. This is a SQL tran.
We are reading from SqlFilestream and writing the bytes read into the output buffer.
URI: HealthCare.MRI.JoeSmithApplication::GetResourceStream Method : Returns a resource stream for a resource data file that is located at the specified UriWriting into a SqlFileStream: We use a buffer that we read into it and write from it.Fileoption: 0 => Default: buffered reads, no write through. Because no write through, might be in some cases, a bit faster.Native shipped first, we wanted client filestream code to be aggressive with flushing the cached writes.Manages sqlfilestream class shipped sometime after the native API.=========================If the file access is readwrite handle of SqlFilestream will be positioned at the beginning of the file. System.io.seek methods to move the handle..
Reading bigger buffers gives better performance FS volumeDedicated volumes means volumes not used for tempdb (non-OS, paging, SQL data & log volumes)If stored files are large as we generally recommend, format with 64K clustersDo compress filestream volumes or filestream containers, but ONLY if data to be stored is compressible. Note that in this case NTFS cluster size must be 4K.1 vol per container => enables space management at volume level.AV should be configured not to delete infected files but to quarantine them. Otherwise corruption will be reported.SMBWith 60KB: A read can happen in one single IO and ideally coming back in one single TCP-IP packet. It is not 64K because 64KB data can't fit in one single TCP/IP buffer.Partitioning:FILESTREAM columns require the presence of the ROWGUID unique index for aligned partitioning, or in case this is not possible, explicitly specifying the data placement option for the unique or primary key constraint on the ROWGUID column.
Optimized hot paths, removed unnecessary serialization, expensive FileSystem operations etc
Not first extraction; another instanceEach has specialty syntaxUser has to just know, and rememberBetter to have one construct for all extraction-related BR services
Expose this data to usersCustomize: Don’t want fancy relationship, just sharing concepts!
In all examples: choose value, choose storageImagine IntelliSense: start typing, here’s the value!