Topic: Document Storage Management for PCMS
To: Development Team
Dated: 3rd March 2010
Objective:
To do the analysis for the large file storage in MS SQL Database.
2. 5.1.Transact-SQL Access:...............................................................................................5
5.2.File System Streaming Access: .................................................................................7
10.Integrated Management................................................................................................10
1. Objective:
To do the analysis for the large file storage in MS SQL Database.
2. Problem Definition:
PCMS has many documents that needs to be uploaded corresponding to Job Cards in all the modules. As
volume of documents is increased with passage of time, so it causes major development and operational
overheads. It grows more then GBs with in few months of period.
A study was conducted few months back; to adopt a third party file system to maintain documents out side
the actual database. Many solutions were analyzed but no concrete option was able to qualify all the
selection parameters like security, access speed, storage efficiency and operational management.
Microsoft has provided a native solution to this problem. They have merged the benefit of file storage
and Database storage under one umbrella with technology named as Filestream.
Large file storage is managed via Filestream.
3. FILESTREAM Definition:
FILESTREAM integrates the SQL Server Database Engine with an NTFS file system by
storing varbinary(max) binary large object (BLOB) data as files on the file system.
To specify that a column should store data on the file system, specify the FILESTREAM
attribute on a varbinary(max) column. This causes the Database Engine to store all data for
that column on the file system, but not in the database file.
4. Filestream operations Summary:
3. 4.1. How to: Enable FILESTREAM
Go to Start >> All Programs >> Microsoft SQL Server 2008>> Configuration Tools>> SQL
Server Configuration Manager >> SQL Server Services >> instance >> Select the Enable
FILESTREAM for Transact-SQL access
Then:
EXEC sp_configure filestream_access_level, 2
RECONFIGURE
4.2. How to: Create a FILESTREAM-Enabled Database
CREATE DATABASE Archive
ON
PRIMARY ( NAME = Arch1,
FILENAME = 'c:dataarchdat1.mdf'),
FILEGROUP FileStreamGroup1 CONTAINS FILESTREAM( NAME = Arch3,
FILENAME = 'c:datafilestream1')
LOG ON ( NAME = Archlog1,
FILENAME = 'c:dataarchlog1.ldf')
GO
The database contains three filegroups: PRIMARY, Arch1, and FileStreamGroup1. PRIMARY and Arch1
are regular filegroups that cannot contain FILESTREAM data. FileStreamGroup1 is the FILESTREAM
filegroup.
4.3. How to: Move a FILESTREAM-Enabled Database
To Displays the location of the physical database files that the FILESTREAM database uses.
To Takes the Archive database offline.
USE master
EXEC sp_detach_db Archive
Create the folder C:moved_location, and then move the files and folders that are listed in step 2
sets the Archive database online.
USE master
EXEC sp_detach_db Archive
GO
CREATE DATABASE Archive ON
4. PRIMARY ( NAME = Arch1,
FILENAME = 'c:moved_locationarchdat1.mdf'),
FILEGROUP FileStreamGroup1 CONTAINS FILESTREAM( NAME = Arch3,
FILENAME = 'c:moved_locationfilestream1')
LOG ON ( NAME = Archlog1,
FILENAME = 'c:moved_locationarchlog1.ldf')
FOR ATTACH
4.4. How to: Create a Table for Storing FILESTREAM Data
To specify that a column contains FILESTREAM data, you create a varbinary(max) column and add the
FILESTREAM attribute.
CREATE TABLE Archive.dbo.Records
(
[Id] [uniqueidentifier] ROWGUIDCOL NOT NULL UNIQUE,
[SerialNumber] INTEGER UNIQUE,
[Chart] VARBINARY(MAX) FILESTREAM NULL
)
GO
4.5. Managing FILESTREAM Data by Using Transact-SQL
Inserting NULL
INSERT INTO Archive.dbo.Records
VALUES (newid (), 1, NULL);
Creating a Data File
INSERT INTO Archive.dbo.Records
VALUES (newid (), 3,
CAST ('Seismic Data' as varbinary(max)));
GO
Updating FILESTREAM Data
UPDATE Archive.dbo.Records
SET [Chart] = CAST('Xray 1' as varbinary(max))
WHERE [SerialNumber] = 2;
Deleting FILESTREAM Data
DELETE Archive.dbo.Records
WHERE SerialNumber = 1;
5. GO
Selecting File Path
DECLARE @filePath varchar(max)
SELECT @filePath = Chart.PathName()
FROM Archive.dbo.Records
WHERE SerialNumber = 1
PRINT @filepath
GO
5. Dual Programming Model to Access BLOB Data:
5.1. Transact-SQL Access:
By using Transact-SQL, you can insert, update, and delete FILESTREAM data
See Sample:
SqlConnection sqlConnection = new SqlConnection(
"Data Source=COMSOFT-23;Initial Catalog=Archive;Integrated Security=True");
SqlCommand sqlCommand = new SqlCommand();
sqlCommand.Connection = sqlConnection;
try
{
sqlConnection.Open();
//The first task is to retrieve the file path
//of the SQL FILESTREAM BLOB that we want to
//access in the application.
sqlCommand.CommandText =
"SELECT Chart.PathName()"
+ " FROM Archive.dbo.Records"
+ " WHERE SerialNumber = 3";
String filePath = null;
Object pathObj = sqlCommand.ExecuteScalar();
if (DBNull.Value != pathObj)
filePath = (string)pathObj;
else
{
throw new System.Exception(
"Chart.PathName() failed"
+ " to read the path name "
+ " for the Chart column.");
}
//The next task is to obtain a transaction
6. //context. All FILESTREAM BLOB operations
//occur within a transaction context to
//maintain data consistency.
//All SQL FILESTREAM BLOB access must occur in
//a transaction. MARS-enabled connections
//have specific rules for batch scoped transactions,
//which the Transact-SQL BEGIN TRANSACTION statement
//violates. To avoid this issue, client applications
//should use appropriate API facilities for transaction management,
//management, such as the SqlTransaction class.
SqlTransaction transaction = sqlConnection.BeginTransaction("mainTranaction");
sqlCommand.Transaction = transaction;
sqlCommand.CommandText =
"SELECT GET_FILESTREAM_TRANSACTION_CONTEXT()";
Object obj = sqlCommand.ExecuteScalar();
byte[] txContext = (byte[])obj;
//The next step is to obtain a handle that
//can be passed to the Win32 FILE APIs.
SqlFileStream sqlFileStream = new SqlFileStream(filePath,
txContext,System.IO.FileAccess.ReadWrite);
byte[] buffer = new byte[512];
int numBytes = 0;
//Write the string, "EKG data." to the FILESTREAM BLOB.
//In your application this string would be replaced with
//the binary data that you want to write.
string someData = "EKG data.";
Encoding unicode = Encoding.GetEncoding(0);
sqlFileStream.Write(unicode.GetBytes(someData.ToCharArray()),
0,
someData.Length);
//Read the data from the FILESTREAM
//BLOB.
sqlFileStream.Seek(0L ,System.IO.SeekOrigin.Begin);
numBytes = sqlFileStream.Read(buffer, 0, buffer.Length);
string readData = unicode.GetString(buffer);
if (numBytes != 0)
{ // Console.WriteLine(readData);
System.Windows.MessageBox.Show(readData);
}
7. //Because reading and writing are finished, FILESTREAM
//must be closed. This closes the c# FileStream class,
//but does not necessarily close the the underlying
//FILESTREAM handle.
sqlFileStream.Close();
//The final step is to commit or roll back the read and write
//operations that were performed on the FILESTREAM BLOB.
sqlCommand.Transaction.Commit();
}
catch (System.Exception ex)
{
Console.WriteLine(ex.ToString());
}
finally
{
sqlConnection.Close();
}
return;
5.2. File System Streaming Access:
The Win32 streaming support works in the context of a SQL Server transaction.
Steps:
Read the FILESTREAM file path.
Read the current transaction context.
Obtain a Win32 handle and use the handle to read and write data to the FILESTREAM BLOB.
Each cell in a FILESTREAM table has a file path that is associated with it. To read the path, use the
PathName property of a varbinary(max) column in a Transact-SQL statement.
//Assumes GetConnectionString returns a valid connection string.
using (SqlConnection connection =
new SqlConnection("Data Source=COMSOFT-23;Initial Catalog=Archive;Integrated
Security=True"))
{
connection.Open();
SqlCommand command = connection.CreateCommand();
try
{
// Setup the command to execute the stored procedure.
command.CommandText = "GetData";
command.CommandType =System.Data.CommandType.StoredProcedure;
// Set up the input parameter for the DocumentID.
SqlParameter paramID =
new SqlParameter("@Id", System.Data.SqlDbType.Int);
paramID.Value = 3;
command.Parameters.Add(paramID);
// Set up the output parameter to retrieve the summary.
SqlParameter paramSummary =
new SqlParameter("@Chart",
System.Data.SqlDbType.VarChar, -1);
paramSummary.Direction =System.Data.ParameterDirection.Output;
8. command.Parameters.Add(paramSummary);
// Execute the stored procedure.
command.ExecuteNonQuery();
Console.WriteLine((String)(paramSummary.Value));
System.Windows.MessageBox.Show( (String)(paramSummary.Value));
}
catch (Exception ex)
{
Console.WriteLine(ex.Message);
}
}
6. FILESTREAM Best Practices
6.1. Physical Configuration and Maintenance
When you set up FILESTREAM storage volumes, consider the following guidelines:
• Turn off short file names on FILESTREAM computer systems. Short file names take significantly
longer to create. To disable short file names, use the Windows fsutil utility.
• Regularly defragment FILESTREAM computer systems.
• Use 64-KB NTFS clusters. Compressed volumes must be set to 4-KB NTFS clusters.
• Disable indexing on FILESTREAM volumes and set disablelastaccess To set disablelastaccess,
use the Windows fsutil utility.
• Disable antivirus scanning of FILESTREAM volumes when it is not unnecessary. If antivirus
scanning is necessary, avoid setting policies that will automatically delete offending files.
• Set up and tune the RAID level for fault tolerance and the performance that is required by an
application.
RAID level Write Read Fault Remarks
performance performance tolerance
RAID 5 Normal Normal Excellent Performance is better than one disk or
JBOD; and less than RAID 0 or RAID 5
with striping.
RAID 0 Excellent Excellent None
RAID 5 + Excellent Excellent Excellent Most expensive option.
stripping
6.2. Physical Database Design
When you design a FILESTREAM database, consider the following guidelines:
• FILESTREAM columns must be accompanied by a corresponding uniqueidentifier ROWGUID
column. These kinds of tables must also be accompanied by a unique index. Typically this index is
not a clustered index. If the databases business logic requires a clustered index, you have to make
9. sure that the values stored in the index are not random. Random values will cause the index to be
reordered every time that a row is added or removed from the table.
• For performance reasons, FILESTREAM filegroups and containers should reside on volumes
other than the operating system, SQL Server database, SQL Server log, tempdb, or paging file.
• Space management and policies are not directly supported by FILESTREAM. However, you can
manage space and apply policies indirectly by assigning each FILESTREAM filegroup to a
separate volume and using the volume's management features.
7. Application Design and Implementation
• When you are designing and implementing applications that use FILESTREAM, consider the
following guidelines:
• Use NULL instead of 0x to represent a non-initialized FILESTREAM column. The 0x value
causes a file to be created, and NULL does not.
• Avoid insert and delete operations in tables that contain nonnull FILESTREAM columns. Insert
and delete operations can modify the FILESTREAM tables that are used for garbage collection.
This can cause an application's performance to decrease over time.
• In applications that use replication, use NEWSEQUENTIALID() instead of NEWID().
NEWSEQUENTIALID() performs better than NEWID() for GUID generation in these
applications.
• The FILESTREAM API is designed for Win32 streaming access to data. Avoid using Transact-
SQL to read or write FILESTREAM binary large objects (BLOBs) that are larger than 2 MB. If
you must read or write BLOB data from Transact-SQL, make sure that all BLOB data is
consumed before you try to open the FILESTREAM BLOB from Win32. Failure to consume all
the Transact-SQL data might cause any successive FILESTREAM open or close operations to fail.
• Avoid Transact-SQL statements that update, append or prepend data to the FILESTREAM BLOB.
This causes the BLOB data to be spooled into the tempdb database and then back into a new
physical file.
• Avoid appending small BLOB updates to a FILESTREAM BLOB. Each append causes the
underlying FILESTREAM files to be copied. If an application has to append small BLOBs, write
the BLOBs into a varbinary(max) column, and then perform a single write operation to the
FILESTREAM BLOB when the number of BLOBs reaches a predetermined limit.
• Avoid retrieving the data length of lots of BLOB files in an application. This is a time-consuming
operation because the size is not stored in the SQL Server Database Engine. If you must determine
the length of a BLOB file, use the Transact-SQL DATALENGTH() function to determine the size
of the BLOB if it is closed. DATALENGTH() does not open the BLOB file to determine its size.
If an application uses Message Block1 (SMB1) protocol, FILESTREAM BLOB data should be read in 60-
KB multiples to optimize performance.
10. 8. When to Use:
The size and use of the data determines whether you should use database storage or file system storage.
• Objects that are being stored are, on average, larger than 1 MB.
• Fast read access is important.
• You are developing applications that use a middle tier for application logic.
• For smaller objects, storing varbinary(max) BLOBs in the database often provides better
streaming performance.
• The sizes of the File system based BLOBs are limited only by the volume size of the file system.
The standard varbinary(max) limitation of 2-GB file sizes does not apply to BLOBs that are
stored in the file system.
9. Integrated Security
• In SQL Server, FILESTREAM data is secured just like other data is secured: by granting
permissions at the table or column levels. If a user has permission to the FILESTREAM column in
a table, the user can open the associated files.
• Encryption is not supported on FILESTREAM data.
10. Integrated Management
SQL Server management tools and functions work without modification for
FILESTREAM data.
All backup and recovery models works with FILESTREAM data.
Can use a partial backup to exclude FILESTREAM filegroups.
11. Using FILESTREAM with Other SQL Server Features
Some limitations:
• Database Snapshots
• Replication
• Log Shipping
• Database Mirroring
• Full-Text
• Failover Clustering
• SQL Server Express