Monday, 15 July 2013

What is the difference between Checkpoint and Lazywriter?



Simplified version:

Checkpoint

1. Objective of checkpoint is to keep recovery time minimum.
2. This is done by scaning dirty pages in buffer cache and writes them to disk.
3. Checkpoint never puts buffers to free list.
4. Scope of Checkpoint is in database level.
5. Checkpoint can be called manually (CHECKPOINT cmd) or automatically
at regular interval (recovery interval property).

Lazywriter

1. Lazywriter is a background thread.
2. Objective of Lazywriter is to handle memory pressure.
3. Each SQL Instance has one/more Lazywriter threads (one per NUMA node).
4. Lazywriter thread is invoked at regular interval.
5. If free buffer count is less, Lazywriter scans 64 buffers and writes dirty pages to disk
and creates free buffers list.
6. Free buffers are released to OS when there is less available memory or if it detects
huge paging.

Saturday, 13 July 2013

Difference Between Onsite and Offsite Data Backup


All computer systems have the potential to fail. When that happens, significant information may be lost forever. Most business professionals understand how critical it is to back up company data. Backup systems have one purpose – to restore data in the case of an emergency. The advancements in cloud backup technology have opened up possibilities that improve the backup process. Knowing the difference between onsite and offsite data backup will come in handy when it is time to create storage protocols.
A Little about Cloud Technology
By now, anyone with computer savvy has at least heard of cloud technology. Working in the cloud means companies get all the benefits of expensive hardware without the capital expenditure. Cloud service provides high-quality servers with regular maintenance and upgrades by IT professionals. For businesses, there is no costly or space consuming equipment that requires full-time attention, just a service fee.
Onsite Backup
Onsite backup systems are limiting. They work for data that is not permanent or critical. With an onsite process, information sits on a local storage device. This usually means a flash drive, DVD, external hard drive – any form of external equipment. Advantages to onsite storage include:
  • Data is readily available
  • Companies own the storage medium
  • Access to information does not require Internet service
The downside to this method of backup is the potential of catastrophe. Onsite storage medium is not permanent. It can fail if a disaster happens. A broken water pipe is all it takes to ruin onsite storage. The data exists in a very fragile environment.
Offsite Backup
Offsite backup offers most of the same advantages as onsite storage but without the risks. A cloud backup system is like renting a virtual offsite storage container to keep valuables safe. Cloud technology allows businesses to access the platform via the Internet. Offsite data servers provide:
  • Immediate access from any location
  • Data preservation
The beauty of an offsite platform is dueling backup processes. A company backs up their system onto the cloud. The cloud service also backs up their information. Even if one side goes down, a copy of the data will remain intact.
With an offsite program, all businesses need to gain access to their information is Internet service or FTP. This allows the company to recover at any location. For example, if there is a fire that shuts down the main office, a remote location can keep the company going to diffuse down time. A proper disaster recovery plan includes storage of a backup system. Cloud technology offers everything a company needs to protect itself from disaster.
Making a Choice
Deciding whether a business will do better with an onsite or offsite backup plan is a matter of preference. Many companies opt to do a little of both. Critical data that requires permanent storage can go to the cloud backup platform. Local medium is available for saving information that is less pertinent. Data not vital to the company’s survival can sit on an external disk.
When it comes down to it, a backup system is only as reliable as the storage medium. The advancements in technology have made devices such as external hard drives almost indestructible, but there is always the possibility of system failure. A cloud backup platform eliminates that risk with their backup program. Offsite data storage means information is safely tucked away inside the cloud.

What Is the Difference Between RTO and RPO?

Both RPO and RTO are critical factors in an effective business continuity plan. Business continuity establishes a schematic that helps a company recover after a disaster hits. Disasters, by definition, are rarely expected. Fire, flood, theft, these are all examples of disasters that can put a poorly organised business under permanently. A continuity plan is a roadmap that details how an organisation continues to thrive while rebuilding. This is where RPO and RTO come into play.

Recovery Point Objective

Recovery point objective, or RPO, is a complex concept geared specifically towards data backup. A business that relies heavily on data is vulnerable during a shutdown. Consider, for example, a company that maintains a database that feeds an ecommerce site. If disaster strikes the datacentre, the inventory disappears or becomes out of date. As part of business continuity planning, the management must figure out how long they can afford to have no access to that system before the business fails. Or in other words, how much data they can afford to lose before it has serious consequences for their business. The answer is crucial to developing a system backup and disaster recovery schedule.
If that same business has a revolving inventory, updating the backup every hour improves the odds of recovering after the main system goes down. Companies that have few changes to their database might be able to update once a week and still stay in business.
RPO is that deadline –the amount of data a business can afford to lose before the failure causes severe problems or shuts them down.

Recovery Time Objective

Recovery time objective, or RTO, is simpler. It is a target time for resumption of their IT activities after a disaster has struck. A business that can afford to take a week before being fully operational again does not need to put as much money into disaster recovery preparation as the organisation that needs the doors open within two hours.
A data entry operation has a short RTO, so the company should invest heavily in disaster recovery systems, maybe even a second DR site. This secondary location would maintain a full system backup with workstations able to support the business if the main office is unable to open. A small boutique would have a longer RTO and not budget for a disaster recovery centre.

RPO vs. RTO

RPO is specifically about data backup in order to maintain continuity. It is essential to determining how often a business should schedule data backup on their network. RTO is how long it will take an organisation to get back up and running to the Recovery Point Objective.
Although, one does not necessarily have anything to do with the other, they are both elements in disaster recovery and business continuity management. One is about how long the company can survive without data while the other is about how long they can take to reopen their doors. A company could have an RPO of three days, but an RTO of just one. For example, a restaurant may be able to operate without a computer system, but they lose money and inventory with the doors shut.
RPO and RTO are important business concepts for companies to consider when developing a system that allows them to survive after disaster strikes. Although not directly related, but they are both a necessary part of the process.
===============================IN Short WORD================================
Recovery time objective: how much time a application/service be unavailable in case of a disaster...
 Recovery point objective: what will be recovered..
the actual defination :)
RPO (Recovery Point Objective) refers to the amount of data at risk. It's determined by the amount of time between data protection events and reflects the amount of data that potentially could be lost during a disaster recovery. The metric is an indication of the amount of data at risk of being lost.

RTO (Recovery Time Objective) is related to downtime. The metric refers to the amount of time it takes to recover from a data loss event and how long it takes to return to service. RTO refers then to the amount of time the system's data is unavailable or inaccessible preventing normal service.

Thursday, 11 July 2013

SQL SERVER – Checkpoint

Checkpoint is the process that writes all dirty pages to disk for the current database. Checkpoint process help to minimize time during a later recovery by creating a point where all dirty pages have been written to disk.

  • Checkpoints can be performed concurrently on the databases
  • If a checkpoint fails or is interrupted and a recover is required, the database engine can not recover from the “failure” checkpoint, is necessary to recover from the last successful checkpoint.
  • The database engine performs any modification to database pages in memory (for performance reasons) it cause dirty pages, those dirty pages are not written to disk on each modification performed, those are written to disk just when a checkpoint occur.
  • When checkpoints occur?
    • Before a backup, the database engine performs a checkpoint, in order that all the changes to database pages (dirty pages) are contained in the backup.
    • Stopping the server using any of the following methods, they it cause a checkpoint.
      • Using Shutdown statement,
      • Stopping SQL Server through SQL Server configuration, SSMS, net stop mssqlserver and ControlPanel-> Services -> SQL Server Service.
      • When the “SHUTDOWN WITH NOWAIT” is used, it does not execute checkpoint on the database.
    • When the recovery internal server configuration is accomplished. This is when the active portion of logs exceeds the size that the server could recover in amount of time defined on the server configuration (recovery internal).
    • When the transaction log is 70% full and the database is in truncation mode.
      • The database is in truncation mode, when is in simple recovery model and after a backup statement has been executed.
  • The time required to perform a checkpoint depends directly of the amount of dirty pages that the checkpoint must write.

[Questions & Answers]

Q: What is Dirty Page?.
A: Dirty pages are the data pages that exists on the buffer cache and have been modified, but not yet written to disk.

Q: What is the checkpoint syntax?
A:

Checkpoint [duration_time]

Duration_time:
  • Time desired to perform the checkpoint process.
  • Is specified in seconds.
  • Must be greater than 0.
  • When is omitted, the database engine adjust automatically the duration time to minimize the performance impact.
  • Depending of the duration defined, is the amount of resources that SQL Servers assign to checkpoint process. Ex. If the duration_time is defined in 10 seconds but the checkpoints normally is going to take 20 seconds, SQL Server assigns more resources with performance impact in order to accomplish the 10 seconds defined. In the other hand, if the checkpoint is going to take 5 seconds, SQL Server assigns fewer resources than would be assigned by default. Checkpoint process usually takes more/less time than the specified.

MSSQL SERVER TEMPDB Restrictions

The following operations cannot be performed on the tempdb database:
  • Adding filegroups.
  • Backing up or restoring the database.
  • Changing collation. The default collation is the server collation.
  • Changing the database owner. tempdb is owned by dbo.
  • Creating a database snapshot.
  • Dropping the database.
  • Dropping the guest user from the database.
  • Enabling change data capture.
  • Participating in database mirroring.
  • Removing the primary filegroup, primary data file, or log file.
  • Renaming the database or primary filegroup.
  • Running DBCC CHECKALLOC.
  • Running DBCC CHECKCATALOG.
  • Setting the database to OFFLINE.
  • Setting the database or primary filegroup to READ_ONLY.

Thursday, 4 July 2013

Index Optimization Tips

Consider creating index on column(s) frequently used in the WHERE, ORDER BY, and GROUP BY clauses.
These column(s) are best candidates for index creating. You should analyze your queries very attentively to avoid creating not useful indexes.


*****

Keep your indexes as narrow as possible.
Because each index take up disk space try to minimize the index key's size to avoid using superfluous disk space. This reduces the number of reads required to read the index and boost overall index performance.


*****

Drop indexes that are not used.
Because each index take up disk space and slow the adding, deleting, and updating of rows, you should drop indexes that are not used. You can use Index Wizard to identify indexes that are not used in your queries.


*****

Try to create indexes on columns that have integer values rather than character values.
Because the integer values usually have less size then the characters values size (the size of the int data type is 4 bytes, the size of the bigint data type is 8 bytes), you can reduce the number of index pages which are used to store the index keys. This reduces the number of reads required to read the index and boost overall index performance.


*****

Limit the number of indexes, if your application updates data very frequently.
Because each index take up disk space and slow the adding, deleting, and updating of rows, you should create new indexes only after analyze the uses of the data, the types and frequencies of queries performed, and how your queries will use the new indexes. In many cases, the speed advantages of creating the new indexes outweigh the disadvantages of additional space used and slowly rows modification. However, avoid using redundant indexes, create them only when it is necessary. For read-only table, the number of indexes can be increased.


*****

Check that index you tried to create does not already exist.
Keep in mind that when you create primary key constraint or unique key constraints SQL Server automatically creates index on the column(s) participate in these constraints. If you specify another index name, you can create the indexes on the same column(s) again and again.


*****

Create clustered index instead of nonclustered to increase performance of the queries that return a range of values and for the queries that contain the GROUP BY or ORDER BY clauses and return the sort results.
Because every table can have only one clustered index, you should choose the column(s) for this index very carefully. Try to analyze all your queries, choose most frequently used queries and include into the clustered index only those column(s), which provide the most performance benefits from the clustered index creation.


*****

Create nonclustered indexes to increase performance of the queries that return few rows and where the index has good selectivity.
In comparison with a clustered index, which can be only one for each table, each table can have as many as 249 nonclustered indexes. However, you should consider nonclustered index creation as carefully as the clustered index, because each index take up disk space and drag on data modification.


*****

Create clustered index on column(s) that is not updated very frequently.
Because the leaf node of a nonclustered index contains a clustered index key if the table has clustered index, then every time that a column used for a clustered index is modified, all of the nonclustered indexes must also be modified.


*****

Create clustered index based on a single column that is as narrow as possibly.
Because nonclustered indexes contain a clustered index key within their leaf nodes and nonclustered indexes use the clustered index to locate data rows, creating clustered index based on a single column that is as narrow as possibly will reduce not only the size of the clustered index, but all nonclustered indexes on the table also.


*****

Avoid creating a clustered index based on an incrementing key.
For example, if a table has surrogate integer primary key declared as IDENTITY and the clustered index was created on this column, then every time data is inserted into this table, the rows will be added to the end of the table. When many rows will be added a "hot spot" can occur. A "hot spot" occurs when many queries try to read or write data in the same area at the same time. A "hot spot" results in I/O bottleneck.
Note. By default, SQL Server creates clustered index for the primary key constraint. So, in this case, you should explicitly specify NONCLUSTERED keyword to indicate that a nonclustered index is created for the primary key constraint.


*****

Create a clustered index for each table.
If you create a table without clustered index, the data rows will not be stored in any particular order. This structure is called a heap. Every time data is inserted into this table, the row will be added to the end of the table. When many rows will be added a "hot spot" can occur. To avoid "hot spot" and improve concurrency, you should create a clustered index for each table.


*****

Don't create index on column(s) which values has low selectivity.
For example, don't create an index for columns with many duplicate values, such as "Sex" column (which has only "Male" and "Female" values), because in this case the disadvantages of additional space used and slowly rows modification outweigh the speed advantages of creating a new index.


*****

If you create a composite (multi-column) index, try to order the columns in the key as to enhance selectivity, with the most selective columns to the leftmost of the key.
The order of the columns in a composite (multi-column) index is very important. This can increase the chance the index will be used.


*****

If you create a composite (multi-column) index, try to order the columns in the key so that the WHERE clauses of the frequently used queries match the column(s) that are leftmost in the index.
The order of the columns in a composite (multi-column) index is very important. The index will be used to evaluate a query only if the leftmost index key's column are specified in the WHERE clause of the query. For example, if you create composite index such as "Name, Age", then the query with the WHERE clause such as "WHERE Name = 'Alex'" will use the index, but the query with the WHERE clause such as "WHERE Age = 28" will not use the index.


*****

If you need to join several tables very frequently, consider creating index on the joined columns.
This can significantly improve performance of the queries against the joined tables.


*****

Consider creating a surrogate integer primary key (identity, for example).
Every table must have a primary key (a unique identifier for a row within a database table). A surrogate primary key is a field that has a unique value but has no actual meaning to the record itself, so users should never see or change a surrogate primary key. Some developers use surrogate primary keys, others use data fields themselves as the primary key. If a primary key consists of many data fields and has a big size, consider creating a surrogate integer primary key. This can improve performance of your queries.


*****

Consider creating the indexes on all the columns, which referenced in most frequently used queries in the WHERE clause which contains the OR operator.
If the WHERE clause in the query contains an OR operator and if any of the referenced columns in the OR clause are not indexed, then the table or clustered index scan will be made. In this case, creating the indexes on all such columns can significantly improve your queries performance.


*****

If your application will perform the same query over and over on the same table, consider creating a covering index including columns from this query.
A covering index is an index, which includes all of the columns referenced in the query. So the creating covering index can improve performance because all the data for the query is contained within the index itself and only the index pages, not the data pages, will be used to retrieve the data. Covering indexes can bring a lot of performance to a query, because it can save a huge amount of I/O operations.


*****

Use the DBCC DBREINDEX statement to rebuild all the indexes on all the tables in your database periodically (for example, one time per week at Sunday) to reduce fragmentation.
Because fragmented data can cause SQL Server to perform unnecessary data reads and the queries performance against the heavy fragmented table can be very bad, you should periodically rebuild all indexes to reduce fragmentation. Try to schedule the DBCC DBREINDEX statement during CPU idle time and slow production periods.


*****

Use the DBCC INDEXDEFRAG statement to defragment clustered and secondary indexes of the specified table or view.
The DBCC INDEXDEFRAG statement is a new SQL Server 2000 command, which was not supported in the previous versions. Unlike DBCC DBREINDEX, DBCC INDEXDEFRAG does not hold locks long term and thus will not block running queries or updates. So, try to use the DBCC INDEXDEFRAG command instead of DBCC DBREINDEX, whenever possible.


*****

Consider using the SORT_IN_TEMPDB option when you create an index and when tempdb is on a different set of disks than the user database.
The SORT_IN_TEMPDB option is a new SQL Server 2000 feature, which was not supported in the previous versions. When you create an index with the SORT_IN_TEMPDB option, SQL Server uses the tempdb database, instead of the current database, to sort data during the index creation. Using this option can reduce the time it takes to create an index, but increases the amount of disk space used to create an index.


*****

Use the SQL Server Profiler Create Trace Wizard with "Identify Scans of Large Tables" trace to determine which tables in your database may need indexes.
This trace will show which tables are being scanned by queries instead of using an index.

SQL Server Transaction Log Stop To Grows So Fast

Problem
I had a production problem recently, the transaction log grew 1 GB every minute and filled up the entire drive. The error message I got was: "The transaction log for database 'MyDB' is full. To find out why space in the log cannot be reused, see the log_reuse_wait_desc column in sys.databases. My SQL Server is clustered and the issue happened after a node failover occurred. In this tip I cover steps that I took to resolve this problem.

Solution
The message I got was the transaction log drive is full and users are unable to access the application, DBAs never want to hear this! The usual DBA question is "what was changed"? 

My first priority was to fix the problem immediately, so users can access the application and then do root cause analysis. For the database in question, I changed the database recovery model to SIMPLE. Then I shrank the transaction log file to a reasonable size (note: you can't shrink it below its original size). After these changes, the log growth stopped, but this still did not tell me what the issue was. Something to note is that changing the recovery model to SIMPLE is not possible if Mirroring or Log Shipping is setup unless you remove these first, but these can be reconfigured later, the important thing is to stop the bleeding ASAP. 

Other options would have been to create another transaction log file on a different drive that had plenty of space or to clean up files on the drive that was running out of space. But in my case the file was growing so rapidly that I needed to change the recovery model so the growth would stop until this issue was resolved. 

Below are the steps that I took to check the size, change the recovery model and shrink the log file.

SELECT recovery_model_desc FROM sys.databases WHERE name = 'MyDB' --Before

ALTER DATABASE MyDB SET recovery simple

SELECT recovery_model_desc FROM sys.databases WHERE name = 'MyDB' --After

EXEC xp_fixeddrives --Check free drive space 

EXEC sp_helpdb MyDB -- Note the size of the log before shrink

DBCC shrinkfile(MyDB_log, 1024) -- shrink log to 1 GB

EXEC sp_helpdb MyDB -- Note the size of the log after shrink

EXEC xp_fixeddrives -- Check free drive space 


First aid is complete, database is up and users are able to use the application again.

Now I need to find out what was causing the log to grow so fast, fix the issue, put the database back in FULL recovery mode, do a full backup and make sure my log backups are working correctly. 

The drive now has enough space. I put the database in Full recovery mode again to see if I could catch the offending transaction and I monitored drive space and transaction log size. I immediately noticed the transaction log growing about 1 GB per minute. I then executed the block of code below to find out what may be causing the log growth.

DBCC opentran --get the spid and feed it into the next query

DECLARE @handle BINARY(20)
DECLARE @SPID INT
SET @SPID = 100 -- 100 was the spid from the dbcc opentran

SELECT @handle = sql_handle
FROM MASTER..sysprocesses
WHERE spid = @SPID
SELECT [text] FROM ::fn_get_sql(@handle)
GO 

The culprit was a stored procedure that was showing up every time I executed the above block of code, although the SPID changed each time. I looked into the stored procedure which was not very complicated, but does a bunch of deletes and inserts based on some condition. This procedure was called by an automated process every minute. What happens when a cluster node fail-over happens? The cache is flushed and all the query plans are gone. Since there has not been any issue with this procedure in the past, my first option was to recompile the procedure. Viola, it worked! The transaction log stopped growing so fast and I didn't see this procedure in the open transactions from the above code. So this simple stored procedure that worked fine suddenly caused a major production issue causing the application to shut down and the fix was as simple as a recompile. Quite interesting.

My next step was to start a full database backup ASAP. While the full database backup was running I kicked off transaction log backups every 5 minutes and they seemed to work fine. I changed the transaction log backup to its normal schedule. After the full backup was completed, I reviewed the sql log, drive space, transaction log size, open transactions and everything seemed fine. If it is a cluster, you could try failing back to the other node and see if that fixes the problem, which in my case was the need for a recompile. In a cluster, the storage is shared by the nodes, so there is no benefit from a storage point of view.