The data is present in random order, but the logical ordering is specified by the index. The data rows may be randomly spread throughout the table. The non-clustered index tree contains the index keys in sorted order, with the leaf level of the index containing the pointer to the page and the row number in the data page. In non-clustered index:
- The physical order of the rows is not the same as the index order.
- Typically created on column used in JOIN, WHERE, and ORDER BY clauses.
- Good for tables whose values may be modified frequently.
Microsoft SQL Server creates non-clustered indices by default when CREATE INDEX command is given. There can be more than one non-clustered index on a database table. There can be as many as 249 non-clustered indexes per table in SQL Server 2005 and 999 non-clustered indexes per table in SQL Server 2008. It is appropriate to create non-clustered index on columns which are:
- Frequently used in the search criteria.
- Used to join other tables.
- Used as foreign key fields.
- Of having high selectivity (column which returns a low percentage (0-5%) of rows from a total number of rows on a particular value)
- Used in the ORDER BY clause.
- Of type XML (primary and secondary indexes need to be created; more on this in the coming articles).
Clustering alters the data block into a certain distinct order to match the index, resulting in the row data being stored in order. Therefore, only one clustered index can be created on a given database table. Clustered indices can greatly increase overall speed of retrieval, but usually only where the data is accessed sequentially in the same or reverse order of the clustered index, or when a range of items is selected.
Since the physical records are in this sort order on disk, the next row item in the sequence is immediately before or after the last one, and so fewer data block reads are required. The primary feature of a clustered index is therefore the ordering of the physical data rows in accordance with the index blocks that point to them. Some databases separate the data and index blocks into separate files, others put two completely different data blocks within the same physical file(s). Create an object where the physical order of rows is same as the index order of the rows and the bottom(leaf) level of clustered index contains the actual data rows.
Many developers tent to think that SQL Server stores data according to the order the records were entered into the table. This is not true. SQL Server stores data according to the way you create a clustered index. Keep in mind that SQL Server enforces PRIMARY KEY constraint by placing unique index on table. If there is no other clustered index on table, SQL Server will make PRIMARY KEY index cluster. When creating indexes other that PRIMARY KEY or UNIQUE KEY indexes, by default SQL Server will create them as non-clustered.
It is very important to have a clustered index on every table. If table does not have a clustered index, than all new records have to be added to the last data page occupied by the table. If table has a clustered index, than new records can be added to the last page or in the middle of the table. They can be added to the more suitable position according to he clustered index.
You can extend functionality of non-clustered indexes by including non-key columns to the leaf level of the non-clustered index. By including non-key columns, you can create non-clustered indexes that cover more queries. This indexes are called Covering indexes or Indexes with included columns. The notion of covering indexes is that SQL Server doesn’t need to user lookups between the non-clustered index and the table to return query results. Because a clustered index is the actual table, clustered index always cover queries. Included non-key columns have the following benefits:
- They can be data types not allowed as index key columns.
- They are not considered by the Database Engine when calculating the number of index key columns or index key size.
- A covering indexes always perform better than a non-covering indexes.
When you are creating covering index you should keep in mind some guidelines:
- Non-key columns are defined in the INCLUDE clause of the CREATE INDEX statement.
- Non-key columns can only be defined on non-clustered indexes on tables or indexed views.
- All data types are allowed except text, ntext, and image.
- Computed columns that are deterministic and either precise or imprecise can be included columns.
- As with key columns, computed columns derived from image, ntext, and text data types can be non-key (included) columns as long as the computed column data type is allowed as a non-key index column.
- Column names cannot be specified in both the INCLUDE list and in the key column list.
- Column names cannot be repeated in the INCLUDE list.
- A maximum of 1023 additional columns can be used as non-key columns (a table can have a maximum of 1024 columns).
Performance benefit gained by using covering indexes is typically great for queries that return a large number of rows (by the way this queries are called a non-selective queries). For queries that return only a small number of rows performance is small. But here you can ask, what is the small number of rows? Small numer of rows could be 10 rows for table with hundreds of rows or 1000 rows for table with 1 000 000 rows.
With SQL Server 2008 comes new type of index called Filtered Index. Filtered Index is a non-clustered index, especially suited to cover subset of data determined by simple WHERE clause. The B-tree containing rows for filtered index will contain only rows which satisfy the filter criteria used while creating index and hence well designed Filtered Index can rapidly improve query performance, reduce index maintenance costs and rapidly reduce index storage costs. Filtered index offers some benefits over standard full non-clustered index:
- As discussed above, the filtered index contains only those rows which satisfy the defined filter criteria. As a result it reduces the storage space requirement for the index. In the example below I will explain it more.
- The filtered statistics or statistics for a filtered index are more compact and accurate, because they consider only the rows in the filtered index and the smaller size of the filtered index reduces the cost/overhead of updating the statistics.
- The impact of data modification is less with a filtered index as it is updated only when the data of the index is impacted or a new record is inserted matching the filter criteria.
- Maintenance costs will be reduced as well since only a subset of rows will be in consideration while re-organizing or rebuilding the index.
- And most important, as it is an optimized non-clustered index, the query performance will improve if only a subset of data, which is covered by the filtered index criteria, is required.
In previous versions of SQL Server to get a similar benefit you had an option to use indexed vies. Using indexed views looks similar to filtered indexes but you can find some differences between this tho approaches. Here is a table with some of them: