# Table Indexing#

Once a `Table` has been created, it is possible to create indices on one or more columns of the table. An index internally sorts the rows of a table based on the index column(s), allowing for element retrieval by column value and improved performance for certain table operations.

## Creating an Index#

To create an index on a table, use the `add_index()` method:

```>>> from astropy.table import Table
>>> t = Table([(2, 3, 2, 1), (8, 7, 6, 5)], names=('a', 'b'))
```

The optional argument `unique` may be specified to create an index with uniquely valued elements.

To create a composite index on multiple columns, pass a list of columns instead:

```>>> t.add_index(['a', 'b'])
```

In particular, the first index created using the `add_index()` method is considered the default index or the “primary key.” To retrieve an index from a table, use the `indices` property:

```>>> t.indices['a']
<SlicedIndex original=True index=<Index columns=('a',) data=<SortedArray length=4>
a  rows
--- ----
1    3
2    0
2    2
3    1>>
>>> t.indices['a', 'b']
<SlicedIndex original=True index=<Index columns=('a', 'b') data=<SortedArray length=4>
a   b  rows
--- --- ----
1   5    3
2   6    2
2   8    0
3   7    1>>
```

## Row Retrieval using Indices#

Row retrieval can be accomplished using two table properties: `loc` and `iloc`. The `loc` property can be indexed either by column value, range of column values (including the bounds), or a `list` or `numpy.ndarray` of column values:

```>>> t = Table([(1, 2, 3, 4), (10, 1, 9, 9)], names=('a', 'b'), dtype=['i8', 'i8'])
>>> t.loc[2]  # the row(s) where a == 2
<Row index=1>
a     b
int64 int64
----- -----
2     1
>>> t.loc[[1, 4]]  # the row(s) where a in [1, 4]
<Table length=2>
a     b
int64 int64
----- -----
1    10
4     9
>>> t.loc[1:3]  # the row(s) where a in [1, 2, 3]
<Table length=3>
a     b
int64 int64
----- -----
1    10
2     1
3     9
>>> t.loc[:]
<Table length=4>
a     b
int64 int64
----- -----
1    10
2     1
3     9
4     9
```

Note that by default, `loc` uses the primary index, which here is column `'a'`. To use a different index, pass the indexed column name before the retrieval data:

```>>> t.add_index('b')
>>> t.loc['b', 8:10]
<Table length=3>
a     b
int64 int64
----- -----
3     9
4     9
1    10
```

The property `iloc` works similarly, except that the retrieval information must be either an integer or a `slice`, and relates to the sorted order of the index rather than column values. For example:

```>>> t.iloc[0] # smallest row by value 'a'
<Row index=0>
a     b
int64 int64
----- -----
1    10
>>> t.iloc['b', 1:] # all but smallest value of 'b'
<Table length=3>
a     b
int64 int64
----- -----
3     9
4     9
1    10
```

## Effects on Performance#

Table operations change somewhat when indices are present, and there are a number of factors to consider when deciding whether the use of indices will improve performance. In general, indexing offers the following advantages:

• Table grouping and sorting based on indexed column(s) both become faster.

• Retrieving values by index is faster than custom searching.

There are certain caveats, however:

• Creating an index requires time and memory.

• Table modifications become slower due to automatic index updates.

• Slicing a table becomes slower due to index relabeling.

See here for an IPython notebook profiling various aspects of table indexing.

## Index Modes#

The `index_mode()` method allows for some flexibility in the behavior of table indexing by allowing the user to enter a specific indexing mode via a context manager. There are currently three indexing modes: `'freeze'`, `'copy_on_getitem'`, and `'discard_on_copy'`.

The `'freeze'` mode prevents automatic index updates whenever a column of the index is modified, and all indices refresh themselves after the context ends:

```>>> with t.index_mode('freeze'):
...    t['a'][0] = 0
...    print(t.indices['a']) # unmodified
<SlicedIndex original=True index=<Index columns=('a',) data=<SortedArray length=4>
a  rows
--- ----
1    0
2    1
3    2
4    3>>
>>> print(t.indices['a']) # modified
<SlicedIndex original=True index=<Index columns=('a',) data=<SortedArray length=4>
a  rows
--- ----
0    0
2    1
3    2
4    3>>
```

The `'copy_on_getitem'` mode forces columns to copy and relabel their indices upon slicing. In the absence of this mode, table slices will preserve indices while column slices will not:

```>>> ca = t['a'][[1, 3]]
>>> ca.info.indices
[]
>>> with t.index_mode('copy_on_getitem'):
...     ca = t['a'][[1, 3]]
...     print(ca.info.indices)
[<SlicedIndex original=True index=<Index columns=('a',) data=<SortedArray length=2>
a  rows
--- ----
2    0
4    1>>]
```

The `'discard_on_copy'` mode prevents indices from being copied whenever a column or table is copied:

```>>> t2 = Table(t)
>>> t2.indices['a']
<SlicedIndex original=True index=<Index columns=('a',) data=<SortedArray length=4>
a  rows
--- ----
0    0
2    1
3    2
4    3>>
...    t2 = Table(t)
...    print(t2.indices)
[]
```

## Updating Rows using Indices#

Row updates can be accomplished by assigning the table property `loc` a complete row or a list of rows:

```>>> t = Table([('w', 'x', 'y', 'z'), (10, 1, 9, 9)], names=('a', 'b'), dtype=['str', 'i8'])
>>> t.loc['x']
<Row index=1>
a     b
str1 int64
---- -----
x     1
>>> t.loc['x'] = ['a', 12]
>>> t
<Table length=4>
a     b
str1 int64
---- -----
w    10
a    12
y     9
z     9
>>> t.loc[['w', 'y']]
<Table length=2>
a     b
str1 int64
---- -----
w    10
y     9
>>> t.loc[['w', 'z']] = [['b', 23], ['c', 56]]
>>> t
<Table length=4>
a     b
str1 int64
---- -----
b    23
a    12
y     9
c    56
```

## Retrieving the Location of Rows using Indices#

Retrieval of the location of rows can be accomplished using a table property: `loc_indices`. The `loc_indices` property can be indexed either by column value, range of column values (including the bounds), or a `list` or `numpy.ndarray` of column values:

```>>> t = Table([('w', 'x', 'y', 'z'), (10, 1, 9, 9)], names=('a', 'b'), dtype=['str', 'i8'])
When creating an index via `add_index()`, the keyword argument `engine` may be specified to use a particular indexing engine. The available engines are:
The SCEngine depends on the `sortedcontainers` dependency. The most important takeaway is that `SortedArray` (the default engine) is usually best, although `SCEngine` may be more appropriate for an index created on an empty column since adding new values is quicker.
The `BST` engine demonstrates a simple pure Python implementation of a search tree engine, but the performance is poor for larger tables. This is available in the code largely as an implementation reference.