What are cardinality statistics?

When Derby creates statistics for a table's index, it calculates and stores the following in the system tables.

For example, consider the primary key on the table FlightAvailability:


For this index, Derby keeps the following information:

How does Derby use these two numbers -- the number of rows in a table and the cardinality of a particular key -- to determine the selectivity of a query? Take this example:

SELECT * FROM Flights, FlightAvailability
WHERE Flights.flight_id = OtherTable.flight_id

If the cardinality for flight_id in Flights is 250, the selectivity of the predicate is 1/250. The optimizer would estimate the number of rows read to be as follows:

((Rows in Flights) * (Rows in OtherTable))/250
Related concepts
Determinations of rows scanned from disk for a table scan
Estimations of rows scanned from disk for an index scan
Statistics-based versus hard-wired selectivity
Working with cardinality statistics