1. Even though both selection sort and insertion sort are both O(n^2)
in the worst and average cases, in the best case insertion sort is
O(n).
The best case for insertion sort is when the data is in sorted order.
When the data is sorted, the amount of work to "insert" each element
into its correct spot in the already sorted portion of the list is
just constant.
Even if a few of the items in the list are not in order, for all of
the other items in the list, we will still have just a constant amount
of time, so if the list is mostly sorted insertion sort will still be
quite fast. On the other hand, selection sort always takes the same
amount of time regardless of input.
2. Big-O notation just describes how the run-time of an algorithm will
grow over time. In the limit (or for very large input sizes) A will
be faster than B. However, there could still be some inputs where B
is faster than A.
Take for example insertion sort. It is an O(n^2) algorithm, however,
on some inputs, it runs quite quickly.