Associative Containers

Associative and sequential containers differ from one another in a fundamental way: Elements in an associative container are stored and retrieved by a key. In contrast, elements in a sequential container are stored and accessed sequentially by their position in the container.

Associative containers support efficient lookup and retrieval by a key. The two primary associative-container types are map and set. The elements in a map are key–value pairs: The key serves as an index into the map, and the value represents the data associated with that index. A set element contains only a key; a set supports efficient queries as to whether a given key is present. We might use a set to hold words that we want to ignore during some kind of text processing. A dictionary would be a good use for a map: The word would be the key, and its definition would be the value.

The library provides eight associative containers, listed in next table. These eight differ along three dimensions: Each container is (1) a set or a map, (2) requires unique keys or allows multiple keys, and (3) stores the elements in order or not.

Type	definition
map	Associated array; Key-value pairs
set	unique key;
multimap	key can appear multiple times
multiset	key can appear multiple times
unordered_map	hash ~
unordered_set	hash ~
unordered_multimap	hash ~
unordered_multiset	hash ~

Using an Associative Container

Using a map

A classic example that relies on associative arrays is a word-counting program, this program reads its input and reports how often each word appears.

//read word from istream, when word is EOF end read
int read_word(istream& in, string& word) {
    in >> word;
    if (word == ("EOF")) {
        return 0;
    }
    return 1;
}

int main() {
    // count the number of times each word occurs in the input
    map<string, size_t> count_words;
    string word;
    while (read_word(cin, word)) {
        ++count_words[word];
    }
    //in vs2019 type rfor can autofill for range
    for (const auto& w : count_words) {
        cout << w.first << " appear " << w.second << " times." << endl;
    }
    return 0;
}
//C++
//java
//C++
//c#
//EOF
//C++ appear 2 times.
//c# appear 1 times.
//java appear 1 times.

Like the sequential containers, the associative containers are templates. To define a map, we must specify both the key and value types. In this program, the map stores elements in which the keys are string and the values are size_t. When we subscript word_count, we use a string as the subscript, and we get back the size_t counter associated with that string.

Using a set

As a example, suppose some words we want exclude when count word, we can use a set to save the words need to exclude.

//SAME------
//save exclude word
while (read_word(cin, word)) {
    exclude.insert(word);
}
//count word
while (read_word(cin, word)) {
    if (exclude.find(word) == exclude.end())
        ++count_words[word];
}
//SAME------

Like the other containers, set is a template. To define a set, we specify the type of its elements, which in this case are string.

Overview of the Associative Containers

Associative containers (both ordered and unordered) support the general container operations covered in previous chapter table. However, the associative containers do not support the sequential-container position-specific operations, such as push_front or back.

Defining an Associative Container

As we’ve just seen, when we define a map, we must indicate both the key and value type; when we define a set, we specify only a key type, because there is no value type. Each of the associative containers defines a default constructor, which creates an empty container of the specified type. We can also initialize an associative container as a copy of another container of the same type or from a range of values, so long as those values can be converted to the type of the container. Under the new standard, we can also list initialize the elements:

map<string, size_t> count_words = { {"hello",3} };
set<string> exclude = { "c++","Java" };

The keys in a map or a set must be unique; there can be only one element with a given key. The multimap and multiset containers have no such restriction; there can be several elements with the same key. For example, the map we used to count words must have only one element per given word. On the other hand, a dictionary could have several definitions associated with a particular word.

// define a vector with 20 elements, holding two copies of each number from 0 to 9
vector<int> ivec;
for (vector<int>::size_type i = 0; i != 10; ++i) {
    ivec.push_back(i);
    ivec.push_back(i); // duplicate copies of each number
}
// iset holds unique elements from ivec; miset holds all 20 elements
set<int> iset(ivec.cbegin(), ivec.cend());
multiset<int> miset(ivec.cbegin(), ivec.cend());
cout << ivec.size() << endl; // prints 20
cout << iset.size() << endl; // prints 10
cout << miset.size() << endl; // prints 20

Requirements on Key Type

The associative containers place constraints on the type that is used as a key. We’ll cover the requirements for keys in the unordered containers in section 4. For the ordered containers, the key type must define a way to compare the elements. By default, the library uses the < operator for the key type to compare the keys. In the set types, the key is the element type; in the map types, the key is the first type.

The pair Type

Before we look at the operations on associative containers, we need to know about the library type named pair, which is defined in the utility header. A pair holds two data members. Like the containers, pair is a template from which we generate specific types. We must supply two type names when we create a pair. The data members of the pair have the corresponding types. There is no requirement that the two types be the same:

pair<string, string> anon; // holds two strings
pair<string, size_t> word_count; // holds a string and an size_t
pair<string, vector<int>> line; // holds string and vector<int>

We can also provide initializers for each member:

pair<string, string> author{"James", "Joyce"};

Unlike other library types, the data members of pair are public. These members are named first and second, respectively.

cout <<author.first<<author.second << endl;

Imagine we have a function that needs to return a pair. Under the new standard we can list initialize the return value:

pair<string, int> process(vector<string>& v) {
    // process v
    if (!v.empty())
        return { v.back(), v.back().size() }; // list initialize
    else
        return pair<string, int>(); // explicitly constructed return value
}

Alternatively, we could have used make_pair to generate a new pair of the appropriate type from its two arguments:

return make_pair(v.back(), v.back().size());

Operations on Associative Containers

In addition to the types listed in previous chapter, the associative containers define the types listed in next list:

key_type: type of the key for container tyoe
map_type: type associated with each key.
value_type: for set same as key_type

set<string>::value_type v1; // v1 is a string
set<string>::key_type v2; // v2 is a string
map<string, int>::value_type v3; // v3 is a pair<const string, int>
map<string, int>::key_type v4; // v4 is a string
map<string, int>::mapped_type v5; // v5 is an int

Associative Container Iterators

When we dereference an iterator, we get a reference to a value of the container’s value_type. In the case of map, the value_type is a pair in which first holds the const key and second holds the value:

// get an iterator to an element in word_count
map<string, size_t> count_words = { {"hello",3} };
auto map_it = count_words.begin();
// *map_it is a reference to a pair<const string, size_t> object
cout << map_it->first; // prints the key for this element
cout << " " << map_it->second; // prints the value of the element
map_it->first = "new key"; // error: key is const
++map_it->second; // ok: we can change the value through an iterator

Although the set types define both the iterator and const_iterator types, both types of iterators give us read-only access to the elements in the set. Just as we cannot change the key part of a map element, the keys in a set are also const. We can use a set iterator to read, but not write, an element’s value:

set<int> iset = {0,1,2,3,4,5,6,7,8,9};
set<int>::iterator set_it = iset.begin();
if (set_it != iset.end()) {
    *set_it = 42; // error: keys in a set are read-only
    cout << *set_it << endl; // ok: can read the key
}

In general, we do not use the generic algorithms with the associative containers. The fact that the keys are const means that we cannot pass associative container iterators to algorithms that write to or reorder container elements. Such algorithms need to write to the elements. The elements in the set types are const, and those in maps are pairs whose first element is const.

Associative containers can be used with the algorithms that read elements. However, many of these algorithms search the sequence. Because elements in an associative container can be found (quickly) by their key, it is almost always a bad idea to use a generic search algorithm.

Adding Elements

The insert members add one element or a range of elements. Because map and set (and the corresponding unordered types) contain unique keys, inserting an element that is already present has no effect:

vector<string> k = { "ourb","uvrgsu","ourb" };
auto m = process(k);
set<string> s;
s.insert(k.begin(), k.end());
s.insert({ "ourb","ivgr" });

The versions of insert that take a pair of iterators or an initializer list work similarly to the corresponding constructors—only the first element with a given key is inserted.

When we insert into a map, we must remember that the element type is a pair. Often, we don’t have a pair object that we want to insert. Instead, we create a pair in the argument list to insert:

// four ways to add word to word_count
word_count.insert({word, 1});
word_count.insert(make_pair(word, 1));
word_count.insert(pair<string, size_t>(word, 1));
word_count.insert(map<string, size_t>::value_type(word, 1));

The value returned by insert (or emplace) depends on the container type and the parameters. For the containers that have unique keys, the versions of insert and emplace that add a single element return a pair that lets us know whether the insertion happened. The first member of the pair is an iterator to the element with the given key; the second is a bool indicating whether that element was inserted, or was already there. If the key is already in the container, then insert does nothing, and the bool portion of the return value is false. If the key isn’t present, then the element is inserted and the bool is true.

while (read(cin,word)) {
    auto isSuccess = count_words.insert({ word,1 });
    if (!isSuccess.second) {
        ++isSuccess.first->second;
    }
}

For each word, we attempt to insert it with a value 1. If word is already in the map, then nothing happens. In particular, the counter associated with word is unchanged. If word is not already in the map, then that string is added to the map and its counter value is set to 1.

Our word-counting program depends on the fact that a given key can occur only once. That way, there is only one counter associated with any given word. Sometimes, we want to be able to add additional elements with the same key.

multimap<string, int> au;
au.insert({ "A",10 });
au.insert({ "A",20 });
cout << au.size() << endl; // 2

Erasing Elements

The associative containers define three versions of erase, which are described in next table. As with the sequential containers, we can erase one element or a range of elements by passing erase an iterator or an iterator pair. These versions of erase are similar to the corresponding operations on sequential containers: The indicated element(s) are removed and the function returns void.

version	means
erase (const_iterator position);	erase by a iterator.
erase (const key_type& k);	erase by a key.
erase (first, last)	erase by a range of element by a pair of iterator.

The version need parameter k removes all the elements, if any, with the given key and returns a count of how many elements were removed.

if (word_count.erase(removal_word))
    cout << "ok: " << removal_word << " removed\n";
else cout << "oops: " << removal_word << " not found!\n";

Subscripting a map

The map and unordered_map containers provide the subscript operator and a corresponding a~t function. The set types do not support subscripting because there is no “value” associated with a key in a set. The elements are themselves keys, so the operation of “fetching the value associated with a key” is meaningless. We can not subscript a multimap or an unordered_multimap because there may be more than one value associated with a given key.

Like the other subscript operators we’ve used, the map subscript takes an index (that is, a key) and fetches the value associated with that key. However, unlike other subscript operators, if the key is not already present, a new element is created and inserted into the map for that key.

// a empty map
map<string, string> au;
// insert a value-initialized element with key a; then assign 10 to its value
au["a"] = "10";
// a invalid kay will return initialize value and save
cout << au["b"] << endl;

au is searched for the element whose key is a. The element is not found.
A new key-value pair is inserted into au. The key is a const string holding Anna. The value is value initialized, meaning in this case that the value is "".
The newly inserted element is fetched and is given the value “10“.

The fact that the subscript operator adds an element if it is not already in the map, On the other hand, sometimes we only want to know whether an element is present and do not want to add the element if it is not.

Accessing Elements

The associative containers provide various ways to find a given element, which are described in next table. Which operation to use depends on what problem we are trying to solve. If all we care about is whether a particular element is in the container, it is probably best to use find. For the containers that can hold only unique keys, it probably doesn’t matter whether we use find or count. However, for the containers with multiple keys, count has to do more work: If the element is present, it still has to count how many elements have the same key. If we don’t need the count, it’s best to use find:

methods	means
find	Get iterator to element
count	Count elements with a specific key
lower_bound	Return iterator to lower bound
upper_bound	Return iterator to upper bound
equal_range	Get range of equal elements

multiset<int> iset = { 0,1,2,3,4,5,6,7,8,9,1 };
auto fa1 = iset.find(1); //returns an iterator that refers to the element with key = 1
auto fa2 = iset.find(11); // returns the iterator == iset.end()
auto fa3 = iset.count(1); // returns 2
auto fa4 = iset.count(11); // returns 0

Finding an element in an associative container that requires unique keys is a simple matter—the element is or is not in the container. For the containers that allow multiple keys, the process is more complicated: There may be many elements with the given key. When a multimap or multiset has multiple elements of a given key, those elements will be adjacent within the container.

multimap<int, int> ikdn;
ikdn.insert({ 1,2 });
ikdn.insert({ 2,5 });
ikdn.insert({ 1,7 });
auto num = ikdn.count(1);
auto iter = ikdn.find(1); // a iterator point to a pair
while (num) {
    cout << iter->second << endl;
    ++iter;
    --num;
}

We are guaranteed that iterating across a multimap or multiset returns all the elements with a given key in sequence.

Alternatively, we can solve our problem using lower_bound and upper_bound. Each of these operations take a key and returns an iterator. If the key is in the container, the iterator returned from lower_bound will refer to the first instance of that key and the iterator returned by upper_bound will refer just after the last instance of the key. If the element is not in the multimap, then lower_bound and upper_bound will return equal iterators; both will refer to the point at which the key can be inserted without disrupting the order. Thus, calling lower_bound and upper_bound on the same key yields an iterator range that denotes all the elements with that key.

multimap<int, int> ikdn;
ikdn.insert({ 1,2 });
ikdn.insert({ 2,5 });
ikdn.insert({ 1,7 });
for (auto i = ikdn.lower_bound(1); i != ikdn.upper_bound(1); i++) {
    cout << i->second << endl;
}

The remaining way to solve this problem is the most direct of the three approaches: Instead of calling upper_bound and lower_bound, we can call equal_range. This function takes a key and returns a pair of iterators. If the key is present, then the first iterator refers to the first instance of the key and the second iterator refers one past the last instance of the key.

for (auto pos = ikdn.equal_range(1); pos.first != pos.second; ++pos.first) {
    cout << pos.first->second << endl;
}

A Word Transformation Map

#include<bits/stdc++.h>
using namespace std;

void transform(ifstream& in_que, ifstream& in_tran) {
    map<string, string> dir;

    //read transform rules
    string sou, con, text, word;
    while (in_tran >> sou >> con) {
        dir[sou] = con;
    }
    in_tran.close();

    //read question
    while (getline(in_que, text)) {
        istringstream ss(text);
        while (ss >> word) {
            if (dir.find(word) != dir.end()) {
                cout << dir[word] << " ";
            }
            else {
                cout << word << " ";
            }
        }
        cout << endl;
    }
}

int main() {
    ifstream in_que("ques.txt");
    ifstream in_tran("trans.txt");
    transform(in_que, in_tran);
    return 0;
}

The Unordered Containers

The new standard defines four unordered associative containers. Rather than using a comparison operation to organize their elements, these containers use a hash function and the key type’s == operator. An unordered container is most useful when we have a key type for which there is no obvious ordering relationship among the elements. These containers are also useful for applications in which the cost of maintaining the elements in order is prohibitive.

Using an Unordered Container

Aside from operations that manage the hashing, the unordered containers provide the same operations (find, insert, and so on) as the ordered containers. That means that the operations we’ve used on map and set apply to unordered_map and unordered_set as well. Similarly for the unordered versions of the containers that allow multiple keys.

As a result, we can usually use an unordered container in place of the corresponding ordered container, and vice versa. However, because the elements are not stored in order, the output of a program that uses an unordered container will (ordinarily) differ from the same program using an ordered container.

For example, we can rewrite our original word-counting program to use an unordered_map and compare them;

#include<bits/stdc++.h>
using namespace std;

//uppercase string
void upperCase(string& str) {
    for (auto& i : str) {
        i = toupper(i);
    }
}

//read word from istream, when word is EOF end read
int read_word(istream& in, string& word) {
    in >> word;
    upperCase(word);
    if (word == ("EOF")) {
        return 0;
    }
    return 1;
}

int main() {
    // count the number of times each word occurs in the input
    map<string, size_t> count_words;
    unordered_map<string, size_t> unorder_count_words;

    set<string> exclude = { "c++","Java" };
    string word;
    while (read_word(cin, word)) {
        // add to order map
        auto isSuccess1 = count_words.insert({ word,1 });
        if (!isSuccess1.second) {
            ++isSuccess1.first->second;
        }
        // add to unorder map
        auto isSuccess2 = unorder_count_words.insert({ word,1 });
        if (!isSuccess2.second) {
            ++isSuccess2.first->second;
        }
    }
    //order map
    for (const auto& w : count_words) {
        cout << w.first << " appear " << w.second << " times." << endl;
    }
    cout << "__________" << endl;
    //unorder map
    for (const auto& w : unorder_count_words) {
        cout << w.first << " appear " << w.second << " times." << endl;
    }
    return 0;
}
//c
//a
//b
//eof
//A appear 1 times.
//B appear 1 times.
//C appear 1 times.
//__________
//B appear 1 times.
//C appear 1 times.
//A appear 1 times.

The unordered containers are organized as a collection of buckets, each of which holds zero or more elements. These containers use a hash function to map elements to buckets. To access an element, the container first computes the element’s hash code, which tells which bucket to search. The container puts all of its elements with a given hash value into the same bucket. If the container allows multiple elements with a given key, all the elements with the same key will be in the same bucket. As a result, the performance of an unordered container depends on the quality of its hash function and on the number and size of its buckets.

The unordered containers provide a set of functions, in here, that let us manage the buckets. These members let us inquire about the state of the container and force the container to reorganize itself as needed.

Requirements on Key Type for Unordered Containers

By default, the unordered containers use the == operator on the key type to compare elements. They also use an object of type hash<key_type> to generate the hash code for each element. The library supplies versions of the hash template for the built in types, including pointers. It also defines hash for some of the library types, including strings and the smart pointer types that we will describe in next chapter. Thus, we can directly define unordered containers whose key is one of the built-in types (including pointer types), or a string, or a smart pointer.

However, we cannot directly define an unordered container that uses a our own class types for its key type. Unlike the containers, we cannot use the hash template directly. Instead, we must supply our own version of the hash template.

(ch7) Associative Containers