Library: Localization
... codecvt_base codecvt ... ... locale::facet
A code conversion facet
#include <locale> namespace std { class codecvt_base; template <class internT, class externT, class stateT> class codecvt; }
template<> class codecvt<char, char, mbstate_t>; template<> class codecvt<wchar_t, char, mbstate_t>;
The codecvt class template provides code conversion facilities. The implementation of codecvt<char, char, mbstate_t> performs no conversions. The codecvt<wchar_t, char, mbstate_t> specialization performs a widening/narrowing during in/out operations. This qualifies as a conversion according to the standard, even though the value of the character remains unchanged).
namespace std { class codecvt_base { public: enum result { ok, partial, error, noconv }; }; template <class internT, class externT, class stateT> class codecvt : public locale::facet, public codecvt_base { public: typedef internT intern_type; typedef externT extern_type; typedef stateT state_type; explicit codecvt(size_t = 0) result out(state_type&, const intern_type*, const intern_type*, const intern_type*&, extern_type*, extern_type*, extern_type*&) const; result unshift(state_type&, extern_type*, extern_type*, extern*&) const; result in(state_type&, const extern_type*, const extern_type*, const extern_type*&, intern_type*, intern_type*, intern_type*&) const; int encoding() const throw(); bool always_noconv() const throw(); int length(state_type&, const extern_type*, const extern_type*, size_t) const; int max_length() const throw(); static locale::id id; protected: virtual ~codecvt(); virtual result do_out(state_type&, const intern_type*, const intern_type*, const intern_type*&, extern_type*, extern_type*, extern_type*&) const; virtual result do_in(state_type&, const extern_type*, const extern_type*, const extern_type*&, intern_type*, intern_type*, intern_type*&) const; virtual result do_unshift(state_type&, extern_type*, extern_type*, extern_type*&) const; virtual int do_encoding() const throw(); virtual bool do_always_noconv() const throw(); virtual int do_length(state_type&, const extern_type*, const extern_type*, size_t) const; virtual int do_max_length() const throw(); }; }
intern_type
Type used for internal representation of a character value.
extern_type
Type used for internal representation of a character value.
state_type
Type of the third template argument.
explicit codecvt(size_t refs = 0)
Constructs a codecvt object. Calls locale::facet (refs).
The refs argument is set to the initial value of the object's reference count. A codecvt object f constructed with (refs == 0) that is installed in one or more locale objects will be destroyed and the storage it occupies will be deallocated when the last locale object containing the facet is destroyed, as if by calling delete static_cast<locale::facet*>(&f). A codecvt object constructed with (refs != 0) will not be destroyed by any locale objects into which it may have been installed.
static locale::id id;
Unique identifier for this type of facet.
The public members of the codecvt facet include an interface to protected members. Each public member xxx() has a corresponding virtual protected member do_xxx(). All work is delegated to these protected members. For instance, the public length() function simply calls its protected cousin do_length().
bool always_noconv() const throw(); int encoding() const throw(); result in(state_type& state, const extern_type *from, const extern_type *from_end, const extern_type*& from_next, intern_type *to, intern_type *to_limit, intern_type*& to_next) const; int length(state_type& state, const extern_type *from, const extern_type *end, size_t max) const; int max_length() const throw(); result out(state_type& state, const intern_type *from, const intern_type *from_end, const intern_type*& from_next, extern_type *to, extern_type *to_limit, extern_type*& to_next) const; result unshift(state_type& state, extern_type *to, extern_type *to_limit, extern_type*& to) const;
Each of these public member functions xxx simply calls the corresponding protected virtual do_xxx function.
virtual bool do_always_noconv() const throw();
Returns true if no conversion is required. This is the case if do_in and do_out return noconv for all valid arguments. The specialization codecvt<char,char,mbstate_t> returns true, while all other specializations return false (widening/narrowing is considered to be conversion).
virtual int do_encoding() const throw();
Returns one of the following
-1, if the external representation of a character uses a stateful encoding
a constant number representing the maximum width in extern_type elements used to represent a character in a fixed-width encoding
0, if the external representation of the characters in the character set uses a variable size encoding
virtual result do_in(state_type& state, const extern_type *from, const extern_type *from_end, const extern_type*& from_next, intern_type *to, intern_type *to_limit, intern_type*& to_next) const; virtual result do_out(state_type& state, const intern_type *from, const intern_type *from_end, const intern_type*& from_next, extern_type *to, extern_type *to_limit, extern_type*& to_next) const;
Both functions take characters in the range of [from,from_end), apply an appropriate conversion, and place the resulting characters in the buffer starting at to. Each function converts at most from_end-from source characters, and stores no more than to_limit-to characters of the destination type. Both do_out and do_in stop if they find a character they cannot convert. In any case, from_next and to_next are always left pointing to the next character beyond the last one successfully converted.
Functions do_out and do_in must be called under the following preconditions:
from <= from_end
to <= to_end
state is either initialized to the beginning of a sequence or equal to the result of the previous conversion on the sequence.
In the case where no conversion is required, from_next is set to from and to_next is set to to. In all cases, regardless of return value, next pointers are set to point to one character past the last character that is successfully converted.
do_out and do_in return one of the following:
Return Value | Meaning |
ok |
Successfully completed the conversion of all complete characters in the source range. |
partial |
The characters in the source range would, after conversions, require space greater than that available in the destination range. |
error |
Encountered either a sequence of elements in the source range forming a valid source character that could not be converted to a destination character, or a sequence of elements in the source range that could not possibly form a valid source character. |
noconv |
intern_type and extern_type are the same type, and input sequence is identical to converted sequence. |
If no conversion is required, from_next is set to from and to_next is set to to. When conversion occurs, from_next is set to the element after the last complete character successfully converted.
virtual int do_length(state_type& state, const extern_type *from, const extern_type *end, size_t max) const;
Determines and returns n, where n is the number of elements of extern_type in the source range [from,end) that can be converted to max or fewer characters of intern_type, as if by a call to in(state, from, from_end, from_next, to, to_end, to_next) where (to_end==to + max).
Sets the value of state to correspond to the shift state of the sequence starting at (from + n).
Function do_length must be called under the following preconditions:
state is either initialized to the beginning of a sequence or equal to the result of the previous conversion on the sequence.
(from <= end) is well-defined and true.
Note that this function does not behave similarly to the C Standard Library function mbsrtowcs(). See the mbsrtowcs.cpp example program for an implementation of this function using the codecvt facet.
virtual int do_max_length() const throw();
Returns the maximum value that do_length() can return for any valid combination of its first three arguments, with the fourth argument (max) set to 1.
virtual result do_out(state_type& state, const intern_type *from, const intern_type *from_end, const intern_type*& from_next, extern_type *to, extern_type *to_limit, extern_type*& to_next) const;
See do_in above.
virtual result do_unshift(state_type& state, extern_type *to, extern_type *to_limit, extern_type*& to_next) const;
Determines the sequence of extern_type elements that should be appended to a sequence whose state is given by state, in order to terminate the sequence; that is, to return it to the default or initial or unshifted state. Stores the terminating sequence starting at to, proceeding no farther than to_limit. Sets to_next to point past the last extern_type element stored. The specializations codecvt<wchar_t, char, mbstate_t> and codecvt<char, char, mbstate_t> store no characters.
do_unshift must be called under the following preconditions:
to <= to_limit
The return value from do_unshift is as shown in Table 14:
Return Value | Meaning |
ok |
Terminating sequence was stored successfully. |
partial |
More space is needed in the destination buffer to store the shift sequence. |
error |
The state is invalid. |
noconv |
No terminating sequence is needed for this state. |
// // codecvt.cpp // #include <iostream> #include <codecvte.h> int main () { // not used, must be zero-initialized and supplied to facet std::mbstate_t state = std::mbstate_t (); // three strings to use as buffers std::string ins ("\xfc \xcc \xcd \x61 \xe1 \xd9 \xc6 \xe6 \xf5"); std::string ins2 (ins.size (), '.'); std::string outs (ins.size () / ex_codecvt ().encoding (), '.'); // Print initial contents of buffers std::cout << "Before:\n" << ins << '\n' << ins2 << '\n' << outs << "\n\n"; // Create a user defined codecvt facet // This facet converts from ISO Latin Alphabet No. 1 // (ISO 8859-1) to U.S. ASCII code page 437. // Replace the default codecvt<char, char, mbstate_t>. std::locale loc (std::cout.getloc (), new ex_codecvt); // Retrieve the facet from the locale. typedef std::codecvt<char, char, std::mbstate_t> CodeCvt; const CodeCvt& cdcvt = std::use_facet<CodeCvt>(loc); // unused, must be provided to codecvt<>::in/out const char *const_out_next = 0; const char *const_in_next = 0; char *in_next = 0; char *out_next = 0; // convert the buffer cdcvt.in (state, ins.c_str(), ins.c_str() + ins.length(), const_in_next, &outs[0], &outs[0] + outs.length(), out_next); std::cout << "After in:\n" << ins << '\n' << ins2 << '\n' << outs << "\n\n"; // zero-initialize (unused) state object state = std::mbstate_t (); // Finally, convert back to the original codeset cdcvt.out (state, outs.c_str(), outs.c_str() + outs.length(), const_out_next, &ins[0], &ins[0] + ins.length(), in_next); std::cout << "After out:\n" << ins << '\n' << ins2 << '\n' << outs << '\n'; return 0; } Program Output:
Before: ü Ì Í a á Ù Æ æ õ ................. ................. After in: ü Ì Í a á Ù Æ æ õ ................. I I a U ' ` o After out: ü I I ã á U Æ æ õ ................. I I a U ' ` o
locale, Facets, codecvt_byname
ISO/IEC 14882:1998 -- International Standard for Information Systems -- Programming Language C++, Section 22.2.1.5