Concatenated SMS
Concatenated SMS refers to message segmentation and reassembly procedure in order
to handle messages longer than standard SMS messages. This technology allows to
increase the size limit of text messages. Being familiar with concatenated SMSs
can be important concerning costs.
Standard SMS Messages Standard messages have a maximum payload of 140 bytes (1120 bits). GSM phones use a 7-bit character encoding. Therefore they allow a maximum of 160 characters per standard SMS message: 1120 bits / (7 bits/character) = 160 characters This means that when you send a text message, as long as the text only contains characters that are included in the GSM 7-bit character set, 160 7-bit characters are compressed into 140 8-bit bytes to produce the 160 character limit that we are so familiar with. (Note: 160 * 7 = 140 * 8). It is worth mentioning that ETSI GSM 03.38 also defines a few characters that are represented by two 7-bit characters when included in a text message. Since there are only a few, they are listed here: "^", "{", "}", "\", "[", "]", "~", "" and "?". Unicode phones use a 16-bit character encoding. This way they allow a maximum of 70 characters per standard SMS message: 1120 bits / (16 bits/character) = 70 characters If you want to send a message that contains characters that are not part of the GSM 7-bit character set, such as Chinese, Arabic, Thai, Cyrillic, etc., then the entire text of the SMS that actually goes out over the air needs to be encoded in the Unicode UCS-2 character set. In the UCS-2 character set, each character is encoded with 16-bits (or two 8-bit bytes). This means that an SMS message is limited to 70 16-bit Unicode characters (70 * 16 = 140 * 8). This size limit is determined by the character set used to transmit the message. Encodings Usually languages that are based on a Latin-based alphabet (e.g. English, French, etc), use phones supporting the GSM character encoding. The GSM character encoding uses 7 bits for representing every character (similar to ASCII). These characters are defined in the ETSI GSM 03.38 standard. As opposed to this, languages that are not based on a Latin-based alphabet (e.g. Chinese, Arabic, etc) use phones supporting Unicode. These phones usually use UTF-16 or UCS-2 character encoding. Both of these encoding techniques use 16 bits for representing each character. The Unicode character set can be used to send special symbols and characters of all languages including Chinese, Arabic, Hebrew, Cyrillic, special eastern European characters, etc. Character conversions and character sets When you send SMS messages from your PC (with the help of an SMS gateway software), the character set in your PC is a Windows or UNIX charset and not the GSM 7 bit or the GSM Unicode character set. For example you may use UTF 8, ISO-8859-1, ISO-8859-2. Some kind of character conversion is needed in these cases, as well. Character conversion is required to transfer PC characters to the appropriate SMS characters. The type of message is determined by this conversion (SMS with English characters or SMS with Unicode characters). The conversion needs to be handled carefully or you can run into extra costs. When you use the appropriate SMS gateway software, such character conversion will be handled effectively. Concatenated SMS Concatenated SMS messages (also called as multipart SMS) allow you to increase the size limit of standard text messages. In other words concatenated messages are for sending longer messages than standard by using segmentation and reassembly. Let’s see an example. In case you wish to send a text message that is written in English and is longer than 160 characters, first the message needs to be segmented. It is transmitted through the GSM network in several SMS messages. Then the recipient device reassembles the segments of the message after receiving it. Finally, the reassembled message is displayed as a single long text for the user. In case of international characters this process starts when the SMS message is longer than 70 characters. Costs When you utilize multipart technology, you can calculate the cost of messages by the number of segmented SMSs. In other words, the cost of concatenated SMS depends on the number of messages that are used for transmitting the text over the GSM network. For example, if an SMS message holds 240 English characters, it will be sent in two SMS messages. Therefore the cost will be twice as much as a single 160 character SMS. Note: According to the above mentioned, you may expect that a 320-charactered SMS will take two SMSs. However, this is not the case. In case of concatenated SMSs, one single message holds 153 characters. The reason for this is that segmentation information needs some space in the message. This segmentation information is needed to reassemble the segmented message parts in the correct order. In this way if you wish to send a 320-charactered SMS, it will take 3 SMSs. The first two messages hold 153 characters and the last message hold 14 characters. In case of international characters, a multipart message segment holds 67 characters. Headers When a long SMS message is segmented, a special header is added to each of its segmented part. This header is necessary for the recipient device to know that is a multi-part message and it needs to be reassembled. These headers are called as segmentation or concatenation headers or SAR headers. They are 6 bytes (8-bits each) and they are included in every physical part of the segmented SMS. SAR headers are placed in the User Data Header (UDH) field of the SMS, but they do count against the overall size limit of the message. Summary In case you send a long text message containing only characters that are part of the GSM 03.38 character set, then each SMS segment can contain up to 153 characters. (140 bytes - 6 bytes for the concatenation header leaves 134 available bytes, or 7 * 134 = 1072 bits. The most 7-bit characters that can be packed into 1072 bits is 153.) If you send a long text message that includes any characters that require Unicode encoding, then each SMS segment can contain up to 67 characters. (67 * 16 = 1072 bits). To get |
