随笔-59  评论-36  文章-0  trackbacks-0

C and C++ enforce subtle differences on the expressions to the left and right of the assignment operator

If you've been programming in either C or C++ for a while, it's likely that you've heard the terms lvalue (pronounced "ELL-value") and rvalue (pronounced "AR-value"), if only because they occasionally appear in compiler error messages. There's also a good chance that you have only a vague understanding of what they are. If so, it's not your fault.

Most books on C or C++ do not explain lvalues and rvalues very well. (I looked in a dozen books and couldn't find one explanation I liked.) This may be due to of the lack of a consistent definition even among the language standards. The 1999 C Standard defines lvalue differently from the 1989 C Standard, and each of those definitions is different from the one in the C++ Standard. And none of the standards is clear.

Given the disparity in the definitions for lvalue and rvalue among the language standards, I'm not prepared to offer precise definitions. However, I can explain the underlying concepts common to the standards.

As is often the case with discussions of esoteric language concepts, it's reasonable for you to ask why you should care. Admittedly, if you program only in C, you can get by without understanding what lvalues and rvalues really are. Many programmers do. But understanding lvalues and rvalues provides valuable insights into the behavior of built-in operators and the code compilers generate to execute those operators. If you program in C++, understanding the built-in operators is essential background for writing well-behaved overloaded operators.


Basic concepts

--------------------------------------------------------------------------------

Kernighan and Ritchie coined the term lvalue to distinguish certain expressions from others. In The C Programming Language (Prentice-Hall, 1988), they wrote "An object is a manipulatable region of storage; an lvalue is an expression referring to an object....The name 'lvalue' comes from the assignment expression E1 = E2 in which the left operand E1 must be an lvalue expression."

In other words, the left and right operands of an assignment expression are themselves expressions. For the assignment to be valid, the left operand must refer to an object-it must be an lvalue. The right operand can be any expression. It need not be an lvalue. For example:

int n;

declares n as an object of type int. When you use n in an assignment expression such as:

n = 3;

n is an expression (a subexpression of the assignment expression) referring to an int object. The expression n is an lvalue.

Suppose you switch the left and right operands around:

3 = n;

Unless you're a former Fortran programmer, this is obviously a silly thing to do. The assignment is trying to change the value of an integer constant. Fortunately, C and C++ compilers reject it as an error. The basis for the rejection is that, although the assignment's left operand 3 is an expression, it's not an lvalue. It's an rvalue. It doesn't refer to an object; it just represents a value.

I don't know where the term rvalue comes from. Neither edition of the C Standard uses it, other than in a footnote stating "What is sometimes called 'rvalue' is in this standard described as the 'value of an expression.'"

The C++ Standard does use the term rvalue, defining it indirectly with this sentence: "Every expression is either an lvalue or an rvalue." So an rvalue is any expression that is not an lvalue.

Numeric literals, such as 3 and 3.14159, are rvalues. So are character literals, such as 'a'. An identifier that refers to an object is an lvalue, but an identifier that names an enumeration constant is an rvalue. For example:

enum color { red, green, blue };
color c;
...
c = green;    // ok
blue = green;    // error

The second assignment is an error because blue is an rvalue.

Although you can't use an rvalue as an lvalue, you can use an lvalue as an rvalue. For example, given:

int m, n;

you can assign the value in n to the object designated by m using:

m = n;

This assignment uses the lvalue expression n as an rvalue. Strictly speaking, a compiler performs what the C++ Standard calls an lvalue-to-rvalue conversion to obtain the value stored in the object to which n refers.


Lvalues in other expressions
-------------------------------------------------------------------------------

Although lvalues and rvalues got their names from their roles in assignment expressions, the concepts apply in all expressions, even those involving other built-in operators.

For example, both operands of the built-in binary operator + must be expressions. Obviously, those expressions must have suitable types. After conversions, both expressions must have the same arithmetic type, or one expression must have a pointer type and the other must have an integer type. But either operand can be either an lvalue or an rvalue. Thus, both x + 2 and 2 + x are valid expressions.

Although the operands of a binary + operator may be lvalues, the result is always an rvalue. For example, given integer objects m and n:

m + 1 = n;

is an error. The + operator has higher precedence than the = operator. Thus, the assignment expression is equivalent to:

(m + 1) = n;    // error

which is an error because m + 1 is an rvalue.

As another example, the unary & (address-of) operator requires an lvalue as its operand. That is, &n is a valid expression only if n is an lvalue. Thus, an expression such as &3 is an error. Again, 3 does not refer to an object, so it's not addressable.

Although the unary & requires an lvalue as its operand, it's result is an rvalue. For example:

int n, *p;
...
p = &n;    // ok
&n = p;    // error: &n is an rvalue

In contrast to unary &, unary * produces an lvalue as its result. A non-null pointer p always points to an object, so *p is an lvalue. For example:

int a[N];
int *p = a;
...
*p = 3;     // ok

Although the result is an lvalue, the operand can be an rvalue, as in:

*(p + 1) = 4;    // ok


Data storage for rvalues
--------------------------------------------------------------------------------

Conceptually, an rvalue is just a value; it doesn't refer to an object. In practice, it's not that an rvalue can't refer to an object. It's just that an rvalue doesn't necessarily refer to an object. Therefore, both C and C++ insist that you program as if rvalues don't refer to objects.

The assumption that rvalues do not refer to objects gives C and C++ compilers considerable freedom in generating code for rvalue expressions. Consider an assignment such as:

n = 1;

where n is an int. A compiler might generate named data storage initialized with the value 1, as if 1 were an lvalue. It would then generate code to copy from that initialized storage to the storage allocated for n. In assembly language, this might look like:

one: .word 1
...
mov (one), n

Many machines provide instructions with immediate operand addressing, in which the source operand can be part of the instruction rather than separate data. In assembly, this might look like:

mov #1, n

In this case, the rvalue 1 never appears as an object in the data space. Rather, it appears as part of an instruction in the code space.

On some machines, the fastest way to put the value 1 into an object is to clear it and then increment it, as in:

clr n
inc n

Clearing the object sets it to zero. Incrementing adds one. Yet data representing the values 0 and 1 appear nowhere in the object code.


More to come
--------------------------------------------------------------------------------

Although it's true that rvalues in C do not refer to objects, it's not so in C++. In C++, rvalues of a class type do refer to objects, but they still aren't lvalues. Thus, everything I've said thus far about rvalues is true as long as we're not dealing with rvalues of a class type.

Although lvalues do designate objects, not all lvalues can appear as the left operand of an assignment. I'll pick up with this in my next column.

如下:

 

--------------------------------------------------------------------------------
Non-modifiable Lvalues

--------------------------------------------------------------------------------


Lvalues actually come in a variety of flavors. If you really want to understand how compilers evaluate expressions, you'd better develop a taste.

const 限定符的含义: 比如 int const m;
它并不是说m的值不能被修改, 而是指 m 不能修改它引用的对象!
e.g:
int m;
int const *p = &m;
m += 1;  //right
*p += 1; //wrong
 

An expression is a sequence of operators and operands that specifies a computation. That computation might produce a resulting value and it might generate side effects. An assignment expression has the form:

e1 = e2

where e1 and e2 are themselves expressions. The right operand e2 can be any expression, but the left operand e1 must be an lvalue expression. That is, it must be an expression that refers to an object. As I explained last month ("Lvalues and Rvalues," June 2001, p. 70), the "l" in lvalue stands for "left," as in "the left side of an assignment expression." For example:

int n;

declares n as an object of type int. When you use n in an assignment expression such as:

n = 3;

the n is an expression (a subexpression of the assignment expression) referring to an int object. The expression n is an lvalue. On the other hand:

3 = n;

causes a compilation error, and well it should, because it's trying to change the value of an integer constant. Although the assignment's left operand 3 is an expression, it's not an lvalue. It's an rvalue. An rvalue is simply any expression that is not an lvalue. It doesn't refer to an object; it just represents a value.

Although lvalue gets its name from the kind of expression that must appear to the left of an assignment operator, that's not really how Kernighan and Ritchie defined it. In the first edition of The C Programming Language (Prentice-Hall, 1978), they defined an lvalue as "an expression referring to an object." At that time, the set of expressions referring to objects was exactly the same as the set of expressions eligible to appear to the left of an assignment operator. But that was before the const qualifier became part of C and C++.

The const qualifier renders the basic notion of lvalues inadequate to describe the semantics of expressions. We need to be able to distinguish between different kinds of lvalues. And that's what I'm about to show you how to do. But first, let me recap.


A few key points
--------------------------------------------------------------------------------

The assignment operator is not the only operator that requires an lvalue as an operand. The unary & (address-of) operator requires an lvalue as its sole operand. That is, &n is a valid expression only if n is an lvalue. Thus, an expression such as &3 is an error. The literal 3 does not refer to an object, so it's not addressable.

Not only is every operand either an lvalue or an rvalue, but every operator yields either an lvalue or an rvalue as its result. For example, the binary + operator yields an rvalue. Given integer objects m and n:

m + 1 = n;

is an error. The + operator has higher precedence than the = operator. Thus, the assignment expression is equivalent to:

(m + 1) = n; // error

which is an error because m + 1 is an rvalue.

An operator may require an lvalue operand, yet yield an rvalue result. The unary & is one such operator. For example:

int n, *p;
...
p = &n; // ok
&n = p; // error: &n is an rvalue


On the other hand, an operator may accept an rvalue operand, yet yield an lvalue result, as is the case with the unary * operator. A valid, non-null pointer p always points to an object, so *p is an lvalue. For example:

int a[N];
int *p = a;
...
*p = 3; // ok

Although the result is an lvalue, the operand can be an rvalue, as in:

*(p + 1) = 4; // ok

With this in mind, let's look at how the const qualifier complicates the notion of lvalues.


Lvalues and the const qualifier
--------------------------------------------------------------------------------

A const qualifier appearing in a declaration modifies the type in that declaration, or some portion thereof. For example: int const n = 127;

declares n as object of type "const int." The expression n refers to an object, almost as if const weren't there, except that n refers to an object the program can't modify. For example, an assignment such as:

n = 0; // error, can't modify n

produces a compile-time error, as does:

++n; // error, can't modify n

(I covered the const qualifier in depth in several of my earlier columns. See "Placing const in Declarations," June 1998, p. 19 or "const T vs. T const," February 1999, p. 13, among others.) How is an expression referring to a const object such as n any different from an rvalue? After all, if you rewrite each of the previous two expressions with an integer literal in place of n, as in:

7 = 0; // error, can't modify literal ++7; // error, can't modify literal

they're both still errors. You can't modify n any more than you can an rvalue, so why not just say n is an rvalue, too? The difference is that you can take the address of a const object, but you can't take the address of an integer literal. For example:

int const *p;
...
p = &n; // ok
p = &7; // error

Notice that p declared just above must be a "pointer to const int." If you omitted const from the pointer type, as in:

int *p;

then the assignment:

p = &n; // error, invalid conversion

would be an error. When you take the address of a const int object, you get a value of type "pointer to const int," which you cannot convert to "pointer to int" unless you use a cast, as in:

p = (int *)&n; // (barely) ok

Although the cast makes the compiler stop complaining about the conversion, it's still a hazardous thing to do. (See "What const Really Means," August 1998, p. 11.)

Thus, an expression that refers to a const object is indeed an lvalue, not an rvalue. However, it's a special kind of lvalue called a non-modifiable lvalue-an lvalue that you can't use to modify the object to which it refers. This is in contrast to a modifiable lvalue, which you can use to modify the object to which it refers.

Once you factor in the const qualifier, it's no longer accurate to say that the left operand of an assignment must be an lvalue. Rather, it must be a non-modifiable lvalue. In fact, every arithmetic assignment operator, such as += and *=, requires a modifiable lvalue as its left operand. For all scalar types:

x += y; // arithmetic assignment

is equivalent to:

x = x + y; // assignment

except that it evaluates x only once. Since the x in this assignment must be a modifiable lvalue, it must also be a modifiable lvalue in the arithmetic assignment. Not every operator that requires an lvalue operand requires a modifiable lvalue. The unary & operator accepts either a modifiable or a non-modifiable lvalue as its operand. For example, given:

int m;
int const n = 10;

&m is a valid expression returning a result of type "pointer to int," and &n is a valid expression returning a result of type "pointer to const int."


What it is that's really non-modifiable
--------------------------------------------------------------------------------
Earlier, I said a non-modifiable lvalue is an lvalue that you can't use to modify an object. Notice that I did not say a non-modifiable lvalue refers to an object that you can't modify-I said you can't use the lvalue to modify the object. The distinction is subtle but nonetheless important, as shown in the following example. Consider:

int n = 0;
int const *p;
...
p = &n;

At this point, p points to n, so *p and n are two different expressions referring to the same object. However, *p and n have different types. As I explained in an earlier column ("What const Really Means"), this assignment uses a qualification conversion to convert a value of type "pointer to int" into a value of type "pointer to const int." Expression n has type "(non-const) int." It is a modifiable lvalue. Thus, you can use n to modify the object it designates, as in:

n += 2;

On the other hand, p has type "pointer to const int," so *p has type "const int." Expression *p is a non-modifiable lvalue. You cannot use *p to modify the object n, as in:

*p += 2;

even though you can use expression n to do it. Such are the semantics of const in C and C++.


In summary
--------------------------------------------------------------------------------
Every expression in C and C++ is either an lvalue or an rvalue. An lvalue is an expression that designates (refers to) an object. Every lvalue is, in turn, either modifiable or non-modifiable. An rvalue is any expression that isn't an lvalue. Operationally, the difference among these kinds of expressions is this:

  • A modifiable lvalue is addressable (can be the operand of unary &) and assignable (can be the left operand of =).
  • A non-modifiable lvalue is addressable, but not assignable.
  • An rvalue is neither addressable nor assignable. 
  • Again, as I cautioned last month, all this applies only to rvalues of a non-class type. Classes in C++ mess up these concepts even further. 

Dan Saks is a high school track coach and the president of Saks & Associates, a C/C++ training and consulting company. You can write to him at dsaks@wittenberg.edu.

 

本文来自CSDN博客,转载请标明出处:http://blog.csdn.net/SeeSeaBee/archive/2007/09/08/1777120.aspx

posted on 2010-02-06 22:41 zhaoyg 阅读(800) 评论(0)  编辑 收藏 引用 所属分类: C/C++学习笔记

只有注册用户登录后才能发表评论。
网站导航: 博客园   IT新闻   BlogJava   博问   Chat2DB   管理