Interacting with Physical Objects

chapter 8 Interacting with Physical Objects

section 8.1 Affordance revisited — what we can do and what we think we can do
section 8.2 Affordances of the artificial

figure 8.1 Glued coins and door handles

section 8.3 Adapted for new actions

figure 8.2 John Locke’s ‘An Essay Concerning Humane Understanding’ [Lo90].
figure 8.3 Screen icons imitate physical controls

section 8.4 Action as investigation

figure 8.4 Embodiment in Tetris (top) mirror image pieces are especially hard to rotate mentally (bottom) where does it fit? rotating the piece makes it easier to tell

section 8.5 Letting the world help

figure 8.5 Compliant action: (i) screw slightly misplaced, but naturally (ii) slides, and (iii) rotates, (iv) into place.
box 8.1 Tight screws
figure 8.6 Folding clean linen in perfect thirds

Objects in the world may have all sorts of properties and impact on our senses in many ways, but when thinking about design the most critical are what we can do with them (e.g. pick up, throw), or what they may do to us (hit us!). When we interact with objects they become part of our own activities and lives.

8.1 Affordance revisited — what we can do and what we think we can do

We briefly mentioned James Gibson’s affordance theory in Chapter 5. Recall that affordance is about what it is possible to do with a thing, the potential for action of an object or the world in general. [Gi79] Note that what the environment affords depends on the person or animal looking at it. Whether or not a rock shelf affords sitting upon depends on your leg length; whether a large stone affords throwing depends on whether you are an Olympic shot putter or not. The differences between humans are relatively small, but certainly what the environment affords a song thrush is very different to what it affords a snow tiger. Affordance is fundamentally relational, and can only be understood by considering both the environment and the agent who can act (or be acted upon) in the environment. It is for this reason that Gibson’s approach is often called ecological psychology: it is about the creature in an environment.

However, we are creatures adapted to our (natural) environment, and so Gibson argued further that our whole perceptual system is attuned to seeking out the action potential of the environment. Our eyes, our ears, indeed all our senses and the brain processing that goes with them, are designed in order to act. This is precisely the emphasis of embodied mind philosophy introduced in Chapter 5.

Because of this adaptation to the environment, Gibson argues that the perception of affordance is immediate. We do not need to go through abstract reasoning steps:

rock is about four inches across and rock is in reach,
therefore rock can be picked up

Instead we are immediately aware of the pertinent qualities of both the object itself, our own bodies in the environment, and the available affordances. Even that awareness may itself be subconscious. We do not explicitly think, ‘that cup can be picked up’, we just do it and drink.

8.2 Affordances of the artificial

Gibson’s illustrations of affordance include artificial phenomena and objects as well as natural ones. However, there is a crucial difference. It is reasonable to believe that the human species has adapted over millennia to the natural world and hence its affordances are directly available to us. The low-level discrimination of our eyes, the range of frequencies that our ears pick up, the turning of our eyes and head in response to movement at the edge of our vision, the ways in which our brains process these and react to them — they are all there to support acting in this natural environment.

It is not so clear why we are able to ascertain that a cup can be picked up, let alone that a television can be switched on using a remote control. Indeed, in the human-crafted world things often go awry. Norman highlighted many examples in his book ‘The Design of Everyday Things’ ((Norman’s book was originally entitled ‘The Psychology of Everyday Things’, but when it moved into paperback the publishers obviously thought ‘design’ was a better draw than ‘psychology’.)), which popularised the word ‘affordance’ within the HCI and design communities, albeit with a somewhat different meaning than Gibson’s [No98].

Many building doors look symmetric. Though they only open in one direction, the handles look the same whichever side you approach from. You see the door in the distance, and as you get closer you can see a sign attached to it. The sign says ‘Push’. You read the sign, acknowledge it and reach for the handle. Then you pull the door and nearly jar your arm out of its socket. If you have done this, the good news is that it’s not your fault. Essentially, the door carries two signs: the handle itself, telling you to pull, and the written one telling you to push. The physical feel of the handle triggers your immediate response to it, and you pull. It is possible to overrule this urge, but only if you are concentrating, and this is not the type of task we generally concentrate on!

The problem is that the perceived affordance is different from the actual affordance.

Fig. 8.1 Glued coins and door handles

http://www.flickr.com/photos/joecws/3656880626/ https://www.blog.theteamw.com/2011/01/19/100-things-you-should-know-about-people-53-people-see-cues-about-how-to-use-an-object/

Sometimes these disparities between appearance and action are deliberate, as in the joke of gluing a coin to the pavement in order to see people struggle to pick it up. However, more often, as in the case of the door handles, it is an accident or poor design. (See Figure 8.1)

While our senses are tuned to the affordances of the natural world, there is no a priori reason why the affordances of constructed (and hence unnatural) objects should be apparent, unless they are designed to be so. That is we have a design heuristic:

design so that the affordances suggested by appearance match those actually afforded by the artefact

The surprising thing is not that there are problems like confusing door handles in the constructed environment, but that there are not more. How is it that we manage at all?

It is partly to do with the objects themselves, and partly with the skill of the designer:

necessary physical correlates — The purpose and use of an artificial object, that is what it affords, are constrained by the properties of the physical world (including people) in which it will be used. If you design a cup to hold drinks then it will need to have a suitable hollow in it, and this hollow will be similar to those in natural objects that afford holding fluid. Likewise, if the cup is designed to be picked up, it will tend to have a size and shape similar to natural pick-up-able objects.
design selection — Successful designs can be used easily. Therefore it is reasonable to suppose that over time traditional designs will end up with perceived affordances that match their actual affordances. The pace of mass manufacture means that this is not necessarily the case for more explicitly designed goods, though it is still likely that those with successful sales match appearance and affordance, whether deliberately or accidentally.
by design — A good designer may either explicitly realise that they need to match perceived and actual affordance, or simply ‘know’ what is right.
imitation and artefact ancestry -– Often artificial objects mimic natural objects, or derive from previous artificial objects that themselves mimicked natural ones or were subject to physical correlates, design selection or plain good design.

8.3 Adapted for new actions

Not only are objects adapted to people, we also adapt to the world. Some creatures operate almost purely by instinctive reaction. Frogs have dedicated visual areas that respond to a small moving dot (that is flies) and trigger the tongue to dart in the right direction to catch it. If you try to feed the frog dead flies, no matter how fresh, it will never notice but starve with food in front of it.

Higher animals and especially humans are much more adaptable and learn both during infanthood and also as adults. Rather than being born pre-programmed for the affordances of a particular natural environment, we are instead able to learn these affordances. That is the immediate perception of (natural) affordances is not entirely innate, but we are born with the ability to form new associations between perception and action. This has given us the ability to adapt ourselves to different natural environments from savannah to tundra and also to artificial environments of our own making.

Note that this adaptability and ability to learn does not mean we are born without innate responses. The idea of the child’s mind being a tabula rasa, a blank slate ready to be written upon by experience and education, was present in some of the earliest philosophy, but is most closely associated with John Locke’s ‘An Essay Concerning Humane Understanding’ [Lo90]. However, more recent philosophers and psychologists, especially those taking a more embodied view of the mind, suggest that there is far more that is ‘inbuilt’.

Fig. 8.2 John Locke’s ‘An Essay Concerning Humane Understanding’ [Lo90].

http://www.gutenberg.org/ebooks/10615

Evidence for this includes the fact that babies can perform mirror actions from an early age, imitating facial expressions and movements before there has been any chance to learn the association [Ga05, p.159;MM77]. Similarly they can move their heads in the direction of sounds and in other ways that suggest that a basic model of the physical self and an egocentric ‘model’ space are there from birth or develop spontaneously soon after.

Brain imaging has shown the existence of so called ‘mirror neurons‘. When we see another person doing an action it fires particular parts of our brain (the mirror neurons), but these in turn connect to exactly the same parts of the brain as when we perform the same action ourselves [Ga05, p.220;Ra04, p.37,107]. Seeing someone else doing something is nearly the same as doing it oneself!

It is possible that the relationship between our proprioceptive sense of body position and the actions of muscle movement is effectively ‘learned’ through association, maybe even in the womb. However, the ability to see another person and mirror their actions, that is the development of the mirror neurons, cannot be learned before we see others and so appears to be born in us, part of our initial neural wiring.

Looking at other animals, it is clear that they too have an innate sense of body presence and movement, and indeed their ‘initial wiring’ seems if anything far stronger than our own. The young of most animals, including those that exhibit learning, can function to some extent either from birth or at a young age; think of those Bambi-like pictures of young fawns standing and running with the herd moments after birth. However, animals with more complex social behaviour (dogs, elephants) often have infants who are more dependent for longer, suggesting that a stronger role for life-long learning necessitates weaker innate instincts.

Humans are the extreme, and indeed Clive Bromhall’s book ‘The Eternal Child’ [Br03] suggests that we are effectively always infants, not so much naked apes as ones that never mature enough to have hair! He argues that the infantile features and behaviours in some species allowed them to be more social, because they avoided the extremes of territoriality that come with adulthood. In humans this effect may have been amplified by runaway sexual selection. Following this argument, our cognitive flexibility is simply a side-effect of our perpetual neonate status; we retain the neural plasticity of a pre-born and the curiosity and sociability of a child, while having the physical size and strength of an adult.

Others would perhaps swap these round and see the cognitive advantages coming first, but in both views it would be a big step from weaker instincts to no instinctive understanding. While the price of sociability and cognitive flexibility may be that human infants do not have the ability to run like a deer, it seems likely both from empirical evidence and common sense that we all start with a level of innate grasp of the body and world that acts as a bootstrap for later learned understanding.

The problem of distinguishing innate understanding from understanding developed early as an infant arises precisely because of our rich ability to learn, especially if we are ‘hard wired’ to learn certain kinds of things easily. As tiny infants we reach out, touch things, push beads and blocks. Are we born with the understanding that if we push things they move, or is it that we pick this up in our first few months of life? Whichever it is, our grasp of the physical world is a form of birthright, available to every child brought up in normal circumstances, anywhere in the world and at any time in history.

The end-point of this ability to cope with savannah and tundra is that we are also able to learn about light-switches and TV remotes. We are natural born affordance-seekers, learning new ways to act in the world and then adding these to our repertoire. The conventions that we learn, whether turning a water tap or switching on a light are cultural affordances, which then become part of the way that future action potential is perceived.

However, while the basic properties of the physical world do not change, these conventions do. Cultures vary geographically and travellers find it difficult to re-learn how to turn on a shower, or even a light. Cultures also change over time. Sometimes old conventions are maintained long after the physical reasons for them have faded; for example the play/pause/fast-forward/rewind buttons of the old analogue tape recorder are now preserved in digital controls. However, changing conventions can disenfranchise. For example, elderly people may have difficulty in grasping the action potential of onscreen menus that seem second nature to those who have grown up with computers.

Fig. 8.3 Screen icons imitate physical controls

Andressa Rodrigues (CC0): https://pixabay.com/en/record-player-jukebox-tape-176970/

OpenClipart (CC0): https://pixabay.com/en/sound-button-glossy-set-player-145674/

Our cognitive flexibility also allows us to reason, even when the appearance of things does not immediately suggest use:

1. There is a water tap, but no handle to twist or press.
2. There is a foot pedal and it is attached to the water pipe.
3. Therefore try pressing the foot pedal.

Of course this reasoning itself makes use of a combination of basic properties of the physical world, learned associations, and culturally specific clues. There is a hierarchy here, from those things that are either innate or learned as a tiny infant, to those that we learn culturally, to those that we have to work out in the situation. As we move up this hierarchy to more and more complex behaviours, our reactions tend to be slower, require more mental effort and potentially become more error prone.

It is perhaps the times when errors occur ‘lower down’ the hierarchy that things go most dramatically wrong. A short while ago, Alan was in a stairwell at night. There was no natural light and in the dark he felt around close to the door he had just emerged from. Eventually he found a small square pad, rather like (cultural affordance) the timed light switches that you press to give you a few minutes of light. He gently pressed, it did not instantly move, so he pressed a little harder (physical understanding — if it doesn’t move press harder). A moment later he felt and heard a crack beneath his fingers and the fire alarm rang!

Earlier we suggested that a good heuristic for affordances was to design things so that the perceptual affordances suggest the actual action potential, but even more important as a design rule is:

make sure that the perceptual affordances (natural or cultural) do not suggest erroneous and harmful behaviour

If you can’t get it right, don’t make it too wrong!

8.4 Action as investigation

Situation 1: You are in your own home and wake early in the morning. You go to your kitchen, open a cupboard and take out the coffee. A few minutes later you return to bed with a mug of coffee and a book.

Situation 2: A few days later you are visiting a friend. You wake early in the morning and go to the kitchen to make a coffee. In front of you is an array of identical cupboards, but which contains the coffee? You have various ways of finding out. You could go and wake your friend to ask. You could reason it out, maybe selecting the cupboard nearest the stove. However, what you are likely to do is open a few doors until you find the coffee.

In Situation 1 you know where the coffee is, since it is your own kitchen (the knowledge is in your head), but in Situation 2 the knowledge is in a sense ‘out there’ in the kitchen, so you need to explore to find out. In both situations you open a cupboard, but in your own kitchen the action is pragmatic, it achieves a concrete purpose, whereas opening the cupboards in Situation 2 is an epistemic action, an action intended to give you knowledge.

You can probably think of many epistemic actions: looking in the fridge to see what you need to replenish, turning a tomato in a supermarket to check it is not damaged on the hidden side, tugging the leaves of a pineapple to see if it is ripe, craning your neck to check the road is clear before driving off. Many of these are incidental to the main purpose of your activity (shopping, getting from A to B, drinking a mug of coffee), but in some cases, such as turning the pages of a book, the primary purpose of the activity is purely informational.

We have already discussed how perception and thinking are not disembodied, but are intimately tied to our being creatures who act in the world: we perceive in order to act. Epistemic actions are in a sense the other side of this, in that we act in order to perceive. Indeed, Gibson argues that the boundaries are largely artificial, so that seeing is not just about a single image on our retina, nor even the flow of images as our eyes scan in front and our head rotates to see behind, but is part and parcel of an
integrated sensing-acting system that includes our hands and bodies.

Just as we saw when discussing the embodied mind in Chapter 5, Clark’s ‘007 principle‘ of parsimony applies [Cl98]. In general we do not bother to hold information in our heads that can be more quickly or easily seen, heard or felt in the environment.

One aspect of this that has been studied in detail is mental rotation; that is the ability to turn an object round in one’s head in two or three dimensions. This is often tested by asking people to decide whether two shapes shown at different angles are the same. The time taken to make the decisions is directly proportional to the angle you have to ‘turn’ the objects in your head in order to match them. Players of Tetris typically turn the fresh pieces as soon as they appear, as it only takes a few hundred milliseconds to do an onscreen rotation, (taking into account visual and motor delays) whereas the equivalent mental rotations may take over a second [KM94]: it is quicker to turn the actual pieces than to think about it. Likewise with physical puzzles, if you are completing a jigsaw you will not just stare at a piece trying to decide if it will fit, but instead spin it round and try it in position.

Fig. 8.4 Embodiment in Tetris (top) mirror image pieces are especially hard to rotate mentally (bottom) where does it fit? rotating the piece makes it easier to tell

It may seem odd that we are not better at mental rotation, which is effectively about the physical world of which we are part, but in fact drawing is a fairly new skill, in evolutionary terms. Before we drew, we could always ‘try things out’ or move to see an object from a different angle, so we did not need to develop better mental skills. Note this is a cost–-benefit trade-off again, but at a longer timescale. The ‘choice’ in Tetris or the jigsaw puzzle to manipulate objects in the environment instead of the head is something we do moment to moment and largely unconsciously. The fact that we are not better at mental rotation is about our development as a species.

Of course, acting to perceive is sometimes costly or dangerous, especially if the act of investigating has irreversible consequences, as was the case with Pandora’s box. In such circumstances, we often have to consciously stop ourselves from acting, for example reaching out to open a door if there is smoke coming through, or looking over your shoulder instead of at the road when driving past a car accident.

The boundaries between action and perception blur further because many actions that are about ‘getting things done’ also give us fresh information as a side effect. As you drive down the road you can see further round the corner, as you pick up a box you feel how robust and how heavy it is. Equally, epistemic actions often become the effective action for which they are seeking information. You turn the jigsaw puzzle piece round, try it in position and find it fits — so you leave it there. Finding out if it fits and putting it in place are one and the same action. If you were to ask your friend where to find the coffee in their kitchen, the acts of finding out (asking) and getting the coffee would be separate. When you investigate yourself, the point at which you open a cupboard to see what is inside and find the coffee there is coincident with opening the cupboard to take it out.

8.5 Letting the world help

Our bodies are covered in soft skin, our tendons and muscles are elastic; even our bones have some flexibility. This flexibility is important for many reasons. It protects us from injury: we are less likely to break limbs when we fall, or strain muscles when we exercise. It is also efficient; when we run, the foot that lands absorbs the energy of your body and leg hitting the ground and then ‘bounces back’, helping you on your next stride and meaning you can run faster with less energy.

This flexibility also helps when we interact with objects in the world. Imagine a classic science fiction robot with a pincher grip picking up a crystal vase. It needs to exert a very precise grip: too tight and it will shatter the glass, too loose and the vase drops. However, when you grip the vase the flexibility of the pads of your fingers gives a small margin.

The objects that we manipulate also make a difference to how easy it is to work with them. More natural materials typically have more flexibility: think about putting on a woollen jumper compared with a suit of armour! Tight, precision fit is often important in mechanical items, but is typically harder to assemble.

While humans cope, robots do not and those designing automated production lines have to design some of this natural flexibility back into both the robots and the items being assembled. Take the example of locating a screw in a hole. If the screw is a tight fit, it may be very difficult to position it in precisely the right position. However, if the screw is slightly pointed at its end, or the hole is drilled with a slight countersink, the screw will find the hole even if the location and angle are not entirely perfect. The robot arm also has to ‘give’ a little so that the screw can guide itself into position; in robotics this is called ‘compliant action‘.

Fig. 8.5 Compliant action: (i) screw slightly misplaced, but naturally (ii) slides, and (iii) rotates, (iv) into place.

Box 8.1 Tight screws

If you are putting in a small screw yourself, you may have similar problems to the robot. If you tighten the screw when it is in the wrong position you might ‘cross thread’ the screw, spoiling either it or the work piece or both. Instead try holding the screw in the hole and with slight pressure turn it backwards (counter clockwise for a normal screw). At some point you feel a slight click into position. When you feel this you can then start to turn the right way and (usually!) it fits easily.

In Chapter 5, we noted how one solves mathematical problems using paper and in Chapter 4 how match puzzles usually require playing with the matches, not just thinking about them. In both these cases the problems are rather discrete (unless one breaks the matches!) and cerebral (even if enacted with physical materials). However, we often use the more analogue properties of materials to help us solve problems or ‘compute’ things.

It is laundry day and you have just taken the freshly cleaned sheets off the washing line. You want to fold the sheet in half. Do you carefully measure the mid-point? No. If there are two of you, you each take opposite corners and then bring the corners together, hold them with one hand, put your finger in the fold and then pull tight. Your finger naturally slides into the middle position. To fold end to end one of you brings their end to the other and the weight of the sheet again neatly halves it. Even thirds are quite easy using variants of the finger and slide techniques.

Fig. 8.6 Folding clean linen in perfect thirds

As well as helping us to do things, physical properties can prevent us doing things. Think of the time-set button on a wristwatch, it is usually small and recessed so that you have to use a pen point or something similar to hold it in position. While this makes it rather difficult to deliberately set the time, it also makes it almost impossible to accidentally do so. In a similar way, flip phones have a flap over the keyboard so that you cannot accidentally ring a number. Phones with physical keys but no keyboard flap need to have an explicit keyboard lock function, usually pressing several keys in combination. Notice the choice: the flip phone works by exploiting the physicality of the phone, the keyboard lock uses digital interaction.

There are clearly many issues in play when making design decisions, but when a physical constraint can be used, such as the keyboard flap or the recessed time-set button, it is often more intuitive and robust.

The flip phone flap does not just prevent accidental phone calls. As you open the flap new opportunities for interaction are made apparent. In terms of affordances, the flap affords opening and the buttons afford pressing. When using a digital keyboard lock for a phone with physical buttons, the buttons are still visible, and so it is possible to start to enter a number before one realises the phone is locked. In this case the perceptual affordance of the keys is there whether or not they actually afford entering a phone number, in contrast to the flap, where the perceptual affordances of the buttons are only revealed when they can be used.

Gaver calls this process sequential affordance, where performing an action corresponding to one affordance makes new affordances available and apparent. He gives the example of a twist door handle. The handle itself (if designed well) is hand shaped and sized and affords grasping. Once you have grasped the door handle, it then affords twisting, and finally having twisted the door handle, the door affords opening.

Note that the actual affordances have a sequence: you cannot twist the handle unless you are grasping it, you cannot pull open the door unless you have twisted the handle. The door handle also has perceptual affordances: the hand size and shape is visually apparent. But what about the twisting and opening affordances? How do we know what to do?

Obviously there are cultural issues in play. We know door handles often twist and that doors open, and that typically (though not always) door handles twist downwards. There may also be subtle visible signs in the shape of the door surround or the visibility of the hinges, which tell us whether the door opens inwards or outwards. Yet we do cope remarkably well when we do not know and indeed if we initially get it wrong.

One reason is that the door handle ‘gives’ a little if you put pressure in the right direction — that is you can work out what the affordances are by feel. This ‘give’ is common in physical situations. You are not sure whether a box is full, and too heavy to lift, or empty, so you give it a little trial pull, not fully lifting it, but with enough pressure to see if it would lift if you really tried. You are not sure whether a door is locked, so you give it little trial pull to see if it starts to move, but not enough to fully open it.

In contrast many digital controls lack this ‘try it out’ nature. For example, with many touchscreens if you put your finger on an area and it happens to be active, then you have already selected it. Additional visual clues may be provided, for example the mouse cursor changes shape, a screen area changes colour, or a tool tip appears, or when you depress the mouse button over an active area the screen object may change colour or highlight. All of these offer similar information to the small ‘give’ of the door handle, but they lack the intuitive and immediate feel of the physical feedback and rely more on the visual senses.

References

1. [Br03] Clive Bromhall. The Eternal Child: Staying Young and the Secret of Human Success. Ebury Press, Jan 2003. ISBN-10: 0091885744 [8.3]

2. [Cl98] Clark, A. (1998). Being There: Putting Brain, Body and the World Together Again. MIT Press. [8.4]

3. [Ga05] Gallagher, S. (2005). How the Body Shapes the Mind. Oxford University Press, ISBN: 9780199271948 [8.3;8.3]

4. [Gi79] Gibson, J. (1979). The Ecological Approach to Visual Perception. New Jersey, USA, Lawrence Erlbaum Associates [8.1]

5. [KM94] Kirsh, D., & Maglio, P (1994) On Distinguishing Epistemic from Pragmatic Action. Cognitive Science: A Multidisciplinary Journal, Vol. 18, No. 4: pages 513-549 [8.4]

6. [Lo90] John Locke. An Essay Concerning Humane Understanding. 1690. [8.3;8.3]

7. [MM77] Andrew N. Meltzoff, M. Keith Moore (1977). Imitation of Facial and Manual Gestures by Human Neonates. Science, New Series, 198(4312; 7 Oct. 1977):75-78. DOI: 10.1126/science.897687 [8.3]

8. [No98] Donald A. Norman, The Design of Everyday Things. MIT Press, 1998, ISBN-10: 0-262-64037-6 [8.2]

9. [Ra04] Ramachandran, V. (2004). A Brief Tour of Human Consciousness. Pi Press, New York. [8.3]