CVNet - summary of query "Does size matter?"

Color and Vision Network (cvnet@lawton.ewind.com)
Tue, 17 Nov 1998 00:29:14 -0800

Subject: summary of "Whether size matters"
From: Denis Pelli <denis@cns.nyu.edu>
To: "CVNet" <cvnet@skivs.ski.org>
Cc: "Beau Watson" <abwatson@mail.arc.nasa.gov>

Here are the collected responses to my question. Thanks to all.

denis

p.s. I'm submitting an article about the topic to Science (fingers
crossed) and am including a link to this summary (which will appear at
the Vision Science site), to share this rich background with interested
readers.

****** My question:

Subject: who said size doesn't matter?
From: Denis Pelli <denis@wotan.cns.nyu.edu>
To: "CVNet" <cvnet@skivs.ski.org>

BACKGROUND: Most of us assume that we recognize shape in pretty much the
same way at all sizes. (A more precise version, dealing with the obvious
limitation of acuity, says that we use essentially the same computation
to recognize retinal images that differ solely in size.) This assumption
is appealing. The alternative to this assumption is to suppose that we
recognize a friend or the letter "a" differently at every distance or
size. There are some recent theories, e.g. Ullman and Koch, that propose
ways of implementing the assumption, but I think the idea is much older
than that.

QUESTION: Has anyone ever written down this assumption, or argued for it?
Aristotle?, Leonardo?, Kant?, Helmholtz?, Barlow?, Attneave?, Rock?

Please send replies to: denis@psych.nyu.edu

I'll post a summary. Hope you can help me.

denis pelli
professor of psychology and neural science
nyu

********* From: Irving Biederman, bieder@usc.edu

Dear Denis:

This response to your inquiry got much longer than I initially intended,
so feel free to truncate it after the first paragraph.

I do not know who first might have argued for size invariance in
recognition but one needs to specify the "recognition" task to know
whether
size invariance actually will obtain. Whereas picture priming (as
assessed
by naming), shows complete size invariance, old-new picture memory is
impaired if the objects are shown at a different size than when originally
viewed (Biederman & Cooper, 1992). [Details below or available from I.B.]
In addition to these memory effects, there are attentional adjustments to
different sizes or scales.

The priming results are consistent with the classical notion of
shape constancy and your observation that "Most of us assume that we
recognize shape in pretty much the same way at all sizes." However, you
probably can remember whether a particular object was large or small.
Incidentally, there is nothing unique about size in this regard. Position
shows a similar pattern to that of size variations: No effect of
translation (e.g., left to right or top to bottom) on priming but
interference for a change of position on explicit memory. Reflection and
rotation in depth (up to parts occlusion) show a similar pattern of
effects.

We conjectured that the invariance was achieved in the ventral
pathway and that naming could be generated from such representations.
Size
(and position and pose) would be specified by the dorsal stream and be
bound to the ventral representation of shape which would produce an
explicit memory for an episode. Changing a dorsal attribute (size,
position, orientation), would lower its similarity to the original
experience and, consequently, reduce the feeling of familiarity mediating
old-new judgments.

Some details and references:

a) Priming of the naming of pictures shows complete size invariance.
Observers see a series of briefly presented, masked pictures of common
objects and animals and have to name them as quickly as possible. The
pictures are shown at one of two sizes, small (3.5 deg) or large (6.2
deg).
They then see pictures again, some of the shapes are identical to those
shown initially, some are at the other size, some have the same name but
are of a different shape (grand v. upright piano), also of the same or
different size. In this task a large portion of the priming can be
visual
(and not just conceptual or verbal) in that the same shape exemplars are
named faster and more accurately than different shaped exemplars. But
there is no effect of changing the size of the stimuli.

b) Episodic (Old-New) memory for pictures shows strong size specificity.
The stimuli and presentations and first block task (naming) are identical
to those of a). But in the second block, subjects are to judge whether
they have seen that shape before, ignoring its size. (So they are suppose
to respond NO to the same name, different shaped exemplars.) Now there is
massive interference (increase in RTs and error rates) from changing the
size of the stimulus. We perform this task on the basis of feelings of
familiarity which are reduced by a different size. The effect is, again,
perceptual as there is no effect of changing the size of the same name,
different shaped distractors (New items).

These experiments on priming and episodic memory were done by
Biederman and Cooper (1992, JEP:HPP) with line drawings. Fiser and
Biederman (1992, Perception) replicated the priming effects with gray
level
images. The Fiser & Biederman, 1995; 1997 ARVO posters show invariance
to
priming over scale (low<-->high spatial frequency).

As noted earlier, the priming and episodic memory results for size
variations hold for changes in position (invariance for priming; no
invariance for old-new recognition) (Biederman & Cooper, 1991, Perception;
Cooper Biederman, & Hummel, 1993, Canadian Journal of Psychology), as well
as rotation in depth up to parts occlusion (Biederman & Gerhardstein,
1993;
1995, JEP:HPP for priming; E. E. Cooper, unpublished for old-new
recognition).

Biederman & Cooper (1992) also review some of the work of Bundeson
and Larsen on attentional adjustments in "size scaling" in shape matching.
The Fiser, Subramaniam, & Biederman ARVO (1996) poster shows that under
RSVP presentations, the usual advantage for large/low-passed stimuli in
single trial presentations can be eliminated through attentional
adjustments (requiring not more than 500 msec) that can be made during an
RSVP sequence. That is, identification of small or high-passed pictures
in a like sequence of such pictures in an RSVP sequence is an accurate as
that with large or low-passed RSVP pictures.

References (all of which I am sending to you):

Biederman, I., & Gerhardstein, P. C. (1995). Viewpoint-dependent
mechanisms in visual object recognition: Reply to Tarr and B¸lthoff
(1995). Journal of Experimental Psychology: Human Perception and
Performance, 21, 1506-1514.

Fiser, J., & Biederman, I. (1995). Size invariance in visual object
priming of gray scale images. Perception, 24, 741-748.

Biederman, I., & Gerhardstein, P. C. (1993). Recognizing depth-rotated
objects: Evidence and conditions for 3D viewpoint invariance. Journal of
Experimental Psychology: Human Perception and Performance, 19, 1162-1182.

Cooper, E. E., Biederman, I., & Hummel, J. E. (1992). Metric invariance
in object recognition: A review and further evidence. Canadian Journal
of
Psychology, 46, 191-214.

Biederman, I., & Cooper, E. E. (1992). Size invariance in visual object
priming. Journal of Experimental Psychology: Human Perception and
Performance, 18, 121-133.

Biederman, I., & Cooper, E. E. (1991). Evidence for complete
translational and reflectional invariance in visual object priming.
Perception, 20, 585-593.

Fiser, J., & Biederman, I. (1997). Independence of visual priming to
hemisphere, scale, and reflection changes. Investigative Ophthalmology &
Visual Science, 38, 1005.

Fiser, J., & Biederman, I. (1995). Priming with complementary gray-scale
images in the spatial-frequency and orientation domains. Investigative
Ophthalmology & Visual Science, 36, 475.

Fiser, J., Subramaniam, S., & Biederman, I. (1996). The effect of
changing size and spatial frequency content of gray-scale object images in
RSVP identification tasks. Investigative Ophthalmology & Visual Science,
37, 178.

Irving Biederman, Ph. D.
William M. Keck Professor of Cognitive Neuroscience
Neuroscience Program and Department of Psychology
University of Southern California
Hedco Neurosciences Building, MC 2520
Los Angeles, CA 90089-2520

bieder@usc.edu
(213) 740-6094 (Office); (213) 740-5687 (Fax);
(310) 823-8980 (Home); (213) 740-6102 (Lab)
http://rana.usc.edu:8376/~ib/iul.html

****** From: Lloyd Kaufman <lk@psych.nyu.edu>

Dear Denis,

I( do not recall that any of the great minds you cite actually dealt with
this problem, but I'm probably wrong. However, I do recall some relevant
discussion in the old Cognitive Psychology book bu Ulric Neisser. I do not
have a copy, but as best I remember, he dealt with the seemingly
impossible
task (for the brain) of storing a template for every orientation and size
of
an object so that it could be recognized regardless of its orientation and
size. Hence, Selfridge and Neisser went to great lengths to devise a
"theory" in which objects are stored in memory as lists of features. This
could be viewed as consistent with Hebb's 1949 idea that the eyes tend to
move along contours, and have to change direction at corners. When
correctly
done, the eye movements are reinforced because on the implicit reward of
staying on the contour. This leads to the establishment of a cell assembly
which "fires" in some phase sequence whenever a similar feature is
presented, e.g., a corner, and three corners if object is a triangle. The
activation of three cell assemblies comes to mean triangle, so recognizing
an object as a triangle no longer requires eye movements but simply the
elicitation of the activity of these feature-linked cell assemblies.
Presumable, this would imply that recognition is independent of size
differences, just so long as the sizes permit activation of all of the
relevant cell assemblies. In Neisser and Selfridge's terms, the feature
list
can match regardless of size or orientation.

Posner and Mitchell did a classic study in which subjects had to indicate
if
two letters were the same or not. If both letters were the same and both
were of upper (or lower) case, RT was faster than if both were the same
but
one was of lower case and the other of upper. But here we have real shape
differences, and not merely size differences.

Rock has a very interesting little book on orientation and form. He
presents
a great deal ofevidence that both retinal orientation and orientation with
respect to the environment affect recognizeability of objects.

To my mind a great classic in this area is an experiment by Wallach. He
made
use of an ambigous figure. When vertically oriented it could be perceived
as
either a chef or a dog. When tilted to the right it was normally perceived
as a chef, and when tilted to the left as a dog. He presented the figure
to
different subjects in one of the two unambigous orientations in one
quadrant
of their visual field. In a subsequent recognition test he presented the
figure (along with others) in its ambigous vertical orientation to all
quadrants of the visual field. It turned out that when it was presented to
the original quadrant (where it had been either a chef or a dog), subjects
tended to perceive it as they had previously. However, when presented to
other quadrants in the vertical orientation, the same subjects were
equally
likely to describe it as a chef or dog. This strikes me as strong evidence
that over the short term memories are locally stored.

I'll think about it some more. Frankly, I am not surprised by your result,
but I am sure that most other people will denounce it as absurd. KIeep me
posted.

...

I wasn't very alert yesterday. Last night I realized that indirect
consideration was given to your problem. It comes from the subject of
size constancy. How, they asked, can we tell that an object subtending a
particular angle at the eye is or is not the same as another object
subtended a different angle? The answer, of course, is that we can tell
they are the same object by taking account of the differences in distance
to them. When you are 10 meters away, I know you are the same guy who was
20 meters away, even though your image on my retina is twice as tall. In
Helmholtzian terms this implies unconscious inference in which I
effectively compute your distal (linear) size from your image size and
cues to your distance. If the computation does not show equal linear
sizes, then I perceive you as two different guys -- one smaller than the
other. I infer from this view that we do not automatically bypass image
size in deciding if otherwise similar objects are the same. We can decide
they are not the same automatically and without awareness if the stimuli
are at different distances or if I have information indicating a
difference in distance, e.g., as when my convergence increases because
somebody puts prisms in front of my eyes.

On this interpretation I would have to say that Helmholtz and, therefore,
Wallach, Hochberg, Gregory, Rock, Kaufman, and many others would all
agree with your conclusion. Of course, they will all have to think about
it for a bit. It seems to me that you could have happened on the first
evidence for an underlying process in which the brain might be engaged in
distance-checking, among other possibilities.

I've been doing some writing on this issue, so I'd like to know more
about your study.

Lloyd

************* From: Joe Lappin (joe.lappin@vanderbilt.edu)

Hi Denis,

I've recently done some research, theoretical and experimental, on this
and
related issues. I will send a couple of relevant reprints and a ms.
submitted to Psych Review on such questions. A couple of relevant remarks
are the following:

1) Note that the nature of the problem depends on the format for
representing spatial structure of visual input. If the elementary
reference frame for defining spatial positions and spatial relations is
assumed to be (a) independent of the visual stimulation and (b) given by
independent local signs of retinal positions, then the perception of
object
shapes is computationally difficult and vulnerable to noise in both the
images and in the physiological signals. The distance between two
features, for example, would be given by the differences in the local
signs
of the two positions, and, insofar as these are independent (which is
implicit in the local-sign assumption), then spatial uncertainty about the
separation between the features is greater by about square root of 2 than
the uncertainty about the positions of the individual features.
Information about the relative positions of 3 collinear features (e.g.,
whether one feature is centered between the other two) would involve still
another such difference between these pair-wise differences, and the
spatial uncertainty about this 2nd-order relation would be greater by a
factor of about 2 than information about the positions of the individual
features.

2) The preceding difficulties are avoided if the primitive spatial
structure of the input is defined by the intrinsic structure of the input
rather than defined extrinsically as above. The initial visual
representation of spatial relations can (in principle) be based on
higher-order differential structure of the the input, and this then buys a
lot of advantages in representing the shapes of objects. The best
development of the latter theoretical approach has been by Koenderink &
van
Doorn. Two especially relevant papers are: K & v D (1992) Generic
neighborhood operators. IEEE PAMI, 14, 597-605; K & v D (1992)
Second-order
optic flow. JOSA A, 9, 530-538.

3) Assumptions that the primitive spatial structure of the visual input is
given by local signs are implicit in many contemporary models of spatial
vision, but this assumption is not supported by any empirical evidence
that
I know about. In contrast, there is good psychophysical evidence against
this assumption. Visual acuity for spatial relations does not show the
predicted decrement in acuity for spatial relations described in #1
above.
This is explicitly tested in several experiments in the papers I will
send.
One of these is: Lappin & Craft (1997) Definition and detection of
binocular disparity. Vis Res, 37(21), 2953-2974. More extensive
experimental tests are given in a 1997 Ph.D. thesis by Craft (Vanderbilt
U), and these are described in the manuscript we have submitted to Psych
Rev. Another of my favorite experiments on the acuity for spatial
relations is by Karen De Valois et al: (1990) Discrimination of relative
spatial position. Vis Res, 30, 1649-1660.

I hope this is helpful. I'd be interested in any comments.

Regards,
Joe

Joseph S. Lappin (joe.lappin@vanderbilt.edu)
Dept. of Psychology phone: 615 - 322 - 2398
301 Wilson Hall fax: 615 - 343 - 8449
Vanderbilt University

******* From: William McIlhagga, william@axp.psl.ku.dk

What a coincidence. I am working here in Copenhagen with a couple of
people (Axel Larsen and Claus Bundesen) who some time ago looked at size
differences in pattern matching. Relevant papers are:

Bundesen & Larsen (1975),Visual transformation of Size, JEP:HPP, v 1 #3,
pp214-220.

Larsen & Bundesen (1978) Size scaling in visual pattern recognition,
JEP:HPP, v4 #1 pp1-20

In brief they showed that size differences have much the same effect on
reaction times as orientation differences (the famous mental
rotation experiments), and that reaction times to say whether a pair of
random objects was the same or different, is linearly related to size
ratio (not the log of size ratio, as you might expect). Kosslyn & Cave
have also done some work on it, but it seems that the paradigm is
fraught with difficulty, especially when you have only a few objects
and it is possible to memorize them (this yields different results, more
like a log transform, see the second paper above).

Hope this info is relevant,

William McIlhagga

******* From: Matteo Carandini, matteo@cns.nyu.edu

Denis,
maybe the following paper has a good bibliography:
Blakemore, C., Garner, E. T. and Sweet, J. A. (1972). The site of size
constancy. Perception, 1(1): 111-9. I haven't read it in a while, and
can't
find it anymore, but it should be worth reading. Yours
-Matteo

******* From: Stijn Oomes, oomes@psyche.mit.edu

Hi Denis,

a nice example of the influence of size on the perception of shape is
the construction of the Statue of Liberty; the sculptor Auguste
Bartholdi built a small model first and then they scaled it up in two
(or three?) steps to get to the present 46.50m height (engineering was
done by Gustave Eiffel) - apparently they had to change the shape with
every step to make it resemble the original model

i'm afraid i can't give you a reference for this because i read it on
the web some years ago - it shouldn't be a problem for you to find a
book on the construction of the statue in NYC

Stijn

--
Stijn Oomes
Perceptual Science Group, MIT
http://www-bcs.mit.edu/~oomes/

******* From: Richard Aslin, aslin@cvs.rochester.edu

Denis, back in the 1970s at fellow named T. G. R. Bower wrote a book (I can't seem to find my copy) in which he took seriously the notion that every fixation (retinal image) for the HUMAN INFANT was a "new" event. He had some odd rationale for this, and it may have been based on some older theorists (philosophers). However, you should know that Bower was (is) thought to be both eccentric and perhaps runs fast and loose with data. You could check with Sue Carey for a reference.

Best,

Dick

****** From: Alice J Otoole, otoole@utdallas.edu

Hi Denis

the question of size invariance in recognition has always fascinated me ...and I am not sure that I would necessarily buy the assumption... I have no data of course...though there is some (which disagrees with me :-) an older paper (about 1989?) of Irv's with size primimg working regardless of changes in object size...anyway I think this generated some other work...you might want to have a look at ....

I look forward to your summary!

Alice

***** From: Pierre Jolicoeur, pjolicoe@cgl.uwaterloo.ca

this is not as old or well known as Aristotle, but it might be relevant:

Jolicoeur, P. (1987). A size-congruency effect in memory for visual shape. Memory & Cognition, 15, 531--543.

Milliken, B., & Jolicoeur, P. (1992). Size effects in visual recognition memory are determined by perceived size. Memory & Cognition, 20, 83--95.

Cheers, Pierre Jolicoeur

****** From: Harry Wyatt, wyatt@sunyopt.edu

In regard to your cvnet query --

I am not sure how far you are trying to push the time envelope, but I believe some proposals were made in the early spatial channel papers. (Fergus Campbell? Colin Blakemore?)

******* From: Bruno A. Olshausen, bruno@palm.ucdavis.edu

Denis,

The first discussion of mechanisms for invariant recognition that I have found in the literature is a paper by Karl Lashley:

Lashley, KS (1942) "The problem of cerebral organization in vision." Biol. Symp., 7: 301-322.

He refers to this as "the most elementary problem of cerebral function and I have come to doubt any progress will be made toward a genuine understanding of nervous integration until the problem of equivalent neural connections, as as it is more generally termed, stimulus equivalence, is solved." He poses the problem in neural terms (i.e., how does the nervous system map different input patterns into the same response pattern?) and he presents a neural solution to the problem that is based on setting up standing waves of activity in the cortex that are unique to each pattern independent of position, size, etc.

The first explicit neural shifter/scaling circuit was proposed by Pitts and McCulloch:

Pitts W, McCulloch WS (1947) "How we know universals: the perception of auditory and visual forms." Bulletin of Mathematical Biophysics, 9: 127-147.

Rather than using an attentional mechanism to gate the connections as in our model (Olshausen, Anderson and Van Essen, J Neurosci, 13:4700-4719), they draw upon an oscillatory circuit (based on the alpha rhythm) to scan sequentially through different sizes of an object.

Bruno

Bruno A. Olshausen Phone: (530) 757-8749 Center for Neuroscience Fax: (530) 757-8827 UC Davis Email: baolshausen@ucdavis.edu 1544 Newton Ct. WWW: http://redwood.ucdavis.edu/bruno Davis, CA 95616

******* From: Walt Makous, walt@cvs.rochester.edu

Denis:

Dating the concept back to my graduate student days in the late 50s, I include this as part of the general problem then referred to as the problem of stimulus equivalence: i. e., stimuli are treated as equivalent in spite of translation, rotation, and expansion/contraction. Klu:ver and Lashley did experimental work on the problem, and some one of the Gestalt psychologists hypothesised in the 20s or 30s that translational and size invariance could be achieved if contours set up waves of excitation across the cortex propagating away from the (topographical) representation of the contours in the direction perpendicular to the representation of the contours. In that case, the same pattern of waves and the interferences among them would be set up, no matter what the size or location of the object (image a triangle as an example). The pattern of waves, then, would be the signature for a given object, not the strict representation of its contours. Rotational invariance, of course, had to be handled some other way.

In the 40s or 50s, McCulloch and Pitts put these ideas into mathematical form.

I know this is all very vague wrt to authors and citations, but I don't have them readily at hand. If you think this might be useful to you, I'll try harder to find the references.

Regards,

Walt

********* From: Frans W Cornelissen, f.w.cornelissen@med.rug.nl

Hi Denis,

I just yesterday heard a presentation by Bart ter Haar-Romeny about the use of "scale space theorie" in image processing. Although he didn't really mention your idea precisely, it seems that it might be related to your question. Their group has a web site: http://www.cv.ruu.nl

Greetings,

Frans

Frans W. Cornelissen Laboratory of Experimental Ophthalmology Graduate School for Behavioral and Cognitive Neurosciences (BCN) University of Groningen P.O. Box 30.001 Hanzeplein 1 9700 RB Groningen, The Netherlands

********* From: David Rose, d.rose@surrey.ac.uk

Kant argued against it: he was into proving whether we could know that things exist, and if so how we know, and countering Hume who said we couldn't and Berkeley who said they didn't. But they all (except Berkeley) assumed that matter exists constantly, even if it moves around (Descartes via Democritus). Kant's chapters on the analogies of experience and on phenomena and noumena are the most relevant: we assume or understand or know that objects exist constantly despite the ever-varying sensory percepts we get, and we can only do this because we have what we would nowadays call functional mechanisms for drawing those conclusions.

...

I think maybe the work of the philosophical behaviourists (Russell? - certainly Wittgenstein, Ryle, Ayer) is one place to look for the origins of the view that multiple views are what is learned. They were against the internal model / representation / category type of theory. They tended to discuss mainly language rather than vision, but were in general criticising the notion of internal states, at least integrative or generalizational states, and were more in favour of meaning being given by networks of associations between particulars. I guess this sounds like what you are thinking of - although perhaps at a different level (words rather than size and shape - though they did talk about pain and colour ...)

I guess associationism goes back to Locke and Hume too, but that's too much to think about now!

Best wishes

Dave

Dave Rose d.rose@surrey.ac.uk Dept. Psychology, Univ. Surrey, Guildford, Surrey GU2 5XH, UK Voice +44 148 330 0800 x2889 Fax +44 148 325 9553

******** From: Laurence Harris, harris@YorkU.CA

Dear Denis

in response to your recent CVnet posting about recognizing things of different sizes.I do not know about the history of the idea of a common mechanism for big and small, but I suspect that the ancients did NOT think there was a common mechanism. Some thoughts on the topic:

1) small things that fit roughtly on the fovea can be taken in with no gaze movements; bigger things need eye movements and very big things need eye and head movements. Thus gaze needs to be taken into account by slightly different mechanisms with increasingly challenging tasks.

2) the example you suggest of a letter 'a' might represent a fairly typical stimulus in a psychophysics lab, but, apart from contrived examples like posters is not a very ecologically valid stimulus. Most things would contain a different set of stereoscopic cues when blown up.

3) And combining points 1 and 2, walking around a giant object would provide parallax cues and many complex cues about the relation of views. These would be different from a tiny version which might be manipulated in the hand with associated tactile stereognosis cues.

Nice question.

Laurence

Laurence Harris phone: (416) 736-2100 x 66108 Department of Psychology fax: (416) 736-5814 York University Toronto,Ontario M3J 1P3 CANADA YORKVIS web page: http://www.yorku.ca/dept/psych/yorkvis/ -----------------------------------------------------

******* From: Helen Ross, h.e.ross@stir.ac.uk

SHAPE AND SIZE

One should always check Ptolemy's Optics (c. 160-170 AD) and Alhazen's Optics (c. 1034 AD) for early statements on perceptual matters. However, neither of them seems to have been very specific on this question.

Ptolemy (Optics, Book 2) discusses slant and shape constancy, and the perception of convexity/concavity. He also says (paragraph 73): "Distortion in visual perceptions of this kind [i.e. slant/shape distortions] depends on the orientation of the figures and on the displacement of the viewers, but it is impossible for one, single case of such distortion to represent every one. If, however, one is disposed to examine many such cases together, the more carefully [he does so, the more] miraculous the natural capacity of the visual flux to arrange visual information turns out to be..... Moreover, it does this swiftly, without delay or interruption, and it carries out a careful scrutiny with a marvelous, nearly incredible power, and it does this unconsciously because of its speed." (Translated by A. Mark Smith: Ptolemy's Theory of Visual Perception, Transactions of the American Philosophical Society, Vol. 86, Part 2, 1996, pp. 101-102.) Thus Ptolemy was aware that viewing distance was a factor in shape perception that was in need of explanation. However, he did not offer an explanation, but merely asserted that the visual processing was swift and unconscious.

Alhazen is usually more detailed and clearer than Ptolemy, but is not so in this case. He does not specifically mention viewing distance as a problem. In his Optics, II.3 he discusses form perception in a repetitious manner, without specifying what is involved. A typical circular statement is: "The similarity of the two forms can only be perceived by comparing one of them with the other and perceiving in each of them that property in respect of which they are similar." He also echoes Ptolemy's 'unconscious inference': "For the shape or size of a body ... are in most cases perceived extremely quickly, and because of this speed one is not aware of having perceived them by inference and judgment. Now the speed with which these properties are perceived by inference is due only to the manifestness of their premisses and to the fact that the faculty of judgment has been much accustomed to discern those properties." (Translated by A.I.Sabra: The Optics of Ibn Al-Haytham, Vol. 1. p. 126 and 131. The Warburg Institute, University of London, 1989.)

It is unlikely that there existed clearer earlier statements on shape perception that were not repeated by Ptolemy and Alhazen. So the first clear statement is probably later than Alhazen.

Helen E. Ross Department of Psychology University of Stirling Stirling FK9 4LA h.e.ross@stir.ac.uk

continued:

... I'd be fascinated to know who first formalised a statement about the ratios remaining the same regardless of size. One of the benefits of checking the Ptolemy and Alhazen passages was to discover that they predate Helmholtz on the idea of an 'unconscious inference'. I never supposed Helmholtz was the first in this respect, but didn't realise the idea was current by Ptolemy's time. In general, most authors who believe in size-distance invariance (as an explanation of 'size constancy) also believe in unconscious inference (or the 'taking account of distance'). The two are not necessarily linked, as size-distance scaling could in principle occur in a totally automatic manner (without any need for learning).

I'd be very interested to see your paper at some point.

Best wishes, Helen

******** From: Zygmunt Pizlo, pizlo@psych.purdue.edu

There has been some discussion on mental size transformation in the 70-ies. Bundesen, Larsen, Kosslyn participated in it. The main idea was similar to mental rotation. They did no9t claim that different sizes involve different mechanisms, but only that before processing shape one has to normalize size. I also have a paper that summarized this discussion. It is in Vision Research '95.

Zyg Pizlo

******** From: George Chaikin, george@cooper.edu

Dear Prof. Pelli,

I am afraid that I am uncertain as to what exactly you are asking for in your question....

However, I may be able to direct you to some research done by myself and Carl Weiman and later by Eric Schwartz on rotational and scale invariance in the retinotopic mapping based on a complex logarithmic model. This model predicts scale invariance in the cortical representation for objects with the same fixation point but different scales in the retinal projection.

I hope this helps

Sincerely yours,

George Chaikin

****** From: Peter Foldiak, Peter.Foldiak@st-andrews.ac.uk

hi Denis,

I don't know the answer to your question but the idea I proposed in a 1991 Neural Computation paper might be relevant to this. Basically, I am saying that a network can learn invariance to any transformation (e.g. size) if it is exposed to continuous variations in that parameter (e.g. continuous retinal size change that you get when moving forward in space). In the paper I use another example (position) but it should work in exactly the same way for size. A modified ('trace') Hebbian rule can learn to connect intermediate-level feature detectors to a unit that comes to ignore these changes. I wrote another (longer) paper that discusses the same idea, but its still in press (long after it is supposed to have come out).

P. Foldiak, Learning invariance from transformation sequences, Neural Computation, vol. 3, pp. 194-200, 1991 P. Foldiak, Learning constancies for object perception, in Visual Constancies: Why things look as they do, eds. V Walsh & J J Kulikowski, Cambridge, U.K.: Cambridge Univ. Press, 1998, in press.

Peter

Peter Foldiak http://psych.st-and.ac.uk:8080/~pf2 Psychological Laboratory phone: +44 411 297469,office:01334 462087 University of St Andrews fax:+44 1334 463042,SMS:foldiak@genie.co.uk St Andrews KY16 9JU, U.K. e-mail: Peter.Foldiak@st-andrews.ac.uk

******* From: Dario Ringach, dario@cns.nyu.edu

Modern references to scale independence of vision include almost all the work on space-scale representation in image processing, from the early pyramids of Rosenfeld to modern wavelets. Also, almost every work in object recognition in computer vision starts with the assumption that you want a system that will show scale invariance.

*********** THE END