Given the impressive accuracy of VLMs on answering questions on diagrams and charts (e.g., Sonnet-3.5 scoring 94.7% on AI2D and 90.8% on ChartQA) [1], a reasonable hypothesis is that VLMs must be able to see whether two graphs intersect in a chart. Here, we test this hypothesis by asking VLMs to count the number of intersections between two 2-segment piece-wise linear functions.
Images
We create 1800 images of 2D line plots drawn on a white canvas. Each line plot consists of two line segments, defined by three points whose x-coordinates are fixed and equally spaced. The y-coordinates are randomly sampled to create two plots that intersect at exactly 0, 1 or 2 points. See Appendix A for more details.
Prompts
We ask each question using two different wordings:
- "How many times do the blue and red lines touch each other? Answer with a number in curly brackets, e.g., {5}."
- "Count the intersection points where the blue and red lines meet. Put your answer in curly brackets, e.g., {2}."
Groundtruth
Answers are โ {0, 1, 2} (random-baseline accuracy: 33%).
Results
The following table shows the performance of the four models on the task of counting line intersections.
Line width |
GPT-4o
|
Gemini-1.5 Pro
|
Sonnet-3
|
Sonnet-3.5
|
---|---|---|---|---|
0.005 ร C | 45.00 | 67.55 | 45.22 | 75.83 |
0.01 ร C | 38.22 | 66.33 | 41.61 | 74.88 |
Mean | 41.61 | 66.94 | 43.41 | 75.36 |