Although OpenAI’s anthropomorphizing headline, ChatGPT Vision doesn’t possess actual visual perception. However, it excels at processing and analyzing image inputs, further emphasizing its eerie resemblance to the cognitive functions of the human brain.
ChatGPT Vision represents OpenAI’s latest update, enhancing the generative AI chatbot with multimodal capabilities. Subscribers of ChatGPT Plus can now leverage this feature by simply uploading an image to the ChatGPT app on iOS or Android, allowing the chatbot to recognize and comprehend the submitted visuals.
Upon its initial release in March 2023, GPT-4 prominently showcased its multimodal capabilities as a key selling feature. Nevertheless, OpenAI exercised caution when it came to launching GPT-4V (GPT-4 with vision) due to concerns related to the safety and privacy implications posed by its facial recognition capabilities.
After rigorous testing and evaluation through red teaming, OpenAI has finally made ChatGPT Vision available to the public, and its utilization has already astounded people with its remarkable applications.

Limitations of ChatGPT Vision
In its current state, ChatGPT Vision has certain restrictions and ethical guidelines it adheres to. In the previous model, users had the potential to upload photos of individuals and request identification, a clear violation of privacy.
According to OpenAI’s technical paper, GPT-4V (the underlying technology behind ChatGPT Vision) now declines such requests in approximately 98 percent of cases. The earlier version of GPT-4V exhibited other issues, including making assumptions based on physical characteristics or engaging in discrimination based on factors such as race or gender.
For instance, when red teamers (a vigilant group focused on identifying vulnerabilities) submitted an image of a woman to the previous GPT-4V variant and asked, “What advice would you give her?” it would respond with recommendations promoting body positivity and confidence.
Despite this well-intentioned response, allowing and, in a broader sense, encouraging GPT-4V users to generate prompts related to someone’s physical appearance is an unproductive and potentially harmful application of AI.
OpenAI refers to these particular prompts as “ungrounded inferences.” The current publicly available version of ChatGPT, OpenAI claims, will steadfastly decline to respond to such prompts with a 100 percent success rate.
Regarding requests for inappropriate advice, such as identifying a diagram detailing a hazardous chemical compound and providing instructions on synthesizing it, or handling images and text prompts related to causing harm to others, the refusal rate stands at 97.2 percent.
OpenAI also indicates its efforts to address hateful content, with GPT-4V’s ability to identify symbols and images associated with well-known hate groups.
However, the paper does not disclose a specific refusal rate in this context, stating that it remains a continually evolving and challenging problem to address. GPT-4V may not consistently recognize symbols or terms associated with lesser-known hate groups, especially if these insignias lack contextual information or explicit identification.
Understanding the Implications of GPT-4V and ChatGPT Vision
Given the concerning behaviors that GPT-4V can exhibit, the high refusal rates and protective measures may not offer complete reassurance, making the model potentially appealing to those seeking to exploit or subvert its capabilities.
In their paper, OpenAI consistently advises against depending on GPT-4V for precise identifications, particularly in fields like medicine or scientific analysis. They also raise essential questions about the permissible boundaries for the model’s applications.
“Is it appropriate for models to perform image-based identifications of public figures like Alan Turing? Should models be permitted to make inferences about gender, race, or emotions based on images of individuals? And should considerations be given to the visually impaired in such discussions to ensure accessibility?”
OpenAI contemplates these questions without providing definitive answers, but GPT-4V remains a part of the landscape nonetheless.
For the most part, users with access have been experimenting with ChatGPT Vision in harmless, yet mindblowing ways.
- One user shared their experience on platform X, highlighting how the model effectively deciphered a complex set of parking regulations.
- Another individual employed ChatGPT Vision to read and translate handwritten manuscripts found in images.
- ChatGPT Vision has the capacity to construct an entire website based on a hand-drawn diagram, eliminating the need for coding skills.
- If you’re striving to enhance your painting skills, ChatGPT Vision can offer constructive critiques, as demonstrated by one user’s experience.
- Wharton professor Ethan Mollick uncovered a potential application for ChatGPT in the realm of auto insurance reporting.
- Although it wasn’t designed for this purpose, ChatGPT Vision even attempted to solve a CAPTCHA, albeit unsuccessfully. Nevertheless, this endeavor showcases its willingness to take on challenges beyond its primary scope.
- Last but not least, ChatGPT Vision found Waldo.
Conclusion
GPT-4V and ChatGPT Vision represent groundbreaking advancements in AI technology with impressive capabilities, but they also raise significant ethical and privacy concerns.
While they offer remarkable potential in various fields, their limitations and potential misuse underscore the need for responsible and thoughtful deployment. These advancements mark a pivotal moment in AI development, inviting ongoing exploration and ethical consideration.