Comprehensive Usability Analysis Using Userlytics’ ULX® Benchmarking Score
Foreword from our CEO
The evolution of conversational AI represents one of the most significant inventions since the advent of the internet. As we witness this transformation, the question is no longer whether these platforms will reshape how we work and interact with technology, but rather which platforms will deliver the superior user experiences that drive meaningful adoption.
At Userlytics, we’ve dedicated over 15 years to understanding what truly drives exceptional user experiences. Our journey in the remote usability testing space led us to develop the comprehensive ULX® Benchmarking Score, guided by a fundamental belief: that user experience, not just functionality, ultimately determines the success of any digital platform.
This study represents more than a simple comparison of AI platforms. It demonstrates the power of scientific UX measurement in an era where subjective opinions and technical benchmarks often dominate the conversation. By applying our rigorous ULX® methodology to ChatGPT, Claude, and DeepSeek, we reveal the nuanced differences in user perception, trust, and satisfaction that contribute to the adoption and the future of generative AI tools.
What emerged from our research confirms what we’ve long advocated: user experience is multidimensional. A platform may excel in raw capability while failing in trust. Another might impress with visual design while lacking the reliability users demand. Only through comprehensive measurement across all dimensions of the user experience can we truly understand which platforms deliver on their promise.
The findings within this report provide more than academic insight as they offer actionable guidance for organizations navigating the complex landscape of AI platform selection. Whether you prioritize reliability, visual excellence, or analytical depth, the data and methodology presented here will inform better decisions and, ultimately, better user experiences.
As we stand at the threshold of an AI-driven future, let us not forget that technology’s true value lies not in its sophistication but in its ability to serve human needs effectively. This study is our contribution to ensuring that the platforms we adopt and build deliver on that fundamental promise.
We hope you find these insights as valuable as we have in creating them.
Alejandro Rivas-Micoud
CEO & Founder, Userlytics
Introduction: The AI Conversational Platform Landscape
The conversational AI landscape has exploded with innovation, presenting organizations and individuals with an increasingly complex choice: which platform delivers the best user experience for their specific needs?
While technical capabilities often dominate the conversation, the reality is that the user experience can significantly determine adoption, customer satisfaction, and long-term success.
Traditional evaluation methods fall short when assessing these sophisticated platforms. Simple feature comparisons or anecdotal experiences fail to capture the nuanced differences in user perception, trust, and overall satisfaction that drive real-world usage decisions.
The AI Revolution in Context
Understanding the explosive growth and strategic importance of conversational AI in today’s enterprise landscape.
Market Size:
$14.8B
in 2025¹
$49.9B
projected by 2030²
23.7%
CAGR (2025-2030)⁵
The conversational AI market is experiencing unprecedented growth, with multiple sources projecting compound annual growth rates (CAGR) between 20-25% through 2030.³⁴
Enterprise Adoption Surge
Adoption Statistics:
- 78% of organizations are now using AI⁶
- AI adoption jumped from 55% to 78% in 2024 alone, with large enterprises leading the charge.⁷
- This represents the fastest adopted technology in history. ⁸
Why UX Measurement Matters More Than Ever in AI
Trust & Transparency
AI’s “black box” nature creates unique trust challenges that traditional UX metrics can’t capture.
User Expectations
Conversational AI creates new interaction paradigms requiring novel measurement approaches.
Competitive Differentiation
As capabilities commoditize, user experience becomes a primary differentiator.
Study Objective
This comprehensive study addresses these challenges by applying Userlytics’ proprietary ULX® Benchmarking Score to rigorously evaluate three leading conversational AI platforms across real-world use cases, providing the scientific rigor needed for informed decision-making. The study included:
- Scientifically measure the user experience across ChatGPT, Claude, and DeepSeek.
- Identify statistically significant differences in user perception and satisfaction.
- Provide actionable insights for platform selection and improvement.
- Demonstrate the power of Userlytics’ ULX® Benchmarking Score methodology.
What is the ULX® Benchmarking Score?
Traditional UX metrics like System Usability Scale (SUS), Net Promoter Score (NPS) or Customer Satisfaction Score (CSAT) have long served as go-to tools for gauging the usability and appeal of digital products/platforms. However, these methods were built for simpler, more linear user experiences and can fall short in measuring the multifaceted, context-aware, and adaptive nature of generative AI tools.
SUS, for example, focuses almost exclusively on ease of use, while NPS captures loyalty without context. It tells you if someone would recommend the product, but not why. CSAT might give a snapshot of user happiness, but it won’t reveal if the product delivered a unique or memorable experience. For this reason, these tools can’t holistically gauge whether an AI model feels intelligent, trustworthy, or nuanced enough for the task at hand.
With generative AI—as with many digital product journeys today—success is often emotional and experiential. A tool might be functional but uninspiring, accurate but slow, or visually polished but forgettable. Understanding these trade-offs requires a multidimensional approach. Ideally, one that looks beyond the basic yes/no of task completion to examine perception, trust, usability, performance, and emotional resonance all at once.
That’s exactly what the ULX® Benchmarking Score is designed to do. It bridges the gap by capturing not only how well a platform works, but how well it feels and why that matters. By measuring 18 attributes across eight constructs like trust, appeal, performance, and distinction, it delivers a 360° view of the complete user experience.
The ULX® Benchmarking Score Methodology
The ULX score was designed with 18 scientifically-validated attributes organized into 8 comprehensive constructs, each weighted based on statistical impact on overall user experience.
The 8 ULX® Constructs
1
Appeal
The desire to use the platform – how engaging the content is and the overall attractiveness that motivates users to interact with the service.
2
Adequacy
Sufficiency and appropriateness of information – whether the platform meets expectations, provides complete information, and delivers overall satisfaction..
3
Distinction
Uniqueness and valuable features not found elsewhere – what makes the platform better than imagined and sets it apart from competitors.
4
Usability
Ease-of-use and findability of important information – how intuitive navigation is and how efficiently users can complete their tasks.
5
Trust
Credibility of information and data privacy perceptions – how secure users feel sharing their data and trusting the platform’s content.
6
Performance
Speed, responsiveness and technical reliability – the platform’s ability to execute quickly and efficiently without technical hiccups.
7
Affinity
Future usage propensity and recommendation likelihood – whether users would choose this platform again and recommend it to others.
8
Appearance
Visual design quality and modern aesthetics – how users perceive the platform’s looks and overall visual presentation.
More Than a Score: A Holistic Approach
This methodology goes beyond isolated ratings by combining structured benchmarking with behavioral context, delivering strategic direction through rich, user-driven insights.
Study Design & Methodology
We created a study using a mixed-methods approach combining quantitative benchmarking with qualitative user insights.
Study Overview:
216
Survey Participants
24
Qualitative Sessions
3
Use Case Scenarios
4
English-Speaking Markets
Participants (90%) completed a 30-minute quantitative survey where they tried out each generative AI tool by working through three tasks, then filled out the ULX Score questionnaire to rate their overall experience.
The remaining 10% did the same activities during 45-minute qualitative unmoderated sessions where they thought aloud while using the tools, giving us deeper insights into their experience. We randomized which tools people engaged with first to prevent any ordering effects from skewing the results.
Testing Scenarios
- Healthcare – Social Media Prevention Campaign
Develop a targeted campaign for yellow fever outbreak response, including key messaging, content strategy, and misinformation management. - Finance – NVIDIA Stock Analysis Email
Create professional client communication with technical analysis, market trends, and recovery scenarios following specific format requirements. - Education – Chatbot Development Guidance
Provide beginner-friendly chatbot creation guidance including resources, timeline, tools, and visual project flow for internal presentation.
Participant Criteria
- ✔ Daily generative AI users
- ✔ Paid subscription holders
- ✔ High proficiency levels (4-5/5)
- ✔ English-speaking markets (US, UK, Canada, Australia)
Executive Summary
This executive summary highlights key findings from our comprehensive analysis of ChatGPT, Claude, and DeepSeek using the ULX® Benchmarking Score methodology.
Study Statistics:
- 216 Survey Participants
- 24 Qualitative Sessions
- 18 ULX Attributes Measured
- 8 UX Constructs Analyzed
Key Findings:
Overall Winner
ChatGPT leads with a ULX® score of 83, demonstrating its position as the reliable generalist that excels across all dimensions while benefiting from strong brand recognition and first-mover advantage.
Statistical Significance
Meaningful differences were identified using t-tests, particularly between ChatGPT and DeepSeek in Distinction, Performance, Affinity, and Appearance constructs (p<0.05).
Visual Excellence
Claude scores 81, distinguishing itself through superior visual presentation and UI design, creating genuine ‘wow moments’ by generating actual visualizations rather than just code.
Key Differentiators
Each platform excels in different areas: ChatGPT for reliability, Claude for visual design, and DeepSeek for analytical depth, offering clear guidance for use case selection.
Sophisticated Analysis
DeepSeek achieves 79 with the most detailed responses, featuring transparent thinking processes and specialist-level analysis, though facing trust challenges in Western markets.
Actionable Insights
All platforms scored in the 80s range, indicating strong overall performance while revealing specific areas for improvement and competitive positioning opportunities.
Benchmark Results at a Glance
While all platforms performed well score-wise, statistically significant differences reveal distinct strengths and positioning in the conversational AI landscape.
Platform Rankings
#1 Overall Winner – ChatGPT (83
The versatile generalist that excels across all dimensions
Key Strengths:
- Top-of-mind brand awareness
- Highest distinction score (79 vs 73 DeepSeek)
- Superior performance perception (84 vs 78 DeepSeek)
- Strong affinity ratings (83 vs 79 Claude)
#2 Close Second – Claude (81)
The visual virtuoso with exceptional UX presentation
Key Strengths:
- Superior visual presentation and UI design
- Generates actual visualizations (not just code)
- Professional, modern typography
- Polished, attractive interface
#3 The Specialist – DeepSeek (79)
The sophisticated analyst with a transparent thinking process
Key Strengths:
- Most sophisticated and detailed results
- Transparent thinking process display
- Excellent context retention
- Specialist-level analysis depth
Platform Deep Dive: Strengths & Positioning
Understanding each platform’s unique value proposition and competitive advantages.
ChatGPT: The Reliable Generalist
ChatGPT maintains its position as the “Swiss Army knife” of conversational AI, delivering consistent quality across all use cases while benefiting from strong brand recognition and first-mover advantage.
“My main impression is that ChatGPT provided planning and execution steps with the goal, target audience, content strategy and formats, platform strategy, and posting schedule. I think that’s a very comprehensive approach.”
“Overall, I’m very satisfied with ChatGPT. It’s something I will use on a daily basis, as it kind of helps me get started in providing an outline. I think that’s what ChatGPT is really good at—providing that base that you can build on top of.”
Key Insight: Top-of-mind awareness drives preference, with statistically significant advantages in Distinction, Performance, Affinity, and Appearance over DeepSeek.
Claude: The Visual Virtuoso
Claude creates genuine “wow moments” through superior visual presentation and UI design, going beyond text to create actual visualizations and professional-grade outputs.
“It has given [me] so much to work with—from campaign planning and execution to content strategy, partnerships, and best practices for public health communication. It covers challenges, solutions, measurement, and adaptation. It’s a really solid, comprehensive set of insights I can build on with my teammates.”
“Wow, it gives me the full campaign planning and execution. The content strategies, the call to action, best practices, challenges, and solutions—it all feels really complete. That’s crazy.”
Key Differentiator: Only platform that generated actual project diagrams and formatted emails in separate windows, creating tangible value for business users.
DeepSeek: The Sophisticated Analyst
Despite being the newest platform, DeepSeek impresses with its transparent thinking process and sophisticated analysis, though it faces challenges with trust and brand awareness in Western markets.
“From a glance, I feel like DeepSeek has provided a more comprehensive email outline, overview of performance, key drivers, analysis insights, short-term, long-term scenarios… it really provides a lot of foresight too.”
“I feel like DeepSeek gives a pretty similar outline, but it also includes a lot of good resources, like a sample project on GitHub and a project visualization flow. I think this version from DeepSeek is more comprehensive.”
Unique Feature: Shows its thinking process transparently, breaking down analysis steps – perceived as delivering more sophisticated results than competitors.
Statistical Analysis: Where Differences Matter
Understanding which platform advantages are statistically significant and actionable.
ULX Construct | ChatGPT | Claude | DeepSeek | Significant Differences |
Appeal | 83 | 80 | 80 | None |
Adequacy | 83 | 82 | 80 | None |
Distinction | 79 | 78 | 73 | ChatGPT > DeepSeek (p<0.05) |
Usability | 83 | 82 | 82 | None |
Trust | 80 | 79 | 76 | None |
Performance | 84 | 82 | 78 | ChatGPT > DeepSeek (p<0.05) |
Affinity | 83 | 79 | 76 | ChatGPT > Claude (p<0.01)<br>ChatGPT > DeepSeek (p<0.01) |
Appearance | 83 | 80 | 78 | ChatGPT > DeepSeek (p<0.05) |
Key Statistical Insights
Largest Gaps (>5 points)
- Trust: “I trust this website with my data” (+6 ChatGPT vs DeepSeek)
- Performance: “Website is quick and responsive” (+5 ChatGPT vs DeepSeek)
- Affinity: “Would be my preferred choice” (+9 ChatGPT vs DeepSeek)
Pattern Analysis
- DeepSeek scored lowest across all 18 individual questions
- Main differences consistently between ChatGPT and DeepSeek
- Claude-ChatGPT differences only significant in Affinity
Sample ULX® Benchmarking Report
Here’s a look at a sample ULX® Benchmarking Report based on this study. This excerpt offers a glimpse into the type of insights, analysis, and deliverables our clients receive.
Implications for AI Platform Selection
Strategic guidance for choosing the right conversational AI platform based on your specific needs.
Choose ChatGPT When:
- Reliability is paramount – Consistent quality across diverse use cases
- Brand trust matters – Hyper established reputation with users
- General-purpose needs – Versatile generalist approach
- User adoption is critical – Highest preference scores
Choose Claude When:
- Visual presentation is key – Superior UI and output design
- Professional deliverables needed – Creates presentation-ready outputs
- User experience matters – Polished, modern interface
- Wow factor is important – Creates memorable user moments
Choose DeepSeek When:
- Analytical depth is crucial – Most sophisticated responses
- Transparency is valued – Shows thinking process
- Context retention matters – Excellent instruction following
- Specialist knowledge needed – Detailed, nuanced analysis
Conclusion
This study shows that user trust, perceived intelligence, visual polish, and emotional resonance shape user preference just as much as raw performance. Each LLM studied brings different strengths, but the nuances in how users engage with these tools reveal the deeper story.
The ULX® Benchmarking Score provides organizations with a scientific, comprehensive approach to uncover these insights, making abstract user sentiment measurable and actionable.
Key Findings Summary
- All platforms perform well (79-83 range), indicating the maturity of the conversational AI market.
- Distinct positioning emerges: ChatGPT as the reliable generalist, Claude as the visual virtuoso, DeepSeek as the analytical specialist.
- Trust and brand awareness remain significant factors, particularly affecting newer platforms in Western markets such as DeepSeek.
- Visual presentation and UX design create meaningful differentiation and user preference.
- Statistical significance validates that perceived differences translate to measurable user experience gaps
Discover How Your Product Performs
This comprehensive analysis demonstrated the power of scientific UX measurement to understand where a product excels and needs improvement.
For more information about Userlytics’ ULX® Benchmarking Score methodology or to commission your own study, contact our research team.
References
¹ Fortune Business Insights. (2024). Conversational AI Market Size, Share & Trends. Available online.
² MarketsandMarkets. (2024). Conversational AI Market worth $49.9 billion by 2030. Available online.
³ Grand View Research. (2025). Market to be worth $41.39 Billion by 2030 at CAGR 23.7%. Available online.
⁴IMARC Group. (2024). Conversational AI Market Size, Share and Growth to 2033. Available online.
⁵ Global Market Insights. (2024). Conversational AI Market Size, Growth Analysis 2024-2032. Available online.
⁶ McKinsey. (2025). The state of AI: How organizations are rewiring to capture value. Available online
⁷ Sullivan, D. (2024). AI use jumps to 78% among businesses as costs drop. Available online.
⁸ Forbes Technology Council. (2023). Suddenly AI: The fastest adopted business technology in history. Forbes. Available online.
Author:
Liliana Camacho
Liliana leads content marketing initiatives across Userlytics. With over a decade experience in B2B SaaS, with a focus in content writing, she’s passionate about the craft of corporate storytelling and thought leadership. She holds a degree in English Language and Literature from Western University in Canada.
Free DemoIn today’s competitive business landscape, companies across various industries are leveraging product-led growth strategies to drive their success. This white paper, titled “A Match Made in Heaven,” aims to provide you with a comprehensive understanding of the relationship between user experience (UX) and product-led growth. By implementing effective UX practices and tools, companies can better understand their customers’ needs and build products that foster customer satisfaction, drive adoption, and fuel growth.
Understanding Product-Led Growth: 1.1 Definition: This section explains the concept of product-led growth, which prioritizes product excellence as the key driver for acquiring, retaining, and expanding customer relationships. 1.2 Benefits: Explore the advantages of adopting a product-led growth strategy, such as increased customer satisfaction, faster user acquisition, higher retention rates, and improved revenue generation. 1.3 Success Stories: Highlight real-world examples of companies that have successfully implemented product-led growth strategies and achieved remarkable results.
The Role of User Experience in Product-Led Growth: 2.1 Importance of UX: Discuss how user experience plays a pivotal role in the success of product-led growth by creating delightful, intuitive, and valuable experiences for customers. 2.2 Customer-Centric Approach: Explain the significance of understanding customer needs and preferences to design products that align with their expectations and desires. 2.3 Building a Product Customers Love: Showcase various UX methodologies, such as user research, usability testing, information architecture, and interaction design, that enable companies to develop user-centric products. 2.4 Optimizing User Onboarding: Explore how a seamless onboarding experience contributes to product adoption and user retention, with a focus on user onboarding best practices and UX considerations.
UX Tools for Customer Insights: 3.1 User Research Methods: Provide an overview of user research techniques, such as surveys, interviews, and usability testing, that help gather valuable insights about user behavior, motivations, and pain points. 3.2 Data Analytics and User Feedback: Discuss the role of analytics tools and user feedback mechanisms in collecting quantitative and qualitative data to inform UX decisions and drive iterative product improvements. 3.3 User Journey Mapping: Explain the process of creating user journey maps to visualize the end-to-end user experience and identify opportunities for enhancement. 3.4 A/B Testing and Conversion Rate Optimization: Illustrate how A/B testing and conversion rate optimization techniques can be leveraged to refine UX elements, optimize conversion funnels, and drive product-led growth.