Large Language Model as an Assignment Evaluator

A Comprehensive Analysis of the Effectiveness of Large Language ...

Lastly, we study the impact of different ensemble strate- gies on the dialogue evaluation performance, including dimension-level and model-level ensembles.

Large Language Models for Explainability in Machine Learning

We investigate the potential of large language models (LLMs) in explainable artificial intelligence (XAI) by examining their ability to ...

HELM Lite - Holistic Evaluation of Language Models (HELM)

The Holistic Evaluation of Language Models (HELM) serves as a living benchmark for transparency in language models. Providing broad coverage and recognizing ...

Scale AI to set the Pentagon's path for testing and evaluating large ...

Experts at Scale AI will adopt a similar approach for T&E with large language models, but because they are generative in nature and the English ...

Evaluating Large Language Models Generated Contents with ...

It's been an eternity since I last endured Dr. Andrew Ng's sermon on evaluation strategies and metrics for scrutinizing the AI-generated ...

Applying Large Language Models to Enhance the Assessment of ...

Evaluation approach and research contributions. To evalu- ate our approach, we assessed ChatGPT-4's performance in grading programming assignments and compared ...

Automated evaluation of retrieval-augmented language models with ...

We propose a new method to measure the task-specific accuracy of Retrieval-Augmented Large Language Models (RAG). Evaluation is performed by scoring the RAG ...

Successful language model evals - Jason Wei

Everybody uses evaluation benchmarks (“evals”), but I think they deserve more attention than they are currently getting.

What Is Language Modeling? | Definition from TechTarget

Large language models (LLMs) also use language modeling. These are advanced language models, such as OpenAI's GPT-3 and Google's Palm 2, that handle ...

Redefining Evaluation: Towards Generation-Based Metrics for ...

The exploration of large language models (LLMs) has significantly advanced the capabilities of machines in understanding and generating ...

Evaluation of Large Language Model Performance and Reliability ...

Background: Large language models (LLMs) have gained prominence since the release of ChatGPT in late 2022. Objective: The aim of this study ...

On Characterizations of Large Language Models and Creativity ...

We conclude that LLMs are beyond “mere generation” and perceivable as creative, but we may need to reassess some frameworks for creativity evaluation.

A HOLISTIC APPROACH FOR TEST AND EVALUATION OF LARGE ...

As large language models (LLMs) become increasingly prevalent in diverse applications, ensuring the utility and safety of model generations becomes paramount.

Can Large Language Models Assess Personality From ...

In particular, the swift advancement of Large Language Models (LLMs) has significantly decreased the cost and technical barrier to developing AI ...

What Should Data Science Education Do With Large Language ...

The rapid advances of large language models (LLMs), such as ChatGPT, are revolutionizing data science and statistics. These state-of-the-art ...

Evaluating Large Language Models at Evaluating Instruction ...

As research in large language models (LLMs) continues to accelerate, LLM-based evaluation has emerged as a scalable and cost-effective alternative to human ...

Evaluating Language-Model Agents on Realistic Autonomous Tasks

We have just released our first public report. It introduces methodology for assessing the capacity of LLM agents to acquire resources, create ...

Transforming Assessment: The Impacts and Implications of Large ...

Applying cutting-edge large language models (LLMs) and generative AI to assessment holds great promise in boosting efficiency, mitigating bias, ...

A Large Language Model Approach to Educational Survey ...

This paper assesses the potential for the large language models (LLMs) GPT-4 and GPT-3.5 to aid in deriving insight from education feedback surveys.

Main Conference - EMNLP 2024

On this page ... Fine-Tuning Large Language Models to Translate: Will a Touch of Noisy Data in Misaligned Languages Suffice? ... Can Large Language Models Always ...