Intelligent Food Portioning System Using Vision-Language-Action (VLA) Models for Small-Scale Food Operations

Yang, Ran; Cai, Siwei; Zhou, Lifeng; Feng, Yiming

Intelligent Food Portioning System Using Vision-Language-Action (VLA) Models for Small-Scale Food Operations

Files

Accepted version (2 MB)

Downloads: 13

Date

2026-01

Authors

Publisher

Elsevier

Abstract

The food industry faces significant barriers to adopting automation, as most of the food service, retail, and processing operations are small businesses that lack the financial capacity to invest in conventional industrial automation systems. Food portioning is a fundamental operation across food industry sectors, yet it remains highly labor-intensive for small businesses. Existing automated portioning systems are generally designed for single-product, large-scale processing, rendering them financially prohibitive and operationally inflexible for small-scale operations with diverse product requirements. Advancements in artificial intelligence (AI) provide promising avenues for the development of cost-effective automation systems for small food businesses, offering adaptable solutions capable of handling multiple food types with flexibility and precision. This study proposes an AI-driven, low-cost food portioning framework as a proof-of-concept solution that integrates weight sensing with vision-language-action (VLA) control to enable adaptable handling of diverse food products. The system employs You-Only-Look-Once (YOLO)-based vision models to interpret digital scale readings while coordinating robotic picking mechanisms that transfer food items until the target weight is reached. Three vision-language models, namely Action Chunking with Transformers (ACT), OpenVLA with Optimized Fine-Tuning (OpenVLA-OFT), and π0, were evaluated on shrimp (30g), grapes (50g), and garlic (20g), demonstrating adaptability across diverse food types. The π0 model achieved a 100% success rate using only 30–50 demonstrations per food type and demonstrated efficient operational performance (e.g., 15.23 seconds to portion 30 g of shrimp). This framework demonstrates the potential for adaptive automation in small-scale food businesses, providing a preliminary foundation that addresses single-product automation limitations in food packaging, distribution and service operations.

Persistent link

https://hdl.handle.net/10919/140900

Collections

All Faculty Deposits
Scholarly Works, Virginia Agricultural Experiment Station

Full item page

Intelligent Food Portioning System Using Vision-Language-Action (VLA) Models for Small-Scale Food Operations

Files

TR Number

Date

Authors

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Description

Keywords

Citation

Persistent link

Collections