Intelligent Food Portioning System Using Vision-Language-Action (VLA) Models for Small-Scale Food Operations
Files
TR Number
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
The food industry faces significant barriers to adopting automation, as most of the food service, retail, and processing operations are small businesses that lack the financial capacity to invest in conventional industrial automation systems. Food portioning is a fundamental operation across food industry sectors, yet it remains highly labor-intensive for small businesses. Existing automated portioning systems are generally designed for single-product, large-scale processing, rendering them financially prohibitive and operationally inflexible for small-scale operations with diverse product requirements. Advancements in artificial intelligence (AI) provide promising avenues for the development of cost-effective automation systems for small food businesses, offering adaptable solutions capable of handling multiple food types with flexibility and precision. This study proposes an AI-driven, low-cost food portioning framework as a proof-of-concept solution that integrates weight sensing with vision-language-action (VLA) control to enable adaptable handling of diverse food products. The system employs You-Only-Look-Once (YOLO)-based vision models to interpret digital scale readings while coordinating robotic picking mechanisms that transfer food items until the target weight is reached. Three vision-language models, namely Action Chunking with Transformers (ACT), OpenVLA with Optimized Fine-Tuning (OpenVLA-OFT), and π0, were evaluated on shrimp (30g), grapes (50g), and garlic (20g), demonstrating adaptability across diverse food types. The π0 model achieved a 100% success rate using only 30–50 demonstrations per food type and demonstrated efficient operational performance (e.g., 15.23 seconds to portion 30 g of shrimp). This framework demonstrates the potential for adaptive automation in small-scale food businesses, providing a preliminary foundation that addresses single-product automation limitations in food packaging, distribution and service operations.