Feature Request: Efficient Chat Bundling in high-traffic environments for LLM API Calls
Estelle Pienaar
Summary: In high-traffic environments with multiple users (such as clubs or group activities), the current system sends a separate LLM API call for each chat message. This results in excessive API usage, increased operational costs, and less natural AI character behavior. I propose implementing a dynamic chat-bundling mechanism that aggregates chat messages before sending them to the LLM, based on both time intervals and message volume thresholds. This would optimize resource usage, reduce costs, and improve the realism of AI character interactions.
Problem Statement
- Resource Waste: Rapid, individual API calls for every chat message in group settings cause unnecessary API usage and increased costs.
- Unnatural AI Responses: AI characters responding to each individual message can disrupt conversation flow and detract from user experience.
- Scalability Issues: As user numbers in a SL region grows, current behavior does not scale efficiently.
Proposed Solution: Implement a smart chat-bundling system that operates as follows:
- Dynamic Interval and Thresholds: Aggregate chat messages over either a set interval (e.g., 30, 45, or 60 seconds) or until a minimum message threshold is reached (e.g., 3, 5, or 10 messages)—whichever comes first. Or when there are contextual triggers.
- User Count Awareness: The bundling strategy adapts based on the number of users in the open chat:
3 – 8 users: Send bundle every 30 seconds or after 3 messages.
9–14 users: Every 45 seconds or after 5 messages.
15+ users: Every 60 seconds or after 10 messages.
(If no text message at all has been written by anyone in the specific intervall, this could also be sent to the LLM and the AI Character gets a chance to boost chat interaction by initiating chat, which could be very useful for AI Characters in a club setting.)
- Owner and Direct Mentions Override: If the AI character’s owner speaks or someone mentions the AI character by name, immediately forward the current message bundle to the LLM to ensure timely, context-appropriate responses.
- Automatic Mode Switching: If other people leave and only only the owner remains present, revert to immediate/individual messaging mode for responsiveness.
Technical Considerations
- User Presence Detection: The AI Characters would need a reliable method for counting active users within open chat range.
- Configurable Parameters: Thresholds and intervals should be easily adjustable (maybe even - within a certain range by the users) and, ideally, dynamic based on real-time chat activity.
- Contextual Triggers: Monitoring for owner speech or direct name mentions should take precedence to maintain engagement and responsiveness.
Expected Benefits
- Cost Reduction: Significantly fewer LLM API calls, lowering operational expenses.
- Improved User Experience: More natural, context-aware AI responses that mirror real user behavior.
- Scalability: The AI Characters system becomes more robust in high-traffic environments, ensuring sustainable performance as user interaction in a region grows.
Summary
By batching chat messages and dynamically adjusting the sending strategy based on group size and activity, LL could deliver a more authentic and cost-effective AI character experience. This approach would align with both user expectations and business goals.
Log In
Estelle Pienaar
Open chat participation of AI Characters has been completely "nerved" this week. And some use-cases are not possible anymore. Therefore before any such API chat bundling is implemented, a user setting for chat participation should be created: https://feedback.secondlife.com/character-designer/p/feature-request-customizable-group-chat-participation-setting-for-ai-characters
AlphaStud Resident
Support. I had proposed something similar, but less detailed than yours: https://feedback.secondlife.com/character-designer/p/add-response-time-or-line-count-delay-option-for-messages-sent-to-characters
I believe what we're suggesting is the best way to tackle the issue. I hope they'll consider it. It's not too difficult to implement either, and I've proven it works.
Estelle Pienaar
AlphaStud Resident Oh no, I tell people to keep older proposals in mind and now I didn't think of it, Sorry.
I think that this proposal should probably best be implemented at the level of Convai itself, because it would benefit all their collaborations and applications. But they would need the detection about the group settings, so they can't do it alone neither.
Maybe it's easier to do the aggregating on LL level and to send the collected messages just like a normal single chat contribution via Convai to the LLM.
In any case it's probably tricky to implement, due to the different involved parties, but with a potential high benefit.