Are you fully aware of what your LLM apps are answering in prod?
Because the output of LLM is difficult to predict, trace accuracy & quality daily to ensure reliability. Teammate Intel makes your LLM app production-ready quality.
If you find a problem on monitoring, identify the cause and improve it quickly by 4 steps below.
1. Identify logs with low quality score
Look at the graphs like accuracy score and coherence score, and find out the logs with low score. Teammate Intel supports a wide variety of metrics.
2. Analyze the contents of logs to identify the causes
Look at the contents of the logs with low scores and identify tendencies and causes. This could be because the prompt instructions are not conveyed wel, data sources in RAG are not sufficient, etc.
3. Improve prompts on "Lang" & RAGs on "Aug"
Solve the cause on "Teammate Lang" and "Teammate Aug". The iterative framework of Teammate AI Services allows you to improve prompts and RAGs easily.
4. Deploy with traffic spliting on "Lang"
Publish the improved version with 50% or lower rate traffic on "Teammate Lang" ,and compare the results on "Intel". If the new version wins, publish fully and if not, improve and try again.
Choose on what metrics you get data and how you visualize them on dashboards. Customizable dashboards allow you to forcus on metrics you daily observe.
If you use "Lang", "Aug" and "Infer", the data is automatically connected to "Intel". If you don't, you can connect your own data to "Intel" by collection APIs.