Log parsing, the process of transforming unstructured log data into a structured format, remains a vital task for diagnosing failures within large online systems. Researchers Jinrui Sun, Tong Jia, Minghua He, and Ying Li from Peking University present a novel approach, VarParser, which addresses a significant limitation in current large language model (LLM)-based log parsers: their neglect of variable components within log messages. This work is significant because it moves beyond a constant-centric strategy, demonstrating that actively utilising variable data improves log grouping, reduces LLM processing costs, and crucially, preserves valuable system information often lost in traditional parsing methods. VarParser employs variable contribution sampling, a variable-centric parsing cache, and adaptive in-context learning to achieve higher accuracy and efficiency than existing techniques, as evidenced by extensive evaluations on large-scale datasets.
Log parsing, a critical first step in tasks like anomaly detection and failure diagnosis, extracts structured events from vast amounts of unstructured log data.
The research addresses limitations in current large language model (LLM)-based log parsers, which predominantly focus on the constant parts of logs while overlooking the valuable contributions of variable information. This constant-centric approach leads to inefficient log grouping, a high number of LLM invocations, increased costs, and a loss of system visibility.
The team achieved this breakthrough by introducing variable contribution sampling, a variable-centric parsing cache, and adaptive variable-aware in-context learning. These innovations enable efficient capture of variable log parts and their effective use in the parsing process. By introducing variable units, the researchers preserve rich variable information, thereby enhancing the integrity of the resulting log parsing results.
This approach moves beyond simply identifying placeholders, retaining crucial system details embedded within the variable data. Experiments demonstrate that VarParser surpasses existing methods in accuracy, significantly improving parsing efficiency and reducing the costs associated with LLM invocations.
The study reveals that by focusing on variables, the system can more effectively group and sample logs, reducing the need for repeated LLM calls. This variable-centric strategy not only enhances the precision of log analysis but also offers a more cost-effective solution for managing the ever-increasing volume of log data generated by modern online services. The work opens new avenues for more insightful and efficient system monitoring and failure diagnosis.
Variable Contribution Sampling and Adaptive In-Context Learning for Efficient Log Parsing demonstrate promising results
Researchers developed VarParser, a variable-centric log parsing strategy to address limitations in existing large language model (LLM)-based approaches. Current methods primarily focus on constant log components, leading to inefficiencies in log grouping and sampling, increased LLM invocations, and high invocation costs.
The study pioneers a method that efficiently captures and leverages variable parts of logs, enhancing parsing accuracy and reducing computational demands. Scientists implemented variable contribution sampling to identify and prioritise log messages containing informative variable data. This technique enables more effective log grouping and reduces the number of LLM calls required for parsing.
A variable-centric parsing cache was engineered to store and reuse parsed variable units, further minimising redundant LLM invocations. The team also developed adaptive variable-aware in-context learning, allowing the LLM to better understand and incorporate variable information during the parsing process.
Experiments employed large-scale datasets to evaluate VarParser’s performance. The research introduced variable units to preserve rich variable information, improving the integrity of parsed log results. This approach contrasts with previous methods that retained only placeholders, thereby losing valuable system visibility.
Extensive evaluations demonstrated that VarParser achieves higher accuracy than existing methods, significantly improving parsing efficiency and reducing LLM invocation costs. The system delivers a more comprehensive and informative log parsing solution, crucial for effective anomaly detection and failure diagnosis in large-scale online service systems.
Variable contribution improves large-scale log parsing accuracy and efficiency significantly
Scientists have developed VarParser, a variable-centric log parsing strategy to improve the analysis of large-scale online service systems. Logs serve as a primary source of information for diagnosing failures, and log parsing, the extraction of structured events from unstructured data, is a critical first step.
The team addressed limitations in existing large language model (LLM)-based log parsers, which traditionally focus solely on the constant parts of logs. Experiments revealed that this constant-centric approach leads to inefficient log grouping and sampling. Researchers discovered that a constant-based parsing cache results in a relatively large number of LLM invocations, impacting both accuracy and efficiency.
Measurements confirm that retaining only placeholders in results causes a loss of system visibility provided by variable information within logs. VarParser employs variable contribution sampling, a variable-centric parsing cache, and adaptive variable-aware in-context learning to efficiently capture and leverage the variable parts of logs.
By introducing variable units, the work preserves rich variable information, enhancing the integrity of log parsing results. Tests demonstrate that VarParser achieves higher accuracy compared to existing methods. The breakthrough delivers significant improvements in parsing efficiency while simultaneously reducing LLM invocation costs.
Data shows the approach effectively addresses the four key problems associated with constant-centric strategies: inefficient grouping, excessive LLM calls, high invocation costs, and loss of variable information. The research provides a new strategy for automated log analysis, with potential applications in anomaly detection, failure diagnosis, and recovery support.
Variable extraction enhances log analysis using large language models by identifying key data points
Scientists have developed VarParser, a novel log parsing approach leveraging large language models (LLMs) that prioritises the variable components of log data. Existing LLM-based parsers typically focus on the constant elements of logs, potentially overlooking valuable information contained within the variable parts.
VarParser addresses this limitation through variable contribution sampling, a variable-centric parsing cache, and adaptive variable-aware in-context learning, enabling more efficient and accurate parsing. The research demonstrates that VarParser outperforms existing methods in both accuracy and efficiency, while also preserving richer variable-related information.
Analysis of large-scale datasets revealed frequently occurring variables such as file paths and hardware identifiers, highlighting their importance in system monitoring and operation. The authors acknowledge a limitation in that their current work focuses on improving parsing performance, and does not address the complexities of handling extremely diverse or unstructured log formats. Future research will explore combining LLMs with smaller models to further enhance performance and scalability.
👉 More information
🗞 VarParser: Unleashing the Neglected Power of Variables for LLM-based Log Parsing
🧠 ArXiv: https://arxiv.org/abs/2601.22676
