Splunk — A developer’s perspective
This article gives a developer’s perspective of using Splunk, get quickly acquainted with Splunk, understand SQL (Splunk Query Language) with some tips and tricks along the way to write simple, readable and optimized queries
Developer Lab setup for practice
You can download the trial edition of Splunk or the Splunk cloud to get started. Anything which can give results using regex can be used as an input to Splunk. For example: CSV files, JSON, log files. Data can be uploaded to Splunk, or it can be streamed to Splunk
In enterprise set up, would have been already loaded from the different system logs using Splunk agents.
For the purpose of the lab, you can load some data and play with it. Here, I have downloaded a titanic training dataset from “kaggle” and have uploaded it to Splunk to get started.
Tip: Follow naming conventions for all knowledge objects. In the long run, especially for enterprise setups which are shared across different solution spaces having and namespaces helps in easy maintenance in the long run.
When uploading/onboarding data you can specify
- Source — What is the location of your log or CSV file
- SourceType — For the multiple sources (physical-location-of-log@host) you can have one source type
Tip: Your search should use sourceType, not the source. This way, when the physical location of the logs change your dashboard queries don't have to change
3. Index — You can specify the data-set that falls within the index.
Tip: For future scaling, you may want to keep disconnected datasets to a different indexes.
Splunk can do a field discovery at 2 stages.
- Indexing stage — When the data is being parsed, trimmed, and extracted.
- Searching stage — From the fields that are searched.
Splunk not only uses the field discovery for the current search but the algorithm is written in such a way that Splunk makes use of searched fields for future field discovery
The delimiter is mentioned when on-boarding the log or the data file. The default is a space character. This plays an important role during the field discovery
Field discovery — Configurations
By default, Splunk discovers the first 50 fields. This default number is configurable in the settings.
Field discovery — Search Head Configurations — There are 3 configurations for field discovery on the GUI.
- The Fast mode — field discovery is off. The search result is therefore fastest here.
- The Smart mode (default) — Selective use for field discovery as a balance between the other 2 modes.
- The Verbose mode — field discovery is done for all fields
As seen above, you can change this option using the droplist on the right side below the search icon.
Tip: You can also configure how field discovery will happen through the search head. This has a good implication on the performance and the quality of results you get back. So it is important to be aware of this.
Selected Fields & Interested fields
Selected Fields — By default, there are 3 fields that are selected as seen in the left navbar. In the main page, along with the search results, the selected fields are highlighted in the results.
Interesting fields — The other fields discovered by Splunk are seen in this section. User has the choice to move an interesting field to selected fields section for the frequently queried field
You can choose how much time-series data you want to analyze using this data. It also gives you a bird’s eye view of the volume of data you are analyzing by timeline.
There are 4 views of the data seen in the 4 tabs highlighted below.
1. Events Tab
Each event is separately listed, this is also the default view of the search
This helps in seeing the pattern % in the data.
Usage — This is useful in seeing % occurrence of say Exceptions, service denials or even pattern within a specific log to understand the big picture very quickly
This show the data statistical data in tabular form. From here you can drill down to specific problem areas
This helps plot the statistics in the form of a graph/chart. There are several visualizations that you can choose from. Using special time-series specific command you can plot stats against a particular timeline.
Tip: Normally the Visualization enables us to identify the time window where some pattern change is seen. From there you can drill down to identify the stats. You can keep drilling down on the stats, till you get to the series of events that caused the change in pattern.
You can see the results of the query in any of the 3 forms
Splunk Query Language
There are several operators which are supported by SQL
“|” — The pipe operator chains the commands, the output of one set of command is fed as input to the next command
2. Arithmetic operator
All the knows arithmetic operators are available, namely =,<,>,≤,≥,!=,&,+,*. They are typically used with the eval command
3. Relational operators
OR, AND operators work like they do in any other language.
4. SPACE — acts as AND operator. To search for data that contains space put the data within double quotes
AS, BY, OVER, WHERE are special keywords. So you cannot query on field names that belong to this list
6. Project fields
You can add or remove fields using the field operator with “+” or “-” sign.
| fields +[field1] -[field2]
7. De-duplicate data
This help remove duplicate data
| dedup [field1] [field2]
8. Rename fields
|rename [fieldOldName1] [fieldNewName1]
9. Ordering command
| head [n] — top n events, the default is 10| tail[n] — bottom n events, the default is 10|sort [field1] — sort on the field in asc order|reverse [field1] — sort on the field in desc order
10. Transforming commands
|top [field1] — Data with maximum occurrences appear in top|rare [field1] — Data with minimum occurrence — useful to identify
anamolies in data|highlight [field1][field2] — Highlight the occurrence of a particular data|contingency[ field1][field2] — Shows the matrix of the data in tabular form with x axis as field1, Y axis as field2
11. Stats command
|stats avg(field1) by field2 — this will find the average by the group in field2|stats count(field1)|stats min(field1)|stats max(field1)|stats sum(field1)
Sample examples of SQL (Splunk query language)
I will demonstrate the typical SQL constructs used. This I will explain for a workflow like login-service, product catalog listing, ordering, log-out service workflow.
Tip: For faster results always include the name of the index
Get all logs in login-service. login-service is a microservice running on different nodes.index=platform-idx sourceType=login_service
Tip: Use the sourceType and not the source for your searches. This way if the physical log location is moved, only the Splunk onboarding configuration needs change. Your dashboard queries need not change.
Search for “Permission Denied” in the login-serviceindex=platform-idx sourceType=login_service "Permission Denied"
Using an arithmetic operator “!=”
Search for “Permission Denied” in the login-service where the principal is not “John”index=platform-idx sourceType=login_service "Permission Denied" principal!=John
Since the default delimiter is space, you need a double when you search for a string that has a space in the search criteria
Search for “Permission Denied” in the login-service where the principal is not “John Miller”index=platform-idx sourceType=login_service "Permission Denied" principal!="John Miller"
Using a wildcard search
Search all the log files for “ux002345”index=platform-idx sourceType=* üx002345
Using the relational “ OR” operator
Search 2 logs of the login_service and logout_service for ”ux002345"index=platform-idx sourceType=login_service OR sourceType=logout_service üx002345
Using stats command for count
Get the total count of successful logins to the applicationindex=platform-idx sourceType=login_service login-status=SUCCESS | stats count(login-status)
Using stats command for mean with grouping
index=platform-idx sourceType=login_service login-process-time| stats avg(login-process_time) by principal
Using time chart with span interval — for time-series visualization
Create a time-series chart than can used to monitor the processing time of a process by batch_name, time-scale=1 day as the process runs twice within the same calendar day.index=platform-idx sourceType=batch_service | timechart span=1d avg(batch_process_time) by batch_name
Now that we know how to use simple queries, let's move on to write more complex queries
- Help us in in writing complex queries in easy to interpret simple queries.
- The knowledge objects are created inside the Setting menu.
- Visibility by default is private
The knowledge objects can be categorized into 5 segments
1. Data Interpretation
This has 2 parts to it
From the data, fields are extracted as name/value pairs.
Tip: Try to have logs which are key/value pairs or JSON format so it is easy for Splunk to easily extract your fields & you can build complex queries around them
b. Field Extraction (Calculated fields)
Extraction of meaningful fields from the data. Calculated-fields usually involve using the eval function. But calculated functions are sometimes difficult to maintain, but used the right way it can increase the reusability and readability of the query
2. Data Enrichment
Data extracted by Splunk can be enriched using static or dynamic lookups to expedite easy interpretation of the data
Using external lookup data which can be in a *.csv file or table, the data can be translated or linked to a more meaningful, understandable notation.
employeeStatus “0” → becomes → employeeStatusName “unemployed”
employeeStatus “1” → becomes → employeeStatusName “employed”
lookup employmentStatus.csv employeeStatus OUTPUT employeeStatusName
You can also set up automatic lookups. Here, you don’t have to explicitly mention the lookup in your query, Splunk tries to map to the lookup. But it also means Splunk will try to use the lookup for every query using the field, so it comes at the cost of performance which may not be desirable.
b. Workflow Actions (dynamic lookups)
The workflow-actions help enrich the data by giving a context to the data by looking up external lookup via Http requests. You can associate workflow-actions to specific fields, which can be a URL hit to some website like google.com etc., to get more context to the data.
Use Case: Say you got the Supplier-Industry-Code (SIC) as 1523 in your logs. And typically you end up explicitly googling to find out the supplier-name/supplier-industry or other information.
By associating the SIC field to a workflow-action, on the click of the button, it can automatically launch the browser with a URL get a request for the SIC-code 1523 thereby getting you to google search results. This will need a little practice to understand what URL you want to put in so that to get the correct search results.
3. Data Normalization
Helps in the normalization of the data for easy categorization later.
A particular “<key> = <value>” pair on which you query frequently can be tagged. And then the tag can be used to query. There can be multiple tags for the same key/value pair.
Example: The name of the index in enterprise solution follow naming conventions and therefore are little difficult to remember. They can easily be replaced by user defined nameindex=platform-idx sourceType=order-service book-line-item product-id | transaction product_id maxpause<1dCreate a tag
PLATFORM for "index=platform-idx"New queryPLATFORM sourceType=order-service book-line-item product-id | transaction product_id maxpause<1d
A field can be renamed to have an alias. Like tags, there can be multiple aliases to a field.
| rename <filed1> as <newfieldname>
Tip: It is very useful in renaming long calculated fields to have a meaningful name
Combine several fields to one field, all the fields can be searched using one field
|eval <newfieldname> =coalesce(<oldfield1>, <oldfield2>)
4. Data Classification
a. Event Type
It is a user-defined field that can represent a category of events. The way the events are related is by the fact they can be searched using a search string. In some sense, it is like a saved search.
Color code — You can assign a color code to an event type. When data is rendered, data that matches the event-type gets color code in the display
Usage — It looks similar to tags. But tags take key/value pairs. Unlike tags, event-type can have wildcards too.
Tip: Prefer using Tags over Event-Types as there is a performance overhead. Only when you need wildcard support, use event-type.
Define event_type "successful_purchase"index=app_idx sourceType=order_service status=200 action=purchaseAny event that can returned by the above search gets an additional user-defined field "eventtype=successful_purchase" even if you are searching for a totally different search.
Within a set/series of logs, transactions allow use mark the start and end of the log and do operations on the subset of the logs demarked by the transaction.
Get all the logs events for a user from the “check-out” to “payment-failed” for the username “JohnMiller”index=platform-idx sourceType=payment-service username=JohnMiller | transaction startswith="step=check_out" endswith="payment-failed"
Using transaction, eval, and aliases
Get the time the average time the user spends inside the website to understand which days users prefer to shopindex=platform-idx sourceType=login-service OR sourceType=logout-server | transaction startswith="login_status=SUCCESS" endswith="logut-status=SUCCESS" | timechart span=1d eval(avg(duration) * 1000) as average-shopping-time
Using maxPause and transaction
Identify all the products sold twice within a minute in the product-catalogindex=platform-idx sourceType=order-service book-line-item product-id | transaction product_id maxpause<1m
5. Data Models
Data models drive the Pivot tool. It enables users of the Pivot to create compelling reports and dashboards without designing the searches that generate them.
Data Models are helpful in making data easier to use, simplify complex data using abstraction. It helps group specific types of data.
This is an advanced topic and out of the scope of the current topic
Performance Optimization Tips
Use different indexes for disconnected datasets. When the index size increase way too much that can be supported on a single node, it will be easy to migrate the entire data-set and the index to a new set of nodes
2. Narrowest Criteria
Search based on the narrowest criteria
3. Use index name
Wherever possible use the index name too in the search criteria
4. Time range
The more narrow the range, the faster is the response. This makes a huge difference to the response, so typically this is the 1st thing that is set before the search query is formulated.
Splunk for Hadoop = HUNK.
Hunk lets you access data in remote Hadoop clusters through virtual indexes and lets you use the Splunk Search Processing Language to analyze your data.
The intent of this article is to provide a jump start for developers to start using Splunk. Like everything else, practice makes you perfect and the need for business will drive you to explore more in Splunk and build complex queries.
Go ahead, get your hands dirty, visualize problems in your areas, and how you can build dashboards, panels & alerts that can help you to pro-actively monitor systems.
https://docs.splunk.com/Documentation — This is the official product documentation of Splunk. Splunk documentation is the most authoritative and authentic source of Splunk usage and it is reviewed and updated on a frequent basis. The documentation is elaborate with good explanations and covers the width and depth of various operations possible in Splunk
https://community.splunk.com/t5/Community/ct-p/en-us — This is a community-driven forum where most of your inquiries will get a prompt response. It is also supported by Splunk and is a good platform for knowledge exchange.