Splunk —An Overview

Any Question, Any Data, One Splunk

Splunk is not a mere log aggregator, it has the capability to do data-analytics in almost real-time by cleaning, segregating, extracting, classifying stream of machine generated data. It helps in visualization & analysis of the data in the form of charts & statistics by creating dashboards.

Data Explosion

Machine generated data can be any data structured/unstructured that is generated by machines which can be system/server logs, data from IOT devices, weather forecast etc., By 2025, IDC (International Data Corporation) says worldwide data will grow 61% to 175 zeta-bytes where 80% of the data is predicted to be from machine-generated.

What data can be ingested by splunk?

Anything which can give results using regex can be used as an input to Splunk. For example: csv files, json, log files. Data can be uploaded to splunk, or it can be streamed to splunk.

History of Splunk

Splunk Inc. was founded in 2003 & became public in 2012. Recently Splunk Inc. declared over 1 Billion sales even amidst the Covid-19 crisis. Covid-19 is also forcing a lot of companies to increase their digital footprint and more and more companies are now looking at Splunk more seriously to improve on their sales efficiency using its web-analytic abilities. This year there has been a revenue growth of 80% in the cloud segment. Splunk has also partnered with Google to improve on its cloud presence.

Splunk Flavors

Splunk is a proprietary software and today comes in 3 flavors

  1. Splunk Enterprise — The most popular, on-prem solution & splunk steam processing system.
  2. Splunk Lite — This is a subset of the functionality offered by splunk enterprise, but this does not come with commercial licence.
  3. Splunk cloud — Splunk offered on cloud as a Saas offering. But companies having data-governance concerns should evaluate this carefully as the data is stored over cloud.

Splunk Licencing & Plans

The licencing in Splunk is by ingest, i.e. by the amount of data that is indexed by splunk. How many bytes of data are you getting to the splunk environment on a daily basis.

  1. Splunk Free — Limit 500 MB data per day
  2. Splunk Enterprise — Popular. The cost varies. You can negotiate the price based on the volume of data you expect, so that you don’t have to pay more than you need to. As the volume increases, the per GB cost reduces.

Splunk Distributed Clustered Deployment

It is built over 3 core components

1. Forwarder

Runs on the devices where the logs are kept typically, the agents that forwards the data to the splunk indexer. It is load balanced at source and there are multiple copies of the same data is sent to different indexers. When there is node failure, the data is not lost.

2. Indexer

The data received is parsed, junk data is eliminated, relevant/meaningful fields are extracted, indexed and stored for future use. This is the reason why splunk is able to retrieve data.

3. The Search Head

This is the end user facing graphical interface that we will deal with day to day where we place all our search queries. This delegates the search to the indexers, aggregates the data from the indexers, analyses & presents the data in the form of graphs/charts to expose patterns in data for the end-user to visualize and drill down

Image sourced from https://www.google.com/url?sa=i&url=https%3A%2F%2Fwww.edureka.co%2Fblog%2Fsplunk-architecture%2F&psig=AOvV
Splunk Distributed Clustered Deployment

Splunk SVA (Splunk validated architecture)

These are splunk recommended deployments which you can pick and choose. The SVA’s are built on the 5 pillars:

  1. Availability — Should be able to recover quickly from planned/unplanned outages
  2. Performance — Provide optimal level of service in-spite of varying usage patterns
  3. Scaling — Should be able to scale to provide for future increased usage
  4. Security — Protect data, configuration & assets
  5. Manageability — Centrally operable and easily manageable across all tiers

Splunk Architecture

Image sourced from https://subscription.packtpub.com/book/big_data_and_business_intelligence/9781785884351/1/ch01lvl1sec08/sp
Image source https://subscription.packtpub.com/book/big_data_and_business_intelligence/9781785884351/1/ch01lvl1sec08/splunk-s-architecture

Splunk Roles

There are 3 roles of users

  1. User — Build and use search and knowledge objects of self
  2. Power — Also use shared knowledge objects of other users
  3. Admin — Can do everything and comprises of least % of users

Splunk Configuration — Data governance for storage

This defines how splunk manages the storage of data for optimal search. It does this by categorizing data in 4 buckets. The buckets are configurable by admin. An example of how these buckets would be used is

  1. Hot Bucket — Last 1 day data — All write happen here. Super fast search
  2. Warm Bucket — last 3 months data — No writes. Fast search
  3. Cold Bucket — Data older than 3 months — No writes. Slow search
  4. Frozen Bucket — Data older than say 6 months — No writes. No search. To search this, data has to be brought to the cold bucket.


The above is splunk overview in a nut-shell, where I have touched on things that splunk does for you under its hood.

Watch out for my next article : Splunk — A developer’s perspective.



Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Sarada Sastri

Sarada Sastri

Java Architect | MongoDB | Oracle DB| Application Performance Tuning | Design Thinking | https://www.linkedin.com/in/saradasastri/