Published at

A year working with Splunk Enterprise

A year working with Splunk Enterprise

Reflecting on a year of working with Splunk Enterprise, support tooling, and on-premises troubleshooting.

Authors
Sharing is caring!
Table of Contents

Over the past year, working with Splunk Enterprise has been a mix of debugging, building internal tools, and learning how customers run complex on-premises environments at scale. We went through major upgrades of Splunk Enterprise, Splunk custom apps, and its infrastructure.

First impressions

One word: scale. Production Splunk Enterprise setup has a lot of hosts. You will have to set up:

  • Deployment Server: a centralized management system that pushes configurations, apps, and updates to forwarders and other components.
  • Forwarders: Lightweight agents installed on endpoints to collect, parse, and route data to the indexers. Use Heavy Forwarders when you need data filtering before sending.
  • Indexers / Indexer Cluster: The core data repository. Clusters use an index replication factor (typically 3) to ensure data redundancy, high availability, and fault tolerance. Suggested host count: 8
  • Search Heads: The frontend user interface. For a production deployment, group these into a Search Head Cluster (managed by a cluster deployer) for load balancing and high availability. Suggested host count: 4
  • License Master: A dedicated instance or configuration that manages your data ingestion volume and license compliance.

As you can see, there are at least 14 hosts needed for a Splunk Enterprise production setup. It gets busy managing that. Thankfully, stability of the newest version is great, so we don’t really encounter any on-call incidents.

How does one manage it, you might ask? Simply start using tmux. You can share your terminal session across multiple engineers, which comes in handy when work has to be picked up from another timezone. It’s gotten pretty popular in the AI space as well.

What I worked on

We went through a major Splunk Enterprise upgrade from 9.0 to 9.4. Splunk likes you to have your certificates upgraded to the latest. While the upgrade was smooth, we still had to deal with a common KV Store issue. Overall, this took less than a few hours on production.

Another major point was making sure we phase out Python 2 leftovers and upgrade Python 3 to the latest version. Somehow, an old Python version was still lingering in the system. It’s good to delete old unused libraries.

When you also support a Splunk App, there is another ecosystem on top of Splunk that you have to learn about. It’s important to understand how to develop a Splunk custom app and how its configs are honoured.

Lessons learned

It’s good practice to plan out your strategy for maintaining Splunk Enterprise. You will have to think about a few things:

  • major version upgrades, which can be a problem if you don’t prepare beforehand,
  • maintenance windows, quite tricky if you want to support your users 24/7. Splunk is super handy with its rolling restarts, minimizing the impact on end users,
  • proactive alerts; it’s a good idea to monitor your instances for any anomalies in the system, especially CPU and memory spikes,
  • vulnerability upgrades, which you will eventually have to address.

Looking ahead

I enjoy working with Splunk Enterprise. It requires minimum effort for maintenance. We will plan around minor Splunk 10 upgrades and see how we can further bulletproof the platform. So far, so good! We’re looking forward to exploring more AI solutions in it. That seems to be the direction the industry is heading, and this is how it will evolve over the next few years.

Sharing is caring!