Smells Like Teen Systems: DevOps Nirvana

Frank Wiles, @fwiles @revsysÂ Slides will be online later.

Smells Like Teen Systems: Advice for raising healthy happy systems and getting to DevOps nirvana

People are fearful of change. Must be small at first. Baby steps. Be agile — little a, not big A: be spiritual, not fundamentalist; mandating….just because you read it somewhere, doesn’t mean you must do it if it doesn’t work for your organization. Have ammunition: managers need data, explanations to make decisions.

Apply metrics mentality to:

change requests
trouble tickets and bugs
deployments
outages of the smallest magnitude
interoffice political fights
approved and denied requests for equipment or funds
hires, fires, and quits
$$; labor hours, etc

“We spend on average 19 hours per week requesting more information”

Guilt tripping — no other option to keep up.

“Once we put <insert system> in place, we realized we no longer needed that weekly meeting…”

DevOps: Develop Everything Visibly Automate Paranoid Services

DEV: Develop Everything Visibly: “Everything has to happen out in the open”

OPS: Operate/Automate Paranoid Services “Automate everything with ridiculous amounts of monitoring and metrics”

Everything is version-controlled. Log of why things happened.
Everything is tracked. Ticketing; Trello; Bugs; etc.

Even more visibility:

Level 1: Team Chat. Like Slack. Email is for outsiders.
Level 2: Chat Ops <– mmmmmbot!
Level 3: Have some fun <– Fun bots

Chat ops suggestions

Deployments and config changes
Status summaries: bot check load db3
Maintenance: bot start maintenance file-server-1
Display Alerts and Warnings
Server boot/shutdown messages
Ops logs: bot log Upgraded redis to 2.8.19
Resolutions: bot resolve ticket #8 Ended up just needing to restart Apache
Common actions: bot restart apache on production

Tools: This is how we do it

Python: scripting language {relatively easy to learn and readable; libraries for talking to everything} Lots of libraries: Fabric highly rec’d, shell scripting on steroids
SaltStack: master & and then salt (minion) code. as simple or as complicated as you want; fast communication even among hundreds of systems (zeromq +aes); extensible via python; ability to return data to the master for monitoring or metrics purposes; simple to crazy complicated orchestration between systems. Examples of uses: Targeting (/srv/salt/top.sls); Pillars (/srv/pillar/* (config differences as data such as); templating
Consul: service discovery and monitoring: health checks; discover services via DNS or HTTP REST apis; deadman health checks.
ELK: Elastic Search/Logstash/Kibano <– fast log searching for when you don’t.
“Logs that aren’t centralized are rarely checked and logs that aren’t searchable are never correlated” -Frank Wiles
Graphana: for metrics visualization; pretty graphs.
Don’t capture exceptions in your inbox; put in a system. Exception.io; Rollbar. Rollbar also tracks deployments.
What to capture? As much you can store.
- general collectd system stats
- logins/signups/emails sent
- failed login attempts/emails bounced
- run time of crons and batch jobs
- backup run times and file size(s)

Resistance. Route around it. If you don’t work with the process….

Maverick Ricardo Semler {1993}

Turn resistance back on others, sometimes so it’s so cumbersome that it burdens their way of thinking.

Share this:

Related