novembre, 2020 | Anthony Dahanne's blog

Kubecon NA 2020 was virtual, and took place from Tuesday November 17th to Friday November 20th 2020.

Unfortunately, I could not join any colocated event on the 17th (such as CloudNative Security Day or ServicemeshCon or any other sponsored event) so my notes just cover the « regular » keynotes, talks and tutorials.

Day 1 Keynotes

After a tribute to former CNCF Executive Director Dan Kohn (who passed away shortly before the conference), the keynotes started with a presentation of new platinum CNCF members (VolcanoEngine from Bytedance, Veeam) and a new level of certification called Certified Kubernetes Security Specialist.

The Apple keynote was presented by a platform engineer describing the challenges at her company adopting Kubernetes: trying to provide a secured and enjoyable experience for developers and SREs alike (Namespace as a service ? Can cluster level CRDs be used ?) They are now investigating Virtual clusters, MicroVMS, multi cluster management, better observability, and improving the CI/CD developer experience.

CNCF Project Updates

Falco (consume and enrich kernel events – incubating) : several improvements and additional platform supported (ARMv7)
Thanos (high available metric system with unlimited(!) storage – incubating): new UI, multi tenancy
Rook (open source storage for k8s – graduated) : support for ceph stretched clusters
Vitess (scale MySQL by sharding it, migrate from bare metal to cloud, manage large number of SQL DBs – graduated)
TiKV, TUF, Harbor, and Helm all graduated

Kubernetes Project Updates

Ingress enhancements went GA
SIG multi cluster: APIs in Alpha
Container Storage Object Interface was released: ephemeral volumes, Windows support, volume snapshot feature progressing
SIG Docs: website migration to modern theme
Infra: moving from google infra to community infra

Day 2 keynotes

Using OpenTelemetry to empower end users

Merger of OpenCensus and OpenTracing took place in 2019.

Why OpenTelemetry? because it’s the moment of observability, prioritizing end users, and vendor lock-in is a pain.

The OpenTelemetry collector is a binary that receives the telemetry data, transforms it and send it along; it’s pluggable and supports popular open source protocols.

It can be deployed on a host gathering its traces or as a collector for several hosts.

OpenTelemetry collector can gather data for Prometheus, Zipkin, Jaeger in a single format, « otel » protocol (but you can still send to the collector your data in the existing vendor protocol) and transform this data into specific formats; doing so, it can also enrich the data with some context.

The collector allows you to abstract away the back-end you’re using; allowing you to break free from vendor lock-in.

The GA is coming soon.

Moving Cloud Native beyond HTTP

Http is the lingua franca of the web.

IoT is a domain where lots of protocols arose recently: for power consumption, management, etc. but those protocols aren’t as well supported as Http (you’re on your own if you want to implement load balancing, HA, etc.)

How can we implement any protocol in a cloud native way?

The speaker came up with a shared document (super interesting to read) where people can share their concerns with those new protocols.

Envoy supports Redis and Postgres protocols now, due to popular demand; and now makes adding your own protocol easier.

Cloud Events is a spec which includes protocols (Http and others) and serialization format (JSON and others) and is already used in project such as KNative.

More Power, Less Pain: Building an Internal Platform with CNCF Tools

Speaker (David Sudia) tried to improve the developer experience at his company.

They migrated from Heroku to K8s, to leverage the ecosystem.

Of course, that introduced complexity versus the simplicity provided by a standalone PaaS provider.

« Everything will be simpler in 6 months » : landscape is moving so fast, that it’s often true! Should you wait before migrating?

Observability: using OpenTelemetry.

Build and deploy: no more Dockerfiles, but Heroku buildpacks.

Your dev tools and platform are products: treat them like such!

The question you have to ask yourself before migrating is: does your organization has the critical size to consider developing your own platform or should you keep on paying for a PaaS ?

Day 3 keynotes

Stephen Augustus started the keynotes describing his role as a Kubernetes open source engineer at VMWare.

In particular, Stephen told us about the release frequency of Kubernetes: should it be yearly, 4 times a year, an LTS, a nightly etc. ? He gathered feedback and most people answered 3 times a year.

Then two engineers described how we can be more inclusive as cloud developers welcoming all users no matter their backgrounds.

The unofficial SIG Honk (reference to Untitled Goose Game and its funny focus on breaking things) hosted a panel, and described how developers should be careful designing their container images (convenience of bringing debug tools – bash, curl, etc.- versus security concern of having those tools in production – they could be useful to attackers) and deploying them (careful propagating secrets, defining trust boundaries, etc.) – they shared an interesting website to learn about securing kubernetes.

During a sponsored keynote, the interesting website KubeAcademy was introduced by VMWare.

Liz Rice talked about governance in trending areas such as: chaos engineering, developer experience, edge, eBPF and service mesh.

Sessions

Stop Writing Operators

Link to slides.

That’s definitely an idea that has been around this year: from blog posts, to IT tech articles and this talk, many people start advising you not to write your own operator!

CoreOS coined the operator pattern term back in 2016: it was meant for managing stateful apps.

When you don’t want an operator

no statefulness or k8s has the necessary state handling
apps you maintain: splitting the management feature out to an operator is an additional piece to maintain and explain how to use
security concerns: operators have privileged access to the cluster.

What are the alternatives

why not use a plain CRD Controller without state ?
no-code operators: such as Kudo
run outside the cluster

When operators are needed

Someone else’s app: and its state has to be managed inside Kubernetes
to leverage Kubernetes latest features from your app

What about routine operations ? Think of the non operator way to run them.

I’m writing an operator anyway, now what?

Understand CRDs and Controllers
Look at the operator landscape : Kudo, Operator Framework, KubeBuilder
Can you write it using your favorite programming language?

When you write the operator, maintain loose coupling, no code is often better, so write the minimum.

Production CI/CD with Cloud Native Buildpacks

Link to the slides

What is a build pack ?

detect: runs against the source to determine which buildpack to use (if pom.xml, it’s Java Maven, if there’s package.json, it’s Javascript / Typescript)
build: downloads build time and runtime dependencies, then compiles if needed

Most buildpacks are implemented by Google, Heroku and Paketo, not by CNCF.

During the demo of pack on a mac and with a Java project, we could see in actions the phases: Detect, analyze, Restore, Build and then export that created a Docker image.

Demos:

Buildpacks in actions at Gitlab, using the pack CLI

Circleci: Buildpacks orb, using the pack CLI

Tekton interacts directly with the phases, not using the CLI

Kpack: tool by VMWare : k8s native, uses lifecycle and supports building and rebasing; kpack picks up some source code commit hash and builds from there, ideally after tests would run successfully.

Constructing Chaos Workflow with Argo and LitmusChaos

Link to the slides

What are Chaos Engineering and Litmus?

What is Chaos Engineering ? Practice of inserting faults in a system in a steady state; to find out weaknesses.

Litmus is deployed into K8s as an operator, configurable in a declarative way.

A helm chart is provided to deploy the portal, then you declare a chaos workflow to configure the operator.

Litmus will share metrics and events, Prometheus can pick them up and aggregate those.

From the ChaosHub, you can download Chaos Experiments; for example among the 22 generic experiments you have: Pod Delete, Disk Fill, Node Taint, Pod Network Loss, etc.

Litmus allows you to Bring Your Own Chaos (BYOC); through a SDK you can define your own experiments.

Through the portal you can design a workflow of experiments, run them (using Litmus running in its own namespace, targeting another namespace) and then get the results.

The results can be exported to an operation data lake.

Demo (source available on Github)

First scenario was deploying a couple of Nginx pods, and having Litmus destroy some of them (then make sure they’re scheduled back)

In the second scenario, the experiment was scheduled using ArgoCD

A High Schoolers Guide to Kubernetes Network Observability

Link to the slides

The speaker, a high schooler (!) tells the audience about his journey into Kubernetes networking internals.

The project he worked on: kube-netc is a tool to grab all network statistics at the TCP layer.

All starts with eBPF: that allows you to run custom code in the linux kernel; it’s a bytecode basically.

To implement eBPF package, you can start with C or Go bindings .

Here are some popular eBPF projects: bpftrace, cilium, falco, kubectl-trace.

Once he could write eBPF code, he had to integrate with Prometheus to share the metrics generated.

After that, he had to deploy his package using a DaemonSet.

The result, kube-netc allows you to discover network metrics on your k8s nodes; you can find out more reading this blog post.

Kubernetes CronJobs – Does Anyone Actually Use This [in Production]?

Link to the slides

Interesting talk by a Lyft engineer, telling his experience using k8s cron jobs, and why they would sometimes not start.

Interesting reasons such as API client rate-limiting (every 30 sec. the CronJobController would connect to the API server , given several hundreds CronJobs, that adds up the calls…), or « Too Many missed starts » can cancel your cron job; worse part is that only few logs or metrics can be gathered… so good luck finding out what happened.

The presenter went on to explain what would make the user experience better with k8s Cron Jobs (observability basically)

All hope is not lost, a KEP was recently merged so progress is on its way!

A Walk Through the Kubernetes UI Landscape

Link to the slides

The presenters went through the advantages and disadvantages of several k8s UIs (all open source)

Kubernetes Dashboard: classic, not extensible, some management possibilities
K8Dash: a simple alternative to K8s Dashboard
Kubevious: can analyze your cluster and even time machine your cluster state, read only
Octant : local only. for developers, nice auto port forwarding feature
Lens: desktop client, ships kubectl along with it
Kubenav: mobile first !
Headlamp: extensible, actions cancellable with grace period!
Kubernetes Web View: read only
K9s: command line interface, very convenient

Beyond the buzzword: BPF’s Unexpected role in Kubernetes

Link to the slides

What is (e)BPF: fundamental change to 15 year old problem; allows to run user space program in the context of the kernel.

Why do you care? customizable networking, debugging and performance analysis, monitoring and security.

Evolution

Used in 1997 for TCPDump, then in 2014: new JitCompiler, IO Visor enters the linux foundation, 2017; several tools like Cilium and Istio, in the Kubernetes space start leveraging those features.

The eBPF OSS landscape is divided into 4 categories: low level tools, API libraries (python, Go and C), security and networking, and visibility.

Inspektor Gadget: a swiss army knife for eBPF tooling, focused on Kubernetes use case.

Inspektor gadget can be used from kubectl gadget that will create a DaemonSet that will augment the node kernels with ebpf « gadgets ».

Gadgets available today: add capabilities, execsnoop, tcptop, tcptracer, profile, network policy advisor (to discover pods interactions)

Alban showed 2 demos

the first with execsnoop: start a pod that executes several commands and record all their calls in a text file
the second with network policy advisor: he observed interactions between micro services and got a network policy configuration generated

GitOps Is Likely More Than You Think It Is

GitOps is about: config code / app code, automation and deployment (kubernetes here)

We got there extending CI into CD; but they’re different though and should be separate from each other (separation of concerns, many deployment environments, decoupling build from deployment)

Having the CI access the deployment keys is also a huge security risk.

So the best pattern is to have the deployment targets pull the images to deploy; but when do they know when to pull ?

Using image update automation, we can reconcile the desired state (use latest stable version for example).

To apply overlays on top of the basic configuration (for environment specific configuration), kustomize can be used.

Flagger is an open source tool that can extend Kubernetes reconciliation allowing the administrator to setup canary deployments (Flagger talks to the ingress controller / service mesh to redirect traffic among the different deployed versions)

GitOps is the result of Continuous Delivery + Continuous Operations.

Declarative Testing clusters with KUTTL

Link to the slides

Developers are familiar with unit, integration and end to end testing.

In the context of kubernetes, how can we smooth out e2e testing?

Declarative testing allows the developer to assert the desired state is obtained.

kuttl comes with a cli to make the assertions

Kuttl creates a namespace for each test, to provide isolation.

Ken demo’ed few use cases, the simplest one being bringing a pod up and asserting it was in a running test; more complex cases bringing up a full test suite including several steps.

Kuttl can also just assert a specific state, instead of deploying + asserting.

In search of a kubectl blame command

Link to slides

Why ? to display cause -> effect.

Kubernetes is a control loop: you express the desired state, it runs a diff, and then applies a function based on the output.

Nick continued on to take the example of CSS as a declarative language, that can be hard to debug.

Say goodbye to YAML engineering with the CDK for Kubernetes

Link to the tutorial code

Not a session per say, but rather a sponsored tutorial.

The presenters started describing how YAML can be useful, in particular in the k8s context.

Introducing cdk8s though, we are introduced to the advantages of code re-usability rather than copying / pasting like we usually do with YAML.

As of today, 4 languages are supported: Typescript, Javascript, Java and Python – Go and .NET are on the roadmap.

Using cdk8s, you can envision, using the same programming language and set of tools, writing your business logic and infrastructure code.

Cdk8s becomes the « compiler », and Yaml the « assembly language ».

The presenters continued with a demo, that is available from the link above.

cdk8s can import the « regular » k8s API as well as CRD Api, as long as you provide the required YAML defining your CRD.

There are 2 levels of APIs: the first one allows the developer to describe, using code something very similar to YAML; while the « plus » API, or L2 API, is a higher construct where you only deploy what you need (Deployment.addContainer(...))

One nice feature they demo’ed was the ability to define a Docker image builder, using cdk8s; they even published it to npm (during the demo, they used Typescript, so in the same class they could reference the image build AND the deployment, service and ingress)

Confusingly enough, using the cdk construct creates a « chart » of resources; although Helm charts can also be created using cdk8s – they used the Redis chart during the demo to provide persistence to their demo counter app.

CDK is growing: not only k8s, but Terraform and other tools; after all, it can generate any type of deployment resource!

Conclusion – key take aways

For sure, a virtual conference is less fun: no interacting with speakers and the audience during the pauses and evenings, the virtual sponsor booths are less engaging; and technically this Kubecon NA Virtual suffered some regular problems with the Intrado provided streams (pauses)

But…

Content was as good as the previous years, and definitely worth the time.

For this edition, to me, the key topics were:

using eBPF based tools to monitor and debug Kubernetes resources
Developer experience: building without Dockerfiles (buildpacks), and deploying to Kubernetes without Yaml (cdk8s, fabric8 API, etc.)
Operators fatigue: you probably don’t need one for your app
Kubernetes UIs: Kubernetes Dashboard is far from your only (and best!) choice
GitOps: becoming the standard way of deploying
ChaosEngineering: becoming mainstream thanks to the mature tooling

Anthony Dahanne's blog

Archives mensuelles : novembre 2020

Kubecon North america 2020 virtual – recap

Day 1 Keynotes

CNCF Project Updates

Kubernetes Project Updates

Day 2 keynotes

Using OpenTelemetry to empower end users

Moving Cloud Native beyond HTTP

More Power, Less Pain: Building an Internal Platform with CNCF Tools

Day 3 keynotes

Sessions

Stop Writing Operators

When you don’t want an operator

What are the alternatives

When operators are needed

I’m writing an operator anyway, now what?

Production CI/CD with Cloud Native Buildpacks

What is a build pack ?

Demos:

Constructing Chaos Workflow with Argo and LitmusChaos

What are Chaos Engineering and Litmus?

Demo (source available on Github)

A High Schoolers Guide to Kubernetes Network Observability

Kubernetes CronJobs – Does Anyone Actually Use This [in Production]?

A Walk Through the Kubernetes UI Landscape

Beyond the buzzword: BPF’s Unexpected role in Kubernetes

Evolution

GitOps Is Likely More Than You Think It Is

Declarative Testing clusters with KUTTL

In search of a kubectl blame command

Say goodbye to YAML engineering with the CDK for Kubernetes

Conclusion – key take aways

Open Source, Java, Docker, Kubernetes, Self hosting