In this blog, I take a moment to interview the “RuxStar” 🌟 of the CTO.ai team, Ruxandra Fediuc, on how to automate your workflows as a developer by running “Ops” or automations in your CLI and Slack using The Ops Platform.
We cover everything from “Op” workflow best practices to security to, everyone’s favorite, Kubernetes automations.
Scroll down to get the full scoOp.
A short table of contents with linked questions is also available here to more easily scan the entire interview:
- What is an “Op?”
- What are some of the benefits of “Opifying” your workflow?
- What open source Ops do you recommend?
- What about if I want to build my own Op from scratch?
- How can I level up my team’s productivity with Ops?
- How can you use Ops collaboratively to help level up your entire team?
- How does security work on The Ops Platform?
- How does the CTO.ai Slack app work?
- Is there anything I can't do with The Ops Platform?
- What tips or tricks or common patterns do you see when building or running Ops?
Tristan Pollock: What is an “Op”?
Ruxandra Fediuc: The way I look at it is that an Op is a repeatable task or sequence of tasks (i.e. a workflow) that you package within a Docker container and then easily share with anyone and run it from either terminal or Slack.
Tristan Pollock: What are some of the benefits of “Opifying” your workflow?
Ruxandra Fediuc: There are multiple benefits. First of all, you remove manual work. Let’s say you have to do a sequence of things manually to achieve a particular objective in your day-to-day job; in that case, you have to keep all the details of each step in mind. Now, imagine you can get rid of all that mental load and leave all the responsibility to the Op to figure out what's the right sequence, what inputs are needed and where to get them from, what exactly has to be done at each step, what kind of feedback does the user running it need to get, etc. And then the Op just does it for you!
Second of all, you get to hide or abstract out complexity, which is extremely powerful especially if there's a lot of domain knowledge around that specific task or workflow. Oftentimes, you would have these domain experts that become bottlenecks, because they are the only people who know or have access to execute specific steps in these workflows. Let’s say you need to update something in a Kubernetes cluster; everyone who doesn’t know Kubernetes (or shouldn’t have direct access to the cluster) needs to now go through the internal cluster admins, and wait for them to have the time and capacity to do these tasks for everyone, over and over again. This slows down the process, and keeps these domain experts away from focusing on innovation and adding value.
Now, imagine you can pretty much encapsulate their domain knowledge into an Op that automates what they do, using just the right level of access by leveraging specific access tokens stored securely in a secret store and hidden from the users’ view. The domain experts help build the Op, they make sure everything gets done following best practices and internal standards and only expose to the users the appropriate level of customization. The complexity is hidden from the final users, the domain experts get visibility into who, when, how is using the Op by leveraging tracking, and the bottlenecks are removed. Removing bottlenecks reduces pressure on the domain experts, and they get to spend their time more wisely instead of just doing low-motivating, routine tasks.
Last but not least, I believe one really cool benefit of “Opifying” your workflows is that, once you made the effort to build it, you and your colleagues get to run them not just inside the terminal, but also in Slack. On any device! I personally don't have a terminal on my phone and I don't want to set one up for this purpose, because I’m pretty sure it's going to be painful and I'm going to hit some roadblocks. Having the freedom to run such workflows while on the go is pretty awesome, if you ask me! 😊
Tristan Pollock: What are some cool open source Ops that you'd recommend for the community to check out?
Ruxandra Fediuc: The place to start with is the CTO.ai Registry. First of all, check out the CTO.ai Official Ops and see what our internal team has built using the platform. There are some interesting and powerful integrations and workflows we’ve covered in our examples, including an Op for both AWS (Amazon Web Services) or GCP (Google Cloud Platform).
We also have a number of Ops to help create and manage Kubernetes clusters. Our EKS (Elastic Kubernetes Service) and GKE (Google Kubernetes Engine) Ops are dedicated to each cloud provider’s managed Kubernetes services and simplify the process of creating or deleting Kubernetes clusters, following best practices in terms of security and networking. The Ops also set up some inside-cluster tooling, as well as enable and guide users on how to access the cluster once the creation is complete. Some might think that creating a Kubernetes cluster (be it a sandbox or for a production environment) is just a matter of making a few clicks in the EKS or GKE consoles, but that’s not really that simple; you might want to first create a private network for your cluster, maybe set up a bastion host, ensure the cluster is then configured correctly, etc. The Ops will take care of all these steps for you, making your life significantly easier! To take this further, our K8s Op is meant to help manage and interact with an existing cluster, from checking the status of resources to deploying apps inside it, creating Kubernetes resources, installing popular tools inside the cluster, and more.
Our Kubernetes Ops are actively being worked on by our team and new features will be coming up, but we encourage our users to jump in and contribute, or fork the Ops and adapt them to their specific setup.
Secondly, there are some strong contributions from our community in our registry. We've seen some really cool ops added by our community members, people like TJ Holowaychuk or Julian Gruber around triaging issues, NPM dependants, GO binaries.
These are just a few examples that come to mind, but there are many more examples in the registry to look at. I think this is a good start for anyone who's excited about automating their workflows, but don’t know where to start or what it’s possible.
Tristan Pollock: So you just talked about some of the open source Ops that are out there, for people to pick up and build upon. What about if I want to create my own Op from scratch?
Ruxandra Fediuc: Right. So, first of all, one must start by deciding on what is the actual task or workflow they would like automated. Think about the things you have to do on a recurring basis and that bore you to death, or that you just find stressful and time-consuming because you have to pay attention to a lot of details, and ensure you don’t skip any steps. Estimate how much time you would save if you could automate these workflows. Look at the tools or platforms that you use on a regular basis, where you find yourself jumping from making clicks in a browser console to running a series of commands in the CLI, and so on.
Next, confirm it is possible to interface with the desired workflow through an SDK, API, or CLI tool. Once you have confirmed that you have a way to interact with these tools and platforms in a programmatic fashion, it's just a matter of going through our documentation and picking up one of our SDKs to build the Op with, based on your language of preference. We currently have support for NodeJS, Golang, Python, and Bash. You can scaffold a very simple Op to get you started by running ops init
, and then just build upon it. Leverage trusted external packages and/or libraries to help simplify your work, in addition to what you get out of the box with our SDK and UX modules.
Tristan Pollock: So what about if, I'm—let's say—a technical startup founder or a team lead? How can I think about using The Ops Platform and building and running Ops, or sharing Ops to level up our team and our team's productivity?
Ruxandra Fediuc: Right. It's interesting that in particular for smaller teams, what often happens is you have these people that sort of need to be “Jack of all trades”, learn a little bit of everything and handle everything. So they don't get the luxury to really dive deep and become experts in a specific field, right? You don't have the time to do that when you need to move fast as a startup.
Now, imagine you become aware of a way to help your team level up their skills and knowledge by automating some of the time-consuming, repetitive tasks that slow them down or completely block them from innovating and delivering value. You can encourage your team to do a little bit of research and identify some opportunities to automate and “Opify” these tasks or workflows. They’d have to invest a bit of time to build these automations, but—once built and ready to be ran—these will pay for themselves in a short time.
Tristan Pollock: If you're a team that wants to automate more of your workflows, how can you use Ops collaboratively?
Ruxandra Fediuc: Well, the beauty of building Ops is that Ops are code you can host in a repository and collaborate on with all the other developers on your team, just like you do on all the other internal tools or applications. The Ops Platform handles the team management aspect, as well as the versioning, so anyone qualified can add enhancements and help with any maintenance required. Ops can easily be built collaboratively by a team, not just by individuals.
As we discussed a bit earlier, one great aspect of building Ops is that it enables developers and operations staff to take a lot of responsibility off their shoulders and reduce their mental load. All the details one might need to keep in mind as they execute a task or sequence of tasks become code and live inside the Op’s source code, as well as any security, infrastructure standards. Without Ops, we either lock specific tasks in the hands of a few people or rely a lot on trusting that our teammates follow internal documentation and specifications when executing particular tasks. And we depend on manual reviews that often slow down the process, or introduce complex approval flows. With Ops, all of that can be easily automated or simply removed, and the engineering team feels more confident about doing things the right way. Think about creating infrastructure components and how easy it is for someone to configure it incorrectly and introduce vulnerabilities, or accrue unbudgeted costs.
To add to all this, by leveraging tracking inside the Ops for the specific events the team considers significant, the business owners gain valuable insights into how the engineering team is using the tools and platforms they spend on.
Tristan Pollock: Yeah. And you mentioned security. I know there's a secret feature on The Ops Platform. How does that work?
Ruxandra Fediuc: Right! That's probably one of the most powerful features we have and a must if you think about the world of Software Development and DevOps in general. As one interacts with various tools and platforms or builds automations around them, they deal with a lot of sensitive information, such as access tokens, API keys, or various credentials. Most tools require some sort of authentication and authorization management, right? Through our Secrets Management feature, the Ops Platform offers users the option to easily store and access such sensitive information in a secrets store at the team level.
So, imagine you would have your developers do something on AWS, but you don't want to grant them too much access. What you would do in this case as an admin is you generate the right tokens or credentials (with the right level of access configured) and you set them like secrets in specific Ops teams where the members who need to access them through Ops are added. Your teammates than just get to leverage them while running Ops, but without seeing their values. They can also build additional automations without having to depend on admins to give them the information, or without having to get elevated access themselves.
Tristan Pollock: And this works both in the CLI and in Slack?
Ruxandra Fediuc: Yes, it works in both places. What is worth highlighting here is that, in Slack, we are extra cautious as to how we take care of these secrets. There is no actual secret being exposed to Slack; everything happens in our infrastructure. So when you try to set a secret or retrieve the value of a secret while running an Op in Slack, we take you out of that to our website and use our secure APIs. We really want to make sure that people feel safe about using our secrets management feature.
Tristan Pollock: That's cool. To circle back on this Slack component and what's really cool about The Ops Platform and the level of collaboration it encourages (especially with everyone working remote right now), I'd like us to talk a little bit about how important the Slack collaboration component is, and how it provides that necessary transparency and speed, and all other things that are really beneficial for a development team…
Ruxandra Fediuc: Right. So, I think, you know, most development teams nowadays probably leverage a number of Slack apps to enable integrations with all the tools they use, just so that they get visibility into what's happening into their CI/CD pipelines, into their PagerDuty accounts, etc. Right? Because that's the beauty of it: You know you have the team there in one place and that they're going to pay attention to notifications and be aware of what is happening. I think this is super powerful, as long as you trust the ones who build these Slack apps to handle your credentials and any data that flows through them. The more apps you use, the more such actors. As far as I'm concerned, I think what is even more powerful is being able to design and build your very own such integrations without having to learn how to configure, build, and publish Slack apps (or, separately, the same integration built as a CLI tool), and without having to trust multiple actors to store your sensitive information and get visibility into your data and processes.
The Ops Platform really simplifies making these Ops available in Slack and CLI in one go: you build once in your language of choice, you run in both places. And it's just smooth, it works, and the interesting thing is you run these either individually on your machine, in the CLI, or in a really collaborative fashion, in Slack, in the channels where your teammates are present as well and where, together with you, they can see and interact with the Op as it runs.
Tristan Pollock: I love that! That’s probably one of the coolest parts about The Ops Platform. Are there any limitations? Is there anything I can't do with The Ops Platform?
Ruxandra Fediuc: Well, there are a few limitations, yes. As I've touched on earlier, if you're using some obscure tool or platform out there that doesn't offer you a way to interface with, you’re not going to be able to automate any tasks around it. That being said, most tools and platforms nowadays offer users programmatic access; if that’s not the case, you should maybe consider updating to a more modern option.
One thing in particular that could impact running Ops in Slack and that is good to be aware of is that there is currently no easy way to interact directly with your file system through an Op. So if, let's say you would like to build an automation that would involve uploading some files somewhere and modifying them (e.g. image optimization), as of now there is no smooth way to do that from within Slack. In the future, we're looking to make this very easy for the user. Every team will get access to dedicated and secure file storage. Anything they upload in there they will be able to access while running Ops.
Also, even though The Ops Platform enables some very rich ways to display information in the CLI (colors, formatting, tables, trees, etc.), as of now, Slack is limiting the way information can be printed out. There is also a limitation on the number of characters that can be printed in one go in Slack, so we recommend users to truncate or split their outputs into multiple parts. That being said, our SDKs offer our users a way to inform the user about the running interface type (terminal or Slack). This way, Op builders get the option to handle each case separately and don't have to compromise the experience in either.
Tristan Pollock: Cool. One last question: Any other tips or tricks, or common patterns you see when building or running or sharing Ops?
Ruxandra Fediuc: Absolutely, let’s see a few that come to mind right now. First of all, as our Ops Platform users are identifying opportunities for automation, one great thing to do before they start building anything from scratch is to check out what already exists in the Registry, or have a look at the Open Source Ops in the CTO.ai Github account. Especially if they’re interested in an integration with something very popular like, let’s say, AWS services. Finding a similar integration as part of an existing Op can really skyrocket one’s experience.
One other thing I recommend Op builders to keep in mind is to really keep responsibilities separate inside the source code. User prompts or other UX-related functionality should be kept separate from business logic or configuration, so that Op updates and enhancements can be done fast and without regressions. Documentation for the Op (covering usage, ways to contribute, etc.) should never be neglected. The README is automatically pulled from the Op’s source code URL (specified inside ops.yml
) and displayed on the main page of each Op in the Registry.
Builders should leverage The Ops Platform’s secret management and config management features wherever appropriate, and make sure they only use external libraries or packages they can trust. To gain insights into how often specific steps inside their Ops are running or with what metadata, I highly recommend leveraging our SDKs to track these events.
Last but not least, never underestimate the UX (User Experience) of your Ops! Each Op you are building is a separate product in itself and should target a specific audience, speaking the right language of its final users, and offering them the opportunity of a delightful experience. Avoid outputs that consist of large blocks of text (especially unformatted), use spacing wisely to help speed up parsing information, help guide the user with prerequisites information and helpful URLs where possible, watch out for jargon or clunky text, etc. You know, don’t build something you’d not enjoy using, even if it’s super technical and powerful 🙂
Follow our Best Practices documentation section for more tips and tricks, as we’ll be covering more and more aspects related to building powerful and efficient Ops.
Want to talk to the CTO.ai Team live? Join us in our Slack Community and continue the discussion.
Comments