[0:00:01] ANNOUNCER: There are hundreds of observability companies out there and many ways to think about observability, such as application performance monitoring, server monitoring, and tracing. In a production application, multiple tools are often needed to get proper visibility on the application. This creates some challenges. Applications can produce lots of different observability data, but how should the data be routed to the various downstream tools? In addition, how can the data be selectively sent to different storage tiers to minimize costs? Calyptia is a service that helps manage observability data from source to destination. Eduardo Silva is the Founder and CEO of Calyptia and he joins us in this episode.

This episode is hosted by Lee Atchison. Lee Atchison is a software architect, author, and thought leader on cloud computing and application modernization. His best-selling book, Architecting for Scale, is an essential resource for technical teams looking to maintain high availability and manage risk in their cloud environments. Lee is the host of his podcast, Modern Digital Business, produced for people looking to build and grow their digital business. Listen at mdb.fm. Follow Lee at softwarearchitectureinsights.com and see all his content at leeatchison.com.

[INTERVIEW]

[0:01:30] LA: Eduardo, welcome to Software Engineering Daily.

[0:01:33] ES: Hi, Lee. Thank you for the invite. I’m really happy to be here.

[0:01:36] LA: I'm glad you're here. First of all, is it Claptia, or Clapcha? How is it pronounced?

[0:01:44] ES: We pronounce the company name as Calyptia.

[0:01:47] LA: Calyptia.

[0:01:47] ES: Calyptia.

[0:01:48] LA: Calyptia. Okay, so I was actually close for the first time than any of the other times. Okay. It's always a struggle of mine, right, is to figure out how to pronounce some of these, and some people's names, too. Are you an observability company, or do you work with observability companies, or both? How would you describe yourself?

[0:02:08] ES: Okay, I would describe Calyptia as both as an observability and a security company that sits in the middle to move and transform, or reduce data between sources and destinations.

[0:02:23] LA: You play in both the observability space and the security space, is the idea?

[0:02:27] ES: Yeah. Actually, if you abstract for what it means to play with data, we are very agnostic. When we say it about security and observability, there are different type of outcomes, or data value extraction that you have for different type of users, but all of them comes from the same data.

[0:02:47] ES: Yeah. The data is the same. It’s just different use cases for the data, sounds like.

[0:02:52] ES: Yes, different solutions, different use cases, and different needs around them, right?

[0:02:57] LA: Right, right. Yeah, so data that's critical for one use case is not necessarily critical for another, but something else is in its place. That makes sense. How do you compare yourself? Let's look at first, observability companies specifically. Let's talk about companies llike, let's say, Datadog, or New Relic. How do you compare to them? You're not a traditional observability company, or for that matter, a traditional security company. Where do you fit in that niche?

[0:03:24] ES: Yeah, that's a really good question. Actually, everybody who understands something for observability is a description that has been set by the biggest players, like Splunk and  Datadog, or others. Like, observability is my platform where you send all the data, and I provide you the right tooling for analysis, insights, and everything you might need for security, for observability, or any other type of case.

Now, when you think about what the user is trying to accomplish, at the end of the day, they just want to analyze data. Get insights, right? When they take the decision, they go one step back and say, how? How means, oh, I might need to centralize all the data in one platform that has all the capabilities to extract and run analysis, or reduce my data, or do some kind of math around it. Then you start thinking about Splunk, you start thinking about Datadog, or Elastic Open Search, or any other major player. Where is Calyptia? Calyptia is a company that specialize on collecting, extracting data from multiple sources, multiple destinations, and integrates them with Splunk, with Datadog, with Open Search, with Elastic, Amazon S3. We have more than a hundred connections between sources and destinations.

While we don't replace Splunk, we don't replace Datadog, actually, we provide a better experience for users that relies on this platform. we Are very, very agnostic from the backend place. We call it backend, means where we send the data for analysis. We sit in the middle.

[0:05:11] LA: Got it. While companies like Datadog, New Relic Splunk, etc., all of those companies generate data and also provide a way of displaying the data. What you do is you collect, aggregate, and promote the data. Promote may not be a good word, but you collect, aggregate, and make available for use the data.

[0:05:33] ES: Yeah. There's a new concept around, but has been there for a couple of years, what is called data pipelines, or telemetry pipelines. If you think on the backend, it’s like the ocean. You have some way to move some water out to that ocean, and those are the telemetry pipelines. Cayptia, we focus on fully integration with different types of sources. When I say sources, could be application logs, application metrics, system metrics, for example, firewalls messages from security aspects that they need to be sent to a central place for analysis. Because as I said, the user wants to do data analysis.

This is a constant challenge. Every company is generating 20% to 30% more data every year. That's one fact. The other garnered just published that by 2026 in three years, at least 40% of the companies will have a solution, something Calyptia, in order to control, collect, enrich, and transport the data before sending the data to these specialized backends. There are a couple of reasons for that.

The challenge is that arise when people start generating data with applications is like, volume. First of all, you lose control of the volume of the amount of data that you have. Your backend provider will tell you, “Hey, please send me all the data. I will take care of it.” Yeah, but maybe you are analyzing 20% of that data, but you’re paying for the 100%. This is a traditional approach for data processing. Send me everything, and I will process it and give you the insight.

Now, what we are proposing as Calyptia, and it's a new trend in the market, it's like, hey, move certain processing capabilities and certain type of analysis to the process that runs before the data gets ingested. This is the data pipeline. While you are collecting data from different sources, or receiving this data from the network, you have the ability to say, “I'm not interested on this data.” I wanted to send it to Splunk, maybe for real-time analysis, or maybe just want to send a fraction and everything else to Amazon S3, which is the cheapest storage, right? It's not for real-time data analysis.

All the users started a journey now that they don't think about just observability. They think, what is my strategy to accomplish a good and scalable observability in the future? If you want to scale in all areas possible, you need to take control, control and measure. These are data pipelines, and Calyptia provides the best one in the market for that.

[0:08:08] LA: Got it. Once again, companies like Datadog, etc., they generate data, and rather than keeping the data internal, we pass it out into a generic pipeline. That's where you come in. Your purpose is essentially the filter, the aggregation, the determination of where the data should go and where it should go and how it should be processed, sending out to the backends, which might be Splunk, it might be an S3 bucket for long-term storage, it might be some other mechanism. You're aggregating the data. May or may not be processing at some level, but you're definitely filtering and deciding which data goes where?

[0:08:46] ES: Actually, we do processing. We can do processing in motion where the data is on flight. For example, one use case from one of our customers from the security space, they have tons of firewalls, and those firewalls ships tons of syslog messages. They use Splunk to do real-time data analysis. They found that from, I don't know, 200 messages, maybe five are useful for them. They are not sending the 200 messages per hardware to Splunk. They are sending the only one that are relevant. With Calyptia, they are able to detect, oh, this is information they really care about. I'm just going to send this information to Splunk.

Now, data sources could be a firewall, could be something that applications that you have in your own cloud and your own BMs that are generating in for logs, right? Logs, majority of use cases, logs tells you what's the state of the application, what's happening inside the application, informational errors, warnings and so on.

[0:09:43] LA: It makes perfect sense. There's an open source offering that does this sort of data pipelines that I'm aware of called Fluent Bit. Now, you're related to Fluent Bit. Do you want to tell everyone how you're connected with Fluent Bit?

[0:09:56] ES: Yeah, Fluent Bit is my baby. I created Fluent Bit, like almost –

[0:10:01] LA: You’re the creator of Fluent Bit.

[0:10:03] ES: Yeah. I created Fluent Bit. The first prime project is called Fluentd. Fluentd was created by SATA. I created Fluent Bit. I was part of the Fluentd team. Fluent Bit and Fluentd, both projects solves the problem of how to collect data from multiple sources, send it to multiple destinations, but also dealing with different formats. It's not the same syslog and an Apache log web server, or NGINX. They are all different. At the end of the day, you need to certain processing capabilities in order to structure the data in a way that you can do an efficient analysis of that data. Fluentd and Fluent Bit allows you to do that.

Fluent Bit has been a very successful project. Has been around for almost seven years, as I said. I think that one of the biggest traction in the market, the primary solutions to create data pipelines is that those projects, when we work at a previous company called Treasure Data, we donated those projects to the CNCF. I think that Fluentd joined it and Fluent Bit, like the fourth or fifth project to the CNCF. CNCF was pretty new. It was just Kubernetes, Prometheus. I think that OpenTracing, when Jerry was there. We are from the early days.

This project is starting to get a lot of adoption and primary because, hey, people's generating more data. I think, it's one plus one, right? Every year, people generate more data, so they need to analyze it, they need to move the data, and they found that relying on Fluentd and now Fluent Bit was the best solution for that, because they were scalable and they have a pluggable architecture.

I think that one of the most important points for big corporations is they're really vendor neutral. They are with the CNCF. It's not like, “Oh, I'm New Relic, and I own this. Or I'm Splunk, or Datadog, and I own this technology.” If tomorrow I change the rules of the game, I don't know, something could happen, right? You might be in trouble. By being a vendor agnostic place, this opened the possibility for companies to trust and rely on this technology. When I say companies, for example, today, Microsoft, Google, and Amazon runs Fluent Bit at a very, very high scale in their infrastructures.

Fluent Bit is an open-source project. Grew a lot. It has all the integrations possible between sources, backend vendors. After a couple of years, I decided with my co-founder today, Anurag, that hey, it's time to start the next chapter of the Fluent ecosystem, and this is Calyptia. Because Fluent Bit got massively adopted in a few weeks. As we were talking before, we're going to approach 10 billion dollars in total, which is insane.

[0:12:40] LA: That's an amazing success for an open-source product.

[0:12:41] ES: It’s amazing. Yeah. Yeah, so the connection between Fluent Bit and Calyptia, Calyptia is just the enterprise chapter of Fluent. Enterprise has other needs around security, performance, auto-healing, or sometimes cloud native solutions, more integration around cloud native, scalability, secret management, and those type of things.

[0:13:04] LA: Do you take Fluent Bit as a basis and add things, like support and enterprise-grade capabilities on top of it? Or did you take Fluent Bit as a starting point and create a new product for the enterprise?

[0:13:20] ES: Oh, it's totally a new product. Actually, Calyptia, what we call Calyptia core is a full platform built on top of Fluent Bit. We are not selling a Fluent Bit++. We're selling a whole platform that allows you to create data pipelines, not just for aggregation, but also, remote pipelines. You can call it fleet management, where remotely, you can create pipelines on nodes that can collect, process, and send the data directly to any place without the need for centralization, or aggregation.

For example, if you have, I don't know, you have a 100 BMs and you need to process all the information from those being collected, pre-process it, or maybe reduce information, you can deploy Calyptia core and remotely use the core agent that is based on Fluent Bit. Our platform through the simple UI allows you to create a data pipeline and say, “This is how I want to collect the data. This is the logic that I want to process my data,” and then send the results to one, or multiple destinations.

Now, if you do aggregation, have this concept of each pipeline, versus centralized pipeline, or aggregation. Aggregation on centralized pipeline is a very scalable way, because it's not just one pipeline. You can scale it up until a 100 if you want. It depends on the load of data that you want to receive. Maybe the next question is, hey, can you do maybe the same thing with Fluent Bit? Yes, but might take you a couple of weeks and a couple of faults from your engineering team to manage and maintain this in a consistent way, while with the product, you can get up and running in 10 minutes.

[0:14:58] LA: Now, you're also a SaaS application, right? People buy you as a service, versus as a downloaded software product. Is that correct? I know there's an agent and everything, but forgetting that, the core of your product is a SaaS application?

[0:15:12] ES: It's a hybrid mode. Let me explain you how. Our control plane lives in the cloud, in our cloud. The control plane doesn't store any customer data. What you do is you deploy core on your own instance, in your own data center, or your own computer, even laptop if you want. Core gets up and running, it connects to our cloud and register itself and say, “Hey, I'm here.” Now from the cloud, you can start creating the data pipelines that gets deployed in your own computer, in your own environment. The data never goes through our cloud. The data always flows in the way that you tell it to flow.

[0:15:51] LA: The data flows, the pipeline itself is all in the customer's premise, or wherever they install the agents.

[0:15:57] ES: Yeah.

[0:15:59] LA: The management of it is what goes on in the cloud.

[0:16:02] ES: Yes, but there's another version that we just shipped a couple of weeks ago. We have a couple of customers who runs in private cloud. Private cloud, you might think about finance and government. They have a special needs around, hey, we cannot talk to the Internet. For them, we ship a version of Calyptia core that runs with all the components in their own environment, so they don't talk to our cloud. You have these two types of deployment mechanism, hybrid in a shared cloud environment, but everything, all your data is on-prem. The other side is fully BPC with your private cloud.

[0:16:37] LA: Got it. Okay. There is an installation process to getting their product working. What do you do to make that a smooth process for people?

[0:16:47] ES: Oh, we just provide two commands. A shell script, or a help command, and you're ready to go copy paste. It takes two minutes to get installed. It depends with your network bandwidth, right? Just download, get up the artifacts and everything, and it's ready to go. If you are in Kubernetes, you can get a better experience.

Actually, as a caveat, our solution is really interesting for how do we look at the future of deployments for platforms that support different types of workloads. Our solution is fully based on Kubernetes. Even if you don't have Kubernetes running, you can deploy Calyptia core. Our installer will recognize, oh, you are not in Kubernetes. That's fine. I'm going to spin up a single classic Kubernetes for my own as a single process, and I'm going to start doing all the magic. If you are in a real Kubernetes cluster, it's going to solve all the components and manage all the balancing networking for you automatically.

[0:17:41] LA: Cool. Okay, so there's an on-premise component. There is your backend component. You do the management on your end, but all the data playing is entirely within a customer premises, so there's no shared data that goes on. Okay. That makes you very different from many observability companies, like the New Relics and the Datadogs in the world, where very much, they’re in control of the customer's data.

[0:18:07] ES: Yeah. They are in control when they send the data to them. We provide the control for the users and send customers back in their own environment. At the end of the day, this is not Calyptia versus the others. Actually, we found that our customers get a better experience with their own backend, like New Relic, or Splunk, because at the end of the day, they are sending – the data gets processed in a more efficient way with the right structure. The user doesn't need to wait until all the data gets ingested into the backend, get indexed in order to do some processing, or analysis, because you can do this in parallel, right, and in a distributed fashion, while using Calyptia core technology. There's a huge difference between waiting 30 minutes to process your data, instead of doing in one second or less, while the data is being collected in real time.

[0:19:00] LA: Makes sense. When I think about observability, and I know you want to talk about both observability and security. Let's focus on the observability side for a moment, because I think that's the broader covering, at least from this conversation. There's multiple different types of observability data that you can capture and that you can, or can be sent through a pipeline, such as yours. When I think of observability data, I think of three major classifications of data. I think of event data, which includes log files and things like that. There are things that have happened to your application. Here's a notice that it's happened and you log it, you do whatever you need with it.

The other one is metrics. These are point in time values. This is what New Relic and Datadog historically have been very good at. It's like, what's the CPU of the server? What's the amount of free memory, any virtual memory stack, and this part of the application, how much free memory do I have? How much have I used? All those sorts of things are point in time values that are part of an observability story. The third one is tracing, which is managing a request end-to-end request, all the way through a potential multiple service application and all the way back and data from multiple sources, aggregating it together and sending it back to create that entire trace. When I think of observability data, I think of those three separate things.

Now, different companies, I think specialize in different types of observability data. The Datadogs of the world really came from a metric background and have added tracing and logging and things like that. You have companies like Splunk, which came from a logging space and added all these other ones in later. Few companies do them all well. Most companies do one or two of them well, but not all of them well. Do you focus on one or two, or do you have capabilities and do you focus on helping in all three of those? Or do you focus on one or two of those categories?

[0:21:05] ES: A great question. Actually, our platform supports logs, metrics, and traces. For example, we can collect – logs has been historically our major support. For metrics, we can do, for example, Prometheus scraping and send those metrics as open telemetry metrics. We have all these conversion layers, or receive open telemetry metrics and send them, or export them as Prometheus exporter. We can compare between different formats through different layers. We have done that for a couple of years already in the open source, where we are leveraging these features from a product perspective with some simplification.

Now, you're right, you're saying, observability care about logs, metrics, and traces and for example, security cares about data sources and destinations, like a scene, period. Now, for metrics and traces, yeah, we support processing capabilities, too. Now, I agree with that that everybody has been focusing on one or the other. Now, we are seeing that in the market, everybody's trying to converge on that, “Hey, we need a solution that it gives us the possibility to correlate all the data together.”

[0:22:11] LA: They realize that you need all three in order to have a full solution.

[0:22:14] ES: I think that need versus ideal is case by case, right? There are companies that with logs that are fine, but just with metrics. For example, in our case, we're finding we are going to ship a new feature in our product that allow us to convert logs to metrics, right? This doesn't exist everywhere. For example, imagine that you're receiving a 1,000 logs, right? You know the structure, you know the value that they have, but you don't care about the logs. You just want to know, for example, how many of them belongs to a certain category, how many of them have a specific value. For example, HTTP queries, or so on.

It's more valuable for the user to have the ability to take those logs and writing the pipeline, be able to process them and ship a single metric that say, “Hey, this is a value.” Then you can take some action. For example, alerting, “Hey, are things going well?” No, you set the threshold and you can do whatever you want with that. Most of the packets provide you those type of futures by processing logs, but while after they have been ingested.

I would say if I talk about what I see from all the perspective of logs, metrics and traces, the journey to have a single platform for everything, I would say that we're still in an early stage in the market for that. I'm glad to see that all vendors are jumping into that journey, because everybody's coming up with really interesting approaches for data analysis. Now, the biggest problem is the volume of data for them.

[0:23:44] LA: I hear you as far as the aggregation between like, converting logs to metrics and vice versa and because metrics can be changed to events by triggers. That's essentially what a trigger is, is a notification of something happening. One of the things I know that's always been a challenge, I know things are getting a lot better now with many open-source products, but at least in the early days of tracing, request correlation was one of the hard things to do. Then when New Relic first started working on tracing, I was still in New Relic at the time. That was always a hard thing to figure out how to do is how do you do the correlation successfully without adding huge complexity to your application, to your infrastructure? Is that correlation something that Calyptia can help with?

[0:24:35] ES: In the way that we helped, for example, with traces specifically, we can receive open telemetry payload, or open telemetry standard for tracing as Prometheus for metrics. That's the industry standard. We get the traces. The ability that we have is to reduce the content of the traces. Sometimes they are very noisy. The span has information that are not necessary, or certain spans are not useful. Same in logs, where you have a lot of noise, same thing happens with trace. Actually, trace is very, very noisy.

Our capabilities is not around to do correlation. It's around data transformation on those spaces. We let all the correlation to happen in New Relic, or in Datadog, or any platform that the customer is using.

[0:25:22] LA: Okay. Correlation from the standpoint of relating one piece of independent data to another piece of independent data and to find out the commonality between it and the interactions between them, you'd leave those to the New Relics of the world. When it comes to actually creating an entire trace and merging all the data that goes along with that and determining what parts of that trace are relevant and what parts are not, that's something that you can handle within the pipeline.

[0:25:51] ES: Yes. What the user does, for example, is that they instrument their applications, for example, with open telemetry. They said, “Oh, this is the endpoint to send my traces.” They use Calyptia core. We reset the traces and they create their own rules to say, “I want to discard certain spans based on certain content, so those spans doesn't flow to the backend.” That's all the thing that we do for tracing. The backend said, “Oh, I get all these spans. I would try to correlate it. Oh, this is a full trace from end to end.” Now, you have less data, right? You have something that gets easier to process.

[0:26:27] LA: This reminds me of a quote I saw, or that I wrote down that I saw on your website and I wanted to mention to you, and I just see it on my notes here now. That quote that I saw on your website was transform complexity into insights. That caught my attention, because IT complexity is a huge topic of mine. I've even written a book, and a Riley book specifically on that topic. I care a lot about complexity. I didn't really know how that applies to your product, but I think you're explaining some of that now and that correlation, but aggregation is really part of that reducing complexity message that you were trying to get across with that statement, right?

[0:27:10] ES: Yeah. There are a lot of complexities in this space. For example, the moment that the users need to deal with different applications with different sources, plus different formats, and they want to have a unified experience of data analysis. That is really, really complex. The way that we solve this problem, we started this journey 10 years ago with Fluentd. Everything that we're building now is building blocks on top of the experience, where we want that instead of the user focusing on how to collect the data, how to transform the data, how to send it, which is a very heavy task for many teams, and we see this in a day-to-day with all the sales call, all the PLC, yeah, people struggle with this, because you want to do data analysis, right?

Nobody is waking up in the morning and saying, “Hey, I just want to manage my logs.” No, it's not like that. They just want to strike value from there. Manage logs, manage metrics and traces is a very complex and heavy task. Even more while now, the environments are growing, right? Growing, meaning number of EMs, and that's – it's like, call for complexity around how hard it's for me to attack the problem in a very efficient way, right? The only way to do it is divide and conquer.

We built this technology to simplify how to divide, how to conquer easily and level up the user in a way that they don't need to touch a Unix file anymore. They just can do a drag and drop, click a couple of buttons, and you get everything that used to do in a couple of days in just minutes. That's simplicity for us. Honestly, we pay a lot of attention in our user interface, how the user interact with it. If we have any extra burden that is not needed, or extra action that will make things hard to understand, we don't want that.

Most of our customers are really happy with the interface, because it's very intuitive and they can accomplish and extract value right away without taking 10 hours of training. I think that, yeah, so for us, reducing this complexity, and we see that our customers are getting insight faster and are hitting their goal in a better way. We are very customer focused.

[0:29:30] LA: That makes sense. Certainly, that's one of the dimensions of complexity is the complexity of the user experience. That makes perfect sense. I also sense another complexity offering in your mixes as well there, too. One of the things I talk about in the book is I talk about there's a trade-off between depth of data and complexity. The more data you get, the more analysis you can do, the more complex the system is. The less complex it is, there's a correlation there. To keep a system simple, often means losing analytics. Means losing visibility into how it works. That's the way you make a system simpler. If you want more visibility and more noticing what's going on within the system, that typically means a more complex system.

One of the things, observability in general, the idea of opening up a system so you can see more of what's going on inside is by its very nature and increase in complexity. In some cases, huge increase in complexity. Using a large-scale application, using the fullest capabilities of like a Datadog, or a New Relic could be very complex just to even wrap your mind around what else is going on. What's valuable is finding a sweet spot, right? Adjusting things, finding a sweet spot between how much data you want, how much observability you need into your system, and the level of complexity that goes with that.

I see a product like yours is something that is able to fine-tune that adjustment, if you will, change the amount of observability data and therefore, change the complexity. As a feature of what you do is users can change that. I want to send this level of detail to this application, I'll store all the nitty-gritty detail on S3 for later analysis if necessary. But real time, I just need this high-level summary and for alerting, I need a high-level summary, etc., etc. Those sorts of things. You're essentially making decisions on trading off detail for complexities, what you're doing. Do you see it that way, or am I missing something in that analysis?

[0:31:49] ES: Yeah, there's an interesting piece in your analysis of what you said, what the user wants and what the user needs. What we've seen in daily work is that we know that they want to analyze the data, but when it's about the what, most of users doesn't have an idea how much information they have. That's a reality. Unless, you start doing some inventory of control, okay, how much data you have. We have one customer that they had a lot of built-in house tools to collect the data from their applications and they thought that they were processing a couple of terabytes per day. We changed the logic, we implemented a product and now they get surprised, because, hey, now we're collecting more data that we used to before. Yeah, you were losing there. You didn't know that you have more data before. This is what you have. This is for real.

Now, when they get into this discovery process like, “Hey, I take control. I understand what I have. Now I can take some decisions on what I need.” The need that you have before, taking control might not be the need after you put some tool in place and start analyzing, hey, this is all what I have. Because the moment that you decide to implement at telemetry pipeline solution could be for security, or for observability, you get more control, you are able to process and gather more data and you have a more holistic view of why you're really having your environment. You're always against time, because you know that months over months, your data is growing. The amount of data per minute is more than the last month.

If you don't stop and you train yourself and put a solution, it will be hard. Everybody has an idea, until they get the punch in the face, right? There's something like that. I think that in this space happens the same –

[0:33:42] LA: Hey, we have more data than we think we have.

[0:33:44] ES: Yeah, yeah. The other more data doesn't correlate with value. That's the biggest problem. The more data you have, it's hard to find the value. You have to clean it up. You have to make sure that oh, this data is –

[0:33:58] LA: That’s the complexity argument is. If it's too complex – a complex system can't be analyzed easily. If you have so much data that it makes your system so complex, so you don't know what the data is, it's not useful to you anymore.

[0:34:14] ES: Yeah. You might be surprised how much money companies and users pay to store all this data. Than, “Oh, I need to store it, because maybe I need it.” If you go one step forward and put some control, or some solution before that, you might be in a better shape for what's coming from a data grow, data volume perspective in the future. Yeah, there's a lot of story from different users, but it's interesting problem. The same problem that we used to have 10 years ago, we have it again. With volume, it's hard to strike the stuff. I think the work cannot rely on the working that's used to working from a couple of years ago. I think that everything is shifting more to the left, where the left is where the data is being collected, where data is being processed. There's a ton of value on that. Yeah, I'm happy to see that. Yeah, besides Calyptia, there are more emerging companies also trying to hit this space with their own approach.

[0:35:10] LA: There definitely is. Yeah. Yeah. It's like what you were saying, too, about the wasted data. I don't think there's any more expensive five words in English language than, “Maybe I might need it.” Those five words, I think, cost a lot of people a lot of money off it. I agree with you. When I worked in New Relic, I used to say that most applications have significantly more analytics data than they do business data. That always shocked people.

I always thought afterwards, I was saying, you don't realize in some cases, it's even in order of magnitude more analytics data than they do business data. It's huge quantities of data that you're collecting. You may not even know about it. It’s just, what you do with it all is a complexity. Obviously, New Relic has their solutions for doing, etc., but when I was working there. The point being, people were always shocked to hear that most companies have more analytics data than business data. It's a significant difference.

[0:36:16] ES: Yeah. I say that most people discovered this by practice after a couple of years. I think it's an interesting journey. It is interesting challenge every day around data. Getting the data, getting the insight, it's really hard. Everything in the lab is to companies to decide, are they going to take a strategy to fight that back or not? If they are not, yeah, we'll have to do it in the future at some point. There's a lot of –

[0:36:45] LA: That makes sense.

[0:36:46] ES: Yeah, there's a lot of interesting journeys around in observability, operations, and security.

[0:36:53] LA: I know we're running short on your time, and I appreciate your hard stop coming up here. We haven't yet talked about security. I'm wondering if we could talk a little bit about security. Now, I was going to talk to you about the difference between SIEM and SOAR, because I know you play a little bit in both places. But you're primarily a SIEM offering. Is that correct? Now, just for everyone who's listening and if you're not familiar with those terms, SIEM is, let's see, security information and event management. It's the events that happen that can cause your business harm from a security standpoint. While SOAR is security orchestration, automation, and response, I guess, is the R. That's more about the systems that you build up that manage the security of your applications and what the processes are for doing that. SOAR is more of a preventive and SIEM is more of a notice of detection. Do you consider yourself more of a SIEM company, or more of a SOAR company? Or do you play in both?

[0:37:56] ES: Oh, as debating, we're both. We play both. The thing is, is the following. We have our base product that has two ways to be consumed for security and operations and for observability. For example, if I can do a comparison between one of the other, for example, in security, you care about security sources, like firewall, security messages that you should send to a SIEM. If you’re on observability space, maybe you are a devops person, or SRE that you just care about the logs, metrics, and traces. You just care about that, analyze that, and that's it.

In the other side, for example, in security, you care about parse for firewalls, windows logs formatting, processing, everything that's around processing to save costs on your SIEM. While in observability, you care about data enrichment, metadata, removing traces, devop logs, which you might say, “Oh, this is similar.” Yeah, but at different personas. Totally different. For example, in security, the buyer is a CSO, right? The chief of security, while in observability is usually the VP of engineering, or now there's VP of observability is a new title. The problem is like, for example, in security, if you want to ask, I'm a security of observability. In security, people say, the major problem is like, “Oh, I cannot connect my data sources. I cannot extract this information from this application, because this is relevant, because it could be a security threat, or if I'm being hacked, that information is there and I'm not be aware of that.” It's really intensive about connectivity, or integrating different systems.

Well, in observability, if I cannot connect my data, the problem is different. I cannot serve my end customers. I may not have my high available services, or so on. The spectrum of problems and personas are totally different, but yeah, data is data, right? The way that you consume it and the way that you want to strike value might be different based on your own needs. From a security standpoint –

[0:40:06] LA: Assuming the data is different, but the data is still the same.

[0:40:10] ES: Yeah. Well, data is data. Just a round of bytes.

[0:40:15] LA: Real quickly before we stop, you list a ton of integrations on your website. Do you want to mention some that you think are the most important for your customers, or the most critical ones? Obviously, open telemetry. Which ones are the ones that the people listening are going to be more likely to be interested in?

[0:40:35] ES: Yeah, for example, I would say the common one is Amazon S3, which is a very cheap object storage, that is common across majority of our customers. They use it. Well, at least, if they are not in the BPC and they don't have access to that, they might not use it. The other are Splunk. Very highly used. Most of our customers use Splunk, they're mid-sized company and big-sized companies. HTTP to receive events over HTTP and send over HTTP. Now, open telemetry, yeah, just for tracing, but I would say, it's not that hot as logs in our case. I think majority of use cases that we're solving today are around logs and metrics, few ones with traces.

[0:41:15] LA: Still, things like syslog integration is more important than open telemetry in most cases.

[0:41:22] ES: Yeah. The thing is that you have to solve the problems for the problems that you have today, not for the problems that you want to have in the future. Now, there are a lot of companies migrating to open telemetry to do logging with open telemetry and do metrics with open telemetry, beside traces. Trace is standard. The industry is still running – we are very honest, to move on logs is fluent, to move on process metrics is Prometheus, and traces is open telemetry.

This might change. I think that the change is happening. It might take a couple of years. Still, you have thousand and thousands of machines running syslog, or firewalls sending messages on syslog. You have to connect those devices. Now, when we see that companies works in hybrid mode, they have their old system, but also they use the cloud. They want to leverage and be able to collect data from every single source of information. If you have, well, as I said, firewalls a couple of times already, or devices, application logs, or even if you're running an instance on AWS, you might want to extract application information logs and then reach them with metadata like, oh, this was running on AWS server with this label, or with this tag. Then you can correlate information based on the type of environment where this was running before.

Yeah, it's a bit of everything. Anybody who's just trying to say, we're going to standardize on this protocol and our solution is just for this protocol, yeah, that's fine. Maybe you're looking for customers in five years. Today, everything is about syslog, HTTP, Splunk, even Elastic, as you said a lot, too. There's a mix of everything.

[0:43:08] LA: Great. Well, thank you very much. My guest today has been Eduardo Silva, who is the Founder and CEO of Calyptia. Calyptia, Calyptia, right?

[0:43:18] ES: Calyptia. Calyptia.

[0:43:20] LA: Calyptia. Okay.

[0:43:21] ES: Okay. Let me tell you why. If I can tell you why, because the original word is called calypthe. calypthe, right? T-H-E. Okay. Calypthe is a genus of hummingbirds. That blue and beige logo is a hummingbird. Calypthe is a genus of hummingbirds. But the way you might pronounce Calypthe in Spanish sounds Calyptia. That's why we chose Calyptia. Yeah.

[0:43:48] LA: Makes sense. Calyptia. Okay.

[0:43:50] ES: Yeah, perfect.

[0:43:51] LA: Anyway, Calyptia is a company that helps you manage your observability and your security data. Once again, this is Eduardo Silva, who is the Founder and CEO. Eduardo, thank you so much for joining me today on Software Engineering Daily.

[0:44:04] ES: Lee, I appreciate the invitation. I would be happy to have another conversation anytime soon. Thanks again.

[END]