Episode 537: Adam Warski on Scala and Tapir : Software Engineering Radio

Adam Warski, the co-founder and CTO of SoftwareMill, discusses Scala programming and the Tapir library. Scala is a general-purpose JVM language, and Tapir is a back-end library used to describe HTTP API endpoints as immutable Scala values. Host Philip Winston speaks with Warski about the implications of Scala being a JVM language, the Scala type system, the Scala community’s view of functional vs. object-oriented programming, and the transition of the ecosystem from Scala 2 to Scala 3. The Tapir discussion explores why Tapir is a library and not a framework, how server interpreters work in Tapir, how interceptors work, and what observability features are included with Tapir.

Transcript brought to you by IEEE Software magazine.
This transcript was automatically generated. To suggest improvements in the text, please contact [email protected] and include the episode number and URL.

Philip Winston 00:00:16 Hello. This is Philip Winston for Software Engineering Radio. Today I’m here with Adam Warski. Adam is a co-founder and the CTO of Software Mill, where he is an expert on Scala and distributed systems. For over 10 years, Software Mill has used Scala and other technologies for custom software development. Adam is also the founder or key contributor on a number of open-source projects, including STTP client, STTP Tapir, Enverse, Quick Lens, and Elastic MQ. Adam has a master’s degree in Computer Science from the University of Warsaw. Today we’re going to discuss the Scala programming language and the Tapir library. Let’s start just by defining each of these briefly. Let’s start with Scala. What is Scala, and when did you personally start using it?

Adam Warski 00:01:04 So I started using, well, I first encountered Scala back in my university days on a seminar on functional programming. It seemed to be quite a weird and partly obscure language back then. I was like on the second year, so I was quite young. Nevertheless, it was quite interesting. But that was like my first, first time when I saw the language. Then I got into Java as a paying job and we started a company. So about like probably eight years later we got our first paying project in Scala, and Scala was way more popular already back then. So, it was this time it was a conscious decision to actually try out something new, and by luck or by choice — well probably half-half — we ended up using Scala. And you know, there’s nothing better to learn a language than actually writing code in that language. And so, thanks to that client and to the openness of that client to us trying out a new language, we managed to learn quite a lot and that’s how we started.

Philip Winston 00:02:07 Can you give me some examples of problem domains where Scala is particularly well-suited, either that you’ve worked on or just from the communities or the precedent for using Scala?

Adam Warski 00:02:18 Well, Scala is a general-purpose language, right? So, you can, in theory at least, write anything using Scala. That said, at least in our company, we mostly use Scala on the back end. So, we use it again as a general-purpose back-end language. So, any kind of APIs, data processing, distributed systems, stuff like that. In the community, Scala is also very popular in the Spark project, through the Spark project. However, we don’t do that much data science ourselves, so that’s not where we use Scala. That’s also the possibility of using Scala on the front end through Scala JS. But that’s also not a domain that we’ve been exploring too much. So, in our case, it’s mostly the backend, it’s mostly business code. We found Scala to be very flexible in the way we can define abstractions and the way we can express various domain concepts.

Adam Warski 00:03:17 So, when using other languages — so, we’ve used Java a lot as well — so very often you were able to express various domain concepts in the language, but they were intertwined with some infrastructure code, right? So, the domain concepts sometimes drowned among all the infrastructure and all the boilerplate that you needed to define as well. So, with Scala it’s much easier to define the abstractions, which allow you to actually make a clear boundary between your business code and your infrastructure code. So, then it’s crystal clear which one is which, right? And this makes it easier to read the code and to understand it, right? If you have the domain concepts fleshed out quite clearly, it’s quite easy to understand how things work. And then if you have the infrastructure separate and the abstractions separately, it’s also easier to understand how the whole thing is orchestrated. So I guess, yeah, that’s, that’s our main use case for Scala.

Philip Winston 00:04:12 So talking about back end, is some of your use cases e-commerce or telecommunications, or like, what specific domain?

Adam Warski 00:04:21 We don’t really focus on any particular industry. The problems tend to be very similar as far as back-end development goes, right? It’s the same problems, maybe the words a bit different, right? So, the domains are different of course, and the business people express their problems using different vocabulary, but in the end, on the technical side, you end up writing more or less the same things. That’s why we don’t really, we are very technical-focused company. Our specialty is not on an industry, but on the technical side. So, as I said, you know, back-end distributed systems and so on. That said, a lot of our clients do come from some specific industries. So, we’ve had a couple of clients from telco and we had some clients from medtech. So medical, we had a couple of clients from the entertainment industry and of course fintech is the fourth large group. So, I guess you can say that I know maybe they, these are industries which have these kind of problems particularly often, but without any special focus that’s what we’ve seen projects being in a similar industry.

Philip Winston 00:05:27 Let’s also briefly talk about Tapir, and then we’ll dive back for about half the show into Scala and half into Tapir. But I just want to let people know where we’re heading. So, what problem did you set out to solve with Tapir? And if you can mention the STTP family of libraries, where does Tapir fit into that?

Adam Warski 00:05:48 Okay, so STTP stands for Scala HTTP. So it’s a family of libraries which are written in Scala and for Scala and deal with various HTTP-related problems. So as far as Tapir is concerned, what we wanted to do is we wanted to expose an HTTP server alongside with open API documentation. So that was the original problem statement. It’s not that easy to do. Maybe it should be, but it isn’t. So there are some, of course, other approaches. One of them is writing the YAML open API definition by hand, which I think a programmer shouldn’t really have to do because it’s not a language meant for developers to write. I think it’s more like a machine language. You can use Java and annotations, but annotations have a lot of drawbacks and I’m not a particular fan of annotations. So that’s another approach. And that’s basically it, right? So these are the two alternatives. So, we hoped to find a better way and that’s where Tapir comes in. So Tapir is a library which allows you to describe HTTP endpoints using a DSL in Scala, using an immutable data structure and some helper methods to build out the data structure and to describe the endpoint. And once you have this description, you can interpret it either as a server or you can interpret it as open API documentation.

Philip Winston 00:07:21 Before we go back to Scala for a while, let me mention three shows in past episodes that are relevant. So, on Scala specifically, there’s Episode 171, “Scala Update with Martin Odersky” and Episode 62, “Martin Odersky on Scala.” Both of those are over 10 years old though. On functional programming in general, we have Episode 418, “Functional Programming in Enterprise Applications.” That episode is coming from a .NET F# perspective, but it contains a lot of general information about functional programming. So, let’s dive into Scala more focusing on more recent developments and actual usage and community. Scala is a JVM language. What is a JVM language and what are some of the benefits and drawbacks to Scala being a JVM language?

Adam Warski 00:08:14 So to be precise, the JVM is the main platform to which you can compile Scala code, right? There are also two others. So we can also compile Scala to JavaScript and to native code as well. But the most popular, like probably 90-something percent of Scala usage comes from the JVM.

Philip Winston 00:08:33 So can you describe how using the JVM impacts developer productivity and also runtime performance?

Adam Warski 00:08:40 I think the main implication of being on the JVM is that you have access to the whole JVM ecosystem. There’s probably a library for everything on the JVM and in the Java. So it might not have a native Scala interface, right? So, it might not expose exactly what you we would expect from a Scala library, so it might use different collections, this time might be different, but it’s there. So in case you really need it, you always have the option to use the Java libraries for some specific task. And I think that’s a great option to have, and it makes your life much easier as a programmer. And so in some ways you can think of it as a backup option. Maybe if, you know, if there’s nothing in Scala that fits your needs, you can always use the Java version of the library or maybe some even other language. However, mixing, I don’t know; closure library and Scala, that might be tricky so probably I wouldn’t recommend that.

Adam Warski 00:09:40 So, another thing is that the runtime is really mature and the garbage collection algorithm are really fine-tuned. So, memory management isn’t really a problem. So, you can safely create lots of objects and, unless your application is under very high load, you don’t really have to care about that. And you know, it’s one less problem that you have to think about as a programmer. So, you can just freely create objects and just dispose of them when you don’t need them. And it’s a nice property of garbage collected languages in general. But in Java, I think it’s one of the best VMs and garbage collectors out there, which, you know, just saves you time when you write your applications so that you can focus on the business instead of focusing on, for example, managing memory. So of course, there are also downsides of the JVM: startup time being one of them.

Adam Warski 00:10:36 There is some movement in the Java world. Project Leyden just got announced a couple of months ago, which aims to actually improve the startup time of the JVM, but it’s still, you know, a couple of years ahead of us, right? So, for now we have to live with that. So Java as a runtime may not be the ideal choice for serverless functions or common line tools where this extra second or two really matters, but it’s not really an issue, you know, for server applications; if it’s a long-running process, if it starts up in a second and then continues working for a month, like who cares, right? And for those other use cases where you do need this fast startup time, you always have the option to compile down to native code using Scala native. You can compile down to JavaScript using JavaScript, or you can use GraalVM native image, which I think works particularly well with Scala. In a way, probably it works better with Scala than with Java because Scala libraries in the whole ecosystem doesn’t rely on reflection, which is a problem with native image in Java. So, I think by coincidence native image is actually a very good fit for Scala.

Philip Winston 00:11:49 We’re going to move on now from the JVM, but I want to mention one more episode. This is Episode 266, Charles Nutter on the JVM as a Language Platform. Scala supports both functional programming and object-oriented programming. Are there communities who insist on purely functional code versus ones that mix the two, and where do you lie on that spectrum?

Adam Warski 00:12:16 That’s a very good question. That’s probably the biggest problem in Scala that there are various approaches to how you can program using Scala. The language is quite flexible as I mentioned, and allows you to create a lot of … well, it’s very flexible in creating abstractions, which makes people do various sometimes crazy things — and sometimes not crazy, but just “original,” let’s say. So, there is one part of the Scala community which is very functional programming oriented, and they do try to do pure functional programming using Scala. So, this usually means working with some kind of an IO monad and representing computations as values. This also brings its own problems because you know, to sequence two computations you need to use flat map. You can’t just write two statements one after another. So, you need to switch your whole programming model to a different approach, and it needs some time to get used to that model and it has a certain learning curve.

Adam Warski 00:13:26 Of course, once you do get over and do gets to grok how this pure functional programming approach works, it has its benefits, and it definitely is a very interesting one. The second approach is more moderate and tries to leverage more of the combination that Scala is between object-oriented and functional programming. So it doesn’t reject side effect in computations in general and doesn’t try to capture every side effect in computation inside the value. Instead, in Scala you can use mutable values; you can use, you can do side effects if you like — the language allows you to do that, and the compiler allows you to do that. So, the second cam would be more moderate in that area and would still use the functional programming constructs that are there, but not in a very restrictive way, right? So, I think there are some aspects in which both communities agree, like using immutable collections. It’s something that everybody does.

Adam Warski 00:14:32 Every library in Scala, the standard library, the whole ecosystem is based on immutable collections and on immutable data structures. And that’s not something that people really discuss using, right? So it’s a very uncontroversial issue. Higher kind of types — so these are types which creates types — that’s, for example, a more controversial issue with some people trying to embrace this way of creating abstractions that Scala allows, some people try to minimize the usage to be more friendly for beginners. And there’s a couple of more of these, of course. So as for me, where I stand, I’m not sure yet. I’m trying to understand that. It is a dilemma, right? Because on one hand, pure functional programming has its benefits and it has a certain charm, which is sometimes hard to resist because the code can be very elegant and it has all those nice properties that the compiler verifies for you.

Adam Warski 00:15:37 On the other hand, I can see that it’s much harder for beginners to understand. It has a higher entry level. Sometimes simple things like sequencing some side effecting computations are not as nice as they would be in an imperative language. So, you know, it’s a question. There are always trade-offs in computer science, right? So, do we want to have this elegance of pure functional programming or do we want to be more practical maybe and allow some side effects? So, it’s something I try to answer for myself to find the golden middle. I haven’t found it yet, and it’s in fact an ongoing discussion in the Scala ecosystem, especially with the introduction of Project Loom in Java, which introduced green threads or lightweight threads into the platform, which kind of solved in a different way one of the main use cases for the IO monad for futures in Java, which was asynchronous computations.

Adam Warski 00:16:41 So now they’re like baked into the language using the direct style of writing programs. So now people started to wonder, like, do we use iOS and futures and so on because of their elegance and because of their functional properties, because of referential transparency, because of some other reasons? Or have we used them only for the asynchronous programming aspect? And it is an ongoing discussion and it’s a very interesting one from, you know, even from a purely academic perspective I think. As far as the libraries which we’ve mentioned go, so both Tapir and STTP, they are designed in a way which works with both representations. So, we try to take a natural stance, and as I said, you know, the base data structures — for example, the data structure for describing the endpoints — it doesn’t really matter how you represent side effects because it’s not concerned with that.

Adam Warski 00:17:43 In fact, it tries very hard to separate the description of the problem domain from the business logic and from the effects that then happen. So this allows us to define the description as a pure immutable value, and it’s done the same way whatever approach in Scala you prefer. And then you can define the business logic. So whatever happens when you invoke the endpoint with whatever representation of side effects you prefer and you choose. So in this respect we try to work with everybody. Of course it’s not with its own, like it has some downsides. So the, the API is a bit more complicated because of that, but it is possible to actually use the same library whatever Scala style you are using.

Philip Winston 00:18:35 You mentioned monads once or twice, I’m going to refer to Episode 266 to define that. So can you give an example of a purely functional library or framework that you really like in Scala besides your own, and then maybe one that is more object-oriented or has side effects that you feel is popular and you like despite those limitations or those choices?

Adam Warski 00:19:02 So, just to again be precise, Tapir isn’t really all pure functional programming because it works with both sides, right? So it is functional in its style, but it allows you to work with both styles. As far as functional libraries go, I think there are two particularly nice implementations of libraries which implement support for purely functional side effects. One is called Cats Effect and the second is called Zio. They both try to solve the same problem in a bit different way, and it’s also interesting to see how they in a way compete and how they implement the same features. So, when one library implements a feature, the other tries to catch up and vice versa, but they also sometimes make different decisions. So it’s very educational to actually see the development going on. So, the problem domain they’re trying to solve is representing computations, which might involve side effects as a value.

Adam Warski 00:20:04 Once you have a computation represented as a value, you can do a lot of things with it. In particular, you can pass it to functions which somehow modify this computation, right? So, for example, you have a computation which represents fetching something from a webpage, right? And now you can pass it to a timeout method which will modify this description of a computation to return another description of a computation, which will actually impose a timeout on the whole process, and so on. There’s a lot of, and there’s a lot of these combinators which allow you to modify how these descriptions where they allow you to build larger descriptions from smaller descriptions and more complex ones from simpler ones. And as far as any kind of concurrency or false tolerance goes, there’s probably an operator for that in both of these libraries. They differ in some details in how they handle concurrency, but the biggest difference I think is the way in which they handle errors.

Adam Warski 00:21:06 So in Zio, we have a dedicated error channel. So each computation is defined through its type — not only by the type of value that the computation produces once it is run, but also by the type of the error which might happen when the computation is run. So this way you can define computations which should never fail and should never return an error by just saying that the error type is nothing, which is a type which has no inhabitants, or you can say that arbitrary exceptions may occur for example. So this is an interesting approach to how errors can be handled, and this is done very nicely throughout the Zio library and other Zio libraries, as well, and very consistently. So you know, error handling is in general a very important subject as errors actually define how you write your code, right? And it’s the number one concern you should have when writing code: what will happen when things go wrong?

Adam Warski 00:22:04 So these are the function libraries which I think are very interesting to take a look at. As for not purely functional libraries, I think I would say Akka is the most interesting one. Unfortunately, it has been moved from an open-source license to a source-available license in the recent days. But nevertheless the library is interesting in itself as well. So, Akka is an implementation. Well, Akka is a lot of things, but at its core it’s an implementation of the actor model for the JVM. It’s available both in Scala and in Java, but the implementation itself is in Scala. So the actor model is one where you have actors which can enclose some behavior and the only way to communicate without actors is by sending them messages in an asynchronous way, and it’s not purely functional because actually sending a message to an actor is a side effecting operation, right?

Adam Warski 00:22:59 So it’s like a fire-and-forget. So that’s not purely functional at all, quite the opposite. However, the way you can define actor behavior can be done in a functional way, and Akka has a very nice API for that. Apart from that, Akka has great APIs for streaming and for HTTP, which I think are one of the most programmer-friendly ones. I would probably use Akka HTTP to write an HTTP server if I didn’t used Tapir. But yeah, but for example, as far as streaming goes, it’s also the most developer-friendly API out there. There other APIs for defining streaming computations in Scala as well and they’re great. But I think Akka streams still has an edge over them in terms of how easy just to understand the code and to write the code. And one thing to say about Akka, although it is now becoming not full open-source, there is an initiative to create a fork in Apache. So maybe the open-source Akka will continue in some form.

Philip Winston 00:24:07 You mentioned three libraries, I’m going to look those up and put them in the show notes, I’ll put links to them. Scala is strongly typed. Can you talk a little bit about how Scala’s type system compares to Java? One of the trends we see in the industry is Python adding gradual typing through type hints and TypeScript adding sort of gradual typing to JavaScript. What benefits do you see from Scala having strong typing from the beginning? And if you could just give one example in Tapir or another library where something sophisticated was done with the types that really helped the implementation.

Adam Warski 00:24:48 So I think first of all, the static versus dynamic typing is a matter of taste in many cases and personal preference. So, I doubt there ever will be a clear winner as to, you know, which approach is better. I think both are good, just some people prefer to use one tools and other people prefer to use other tools, right? So, in my case, I have always liked static typing. I have always liked the fact that the compiler tracks all those boring properties for me, and these are the properties which are proved to be correct and I don’t have to write tests for them, right? And I think the fact that both Python introduce some form of static typing, that TypeScript exists, and so on, this kind of validates the fact that in large code bases and in more complex systems you do need the static types to navigate code.

Adam Warski 00:25:43 Especially in cases where you can’t fit the whole system in your head and when you work on somebody else’s code, when you got introduced to a project, that’s when even the simplest types are very beneficial just for code navigation, you know, and for naming things. This might be trivial — or they might seem trivial properties, but they’re actually very helpful I think. So as for Scala and Java and their type systems, so this Scala type system is actually very irregular and in some ways it might also be view viewed as simpler than Java’s. What Scala in general is a language is actually a lot simpler than Java because it has way less special cases and coronary cases and probably the same goes for the type system. So, so as far as the language goes, the grammar size might be an indicator and that’s a property that Martin Odersky, the creator of Scala often shows, that the grammar size for Scala is actually much smaller than the grammar size for C#, Java, and so on.

Adam Warski 00:26:49 The language is just way more regular. It has a couple of features that you can always use, and it’s the intersection between the features that give the language its power. Anyway, going back to the type systems, so everything you can express in Java, you can express in Scala as well. However, Scala has a number of additions which again make it more regular but also make it more powerful. So higher kind of types which I have already mentioned. One example, so in Java you’ve got, you’ve got the generics so you can parameterize your class with some type. In Scala can do the same but can also parameterize a type with a type factor. So you can parameterize a class with for example some kind of a constructor which needs to be provided with a type to produce another type. So an example of a type constructor is a list, right?

Adam Warski 00:27:42 A list in itself is not a type, it’s a type constructor. You need to provide it with a type of the elements to actually get a type. So a list of a string is a proper type and the list is type constructor. So you can use those high kind of types to create abstractions and that’s very useful in Tapir, in the way we implement our integration with various approaches to handing side effects in Scala. So when you provide the business logic for an endpoint, which I’ve also mentioned earlier, you need to provide the function which takes the input parameters and produces the output parameters, which are then mapped to the HTTP response. And this function needs to produce the output parameters using some kind of effect, right? It can be the IO effect from Cats Effect, it can be the Zio effect from Zio, it can be future from Akka, it can also be the identity effect if you would like to use Project Loom, for example, and write synchronous direct style code.

Adam Warski 00:28:38 That’s also possible, but because this server logic function is parameterized with a higher kind of type, you can just plug in everything there. So that’s the kind of flexibility that Scala allows, and it’s just a no-brainer to actually do that. Scala also has especially a useful, I think, other types that come with Scala 3. There are some new kinds of types that got introduced, which are not so well known yet I guess in the wider audience. So, for example, new types known in Scala is opaque types, these allow you to create a kind of a zero cost abstraction. So, they allow you to wrap an existing type with something that is distinct from that type at the compilation time. So, for example, you can wrap a string into an email type, and when you compile things this email type would be different from a string.

Adam Warski 00:29:40 So you can’t mix those two, right? But at runtime everything is erased, and this opaque type behaves just as a string without any runtime overhead. And there’s a couple of my examples of these types that have been added to Scala. As for how Tapir uses it, I’ve already given one example how you can define the business logic, but I think going one step earlier is the way Tapir provides type safety of its input and output parameters. So, when you describe an endpoint using Tapir, you do so incrementally: you incrementally define the inputs of an endpoint and the outputs. So, the inputs are the things that are extracted from the HTTP request — so, this might be a query parameter; this might be a header; this might be the request body, for example — and you incrementally say that, you know, this endpoint has a query parameter name that should be read as a string.

Adam Warski 00:30:45 It has a header, something which should be parsed as an end, and it has a Json body, right? So, you just call three times a method which adds an input and the type of the endpoint each time is extended by the type of the input that you add, right? So, if you add three inputs, a string, and in, and a Json body, you end up with a tuple, which has three elements corresponding to these types. And the same thing is done with the outputs. So then when you need to define the logic of the endpoint, you need to provide the function which has this exact type, right? So, everything is well typed and verified by the compiler, and I think that’s nothing particularly fancy in Scala to actually build those topos. It is like some very simple type-level programming which you can do, but it has very nice, compile-time properties so that you can see the shape of the endpoint, what are the inputs, and the exact type and the outputs. A very important property here is that once you write an endpoint, the IDE can infer the type of the endpoint, right? So, you don’t have to write it by hand, you can just click in IntelliJ or whatever IDE you use to please infer the type and you will get the correct type generated for you.

Philip Winston 00:32:13 So we got into Tapir there relative to the type system, but I wanted to call out one thing you mentioned, which was Scala 3. So, Scala 3 was released in 2021 after maybe eight years of development? I just wanted your opinion on how the transition is going from 2 to 3. Python famously had a very long transition period; I think more than 10 years in some sense. Can you just talk about how that transition is going for either your work at Software Mill or the wider ecosystem, and maybe mention an additional besides the type changes, an additional Scala 3 feature that you like and maybe one that you’re less excited about or that maybe you have reservations about?

Adam Warski 00:32:59 Sure. So, I think that everybody hoped the migration would actually go faster, but as always things go slower and that’s nothing that’s exceptional in Scala, I guess. Just a general rule of life. Scala is, as you know, as an introduction to that subject, Scala is much better suited for such migrations than Python because it is statically typed, and you have the compilation phase and the compiler will actually tell you if things work or not upfront, right? So that’s one thing. But another thing is that because of the types, there is a chance to write a tool that migrates Scala 2 code to Scala 3 code and such tools do exist. There are some syntax changes, there are some semantical changes, and there are some tools which will actually allow you to migrate the code base. So that’s not a big problem. The bigger problem is the ecosystem and how fast all of the libraries get migrated.

Adam Warski 00:33:59 So there are some libraries that have migrated very fast. There are some libraries that are catching up right now. There are some which are like still lagging behind — Akka here being a prime example, there still is no release of Akka for Scala 3, unfortunately. So, it depends which part of the ecosystem you’re using. Now our company, we are still mainly using Scala 2. We are only starting our first Scala 3 projects I think either this or next month. So it’s slowly getting there, but some work still needs to be done, especially in the ecosystem migration because that simply requires manual labor and it requires often to maintain two versions of the code base, right? So there are some not very common, but in some cases you do need to have different code for Scala 2 and Scala 3. So you can share most of the code, but you also need to actually create two different parts of the source that one is included in Scala 2 and one is included in Scala 3.

Adam Warski 00:34:57 And you know, being a maintainer of STTP, I can say that maybe it’s not a big problem, but it does take some time to actually do. However, I haven’t seen like any big problems out there. It’s not like there are some showstoppers or there are some major obstacles, apart from people having to invest their time, which is understandable, you know, it’s open-source, you can’t really expect people to do the work unless you know you end a business relation with them. So, you can either do it yourself or you can wait for others when they have time. So, I’m optimistic as to how this will progress in the future. I think in a year or so we will see a much higher Scala 3 adoption and that also companies, including mine, which invest in Scala and in Scala tooling and in the migration efforts of Scala. So hopefully this will pay off.

Adam Warski 00:35:53 As for the Scala 3 features, I think my favorite feature, and I think something that is unique to Scala in general, is its macro system. So, macros have been present in the Scala 2 as an experimental feature. They have seen two or three iterations of how the macro is being written and defined. However, in Scala 3 we get a brand-new way of actually writing macros, which is a good thing because the new way of writing macros is much more principled and it’s cleaned up, and it’s much more friendly for developers in certain aspects. However, it also means that if you have used a macros in Scala 2, you now have to rewrite the macro in a completely different way into Scala 3, and that’s like one big part that is not compatible between those two releases. I think it’s the only major part, in fact.

Adam Warski 00:36:50 However, macros actually allow you to do a lot of things. So, macros allow you to generate code at compile time using Scala code. So, you write Scala code which manipulates the abstract syntax tree of your program and generates some other code at compile time so that it’s compiled later by the Scala compiler. And I think it’s a great replacement for the annotations that are used or abused in Java very often. So, in Java, for example, if you want to encode or decode Json, you will often see classes annotated with Json mapping annotations and then at runtime these annotations are read using reflection and some byte code is generated to actually handle the serialization and deserialization. And you know, it works. It has its downsides.

Adam Warski 00:37:47 I think there’s quite a lot of downsides using annotations in Java this way and relying so much on reflection. And I think there is a better way through macros here. What you can do instead is you can sometimes even also using annotations, but these annotations are processed at compiled time so you can generate code which will actually handle the Json reading and writing. And one big benefit here is that any errors that might happen — so, any mistakes in the mapping — will actually get caught and surface at compile time instead of runtime. Also, the runtime penalty is lower because you can just generate code once when you compile instead of doing it over and over again at runtime when the application starts up. And also, the API for actually generating the code. Well, it’s just Scala code that you write. It’s not some annotation processor, it’s not some reflection API that you have to rely on. It is simply Scala code that generates other Scala code.

Adam Warski 00:38:44 But macros is, maybe, I shouldn’t even say that, I shouldn’t call this feature macros, it’s a whole meta-programming aspect. So macros is one part, but also inline functions which sometimes even allow you to do a lot in terms of code generation without actually writing a macro. So, you just can write some inline, you can do conditionals in there, you can do pattern matching in there on types, all at compile time. So that’s a feature I really like, and I think it’s quite unique because in Java you cannot do anything like that, or in Kotlin. So, I think that’s something that really stands out as far as languages on the JVM in general go. As for the feature I wouldn’t like so much in Scala 3. That’s a good question. I don’t really know, I don’t know.

Philip Winston 00:39:29 That’s fine. It was interesting to hear about Scala 3. Now I want to shift gears to Tapir itself. Obviously, if you want to reference a Scala feature relative to Tapir, that’s great, but Tapir version 1.0 was released this summer, June 2022. Tapir started development, I think, in 2018. What was the path like from origin to release of 1.0, and can you give just one specific example of maybe a technical issue that was difficult to overcome or took a lot of effort and then maybe a community issue as far as attracting attention to the library?

Adam Warski 00:40:10 So I must say that Tapir caught on pretty quickly. So, I think it solved a really common problem that people had, that people really wanted to generate documentation out of the endpoints. And the other approaches that I mentioned aren’t really that great, and Tapir here really filled a niche that needed to be filled. There were also other approaches like endpoints for rest, which I think still do exist. They take a bit of a different approach but in general they try to solve the same problem of how do you define an endpoint alongside with the docs. That said, as you said, the development of Tapir took about four years of Tapir 1.0. It’s not like finished, finished. It’s just the core module out that’s declared as stable. I’m not sure if it was a community issue, I think it was just a good community that we managed together, but it appears a lot of iterations on various design elements.

Adam Warski 00:41:11 So quite often we had like, I think 20 minor releases, so 0.1, 0.2 up to 0.21 or something like that. And each of them actually meant that you had to rewrite part of your code, which probably isn’t such a great experience for people using Tapir. But they did, they did migrate from version to version, and they did report problems back. So that was very helpful in actually understanding how people use the library, what they expect and so on. Still, you know, it was a zero dot version, so some breakage is expected, I guess. But I think to have, they were very patient into how we tried to find the best representation for various concepts.

Philip Winston 00:41:54 Can you give some examples of production applications that are built with Tapir, maybe not just companies but actual applications people might have heard of or that you just feel are a good representation of what Tapir can do?

Adam Warski 00:42:09 We use Tapir a lot inside our company because we build applications for our clients. I can’t share their names unfortunately out of these reasons. It’s not usually that you know the — well, Tapir functionality in a way is user facing because you end up using a REST API you wouldn’t know that it’s Tapir, right? It can be any other library out there. The same if you take a look at Swagger, the editor or the open API docs, you wouldn’t know that it’s generated by Tapir, right? Just standard format. So, there’s a list of Tapir adopters on the Tapir documentation site, and there’s a couple of companies that publicly agreed to share their names. So if you’re interested you can take a look over there. Beside that I don’t really know, you know, how wide Tapir is used, it’s very, it’s in general a hard problem in open-source — getting to know is your library used or not?

Adam Warski 00:43:01 There are some indicators like how often do you get bug reports? So, if you do get bug reports in, obviously people do use your library. And in Tapir, I guess we get a fair amount of questions — sometimes bugs, sometimes future requests — which shows a certain kind of activity which is very encouraging and very promising. You can also take a look at the number of downloads in the Maven Central, however that’s, you know, very inaccurate, right? Because it’s just CI systems downloading the same stuff over and over again. Although it does give you some indication. So again, here I have no idea what exact numbers, anything like that, but we can see some nice growth into how Tapir is being used. So, it’s either people just running their builds more and more often or its actually new projects being created with Tapir.

Adam Warski 00:43:53 But you know, and I think as I mentioned in the beginning, because we are talking about exposing a REST API, it’s not any particular type of problem domain, right? Most projects nowadays need a REST API of some sort, and you need to document the API for others to consume it. So, the nice thing about Tapir is that you describe your endpoints once, and you do that using a high-level language and a type-safe language, instead of writing YAML. When you write an endpoint using Tapir, you not only get type safety, but you also get code completion, you get the compiler verifying that the types at least at the basic level match. So, these are some important characteristics when it comes to the developer experience of actually writing, well the task of exposing a REST API probably isn’t the most interesting one, right? You can think of more exciting things.

Adam Warski 00:44:52 So I think it’s important that we actually have a good and efficient way of describing how the API should look like. And one thing I think that’s also worth mentioning is that you can also interpret a Tapir endpoint as a client. So, you can use the same description to actually call an endpoint that you have exposed. So, if your clients are also written in Scala, it might be Scala JS and it might run in the front end or it might be another microservice. You can also use the Tapir description to create a client and call out your service which is being described by Tapir. You can even go as far as describing other services using the Tapir data structures and maybe documenting them even if the server doesn’t run using Tapir and you know, generating docs basing on that. I think some people are doing that and I can’t blame them. I would prefer describing endpoints using a high-level language and a properly typed language instead of YAML, which I’m not a particular fan of.

Philip Winston 00:45:58 What do you feel is the primary difference between a library and a framework? I’m assuming that Tapir is a library. Do you feel that Scala as a language biases people more towards libraries, or is it also possible to write a framework in Scala and do you maybe have an example of a framework that you do actually use in Scala and just kind of contrast the two?

Adam Warski 00:46:24 Right, so I think the difference might be subtle, but the major difference is how you actually use a certain piece of code, right? With the library, you are in full control and you decide when to invoke the functionality in that dependency, right? So, it’s you invoking the library, not the library invoking you. Of course, you can get callbacks and so on, that’s normal, but it’s about the main mode of operation, how you actually structure and write your code. Whereas in a framework you have to adapt to the way the framework imagines you will structure and write your code, and you have to follow the recipes that the framework authors have created for you. So in a way it’s much more constraining, which can be a good thing and a bad thing a good thing because it’s actually, you don’t have to think about how do I structure my code because it’s already there, right?

Adam Warski 00:47:16 It’s already defined by the framework author. It’s a bad thing because it constraints you. So, it’s a double-edged sword, right? Sometimes constraints are nice and, in a way, liberating, as Runar said in one of his talks. So, Tapir definitely falls in the library category. So, there is nothing proscriptive in Tapir as to how you should write your code. You use the Tapir APIs to describe the endpoint; you use the Tapir NPIs to couple the endpoint with the server logic that should be run when the endpoint is invoked. But then you know where you define the endpoint, how you actually, where the logic lives, right? You just need to pass in the function. So, where that function is defined, is it defined in some other class that is, I don’t know, wired using some dependency injection library, or maybe we are just using singleton objects, whatever, it’s not a concern of Tapir.

Adam Warski 00:48:17 You just need to pass in the functions and then you pass in this description into another function which turns it into a server, which you still have to start, right? So, in all stages it’s your responsibility to actually invoke the Tapir functionality, and you have to include all of that in your code base, which I think is a good thing because it allows you to actually have an application with a main method where the main method is like the main entry point, not only to the application but also to reading the application, reading the code. So, you can, again, using simple code navigation in the IDE, you can understand what happens step by step when the application starts and where the components are defined. So, there’s no, you know, magic auto discovery, whatever. So, I think this library approach is actually, at least for me, much easier to follow and to understand as I have clearly clear places in code where I know things happen, right?

Adam Warski 00:49:18 And I know that other things won’t happen unless they’re written in the main function and code reachable from that main function. And I think that’s an overall approach in Scala. Scala as an ecosystem and as a community, either the functional one or the less functional one, they both tend to prefer libraries over frameworks. I think maybe, in a way, Zio tends to go a little bit in the direction of a framework than a library, but it’s also quite subtle and you can still use Zio as a library as well. Akka here is also an example, at least in some parts of its functionality, where it is a bit framework-like, but you can still use Akka as a library if you prefer to do so. All of its components are usable standalone. So you will always get the dependence on a Akka for example, but you can use the streaming independent of HTTP and so on.

Adam Warski 00:50:18 So I don’t think there will be like a Scala framework coming. Maybe instead what will happen is we will see some kind of an integrated set of libraries being introduced. So, libraries which are documented in a similar way, which behave in a similar way, maybe which are configured in a similar way. Just so that you can have the same feeling when using the library, you know what to expect, what kind of approach to expect because the code style is similar, the naming conventions are similar and so on. So, I think we might see something like that, and I would definitely be a fan of this idea because, as I said, I do prefer libraries over frameworks. I think they give you the right amount of control, but of course you don’t want to learn a new approach with every library. So having some integrated set would actually be very nice to have in the Scala ecosystem.

Adam Warski 00:51:18 And this might be happening, there’s an initiative led by Scala Center and Vert.x Lab, which is called Scala Toolkit and it will contain a number of libraries which are like a companion to the standard library. So, there will be, for example, a library to parse Json, there will be a library to access the file system, and a part of it also will be a STTP client, which will allow you to make HTTP client requests. And the goal here is to create a toolkit for which you have the documentation in one place in a similar format and the integrations are there so that one part of the toolkit works with another, and so on. So that’s I think coming sometimes next year

Philip Winston 00:52:04 I’ll definitely put links to that project in the show notes. Two kind of technical topics in Tapir documentation that sounded, I don’t know if they’re unique but not commonly used phrases. One was “server interpreters,” and one was “interceptors.” I thought it’d be interesting to hear your explanation of what these two are, what value do they provide, and maybe if you know, are they general concepts used outside of Tapir and just kind of let us know about that.

Adam Warski 00:52:38 Sure. First let’s maybe talk about the interpreters. The first thing that you do with Tapir is you describe an endpoint using our API right? You get immutable value, which is a description, but it’s just that, right? It doesn’t contain any logic as to what should happen when the endpoint is invoked. It doesn’t contain any logic as to how to expose a server to the outside world. It’s just a data structure with the meta data, right? It also allows us to cleanly separate the structure of the endpoint, the shape, from actually any code that implements the business logic. So, this is the first step. Now you would probably want to actually expose a server, right? And for that, Tapir has server interpreters. So, Tapir itself doesn’t implement an HTTP server. There’s a ton of great HTTP servers out there, and writing yet another one probably would be a long effort and I’m not sure if it would implement anything better than already exists.

Adam Warski 00:53:44 So instead, you can take an endpoint description, put it inside the server interpreter, which is just a function in the end, and it turns the description into some kind of other representation that is understood by an actual HTTP server implementation. So for example, there is a Netty interpreter. Netty is a networking library for Java, but it’s also usable in Scala. So you can take a Tapir endpoint, put it inside the Netty server interpreter, and you get a Netty handler, which you can attach to a Netty server and expose it on the web. In a similar way, you have an Akka interpreter which converts an endpoint into an Akka route, which you can then expose. We also have interpreters for Vert.x, for Play, for Armeria, for HTTP4S, and probably some others as well. The latest interpreter is for a Helidon Nima, which is the Loom first implementation of an HTTP server in the Java using Project Loom.

Adam Warski 00:54:57 So these interpreters are, you can think of them as functions which take the description of an endpoint and turn it into an actual server which can then attach to some server implementation. And we provide nice APIs which allow you to actually expose those endpoints so that you don’t have to write too much code. So that’s one part. The interceptors, on the other hand, they’re also part of the server aspect of Tapir. So, there are some crosscutting concerns which you want to address. For example, exception handling, for example, gathering metrics, or what should happen when a parameter can’t be decoded because I know the Json body is malformed or you are expecting a query parameter that you said you want to be an integer but it’s actually, you know, a string and it doesn’t parse.

Adam Warski 00:55:51 So these are some components which you can plug in to the server interpreter and you can specify the behavior for all endpoints. Usually, you don’t want to specify this in a different way for each endpoint, right? If an exception happens inside your server logic, each for whatever the end point is, you probably want to just return a 500 internal server error, log the exception, and go further, right? A nice thing about interceptors and the way Tapir endpoints are defined is the way we can handle observability. So, one of the interceptors that’s there by default is the metrics interceptor, which well, you have to enable it, but it’s part of the Tapir project. So, we can actually leverage the structure of the endpoint as it is described in the data structure to provide some more information for metrics, for logging, compared to what we would have if the endpoint was just an opaque entity, right?

Adam Warski 00:56:55 So for example, the interceptor knows, and it gets a callback that the request is matching a certain endpoint and that we will actually try to invoke the server logic for that endpoint, right? Because the query parameters match, the path matches, the headers match, and so on. So, using that knowledge we can actually log some more information that, you know, now we are trying to invoke an endpoint with a given name or with a given path or with a given path template, right? Because maybe the past included some variable elements, some variable path segments and this makes it much easier to implement both metrics and logging in a nice way because you have access to that whole endpoint metadata that is defined with the endpoint description.

Philip Winston 00:57:47 So I think we’re talking somewhat about what’s called observability, I think today maybe that includes air handling, logging, any debugging features. Rather than get too deep into those, let’s maybe hear a real-world debugging story, a time that you had to use some of these observability features to, you know, you can change the names a little bit but to debug a specific problem,

Adam Warski 00:58:15 Right? So debugging, it’s not always that easy in Scala. So that’s actually one of the weaker sides I would say in Scala, especially when you use the effect systems, that is because they multiplex your code onto multiple threads, right? And this way they allow you to write code which uses library-level fibers or green threads on a bounded thread. So, this might change with Project Loom, but so far we are on the old Java implementation and because of that the stak traces aren’t always that informative because you can get a very short stack trace just you know, with the internal run loop exposed and the stack trace instead of the whole history of where the invocation actually came from. So, this makes debugging not as easy as it might be, and sometimes you just need to rely on the back logs or print lines, which is I think the most popular debugging method out there.

Adam Warski 00:59:16 So yeah, so, but that’s like Scala in the general. As far as Tapir goes, a very nice feature is that we can actually see, and we can enable it in Tapir, which endpoints is attempted to be the code one by one. So, by default that’s not turned on,but if you have some problematic endpoints, and especially in the early days of Tapir, I often got bug reports that people were expecting that a certain endpoint is invoked but it didn’t or that the endpoints are invoked out of order, or something that. So what you can do then is you can enable this detailed logging which allows you to see that, well the interpreter tried to decode the request for this particular endpoint, but the query parameter called AGE didn’t match. So, we reject this and we go to the next one, and here the path didn’t match. So we go to the next one and here we try to decode the body and once we try to get decode the body, we don’t try any subsequent endpoints because we’ve already consumed the HTTP request. So, we just return a 400 bad request, right? So you can see this detailed trace of what the server is actually trying to do, and in cases where you actually expect the endpoints to be invoked, but it didn’t, that’s very helpful. And that’s what I often use to debug various problems that people report when using Tapir.

Philip Winston 01:00:43 Let’s start wrapping up. Can you tell me what’s next for Tapir? Either as far as features, community adoption, what do you see taking up your time in the next year or so?

Adam Warski 01:00:55 Right, so as I said, I think we are going to explore the direction in which Scala and the Scala libraries would evolve, and both try to observe the community and maybe take part in the development itself, as well. So, there’s the question of how effects should be represented in Scala, should we focus on the functional representation of effects — so the IO monad? Should we go the Loom way using direct style code? There’s also a research project that aims to add capabilities to Scala, which is, I think it’s going to be an implementation of algebraic effects. So, something that allows you to capture what kind of side effects a certain function performs inside the type of signature, but without using monads. So, it’s trying to do the best of both worlds. So, this is a very promising direction, but it’s still probably a couple of years out.

Adam Warski 01:01:55 But who knows? Maybe we’ll see some of that. I think the base machinery for that is there in the form of context functions and contextual types, but it might need to be refined. So that’s one direction that we will observe. And however the community evolves, we’ll try to adopt Tapir and STTP to the new libraries that come to light. And as I said, it’s not going to probably — well, hopefully, it’s not going to be a very hard job because we try to be flexible in the approaches that we support. But we’ll see. Probably there will be no, some work will need to be done. So, another area that we are starting to explore is can we also expose an endpoint using GRPC? Using the same endpoint description as we are using for the HTTP version. So, there’s a preview version of that, and I think that’s also an interesting approach if you could actually have a single description, which you can interpret as a GRPC endpoint as an HTTP endpoint, although there’s some model differences in both, which make it hard.

Adam Warski 01:03:03 So yeah, we will just have to, you know, experiment and see how it evolves. Another direction is serverless, which I think is also very promising. We can actually leverage the metadata that we have. So, we have the whole metadata available to us at runtime, which we can actually use to generate a serverless description of an endpoint. So, there’s already some code in Tapir which allows you to interpret at Tapir endpoint as a Lambda function on AWS, right? And it generates the whole YAML for that for you. So you just need to, you know, there’s one component that generates the Docker image, which actually runs the code, and there’s another component which generates the AWS configuration, which you have to plug in to actually expose and configure the Lambda. So, I think this, that’s also an interesting direction of Tapir. Maybe there will be others as well into how you can actually leverage the description of an endpoint, which I haven’t envisioned yet, but these are our most immediate plans.

Adam Warski 01:04:05 Also, we would like probably to stabilize the other modules of Tapir. So far, we have stabilized core and as far as 1.0 is out, we are, there’s a guarantee that, things will be binary compatible, but when releases we should probably do the same for the server and client modules. So, it’s not like the most exciting work or the most visible work. So, you probably won’t to see a lot of interesting features out there, but it’s something that needs to be done, you know, just cause it’s nice for the users to know that they won’t have to do any code changes between Tapir releases. So yeah, I guess that’s our plans for the next half year at least.

Philip Winston 01:04:50 I’m glad to hear about that YAML generation for serverless. I am also not a fan of writing too much YAML. So how can listeners learn more about you and Software Mill? And I will put the links in the show notes.

Adam Warski 01:05:05 I think the best way is to visit our blog. We try to put a lot of emphasis on writing good technical blogs on subjects that we find interesting. So we have a whole incentive program in our company so that people actually share what they learn by writing blogs. I think it’s a very nice skill to have to be able to communicate efficiently in writing. And it’s also what I practice. I write quite a lot of blogs, so I think the technical blog is a great place to start. We do a lot of content on functional programming, on event sourcing, well and a lot of other subjects as well. I would also invite people to try out the Tapir documentation. We try to put a lot of effort into writing actually good docs so that you can easily find solutions to your problems. There’s a generator where you can generate a simple Tapir project. It’s called Adopt a Tapir. So maybe you can try it out and you just preview the code so we can see if the way the code looks seems nice to you and seems elegant, and hopefully we will make a good first impression.

Philip Winston 01:06:14 That’s great. Thanks for taking the time today, Adam.

Adam Warski 01:06:17 Thank you.

Philip Winston 01:06:18 This is Philip Winston for Software Engineering Radio. Thanks for listening.

[End of Audio]