The Smaato Blog

The Smaato Blog

Spark on Docker on Amazon EC2: Only the Code Tells You Everything

Posted by Dr. Stefan Shadwinkel on November 13, 2015

Our global real-time advertising platform processes vast amounts of data per second. Therefore managing, supporting, and enhancing all its tools and processes with data-driven solutions is crucial to our success.

Developing these solution requires a flexible setup that can also be easily scaled to allow testing on reasonable data sizes. One part in our current setup is to run Apache Spark on Docker on Amazon EC2 instances.

Using straight EC2 instances instead of EMR has the benefits of lower costs and being able to directly run the latest version or development builds of Spark.

In this blog post, we will look into the peculiarities of configuring Spark on Docker on EC2 and dive into some Spark code excerpts to understand Spark's behavior.

Read more »

Microservices: Are They the Right Architecture for You?

Posted by Arne Schipper on July 29, 2015

These days, many across our industry and others are talking about microservices. It’s one of the buzzwords of the moment, even though the topic as such has a far longer history. With companies like Netflix, Gilt and LinkedIn, among others, drawing attention to this architecture, many smaller companies find themselves confronting this very issue. With cloud providers offering more and more microservice support, and with tools and frameworks evolving around this topic, we at Smaato have also been asking if this is the direction we’d like to go.

Read more »

Big Data & NoSQL Meetup Hamburg with Apache Flink at Smaato

Posted by Dr. Stefan Shadwinkel on July 17, 2015

Smaato was very happy to host the spring to summer edition of the Big Data and NoSQL Hamburg (BDNSHH) meetup with two great guests from Berlin: Aljoscha Krettek and Maximilian Michels from dataArtisans, the company behind Apache Flink.

Apache Flink is an open source platform for scalable batch and stream data processing. At its core is a streaming dataflow engine that provides data distribution, communication, and fault tolerance for distributed computations over data streams. Interesting features are its custom dataflow optimizer, custom memory management, and its strategies to perform well when memory runs out.

We’ve interviewed our guests to dig deeper into Apache Flink:

Read more »

Tuning Spark Streaming Applications

Posted by Stephan Brosinski on April 20, 2015

Distributed streaming applications are like Rube Goldberg machines. Lots of levers and knobs. You feel like you have to observe them in order to figure out how they work. This article is not about the one magic Spark parameter making your app scream. It's about efficient Spark performance tuning in order to optimize your freshly developed Spark Streaming app.

Read more »

Quickstart a Web Development Stack Using Vagrant & Docker

Posted by Roland Von Ohlen on December 11, 2014

A homogenous web development stack for heterogeneous environments

Developing and maintaining a mobile monetization platform like the Smaato RTB Ad Exchange, implied the use of Docker and Vagrant to set up a development stack, that could address some of our challenges. Projects like ours, with multiple developers spread across varying time zones can make setting up a local development environment a time consuming task. Add a team with different skillsets including a backend developer, a frontend developer and a designer working on the same codebase, then it gets tricky.

Read more »


    Recent Posts