We use cookies to ensure that we give you the best experience on our website. By continuing to use the website you agree for the use of cookies for better website performance and a personalized experience.

Sage+Archer Case Study: Transitioning from Hadoop / Apache Spark to Apache Druid / Apache Flink

Sebastian Zontek
.
November 23, 2023
Sage+Archer Case Study: Transitioning from Hadoop / Apache Spark to Apache Druid / Apache Flink
Sebastian Zontek
November 23, 2023
.
X MIN Read
November 23, 2023
.
X MIN Read
November 23, 2023
.
X MIN Read

Note: Since the publication of this article, Sage+Archer, a Vistar Media company has changed its name to Vistar Media. While the content remains relevant, please note that references to the former company name may appear within the article.

Background

Sage+Archer, a cutting-edge European advertising company in Digital-Out-Of-Home, was using a combination of Hadoop and Apache Spark for data processing. However, because of data processing inefficiencies with Hadoop and Spark, results from online bidding auctions were delayed by hours. This delay impeded the company's ability to quickly adapt to the marketplace and refine real time business decisioning. Furthermore, maintenance of the Hadoop cluster and the Spark code turned out to be a challenge.

To tackle these challenges Sage+Archer wanted to move from batch to real-time data processing and a more maintainable data-pipeline. They chose to partner with Deep.BI, a big data company with extensive experience in open-source data technologies, to overhaul their data processing infrastructure.

Project Scope

Deep.BI's task was to architect a new data processing pipeline and support Sage+Archer in implementing it. This involved replacing Hadoop and Spark with Apache Flink and Druid, reconstructing the entire data pipeline, and incorporating multiple data streams, including bid and no-bid events and ad metrics. Additionally, Deep.BI helped implement data enrichment from static SQL databases into the Kafka data stream, providing a more sophisticated and comprehensive data set for analysis.

Implementation

Fortunately, most of the bidstream and event data was already available in a streaming way in Kafka. The Hadoop/Spark setup would write from Kafka to GCS and then read these files again in the night and process them. The approach was to build the whole new Flink based data pipeline next to the existing one, run both of them for some time to compare results and then migrate to the new setup. This turned out to be a good approach.

With the capabilities of Flink to quickly change code and deploy it, it became possible to see in real time the changes that were made to the logic and compare the data. Re-running the pipeline on older data that was still available in Kafka was also a nice way to test the correctness of the newly implemented code. Especially because the Sage+Archer development team was able to move from Scala to Python, it was possible to achieve savings in future development costs and quickly iterate during the development cycle. 

Druid was already in use as an analytics engine. With the Flink setup it became also more real time as opposed to a Hadoop batch based ingestion into Druid. Paired with Flink, Druid enables Sage+Archer to aggregate and analyze their data streams with exceptional speed and flexibility. Late incoming events can now also be captured and made visible which was not possible in the batch as it had a hard cut-off at 01:00 at night.

Another important part of the logic in the system has to do with the clearing of the price of the bids and the metrics. This includes calculating from one currency to the other and measuring the use of certain targeting options in the campaigns. It was relatively simple to re-implement this logic in Flink and replace a Redis and Cassandra based system that was previously doing this work.

The final advantage over the Spark based logic is that it is now more easy to implement different business rules or logic depending on the incoming data. So for example if one publisher had a timeout for events of 15 minutes and another for 90 minutes this logic is very easy to implement.

The architecture designed by Deep.BI was a perfect fit to the challenges, so Sage+Archer could instantly process auction results, adapt their bidding strategies in real-time and show real time metrics for existing campaigns to its customers.

Analytics and Pacing for Real-time Bidding

Outcomes

The introduction of Flink and Druid significantly amplified the speed, efficiency and quality of Sage+Archer's data processing pipeline. Information about the outcome of online bidding auctions became available almost instantaneously so Sage+Archer could adjust their strategies and make more informed business decisions on the fly. 

Furthermore, the transition led to a reduction in infrastructure costs, and a huge simplification of the number of different technologies used. It also accelerated the development of new features. Sage+Archer's use of Python with Flink expedited this process. The switch from batch processing to real-time processing provided Sage+Archer with a competitive edge in the fast-paced advertising industry.

About Sage+Archer

Sage+Archer is an industry-leading automated self-service buying platform for Digital-Out-Of-Home advertising. Trusted by global brands, Sage+Archer offers an inventive solution that allows buyers to control how they engage with consumers in real time. Using mobile data and dynamic creative, Sage+Archer delivers effective and efficient advertising that resonates with consumers.

About Deep.BI

Deep.BI is a leading company in deploying next-generation data processing and analytics pipelines. They have successfully executed over 50 projects across the USA, Canada, Latin America, Europe, and Asia, assisting businesses to transition from batch to real-time processing, drastically reducing query times, and enabling a shift from costly commercial software to open-source solutions. Deep.BI excels as a reliable partner for designing, developing, and providing support for advanced data processing pipelines.

Subscribe and stay in the loop with the latest on Druid, Flink, and more!

Thank you for joining our newsletter!
Oops! Something went wrong while submitting the form.
Deep.BI needs the contact information you provide to contact you. You may unsubscribe at any time. For information on how to unsubscribe and more, please review our Privacy Policy.

You Might Also Like