90 seconds @ Burda is a new series that brings you closer to people, brands, departments or projects at Burda. Today, we visit the Data Science Team.
In the Focus Online lab, prototypes and new functions for brands such as Focus Online, Huffington Post and Chip are tested to further improve the user’s product experience. I spoke to Christian Essling, Director Analytics in the Focus Online lab, about big data and analysing masses of information. In the following interview, he explains why even perceived “mathematical late bloomers” can enjoy this discipline and how to bring order to numerical chaos.
How did you end up in this job – were you always a maths nerd?
Quite the opposite; for most of my school days I couldn’t get to grips with maths at all. It was too theoretical for me and I didn’t understand how it could be useful. When I began to study economics, maths suddenly transformed into optimising a company’s profits, which I found much more exciting.
My passion for data and statistics really came to the fore during my doctorate. I was fascinated by using algorithms to detect patterns in data. Then I had to decide which direction to take – to remain in academia and write research papers, or enter industry to analyse data and solve thrilling problems. I chose the latter.
Big data starts as an unstructured mass of little use. How do you bring order to the chaos?
A colleague of mine always compared it to a bar. Here we have 20 different spirits, but if we pour them all into a bucket and stir, you wouldn’t even drink it on the Magaluf strip. It would be much smarter to understand what we have and how to measure and mix it to make a delicious cocktail.
It’s the same with data. Unfortunately, many people still think that all you have to do is collect enough data, let a super smart algorithm do all the work and wait for the €500 notes to pop out at the end. Understanding what you have is an integral part of every data project.
Can you provide a practical example?
I once had a project that revolved around predicting engine failure based on entries in the control unit, so predictive maintenance. We could tell from the vehicles when the component had been replaced in the workshop and we knew which entries were contained in the engine control unit at that time. In principle, that was all we needed. But with 75% forecast accuracy, the performance of our algorithm was negligible.
When we went deeper, we realised that we needed to question the selection of vehicles provided to us. The component had been replaced in all the vehicles, but it transpired that in around half of the vehicles the component was still intact and had been replaced on spec. Once we excluded these cases, the forecast accuracy soared to over 95% and we learned that you also need to scrutinise factors you can’t even see.
Where does all this data actually come from?
There are of course huge numbers of sources. Log files from machines, accounting data from the Controlling department, data from weather stations. At BurdaForward, we are currently concentrating primarily on usage data, which tells us how the product is being used. We are also working to understand the themes of the articles a user has read in order to establish content preferences. For example, are they interested in regional news from Munich? If we have news from the Munich region when that user returns, we would offer it to them directly.
And what about data protection?
Data protection is important in everything we do. On the one hand, we are noticing that users are more careful and sceptical and want to know what is happening to their data. We want to counter this with increased transparency and make it clear that we only use their data to make their experience more pleasant.
On the other hand, much more stringent data protection measures will be enacted in the coming year with the General Data Protection Regulation and the E-Privacy Directive. We are already preparing to respond perfectly from the middle of the coming year.
How do you ensure that information is not interpreted and analysed incorrectly?
Unfortunately, the false interpretation of analyses is a recurring problem. It helps to ask whether all the variables that could have an effect are taken into account in the model. In some cases, omitting influential variables can seriously affect the validity of the results. However, it is even more serious – and yet much less obvious – when a model is developed on the basis of non-representative data, meaning that the dataset was not randomly selected.
I’d like to give an example here. During World War II, a military initiative investigated which parts of a fighter aircraft’s armour should be reinforced. To find out, they gathered aircraft that had returned from combat and counted the bullet holes. They counted an average of 1.7 bullet holes on the front half of the aircraft and 2.2 on the back half. However, it would be incorrect to assume that the rear fuselage requires greater reinforcement. In fact, the areas with the fewest bullet holes need to be reinforced – an aircraft that crashed after taking a shot to the engine would not have been able to take part in the initiative, but would have been much more interesting to examine.
Where do you think data science will be used most in the future?
I believe that data science will change all areas of our lives for the foreseeable future. Just look at all the things based on data today: You order a taxi via Alexa and probably pay with an app, rather than cash. Your car knows when it needs to go to the workshop to avoid a critical failure. Your heating knows when you leave the house and automatically shuts down to save money. Your streaming service gets to know your taste in music and plays more of the songs you really like.
At BurdaForward, we are also focusing intensively on ways to make our users’ lives more pleasant. For example, if our algorithm learns that you are a fan of FC Bayern, the results of their games could be displayed at the top of the start page every Saturday afternoon. We have a whole host of ideas that we want to test in our FOCUS Online lab over the coming months.
The Focus Online lab
In the Focus Online lab, prototypes and new functions are tested to further improve the product experience for every single user of brands such as Focus Online, Huffington Post and Chip. For example, the team is experimenting with chatbots, augmented reality and digital products for the home to significantly improve user experiences using the data acquired in the lab. Tests are performed in the usual usage environment of a mobile device or stationary computer.