A few weeks ago, I met with a number of IT consultants who had been hired to provide data science knowledge for an Industry 4.0 project at a large German industrial company. The day I saw them they looked frazzled and frustrated. At the beginning of our meeting they spoke about the source of their frustration: ‘Grabbing a bunch of sensor data’ from a turbine had turned out to be a pretty daunting task. It had looked so simple on the surface. But it wasn’t.
Data hungry Industry 4.0
In my last blog post, I looked at the Industry 4.0 movement. It’s an exciting and worthy cause but it requires a ton of data if executed well. Sensor data (aka industrial time-series data) from various assets and control systems is key. But acquiring this type of data, processing it in real-time, archiving and managing it for further analysis turns out to be extremely problematic if you use the wrong tools. So, what’s so difficult? Here are the common problems people encounter.
1. The asset jungle
When we look at a typical industrial environment such as a packaging line, a transmission network or a chemical plant, we will find a plethora of equipment from different manufacturers, assets of different ages (it’s not unusual for industrial equipment to operate for decades), control and automation systems from different vendors (E.g. Rockwell, Emerson, Siemens, etc.). To make things worse, there is also a multitude of different communication standards and protocols such as OPC DA, IEEE C37.118 & Modbus just to name a few. As a result, it’s not easy to communicate with industrial equipment. There is no single standard. Instead, you typically need to develop and operate a multitude of interfaces. Just ‘grabbing’ a bunch of sensor data suddenly turned difficult. There is no one-size fits all.
2. Speedy data
Once you have started communicating with an asset, you will find that its data can be quite fast. It’s not unusual for an asset to send data in the milisecond or second range. Capturing and processing something this fast requires special technology. Also, we do want to capture data at this resolution as it could potentially provide critical insights. And how about analyzing and monitoring that data in real-time? This is often a requirement for Industry 4.0 scenarios.
3. Big data volumes
Not only is data super fast, it’s also big. Modern assets can easily send around 500 -10000 distinct signals or tags (e.g. bearing vibration, temperature, etc.). A modern wind turbine has 1000 plus important signals. A complex packaging machine for the pharmaceutical industry captures 300-1000 signals.
The sheer volume creates a number of problems:
Storage: Think about the volume of data that is being generated in a day, week or month: 10k signals per second can easily grow to a significant amount of data. Storing this in a relational database can be very tricky and slow. You are looking at massive amounts of TB.
Context: Sensors usually have a signal/ tag name that can be quite confusing. The local engineer might know the context, but what about the data scientist? How would she know that tag AC03.Air_Flow is related to turbine A in Italy and not pump B in Denmark?
4. Tricky time-series
Last but not least, managing and analyzing industrial time series data is not that easy. Performing time-based calculations such as averages require specific functions that are not readily available in common tools such as Hadoop, SQL Server and Excel. To make things worse, units of measure are also tricky when it comes to industrial data. This can especially be a huge problems when you work across different regions (think about degree C vs F). You really have to make sure that you are comparing apples to apples.
5. Analytics ready data
An often overlooked problem is that sensor data is not necessarily clean. Data is usually sent at uneven points in time. There might be a sensor failure or a value just doesn’t change very often. As a result you always end up with unevenly spaced data which is really hard to manage in a relational database (just google the problem). Data scientists usually require equidistant data for their analytics projects. Getting the data in the right shape can be immensely time-consuming (think about interpolations etc.).
That tricky sensor data
To summarize this: ‘grabbing a bunch of sensor data’ is anything but easy. Industry 4.0 initiatives require a solid data foundation as discussed in my last post. Without it you run the risk of wasting a ton of time & resources. Also, chances are that the results will be disappointing. Imagine a data scientist attempting to train a predictive maintenance model with just a small set of noisy and incomplete data.
To do this properly, you need special tools such as the OSIsoft PI System. The PI System provides a unique real-time data infrastructure for all your Industry 4.0 projects. In my next post, I will describe how this works.
What are your experiences with industrial time-series data?
If you work in a manufacturing related industry, it’s difficult to escape the ideas and concepts of Industry 4.0. A brainchild of the German government, Industry 4.0 is a framework that is intended to revolutionize the manufacturing world. Similar to what the steam engine did for us earlier in the last century, smart usage of modern technology will allow manufacturers to significantly increase effectiveness.
While there is a general framework that describes what Industry 4.0 should be, I have noticed that most companies have developed their own definitions. As a matter of fact, most of my clients lump the terms Industry 4.0, Digitalization and IoT together. Also, the desired objectives have a wide range and include items such as:
Improve product quality
Reduce cycle time
Industry 4.0 initiatives
With a wide definition of Industry 4.0/ Digitalization comes an equally wide interpretation of what type of tactics and initiatives should be undertaken to achieve the desired outcomes. Based on my own experience, I see companies look at a variety of activities that include:
When you think about it, each one of these programs requires a ton of data. How else would you go about it? Consider the easiest example: energy management. Reducing the amount of money spent on energy throughout a large plant by gut-feel or experience is almost impossible. It is the smart use of data that allows you to identify energy usage patterns, and hot spots of consumption. Data must therefore be the foundation of every Industry 4.0 undertaking.
Big Data & Industry 4.0
What type of data does Industry 4.0 require? It depends. Typical scenarios could include relational data about industrial equipment (such as maintenance intervals, critical component descriptions etc.), geospatial (e.g. Equipment location, routes, etc.) and most importantly sensor data (e.g. Temperatures, pressure, flow-rates, vibration etc.).
Sensors and automation systems are the heart of your Industry 4.0 program: they pump a vast amount of highly critical time series data through your various initiatives. Just like the vital signals from a human being allow a doctor to diagnose a disease, industrial time series data allows us to learn more about our operations and to diagnose problems with our assets & processes early on.
The value of industrial time series data
Assets such as turbines, reactors, tablet presses, pumps or trains are complex things. Each one of them has thousands of valves, screws, pipes etc.. Instead of relying on intuition, hard-earned experience and luck, we can collect data about their status through sensors. It’s not unusual for specific assets to produce upwards of 1000-5000 signals. Combine a number of assets for a specific production process and you end up with some really BIG DATA. This data, however, allows engineers and data scientists to monitor operations in real-time, to detect specific patterns, to learn new insights and to ultimately increase the effectiveness of their operations.
Industry 4.0/ Digitalization is an exciting opportunity for most companies. While many organizations have already done a bunch of stuff in the past, the hype around Industry 4.0 allows project teams to secure funds for value-add initiatives. It surely is an exciting time for that reason.
But is dealing with industrial time series data easy? Collecting, archiving and managing this type of data can be a huge problem if not done properly. In the next blog post, I will speak about the common challenges and ideas for making this easier.
Activity trackers such as the ubiquitous Fitbit, Jawbone and the Garmin Vivofit are extremely popular these days. You can frequently spot them on colleagues, friends and customers. Their popularity raises a question: Does the collected data add value to your personal life? As a data hungry endurance athlete who relies on various technologies such as heart rate monitors, accelerometers & powermeters to improve my training I could not resist finding the answer. For the past three months I have worn a Garmin Vivofit to collect and analyze data. Here are my experiences and a simple process for getting value out of your activity tracker.
What do activity trackers actually do? The devices count the number of steps that you take each day (they also estimate the distance you have covered). In addition, they also track data about your sleep. The Garmin Vivofit and the Polar Loop also allow you to measure your heart rate and the associated calories burned during workouts. Pretty basic stuff really, nothing too fancy. Once the data has been collected you can review it in an app. The reports are very easy to understand, but it’s easy to brush over them. As a matter of fact, many people I know don’t use the dashboards. Instead, they simply look at their total step number. I believe that you can do more. Last year I wrote a very popular post called “Data is only useful if you use it!“. The activity tracker is a prime example. Here is the process that I leverage.
Five easy steps
1. Collect a bunch of data.
Start using your activity tracker for a few weeks. Make sure to wear it every single day. Wear it all the time. Synchronize frequently to avoid losing data. Also, make sure to familiarize yourself with the reports that are available for your device.
2. Analyze your lifestyle.
Once you have collected data, spend some time to look at the reports. I discovered a few surprises:
Reaching the typical goal of 10k steps per days is not that hard for me. A typical morning run can easily get me above to 10000 steps before 8am.
A typical workday is a bit of a shocker: Conference calls, admin work and email create long periods of complete inactivity except for the occasional walk to the coffee machine or the bathroom. As a matter of fact, the morning runs often account for 80% of the activity for the entire day.
Weekends and vacation days usually show a high activity level. I typically move around a lot and it is spread evenly throughout the day.
No wonder that conferences and trade-shows are so exhausting: the five most active days (as measured in steps) are linked to conferences. You constantly move, hardly ever sit around and often walk long distances.
Check out the charts below. Pretty interesting stuff.
3. Identify weak spots.
Now that you have found some interesting patterns, identify your weak spots. I found three specific areas:
Not enough sleep
Too many periods of complete inactivity during working hours
Hardly any activity on workdays when I don’t work out (steps below 5000)
It’s fairly easy to get this information out of the reports.
4. Make changes to your lifestyle
It’s time to make some changes. In general, scientists recommend to stay active throughout the day to keep your metabolism engaged. And some of the activity trackers can help you with that. My Garmin Vivofit, for example, features a red bar on top of the display which displays inactivity. To clear this bar you basically have to move and do something.
In general, here are some of the things that I have changed:
Instead of taking mental breaks at my desk (surfing, reading the news, personal email), I now get up every 45-60 minutes and spend a few minutes doing an activity (walking, push-ups, stretching).
3-4 short walks on rest days. It’s good to get out!
Focus on sleep
5. Use the activity tracker for daily motivation
Once you have some goals and objectives, you can also use the activity tracker to get motivated. First of all, there is the daily goal that all of these devices provide you with. Then some of them also have badges for certain achievements. It’d kind of fun to work on getting them. Last but not least, you can also participate in step challenges with friends and families.
Activity trackers can definitely provide you with some interesting insights. However, you do have to make an effort to analyze the data. Simply looking at the total number of steps is probably a waste of money. To do that you can purchase a cheap step tracker. It’s the analysis where you get the bang for the buck. Will I continue wearing the Garmin Vivofit? I certainly will. I am currently in the process of assessing how activity levels between really hard workouts affect my recovery. What are your experiences?
Last week, I had the honor to moderate the OSIsoft 2014 user conference in San Francisco. Over 2000 professionals came together to discuss the value and use of real-time data across different industries. There were a ton of really interesting and inspiring customer presentations. It’s just amazing to see how much companies rely on analytics these days to keep their operations running and/ or to improve their situation.
Combating the Polar Vortex
One of the keynote presentations of the conference really stuck out and I want to share the content with you. Columbia Pipeline Group (CPG) operate close to 16000 miles of natural gas pipelines in the US. Keeping the gas flowing reliably and safely is not easy to begin with. But doing that during the polar vortex that struck the East Cost of the US earlier this year is even harder. CPG turned to real-time data and analytics to keep their assets safe. The benefits of using data are tremendous as outlined in Emily Rawlings’ presentation:
Estimated $ 2.8M in savings from event (outages etc.) prevention
Increased customer confidence
Improved asset reliability
Expanded operational visibility.
If you have a few minutes to spare, take a look at Emily’s cool presentation:
We have all become data collectors. This is true for corporations and individuals. Organizations store petabytes worth of customer transactions, social sentiment and machine data. SAP’s Timo Elliott recently wrote a nice blog post about the ‘datafication’ of our own private lives. Just to give you a personal example, I have over 2GB worth of exercise data (heart rate, running pace, cycling power, GPS info, etc.) ranging back to 2003. But there is a growing problem – too many people & organizations are just really good at collecting data. Not enough people are doing anything with it. Let’s face it – data is only valuable if we really use it!
The inertia problem
Leveraging data for your benefit can be a struggle: you have to process it, you have to look at it, you have to analyze it and you also have to think about it. Here is an example: let’s say I am a runner and I wear a heart rate monitor that is connected to my iPhone. I will only get value out of that data, if I am willing and qualified to analyze it after each run. Letting the data sit on my iPhone will not help me identify trends and patterns. And then there is also the step of developing and implementing specific actions: should I rest, do I need to run harder to improve my marathon time or do I actually need to slow down to accelerate recovery? The same thing is happening in organizations. Starting to trust your analytics is another whole big issue.
How can we prevent becoming masters in data collecting but rather champions in performing analytics? Based on my experience there are a number of actions we should all look at (personal & professional):
Examine your available data and make sure that you really understand what it all means. This includes knowledge of the data sources, meaning of KPIs, collection methods, etc..
Sit down and clearly identify why you are collecting that data. Identify goals such as increase sales, set a PR in the next marathon, increase machine performance.
Develop a habit of working with your data on a daily basis – practice makes perfect. Only cont
Acquire the right skills (attend training, read a book, meet a thought-leader etc.) – we all need to work on our skills
Invest in the right tools – not every piece of software makes it easy to perform analysis.
Collaborate with other people, i.e. share your data, discuss findings
Celebrate success when you are able to achieve your desired outcomes
What are your experiences? Are you really leveraging your data or are you just collecting it? What else can we do?
No doubt – there is tremendous value in data. I use data collected from a small sensor in my bike to improve my cycling performance. Factories leverage data to keep their machines humming as long and as efficiently as possible. Unfortunately, most companies have historically tried to keep data for themselves. Sharing was a foreign concept. Security concerns and cultural barriers (“It’s my data!”) have fostered this environment.
“Share your knowledge. It is a way to achieve immortality.”― Dalai Lama XIV
What if we could share critical data with relevant stakeholders in a secure and effective way? Would we be able to improve our performance? Take a look at this short video to see what can happen if you start sharing subsets of your data. It is a fascinating scenario.
Amazon.com recently recommended the book Naked Statistics: Stripping Dread from the Data. Since I already knew the author Charles Wheelan from his awesome book Naked Economics: Undressing the Dismal Science (Fully Revised and Updated) I went ahead and bought this one for my Kindle. Great decision – it is one of those books that is fun to read while also adding (hopefully) long-lasting value. To make it short: Business Analytics professionals should read Naked Statistics. We work with data on a daily basis and there is an increasing emphasis on Predictive Analytics. Professionals therefore have a growing need for a decent working knowledge of statistics.
Many people have a hard time with statistics. College and university courses usually throw around a wild mix of scary looking formulas containing lot’s of Greek symbols. It certainly took me a while to make sense of my professor’s scribble. As a result, lot’s of people develop a fear of of this subject. Naked Statistics, however, demonstrates that it is possible to teach a seemingly complex topic in a simple manner. Charles Wheelan provides a journey through some of the most important statistical concepts and he makes it fun and easy to understand.
Naked Statistics covers a broad range of the most fundamental statistical concepts such as median, standard deviation, probability, correlation, regression analysis, central limit theorem and hypothesis testing. Each concept is explained in simple terms. The author also uses a mix of fictitious stories (some of them are funny) and real-life examples to show how things work and why they are relevant. Math is kept to a bare minimum – you will only find a few formulas in the main text. Reading is easy and fun. I was surprised to find that I devoured many chapters late at night in bed (I don’t usually read business books that late).
Naked Statistics is a great read. It provides you with a sound working knowledge of statistics and it actually motivates you to dig deeper (I pulled out one of old text books). For those people who know statistics, this book can help you brush up on some concepts. Analytics professionals might also want to recommend this read to colleagues who start working with predictive analytics and other advanced tools. Students should buy a copy before they attend statistics classes – they will certainly be able to grasp the more advanced subjects more easily. I wish I had had this book back at university. It would have saved me some sleepless nights. Two thumbs up – Charles Wheelan does strip the dread from the data.
2012 is almost over and I just realized that I have not yet posted a single entry about big data. Clearly a big mistake – right? Let’s see: Software vendors, media and industry analysts are all over the topic. If you listen to some of the messages, it seems that big data will create billions of jobs, solve all problems and will make us happier individuals. Really? Not really – at least in my humble opinion. It rather seems to me that big data fills a number of functions for a select group of people:
It provides analysts with a fresh and fancy-sounding topic
Media have something big to write about
BI companies obtain a ‘fresh’ marketing message
Professionals can have ‘smart’ discussions
Consultants can sell new assessment projects
Big data – really?
I do apologize for sounding so negative. But I have a hard time finding big value in this big data discussion. Please don’t get me wrong – I would be the last person to deny that there is a tremendous amount of value in big data. But it does not deserve the hype. On the contrary, I personally find that the current discussions ignore the fact that most of us do not have the skills to do big data. We need to get the foundation right and make sure that we can tame the ‘small data lion’ before we tackle the big data Gozzilla. Don’t believe me? Consider the following:
Spreadsheets are still the number one data analysis tool in most organizations.
Managers still argue about whose revenue and unit numbers are correct.
Knowledge workers have yet to learn how to make sense of even simple corporate data sets.
3D pie charts are floating around boardrooms.
Companies spend over 6 months collecting and aggregating budgets only to find that a stupid formula mistake messed up the final report
Hardly any professional has ever read a book or attended a course about proper data analysis
Here is the thing: Dealing with big data is a big challenge. It will require a lot more skills than most of us currently have (try finding meaning in gazillion TBs of data using a 3D pie chart!).
A big data problem
Earlier this year, I acquired a 36 megapixel camera. You can take some amazingly gorgeous photos with it. But it comes at a cost. Each photo consumes 65-75MB on my sad hard drive. Vacations now create a big data challenge for me. But guess what: this camera is anything but easy to handle. You have to really slow down and put 100% effort into each and every photo. 36MP have the ability to reveal every single flaw: The slightest camera shake is recorded & exposed. Minimal focus deviations that a small camera would not register, kill an otherwise solid photo. In other words: this big data camera requires big skills. And here is something else: The damn camera won’t help you create awesome photos. No, you still need to learn the basics such as composition, proper lighting etc.. That’s the hard stuff. But let me tell you this: If you know the basics, this big data camera certainly does some magic for you.
Big data – what’s next
Ok. That was my big data rant. I love data and analytics. No doubt – there is a tremendous amount of value we can gain from those new data sources. But let’s not forget that we need to learn the basics first. A Formula 1 driver learned his skills on the cart track. At the same time, there is a lot of information hidden in our ‘small data’ sources such as ERP, CRMs and historians. Let’s take a step back and put things into perspective. Big data is important but not THAT important.
With that: Thank your for following this blog. Happy holidays and see you next year!
December is always an interesting month. Analysts, software companies and journalists post a ton of predictions, reviews and opinions to celebrate the start of the new year. 2012 is not different. Here are a a few posts that I highly recommend reading.
Most influential visualizations
Tableau Software without a doubt knows a lot about data visualization techniques. That’s why I happily viewed one of their new presentations out on Slideshare. It’s called ‘The 5 most influential data visualizations of all time”. Some of the featured visualizations have been discussed by Stephen Few and Edward Tufte, but it’s well worth spending a few minutes reviewing and thinking about how they changed the course of time.
Are you ready for some hilarious reading? Well, here it is. The good folks over at the Simply Statistics blog compiled a number of data visualizations that appeared on Fox News (don’t worry – this is NOT about politics). Most of the featured charts are flawed from a technical point of view, but it turns out that they do an excellent job of communicating the intended message (which can be very different from what the actual data says….). Read with a smile but don’t loose focus on the idea that there is an important message! Most of us strive to produce visualizations, dashboards and reports to provide an accurate portrait of reality. But we can also twist this around and do the opposite: confuse and mislead. You might also want to take a quick look at the comment section of that blog entry. That’s where the post starts getting political.
Nucleus Top Ten Predictions for 2013
Nucleus is one of those research houses that produces very interesting reports. I don’t always agree with the stuff that they write, but it is certainly amongst the most tangible in the industry. Their 2013 predictions don’t disappoint. And guess what – BI is on top of the list. The remaining predictions represent a mixture of different trends – most of which affect analytics to a certain degree. In any case, the free report is well worth a five minute investment. One of my favorite statements is: “It’s time to make sure HP has signed its organ donor card.”You can download the free report from the Nucleus website.
Many of us get really frustrated when business people do not immediately embrace our analytics solutions. But let’s step in their shoes for a moment. Trusting analytics for decision making is leap of faith. Imagine you are a manager who is used to listening to his gut feeling and intuition. We can’t expect that person to immediately embrace the latest and greatest analytics solution. As a matter of fact, data can often be viewed as some scary. Starting to rely on analytics can therefore often feel like the proverbial leap of faith.
Why is that so? When we simplify the feelings that a new analytics user experiences we can identify three major stages.
Reject: Can I trust the data? What am I supposed to do with it?
Accept: I can see the value but I can’t identify the stories
Embrace: This is cool! What else can I do with this?
We as analytics professionals have the duty to help people make that leap of faith. We have to make it easy for them to get from stage 1 to stage 3.
A personal story
About ten years ago, I got really serious about my running and cycling. Instead of just following my gut feeling for developing a training plan, I purchased a heart rate monitor, a cycling power meter and some analytics software.
Stage 1 – Reject: The initial experience was intimidating. Getting everything to work was complicated and there were a ton of data drop-outs. What about the data itself? It did not tell me anything. All I saw was a bunch of colorful charts and nothing else. I was ready to throw the stuff out of the door. It felt like a waste of time.
Stage 2 – Accept: After a few weeks, however, things started to work smoothly and a coach finally helped me understand the charts and taught me how to identify a few weaknesses in my approach. Based on those insights, I tweaked my plan a little bit. It was a positive step forward but I was still waiting for the big impact.
Stage 3 – Embrace: Studyingbooks and consulting with other athletes allowed me to achieve a real break-through. That’s when I finally learned to really rely on the data. Here is an example: Analysis showed that I had trained too hard for over two years. I needed to change my approach and spend more time recovering. It sounded scary: Train slower to race faster? Guess what – it worked! Once I started to back off, I was able to dramatically improve my performance. And that is my personal story of moving from stage 1 (reject) to stage 3 (embrace).
Don’t expect your users to immediately embrace your cool analytics solution. It is a leap of faith. It is your job to help and coach them. Show them how they can apply their data and the associated insights. Also, make sure that you develop solutions that are easy to use and that communicate clearly. Don’t let them alone. Move them along these three stages. It’s your responsibility! You can also find some ideas how to do that on this blog.