Subscribe to the Teradata Blog

Get the latest industry news, technology trends, and data science insights each week.



我同意作为本网站提供商的Teradata天睿公司可能偶尔向我发送Teradata市场沟通电子邮件,其中包含有关产品、数据分析、活动和网络研讨会邀请的信息。我了解我可以随时通过点击我收到的任何电子邮件底部的取消订阅链接取消订阅。

您的隐私很重要。您的个人信息将根据Teradata全球隐私政策收集、存储和处理,您可以通过单击此隐私链接阅读和打印。

Analytics at Scale: What Data Analysts Need to Know

Analytics at Scale: What Data Analysts Need to Know

With every technology company shouting “Big Data”, we are led to think analytics challenges can be solved simply by storing a whole mess of data. With current technology, storing large volumes of data is easy. It also provides absolutely no value. Value only comes from data when it is examined, manipulated, learned from, and acted upon. To extract business insights you have to move data, exercise data (or is it exorcise) and that isn’t easy at all.

With the current trend toward IT becoming more of a technology provider than solution provider, it is becoming the responsibility of the analyst to choose from technologies, techniques, and engines. Gaining value from ever increasing volumes, varieties, and velocities of data is shining a spotlight on weaknesses technologies have always had; performance at scale and the ability to support concurrent workloads.


With the increasing need to consider entire data populations instead of just samples, data volumes are greater through every step of the analytic process. Not all engines and platforms perform equally. If you are using the right tools, even large, complex processes should run in a few minutes. If they aren’t, consider your approach to the solution and the technology you’re relying on.

In the current market, analysts are constantly pelted with buzzwords – AI, machine learning, deep learning, neural networks. Remember it’s all just math. Computers are really good at math, all computers. Some technologies want you to think they are the only choice for certain types of advanced analytics. There is no test or technique that can only be done with a single language or using a single platform. Start where the data lies. Start with a simple approach on your fastest platform.

More data means larger platforms and more sharing of resources. Concurrency has been a vulnerability for many technologies and that trend continues. Fancy platforms aren’t going to hold up through the analytic lifecycle if they can only support a handful of users or concurrent processes. Almost all platforms and engines will hit a concurrency ceiling no matter how much compute power they have. What matters is where that ceiling is. This has been a consistent challenge for open source. Since these projects are often not tested at scale, most open source bugs are concurrency related. Be aware and use appropriate caution.

As numbers people, it is easy for us analysts to get hung up on precision. In a business context, there is a clear preference for false negatives or false positives so you have an intentionally lower precision threshold. Don’t over engineer your solution. Choose the simplest and most performant solution that provides the business value you need. Basic statistics are still the best fit for binary data. Set logic is faster than procedural. Take advantage of parallel systems. You will be a hero as you rapidly show results. You will also produce processes that can actually be operationalized which is a super power in itself.

Most importantly, don’t overlook the value of concluding a path isn’t showing results and it’s time to move on to the next challenge.

 


Portrait of Karen Diamond

(Author):
Karen Diamond

Karen Diamond has been working in analytics for 23 years and with big data for 19. She has worked with over 15 fortune 500 companies designing scalable solutions in advanced analytics. From a business perspective, she has experience in profitability, forecasting, churn, product mix management, trade spend, product quality, and post acquisition data integration. She has architected AI solutions through the full life cycle from requirements to reporting. Since moving to San Francisco in 2016, she has had the pleasure of supporting the needs of the Apple business. View all posts by Karen Diamond

使用Teradata Vantage将您复杂的数据和分析变成答案。

联系我们