Elsevier: How to Gain Data Agility in the Cloud

Presenting at Talend Connect London 2018 is Reed Elsevier (part of RELX Group), a $7 billion data and analytics company with 31,000 employees, serving scientists, lawyers, doctors, and insurance companies among its many clients. The company helps scientists make discoveries, lawyers win cases, doctors save lives, insurance companies offer customers lower prices, and save taxpayers money by preventing fraud.

Standardizing business practices for successful growth

As the business grew over the years, different parts of the organization began buying and deploying integration tools, which created management challenges for central IT. It was a “shadow IT” situation, where individual business departments were implementing their own integrations with their own different tools.

With lack of standardization, integration was handled separately between different units, which made it more difficult for different components of the enterprise to share data. Central IT wanted to bring order to the process and deploy a system that was effective at meeting the company’s needs as well as scalable to keep pace with growth.

Moving to the cloud

One of the essential requirements was that any new solution be a cloud-based offering. Elsevier a few years ago became a “cloud first” company, mandating that any new IT services be delivered via the cloud and nothing be hosted on-premises. It also adopted agile methodologies and a continuous deployment approach, to become as nimble as possible when bringing new products or releases to market.

Elsevier selected Talend as a solution and began using it in 2016. Among the vital selection factors were platform flexibility, alignment with the company’s existing infrastructure, and its ability to generate Java code as output and support microservices and containers.

In their Talend Connect session, Delivering Agile integration platforms, Elsevier will discuss how it got up and running rapidly with Talend despite having a diverse development environment. And, how it’s using Talend, along with Amazon Web Services, to build a data platform for transforming raw data into insight at scale across the business. You’ll learn how Elsevier created a dynamic platform using containers, serverless data processing and continuous integration/continuous development to reach a level of agility and speed.

Agility is among the most significant benefits of their approach using Talend. Elsevier spins up servers as needed and enables groups to independently develop integrations on a common platform without central IT being a bottleneck. Since building the platform, internal demand has far surpassed the company’s expectations—as it is delivering cost savings and insight at a whole new level.

Attend this session to learn more about how you can transform your integration environment.

 

The post Elsevier: How to Gain Data Agility in the Cloud appeared first on Talend Real-Time Open Source Data Integration Software.

Source: Elsevier: How to Gain Data Agility in the Cloud by analyticsweekpick

Dec 06, 18: #AnalyticsClub #Newsletter (Events, Tips, News & more..)

[  COVER OF THE WEEK ]

image
Data security  Source

[ LOCAL EVENTS & SESSIONS]

More WEB events? Click Here

[ AnalyticsWeek BYTES]

>> Kaggle Joins Google Cloud by analyticsweek

>> Customer Loyalty 2.0 Article in Quirk’s Marketing Research Review by bobehayes

>> The Big Data Problem in Customer Experience Management: Understanding Sampling Error by bobehayes

Wanna write? Click Here

[ NEWS BYTES]

>>
 Want Safer Internet of Things? Change Government Buying Rules. – Nextgov Under  Internet Of Things

>>
 Winter May Bring Bouts of Extreme Cold to Some, Drought Relief to … – Global Banking And Finance Review (press release) Under  Financial Analytics

>>
 Will Cloud and Improving Margins Dominate Amazon’s Earning Report? – Motley Fool Under  Cloud

More NEWS ? Click Here

[ FEATURED COURSE]

Hadoop Starter Kit

image

Hadoop learning made easy and fun. Learn HDFS, MapReduce and introduction to Pig and Hive with FREE cluster access…. more

[ FEATURED READ]

Thinking, Fast and Slow

image

Drawing on decades of research in psychology that resulted in a Nobel Prize in Economic Sciences, Daniel Kahneman takes readers on an exploration of what influences thought example by example, sometimes with unlikely wor… more

[ TIPS & TRICKS OF THE WEEK]

Keeping Biases Checked during the last mile of decision making
Today a data driven leader, a data scientist or a data driven expert is always put to test by helping his team solve a problem using his skills and expertise. Believe it or not but a part of that decision tree is derived from the intuition that adds a bias in our judgement that makes the suggestions tainted. Most skilled professionals do understand and handle the biases well, but in few cases, we give into tiny traps and could find ourselves trapped in those biases which impairs the judgement. So, it is important that we keep the intuition bias in check when working on a data problem.

[ DATA SCIENCE Q&A]

Q:What do you think about the idea of injecting noise in your data set to test the sensitivity of your models?
A: * Effect would be similar to regularization: avoid overfitting
* Used to increase robustness

Source

[ VIDEO OF THE WEEK]

@EdwardBoudrot / @Optum on #DesignThinking & #DataDriven Products #FutureOfData #Podcast

 @EdwardBoudrot / @Optum on #DesignThinking & #DataDriven Products #FutureOfData #Podcast

Subscribe to  Youtube

[ QUOTE OF THE WEEK]

Data really powers everything that we do. – Jeff Weiner

[ PODCAST OF THE WEEK]

#FutureOfData with Rob(@telerob) / @ConnellyAgency on running innovation in agency

 #FutureOfData with Rob(@telerob) / @ConnellyAgency on running innovation in agency

Subscribe 

iTunes  GooglePlay

[ FACT OF THE WEEK]

Akamai analyzes 75 million events per day to better target advertisements.

Sourced from: Analytics.CLUB #WEB Newsletter

How to use the TRACE function to track down problems in QlikView

Mark Viecelli
Business Intelligence Specialist, Qlik
John Daniel Associates, Inc.

Mark’s Profile

What is the TRACE Function?

Many Qlik developers will tell you that one of the most tedious tasks when developing an application is tracking down where and why your script has failed, especially when you are working on an application with complex scripting. Sometimes these errors are obvious, but oftentimes you can spend unfathomable amounts of time rereading your script trying to find out exactly where things went wrong. Throughout my time working with Qlik, I have personally experienced countless iterations of digging through my failed script executions until finally locating my problems after what always seems like an eternity… That was until I stumbled across the TRACE function.

The TRACE function has become one of my favorite tools in my arsenal of QlikView tips and tricks due to its ability to save time and eliminate many of the frustrations that come with debugging your script. In fact, I have found this function so useful that I have made a habit of adding it to all of my applications no matter the size or complexity. I believe that after reading this blog you too will find yourself adding this TRACE function to each and every one of your scripts!

In its simplest form, the TRACE function writes a string to the Script Execution Progress window and the script log file as shown in the images below.

Trace Function 1

Trace Function 2

As you can see in the images above, the TRACE function allows you to set markers within the Script Execution Progress window and log file so that if a script fails you can walk through your script and follow each marker, eventually pin-pointing the exact area where things went awry. These “markers,” or messages, are versatile and customizable, which allow you to tailor your TRACE markers to any application or situation.

One important thing to remember is that the TRACE function does not specifically spell out what failed, but rather it shows you what pieces of the script were successfully completed, thus allowing the developer to follow the success markers up until the point of script failure. The image below shows what would happen in both the Script Execution Progress window and the application log file if the script were to fail.

Trace Function 3

Trace Function 4

You can see that the Glossary Load started and finished, which tells me that everything seems to be working fine with my Glossary. I then scroll down to see that my Change Log Load starts, but I do not see any marker telling me that the load of this table was finished. Due to the missing TRACE marker in the Script Execution window and the presence of the “Execution Failed” notification in the log file I know that something went wrong within my Change Log table load. These events and markers allow me to save time and jump right to that piece of script within my application and continue my research.

While this simple example is being used for the sake of this blog and explanation, in larger and more complex scripts being able to see what was fully completed and what was not can be a life saver. If you are running your script manually within the QlikView application you will still receive your normal script error dialog box, but the TRACE function can be invaluable when running jobs from the QMC where the specific error notifications are not as pronounced unless viewing the log file.

How is the TRACE function used?

There are a variety of ways to use the trace function but the two I find most useful are as follows:

  1. Designating a specific string by simply typing the desired output directly into the script as you have seen throughout this blog

Trace Function 5

Notice that there does not need to be quotes around your string unless you are adding additional components such as a variable. In that case you would need to use the format of ‘This is an example ’& $(vExample).

  1. Create a variable that feeds into your TRACE function

Trace Function 6

Trace Function 7

NOTE: If running the script manually from within the Qlik application it is VERY helpful to deselect the “Close when finished” checkbox at the bottom left hand side of the Script Execution Progress window. This eliminates the need to access the log file in the event of a failure. You can simply just scroll within your Script Execution Progress window to easily locate the area of your script error. This can also be configured in the User Preferences menu located under the Settings. Both methods are shown below.

Trace Function 8

Trace Function 9

*DEVELOPMENT TIP* – Because the TRACE function does not print anything to signify where a TRACE has been inserted it is sometimes hard to locate the markers that you have worked into your script. I find it extremely helpful to insert a string or characters that stand out into your TRACE function. Strings such as ‘……….’ (this string is used in the example image above), ‘********’, or ‘>>>>>>>’ will make the TRACE much easier to read when scrolling through your Script Execution Progress window or application log file.

Here’s a quick side-by-side look at what would happen in the Script Execution Progress window if I removed the preceding ‘……….’ before my TRACE string:

Trace Function 10

Trace Function 11

While the TRACE information is still present, it does not jump out at the developer the same as it would if they were to include preceding characters. For best results, use something that sticks out to you. Again, this helps to save time and increase readability.

 

NOTE: The TRACE function must be used before or after your script statements. For example, it CANNOT be used in the middle of a LOAD, such as after a field name.

As I have stated earlier, one limitation of the TRACE function is that it will not give you the exact record or field the script fails on. This is mainly due to the fact that the TRACE cannot be placed within a given statement. Although the TRACE function will not point out an exact record or field of error, it does have the ability to substantially narrow down the possible sources of errors. Because of this, I offer the following development tip:

 

*DEVELOPMENT TIP* – Use as many TRACE functions as you see fit. The more TRACE functions you use the easier it will be to debug your application.

For experienced developers it is easy to ignore certain functions or practices because they may seem useless or may take an extra few seconds to enter into your script. However, I maintain that it is these “basic” functions that we so often overlook that can actually be game changers when it comes crunch time. It is important to not become jaded by the development experience you obtain and always remember that going back to basics is not a sign of ignorance, but rather one of the smartest moves developers of all experience levels can make.

The TRACE function is debugging made easy. The next time you develop an application or begin to make changes to an old script give this method a try… You won’t be disappointed!

Save

The post How to use the TRACE function to track down problems in QlikView appeared first on John Daniel Associates, Inc..

Source by analyticsweek

Don’t Let your Data Lake become a Data Swamp

In an always-on, competitive business environment, organizations are looking to gain an edge through digital transformation. Subsequently, many companies feel a sense of urgency to transform across all areas of their enterprise—from manufacturing to business operations—in the constant pursuit of continuous innovation and process efficiency.

Data is at the heart of all these digital transformation projects. It is the critical component that helps generate smarter, improved decision-make by empowering business users to eliminate gut feelings, unclear hypotheses, and false assumptions. As a result, many organizations believe building a massive data lake is the ‘silver bullet’ for delivering real-time business insights. In fact, according to a survey by CIO review from IDG, 75 percent of business leaders believe their future success will be driven by their organization’s ability to make the most of their information assets. However, only four percent of these organizations said they are set up a data-driven approach for successfully benefits from their information.

Is your Data Lake becoming more of a hindrance than an enabler?

The reality is that all these new initiatives and technologies come with a unique set of generated data, which creates additional complexity in the decision-making process. To cope with the growing volume and complexity of data and alleviate IT pressure, some are migrating to the cloud.

But this transition—in turn—creates other issues. For example, once data is made more broadly available via the cloud, more employees want access to that information. Growing numbers and varieties of business roles are looking to extract value from increasingly diverse data sets, faster than ever—putting pressure on IT organizations to deliver real-time, data access that serves the diverse needs of business users looking to apply real-time analytics to their everyday jobs. However, it’s not just about better analytics—business users also frequently want tools that allow them to prepare, share, and manage data.

To minimize tension and friction between IT and business departments, moving raw data to one place where everybody can access it sounded like a good move.  The concept of the data lake first coined by James Dixon in 2014 expected the data lake to be a large body of raw data in a more natural state where different users come to examine it, delve into it, or extract samples from it. However, increasingly organizations are beginning to realize that all the time and effort spent building massive data lakes have frequently made things worse due to poor data governance and management, which resulted in the formation of so-called “Data Swamps”.

Bad data clogging up the machinery

The same way data warehouses failed to manage data analytics a decade ago, data lakes will undoubtedly become “Data Swamps” if companies don’t manage them in the correct way. Putting all your data in a single place won’t in and of itself solve a broader data access problem. Leaving data uncontrolled, un-enriched, not qualified, and unmanaged, will dramatically hamper the benefits of a data lake, as it will still have the ability to only be utilized properly by a limited number of experts with a unique set of skills.

A success system of real-time business insights starts with a system of trust. To illustrate the negative impact of bad data and bad governance, let’s take a look at what happened to Dieselgate. The Dieselgate emissions scandal highlighted the difference between real-world and official air pollutant emissions data. In this case, the issue was not a problem of data quality, but of ethics, since some car manufacturers misled the measurement system by injecting fake data. This resulted in fines for car manufacturers exceeding more than tens of billions of dollars and consumers losing faith in the industry. After all, how can consumers trust the performance of cars now that they know the system-of-measure has been intentionally tampered with? 

The takeaway in the context of an enterprise data lake is that its value will depend on the level of trust employees have in the data contained in the lake. Failing to control data accuracy and quality within the lake will create mistrust amongst employees, seed doubt about the competency of IT, and jeopardize the whole data value chain, which then negatively impacts overall company performance.

A cloud data warehouse to deliver trusted insights for the masses

Leading firms believe governed cloud data lakes represent an adequate solution to overcoming some of these more traditional data lake stumbling blocks. The following four-step approach helps modernize cloud data warehouse while providing better insight into the entire organization. 

  1. Unite all data sources and reconcile them: Make sure the organization has the capacity to integrate a wide array of data sources, formats and sizes. Storing a wide variety of data in one place is the first step, but it’s not enough. Bridging data pipelines and reconciling them is another way to gain the capacity to manage insights. Verify the company has a cloud-enabled data management platform combining rich integration capabilities and cloud elasticity to process high data volumes at a reasonable price.
  2. Accelerate trusted insights to the masses: Efficiently manage data with cloud data integration solutions that help prepare, profile, cleanse, and mask data while monitoring data quality over time regardless of file format and size.  When coupled with cloud data warehouse capabilities, data integration can enable companies to create trusted data for access, reporting, and analytics in a fraction of the time and cost of traditional data warehouses. 
  3. Collaborative data governance to the rescue: The old schema of a data value chain where data is produced solely by IT in data warehouses and consumed by business users is no longer valid.  Now everyone wants to create content, add context, enrich data, and share it with others. Take the example of the internet and a knowledge platform such as Wikipedia where everybody can contribute, moderate and create new entries in the encyclopedia. In the same way Wikipedia established collaborative governance, companies should instill a collaborative governance in their organization by delegating the appropriate role-based, authority or access rights to citizen data scientists, line-of-business experts, and data analysts.
  4. Democratize data access and encourage users to be part of the Data Value Chain: Without making people accountable for what they’re doing, analyzing, and operating, there is little chance that organizations will succeed in implementing the right data strategy across business lines. Thus, you need to build a continuous Data Value Chain where business users contribute, share, and enrich the data flow in combination with a cloud data warehouse multi-cluster architecture that will accelerate data usage by load balancing data processing across diverse audiences.

In summary, think of data as the next strategic asset. Right now, it’s more like a hidden treasure at the bottom of many companies. Once modernized, shared and processed, data will reveal its true value, delivering better and faster insights to help companies get ahead of the competition.

The post Don’t Let your Data Lake become a Data Swamp appeared first on Talend Real-Time Open Source Data Integration Software.

Source by analyticsweek