Aug 29, 19: #AnalyticsClub #Newsletter (Events, Tips, News & more..)

Warning: file_get_contents(http://news.analyticsweek.com/tw/newspull.php): failed to open stream: HTTP request failed! in /home3/vishaltao/public_html/mytao/script/includeit.php on line 15

Warning: file_get_contents(http://news.analyticsweek.com/tw/newspull.php): failed to open stream: HTTP request failed! in /home3/vishaltao/public_html/mytao/script/includeit.php on line 15

Warning: file_get_contents(http://news.analyticsweek.com/tw/newspull.php): failed to open stream: HTTP request failed! in /home3/vishaltao/public_html/mytao/script/includeit.php on line 15

[  COVER OF THE WEEK ]

image
Accuracy  Source

[ LOCAL EVENTS & SESSIONS]

More WEB events? Click Here

[ AnalyticsWeek BYTES]

>> Is Your Data Product Ready for Launch? by analyticsweek

>> Are U.S. Hospitals Delivering a Better Patient Experience? by bobehayes

>> 50 UX Metrics, Methods, & Measurement Articles from 2018 by analyticsweek

Wanna write? Click Here

[ FEATURED COURSE]

Master Statistics with R

image

In this Specialization, you will learn to analyze and visualize data in R and created reproducible data analysis reports, demonstrate a conceptual understanding of the unified nature of statistical inference, perform fre… more

[ FEATURED READ]

On Intelligence

image

Jeff Hawkins, the man who created the PalmPilot, Treo smart phone, and other handheld devices, has reshaped our relationship to computers. Now he stands ready to revolutionize both neuroscience and computing in one strok… more

[ TIPS & TRICKS OF THE WEEK]

Analytics Strategy that is Startup Compliant
With right tools, capturing data is easy but not being able to handle data could lead to chaos. One of the most reliable startup strategy for adopting data analytics is TUM or The Ultimate Metric. This is the metric that matters the most to your startup. Some advantages of TUM: It answers the most important business question, it cleans up your goals, it inspires innovation and helps you understand the entire quantified business.

[ DATA SCIENCE Q&A]

Q:Given two fair dices, what is the probability of getting scores that sum to 4? to 8?
A: * Total: 36 combinations
* Of these, 3 involve a score of 4: (1,3), (3,1), (2,2)
* So: 3/36=1/12
* Considering a score of 8: (2,6), (3,5), (4,4), (6,2), (5,3)
* So: 5/36

Source

[ VIDEO OF THE WEEK]

@AnalyticsWeek: Big Data Health Informatics for the 21st Century: Gil Alterovitz

 @AnalyticsWeek: Big Data Health Informatics for the 21st Century: Gil Alterovitz

Subscribe to  Youtube

[ QUOTE OF THE WEEK]

Information is the oil of the 21st century, and analytics is the combustion engine. – Peter Sondergaard

[ PODCAST OF THE WEEK]

#FutureOfData Podcast: Conversation With Sean Naismith, Enova Decisions

 #FutureOfData Podcast: Conversation With Sean Naismith, Enova Decisions

Subscribe 

iTunes  GooglePlay

[ FACT OF THE WEEK]

Bad data or poor data quality costs US businesses $600 billion annually.

Sourced from: Analytics.CLUB #WEB Newsletter

August 28, 2017 Health and Biotech analytics news roundup

Genome sequencing method can detect clinically relevant mutations using 5 CTCs: Researchers showed that a technique that can sequence very long stretches of the genome can accurately quantify mutations using only 5 ‘circulating tumor cells’ (although they used 34 in this study).

Artificial intelligence predicts dementia before onset of symptoms: Using only one scan of the brain per patient, McGill scientists were able to accurately predict Alzheimer’s 2 years before its onset.

Using machine learning to improve patient care: Two papers from MIT made strides in the field, one that used ICU data to predict necessary treatments and another that trained models of mortality and length of stay based on electronic health record data.

How CROs Are Helping With Healthcare’s Data Problem: Clinical trial costs are a major cause of rising health care costs. To help streamline this, pharmaceutical companies are increasingly using ‘contract research organizations’ to conduct trials, as they can use their expertise and specialized business intelligence tools to cut costs.

I was worried about artificial intelligence—until it saved my life: Krista Jones had a rare form of cancer that was only able to be correctly treated with machine learning technology.

Genomic Medicine Has Entered the Building: Some types of genome sequences now cost as much as an MRI, which has allowed organizations to undertake large-scale studies in personalized medicine.

Source: August 28, 2017 Health and Biotech analytics news roundup

The 37 best tools for data visualization

Creating charts and info graphics can be time-consuming. But these tools make it easier.

It’s often said that data is the new world currency, and the web is the exchange bureau through which it’s traded. As consumers, we’re positively swimming in data; it’s everywhere from labels on food packaging design to World Health Organisation reports. As a result, for the designer it’s becoming increasingly difficult to present data in a way that stands out from the mass of competing data streams.

One of the best ways to get your message across is to use a visualization to quickly draw attention to the key messages, and by presenting data visually it’s also possible to uncover surprising patterns and observations that wouldn’t be apparent from looking at stats alone.

EVENT PROMOTION

Not a web designer or developer? You may prefer free tools for creating infographics.

As author, data journalist and information designer David McCandless said in his TED talk: “By visualizing information, we turn it into a landscape that you can explore with your eyes, a sort of information map. And when you’re lost in information, an information map is kind of useful.”

There are many different ways of telling a story, but everything starts with an idea. So to help you get started we’ve rounded up some of the most awesome data visualization tools available on the web.

01. Dygraphs

Help visitors explore dense data sets with JavaScript library Dygraphs

Dygraphs is a fast, flexible open source JavaScript charting library that allows users to explore and interpret dense data sets. It’s highly customizable, works in all major browsers, and you can even pinch to zoom on mobile and tablet devices.

02. ZingChart

ZingChart lets you create HTML5 Canvas charts and more

ZingChart is a JavaScript charting library and feature-rich API set that lets you build interactive Flash or HTML5 charts. It offer over 100 chart types to fit your data.

03. InstantAtlas

InstantAtlas enables you to create highly engaging visualisations around map data

If you’re looking for a data viz tool with mapping, InstantAtlas is worth checking out. This tool enables you to create highly-interactive dynamic and profile reports that combine statistics and map data to create engaging data visualizations.

04. Timeline

 Timeline
Timeline creates beautiful interactive visualizations

Timeline is a fantastic widget which renders a beautiful interactive timeline that responds to the user’s mouse, making it easy to create advanced timelines that convey a lot of information in a compressed space.

Each element can be clicked to reveal more in-depth information, making this a great way to give a big-picture view while still providing full detail.

05. Exhibit

 Exhibit
Exhibit makes data visualization a doddle

Developed by MIT, and fully open-source, Exhibit makes it easy to create interactive maps, and other data-based visualizations that are orientated towards teaching or static/historical based data sets, such as flags pinned to countries, or birth-places of famous people.

06. Modest Maps

 Modest Maps
Integrate and develop interactive maps within your site with this cool tool

Modest Maps is a lightweight, simple mapping tool for web designers that makes it easy to integrate and develop interactive maps within your site, using them as a data visualization tool.

The API is easy to get to grips with, and offers a useful number of hooks for adding your own interaction code, making it a good choice for designers looking to fully customise their user’s experience to match their website or web app. The basic library can also be extended with additional plugins, adding to its core functionality and offering some very useful data integration options.

07. Leaflet

 Leaflet
Use OpenStreetMap data and integrate data visualisation in an HTML5/CSS3 wrapper

Another mapping tool, Leaflet makes it easy to use OpenStreetMap data and integrate fully interactive data visualisation in an HTML5/CSS3 wrapper.

The core library itself is very small, but there are a wide range of plugins available that extend the functionality with specialist functionality such as animated markers, masks and heatmaps. Perfect for any project where you need to show data overlaid on a geographical projection (including unusual projections!).

08. WolframAlpha

 Wolfram Alpha
Wolfram Alpha is excellent at creating charts

Billed as a “computational knowledge engine”, the Google rival WolframAlpha is really good at intelligently displaying charts in response to data queries without the need for any configuration. If you’re using publically available data, this offers a simple widget builder to make it really simple to get visualizations on your site.

09. Visual.ly

 Visual.ly
Visual.ly makes data visualization as simple as it can be

Visual.ly is a combined gallery and infographic generation tool. It offers a simple toolset for building stunning data representations, as well as a platform to share your creations. This goes beyond pure data visualisation, but if you want to create something that stands on its own, it’s a fantastic resource and an info-junkie’s dream come true!

10. Visualize Free

 Visualize Free
Make visualizations for free!

Visualize Free is a hosted tool that allows you to use publicly available datasets, or upload your own, and build interactive visualizations to illustrate the data. The visualizations go well beyond simple charts, and the service is completely free plus while development work requires Flash, output can be done through HTML5.

11. Better World Flux

 Better World Flux
Making the ugly beautiful – that’s Better World Flux

Orientated towards making positive change to the world, Better World Flux has some lovely visualizations of some pretty depressing data. It would be very useful, for example, if you were writing an article about world poverty, child undernourishment or access to clean water. This tool doesn’t allow you to upload your own data, but does offer a rich interactive output.

12. FusionCharts

FusionCharts Suite XT
A comprehensive JavaScript/HTML5 charting solution for your data visualization needs

FusionCharts Suite XT brings you 90+ charts and gauges, 965 data-driven maps, and ready-made business dashboards and demos. FusionCharts comes with extensive JavaScript API that makes it easy to integrate it with any AJAX application or JavaScript framework. These charts, maps and dashboards are highly interactive, customizable and work across all devices and platforms. They also have a comparison of the top JavaScript charting libraries which is worth checking out.

13. jqPlot

 jQPlot
jqPlot is a nice solution for line and point charts

Another jQuery plugin, jqPlot is a nice solution for line and point charts. It comes with a few nice additional features such as the ability to generate trend lines automatically, and interactive points that can be adjusted by the website visitor, updating the dataset accordingly.

14. Dipity

 Dipity
Dipity has free and premium versions to suit your needs

Dipity allows you to create rich interactive timelines and embed them on your website. It offers a free version and a premium product, with the usual restrictions and limitations present. The timelines it outputs are beautiful and fully customisable, and are very easy to embed directly into your page.

15. Many Eyes

 Many Eyes
Many Eyes was developed by IBM

Developed by IBM, Many Eyes allows you to quickly build visualizations from publically available or uploaded data sets, and features a wide range of analysis types including the ability to scan text for keyword density and saturation. This is another great example of a big company supporting research and sharing the results openly.

16. D3.js

 D3.js
You can render some amazing diagrams with D3

D3.js is a JavaScript library that uses HTML, SVG, and CSS to render some amazing diagrams and charts from a variety of data sources. This library, more than most, is capable of some seriously advanced visualizations with complex data sets. It’s open source, and uses web standards so is very accessible. It also includes some fantastic user interaction support.

17. JavaScript InfoVis Toolkit

 JavaScript InfoVis Toolkit
JavaScript InfoVis Toolkit includes a handy modular structure

A fantastic library written by Nicolas Belmonte, the JavaScript InfoVis Toolkit includes a modular structure, allowing you to only force visitors to download what’s absolutely necessary to display your chosen data visualizations. This library has a number of unique styles and swish animation effects, and is free to use (although donations are encouraged).

18. jpGraph

 jpGraph
jpGraph is a PHP-based data visualization tool

If you need to generate charts and graphs server-side, jpGraph offers a PHP-based solution with a wide range of chart types. It’s free for non-commercial use, and features extensive documentation. By rendering on the server, this is guaranteed to provide a consistent visual output, albeit at the expense of interactivity and accessibility.

19. Highcharts

 Highcharts
Highcharts has a huge range of options available

Highcharts is a JavaScript charting library with a huge range of chart options available. The output is rendered using SVG in modern browsers and VML in Internet Explorer. The charts are beautifully animated into view automatically, and the framework also supports live data streams. It’s free to download and use non-commercially (and licensable for commercial use). You can also play with the extensive demos using JSFiddle.

20. Google Charts

 Google Charts
Google Charts has an excellent selection of tools available

The seminal charting solution for much of the web, Google Charts is highly flexible and has an excellent set of developer tools behind it. It’s an especially useful tool for specialist visualizations such as geocharts and gauges, and it also includes built-in animation and user interaction controls.

21. Excel

 Excel
It isn’t graphically flexible, but Excel is a good way to explore data: for example, by creating ‘heat maps’ like this one

You can actually do some pretty complex things with Excel, from ‘heat maps’ of cells to scatter plots. As an entry-level tool, it can be a good way of quickly exploring data, or creating visualizations for internal use, but the limited default set of colours, lines and styles make it difficult to create graphics that would be usable in a professional publication or website. Nevertheless, as a means of rapidly communicating ideas, Excel should be part of your toolbox.

Excel comes as part of the commercial Microsoft Office suite, so if you don’t have access to it, Google’s spreadsheets – part ofGoogle Docs and Google Drive – can do many of the same things. Google ‘eats its own dog food’, so the spreadsheet can generate the same charts as the Google Chart API. This will get your familiar with what is possible before stepping off and using the API directly for your own projects.

22. CSV/JSON

CSV (Comma-Separated Values) and JSON (JavaScript Object Notation) aren’t actual visualization tools, but they are common formats for data. You’ll need to understand their structures and how to get data in or out of them.

23. Crossfilter

 Crossfilter
Crossfilter in action: by restricting the input range on any one chart, data is affected everywhere. This is a great tool for dashboards or other interactive tools with large volumes of data behind them

As we build more complex tools to enable clients to wade through their data, we are starting to create graphs and charts that double as interactive GUI widgets. JavaScript library Crossfilter can be both of these. It displays data, but at the same time, you can restrict the range of that data and see other linked charts react.

24. Tangle

 Tangle
Tangle creates complex interactive graphics. Pulling on any one of the knobs affects data throughout all of the linked charts. This creates a real-time feedback loop, enabling you to understand complex equations in a more intuitive way

The line between content and control blurs even further with Tangle. When you are trying to describe a complex interaction or equation, letting the reader tweak the input values and see the outcome for themselves provides both a sense of control and a powerful way to explore data. JavaScript library Tangle is a set of tools to do just this.

Dragging on variables enables you to increase or decrease their values and see an accompanying chart update automatically. The results are only just short of magical.

25. Polymaps

 Polymaps
Aimed more at specialist data visualisers, the Polymaps library creates image and vector-tiled maps using SVG

Polymaps is a mapping library that is aimed squarely at a data visualization audience. Offering a unique approach to styling the the maps it creates, analagous to CSS selectors, it’s a great resource to know about.

26. OpenLayers

 OpenLayers
It isn’t easy to master, but OpenLayers is arguably the most complete, robust mapping solution discussed here

OpenLayers is probably the most robust of these mapping libraries. The documentation isn’t great and the learning curve is steep, but for certain tasks nothing else can compete. When you need a very specific tool no other library provides, OpenLayers is always there.

27. Kartograph

 Kartograph
Kartograph’s projections breathe new life into our standard slippy maps

Kartograph’s tag line is ‘rethink mapping’ and that is exactly what its developers are doing. We’re all used to the Mercator projection, but Kartograph brings far more choices to the table. If you aren’t working with worldwide data, and can place your map in a defined box, Kartograph has the options you need to stand out from the crowd.

28. CartoDB

 CartoDB
CartoDB provides an unparalleled way to combine maps and tabular data to create visualisations

CartoDB is a must-know site. The ease with which you can combine tabular data with maps is second to none. For example, you can feed in a CSV file of address strings and it will convert them to latitudes and longitudes and plot them on a map, but there are many other users. It’s free for up to five tables; after that, there are monthly pricing plans.

29. Processing

 Processing
Processing provides a cross-platform environment for creating images, animations, and interactions

Processing has become the poster child for interactive visualizations. It enables you to write much simpler code which is in turn compiled into Java.

There is also a Processing.js project to make it easier for websites to use Processing without Java applets, plus a port to Objective-C so you can use it on iOS. It is a desktop application, but can be run on all platforms, and given that it is now several years old, there are plenty of examples and code from the community.

30. NodeBox

 NodeBox
NodeBox is a quick, easy way for Python-savvy developers to create 2D visualisations

NodeBox is an OS X application for creating 2D graphics and visualizations. You need to know and understand Python code, but beyond that it’s a quick and easy way to tweak variables and see results instantly. It’s similar to Processing, but without all the interactivity.

31. R

 R
A powerful free software environment for statistical computing and graphics, R is the most complex of the tools listed here

How many other pieces of software have an entire search enginededicated to them? A statistical package used to parse large data sets, R is a very complex tool, and one that takes a while to understand, but has a strong community and package library, with more and more being produced.

The learning curve is one of the steepest of any of these tools listed here, but you must be comfortable using it if you want to get to this level.

32. Weka

 Weka
A collection of machine-learning algorithms for data-mining tasks, Weka is a powerful way to explore data

When you get deeper into being a data scientist, you will need to expand your capabilities from just creating visualizations to data mining. Weka is a good tool for classifying and clustering data based on various attributes – both powerful ways to explore data – but it also has the ability to generate simple plots.

33. Gephi

 Gelphi
Gephi in action. Coloured regions represent clusters of data that the system is guessing are similar

When people talk about relatedness, social graphs and co-relations, they are really talking about how two nodes are related to one another relative to the other nodes in a network. The nodes in question could be people in a company, words in a document or passes in a football game, but the maths is the same.

Gephi, a graph-based visualiser and data explorer, can not only crunch large data sets and produce beautiful visualizations, but also allows you to clean and sort the data. It’s a very niche use case and a complex piece of software, but it puts you ahead of anyone else in the field who doesn’t know about this gem.

34. iCharts

 iCharts
iCharts can have interactive elements, and you can pull in data from Google Docs

The iCharts service provides a hosted solution for creating and presenting compelling charts for inclusion on your website. There are many different chart types available, and each is fully customisable to suit the subject matter and colour scheme of your site.

Charts can have interactive elements, and can pull data from Google Docs, Excel spreadsheets and other sources. The free account lets you create basic charts, while you can pay to upgrade for additional features and branding-free options.

35. Flot

 Flot
Create animated visualisations with this jQuery plugin

Flot is a specialised plotting library for jQuery, but it has many handy features and crucially works across all common browsers including Internet Explorer 6. Data can be animated and, because it’s a jQuery plugin, you can fully control all the aspects of animation, presentation and user interaction. This does mean that you need to be familiar with (and comfortable with) jQuery, but if that’s the case, this makes a great option for including interactive charts on your website.

36. Raphaël

 Raphael
This handy JavaScript library offers a range of data visualisation options

This handy JavaScript library offers a wide range of data visualization options which are rendered using SVG. This makes for a flexible approach that can easily be integrated within your own web site/app code, and is limited only by your own imagination.

That said, it’s a bit more hands-on than some of the other tools featured here (a victim of being so flexible), so unless you’re a hardcore coder, you might want to check out some of the more point-and-click orientated options first!

37. jQuery Visualize

 JQuery Visualise
jQuery Visualize Plugin is an open source charting plugin

Written by the team behind jQuery’s ThemeRoller and jQuery UI websites, jQuery Visualize Plugin is an open source charting plugin for jQuery that uses HTML Canvas to draw a number of different chart types. One of the key features of this plugin is its focus on achieving ARIA support, making it friendly to screen-readers. It’s free to download from this page on GitHub.

Further reading

  • A great Tumblr blog for visualization examples and inspiration:vizualize.tumblr.com
  • Nicholas Felton’s annual reports are now infamous, but he also has a Tumblr blog of great things he finds.
  • From the guy who helped bring Processing into the world:benfry.com/writing
  • Stamen Design is always creating interesting projects:stamen.com
  • Eyeo Festival brings some of the greatest minds in data visualization together in one place, and you can watch the videos online.

Brian Suda is a master informatician and author of Designing with Data, a practical guide to data visualisation.

Originally posted via “The 37 best tools for data visualization”

 

Source

Aug 22, 19: #AnalyticsClub #Newsletter (Events, Tips, News & more..)

Warning: file_get_contents(http://news.analyticsweek.com/tw/newspull.php): failed to open stream: HTTP request failed! in /home3/vishaltao/public_html/mytao/script/includeit.php on line 15

Warning: file_get_contents(http://news.analyticsweek.com/tw/newspull.php): failed to open stream: HTTP request failed! in /home3/vishaltao/public_html/mytao/script/includeit.php on line 15

Warning: file_get_contents(http://news.analyticsweek.com/tw/newspull.php): failed to open stream: HTTP request failed! in /home3/vishaltao/public_html/mytao/script/includeit.php on line 15

[  COVER OF THE WEEK ]

image
Data security  Source

[ LOCAL EVENTS & SESSIONS]

More WEB events? Click Here

[ AnalyticsWeek BYTES]

>> How Small Businesses Can Embrace Big Data by analyticsweekpick

>> How the NFL is Using Big Data by analyticsweekpick

>> Key Considerations for Converting Legacy ETL to Modern ETL – Part II by analyticsweekpick

Wanna write? Click Here

[ FEATURED COURSE]

CS229 – Machine Learning

image

This course provides a broad introduction to machine learning and statistical pattern recognition. … more

[ FEATURED READ]

Superintelligence: Paths, Dangers, Strategies

image

The human brain has some capabilities that the brains of other animals lack. It is to these distinctive capabilities that our species owes its dominant position. Other animals have stronger muscles or sharper claws, but … more

[ TIPS & TRICKS OF THE WEEK]

Save yourself from zombie apocalypse from unscalable models
One living and breathing zombie in today’s analytical models is the pulsating absence of error bars. Not every model is scalable or holds ground with increasing data. Error bars that is tagged to almost every models should be duly calibrated. As business models rake in more data the error bars keep it sensible and in check. If error bars are not accounted for, we will make our models susceptible to failure leading us to halloween that we never wants to see.

[ DATA SCIENCE Q&A]

Q:Explain what a false positive and a false negative are. Why is it important these from each other? Provide examples when false positives are more important than false negatives, false negatives are more important than false positives and when these two types of errors are equally important
A: * False positive
Improperly reporting the presence of a condition when it’s not in reality. Example: HIV positive test when the patient is actually HIV negative

* False negative
Improperly reporting the absence of a condition when in reality it’s the case. Example: not detecting a disease when the patient has this disease.

When false positives are more important than false negatives:
– In a non-contagious disease, where treatment delay doesn’t have any long-term consequences but the treatment itself is grueling
– HIV test: psychological impact

When false negatives are more important than false positives:
– If early treatment is important for good outcomes
– In quality control: a defective item passes through the cracks!
– Software testing: a test to catch a virus has failed

Source

[ VIDEO OF THE WEEK]

@RCKashyap @Cylance on State of Security & Technologist Mindset #FutureOfData #Podcast

 @RCKashyap @Cylance on State of Security & Technologist Mindset #FutureOfData #Podcast

Subscribe to  Youtube

[ QUOTE OF THE WEEK]

Data are becoming the new raw material of business. – Craig Mundie

[ PODCAST OF THE WEEK]

#GlobalBusiness at the speed of The #BigAnalytics

 #GlobalBusiness at the speed of The #BigAnalytics

Subscribe 

iTunes  GooglePlay

[ FACT OF THE WEEK]

Data production will be 44 times greater in 2020 than it was in 2009.

Sourced from: Analytics.CLUB #WEB Newsletter

Best Practices for Using Context Variables with Talend – Part 1

A question I was regularly asked when working on different customer sites and answering questions on forums was “What is the best practice when using context variables?”

My years of working with Talend have led me to work with context variables in a way that minimizes the effort I need to put into ongoing maintenance and moving them between environments. This blog series is intended to give you an insight into the best practices I use as well as highlight the potential pitfalls that can arise from using the Talend context variable functionality without fully understanding it.

Contexts, Context Variables and Context Groups

To start, I want to ensure that we are all on the same page with regard to terminology. There are 3 ways “Context” is used in Talend:

  • Context variable: A variable which can be set either at compile time or runtime. It can be changed and allows variables which would otherwise be hardcoded to be more dynamic.
  • Context: The environment or category of the value held by the context variable. Most of the time Contexts are DEV, TEST, PROD, UAT, etc. This allows you to set up one context variable and assign a different value per environment.
  • Context Group: A group of context variables which are packaged together for ease of use. Context Groups can be dragged and dropped into jobs so that you do not have to set up the same context variables in different jobs. They can also be updated (added to) in one location and then the changes can be distributed to the jobs that use those Context Groups.

I’ve found that many people will refer to “context variables” as “contexts”. This leads to confusion in discussions, so if these terms are used incorrectly online it really can confuse the issue. So, now that we have a common set of definitions, let’s move forward.

Potential Pitfalls with Contexts

While context variables are incredibly useful when working with Talend, they can also introduce some unforeseen problems if not fully understood. The biggest cause of problems in my experience are the contexts. Quite simply, I do not use anything but a Default Context.

At the beginning of your Talend journey, they come across as a genius idea which allows developers to build against one environment, using that environment’s context variable values, then when the code is ready to test, change the context at the flick of a switch. That is true (kind of), but mainly for smaller data integration jobs. However, more often than not they open up developers and testers to horrible and time-consuming unexpected behavior. Below is just one scenario demonstrating this.

<>

Let’s say a developer has built a job which uses a Context Group configured to supply database connection parameters. She has set up 4 Contexts (DEV1, DEV2, TEST and PROD) and has configured the different Context Variable values for each Context. In her main job, she reads from the database and then passes some of the data to Child Jobs using tRunJob components. Some of these Child Jobs have their own Child Jobs and all Child Jobs make use of the database. Thus, all jobs make use of the Context Group holding the database credentials. While she is developing, she sets the Context within the tRunJobs to DEV1. This is great. She can debug her Job until she is happy that it is working. However, she needs to test on DEV2 because it has a slightly cleaner environment. When she runs the Parent Job she changes the default Context from DEV1 to DEV2 and runs the Job. It seems to work, but she cannot see the database updates in her DEV1 database. Why? She then realizes that her Child Jobs are all defaulted to use DEV1 and not DEV2.

Now there are ways around this, she could ensure that all of her tRunJobs are set with the correct Context. But what if she has dozens of them? How long will that take? She could ensure that “Transmit whole context” is set in each tRunJob. But what happens if a Child Job is using a Context variable or Context Group that is not used by any of the Parent Jobs? We are back to the same problem of having to change all of the tRunJob Contexts. But this doesn’t affect us outside of the Talend Studio, right? Wrong.

If the developer compiled that job to use on the command-line, even if she sets “Apply Context to children jobs” on the Build Job page, all this does is hardcode all of the Child Jobs’ Contexts to that selected in the Context scripts drop down. When you run it, if you change the Context that the Job needs to run for, the Child Jobs stick with the one that has been compiled. The same thing happens in the Talend Administration Center (TAC) as well.

Now, this does have some uses. Maybe your Contexts are not for environments and you want to be able to use different Contexts within the same environment? That is a legitimate (if not slightly unusual) scenario. There are other examples of these sorts of problems, but I think you get the idea.

In the early days of Talend, Contexts were brilliant. But these days (unless you have a particular use case where multiple Contexts are used within a single environment), there are better ways of handling Context variables for multiple environments. I’ll cover all of those ways and best practices in part two and three of our blog series coming out next week. Until next time!

The post Best Practices for Using Context Variables with Talend – Part 1 appeared first on Talend Real-Time Open Source Data Integration Software.

Originally Posted at: Best Practices for Using Context Variables with Talend – Part 1 by analyticsweekpick

Aug 15, 19: #AnalyticsClub #Newsletter (Events, Tips, News & more..)

Warning: file_get_contents(http://news.analyticsweek.com/tw/newspull.php): failed to open stream: HTTP request failed! in /home3/vishaltao/public_html/mytao/script/includeit.php on line 15

Warning: file_get_contents(http://news.analyticsweek.com/tw/newspull.php): failed to open stream: HTTP request failed! in /home3/vishaltao/public_html/mytao/script/includeit.php on line 15

Warning: file_get_contents(http://news.analyticsweek.com/tw/newspull.php): failed to open stream: HTTP request failed! in /home3/vishaltao/public_html/mytao/script/includeit.php on line 15

[  COVER OF THE WEEK ]

image
Complex data  Source

[ LOCAL EVENTS & SESSIONS]

More WEB events? Click Here

[ AnalyticsWeek BYTES]

>> Jul 26, 18: #AnalyticsClub #Newsletter (Events, Tips, News & more..) by admin

>> Tutorial: Azure Data Lake analytics with R by analyticsweek

>> Underpinning the Internet of Things with GPUs by jelaniharper

Wanna write? Click Here

[ FEATURED COURSE]

R, ggplot, and Simple Linear Regression

image

Begin to use R and ggplot while learning the basics of linear regression… more

[ FEATURED READ]

The Black Swan: The Impact of the Highly Improbable

image

A black swan is an event, positive or negative, that is deemed improbable yet causes massive consequences. In this groundbreaking and prophetic book, Taleb shows in a playful way that Black Swan events explain almost eve… more

[ TIPS & TRICKS OF THE WEEK]

Data Analytics Success Starts with Empowerment
Being Data Driven is not as much of a tech challenge as it is an adoption challenge. Adoption has it’s root in cultural DNA of any organization. Great data driven organizations rungs the data driven culture into the corporate DNA. A culture of connection, interactions, sharing and collaboration is what it takes to be data driven. Its about being empowered more than its about being educated.

[ DATA SCIENCE Q&A]

Q:Do you think 50 small decision trees are better than a large one? Why?
A: * Yes!
* More robust model (ensemble of weak learners that come and make a strong learner)
* Better to improve a model by taking many small steps than fewer large steps
* If one tree is erroneous, it can be auto-corrected by the following
* Less prone to overfitting

Source

[ VIDEO OF THE WEEK]

#DataScience Approach to Reducing #Employee #Attrition

 #DataScience Approach to Reducing #Employee #Attrition

Subscribe to  Youtube

[ QUOTE OF THE WEEK]

If you can’t explain it simply, you don’t understand it well enough. – Albert Einstein

[ PODCAST OF THE WEEK]

Solving #FutureOfOrgs with #Detonate mindset (by @steven_goldbach & @geofftuff) #FutureOfData #Podcast

 Solving #FutureOfOrgs with #Detonate mindset (by @steven_goldbach & @geofftuff) #FutureOfData #Podcast

Subscribe 

iTunes  GooglePlay

[ FACT OF THE WEEK]

Walmart handles more than 1 million customer transactions every hour, which is imported into databases estimated to contain more than 2.5 petabytes of data.

Sourced from: Analytics.CLUB #WEB Newsletter

Data skills – Many hands make light work in a world awash with data

While the transformation to a data-driven culture needs to come from the top of the organization, data skills must permeate through all areas of the business.

Rather than being the responsibility of one person or department, assuring data availability and integrity must be a team sport in modern data-centric businesses. Everyone must be involved and made accountable throughout the process.  

The challenge for enterprises is to effectively enable greater data access among the workforce while maintaining oversight and quality.

The Evolution of the Data Team

Businesses are recognizing the value and opportunities that data creates. There is an understanding that data needs to be handled and processed efficiently. For some companies, this has led to the formation of a new department of data analysts and scientists.

The data team is led by a Chief Data Officer (CDO), a role that is set to become key to business success in the digital era, according to recent research from Gartner. While earlier iterations of roles within the data team centered on data governance, data quality and regulatory issues, the focus is shifting. Data analysts and scientists are now expected to contribute and deliver a data-driven culture across the company, while also driving business value. According to the Gartner survey, the skills required for roles within the data team have expanded to span data management, analytics, data science, ethics, and digital transformation.

Businesses are clearly recognizing the importance of the data team’s functions and are making significant investments in it. Office budgets for the data team increased by an impressive 23% between 2016 and 2017 according to Gartner. What’s more, some 15% of the CDOs that took part in the study revealed that their budgets were more than $20 million for their departments, compared with just 7% who said the same in 2016. The increasing popularity and evolution of these new data roles has largely been driven by GDPR in Europe and by new data protection regulations in the US. And the evidence suggests that the position will be essential for ensuring the successful transfer of data skills throughout businesses of all sizes.

The Data Skills Shortage

Data is an incredibly valuable resource, but businesses can only unlock its full potential if they have the talent to analyze that data and produce actionable insights that help them to better understand their customers’ needs. However, companies are already struggling to cope with the big data ecosystem due to a skills shortage and the problem shows little sign of improving. In fact, Europe could see a shortage of up to 500,000 IT professionals by 2020, according to the latest research from consultancy firm Empirica.

The rapidly evolving digital landscape is partly to blame as the skills required have changed radically in recent years. The required data science skills needed at today’s data-driven companies are more wide-ranging than ever before. The modern workforce is now required to have a firm grasp of computer science including everything from databases to the cloud, according to strategic advisor and best-selling author Bernard Marr. In addition, analytical skills are essential to make sense of the ever-increasing data gathered by enterprises, while mathematical skills are also vital as much of the data captured will be numerical as this is largely due to IoT and sensor data. These skills must also sit alongside more traditional business and communication skills, as well as the ability to be creative and adapt to developing technologies.

The need for these skills is set to increase, with IBM predicting that the number of jobs for data professionals will rise by a massive 28% by 2020. The good news is that businesses are already recognizing the importance of digital skills in the workforce, with the role of Data Scientist taking the number one spot in Glassdoor’s Best Jobs in America for the past three years, with a staggering 4,524 positions available in 2018. 

Data Training Employees

Data quality management is a task that extends across all functional areas of a company. It, therefore, makes sense to provide the employees in the specialist departments with tools to ensure data quality in self-service. Cloud-based tools that can be rolled out quickly and easily in the departments are essential. This way, companies can gradually improve their data quality whilst also increasing the value of their data.

While the number of data workers triples and to stay competitive with GDPR, businesses must think of good data management as a team sport. Investing in the Chief Data Officer role and data skills now will enable forward-thinking businesses to reap the rewards, both in the short-term and further into the future.

The post Data skills – Many hands make light work in a world awash with data appeared first on Talend Real-Time Open Source Data Integration Software.

Originally Posted at: Data skills – Many hands make light work in a world awash with data by analyticsweekpick

Aug 08, 19: #AnalyticsClub #Newsletter (Events, Tips, News & more..)

Warning: file_get_contents(http://news.analyticsweek.com/tw/newspull.php): failed to open stream: HTTP request failed! in /home3/vishaltao/public_html/mytao/script/includeit.php on line 15

Warning: file_get_contents(http://news.analyticsweek.com/tw/newspull.php): failed to open stream: HTTP request failed! in /home3/vishaltao/public_html/mytao/script/includeit.php on line 15

Warning: file_get_contents(http://news.analyticsweek.com/tw/newspull.php): failed to open stream: HTTP request failed! in /home3/vishaltao/public_html/mytao/script/includeit.php on line 15

[  COVER OF THE WEEK ]

image
Trust the data  Source

[ LOCAL EVENTS & SESSIONS]

More WEB events? Click Here

[ AnalyticsWeek BYTES]

>> The 4 Common Challenges of Predictive Analytics by analyticsweek

>> Inside CXM: New Global Thought Leader Hub for Customer Experience Professionals by bobehayes

>> What Motivates People to Take Free Surveys? by analyticsweek

Wanna write? Click Here

[ FEATURED COURSE]

Deep Learning Prerequisites: The Numpy Stack in Python

image

The Numpy, Scipy, Pandas, and Matplotlib stack: prep for deep learning, machine learning, and artificial intelligence… more

[ FEATURED READ]

Research Design: Qualitative, Quantitative, and Mixed Methods Approaches, 4th Edition

image

The eagerly anticipated Fourth Edition of the title that pioneered the comparison of qualitative, quantitative, and mixed methods research design is here! For all three approaches, Creswell includes a preliminary conside… more

[ TIPS & TRICKS OF THE WEEK]

Data Have Meaning
We live in a Big Data world in which everything is quantified. While the emphasis of Big Data has been focused on distinguishing the three characteristics of data (the infamous three Vs), we need to be cognizant of the fact that data have meaning. That is, the numbers in your data represent something of interest, an outcome that is important to your business. The meaning of those numbers is about the veracity of your data.

[ DATA SCIENCE Q&A]

Q:How frequently an algorithm must be updated?
A: You want to update an algorithm when:
– You want the model to evolve as data streams through infrastructure
– The underlying data source is changing
– Example: a retail store model that remains accurate as the business grows
– Dealing with non-stationarity

Some options:
– Incremental algorithms: the model is updated every time it sees a new training example
Note: simple, you always have an up-to-date model but you can’t incorporate data to different degrees.
Sometimes mandatory: when data must be discarded once seen (privacy)
– Periodic re-training in “batch” mode: simply buffer the relevant data and update the model every-so-often
Note: more decisions and more complex implementations

How frequently?
– Is the sacrifice worth it?
– Data horizon: how quickly do you need the most recent training example to be part of your model?
– Data obsolescence: how long does it take before data is irrelevant to the model? Are some older instances
more relevant than the newer ones?
Economics: generally, newer instances are more relevant than older ones. However, data from the same month, quarter or year of the last year can be more relevant than the same periods of the current year. In a recession period: data from previous recessions can be more relevant than newer data from different economic cycles.

Source

[ VIDEO OF THE WEEK]

Want to fix #DataScience ? fix #governance by @StephenGatchell @Dell #FutureOfData #Podcast

 Want to fix #DataScience ? fix #governance by @StephenGatchell @Dell #FutureOfData #Podcast

Subscribe to  Youtube

[ QUOTE OF THE WEEK]

Torture the data, and it will confess to anything. – Ronald Coase

[ PODCAST OF THE WEEK]

#FutureOfData Podcast: Peter Morgan, CEO, Deep Learning Partnership

 #FutureOfData Podcast: Peter Morgan, CEO, Deep Learning Partnership

Subscribe 

iTunes  GooglePlay

[ FACT OF THE WEEK]

Bad data or poor data quality costs US businesses $600 billion annually.

Sourced from: Analytics.CLUB #WEB Newsletter

How big data can improve manufacturing

Manufacturers taking advantage of advanced analytics can reduce process flaws, saving time and money.

In the past 20 years or so, manufacturers have been able to reduce waste and variability in their production processes and dramatically improve product quality and yield (the amount of output per unit of input) by implementing lean and Six Sigma programs. However, in certain processing environments—pharmaceuticals, chemicals, and mining, for instance—extreme swings in variability are a fact of life, sometimes even after lean techniques have been applied. Given the sheer number and complexity of production activities that influence yield in these and other industries, manufacturers need a more granular approach to diagnosing and correcting process flaws. Advanced analytics provides just such an approach.

Advanced analytics refers to the application of statistics and other mathematical tools to business data in order to assess and improve practices (exhibit). In manufacturing, operations managers can use advanced analytics to take a deep dive into historical process data, identify patterns and relationships among discrete process steps and inputs, and then optimize the factors that prove to have the greatest effect on yield. Many global manufacturers in a range of industries and geographies now have an abundance of real-time shop-floor data and the capability to conduct such sophisticated statistical assessments. They are taking previously isolated data sets, aggregating them, and analyzing them to reveal important insights.

Exhibit

Consider the production of biopharmaceuticals, a category of healthcare products that includes vaccines, hormones, and blood components. They are manufactured using live, genetically engineered cells, and production teams must often monitor more than 200 variables within the production flow to ensure the purity of the ingredients as well as the substances being made. Two batches of a particular substance, produced using an identical process, can still exhibit a variation in yield of between 50 and 100 percent. This huge unexplained variability can create issues with capacity and product quality and can draw increased regulatory scrutiny.

One top-five biopharmaceuticals maker used advanced analytics to significantly increase its yield in vaccine production while incurring no additional capital expenditures. The company segmented its entire process into clusters of closely related production activities; for each cluster, it took far-flung data about process steps and the materials used and gathered them in a central database.

A project team then applied various forms of statistical analysis to the data to determine interdependencies among the different process parameters (upstream and downstream) and their impact on yield. Nine parameters proved to be most influential, especially time to inoculate cells and conductivity measures associated with one of the chromatography steps. The manufacturer made targeted process changes to account for these nine parameters and was able to increase its vaccine yield by more than 50 percent—worth between $5 million and $10 million in yearly savings for a single substance, one of hundreds it produces.

Developing unexpected insights

Even within manufacturing operations that are considered best in class, the use of advanced analytics may reveal further opportunities to increase yield. This was the case at one established European maker of functional and specialty chemicals for a number of industries, including paper, detergents, and metalworking. It boasted a strong history of process improvements since the 1960s, and its average yield was consistently higher than industry benchmarks. In fact, staffers were skeptical that there was much room for improvement. “This is the plant that everybody uses as a reference,” one engineer pointed out.

However, several unexpected insights emerged when the company used neural-network techniques (a form of advanced analytics based on the way the human brain processes information) to measure and compare the relative impact of different production inputs on yield. Among the factors it examined were coolant pressures, temperatures, quantity, and carbon dioxide flow. The analysis revealed a number of previously unseen sensitivities—for instance, levels of variability in carbon dioxide flow prompted significant reductions in yield. By resetting its parameters accordingly, the chemical company was able to reduce its waste of raw materials by 20 percent and its energy costs by around 15 percent, thereby improving overall yield. It is now implementing advanced process controls to complement its basic systems and steer production automatically.

Meanwhile, a precious-metals mine was able to increase its yield and profitability by rigorously assessing production data that were less than complete. The mine was going through a period in which the grade of its ore was declining; one of the only ways it could maintain production levels was to try to speed up or otherwise optimize its extraction and refining processes. The recovery of precious metals from ore is incredibly complex, typically involving between 10 and 15 variables and more than 15 pieces of machinery; extraction treatments may include cyanidation, oxidation, grinding, and leaching.

The production and process data that the operations team at the mine were working with were extremely fragmented, so the first step for the analytics team was to clean it up, using mathematical approaches to reconcile inconsistencies and account for information gaps. The team then examined the data on a number of process parameters—reagents, flow rates, density, and so on—before recognizing that variability in levels of dissolved oxygen (a key parameter in the leaching process) seemed to have the biggest impact on yield. Specifically, the team spotted fluctuations in oxygen concentration, which indicated that there were challenges in process control. The analysis also showed that the best demonstrated performance at the mine occurred on days in which oxygen levels were highest.

As a result of these findings, the mine made minor changes to its leach-recovery processes and increased its average yield by 3.7 percent within three months—a significant gain in a period during which ore grade had declined by some 20 percent. The increase in yield translated into a sustainable $10 million to $20 million annual profit impact for the mine, without it having to make additional capital investments or implement major change initiatives.

Capitalizing on big data

The critical first step for manufacturers that want to use advanced analytics to improve yield is to consider how much data the company has at its disposal. Most companies collect vast troves of process data but typically use them only for tracking purposes, not as a basis for improving operations. For these players, the challenge is to invest in the systems and skill sets that will allow them to optimize their use of existing process information—for instance, centralizing or indexing data from multiple sources so they can be analyzed more easily and hiring data analysts who are trained in spotting patterns and drawing actionable insights from information.

Some companies, particularly those with months- and sometimes years-long production cycles, have too little data to be statistically meaningful when put under an analyst’s lens. The challenge for senior leaders at these companies will be taking a long-term focus and investing in systems and practices to collect more data. They can invest incrementally—for instance, gathering information about one particularly important or particularly complex process step within the larger chain of activities, and then applying sophisticated analysis to that part of the process.

The big data era has only just emerged, but the practice of advanced analytics is grounded in years of mathematical research and scientific application. It can be a critical tool for realizing improvements in yield, particularly in any manufacturing environment in which process complexity, process variability, and capacity restraints are present. Indeed, companies that successfully build up their capabilities in conducting quantitative assessments can set themselves far apart from competitors.

About the authors

Eric Auschitzky is a consultant in McKinsey’s Lyon office, Markus Hammer is a senior expert in the Lisbon office, and Agesan Rajagopaul is an associate principal in the Johannesburg office.

The authors would like to thank Stewart Goodman, Jean-Baptiste Pelletier, Paul Rutten, Alberto Santagostino, Christoph Schmitz, and Ken Somers for their contributions to this article.

Originally posted via “How big data can improve manufacturing”

PDF

Source by analyticsweekpick