Jan 30, 20: #AnalyticsClub #Newsletter (Events, Tips, News & more..)

Warning: file_get_contents(http://events.analytics.club/tw/eventpull.php?cat=WEB): failed to open stream: HTTP request failed! in /home3/vishaltao/public_html/mytao/script/includeit.php on line 15

Warning: file_get_contents(http://events.analytics.club/tw/eventpull.php?cat=WEB): failed to open stream: HTTP request failed! in /home3/vishaltao/public_html/mytao/script/includeit.php on line 15

Warning: file_get_contents(http://events.analytics.club/tw/eventpull.php?cat=WEB): failed to open stream: HTTP request failed! in /home3/vishaltao/public_html/mytao/script/includeit.php on line 15

Warning: file_get_contents(http://events.analytics.club/tw/eventpull.php?cat=WEB): failed to open stream: HTTP request failed! in /home3/vishaltao/public_html/mytao/script/includeit.php on line 15

Warning: file_get_contents(http://events.analytics.club/tw/eventpull.php?cat=WEB): failed to open stream: HTTP request failed! in /home3/vishaltao/public_html/mytao/script/includeit.php on line 15

Warning: file_get_contents(http://events.analytics.club/tw/eventpull.php?cat=WEB): failed to open stream: HTTP request failed! in /home3/vishaltao/public_html/mytao/script/includeit.php on line 15

Warning: file_get_contents(http://news.analyticsweek.com/tw/newspull.php): failed to open stream: HTTP request failed! in /home3/vishaltao/public_html/mytao/script/includeit.php on line 15

Warning: file_get_contents(http://news.analyticsweek.com/tw/newspull.php): failed to open stream: HTTP request failed! in /home3/vishaltao/public_html/mytao/script/includeit.php on line 15

Warning: file_get_contents(http://news.analyticsweek.com/tw/newspull.php): failed to open stream: HTTP request failed! in /home3/vishaltao/public_html/mytao/script/includeit.php on line 15

[  COVER OF THE WEEK ]

image
Conditional Risk  Source

[ AnalyticsWeek BYTES]

>> Entrepreneurs Wishlist for Santa by v1shal

>> Building Big Analytics as a Sustainable Competitive Advantage by v1shal

>> Accountants Increasingly Use Data Analysis to Catch Fraud by analyticsweekpick

Wanna write? Click Here

[ FEATURED COURSE]

Tackle Real Data Challenges

image

Learn scalable data management, evaluate big data technologies, and design effective visualizations…. more

[ FEATURED READ]

The Master Algorithm: How the Quest for the Ultimate Learning Machine Will Remake Our World

image

In the world’s top research labs and universities, the race is on to invent the ultimate learning algorithm: one capable of discovering any knowledge from data, and doing anything we want, before we even ask. In The Mast… more

[ TIPS & TRICKS OF THE WEEK]

Data Analytics Success Starts with Empowerment
Being Data Driven is not as much of a tech challenge as it is an adoption challenge. Adoption has it’s root in cultural DNA of any organization. Great data driven organizations rungs the data driven culture into the corporate DNA. A culture of connection, interactions, sharing and collaboration is what it takes to be data driven. Its about being empowered more than its about being educated.

[ DATA SCIENCE Q&A]

Q:Explain what a long-tailed distribution is and provide three examples of relevant phenomena that have long tails. Why are they important in classification and regression problems?
A: * In long tailed distributions, a high frequency population is followed by a low frequency population, which gradually tails off asymptotically
* Rule of thumb: majority of occurrences (more than half, and when Pareto principles applies, 80%) are accounted for by the first 20% items in the distribution
* The least frequently occurring 80% of items are more important as a proportion of the total population
* Zipf’s law, Pareto distribution, power laws

Examples:
1) Natural language
– Given some corpus of natural language – The frequency of any word is inversely proportional to its rank in the frequency table
– The most frequent word will occur twice as often as the second most frequent, three times as often as the third most frequent…
– The” accounts for 7% of all word occurrences (70000 over 1 million)
– ‘of” accounts for 3.5%, followed by ‘and”…
– Only 135 vocabulary items are needed to account for half the English corpus!

2. Allocation of wealth among individuals: the larger portion of the wealth of any society is controlled by a smaller percentage of the people

3. File size distribution of Internet Traffic

Additional: Hard disk error rates, values of oil reserves in a field (a few large fields, many small ones), sizes of sand particles, sizes of meteorites

Importance in classification and regression problems:
– Skewed distribution
– Which metrics to use? Accuracy paradox (classification), F-score, AUC
– Issue when using models that make assumptions on the linearity (linear regression): need to apply a monotone transformation on the data (logarithm, square root, sigmoid function…)
– Issue when sampling: your data becomes even more unbalanced! Using of stratified sampling of random sampling, SMOTE (‘Synthetic Minority Over-sampling Technique”, NV Chawla) or anomaly detection approach

Source

[ VIDEO OF THE WEEK]

Understanding #FutureOfData in #Health & #Medicine - @thedataguru / @InovaHealth #FutureOfData #Podcast

 Understanding #FutureOfData in #Health & #Medicine – @thedataguru / @InovaHealth #FutureOfData #Podcast

Subscribe to  Youtube

[ QUOTE OF THE WEEK]

Processed data is information. Processed information is knowledge Processed knowledge is Wisdom. – Ankala V. Subbarao

[ PODCAST OF THE WEEK]

#FutureOfData Podcast: Peter Morgan, CEO, Deep Learning Partnership

 #FutureOfData Podcast: Peter Morgan, CEO, Deep Learning Partnership

Subscribe 

iTunes  GooglePlay

[ FACT OF THE WEEK]

In that same survey, by a small but noticeable margin, executives at small companies (fewer than 1,000 employees) are nearly 10 percent more likely to view data as a strategic differentiator than their counterparts at large enterprises.

Sourced from: Analytics.CLUB #WEB Newsletter

Self-Service Master Data Management

Once data is under management in its best-fit leveragable platform in an organization, it is as prepared as it can be to serve its many callings. It is in position to be used for purposes operationally and analytically and across the spectrum of need. Ideas emerge from business areas no longer encumbered with the burden of managing data, which can be 60% – 70% of the effort to bring the idea to reality. Walls of distrust in data come down and the organization can truly excel with an important barrier to success removed.

An important goal of the information management function in an organization is to get all data under management by this definition, and to keep it under management as systems come and go over time.

Master Data Management (MDM) is one of these key leveragable platforms. It is the elegant place for data with widespread use in the organization. It becomes the system of record for customer, product, store, material, reference and all other non-transactional data. MDM data can be accessed directly from the hub or, more commonly, mapped and distributed widely throughout the organization. This use of MDM data does not even account for the significant MDM benefit of efficiently creating and curating master data to begin with.

MDM benefits are many, including hierarchy management, data quality, data governance/workflow, data curation, and data distribution. One overlooked benefit is just having a database where trusted data can be accessed. Like any data for access, the visualization aspect of this is important. With MDM data having a strong associative quality to it, the graph representation works quite well.

Graph traversals are a natural way for analyzing network patterns. Graphs can handle high degrees of separation with ease and facilitate visualization and exploration of networks and hierarchies. Graph databases themselves are no substitute for MDM as they provide only one of the many necessary functions that an MDM tool does. However, when graph technology is embedded within MDM, such as what IBM is doing in InfoSphere MDM – it is very powerful.

Graph technology is one of the many ways to facilitate self-service to MDM. Long a goal of business intelligence, self-service has significant applicability to MDM as well. Self-service is opportunity oriented. Users may want to validate a hypothesis, experiment, innovate, etc. Long development cycles or laborious process between a user and the data can be frustrating.

Historically, the burden for all MDM functions has fallen squarely on a centralized, development function. It’s overloaded and, as with the self-service business intelligence movement, needs disintermediation. IBM is fundamentally changing this dynamic with the next release of Infosphere MDM. Its self-service data import, matching, and lightweight analytics allows the business user to find, share and get insight from both MDM and other data.

Then there’s Big Match. Big Match can analyze structured and unstructured customer data together to gain deeper customer insights. It can enable fast, efficient linking of data from multiple sources to grow and curate customer information. The majority of the information in your organization that is not under management is unstructured data. Unstructured data has always been a valuable asset to organizations, but it can be difficult to manage. Emails, documents, medical records, contracts, design specifications, legal agreements, advertisements, delivery instructions, and other text-based sources of information do not fit neatly into tabular relational databases. Most BI tools on MDM data offer the ability to drill down and roll up data in reports and dashboards, which is good. But what about the ability to “walk sideways” across data sources to discover how different parts of the business interrelate?

Using unstructured data for customer profiling allows organizations to unify diverse data from inside and outside the enterprise—even the “ugly” stuff; that is, dirty data that is incompatible with highly structured, fact-dimension data that would have been too costly to combine using traditional integration and ETL methods.

Finally, unstructured data management enables text analytics, so that organizations can gain insight into customer sentiment, competitive trends, current news trends, and other critical business information. In text analytics, everything is fair game for consideration, including customer complaints, product reviews from the web, call center transcripts, medical records, and comment/note fields in an operational system. Combining unstructured data with artificial intelligence and natural language processing can extract new attributes and facts for entities such as people, location, and sentiment from text, which can then be used to enrich the analytic experience.

All of these uses and capabilities are enhanced if they can be provided using a self-service interface that users can easily leverage to enrich data from within their apps and sources. This opens up a whole new world for discovery.

With graph technology, distribution of the publishing function and the integration of al data including unstructured data, MDM can truly have important data under management, empower the business user, be the cornerstone to digital transformation and truly be self-service.

Source by analyticsweek

Sep 28, 17: #AnalyticsClub #Newsletter (Events, Tips, News & more..)

[  COVER OF THE WEEK ]

image
Fake data  Source

[ NEWS BYTES]

>>
 Microsoft Azure to Feature New Big Data Analytics Platform – MeriTalk (blog) Under  Big Data Analytics

>>
 Digital Guardian Declares a New Dawn for Data Loss Prevention – insideBIGDATA Under  Big Data Security

>>
 The DCIM tool and its place in the modern data center – TechTarget Under  Data Center

More NEWS ? Click Here

[ FEATURED COURSE]

Statistical Thinking and Data Analysis

image

This course is an introduction to statistical data analysis. Topics are chosen from applied probability, sampling, estimation, hypothesis testing, linear regression, analysis of variance, categorical data analysis, and n… more

[ FEATURED READ]

How to Create a Mind: The Secret of Human Thought Revealed

image

Ray Kurzweil is arguably today’s most influential—and often controversial—futurist. In How to Create a Mind, Kurzweil presents a provocative exploration of the most important project in human-machine civilization—reverse… more

[ TIPS & TRICKS OF THE WEEK]

Data Analytics Success Starts with Empowerment
Being Data Driven is not as much of a tech challenge as it is an adoption challenge. Adoption has it’s root in cultural DNA of any organization. Great data driven organizations rungs the data driven culture into the corporate DNA. A culture of connection, interactions, sharing and collaboration is what it takes to be data driven. Its about being empowered more than its about being educated.

[ DATA SCIENCE Q&A]

Q:What is: lift, KPI, robustness, model fitting, design of experiments, 80/20 rule?
A: Lift:
It’s measure of performance of a targeting model (or a rule) at predicting or classifying cases as having an enhanced response (with respect to the population as a whole), measured against a random choice targeting model. Lift is simply: target response/average response.

Suppose a population has an average response rate of 5% (mailing for instance). A certain model (or rule) has identified a segment with a response rate of 20%, then lift=20/5=4

Typically, the modeler seeks to divide the population into quantiles, and rank the quantiles by lift. He can then consider each quantile, and by weighing the predicted response rate against the cost, he can decide to market that quantile or not.
“if we use the probability scores on customers, we can get 60% of the total responders we’d get mailing randomly by only mailing the top 30% of the scored customers”.

KPI:
– Key performance indicator
– A type of performance measurement
– Examples: 0 defects, 10/10 customer satisfaction
– Relies upon a good understanding of what is important to the organization

More examples:

Marketing & Sales:
– New customers acquisition
– Customer attrition
– Revenue (turnover) generated by segments of the customer population
– Often done with a data management platform

IT operations:
– Mean time between failure
– Mean time to repair

Robustness:
– Statistics with good performance even if the underlying distribution is not normal
– Statistics that are not affected by outliers
– A learning algorithm that can reduce the chance of fitting noise is called robust
– Median is a robust measure of central tendency, while mean is not
– Median absolute deviation is also more robust than the standard deviation

Model fitting:
– How well a statistical model fits a set of observations
– Examples: AIC, R2, Kolmogorov-Smirnov test, Chi 2, deviance (glm)

Design of experiments:
The design of any task that aims to describe or explain the variation of information under conditions that are hypothesized to reflect the variation.
In its simplest form, an experiment aims at predicting the outcome by changing the preconditions, the predictors.
– Selection of the suitable predictors and outcomes
– Delivery of the experiment under statistically optimal conditions
– Randomization
– Blocking: an experiment may be conducted with the same equipment to avoid any unwanted variations in the input
– Replication: performing the same combination run more than once, in order to get an estimate for the amount of random error that could be part of the process
– Interaction: when an experiment has 3 or more variables, the situation in which the interaction of two variables on a third is not additive

80/20 rule:
– Pareto principle
– 80% of the effects come from 20% of the causes
– 80% of your sales come from 20% of your clients
– 80% of a company complaints come from 20% of its customers

Source

[ VIDEO OF THE WEEK]

Data-As-A-Service (#DAAS) to enable compliance reporting

 Data-As-A-Service (#DAAS) to enable compliance reporting

Subscribe to  Youtube

[ QUOTE OF THE WEEK]

Everybody gets so much information all day long that they lose their common sense. – Gertrude Stein

[ PODCAST OF THE WEEK]

#DataScience Approach to Reducing #Employee #Attrition

 #DataScience Approach to Reducing #Employee #Attrition

Subscribe 

iTunes  GooglePlay

[ FACT OF THE WEEK]

We are seeing a massive growth in video and photo data, where every minute up to 300 hours of video are uploaded to YouTube alone.

Sourced from: Analytics.CLUB #WEB Newsletter

Source

Jan 23, 20: #AnalyticsClub #Newsletter (Events, Tips, News & more..)

Warning: file_get_contents(http://events.analytics.club/tw/eventpull.php?cat=WEB): failed to open stream: HTTP request failed! in /home3/vishaltao/public_html/mytao/script/includeit.php on line 15

Warning: file_get_contents(http://events.analytics.club/tw/eventpull.php?cat=WEB): failed to open stream: HTTP request failed! in /home3/vishaltao/public_html/mytao/script/includeit.php on line 15

Warning: file_get_contents(http://events.analytics.club/tw/eventpull.php?cat=WEB): failed to open stream: HTTP request failed! in /home3/vishaltao/public_html/mytao/script/includeit.php on line 15

Warning: file_get_contents(http://events.analytics.club/tw/eventpull.php?cat=WEB): failed to open stream: HTTP request failed! in /home3/vishaltao/public_html/mytao/script/includeit.php on line 15

Warning: file_get_contents(http://events.analytics.club/tw/eventpull.php?cat=WEB): failed to open stream: HTTP request failed! in /home3/vishaltao/public_html/mytao/script/includeit.php on line 15

Warning: file_get_contents(http://events.analytics.club/tw/eventpull.php?cat=WEB): failed to open stream: HTTP request failed! in /home3/vishaltao/public_html/mytao/script/includeit.php on line 15

Warning: file_get_contents(http://news.analyticsweek.com/tw/newspull.php): failed to open stream: HTTP request failed! in /home3/vishaltao/public_html/mytao/script/includeit.php on line 15

Warning: file_get_contents(http://news.analyticsweek.com/tw/newspull.php): failed to open stream: HTTP request failed! in /home3/vishaltao/public_html/mytao/script/includeit.php on line 15

Warning: file_get_contents(http://news.analyticsweek.com/tw/newspull.php): failed to open stream: HTTP request failed! in /home3/vishaltao/public_html/mytao/script/includeit.php on line 15

[  COVER OF THE WEEK ]

image
Data security  Source

[ AnalyticsWeek BYTES]

>> Apr 11, 19: #AnalyticsClub #Newsletter (Events, Tips, News & more..) by admin

>> Can Police Use Data Science to Prevent Deadly Encounters? by analyticsweekpick

>> Tackling 4th Industrial Revolution with HR4.0 by v1shal

Wanna write? Click Here

[ FEATURED COURSE]

R Basics – R Programming Language Introduction

image

Learn the essentials of R Programming – R Beginner Level!… more

[ FEATURED READ]

Rise of the Robots: Technology and the Threat of a Jobless Future

image

What are the jobs of the future? How many will there be? And who will have them? As technology continues to accelerate and machines begin taking care of themselves, fewer people will be necessary. Artificial intelligence… more

[ TIPS & TRICKS OF THE WEEK]

Finding a success in your data science ? Find a mentor
Yes, most of us dont feel a need but most of us really could use one. As most of data science professionals work in their own isolations, getting an unbiased perspective is not easy. Many times, it is also not easy to understand how the data science progression is going to be. Getting a network of mentors address these issues easily, it gives data professionals an outside perspective and unbiased ally. It’s extremely important for successful data science professionals to build a mentor network and use it through their success.

[ DATA SCIENCE Q&A]

Q:Define: quality assurance, six sigma?
A: Quality assurance:
– A way of preventing mistakes or defects in manufacturing products or when delivering services to customers
– In a machine learning context: anomaly detection

Six sigma:
– Set of techniques and tools for process improvement
– 99.99966% of products are defect-free products (3.4 per 1 million)
– 6 standard deviation from the process mean

Source

[ VIDEO OF THE WEEK]

Scott Harrison (@SRHarrisonJD) on leading the learning organization #JobsOfFuture #Podcast

 Scott Harrison (@SRHarrisonJD) on leading the learning organization #JobsOfFuture #Podcast

Subscribe to  Youtube

[ QUOTE OF THE WEEK]

Getting information off the Internet is like taking a drink from a firehose. – Mitchell Kapor

[ PODCAST OF THE WEEK]

Solving #FutureOfOrgs with #Detonate mindset (by @steven_goldbach & @geofftuff) #FutureOfData #Podcast

 Solving #FutureOfOrgs with #Detonate mindset (by @steven_goldbach & @geofftuff) #FutureOfData #Podcast

Subscribe 

iTunes  GooglePlay

[ FACT OF THE WEEK]

Every person in the US tweeting three tweets per minute for 26,976 years.

Sourced from: Analytics.CLUB #WEB Newsletter

Six Do’s and Don’ts of Collaborative Data Management

Data Quality Projects are not technical projects anymore. They are becoming collaborative and team driven.

As organizations strive to succeed their digital transformation, data professionals realize they need to work as teams with business operations as they are the ones who need better data to succeed their operations. Being in the cockpit, Chief Data Officers need to master some simple but useful Do’s and Don’t’s about running their Data Quality Projects.

Let’s list a few of these.

 DO’S

 Set your expectations from the start.

Why Data Quality? What do you target? How deep will you impact your organization’s business performance? Find your Data Quality answers among business people. Make sure you know your finish line, so you can set intermediate goals and milestones on a project calendar.

Build your interdisciplinary team.

Of course, it’s about having the right technical people on board: people who master Data Management Platforms. But It’s all also about finding the right people who will understand how Data Quality impacts the business and make them your local champions in their respective department. For example, Digital Marketing Experts often struggle with bad leads and low performing tactics due to the lack of good contact information. Moreover, new regulations such as GDPR made marketing professionals aware about how important personal data are. By putting such tools as Data Preparation in their hands, you will give them a way to act on their Data without losing control. They will be your allies in your Data Quality Journey.

Deliver quick wins.

While it’s key to stretch people capabilities and set ambitious objectives, it’s also necessary to prove your data quality project will have positive business value very quickly. Don’t spend too much time on heavy planning. You need to prove business impacts with immediate results. Some Talend customers achieved business results very quickly by enabling business people with apps such as Data Prep or Data Stewardship.  If you deliver better and faster time to insight, you will gain instant credibility and people will support your project. After gaining credibility and confidence, it will be easier to ask for additional means when presenting your projects to the board. At the end remember many small ones make a big one.

DON’TS

Don’t underestimate the power of bad communication

We often think technical projects need technical answers. But Data Quality is a strategic topic. It would be misleading to treat it as a technical challenge. To succeed, your project must be widely known within your organization. You will take control of your own project story instead of leaving bad communication spreading across departments. For that, you must master the perfect mix of know-how and communication skills so that your results will be known and properly communicated within your organization. Marketing suffering from bad leads, operations suffering from missing infos, strategists suffering from biased insights. People may ask you to extend your projects and solve their data quality issues, which is a good reason to ask for more budget.

Don’t overengineer your projects then making it too complex and sophisticated.

Talend provides simple and powerful platform to produce fast results so you can start small and deliver big. One example of having implemented Data Management from the start, is Carhartt who managed to clean 50,000 records in one day. You don’t necessarily need to wait a long time to see results.

Don’t Leave the clock running and leave your team without clear directions

Set and meet deadlines as often as possible. It will bolster your credibility. As time is running fast and your organization may shift to short term business priorities, track your route and stay focused on your end goals. Make sure you deliver project on time. Then celebrate success. When finishing a project milestone, make sure you take time to celebrate with your team and within the organization.

 

To learn more about Data Quality, please download our Definitive Guide to Data Quality.

 

The post Six Do’s and Don’ts of Collaborative Data Management appeared first on Talend Real-Time Open Source Data Integration Software.

Source: Six Do’s and Don’ts of Collaborative Data Management

August 14, 2017 Health and Biotech analytics news roundup

Biohackers Encoded Malware in a Strand of DNA: University of Washington researchers encoded a sequence of DNA to cause a buffer overflow in the program used to compress it, potentially allowing an attacker to take control. Note, however, that the attack was not always successful, and that the program had been modified to allow the attack to take place.

Geneformics Announces the First Truly Scalable Genomics Data Compression Solution to Accelerate the Migration of Precision Medicine to the Cloud: The large size of genomics data limits the ability to store and analyze it, which this solution helps to address.

The 5 Smartest Companies Analyzing Your DNA: Overviews of 23andMe, Illumina, Oxford Nanopore, Sophia Genetics, and Veritas Genetics.

New tumor database deployed to battle childhood Cancer at UC Santa Cruz: The database is public and free to use at https://treehouse.xenahubs.net.

Deep Learning Thrives in Cancer Moonshot: The CANcer Distributed Learning Environment (CANDLE) project is looking to develop new models and frameworks to help make new cancer therapies.

Source

Jan 16, 20: #AnalyticsClub #Newsletter (Events, Tips, News & more..)

Warning: file_get_contents(http://events.analytics.club/tw/eventpull.php?cat=WEB): failed to open stream: HTTP request failed! in /home3/vishaltao/public_html/mytao/script/includeit.php on line 15

Warning: file_get_contents(http://events.analytics.club/tw/eventpull.php?cat=WEB): failed to open stream: HTTP request failed! in /home3/vishaltao/public_html/mytao/script/includeit.php on line 15

Warning: file_get_contents(http://events.analytics.club/tw/eventpull.php?cat=WEB): failed to open stream: HTTP request failed! in /home3/vishaltao/public_html/mytao/script/includeit.php on line 15

Warning: file_get_contents(http://events.analytics.club/tw/eventpull.php?cat=WEB): failed to open stream: HTTP request failed! in /home3/vishaltao/public_html/mytao/script/includeit.php on line 15

Warning: file_get_contents(http://events.analytics.club/tw/eventpull.php?cat=WEB): failed to open stream: HTTP request failed! in /home3/vishaltao/public_html/mytao/script/includeit.php on line 15

Warning: file_get_contents(http://events.analytics.club/tw/eventpull.php?cat=WEB): failed to open stream: HTTP request failed! in /home3/vishaltao/public_html/mytao/script/includeit.php on line 15

Warning: file_get_contents(http://news.analyticsweek.com/tw/newspull.php): failed to open stream: HTTP request failed! in /home3/vishaltao/public_html/mytao/script/includeit.php on line 15

Warning: file_get_contents(http://news.analyticsweek.com/tw/newspull.php): failed to open stream: HTTP request failed! in /home3/vishaltao/public_html/mytao/script/includeit.php on line 15

Warning: file_get_contents(http://news.analyticsweek.com/tw/newspull.php): failed to open stream: HTTP request failed! in /home3/vishaltao/public_html/mytao/script/includeit.php on line 15

[  COVER OF THE WEEK ]

image
Fake data  Source

[ AnalyticsWeek BYTES]

>> Ashok Srivastava(@aerotrekker) @Intuit on Winning the Art of #DataScience #FutureOfData #Podcast by admin

>> CISOs’ newest fear? Criminals with a big data strategy by analyticsweekpick

>> How to Generate Game of Thrones Characters Using StyleGAN by administrator

Wanna write? Click Here

[ FEATURED COURSE]

A Course in Machine Learning

image

Machine learning is the study of algorithms that learn from data and experience. It is applied in a vast variety of application areas, from medicine to advertising, from military to pedestrian. Any area in which you need… more

[ FEATURED READ]

Superintelligence: Paths, Dangers, Strategies

image

The human brain has some capabilities that the brains of other animals lack. It is to these distinctive capabilities that our species owes its dominant position. Other animals have stronger muscles or sharper claws, but … more

[ TIPS & TRICKS OF THE WEEK]

Data aids, not replace judgement
Data is a tool and means to help build a consensus to facilitate human decision-making but not replace it. Analysis converts data into information, information via context leads to insight. Insights lead to decision making which ultimately leads to outcomes that brings value. So, data is just the start, context and intuition plays a role.

[ DATA SCIENCE Q&A]

Q:Explain what a long-tailed distribution is and provide three examples of relevant phenomena that have long tails. Why are they important in classification and regression problems?
A: * In long tailed distributions, a high frequency population is followed by a low frequency population, which gradually tails off asymptotically
* Rule of thumb: majority of occurrences (more than half, and when Pareto principles applies, 80%) are accounted for by the first 20% items in the distribution
* The least frequently occurring 80% of items are more important as a proportion of the total population
* Zipf’s law, Pareto distribution, power laws

Examples:
1) Natural language
– Given some corpus of natural language – The frequency of any word is inversely proportional to its rank in the frequency table
– The most frequent word will occur twice as often as the second most frequent, three times as often as the third most frequent…
– The” accounts for 7% of all word occurrences (70000 over 1 million)
– ‘of” accounts for 3.5%, followed by ‘and”…
– Only 135 vocabulary items are needed to account for half the English corpus!

2. Allocation of wealth among individuals: the larger portion of the wealth of any society is controlled by a smaller percentage of the people

3. File size distribution of Internet Traffic

Additional: Hard disk error rates, values of oil reserves in a field (a few large fields, many small ones), sizes of sand particles, sizes of meteorites

Importance in classification and regression problems:
– Skewed distribution
– Which metrics to use? Accuracy paradox (classification), F-score, AUC
– Issue when using models that make assumptions on the linearity (linear regression): need to apply a monotone transformation on the data (logarithm, square root, sigmoid function…)
– Issue when sampling: your data becomes even more unbalanced! Using of stratified sampling of random sampling, SMOTE (‘Synthetic Minority Over-sampling Technique”, NV Chawla) or anomaly detection approach

Source

[ VIDEO OF THE WEEK]

Discussing #InfoSec with @travturn, @hrbrmstr(@rapid7) @thebearconomist(@boozallen) @yaxa_io

 Discussing #InfoSec with @travturn, @hrbrmstr(@rapid7) @thebearconomist(@boozallen) @yaxa_io

Subscribe to  Youtube

[ QUOTE OF THE WEEK]

Torture the data, and it will confess to anything. – Ronald Coase

[ PODCAST OF THE WEEK]

@EdwardBoudrot / @Optum on #DesignThinking & #DataDriven Products #FutureOfData #Podcast

 @EdwardBoudrot / @Optum on #DesignThinking & #DataDriven Products #FutureOfData #Podcast

Subscribe 

iTunes  GooglePlay

[ FACT OF THE WEEK]

Bad data or poor data quality costs US businesses $600 billion annually.

Sourced from: Analytics.CLUB #WEB Newsletter

Building Data Dashboards for Business Professionals

Blog

Everyone wants to get more out of their data, but how exactly to do that can leave you scratching your head. Our BI Best Practices demystify the analytics world and empower you with actionable how-to guidance.

Preserving insights

One of the biggest pitfalls in data is the preservation of insights when analysis is handed off from the data team to a business professional. Often, the data experts have been exploring the data for a while, developing a clear sense of its structure, assumptions, and conclusions. The data analyst has had a great opportunity to pinpoint an insight, but when it comes to sharing that work with the business person who will ultimately make a decision, they fail to fully communicate that information. Here are some tips to help you improve your data visualization so that you can add the most value to your teams via a dashboard.

The first step to making the most out of data collaborations is to set up a meeting where you discuss the relevant business questions. I outlined some important guidelines for this meeting in 6 Tips for Data Professionals to Improve Collaboration, so if you haven’t already read that, it’s a good place to start. Data experts should expect to come out of that meeting with a list of questions that they can translate into queries to build the initial dashboard. 

In this post, we’ll cover the next steps in the dashboarding process.  

BI & Analytics for Business Analysts

Work from a blueprint

The first step in building a great dashboard is to review that list of questions and group them into larger buckets. When you read the entire list, think about which themes emerge. When you’re identifying those buckets, you want to look for general topics that are only answered by combining a few of the individual questions. In the ideal scenario, all of your individual questions can be grouped into a handful of broader themes.

Once the questions are grouped into buckets, it’s time to build a blueprint of the dashboard using those general themes as the headers and the individual question as the charts that fit underneath. My personal blueprinting process uses a lot of sticky notes for this. I’ll write down each of the questions on my list, then arrange them into groups on paper until I’m happy with the design. It also helps me to make sketches of the individual charts to get the most complete idea of the final dashboard. 

At this stage, I’m making decisions about what is included in each chart. For example, if there are three line charts next to each other that can be combined into one chart with three lines, this is the step to decide to do that. If I decide to combine charts, then I need to make sure the title and scale of that chart fit all the information appropriately. If I decide to keep three separate line charts, I think how can I format them to show the unique information clearly.

Getting your charts in order

Once all the organizational work is done, I start arranging the charts to match my blueprint. Every question from that original list gets turned into a chart that’s placed on my dashboard according to the blueprint I made in the first part of the process. 

I like to build a dashboard in horizontal layers, with the very top layer being the most important high-level KPIs and then each of the layers below tackling one of the buckets I identified in blueprinting. To help guide users to understand the purpose and context for each section of the dashboard, I often use text as signposts. Additionally, within each chart, I use titles, colors, and other visual cues to help the charts explain themselves. Finally, when deciding on the arrangement of the charts within each section, I start with the chart that will be most frequently referenced on the left and then work my way to the right in order of decreasing frequency of views.

For the additional layers, I like to provide context for those high-level charts. For example, if the most critical chart in the dashboard is a revenue tracker, the layer directly underneath has more charts that answer questions related to revenue. The next layer down contains more detailed information about the second high-level chart, and so on. This design strategy lets that first row of the dashboard act not only as a quick summary of the most important information, it also turns it into a table of contents for the rest of the layers.

In my ideal dashboard, the top layer has one high-level KPI chart to summarize each of the buckets I identified in my initial conversation with the business professional who requested this data. Then, each section underneath that contains answers to all of the related questions we listed that belong to that topic, starting with the information that will need to be referenced the most frequently and working down to the information that will be referenced the least. Data is messy and it doesn’t always fit neatly into those layers, but this mindset makes it easy to compartmentalize and organize a long list of charts.

Readability means insight preservation

The goal of your dashboard isn’t to allow business professionals to easily find answers, it’s to help them find the right answers easily.

An important thing to remember when creating a dashboard is that most of your consumers are busy professionals with their own long list of work priorities and deliverables. When they read your dashboards, they are most likely looking for quick answers to a particular problem. They need to take away the important learnings that you’ve found in the data, but they won’t have the same amount of time to spend studying the data as you did. This attention gap is a place where the insight can erode significantly, so you need to make sure that it’s easy to get the right insights quickly.

One of the best ways to focus a reader’s eye is through the use of color. Using the same color for all the charts related to one topic of your dashboard is a shortcut to making sure all of that information is digested together. The goal of your dashboard isn’t to allow business professionals to easily find answers, it’s to help them find the right answers easily.

It’s always good practice to title the charts as specifically as possible to minimize confusion about the insights. Translating the blueprint you designed to an actual live dashboard will always result in a few unexpected hiccups, so it’s crucial that you review the dashboard as a whole and the topical layers for places where insights could get lost in the handoff back to a business professional. Through iteration, your dashboard will not only have all the right data, but it will also have a form that resonates with the person taking action, leading to higher adoption and higher impact.

BI & Analytics for Business Analysts

Christine Quan is a seasoned data and analytics veteran, focused on data visualization theory and building tools to empower data teams. She’s an expert at constructing SQL queries and building visualizations in R, Python, or Javascript.

Source

October 3, 2016 Health and Biotech Analytics News Roundup

The latest health and biotech analytics news and opinion:

insideBIGDATA Guide to Healthcare & Life Sciences: The news source has come out with a white paper evaluating different areas of healthcare that require big data technology.

A $280 billion healthcare problem ripe for technology innovation and predictive analytics: Behavioral health analytics could potentially have a very large impact on health care costs.

The New England Journal of Medicine Announces the SPRINT Data Analysis Challenge: The contest is looking for new ways to evaluate clinical trial datasets.

Advances in Next-Generation Sequencing: Long reads, single-cell sequencing, and cancer screening with DNA in the blood are exciting new areas in DNA sequencing.

The Sequencing App and the Quest for Fun: Joe Pickrell, a Columbia biology professor, has launched a company providing low-quality, low-cost genomes. He hopes it will get more people interested in biology.

Originally Posted at: October 3, 2016 Health and Biotech Analytics News Roundup by pstein

Jan 09, 20: #AnalyticsClub #Newsletter (Events, Tips, News & more..)

Warning: file_get_contents(http://events.analytics.club/tw/eventpull.php?cat=WEB): failed to open stream: HTTP request failed! in /home3/vishaltao/public_html/mytao/script/includeit.php on line 15

Warning: file_get_contents(http://events.analytics.club/tw/eventpull.php?cat=WEB): failed to open stream: HTTP request failed! in /home3/vishaltao/public_html/mytao/script/includeit.php on line 15

Warning: file_get_contents(http://events.analytics.club/tw/eventpull.php?cat=WEB): failed to open stream: HTTP request failed! in /home3/vishaltao/public_html/mytao/script/includeit.php on line 15

Warning: file_get_contents(http://events.analytics.club/tw/eventpull.php?cat=WEB): failed to open stream: HTTP request failed! in /home3/vishaltao/public_html/mytao/script/includeit.php on line 15

Warning: file_get_contents(http://events.analytics.club/tw/eventpull.php?cat=WEB): failed to open stream: HTTP request failed! in /home3/vishaltao/public_html/mytao/script/includeit.php on line 15

Warning: file_get_contents(http://events.analytics.club/tw/eventpull.php?cat=WEB): failed to open stream: HTTP request failed! in /home3/vishaltao/public_html/mytao/script/includeit.php on line 15

Warning: file_get_contents(http://news.analyticsweek.com/tw/newspull.php): failed to open stream: HTTP request failed! in /home3/vishaltao/public_html/mytao/script/includeit.php on line 15

Warning: file_get_contents(http://news.analyticsweek.com/tw/newspull.php): failed to open stream: HTTP request failed! in /home3/vishaltao/public_html/mytao/script/includeit.php on line 15

Warning: file_get_contents(http://news.analyticsweek.com/tw/newspull.php): failed to open stream: HTTP request failed! in /home3/vishaltao/public_html/mytao/script/includeit.php on line 15

[  COVER OF THE WEEK ]

image
Data Mining  Source

[ AnalyticsWeek BYTES]

>> Will China use big data as a tool of the state? by analyticsweekpick

>> Review: myCharge Hubmax Portable Charge Power Bank by administrator

>> 6 Operational Reporting Capabilities to Consider by analyticsweek

Wanna write? Click Here

[ FEATURED COURSE]

Statistical Thinking and Data Analysis

image

This course is an introduction to statistical data analysis. Topics are chosen from applied probability, sampling, estimation, hypothesis testing, linear regression, analysis of variance, categorical data analysis, and n… more

[ FEATURED READ]

Python for Data Analysis: Data Wrangling with Pandas, NumPy, and IPython

image

Python for Data Analysis is concerned with the nuts and bolts of manipulating, processing, cleaning, and crunching data in Python. It is also a practical, modern introduction to scientific computing in Python, tailored f… more

[ TIPS & TRICKS OF THE WEEK]

Data Have Meaning
We live in a Big Data world in which everything is quantified. While the emphasis of Big Data has been focused on distinguishing the three characteristics of data (the infamous three Vs), we need to be cognizant of the fact that data have meaning. That is, the numbers in your data represent something of interest, an outcome that is important to your business. The meaning of those numbers is about the veracity of your data.

[ DATA SCIENCE Q&A]

Q:Name a few famous API’s (for instance GoogleSearch)
A: Google API (Google Analytics, Picasa), Twitter API (interact with Twitter functions), GitHub API, LinkedIn API (users data)…
Source

[ VIDEO OF THE WEEK]

Ashok Srivastava(@aerotrekker @intuit) on Winning the Art of #DataScience #FutureOfData #Podcast

 Ashok Srivastava(@aerotrekker @intuit) on Winning the Art of #DataScience #FutureOfData #Podcast

Subscribe to  Youtube

[ QUOTE OF THE WEEK]

If you can’t explain it simply, you don’t understand it well enough. – Albert Einstein

[ PODCAST OF THE WEEK]

#FutureOfData Podcast: Peter Morgan, CEO, Deep Learning Partnership

 #FutureOfData Podcast: Peter Morgan, CEO, Deep Learning Partnership

Subscribe 

iTunes  GooglePlay

[ FACT OF THE WEEK]

Estimates suggest that by better integrating big data, healthcare could save as much as $300 billion a year — that’s equal to reducing costs by $1000 a year for every man, woman, and child.

Sourced from: Analytics.CLUB #WEB Newsletter