Jun 25, 20: #AnalyticsClub #Newsletter (Events, Tips, News & more..)

Warning: file_get_contents(http://events.analytics.club/tw/eventpull.php?cat=WEB): failed to open stream: HTTP request failed! HTTP/1.0 404 Not Found
in /home3/vishaltao/public_html/mytao/script/includeit.php on line 15

Warning: file_get_contents(http://events.analytics.club/tw/eventpull.php?cat=WEB): failed to open stream: HTTP request failed! HTTP/1.0 404 Not Found
in /home3/vishaltao/public_html/mytao/script/includeit.php on line 15

Warning: file_get_contents(http://events.analytics.club/tw/eventpull.php?cat=WEB): failed to open stream: HTTP request failed! HTTP/1.0 404 Not Found
in /home3/vishaltao/public_html/mytao/script/includeit.php on line 15

Warning: file_get_contents(http://events.analytics.club/tw/eventpull.php?cat=WEB): failed to open stream: HTTP request failed! HTTP/1.0 404 Not Found
in /home3/vishaltao/public_html/mytao/script/includeit.php on line 15

Warning: file_get_contents(http://events.analytics.club/tw/eventpull.php?cat=WEB): failed to open stream: HTTP request failed! HTTP/1.0 404 Not Found
in /home3/vishaltao/public_html/mytao/script/includeit.php on line 15

Warning: file_get_contents(http://events.analytics.club/tw/eventpull.php?cat=WEB): failed to open stream: HTTP request failed! HTTP/1.0 404 Not Found
in /home3/vishaltao/public_html/mytao/script/includeit.php on line 15

Warning: file_get_contents(http://news.analyticsweek.com/tw/newspull.php): failed to open stream: HTTP request failed! in /home3/vishaltao/public_html/mytao/script/includeit.php on line 15

Warning: file_get_contents(http://news.analyticsweek.com/tw/newspull.php): failed to open stream: HTTP request failed! in /home3/vishaltao/public_html/mytao/script/includeit.php on line 15

Warning: file_get_contents(http://news.analyticsweek.com/tw/newspull.php): failed to open stream: HTTP request failed! in /home3/vishaltao/public_html/mytao/script/includeit.php on line 15

[  COVER OF THE WEEK ]

image
Ethics  Source

[ AnalyticsWeek BYTES]

>> How The Guardian’s Ophan analytics engine helps editors make better decisions by analyticsweekpick

>> May 23, 19: #AnalyticsClub #Newsletter (Events, Tips, News & more..) by admin

>> Data Matching with Different Regional Data Sets by analyticsweekpick

Wanna write? Click Here

[ FEATURED COURSE]

Applied Data Science: An Introduction

image

As the world’s data grow exponentially, organizations across all sectors, including government and not-for-profit, need to understand, manage and use big, complex data sets—known as big data…. more

[ FEATURED READ]

Data Science from Scratch: First Principles with Python

image

Data science libraries, frameworks, modules, and toolkits are great for doing data science, but they’re also a good way to dive into the discipline without actually understanding data science. In this book, you’ll learn … more

[ TIPS & TRICKS OF THE WEEK]

Winter is coming, warm your Analytics Club
Yes and yes! As we are heading into winter what better way but to talk about our increasing dependence on data analytics to help with our decision making. Data and analytics driven decision making is rapidly sneaking its way into our core corporate DNA and we are not churning practice ground to test those models fast enough. Such snugly looking models have hidden nails which could induce unchartered pain if go unchecked. This is the right time to start thinking about putting Analytics Club[Data Analytics CoE] in your work place to help Lab out the best practices and provide test environment for those models.

[ DATA SCIENCE Q&A]

Q:Provide a simple example of how an experimental design can help answer a question about behavior. How does experimental data contrast with observational data?
A: * You are researching the effect of music-listening on studying efficiency
* You might divide your subjects into two groups: one would listen to music and the other (control group) wouldn’t listen anything!
* You give them a test
* Then, you compare grades between the two groups

Differences between observational and experimental data:
– Observational data: measures the characteristics of a population by studying individuals in a sample, but doesn’t attempt to manipulate or influence the variables of interest
– Experimental data: applies a treatment to individuals and attempts to isolate the effects of the treatment on a response variable

Observational data: find 100 women age 30 of which 50 have been smoking a pack a day for 10 years while the other have been smoke free for 10 years. Measure lung capacity for each of the 100 women. Analyze, interpret and draw conclusions from data.

Experimental data: find 100 women age 20 who don’t currently smoke. Randomly assign 50 of the 100 women to the smoking treatment and the other 50 to the no smoking treatment. Those in the smoking group smoke a pack a day for 10 years while those in the control group remain smoke free for 10 years. Measure lung capacity for each of the 100 women.
Analyze, interpret and draw conclusions from data.

Source

[ VIDEO OF THE WEEK]

@DrewConway on creating socially responsible data science practice #FutureOfData #Podcast

 @DrewConway on creating socially responsible data science practice #FutureOfData #Podcast

Subscribe to  Youtube

[ QUOTE OF THE WEEK]

Torture the data, and it will confess to anything. – Ronald Coase

[ PODCAST OF THE WEEK]

Solving #FutureOfWork with #Detonate mindset (by @steven_goldbach & @geofftuff) #JobsOfFuture #Podcast

 Solving #FutureOfWork with #Detonate mindset (by @steven_goldbach & @geofftuff) #JobsOfFuture #Podcast

Subscribe 

iTunes  GooglePlay

[ FACT OF THE WEEK]

Decoding the human genome originally took 10 years to process; now it can be achieved in one week.

Sourced from: Analytics.CLUB #WEB Newsletter

An Introduction to Central Limit Theorem

In machine learning, statistics play a significant role in achieving data distribution and the study of inferential statistics. A data scientist must understand the math behind sample data and Central Limit Theorem answers most of the problems. Let us discuss the concept of the Central Limit Theorem. It assumes that the distribution in the sample […]

The post An Introduction to Central Limit Theorem appeared first on GreatLearning.

Source

April 3, 2017 Health and Biotech analytics news roundup

Advantages of a Truly Open-Access Data-Sharing Model: Traditionally, data from clinical trials has been siloed, but now there is support for making such data open for all to use. Project Data Sphere is one such effort.

Detecting mutations could lead to earlier liver cancer diagnosis: Aflatoxin induces a mutation that can cause liver cancer. Now, MIT researchers have developed a method to detect this mutation before cancer develops.

Australia launches machine-learning centre to decrypt the personal genome: Geneticists and computer scientists have launched the Garvan-Deakin Program in Advanced Genomic Investigation (PAGI). They hope to work out the complex genetic causes of diseases.

More genomic sequencing announced for Victoria: Selected Australian patients will have access to genomic sequencing. This project is intended to help track drug-resistant “superbugs” as well as 4 other personalized conditions.

No, We Can’t Say Whether Cancer Is Mostly Bad Luck: Last week’s news on the mutations that cause cancer is disputed among cancer scientists.

Source: April 3, 2017 Health and Biotech analytics news roundup by pstein

Jun 18, 20: #AnalyticsClub #Newsletter (Events, Tips, News & more..)

Warning: file_get_contents(http://events.analytics.club/tw/eventpull.php?cat=WEB): failed to open stream: HTTP request failed! HTTP/1.0 404 Not Found
in /home3/vishaltao/public_html/mytao/script/includeit.php on line 15

Warning: file_get_contents(http://events.analytics.club/tw/eventpull.php?cat=WEB): failed to open stream: HTTP request failed! HTTP/1.0 404 Not Found
in /home3/vishaltao/public_html/mytao/script/includeit.php on line 15

Warning: file_get_contents(http://events.analytics.club/tw/eventpull.php?cat=WEB): failed to open stream: HTTP request failed! HTTP/1.0 404 Not Found
in /home3/vishaltao/public_html/mytao/script/includeit.php on line 15

Warning: file_get_contents(http://events.analytics.club/tw/eventpull.php?cat=WEB): failed to open stream: HTTP request failed! HTTP/1.0 404 Not Found
in /home3/vishaltao/public_html/mytao/script/includeit.php on line 15

Warning: file_get_contents(http://events.analytics.club/tw/eventpull.php?cat=WEB): failed to open stream: HTTP request failed! HTTP/1.0 404 Not Found
in /home3/vishaltao/public_html/mytao/script/includeit.php on line 15

Warning: file_get_contents(http://events.analytics.club/tw/eventpull.php?cat=WEB): failed to open stream: HTTP request failed! HTTP/1.0 404 Not Found
in /home3/vishaltao/public_html/mytao/script/includeit.php on line 15

Warning: file_get_contents(http://news.analyticsweek.com/tw/newspull.php): failed to open stream: HTTP request failed! in /home3/vishaltao/public_html/mytao/script/includeit.php on line 15

Warning: file_get_contents(http://news.analyticsweek.com/tw/newspull.php): failed to open stream: HTTP request failed! in /home3/vishaltao/public_html/mytao/script/includeit.php on line 15

Warning: file_get_contents(http://news.analyticsweek.com/tw/newspull.php): failed to open stream: HTTP request failed! in /home3/vishaltao/public_html/mytao/script/includeit.php on line 15

[  COVER OF THE WEEK ]

image
Statistics  Source

[ AnalyticsWeek BYTES]

>> Proactive Services Data Management in the Age of Hyper-Distribution by analyticsweekpick

>> How to install and use the Datumbox Machine Learning Framework by administrator

>> Oct 10, 19: #AnalyticsClub #Newsletter (Events, Tips, News & more..) by admin

Wanna write? Click Here

[ FEATURED COURSE]

Artificial Intelligence

image

This course includes interactive demonstrations which are intended to stimulate interest and to help students gain intuition about how artificial intelligence methods work under a variety of circumstances…. more

[ FEATURED READ]

How to Create a Mind: The Secret of Human Thought Revealed

image

Ray Kurzweil is arguably today’s most influential—and often controversial—futurist. In How to Create a Mind, Kurzweil presents a provocative exploration of the most important project in human-machine civilization—reverse… more

[ TIPS & TRICKS OF THE WEEK]

Finding a success in your data science ? Find a mentor
Yes, most of us dont feel a need but most of us really could use one. As most of data science professionals work in their own isolations, getting an unbiased perspective is not easy. Many times, it is also not easy to understand how the data science progression is going to be. Getting a network of mentors address these issues easily, it gives data professionals an outside perspective and unbiased ally. It’s extremely important for successful data science professionals to build a mentor network and use it through their success.

[ DATA SCIENCE Q&A]

Q:How do you take millions of users with 100’s transactions each, amongst 10k’s of products and group the users together in meaningful segments?
A: 1. Some exploratory data analysis (get a first insight)

* Transactions by date
* Count of customers Vs number of items bought
* Total items Vs total basket per customer
* Total items Vs total basket per area

2.Create new features (per customer):

Counts:

* Total baskets (unique days)
* Total items
* Total spent
* Unique product id

Distributions:

* Items per basket
* Spent per basket
* Product id per basket
* Duration between visits
* Product preferences: proportion of items per product cat per basket

3. Too many features, dimension-reduction? PCA?

4. Clustering:

* PCA

5. Interpreting model fit
* View the clustering by principal component axis pairs PC1 Vs PC2, PC2 Vs PC1.
* Interpret each principal component regarding the linear combination it’s obtained from; example: PC1=spendy axis (proportion of baskets containing spendy items, raw counts of items and visits)

Source

[ VIDEO OF THE WEEK]

Data-As-A-Service (#DAAS) to enable compliance reporting

 Data-As-A-Service (#DAAS) to enable compliance reporting

Subscribe to  Youtube

[ QUOTE OF THE WEEK]

We chose it because we deal with huge amounts of data. Besides, it sounds really cool. – Larry Page

[ PODCAST OF THE WEEK]

Unconference Panel Discussion: #Workforce #Analytics Leadership Panel

 Unconference Panel Discussion: #Workforce #Analytics Leadership Panel

Subscribe 

iTunes  GooglePlay

[ FACT OF THE WEEK]

Data is growing faster than ever before and by the year 2020, about 1.7 megabytes of new information will be created every second for every human being on the planet.

Sourced from: Analytics.CLUB #WEB Newsletter

6 things that you should know about vMwarevSphere 6.5

vSphere 6.5 offers a resilient, highly available, on-demand infrastructure that is the perfect groundwork for any cloud environment. It provides innovation that will assist digital transformation for the business and make the job of the IT administrator simpler. This means that most of their time will be freed up so that they can carry out more innovations instead of maintaining the status quo. Furthermore, vSpehere is the foundation of the hybrid cloud strategy of VMware and is necessary for cross-cloud architectures. Here are essential features of the new and updated vSphere.

vCenter Server appliance

vCenter is an essential backend tool that controls the virtual infrastructure of VMware. vCenter 6.5 has lots of innovative upgraded features. It has a migration tool that aids in shifting from vSphere 5.5 or 6.0 to vSphere 6.5. The vCenter Server appliance also includes the VMware Update Manager that eliminates the need for restarting external VM tasks or using pesky plugins.

vSphere client

In the past, the front-end client that was used for accessing the vCenter Server was quite old-fashioned and stocky. The vSphere has undergone necessary HTML5 alterations. Aside from the foreseeable performance upgrades, the change also makes this tool cross-browser compatible and more mobile-friendly.  The plugins are no longer needed and the UI has been switched for a more cutting-edge aesthetics founded on the VMware Clarity UI.

Backup and restore

The backup and restore capabilities of the VSpher 6.5 is an excellent functionality that enables clients to back up data on any Platform Services Controller appliances or the vCenter Server directly from the Application Programming Interface(API) or Virtual Appliance Management Interface (VAMI). In addition, it is able to back up both VUM and Auto Deploy implanted within the appliance. This backup mainly consists of files that need to be streamed into a preferred storage device through SCP, FTP(s), or HTTP(s) protocols.

Superior automation capabilities

With regards to automation, VMware vSphere 6.5 works perfectly because of the new upgrades. The new PowerCLI tweak has been an excellent addition to the VMware part because it is completely module-based and the APIs are at present in very high demand. This feature enables the IT administrators to entirely computerize tasks down to the virtual machine level.

 Secure boot

The secure boot element of vSphere comprises the -enabled virtual machines. This feature is available in both Linux and Windows VMs and it allows secure boot to be completed through the clicking of a simplified checkbox situated in the VM properties. After it is enabled, only the properly signed VMs can utilize the virtual environment for booting.

 Improved auditing

The Vsphere 6.5 offers clients improved audit-quality logging characteristics. This aids in accessing more forensic details about user actions. With this feature, it is easier to determine what was done, when, by whom, and if any investigations are essential with regards to anomalies and security threats.

VMware’s vSphere developed out of complexity and necessity of expanding the virtualization market. The earlier serve products were not robust enough to deal with the increasing demands of IT departments. As businesses invested in virtualization, they had to consolidate and simplify their physical server farms into virtualized ones and this triggered the need for virtual infrastructure. With these VSphere 6.5 features in mind, you can unleash its full potential and usage. Make the switch today to the new and innovative VMware VSphere 6.5.

 

Source by thomassujain

The power of data in the financial services industry

Changes in business and technology in the financial services industry have opened up new possibilities. With the rapidly growing number of customer interactions through digital banking, there is a huge volume of customer data now available that can provide strategic opportunities for business growth and tremendous prospects for improved management tools.

However, after experiencing the challenges of the recent financial crisis, most Philippine financial services companies are understandably more focused on compliance and risk management, rather than on growth opportunities resulting from improved data and analytics. They are still dominated by data management solutions and have yet to truly embed analytics into business decisions. Data is used operationally and not strategically. They have yet to embrace the key awareness that, in this digital age, acknowledging the value of data as a strategic asset, deploying sophisticated analytics to realize the benefits of that asset, and converting information into insights and practical actions create a competitive advantage.

The industry has always used big data, particularly in credit analysis. Many analytics tools are currently available such as ACL, SQL, SAS, Falcon, Lavastorm, Tableau, Spotfire and Qlikview. At present, the business functions that are most advanced in terms of analytics are finance and risk management. There is also increased use in compliance and internal audits. However, the power of data has remained largely unexploited and untapped. Insights from big data can also be used to make well-informed strategic decisions by using data to effectively extract value from customers, identify risks, and improve operational efficiency.

A few high-growth financial services companies in the Philippines, mostly foreign, are beginning to embed data analytics in sales, marketing, budgeting and planning. They understand that product and service models must be fine-tuned to respond to changing customer preferences and expectations. Using big data techniques can help enhance customer targeting, as well as advice and adjust pricing and resource allocation. Other companies in the financial services industry should consider adopting these initiatives in order to do well in light of increasing competition.

THE CHALLENGES
As with any new idea, gaining an appreciation for the opportunities in data analytics is not without difficulty. Regulations, data privacy, fragmentation and skills shortages are among the challenges facing the financial services industries in this regard.

• Regulation and data privacy concerns still dominate the financial services industry because failure may cause irreversible financial and reputational damage; after all, these businesses rely largely on credibility. The industry has also become a target for cyber attacks. Cybercriminals have developed advanced techniques to infiltrate businesses and fraudulently access sensitive information such as usernames, passwords and credit card details. Top cyber attacks in the financial services industry include phishing (unsolicited e-mails sent without the recipients’ consent to steal login credentials and banking details), and remote access Trojans (fraudulently gaining access to sensitive and private information). Consequently, customers continue to take issue with digital banking.

This should not, however, dissuade companies in the financial services industry; this challenge does not prevent them from exploiting the full potential of data analytics. The industry must find ways to use big data to improve customer service without violating privacy concerns. It must continually reassure customers that their data is valuable and that their privacy has not been violated.

To retain confidence in their ability to safeguard customer data, financial services companies will need to consistently update information security policies, systems and infrastructures, and ensure that they are abreast with best practices.

• The infrastructure of many financial services companies is set up along products or business lines using legacy IT systems that are poorly connected and are unable to communicate with one another. Bridging the gaps between these fragmented systems and replacing them with new platforms represent a serious challenge, making it difficult to introduce new technology solutions. It requires simultaneously running the business while unwinding the legacy systems and migrating smoothly to a new platform with a central data system.

• Another important technical challenge is the lack of skilled data specialists who are able to understand and manage the complexity of the data from the emerging tools and technology and provide high-class analytics with business implications.

STRONG LEADERSHIP AND GOVERNANCE
Strong leadership and governance is the key to the success in the use of data analytics. Leaders with vision and character who are attuned to the fast and continuous growth in business and technology must first make a firm decision to give more impetus to data analytics, integrate the whole company’s data management team by hiring skilled data analysts and orchestrating the motion of extracting and exploiting big data and using it to achieve competitive advantage.

Data analysis was previously considered an IT-level matter. The scale of digitization and data analysis must be adopted as a core strategic issue and must move to the top level of management and be given due attention.

Effective data governance requires an integrated approach. Leaders should commit not just to the technology, but must also see the need to invest in the people, processes and structures necessary to ensure that technology delivers value throughout the business.

Part of the task requires re-educating the organization. Formalized data governance processes must be disseminated, understood, and complied with throughout the business.

Potential data issues should be identified through regular data quality audits, continuously training staff on governance policies and procedures, and conducting regular risk assessments aimed at identifying potential data vulnerabilities.

With these complex requirements and tasks, it may take time for companies to fully appreciate the advantages of data analytics. But with the rapid evolution of technology and increasing competition, forward-looking organizations might seriously consider fast-tracking the necessary steps to fully appreciate the power of data.

Veronica Mae A. Arce is a Senior Director of SGV & Co.

 

Originally posted via “The power of data in the financial services industry”

Source: The power of data in the financial services industry by analyticsweekpick

Jun 11, 20: #AnalyticsClub #Newsletter (Events, Tips, News & more..)

Warning: file_get_contents(http://events.analytics.club/tw/eventpull.php?cat=WEB): failed to open stream: HTTP request failed! HTTP/1.0 404 Not Found
in /home3/vishaltao/public_html/mytao/script/includeit.php on line 15

Warning: file_get_contents(http://events.analytics.club/tw/eventpull.php?cat=WEB): failed to open stream: HTTP request failed! HTTP/1.0 404 Not Found
in /home3/vishaltao/public_html/mytao/script/includeit.php on line 15

Warning: file_get_contents(http://events.analytics.club/tw/eventpull.php?cat=WEB): failed to open stream: HTTP request failed! HTTP/1.0 404 Not Found
in /home3/vishaltao/public_html/mytao/script/includeit.php on line 15

Warning: file_get_contents(http://events.analytics.club/tw/eventpull.php?cat=WEB): failed to open stream: HTTP request failed! HTTP/1.0 404 Not Found
in /home3/vishaltao/public_html/mytao/script/includeit.php on line 15

Warning: file_get_contents(http://events.analytics.club/tw/eventpull.php?cat=WEB): failed to open stream: HTTP request failed! HTTP/1.0 404 Not Found
in /home3/vishaltao/public_html/mytao/script/includeit.php on line 15

Warning: file_get_contents(http://events.analytics.club/tw/eventpull.php?cat=WEB): failed to open stream: HTTP request failed! HTTP/1.0 404 Not Found
in /home3/vishaltao/public_html/mytao/script/includeit.php on line 15

Warning: file_get_contents(http://news.analyticsweek.com/tw/newspull.php): failed to open stream: HTTP request failed! in /home3/vishaltao/public_html/mytao/script/includeit.php on line 15

Warning: file_get_contents(http://news.analyticsweek.com/tw/newspull.php): failed to open stream: HTTP request failed! in /home3/vishaltao/public_html/mytao/script/includeit.php on line 15

Warning: file_get_contents(http://news.analyticsweek.com/tw/newspull.php): failed to open stream: HTTP request failed! in /home3/vishaltao/public_html/mytao/script/includeit.php on line 15

[  COVER OF THE WEEK ]

image
Data analyst  Source

[ AnalyticsWeek BYTES]

>> Stephen Wunker on future of customer success through cost innovation and data by v1shal

>> Hit the “Easy” Button with Talend & Databricks to Process Data at Scale in the Cloud by analyticsweekpick

>> Voices in AI – Episode 98 – A Conversation with Jerome Glenn by analyticsweekpick

Wanna write? Click Here

[ FEATURED COURSE]

Probability & Statistics

image

This course introduces students to the basic concepts and logic of statistical reasoning and gives the students introductory-level practical ability to choose, generate, and properly interpret appropriate descriptive and… more

[ FEATURED READ]

Hypothesis Testing: A Visual Introduction To Statistical Significance

image

Statistical significance is a way of determining if an outcome occurred by random chance, or did something cause that outcome to be different than the expected baseline. Statistical significance calculations find their … more

[ TIPS & TRICKS OF THE WEEK]

Analytics Strategy that is Startup Compliant
With right tools, capturing data is easy but not being able to handle data could lead to chaos. One of the most reliable startup strategy for adopting data analytics is TUM or The Ultimate Metric. This is the metric that matters the most to your startup. Some advantages of TUM: It answers the most important business question, it cleans up your goals, it inspires innovation and helps you understand the entire quantified business.

[ DATA SCIENCE Q&A]

Q:What is the life cycle of a data science project ?
A: 1. Data acquisition
Acquiring data from both internal and external sources, including social media or web scraping. In a steady state, data extraction and routines should be in place, and new sources, once identified would be acquired following the established processes

2. Data preparation
Also called data wrangling: cleaning the data and shaping it into a suitable form for later analyses. Involves exploratory data analysis and feature extraction.

3. Hypothesis & modelling
Like in data mining but not with samples, with all the data instead. Applying machine learning techniques to all the data. A key sub-step: model selection. This involves preparing a training set for model candidates, and validation and test sets for comparing model performances, selecting the best performing model, gauging model accuracy and preventing overfitting

4. Evaluation & interpretation

Steps 2 to 4 are repeated a number of times as needed; as the understanding of data and business becomes clearer and results from initial models and hypotheses are evaluated, further tweaks are performed. These may sometimes include step5 and be performed in a pre-production.

5. Deployment

6. Operations
Regular maintenance and operations. Includes performance tests to measure model performance, and can alert when performance goes beyond a certain acceptable threshold

7. Optimization
Can be triggered by failing performance, or due to the need to add new data sources and retraining the model or even to deploy new versions of an improved model

Note: with increasing maturity and well-defined project goals, pre-defined performance can help evaluate feasibility of the data science project early enough in the data-science life cycle. This early comparison helps the team refine hypothesis, discard the project if non-viable, change approaches.

Steps 2 to 4 are repeated a number of times as needed; as the understanding of data and business becomes clearer and results from initial models and hypotheses are evaluated, further tweaks are performed. These may sometimes include step5 and be performed in a pre-production.

Deployment

Operations
Regular maintenance and operations. Includes performance tests to measure model performance, and can alert when performance goes beyond a certain acceptable threshold

Optimization
Can be triggered by failing performance, or due to the need to add new data sources and retraining the model or even to deploy new versions of an improved model

Note: with increasing maturity and well-defined project goals, pre-defined performance can help evaluate feasibility of the data science project early enough in the data-science life cycle. This early comparison helps the team refine hypothesis, discard the project if non-viable, change approaches.

Steps 2 to 4 are repeated a number of times as needed; as the understanding of data and business becomes clearer and results from initial models and hypotheses are evaluated, further tweaks are performed. These may sometimes include step5 and be performed in a pre-production.

Deployment

Operations
Regular maintenance and operations. Includes performance tests to measure model performance, and can alert when performance goes beyond a certain acceptable threshold

Optimization
Can be triggered by failing performance, or due to the need to add new data sources and retraining the model or even to deploy new versions of an improved model

Note: with increasing maturity and well-defined project goals, pre-defined performance can help evaluate feasibility of the data science project early enough in the data-science life cycle. This early comparison helps the team refine hypothesis, discard the project if non-viable, change approaches.

Steps 2 to 4 are repeated a number of times as needed; as the understanding of data and business becomes clearer and results from initial models and hypotheses are evaluated, further tweaks are performed. These may sometimes include step5 and be performed in a pre-production.

Deployment

Operations
Regular maintenance and operations. Includes performance tests to measure model performance, and can alert when performance goes beyond a certain acceptable threshold

Optimization
Can be triggered by failing performance, or due to the need to add new data sources and retraining the model or even to deploy new versions of an improved model

Note: with increasing maturity and well-defined project goals, pre-defined performance can help evaluate feasibility of the data science project early enough in the data-science life cycle. This early comparison helps the team refine hypothesis, discard the project if non-viable, change approaches.

Source

[ VIDEO OF THE WEEK]

@TimothyChou on World of #IOT & Its #Future Part 2 #FutureOfData #Podcast

 @TimothyChou on World of #IOT & Its #Future Part 2 #FutureOfData #Podcast

Subscribe to  Youtube

[ QUOTE OF THE WEEK]

Everybody gets so much information all day long that they lose their common sense. – Gertrude Stein

[ PODCAST OF THE WEEK]

@DrewConway on fabric of an IOT Startup #FutureOfData #Podcast

 @DrewConway on fabric of an IOT Startup #FutureOfData #Podcast

Subscribe 

iTunes  GooglePlay

[ FACT OF THE WEEK]

More than 200bn HD movies – which would take a person 47m years to watch.

Sourced from: Analytics.CLUB #WEB Newsletter

Improving Self-Service Business Intelligence and Data Science

The heterogeneous complexities of big data present the foremost challenge in delivering that data to the end users who need them most. Those complexities are characterized by:

  • Disparate data sources: The influx of big data multiplied the sheer amount of data sources almost exponentially, including those both external and internal ones. Moreover, the quantity of sources required today are made more complex by…
  • Multiple technologies powering those sources: For almost every instance in which SQL is still deployed, there is seemingly another application, use case, or data source which involves an assortment of alternative technologies. Moreover, accounting for the plethora of technologies in use today is frequently aggravated by contemporary…
  • Architecture and infrastructure complications: With numerous advantages for deployments in the cloud, on-premise, and in hybrid manifestations of the two, contemporary enterprise architecture and infrastructure is increasingly ensnared in a process which protracts time to value for accessing data. The dilatory nature of this reality is only worsened in the wake of…
  • Heightened expectations for data: As data becomes ever entrenched in the personal lives of business users, the traditional lengthy periods of business intelligence and data insight are becoming less tolerable. According to Dremio Chief Marketing Officer Kelly Stirman, “In our personal lives, when we want to use data to answer questions, it’s just a few seconds away on Google…And then you get to work, and your experience is nothing like that. If you want to answer a question or want some piece of data, it’s a multi-week or multi-month process, and you have to ask IT for things. It’s frustrating as well.”

However, a number of recent developments have taken place within the ever-shifting data landscape to substantially accelerate self-service BI and certain aspects of data science. The end result is that despite the variegated factors characterizing today’s big data environments, “for a user, all of my data looks like it’s in a single high performance relational database,” Stirman revealed. “That’s exactly what every analytical tool was designed for. But behind the scenes, your data’s spread across hundreds of different systems and dozens of different technologies.”

Avoiding ETL

Conventional BI platforms were routinely hampered by the ETL process, a prerequisite for both integrating and loading data into tools with schema at variance with that of source systems. The ETL process was significant for three reasons. It was the traditional way of transforming data for application consumption. It was typically the part of the analytics process which absorbed a significant amount of time—and skill—because it required the manual writing of code. Furthermore, it resulted in multiple copies of data which could be extremely costly to organizations. Stirman observed that, “Each time you need a different set of transformations you’re making a different copy of the data. A big financial services institution that we spoke to recently said that on average they have eight copies of every piece of data, and that consumes about 40 percent of their entire IT budget which is over a billion dollars.” ETL is one of the facets of the data engineering process which monopolizes the time and resources of data scientists, who are frequently tasked with transforming data prior to leveraging them.

Modern self-service BI platforms eschew ETL with automated mechanisms that provide virtual (instead of physical) copies of data for transformation. Thus, each subsequent transformation is applied to the virtual replication of the data with swift in-memory technologies that not only accelerate the process, but eliminate the need to dedicate resources to physical copies. “We use a distributed process that can run on thousands of servers and take advantage of the aggregate RAM across thousands of servers,” Stirman said. “We can execute these transformations dynamically and give you a great high-performance experience on the data, even though we’re transforming it on the fly.” End users can enact this process visually without involving script.
Reflections

Today’s self-service BI and data science platforms have also expedited time to insight by making data more available than traditional solutions did. Virtual replications of datasets are useful in this regard because they are stored in the underlying BI solution—instead of in the actual source of data. Thus, these platforms can access that data without retrieving them from the initial data source and incurring the intrinsic delays associated with architectural complexities or slow source systems. According to Stirman, the more of these “copies of the data in a highly optimized format” such a self-service BI or data science solution has, the faster it is at retrieving relevant data for a query. Stirman noted this approach is similar to one used by Google, in which there are not only copies of web pages available but also “all these different ways of structuring data about the data, so when you ask a question they can give you an answer very quickly.” Self-service analytics solutions which optimize their data copies in this manner produce the same effect.

Prioritizing SQL

Competitive platforms in this space are able to account for the multiplicity of technologies the enterprise has to contend with in a holistic fashion. Furthermore, they’re able to do so by continuing to prioritize SQL as the preferred query language which is rewritten into the language relevant to the source data’s technology—even when it isn’t SQL. By rewriting SQL into the query language of the host of non-relational technology options, users effectively have “a single, unified future-proof way to query any data source,” Stirman said. Thus, they can effectively query any data source without understanding its technology or its query language, because the self-service BI platform does. In those instances in which “those sources have something you can’t express in SQL, we augment those capabilities with our distributed execution engine,” Stirman remarked.
User Experience

The crux of self-service platforms for BI and data science is that by eschewing ETL for quicker versions of transformation, leveraging in-memory technologies to access virtual copies of data, and re-writing queries from non-relational technologies into familiar relational ones, users can rely on their tool of choice for analytics. Business end users can choose from any popular Tableau, Qlik, or any other preferred tool, while data scientists can use R, Python, or any other popular data science platform. The fact that these solutions are able to facilitate these advantages at scale and in cloud environments adds to their viability. Consequently, “You log in as a consumer of data and you can see the data, and you can shape it the way you want to yourself without being able to program, without knowing these low level IT skills, and you get the data the way you want it through a powerful self-service model instead of asking IT to do it for you,” Stirman said. “That’s a fundamentally very different approach from the traditional approach.”

 

Source by jelaniharper

AI and Privacy: What’s in store for the the future?

One of the most common use cases of artificial intelligence at the moment is its ability to handle massive datasets, processing and interpreting them. A task that human data analysts would take ages to complete, if at all, is performed in no time, and without the possibility of human error. At the same time, the average person creates an increasingly larger digital footprint, leaving a trace in the form of a vast amount of personal information on the internet.

Corporations and governments, then, gather, store, and feed that information to powerful AI algorithms in order to learn as much as possible about that person for marketing (and other) purposes. All this has led to heated debates over the safety of our personal data and its potential misuse.

No doubt AI holds tremendous potential to disrupt and improve our lives, but there are some hidden traps and pitfalls that have to be discussed and overcome.

Is There Such a Thing as Too Much Data?

It depends on the point of view. Brands seem to need every single bit of information on their target audience in order to better understand their needs and preferences so that they can tailor the right marketing message.

While that’s in a way a legitimate thing, the rise of advanced technologies, including AI, has led this thirst for information to get in the way of their customers’ privacy.

Namely, before AI and big data analytics, it was impossible to properly interpret unstructured data coming from different sources and in different formats, which left a big chunk of information uninterpretable and thus unused.

But, once the technologies managed to crack this code and translate illegible data into the actual information, the concept of digital privacy became an issue.

In 2012, an incident showed how intimidatingly accurate data analytics can be, and what that means for an ordinary user. In an attempt to assist its customers in finding everything they might need, Target sent coupons for cribs and baby clothes to a high school girl through the mail. Her unsuspecting father went to complain, only to find out that this wasn’t just a random mistake – the store’s algorithm picked up different cues based on what kind of products the pregnant girl purchased and viewed.

Similarly, it’s possible to track and locate people with the help of their own mobile devices and wearables, which means that it’s virtually impossible to go off the radar and seclude oneself.

Voice and facial recognition additionally complicate things as these technologies are capable of completely obliterating anonymity in public places.

Although it’s somewhat comforting to know that this way many wrongdoings and crimes can be prevented and identified, the lack of regulations might put us all under surveillance. Besides, there are growing fears of misidentification and wrongful convictions. According to research studies, this technology isn’t accurate when it comes to identifying people of color, which can have grave consequences.

The Concept of Privacy in a Digital Age

The Facebook-Cambridge Analytica Scandal was just one in line of numerous incidents that demonstrated how unprotected our data is and how easy it is to obtain it, with almost no repercussions.

Just 20 years ago privacy was still a concept that existed only in the offline, physical world. And it was much easier to protect yourself and your personal data by not disclosing your credit card or Social Security number.

Today, as we use numerous online services, it’s hard to keep your data to yourself. If you want to purchase something online, you’ll have to provide your credit card number and authorize the transaction. Websites store this sensitive information online, and a single hacker attack can expose it.

For example, the data of up to 500 million Marriott International guests was compromised in a data breach in 2018.

But, it’s not only hackers and cybercriminals that jeopardize our privacy.

It’s not a secret that many companies use social media and the internet to find out more about their potential and existing employees. This can have severe implications, as people can be (and usually are) held accountable for what they post online. Some have even lost their jobs due to certain online activities like posting a profanity-laced tweet, which is exactly what happened to a NASA intern.

Is There a Solution to This Issue?

It can’t be denied that being constantly monitored and under surveillance can be frustrating.

But it would be a shame to curb the development of such immense technological advancement because of unresolved privacy issues.

AI, big data analytics, IoT, and 5G, for example, are much maligned in some circles due to the fact that they heavily rely on gargantuan amounts of data as well as that they enable a massive network of interconnected devices that can be controlled remotely.

What does this mean?

It can be both a gigantic blessing and a curse. When combined, these technologies allow, for example, the possibility of remote surgery that could save millions of lives. Similarly, IoT is a network that enables remote control of cars, homes, and appliances.

On the other hand, the data generated by these technologies can be compromised or used for harmful purposes.

Another example is AI-powered chatbots that have become indispensable in numerous industries, thanks to the fact that they can improve customer engagement and juggle multiple customer queries at the same time. This way, they help customers and increase satisfaction.

They are also capable of collecting, analyzing, and storing customer information in order to personalize every subsequent customer touchpoint and offer the best and most personalized service. This way, companies can reduce operational costs and boost customer retention rates.

Luckily, there are ways to make the most of all these AI benefits but not at the sake of compromising users’ privacy.

A New Dawn of Digital Privacy

How are we going to achieve this win-win situation and give brands our data without any fears of it being misused?

The trick is in combining cryptography and machine learning, which will result in AI’s ability to learn from data without actually seeing it.

This way, the privacy of end-users will be protected, and at the same time, companies will be able to leverage their data without breaking any laws of ethics.

Several technologies will make this happen:

  • Federated learning: This concept describes a decentralized AI framework distributed across millions of devices. Federated learning will enable scientists to create, train, improve, and assess a shared prediction model while keeping all the data on the device. In a nutshell, companies won’t have access to users’ raw data as well as no possibility of labeling it. This technology is a synergy of AI, blockchain, and IoT, keeps users’ privacy safe, and yet provides all the benefits of aggregated model improvement.
  • Differential privacy: A number of applications, including maps or fitness and health apps, collect individual users’ data so that they can make traffic predictions and or analyze users’ fitness levels and other parameters. At the moment, it’s theoretically possible to match individual contributors and their data. Differential privacy will add some randomness to the entire procedure and make it impossible to trace back the information. As a result, it won’t be possible to expose the identity of individual contributors, while allowing for their data to be collected and analyzed.
  • Homomorphic encryption: This technology uses machine learning algorithms to process and analyze encrypted data without accessing sensitive information. This data is encrypted and analyzed on a remote system. The results are sent in an encrypted form too, and they can be unlocked only by using a unique key so that the privacy of users whose data is being analyzed can be protected.

We’re still far from finding the right solution to the problem of privacy, but these small steps help in keeping things under control. AI and other technologies keep on evolving, which means that other obstacles will emerge, meaning that scientists and security experts will have to keep pace and constantly upgrade security protocols.

The post AI and Privacy: What’s in store for the the future? appeared first on Big Data Made Simple.

Source: AI and Privacy: What’s in store for the the future? by administrator

Jun 04, 20: #AnalyticsClub #Newsletter (Events, Tips, News & more..)

Warning: file_get_contents(http://events.analytics.club/tw/eventpull.php?cat=WEB): failed to open stream: HTTP request failed! HTTP/1.0 404 Not Found
in /home3/vishaltao/public_html/mytao/script/includeit.php on line 15

Warning: file_get_contents(http://events.analytics.club/tw/eventpull.php?cat=WEB): failed to open stream: HTTP request failed! HTTP/1.0 404 Not Found
in /home3/vishaltao/public_html/mytao/script/includeit.php on line 15

Warning: file_get_contents(http://events.analytics.club/tw/eventpull.php?cat=WEB): failed to open stream: HTTP request failed! HTTP/1.0 404 Not Found
in /home3/vishaltao/public_html/mytao/script/includeit.php on line 15

Warning: file_get_contents(http://events.analytics.club/tw/eventpull.php?cat=WEB): failed to open stream: HTTP request failed! HTTP/1.0 404 Not Found
in /home3/vishaltao/public_html/mytao/script/includeit.php on line 15

Warning: file_get_contents(http://events.analytics.club/tw/eventpull.php?cat=WEB): failed to open stream: HTTP request failed! HTTP/1.0 404 Not Found
in /home3/vishaltao/public_html/mytao/script/includeit.php on line 15

Warning: file_get_contents(http://events.analytics.club/tw/eventpull.php?cat=WEB): failed to open stream: HTTP request failed! HTTP/1.0 404 Not Found
in /home3/vishaltao/public_html/mytao/script/includeit.php on line 15

Warning: file_get_contents(http://news.analyticsweek.com/tw/newspull.php): failed to open stream: HTTP request failed! in /home3/vishaltao/public_html/mytao/script/includeit.php on line 15

Warning: file_get_contents(http://news.analyticsweek.com/tw/newspull.php): failed to open stream: HTTP request failed! in /home3/vishaltao/public_html/mytao/script/includeit.php on line 15

Warning: file_get_contents(http://news.analyticsweek.com/tw/newspull.php): failed to open stream: HTTP request failed! in /home3/vishaltao/public_html/mytao/script/includeit.php on line 15

[  COVER OF THE WEEK ]

image
Data security  Source

[ AnalyticsWeek BYTES]

>> Is this video software company set to take hockey analytics to the next level? by analyticsweekpick

>> Dave Ulrich (@dave_ulrich) talks about role / responsibility of HR in #FutureOfWork #JobsOfFuture #Podcast by v1shal

>> Six Do’s and Don’ts of Collaborative Data Management by analyticsweekpick

Wanna write? Click Here

[ FEATURED COURSE]

Introduction to Apache Spark

image

Learn the fundamentals and architecture of Apache Spark, the leading cluster-computing framework among professionals…. more

[ FEATURED READ]

Hypothesis Testing: A Visual Introduction To Statistical Significance

image

Statistical significance is a way of determining if an outcome occurred by random chance, or did something cause that outcome to be different than the expected baseline. Statistical significance calculations find their … more

[ TIPS & TRICKS OF THE WEEK]

Grow at the speed of collaboration
A research by Cornerstone On Demand pointed out the need for better collaboration within workforce, and data analytics domain is no different. A rapidly changing and growing industry like data analytics is very difficult to catchup by isolated workforce. A good collaborative work-environment facilitate better flow of ideas, improved team dynamics, rapid learning, and increasing ability to cut through the noise. So, embrace collaborative team dynamics.

[ DATA SCIENCE Q&A]

Q:What is the life cycle of a data science project ?
A: 1. Data acquisition
Acquiring data from both internal and external sources, including social media or web scraping. In a steady state, data extraction and routines should be in place, and new sources, once identified would be acquired following the established processes

2. Data preparation
Also called data wrangling: cleaning the data and shaping it into a suitable form for later analyses. Involves exploratory data analysis and feature extraction.

3. Hypothesis & modelling
Like in data mining but not with samples, with all the data instead. Applying machine learning techniques to all the data. A key sub-step: model selection. This involves preparing a training set for model candidates, and validation and test sets for comparing model performances, selecting the best performing model, gauging model accuracy and preventing overfitting

4. Evaluation & interpretation

Steps 2 to 4 are repeated a number of times as needed; as the understanding of data and business becomes clearer and results from initial models and hypotheses are evaluated, further tweaks are performed. These may sometimes include step5 and be performed in a pre-production.

5. Deployment

6. Operations
Regular maintenance and operations. Includes performance tests to measure model performance, and can alert when performance goes beyond a certain acceptable threshold

7. Optimization
Can be triggered by failing performance, or due to the need to add new data sources and retraining the model or even to deploy new versions of an improved model

Note: with increasing maturity and well-defined project goals, pre-defined performance can help evaluate feasibility of the data science project early enough in the data-science life cycle. This early comparison helps the team refine hypothesis, discard the project if non-viable, change approaches.

Steps 2 to 4 are repeated a number of times as needed; as the understanding of data and business becomes clearer and results from initial models and hypotheses are evaluated, further tweaks are performed. These may sometimes include step5 and be performed in a pre-production.

Deployment

Operations
Regular maintenance and operations. Includes performance tests to measure model performance, and can alert when performance goes beyond a certain acceptable threshold

Optimization
Can be triggered by failing performance, or due to the need to add new data sources and retraining the model or even to deploy new versions of an improved model

Note: with increasing maturity and well-defined project goals, pre-defined performance can help evaluate feasibility of the data science project early enough in the data-science life cycle. This early comparison helps the team refine hypothesis, discard the project if non-viable, change approaches.

Steps 2 to 4 are repeated a number of times as needed; as the understanding of data and business becomes clearer and results from initial models and hypotheses are evaluated, further tweaks are performed. These may sometimes include step5 and be performed in a pre-production.

Deployment

Operations
Regular maintenance and operations. Includes performance tests to measure model performance, and can alert when performance goes beyond a certain acceptable threshold

Optimization
Can be triggered by failing performance, or due to the need to add new data sources and retraining the model or even to deploy new versions of an improved model

Note: with increasing maturity and well-defined project goals, pre-defined performance can help evaluate feasibility of the data science project early enough in the data-science life cycle. This early comparison helps the team refine hypothesis, discard the project if non-viable, change approaches.

Steps 2 to 4 are repeated a number of times as needed; as the understanding of data and business becomes clearer and results from initial models and hypotheses are evaluated, further tweaks are performed. These may sometimes include step5 and be performed in a pre-production.

Deployment

Operations
Regular maintenance and operations. Includes performance tests to measure model performance, and can alert when performance goes beyond a certain acceptable threshold

Optimization
Can be triggered by failing performance, or due to the need to add new data sources and retraining the model or even to deploy new versions of an improved model

Note: with increasing maturity and well-defined project goals, pre-defined performance can help evaluate feasibility of the data science project early enough in the data-science life cycle. This early comparison helps the team refine hypothesis, discard the project if non-viable, change approaches.

Source

[ VIDEO OF THE WEEK]

Understanding #FutureOfData in #Health & #Medicine - @thedataguru / @InovaHealth #FutureOfData #Podcast

 Understanding #FutureOfData in #Health & #Medicine – @thedataguru / @InovaHealth #FutureOfData #Podcast

Subscribe to  Youtube

[ QUOTE OF THE WEEK]

I’m sure, the highest capacity of storage device, will not enough to record all our stories; because, everytime with you is very valuable da

[ PODCAST OF THE WEEK]

@BrianHaugli @The_Hanover ?on Building a #Leadership #Security #Mindset #FutureOfData #Podcast

 @BrianHaugli @The_Hanover ?on Building a #Leadership #Security #Mindset #FutureOfData #Podcast

Subscribe 

iTunes  GooglePlay

[ FACT OF THE WEEK]

The largest AT&T database boasts titles including the largest volume of data in one unique database (312 terabytes) and the second largest number of rows in a unique database (1.9 trillion), which comprises AT&T’s extensive calling records.

Sourced from: Analytics.CLUB #WEB Newsletter