With big data invading campus, universities risk unfairly profiling their students

Obama’s proposed Student Digital Privacy Act aims to limit what schools can do with data collected from apps used in K-12 classrooms. But college students are just as vulnerable to privacy violations.

Privacy advocates have long been pushing for laws governing how schools and companies treat data gathered from students using technology in the classroom. Most now applaud President Obama‘s newly announced Student Digital Privacy Act to ensure “data collected in the educational context is used only for educational purposes.”

But while young students are vulnerable to privacy harms, things are tricky for college students, too. This is especially true as many universities and colleges gather and analyze more data about students’ academic — and personal — lives than ever before.

Jeffrey Alan Johnson, assistant director of institutional effectiveness and planning at Utah Valley University, has written about some of the main issues for universities and college students in the era of big data. I spoke with him about the ethical and privacy implications of universities using more data analytics techniques.

Recommended: Do you have a clue about teenage behavior? Take our quiz!

Selinger: Privacy advocates worry about companies creating profiles of us. Is there an analog in the academic space? Are profiles being created that can have troubling experiential effects?

Johnson: Absolutely. We’ve got an early warning system [called Stoplight] in place on our campus that allows instructors to see what a student’s risk level is for completing a class. You don’t come in and start demonstrating what kind of a student you are. The instructor already knows that. The profile shows a red light, a green light, or a yellow light based on things like have you attempted to take the class before, what’s your overall level of performance, and do you fit any of the demographic categories related to risk. These profiles tend to follow students around, even after folks change how they approach school. The profile says they took three attempts to pass a basic math course and that suggests they’re going to be pretty shaky in advanced calculus.

Selinger: Is this transparent to students? Do they actually know what information the professor sees?

Johnson: No, not unless the professor tells them. I don’t think students are being told about Stoplight at all. I don’t think students are being told about many of the systems in place. To my knowledge, they aren’t told about the basis of the advising system that Austin Peay put in place where they’re recommending courses to students based, in part, on their likelihood of success. They’re as unaware of these things as the general public is about how Facebook determines what users should see.

Evan Selinger is an associate professor of philosophy at Rochester Institute of Technology. Follow him on Twitter @EvanSelinger.

Originally posted via “With big data invading campus, universities risk unfairly profiling their students”.

 

Originally Posted at: With big data invading campus, universities risk unfairly profiling their students by analyticsweekpick

Jul 25, 19: #AnalyticsClub #Newsletter (Events, Tips, News & more..)

Warning: file_get_contents(http://news.analyticsweek.com/tw/newspull.php): failed to open stream: HTTP request failed! in /home3/vishaltao/public_html/mytao/script/includeit.php on line 15

Warning: file_get_contents(http://news.analyticsweek.com/tw/newspull.php): failed to open stream: HTTP request failed! in /home3/vishaltao/public_html/mytao/script/includeit.php on line 15

Warning: file_get_contents(http://news.analyticsweek.com/tw/newspull.php): failed to open stream: HTTP request failed! in /home3/vishaltao/public_html/mytao/script/includeit.php on line 15

[  COVER OF THE WEEK ]

image
Fake data  Source

[ LOCAL EVENTS & SESSIONS]

More WEB events? Click Here

[ AnalyticsWeek BYTES]

>> Are You Evolving Your Analytics? by analyticsweek

>> Meet Us in DC for the 2019 Logi Conference by analyticsweek

>> Big Data Is No Longer Confined to the Big Business Playbook by analyticsweekpick

Wanna write? Click Here

[ FEATURED COURSE]

Applied Data Science: An Introduction

image

As the world’s data grow exponentially, organizations across all sectors, including government and not-for-profit, need to understand, manage and use big, complex data sets—known as big data…. more

[ FEATURED READ]

Thinking, Fast and Slow

image

Drawing on decades of research in psychology that resulted in a Nobel Prize in Economic Sciences, Daniel Kahneman takes readers on an exploration of what influences thought example by example, sometimes with unlikely wor… more

[ TIPS & TRICKS OF THE WEEK]

Fix the Culture, spread awareness to get awareness
Adoption of analytics tools and capabilities has not yet caught up to industry standards. Talent has always been the bottleneck towards achieving the comparative enterprise adoption. One of the primal reason is lack of understanding and knowledge within the stakeholders. To facilitate wider adoption, data analytics leaders, users, and community members needs to step up to create awareness within the organization. An aware organization goes a long way in helping get quick buy-ins and better funding which ultimately leads to faster adoption. So be the voice that you want to hear from leadership.

[ DATA SCIENCE Q&A]

Q:What is the difference between supervised learning and unsupervised learning? Give concrete examples
?

A: * Supervised learning: inferring a function from labeled training data
* Supervised learning: predictor measurements associated with a response measurement; we wish to fit a model that relates both for better understanding the relation between them (inference) or with the aim to accurately predicting the response for future observations (prediction)
* Supervised learning: support vector machines, neural networks, linear regression, logistic regression, extreme gradient boosting
* Supervised learning examples: predict the price of a house based on the are, size.; churn prediction; predict the relevance of search engine results.
* Unsupervised learning: inferring a function to describe hidden structure of unlabeled data
* Unsupervised learning: we lack a response variable that can supervise our analysis
* Unsupervised learning: clustering, principal component analysis, singular value decomposition; identify group of customers
* Unsupervised learning examples: find customer segments; image segmentation; classify US senators by their voting.

Source

[ VIDEO OF THE WEEK]

#BigData @AnalyticsWeek #FutureOfData #Podcast with @Beena_Ammanath, @GE

 #BigData @AnalyticsWeek #FutureOfData #Podcast with @Beena_Ammanath, @GE

Subscribe to  Youtube

[ QUOTE OF THE WEEK]

The world is one big data problem. – Andrew McAfee

[ PODCAST OF THE WEEK]

#FutureOfData Podcast: Conversation With Sean Naismith, Enova Decisions

 #FutureOfData Podcast: Conversation With Sean Naismith, Enova Decisions

Subscribe 

iTunes  GooglePlay

[ FACT OF THE WEEK]

For a typical Fortune 1000 company, just a 10% increase in data accessibility will result in more than $65 million additional net income.

Sourced from: Analytics.CLUB #WEB Newsletter

Logi Tutorial: Step-by-Step Guide for Setting Up a Logi Application on AWS

This post originally appeared on dbSeer, a business analytics consulting firm and Logi Analytics partner.

Last year, we wrote a blog about how to use AWS Auto Scaling with Logi Analytics Applications. In that blog, we promised to release this step-by-step guide outlining the technical details of how a Logi Application can be configured to harness the scalability and elasticity features of AWS.

Enabling a multi-web server Logi application on AWS Windows instances requires the right configuration for some of the shared Logi files (cache files, secure key, bookmarks, etc.). To support these shared files, we need a shared network drive that can be accessed by the different Logi webservers. Currently EFS (Elastic File Storage) is not supported on Windows on AWS. Below we have defined how EFS can be mounted on Windows servers and setup so that you can utilize the scalability feature of Logi.

Setting Up the File Server

Overview

In order for our distributed Logi application to function properly, it needs access to a shared file location. This can be easily implemented with Amazon’s Elastic File System (EFS). However, if you’re using a Windows server to run your Logi application, extra steps are necessary, as Windows does not currently support EFS drives. In order to get around this constraint, it is necessary to create Linux based EC2 instances to serve as an in-between file server. The EFS volumes will be mounted on these locations and then our Windows servers will access the files via the Samba (SMB) protocol.

Steps

  • Create EC2:
    • Follow the steps as outlined in this AWS Get Started guide and choose: Image: “Ubuntu Server 16.04 LTS (HVM), SSD Volume Type”
    • Create an Instance with desired type e.g.: “t2.micro”
  • Create AWS EFS volume:
    • Follow the steps listed here and use same VPC and availability zone as used above
  • Setup AWS EFS inside the EC2:
    • Connect to the EC2 instance we created in Step 1 using SSH
  • Mount the EFS to the EC2 using the following commands:
    • sudo apt-get install -y nfs-common
    • mkdir /mnt/efs
    • mount -t nfs4 -o nfsvers=4.1,rsize=1048576,wsize=1048576,hard,timeo=600,retrans=2 EFS_IP_ADDRESS_HERE:/ /mnt/efs
  • Re-export NFS share to be used in Windows:
    • Give our Windows user access to its files. Let’s do this using samba. Again, drop the following to your shell for installing SMB(Samba) services in your Ubuntu EC2
  • Run the following commands:
    • apt-get install -y samba samba-common python-glade2 system-config-samba
    • cp -pf /etc/samba/smb.conf /etc/samba/smb.conf.bak
    • cat /dev/null -> /etc/samba/smb.conf
    • nano /etc/samba/smb.conf
  • And then, paste the text below inside the smb.conf file:

[global]
workgroup = WORKGROUP
server string = AWS-EFS-Windows
netbios name = ubuntu
dns proxy = no
socket options = TCP_NODELAY[efs]
path = /mnt/efs
read only = no
browseable = yes
guest ok = yes
writeable = yes

  • Create a Samba user/password. Use the same credentials as your EC2 user:
    • sudo smbpasswd –a ubuntu
  • Give Ubuntu user access to the mounted folder:
    • sudo chown ubuntu:ubuntu /mnt/efs/
  • And finally, restart the samba service:
    • sudo /etc/init.d/smbd restart

Setting Up the Application Server

Overview

Logi applications require setup in the form of settings, files, licenses, and more. In order to accommodate the elastic auto-scaling, we’ll set up one server – from creation to connecting to our shared drive to installing and configuring Logi – and then make an Amazon Machine Image (AMI) for use later.

Steps

  • Create EC2:
    • Follow the steps as outlined in this AWS Get Started guide and choose: Image: “Microsoft Windows Server 2016 Base”
    • Instance type: “t2.micro” or whatever type your application requires
  • Deploy code:
    • Clone your project repository and deploy the code in IIS
  • Set Up User Access:
    • Allow your application in IIS to access the shared folder (EFS) that we created inside the File server
    • From the control panel, choose users accounts → manage another account → add a user account
    • Use same username and password we created for the samba user in Ubuntu file server
    • In IIS, add the new Windows user we created above to the application connection pool, IIS → Application Pools → right click on your project application pool → identity → custom account → fill in the new username and password we created earlier.
  • Test EFS (shared folder) connection:
    • To test the connection between Windows application server and Ubuntu file server, go to:
    • This PC → computer tap → map network drive → in folder textbox type in “FILE_SERVER_IP_ADDRESSefs” → If credentials window appears for you, just use the new username and password we created earlier.

Configuring the Logi Application

Sticky and Non-Sticky Sessions

In a standard environment with one server, a session is established with the first HTTP request and all subsequent requests, for the life of the session, will be handled by that same server. However, in a load-balanced or clustered environment, there are two possibilities for handling requests: “sticky” sessions (sometimes called session affinity) and “non-sticky” sessions.

Use a sticky session to handle HTTP requests by centralizing the location of any shared resources and managing session state. You must create a centralized, shared location for cached data (rdDataCache folder), saved Bookmark files, _metaData folder, and saved Dashboard files because they must be accessible to all servers in the cluster.

Managing Session State

IIS is configured by default to manage session information using the “InProc” option. For both standalone and load-balanced, sticky environments, this option allows a single server to manage the session information for the life of the session.

Centralization of Application Resources

In a load-balanced environment, each web server must have Logi Server installed and properly licensed, and must have its own copy of the Logi application with its folder structure, system files, etc. This includes everything in the _SupportFiles folder such as images, style sheets, XML data files, etc., any custom themes, and any HTML or script files. We will achieve this by creating one instance with all the proper configurations, and then using an AMI.

Some application files should be centralized, which also allows for easier configuration management. These files include:

  • Definitions: Copies of report, process, widget, template, and any other necessary definitions (except _Settings) can be installed on each web server as part of the application, or centralized definitions may be used for easier maintenance (if desired).
    The location of definitions is configured in _Settings definition, using the Path element’s Alternative Definition Folder attribute, as shown above. This should be set to the UNC path to a shared network location accessible by all web servers, and the attribute value should include the _Definitions folder. Physically, within that folder, you should create the folders _Reports, _Processes, _Widgets, and _Templates as necessary. Do not include the _Settings definition in any alternate location; it must remain in the application folder on the web server as usual.
  • “Saved” Files: Many super-elements, such as the Dashboard and Analysis Grid, allow the user to save the current configuration to a file for later reuse. The locations of these files are specified in attributes of the elements.
    As shown in the example above, the Save File attribute value should be the UNC path to a shared network location (with file name, if applicable) accessible by all web servers.
  • Bookmarks: If used in an application, the location of these files should also be centralized:
    As shown above, in the _Settings definition, configure the General element’s Bookmark Folder Location attribute, with a UNC path to a shared network folder accessible by all web servers.

Using SecureKey Security

If you’re using Logi SecureKey security in a load-balanced environment, you need to configure security to share requests.

In the _Settings definition, set the Security element’s SecureKey Shared Folder attribute to a network path, as shown above. Files in the SecureKey folder are automatically deleted over time, so do not use this folder to store other files. It’s required to create the folder rdSecureKey under myProject shared folder, since it’s not auto created by Logi.

Note: “Authentication Client Addresses” must be replaced later with subnet IP addresses ranges of the load balancer VPC after completing the setup for load balancer below.

You can Specify ranges of IP addresses with wildcards. To use wildcards, specify an IP address, the space character, then the wildcard mask. For example to allow all addresses in the range of 172.16.*.*, specify:

172.16.0.0 0.0.255.255

Centralizing the Data Cache

The data cache repository is, by default, the rdDataCache folder in a Logi application’s root folder. In a standalone environment, where all the requests are processed by the same server, this default cache configuration is sufficient.

In a load-balanced environment, centralizing the data cache repository is required.

This is accomplished in Studio by editing a Logi application’s _Settings definition, as shown above. The General element’s Data Cache Location attribute value should be set to the UNC path of a shared network location accessible by all web servers. This change should be made in the _Settings definition for each instance of the Logi application (i.e. on each web server).

Note: “mySharedFileServer” IP/DNS address should be replaced later with file servers load balancer dns after completing the setup for load balancer below.

Creating and Configuring Your Load-Balancer

Overview

You’ll need to set up load balancers for both the Linux file server and the Windows application/web server. This process is relatively simple and is outlined below, and in the Getting Started guide here.

Steps

  • Windows application/web servers load balancer:
    • Use classic load balancers.
    • Use the same VPC that our Ubuntu file server’s uses.
    • Listener configuration: Keep defaults.
    • Health check configuration: Keep defaults and make sure that ping path is exists, i.e. “/myProject/rdlogon.aspx”
  • Add Instances: Add all Windows web/application servers to the load balancer, and check the status. All servers should give “InService” in 20-30 seconds.
    • To enable stickiness, select ELB > port configuration > edit stickiness > choose “enable load balancer generated cookie stickiness”, set expiration period for the same as well.
  • Linux file servers load balancer:
    • Use classic load balancers.
    • Use the same VPC that the EFS volume uses.
    • Listener configuration:
    • Health check configuration: Keep defaults and make sure that ping path is exists, i.e. “/index.html”
    • Note: A simple web application must be deployed to the Linux file servers, in order to set the health check. It should be running inside a web container like tomcat, then modify the ping path for the health checker to the deployed application path.
  • Add Instances: Add all Ubuntu file servers to the load balancer, and check the status, all servers should give “InService” in 20-30 seconds.

Using Auto-Scaling

Overview

In order to achieve auto-scaling, you need to set up a Launch Template and an Auto-Scaling Group. You can follow the steps in the link here, or the ones outlined below.

Steps

  • Create Launch Configuration:
    • Search and select the AMI that you created above.
    • Use same security group you used in your app server EC2 instance. (Windows)
  • Create an Auto Scaling Group
    • Make sure to select the launch configuration that we created above.
    • Make sure to set the group size, aka how many EE2 instances you want to have in the auto scaling group at all times.
    • Make sure to use same VPC we used for the Windows application server EC2s.
  • Set the Auto scaling policies:
    • Set min/max size of the group:
    • Min: minimum number of instances that will be launched at all times.
    • Max: maximum number of instances that will be launched once a metric condition is met.
  • Click on “Scale the Auto Scaling group using step or simple scaling policies”
  • Set the required values for:
    • Increase group size
      • Make sure that you create a new alarm that will notify your auto scaling group when the CPU utilization exceeds certain limits.
      • Make sure that you specify the action “add” and the number of instances that we want to add when the above alarm triggered.
    • Decrease group size
      • Make sure that you create a new alarm that will notify your auto scaling group when the CPU utilization is below certain limits.
      • Make sure that you specify the action and the number of instances that we want to add when the above alarm is triggered.
      • You can set the warm up time for the EC2, if necessary. This will depend on whether you have any initialization tasks that run after launching the EC2 instance, and if you want to wait for them to finish before starting to use the newly created instance.
      • You can also add a notification service to know when any instance is launched, terminated, failed to launch or failed to terminate by the auto scaling process.
  • Add tags to the auto scaling group. You can optionally choose to apply these tags to the instances in the group when they launch.
  • Review your settings and then click on Create Auto Scaling Group.

We hope this detailed how-to guide was helpful in helping you set up your Logi Application on AWS.

Please contact dbSeer if you have any questions or have any other how-to guide requests. We’re always happy to hear from you!

References:

Originally Posted at: Logi Tutorial: Step-by-Step Guide for Setting Up a Logi Application on AWS

Paul Ballew(@Ford) on running global data science group #FutureOfData #Podcast

[youtube https://www.youtube.com/watch?v=__H9DtfnG54]

Youtube: https://youtu.be/__H9DtfnG54
iTunes: http://apple.co/2ktprer

In this podcast Paul Ballew(@Ford) talks about best practices when running a data science organization spanned across multiple continents. He shared the importance of being Smart, Nice and Inquisitive in creating tomorrow’s workforce today. He sheds some lights on importance of appreciating culture when defining forward looking policies. He also gave a case for non-native group and discuss ways to implement data science as a central organization(with no hub-spoke model). This podcast is great for future data science leaders leading organizations with broad consumer base and multiple geo-political silos.

Paul’s Recommended Read:
The Outsiders Paperback – S. E. Hinton http://amzn.to/2Ai84Gl

Podcast Link:
iTunes: http://math.im/itunes
GooglePlay: http://math.im/gplay

Paul’s BIO:
Paul Ballew is vice president and Global Chief Data and Analytics officer, Ford Motor Company, effective June 1, 2017. At the same time, he also was elected a Ford Motor Company officer. In this role, he leads Ford’s global data and analytics teams for the enterprise.
Previously, Ballew was Global Chief Data and Analytics officer, a position to which he was named to in December 2014. In this role, he has been responsible for establishing and growing the company’s industry-leading data and analytics operations that is driving significant business value throughout the enterprise.
Prior to joining Ford, he was Chief Data, Insight & Analytics Officer at Dun & Bradstreet. In this capacity, he was responsible for the company’s global data and analytic activities along with the company’s strategic consulting practice.
Previously, Ballew served as Nationwide’s senior vice president for Customer Insight and Analytics. He directed customer analytics, market research, and information and data management functions, and supported the company’s marketing strategy. His responsibilities included development of Nationwide’s customer analytics, data operations and strategy. Ballew joined Nationwide in November 2007 and established the company’s Customer Insights and Analytics capabilities.
Prior to joining Nationwide, Ballew served as General Motors Corporation’s executive director for Global Market and Industry Analysis. He was responsible for the company’s research, consumer data and information, forecasting and sales, and customer, economic and industry analytic functions. He also directed the company’s sales and marketing strategic planning activities in North America and served as a senior director for global product planning activities.
Prior to joining GM, Ballew was a partner for J.D. Power and Associates from 1995 to 1999. At J.D. Power, Paul was responsible for global analysis, forecasting and the establishment of the firm’s consulting activities. During his tenure, he was the company’s senior advisor on industry conditions and corporate strategies.
Before joining J.D. Power, Ballew was a research officer and senior economist with the Federal Reserve from 1988 to 1995, specializing on the automotive industry. Responsibilities included the oversight of the Fed’s automotive research activities, serving as an advisor to the president of the Federal Reserve Bank of Chicago and advising the Board of Governors.
Ballew sits on the boards of Neustar, Inc. and Hyatt Hotels Corporation. He was born in 1964 and has bachelor’s and master’s degree in Economics from the University of Detroit.

About #Podcast:
#FutureOfData podcast is a conversation starter to bring leaders, influencers and lead practitioners to come on show and discuss their journey in creating the data driven future.

Wanna Join?
If you or any you know wants to join in,
Register your interest @ http://play.analyticsweek.com/guest/

Want to sponsor?
Email us @ info@analyticsweek.com

Keywords:
#FutureOfData #DataAnalytics #Leadership #Podcast #BigData #Strategy

Source: Paul Ballew(@Ford) on running global data science group #FutureOfData #Podcast

Jul 18, 19: #AnalyticsClub #Newsletter (Events, Tips, News & more..)

Warning: file_get_contents(http://news.analyticsweek.com/tw/newspull.php): failed to open stream: HTTP request failed! in /home3/vishaltao/public_html/mytao/script/includeit.php on line 15

Warning: file_get_contents(http://news.analyticsweek.com/tw/newspull.php): failed to open stream: HTTP request failed! in /home3/vishaltao/public_html/mytao/script/includeit.php on line 15

Warning: file_get_contents(http://news.analyticsweek.com/tw/newspull.php): failed to open stream: HTTP request failed! in /home3/vishaltao/public_html/mytao/script/includeit.php on line 15

[  COVER OF THE WEEK ]

image
Data shortage  Source

[ LOCAL EVENTS & SESSIONS]

More WEB events? Click Here

[ AnalyticsWeek BYTES]

>> Jul 27, 17: #AnalyticsClub #Newsletter (Events, Tips, News & more..) by admin

>> Are APIs becoming the keys to customer experience? by analyticsweekpick

>> Could these 5 big data projects stop climate change? by analyticsweekpick

Wanna write? Click Here

[ FEATURED COURSE]

Process Mining: Data science in Action

image

Process mining is the missing link between model-based process analysis and data-oriented analysis techniques. Through concrete data sets and easy to use software the course provides data science knowledge that can be ap… more

[ FEATURED READ]

The Misbehavior of Markets: A Fractal View of Financial Turbulence

image

Mathematical superstar and inventor of fractal geometry, Benoit Mandelbrot, has spent the past forty years studying the underlying mathematics of space and natural patterns. What many of his followers don’t realize is th… more

[ TIPS & TRICKS OF THE WEEK]

Keeping Biases Checked during the last mile of decision making
Today a data driven leader, a data scientist or a data driven expert is always put to test by helping his team solve a problem using his skills and expertise. Believe it or not but a part of that decision tree is derived from the intuition that adds a bias in our judgement that makes the suggestions tainted. Most skilled professionals do understand and handle the biases well, but in few cases, we give into tiny traps and could find ourselves trapped in those biases which impairs the judgement. So, it is important that we keep the intuition bias in check when working on a data problem.

[ DATA SCIENCE Q&A]

Q:Explain selection bias (with regard to a dataset, not variable selection). Why is it important? How can data management procedures such as missing data handling make it worse?
A: * Selection of individuals, groups or data for analysis in such a way that proper randomization is not achieved
Types:
– Sampling bias: systematic error due to a non-random sample of a population causing some members to be less likely to be included than others
– Time interval: a trial may terminated early at an extreme value (ethical reasons), but the extreme value is likely to be reached by the variable with the largest variance, even if all the variables have similar means
– Data: “cherry picking”, when specific subsets of the data are chosen to support a conclusion (citing examples of plane crashes as evidence of airline flight being unsafe, while the far more common example of flights that complete safely)
– Studies: performing experiments and reporting only the most favorable results
– Can lead to unaccurate or even erroneous conclusions
– Statistical methods can generally not overcome it

Why data handling make it worse?
– Example: individuals who know or suspect that they are HIV positive are less likely to participate in HIV surveys
– Missing data handling will increase this effect as it’s based on most HIV negative
-Prevalence estimates will be unaccurate

Source

[ VIDEO OF THE WEEK]

Pascal Marmier (@pmarmier) @SwissRe discusses running data driven innovation catalyst

 Pascal Marmier (@pmarmier) @SwissRe discusses running data driven innovation catalyst

Subscribe to  Youtube

[ QUOTE OF THE WEEK]

You can have data without information, but you cannot have information without data. – Daniel Keys Moran

[ PODCAST OF THE WEEK]

@AlexWG on Unwrapping Intelligence in #ArtificialIntelligence #FutureOfData #Podcast

 @AlexWG on Unwrapping Intelligence in #ArtificialIntelligence #FutureOfData #Podcast

Subscribe 

iTunes  GooglePlay

[ FACT OF THE WEEK]

In the developed economies of Europe, government administrators could save more than €100 billion ($149 billion) in operational efficiency improvements alone by using big data, not including using big data to reduce fraud and errors and boost the collection of tax revenues.

Sourced from: Analytics.CLUB #WEB Newsletter

Jul 11, 19: #AnalyticsClub #Newsletter (Events, Tips, News & more..)

Warning: file_get_contents(http://news.analyticsweek.com/tw/newspull.php): failed to open stream: HTTP request failed! in /home3/vishaltao/public_html/mytao/script/includeit.php on line 15

Warning: file_get_contents(http://news.analyticsweek.com/tw/newspull.php): failed to open stream: HTTP request failed! in /home3/vishaltao/public_html/mytao/script/includeit.php on line 15

Warning: file_get_contents(http://news.analyticsweek.com/tw/newspull.php): failed to open stream: HTTP request failed! in /home3/vishaltao/public_html/mytao/script/includeit.php on line 15

[  COVER OF THE WEEK ]

image
Statistics  Source

[ LOCAL EVENTS & SESSIONS]

More WEB events? Click Here

[ AnalyticsWeek BYTES]

>> Aureus at InsureTech Connect 2017, Las Vegas by analyticsweek

>> Are You Headed for the Analytics Cliff? by analyticsweek

>> Big data: The critical ingredient by analyticsweekpick

Wanna write? Click Here

[ FEATURED COURSE]

Deep Learning Prerequisites: The Numpy Stack in Python

image

The Numpy, Scipy, Pandas, and Matplotlib stack: prep for deep learning, machine learning, and artificial intelligence… more

[ FEATURED READ]

The Future of the Professions: How Technology Will Transform the Work of Human Experts

image

This book predicts the decline of today’s professions and describes the people and systems that will replace them. In an Internet society, according to Richard Susskind and Daniel Susskind, we will neither need nor want … more

[ TIPS & TRICKS OF THE WEEK]

Save yourself from zombie apocalypse from unscalable models
One living and breathing zombie in today’s analytical models is the pulsating absence of error bars. Not every model is scalable or holds ground with increasing data. Error bars that is tagged to almost every models should be duly calibrated. As business models rake in more data the error bars keep it sensible and in check. If error bars are not accounted for, we will make our models susceptible to failure leading us to halloween that we never wants to see.

[ DATA SCIENCE Q&A]

Q:What is the Law of Large Numbers?
A: * A theorem that describes the result of performing the same experiment a large number of times
* Forms the basis of frequency-style thinking
* It says that the sample mean, the sample variance and the sample standard deviation converge to what they are trying to estimate
* Example: roll a dice, expected value is 3.5. For a large number of experiments, the average converges to 3.5

Source

[ VIDEO OF THE WEEK]

@Schmarzo @DellEMC on Ingredients of healthy #DataScience practice #FutureOfData #Podcast

 @Schmarzo @DellEMC on Ingredients of healthy #DataScience practice #FutureOfData #Podcast

Subscribe to  Youtube

[ QUOTE OF THE WEEK]

Data are becoming the new raw material of business. – Craig Mundie

[ PODCAST OF THE WEEK]

#FutureOfData with @CharlieDataMine, @Oracle discussing running analytics in an enterprise

 #FutureOfData with @CharlieDataMine, @Oracle discussing running analytics in an enterprise

Subscribe 

iTunes  GooglePlay

[ FACT OF THE WEEK]

571 new websites are created every minute of the day.

Sourced from: Analytics.CLUB #WEB Newsletter

BARC Survey Shows New Benefits from Embedded Analytics

Application teams are embedding analytics in their products at an increasingly rapid pace. More than 85 percent of application teams have embedded dashboards, reports, and analytics in their software products, according to Logi’s 2018 State of Embedded Analytics Report. And they’re seeing value from their efforts: 92 percent of respondents say enhancing their products with analytics has increased competitive differentiation, with over 90 percent crediting it with improving win rates, increasing adoption, and reducing customer churn.

Now a new survey from the Business Application Research Center (BARC) indicates even more value may come from embedded analytics. According to The BI Survey 2018, the world’s largest annual survey of business intelligence (BI) software users, companies that encourage more users to adopt BI also see additional business benefits from their BI projects.

Related: New Study: Top 3 Trends in Embedded Analytics

 “Companies claiming to have achieved the most benefit from their BI tools (‘Best-in-Class’) have on average nine percent more BI users than those achieving the least benefit (‘Laggards’), suggesting that there is a relationship between the number of BI users and the degree of benefits an organization gains.” writes BARC in the report. “This should provide an incentive for businesses to maximize BI tool penetration and train as many employees as possible to use their BI tool.”

If more BI users means more business benefits, the natural question becomes, how do you get more BI users? As Logi’s own data from the 2018 State of Embedded Analytics Report shows, the best way to increase adoption of BI tools is to embed analytics in the applications people already use.Adoption of standalone vs embedded

In fact, embedded analytics sees twice the adoption rates of standalone BI solutions. Why? Because business users want to stay in one place, not jump from application to application to get what they need. In the 2017 State of Analytics Adoption Report, over 83 percent of business professionals expressed a strong desire to stay in one application, when and where a decision is needed, instead of wasting precious time switching applications. People clearly want their information in context of where they work, and embedded analytics delivers on this need.

According to our survey, 67 percent of application teams say time spent in their applications increased after they embedded analytics. On top of that, they cite the substantial business benefits of embedding analytics:

  • 96 percent of companies said embedded analytics contributes to overall revenue growth
  • 94 percent said it boosts user satisfaction
  • 93 percent said they’ve improved user experiences

 

Ready to embed analytics in your application? Gartner outlines best practices on evaluating solutions in its analyst paper, “5 Best Practices for Choosing an Embedded Analytics Platform Provider.”

 

Source by analyticsweek

Guide to business intelligence and health IT analytics

large_article_im1353_Data_Analytics

Introduction
Technology is frequently used as a tool through which healthcare providers and their IT departments can monitor and improve the business and personal performance of every aspect of their organization. For example, an analytics program that is deployed to examine a patient population’s medical data can then become the starting point for a provider’s business intelligence program. The results found by mining patient data can inform future care decisions and help the IT team discover any technology-related operational malfunctions.

There’s no doubt technology can be a valuable asset to healthcare practitioners when used properly, but convincing them to use new technology hasn’t been a cinch. Some physicians neglect clinical decision support tools in favor of consulting a colleague. A downside of healthcare organizations installing new technology containing patient data is that it creates additional security concerns. The ability for new technology to analyze data without improperly exposing protected health information will be key to determining how much it can improve the delivery of healthcare.

1Business intelligence
Applications of healthcare business intelligence

There is more data than ever for healthcare providers to use to maximize their operational efficiency. Information derived from social media and captured on patients’ mobile health devices are two examples. This section covers how providers are using business intelligence tools to analyze data and improve the experience of their patients. Business intelligence through cloud computing is an option for providers, but it comes with its own set of security issues.

Tip
Discover how providers apply business intelligence to big data

Social media is yet another source of data through which providers can monitor patients and health trends. Learn how they can apply this data to their business goals. Continue Reading

Tip
Business advantages of cloud have the attention of healthcare organizations

Security is a particularly strong concern for healthcare organizations that deploy cloud services. Continue Reading

Tip
Five keys to mastering healthcare business intelligence

A successful business intelligence program starts with good data. What’s required to turn that data into meaningful analysis may be a surprise. Continue Reading

Tip
Tips for patching analytics, business intelligence errors

Find out why healthcare analytics and business intelligence technology can fail, even after those systems are up and running. Continue Reading

Tip
Boost in computing power multiples power of health IT

Cloud computing and artificial intelligence are only two of the business intelligence tools that are molding the future of healthcare. Continue Reading
2Analytics at the point of care
Clinical decision support and health IT analytics

How can providers mine health data for information without exposing patients’ private information? That important question is examined in this section of the guide. Also, learn why some physicians have accepted the analysis provided to them via clinical decision support tools and why others still refuse to consult this form of technology for a second opinion when making a decision about a patient’s care. Like every other form of technology, healthcare analytics resources are only as good as their security and backup measures allow them to be. A cybersecurity expert explains how to approach protecting your health IT department from today’s threats.

News
The ups and downs of clinical decision support

A years-long government-sponsored study turned up some surprising results about the efficacy of analytics. Continue Reading

Podcast
Cybersecurity pro examines threats in healthcare

Analytics are no use unless healthcare organizations protect their data. Mac McMillan dishes out advice on what security precautions to take. Continue Reading

Tip
Privacy a top concern during clinical analysis

An analytics expert explains under which circumstances a patient’s identifying information should be available. Continue Reading

Tip
Analytics becoming a way of life for providers

Discover why more healthcare organizations are using analytics tools to keep up with regulatory changes. Continue Reading

Tip
Real-time analytics slowly working its way into patient care

Find out why physicians are wary of becoming too reliant on clinical decision support tools. Continue Reading

Tip
Quantity of data challenges healthcare analysts

There are a few simple steps health IT analysts should follow when examining data they are unfamiliar with. Continue Reading

Tip
Analytics backups preserve clinical decision support

Too many healthcare organizations take analytics for granted and don’t realize what would happen to their workflows if their backups failed. Continue Reading
3Population health management
How technology controls population health

Population health management, or the collective treatment of a group of patients, is an area that has matured along with the use of technology in healthcare. Though technology has come a long way, there are still hurdles, including those involving the exchange of health information among care facilities, that are causing hospitals to achieve treatment advances at different rates. This section contains information on why participating in an accountable care organization is one way for healthcare providers to commit to improving their population’s health and why that commitment has proven elusive for some.

Feature
Population health management in the home

Find out when healthcare services could become part of your cable bill. Continue Reading

Tip
Accountable care progress held up by technology

Technology that supports health information exchange is being adopted at a plodding rate. Learn why this is affecting accountable care organizations. Continue Reading

Feature
Karen DeSalvo, M.D. explains her public health mission

Karen DeSalvo goes into why public health goals shouldn’t be brushed aside. Continue Reading

Feature
Clinical decision support education a must

Too many physicians still don’t know how to use clinical decision support technology to their advantage. Continue Reading

Podcast
Chief information officer walks through population health process

A CIO of a New Jersey hospital system shares his organizations’ technology-based population health plan and how it will lead them to accountable care. Continue Reading

Feature
National health IT coordinator talks population health

The head of the Office of the National Coordinator for Health IT explains her career background and the early days of her government tenure. Continue Reading

Note: This article originally appeared in TechTarget. Click for link here.

Originally Posted at: Guide to business intelligence and health IT analytics by analyticsweekpick

Jul 04, 19: #AnalyticsClub #Newsletter (Events, Tips, News & more..)

Warning: file_get_contents(http://news.analyticsweek.com/tw/newspull.php): failed to open stream: HTTP request failed! in /home3/vishaltao/public_html/mytao/script/includeit.php on line 15

Warning: file_get_contents(http://news.analyticsweek.com/tw/newspull.php): failed to open stream: HTTP request failed! in /home3/vishaltao/public_html/mytao/script/includeit.php on line 15

Warning: file_get_contents(http://news.analyticsweek.com/tw/newspull.php): failed to open stream: HTTP request failed! in /home3/vishaltao/public_html/mytao/script/includeit.php on line 15

[  COVER OF THE WEEK ]

image
Fake data  Source

[ LOCAL EVENTS & SESSIONS]

More WEB events? Click Here

[ FEATURED COURSE]

Pattern Discovery in Data Mining

image

Learn the general concepts of data mining along with basic methodologies and applications. Then dive into one subfield in data mining: pattern discovery. Learn in-depth concepts, methods, and applications of pattern disc… more

[ FEATURED READ]

The Misbehavior of Markets: A Fractal View of Financial Turbulence

image

Mathematical superstar and inventor of fractal geometry, Benoit Mandelbrot, has spent the past forty years studying the underlying mathematics of space and natural patterns. What many of his followers don’t realize is th… more

[ TIPS & TRICKS OF THE WEEK]

Fix the Culture, spread awareness to get awareness
Adoption of analytics tools and capabilities has not yet caught up to industry standards. Talent has always been the bottleneck towards achieving the comparative enterprise adoption. One of the primal reason is lack of understanding and knowledge within the stakeholders. To facilitate wider adoption, data analytics leaders, users, and community members needs to step up to create awareness within the organization. An aware organization goes a long way in helping get quick buy-ins and better funding which ultimately leads to faster adoption. So be the voice that you want to hear from leadership.

[ DATA SCIENCE Q&A]

Q:Explain what a long-tailed distribution is and provide three examples of relevant phenomena that have long tails. Why are they important in classification and regression problems?
A: * In long tailed distributions, a high frequency population is followed by a low frequency population, which gradually tails off asymptotically
* Rule of thumb: majority of occurrences (more than half, and when Pareto principles applies, 80%) are accounted for by the first 20% items in the distribution
* The least frequently occurring 80% of items are more important as a proportion of the total population
* Zipf’s law, Pareto distribution, power laws

Examples:
1) Natural language
– Given some corpus of natural language – The frequency of any word is inversely proportional to its rank in the frequency table
– The most frequent word will occur twice as often as the second most frequent, three times as often as the third most frequent…
– The” accounts for 7% of all word occurrences (70000 over 1 million)
– ‘of” accounts for 3.5%, followed by ‘and”…
– Only 135 vocabulary items are needed to account for half the English corpus!

2. Allocation of wealth among individuals: the larger portion of the wealth of any society is controlled by a smaller percentage of the people

3. File size distribution of Internet Traffic

Additional: Hard disk error rates, values of oil reserves in a field (a few large fields, many small ones), sizes of sand particles, sizes of meteorites

Importance in classification and regression problems:
– Skewed distribution
– Which metrics to use? Accuracy paradox (classification), F-score, AUC
– Issue when using models that make assumptions on the linearity (linear regression): need to apply a monotone transformation on the data (logarithm, square root, sigmoid function…)
– Issue when sampling: your data becomes even more unbalanced! Using of stratified sampling of random sampling, SMOTE (‘Synthetic Minority Over-sampling Technique”, NV Chawla) or anomaly detection approach

Source

[ VIDEO OF THE WEEK]

@JohnNives on ways to demystify AI for enterprise #FutureOfData #Podcast

 @JohnNives on ways to demystify AI for enterprise #FutureOfData #Podcast

Subscribe to  Youtube

[ QUOTE OF THE WEEK]

You can have data without information, but you cannot have information without data. – Daniel Keys Moran

[ PODCAST OF THE WEEK]

@JohnTLangton from @Wolters_Kluwer discussed his #AI Lead Startup Journey #FutureOfData #Podcast

 @JohnTLangton from @Wolters_Kluwer discussed his #AI Lead Startup Journey #FutureOfData #Podcast

Subscribe 

iTunes  GooglePlay

[ FACT OF THE WEEK]

As recently as 2009 there were only a handful of big data projects and total industry revenues were under $100 million. By the end of 2012 more than 90 percent of the Fortune 500 will likely have at least some big data initiatives under way.

Sourced from: Analytics.CLUB #WEB Newsletter

Enhancing CRM with Real-Time, Distributed Data Integrations

The world of CRM is relatively slow to change. These repositories still excel at making available numerous types of largely historical data based on user accounts, frequently asked questions, manual notes, and databases containing these and other forms of customer information.

According to UJET CEO Anand Janefalkar, however, CRM is much less effective for real-time data, particularly those spawned from heterogeneous settings involving contemporary applications of the Internet of Things and mobile technologies: “It just takes a different level of focus to not only reduce the latency, not only shift it’s intent, but also have a specific focus on real-time interactions and user experience.”

Nonetheless, there are a number of contemporary developments taking place within the CRM space (and that for customer service in general) that are designed to enrich the customer experience and the service organizations provide their endpoint customers in velocities equitable to that of modern mobile and big data technologies.

Prudent usage of these mechanisms produces “bi-directional, smart, high-bandwidth communication so that way, there are no artificial limits, and there’s all of the options available for someone to really curate and configure their vision of the user journey,” Janefalkar mentioned.

Embedded, Cloud-Based Data Integrations
Embedding contemporary contact center options, typically in the form of widgets or adaptors, inside of CRM makes them suddenly viable for a host of real-time data sources. Many contact center solutions are facilitated through the cloud, so that they offer omni-channel experiences in which users can communicate with the enterprise via text, chats, mobile apps, web sites, phone calls, and just about any other form of electronic communication. Highly competitive platforms “design an extremely meticulous user experience to enable agents and customers to communicate visually and contextually,” Janefalkar said.

By embedding the adaptors for these solutions into CRM, organizations can now make available an assortment of low latent data which otherwise would have proved too arduous to assemble quickly enough—and which can drastically improve customer service. Examples of these data sources include “photos, videos, screenshots, sensor data, [which] is either requested by an agent or sent from the web site or the smart phone app to the agent,” Janefalkar revealed. “All of that gets stored into the CRM in real time.” With this approach, CRM is suddenly equipped with a multitude of largely unstructured data to associate with specific customers.

Decentralized Use Cases
The practical business value of enhancing CRM with low latent data integrations from distributed sources varies according to verticals, yet is always almost demonstrable. Perhaps the most significant factor about this methodology is it enables for low latent integrations of data from distributed sources outside of enterprise firewalls. In insurance, for example, if a customer gets into a fender bender, he or she can use a mobile application to present digital identification to the law enforcement officer summoned, then inform the process with contextual and visual information regarding the encounter. This information might include photographs or even videos of the scene, all of which is transmitted alongside any other digital information attained at the time (such as the other party’s contact and insurance information), and embedded into “the case or the ticket,” Janefalkar said.

The resulting workflow efficiency contributes to faster resolutions and better performance because “when the first agent is contacted, they take this information and it gets logged into the CRM,” Janefalkar explained. “And then, that gets passed over to a claims assessor. All that information’s already there. The claims assessor doesn’t have to call you back and ask the same questions, ask you to send an email with the photos that you have. Obviously, since it’s after the fact you wouldn’t have access to a video of the site, because you may not have taken it.”

Visual and Contextual Data Integrations
The rapid integration of visual and contextual decentralized data inside CRM to expedite and improve customer service is also an integral approach to handling claims of damaged or incorrect items from e-commerce sites. There’s also a wide range of applicability in other verticals, as well.

The true power of these celeritous integrations of data within CRM is they expand the utility of these platforms, effectively modernize them at the pace of contemporary business, and “make them even better by providing a deep integration into the CRMs so that all of the data and business rules are fetched in real time, so that the agent doesn’t have to go back and forth between different tabs or windows,” Janefalkar said. “But also, when the conversation is done and then the photos and the secure information, they’re not going through any different source. It gets completely archived from us and put back into the source of truth, which usually is the CRM.”

Source: Enhancing CRM with Real-Time, Distributed Data Integrations by jelaniharper