5 tips to becoming a big data superhero

superhero_businessman

Who’s the most powerful superhero?

Rishi Sikka, MD has a favorite and it’s one most people have probably never even heard of: Waverider.

Sikka, senior vice president of clinical transformation at Advocate Healthcare, considers Waverider the most powerful superhero because he can surf the time stream and predict the future.

Leading up to his presentation here at the Healthcare IT News Big Data and Healthcare Analytics Forum in New York, Sikka looked up the word “hero” and found that it has existed for millennia — it was even used prior to tongues we can trace — and the root concept is “to protect.”

Based on that definition from the Latin, and with a focus on population health management in mind, Sikka shared a fistful of tips about becoming a big data superhero.

1. Your power is looking to the future but your strength lies in the past and present. So healthcare professionals and organizations must assemble the data necessary to understand your current state of being, including knowing as much as possible about patients.

2. Pick your battles wisely. “All the great superheroes know when it’s time to move on,” Sikka said, pointing to the need for risk stratification and strategic resource allocation, which is “where big data and population health intersect.”

3. Your enemy has a name – and it’s regression to the mean. “I know it’s not very sexy,” Sikka said of his description of that enemy. He recommended that healthcare organizations consider the impactability of what they are doing, or focusing on where they can have the biggest impact. “I hope impactability will become a buzzword in the next year or two.”

4. Your superhero name is not … Cassandra. “It’s a lovely name,” Sikka explained, “just don’t pick it as a superhero name.” Why not? In Greek mythology, Cassandra, daughter of Apollo and a mortal mother, could predict the future. That was the blessing. The curse: Nobody believed her. “We don’t want population health to be an academic exercise.”

5. Don’t forget your mission. Every superhero is out fighting the bad guys, saving humanity, but sometimes even they can forget why they’re on this earth. “When we talk about population health we talk a lot about cost. We talk about bending the cost curve,” he added, “but I don’t know a single person on the front lines of care who gets jazzed up to bend the cost curve. The work revolves fundamentally around health.” Sikka suggested that healthcare professionals work to steer the dialogue back to clinical outcomes and wellness.

Sikka wound back around to the root of the word hero: “Our goal with respect to analytics, big data, population health,” he said, “is to protect, aid, support, those who give and receive care.”

To read the original article on Healthcare IT News, click here.

Originally Posted at: 5 tips to becoming a big data superhero

Sep 13, 18: #AnalyticsClub #Newsletter (Events, Tips, News & more..)

[  COVER OF THE WEEK ]

image
Data security  Source

[ LOCAL EVENTS & SESSIONS]

More WEB events? Click Here

[ AnalyticsWeek BYTES]

>> How to Use MLflow, TensorFlow, and Keras with PyCharm by analyticsweek

>> The Upper Echelons of Cognitive Computing: Deriving Business Value from Speech Recognition by jelaniharper

>> 20 Best Practices for Customer Feedback Programs: Business Process Integration by bobehayes

Wanna write? Click Here

[ NEWS BYTES]

>>
 Senior Analyst, Marketing Analytics – Built In Chicago Under  Marketing Analytics

>>
 Global Financial Analytics Market 2017-2026 By Raw Materials, Manufacturing Expenses And Process Analysis – DailyHover Under  Financial Analytics

>>
 5 Tactics That Separate Analytics Leaders From Followers – Forbes Under  Analytics

More NEWS ? Click Here

[ FEATURED COURSE]

CS109 Data Science

image

Learning from data in order to gain useful predictions and insights. This course introduces methods for five key facets of an investigation: data wrangling, cleaning, and sampling to get a suitable data set; data managem… more

[ FEATURED READ]

Superintelligence: Paths, Dangers, Strategies

image

The human brain has some capabilities that the brains of other animals lack. It is to these distinctive capabilities that our species owes its dominant position. Other animals have stronger muscles or sharper claws, but … more

[ TIPS & TRICKS OF THE WEEK]

Winter is coming, warm your Analytics Club
Yes and yes! As we are heading into winter what better way but to talk about our increasing dependence on data analytics to help with our decision making. Data and analytics driven decision making is rapidly sneaking its way into our core corporate DNA and we are not churning practice ground to test those models fast enough. Such snugly looking models have hidden nails which could induce unchartered pain if go unchecked. This is the right time to start thinking about putting Analytics Club[Data Analytics CoE] in your work place to help Lab out the best practices and provide test environment for those models.

[ DATA SCIENCE Q&A]

Q:Explain selection bias (with regard to a dataset, not variable selection). Why is it important? How can data management procedures such as missing data handling make it worse?
A: * Selection of individuals, groups or data for analysis in such a way that proper randomization is not achieved
Types:
– Sampling bias: systematic error due to a non-random sample of a population causing some members to be less likely to be included than others
– Time interval: a trial may terminated early at an extreme value (ethical reasons), but the extreme value is likely to be reached by the variable with the largest variance, even if all the variables have similar means
– Data: “cherry picking”, when specific subsets of the data are chosen to support a conclusion (citing examples of plane crashes as evidence of airline flight being unsafe, while the far more common example of flights that complete safely)
– Studies: performing experiments and reporting only the most favorable results
– Can lead to unaccurate or even erroneous conclusions
– Statistical methods can generally not overcome it

Why data handling make it worse?
– Example: individuals who know or suspect that they are HIV positive are less likely to participate in HIV surveys
– Missing data handling will increase this effect as it’s based on most HIV negative
-Prevalence estimates will be unaccurate

Source

[ VIDEO OF THE WEEK]

#GlobalBusiness at the speed of The #BigAnalytics

 #GlobalBusiness at the speed of The #BigAnalytics

Subscribe to  Youtube

[ QUOTE OF THE WEEK]

I keep saying that the sexy job in the next 10 years will be statisticians. And I’m not kidding. – Hal Varian

[ PODCAST OF THE WEEK]

#BigData @AnalyticsWeek #FutureOfData #Podcast with Nathaniel Lin (@analytics123), @NFPA

 #BigData @AnalyticsWeek #FutureOfData #Podcast with Nathaniel Lin (@analytics123), @NFPA

Subscribe 

iTunes  GooglePlay

[ FACT OF THE WEEK]

More than 200bn HD movies – which would take a person 47m years to watch.

Sourced from: Analytics.CLUB #WEB Newsletter

Big universe, big data, astronomical opportunity

30 Oct 2010 --- Open cluster Messier 39 in the constellation Cygnus. --- Image by © Alan Dyer/Stocktrek Images/Corbis
30 Oct 2010 — Open cluster Messier 39 in the constellation Cygnus. — Image by © Alan Dyer/Stocktrek Images/Corbis

Astronomical data is and has always been “big data”. Once that was only true metaphorically, now it is true in all senses. We acquire it far more rapidly than the rate at which we can process, analyse and exploit it. This means we are creating a vast global repository that may already hold answers to some of the fundamental questions of the Universe we are seeking.

Does this mean we should cancel our up-coming missions and telescopes – after all why continue to order food when the table is replete? Of course not. What it means is that, while we continue our inevitable yet budget limited advancement into the future, so we must also simultaneously do justice to the data we have already acquired.

In a small way we already doing this. Consider citizen science, where public participation in the analysis of archived data increases the possibility of real scientific discovery. It’s a natural evolution, giving those with spare time on their hands the chance to advance scientific knowledge.

However, soon this will not be sufficient. What we need is a new breed of professional astronomy data-miners eager to get their hands dirty with “old” data, with the capacity to exploit more readily the results and findings.

Thus far, human ingenuity, and current technology have ensured that data storage capabilities have kept pace with the massive output of the electronic stargazers. The real struggle is now figuring out how to search and synthesize that output.

The greatest challenges for tackling large astronomical data sets are:

Visualisation of astronomical datasets
Creation and utilisation of efficient algorithms for processing large datasets.
The efficient development of, and interaction with, large databases.
The use of “machine learning” methodologies
The challenges unique to astronomical data are borne out of the characteristics of big data. The three Vs: volume – amount of data, variety – complexity of data and the sources that it is gathered from and velocity – rate of data and information flow. It is a problem that is getting worse.

In 2004, the data I used for my Masters had been acquired in the mid-1990s by the United Kingdom Infra-Red Telescope (UKIRT), Hawaii. In total it amounted a few 10s of Gigabytes.

Moving onward just a matter of months to my PhD, I was studying data taken from one the most successful ground based surveys in the history of astronomy, the Sloan Digital Sky Survey (SDSS). The volume of data I was having to cope with was orders of magnitude more.

SDSS entered routine operations in 2000. At the time of Data Release 12 (DR12) in July 2014 the total volume of that release was 116TB. Even this pales next to the Large Synoptic Survey Telescope (LSST). Planned to enter operation in 2022, it is aiming to gather 30TB a night.

To make progress with this massive data set, astronomy must embrace a new era of data-mining techniques and technologies. These include the application of artificial intelligence, machine learning, statistics, and database systems, to extract information from a data set and transform it into an understandable structure for further use.

Now while many scientists find themselves focused on solving these issues, let’s just pull back a moment and ask the tough questions. For what purpose are we gathering all this new data? What value do we gain from just collecting it? For that matter, have we learned all that we can from the data that we have?

It seems that the original science of data, astronomy, has a lot to learn from the new kid on the block, data science. Think about it. What if, as we strive to acquire and process more photons from across the farther reaches of the universe, from ever more exotic sources with even more complex instrumentation, that somewhere in a dusty server on Earth, the answers are already here, if we would just only pick up that dataset and look at it … possibly for the first time.

Dr Maya Dillon is the community manager for Pivigo. The company supports analytical PhDs making the transition into the world of Data Science and also runs S2DS: Europe’s largest data science boot-camp.

To read the original article on The Guardian, click here.

Source: Big universe, big data, astronomical opportunity by analyticsweekpick

Sep 06, 18: #AnalyticsClub #Newsletter (Events, Tips, News & more..)

[  COVER OF THE WEEK ]

image
Big Data knows everything  Source

[ LOCAL EVENTS & SESSIONS]

More WEB events? Click Here

[ AnalyticsWeek BYTES]

>> Google Cloud security updates for SEO before 2018 GDPR to change business data interactions! by thomassujain

>> Jul 27, 17: #AnalyticsClub #Newsletter (Events, Tips, News & more..) by admin

>> Seven ways predictive analytics can improve healthcare by analyticsweekpick

Wanna write? Click Here

[ NEWS BYTES]

>>
 UI data science degree sees rise in enrollment as chance of employment soars – Daily Illini Under  Data Science

>>
 Know What will be the Future Scenario of Hadoop-as-a-Service Market – Truthful Observer Under  Hadoop

>>
 IoT Time Podcast S.3 Ep.24 Smart City of San Antonio – IoT Evolution World (blog) Under  IOT

More NEWS ? Click Here

[ FEATURED COURSE]

Data Mining

image

Data that has relevance for managerial decisions is accumulating at an incredible rate due to a host of technological advances. Electronic data capture has become inexpensive and ubiquitous as a by-product of innovations… more

[ FEATURED READ]

Thinking, Fast and Slow

image

Drawing on decades of research in psychology that resulted in a Nobel Prize in Economic Sciences, Daniel Kahneman takes readers on an exploration of what influences thought example by example, sometimes with unlikely wor… more

[ TIPS & TRICKS OF THE WEEK]

Keeping Biases Checked during the last mile of decision making
Today a data driven leader, a data scientist or a data driven expert is always put to test by helping his team solve a problem using his skills and expertise. Believe it or not but a part of that decision tree is derived from the intuition that adds a bias in our judgement that makes the suggestions tainted. Most skilled professionals do understand and handle the biases well, but in few cases, we give into tiny traps and could find ourselves trapped in those biases which impairs the judgement. So, it is important that we keep the intuition bias in check when working on a data problem.

[ DATA SCIENCE Q&A]

Q:When would you use random forests Vs SVM and why?
A: * In a case of a multi-class classification problem: SVM will require one-against-all method (memory intensive)
* If one needs to know the variable importance (random forests can perform it as well)
* If one needs to get a model fast (SVM is long to tune, need to choose the appropriate kernel and its parameters, for instance sigma and epsilon)
* In a semi-supervised learning context (random forest and dissimilarity measure): SVM can work only in a supervised learning mode

Source

[ VIDEO OF THE WEEK]

@AnalyticsWeek: Big Data Health Informatics for the 21st Century: Gil Alterovitz

 @AnalyticsWeek: Big Data Health Informatics for the 21st Century: Gil Alterovitz

Subscribe to  Youtube

[ QUOTE OF THE WEEK]

Processed data is information. Processed information is knowledge Processed knowledge is Wisdom. – Ankala V. Subbarao

[ PODCAST OF THE WEEK]

@BrianHaugli @The_Hanover ?on Building a #Leadership #Security #Mindset #FutureOfData #Podcast

 @BrianHaugli @The_Hanover ?on Building a #Leadership #Security #Mindset #FutureOfData #Podcast

Subscribe 

iTunes  GooglePlay

[ FACT OF THE WEEK]

571 new websites are created every minute of the day.

Sourced from: Analytics.CLUB #WEB Newsletter

2017 Trends in the Internet of Things

The Internet of Things is lurching forward into the coming year, like never before. Its growth is manifesting rapidly, exponentially, with an increasingly broadening array of use cases and applications influencing verticals well removed from its conventional patronage in the industrial internet.

With advances throughout the public and private sectors, its sway is extending beyond retail and supply chain management to encompass facets of delivery route optimization, financial services, healthcare and the marked expansion of the telecommunication industry in the form of connected cities and connected cars.

An onset of technological approaches, some novel, some refined, will emerge in the coming year to facilitate the analytics and security functionality necessary to solidify the IoT’s impact across the data sphere with a unique blend of big data, cloud, cognitive computing and processing advancements for customized applications of this expressivity of IT.

The result will be a personalization of business opportunities and consumer services veering ever closer to laymen users.

Speed of Thought Analytics
The interminable sensor generation and streaming of data foundational to the IoT warrants a heightened analytic productivity facilitated in a variety of ways. Surmounting the typical schema constraints germane to the relational world can involve semantic technologies with naturally evolving models to accommodate time-sensitive data. Other techniques involve file formats capable of deriving schema on the fly. “Self-describing formats is the umbrella,” MapR Senior Vice President of Data and Applications Jack Norris reflected. “There are different types of files that kind of fall into that, such as JSON and Avro.” Still other approaches involve General Processing Units (GPUs), which have emerged as a preferable alternative to conventional Central Processing Units (CPUs) to enable what Kinetica VP of Global Solution Engineering Eric Mizell referred to as answering questions at “the speed of thought”—in which organizations are not limited by schema and indexing designs for the number, speed, and type of questions provisioned by analytics in real-time.

According to Mizell, GPUs are “purpose-built for repetitive tasks at parallel with thousands of cores for aggregation, mathematics, and those type of things” whereas CPUS are better for discreet, sequential operations. Analytics platforms—particularly those designed for the IoT—leveraging GPUs are not bound by schema and rigid indexing to allow for multiple questions equitable to the speed at which data is generated, especially when augmented with visualization mechanisms illustrating fluctuating data states. “You can ask questions of the data without having to have predetermined questions and find answers in human readable time,” Mizell explained. “We’re getting tremendous response from customers able to load hundreds of millions and billions of rows [and] include them in interactive time. It transforms what business can do.” These capabilities are integral to the expansion of the IoT in the telecommunications industry, as “Connected cities and connected cars are huge with a lot of the telcos,” according to Mizell.

Machine Interaction
The best means of deriving value from the IoT actually transcends analytics and necessitates creating action between connected machines. Effecting such action in real-time will increasingly come to rely on the various forms of artificial intelligence pervading throughout modern enterprises, which is most readily accessible with options for machine learning and deep learning. Furthermore, Forbes contends AI is steadily moving to the cloud, which is instrumental in making these capabilities available to all organizations—not just those armed with a slew of data scientists. Regarding the options for libraries of deep learning and machine learning algorithms available to organizations today, Mizell remarked, “We’re exposing those libraries for consumers to use on analytics and streaming. On the data streaming end we’ll be able to execute those libraries on demand to make decisions in real-time.” The most cogent use case for machine-to-machine interaction involving the IoT pertains to connected cars, autonomous vehicles, and some of the more cutting edge applications for race car drivers. These vehicles are able to account for the requisite action necessary in such time-sensitive applications by leveraging GPU-facilitated AI in real time. “For autonomous cars, the Tesla has a bank of GPUs in the trunk,” Mizell commented. “That’s how it’s able to read the road in real-time.”

Back from the Edge
Another substantial trend to impact the IoT in the coming year is the evolving nature of the cloud as it relates to remote streaming and sensor data applications. Cloud developments in previous years were focused on the need for edge computing. The coming year will likely see a greater emphasis on hybrid models combining the decentralized paradigm with the traditional centralized one. In circumstances in which organizations have real-time, remote data sources on a national scale, “You can’t respond to it fast enough if you’re piping it all the way down to your data center,” Mizell said. “You’ll have a mix of hybrid but the aggregation will come local. The rest will become global.” One of the best use cases for such hybrid cloud models for the IoT comes from the U.S. Postal Service, which Mizell mentioned is utilizing the IoT to track mail carriers, optimize their routes, and increase overall efficiency. This use case is similar to deployments in retail in which route optimization is ascertained for supply chain management and the procurement of resources. Still, the most prominent development affecting the IoT’s cloud developments could be that “all of the cloud vendors are now providing GPUs,” Mizell said. “That’s very new this year. You’ve got all the big three with a bank of GPUs at the ready.” This development is aligned with the broadening AI capabilities found in the cloud.

Software Defined Security
Implementing IoT action and analytics in a secure environment could very well represent the central issue of the viability of this technology to the enterprise. Numerous developments in security are taking place to reduce the number and magnitude of attacks on the IoT. One of the means of protecting both endpoint devices and the centralized networks upon which they are based is to utilize software defined networking, which is enjoying a resurgence of sorts due to IoT security concerns. The core of the software defined networking approach is the intelligent provisioning of resources on demand for the various concerns of a particular network. In some instances this capability includes dedicating resources for bandwidth and trafficking, in others it directly applies to security. In the latter instance the network can create routes for various devices—on-the-fly—to either connect or disconnect devices to centralized frameworks according to security protocols. “Devices are popping up left and right,” Mizell acknowledged. “If it’s an unknown device shut it down. Even if it has a username and a password, don’t give it access.” Some of the applications of the IoT certainly warrant such security measures, including financial industry forays into the realm of digital banking in which mobile devices function as ATM machines allowing users to withdraw funds from their phones and have cash delivered to them. “That’s what they say is in the works,” Mizell observed.

Endpoint Security
Security measures for the IoT are exacerbated by the involvement of endpoint devices, which typically have much less robust security than centralized frameworks do. Moreover, such devices can actually perpetuate attacks in the IoT to wreak havoc on centralized mainframes. Strengthening device security can now take the form of endpoint device registration and authorization. According to Mizell: “There’s a notion of device registration, whether it’s on the network or not. If [you] can bring your phone or whatever device to work, it detects the device by its signature, and then says it only has access to the internet. So you start locking devices into a certain channel.” Blockchain technologies can also prove influential in securing the IoT. These technologies have natural encryption techniques that are useful for this purpose. Moreover, they also utilize a decentralized framework in which the validity of an action or addendum to the blockchain (which could pertain to IoT devices in this case) is determined by effecting a consensus among those involved in it. This decentralized, consensus-based authorization could prove valuable for protecting the IoT from attacks.

Democratization
As the use cases for the IoT become more and more surprising, it is perhaps reassuring to realize that the technologies enabling them are becoming more dependable. Accessing the cognitive computing capabilities to implement machine-based action and swift analytics via the cloud is within the grasp of most organizations. The plethora of security options can strengthen IoT networks, helping to justify their investments. Hybrid cloud models use the best of both worlds for instantaneous action as well as data aggregation. Thus, the advantages of the continuous connectivity and data generation of this technology are showing significant signs of democratization.

Source: 2017 Trends in the Internet of Things by jelaniharper

Dipping Customer Satisfaction? 5 Lessons from a cab driver

Dipping Customer Satisfaction? 5 Lessons to learn from a cab driverYes, you read it right. Great customer experience comes from anywhere. I want to bring your kind attention to a personal service encounter of a leading customer experience advocate Scott McKain . This encounter is focused ona cab journey by a cab driver “Taxi Terry”. No.. No.. I am not going to bug you with a word-on-word transcript of something you could extract more pleasure from watching the video at the end.
This experience teaches us some fundamental lessons that we all could learn from.

The top 5 lessons are:
1.      Understand your customer well:

In video story, cab driver Taxi Terry is working really hard in connecting with his customers and understanding their story. He not only listening but also recording those stories. It is something we all could learn. In today’s world when competition is at its peak, and companies are sitting on tight budget, it has become an absolute essential to retain existing clients.
What better way to retain existing client but to know their story and connect with them. The more you know about your customers, the better bond it forms, thatis comfortable and trustworthy. This not only leads to long-term, business relationship, but also more referral/word-of mouth opportunities.

2.      Build a system to deliver experience and not just service:
From the statement “Are you ready for the best cab ride of your life” to the point of referring Scott to Taxi Terry’s website for receipt and future bookings; every thing was build not just to deliver a service, but to deliver an experience. The outcome- Look at this blog, the video, hits on this youtube video. People do not remember service, but they do remember experience. So, if an effort is put in place to build a system to deliver an experience, the more customers will take them along for effective word-of-mouth, utlimately resulting in better branding.

Consider watching a movie trailer.  Movie trailers are not made to tell about an upcoming movie, but to deliver an experience, which you take with you when you leave and share it with people around. With surging social media channels, nothing could be better than a satisfied customer willing to share their story to friends and family. So, any effort in delivering experience will ultimately deliver more word-of-mouth and better loyalists.

3.      Keep on minimizing complexities for your customers:
One more thing that stood out in the video was consistent effort by cab driver to make sure customer is met with minimum difficulty. Every inch of effort was placed to make customer feel comfortable and at ease. It resulted in consistent spurts of WOW moments on how easy it is made to deal with something frustrating, dry and mundane. This ultimately helped customer in focusing on other valuable aspects of the service. The more complicated the services, the more they distract customers from a good experience. So, it will be worth every penny to fine tune processes to make them simple and easy to follow. Companies like Amazon, Zappos, and Southwest are pioneers at this.

4.      Always work to create a WOW experience, even if it costs extra:
While delivering some services, how many of us actually focus on delivering a WOW moment to amuse customers? This is another big hole that Taxi Terry addresses. His constant effort to entertain his client has paid off handsomely. Surely, he might have had to spend something extra to integrate the process in his routine services, but it ultimately helped him gain some loyal customers, build a strong customer centric brand. What these small things did for him, wouldn’t have been made possible by big investments in marketing. So, investing in that wow experience should always be considered. It not only helps build a powerful and sustainable brand around customer centricity, but also, help gain loyal customers along the way.
5.      Always use the best tools for the job that helps you excel in customer engagement:
Cab driver with a weather map is surely something no one expected. Even a database with customer history is something one could never relate with him. This openness to latest tools and techniques to deliver a quality experience is something, which we should all learn from this video.

Having better tools to help us deliver a wow experience is always an asset. It not only helps in delivering a better service, but also helps in establishing a competitive edge, which utlimately results in building a stronger and identifiable value preposition.

With these 5 lessons, one could easily create a rounded customer experience approach for delivering a system that is generating continuous and sustainable WOW at the cost of building a customer centric company.

Did it hit you in a different way, would love to know your thoughts. Please leave a comment for any suggestions/criticism.

Certainly would love to give a shout out to Scott McKain for this video.

Now, with no further ado the video:

Source by v1shal

Aug 30, 18: #AnalyticsClub #Newsletter (Events, Tips, News & more..)

[  COVER OF THE WEEK ]

image
Data security  Source

[ LOCAL EVENTS & SESSIONS]

More WEB events? Click Here

[ AnalyticsWeek BYTES]

>> Enterprise Architecture for the Internet of Things: Containerization and Microservices by jelaniharper

>> Future of Public Sector and Jobs in #BigData World #FutureOfData #Podcast by v1shal

>> Data And Analytics Collaboration Is A Win-Win-Win For Manufacturers, Retailers And Consumers by analyticsweekpick

Wanna write? Click Here

[ NEWS BYTES]

>>
 Master the fundamentals of cloud application security – TechTarget Under  Cloud Security

>>
 Hadoop and Big Data Analytics Market Segmentation, Opportunities, Trends & Future Scope to 2026 – Coherent Chronicle (press release) (blog) Under  Hadoop

>>
 HR Tech Startup meQuilibrium Raises $7M in Series C – American Inno Under  Talent Analytics

More NEWS ? Click Here

[ FEATURED COURSE]

Process Mining: Data science in Action

image

Process mining is the missing link between model-based process analysis and data-oriented analysis techniques. Through concrete data sets and easy to use software the course provides data science knowledge that can be ap… more

[ FEATURED READ]

Superintelligence: Paths, Dangers, Strategies

image

The human brain has some capabilities that the brains of other animals lack. It is to these distinctive capabilities that our species owes its dominant position. Other animals have stronger muscles or sharper claws, but … more

[ TIPS & TRICKS OF THE WEEK]

Save yourself from zombie apocalypse from unscalable models
One living and breathing zombie in today’s analytical models is the pulsating absence of error bars. Not every model is scalable or holds ground with increasing data. Error bars that is tagged to almost every models should be duly calibrated. As business models rake in more data the error bars keep it sensible and in check. If error bars are not accounted for, we will make our models susceptible to failure leading us to halloween that we never wants to see.

[ DATA SCIENCE Q&A]

Q:What is the life cycle of a data science project ?
A: 1. Data acquisition
Acquiring data from both internal and external sources, including social media or web scraping. In a steady state, data extraction and routines should be in place, and new sources, once identified would be acquired following the established processes

2. Data preparation
Also called data wrangling: cleaning the data and shaping it into a suitable form for later analyses. Involves exploratory data analysis and feature extraction.

3. Hypothesis & modelling
Like in data mining but not with samples, with all the data instead. Applying machine learning techniques to all the data. A key sub-step: model selection. This involves preparing a training set for model candidates, and validation and test sets for comparing model performances, selecting the best performing model, gauging model accuracy and preventing overfitting

4. Evaluation & interpretation

Steps 2 to 4 are repeated a number of times as needed; as the understanding of data and business becomes clearer and results from initial models and hypotheses are evaluated, further tweaks are performed. These may sometimes include step5 and be performed in a pre-production.

5. Deployment

6. Operations
Regular maintenance and operations. Includes performance tests to measure model performance, and can alert when performance goes beyond a certain acceptable threshold

7. Optimization
Can be triggered by failing performance, or due to the need to add new data sources and retraining the model or even to deploy new versions of an improved model

Note: with increasing maturity and well-defined project goals, pre-defined performance can help evaluate feasibility of the data science project early enough in the data-science life cycle. This early comparison helps the team refine hypothesis, discard the project if non-viable, change approaches.

Steps 2 to 4 are repeated a number of times as needed; as the understanding of data and business becomes clearer and results from initial models and hypotheses are evaluated, further tweaks are performed. These may sometimes include step5 and be performed in a pre-production.

Deployment

Operations
Regular maintenance and operations. Includes performance tests to measure model performance, and can alert when performance goes beyond a certain acceptable threshold

Optimization
Can be triggered by failing performance, or due to the need to add new data sources and retraining the model or even to deploy new versions of an improved model

Note: with increasing maturity and well-defined project goals, pre-defined performance can help evaluate feasibility of the data science project early enough in the data-science life cycle. This early comparison helps the team refine hypothesis, discard the project if non-viable, change approaches.

Steps 2 to 4 are repeated a number of times as needed; as the understanding of data and business becomes clearer and results from initial models and hypotheses are evaluated, further tweaks are performed. These may sometimes include step5 and be performed in a pre-production.

Deployment

Operations
Regular maintenance and operations. Includes performance tests to measure model performance, and can alert when performance goes beyond a certain acceptable threshold

Optimization
Can be triggered by failing performance, or due to the need to add new data sources and retraining the model or even to deploy new versions of an improved model

Note: with increasing maturity and well-defined project goals, pre-defined performance can help evaluate feasibility of the data science project early enough in the data-science life cycle. This early comparison helps the team refine hypothesis, discard the project if non-viable, change approaches.

Steps 2 to 4 are repeated a number of times as needed; as the understanding of data and business becomes clearer and results from initial models and hypotheses are evaluated, further tweaks are performed. These may sometimes include step5 and be performed in a pre-production.

Deployment

Operations
Regular maintenance and operations. Includes performance tests to measure model performance, and can alert when performance goes beyond a certain acceptable threshold

Optimization
Can be triggered by failing performance, or due to the need to add new data sources and retraining the model or even to deploy new versions of an improved model

Note: with increasing maturity and well-defined project goals, pre-defined performance can help evaluate feasibility of the data science project early enough in the data-science life cycle. This early comparison helps the team refine hypothesis, discard the project if non-viable, change approaches.

Source

[ VIDEO OF THE WEEK]

The History and Use of R

 The History and Use of R

Subscribe to  Youtube

[ QUOTE OF THE WEEK]

Information is the oil of the 21st century, and analytics is the combustion engine. – Peter Sondergaard

[ PODCAST OF THE WEEK]

Discussing Forecasting with Brett McLaughlin (@akabret), @Akamai

 Discussing Forecasting with Brett McLaughlin (@akabret), @Akamai

Subscribe 

iTunes  GooglePlay

[ FACT OF THE WEEK]

Estimates suggest that by better integrating big data, healthcare could save as much as $300 billion a year — that’s equal to reducing costs by $1000 a year for every man, woman, and child.

Sourced from: Analytics.CLUB #WEB Newsletter

What Is a Residential IP, Data Center Proxy and what are the Differences?

A residential IP could simply mean a connection from an ISP to a residential owner. When you connect to the internet, you connect using an IP address. To know your current IP address, you can use the What Is My IP site. It will display your IP address and your ISP name as well as the country you are connecting the internet from.

An IP address is a set of numbers appearing in a pattern and separated with a full stop such as 198.162.122.1. If you use a residential IP address as your proxy when connecting the internet in your residence, your real IP address will be masked so you will be assigned a different IP address which is called residential IP address.

What is a datacenter proxy?

Unlike a residential IP that is owned by an ISP, a datacenter proxy is not. It acts a shield between you and the web. So anyone spying on what you are doing cannot track you. Your home IP address and all the information related to it is hidden and only the datacenter proxy is displayed together with the details of the datacenter proxy provider. A datacenter proxy can also work as a shield that masks your actual IP address and all your information, however, its performance is not as effective as that of a residential IP.

Difference between a residential IP and a datacenter proxy

Let’s say you are browsing the web from a public Wi-Fi and you need to hide your real IP since most public Wi-Fi connections are not secure which could make no sense to use a residential IP as a proxy.The real essence to use a residential IP address it to ensure that sites don’t know who exactly you are since no information associated with you is made available to those websites you visit.

Residential IP Proxies

Genuine and legitimate: It is easier to create multiple data center proxies but obtaining many residential proxies is difficult since residential IPs are mainly used for residential purposes. This is the reason why residential IPs are considered to be more genuine and legitimate when compared to datacenter proxies.

Costly with few providers: Residential IPs are difficult to obtain so this makes them be more expensive since fewer providers offer them, in fact, obtaining a monthly subscription for hundreds of residential IPs is extremely expensive. However, sometimes the monthly subscription for hundreds of residential IPs could be cheaper when compared with a larger monthly subscription of data center IP proxies.

Residential IPs are sometimes prone to be blacklisted: Although they are genuine and legitimate, they are also likely to be abused.  In such situations, they get blacklisted by some security technologies and databases. Therefore, using a residential proxy connection is good although not perfect.

Datacenter proxies

Less genuine though still protective: Websites have the ability to detect a user who is accessing them via a proxy connection, and since there are many users who are spamming these websites, you could be held accountable when accessing these websites using one. However, what the websites can detect is the datacenter proxy since you real IP address and all information associated with you is shielded.  It is, therefore, good to use fresh data center proxy for different accounts than accessing the web with your real IP for all your account.

Cheaper with more providers: It’s easy to collect datacenter proxies and they are offered by hundreds of providers. This makes them be less expensive; in fact, they cost a fraction of what residential IP proxies could cost you.

Which is best for residential IPs and data center proxy?

This post is not aimed at selling either of the two so you can take it up to yourself to decide which one best suits your needs. However, it is good to be careful when getting advice from a proxy or VPN provider.

Data centers are easy to get and they are less expensive. Using them could cost you a fraction of what residential IPs could cost you, however, if you consider legitimacy, you are better off using residential IPs.

Conclusion

Having learned about the difference between residential IPs and datacenter proxy, it’s your turn to choose which one is suitable for your needs. However, it is good to consider using something that is genuine all time.

Source

Assess Your Data Science Expertise

Data Skills Scoring System
Data Skills Scoring System

What kind of a data scientist are you? Take the free Data Skills Scoring System Survey at http://pxl.me/awrds3

Companies rely on experts who can make sense of their data. Often referred to as data scientists, these people bring their specific skills to bear in helping extract insight from the data. These skills include such things as Hacking, Math & Statistics and Substantive Expertise. In an interesting study published by O’Reilly, Harlan D. Harris, Sean Patrick Murphy and Marck Vaisman surveyed several hundred practitioners, asking them about their proficiency in 22 different data skills. They found that data skills fell into five broad areas: Business, ML / Big Data, Math / OR, Programming and Statistics.

Complementary Data Skills Required

There are three major tasks involved in analytics projects. First, you need to ask the right questions, requiring deep knowledge of your domain of interest, whether that be for-profit business, non-profits or healthcare organizations. When you know your domain area well, you are better equipped to know what questions to ask to get the most value from your data. Second, you need access to the data to help you answer those questions. These data might be housed in multiple data sources, requiring a data worker with programming skills to access and intelligently integrate data silos. Finally, you need somebody to make sense of the data to answer the questions proposed earlier. This step requires data workers who are more statistically-minded and can apply the right analytics to the data. Answering these questions could be more exploratory or intentional in nature, requiring different types of statistical and mathematical approaches.

Getting value from data is no simple task, often requiring data experts with complementary skills. After all, I know of nobody who possesses all the data skills to successfully tackle data problems. No wonder why data science has been referred to as a team sport.

Data Skills Scoring System (DS3)

We at AnalyticsWeek have developed the Data Skills Scoring System (DS3), a free web-based self-assessment survey that measures proficiency across five broad data science skills: business, technology, math and modeling, programming and statistics. Our hope is that the DS3 can optimize the value of data by improving how data professionals work together. If you are a data professional, the DS3 can help you:

  1. identify your analytics strengths
  2. understand where to improve your analytics skill set
  3. identify team members who complement your skills
  4. capitalize on job postings that match your skill set

While the publicly available DS3 is best suited for individual data professionals, we are customizing the DS3 for enterprises to help them optimize the value of their data science teams. By integrating DS3 scores with other data sources, enterprises will be able to improve how they acquire, retain and manage data professionals.

Find out your data skills score by taking the free Data Skills Scoring System Survey:

http://pxl.me/awrds3

We are also conducting research using the DS3 that will advance our understanding of the emerging field of data science. Some questions we would like to answer are:

  • Do certain data skills cluster together?
  • Are some data skills more important than others in determining project success?
  • Are data science teams with comprehensive data skills more satisfied with their work than data science teams where some skills are lacking?

Respondents will receive a free executive summary of our findings.

Source by bobehayes

Aug 23, 18: #AnalyticsClub #Newsletter (Events, Tips, News & more..)

[  COVER OF THE WEEK ]

image
Productivity  Source

[ LOCAL EVENTS & SESSIONS]

More WEB events? Click Here

[ AnalyticsWeek BYTES]

>> Which Customer Loyalty Metric is the Best? My Interview with Jeff Olsen of Allegiance Radio by bobehayes

>> Measuring Customer Loyalty in Non-Competitive Environments by bobehayes

>> Four Use Cases for Healthcare Predictive Analytics, Big Data by analyticsweekpick

Wanna write? Click Here

[ FEATURED COURSE]

Process Mining: Data science in Action

image

Process mining is the missing link between model-based process analysis and data-oriented analysis techniques. Through concrete data sets and easy to use software the course provides data science knowledge that can be ap… more

[ FEATURED READ]

The Black Swan: The Impact of the Highly Improbable

image

A black swan is an event, positive or negative, that is deemed improbable yet causes massive consequences. In this groundbreaking and prophetic book, Taleb shows in a playful way that Black Swan events explain almost eve… more

[ TIPS & TRICKS OF THE WEEK]

Keeping Biases Checked during the last mile of decision making
Today a data driven leader, a data scientist or a data driven expert is always put to test by helping his team solve a problem using his skills and expertise. Believe it or not but a part of that decision tree is derived from the intuition that adds a bias in our judgement that makes the suggestions tainted. Most skilled professionals do understand and handle the biases well, but in few cases, we give into tiny traps and could find ourselves trapped in those biases which impairs the judgement. So, it is important that we keep the intuition bias in check when working on a data problem.

[ DATA SCIENCE Q&A]

Q:Do you think 50 small decision trees are better than a large one? Why?
A: * Yes!
* More robust model (ensemble of weak learners that come and make a strong learner)
* Better to improve a model by taking many small steps than fewer large steps
* If one tree is erroneous, it can be auto-corrected by the following
* Less prone to overfitting

Source

[ VIDEO OF THE WEEK]

Solving #FutureOfWork with #Detonate mindset (by @steven_goldbach & @geofftuff) #JobsOfFuture #Podcast

 Solving #FutureOfWork with #Detonate mindset (by @steven_goldbach & @geofftuff) #JobsOfFuture #Podcast

Subscribe to  Youtube

[ QUOTE OF THE WEEK]

Numbers have an important story to tell. They rely on you to give them a voice. – Stephen Few

[ PODCAST OF THE WEEK]

#BigData @AnalyticsWeek #FutureOfData with Jon Gibs(@jonathangibs) @L2_Digital

 #BigData @AnalyticsWeek #FutureOfData with Jon Gibs(@jonathangibs) @L2_Digital

Subscribe 

iTunes  GooglePlay

[ FACT OF THE WEEK]

73% of organizations have already invested or plan to invest in big data by 2016

Sourced from: Analytics.CLUB #WEB Newsletter