[ COVER OF THE WEEK ]
Data security Source
[ LOCAL EVENTS & SESSIONS]
- Sep 02, 2018 #WEB Develop a Successful Big Data & Analytics Startup Today! Amsterdam
- Sep 17, 2018 #WEB [1 hr Free] PipelineAI, GPU, TPU, Spark, TensorFlow, Kubernetes, Kafka, Scikit
- Sep 16, 2018 #WEB Free Webinar on Big Data with Scala & Spark – Live Instructor Led Session | Limited Seats | Houston, TX
[ AnalyticsWeek BYTES]
>> Enterprise Architecture for the Internet of Things: Containerization and Microservices by jelaniharper
>> Future of Public Sector and Jobs in #BigData World #FutureOfData #Podcast by v1shal
>> Data And Analytics Collaboration Is A Win-Win-Win For Manufacturers, Retailers And Consumers by analyticsweekpick
[ NEWS BYTES]
>>
Master the fundamentals of cloud application security – TechTarget Under Cloud Security
>>
Hadoop and Big Data Analytics Market Segmentation, Opportunities, Trends & Future Scope to 2026 – Coherent Chronicle (press release) (blog) Under Hadoop
>>
HR Tech Startup meQuilibrium Raises $7M in Series C – American Inno Under Talent Analytics
[ FEATURED COURSE]
Process Mining: Data science in Action
![]() |
[ FEATURED READ]
Superintelligence: Paths, Dangers, Strategies
![]() |
[ TIPS & TRICKS OF THE WEEK]
Save yourself from zombie apocalypse from unscalable models
One living and breathing zombie in today’s analytical models is the pulsating absence of error bars. Not every model is scalable or holds ground with increasing data. Error bars that is tagged to almost every models should be duly calibrated. As business models rake in more data the error bars keep it sensible and in check. If error bars are not accounted for, we will make our models susceptible to failure leading us to halloween that we never wants to see.
[ DATA SCIENCE Q&A]
Q:What is the life cycle of a data science project ?
A: 1. Data acquisition
Acquiring data from both internal and external sources, including social media or web scraping. In a steady state, data extraction and routines should be in place, and new sources, once identified would be acquired following the established processes
2. Data preparation
Also called data wrangling: cleaning the data and shaping it into a suitable form for later analyses. Involves exploratory data analysis and feature extraction.
3. Hypothesis & modelling
Like in data mining but not with samples, with all the data instead. Applying machine learning techniques to all the data. A key sub-step: model selection. This involves preparing a training set for model candidates, and validation and test sets for comparing model performances, selecting the best performing model, gauging model accuracy and preventing overfitting
4. Evaluation & interpretation
Steps 2 to 4 are repeated a number of times as needed; as the understanding of data and business becomes clearer and results from initial models and hypotheses are evaluated, further tweaks are performed. These may sometimes include step5 and be performed in a pre-production.
5. Deployment
6. Operations
Regular maintenance and operations. Includes performance tests to measure model performance, and can alert when performance goes beyond a certain acceptable threshold
7. Optimization
Can be triggered by failing performance, or due to the need to add new data sources and retraining the model or even to deploy new versions of an improved model
Note: with increasing maturity and well-defined project goals, pre-defined performance can help evaluate feasibility of the data science project early enough in the data-science life cycle. This early comparison helps the team refine hypothesis, discard the project if non-viable, change approaches.
Steps 2 to 4 are repeated a number of times as needed; as the understanding of data and business becomes clearer and results from initial models and hypotheses are evaluated, further tweaks are performed. These may sometimes include step5 and be performed in a pre-production.
Deployment
Operations
Regular maintenance and operations. Includes performance tests to measure model performance, and can alert when performance goes beyond a certain acceptable threshold
Optimization
Can be triggered by failing performance, or due to the need to add new data sources and retraining the model or even to deploy new versions of an improved model
Note: with increasing maturity and well-defined project goals, pre-defined performance can help evaluate feasibility of the data science project early enough in the data-science life cycle. This early comparison helps the team refine hypothesis, discard the project if non-viable, change approaches.
Steps 2 to 4 are repeated a number of times as needed; as the understanding of data and business becomes clearer and results from initial models and hypotheses are evaluated, further tweaks are performed. These may sometimes include step5 and be performed in a pre-production.
Deployment
Operations
Regular maintenance and operations. Includes performance tests to measure model performance, and can alert when performance goes beyond a certain acceptable threshold
Optimization
Can be triggered by failing performance, or due to the need to add new data sources and retraining the model or even to deploy new versions of an improved model
Note: with increasing maturity and well-defined project goals, pre-defined performance can help evaluate feasibility of the data science project early enough in the data-science life cycle. This early comparison helps the team refine hypothesis, discard the project if non-viable, change approaches.
Steps 2 to 4 are repeated a number of times as needed; as the understanding of data and business becomes clearer and results from initial models and hypotheses are evaluated, further tweaks are performed. These may sometimes include step5 and be performed in a pre-production.
Deployment
Operations
Regular maintenance and operations. Includes performance tests to measure model performance, and can alert when performance goes beyond a certain acceptable threshold
Optimization
Can be triggered by failing performance, or due to the need to add new data sources and retraining the model or even to deploy new versions of an improved model
Note: with increasing maturity and well-defined project goals, pre-defined performance can help evaluate feasibility of the data science project early enough in the data-science life cycle. This early comparison helps the team refine hypothesis, discard the project if non-viable, change approaches.
Source
[ VIDEO OF THE WEEK]
Subscribe to Youtube
[ QUOTE OF THE WEEK]
Information is the oil of the 21st century, and analytics is the combustion engine. Peter Sondergaard
[ PODCAST OF THE WEEK]
Discussing Forecasting with Brett McLaughlin (@akabret), @Akamai
Subscribe
[ FACT OF THE WEEK]
Estimates suggest that by better integrating big data, healthcare could save as much as $300 billion a year that’s equal to reducing costs by $1000 a year for every man, woman, and child.