Wednesday, October 28, 2020
In our software-driven world, business success depends on engineering organizations. But manual data gathering, meeting cycles and ad hoc reports weigh down engineering teams’ ability to build collaborative solutions, and disjointed and overlapping data silos prevent engineering teams from productive collaboration.
This session will outline the problems engineering teams face in creating streamlined workflows, and the AI and machine learning solutions that enable engineering organizations to produce accurate insights on team performance and potential. Jeff will address how machine learning and applied AI leverage data science to transform engineering operations from the bottom up, allowing engineers to focus on what they do best: building software.
Applying AI to healthcare is a great opportunity — better predictions on who is more likely to develop diabetes, back pain, and other chronic diseases, better predictions on which patients will require hospital re-admissions — not only in saving money but also improving patient health. In this talk, we will discuss our technology solution and our challenges in building AI/ML solutions in this domain:
* We built a data ingestion and extraction process using Apache Beam and Google Cloud DataFlow. We will describe our obstacles around joining and normalizing disparate patient datasets and our heuristics to solve this problem. We will also talk about performance and scalability obstacles and our solutions.
* We built model training and serving pipelines using Kubeflow (TensorFlow on Kubernetes and Istio). We will talk about how we built a HIPAA/SOC2 compliant infrastructure with these technologies and our experience using Katib for model tuning.
Thursday, October 29, 2020
As a C-Level executive, you understand that a hypothesis-driven market strategy is key to the success of your business. Increasingly, however, the IT Department is taking over, using data analysis to furnish “answers” and creating an environment where hypotheses ending in questions are “defeats.” You know that data explains the past while your domain knowledge and experience informs the future. If this success/failure climate takes hold and persists at your company, your business will fail. IT is analyzing the past, but they are mistaking it for the future. For the holistic health and future of your business, you must reclaim your hypothesis-driven methodology. You must use your experience to reconnoiter analyst perspective, remind all of company credo, and recalibrate to encourage hypothesis-driven analysis.
NLP is a key component in many data science systems that must understand or reason about text. This hands-on tutorial uses the open-source Spark NLP library to explore advanced NLP in Python. Spark NLP provides state-of-the-art accuracy, speed, and scalability for language understanding by delivering production-grade implementations of some of the most recent research in applied deep learning. It's the most widely used NLP library in the enterprise today. You'll edit and extend a set of executable Python notebooks by implementing these common NLP tasks: named entity recognition, sentiment analysis, spell checking and correction, document classification, and multilingual and multi domain support. The discussion of each NLP task includes the latest advances in deep learning used to tackle it, including the prebuilt use of BERT embeddings within Spark NLP, using tuned embeddings, and 'post-BERT' research results like XLNet, ALBERT, and roBERTa. Spark NLP builds on the Apache Spark and TensorFlow ecosystems, and as such it's the only open-source NLP library that can natively scale to use any Spark cluster, as well as take advantage of the latest processors from Intel and Nvidia. You'll run the notebooks locally on your laptop, but we'll explain and show a complete case study and benchmarks on how to scale an NLP pipeline for both training and inference.