The Big Data community is coming back in November with a new workshop on Machine Learning – this time with how to build a model using Spark ML logistic regression and gradient boosting. RSVP here to save your front row spot on November 5, starting at 14:00!
About the workshop
Come join us for an afternoon in which we will explore Apache Spark’s Machine Learning capabilities. We’ll be looking at using Spark to build a Credit Scoring model which estimates the probability of default for current and existing customers.
1. Intro, problem description and setup
2. Loading data
3. Exploratory data analysis
4. Feature engineering
5. Building our first model
6. Testing and validation
7. Improving the model with cross-validation and hyper-parameter tuning
9. Considerations for running in production
Bring your laptop! Beyond that, the only requirement is a web browser. We will be using a managed Spark platform, called Azure Databricks, for the lab. We will provide access to Azure Databricks and everything else you need.
Bucharest Big Data, community, machine learning, spark, tech events, TechHub, techhub bucharest, workshop
Do you want to learn more about the newly introduced features in Spark v2.2 and about Spark’s integration with Kafka and Cassandra for streaming pipelines? RSVP here now and save your front row place on November 3, starting 3:30 PM!
About the event
- The first part of the workshop will cover Spark SQL with Scala, specifically the limited toy examples emphasized by Spark documentation and tutorials. Spark SQL, used in isolation, can realistically only be used for such didactic use cases – when ingesting real-world datasets, Spark SQL will very quickly show its limitations and therefore some more powerful techniques are needed.
- The second part of the workshop will cover the techniques mentioned above, without which Spark SQL is largely ineffective. This section of the workshop will be about sharing lessons learned the hard way, and experience gathered in the trenches of the real world.
- The third part of the workshop, titled “Machine Learning By Example”, will cover multiclass classification using SparkML’s Pipeline API with Scala. SparkML is the machine learning module that ships with Spark.
- During the remaining time, the trainer will focus on a Scala/Spark Streaming application that ingests data from Apache Kafka (an open-source, high-performance, distributed message queue), performs streaming analytics, then saves the analytics results back into Kafka.
If participants want to dive deeper into high-complexity topics, the trainer will instead focus on live coding ad-hoc demos.
- 16:00 – 17:00 SQL with Scala
- 17:00 – 18:00 Spark SQL techniques
- 18:00 – 19:00 Machine Learning By Example
- 19:00 – 20:00 Scala/Spark Streaming application
RSVP here to confirm your attendance to Spark v2.2 Workshop.
This event is hosted through TechSociety(http://techsociety.co/), an initiative that aims to grow the local tech community stronger by providing free event space, as well as logistical and communication support, to all people that organize free tech-related events.
Thinking about organizing a meetup or an event for the tech community out there? Join TechSociety(http://techsociety.co/) and we’ll help you out! All you have to do is submit the registration form available on our website, and we’ll get back to you to set all the details straight!
agile, ai, apache, Bucharest, cassandra, kafka, machine learning, networking, Public Event, scala, spark, sql, TechHub, workshop