We provide IT Staff Augmentation Services!

Spark/scala Developer Resume

0/5 (Submit Your Rating)

Atlanta, GA

SUMMARY

  • Having 10+ years of experience in IT industry implementing, developing and maintenance of various Web Based applications using Java, J2EE Technologies and Big Data Ecosystem.
  • Superior background in object oriented development including PERL, C++, Java, Scala and shell scripting.
  • Good understanding in writing Python Scripts.
  • Good experience in writing Spark applications using Python and Scala.
  • Used python scripting for automation and deep have understanding of advanced data analysis libraries like NumPy, Pandas, Matplotlib.
  • Configured Spark streaming to get ongoing information from the Kafka and store the stream information to HDFS.
  • Designed and developed data loading strategies, transformation for business to analyze the datasets.
  • Processed flat files in various file formats and stored them as in various partition models in HDFS.
  • Responsible for Building, develop, testing shared components that will be used across modules.
  • Involved in creating Data Model (RDF) for the Existing PME Application in Semantic Web.
  • Developed Shell, Perl and Python scripts to automate and provide Control flow to Pig scripts.
  • Created various Parser programs to extract data from Autosys, Tibco

PROFESSIONAL EXPERIENCE

Confidential

Lead Spark/Scala Developer

Responsibilities:

  • Analyze and define researcher's strategy and determine system architecture and requirement to achieve goals.
  • Developed multiple Kafka Producers and Consumers from as per the software requirement specifications.
  • Used Kafka for log accumulation like gathering physical log documents off servers and places them in a focal spot like HDFS for handling.
  • Configured Spark streaming to get ongoing information from the Kafka and store the stream information to HDFS.
  • Used various spark Transformations and Actions for cleansing the input data.
  • Developed shell scripts to generate the hive create statements from the data and load the data into the table.
  • Wrote Map Reduce jobs using Java API and Pig Latin
  • Optimized Hive QL/ pig scripts by using execution engine like Tez, Spark.
  • Involved in writing custom Map - Reduce programs using java API for data processing.
  • Integrated Maven build and designed workflows to automate the build and deploy process.
  • Involved in developing a linear regression model to predict a continuous measurement for improving the observation on wind turbine data developed using spark with Scala API.
  • The hive tables are created as per requirement were Internal or External tables defined with appropriate static, dynamic partitions and bucketing, intended for efficiency.
  • Load and transform large sets of structured, semi structured data using hive.
  • Developed monitoring and notification tools using Python.
  • Developed Python, Shell/Perl Scripts and Power shell for automation purpose.
  • Extract Real time feed using Kafka and Spark Streaming and convert it to RDD and process data in the form of Data Frame and save the data as Parquet format in HDFS.
  • Used Spark and Spark-SQL to read the parquet data and create the tables in hive using the Scala API.
  • Developed Hive queries for the analysts.
  • Worked in AWS environment for development and deployment of custom Hadoop applications.
  • Strong experience in working with ELASTIC MAPREDUCE(EMR)and setting up environments on Amazon AWS EC2 instances.
  • Cassandra implementation using Datastax Java API.
  • Very good understanding Cassandra cluster mechanism that includes replication strategies, snitch, gossip, consistent hashing and consistency levels.
  • Experienced in using the spark application master to monitor the spark jobs and capture the logs for the spark jobs.
  • Implemented Spark using Scala and Spark SQL for faster testing and processing of data.
  • Implemented Spark using Scala and utilizing Data frames and Spark SQL API for faster processing of data.
  • Apply data science and machine learning techniques using Zeppelin to improve search engine in Wealth management firm.
  • Involved in performing the analytics and visualization for the data from the logs and estimate the error rate and study the probability of future errors using regressing models.
  • Used WEB HDFS REST API to make the HTTP GET, PUT, POST and DELETE requests from the webserver to perform analytics on the data lake.
  • Worked on Libraries like Pandas for data manipulation and analysis.
  • Used Kafka to patch up a customer activity taking after pipeline as a course of action of steady appropriate subscribe supports.
  • Cluster coordination services through Zookeeper

Environment: HDP 2.3.4, Hadoop, Hive, Python, HDFS, HPC, WEBHDFS, WEBHCAT, Spark, Spark-SQL, KAFKA, AWS, Zeppelin, Java, Scala, Web Server's, Maven Build and SBT build, Rally

Confidential, FL

Lead Spark/Scala Developer

Responsibilities:

  • Designed and developed data loading strategies, transformation for business to analyze the datasets.
  • Processed flat files in various file formats and stored them as in various partition models in HDFS.
  • Responsible for Building, develop, testing shared components that will be used across modules.
  • Responsible in performing sort, join, aggregations, filter, and other transformations on the datasets using Spark.
  • Involved in developing a linear regression model for predicting continuous measurement.
  • Responsible in Implementing advanced procedures like text analytics and processing using the in-memory computing capabilities like Spark.
  • Experience in extracting appropriate features from data sets in order to handle bad, null, partial records using Spark SQL.
  • Collected data using Spark Streaming from AWS S3 bucket in near-real-time and performs necessary Transformations and Aggregations to build the data model and persists the data in HDFS
  • Involved in converting Hive/SQL queries into Spark transformations using Spark RDD's.
  • Experienced in working with spark eco system using Spark SQL and Scala queries on different formats like Text file, CSV file.
  • Expertized in implementing Spark using Scala and Spark SQL for faster testing and processing of data responsible to manage data from different sources.
  • Managing and scheduling Jobs on a Hadoop cluster using Oozie.
  • Using spark - Cassandra connector to load data to and from Cassandra.
  • Experience in building Real-time Data Pipelines with Kafka Connect and Spark Streaming.
  • Responsible in development of Spark Cassandra connector to load data from flat file to Cassandra for analysis.
  • Imported the data from different sources like AWS S3, LFS into Spark RDD.
  • Responsible in creating consumer API's using Kafka.
  • Responsible in creating Hive tables, loading with data and writing Hive queries.
  • Developed end to end data processing pipelines that begin with receiving data using distributed messaging systems Kafka for persisting data into Cassandra
  • Worked on a POC to perform sentiment analysis of twitter data using Open NLP API.
  • Pandas library was used for flexible reshaping and pivoting of data sets.
  • Responsible in creating mappings and workflows to extract and load data from relational databases, flat file sources and legacy systems using Talend.
  • Developed and designed ETL Jobs using Talend Integration Suite in Talend 5.2.2
  • Apply data science and machine learning techniques using Zeppelin to improve search engine in Wealth management firm.
  • Writing logic using Python
  • Experienced in managing and reviewing log files using Web UI and Cloudera Manager.
  • Involved in creating External Hive tables and involved in data loading and writing Hive UDFs.
  • Experience in using various compression techniques like Snappy Codec, Lzo, and Gzip to save data and optimize data transfer over network using Avro, Parquet.
  • Involved in unit testing and user documentation and used Log4j for creating the logs.

Environment: Apache Spark, Hadoop, HDFS, Hive, Kafka, Sqoop, Scala, Talend, Cassandra, Oozie, Cloudera, Impala, linux, Oozie

Confidential, Phoenix AZ

Scala Spark Programmer

Responsibilities:

  • Worked with BI team in the area of Big Data Hadoop cluster implementation and data integration in developing large-scale system software.
  • Processing of incoming files using Spark native API.
  • Usage of Spark Streaming and Spark SQL API to process the files.
  • Developed Spark scripts by using Scala shell commands as per the requirement.
  • Processing the schema oriented and non-schema oriented data using Scala and Spark.
  • Developed and designed system to collect data from multiple portal using kafka and then process it using spark.
  • Developed and designed automate process using shell scripting for data movement and purging.
  • Used Spark API over Cloudera Hadoop YARN to perform analytics on data in Hive.
  • Developed Hadoop streaming Map/Reduce works using Python. Real time stream processing with Apache Kafka and Apache Storm. Hadoop distributions (Hortonworks & Cloudera.
  • Developed Scala scripts, UDFFs using both Data frames/SQL/Data sets and RDD/MapReduce in Spark 1.6 for Data Aggregation, queries and writing data back into OLTP system through Sqoop.
  • Handling large datasets using Partitions, Spark in Memory capabilities, Broadcasts in Spark, Effective & efficient Joins, Transformations and other during ingestion process itself.

Environment: Hadoop, MapReduce, HDFS, Scala,Spark Cloudera Manager, Pig, Sqoop, ZooKeeper, Teradata, PL/SQL, MySQL, Windows, Hbase.

Confidential, Atlanta GA

Spark/Scala Developer

Responsibilities:

  • Designed and developed data loading strategies, transformation for business to analyze the datasets.
  • Processed flat files in various file formats and stored them as in various partition models in HDFS.
  • Responsible for Building, develop, testing shared components that will be used across modules.
  • Responsible in performing sort, join, aggregations, filter, and other transformations on the datasets using Spark.
  • Responsible in Implementing advanced procedures like text analytics and processing using the in-memory computing capabilities like Spark.
  • Experience in extracting appropriate features from data sets in order to handle bad, null, partial records using Spark SQL.
  • Collected data using Spark Streaming from AWS S3 bucket in near-real-time and performs necessary Transformations and Aggregations to build the data model and persists the data in HDFS.
  • Involved in making code changes for a module in turbine simulation for processing across the cluster using spark-submit.
  • Involved in converting Hive/SQL queries into Spark transformations using Spark RDD's.
  • Experienced in working with spark eco system using Spark SQL and Scala queries on different formats like Text file, CSV file.
  • Expertized in implementing Spark using Scala and Spark SQL for faster testing and processing of data responsible to manage data from different sources.
  • Used Python for writing script to move the data across clusters.
  • Managing and scheduling Jobs on a Hadoop cluster using Oozie.
  • Using spark - Cassandra connector to load data to and from Cassandra.
  • Experience in building Real-time Data Pipelines with Kafka Connect and Spark Streaming.
  • Responsible in development of Spark Cassandra connector to load data from flat file to Cassandra for analysis.
  • Imported the data from different sources like AWS S3, LFS into Spark RDD.
  • Responsible in creating consumer API's using Kafka.
  • Responsible in creating Hive tables, loading with data and writing Hive queries.
  • Developed end to end data processing pipelines that begin with receiving data using distributed messaging systems Kafka for persisting data into Cassandra.
  • Involved in developing a linear regression model to predict a continuous measurement for improving the observation on wind turbine data developed using spark with Scala API.
  • Worked extensively on spark and MLlib to develop a regression model for logistic information.
  • Responsible in creating mappings and workflows to extract and load data from relational databases, flat file sources and legacy systems using Talend.
  • Developed and designed ETL Jobs using Talend Integration Suite in Talend 5.2.2
  • Experienced in managing and reviewing log files using Web UI and Cloudera Manager.
  • Involved in creating External Hive tables and involved in data loading and writing Hive UDFs.
  • Experience in using various compression techniques like Snappy Codec, Lzo, and Gzip to save data and optimize data transfer over network using Avro, Parquet.
  • Involved in unit testing and user documentation and used Log4j for creating the logs.

Environment: Apache Spark, Hadoop, HDFS, Hive, Kafka, Sqoop, Scala, Talend, AWS Cassandra, Oozie, Cloudera, Impala, linux, Oozie

Confidential

Java\ Scala Developer

Responsibilities:

  • Involved in creating Data Model (RDF) for the Existing PME Application in Semantic Web.
  • Developed Shell, Perl and Python scripts to automate and provide Control flow to Pig scripts.
  • Created various Parser programs to extract data from Autosys, Tibco Business Objects, XML, Informatica, Java, and database views using Scala
  • Created Ingestor to publish extracting data using Data Model to Cesium environment.
  • Wrote the Stratio Metashell Interpreter for Zeppelin for allowing interactive data analytics facilitating data-driven and collaborative documents with SQL, Scala, Markdown shell and Meta shell.
  • Configured Jenkins build configuration for this application everyday automatically run.
  • Interaction with offshore team to every day and following agile methodology.

Environment: Scala 1.7 Spring 2.5, Hibernate 3.x, JMS, Sybase, Toad 10.5, Linux, XML, Log4j, GitHub, Hudson, Scala 4.4.1, Ivy, Semantic Web (RDF) and SPARQL, Top Briad

We'd love your feedback!