We provide IT Staff Augmentation Services!

Sr. Spark Developer Resume

Atlanta, GA

PROFESSIONAL SUMMARY:

  • 7 years of IT experience in Architecture, Analysis, design, development, implementation, maintenance and support with experience in developing strategic methods for deploying big data technologies to efficiently solve Big Data processing requirement.
  • Around 5+ years of experience on BIG DATA using HADOOP framework and related technologies such as HDFS, HBASE, MapReduce, HIVE, PIG, FLUME, OOZIE, SQOOP, and ZOOKEEPER.
  • Experience in using Hadoop ecosystem components like Map Reduce, HDFS, HBase, Zoo Keeper, Hive, Sqoop, Pig, Flume, Spark, Cloud era.
  • Knowledge and experience in Spark using Python and Scala .
  • Knowledge on big - data database HBase and NoSQL databases Mongo DB and Cassandra.
  • Experience in meeting expectations with Hadoop clusters using Cloudera and Horton Works.
  • Experience includes Requirements Gathering, Design, Development, Integration, Documentation, Testing and Build.
  • Experience in Spark applications using Scala for easy Hadoop transitions.
  • Extending Hive and Pig core functionality by writing custom UDFs.
  • Solid knowledge of Hadoop architecture and core components Name node, Data nodes, Job trackers, Task Trackers, Oozie, Scribe, Hue, Flume, HBase, etc.
  • Extensively worked on development and optimization of Map reduce programs, PIG scripts and HIVE queries to create structured data for data mining.
  • Ingested data from RDBMS and performed data transformations, and then export the transformed data to Cassandra as per the business requirement.
  • Expertise on various AWS Services in Compute, Storage, Network, Database, Monitoring and Security.
  • Wrote many scripts in shell/python to automate daily tasks or query data on AWS resources.
  • Created AWS Config and built the tools to track the configuration changes and by setting the automatic Notifications using AWS SNS .
  • Design roles and groups using AWS Identity and Access Management (IAM).
  • Working knowledge in AWS Lambda for Server-less computing using Python.
  • Deployed and configured 15 Node Kafka cluster On Hortonworks platform.
  • Pretty Good Experience On Kafka management tools like Kafka manager and Kafka offset monitor.
  • Implemented custom Kafka connector for Kafka (sink and source) to HDFS,MYSQL.
  • Good Experience with Kafka Log Aggregation, Stream Processing, Event sourcing.
  • Good experience with Kafka Commit Log re-syncing mechanism for failed nodes to restore their data.
  • Implemented Kafka security using Kerberos and SSL.
  • Implemented spark- Kafka custom offset management using zookeeper.
  • Handled Gigabytes of data per day wise using Kafka -streaming.
  • Implemented SQL on Kafka data using Kafka -table.
  • Implemented KSQL also for do more SQL operation on Kafka data.
  • Worked in provisioning and managing multi-tenant Hadoop clusters on public cloud environment - Amazon Web Services (AWS) and on private cloud infrastructure - Open stack cloud platform.
  • Great Experience with Kafka implementation and engineering Kafka sizing, security, replication and monitoring .
  • Strong experience in working with Elastic MapReduce and setting up environments on Amazon AWS EC2 instances
  • Loaded some of the data into Cassandra for fast retrieval of data.
  • Worked with both Scala and Java, Created frameworks for processing data pipelines through Spark.
  • Implemented batch-processing solution to certain unstructured and large volume of data by using Hadoop Map Reduce framework.
  • Very good experience of Partitions, bucketing concepts in Hive and designed both Managed and External tables in Hive to optimize performance.
  • Knowledge of ETL methods for data extraction, transformation and loading in corporate-wide ETL Solutions and Data warehouse tools for data analysis.
  • Experience in Database design, Data analysis, Programming SQL.
  • Experience in extending HIVE and PIG core functionality by using Custom User Defined functions.
  • Experience in writing custom classes, functions, procedures, problem management, library controls and reusable components.
  • Working knowledge on Oozie, a workflow scheduler system to manage the jobs that run on PIG, HIVE and SQOOP.
  • Experienced in integrating Java-based web applications in a UNIX environment.
  • Experience working with Red Hat Enterprise Linux.

TECHNICAL SKILLS:

Hadoop/Big Data: HDFS, Map Reduce, Hive, Pig, Sqoop, Flume, Oozie, and Spark (Python and Scala )

NoSQL Databases: HBase, Cassandra, MongoDB

Languages: C, C++, Java, J2EE, PL/SQL, Pig Latin, HiveQL, Unix shell scripts, R

Java/J2EE Technologies: Applets, Swing, JDBC, JSON, JSTL, JMS, Java Script, JSP, Servlets, EJB, JSF, JQuery

Frameworks: MVC, Struts, Spring, Hibernate.

ETL: IBMWebSphere/Oracle

Operating Systems: Sun Solaris, UNIX, Red Hat Linux, Ubuntu Linux and Windows XP/Vista/7/8

Web Technologies: HTML, DHTML, XML, WSDL, SOAP

Web/Application servers: Apache Tomcat, WebLogic, JBoss

Databases: Oracle, SQL Server, MySQL

Tools: and IDE: Eclipse, NetBeans, JDeveloper, DB Visualizer.

Network Protocols: TCP/IP, UDP, HTTP, DNS

PROFESSIONAL EXPERIENCE:

Sr. Spark developer

Confidential, Atlanta, GA

Responsibilities:

  • Deployed and configured 15 Node Kafka cluster.
  • Pretty Good Experience On Kafka management tools like Kafka manager and Kafka offset monitor.
  • Implemented custom Kafka connector for Kafka (sink and source) to HDFS,MYSQL.
  • Good Experience with Kafka Log Aggregation, Stream Processing, Event sourcing.
  • Expertise on various AWS Services in Compute, Storage, Network, Database, Monitoring and Security.
  • Wrote many scripts in shell/python to automate daily tasks or query data on AWS resources.
  • Created AWS Config and built the tools to track the configuration changes and by setting the automatic Notifications using AWS SNS .
  • Design roles and groups using AWS Identity and Access Management (IAM).
  • Working knowledge in AWS Lambda for Server-less computing using Python.
  • Strong experience in working with Spark-SQL, Data Frames and Pair RDD's.
  • Experienced in handling large dataframes using Partitions, Spark in-Memory capabilities, Broadcasts in Spark, Effective & efficient Joins, Transformations and other during ingestion process itself.
  • Experience in writing SPARK UDF’s which is a feature of SPARK SQL which helps in code optimization.
  • Actively involved in the SPARK tuning techniques by successfully caching the RDD’s and increase the number of executor’s per node etc.
  • Experienced in working with different level’s of data compressions like JSON,PARQUET,SNAPPY etc and writing into S3 with the desired partitioning.
  • Successfull cretion on workflow’s using Databricks REST API’s for running the production data.
  • Knowledge on Data bricks airflow.
  • Successful creation of metric files (JSON format) for the kibana dashboards.
  • Worked on developing Unix Shell scripts to automate Spark-Sql.
  • Spark transformations using Spark and Scala.
  • Worked on creating SPARK jobs using SCALA.
  • Experience in working with GitHub,GitBash,GitK etc.
  • Experience in working with Rally to provide the test case and test plans for the daily CI’s and also updating defects, user stories etc using RALLY API’s.
  • Experience in working with SPARK using SCALA using functionally and improving the performance.
  • Using Spark SQL to query the headers to learn about the composition of the data allowing us to compare data from various sources.
  • Involved in the gathering of Business requirements and preparation of Information Template used for identifying data elements for future reporting needs.
  • Using Amazon Redshift to load the data from cloud.
  • Maintain documentation that supports system configuration, training and user experience.

Confidential, Atlanta, GA

Spark Developer

Responsibilities:

  • Implemented Spark using Scala and Spark SQL for faster testing and processing of data.
  • Real time streaming the data using Spark with Kafka.
  • Responsible for building scalable distributed data solutions using Hadoop.
  • Responsible for cluster maintenance, adding and removing cluster nodes, cluster monitoring and troubleshooting, managing and reviewing data backups and Hadoop log files.
  • Continuous monitoring and managing the Hadoop cluster through Cloudera Manager.
  • Upgrading the Hadoop Cluster from CDH3 to CDH4, setting up High Availability Cluster and integrating HIVE with existing applications.
  • Exploring with the Spark improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, and Pair RDD’s.
  • Installed Oozie workflow engine to run multiple Hive and Pig jobs.
  • Handled importing of data from various data sources, performed transformations using Hive, MapReduce, loaded data into HDFS and extracted data from Teradata into HDFS using Sqoop.
  • Worked extensively with Sqoop for importing metadata from Oracle.
  • Configured Sqoop and developed scripts to extract data from MySQL into HDFS.
  • Created Hbase tables to store various data formats of PII data coming from different portfolios.
  • Helped with the sizing and performance tuning of the Cassandra cluster.
  • Involved in the process of Cassandra data modelling and building efficient data structures.
  • Trained and mentored analyst and test team on Hadoop framework, HDFS, Map Reduce concepts, Hadoop Ecosystem.
  • Responsible for architecting Hadoop clusters.
  • Assist with the addition of Hadoop processing to the IT infrastructure.

Environment: Hadoop, MapReduce, HDFS, Hive, Java, SQL, Spark, Cloudera Manager, Storm Cassandra, Pig, Sqoop, PL/SQL, MySQL, Windows, Horton works Oozie, HBase

Hadoop Developer

Confidential

Responsibilities:

  • Load and transform large sets of structured, semi structured and unstructured data.
  • Developed workflow in Oozie to automate the tasks of loading the data into HDFS and pre-processing with Pig.
  • Mentored analyst and test team for writing Hive Queries.
  • Develop and maintains complex outbound notification applications that run on custom architectures, using diverse technologies including Core Java, J2EE, SOAP, XML, JMS, JBoss and Web Services.
  • Used Nagios Remote Plugin Executor (NRPE) for the remote system monitoring.
  • Involved in running Hadoop jobs for processing millions of records of text data.
  • Worked with application teams to install operating system, Hadoop updates, patches, version upgrades as required .
  • Used Ganglia PHP Web Front-end to view the gathered information from the web pages.
  • Developed multiple MapReduce jobs in java for data cleaning and preprocessing.
  • Involved in loading data from LINUX file system to HDFS.
  • Responsible for managing data from multiple sources.
  • Experienced in running Hadoop streaming jobs to process terabytes of xml format data.
  • Assisted in exporting analyzed data to relational databases using Sqoop.
  • Created and maintained Technical documentation for launching HADOOP Clusters and for executing Hive queries and Pig Scripts .
  • Exported data to RDBMS via Sqoop to check whether the power saving program is successful or not.
  • Extensively used Sqoop for importing the data from RDBMS to HDFS.
  • Used ZooKeeper to coordinate the clusters.
  • Handled the imported data to perform transformations, cleaning and filtering using Hive and Map Reduce.
  • Enabled speedy reviews and first mover advantages by using Oozie to automate data loading into the Hadoop Distributed File System (HDFS) and PIG to pre-process the data.
  • Installed and configured MapReduce, HIVE and the HDFS; implemented CDH3 Hadoop cluster on CentOS. Assisted with performance tuning and monitoring.

Hire Now