Spark Developer Resume
NC
SUMMARY:
- 6 Years of professional experience in IT which includes 4 years of comprehensive experience in Apache Hadoop and Spark Developer, and related technologies.
- Expertise in writing Hadoop Jobs using Java and Scala language.
- In depth understanding/knowledge of Hadoop Architecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node, Map Reduce, Spark and Spark Sql
- Experience of converting Hive/SQL queries into Spark transformations using Spark RDDs.
- Experience of developing SQL scripts using Spark for handling different data sets and verifying the performance over Map Reduce jobs.
- Experience in importing and exporting Multi Terabytes of data using Sqoop from HDFS to Relational Database Systems (RDBMS) and vice - versa
- Experienced in working with Hadoop/BigData storage and analytical frameworks over Amazon AWS Cloud using tools like SSH, Putty and MindTerm.
- Good understanding of Data Mining and Machine Learning techniques like Random Forest, logistic Regression, K-Means.
- Experience in implementing Custom Partitioners and Combiners for effective data distributions.
- Experience in writing simple to complex adhoc PIG Scripts and Pig UDFs
- Having experience in writing simple to complex HIVE adhoc scripts, HIVE UDFs, UDTF and UDAFs
- Experience in writing shell scripts to dump the Shared data from MySQL, Oracle servers to HDFS.
- Good Knowledge in creating event processing data pipelines using Kafka and Storm.
- Good understanding in configuring simple to complex work flows using Oozie.
- Good understanding of NoSQL databases like MongoDB and Cassandra
- Proficient in Working with Various IDE tools including Eclipse and VM Ware.
- Very good experience in customer specification study, requirements gathering, analyzing the requirement, design, development, testing and implementation.
- Worked on different operating systems like UNIX/Linux, Windows
- Exceptional ability to quickly master new concepts and capable of working in-group as well as independently with excellent communication skills.
TECHNICAL SKILLS:
Languages and Technologies: Java, Scala, R, C, C++, XML, SQL, Shell Script, PIG Latin,Impala, MapReduce, Hive, Sqoop, Spark, Spark SQL,AWS, Zookeeper, Hbase, Kafka, Oozie, Storm, Flume
Operating Systems: Linux, Windows
Databases: MySQL, MSSQL, MongoDB, Cassandra
Tools: Eclipse, Winscp, Wireshark, JIRA, IBM Tivoli
Scripting Languages: Scala, JavaScript, PHP,Python
Others: HTML, XML, JSON, REST, SOAP
PROFESSIONAL EXPERIENCE:
Confidential
Spark Developer
Responsibilities:
- Developed Spark code using Scala and Spark-SQL/Streaming for faster processing of data .
- Prepared Spark build from the source code and ran the PIG Scripts using Spark rather using MR jobs for better performance
- Imported data using Sqoop to load data from MySQL to HDFS on regular basis.
- Implemented various machine learning techniques like Random Forest, K-Means, Logistic Regression for predictions and pattern identification using Spark-MLib.
- Developed Scripts and Batch Job to schedule various Hadoop Program.
- Used Spark API over Cloudera Hadoop YARN to perform analytics on data in Hive.
- Wrote Hive queries for data analysis to meet the business requirements.
- Developed Kafka producer and consumers for message handling.
- Responsible for analyzing multi-platform applications using python.
- Apache Hadoop installation & configuration of multiple nodes on AWS EC2 system.
- Used storm for an automatic mechanism to analyze large amounts of non-unique data points with low latency and high throughput.
- Migrating servers, databases, and applications from on-premise to AWS, Azure and Google Cloud Platform
- Developed MapReduce jobs in Python for data cleaning and data processing.
- Exploring with the Spark improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, Pair RDD's, Spark YARN.
- Weekly meetings with technical collaborators and active participation in code review sessions with senior and junior developers.
Environment: CDH4, Scala, Spark, HDFS, AWS, Hive, Pig, Linux, Python, MySQL, MySQL Workbench, Eclipse, PL/SQL, SQL connector.
Confidential, NC
Hadoop Developer
Responsibilities:
- Worked as a H a doo p developer to analyze large amounts of data to analyze regulatory reports by creating M a pR e du c e j ob s i n Ja va.
- Exported data using S q oo p into HDFS and H i v e for report analysis.
- Worked on U se r D efi n e d Fun cti on s in Hive to load the data from HDFS to run aggregation function on multiple rows.
- Created a MapReduce job to perform l oo k - up s of specific entries using key-value pairs.
- Developed P i g L ati n scri p t s to load data from output files and put to HDFS.
- Monitored and managed Hadoop cluster using the Cloudera Manager web interface.
- Developed and implemented hive custom UDFs involving date functions.
- Used Oo zi e Workflow engine to run multiple Hive and Pig jobs.
- Created stored procedures, triggers and functions to operate on report data in MySQL.
- Implemented POC to migrate map reduce jobs into Spark RDD transformations.
- Weekly meetings with technical collaborators and active participation in code review sessions with senior and junior developers.
Environment: UNIX Scripting, Java, Hadoop, MapReduce, HDFS, Pig, Sqoop, Hive, Oracle, Teradata and Eclipse
Confidential, Durham, NC
Hadoop Developer/ Administrator
Responsibilities:
- Installed and configured Hadoop HDFS, MapReduce, Pig, Hive, and Sqoop.
- Wrote Map/Reduce jobs in Java to run over Hadoop clusters.
- Involved in implementing High Availability and automatic failover infrastructure to overcome single point of failure for Namenode utilizing zookeeper services.
- Developing PIG scripts to transform the raw data into intelligent data as specified by business users.
- Worked on Hadoop cluster and data querying tools Hive to store and retrieve data.
- Reviewing and managing Hadoop log files by consolidating logs from multiple machines using flume.
- Exported analyzed data to HDFS using Sqoop for generating reports.
- Importing and exporting data into HDFS and Hive using Sqoop and Flume.
- Worked on Oozie workflow engine to run multiple Map Reduce jobs.
- Experienced in working with applications team in installing Hadoop updates, upgrades based on requirement.
Environment: Hadoop, MapReduce, HDFS, Pig, Sqoop, Hive, Oracle, Teradata, Eclipse and Unix Scripting.
Confidential
Java Developer
Responsibilities:
- Followed Agile software development with Scrum methodology.
- Designed and developed various modules of the application with OOAD.
- Implemented JAVA/J2EE design patterns such as Factory, Singleton, DTO, DAO, Session Facade.
- Utilized J2SE 7 extensively to develop business logic.
- Implemented dynamic functionality to screens using JQuery and implemented Asynchronous means of retrieval of data using AJAX
- Responsible for designing and coding of User Interfaces using SpringMVC framework.
- Implemented Ajax component for dynamic values to get from database and updating forms.
- Developed the Code both Front-end and Back-end.
- Developed classes in DAO and service layers
- Consumed Restful web services using JAX-RS
- Used SVN for source control repository
- Used supervised machine learning techniques for developing prediction models and logistic decisions.
- Used third party library like JFreeChart for data visualization.
- Used swing components in creating the dashboard.
- Configured spring with hibernate properties and validations for Dependency Injection.
Environment: Java, JSP, Servlets, Web Sphere Application Server, Eclipse, Java Script, Oracle, PL/SQL and JDBC.