We provide IT Staff Augmentation Services!

Spark / Hadoop Developer Resume

4.00/5 (Submit Your Rating)

Wall Street New, YorK

PROFESSIONAL SUMMARY:

  • 10+ years of experience in IT industry which comprising of 3 years of hands on experience in Hadoop Eco system technologies such as in Spark, Scala, Hive, Pig, Oozie, Flume, Sqoop, Map Reduce, HBase and Zookeeper
  • Proficiency in formulating a strategic vision and a tactical roadmap to address client's critical Business Intelligence / Analytics needs in conformance with overall corporate objectives.
  • Technical evangelist skilled at developing new applications on Hadoop according to business needs, and convert existing applications to Hadoop environment.
  • Analysis, Design, Development and Production Support using Data Warehouse, ETL, Talend, Informatica, Core Java and Mainframe applications.
  • Responsible for analyze big data and provide technical expertise and recommendations to improve current existing systems.
  • Hands on experience in Capacity planning, monitoring and Performance Tuning of Hadoop Clusters.
  • Involved in finding, evaluating and deploying new Big Data technologies and tools.
  • Proficient knowledge on Apache Spark and programming SCALA to analyze large datasets using Spark and Storm & Kafka to process real time data.
  • Worked on writing custom UDF’s in java for Hive and Pig.
  • Involved in building, evolving and reporting framework on top of the Hadoop cluster to facilitate data mining, analytics and dash - boarding.
  • Support a wide variety of ad hoc data needs.
  • Profound experience in creating real time data streaming solutions using Apache Spark/Spark Streaming, Kafka.
  • Worked on Oozie to manage and schedule the jobs on Hadoop cluster
  • Experience in shell and python scripting languages
  • Hands on experience in writing map-reduce programs in java.
  • Have a strong ability to prepare and present data in a visually appealing and easy to understand manner.
  • Build high-volume real-time data processing applications using Hadoop platform.
  • Performed Importing and exporting data into HDFS, Hive and HBase using Sqoop.
  • Hands-on experience on full life cycle implementation using Hortonworks Data Platform (HDP®), CDH(ClouderaDistribution Hadoop) and MapR.
  • Involved in Design and Architecting of Big Data solutions using Hadoop Eco System.
  • Experience working in large scale Databases like Oracle 11g, DB2, XML, MS Excel and Flat files
  • Strong background in Data warehousing concepts and implementing dimensional Modeling concepts.
  • Proficient in interacting with the business users by conducting meetings with the clients during requirements Analysis phase
  • Involved in Performance Tuning and Productivity Improvement activities
  • Excellent knowledge with Unit Testing, Regression Testing, Integration Testing, User Acceptance Testing, Production implementation and Maintenance.
  • Demonstrated ability to communicate and gather requirements, partner with Enterprise Architects, Business Users, Analysts and development teams to deliver rapid iterations of complex solutions
  • Effective leadership quality with good skills in strategy, business development, client management and project management
  • Excellent global exposure to various work cultures and client interaction with diverse teams
  • Ability to work effectively in cross-functional team environments and experience of providing to business users.

TECHNICAL SKILLS:

Environment: s: Win 95/98, Win NT, Unix, Linux and Win XP

Languages: C, C++, Core Java, PL/SQL, Pig, HiveQL, Linux Shell Scripting, Scala and Python

Hadoop related Big Data Technologies: HDFS, YARN, MapReduce, Hive, Impala, Pig, Sqoop, Oozie, Flume, Spark, Spark - Streaming, Machine Learning (MLLIB), K-Means, DistCp, MapR, Cloudera, Hortonworks, Apache Hadoop, ZooKeeper, NoSQL(HBase), Cassandra & Mongodb

Analytical tools: Tableau, Datameer and Platfora

ETL Tools: Informatica PowerCenter 9.1/8.6

Databases: Oracle 11g/10g, Teradata, IBM DB2, SQL Server 2008, MS Access

Source control: SVN, GIT, Bit Bucket

Job Scheduling: Crontab, Autosys & Oozie

Web: Apache Tomcat

Tools: & Utilities: RALLY, Tectia Client, Putty, Winscp, Autosys, Eclipse, Toad, Maven, FileZilla, SPUFI, AbendAid, Endevor, Remedy, QMF and File-Aid

GUI: SQL Developer, SQL Server Management Studio, VB 6.0 and Developer 2000

Mainframe Technologies: S/390 Mainframe, COBOL II, JCL, DB2, CICS, IMS DB and VSAM

Project Management: PRINCE2, ITIL V3.0, Agile Scrum

PROFESSIONAL EXPERIENCE:

Confidential - Wall Street, New York

SPARK / Hadoop Developer

Responsibilities:

  • Handling importing of data from various data sources, performed transformations using Hive, Pig and loaded data into HDFS for aggregations.
  • Participate in Design Reviews & Daily Project Scrums
  • Working closely with the business analysts to convert the Business Requirements into Technical Requirements and prepared low and high level documentation.
  • Involved in enhancing the speed performance using Apache Spark.
  • Architected, designed and implemented a Big Data initiative using Hadoop Framework, MapReduce, Pig, Hive. Spark, HBase to process large volumes of structured and unstructured data.
  • Wrote hive queries for data analysis to meet business requirements
  • Maintaining and Monitoring Hive data warehouse tool-creating tables, data distribution by implementing partitioning and bucketing, writing and optimizing the HiveQL queries
  • Worked hand-in-hand with the Architect; enhanced and optimized productSparkcode to aggregate, group and run data mining tasks using Sparkframework
  • Monitored and tuned Spark jobs running on the cluster
  • Hands on experience in joining raw data with the data using Pig scripting.
  • Written custom UDF’s in Hive, Pig using Java and python.
  • Developed Scala scripts, UDFs in Spark 1.6 for Data Aggregation, queries and writing data back into OLTP system through Sqoop.
  • Used Spark to perform analytics on data in hive.
  • Created real time data ingestion using Spark streaming to Hadoop
  • Hands on extracting data from different databases and to copy into HDFS file system using Sqoop.
  • Created Oozie coordinated workflow to execute Sqoop incremental job daily.
  • Hands on experience in exporting the results into relational databases using Sqoop for visualization and to generate reports for the BI team.
  • Involved in installing and configuring Hive, Pig, Sqoop, Flume and Oozie on the Hadoop cluster.
  • Communicate deliverables status to user/stakeholders, client and drive periodic review meetings.
  • On time completion of tasks and the project per quality goals.

Environment: Hadoop, HDFS, Spark, Scala, Machine Learning (MLLIB), K-Means, Map Reduce, HIVE, Pig, Sqoop, HBase, Cassandra, Oozie, Teradata, MySql, GIT, Putty, Zookeeper, Linux Shell Scripting.

Confidential - New Jersey

SPARK / Hadoop Developer

Responsibilities:

  • Worked on requirement gathering, analysis and translated business requirements into technical design with Hadoop Ecosystem.
  • Worked collaboratively to manage build outs of large data clusters with Spark.
  • Worked on creating Hive tables and written Hive queries for data analysis to meet business requirements and experienced in Sqoop to import and export the data from Oracle.
  • Extracted files from MySQL, Oracle through Sqoop and placed in HDFS and processed.
  • Used Core java and object-oriented concepts for developing UDF’s.
  • Created Target tables and Staging tables.
  • Used Spark for interactive queries, processing of streaming data and integration with popular NoSQL database for huge volume of data.
  • Built SPARK pipelines and work flows using Scala.
  • Analyzed the SQL scripts and designed the solution to implement using Scala.
  • Written Spark applications using Scala to interact with the MySQL database using Spark SQL Context and accessed Hive tables using Hive Context.
  • Optimized the Hive tables using optimization techniques like partitions and bucketing to provide better performance with Hive queries.
  • Created big data workflows to ingest the data from various sources to Hadoop using OOZIE and these workflows comprises of heterogeneous jobs like Hive, SQOOP.
  • Created JIRA projects integrating workflows, screen schemes, field configurationschemes, permission schemes, project roles, and notification schemes.
  • Experienced in Agile processes and delivered quality solutions in regular sprints.
  • Optimized and tuned existing ETL scripts (SQL and PL/SQL).
  • Moving data from HDFS to RDBMS and vice-versa using SQOOP.
  • Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs and Scala.
  • Experienced in writing HIVE JOIN Queries.
  • Involved in loading and transforming large sets of Structured, Semi-Structured and Unstructured data and analyzed them by running Hive queries.
  • Experience in developing custom UDF's for Hive.

Environment: Hadoop, HDFS, Spark 1.6, Scala, Machine Learning (MLLIB), K-Means, Map Reduce, HIVE, Pig, Sqoop, HBase, Oozie, Teradata, MySql, SVN, Putty, Zookeeper, Linux Shell Scripting.

Confidential - New Jersey

Hadoop Developer

Responsibilities:

  • Hadoop ecosystem experience (HDFS/MapReduce/Spark/ Hive /Impala /Oozie /Yarn /PIG /Impala)
  • Processed HDFS data and created external tables using Hive and developed scripts to ingest and repair tables that can be reused across the project.
  • Developed Pig Scripts, Pig UDFs and Hive Scripts, Hive UDFs to load data files.
  • Extensive understanding of Hadoop Architecture and its components. Rack awareness and involved in the Hadoop Cluster planning and managing by taking the Hardware, software and Network considerations.
  • Planning and building of cluster by keeping in the considerations of data growth.
  • Handled importing of data from various data sources, performed transformations using Hive, MapReduce, loaded data into HDFS and Extracted the data from MySQL into HDFS using Sqoop.
  • Expert in importing and exporting data into HDFS using Sqoop and Flume.
  • Cluster configuration and data transfer (distcp), inter and intra cluster data transfer.
  • Experience in setting up Test, QA, and Prod environment.
  • Involved in loading data from UNIX file system to HDFS.
  • Creating Hive tables and working on them using HiveQL.
  • Hands on experience in installing, configuring Apache Hadoop ecosystem components like MapReduce, HDFS, HBase, Oozie, Hive, Pig, impala, zookeeper, sqoop.
  • Strong knowledge on YARN terminology and the High-Availability Hadoop Clusters.
  • Experience in HDFS data storage and support for running map-reduce jobs.
  • Decommissioning and commissioning the Node on running Hadoop cluster.
  • Excellent understanding and knowledge of Hadoop architecture and various components such as HDFS, Namenode, Datanode, MapReduce, Yarn, Job tracker, Task tracker, Node manager, Resource manager, Application master.
  • Insight around cluster planning and managing, various aspects to remember while planning and setup of a new cluster, capacity sizing, understanding recommendations and different distributions of Hadoop.

Environment: Hadoop, HDFS, Spark, MLLIB, Scala, Map Reduce, HIVE, Pig, Sqoop, HBase, Oozie, My Sql, Teradata, SVN, Putty, Zookeeper, Linux Shell Scripting.

Confidential, New York

Hadoop Developer

Responsibilities:

  • Hands on extracting data from different databases and to copy into HDFS using Sqoop and have an expertise in using compression techniques to optimize the data storage.
  • Implemented ETL code to load data from multiple sources into HDFS using pig scripts.
  • Wrote Map Reduce jobs using the access tokens to get the data from the customers.
  • Hands on experience in creating HBase tables to load large sets of semi structured data coming from various sources.
  • Developed simple to complex Map Reduce jobs using Hive and Pig for analyzing the data.
  • Implemented some business logics by writing UDFs in Java and used various UDFs from Piggybanks and other sources to get some results from the data.
  • Optimized Map/Reduce Jobs to use HDFS efficiently by using various compression mechanisms.
  • Used Oozie workflow to automate all the jobs.
  • Hands on exporting the analyzed data into relational databases using Sqoop for visualization and to generate reports for the BI team.
  • Involved in installing and configuring Hive, Pig, Sqoop, Flume and Oozie on theHadoopcluster.
  • Deep understanding of scalable distributed computing systems,software architecture,data structures and algorithms using Hadoop, Apache Spark etc.
  • Worked with application teams to install operating system,Hadoopupdates, patches, version upgrades as required.
  • Continuously monitored and managed the Hadoop Cluster using Cloudera Manager.
  • Hands on experience in Tableau for Data Visualization and analysis on large data sets, drawing various conclusions.

Environment: Hadoop, HDFS, Spark, Scala, Map Reduce, HIVE, Pig, Sqoop, HBase, Oozie, Teradata, My Sql, SVN, Putty, Zookeeper, Linux Shell Scripting, Cornerstone and Fasttrack.

Confidential, Dearborn, MI

Developer

Responsibilities:

  • Involved in creating the Enterprise Change management (ECM), Build Documents and Change Review Scripts (CRS) and getting the approval from CAB.
  • Involved in Development team reviews like Review of code, unit test cases and results, System and Integration test cases and results and promote the CL’s to Endevor Model region.
  • Created Oozie coordinated workflow to execute Sqoop incremental job daily.
  • Used Oozie workflow engine to run multiple Hive and Pig jobs.
  • Written custom UDF’s in Hive.
  • Hands on extracting data from different databases and to copy into HDFS file system using Sqoop.
  • Written Sqoop incremental import job to move new / updated info from Database to HDFS.
  • Involved in installing and configuring Hive, Pig, Sqoop, Flume and Oozie on the Hadoop cluster.
  • Worked with application teams to install operating system, Hadoop updates, patches, version upgrades as required.
  • Working with clients on requirements based on their business needs.
  • Communicate deliverables status to user/stakeholders, client and drive periodic review meetings.
  • On time completion of tasks and the project per quality goals.

Environment: Hadoop, HDFS, Map Reduce, HIVE, Pig, Sqoop, HBase, Oozie, My Sql, SVN, Putty, Zookeeper, Linux Shell Scriptin, Informatica PowerCenter, COBOL II, DB2, IMS DB, VSAM, JCL, CA-7

Confidential, NYC, NY

Developer

Responsibilities:

  • Involved in source system Analysis and business requirement gathering with users.
  • Worked closely with the team responsible for gathering the reporting needs as well as the ensuring that the sourced data is not currently available in the existing data warehouse.
  • Responsible for end to end verification of requirements gathered and the functional specifications and come up with technical design document and Source to Target mappings documents.
  • Development and review of mappings involving extracting data from Flat Files, Oracle, sources to Oracle database.
  • Involved in creating the Enterprise Change management (ECM), Build Documents and Change Review Scripts (CRS) and getting the approval from CAB.
  • Involved in Development team reviews like Review of code, unit test cases and results, System and Integration test cases and results and promote the CL’s to Endevor Model region.
  • Worked on identifying Mapping Bottlenecks in Source, Target and Mappings to improve Performance.
  • Involved in performance tuning the ETL processes, testing of stored procedures and functions, testing of Informatica sessions, batches and the target Data.
  • Involved in Development team reviews like Review of code, unit test cases and results, System and Integration test cases, results and promote the CI’s to Endevor Model region.
  • Perform Defect tracking, Log and update any defects and solutions encountered during the project. Providing direction and guidance to a team of developers, including allocation and management of workload by conducting status calls and code reviews.

Environment: Informatica PowerCenter 9.1, Oracle, SQL, PL/SQL, Remedy, SQL Developer, Flat Files, FileZilla, Shell scripting, COBOL II, DB2, IMS DB, VSAM, JCL, CA-7

Confidential, Michigan, MI

System Analyst

Responsibilities:

  • Involved in source system Analysis and business requirement gathering with users
  • Worked closely with the team responsible for gathering the reporting needs as well as the ensuring that the sourced data is not currently available in the existing data warehouse
  • Worked closely with the data modeler’s to come up with the data model and ensure that it confirms to dimensional modeling reporting needs
  • Creation of Design Documents, System Test Cases, Unit Test Cases and review document, Migration Documents Involved in performance tuning the ETL processes, testing of stored procedures and functions, testing of Informatica sessions, batches and the target Data
  • Worked with source system teams to resolve data quality issues raised by end users.
  • Working with clients on requirements based on their business needs.
  • Providing direction and guidance to a team of developers, including allocation and management of workload by conducting status calls and code reviews.

Environment: Informatica PowerCenter 9.1, COBOL, JCL, VSAM, CICS, DB2, IMS-DB, Easytrive, ENDEVOR, SPUFI, FILE-AID, CA7.

Confidential, Detroit, MI

Technical Lead

Responsibilities:

  • Involved in source system Analysis and business requirement gathering with users.
  • Worked closely with the team responsible for gathering the reporting needs as well as the ensuring that the sourced data is not currently available in the existing data warehouse.
  • Responsible for end to end verification of requirements gathered and the functional specifications and come up with technical design document and Source to Target mappings documents.
  • Development and review of mappings involving extracting data from Flat Files, Oracle, sources to Oracle database.
  • Involved in creating the Enterprise Change management (ECM), Build Documents and Change Review Scripts (CRS) and getting the approval from CAB.
  • Involved in Development team reviews like Review of code, unit test cases and results, System and Integration test cases and results and promote the CL’s to Endevor Model region.
  • Worked on identifying Mapping Bottlenecks in Source, Target and Mappings to improve Performance.
  • Involved in performance tuning the ETL processes, testing of stored procedures and functions, testing of Informatica sessions, batches and the target Data.
  • Involved in Development team reviews like Review of code, unit test cases and results, System and Integration test cases, results and promote the CI’s to Endevor Model region.
  • Perform Defect tracking, Log and update any defects and solutions encountered during the project. Providing direction and guidance to a team of developers, including allocation and management of workload by conducting status calls and code reviews.

Environment: Informatica, COBOL, JCL, VSAM, CICS, DB2, IMS-DB, Easytrive, ENDEVOR, SPUFI, FILE-AID, CA7

Confidential, Charlotte, NC

Business Associate

Responsibilities:

  • Interacting with client to define business requirements and scope of the project.
  • Creation of System Test Cases, Unit Test Cases and review document, Migration Documents
  • Maintained, developed and fixed bugs for applications.
  • Solid background in Object-Oriented analysis and design.
  • Compiling and running the software.
  • Executing test cases and fixing bugs through unit testing.
  • Generating daily progress reports
  • Monitoring of Daily Production Jobs, Production support of Application.
  • Coordinating with other programmers in the team to ensure that all the modules complement each other well.
  • Working with clients on requirements based on their business needs.

Environment: COBOL, JCL, VSAM, DB2, Java, JSP, Servlets, XML, Rational Rose, Web Services, Windows XP, LINUX.

We'd love your feedback!