Sr. Hadoop/spark Developer Resume
Milwaukee, WI
SUMMARY:
- IT Professional with 8+ years of overall experience and in - depth understanding of HDFS, Spark Architecture including Spark Core, Spark SQL, Data Frames, Spark Streaming.
- Expertise in writing Spark RDD and Data Frames transformations, actions, Data Frames, case classes for the required input data and performed the data transformations using Spark-Core.
- Good knowledge in evaluating big data analytics libraries and use of Spark-SQL for data exploratory.
- Experience in writing Hadoop Jobs for analyzing structured and unstructured data from web logs and social networks using HDFS, Hive, HBase, Pig, Spark, Kafka and Scala .
- Good knowledge of Hadoop Architecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node, YARN and MapReduce concepts.
- Experience on designing and implementing complete end-to-end Hadoop Infrastructure including PIG, HIVE, Sqoop, Oozie, Flume and zookeeper.
- Experience in designing both time driven and data driven automated workflows using Oozie.
- Hands on experience in various Hadoop distributions Cloudera and Hortonworks.
- Good knowledge in shell scripts to dump the shared data from MySQL servers to HDFS.
- Experience in performance tuning the Hadoop cluster by gathering and analyzing the existing infrastructure.
- Strong debugging and problem-solving skills with excellent understanding of system development methodologies, techniques and tools.
- Skilled in Amazon Web Services (AWS) using EC2 for computing S3 as storage mechanism and Cloud services like EBS, RDS and VPC.
- In-depth Knowledge of Data Structures, Design and Analysis of Algorithms.
- Ability to articulate different complex Statistical Concepts and identify KEY INSIGHTS from Data.
- Exceptional ability to quickly master new concepts and capable of working in-group as well as independently with excellent communication skills.
TECHNICAL COMPETENCIES:
BigData Tools: Hadoop, Spark, HDFS, MapReduce, TEZ, YARN, Hive, Pig, Zookeeper, Sqoop, Oozie, IMPALA, HBASE, Kafka, Flume
Hadoop Tools: Cloudera, Horton Works
Languages: Scala, COBOL, JCL, Java, C, C++, JavaScript, XML, HTML, and CSS
Databases: MySQL, Oracle, Microsoft SQL Server, DB2, MongoDB, IMSDB
IDE & Build Tools: Eclipse, IntelliJ, Sublime Text, Maven, SBT
Operating Systems: Windows, Linux/Unix
Methodologies: Agile (Scrum), Waterfall, UML, Design Patterns, SDLC
Version Controls: GIT, SVN, CA Harvest
AWS Services: EC2, S3, DynamoDB, EMR, IAM, SNS, SQS, VPC, Cloud Formation, EBS, Elastic Beanstalk, Elastic Load Balancer, RDS
Other Tools: In Sync, ISPF, DB2I, QMF, BMC Catalog Manager, Dump master, Freeze Frame, Apptune, Endevor, BMC Remedy, NDM, MQ, JIRA
PROFESSIONAL EXPERIENCE:
Confidential, Milwaukee, WI
Sr. Hadoop/Spark Developer
Responsibilities:
- Worked in Agile Iterative sessions to create Hadoop Data Lake for the client.
- Responsible for building scalable distributed data solutions using Hadoop.
- Developed UDF’s using Scala Scripts, which used in Data frames/SQL and RDD in Spark for Data Aggregation, queries and writing data back into RDBMS through Sqoop.
- Designing and developing POCs in Spark using Scala to compare the performance of Spark with Map Reduce, Hive.
- Worked on DAG cycle for entire spark application flow.
- Implemented Spark Scripts using Scala, Spark SQL to access hive tables into spark for faster processing of data.
- Loaded the data into Spark RDD and do in memory data Computation to generate the Output response.
- Handled Hive queries using Spark SQL that integrate with Spark environment.
- Worked closely with the customer to understand the business requirements and implemented them.
- Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs, and Scala.
- Involved in creating Hive tables, and loading and analyzing data using hive queries.
- Wrote Hive queries on the analyzed data for aggregation and reporting.
- Imported and exported data from different databases into HDFS and Hive using Sqoop.
- Used HUE for Hive Query execution.
- Used Sqoop for loading existing data in Oracle to HDFS.
- The collection of Necessary data to store it in one Central big data lake database This Centralized big data lake will feed into Tableau dashboard to provide clear report.
Environment: HDFS, Hive, Pig, Spark, Spark-Streaming, Spark, Yarn, Linux, SQOOP, Scala, Tableau, Intellij, Oracle, Git, Shell Scripting
Confidential, Portsmouth, NH
Hadoop Developer
Responsibilities:
- Evaluated business requirements and prepared detailed specifications that follow project guidelines required to develop written programs.
- Devised procedures that solve complex business problems with due considerations for hardware/software capacity and limitations, operating times and desired results.
- Analyzed large amounts of data sets to determine optimal way to aggregate and report on it. Provided quick response to ad hoc internal and external client requests for data and experienced in creating ad hoc reports.
- Migrated an existing on-premises application to AWS and Implemented 10 node cluster in AWS with help of the EMR in the test environment to implement in the cloud.
- Knowledge on Amazon EC2 Spot integration & and Amazon S3 integration.
- Managed Amazon redshift clusters such as launching the cluster by specifying the nodes and performing the data analysis queries.
- Optimized the EMRFS for Hadoop to directly read and write in parallel to AWS S3
- Used Cloud watch logs to move application logs to S3 and create alarms based on a few exceptions raised by applications.
- Responsible for building scalable distributed data solutions using Hadoop.
- Handled Imported of data from various data sources, performed transformations loaded data into HDFS.
- Installed Oozie workflow engine to run multiple spark jobs.
- Improving performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, and Pair RDD's.
- Imported and exported data from different databases into HDFS and Hive using Sqoop.
- Migrated MapReduce programs into Spark transformations using Spark and Scala.
- Used Spark API over Cloudera Hadoop YARN to perform analytics on data.
- Involved in converting Hive/SQL queries into Spark transformations using Spark SQL, Python and Scala.
- Developed Spark scripts by using Scala shell commands as per the requirement.
- Extensively worked on the core and Spark SQL modules of Spark.
- Used Spark API over Hadoop YARN to perform analytics on data in Hive and Spark SQL.
- Loaded the data into Spark RDD and do in memory data Computation to generate the Output response.
- Create views over HBase table and used SQL queries to retrieve alerts and Meta data.
- Worked with HBASE NOSQL database.
- Helped and directed testing team to get up to speed on Hadoop Data testing.
- Worked on loading and transforming large sets of structured, semi structured and unstructured data.
- Used Oozie Workflow engine to run multiple Hive and Pig jobs.
- Created stored procedures, triggers and functions to operate on report data in MySQL.
Environment: Cloudera, Spark, Spark-Streaming, Spark, Yarn, SQOOP, Scala, Oozie, Tableau, Intellij, Oracle, Git, Shell Scripting
Confidential, New York, NY
Hadoop Developer
Responsibilities:
- Involved in review of functional and non-functional requirements.
- Installed and configured Hadoop MapReduce, and developed multiple MapReduce jobs in java for data cleaning and pre-processing.
- Develop MapReduce jobs for the users. Maintain, update and schedule the periodic jobs which range from updates on periodic MapReduce jobs to creating ad-hoc jobs for the business users.
- Importing and exporting data into HDFS and Hive using Sqoop.
- Worked on managing and reviewing Hadoop log files.
- Extracted files from SQLDB through Sqoop and placed in HDFS and processed.
- Experienced in running Hadoop streaming jobs to process terabytes of xml format data.
- Load and transform large sets of structured, semi structured and unstructured data.
- Responsible to manage data coming from various sources.
- Supported Map Reduce Programs those are running on the cluster.
- Created NDM Jobs in Mainframe in order to copy the daily SOR files from Mainframe to Edge node (Unix)
- Involved in loading data from UNIX file system to HDFS.
- Involved in creating Hive tables, loading with data and writing hive queries which will run internally in map reduce way.
- Developed a custom File System plug in for Hadoop so it can access files on Data Platform.
- Designed and implemented MapReduce-based large-scale parallel relation-learning system
- Setup and benchmarked Hadoop/HBase clusters for internal use.
- Analyzed large amounts of data sets to determine optimal way to aggregate and report on it. Provided quick response to ad hoc internal and external client requests for data and experienced in creating ad hoc reports.
- Facilitated knowledge transfer sessions.
Environment: Eclipse, Oracle 10g, Hadoop, Hive, HBase, Linux, MapReduce, Java (JDK 1.6), Cloudera, MapReduce, DataStax, IBM DataStage 8.1, Oracle 11g / 10g, PL/SQL, SQL PLUS, UNIX Shell Scripting
Confidential
Java Developer
Responsibilities:
- Effectively worked on all the stages of a Software Development Life Cycle (SDLC).
- Extensively used Core Java, Servlets, JSP and XML.
- Extensively applied various design patterns such as MVC, Front Controller, Factory, Singleton, DAO etc. throughout the application for a clear and manageable distribution of roles.
- Used JavaScript code, HTML and CSS style declarations to enrich websites.
- Implemented the application using Spring MVC Framework which is based on MVC design pattern.
- Writing stored procedures to process the data and store it in an organized manner.
- Wrote SQL queries, stored procedures, and triggers to perform back-end database operations.
- Developed SQL stored procedures and prepared statements for updating and accessing data from database.
- Fine-tuned SQL queries for maximum efficiency to improve the performance.
- Used MQ Series for integrating with other legacy systems.
- Developed application service components and configured beans using (applicationContext.xml) Spring IOC
- Designed User Interface and the business logic for customer registration and maintenance.
- Integrating Web services and working with data in different servers.
- Involved in designing and Development of SOA services using Web Services.
- Understanding the requirements from business users and end users.
- Working with XML/XSLT files.
- Created UML class and sequence diagram.
- Created Tables, Views, Triggers, Indexes, Constraints and functions in SQLServer2005.
- Worked on content management for versioning and notifications.
- Involved in configuring and deploying the application with Tomcat.
- Involved in Unit testing, Integration testing and User Acceptance testing.
- Utilizes Java and SQL day to day to debug and fix issues with client processes.
Environment: Java, Python, J2EE, HTML, CSS, JavaScript, Struts, Hibernate, Eclipse IDE
Confidential
Mainframe Developer
Responsibilities:
- Involved in requirement analysis, coding, testing and release of all the elements related to the project.
- Involved in designing and creation of new tables for the development
- Involved in conversion of database for IMS DB to DB2.
- Worked on Backend implementation using MVC framework.
- Auto creation for JCL modules from the COBOL program
- Created Code Blocks for the application
- Developed programs for accessing the database and creating XML format to send the request for wed application.
- Wrote SQL code blocks using cursors for shifting records from various tables based on checks
- Wrote procedures and triggers for validating the consistency of metadata.
- Knowing the existing logic for end to end for the migration and analysis of impact on existing modules.
- Preparation of Detailed Design Document.
- Preparation of elements migration requests and checklist.
- Followed coding and documentation standards
Environment: COBOL, DB2, DB2 Utilities, JCL, Sort, MQ