Senior Hadoop/spark Developer Resume
Denver, CO
SUMMARY
- IT Consultant with 10 years of extensive experience in Operations, developing, maintaining, monitoring and upgrading Hadoop Clusters (Hortonworks distributions).
- Extensive Retail Domain and Telecom Domain knowledge with primary skillset on Merchandising, Finance, Product Design and Development and Supply Chain Management areas.
- Good Experience in translating client’s Big Data business requirements and transforming them into Hadoop centric technologies.
- Hands on experience in installing/configuring/maintaining Apache Hadoop clusters for application development and Hadoop tools like Hive, Pig, Spark, Kafka, Zookeeper, Hue and Sqoop using Hortonworks.
- Hands on experience in developing and deploying enterprise - based applications using major components in Hadoop ecosystem like Hadoop 2.x, YARN, Hive, Pig, Map Reduce, Spark, Kafka, Storm, Oozie, HBase, Flume, Sqoop and Zookeeper.
- Experience in handling large datasets using Partitions, Spark in Memory capabilities, Broadcasts in Spark, Effective & efficient Joins, Transformations and other during ingestion process itself.
- Experience in converting Hive/SQL queries into Spark transformations using Java. Experience on ETL development using Kafka, Flume, and Sqoop.
- Built large-scale data processing pipelines and data storage platforms using open-source big data technologies.
- Experience in installation, configuration, management and deployment of Big Data solutions and the underlying infrastructure of Hadoop Cluster.
- Experience in installing, configuring Hive, its services and Metastore. Exposure to Hive Querying Language, knowledge about tables like importing data, altering and dropping tables.
- Experience in installing and running Pig, its execution types, Grunt, Pig Latin Editors. Good knowledge about how to load, store, filter data and also combining and splitting data.
- Experience in tuning and debugging Spark application running.
- Experience integration of Kafka with Spark for real time data processing.
- Involved in all phases of Software Development Life Cycle (SDLC) and Worked on all activities related to the Operations, implementation, administration and support of ETL processes for large-scale Data Warehouses.
- In depth knowledge about database imports, worked with imported data to populate tables in Hive. Exposure about how to export data from relational databases to Hadoop Distributed File System.
- Experience in setting up the High-Availability Hadoop Clusters.
- Good knowledge about planning a Hadoop cluster like choosing the distribution, hardware selection for both master as well as slave nodes and cluster sizing.
- Experience in developing Shell Scripts for system management.
- Experience in Hadoop administration with good knowledge about Hadoop features like safe mode, auditing.
- Responsible for writing J2EE compliant code using Java for an application development effort.
- Implemented Java and J2EE Design patterns like Business Delegate and Data Transfer Object (DTO), Data Access Object and Service Locator.
- Experience with Software Development Processes & Models: Agile, Waterfall & Scrum Model.
- Have good knowledge on sprint planning tools like Rally, Jira and GitHub version control tools as well.
- Team Player and a fast learner with good analytical and problem solving skills.
- Self-Starter and Ability to work independently as well as a Team.
- Experience in UNIX shell scripting and has good understanding of OOPS and Data structures.
TECHNICAL SKILLS
Operating Systems: MSDOS, Win 95/98/NT/2000/XP, Windows 7, Zos, UNIX
Project Management Tools: MS-Project, Unified Modeling Language (UML), Rational Unified Process (RUP)Software Design Life Cycle (SDLC), Agile (SCRUM),KANBAN
Process/Model Tools: Rational Rose, MS Visio, Rally,Jira
Hadoop Technologies: Hadoop/Big Data Technologies HDFS, SPARK, Scala, Hive, Pig, Sqoop, Flume,JavaKafka, Gobblin
Language: JCL, REXX, EXTRIEVE, SQL, COBOL, JDk 1.8, Java/J2EE, JDBC
Database: DB2, MS Access, Oracle 9i, HBase
Database Tools: IBM DB2 Connect,TOAD,SQLDeveloper
Testing Strategies: System Integration Testing, Regression and System Testing
Testing Tools: HP Quality Center, Quality Center
Office Tools: MS Word, MS Excel, MS PowerPoint, MS Access, MS Project
Web related: HTML, XML, VBScript, and Java Script
Others: Tandem (OutSide Overview),TotalSystem(TSYS)
PROFESSIONAL EXPERIENCE
Confidential - Denver, CO
Senior Hadoop/Spark developer
Responsibilities:
- Experience in deploying Hadoop cluster on Public and Private Cloud Environment like Amazon AWS, Rackspace and Open Stack.
- Experience in Spark (using RDDs, Data frames& SQLs) and Hadoop(using Map-reduce) eco-system with underlying programming language as Scala.
- Experienced in handling large datasets using Partitions, Spark in Memory capabilities, Broadcasts in Spark, Effective & efficient Joins, Transformations and other during ingestion process itself.
- Worked on Spark SQL and Data frames for faster execution of Hive queries using Spark Sql Context.
- Analyzing the clients existing Hadoop infrastructure and understand the performance bottlenecks and provide the performance tuning accordingly.
- Defining job flows in Hadoop environment-using tools like Oozie,UC4 for data scrubbing and processing.
- Loading logs from multiple sources directly into HDFS using tools like Flume.
- Strong knowledge in administration and development of Hive, Pig with HiveQL and PigLatin scripts respectively.
- Used Hive and Pig to analyze data in HDFS to identify issues and behavioral patterns
- Worked with Sqoop in importing and exporting data from different databases like MySql, Oracle into HDFS and Hive.
- Effectively used Oozie to develop automatic workflows of Sqoop, Mapreduce and Hive jobs.
- Scheduling the Jobs in the UC4 as per the deployments. Setting the workflows in the UC4.
- Troubleshooting and monitoring the cluster.
- Worked on Hive quires from Hue environment.
- Created Hive tables and involved in data loading and writing Hive.
- Monitored the user jobs from Resource manager and optimizing the long running jobs.
- Worked on Toad oracle 11.6 for data ingestion.
- Created Kafka topics, provide ACLs to users and setting up rest mirror and mirror maker to transfer the data between two Kafka clusters.
- Helped the users to connect to Kerberized Hive from SQL Workbench and BI tools.
- Written scripts for disk monitoring and logs compression.
- Handling data import from various data sources, performed transformations using Hive, Map Reduce, loaded data into HDFS and Extracted the data from Mysql into HDFS.
- Written scripts for automating the processes such as taking periodic backups, setting up user batch jobs.
- Deployed multi module applications with built tool like Maven and integrated with Continuous integration servers like Jenkins.
- Developed test cases using JUNIT and configured GIT for maintaining repository for the project
Environment: Hadoop 2.6.0, HDFS, MapReduce, Spark Core, Spark SQL, Scala, Pig 0.14, Hive 1.2.1, Sqoop 1.4.4, Flume 1.6.0,Kafka,Gobblin,Knox0.6.0,Ambari 2.4.1,Storm 0.9.3, JDk 1.8, Java/J2EE, JDBC, JUNIT4, MAVEN 2.0,Databricks.
Confidential - Denver, CO
Senior Hadoop Developer
Responsibilities:
- Coordinated with business customers to gather business requirements. And also interact with other technical peers to derive Technical requirements and delivered the BRD and TDD documents.
- Extensively involved in Design phase and delivered Design documents.
- Worked on analyzing Hadoop cluster and different Big Data analytic tools including Pig, Hive HBase database and SQOOP.
- Importing and exporting data into HDFS and Hive using SQOOP.
- Migration of huge amounts of data from different databases (i.e. Oracle, SQL Server) to Hadoop.
- Written Hive jobs to parse the logs and structure them in tabular format to facilitate effective querying on the log data.
- Involved in creating Hive tables, loading with data and writing hive queries that will run internally in mapreduce way.
- Experienced in defining job flows.
- Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting.
- Experienced in managing and reviewing the Hadoop log files.
- Load and Transform large sets of structured and semi structured data.
- Responsible to manage data coming from different sources.
- Involved in creating Hive Tables, loading data and writing Hive queries.
- Utilized Apache Hadoop environment by Hortonworks.
- Created Data model for Hive tables.
- Exported data from HDFS environment into RDBMS using Sqoop for report generation and visualization purpose.
- Written helper classes using the Java Collection Framework.
- Written JUnit Test Cases for the classes developed.
- Worked on Oozie workflow engine for job scheduling.
- Did unit testing for newly developed components using JUnit
- Involvement in Automation Environment setup using Eclipse, java, selenium web driver java language bindings and TestNG jars.
- Involved in Unit testing and delivered Unit test plans and results documents.
Environment: Hadoop 2.x, HDFS, MapReduce, Pig 0.12.1, Hive 0.13.1, Sqoop 1.4.4, Flume 1.6.0,Unix, JDk 1.8, Java/J2EE, JDBC, Junit, JSON, MAVEN 2.0
Confidential - Minneapolis, MN
Hadoop Developer
Responsibilities:
- Involved in creating Hive tables, and loading and analyzing data using hive queries.
- Developed and executed custom MapReduce programs, PigLatin scripts and HQL queries.
- Implemented Java and J2EE Design patterns like Business Delegate and Data Transfer Object (DTO), Data Access Object and Service Locator.
- Worked on importing the data from different databases into Hive Partitions directly using Sqoop.
- Performed data analytics in Hive and then exported the metrics to RDBMS using Sqoop.
- Involved in running Hadoop jobs for processing millions of records of text data.
- Extensively used Pig for data cleaning and optimization.
- Implemented complex map reduce programs to perform joins on the Map side using distributed cache.
- Developed multiple MapReduce jobs in java for data cleaning and preprocessing.
- Thoroughly tested Mapreduce programs using MRUnit and Junit testing frameworks.
- Responsible for troubleshooting issues in the execution of Mapreduce jobs by inspecting and reviewing log files.
- Extracted Tables from MS SQL Server through Sqoop and placed in HDFS and processed the records.
- Used Flume to collect and aggregate weblog data from different sources and pushed to HDFS.
- Deployed multi module applications with built tool like Maven and integrated with Continuous integration servers like Jenkins.
Environment: Hadoop 1.x, HDFS, MapReduce, Pig 0.11, Hive 0.10, Sqoop, Unix, JDk 1.8, Java/J2EE, JDBC, Junit, JSON, MAVEN 2.0
Confidential
Java Developer
Responsibilities:
- Involved in various phases of Software Development Life Cycle (SDLC) as design development and unit testing.
- Developed and deployed UI layer logics of sites using JSP, XML, JavaScript, HTML/DHTML, and Ajax.cc
- Designed different design specifications for application development that includes front-end, back-end using design patterns.
- Developed proto-type test screens in HTML and JavaScript.
- Involved in developing JSP for client data presentation and, data validation on the client side with in the forms.
- Developed the application by using the Spring MVC framework.
- Collection framework used to transfer objects between the different layers of the application.
- Spring IOC being used to inject the parameter values for the Dynamic parameters.
- Actively involved in code review and bug fixing for improving the performance.
- Documented application for its functionality and its enhanced features.
- Created connection through JDBC and used JDBC statements to call stored procedures.
- Created UML diagrams like use cases, class diagrams, interaction diagrams, and activity diagrams.
- Extensively worked on User Interface for few modules using JSPs, JavaScript and Ajax.
- Wrote complex SQL queries and stored procedures.
- Developed the XML Schema and Web services for the data maintenance and structures. .
- Designed the logical and physical data model, generated DDL scripts, and wrote DML scripts for PostgreSQL database.
- Used Hibernate ORM framework with Spring framework for data persistence and transaction management.
- Used struts validation framework for form level validation.
- Wrote test cases in JUnit for unit testing of classes.
- Involved in creating templates and screens in HTML and JavaScript.
Environment: Core Java, Eclipse, Java SDK 1.6, XML, JavaScript, HTML/DHTML
Confidential
Developer
Responsibilities:
- Analyzing abended jobs in CA7, appropriate recovery is followed for all incidents.
- Analyzing abended jobs in CA7, appropriate recovery is followed for all incidents.
- Involved in synchronizing primary Finance application with production.
- Processing testing team requests and adhoc requests on host..
- Involved in Root cause analysis and permanent fixation of the critical abend.
- Involved casting of the elements and implementation.
- Involved loading, unloading of the table based on the request by testing team.
- Involved in casting the package to move the elements changes into Integration region.
- Involved in setup the stream for testing.
- Mapped the NUID to the testing team members.
- Involved in review of Test Criteria Form which is receiving from testing team for before element implementation.
- Wrote complex SQL queries and stored procedures.
- Developed the XML Schema and Web services for the data maintenance and structures.
Environment: Cobol, JCL, DB2, VSAM, Core Java, Eclipse, Java SDK 1.6, XML
