Hadoop And Spark Developer Resume
Erie, PA
PROFESSION SUMMARY:
- Overall 9 years of professional experience in IT and Enterprise Application Development in all phases of software development life cycle and Big Data Analytics. Around 3+ Years of experience in Big Data implementing complete Hadoop solutions.
- Hands on experience in using Apache Hadoop ecosystem components like HDFS, Map Reduce, Hive, PIG, Spark, Oozie, Sqoop, Kafka, Storm, Hue, Maven, SBT and Java/J2EE application development.
- Working experience on Cloudera Hadoop distribution and Horton works - HDP distributions and working on Cloudera Hue web interface for executing the respective scripts.
- Experience in importing and exporting data using Sqoop from HDFS to Relational Database Systems(RDBMS) and vice-versa
- Experience in analyzing data using Hive QL, Pig Latin, and custom Map Reduce programs in Java and using Spark
- Experience in optimization of Map Reduce algorithm using combiners and partitioners to deliver the best results.
- Knowledge of job workflow scheduling and monitoring tools like Oozie and Zookeeper
- Expertise in core Java, J2EE, JDBC, Web Services, Hibernate, Shell Scripting and proficient in using Java API’s for application development
- Proficient on processing the data using User Defined Functions (UDF) written in Java and Hive.
- Extensively used ETL methodologies for supporting Data Ingestion, Data Extraction, transformations and loading processing keeping in view of the optimization and performance tuning.
- Strong database connectivity skills which includes Oracle, MYSQL, Teradata and DB2 and in programming with SQL, PL/SQL, and Stored Procedures, Triggers, Functions and Packages besides writing DDL, DML and Transaction queries with development tools.
- Expertise in working with different kind of data files such as flat files, CSV, JSON, Avro/ORC, and XML and good understanding of structured and unstructured data.
- Excellent working experience in Scrum / Agile framework and Waterfall project execution methodologies.
- Extensive experience in customer specification study, requirements gathering, system architectural design and turning the requirements with domain expertise in Financial, Retail and Insurance.
- Strong technical background, excellent analytical ability, good debugging skills, good communication skills, team player, goal oriented and ability to quickly learn new technologies as required.
- Good understanding of NoSQL databases like HBase, Cassandra, MongoDB and Mark logic.
- Scheduling Hadoop work flows and dependencies using Autosys and Control M
- Hands on experience with message brokers such as Apache Kafka & IBM WebSphere.
- Hands on Experience with Amazon Web Services (AWS) such as Amazon S3, Amazon EC2, Amazon Elastic Map Reduce (EMR).
TECHNICAL SKILLS
Bigdata Ecosystem: HDFS, Map Reduce, Hive, Pig, HBase, Zookeeper, Sqoop, Oozie, Flume, Spark, Kafka and Storm
Java/J2EETechnologies: Java, J2EE, Servlets, JSP, XML, AJAX, SOAP and REST Web services, Hibernate.
Enterprise Frameworks: MVC, Struts 2/1, Hibernate 3, Spring 3/2.5/2
Version and Source Control: RAD, SVN, Git
Programming Languages: C, Java, XML, Unix Shell scripting, SQL and PL SQL, Scala, Python
Web Technologies: HTML, DHTML, XML, XSLT, JavaScript, CSS
IDE Tools: Intellij, Eclipse, Net beans
Web Services: WebLogic, WebSphere, JBoss
Databases: Oracle,DB2,, MySQL, Teradata, Hbase, Apache Cassandra and Marklogic
Frameworks: MVC, Struts, Log4J, Junit, Maven/ SBT
Operating Systems: Windows, UNIX, Linux, Centos
PROFESSIONAL EXPERIENCE:
Hadoop and Spark Developer
Confidential, Erie, PA
Responsibilities:
- Responsible for understanding the scope of the project and requirements gathering.
- Break down large system requirements into manageable parts for each line of business.
- Worked extensively with Sqoop for importing data from Oracle/DB2
- Utilized Apache Hadoop ecosystem tools like HDFS, Hive and Pig for large datasets analysis.
- Expertise in designing and deployment of Hadoop cluster and different Big Data analytic tools including Pig, Hive, HBase, Oozie, Sqoop, Flume, Spark, Impala with Cloudera distribution.
- Installed Hadoop, Map Reduce, and HDFS and developed multiple Map Reduce jobs in Pig and Hive for data cleaning and pre-processing.
- Assisted in upgrading, configuration and maintenance of various Hadoop infrastructures like Pig, Hive, and HBase.
- Developed workflows and coordinator jobs in Oozie.
- Developed Spark scripts by using Scala shell commands as per the requirement.
- Used Spark API over Cloudera Hadoop YARN to perform analytics on data in Hive.
- Developed UDF's in MapReduce, Scala scripts using both Data frames/ RDD/SQL and in Spark for Data Aggregation, queries and writing data back into RDBMS through Sqoop.
- Exploring with Spark improving the performance and optimization of the existing algorithms in Hadoop using Spark context, Spark-SQL, Data Frame, pair RDD's, Spark YARN.
- Developed Spark code and Spark-SQL/Streaming for faster testing and processing of data.
- Experience in deploying data from various sources into HDFS and building reports using Tableau.
- Performed real time analysis on the incoming data.
- Configured deployed and maintained multi-node Dev and Test Kafka Clusters.
- Developed a data pipeline using Kafka and Strom to store data into HDFS.
- Performed transformations, cleaning and filtering on imported data using Hive, Map Reduce, and loaded final data into HDFS.
- Load the data into Spark RDD, Dataframes and performed in-memory data computation to generate the output response for reporting and data warehouse.
- Loading data into Teradata using Bulk Load and Non-bulk load
- Configured Zookeeper for Cluster co-ordination services.
- Used Jira for bug tracking, Issue tracking and project management.
Environment: Cloudera, HDFS, Map Reduce,Spark, Hive, Pig, HBase, Pig, Flume, Sqoop, Flume, Kafka, Java, UNIX Shell Scripting, Oracle, DB2, Teradata, Cassandra and Mark logic.
Hadoop Data Engineer
Confidential, Harrisburg, PA
Responsibilities:
- Performed benchmarking of HDFS and Resource manager using Test DFSIO and Tera Sort
- Worked on SQOOP to import data from various relational data sources
- Working with Flume in bringing click stream data from front facing application logs.worked on strategizing SQOOP jobs to parallelize data loads from source systems
- Participated in providing inputs for design of the ingestion patterns and strategizing loads without impacting front facing applications
- Worked on design on Hive data store to store the data from various data sources
- Involved in brainstorming sessions for sizing the Hadoop cluster and providing inputs to analyst team for functional testing.
- Worked with source system load testing teams to perform loads while ingestion jobs are in progress
- Worked on performing data standardization using PIG scripts
- Worked on installation and configuration Horton works cluster ground up.
- Managed various groups for users with different queue configurations
- Worked on building analytical data stores for data science team's model development.
- Worked on design and development of Oozie works flows to perform orchestration of PIG and HIVE jobs.
- Worked on performance tuning of HIVE queries with partitioning and bucketing process
- Exposure in managing various data provisioning formats such as Avro, Parquet, Ebcidic formats
- Analyzed the SQL scripts and designed the solution to implement using PySpark.
- Experienced in analyzing the SQL scripts and designed the solution to implement using PySpark
Environment: Hadoop, MapReduce, Yarn, Hive, Pig, HBase, Oozie, Sqoop, Strom, Flume, Oracle 11g, Core Java, Horton works, HDFS, Eclipse.
Programmer and Hadoop ETL Analyst
Confidential, Charlotte, NC
Responsibilities:
- Initially involved in Involved in the development usingJava/J2EE Technologies, Web services, Hibernate ORM Framework.
- Involved in Trouble Shooting and Customer Support.
- Interacted with end applications and performed Business Analysis and Detailed Design of the system from Business Requirement documents.
- Development and ETL Design in Hadoop cluster environent.
- Developing Hive queries and UDF's as per requirement.
- Involved in extracting customer's big data from various data sources into Hadoop HDFS. dis included data from Netezza, Teradata, Oracle, DB2 Enterprise databases.
- Used Sqoop to efficiently transfer data between databases and HDFS and used Flume to stream the log data from servers.
- Developed Map Reduce programs to cleanse the data in HDFS obtained from heterogeneous data sources to make it suitable for ingestion into Hive schema for analysis.
- The Hive tables created as per requirement were managed or external tables defined with appropriate static and dynamic partitions, intended for efficiency.
- Implemented Partitioning, Bucketing in Hive for better organization of the data.
- Exposure in managing various data provisioning formats such as Avro, Parquet, Ebcidic formats
- Developed many Hive queries for data analysis to meet the business requirements.
- Used Oozie workflow engine to manage interdependent Hadoop jobs and to automate several types of Hadoop jobs such as Java Map Reduce, Hive and Sqoop as well as system specific jobs.
- Implemented Fair Scheduler on the job tracker to allocate the fair amount of resources to small jobs.
- Also Implemented automatic failover Zookeeper and zookeeper failover controller.
- Design and developed many workflows using Oozie to execute Hive, pig, java main, shell, email actions
- Developed smoke test script to validate cluster followed by platform updates and maintenance
- Scheduling Hadoop work flows and dependencies using Autosys scheduler
- Worked on the scope of Hadoop Cloudera upgrade from CDH3 to CDH4.x.
- Worked on providing user support and application support on Hadoop Infrastructure.
- Worked on evaluating, comparing different tools for test data management with Hadoop.
Environment: Java, Web Services, Hadoopv1.2.1, HDFS, Scala, Map Reduce, Hive, Sqoop, Pig, DB2, Oracle, Couch DB, Cassandra and CDH4.x
Java/J2EE consultant
Confidential - Harrisburg, PA
Responsibilities:
- Involved in gathering and analyzing the requirements, design, development and support of the application.
- Developed web-tier module using JSPs, Servlets, and Struts.
- Developed user Interfaces using HTML and JavaScript.
- Worked in Agile Scrum Methodology
- Involved in configuration of Spring MVC and Integration with Hibernate for Suntoll module.
- Developed action classes and Services classes dat support Struts framework.
- Used design patterns like Singleton, DAO (DataAccessObjects) to connect backend Database.
- Experience in working with Spring MVC using AOP, DI/IOC and JDBC template.
- Developed services for business logic and data access layer with Hibernate.
- Used Java and Object Oriented Programming to write business logic for different requirements
- Hands on Experience with Spring Framework, Maven, JIRA and Agile methodologies.
- Developed reusable AJAX components for web-tier.
- Configured the JMS application server to make asynchronous calls for app admin role.
- Implemented Hibernate (ORM Mapping tool) framework to interact with the database to update, retrieve, insert and delete values TEMPeffectively.
- Extensively used the JSON objects with AJAX for UI displays. Created XML Schema, XML template and XSL. Used HTML, CSS, JavaScript and AngularJS to design the front end.
- Used Maven to build the J2EE application.
- Created page flows for new business requirements by co coordinating with the business users.
- Develop new functionalities with in AS400 to receive messages fromMDMand generatenew master records at corporate systems.
- Develop VB scripts for users to automate manual key-in activities with AS400 mastermenus
- Automate master data job monitoring process and trigger an email to support team in case of failures
- Designed new AS400 architectures and functionalities to replace the item adoption logics from CIM system implement it in AS400.
- Transmit master data from AS400 to windows system through ICF communication
Environment: AS400, DB2/400, IBMMDM, Java, JMS, RPGLE, CL, CLLE, VB scripts, Java Unix Shell Script
Programmer Analyst
Confidential, Fort Worth, TX
Responsibilities:
- Technology development and Support of the mainframe applications and the middleware tools
- Onsite offshore communication with client. Work allocation to the team and mentoring the team
- Responsible for overseeing the Quality procedures related to the project and the Software development.
- Developed domain objects using Hibernate and the respective configuration in xml files.Used hibernate framework for getting the data from the database.
- Developed and tested the Efficiency Management module using Core Java, EJB, Servlets and JSP components in WebLogic Application Server
- Worked on front end technologies Java script, JQuery, HTML and CSS
- Developed Struts framework, providing access to system functions of a server’s business layer
- Install application releases to production and provide Tier1 post production support.
- Identify feature improvements and provide time critical enhancement suggestions to business teams for increased efficiency and usability of the applications.
- Implemented business components as persistent object model as EJBCMP and BMP Entity Beans for storing and retrieving data objects from Resources
- Implemented the application MVC Architecture using Struts framework
- Involved in stored procedures using PL/SQL to interact with the Oracle database required by the Efficiency Module, Teradata tool.
- Built the PTC data mining algorithm as a reusable component such dat it is utilized for different needs like Retro Running and Single Train Data Analytics apart from its Mainstream Execution as a scheduled job.
Environment: Java, J2ee, JSP, Servlets, HTML, CSS, SOAP, XML, WSDL, SVN, Spring IOC, Spring AOP, Hibernate, Spring MVC, JNDI, Axis 2, Oracle 10g, SQL navigator, WebLogic Application Server, Windows2003, Unix and Shell script, COBOL, VSAM, JCL, DB2, IMS DB/DC, CICS.
