Hadoop Developer Resume
Minneapolis, MN
SUMMARY:
- Overall 7 years of IT professional experience in Software Development and Requirement Analysis in Agile work environment with 4+ years of Big Data Ecosystems experience in ingestion, storage, querying, processing and analysis of Big Data.
- Experience in dealing with Apache Hadoop components like HDFS, Map Reduce, Hive, HBase, Pig, Sqoop, Oozier, Mahout, Python, Spark, Cassandra, Mongo DB,
- Good understanding/knowledge of Hadoop Architecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node, Secondary Name node, and Map Reduce concepts.
- Experienced managing No - SQL DB on large Hadoop distribution Systems such as: Cloud era, Horton works HDP, Map M series etc.
- Experienced developing Hadoop integration for data ingestion, data mapping and data process capabilities.
- Worked with various data sources such as Flat files and RDBMS-Teradata, SQL server 2005, Netezza and Oracle. Extensive work in ETL process consisting of data transformation, data sourcing, mapping, conversion.
- Hands on experience in installing, configuring, and using Hadoop ecosystem components like Hadoop Map Reduce, HDFS, HBase, Hive, Sqoop, Pig, Zookeeper, Storm, Spark, Kafka and Flume.
- Strong understanding of Data Modelling and experience with Data Cleansing, Data Profiling and Data analysis.
- Designed and implemented Apache Spark streaming application using Python(pySpark) and Scala.
- Experience in ETL (Data stage) analysis, designing, developing, testing and implementing ETL processes including performance tuning and query optimizing of databases.
- Experience in extracting source data from Sequential files, XML files, Excel files, transforming and loading it into the target data warehouse.
- Strong experience with Java/J2EE technologies such as Core Java, JDBC, JSP, JSTL, HTML, JavaScript, JSON.
- Proficiency in programming with different IDE's like Eclipse, Net Beans.
- Involved in database design, creating Tables, Views, Stored Procedures, Functions, Triggers and Indexes.
- Good understanding of Service Oriented architecture (SOA) and web services like XML, XSD, XSDL, and SOAP.
- Good Knowledge about scalable, secure cloud architecture based on Amazon Web Services (leveraging AWS cloud services: EC2, Cloud Formation, VPC, S3, etc.
- Good Knowledge on Hadoop Cluster architecture and monitoring the cluster.
- In-depth understanding of Data Structure and Algorithms.
- Experience in managing and troubleshooting Hadoop related issues.
- Expertise in setting up standards and processes for Hadoop based application design and implementation.
- Experience in importing and exporting data using Sqoop from Relational Database Systems to HDFS and vice-versa.
- Experience in managing Hadoop clusters using Cloud era Manager.
- Hands on experience in VPN, Putty, wisp, Unviewed, etc.
- Expertise in setting up standards and processes for Hadoop based application design and implementation.
- Excellent communication and inter-personal skills, flexible and adaptive to new environments, self-motivated, team player, positive thinker and enjoy working in multicultural environment.
- Analytical, organized and enthusiastic to work in a fast paced and team-oriented environment.
- Expertise in interacting with business users and understanding the requirement and providing solutions to match their requirement.
TECHNICAL SKILLS:
Programming Languages: SQL, Java (Core), Python, C ++. C
Operating System: Windows ( NT/2000/XP/7/8), LINUX, UNIX
Databases: Oracle 10g/11g, MS SQL Server 2008, MySQL, HBase (NoSQL), MongoDB(NoSQL)
Big Data ecosystem: Hadoop - HDFS, Map reduce, Apache Pig, Hive, Apache Spark, Hbase, Flume, Oozie, MongoDB
Hadoop Distributions: Cloudera (CDH3, CDH4, and CDH5), Hortonworks, MapR and Apache
IDE Tools: Eclipse, NetBeans
Web Technologies: ASP.NET, HTML, XML
OLAP concepts: Data warehousing
Other Technologies: SQL Developer, TOAD
PROFESSIONAL EXPERIENCE:
Hadoop Developer
Confidential, Minneapolis, MN
Responsibilities:
- Installed, configured, and maintained Apache Hadoop clusters for application development and major components of Hadoop Ecosystem: Hive, Pig, HBase, Sqoop, Flume, Oozie, Spark and Zookeeper.
- Used Sqoop to transfer data between RDBMS and HDFS.
- Involved in collecting and aggregating large amounts of streaming data into HDFS using Flume and defined channel selectors to multiplex data into different sinks.
- Implemented complex map reduce programs to perform map side joins using distributed cache.
- Designed and implemented custom writable, custom input formats, custom partitions and custom comparators in MapReduce.
- Responsible for troubleshooting issues in the execution of MapReduce jobs by inspecting and reviewing log files.
- Converted existing SQL queries into Hive QL queries.
- Implemented UDFs, UDAFs, UDTFs in java for hive to process the data that can’t be performed using Hive inbuilt functions.
- Effectively used Oozier to develop automatic workflows of Sqoop, MapReduce and Hive jobs.
- Exported the analysed data into relational databases using Sqoop for visualization and to generate reports for the BI team.
- Gathered the business requirements from the Business Partners and Subject Matter Experts.
- Utilized Agile Scrum Methodology to help manage and organize a team of 4 developers with regular code review sessions.
- Weekly meetings with technical collaborators and active participation in code review sessions with senior and junior developers.
- Loaded and analysed Omniture logs generated by different web applications
- Loaded and transformed large sets of structured, semi structured and unstructured data in various formats like text, zip, XML and JSON.
- Refined the Website clickstream data from Omniture logs and moved it into Hive.
- Wrote multiple MapReduce programs to power data for extraction, transformation and aggregation from multiple file formats including XML, JSON, CSV& other compressed file formats.
- Defined job flows and developed simple to complex Map Reduce jobs as per the requirement.
- Optimized Map/Reduce Jobs to use HDFS efficiently by using various compression mechanisms
- Developed PIG UDFs for manipulating the data according to Business Requirements and also worked on developing custom PIG Loaders.
- Worked on developing ETL processes (Data Stage Open Studio) to load data from multiple data sources to HDFS using FLUME and SQOOP, and performed structural modifications using Map Reduce, HIVE.
- Responsible for creating Hive tables based on business requirements.
- Developed Scala and SQL code to extract data from various databases.
- Worked on regular expression related text-processing using the in-memory computing capabilities of Spark using Scala.
- Implemented Partitioning, Dynamic Partitions and Buckets in HIVE for efficient data access.
- Involved in NoSQL database design, integration and implementation
- Loaded data into NoSQL database HBase
- Knowledge on handling Hive queries using Spark SQL that integrate Spark environment.
- Also, explored Spark MLIB library to do POC on recommendation engine.
Environment: Hadoop, CDH4, Map Reduce, HDFS, Pig, Hive, Impala, oozy, Java, spark, Kafka, Flume, Storm, Knox, Linux, Scala, Maven, Java Scripting, Oracle 11g/10g, SVN
Big Data Developer
Confidential, Baltimore, MD
Responsibilities:
- Analyze the requirement and lay out the plan to execute the task.
- With Pig Latin & Hive, analyze the files obtained from claims submissions and third party vendor data dumps.
- Developed Hadoop streaming Map/Reduce works using Python.
- Modify database through query as per client request using SQL queries and DB2
- Writing optimized SQL queries and extract data from data warehouse as per business user requirement.
- Import data using Sqoop to load data from HDFS to MySQL and vice versa on regular basis.
- Developed Pig scripts to implement ETL transformations.
- Developed join data set scripts using Pig Latin join operations.
- Write Pig User Defined functions whenever necessary for carrying out ETL tasks.
- Developed HIVE UDFs to in corporate external business logic.
- Imported Bulk Data into HBase Using Map Reduce programs.
- Good understanding of Spark architecture and its components.
- Good experience on Spark with Java/Scala.
- Good knowledge of NO SQL database HBase and MangoDB.
- Good knowledge on scripting languages like Python and Scala.
- Experienced in writing Spark Applications in Scala and Python (Pyspark).
- Good experience in handling data manipulation using python Scripts.
- Worked on Big Data Integration &Analytics based on Hadoop, SOLR, Spark, Kafka, Storm and web Methods.
- Implemented Spring boot microservices to process the messages into the Kafka cluster setup.
- Closely worked with Kafka Admin team to set up Kafka cluster setup on the QA and Production environments.
- Implemented Kafka producer and consumer applications on Kafka cluster setup with help of Zookeeper.
- Preparing dashboards using Tableau for analysis of data
- Recommending Dashboards per Tableau visualization features.
- Implemented advanced geographic mapping techniques and use custom images and geo coding to build spatial visualizations of non-geographic data
- Purchasing, Setting up, and configuring a Tableau Server for Data warehouse purpose.
- Making sure that the claims are processed on time as per the SLA.
- Used Snowflake Connector for Spark brings Snowflake into the Apache Spark ecosystem, enabling Spark to read data from, and write data to, Snowflake.
- Worked on populating Spark Data Frame from a table in Snowflake and writing the contents of a Spark Data Frame to a table in Snowflake
- Worked on EMR notebooks for iterative development, collaboration and access data stored in AWS S3.
- Stored EMRs in AWS S3 Data Lake to scale and process EMR cluster using Apache Spark easily.
- When there is any issue in the batch jobs due to the wrong format of the claims, file should be analyzed and then work with the EDI team to make sure that the files are sent with the correct format.
Environment: Hadoop - Pig latin Script, Hive, Hbase, Apache Spark, MapReduce, SQL Server 2008,SQL Server Management Studio, Tableau 7,8.1,Tableau server, COBOL, JCL
Hadoop Developer
Confidential, Denver, CO
Responsibilities:
- Developed MapReduce programs to parse the raw data, populate staging tables and store the refined data in partitioned tables
- Installed & maintained Cloudera Hadoop distribution.
- Installed and configured Flume, Hive, Pig, Sqoop and Oozie on the Hadoop cluster.
- Involved in loading the data from Linux file system to HDFS.
- Implemented MapReduce programs on log data to transform into structured way to find user information.
- Performed performance tuning and troubleshooting of MapReduce jobs by analyzing and reviewing Hadoop log files.
- Exported the analyzed data to the relational databases using sqoop for virtualization and to generate reports for the BI team
- Monitored workload, job performance and capacity planning using Cloudera Manager.
- Installed Oozie workflow engine to run multiple MapReduce, Hive and Pig jobs.
- Responsible for creating Hive tables, loading the structured data resulted from MapReduce jobs into the tables and writing hive queries to further analyze the logs to identify issues and behavioral patterns.
- Imported data frequently from MS-SQL Server, Cassandra and NoSQL to HDFS using Sqoop.
- Managing and upgrading MVC framework and Scala.
- Experience in refactoring the existing spark batch process for different logs written in Scala.
- Worked in utilizing spark machine learning techniques implemented in Scala.
- Good knowledge on scripting languages like Python and Scala.
- Experienced in writing Spark Applications in Scala and Python (Pyspark).
- Supported operation team in Hadoop cluster maintenance activities including commissioning and decommissioning nodes and upgrades.
- Used ETL tool, Talend to do transformations, event joins, filter and some pre-aggregations
- Created and maintained Technical documentation for launching HADOOP Clusters and for executing Hive queries and Pig Scripts.
- Used Tableau for visualizing and to generate reports.
Environment: Hadoop, Cloudera, MapReduce, Hive, Sqoop, Spark, Flume, Talend, Python, MS-SQL Server, Tableau, ETL, NoSQL.
Hadoop Developer
Confidential, Cincinnati, OH
Responsibilities:
- Worked with the business users to gather, define business requirements and analyze the possible technical solutions.
- Installed Name node, Secondary name node, Yarn (Resource Manager, Node manager, Application master), Data node.
- Installed and Configured HDP2.2
- Responsible for implementation and ongoing administration of Hadoop infrastructure.
- Developed workflow in Oozie to automate the tasks of loading the data into HDFS and pre-processing with Pig.
- Migrated complex map reduce programs into in memory Spark processing using Transformations and actions.
- Good knowledge on Teradata Manager, TDWM, PMON, DBQL, SQL assistant and BTEQ.
- Gathered system design requirements, design and write system specifications.
- Designed and developed UNIX shell scripts as part of the ETL process to compare control totals, automate the process of loading, pulling and pushing data from and to different servers.
- Experienced in working with different scripting technologies like Python, Unix shell scripts.
- Mentored analyst and test team for writing Hive Queries.
- Worked on optimizing and tuning the Teradata views and SQL’s to improve the performance of batch and response time of data for users.
- Designed workflows with many sessions with decision, assignment task, event wait, and event raise tasks.
- Extracted the needed data from server and into HDFS and bulk loaded the cleaned data into HBase.
- Handled different time series data using HBase to store data and perform analytics based on time to improve queries retrieval time.
- Involved in Agile methodologies, daily scrum meetings, spring planning.
Environment: Hadoop, HDFS, Hive, Pig, Flume, Sqoop, Spark, MapReduce, Cloudera, Avro, Snappy, Zookeeper, CDH, NoSQL, HBase, Java (JDK 1.6), Eclipse, Python, MySQL.
Java/J2EE Developer
Confidential
Responsibilities:
- Analyzed project requirements for this product and involved in designing using UML infrastructure.
- Interacting with the system analysts & business users for design & requirement clarification.
- Extensive use of HTML5 with Angular JS, JSTL, JSP, jQuery and Bootstrap for the presentation layer along with JavaScript for client-side validation.
- Taken care of Java Multithreading part in back end components.
- Developed HTML reports for various modules as per the requirement.
- Developed Web Services using SOAP, SOA, WSDL Spring MVC and developed DTDs, XSD schemas for XML (parsing, processing, and design) to communicate with Active Directory application using Restful API.
- Created multiple RESTful web services using jersey 2 framework.
- Used Aqua Logic BPM (Business Process Managements) for workflow management.
- Developed the application using NOSQL on MongoDB for storing data to the server.
- Developed complete business tier with state full session Java beans and CMP Java entity beans with EJB 2.0.
- Developed integration services using SOA, Web Services, SOAP, and WSDL.
- Designed, developed and maintained the data layer using the ORM framework in Hibernate.
- Used Spring framework's JMS support for writing to JMS Queue, Hibernate Dao Support for interfacing with the database and integrated Spring with JSF.
- Involved in writing Unit test cases using JUnit and involved in integration testing.
Environment: Java, J2EE, HTML, CSS, JSP, JavaScript, Bootstrap, AngularJS, Servlets, JDBC, EJB, Java Beans, Hibernate, Spring MVC, Restful, JMS, MQ Series, AJAX, WebSphere Application Server, SOAP, XML, MongoDB, JUnit, Rational Suite, CVS Repository.
Oracle PL/SQL Developer
Confidential
Responsibilities:
- Developing Oracle PL/SQL stored procedures, Functions, Packages, SQL scripts.
- Worked with users and application developers to identify business needs and provide solutions.
- Created Database Objects, such as Tables, Indexes, Views, and Constraints.
- Enforced database integrity using primary keys and foreign keys.
- Tuned pre-existing PL/ SQL programs for better performance.
- Created many complex SQL queries and used them in Oracle Reports to generate reports.
- Implemented data validations using Database Triggers.
- Used import export utilities such as UTL FILE for data transfer between tables and flat files
- Performed SQL tuning using Explain Plan.
- Provided support in the implementation of the project.
- Worked with built-in Oracle standard Packages like DBMS SQL, DBMS JOBS and DBMS OUTPUT.
- Created and implement report modules into database from client system using Oracle Reports as per the business requirements.
- Used PL/SQL Dynamic procedures during Package creation.
Environment: Oracle 9i, Oracle Reports, SQL, PL/SQL, SQL*Plus, SQL*Loader, Windows XP .