Senior Hadoop Developer Resume
Columbus, GA
SUMMARY
- Highly Confident and Skilled Professional with having 7+ years of professional experience in IT industry, with around 4+ years of hands on expertise in Big Data processing using Hadoop, Hadoop Ecosystem (Map Reduce, Pig, Spark, Scala, Hive, Sqoop, Flume and HBase, Cassandra, Mongo DB, Akka Framework) implementation, maintenance, ETL and Big Data analysis operations
- Well versed in installation, configuration, supporting and managing of Big Data and underlying infrastructure of Hadoop Cluster
- Exposure of working on different big data distributions like Cloudera, Hortonworks, Apache etc.
- Experience in Apache Spark, Spark Streaming, Spark SQL and No SQL databases like Cassandra and HBase
- Extensive experience in performing data processing and analysis using HiveQL, Pig Latin, custom Map Reduce programs in Java, python scripts in Spark and Spark SQL
- Experience in Importing and exporting data from different databases like MySQL, Oracle, Teradata into HDFS and vice - versa using Sqoop
- Very good understanding of Partitions, bucketing concepts in Hive and designed both Managed and External tables in Hive to optimize performance
- Experience in managing and reviewing Hadoop Log files
- Experience with Oozie Workflow Engine in running workflow jobs with actions that run Hadoop Map/Reduce and Pig jobs
- Performed advanced procedures like text analytics and processing, using the in-memory computing capabilities of Spark using Scala
- Knowledge of ETL methods for data extraction, transformation and loading in corporate-wide ETL Solutions and Data warehouse tools for reporting and data analysis
- Exposure to Migration from Data warehouses to Hadoop Eco System
- Experience in NoSQL databases like HBase, Cassandra and Mongo DB.
- Worked extensively with Dimensional modeling, Data migration, Data cleansing, Data profiling, and ETL Processes features for data warehouses
- Experience in importing and exporting data using Sqoop from HDFS to Relational Database Systems (RDBMS), Teradata and vice versa
- Skilled in creating workflows using Oozie for Autosys jobs
- Hands on experience with message brokers such as Apache Kafka
- Hands on experience in application development using Java, RDBMS, and Linux shell scripting
- Well acquainted with software development life cycle (SDLC) with experience in Agile and Waterfall Methodologies.
TECHNICAL SKILLS
Hadoop Core Services: HDFS, Map Reduce, Spark, YARN
Hadoop Distributions: Cloudera CDH’s, Hortonworks HDP’s, MapR, Apache
Hadoop Data Services: Hive, Pig, Sqoop, Flume, Impala, Kafka
Hadoop Operational Services: Zookeeper, Oozie
Monitoring Tools: Cloudera Manager, Ambari
Programming Languages: Java(J2EE), Python, Scala, SQL, Shell Scripting, C, PL/SQL
Operating Systems: Windows XP, Windows Server 2008, Linux, Unix
IDE Tools: Eclipse, IntelliJ, NetBeans
Application Servers: Red hat
Databases: MySQL, HBase, Cassandra, Mongo DB, Oracle
Others: Git, Putty
PROFESSIONAL EXPERIENCE
Senior Hadoop Developer
Confidential, Columbus, GA
Responsibilities:
- Worked closely with the business analysts to convert the Business Requirements into Technical Requirements and preparing low and high-level documentation
- Performing transformations using Hive, MapReduce, hands on experience in copying .log, snappy files into HDFS from Greenplum using Flume & Kafka, loaded data into HDFS and extracted the data into HDFS from MYSQL using Sqoop
- Involved in preparing the S2TM document as per the business requirement and worked with Source system SME’s in understanding the source data behavior
- Imported required tables from RDBMS to HDFS using Sqoop and used Storm/ Spark streaming and Kafka to get real time streaming of data into HBase
- Experience in Writing Map Reduce jobs for text mining and worked with predictive analysis team and Experience in working with Hadoop components such as HBase, Spark, Yarn, Kafka, Zookeeper, PIG, HIVE, Sqoop, Oozie, Impala and Flume
- Wrote HIVE UDF's as per requirements and to handle different schema’s and xml data
- Designing and developing MapReduce jobs to process data coming in different file formats like XML, CSV, JSON
- Involved in Apache SPARK testing
- Implemented ETL code to load data from multiple sources into HDFS using Pig Scripts
- Implemented MapReduce programs to handle semi/ unstructured data like XML, JSON, Avro data files and sequence files for log files
- Responsible to review the test cases in HP ALM
- DevelopedSparkapplications using Scalafor easy Hadoop transitions.And Hands on experienced in writingSparkjobs and Sparkstreaming API using Scalaand Python
- Used SparkAPI over Cloudera Hadoop YARN to perform analytics on data in Hive developedSpark code andSpark-SQL/Streaming for faster testing and processing of data
- Installed Oozie workflow engine to run multiple Hive and Pig jobs
- Designed and developed User Defined Function (UDF) for Hive and Developed the Pig UDF'S to pre-process the data for analysis as well as experience in (UDAFs) for custom data specific processing
- Assisted in problem solving with Big Data technologies for integration of Hive with HBase and Sqoop with HBase
- Designed and developed the core data pipeline code, involving work in Java and Python and built onKafkaand Storm
- Good knowledge on Partitions, bucketing concepts in Hive and designed both Managed and External tables in Hive for optimized performance
- Performance tuning using Partitioning, bucketing of IMPALA tables
- Hands on experience on fetching the live stream data from DB2 to HBase table using Spark Streaming and Apache Kafka
- Experience in job workflow scheduling and monitoring tools like Oozie and Zookeeper
- Worked on NoSQL databases including HBase and Cassandra
- Populated HDFS and Cassandra with huge amounts of data using Apache Kafka
- Created POC (Proof of Concept) to store Server Log data into Cassandra to identify System Alert Metrics.
Environment: Map Reduce, HDFS, Hive, Pig, HBase, Python, SQL, Sqoop, Flume, Oozie, Impala, Scala, Spark, Apache Kafka, Zookeeper, J2EE, Linux Red Hat, HP-ALM, Eclipse, Cassandra, Talend, Informatica.
Senior Hadoop developer
Confidential, Philadelphia, PA
Responsibilities:
- Involved in complete Bigdata flow of the application starting from data ingestion from upstream to HDFS, processing the data in HDFS and analyzing the data.
- Using Flume to handle streaming data and loaded the data intoHadoopcluster.
- Created shell script to ingestion the files from Edge Node to HDFS.
- Worked on creating Map Reduce scripts for processing the data.
- Working extensively on HIVE, SQOOP, MAPREDUCE, SHELL, PIG and PYTHON.
- Using SQOOP to move the structured data from MySQL to HDFS, HIVE, PIG and HBase.
- Experience in writing HIVE JOIN Queries.
- Using PIG predefined functions to convert the fixed width file to delimited file.
- Worked on different Big Data file formats like txt, sequence, avro, parquet and snappy compression.
- Experienced in developing HIVE and SCALA Queries on different data formats like Text file, CSV file.
- Using Java to read the AVRO file.
- Develop HiveQL scripts to perform the incremental loads.
- Using HIVE join queries to join multiple tables of a source system and load them into Elastic Search Tables.
- Analyzed the SQL scripts and designed the solution to implement using Scala.
- Importing and Exporting Big Data in CDH in to every data analytics ecosystem.
- Involved in data migration from one cluster to another cluster.
- Analyzing HBase database and compare it with other open-source NoSQL databases to find which one of them better suites the current requirement.
- Creating the Hive tables and partitioned tables using Hive Index and bucket to make ease data analytics.
- Experience in HBase database manipulation with structured, unstructured and semi-structured types of data.
- Using Oozie to schedule the workflows to perform shell action and hive actions.
- Experience in writing the business logics for defining the DAT, CSV files for MapReduce.
- Experience in writing aggression logics in different combinations to perform complex data analytics to the business needs.
Environment: Hadoop, HDFS, Hive, Pig, Sqoop, Spark, Scala, MapReduce, Cloudera, NoSQL, HBase, Shell Scripting, Linux.
Hadoop Developer
Confidential, Pasadena, CA
Responsibilities:
- Installed and configured Apache Hadoop to test the maintenance of log files in Hadoop cluster.
- Installed and configured Hive, Pig, Sqoop, Flume and Oozie on the Hadoop cluster.
- Developed Sqoop jobs to import and store massive volumes of data in HDFS and Hive.
- Designed and developed PIG data transformation scripts to work against unstructured data from various data points and created a base line.
- Setup and benchmarked Hadoop /HBase clusters for internal use.
- Developed Java MapReduce programs for the analysis of sample log file stored in cluster.
- Developed Map Reduce Programs for data analysis and data cleaning.
- Developed PIG UDF’s (like UTAF’s & UDAF’s) for manipulating the data according to business requirements and worked on developing custom PIG Loaders.
- Implemented data pipeline by chaining multiple mappers by using Chained Mapper.
- Used Hive and created Hive tables and involved in data loading and writing Hive UDFs.
- Migration of processes from Oracle to Hive to test the easy data manipulation.
- Configured deployed and maintained multi-node Dev and Test Kafka Clusters.
- Implemented data injection systems by creating Kafka brokers, producers, Consumers, custom encoders.
- Implemented Partitioning, Dynamic Partitions and Bucketing in Hive for efficient data access.
- Developed a data pipeline using Kafka and Strom to store data into HDFS.
- Developed some utility helper classes to get data from HBase tables.
- Good experience in troubleshooting performance issues and tuning Hadoop cluster.
- Built components, modules and plugins using Angular JS and Bootstrap.
- Created Java Interfaces and Abstract classes for different functionalities.
- Loaded and transformed large sets of structured, semi-structured and unstructured data in various formats like text, sequence, XML, and JSON. Written multiple MapReduce programs to power data for extraction, transformation, and aggregation from multiple file formats including XML, JSON, CSV & other compressed file formats.
Environment: Apache Hadoop, HDFS, Cloudera Manager, Java, MapReduce, Eclipse, Hive, PIG, Sqoop, Oozie, SQL, Zookeeper, CDH3, Cassandra, Oracle, NoSQL and Unix/Linux, Kafka.
Big Data Engineer
Confidential, Las Vegas, NV
Responsibilities:
- Responsible for creating sample Datasets required for testing various Map Reduce applications from various sources.
- Developed Hive UDF to parse the staged raw data to get the item details from a specific store.
- Built re-usable Hive UDF libraries for business requirements, which enabled users to use these UDF’s in Hive querying.
- Designed workflow by scheduling Hive processes for Log file data, which is streamed into HDFS using Flume.
- Developed SQL scripts to compare all the records for every field and table at each phase of the data movement process from the original source system to the final target.
- Implemented map reduce jobs to process standard data in Hadoop cluster.
- Involved in the performance enhancement by analyzing the workflows, joins, configuration parameters etc.
- Collaborating with business users/product owners/developers to contribute to the analysis of functional requirements.
- Design & Develop workflow using Oozie for business requirements, which includes automating the extraction of data from MySQL database into HDFS using Sqoop scripts.
- Worked on migration of Informatica Power center to Hadoop eco system. Assisted with the admin department with issues related to migration.
- Performed data analytics in Hive and then exported this metrics back to Oracle Database using Sqoop.
- Provided ad-hoc queries and data metrics to the Business Users using Hive, Pig.
- Developed Map Reduce Programs in Java for applying business rules on the data.
- Implemented Partitioning, Dynamic Partitions, Bucketing in Hive.
- Document and manage failure/recovery steps for any production issues.
- Involved in Minor and Major Release work activities.
- Proactively involved in ongoing maintenance, support and improvements in Hadoop cluster.
- Import the data from various sources like HDFS/HBase into Kafka.
- Wrote MapReduce jobs using java and Pig Latin.
- Worked on NoSQL databases including HBase and Cassandra. Configured SQL Database to store Hive Teradata.
- Participated in development/implementation of Cloudera impala Hadoop environment
- Imported data using Sqoop to load data from MySQL to HDFS on regular basis.
- Used Pig as ETL (Informatica) tool to do Transformations, even joins and some pre-aggregations before storing the data onto HDFS.
- Developed Application components and API's using Scala.
- Created ETL (Informatica)jobs to generate and distribute reports from MySQL database
- Involved in loading data from LINUX file system to HDFS using Sqoop and exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the Business intelligence (BI) team.
- Installed and configured Flume, Hive, Pig, Sqoop and Oozie on the Hadoop cluster.
- Extracted the Teradata from Oracle into Hive using the Sqoop.
- Worked on Agile Methodology.
Environment: Hadoop, HDFS, Pig, Zookeeper, Sqoop, HBase, Shell Scripting, Ubuntu, Linux Red Hat, Kafka, Cassandra.
Java Developer
Confidential
Responsibilities:
- Involved in the Requirements collection & Analysis from the business team.
- Created the design documents with use case diagram, class diagrams, and sequence diagrams using Rational Rose.
- Implemented the MVC architecture using Apache Struts Framework.
- Implemented Action Classes and server-side validations for account activity, payment history and Transactions.
- Implemented views using Struts tags, JSTL and Expression Language.
- Implemented session beans to handle business logic for fund transfer, loan, credit card & fixed deposit modules.
- Worked with various java patterns such as singleton and Factory Pattern at the business layer for effective object behaviors.
- Worked on the JAVA Collections API for handling the data objects between the business layers and the front end.
- Worked on resolving java thread synchronization issues in existing applications.
- Developed Unit test cases using JUnit.
- Used Clear Case for source code maintenance.
Environment: J2EE 1.4, Java 2, Tiles, JSP 1.2, JNDI, Java Mail, Clear Case, ANT, JavaScript, JMS.
Java Developer/J2EE
Confidential
Responsibilities:
- Designed use cases for different scenarios.
- Involved in acquiring requirements from the clients.
- Developed functional code and met expected requirements.
- Wrote product technical documentation as necessary.
- Designed presentation part in JSP(Dynamic content) and HTML(for static pages).
- Designed Business logic in EJB and Business facades.
- Used Resource Manager to schedule the job in UNIX server.
- Wrote numerous session and message driven beans for operation on JBoss and WebLogic
- Apache Tomcat Server was used to deploy the application.
- Involving in Building the modules in Linux environment with ant script.
- Used MDBs (JMS) and MQ Series for Account information exchange between current and legacy system.
- Created Connection pools and Data Sources.
- Involved in the Enhancements of Data Base tables and procedures.
- Deployed this application, which uses J2EE architecture model and Struts Framework first onWebLogic and helped in migrating to JBoss Application server.
- Participated in code reviews and optimization of code.
- Followed Change Control Process by utilizing CVS Version Manager.
Environment: J2EE, JSP, HTML, Struts Frame Work, EJB, JMS, Web Logic Server, JBoss Server, PL/SQL, CVS, MS PowerPoint, MS Outlook.