Sr. Hadoop Developer Resume
Houston, TX
SUMMARY
- Having 8+ years of extensive experience in Information Technology on Hadoop Echo Systems Administration and Enterprise Application Development and analysis, planning, design and implementation of Enterprise Data solutions.
- Worked on various Hadoop Distributions (Cloudera, MapR) to implement and make use of those.
- Hands on experience in developing Applications using Hadoop ecosystem like MapReduce, Spark, Hive, Pig, Flume, Sqoop and HBase.
- Excellent understanding of Hadoop architecture and core components such as Name Node, Data Node, Resource Manager, Node Manager and other distributed components in the Hadoop platform.
- Hands on experience in writing Ad - hoc Queries for moving data from HDFS to HIVE and analyzing the data using HIVE QL.
- Worked extensively in importing and exporting data using Sqoop from Relational Database Systems (RDBMS) to HDFS.
- Developed Kafka producer and consumers, HBase clients, Spark and Hadoop MapReduce jobs along with components on HDFS, Hive.
- Aggregations and analysis done on large set of log data, collection of log data done using custom built Input Adapters and Sqoop.
- Involved in converting Hive/SQL queries into Spark transformations using Python and Scala.
- Extending HIVE and PIG core functionality by using custom User Defined Function's (UDF), User Defined Table-Generating Functions (UDTF) and User Defined Aggregating Functions (UDAF) for Hive and Pig.
- Created Sqoop jobs with incremental load to populate Hive External tables.
- Good understating of Partitioning, Bucketing, Join optimizations and query optimizations in hive.
- Written custom UDF's in hive and pig to solve certain business requirements.
- Experience in the successful implementation of ETL solution between an OLTP and OLAP database in support of Decision Support Systems/Business Intelligence with expertise in all phases of SDLC.
- Familiarity with the Hadoop information architecture, design of data ingestion pipeline, data mining and modeling, advanced data processing and machine learning. Experience in optimizing ETL workflows.
- Well versed in configuring the Hadoop cluster using major Hadoop Distribution like MapR and Cloudera.
- Worked in importing and exporting data from Relational database to HDFS, Hive and HBase using Sqoop.
- Worked on various performance optimizations like using distributed cache for small datasets, Partition, Bucketing in Hive and Map Side joins.
- Experienced with different file formats like CSV, Text files, Sequence files, ORC, Parquet, XML, JSON and Avro files.
- Experience in scripting languages like bash and korn shell.
- Expertise in doing Unit Testing, Integration Testing, System Testing and Data Validation for Developed Informatica Mappings.
- Experience in JAVA, J2EE, XML, Struts, Jquery AJAX, Spring Hibernate, Oracle SOA Suite 11g (BPEL, BPM), Unix, Oracle PL/SQL
- Expertise in SOAP, WSDL and Web Services, RESTful Web Services, XSD, XML, XSLT and XPATH.
- Experience in messaging using Oracle JMS and ORACLE MQ.
- Experience in Oracle Project Management product Primavera P6.
- In depth understanding of J2EE Architecture and implementation.
- Implemented MVC and Design Patterns in web architecture.
- Experience in Agile (Scrum) Methodologies for software development and management.
- Excellent analytical, problem solving skills and a motivated team player with excellent inter-personal skills.
- Experience working in an onsite/offshore model.
TECHNICAL SKILLS
RDBMS/OLTP: SQL 2008/2008R 2/2012/2014/2016 , Oracle 9i/10g/11g, MS Access
Data Warehouse/OLAP: SQL Server Analysis Services(SSAS), IBM Cognos, Tableau Server
Reporting Tools: SQL Server Reporting Service(SSRS), Tableau (8x/9.x/10.x)., IBM Cognos, PowerBI, Qlikview (1.x/2.x/3.x), Domo
ETL: SQL Server Integration Service (SSIS), Data Transformation Services, SQL Server DataTools(SSDT), Alteryx
Project Management Tools: JIRA, SharePoint, Github, MS Project Planner, TFS
Platform: Visual Studio 2005/2008/2010/2012
Programming Language: SQL, Confidential -SQL, PL/SQL, R-Programming, Python(2.x/3.x)
PROFESSIONAL EXPERIENCE
Confidential - Houston, TX
Sr. Hadoop Developer
Responsibilities:
- Worked on Cloudera distribution of Hadoop
- The Data Interface is implemented to get information of customers using Rest API and Pre-Process data using Map Reduce and store into HDFS.
- Experience with Talend and SQOOP to Import/Export data from RDBMs to HDFS.
- The Oozie work flows is configured to automate data flow, preprocess and cleaning tasks using Hadoop Actions.
- Implemented Generic writable to incorporate multiple data sources into reducer to implement recommendation based reports using Map Reduce programs.Implemented Map Reduce programs to find out top failure locations of the ATM's using different tacking device.
- The Cassandra CQL is used with Java API's to retrieve data from Cassandra tables
- Implemented Optimized joins to perform analysis on different data sets using Map Reduce programs.
- Worked on optimizing Shuffle and Sort phase in Map Reduce Phase.
- Experience in writing business logic using Hive UDF's to perform ad-hoc queries on structured data.
- Experience with HIVE DDLs and Hive Query language (HQLs)
- Worked on dash boards that internally use Hive queries to perform analytics on structured data, Avro and Json data.
- Experienced in transferring data from different data sources into HDFS systems using Kafka producers, consumers and Kafka brokers.
- Involved in importing the real time data to Hadoop using Kafka and implemented the Oozie job for daily
- Integrated Apache Storm with Kafka to perform web analytics and to perform click stream data from Kafka to HDFS.
- Experienced in handling Avro and Json data in Hive using Hive SerDe's.
- Good knowledge and understanding of REST architecture style and its application to well performing web sites for global usage.
- Written Map/Reduce programs, Pig scripts to specify the conditions to separate the fraudulent claims.
- Performance tuning and indexing strategies using mongo utilities like Mongostat and Mongotop.
- Migrated Mongo database systems from No-SSL authentication to SSL authentication using certificates.
- Migrated ETL operations into Hadoop system using Pig Latin scripts.
- Developing Scripts and Batch Job to schedule various Hadoop Program.
- Experience with Hive queries for data analysis to meet the business requirements.
- Worked on test scripts to support test driven development and continuous integration.
- Responsible to manage data coming from different sources.
- Experience in managing and reviewing Hadoop log files.
- Managing and scheduling Jobs on a Hadoop cluster using Ganglia.
- Used Pig as ETL tool to do transformations, event joins, filter bot traffic and some pre-aggregations before storing the data onto HDFS.
- Experience in build scripts using Maven and do continuous integrations systems like Jenkins.
Environment: Hadoop, Hive, Map Reduce, HDFS, Pig, Sqoop, Maven, Jenkins, Java (JDK 1.6), Java 6, Eclipse, Oracle 10g, PL/SQL, SQL*PLUS, Linux
Confidential - Stamford, CT
Sr. Hadoop Developer
Responsibilities:
- Worked on SQOOP for Import/Export data into HDFS and Hive.
- Experience with Pig program for loading and filtering the streaming data into HDFS using Flume.
- Worked on handling data from different data sets, join them and pre-process using Pig join operations.
- Moving large amount data into HBase using Map Reduce Integration.
- Experienced with Map-Reduce programs to clean and aggregate the data
- Developed HBase data model on top of HDFS data to perform real time analytics using Java API.
- Worked on different kind of custom filters and handled pre-defined filters on HBase data using API.
- Strong understanding of Hadoop eco system such as HDFS, MapReduce, HBase, Zookeeper, Pig, Hadoop streaming, Sqoop, Oozie and Hive
- Developed counters on HBase data to count total records on different tables.
- Experienced in handling Avro data files by passing schema into HDFS using Avro tools and Map Reduce.
- Worked on custom Pig Loaders and Storage classes to work with a variety of data formats such as JSON, Compressed CSV, etc.
- Implemented secondary sorting to sort reducer output globally in map reduce.
- Implemented data pipeline by chaining multiple mappers by using Chained Mapper.
- Created Hive Dynamic partitions to load time series data
- Experienced in handling different types of joins in Hive like Map joins, bucker map joins, sorted bucket map joins.
- Created tables, partitions, buckets and perform analytics using Hive ad-hoc queries.
- Experienced import/export data into HDFS/Hive from relational data base and Tera data using Sqoop.
- Handling continuous streaming data comes from different sources using flume and set destination as HDFS.
- Integrated spring schedulers with Oozie client as beans to handle cron jobs.
- Experience with CDH distribution and Cloudera Manager to manage and monitor Hadoop clusters
- Actively participated in software development lifecycle (scope, design, implement, deploy, test), including design and code reviews.
- Involved in story-driven agile development methodology and actively participated in daily scrum meetings.
Environment: Hadoop, HDFS, Map Reduce, Hive, Pig, Hbase, Sqoop, RDBMS/DB, Flat files, Mysql, CSV, Avro data files.
Confidential - Irving, TX
Hadoop Developer
Responsibilities:
- Hands on experience in developing Applications using Hadoop ecosystem like MapReduce, Hive, Drill, Spark, Pig, Flume, Sqoop and HBase.
- Assessed business rules, worked on source to target data mappings and collaborated with the stakeholders.
- Familiarity with the Hadoop information architecture, design of data ingestion pipeline, data mining and modeling, advanced data processing and machine learning. Experience in optimizing ETL workflows.
- Handled structured and unstructured data and applying ETL processes.
- Written Map Reduce procedures to power data for extraction, transformation and aggregation from multiple file formats including XML, JSON, CSV & other compressed file formats.
- Expertise in data migration from various databases to Hadoop HDFS and Hive using Sqoop.
- Worked with Hive's data warehousing infrastructure to analyze large structured datasets.
- Experienced in creating Hive schema, external tables and managing views.
- Responsible for Data loading involved in creating Hive tables and partitions based on the requirement.
- Executed Map Reduce programs to cleanse data in HDFS gathered from heterogeneous data sources to make it suitable for ingestion into Hive schema for analysis.
- Wrote Spark applications in Scala utilizing the data frame and spark SQL api.
- Strong Knowledge on Architecture of Distributed systems and parallel processing, In-depth understanding of MapReduce programming paradigm.
- Importing data into HDFS using Sqoop, which includes incremental loading.
- Design and develop MapReduce jobs to process logs and feed Data Warehouse, load Hive tables for analytics and to store daily feed of data on HDFS for other team's use.
- Develop automated shell scripts that are responsible for the data flow, monitoring and status reporting.
- Taking on-call responsibilities and responding whenever needed (if something goes wrong with Hadoop jobs or clusters)
Environment: Hadoop, Map Reduce, HDFS, PIG, Hive, Spark, Sqoop, HBase, Impala, Cloudera, Tabula, Eclipse, Scala, UNIX Shell Scripts, Java, RestClient, Firebug, Cassandra, Amazon Web Services with Cloud, Business Intelligence, HTML, XML, XML SPY, Putty.
Confidential, Wilmington, DE
MS SQL/Business Intelligence Developer
Responsibilities:
- Involved in gathering user requirements from business users and IT managers and created documentation for the project.
- Worked with writing complex SQL queries that involved reoccurring sub queries with the use of multiple joins and unions.
- Designed and created Data Warehouse solutions using SSIS packages with the use of lookup, merge, union-all.
- Created SSIS packages that involved dealing with different source formats (Text files, XML, Database Tables).
- Implemented Agile Project Management method throughout the project.
- Performed performance tuning with existing SQL queries with the use of SQL Server Profiler/Database Engine Tuning Advisor.
- Developed dashboards in Tabular Modes to help management identify critical KPIs and facilitate strategic planning in the organization.
- Using SSRS designed dashboards having Geo Maps, Heat Maps, Drill through, Drill down reports functionality.
- Deployed the reporting solutions on production servers and maintained the data quality accuracy before data refreshes on dashboards.
- Created corporate standard variables in the metadata to ensure standard measures and best practices are followed.
- Maintained effectiveness in creating the dashboards by appropriately including guided navigational links and drilldowns and setup the visibility of dashboards to employees based on their responsibilities.
- Created groups in the repository and added users to the groups and granted privileges explicitly and through group inheritance.
- Developed & designed various reports (pivots, charts, tabular, drill down, pivot, narrative reports) using global and local filters.
- Created calculated member for complex calculation in reports using Business Intelligence.
- Involved in the development, testing and deployment phase.
Environment: SQL Server 2012, SQL Server Management Studio, SQL BI Suite (SSIS, SSRS), XML, MS Project, MS Access & Windows Server 2008 R2.
Confidential - Irving, TX
Hadoop Developer
Responsibilities:
- Optimizing the Hive Queries using the various files format like JSON, Avro, ORC, and Parquet.
- Worked on Spark RDD transformations to map business analysis and apply actions on top of transformations.
- Experienced in working with spark eco system using Spark SQL and Scala queries on different formats like Text file, Avro, Parquet files.
- Worked in Spark streaming to get ongoing information from the Kafka and store the stream information to HDFS.
- Developed Pig Latin scripts and Pig command line transformations for data joins and custom processing of Map reduce outputs and loading tables from Hadoop to various clusters.
- Talend jobs for data ingestion, enrichment, and provisioning.
- Worked in migrating Hive QL into Impala to minimize query response time.
- Involved in loading data from edge node to HDFS using shell scripting.
- Use Data frames for data transformation.
- Worked with Kerberos and integrated it to the Hadoop cluster to make it more strong and secure from unauthorized access.
- Migrated an existing on-premises application to AWS. Used AWS services like EC2 and S3 for small data sets.
- Created Hive tables, dynamic partitions, buckets for sampling, and working on them using HQL.
- Worked on Spark using Scala and Spark SQL for faster testing and processing of data.
- Experienced a proof of concept using Kafka, HBase for processing streaming data.
- Involved in advanced procedures like text analytics and processing using the in-memory computing capabilities like Apache Spark written in Scala.
- Worked with BI teams in generating the reports and designing ETL workflows on Tableau. Deployed data from various sources into HDFS and building reports using Tableau.
- Written Python scripts to analyze the data of the customers.
- Implemented Talend jobs to load data from different sources and integrated with Kafka.
- Integrated Oozie with the rest of the Hadoop stack supporting several types of Hadoop jobs out of the box (such as Map-Reduce, Pig, Hive, and Sqoop) as well as system specific jobs (such as Java programs and shell scripts).
Environment: Map Reduce, HDFS, Spark, Scala, Python, Kafka, Hive, Pig, Spark streaming, Talend, HBase, Tableau, Maven, Jenkins, UNIX, MR Unit, Git.
