Senior Hadoop Developer Resume
Austin, TX
SUMMARY
- 7 years of professional experience working with data, which includes hands on experience of 5+ years in analysis, design, development and maintenance of Hadoop.
- Extensive experience of development using Hadoop ecosystem covering Map Reduce, HDFS, YARN, Hive, Impala, Pig, Hbase, Spark, Sqoop, Oozie, Cloudera.
- Full - scale knowledge of Hadoop ecosystem components such as HDFS, Job Tracker, Name Node, Data Node.
- Experience with an in-depth level of understanding in the strategy and practical implementation of AWS Cloud-Specific technologies including IAM, EC2, EMR, SNS, RDS, Redshift, Athena, Dynamo DB, Lambda, Cloud Watch, Auto-Scaling, S3, and Route 53.
- Strong experience in analyzing data using HiveQL, Spark SQL, HBase and custom Map Reduce programs.
- Performed importing and exporting data into HDFS and Hive using Sqoop.
- Experience in writing shell scripts to dump the Shared data from MySQL servers to HDFS.
- Working knowledge in python and Scala to use spark.
- Experience in Oozie and workflow scheduler to manage Hadoop jobs by Direct Acyclic Graph (DAG) of actions with control flows.
- Experience in designing and developing POCs in Spark using Scala to compare the performance of Spark with Hive and SQL/Oracle.
- Handled different file formats like Parquet, Avro files, RC files using different SerDes in Hive.
- Performed Data Ingestion from multiple disparate sources and systems using Kafka.
- Experience in managing and monitoring Hadoop cluster using Cloudera Manager.
- Experience in writing custom UDF’s which extends Hive and Pig core functionalities
- Good knowledge of No-SQL databases Cassandra, MongoDB and HBase .
- Worked on HBase to load and retrieve data for real time processing using Rest API.
- Experience in developing applications using waterfall and Agile ( XP and Scrum ).
- Experienced with build tool ANT, Maven and continuous integrations like Jenkins.
- Proficiency in working with databases like Oracle, MySQL.
- Extensive experience in writing stored procedures and functions using SQL and PL/SQL.
- Experience in Amazon AWS cloud Administration and actively involved highly available, Scalability, cost effective and fault tolerant systems using multiple AWS services.
- Able to assess business rules, collaborate with stakeholders and perform source-to-target data mapping, design and review.
- Strong problem-solving, organizing, team management, communication and planning skills, with ability to work in team environment. Ability to write clear, well-documented, well-commented and efficient code as per the requirement.
TECHNICAL SKILLS
Big Data Technologies: Hadoop, HDFS, Map Reduce, Hive, Pig, HBase, Impala, Sqoop, Flume, NoSQL (HBase, Cassandra), Spark, Kafka, Zookeeper, Oozie, Hue, Cloudera Manager, Amazon AWS, Hortonwork clusters
AWS Ecosystems: S3, EC2, EMR, Redshift, Athena, Glue, Lambda, SNS, CloudWatch
Java/J2EE & Web Technologies : J2EE, JMS, JSF, JDBC, Servlets, HTML, CSS, XML, XHTML, AJAX, JavaScript.
Languages: C, C++, Core Java, Shell Scripting, SQL, PL/SQL, Python, Pig Latin
Operating systems: Windows, Linux and Unix
DBMS / RDBMS: Oracle, Talend ETL, Microsoft SQL Server 2012/2008, MySQL, DB2 and NoSQL, Teradata SQL, MongoDB, Cassandra, HBase
IDE and Build Tools: Eclipse, NetBeans, MS Visual Studio, Ant, Maven, JIRA, Confluence
Version Control: Git, SVN, CVS
Web Services: RESTful, SOAP
Web Servers: Web Logic, Web Sphere, Apache Tomcat
PROFESSIONAL EXPERIENCE
Senior Hadoop Developer
Confidential, Austin, TX
Responsibilities:
- Created and maintained Technical documentation for launching Hadoop clusters and for executing Hive queries and Pig Scripts.
- Developed data pipeline using Spark, Hive, Pig, python, Impala and HBase to ingest customer behavioral data and financial histories into Hadoop cluster for analysis.
- Have done monitoring and reviewing Hadoop log files and written queries to analyze them.
- Conducted POC's and mocks with client to understand the Business requirement, also attended defect triage meeting with UAT team and QA team to ensure defects are resolved in timely manner.
- Worked with Kafka for the proof of concept for carrying out log processing on a distributed system.
- Understanding the existing Enterprise data warehouse set up and provided design and architecture suggestion converting to Hadoop using MRv2, HIVE, SQOOP and PigLatin.
- Loading the data from the different Data sources like (Teradata and DB2) into HDFS using Sqoop, Flume and load into Hive tables, which are partitioned.
- Developed HiveSQL queries, Mappings, tables, external tables in Hive for analysis across different banners and worked on partitioning, optimization, compilation and execution.
- Written complex queries to get the data into HBase and responsible for executing hive queries using Hive Command Line, HUE.
- Designed and implemented proprietary data solutions by correlating data from SQL and NoSQL databases using Kafka.
- Used Pig as ETL tool to do transformations and some pre-aggregations before storing the analyzed data into HDFS.
- Developed a PySpark code for saving data into AVRO and Parquet format and building hive tables on top of them.
- Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs, Python and Scala.
- Automated workflows using shell scripts to pull data from various data bases into Hadoop.
- Developed bashscripts to bring the TLog files from ftp server and then processing it to load into hive tables. All the bash scripts are scheduled using Resource Manager Scheduler.
- Developed Oozie Workflows for daily incremental loads, which gets data from Teradata and then imported into hive tables.
- Developed spark programs using Scala, involved in creating SparkSQL Queries and developed Oozie workflow for spark jobs.
Environment: HDFS, Hadoop 2.x YARN, Teradata, NoSQL, PySpark, Map Reduce, pig, Hive, Sqoop, Spark 2.3, Scala, Oozie, Java, Python, MongoDB, Shell and bash Scripting.
Hadoop Developer
Confidential, Irving, TX
Responsibilities:
- Worked with Sqoop jobs with incremental load to populate HAWQ External tables to Internal table.
- Importing and exporting data jobs, to perform operations like copying data from HDFS and to HDFS using Sqoop.
- Worked with Spark core, Spark Streaming and SQL modules of Spark.
- Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs, Python and Scala.
- Worked with Pig, HBase, NoSQL database HBase and Sqoop, for analyzing the Hadoop cluster as well as big data.
- Very good understanding of Partitions, bucketing concepts in Hive and designed both Managed and External tables in Hive to optimize performance.
- Creating Hive tables and working on them for data analysis in order to meet the business requirements.
- Developed a data pipeline using Spark and Hive to ingest, transform and analyzing data.
- Worked on data cleansing in order to populate into hive external table and internal tables.
- Experience in using Sequence files, RCFile, AVRO and HAR file formats.
- Hands on Experience writing PIG Scripts to Tokenized sensitive information using PROTEGRITY.
- Supporting and building the Data Science team projects on to Hadoop.
- Used FLUME to dump the application server logs into HDFS.
- Automating backups by shell for Linux to transfer data in S3 bucket.
- Experience in working with NoSQL database HBASE in getting real time data analytics.
- Hands on experience working as production support Engineer.
- Worked on RCA documentation.
- Automated incremental loads to load data into production cluster.
- Ingested the data from various file system to HDFS using Unix command line utilities.
- Hands on experience in moving data from one cluster to another cluster using DISTCP.
- Experience in reviewing Hadoop log files to detect failures.
- Worked on EPIC user stories and delivered on time.
- Worked on data ingestion part for malicious intent model. Automated daily incremental jobs that can run on daily basis.
- Hands on experience in Agile and scrum methodologies.
Environment: MapReduce, HDFS, Hive, HBase, Sqoop, Pig, Flume, Oracle 11/10g, DB2, Teradata, MySQL, HAWQ, PL/SQL, Java, Linux, Shell Scripting, SQL Developer, HP ALM.
Hadoop Developer
Confidential, Charlotte, NC
Responsibilities:
- Developed optimal strategies for distributing the web log data over the cluster, importing and exporting the stored web log data into HDFS and Hive using Scoop.
- Hands on experience in loading data from UNIX file system and Teradata to HDFS.
- Analysed the web log data using the HiveQL to extract number of unique visitors per day, page views, visit duration, most purchased product on website.
- Install, configure, and operate Apache stack i.e. Hive, HBase, Pig, Sqoop, Zookeeper, Oozie, Flume and Mahout on Hadoop cluster.
- Created a high-level design for the Data Ingestion and data extraction Module, enhancement of Hadoop Map-Reduce job which joins the incoming slices of data and pick only the fields needed for further processing.
- Involved in creating Hive tables, loading with data and writing hive queries that will run internally in map reduce way.
- Have been working with AWS cloud services (VPC, EC2, S3, EMR, DynamoDB,SNS, SQS).
- Have been a part of team that has taken care of setting the infrastructure in AWS.
- Used AmazonS3 as a storage mechanism and written python scripts that dump the data into S3.
- Tested raw data and executed performance scripts.
- Worked with NoSQL database HBase to create tables and store data.
- Developed and involved in the industry specific UDF (user defined functions)
- Used Flume to collect, aggregate, and store the web log data from various sources like web servers, mobile and network devices and pushed to HDFS.
- Used Oozie workflow engine to manage interdependent Hadoop jobs and to automate several types of Hadoop jobs such as Java map reduce, Hive, Pig, and Sqoop.
- Monitored workload, job performance and capacity planning using Cloudera Manager.
Environment: Cloudera, MapReduce, Hive, Solr, Pig, HDFS, HBase, Flume, Sqoop, Zookeeper, Python, Flat files, AWS, Unix/Linux.
Hadoop Developer
Confidential
Responsibilities:
- Installed and configured Hadoop MapReduce , HDFS , Developed multiple MapReduce jobs in java for data cleaning and pre-processing.
- Written MapReduce code to process & parsing the data from various sources & storing parsed data into HBase and Hive using HBase-Hive Integration.
- Worked on moving all log files generated from various sources to HDFS for further processing.
- Developed workflows using custom MapReduce , Pig, Hive and Sqoop .
- Developing predictive analytic product for using Apache Spark, SQL/HiveQL.
- Writing Spark programs to load, parse, refined and store sensor data into Hadoop and also process analyzed and aggregate data for visualizations.
- Creating various views for HBASE tables and also utilizing the performance of Hive on top of HBASE .
- Developed the Apache Storm , Kafka , and HDFS integration project to do a real time data analysis.
- Designed and developed the Apache Storm topologies for Inbound and outbound data for real time ETL to find the latest trends and keywords.
- Developed Map Reduce program for parsing and loading into HDFS information.
- Built reusable Hive UDF libraries for business requirements which enabled users to use these UDF's in Hive Querying .
- Written Hive UDF to sort Structure fields and return complex data type.
- Responsible for loading data from UNIX file system to HDFS .
- Developed ETL Applications using HIVE, SPARK , IMPALA & SQOOP and Automated using Oozie
- Developed suit of Unit Test Cases for Mapper, Reducer and Driver classes using MR Testing library.
- Designed and developed a distributed processing system running to process binary files in parallel and crunch the analysis metrics into a Data Warehousing platform for reporting.
- Developed workflow in Control M to automate tasks of loading data into HDFS and pre-processing with PIG .
- Cluster co-ordination services through ZooKeeper.
- Used Maven extensively for building jar files of MapReduce programs and deployed to Cluster .
- Modelled Hive partitions extensively for data separation and faster data processing and followed Pig and Hive best practices for tuning.
Environment: Hive QL, MySQL, HBase, HDFS, HIVE, Eclipse (Kepler), Hadoop, Oracle 11g, PL/SQL, SQL*PLUS, Toad 9.6, Flume, PIG, Oozie, Sqoop, Aws, Spark, Unix, Tableau, Cosmos.
Hadoop Developer
Confidential
Responsibilities:
- Experienced with Python frameworks like WPebapp2 and, Flask .
- Experienced in WAMP (Windows, Apache, MYSQL, and Python PHP) and MVC Struts
- Developed mobile cross-browser web application Angular JS , JavaScript API .
- Successfully migrated the Django database from SQLite to MySQL to PostgreSQ L with complete data integrity.
- Used Celery with Rabbit MQ and Flask to create a distributed worker framework.
- Created Automation test framework using Selenium .
- Responsible for design and development of Web Pages using PHP, HTML , JOOMLA, CSS including Ajax controls and XML.
- Developed intranet portal for managing Amazon EC2 servers using Tornado and MongoDB.
- Expertise in developing different web applications implementing the Model-View-Controller (MVC) architectures using Full stack frameworks such as Turbo Gears.
- Implemented monitoring and established best practices around using Elastic search.
- Strong experience in building large, responsive based REST web application experienced in Cherrypy framework, Python.
- Used Test driven approach (TDD) for developing services required for the application.
Environment: Python 2.7/3.0, PL/SQL C++, Redshift, XML, Agile (SCRUM), PyUnit, MYSQL, Apache, CSS, MySQL, DHTML, HTML, JavaScript, Shell Scripts, Git, Linux, Unix and Windows.