Senior Hadoop Developer Resume Austin, TX - Hire IT People

SUMMARY

7 years of professional experience working with data, which includes hands on experience of 5+ years in analysis, design, development and maintenance of Hadoop.
Extensive experience of development using Hadoop ecosystem covering Map Reduce, HDFS, YARN, Hive, Impala, Pig, Hbase, Spark, Sqoop, Oozie, Cloudera.
Full - scale knowledge of Hadoop ecosystem components such as HDFS, Job Tracker, Name Node, Data Node.
Experience with an in-depth level of understanding in the strategy and practical implementation of AWS Cloud-Specific technologies including IAM, EC2, EMR, SNS, RDS, Redshift, Athena, Dynamo DB, Lambda, Cloud Watch, Auto-Scaling, S3, and Route 53.
Strong experience in analyzing data using HiveQL, Spark SQL, HBase and custom Map Reduce programs.
Performed importing and exporting data into HDFS and Hive using Sqoop.
Experience in writing shell scripts to dump the Shared data from MySQL servers to HDFS.
Working knowledge in python and Scala to use spark.
Experience in Oozie and workflow scheduler to manage Hadoop jobs by Direct Acyclic Graph (DAG) of actions with control flows.
Experience in designing and developing POCs in Spark using Scala to compare the performance of Spark with Hive and SQL/Oracle.
Handled different file formats like Parquet, Avro files, RC files using different SerDes in Hive.
Performed Data Ingestion from multiple disparate sources and systems using Kafka.
Experience in managing and monitoring Hadoop cluster using Cloudera Manager.
Experience in writing custom UDF’s which extends Hive and Pig core functionalities
Good knowledge of No-SQL databases Cassandra, MongoDB and HBase .
Worked on HBase to load and retrieve data for real time processing using Rest API.
Experience in developing applications using waterfall and Agile ( XP and Scrum ).
Experienced with build tool ANT, Maven and continuous integrations like Jenkins.
Proficiency in working with databases like Oracle, MySQL.
Extensive experience in writing stored procedures and functions using SQL and PL/SQL.
Experience in Amazon AWS cloud Administration and actively involved highly available, Scalability, cost effective and fault tolerant systems using multiple AWS services.
Able to assess business rules, collaborate with stakeholders and perform source-to-target data mapping, design and review.
Strong problem-solving, organizing, team management, communication and planning skills, with ability to work in team environment. Ability to write clear, well-documented, well-commented and efficient code as per the requirement.

TECHNICAL SKILLS

Big Data Technologies: Hadoop, HDFS, Map Reduce, Hive, Pig, HBase, Impala, Sqoop, Flume, NoSQL (HBase, Cassandra), Spark, Kafka, Zookeeper, Oozie, Hue, Cloudera Manager, Amazon AWS, Hortonwork clusters

AWS Ecosystems: S3, EC2, EMR, Redshift, Athena, Glue, Lambda, SNS, CloudWatch

Java/J2EE & Web Technologies : J2EE, JMS, JSF, JDBC, Servlets, HTML, CSS, XML, XHTML, AJAX, JavaScript.

Languages: C, C++, Core Java, Shell Scripting, SQL, PL/SQL, Python, Pig Latin

Operating systems: Windows, Linux and Unix

DBMS / RDBMS: Oracle, Talend ETL, Microsoft SQL Server 2012/2008, MySQL, DB2 and NoSQL, Teradata SQL, MongoDB, Cassandra, HBase

IDE and Build Tools: Eclipse, NetBeans, MS Visual Studio, Ant, Maven, JIRA, Confluence

Version Control: Git, SVN, CVS

Web Services: RESTful, SOAP

Web Servers: Web Logic, Web Sphere, Apache Tomcat

PROFESSIONAL EXPERIENCE

Senior Hadoop Developer

Confidential, Austin, TX

Responsibilities:

Created and maintained Technical documentation for launching Hadoop clusters and for executing Hive queries and Pig Scripts.
Developed data pipeline using Spark, Hive, Pig, python, Impala and HBase to ingest customer behavioral data and financial histories into Hadoop cluster for analysis.
Have done monitoring and reviewing Hadoop log files and written queries to analyze them.
Conducted POC's and mocks with client to understand the Business requirement, also attended defect triage meeting with UAT team and QA team to ensure defects are resolved in timely manner.
Worked with Kafka for the proof of concept for carrying out log processing on a distributed system.
Understanding the existing Enterprise data warehouse set up and provided design and architecture suggestion converting to Hadoop using MRv2, HIVE, SQOOP and PigLatin.
Loading the data from the different Data sources like (Teradata and DB2) into HDFS using Sqoop, Flume and load into Hive tables, which are partitioned.
Developed HiveSQL queries, Mappings, tables, external tables in Hive for analysis across different banners and worked on partitioning, optimization, compilation and execution.
Written complex queries to get the data into HBase and responsible for executing hive queries using Hive Command Line, HUE.
Designed and implemented proprietary data solutions by correlating data from SQL and NoSQL databases using Kafka.
Used Pig as ETL tool to do transformations and some pre-aggregations before storing the analyzed data into HDFS.
Developed a PySpark code for saving data into AVRO and Parquet format and building hive tables on top of them.
Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs, Python and Scala.
Automated workflows using shell scripts to pull data from various data bases into Hadoop.
Developed bashscripts to bring the TLog files from ftp server and then processing it to load into hive tables. All the bash scripts are scheduled using Resource Manager Scheduler.
Developed Oozie Workflows for daily incremental loads, which gets data from Teradata and then imported into hive tables.
Developed spark programs using Scala, involved in creating SparkSQL Queries and developed Oozie workflow for spark jobs.

Environment: HDFS, Hadoop 2.x YARN, Teradata, NoSQL, PySpark, Map Reduce, pig, Hive, Sqoop, Spark 2.3, Scala, Oozie, Java, Python, MongoDB, Shell and bash Scripting.

Hadoop Developer

Confidential, Irving, TX

Responsibilities:

Worked with Sqoop jobs with incremental load to populate HAWQ External tables to Internal table.
Importing and exporting data jobs, to perform operations like copying data from HDFS and to HDFS using Sqoop.
Worked with Spark core, Spark Streaming and SQL modules of Spark.
Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs, Python and Scala.
Worked with Pig, HBase, NoSQL database HBase and Sqoop, for analyzing the Hadoop cluster as well as big data.
Very good understanding of Partitions, bucketing concepts in Hive and designed both Managed and External tables in Hive to optimize performance.
Creating Hive tables and working on them for data analysis in order to meet the business requirements.
Developed a data pipeline using Spark and Hive to ingest, transform and analyzing data.
Worked on data cleansing in order to populate into hive external table and internal tables.
Experience in using Sequence files, RCFile, AVRO and HAR file formats.
Hands on Experience writing PIG Scripts to Tokenized sensitive information using PROTEGRITY.
Supporting and building the Data Science team projects on to Hadoop.
Used FLUME to dump the application server logs into HDFS.
Automating backups by shell for Linux to transfer data in S3 bucket.
Experience in working with NoSQL database HBASE in getting real time data analytics.
Hands on experience working as production support Engineer.
Worked on RCA documentation.
Automated incremental loads to load data into production cluster.
Ingested the data from various file system to HDFS using Unix command line utilities.
Hands on experience in moving data from one cluster to another cluster using DISTCP.
Experience in reviewing Hadoop log files to detect failures.
Worked on EPIC user stories and delivered on time.
Worked on data ingestion part for malicious intent model. Automated daily incremental jobs that can run on daily basis.
Hands on experience in Agile and scrum methodologies.

Environment: MapReduce, HDFS, Hive, HBase, Sqoop, Pig, Flume, Oracle 11/10g, DB2, Teradata, MySQL, HAWQ, PL/SQL, Java, Linux, Shell Scripting, SQL Developer, HP ALM.

Hadoop Developer

Confidential, Charlotte, NC

Responsibilities:

Developed optimal strategies for distributing the web log data over the cluster, importing and exporting the stored web log data into HDFS and Hive using Scoop.
Hands on experience in loading data from UNIX file system and Teradata to HDFS.
Analysed the web log data using the HiveQL to extract number of unique visitors per day, page views, visit duration, most purchased product on website.
Install, configure, and operate Apache stack i.e. Hive, HBase, Pig, Sqoop, Zookeeper, Oozie, Flume and Mahout on Hadoop cluster.
Created a high-level design for the Data Ingestion and data extraction Module, enhancement of Hadoop Map-Reduce job which joins the incoming slices of data and pick only the fields needed for further processing.
Involved in creating Hive tables, loading with data and writing hive queries that will run internally in map reduce way.
Have been working with AWS cloud services (VPC, EC2, S3, EMR, DynamoDB,SNS, SQS).
Have been a part of team that has taken care of setting the infrastructure in AWS.
Used AmazonS3 as a storage mechanism and written python scripts that dump the data into S3.
Tested raw data and executed performance scripts.
Worked with NoSQL database HBase to create tables and store data.
Developed and involved in the industry specific UDF (user defined functions)
Used Flume to collect, aggregate, and store the web log data from various sources like web servers, mobile and network devices and pushed to HDFS.
Used Oozie workflow engine to manage interdependent Hadoop jobs and to automate several types of Hadoop jobs such as Java map reduce, Hive, Pig, and Sqoop.
Monitored workload, job performance and capacity planning using Cloudera Manager.

Environment: Cloudera, MapReduce, Hive, Solr, Pig, HDFS, HBase, Flume, Sqoop, Zookeeper, Python, Flat files, AWS, Unix/Linux.

Hadoop Developer

Confidential

Responsibilities:

Installed and configured Hadoop MapReduce , HDFS , Developed multiple MapReduce jobs in java for data cleaning and pre-processing.
Written MapReduce code to process & parsing the data from various sources & storing parsed data into HBase and Hive using HBase-Hive Integration.
Worked on moving all log files generated from various sources to HDFS for further processing.
Developed workflows using custom MapReduce , Pig, Hive and Sqoop .
Developing predictive analytic product for using Apache Spark, SQL/HiveQL.
Writing Spark programs to load, parse, refined and store sensor data into Hadoop and also process analyzed and aggregate data for visualizations.
Creating various views for HBASE tables and also utilizing the performance of Hive on top of HBASE .
Developed the Apache Storm , Kafka , and HDFS integration project to do a real time data analysis.
Designed and developed the Apache Storm topologies for Inbound and outbound data for real time ETL to find the latest trends and keywords.
Developed Map Reduce program for parsing and loading into HDFS information.
Built reusable Hive UDF libraries for business requirements which enabled users to use these UDF's in Hive Querying .
Written Hive UDF to sort Structure fields and return complex data type.
Responsible for loading data from UNIX file system to HDFS .
Developed ETL Applications using HIVE, SPARK , IMPALA & SQOOP and Automated using Oozie
Developed suit of Unit Test Cases for Mapper, Reducer and Driver classes using MR Testing library.
Designed and developed a distributed processing system running to process binary files in parallel and crunch the analysis metrics into a Data Warehousing platform for reporting.
Developed workflow in Control M to automate tasks of loading data into HDFS and pre-processing with PIG .
Cluster co-ordination services through ZooKeeper.
Used Maven extensively for building jar files of MapReduce programs and deployed to Cluster .
Modelled Hive partitions extensively for data separation and faster data processing and followed Pig and Hive best practices for tuning.

Environment: Hive QL, MySQL, HBase, HDFS, HIVE, Eclipse (Kepler), Hadoop, Oracle 11g, PL/SQL, SQL*PLUS, Toad 9.6, Flume, PIG, Oozie, Sqoop, Aws, Spark, Unix, Tableau, Cosmos.

Hadoop Developer

Confidential

Responsibilities:

Experienced with Python frameworks like WPebapp2 and, Flask .
Experienced in WAMP (Windows, Apache, MYSQL, and Python PHP) and MVC Struts
Developed mobile cross-browser web application Angular JS , JavaScript API .
Successfully migrated the Django database from SQLite to MySQL to PostgreSQ L with complete data integrity.
Used Celery with Rabbit MQ and Flask to create a distributed worker framework.
Created Automation test framework using Selenium .
Responsible for design and development of Web Pages using PHP, HTML , JOOMLA, CSS including Ajax controls and XML.
Developed intranet portal for managing Amazon EC2 servers using Tornado and MongoDB.
Expertise in developing different web applications implementing the Model-View-Controller (MVC) architectures using Full stack frameworks such as Turbo Gears.
Implemented monitoring and established best practices around using Elastic search.
Strong experience in building large, responsive based REST web application experienced in Cherrypy framework, Python.
Used Test driven approach (TDD) for developing services required for the application.

Environment: Python 2.7/3.0, PL/SQL C++, Redshift, XML, Agile (SCRUM), PyUnit, MYSQL, Apache, CSS, MySQL, DHTML, HTML, JavaScript, Shell Scripts, Git, Linux, Unix and Windows.

We provide IT Staff Augmentation Services!

Senior Hadoop Developer Resume

Austin, TX

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship