Sr.Hadoop Developer Resume Reston, VA - Hire IT People

SUMMARY

Around 7+ years of IT experience as Hadoop Developer with working knowledge in Hadoop Ecosystem and expertise in application Design and Development in various domains with an emphasis on Data warehousing tools using industry accepted methodologies
Hands on experience in developing and deploying enterprise - based applications using major Hadoop ecosystem components like MapReduce, YARN, Hive, Pig, Flume, Sqoop, Pig, Kafka, and Oozie
Good understanding/knowledge of Hadoop Architecture and its components such as HDFS, Resource Manager, Node Manager, Job Tracker, Task Tracker, Name Node, Data Node and MapReduce
Expertise in writing Hadoop Jobs for analyzing data using MapReduce and Hive
Experience in importing and exporting data using Sqoop from HDFS to Relational Database Systems and vice-versa
Wrote Ad - hoc queries for analyzing the data using HIVE QL
Strong knowledge in NOSQL column-oriented databases like HBase, Cassandra, MongoDB, and its integration with Hadoop cluster
Experience in managing Hadoop clusters using Cloudera Manager tool
Experience in Hive Partitioning, bucketing and perform different types of joins on Hive tables and implementing Hive Serde’s like REGEX, JSON, and PARQUET
Written Several Sqoop scripts to load the data directly into HDFS and Hive Tables from different sources
Worked on developing ETL processes to load data from multiple data sources to HDFS using SQOOP, perform structural modifications using Map-Reduce, HIVE, and analyze data using visualization/reporting tools
Experienced in running query using Impala and used BI tools to run ad-hoc queries directly on Hadoop
Experienced in transferring data from different data sources into HDFS systems using Kafka producers, consumers, and Kafka brokers
Very good understanding of job workflow scheduling and monitoring tools like Oozie and Control M
Experience in managing and reviewing Hadoop Log files using FLUME and Kafka and developed the Pig UDF's and Hive UDF's to pre-process the data for analysis
Worked on Hadoop data operation components like Zookeeper and Oozie
Experience on handling Hive queries using SparkSQL dat integrates with Spark environment
Good understanding and working experience on Hadoop Distributions like Cloudera and Hortonworks
Working knowledge on AWS technologies like S3 and EMR for storage, Big Data processing and analysis
Good understanding of Amazon Web services to design Data Pipeline using various services
Experience in integration of various data sources in RDMS like Oracle, SQL Server, MySQL, Teradata
Hands-on experience with Amazon EC2, Amazon S3, Amazon RDS, VPC, IAM, Amazon Elastic Load Balancing and other services of the AWS family
Knowledge in build automation tools like Maven and Jenkins
Good experience and knowledge in Unix commands and Shell Scripting
Good Knowledge and understanding of the Python Programming
Good Knowledge and understanding of the SWIFT Programming
Experienced in Project management tools like Jira, Confluence, and Share Point
Quick learning, self-motivated, hardworking, good team player with excellent communication skills and strong affinity towards learning new technologies.

TECHNICAL SKILLS

Big Data/Hadoop: HDFS, Map Reduce, Yarn, Hive, Pig, Sqoop, Oozie, Hue, Flume, Kafka, Zookeeper

Programming Languages/Scripting: Java, Python, Bash, Swift

Query Languages: SQL, PL/SQL, T-SQL

Virtualization: VMware, Virtual Center, Virtual Box

Hadoop Distributions: Cloudera, Hortonworks

Cloud Platform: AWS, Azure

Cloud Services: AWS EC2, S3, ELB, EBS, Cloud Watch, SNS, SQS, SES, Route53, CloudTrail, ECS, EMR, DynamoDB, RDS, Glacier, Lambda

Databases: MySQL, SQL server, Oracle, Teradata

NoSQL Databases: HBase, Cassandra, MongoDB

ETL Tools: SSIS, Informatica, DataStage

Reporting Tools: SSRS, Tableau, Cognos BI

Version Control System: Subversion (SVN), GIT, Bit Bucket

Build Tools: ANT, Maven

CI Tools: Jenkins, Bamboo

Operating Systems: Windows, Linux, UNIX, RHEL/CentOS 5.x/6.x/7, Mac OS

Web/App Servers: Apache Tomcat, Web Logic, Web Sphere, JBoss

Automated Test Execution: Web driver, TestNG, Junit

Bug Tracking Tools: JIRA, ServiceNow, Confluence

Web Technologies: HTML5& CSS3, Java Script, JDBC, JSP, JSON

IDE/Tools: Eclipse, NetBeans

Monitoring Tools: Nagios, Splunk and Cloud watch

PROFESSIONAL EXPERIENCE

Confidential, Reston, VA

Sr.Hadoop Developer

Responsibilities:

Involved in complete project life cycle starting from design discussion to production deployment.
Experience in Hadoop ecosystem experience in ingestion, storage, querying, processing, and analysis of big data
Worked on analyzing Hadoop stack and different big data analytic tools including Pig, Hive, HBase and Sqoop
Developed Hive scripts in HiveQL to de-normalize and aggregate the data
Created HBase tables and column families to store the user event data
Written automated HBase test cases for data quality checks using HBase command line tools
Used Hive and Impala to query the data in HBase
Convert CSV files into parquet format and load the parquet file into data frames and query them using Spark and SQL
Created multiple Hive tables, implemented Hive queries, partitioning, dynamic partitioning and buckets in Hive for efficient data access
Performed Spark jobs with the Spark core, SparkSQL libraries for processing the data
Created Hive DDL’s on top of Parquet schema in HDFS location as requested by the source team
Build a continuous ETL pipeline by using Kafka, Spark Streaming and HDFS
Perform ETL on the data from different formats like JSON, Parquet, and Database. Tan run ad-hoc querying using Spark SQL
Involved in importing the real time data to Hadoop using Kafka and implemented the Oozie job for daily imports
Worked extensively on AWS components like Elastic MapReduce (EMR), Elastic Compute Cloud (EC2), Simple Storage Service (S3)
Used Amazon Cloud Watch to monitor and track resources on AWS
Done various compressions and file formats like snappy, gzip, Bzip2, Avro, Parquet, text
Extracted data from disparate source systems such as Oracle, Hive, Snowflake and Files (CSV)
Created Hive tables to store the processed results in a tabular format
Bulk loading and unloading data into Snowflake tables using COPY command
Experienced in designing, built, and deploying and utilizing almost all the AWS stack (EMR, RDS, S3, Atana, Dynamo DB, Redshift, Glue and EC2) focusing on high-availability, fault tolerance, and auto-scaling
Extensively worked using AWS services along with wide and in depth understanding of each one of them. Developed ETL Pipeline to extract data logs and store into AWS S3 Data Lake.

Environment: Hadoop, MapReduce, YARN, Snowflake, HDFS, PySpark, Hive, Java, SQL, Spark, Pig, Sqoop, Oozie, Zookeeper, AWS, Python, Teradata, PL/SQL, MySQL, Windows, Oozie, HBase, GIT, Jenkins, Maven.

Confidential, New York, NY

Hadoop Developer

Responsibilities:

Extracted and updated the data into HDFS using Sqoop import and export command line utility interface
Responsible for developing data pipeline using Flume, Sqoop, and Pig to extract the data from weblogs and store in HDFS
Develop transformations using custom MapReduce, Pig and Hive
Perform Map side joins in both Pig and Hive
Optimize joins in Hive using techniques such as Sort-Merge join and Map side join
Control parallelism at relational level and script level in Pig
Implement partitioning and bucketing techniques in Hive
Worked with Senior Engineer on configuring Kafka for streaming data
Worked in Spark streaming to get ongoing information from the Kafka and store the stream information to HDFS.
Developed and Configured Kafka brokers to pipeline server logs data into Spark streaming
Responsible for developing scalable distributed data solutions using Hadoop
Loaded cache data into HBase using Sqoop
Build Spark Data frames to process huge amounts of structured data
Use JSON to represent complex data structure within a MapReduce job
Store and preprocess the logs and semi structured content on HDFS using MapReduce and import it into Hive warehouse
Worked with Senior Engineer on configuring Kafka for streaming data
Moved data from HDFS to Cassandra using MapReduce and Bulk Output Format class
Hands-on experience with Data warehouse and SQL databases like Oracle
Installed Oozie workflow engine to run multiple Hive and Pig jobs which run independently with time and data availability
Developed Hive Queries in Spark-SQL for analysis and processing the data.
Expertise in understanding Partitions, Bucketing concepts in Hive
Used Oozie Scheduler system to automate the pipeline workflow and orchestrate the MapReduce’s jobs dat extract the data on a timely manner. Responsible for loading data from UNIX file system to HDFS
Responsible for Cluster maintenance, adding and removing cluster nodes, cluster monitoring and troubleshooting, managing, and reviewing data backups and Hadoop log files.

Environment: Hadoop 2x, Apache Spark, Spark-SQL, Data frames, Scala, HDFS, HIVE, Oozie, Kafka, Autosys, Oracle, Teradata, Python/PySpark, Map Reduce, Sqoop, HBase, Shell Scripting, PIG, Core Java, Cassandra, Nifi, Cloudera Hadoop Distribution, PL/SQL, Toad, Windows NT, LINUX.

Confidential, Waltham, MA

Jr Hadoop Developer

Responsibilities:

Experienced in Spark Streaming and creating RDD and applying operations transformations and actions
Managing and scheduling Spark Jobs on a Hadoop Cluster using Oozie.
Extensively wrote Shell scripts
Tested Apache TEZ, an extensible framework for building high performance batch and interactive data processing applications, on Pig and Hive jobs
Designed and implemented Incremental Imports into Hive tables and writing Hive queries to run on TEZ
Experienced data pipelines using Kafka for handling large terabytes of data
Written shell scripts dat run multiple Hive jobs which halps to automate different Hive tables incrementally which are used to generate different reports using Tableau for the Business use
Worked on SparkSQL, created Data frames by loading data from Hive tables and created prep data and stored in AWS S3
Extensively worked on Text, ORC, Avro and Parquet file formats and compression techniques like Snappy, Gzip and Zlib
Created, managed, and utilized policies for S3 buckets and Glacier for storage and backup AWS
Developed Simple to complex Map/Reduce Jobs using Hive and Pig
Exploring with Spark improving the performance and optimization of the existing algorithms in Hadoop using Spark context, Spark-SQL, Data Frame, pair RDD's, Spark YARN
Load the data into Spark RDD and performed in-memory data computation to generate the output response.
Worked on NoSQL database such as MongoDB
Worked on debugging, performance tuning of Hive & Pig Jobs
Implemented test scripts to support test driven development and continuous integration
Importing and exporting data into HDFS and Hive using Sqoop
Experience working on processing unstructured data using Pig and Hive
Gained experience in managing and reviewing Hadoop log files.
Designed Hive UDFS to formatting and to apply the predetermined quick transformations and hashing functions.
Involved in end to end developing, scheduling all the Hive, MapReduce, and Pig jobs in Oozie workflow.
Responsible for managing data coming from different sources.

Environment: Hadoop, YARN, Resource Manager, SQL, Python, Kafka, Hive, Sqoop, Qlik Sense, Tableau, Oozie, Jenkins, Linux, Scala, Spark.

Confidential, Milwaukee, WI

Software Engineer

Responsibilities:

Involved in design and development of web front end using HTML, Java Script, CSS and JSP’s for Administration, Efficiency Management and Self-Assessment modules and part of Data Warehousing development team using Informatica
Developed and tested the Efficiency Management module using EJB, Servlets, and JSP & Core Java components in WebLogic Application Server
Developed Struts framework, providing access to system functions of a server’s business layer
Developed Informatica transformations using Informatica Power Center designer
Developed Workflows using Informatica and automated them using Unix Scripts
Implemented business components as persistent object model as EJBCMP and BMP Entity Beans for storing and retrieving data objects from Resources
Implemented the application MVC Architecture using Strut’s framework
Involved in stored procedures using PL/SQL to interact with the Oracle database required by the Efficiency Module, Informatica tool
Deployed web components, presentation components and business components in WebLogic Application Server.

Environment: al: Java, J2EE (Servlets, JDBC, EJB, JSP, JMS), HTML, CSS, Java Script, eclipse, Struts Framework 1.1, ANT, XML, CVS, Oracle 8i, PL/SQL, Log4j, Windows XP.

We provide IT Staff Augmentation Services!

Sr.hadoop Developer Resume

Reston, VA

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship