Sr.hadoop Developer Resume
Reston, VA
SUMMARY
- Around 7+ years of IT experience as Hadoop Developer with working knowledge in Hadoop Ecosystem and expertise in application Design and Development in various domains with an emphasis on Data warehousing tools using industry accepted methodologies
- Hands on experience in developing and deploying enterprise - based applications using major Hadoop ecosystem components like MapReduce, YARN, Hive, Pig, Flume, Sqoop, Pig, Kafka, and Oozie
- Good understanding/knowledge of Hadoop Architecture and its components such as HDFS, Resource Manager, Node Manager, Job Tracker, Task Tracker, Name Node, Data Node and MapReduce
- Expertise in writing Hadoop Jobs for analyzing data using MapReduce and Hive
- Experience in importing and exporting data using Sqoop from HDFS to Relational Database Systems and vice-versa
- Wrote Ad - hoc queries for analyzing the data using HIVE QL
- Strong knowledge in NOSQL column-oriented databases like HBase, Cassandra, MongoDB, and its integration with Hadoop cluster
- Experience in managing Hadoop clusters using Cloudera Manager tool
- Experience in Hive Partitioning, bucketing and perform different types of joins on Hive tables and implementing Hive Serde’s like REGEX, JSON, and PARQUET
- Written Several Sqoop scripts to load the data directly into HDFS and Hive Tables from different sources
- Worked on developing ETL processes to load data from multiple data sources to HDFS using SQOOP, perform structural modifications using Map-Reduce, HIVE, and analyze data using visualization/reporting tools
- Experienced in running query using Impala and used BI tools to run ad-hoc queries directly on Hadoop
- Experienced in transferring data from different data sources into HDFS systems using Kafka producers, consumers, and Kafka brokers
- Very good understanding of job workflow scheduling and monitoring tools like Oozie and Control M
- Experience in managing and reviewing Hadoop Log files using FLUME and Kafka and developed the Pig UDF's and Hive UDF's to pre-process the data for analysis
- Worked on Hadoop data operation components like Zookeeper and Oozie
- Experience on handling Hive queries using SparkSQL dat integrates with Spark environment
- Good understanding and working experience on Hadoop Distributions like Cloudera and Hortonworks
- Working knowledge on AWS technologies like S3 and EMR for storage, Big Data processing and analysis
- Good understanding of Amazon Web services to design Data Pipeline using various services
- Experience in integration of various data sources in RDMS like Oracle, SQL Server, MySQL, Teradata
- Hands-on experience with Amazon EC2, Amazon S3, Amazon RDS, VPC, IAM, Amazon Elastic Load Balancing and other services of the AWS family
- Knowledge in build automation tools like Maven and Jenkins
- Good experience and knowledge in Unix commands and Shell Scripting
- Good Knowledge and understanding of the Python Programming
- Good Knowledge and understanding of the SWIFT Programming
- Experienced in Project management tools like Jira, Confluence, and Share Point
- Quick learning, self-motivated, hardworking, good team player with excellent communication skills and strong affinity towards learning new technologies.
TECHNICAL SKILLS
Big Data/Hadoop: HDFS, Map Reduce, Yarn, Hive, Pig, Sqoop, Oozie, Hue, Flume, Kafka, Zookeeper
Programming Languages/Scripting: Java, Python, Bash, Swift
Query Languages: SQL, PL/SQL, T-SQL
Virtualization: VMware, Virtual Center, Virtual Box
Hadoop Distributions: Cloudera, Hortonworks
Cloud Platform: AWS, Azure
Cloud Services: AWS EC2, S3, ELB, EBS, Cloud Watch, SNS, SQS, SES, Route53, CloudTrail, ECS, EMR, DynamoDB, RDS, Glacier, Lambda
Databases: MySQL, SQL server, Oracle, Teradata
NoSQL Databases: HBase, Cassandra, MongoDB
ETL Tools: SSIS, Informatica, DataStage
Reporting Tools: SSRS, Tableau, Cognos BI
Version Control System: Subversion (SVN), GIT, Bit Bucket
Build Tools: ANT, Maven
CI Tools: Jenkins, Bamboo
Operating Systems: Windows, Linux, UNIX, RHEL/CentOS 5.x/6.x/7, Mac OS
Web/App Servers: Apache Tomcat, Web Logic, Web Sphere, JBoss
Automated Test Execution: Web driver, TestNG, Junit
Bug Tracking Tools: JIRA, ServiceNow, Confluence
Web Technologies: HTML5& CSS3, Java Script, JDBC, JSP, JSON
IDE/Tools: Eclipse, NetBeans
Monitoring Tools: Nagios, Splunk and Cloud watch
PROFESSIONAL EXPERIENCE
Confidential, Reston, VA
Sr.Hadoop Developer
Responsibilities:
- Involved in complete project life cycle starting from design discussion to production deployment.
- Experience in Hadoop ecosystem experience in ingestion, storage, querying, processing, and analysis of big data
- Worked on analyzing Hadoop stack and different big data analytic tools including Pig, Hive, HBase and Sqoop
- Developed Hive scripts in HiveQL to de-normalize and aggregate the data
- Created HBase tables and column families to store the user event data
- Written automated HBase test cases for data quality checks using HBase command line tools
- Used Hive and Impala to query the data in HBase
- Convert CSV files into parquet format and load the parquet file into data frames and query them using Spark and SQL
- Created multiple Hive tables, implemented Hive queries, partitioning, dynamic partitioning and buckets in Hive for efficient data access
- Performed Spark jobs with the Spark core, SparkSQL libraries for processing the data
- Created Hive DDL’s on top of Parquet schema in HDFS location as requested by the source team
- Build a continuous ETL pipeline by using Kafka, Spark Streaming and HDFS
- Perform ETL on the data from different formats like JSON, Parquet, and Database. Tan run ad-hoc querying using Spark SQL
- Involved in importing the real time data to Hadoop using Kafka and implemented the Oozie job for daily imports
- Worked extensively on AWS components like Elastic MapReduce (EMR), Elastic Compute Cloud (EC2), Simple Storage Service (S3)
- Used Amazon Cloud Watch to monitor and track resources on AWS
- Done various compressions and file formats like snappy, gzip, Bzip2, Avro, Parquet, text
- Extracted data from disparate source systems such as Oracle, Hive, Snowflake and Files (CSV)
- Created Hive tables to store the processed results in a tabular format
- Bulk loading and unloading data into Snowflake tables using COPY command
- Experienced in designing, built, and deploying and utilizing almost all the AWS stack (EMR, RDS, S3, Atana, Dynamo DB, Redshift, Glue and EC2) focusing on high-availability, fault tolerance, and auto-scaling
- Extensively worked using AWS services along with wide and in depth understanding of each one of them. Developed ETL Pipeline to extract data logs and store into AWS S3 Data Lake.
Environment: Hadoop, MapReduce, YARN, Snowflake, HDFS, PySpark, Hive, Java, SQL, Spark, Pig, Sqoop, Oozie, Zookeeper, AWS, Python, Teradata, PL/SQL, MySQL, Windows, Oozie, HBase, GIT, Jenkins, Maven.
Confidential, New York, NY
Hadoop Developer
Responsibilities:
- Extracted and updated the data into HDFS using Sqoop import and export command line utility interface
- Responsible for developing data pipeline using Flume, Sqoop, and Pig to extract the data from weblogs and store in HDFS
- Develop transformations using custom MapReduce, Pig and Hive
- Perform Map side joins in both Pig and Hive
- Optimize joins in Hive using techniques such as Sort-Merge join and Map side join
- Control parallelism at relational level and script level in Pig
- Implement partitioning and bucketing techniques in Hive
- Worked with Senior Engineer on configuring Kafka for streaming data
- Worked in Spark streaming to get ongoing information from the Kafka and store the stream information to HDFS.
- Developed and Configured Kafka brokers to pipeline server logs data into Spark streaming
- Responsible for developing scalable distributed data solutions using Hadoop
- Loaded cache data into HBase using Sqoop
- Build Spark Data frames to process huge amounts of structured data
- Use JSON to represent complex data structure within a MapReduce job
- Store and preprocess the logs and semi structured content on HDFS using MapReduce and import it into Hive warehouse
- Worked with Senior Engineer on configuring Kafka for streaming data
- Moved data from HDFS to Cassandra using MapReduce and Bulk Output Format class
- Hands-on experience with Data warehouse and SQL databases like Oracle
- Installed Oozie workflow engine to run multiple Hive and Pig jobs which run independently with time and data availability
- Developed Hive Queries in Spark-SQL for analysis and processing the data.
- Expertise in understanding Partitions, Bucketing concepts in Hive
- Used Oozie Scheduler system to automate the pipeline workflow and orchestrate the MapReduce’s jobs dat extract the data on a timely manner. Responsible for loading data from UNIX file system to HDFS
- Responsible for Cluster maintenance, adding and removing cluster nodes, cluster monitoring and troubleshooting, managing, and reviewing data backups and Hadoop log files.
Environment: Hadoop 2x, Apache Spark, Spark-SQL, Data frames, Scala, HDFS, HIVE, Oozie, Kafka, Autosys, Oracle, Teradata, Python/PySpark, Map Reduce, Sqoop, HBase, Shell Scripting, PIG, Core Java, Cassandra, Nifi, Cloudera Hadoop Distribution, PL/SQL, Toad, Windows NT, LINUX.
Confidential, Waltham, MA
Jr Hadoop Developer
Responsibilities:
- Experienced in Spark Streaming and creating RDD and applying operations transformations and actions
- Managing and scheduling Spark Jobs on a Hadoop Cluster using Oozie.
- Extensively wrote Shell scripts
- Tested Apache TEZ, an extensible framework for building high performance batch and interactive data processing applications, on Pig and Hive jobs
- Designed and implemented Incremental Imports into Hive tables and writing Hive queries to run on TEZ
- Experienced data pipelines using Kafka for handling large terabytes of data
- Written shell scripts dat run multiple Hive jobs which halps to automate different Hive tables incrementally which are used to generate different reports using Tableau for the Business use
- Worked on SparkSQL, created Data frames by loading data from Hive tables and created prep data and stored in AWS S3
- Extensively worked on Text, ORC, Avro and Parquet file formats and compression techniques like Snappy, Gzip and Zlib
- Created, managed, and utilized policies for S3 buckets and Glacier for storage and backup AWS
- Developed Simple to complex Map/Reduce Jobs using Hive and Pig
- Exploring with Spark improving the performance and optimization of the existing algorithms in Hadoop using Spark context, Spark-SQL, Data Frame, pair RDD's, Spark YARN
- Load the data into Spark RDD and performed in-memory data computation to generate the output response.
- Worked on NoSQL database such as MongoDB
- Worked on debugging, performance tuning of Hive & Pig Jobs
- Implemented test scripts to support test driven development and continuous integration
- Importing and exporting data into HDFS and Hive using Sqoop
- Experience working on processing unstructured data using Pig and Hive
- Gained experience in managing and reviewing Hadoop log files.
- Designed Hive UDFS to formatting and to apply the predetermined quick transformations and hashing functions.
- Involved in end to end developing, scheduling all the Hive, MapReduce, and Pig jobs in Oozie workflow.
- Responsible for managing data coming from different sources.
Environment: Hadoop, YARN, Resource Manager, SQL, Python, Kafka, Hive, Sqoop, Qlik Sense, Tableau, Oozie, Jenkins, Linux, Scala, Spark.
Confidential, Milwaukee, WI
Software Engineer
Responsibilities:
- Involved in design and development of web front end using HTML, Java Script, CSS and JSP’s for Administration, Efficiency Management and Self-Assessment modules and part of Data Warehousing development team using Informatica
- Developed and tested the Efficiency Management module using EJB, Servlets, and JSP & Core Java components in WebLogic Application Server
- Developed Struts framework, providing access to system functions of a server’s business layer
- Developed Informatica transformations using Informatica Power Center designer
- Developed Workflows using Informatica and automated them using Unix Scripts
- Implemented business components as persistent object model as EJBCMP and BMP Entity Beans for storing and retrieving data objects from Resources
- Implemented the application MVC Architecture using Strut’s framework
- Involved in stored procedures using PL/SQL to interact with the Oracle database required by the Efficiency Module, Informatica tool
- Deployed web components, presentation components and business components in WebLogic Application Server.
Environment: al: Java, J2EE (Servlets, JDBC, EJB, JSP, JMS), HTML, CSS, Java Script, eclipse, Struts Framework 1.1, ANT, XML, CVS, Oracle 8i, PL/SQL, Log4j, Windows XP.
