Senior Hadoop Developer Resume
FL
SUMMARY
- 7 years of overall IT experience as a Hadoop Developer/Data Engineer with strong emphasis on Hadoop Ecosystem, Azure, Spark, Informatica, SQL and Tableau. Involved in various Analysis, Development and Implementation projects.
- Hands - on experience in Hadoop and its components like HDFS, Spark, Scala, Map Reduce, Hive,
- Impala, Sqoop, Pig, Oozie Kafka, Mongo, PostgreSQL, SQL Server
- Hands-on experience in Tableau
- Hands on Experience in SQL Queries for Data Analysis.
- Analyzing and acquiring experiences into large data sets, creating visually compelling and actionable
- Interactive reports and dashboards.
- Proficient in design and development of various dashboards and reports utilizing Tableau Visualizations
- Like Dual Axis, Bar Graphs, Scatter Plots, Heat Maps, Bubble Charts, Tree Maps, Waterfall Charts,
- Geographic Visualization, and other making use of actions, other local and global filters according to the end-user requirement.
- Hands on Experience in various Database concepts like Data Modeling, physical and logical schema
- Design, creating Triggers, Indexes and views, Snowflake and star schema, Dimensional and Fact Tables,
- Slowly Changing Dimensions, Normalization and other database concepts.
- Performed Data Quality checks on small data sets using Excel (VLookup, Pivot tables).
- Involved in ingesting data from RDBMS to Hadoop and vice-versa using Sqoop.
- Hands-on experience in performing using ETL using Hive scripts and loaded data back into HDFS/HIVE.
- Strong Knowledge in Partitioning Data in Hive using Static and Dynamic Partitioning and Bucketing in Hive
- Strong understanding of Hive Meta store and used Impala for Adhoc & additional analysis.
- Hands-on experience in writing Py-spark data frames in Jupyter notebooks and writing the output back to HDFS.
- Automated, scheduled, co-ordinate workflows in Hadoop using Oozie xml tags.
- Experience in using Text, CSV, Excel, JSON file format in Hadoop ecosystem
- Hands-on experience in Informatica, Teradata, also supported multiple projects from start to end life cycle.
- Handled small files in HDFS and resolved name node issues.
- Spark performance optimizations using broadcast joins, coalesce, and repartition the data.
- Good knowledge in UNIX commands, writing shell scripts for QA purposes.
- Worked in Software Development Life Cycle (analysis, design, development, testing, implementation, deployment, support) using Waterfall, SAFE, Scrum, Agile and Kanban Methodologies.
- Experience in Azure cloud (ADF, Azure Devops, Azure Data Lake, Data bricks) and NO Sql data bases such as MongoDB, HBASE.
- Co-ordinated onsite and offshore model.
TECHNICAL SKILLS
Big Data Ecosystems: HDFS, Map Reduce, Hive, Pig, Sqoop, Oozie, Spark, Kafka, YARN, NIFI
NoSQL Databases: MongoDB, HBASE
Scripting Languages: Python, SQL, HiveQL, Unix shell scripts
Operating Systems: Windows, Linux, Unix
Databases: Oracle, Teradata, SQL Server
Tools: and IDE Eclipse, IntelliJ
Version control/ Build Tools: GitHub
Methodologies: Waterfall, Agile
Cloud: Azure
Visualization tools: Tableau, Excel(VLookup, Pivot tables)
ETL Tool: Informatica
PROFESSIONAL EXPERIENCE
Confidential, FL
Senior Hadoop Developer
Responsibilities:
- Developed and Designed Data flows starting from SOURCE-STAGING-HIVE.
- Supported Enterprise NIFI data pipelines to ingest DATA from various sources to HDFS.
- Experience in writing Sqoop scripts to import and export data from Teradata and SQL Server to HDFS vice-versa.
- Experience in developing Spark applications using Pyspark.
- Developed Spark scripts using Data frames and RDD.
- Extensively worked with parquet and Avro file format.
- Experience in building SCD TYPE dimensional and fact tables using Spark applications.
- Involved in consuming Kafka messages from Kafka using Spark.
- Extensively used Hive partitioned tables, map join, bucketing and gained good understanding of dynamic partitioning, also worked on Impala for Adhoc Analysis.
- Experience in designing both time driven and data driven automated workflows using Oozie.
- Gained good exposure to Hue interface for monitoring the job status, managing the HDFS files, tracking the scheduled jobs and managing the Oozie workflows.
- Performed optimizations and performance tuning in Spark and Hive.
- Developed Unix script to automate data load into HDFS.
- Experienced in managing and reviewing Hadoop log files.
- Good knowledge on GIT commands, version tagging and pull requests.
- Performed unit testing and integration testing, regression testing after the development and participated in code reviews.
- Deprecating on-Prem process to Azure using Data bricks and ADF pipelines.
- Using Azure devops for build pipelines and automatic deployment.
- Interact with business analysts to understand the business requirements and translate them to technical requirements.
- Collaborate with various technical experts, architects and developers for design and implementation of technical requirements.
- Documented business requirements, technical specifications, and process flows.
Environment: Hadoop Cloudera, Spark, Hive, Impala, Sqoop, Oozie, Azure, GIT, UNIX Shell scripting, Teradata, SQL
Confidential
Hadoop Developer/ ETL Developer
Responsibilities:
- Analyzed data sets from different Sources/Domains using SQL and Excel
- Hands-on experience in writing SQL queries.
- Knowledge in building sample aggregated reports using Excel pivots and VLOOKUP’s.
- Analyzed, Supported ETL Workflows in Informatica.
- Developing code using ETL tools and documenting artifacts to deploy it.
- Used Informatica Power Center 8.6 and 9.5 for extraction, transformation and load (ETL) of data.
- Extensively used Transformations like Router, Aggregator, Normalizer, Joiner, Expression, Lookup, Update strategy and Sequence generator, Procedure, XML, and SQL.
- Scheduling Jobs using AUTOSYS scheduler.
- Responsible for writing Hive queries for data analysis, loading data to meet the businessrequirements.
- Developed Spark ETL jobs to apply rules, logics and transform data.
- Involving in analyzing performance improvement areas and implementing the design approach to increase performance.
- Created Hive Tables and loaded them using spark.
- Using Sqoop extracted data from various RDBMS to HDFS.
- Working on several enhancements to meet the client requirements and managed to successfully deliver them with no defects, besides meeting deadlines.
- Performing high and low-level designs, provide pseudo codes, implement the prototype and conduct design reviews.
- Responding to the issues assigned, conduct analysis, suggest / implement workarounds.
- Performing root-cause analysis of any issues that arise post-implementation and work on solutions related to issue fixing.
Environment: UNIX Shell scripting, ETL Informatica, Teradata, Agile development, SQL
Confidential
Hadoop Developer
Responsibilities:
- Interact with business analysts to understand the business requirements and translate them to technical requirements.
- Collaborate with various technical experts, architects and developers for design and implementation of technical requirements.
- Experience in building workflows using Informatica to load data into Teradata.
- Designed and developed Data Ingestion scripts using Sqoop from Teradata to Hadoop/HDFS.
- Consumed Raw Data and performed data cleaning to create a clean source of data using hive and spark scripts.
- Business Logic implementation using Spark data frames (Adding new columns, joining multiple data frames) and loaded data into Hive tables using HIVE Scripts.
- Performed QA checks across various environments using Hadoop Tech stack and Shell scripting.
- Knowledge on SCD TYPE dimensional and fact tables (Star Schema).
- Extensively use Hive partitioned tables, map side join, bucketing and gained good understanding of dynamic partitioning.
- Used Hive & Impala to perform data queries and analysis as a part of the QA.
- Experience in writing Sqoop scripts to import and export data from Teradata to HDFS and vice-versa.
- Created reports for the product/client team using tableau.
- Performed optimizations and performance tuning in Spark.
- Knowledge on GIT commands, version tagging and pull requests.
- Practical experience with developing applications in IntelliJ and Jupyter Notebooks.
- Experience in handling offshore and on-site Change Approval calls.
- Documented business requirements, technical specifications, and process flows.
Environment: Hadoop, Cloudera, Spark, Apache Pig, Apache Hive, HDFS, Sqoop, Oozie, IntelliJ, GIT, UNIX Shell scripting, Informatica, Teradata, Linux, Tableau, Agile development
