We provide IT Staff Augmentation Services!

Hadoop Developer Resume

2.00/5 (Submit Your Rating)

San, JosE

SUMMARY:

  • Hadoop Developer with 8 plus years of Information Technology experience in the field of development, enhancement and maintenance for Banking and Financial applications as well as Retail DW applications
  • Expert in designing data pipelines from Ingestion, processing and reporting
  • 4 years of hands on experience in implementing data platforms in Hadoop
  • Experience in working with Agile and waterfall models
  • Hands - on experience in SQL, HiveQL, Spark, DataStage(ETL) and performance tuning
  • Expert in process automation using scripting languages like Python and Shell
  • Adept at maintaining focus on achieving bottom-line results while formulating and implementing advanced technology and business solutions to meet a diversity of needs
  • An active team player with effective communication and interpersonal skills
  • Enjoy brainstorming the merits of new ideas, establishing a direction, and proceeding with resolve. A quiet achiever, encouraging mentor, and decisive problem-solver
  • Consistently deliver business-critical projects on schedule and within budget, despite intense pressure and tight timelines

TECHNICAL SKILLS:

Big Data Platform: Hadoop, Hive, Sqoop, SparkSQL, PySpark, MapReduce (Python), YARN, HDFS, Hbase

Scripting: Shell Scripting, Python

DBMS: Teradata, Oracle, DB2, MySQL

ETL: DataStage, Talend for Big Data

Version Control: Git, SVN, TFS

Cloud: AWS S3,EC2,RedShift

PROFESSIONAL EXPERIENCE:

Confidential, San Jose

Hadoop Developer

Technology: Hortonworks HDP 2.5.3,Hadoop, Hive, SparkSQL, PySpark, Teradata, Shell, Sqoop, Apache Livy, Git, Rally, Hbase

Responsibilities:

  • Design data ingestion process using tools Sqoop, Flume and Kafka
  • Design ETL data pipelines using combination of tools like Hive, SparkSQL, Hbase and PySpark
  • Migrate existing Spark Jobs in production to run via “Spark Compute as a Service using Apache Livy ” framework which enabled SparkSession sharing and improve performance
  • Design and develop a homogenous layer on hive to accommodate various data sources adhere to the same data model
  • Designed logic to implement ACID in hive by integrating hive on Hbase
  • Process and load real time data for every 30 minutes on to HDFS using HiveQL
  • Develop reporting queries using OLAP functions on top of the Financial data in Hive and publish to the Business users on regular time intervals
  • Automation of manually running reports using Shell Scripting, Teradata and scheduled in Crontab
  • Migration of Teradata tables to Hadoop using Hive and orchestration via internal python Framework

Confidential, Austin, TX

Hadoop Developer

Programming Languages: Map R Hadoop v2,Hive,Pig,Spark,Shell,JIRA,Oozie

Responsibilities:

  • Design ETL pipelines using Hive and Spark (PySpark)
  • Develop job orchestration using Shell scripting
  • Refactor Data Ingestion into Hadoop Data Lake from disparate third party vendors for better performance using SFTP, Gsutil and Teradata connector for Sqoop
  • Refactor HDFS schema design according to best practices
  • Design scalable data layout in Hive by choosing the right file formats (parquet, sequencefile, ORC) and compression codecs (snappy, Lzo etc.)
  • Develop SparkSQL code to replace traditional Hive MapReduce jobs
  • Work on SCRUM mode, participate in sprint retrospectives and planning sessions
  • Automated testing script to perform QA
  • Job scheduling using Oozie

Confidential, Bentonville, AR

Hadoop Developer

Programming Languages: Talend, Hadoop, HiveQL, HDFS, Pig, Sqoop, Spark, UNIX, Oracle, Teradata, Azure

Responsibilities:

  • Provide BI consulting solutions
  • Use Big Data technologies like Hadoop, Cassandra in BI data delivery
  • Involved in deployment of HDP cluster on Microsoft Azure
  • Data Migration from existing Teradata Systems to Hortonworks HDInsight cluster on Azure
  • Leverage core expertise in solution design and managing enterprise wide BI (Data warehousing/Data Integration) implementations
  • Design and implement ETL solutions with tools like Datastage, Talend Open Studio for Big Data
  • Perform data analysis over large datasets using Apache Pig, Apache Hive and Spark
  • Design and build data staging and summary(aggregated) area in Hive DW
  • Perform Data visualization techniques like dashboards, scorecards using Tableau

Confidential

Hadoop Developer

Programming Languages: Datastage, Unix Scripting, Teradata, Oracle. DB2,TFS

Responsibilities:

  • Create Technical Design and ETL mapping documents
  • Lead offshore team, allocate and track tasks assigned
  • Design DataStage ETL jobs, DataStage Sequences and Shell scripts
  • Unit testing of DataStage ETL jobs
  • Design the flow of execution using Datastage
  • Performance tuning of SQL queries and Datastage jobs
  • Involved in version control activities using Tortoise SVN

Confidential

Hadoop Developer

Programming Languages: DataStage, Oracle, DB2, MySQL, Toad, IBM SORSA,SVN

Responsibilities:

  • Create Technical Design and ETL mapping documents
  • Perform impact analysis pertaining to DML and DDL changes to the Banking Data Warehouse
  • Prepare DDL and DML scripts
  • Design DataStage ETL jobs, DataStage Sequences and Shell scripts
  • Unit testing of DataStage ETL jobs
  • Design the flow of execution using Datastage
  • Performance tuning of SQL queries and Datastage jobs
  • Involved in version control activities using Tortoise SVN

We'd love your feedback!