Hadoop Developer Resume
San, JosE
SUMMARY:
- Hadoop Developer with 8 plus years of Information Technology experience in the field of development, enhancement and maintenance for Banking and Financial applications as well as Retail DW applications
- Expert in designing data pipelines from Ingestion, processing and reporting
- 4 years of hands on experience in implementing data platforms in Hadoop
- Experience in working with Agile and waterfall models
- Hands - on experience in SQL, HiveQL, Spark, DataStage(ETL) and performance tuning
- Expert in process automation using scripting languages like Python and Shell
- Adept at maintaining focus on achieving bottom-line results while formulating and implementing advanced technology and business solutions to meet a diversity of needs
- An active team player with effective communication and interpersonal skills
- Enjoy brainstorming the merits of new ideas, establishing a direction, and proceeding with resolve. A quiet achiever, encouraging mentor, and decisive problem-solver
- Consistently deliver business-critical projects on schedule and within budget, despite intense pressure and tight timelines
TECHNICAL SKILLS:
Big Data Platform: Hadoop, Hive, Sqoop, SparkSQL, PySpark, MapReduce (Python), YARN, HDFS, Hbase
Scripting: Shell Scripting, Python
DBMS: Teradata, Oracle, DB2, MySQL
ETL: DataStage, Talend for Big Data
Version Control: Git, SVN, TFS
Cloud: AWS S3,EC2,RedShift
PROFESSIONAL EXPERIENCE:
Confidential, San Jose
Hadoop DeveloperTechnology: Hortonworks HDP 2.5.3,Hadoop, Hive, SparkSQL, PySpark, Teradata, Shell, Sqoop, Apache Livy, Git, Rally, Hbase
Responsibilities:
- Design data ingestion process using tools Sqoop, Flume and Kafka
- Design ETL data pipelines using combination of tools like Hive, SparkSQL, Hbase and PySpark
- Migrate existing Spark Jobs in production to run via “Spark Compute as a Service using Apache Livy ” framework which enabled SparkSession sharing and improve performance
- Design and develop a homogenous layer on hive to accommodate various data sources adhere to the same data model
- Designed logic to implement ACID in hive by integrating hive on Hbase
- Process and load real time data for every 30 minutes on to HDFS using HiveQL
- Develop reporting queries using OLAP functions on top of the Financial data in Hive and publish to the Business users on regular time intervals
- Automation of manually running reports using Shell Scripting, Teradata and scheduled in Crontab
- Migration of Teradata tables to Hadoop using Hive and orchestration via internal python Framework
Confidential, Austin, TX
Hadoop DeveloperProgramming Languages: Map R Hadoop v2,Hive,Pig,Spark,Shell,JIRA,Oozie
Responsibilities:
- Design ETL pipelines using Hive and Spark (PySpark)
- Develop job orchestration using Shell scripting
- Refactor Data Ingestion into Hadoop Data Lake from disparate third party vendors for better performance using SFTP, Gsutil and Teradata connector for Sqoop
- Refactor HDFS schema design according to best practices
- Design scalable data layout in Hive by choosing the right file formats (parquet, sequencefile, ORC) and compression codecs (snappy, Lzo etc.)
- Develop SparkSQL code to replace traditional Hive MapReduce jobs
- Work on SCRUM mode, participate in sprint retrospectives and planning sessions
- Automated testing script to perform QA
- Job scheduling using Oozie
Confidential, Bentonville, AR
Hadoop DeveloperProgramming Languages: Talend, Hadoop, HiveQL, HDFS, Pig, Sqoop, Spark, UNIX, Oracle, Teradata, Azure
Responsibilities:
- Provide BI consulting solutions
- Use Big Data technologies like Hadoop, Cassandra in BI data delivery
- Involved in deployment of HDP cluster on Microsoft Azure
- Data Migration from existing Teradata Systems to Hortonworks HDInsight cluster on Azure
- Leverage core expertise in solution design and managing enterprise wide BI (Data warehousing/Data Integration) implementations
- Design and implement ETL solutions with tools like Datastage, Talend Open Studio for Big Data
- Perform data analysis over large datasets using Apache Pig, Apache Hive and Spark
- Design and build data staging and summary(aggregated) area in Hive DW
- Perform Data visualization techniques like dashboards, scorecards using Tableau
Confidential
Hadoop DeveloperProgramming Languages: Datastage, Unix Scripting, Teradata, Oracle. DB2,TFS
Responsibilities:
- Create Technical Design and ETL mapping documents
- Lead offshore team, allocate and track tasks assigned
- Design DataStage ETL jobs, DataStage Sequences and Shell scripts
- Unit testing of DataStage ETL jobs
- Design the flow of execution using Datastage
- Performance tuning of SQL queries and Datastage jobs
- Involved in version control activities using Tortoise SVN
Confidential
Hadoop DeveloperProgramming Languages: DataStage, Oracle, DB2, MySQL, Toad, IBM SORSA,SVN
Responsibilities:
- Create Technical Design and ETL mapping documents
- Perform impact analysis pertaining to DML and DDL changes to the Banking Data Warehouse
- Prepare DDL and DML scripts
- Design DataStage ETL jobs, DataStage Sequences and Shell scripts
- Unit testing of DataStage ETL jobs
- Design the flow of execution using Datastage
- Performance tuning of SQL queries and Datastage jobs
- Involved in version control activities using Tortoise SVN
