Senior Hadoop Developer Resume
Iowa City, IA
SUMMARY:
- 11+ Years of IT industry experience encompassing a wide range of skill sets. Roles and industry verticals.
- Certified Big data (HADOOP) developer, Certified in IBM Database Developer and lean concepts
- Hands on experience in performing development and data analytics using HADOOP (BIG - DATA) tools and technologies which included HDFS, MAP REDUCE, HIVE, PIG, HBASE, SPARK, FLUME, SQOOP, DMX-sync sort, HDINSIGHT, AZURE and OOZIE.
- Strong database skills in DB2, HIVE, Oracle, SQL server, MySQL, BigSQL and No-SQL databases like HBASE, familiarity with CASSANDRA.
- Experienced in installing, configuring, managing, and testing Big-data “HADOOP” ecosystem components.
- Experienced in developing map reduce program using java.
- Used Apache Spark with Scala for large-scale data processing, handling real-time analytics and designed ETL.
- Experienced in Data warehouse concepts and ETL tools (AB initio)
- Experienced using Teradata SQL Assistant, data import/export, data loading with utilities like BTEQ, Multi Load, Fast Load, and Fast Export on UNIX environments.
- Experienced in Stored Procedure, Function, Trigger and macros, SQL Loader.
- Experienced in UNIX Shell Scripting
- Good knowledge in Maestro, StartTeam, Git, Buildforge, TWS Scheduler, Control M.
- Experienced with workflow schedulers, data architecture including data ingestion pipeline design and data modeling.
- Possess functional knowledge in the areas of Fixed insurance, Retirement portfolio, Financial Systems, Banking System and Healthcare System.
- Good experience in all phases of systems life cycle Development, Testing (Unit test, System test, Integration Testing and Regression Testing) and Pre-Production support.
- Proficient in analyzing and translating business requirements to technical requirements and architecture.
- Experienced in Data governance and PI classification process.
- Experienced in handling internal and external functional, process and data audits.
TECHNICAL SKILLS:
Big Data Ecosystems: HDFS, Hive, Pig, Map Reduce, Spark Sqoop, HBase, Cassandra, Zookeeper, Flume, DMX, Oozie, Avaro and Hue
Languages: Scala, C#, .NET, Java, PL/SQL, Python, Unix shell scripting, Hiveql, Pig scripts.
Data Base: MY SQL, BIGSQL, NOSQL, SQL SERVER, Oracle, Exadata, DMS1100 and DB2, PostgreSQL
Operating System: Unix, Windows, MVS/ESA, ZOS
ETL/Reporting: Teradata
Methodologies: Waterfall, Scrum
Tools: RPM, MPP, Test Direct, TWS Scheduler, Jira, Quality Center, Service Center, SFTP, Teradata Sql assistant, Toda, SSH, HUE, Eclipse, Maven, Putty, BigInsight, Cloudera, Hortonworks, Beeline Connect, Visual Studio, Visual Code, Cute FTP, SQL Management Studio, Azure Devops, Team server, Powerbi
PROFESSIONAL EXPERIENCE:
Confidential, Iowa City, IA
Senior Hadoop Developer
Responsibilities:
- Strong understanding and practical experience in developing Spark applications with Scala.
- Developed Spark scripts by using Spark shell commands as per the requirement.
- Developed Scala scripts, UDF's using both Data frames/SQL and RDD in Spark for Data Aggregation.
- Exploring with Spark for improving the performance and optimization of the existing algorithms in Hadoop using Spark context, Spark-SQL, Data Frame and pair RDD's
- Experience in developing SparkSQL applications both using SQL and DSL
- Extensively worked with parquet file format and gained practical knowledge in writing spark and hive applications to meet the parquet requirements.
- Experience in using various compression techniques along with Parquet file format.
- Experience in managing datasets and gained good experience in creating the test datasets for development purpose
- Experience in building dimensional and fact tables using Spark Scala applications
- Practical knowledge on writing applications in Scala to interact with the Hive through the Spark application.
- Extensively used Hive partitioned tables, map join, bucketing and gained good understanding of dynamic partitioning.
- Performed POC on writing the spark applications in Scala, Python and R programming language
- Good hands on experience with Hive to perform data queries and analysis as a part of the QA
- Practical experience in using Pig to perform the QA by calculating the statistics of the final output.
- Experience in designing both time driven and data driven automated workflows using Oozie
- Experience in writing Sqoop scripts to import data from exadata to HDFS
- Good exposure to MongoDB, it’s functionality and use-cases
- Gained good exposure to Hue interface for monitoring the job status, managing the HDFS files, tracking the scheduled jobs and managing the Oozie workflows
- Performed optimizations and performance tuning in Spark and Hive
- Developed Unix script to automate data load into HDFS
- Strong knowledge on HDFS commands to manage the files and also gained good understanding in managing the file system through the Spark Scala applications.
- Extensive usage of alias for Oozie and HDFS commands
- Experienced in managing and reviewing Hadoop log files.
- Experience in log controlling for Spark applications and extensive use of log4j to log the respective phases of the application accordingly
- Good knowledge on GIT commands, version tagging and pull requests
- Performed unit testing and also integration testing after the development and participated in code reviews.
- Experience in writing the Junit test cases for testing the Spark and SparkSQL applications
- Practical experience with developing applications in IntelliJ and Maven
- Good exposure to Agile environment. Participated in daily standups, Big Room Planning, Sprint meetings and Team Retrospectives
- Interact with business analysts to understand the business requirements and translate them to technical requirements
Environment: Hadoop 2.6.0-cdh5.7.0, Java 1.8.0 92, Spark 1.6.0, SparkSQL, R programming, Python, Scala 2.10.5, MongoDB, Apache Pig 0.12.0, Apache Hive 1.1.0, HDFS, Sqoop, Oozie, Maven, IntelliJ, GIT, UNIX Shell scripting, Oracle 11g/10g, Log4j, Linux, Agile development
Confidential
Senior Data Engineer
Responsibilities:
- Planning and designing & end to end setup of Data lake sandbox instances
- Developed numerous Spark jobs in Scala for Data Cleansing and Analyzing Data in Impala 2.1.0.
- Developed FTP scripts to bring files from different source to Hadoop data lake.
- Responsible for Ingestion of Data from Blob to Kusto and maintaining the PPE and PROD pipelines.
- Developed python script for NLP pattern search.
- Responsible for creating Hive tables, partitions, loading data and writing hive queries.
- Imported and exported the data using Sqoop between Hadoop Distributed File System (HDFS) and Relational database systems.
- Responsible to build different reporting dashboards in Powerbi and publish to cloud for user.
Environment: Cloudera, Azure, HDFS, Azure Devops, HIVE, Scala, PYTHON, OOZIE, SQL Server, UBUNTU/UNIX, Visual code, Scrum/agile, Powerbi, HUE, Putty, Beeline connect, TWS Scheduler.
Confidential
Lead Hadoop Developer
Responsibilities:
- Responsible for building scalable distributed data solutions using Hive, Impala, Scala & DMX
- Importing and exporting data into HDFS and Hive using Sqoop
- Developed FTP scripts to bring files from different source to Hadoop.
- Used DMX Sync sort to build ETL pipeline for packed data.
- Implanted AB switch logic with Oracle Exadata for parallel processing.
- Implemented Partitioning HIVE.
- Load and transform large sets of structured, semi structured and unstructured
- Deployed Algorithms in Pyspark, using complex datasets.
- Experience in using Sequence files AVRO and PARQUET file formats.
- Come up with project planning and estimations
Environment: HADOOP, HDFS, MAPREDUCE, HIVE, PIG, Scala, PYTHON, HBASE, OOZIE, DMX sync sort yarn, Spark, Core Java, Oracle Exadata, UBUNTU/UNIX, eclipse, JDBC drivers, MySQL, Linux, XML, CRM, SVN, HUE, Putty, Cloudera, Beeline connect, TWS Scheduler.
Confidential, New York
Lead Hadoop Developer
Responsibilities:
- Responsible for building scalable distributed data solutions using Hive, Impala, Spark & Greenplum
- Implemented Partitioning, Bucketing in HIVE.
- Load and transform large sets of structured, semi structured and unstructured
- Deployed Algorithms in Scala with Spark, using complex datasets and done Spark based development with Scala
- Created Java UDFs in PIG and HIVE.
- Experience in using Sequence files, AVRO, PARQUET and TEXT file formats.
- Good working knowledge of Amazon Web Service components like EC2, EMR, S3, EBS, ELB
- Come up with estimations and Technical Design Specifications for projects.
Environment: HADOOP, HDFS, MAPREDUCE, HIVE, PIG, Scala, HBASE, OOZIE, yarn, Spark, Core Java, Teradata, SQL, UBUNTU/UNIX, eclipse, Maven, JDBC drivers, MySQL, Linux, AWS, XML, CRM, SVN, HUE, Putty, Cloudera, Beeline connect, TWS Scheduler.
Confidential, CA
Hadoop Developer
Responsibilities:
- Create the project using HIVE, BIGSQL, PIG
- Involved in data modeling in Hadoop.
- Creating Hive tables and working on them using Hiveql.
- Written Apache PIG scripts to process the HDFS data.
- Involved in data modeling in Hadoop.
- Automated tasks using UNIX shell scripts.
Environment: HADOOP, HDFS, MAPREDUCE, HIVE, PIG, Scala, Python, HBASE, OOZIE, yarn, Spark, Core Java, Oracle, SQL, UBUNTU/UNIX, eclipse, Maven, JDBC drivers, Mainframe, MySQL, Linux, AWS, XML, CRM, SVN, PDSH, Putty, BigInsights
Confidential, NJ
Senior Developer
Responsibilities:
- Understand the requirement and build the HBASE data model
- Loaded history Data as well as incremental customer and other data to Hadoop through Hive.
- Importing and exporting large data sets from various data sources into HDFS using Sqoop.
- Load balancing of data across the cluster and performance tuning of various jobs running on the cluster.
- Developed Oozie workflow for scheduling and orchestrating the ETL process.
- Developed applications using Eclipse
- Performed process enhancement by SQL Tuning.
Environment: HADOOP, HDFS, MAPREDUCE, java, HIVE, Hue, PIG, Flume, SQOOP, HBASE, OOZIE, Yarn, Zookeeper eclipse, Maven, BigInsight
Confidential, NJ
Senior Developer
Responsibilities:
- Designed TDD (low level) from SRS (High level)
- Used Python script to transform the data.
- Fixed issues with the existing Fast Load/ Multi Load Scripts in for smooth loading of data in the warehouse more effectively.
- Created Bteq scripts with data transformations for loading the base tables.
- Generated reports using Teradata BTEQ.
- Worked on optimizing and tuning the Teradata SQLs to improve the performance of batch and response time of data for users.
- Fast Export utility to extract large volume of data and send files to downstream applications
Environment: Teradata V2R12, Teradata SQL Assistant, MLOAD, FASTLOAD, BTEQ, Erwin, Unix Shell Scripting, Macros, Stored procedure, Db2, Cobol, Python, SAS, PL/SQL, FileZilla
Confidential
Developer
Responsibilities:
- Developed windows applications for annual APR review process using VB .NET
- Created and reorganized all types of database objects including tables, views, indexes, sequences, synonyms and setting proper parameters and values for all the objects.
- Wrote database triggers, stored procedures, stored functions, and stored packages to perform various automated tasks for better performance.
- Created Shell Scripts for invoking SQL scripts.
- Effectively made use of Table Functions, Indexes, Table Partitioning, Analytical functions, and Materialized Views
- Experience with Performance Tuning for Oracle RDBMS using Explain Plan and HINTS.
- Involved in the continuous enhancements and fixing of production problems.
- Verified and validated data using SQL queries.
Environment: SQL Server, .NET, C#, SQL, UNIX, SQL*Loader, ASPOSE, Spreadsheet gear, SQL Navigator, TOAD, SQL DEVELOPER.
Confidential
Developer
Responsibilities:
- Analyze requirements, Design document and flow chart of old system.
- Provide post-production support to Developed modules.
- Involved in coding and unit testing.
- Preparing MTP and DTP for BIs.
- Schedule Quality Assurance Reviews (QA’s) as and when required reviewing.
- System testing for different procedures under different BI
- Involved in CAT (Combined Application Testing) and giving support
Environment: Cobol, HP service center, Db2, Oracle, Unix, Shell script, Macros