Big Data Developer Resume

SUMMARY

Having 9+ years of IT experience with over 4+ years of experience as a Big Data Developerusing Hadoopand 8+ years of experience inInformatica/Teradatawith extensive knowledge on Banking, Retail and Telecom Domains.
Experience in using SDLC methodologies like Waterfall, Agile Scrum for design and development.
Hands on experience in major components of Big Data/Hadoop ecosystem like Map Reduce, HDFS, Hive, Pig, HBase, Zookeeper, Sqoop, Oozie, YARN, Spark, Spark Streaming, Spark SQL, Kafka, Impala.
Experience in Managing data extraction jobs and build new data pipelines from various structured and unstructured sources into Hadoop.
Experienced with RDBMS and NoSQL databases like Oracle, DB2, SQL Server, Hbase.
Experience in writing Spark RDD transformations, actions for the input data and Spark - SQL queries, Data frames to import data from Data sources to perform data transformations, read/write operations using Spark-Core and save the results to output directory into HDFS.
Experience in Spark Core, Spark SQL with Java, and Scala API.
Experience in Amazon Cloud services like Amazon EMR File System and Simple Storage Services (S3).
Good experience in scripting for automation, and monitoring using Shell.
Good Experience in ETL, Data Integration and Migration, extensively used ETL methodology for supporting Data Extraction, transformations and loading in Hive, Pig and HBase.
Experience in various distributions: Cloudera distributions like (CDH 4/CDH5).
Expertise in optimizing traffic across network using Combiners, joining multiple schema datasets using Joins and organizing data using partitions and Buckets.
Experience in Data Load Management, importing and exporting data from HDFS to Relational and non- Relational Database Systems using Sqoop.
Extensive experience working with real time streaming applications and batch style large scale distributed computing applications on integrating Kafka and Spark.
Experience in developing data pipeline using Kafka, Spark, and Hive to ingest, transform and analysing data.
Experience in scheduling MapReduce/Hive jobs using Oozie.
Experience in ingesting large volumes of data into Hadoop using Sqoop.
Experience in writing real time query processing using ClouderaImpala.
Hands on knowledge on RDD transformations, DataFrame transformations in Spark.
Developed Python scripts to find vulnerabilities with SQL Queries by doing SQL injection.
Implemented automation, traceability, and transparency for every step of the process to build trust in data and streamline data science efforts using Python, Java, Hadoopstreaming, ApacheSpark, SparkSQL, Scala, Hive, and Pig.
Performed data validation and transformation using Python and Hadoop streaming.
Developed Spark scripts by using Python shell commands.
Worked with different data sources like Avro data files, XML files, JSON files, SQL server and Oracle to load data into Hive tables.
Specified the cluster size, allocating Resource pool, Distribution of Hadoop by writing the specification texts in JSON File format.
Experience in collection of LogData and JSON data into HDFS using Flume and processed the data using Hive/Pig.
Involved in developing a linear regression model to predict a continuous measurement for improving the observation on wind turbine data developed using spark with Scala API.
Used Spark and Spark-SQL to read the parquet data and create the tables in hive using the Scala API.
Implemented Spark using Scala and Spark SQL for faster testing and processing of data.
Strong working experience in planning and carrying out of Teradata system extraction using Informatica, Loading Process and Data warehousing, Large-scale Database Management and Reengineering.
Experience in developing, support and maintenance for the ETL processes using Informatica PowerCenter.
Highly experienced in creating complex Informatica mappings and workflows working with major transformations.
Good knowledge on Data Warehousing, ETL development, Distributed Computing, and largescale data processing.
Good knowledge on Agile Methodology and the scrum process.
Excellent interpersonal skills comfortable presenting to large Groups, preparing written communications and presentation material.
Flexible and Quick learner, who can adapt and execute in any fast-paced environment.

TECHNICAL SKILLS

Operating Systems: Windows XP/NT/2000, Unix/Linux

Programming Languages: Java, SQL, PL/SQL, Scala APIFrameworks Hadoop (Sqoop, HDFS, Hive, Impala, Pig, Map Reduce)KafkaSpark (Spark Core, Spark SQL, Spark Streaming)

RDBMS: Teradata, Oracle 9i/10g/11g/12c, MySQL 5.5, DB2, SQL Server

Tools: SQL Developer, Toad, Tableau, Jira, Informatica Power Center 9.5/9.0.1/9/8.6.1 , MS Visio, ServiceNow, HP Quality Center, Autosys.

Version Controller: Git, SVN

IDE: IntelliJ Eclipse

PROFESSIONAL EXPERIENCE

Confidential

Big Data Developer

Responsibilities:

Developed and executed shell scripts to automate the jobs and Wrote complex Hive queries.
Worked on reading multiple data formats on HDFS using Spark.
Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs, and Scala.
Analysed the SQL scripts and designed the solution to implement using Spark.
Involved in loading data from oracle toHDFS, AWS S3.
Extracted the data from Teradata into HDFS using Sqoop.
Created files and tuned the SQL queries in Hive utilizing HUE (Hadoop User Experience).
Analysed substantial data sets by running Hive queries and Pig scripts.
Implemented Kafka Real-time streaming API - Loading Tables from SQL Server to TERADATA Server.
Implemented Kafka Real-time Streaming API using Kafka - Kyte for real-time loading the data from SQL Server to Hive/Hadoop.
Automated Kafka real time integration Udeploy - Cigna Devop Application; Integrated with Git - CI/CD for Environments in Hadoop cluster.
Implemented Kafka-connect to ingest real time streaming of HL7 data (i.e., CCD/SCD data) into HDFS.
Involved in analysis, design, testing phases and responsible for documenting technical specifications.
Developed Kafka producer and consumers, HBase clients, Spark, and Hadoop MapReduce jobs along with components on HDFS, Hive using AWS EMR.
Implemented automation, traceability, and transparency for every step of the process to build trust in data and streamline data science efforts using Python, Java, Hadoopstreaming, ApacheSpark, SparkSQL, Scala, Hive, and Pig.
Performed data validation and transformation using Python and Hadoop streaming.
Developed Spark scripts by using Python shell commands.
Worked with Spark eco system using Spark SQL and Scala queries on different formats like Text file, CSV file.
Managed real-time data processing and real time Data Ingestion.
Facilitating the daily scrum meetings, spring planning, spring review, and spring retrospective.
Worked on the core and Spark SQL modules of Spark extensively.
Involved in importing the real time data to Hadoop using Kafka and implemented Oozie job for daily imports.
Involved in migrating tables from RDBMS (Oracle, SQL Server, DB2) into Hive tables using Sqoop.

Confidential

Big Data Developer

Responsibilities:

CreatedSqoopjobs to import data from SQl, Oracle, and Teradata to HDFS
Created Hive tables to push the data to MongoDB.
Designed highly efficient data model for optimizing large-scale queries utilizing Hive complex data types and Parquetfile format.
Performed data validation and transformation using Python and Hadoop streaming.
Worked on loading the data from the different Data sources like (Teradata and DB2) into HDFS using SQOOP and load into Hive tables, which are partitioned.
Automated workflows using shell scripts and Control-M jobs to pull data from various databases into HadoopDataLake.
Involved in story-driven Agile development methodology and actively participated in daily scrum meetings.
Involved in converting Hive/SQL queries into Spark transformations using SparkRDDs, Scala and have a good experience in using Spark-Shell and Spark Streaming.
Created HBase tables to load large sets of structured, semi-structured and unstructured data coming from UNIX, NoSQL and a variety of portfolios.

Confidential

ETL Developer

Responsibilities:

Developed and implemented technical best practices for data movement, data quality, data cleansing, and other ETL -related activities. Developed and ensured adherence to locally defined standards for all developed components.
ETL Repository Management, maintain appropriate backups and Multi-project/Multi-concurrent release ETL environment.
Documented designs and data maps, developed data quality components and establish and/or conducted unit tests.
Analyzing source system data to assess data quality . Work with technical and business representatives to determine strategies for handling data anomalies that are identified.
Designed ETL processes and developed source-to-target data mappings, integration workflows, and load processes.
Designed, developed and implemented enterprise -class data warehousing solutions.
Developed, tested, integrated and deployed ETL routines using ETL tools and external programming/scripting languages as necessary. Provided data analysis and technical documentation for both source and target mappings.
Help administer and maintain the ETL environment in a support role with Systems Administrators, including performing configuration management activities related to the ETL environment .

We provide IT Staff Augmentation Services!

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship