We provide IT Staff Augmentation Services!

Software Engineer - Big Data Resume

3.00/5 (Submit Your Rating)

PA

SUMMARY

  • 7+ years of professional IT experience which includes about 3 years of experience in Big data ecosystem related technologies like Hadoop HDFS, Map Reduce, Apache Pig, Hive, Sqoop, Spark, Scala, Hbase, Flume, Oozie.
  • Very Good Knowledge in Object - oriented concepts with complete software development life cycle (SDLC) experience - Requirements gathering, Detail design, Development, System and User Acceptance Testing.
  • Experience in importing and exporting the data using Sqoop from HDFS to Relational Database systems and vice-versa.
  • Good understanding/knowledge of Hadoop Architecture and various components such as HDFS, Job Tracker, TaskTracker, NameNode, DataNode, Secondary Namenode, and Map Reduce concepts.
  • Worked on different OS like UNIX /LINUX and developed various shell scripts.
  • Worked with HiveQL to query data from Hive tables in HDFS.
  • Worked with Big Data distributions Cloudera CDH 3 and 4 .Used Pig Latin scripts and customized UDF’s to analyze large data sets.
  • Good working knowledge with ETL and Query big data tools like Pig Latin and Hive QL.
  • Good hands on NoSQL database experience with Hbase
  • Extracted the data from MySQL, Oracle, Sql Server using Sqoop and loaded data.
  • Have hands on experience in writing Map Reduce jobs on Hadoop Ecosystem including Hive and Pig.
  • Experience in developing pipelines and processing data from various sources and processing them with Hive and Pig.
  • Good working knowledge in core java concepts Experience in working with Flume to load the log data from multiple sources directly into HDFS.
  • Experience in designing both time driven and data driven automated workflows using Oozie.
  • Worked on Linux shell scripts for business process and loading data from different interfaces to HDFS.
  • Knowledge and understanding of NOSQL database MongoDB.
  • Experience in Scrum, Agile and Waterfall models.
  • Deployed VMs in AWS, GCP, Azure using Terraform’s providers.
  • Experience in Designing, Architecting and implementing scalable cloud-based web applications using AWS and Azure
  • Strong experience on Informatica Power Center with strong business understanding and knowledge of Extraction, Transformation and Loading of data from source systems like Flat files, Excel, XML, Oracle, SQL Server.
  • Extensively involved in ETL Data warehousing using Informatica Power Center 7.x/8.x/9.x Designer tools like Source Analyzer, Target Designer, Mapping Designer, Mapplet Designer, Transformation Developer, Workflow Manager and Workflow Monitor.
  • Experience in creating Reusable Transformations (Joiner, Sorter, Aggregator, Expression, Lookup, Router, Filter, Update Strategy, Normalizer and Rank) and Mappings using Informatica Designer and processing tasks using Workflow Manager to move data from multiple sources into targets.
  • Experience with dimensional modeling using star schema.
  • Hands on experience in identifying and resolving performance bottlenecks in various levels like sources, mappings and sessions.
  • Good understanding on the principles of DW using Fact Tables, Dimension Tables, Star Schema.
  • Involved in Unit Testing to check whether the data loads into target are accurate.
  • Good Working Knowledge in writing SQL Joins, Nested Queries, Unions.
  • Created slowly changing dimensions (SCD) Type1/2 dimension mappings.
  • Good in coding using SQL, SQL*Plus, PL/SQL, Stored Procedures/Functions.
  • Good experience in performing and supporting Unit testing, System Integration testing (SIT), UAT and production support for issues raised by application users.
  • A team player with good Technical, Communication and Interpersonal skills with fast Learning and creative Analytical abilities.

TECHNICAL SKILLS

Programming Languages: PL/SQL, Java (Core),Scala, VB.NET, C#.NET.

Operating Systems: Windows (NT/2000/XP/7/8), LINUX,UNIX

Databases: Oracle 10g/11g, MS SQL Server 2008, MySQL, HBase (NoSQL), MongoDB(NoSQL)

Big Data ecosystem: Hadoop - HDFS, Map reduce, Apache Pig, Hive, Hbase, Flume, Oozie, MongoDB

IDE Tools: Eclipse

Web Technologies: ASP.NET, HTML,XML

ETL Tools: Informatica Power Center 7.x/8.x/9,x

OLAP concepts: Data warehousing

Other Technologies: SQL Developer, TOAD.

PROFESSIONAL EXPERIENCE

Software Engineer - Big Data

Confidential - PA

Responsibilities:

  • Analyze and define client’s strategy and determine system architecture and requirement to achieve goals.
  • Associated with creating Hive Tables, and loading and analyzing data using Hive Queries for reports.
  • Used Spark and Spark-SQL to read the parquet data and create the tables in hive using the Scala API.
  • Developed Hive queries for the analysts.
  • Involved in converting Hive/SQL queries into Spark transformations using Spark RDD's.
  • Experienced in working with spark eco system using Spark SQL and Scala queries on different formats like Text file, CSV file.
  • Work with a variety of data formats such as JSON, Compressed CSV, Parquet etc.
  • Worked in AWS environment for development and deployment of custom Hadoop applications.
  • Strong experience in working with ELASTIC MAPREDUCE(EMR)and setting up environments on Amazon AWS EC2 instances.
  • Very good experience with version control.
  • Develop and deploy the outcome using spark and Scala code in Hadoop cluster running on Azure.
  • Importing and exporting data into HDFS and HIVE using Spark and Scala.
  • Analyzing/Transforming data with HIVE and PIG.
  • Load and transform large sets of structured, semi structured and unstructured data.
  • Developed job flows to automate the workflow for the data to be loaded to AWS S3 buckets and hive.
  • Worked on CDAP to automate and schedule the tasks.
  • Designed and Implemented Partitioning (Multi-level), Buckets in HIVE.
  • Loaded the aggregated data onto Amazon S3 Buckets from Hadoop environment using Sqoop for reporting on the dashboard.
  • Performed ETL operations such as transformations, event joins, filters both traffic and some pre-aggregations before storing the data onto cloud.
  • Designed and developed mappings, mapplets and sessions from different database source to target database using Informatica Power Center, and tuned mappings for improving performance.
  • Good understanding of Partitions, Bucketing concepts in Hive and designed both Managed and External tables in Hive to optimize performance.
  • Used agile methodology in developing the application, which included iterative application development, weekly status report and stand up meetings.
  • Involved in working with Spark on top of Yarn/MRv2 for interactive and Batch Analysis
  • Worked closely with AWS EC2 infrastructure teams to troubleshoot complex issues.
  • Responsible for the design and development of high performance, robust software using Scala
  • Developed different kinds of reports using large amounts of data that can be integrated with the app developed.
  • Utilized Azure Data Factory for transforming and moving data from virtual machine to Data Factory, BLOB storage, and SQL Server.
  • Expertise in writing the Scala code using higher order functions for the iterative algorithms in spark for performance consideration.
  • Work with support to investigate problems, perform root-cause analysis and deliver bug-fixes
  • Experienced in using the spark application master to monitor the spark jobs and capture the logs for the spark jobs.
  • Implemented Spark using Scala and Spark SQL for faster testing and processing of data.
  • Implemented Spark using Scala and utilizing Data frames and Spark SQL API for faster processing of data.
  • Involved in performing the analytics and visualization for the data from the logs and estimate the error rate and study the probability of future errors using regressing models.
  • Worked on Big Data Integration &Analytics based on Hadoop, SOLR, Spark, Kafka, Storm and web Methods.
  • Implemented Spring boot microservices to process the messages into the Kafka cluster setup.
  • Implemented Kafka producer and consumer applications on Kafka cluster setup with help of Zookeeper.
  • Managing and upgrading MVC framework and Scala.
  • Experience in refactoring the existing spark batch process for different logs written in Scala and Python.
  • Worked in utilizing spark machine learning techniques implemented in Scala and Python.

Environment: JDK1.6, HDFS, Map Reduce, Apache Pig, Sqoop, Hive, Ubuntu/CentOS, Oracle 10g, Eclipse LINUX.

Hadoop Consultant

Confidential

Responsibilities:

  • Moving data from Oracle to HDFS and vice-versa using SQOOP.
  • Written the Apache Pig scripts to process the HDFS data.
  • Associated with creating Hive Tables, and loading and analyzing data using Hive Queries for reports.
  • Installed and configured Pig and also written Pig Latin scripts.
  • Wrote Map Reduce job using Pig Latin.
  • Importing and exporting data into HDFS and HIVE using SQOOP.
  • Analyzing/Transforming data with HIVE and PIG.
  • Experience in building and architecting multiple Data pipelines, end to end ETL and ELT process for Data ingestion and transformation in GCP and coordinate task among the team.
  • Using g-cloud function with Python to load Data in to Bigquery for on arrival csv files in GCS bucket Develop and deploy the outcome using spark and Scala code in Hadoop cluster running on GCP(Google Cloud).
  • Load and transform large sets of structured, semi structured and unstructured data.
  • Worked with KUDU on streaming data in from live real-time data sources using the Java client, and then process it immediately upon arrival using Spark, Impala, or MapReduce.
  • Developed job flows to automate the workflow for pig and hive jobs.
  • Collecting and aggregating large amounts of data using Apache Flume and staging data in HDFS for further analysis.
  • Designed and Implemented Partitioning (Multi-level), Buckets in HIVE.
  • Loaded the aggregated data onto Oracle from Hadoop environment using Sqoop for reporting on the dashboard.
  • Extensively involved in performance tuning of Oracle queries
  • Good understanding of Partitions, Bucketing concepts in Hive and designed both Managed and External tables in Hive to optimize performance.
  • Used agile methodology in developing the application, which included iterative application development, weekly status report and stand up meetings.
  • Involved in working with Spark on top of Yarn/MRv2 for interactive and Batch Analysis
  • Worked closely with AWS EC2 infrastructure teams to troubleshoot complex issues
  • Expertise in writing the Scala code using higher order functions for the iterative algorithms in spark for performance consideration
  • Worked on Big Data Integration &Analytics based on Hadoop, SOLR, Spark, Kafka, Storm and web Methods.
  • Implemented Spring boot microservices to process the messages into the Kafka cluster setup.
  • Implemented Kafka producer and consumer applications on Kafka cluster setup with help of Zookeeper.
  • Managing and upgrading MVC framework and Scala.
  • Proficient with container systems like Docker and container orchestration like EC2 Container Service, Kubernetes, worked with Terraform.
  • Managed Docker orchestration and Docker containerization using Kubernetes.
  • Used Kubernetes to orchestrate the deployment, scaling and management of Docker Containers.
  • Worked in the performance tuning for mappings and ETL procedures both at mapping and session level.
  • Used Workflow monitor to monitor tasks, workflows and also to monitor performance.
  • Worked within a team to populate Type I and Type II slowly changing dimension customer tables Loading facts and dimensions from source to target data marts.
  • Used Mapplets, Parameters and Variables to implement Object Orientation techniques and facilitate the reusability of code.

Environment: JDK1.6, HDFS, Map Reduce, Apache Pig, Sqoop, Hive, Ubuntu/CentOS, Oracle 10g, Eclipse LINUX.

We'd love your feedback!