We provide IT Staff Augmentation Services!

Big Data Engineer Resume

5.00/5 (Submit Your Rating)

Virginia Beach, VA

SUMMARY

  • Over 12+ years of experience in Software development lifecycle - Software analysis, design, development, testing, deployment and maintenance.
  • Strong background with Big Data technologies like Spark, Azure,Scala, Hadoop, Storm, Batch, HDFS, MapReduce, Kafka, Hive, Cassandra, Python, SQOOP, and PIG.
  • Hands-on experience with Apache Spark and its components (Spark core and Spark SQL)
  • Experienced in converting HiveQL queries into Spark transformations using Spark RDDs and Scala
  • Hands on experience in in-memory data processing with Apache Spark
  • Developed Spark scripts by using Scala shell commands as per the requirement
  • Experience in designing and developing POCs in Spark using Scala to compare the performance of Spark with Hive and SQL/Oracle
  • Broad understanding and experience of real-time analytics and batch processing using apache spark.
  • Hands on experience in AWS (Amazon Web Services), Cassandra, python and cloud computing.
  • Experience with agile development methodologies like Scrum and Test-Driven Development, Continuous Integration
  • Ability to translate business requirements into system design
  • Experience in importing and exporting data from HDFS to RDBMS/ non-RDBMS and vice-versa using SQOOP
  • Analyzed large amounts of data sets by writing Pig scripts and Hive queries.
  • Hands on experience in writing pig Latin scripts and pig commands
  • Experience with front end technologies like HTML, CSS and JavaScript
  • Experienced in using tools like Eclipse, NetBeans, GIT, Tortoise SVN and TOAD.
  • Experience in database development using SQL and PL/SQL and experience working on databases like Oracle 9i/11g, MySQL and SQL Server.
  • Effective team player and excellent communication skills with insight to determine priorities, schedule work and meet critical timelines.
  • Certified in FINRA (Financial Industry Regulatory Authority, Inc)

TECHNICAL SKILLS

Big Data: Apache Spark, Scala, Map Reduce, HDFS, HBase, Hive, Pig, SQOOP, PostgreSQL

Databases: Oracle 9i/11g, My SQL, SQL Server 2000/2005

Hadoop distributions: Cloudera, Hortonworks, AWS

DWH (Reporting): OBIEE 10.1.3.2.0/11 g

DWH (ETL): Informatica Power Center 9.6.x

Languages: SQL, PL/SQL, Python, Java

UI: HTML, CSS, JavaScript

Defect Tracking Tools: Quality Center, JIRA

Tools: SQL Tools, TOAD

Version Control: Tortoise SVN, GitHub

Operating Systems: Windows ..., Linux/Unix

PROFESSIONAL EXPERIENCE

Confidential - Virginia Beach, VA

Big Data Engineer

Responsibilities:

  • Implemented solutions for ingesting data from various sources and processing the Data-at-Rest utilizing Big Data technologies such as Hadoop, Map Reduce Frameworks, HBase, Hive.
  • Performed data analysis, feature selection, feature extraction using Apache Spark Machine Learning streaming libraries in Python.
  • Loaded and transformed large sets of structured, semi structured and unstructured data using Hadoop/Big Data concepts.
  • Experience in AWS, implementing solutions using services like (EC2, S3, RDS, Redshift, VPC).
  • Extensively development experience in different IDE like Eclipse, Net Beans and IntelliJ.
  • Worked as a Hadoop consultant on (Map Reduce/Pig/HIVE/SQOOP).
  • Worked using Apache Hadoop ecosystem components like HDFS, Hive, SQOOP, Pig, and Map Reduce.
  • Good exposure to GitHub and Jenkins.
  • Exposed to Agile environment and familiar with tools like JIRA, Confluence.
  • Provided recommendations to machine learning group about customer roadmap.
  • Sound knowledge in Agile methodology- SCRUM, Rational Tools.
  • Lead architecture and design of data processing, warehousing and analytics initiatives.
  • Exploring with the Spark for improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, Pair RDD's, Spark YARN.
  • Used Apache Nifi for ingestion of data from the IBM MQ's (Messages Queue).
  • Identify query duplication, complexity and dependency to minimize migration efforts Technology stack: Oracle, Cloudera, Hortonworks HDP cluster, Cloudera Navigator Optimizer, AWS Cloud and Dynamo DB.
  • As a POC, used Spark for data transformation of larger data sets.
  • Worked on setting up and configuring AWS's EMR Clusters and Used Amazon IAM to grant fine-grained access to AWS resources to users
  • Worked on Sequence files, RC files, Map side joins, bucketing, partitioning for Hive performance enhancement and storage improvement.
  • Enable and configure Hadoop services such as HDFS, YARN, Hive, Hbase, Kafka, Sqoop, Notebook and Spark/Spark2.
  • Worked on Spark, Scala, Python, Storm, Impala.
  • Extensive experience in Spark Streaming (version 1.5.2) through core Spark API running Scala, Java to transform raw data from several data sources into forming baseline data.
  • Creating dashboard on Tableau and Elastic search with Kibana.
  • Hands on expertise in running the SPARK & SPARK SQL.
  • Experienced in analyzing and Optimizing RDD's by controlling partitions for the given data.
  • Worked on MapR Hadoop platform to implement big data solutions using Hive, Map Reduce, shell scripting, and java technologies.
  • Struts (MVC) is used for implementation of business model logic.
  • Experienced in querying data using Spark SQL on top of Spark engine.
  • Experience in managing and monitoring Hadoop cluster using Cloudera Manager.

Environment: Big Data, JDBC, NOSQL, Spark, YARN, HIVE, Pig, Scala, AWS EMR, Python, Hadoop, Redshift.

Confidential - Irvine, CA

Big Data Engineer

Responsibilities:

  • Analyzed large amounts of data sets to determine optimal way to aggregate and report on it.
  • Hands on experience in Spark, Cassandra, Scala, Python and Spark Streaming creating RDD's, applying operations -Transformation and Actions.
  • Used HIVE to analyze the partitioned and bucketed data and compute various metrics for reporting.
  • Handled importing of data from various data sources, performed transformations using Hive, Spark and loaded data into HDFS.
  • Developed spark code and Spark-SQL, for faster testing and processing of data
  • Snapped the cleansed data to the Analytics Cluster for reporting purpose to Business.
  • Hands on experience on AWS platform with S3 & EMR.
  • Experience on working with different data types like FLATFILES, ORC, AVRO and JSON.
  • Automation of Business reports using Bash scripts in UNIX on Data lake by sending them to business owners.
  • Provided design recommendations and thought leadership to sponsors/stakeholders that improved review processes and resolved technical problems and suggested some solution.

Environment: Apache Spark, Scala, Spark-Core, Spark-SQL, Python, Hadoop, MapReduce, HDFS, Hive, Pig, MongoDB, Sqoop, Oozie, MySQL, Python, Java (jdk1.7), AWS

Confidential - Sunnyvale, CA

Big Data Engineer

Responsibilities:

  • Build patterns according to business requirements to help find violations in the market and generate alerts by using Big Data technology (Hive, Tez, spark, Scala) on AWS
  • Worked as a Scrum Master, facilitating team productivities and monitoring project progress by applying Agile Methodology Scrum and Kanban on JIRA board to ensure quality of deliverables
  • Optimize the long-run pattern by writing shell-scripts and using optimization settings in Hive (e.g. successfully changed 20 hours daily pattern into 7 hours run by figuring out data skew in TB level table, which was adopted company-wise and saved around 50,000 USD per year)
  • Migrate on-prem RDBMS (Oracle, Greenplum) code into HiveQL and Spark SQL running on AWS EMR
  • Participate in Machine Learning project, including decision tree modeling and feature engineering
  • Responsible for ETL and data warehouse process to transfer and register data into AWS S3
  • Develop Hive UDF functions with Java and modify framework code with Python

Environment: Apache Spark, Scala, Spark-Core, Spark-Streaming, Python, Spark-SQL, Hadoop, MapReduce, HDFS, Hive, Pig, MongoDB, Sqoop, Oozie, MySQL, Java (jdk1.7), AWS

Confidential - Dallas, TX

Sr. Apache Spark Consultant

Responsibilities:

  • Gather business requirements for the project by coordinating with Business users and data warehousing (front-end) team members.
  • Involved in products data injection into HDFS using Spark
  • Created partitioned tables and bucketed data in Hive to improve the performance
  • Use Amazon Web Services (AWS), EC2 for computing and S3 as storage mechanism.
  • Load data into MongoDB using hive-mongo connection jars for the purpose of reports generation.
  • Developed Spark scripts by using Scala shell commands as per the requirement.
  • Loaded the data into Spark RDD and do in memory data Computation to generate the output response.
  • Optimizing of existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frames and Pair RDD's.
  • Performed advanced procedures like text analytics and processing, using the in-memory computing capabilities of Spark using Scala.
  • Worked on migrating Map Reduce programs into Spark transformations using Spark and Scala.
  • Handled importing of data from various data sources from Oracle into HDFS vice-versa using Sqoop.
  • Involved in converting Map Reduce programs into Spark transformations using Spark RDD's on Scala.
  • Migrating various Hive UDF's and queries into Spark SQL for faster requests.
  • Involved in creating Hive tables, loading with data and writing hive queries which run internally in MapReduce.

Environment: Apache Spark, Scala, Spark-Core, Spark-Streaming, Spark-SQL, Hadoop, MapReduce, HDFS, Hive, Pig, MongoDB, Sqoop, Oozie, Python, MySQL, Java (jdk1.7), AWS

Confidential

Big Data Developer

Responsibilities:

  • AML Cards is a compliance project handling all credit card transactions (both retail and consumer). The main goal is to detect fraud transactions and generate alerts on such transactions over a data about 400 GB/Month for USA & Canada alone. The project is divided into two parts:
  • Segmentation (12-month historical data is provided to analysts).
  • Transaction Monitoring (alerts are generated on 12 months & recurring feed data. This is a rule based alert generation model).
  • Lead the AML Cards North America development and DQ team successfully to implement the compliance project.
  • Involved in the project from POC and worked from data staging till saturation of DataMart and reporting. Worked in an onsite-offshore environment.
  • Completely responsible for creating data model for storing & processing data and for generating & reporting alerts. This model is being implemented as standard across all regions as a global solution.
  • Involved in discussions and guiding other region teams on SCB Big data platform and AML cards data model and strategy.
  • Responsible for technical design and review of data dictionary (Business requirement).
  • Responsible for providing technical solutions and work arounds.
  • Migrate of the needed data from Data warehouse and Product processors into HDFS using SQOOP and importing various formats of flat files into HDFS.
  • Involved in discussion with source systems for issues related to DQ in data.
  • Implemented partitioning, dynamic partitions, buckets and Custom UDF's in HIVE.
  • Used Hive to process data and Batch data filtering.
  • Supported and Monitored Map Reduce Programs running on the cluster.
  • Monitored logs and responded accordingly to any warning or failure conditions.
  • Responsible for preserving code and design integrity using SVN and SharePoint.

Environment: Apache Hadoop, HDFS, Hive, Map Reduce, Hive, Pig, HBase, Zookeeper, Oozie, MongoDB, Python, Java, Sqoop

Confidential

Data Quality Engineer

Responsibilities:

  • Designed, developed, and maintain an internal interface application allowing one application to share data with another.
  • Analyzed 90% of all changes and modifications to the interface application.
  • Coordinated development work efforts that spanned multiple applications and developers.
  • Developed and maintain data models for internal and external interfaces.
  • Worked with other Bureaus in the Department of State to implement data sharing interfaces.
  • Attended Configuration Management Process Working Group and Configuration Control Board meetings.
  • Performed DDL (CREATE, ALTER, DROP, TRUNCATE and RENAME), DML (INSERT, UPDATE, DELETE and SELECT) and DCL (GRANT and REVOKE) operations where permitted.
  • Design and develop database applications.
  • Design the database structure for an application.
  • Estimate storage requirements for an application.
  • Specify modifications of the database structures for an application.
  • Keep the database administrator informed of required changes.
  • Tune the application during development.
  • Establish an application's security requirements during development.
  • Created Functions, Procedures and Packages as part of the development.
  • Assisted the Configuration Management group to design new procedures and processes.
  • Lead the Interfaces Team with responsibility to maintain and support both internal and external interfaces.
  • Responsible for following all processes and procedures in place for the entire Software Development Life Cycle.
  • Wrote documents in support of the SDLC phases. Documents include requirements and analysis reports, design documents, and technical documentation.
  • Created MS Project schedules for large work efforts.

Environment: Oracle 9i, Informatica 7.1.x, Control-M, TOAD, Linux/Unix

We'd love your feedback!