Big Data Engineer Resume Virginia Beach, VA - Hire IT People

SUMMARY

Over 12+ years of experience in Software development lifecycle - Software analysis, design, development, testing, deployment and maintenance.
Strong background with Big Data technologies like Spark, Azure,Scala, Hadoop, Storm, Batch, HDFS, MapReduce, Kafka, Hive, Cassandra, Python, SQOOP, and PIG.
Hands-on experience with Apache Spark and its components (Spark core and Spark SQL)
Experienced in converting HiveQL queries into Spark transformations using Spark RDDs and Scala
Hands on experience in in-memory data processing with Apache Spark
Developed Spark scripts by using Scala shell commands as per the requirement
Experience in designing and developing POCs in Spark using Scala to compare the performance of Spark with Hive and SQL/Oracle
Broad understanding and experience of real-time analytics and batch processing using apache spark.
Hands on experience in AWS (Amazon Web Services), Cassandra, python and cloud computing.
Experience with agile development methodologies like Scrum and Test-Driven Development, Continuous Integration
Ability to translate business requirements into system design
Experience in importing and exporting data from HDFS to RDBMS/ non-RDBMS and vice-versa using SQOOP
Analyzed large amounts of data sets by writing Pig scripts and Hive queries.
Hands on experience in writing pig Latin scripts and pig commands
Experience with front end technologies like HTML, CSS and JavaScript
Experienced in using tools like Eclipse, NetBeans, GIT, Tortoise SVN and TOAD.
Experience in database development using SQL and PL/SQL and experience working on databases like Oracle 9i/11g, MySQL and SQL Server.
Effective team player and excellent communication skills with insight to determine priorities, schedule work and meet critical timelines.
Certified in FINRA (Financial Industry Regulatory Authority, Inc)

TECHNICAL SKILLS

Big Data: Apache Spark, Scala, Map Reduce, HDFS, HBase, Hive, Pig, SQOOP, PostgreSQL

Databases: Oracle 9i/11g, My SQL, SQL Server 2000/2005

Hadoop distributions: Cloudera, Hortonworks, AWS

DWH (Reporting): OBIEE 10.1.3.2.0/11 g

DWH (ETL): Informatica Power Center 9.6.x

Languages: SQL, PL/SQL, Python, Java

UI: HTML, CSS, JavaScript

Defect Tracking Tools: Quality Center, JIRA

Tools: SQL Tools, TOAD

Version Control: Tortoise SVN, GitHub

Operating Systems: Windows ..., Linux/Unix

PROFESSIONAL EXPERIENCE

Confidential - Virginia Beach, VA

Big Data Engineer

Responsibilities:

Implemented solutions for ingesting data from various sources and processing the Data-at-Rest utilizing Big Data technologies such as Hadoop, Map Reduce Frameworks, HBase, Hive.
Performed data analysis, feature selection, feature extraction using Apache Spark Machine Learning streaming libraries in Python.
Loaded and transformed large sets of structured, semi structured and unstructured data using Hadoop/Big Data concepts.
Experience in AWS, implementing solutions using services like (EC2, S3, RDS, Redshift, VPC).
Extensively development experience in different IDE like Eclipse, Net Beans and IntelliJ.
Worked as a Hadoop consultant on (Map Reduce/Pig/HIVE/SQOOP).
Worked using Apache Hadoop ecosystem components like HDFS, Hive, SQOOP, Pig, and Map Reduce.
Good exposure to GitHub and Jenkins.
Exposed to Agile environment and familiar with tools like JIRA, Confluence.
Provided recommendations to machine learning group about customer roadmap.
Sound knowledge in Agile methodology- SCRUM, Rational Tools.
Lead architecture and design of data processing, warehousing and analytics initiatives.
Exploring with the Spark for improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, Pair RDD's, Spark YARN.
Used Apache Nifi for ingestion of data from the IBM MQ's (Messages Queue).
Identify query duplication, complexity and dependency to minimize migration efforts Technology stack: Oracle, Cloudera, Hortonworks HDP cluster, Cloudera Navigator Optimizer, AWS Cloud and Dynamo DB.
As a POC, used Spark for data transformation of larger data sets.
Worked on setting up and configuring AWS's EMR Clusters and Used Amazon IAM to grant fine-grained access to AWS resources to users
Worked on Sequence files, RC files, Map side joins, bucketing, partitioning for Hive performance enhancement and storage improvement.
Enable and configure Hadoop services such as HDFS, YARN, Hive, Hbase, Kafka, Sqoop, Notebook and Spark/Spark2.
Worked on Spark, Scala, Python, Storm, Impala.
Extensive experience in Spark Streaming (version 1.5.2) through core Spark API running Scala, Java to transform raw data from several data sources into forming baseline data.
Creating dashboard on Tableau and Elastic search with Kibana.
Hands on expertise in running the SPARK & SPARK SQL.
Experienced in analyzing and Optimizing RDD's by controlling partitions for the given data.
Worked on MapR Hadoop platform to implement big data solutions using Hive, Map Reduce, shell scripting, and java technologies.
Struts (MVC) is used for implementation of business model logic.
Experienced in querying data using Spark SQL on top of Spark engine.
Experience in managing and monitoring Hadoop cluster using Cloudera Manager.

Environment: Big Data, JDBC, NOSQL, Spark, YARN, HIVE, Pig, Scala, AWS EMR, Python, Hadoop, Redshift.

Confidential - Irvine, CA

Big Data Engineer

Responsibilities:

Analyzed large amounts of data sets to determine optimal way to aggregate and report on it.
Hands on experience in Spark, Cassandra, Scala, Python and Spark Streaming creating RDD's, applying operations -Transformation and Actions.
Used HIVE to analyze the partitioned and bucketed data and compute various metrics for reporting.
Handled importing of data from various data sources, performed transformations using Hive, Spark and loaded data into HDFS.
Developed spark code and Spark-SQL, for faster testing and processing of data
Snapped the cleansed data to the Analytics Cluster for reporting purpose to Business.
Hands on experience on AWS platform with S3 & EMR.
Experience on working with different data types like FLATFILES, ORC, AVRO and JSON.
Automation of Business reports using Bash scripts in UNIX on Data lake by sending them to business owners.
Provided design recommendations and thought leadership to sponsors/stakeholders that improved review processes and resolved technical problems and suggested some solution.

Environment: Apache Spark, Scala, Spark-Core, Spark-SQL, Python, Hadoop, MapReduce, HDFS, Hive, Pig, MongoDB, Sqoop, Oozie, MySQL, Python, Java (jdk1.7), AWS

Confidential - Sunnyvale, CA

Big Data Engineer

Responsibilities:

Build patterns according to business requirements to help find violations in the market and generate alerts by using Big Data technology (Hive, Tez, spark, Scala) on AWS
Worked as a Scrum Master, facilitating team productivities and monitoring project progress by applying Agile Methodology Scrum and Kanban on JIRA board to ensure quality of deliverables
Optimize the long-run pattern by writing shell-scripts and using optimization settings in Hive (e.g. successfully changed 20 hours daily pattern into 7 hours run by figuring out data skew in TB level table, which was adopted company-wise and saved around 50,000 USD per year)
Migrate on-prem RDBMS (Oracle, Greenplum) code into HiveQL and Spark SQL running on AWS EMR
Participate in Machine Learning project, including decision tree modeling and feature engineering
Responsible for ETL and data warehouse process to transfer and register data into AWS S3
Develop Hive UDF functions with Java and modify framework code with Python

Environment: Apache Spark, Scala, Spark-Core, Spark-Streaming, Python, Spark-SQL, Hadoop, MapReduce, HDFS, Hive, Pig, MongoDB, Sqoop, Oozie, MySQL, Java (jdk1.7), AWS

Confidential - Dallas, TX

Sr. Apache Spark Consultant

Responsibilities:

Gather business requirements for the project by coordinating with Business users and data warehousing (front-end) team members.
Involved in products data injection into HDFS using Spark
Created partitioned tables and bucketed data in Hive to improve the performance
Use Amazon Web Services (AWS), EC2 for computing and S3 as storage mechanism.
Load data into MongoDB using hive-mongo connection jars for the purpose of reports generation.
Developed Spark scripts by using Scala shell commands as per the requirement.
Loaded the data into Spark RDD and do in memory data Computation to generate the output response.
Optimizing of existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frames and Pair RDD's.
Performed advanced procedures like text analytics and processing, using the in-memory computing capabilities of Spark using Scala.
Worked on migrating Map Reduce programs into Spark transformations using Spark and Scala.
Handled importing of data from various data sources from Oracle into HDFS vice-versa using Sqoop.
Involved in converting Map Reduce programs into Spark transformations using Spark RDD's on Scala.
Migrating various Hive UDF's and queries into Spark SQL for faster requests.
Involved in creating Hive tables, loading with data and writing hive queries which run internally in MapReduce.

Environment: Apache Spark, Scala, Spark-Core, Spark-Streaming, Spark-SQL, Hadoop, MapReduce, HDFS, Hive, Pig, MongoDB, Sqoop, Oozie, Python, MySQL, Java (jdk1.7), AWS

Confidential

Big Data Developer

Responsibilities:

AML Cards is a compliance project handling all credit card transactions (both retail and consumer). The main goal is to detect fraud transactions and generate alerts on such transactions over a data about 400 GB/Month for USA & Canada alone. The project is divided into two parts:
Segmentation (12-month historical data is provided to analysts).
Transaction Monitoring (alerts are generated on 12 months & recurring feed data. This is a rule based alert generation model).
Lead the AML Cards North America development and DQ team successfully to implement the compliance project.
Involved in the project from POC and worked from data staging till saturation of DataMart and reporting. Worked in an onsite-offshore environment.
Completely responsible for creating data model for storing & processing data and for generating & reporting alerts. This model is being implemented as standard across all regions as a global solution.
Involved in discussions and guiding other region teams on SCB Big data platform and AML cards data model and strategy.
Responsible for technical design and review of data dictionary (Business requirement).
Responsible for providing technical solutions and work arounds.
Migrate of the needed data from Data warehouse and Product processors into HDFS using SQOOP and importing various formats of flat files into HDFS.
Involved in discussion with source systems for issues related to DQ in data.
Implemented partitioning, dynamic partitions, buckets and Custom UDF's in HIVE.
Used Hive to process data and Batch data filtering.
Supported and Monitored Map Reduce Programs running on the cluster.
Monitored logs and responded accordingly to any warning or failure conditions.
Responsible for preserving code and design integrity using SVN and SharePoint.

Environment: Apache Hadoop, HDFS, Hive, Map Reduce, Hive, Pig, HBase, Zookeeper, Oozie, MongoDB, Python, Java, Sqoop

Confidential

Data Quality Engineer

Responsibilities:

Designed, developed, and maintain an internal interface application allowing one application to share data with another.
Analyzed 90% of all changes and modifications to the interface application.
Coordinated development work efforts that spanned multiple applications and developers.
Developed and maintain data models for internal and external interfaces.
Worked with other Bureaus in the Department of State to implement data sharing interfaces.
Attended Configuration Management Process Working Group and Configuration Control Board meetings.
Performed DDL (CREATE, ALTER, DROP, TRUNCATE and RENAME), DML (INSERT, UPDATE, DELETE and SELECT) and DCL (GRANT and REVOKE) operations where permitted.
Design and develop database applications.
Design the database structure for an application.
Estimate storage requirements for an application.
Specify modifications of the database structures for an application.
Keep the database administrator informed of required changes.
Tune the application during development.
Establish an application's security requirements during development.
Created Functions, Procedures and Packages as part of the development.
Assisted the Configuration Management group to design new procedures and processes.
Lead the Interfaces Team with responsibility to maintain and support both internal and external interfaces.
Responsible for following all processes and procedures in place for the entire Software Development Life Cycle.
Wrote documents in support of the SDLC phases. Documents include requirements and analysis reports, design documents, and technical documentation.
Created MS Project schedules for large work efforts.

Environment: Oracle 9i, Informatica 7.1.x, Control-M, TOAD, Linux/Unix

We provide IT Staff Augmentation Services!

Big Data Engineer Resume

Virginia Beach, VA

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship