Big Data Engineer Resume Plano, TX - Hire IT People

SUMMARY

Over 4 years of professional IT experience as Big Data Engineer.
Experience in Complete Software Development Life Cycle (SDLC) which includes Requirement Analysis, Design, Coding, Testing and Implementation using Agile (Scrum), TDD and other development methodologies.
Expertise in AWS Cloud Services (S3, EMR, EC2) and Snowflake Computing.
Strong understanding of Data warehouse concepts like ETL, Star Schema, Snowflake, data modeling experience using Normalization, Business Process Analysis, Dimensional Data modeling, physical & logical data modeling.
Experience in setting cluster in Amazon EC2 & S3 including the automation of setting & extending the clusters in AWS Amazon cloud.
Hands on experience in data transformation operations, by implementing various functions, for loading and evaluating data in the relations.
Loading log data into HDFS by collecting and aggregating the data from various sources.
Worked extensively on different databases Oracle, MySQL and have good database programming experience with SQL.
Expert in handling the various source schemes such as Flat files, DB2, MS SQL server, Excel, Oracle, Csv files, Teradata, XML files.
Worked on GIT for version control, JIRA for project tracking.

TECHNICAL SKILLS

Programming Languages: Core Java, Python, Scala.

AWS Services: EMR, S3, EC2, Lambda

Big Data Technologies: Hadoop, HDFS, Snowflake Computing, Scala, Spark.

Databases: MySQL, SQL Server.

NoSQL Databases: HBase and Cassandra.

Scripting and Query Languages: UNIX Shell scripting, SQL and PL/SQL.

Operating Systems: Windows, Linux

Other Tools: Eclipse, Tableau 10.1, Informatica, Control - M, ServiceNow

PROFESSIONAL EXPERIENCE

Confidential - Plano, TX

Big Data Engineer

Responsibilities:

Involved in gathering the business requirements, designing and development.
Responsible to review and understand how Quantum (Spark Wrapper) is used to ingest and process batch and real-timedatausing Apache Spark, Scala and SQL.
This project required an understanding of business rules, business logic, and use cases to be implemented.
Worked in a cloud environment on Amazon AWS using a Multistage deployment environment.
The project involved sources ofdataas disparate as CSV, Parquet, Avro, Kafka, Snowflake Tables, etc.
Developed Quantum workflows to read parquet files from S3 buckets and apply transformations, joins, filters, and SQL queries to different dataframes and create output datasets.
Synchronized and ensured high availability ofdatasources through AWS regions.
Prepared use cases, mockdataand error scenarios to test workflows execution in an EMR Cluster to be deployed from Dev to Test to QA.
Developed and coded exclusion rules workflow to connect it to Ability-to-Pay external process using Spark, Quantum, SQL and Python.
Design and implement ELK (ElasticSearch, and Kibana) stack solution for Proactive Monitoring of applications logs and statistics.
Use Scala to read Parquet files in HDFS and perform preprocessing.
Created Partitions, Buckets based on State to further process using Bucket based Hive joins.
Implemented Spark using Scala and Spark SQL for faster testing and processing of data.
Used Partitioning, Dynamic-Partitioning and Bucketing concepts for performance optimization in Hive.
Created Hive external Tables on top of HBase data.
Managed data in NOSQL databases like HBase.
Stored data into HBase data systems from Spark streams in kafka and spark-stream POC.
Performance tuning of various joins to Map side joins. Performance improvement of Hive queries using partitioning and bucketing
Work within and across Agile teams to design, develop, test, implement, and support technical solutions across a full stack of development tools and technologies.
Automate deployments on AWS using GitHub and Jenkins.
Verified and validated that ability-to-pay AWS Lambda triggered jobs appropriately to execute the cluster and process the accounts.
Set up the CI/CD pipelines using Jenkins, Maven, GitHub and AWS.
Used GitHub for control version and Jira for issues and project tracking.

Confidential - Round Rock, TX

Hadoop Developer

Responsibilities:

Worked with business teams and created Hive queries for ad-hoc access.
Responsible to manage data coming from different sources.
Involved in loading data from UNIX file system to HDFS.
Worked on Spark batch applications to convert HiveQL into Spark SQL using DataFrames and DataSets.
Created Hive tables and executed Hive queries on Hive warehouse.
Involved in review of functional and non-functional requirements and developed Hive queries for the analysts.
Extensively used Scala programming for developing Spark applications
Processing the schema-oriented data using Scala and Spark
Design and implement Hive tables (Partitioned, Non-Partitioned, Buckets).
Involved in HDFS maintenance and loading of structured and unstructured data.
Loaded the processed data into Hive tables.Worked on Hive developing external table, managed table, the pipelinefor smooth ETL processing.
Applied transformations on the data loaded into Spark Dataframes and done in memory data computation to generate the output response.
Worked on migrating MapReduce programs into Spark transformations using Spark and Scala, initially done using Python.
Experience in Code version control using Git and maintain repositories as a best practice.
Developed multiple POCs using Spark Scala and deployed on the Yarn cluster, compared the performance of Spark, with Hive and SQL.
Used hive to analyze the partitioned data and compute various metrics for reporting.
Import the data from different sources like HDFS into Spark Data frames.
Experienced with Spark Context, Spark -SQL, Data Frame and Pair RDD's.
Reduced the latency of spark jobs by tweaking the spark configurations and following other performance and Optimization techniques.
Developed various data connections from data source to SSIS, Tableau Server for report and dashboard development
Used Tableau for Data visualization to identify the role of various factors.
Involved in identifying KPIs.

Confidential

ETL Developer

Responsibilities:

Interacted with business community and gathered requirements based on changing needs. Incorporated identified factors into Informatica mappings to build Data Warehouses.
Developed a standard ETL framework to enable the reusability of similar logic across the board. Involved in System Documentation of Dataflow and methodology.
Identified all the dimensions to be included in the target warehouse design and confirmed the granularity of the facts in the fact tables.
Analyzed the logical model of the databases and normalizing it when necessary and involved in identification of the fact and dimension tables.
Extensively used Informatica Power Center for extracting, transforming and loading into different databases.
Wrote PL/SQL stored procedures and triggers for implementing business rules and transformations.
Developed transformation logic as per the requirement, created mappings and loaded data into respective targets.
Stored reformatted data from relational, flat file, XML files using Informatica (ETL) and developed mapping to load the data in slowly changing dimension.
Replicated operational tables into staging tables, to transform and load data into the enterprise data warehouse using Informatica.
Involved in Performance Tuning at various levels including Target, Source, Mapping, and Session for large data files.
Documented Data Mappings/ Transformations as per the business requirement.
Performed testing, knowledge transfer and mentored other team members.

We provide IT Staff Augmentation Services!

Big Data Engineer Resume

Plano, TX

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship