Data Engineer Resume Texas - Hire IT People

SUMMARY

Data Engineer with 8 years of experience in Big Data, analytics field in storage, querying, processing, and analysis for developing data pipelines with hands - on cloud infrastructure.
Expertise in designing scalable big data solutions, data warehouse models on large-scale distributed data, performing wide range of analytics.
Worked on SDLC life cycle agile scrum from feasibility analysis and conceptual design through implementation, including documentation, user training and operation support. Always eager to contribute to team success through hard work, attention to detail and good organizational skills.
Experienced in querying Snowflake, Oracle, Redshift, MS SQL server databases for OLTP and OLAP
Developed Spark and Hive Jobs to summarize and transform data
Understanding of RDBMS database concepts and performance tuning and query optimization
Data ingestion from different data sources into HDFS using Sqoop, Flume and perform transformations using Hive, Map Reduce
Worked on data processing - collecting, aggregating, moving from various sources using Apache Flume and Kafka
Developed Automation Regressing Scripts for validation of ETL process between multiple databases like AWS Redshift, Oracle, Mongo DB, and SQL Server using Python.
Responsible for design development of Spark SQL scripts based on functional specifications
Worked on cluster coordination services through Zookeeper
Deployed data pipelines with CI/CD process
Worked on ETL pipelines on S3 paraquet files on data lakes using AWS Glue
Used python and shell scripting to build pipelines
Performed advanced procedures like text analytics and processing, using the in-memory computing capabilities of Spark using Scala
Responsible for resolving issues and troubleshoot performance related to Hadoop clusters and fine-tuning failures in Spark applications
Hands-on experience on developing SQL scripts
Involved in file movements between HDFS and AWS S3 and worked with S3 bucket in AWS
Helped implemented and automate detective controls in cloud environment to alert on critical security issues
Monitored AWS migrated applications using CloudTrail, CloudWatch and AWS config
Written python script which automates to launch the EMR cluster and configure the Hadoop applications
Experienced in Waterfall and Agile development (SCRUM) methodologies
Experience with building supporting data transformation, data structures, metadata, dependency, and workload management.
Implemented and maintained security controls that reduce risk and allow risk-based reporting on cloud security posture.
Worked closely with AWS cloud security matter experts and advisor for IT product management staff in relation to dynamic and static code scans, vulnerability scans, web applications scans, and other cloud security reviews

TECHNICAL SKILLS

Data Tools / Technologies: Spark, Spark SQL, Spark Streaming, Hive, Sqoop, Hadoop, HDFS, MapReduce, Pig, Flume, Kafka, Zookeeper, Airflow, Data Lake

Programming Languages: Python, SQL, HiveQL, T-SQL, NoSQL, Shell Scripting, Java

NO SQL Databases: HBase, MongoDB, DynamoDB

Tools: PyCharm, Visual Studio Code, Tableau, Databricks, MySQL Workbench, Maven, Jupyter/Notebook, Tableau, GIT, Eclipse, Informatica, Terraform

Databases: Microsoft SQL Server 2008,2010/2012, MySQL 4.x/5.x, Oracle 11g, 12c

Cloud Platforms/Services: Snowflake, AWS, AWS CLI, EC2, S3, EMR, IAM, Redshift, DynamoDB, AWS Lambda, Glue, Athena, VPC, Databricks

Hadoop Distributions: Cloudera, Hortonworks

Platforms: UNIX, Windows, Linux

PROFESSIONAL EXPERIENCE

Confidential, Texas

Data Engineer

Responsibilities:

Involved in gathering requirements from different teams to design ETL migration process from Existing RDBMS to Hadoop Cluster using Sqoop.
Created Bash Scripts to load data from Linux/UNIX file system into HDFS
Developed HIVE queries for Data Transformation and Data analysis.
Loaded data from existing DWH sources like (Teradata and Oracle) into HDFS using Sqoop and load into Hive tables which are partitioned.
Converted some existing Sqoop, Hive jobs to Spark SQL applications to read data from RDBMS using JDBC and write it to hive table
Used Hive Optimizations like partitioning, Bucketing, Map Join, table statistics for efficient data access reducing execution time by 30%
Collaborated with Predictive Analytics Engineering team and developed End to End Data solutions for building Data Lake and migrated Datawarehouse from OnPrem to Hadoop cluster
Developed Pyspark scripts to Reduce costs of organization by 30% by migrating customers data in DHW (Teradata) to Hadoop.
Built Data Pipelines using Apache NIFI to analyze structured data by pulling it from Splunk and created Hive tables
Writing Advanced SQL queries against Snowflake and saving as Delta tables.
Developed spark jobs to session clickstream data residing in Snowflake
Worked on implementation of log producer in Scala and send them to Kafka and Zookeeper based log collection platform
Experience in handling JSON datasets and written custom Python functions that is re-used by various applications within enterprise
Responsible for assessing and improving the Quality of Customer Data.
Worked on end-to-end data quality process setup on AWS for entire health insurance division
Designed ETL pipelines in Amazon EMR using Pyspark to process the raw data from Amazon S3 and copy the data into Amazon Redshift and created views to enable fast access to the data & improved view performance by using sort keys
Worked on developing ETL pipelines on S3 parquet files on data lake using AWS Glue
Created and executed CRON jobs to automate the execution process of mass data quality
Developed a Java based ETL tool which extracts data from sources like IBM Cognos(xml) & MySQL and dumps data in the target tables in MySQL database
Monitored datasets on Ec2 instances with EBS volumes attached.
Involved in the code migration of quality monitoring tool from AWS EC2 to WS AWS lambda WS to reduce the costs incurred due to reserved EC2 instances

Environment: Apache Spark, Apache Hive, Python, AWS, S3, Pyspark, NIFI, Zookeeper, Kafka, Oracle Primavera, Hadoop, Data Lake, EMR

Confidential, Ohio

Data Engineer

Responsibilities:

Involved in the design and extraction of data from different ETL tools and then apply transformation logical
Used fast load and multiload utilities for loading data to staging and target tables
Developed complex mapping in DataStage
Worked on data profiling and created logical datasets on snowflake to administer quality monitoring process
Develop code in Hadoop technologies and perform unit testing
Designed and developed Spark Scala ingestion pipelines both in real-time and batch
Bulk loading from external stage and internal stage to Snowflake and perform transformations based on business requirements using Databricks, SparkSQL, Pyspark, S3 and Delta.
Developed ETL pipelines using combination of Python and Snowflakes, SnowSQL and writing advanced SQL quires against Snowflakes
Developed Spark streaming programs to process data from Kafka and process the data for both stateless and state full transformations
Built and implemented automated procedures to split large files into smaller batches of data to facilitate FTP transfer which reduced 60% of execution time
Created Spark programs to parse, analyze and implement the solutions based on customer needs
Migrated all Facts/OLAPS written in Hive into Pyspark
Ingested data and performed RDD transformations using Spark to perform streaming analytics in Data bricks
Worked on implementation of log producer in Scala for application logs, transform incremental log and send them to Kafka and Zookeeper based log collection platforms
Transformed Pyspark using AWS Glue dynamic frames with Pyspark
Implemented AWS step functions to automate Amazon Sage Maker related tasks such as publishing data to S3
Created graphs/charts with detailed data analysis results
Participate in Agile Scrum Daily Stand-up meeting to discuss work progress and blockers on the way
Actively be a part of Sprint meeting which is held every 15 days to Demo work to the clients and get their feedbacks.

Environment: Databricks, Spark, Python, AWS, S3, Snowflake, Pyspark, SparkSQL, Kafka, Zookeeper

Confidential

Data Engineer

Responsibilities:

Designed, built, and maintained the data pipelines that bring data into data lakes.
Involved in designing ETL pipelines to automate ingestion data and facilitate data analysis
Ingest streaming data from multiple sources into HDFS for storage and analysis using Apache Flume
Worked with global teams to fed data into various environments so that downstream applications can be tested
Used Hive to run map reduce jobs for data aggregation, transformation and aggregation from multiple file formats including XML, JSON, CSV and other compressed file formats
Efficiently used Spark transformations to create structured data from the pool of unstructured data received to build simple/quick and complex ETL applications
Created HBase tables to store variable data formats coming from different portfolios
Designed Internal and External table schemas in Hive with appropriate static and dynamic partitions for efficiency.
Involved in preparing design, unit, and integration tests documents.
Developed Hadoop solutions on AWS from developer to admin roles utilizing the Horton Hadoop stack
Managed AWS role-based security Hadoop admin load balancing on AWS EC2 clusters.

Environment: Apache Spark, Apache Hive, Python, AWS, S3, Snowflake, Pyspark, Spark SQL, Kafka, Oracle Primavera, Hadoop, Data Lake

Confidential

Data Engineer

Responsibilities:

Created consumption views on top of metrics to reduce the running time for complex queries.
Involved in Functional testing, Integration testing, Regression testing, Smoke testing, and performance testing. Tested Hadoop Map Reduce developed in python, pig, Hive
Generated Custom SQL to verify the dependency for the daily, Weekly, Monthly jobs.
Using Nebula Metadata, registered Business and Technical Datasets for corresponding SQL scripts
Created performance dashboards in Tableau/ Excel / Power point for the key stakeholders
Implemented Defect Tracking process using JIRA tool by assigning bugs to Development Team
Developed Spark code and Spark-SQL/streaming for faster testing and processing of data.
Evaluated the traffic and performance of Daily deals PLA ads and compare those items with non-daily deal items to see the possibility of increasing ROI. suggested improvements and modify existing BI components (Reports, Stored Procedures)
As a part of Data Migration, wrote many SQL Scripts for Mismatch of data and worked on loading the history data from Teradata SQL to snowflake.
Created Metric tables, End user views in Snowflake to feed data for Tableau refresh.
Experienced in working with spark ecosystem using Spark SQL and Scala queries on different formats like text file, CSV file.

Environment: Hadoop, MapReduce, Hive, Apache Spark, Sqoop Snowflake, Nebula, Teradata, SQL Server, Python, Pig, GitHub, Teradata, Tableau

Confidential

Data Analyst

Responsibilities:

Create database objects such as tables, views, stored procedures, triggers etc
Designed and implemented data integration modules for ETL data analysis techniques to validate business rules and identify low quality missing data in the existing enterprise data warehouse
Performed analysis and presented results using SQL, Python, SSIS, MS Access, Excel and visual basic scripts
Analyzed and validated findings, creating reports, presentations, and visualizations
Designed data table in coordination with client services and internal departments
Coordinated with QA testers for end-to-end unit testing and postproduction testing
Performed Tableau Server admin duties like installation, configuration, security, migration, upgrades, maintenance, and monitoring
Creating dashboards using Tableau to leverage interactive, reliable reporting and visually stunning, accurate dashboards, Tested, Cleaned, and Standardized Data to meet the business standards
Worked with relational DBMS environments and ER diagramming
Resolved and troubleshooted complex issue

Environment: SQL Server, Python, Tableau, MS Excel, MS Power Point.

We provide IT Staff Augmentation Services!

Data Engineer Resume

TexaS

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship