We provide IT Staff Augmentation Services!

Senior Engineer - Big Data Resume

4.00/5 (Submit Your Rating)

SUMMARY

  • Solid experience in working with Fortune 1 company on Big Data Engineering
  • Helped program and higher management in the design, planning of implementing critical projects
  • Articulated Technical leadership to project teams in helping them from design to deployment related activities, provide guidance, perform reviews, prevent and resolve technical issues.
  • Profound experience with agile scrum development and lean startup principles
  • Proven ability in Client relationship building and Excellent ability on providing Innovative solutions
  • Actively Lead the teams to shape Data Architecture vision, standards and guidelines
  • Highly organized and efficient in fast - paced multitasking environments like Cloudera, Cloud, Python
  • Strong ability in prioritizing effectively to accomplish objectives with commitment and enthusiasm
  • Successfully Lead teams through high impact projects that use the newest data Lake technologies
  • Stood behind the Technical teams in creating re-usable framework scripts for the broader use
  • Strong judgment, decision-making and prioritization abilities
  • Extensively worked on Python Programming for Automation and connecting different ecosystems
  • Robust experience in Niche technologies like Apache Spark for Parallel Data Processing
  • Best working experience of Technologies like Hadoop, Spark, ETL Pipelines, Spark Streaming
  • Strong Experience in Data Science programming and their ecosystems
  • Well versed with Python Packages such as Pandas, Numpy and other Data Science Packages
  • Proven ability to master niche skills and technologies required by customer in shorter time frame
  • Masterly ability to solve the most complex and high scale Data Lake challenges in Big Data Space
  • Tight deadlines were turned into on time tasks completion with zero issues and picked up additional tasks

TECHNICAL SKILLS

Programming languages: Python, PySpark, Shell Scripting, SQL, PL/SQL and UNIX Bash

Big Data: Hadoop, Sqoop, Apache Spark, NiFi, AWS, GCP, Azure, Kafka, Snowflake, Cloudera, Horton Works, Pyspark, Spark, Spark SQL

Operating Systems: UNIX, LINUX, Solaris, Mainframes

Data bases: Oracle, DB2, Sybase, Netezza, Hive, Impala

IDE Tools: Aginitiy for Hadoop, PyCharm, Toad, SQL Developer, SQL *Plus, Sublime Text, VI Editor

Others: AutoSys, Crontab, ArcGIS, Clarity, Informatica, Business Objects, IBM MQ, Splunk

PROFESSIONAL EXPERIENCE

Confidential

Senior Engineer - Big Data

Responsibilities:

  • In depth involvement in working with terabytes of orders data in Multi cloud and multi technical environment
  • Successfully converted Pig Latin code into advanced Pyspark for the customers Best Sellers orders pipeline
  • Fully Automated backfilling of huge volume of historical order data from Azure cloud into GCP cloud
  • Extensively written error loggers with complete recovery of the pipeline helpful for worst case scenarios
  • Efficiently turned complex tasks into simple programming logic using Python, Pyspark and shell in successfully processing Huge volume of data
  • Continuously integrated CI/CD pipelines were completed using GIT and other open source tools
  • Solid development of new python frameworks and shell wrapper scripts to embed the programming logic

Technical Environment: Pyspark, Python, Azure, GCP, Hive, Shell Scripting, Pig Latin, SQL, GIT

Confidential, Irving, TX

Lead Engineer - Big Data

Responsibilities:

  • Highly involved in creating Hive tables, loading data, writing hive queries, generating partitions and buckets for optimization.
  • Importing the data from Oracle into the HDFS using Sqoop . Performed full and incremental imports using Sqoop jobs.
  • Built Highly Scalable ETL Data Pipelines were built utilizing the Big Data and Cloud Technologies
  • Transferred ETL data into AWS EC2, S3 and EMR Cloud and finally into Snowflake
  • Automating repeated tasks using Python and UNIX Bash Scripting
  • Extensively worked on Cloudera - Hue to debug the logs and queries
  • Robust utilization of Apache Spark to transform Bulk data for further Data munging
  • Written Python scripts for movement of data across different systems
  • Extensively utilised Python Pandas Packages for handling of the data
  • Exceptionally processed Huge volume of data through Pyspark Parallel processing
  • Constructively involved in data modeling of several applications
  • Effeciently handled un structured and Structured different file formats including Parquet, ORC, JSON and Avro

Technical Environment: Python, AWS Cloud, Cloudera, Sqoop, Hive, Apache Spark, Oracle, SQL, UNIX and Snowflake

Confidential

Lead Engineer - Big Data

Responsibilities:

  • Clearly articulated pros and cons of various technologies and platforms
  • Benchmarked systems, analyse system bottlenecks and propose solutions to eliminate them
  • Worked Directly with various senior stakeholder across Technology Division and their Business sponsors
  • Seamlessly able to convert hard-to-grasp technical requirements into outstanding designs
  • Applied strategic thinking and recommend technical solutions to teams engaged in big data initiatives
  • Lead Team on working on Hive Queries optimizations and reduced time from 20 minutes to 2 minutes
  • Lead the teams in working on Pyspark to build ETL data pipelines
  • Robust working experience on Spark Streaming for Real Time Data Processing
  • Strong Technical acumen in implementing Big data ETL projects involving huge volumes and large datasets
  • Encouraged the team on writing SQOOP scripts to transfer data from RDBMS to Hadoop
  • Highly Recommended the usage of NiFi Tool to transfer data between eco systems
  • Extensively worked on Py Spark Scripts to transform data in Hadoop and make it utilizable for Snowflake
  • Efficiently worked on building ETL Data Pipelines utilizing pandas packages using Cloudera
  • Proposed High Level standards in implementation and executed across the team
  • Involved from Start to End processes of Converting complex SAS reports to Hadoop scheduled jobs
  • Effeciently handled different file formats including Parquet, ORC, JSON and Avro

Technical Environment: Python, Apache Spark, Hadoop Horton Works, SQOOP, Hive, Kafka, NiFi, SAS and Snowflake

We'd love your feedback!