Senior Engineer - Big Data Resume
SUMMARY
- Solid experience in working with Fortune 1 company on Big Data Engineering
- Helped program and higher management in the design, planning of implementing critical projects
- Articulated Technical leadership to project teams in helping them from design to deployment related activities, provide guidance, perform reviews, prevent and resolve technical issues.
- Profound experience with agile scrum development and lean startup principles
- Proven ability in Client relationship building and Excellent ability on providing Innovative solutions
- Actively Lead the teams to shape Data Architecture vision, standards and guidelines
- Highly organized and efficient in fast - paced multitasking environments like Cloudera, Cloud, Python
- Strong ability in prioritizing effectively to accomplish objectives with commitment and enthusiasm
- Successfully Lead teams through high impact projects that use the newest data Lake technologies
- Stood behind the Technical teams in creating re-usable framework scripts for the broader use
- Strong judgment, decision-making and prioritization abilities
- Extensively worked on Python Programming for Automation and connecting different ecosystems
- Robust experience in Niche technologies like Apache Spark for Parallel Data Processing
- Best working experience of Technologies like Hadoop, Spark, ETL Pipelines, Spark Streaming
- Strong Experience in Data Science programming and their ecosystems
- Well versed with Python Packages such as Pandas, Numpy and other Data Science Packages
- Proven ability to master niche skills and technologies required by customer in shorter time frame
- Masterly ability to solve the most complex and high scale Data Lake challenges in Big Data Space
- Tight deadlines were turned into on time tasks completion with zero issues and picked up additional tasks
TECHNICAL SKILLS
Programming languages: Python, PySpark, Shell Scripting, SQL, PL/SQL and UNIX Bash
Big Data: Hadoop, Sqoop, Apache Spark, NiFi, AWS, GCP, Azure, Kafka, Snowflake, Cloudera, Horton Works, Pyspark, Spark, Spark SQL
Operating Systems: UNIX, LINUX, Solaris, Mainframes
Data bases: Oracle, DB2, Sybase, Netezza, Hive, Impala
IDE Tools: Aginitiy for Hadoop, PyCharm, Toad, SQL Developer, SQL *Plus, Sublime Text, VI Editor
Others: AutoSys, Crontab, ArcGIS, Clarity, Informatica, Business Objects, IBM MQ, Splunk
PROFESSIONAL EXPERIENCE
Confidential
Senior Engineer - Big Data
Responsibilities:
- In depth involvement in working with terabytes of orders data in Multi cloud and multi technical environment
- Successfully converted Pig Latin code into advanced Pyspark for the customers Best Sellers orders pipeline
- Fully Automated backfilling of huge volume of historical order data from Azure cloud into GCP cloud
- Extensively written error loggers with complete recovery of the pipeline helpful for worst case scenarios
- Efficiently turned complex tasks into simple programming logic using Python, Pyspark and shell in successfully processing Huge volume of data
- Continuously integrated CI/CD pipelines were completed using GIT and other open source tools
- Solid development of new python frameworks and shell wrapper scripts to embed the programming logic
Technical Environment: Pyspark, Python, Azure, GCP, Hive, Shell Scripting, Pig Latin, SQL, GIT
Confidential, Irving, TX
Lead Engineer - Big Data
Responsibilities:
- Highly involved in creating Hive tables, loading data, writing hive queries, generating partitions and buckets for optimization.
- Importing the data from Oracle into the HDFS using Sqoop . Performed full and incremental imports using Sqoop jobs.
- Built Highly Scalable ETL Data Pipelines were built utilizing the Big Data and Cloud Technologies
- Transferred ETL data into AWS EC2, S3 and EMR Cloud and finally into Snowflake
- Automating repeated tasks using Python and UNIX Bash Scripting
- Extensively worked on Cloudera - Hue to debug the logs and queries
- Robust utilization of Apache Spark to transform Bulk data for further Data munging
- Written Python scripts for movement of data across different systems
- Extensively utilised Python Pandas Packages for handling of the data
- Exceptionally processed Huge volume of data through Pyspark Parallel processing
- Constructively involved in data modeling of several applications
- Effeciently handled un structured and Structured different file formats including Parquet, ORC, JSON and Avro
Technical Environment: Python, AWS Cloud, Cloudera, Sqoop, Hive, Apache Spark, Oracle, SQL, UNIX and Snowflake
Confidential
Lead Engineer - Big Data
Responsibilities:
- Clearly articulated pros and cons of various technologies and platforms
- Benchmarked systems, analyse system bottlenecks and propose solutions to eliminate them
- Worked Directly with various senior stakeholder across Technology Division and their Business sponsors
- Seamlessly able to convert hard-to-grasp technical requirements into outstanding designs
- Applied strategic thinking and recommend technical solutions to teams engaged in big data initiatives
- Lead Team on working on Hive Queries optimizations and reduced time from 20 minutes to 2 minutes
- Lead the teams in working on Pyspark to build ETL data pipelines
- Robust working experience on Spark Streaming for Real Time Data Processing
- Strong Technical acumen in implementing Big data ETL projects involving huge volumes and large datasets
- Encouraged the team on writing SQOOP scripts to transfer data from RDBMS to Hadoop
- Highly Recommended the usage of NiFi Tool to transfer data between eco systems
- Extensively worked on Py Spark Scripts to transform data in Hadoop and make it utilizable for Snowflake
- Efficiently worked on building ETL Data Pipelines utilizing pandas packages using Cloudera
- Proposed High Level standards in implementation and executed across the team
- Involved from Start to End processes of Converting complex SAS reports to Hadoop scheduled jobs
- Effeciently handled different file formats including Parquet, ORC, JSON and Avro
Technical Environment: Python, Apache Spark, Hadoop Horton Works, SQOOP, Hive, Kafka, NiFi, SAS and Snowflake
