Big Data Developer/data Engineer Resume
SUMMARY:
- 4 + years of strong experience, working on Apache Hadoop ecosystem components like HDFS, Hive, Sqoop, Kafka, Spark, Python(pyspark) with CDH4&5 distributions.
- Over 20 years of extensive knowledge in Design, Development, Implementation and support of Data warehousing, Handling data in the Cloud, Unstructured Data, Data Modeling, Data Migration with experience in Hadoop Stack.
- Experience with migration of application to Cloud and handling of Unstructured data.
- Hands on experience on AWS cloud services (VPC, EC2, S3, IAM, Redshift, EMR, Lambda, SNS).
- Expertise in Data Warehousing/ETL(Datastage) programming and fulfilment of data warehouse project tasks such as data extraction, cleansing, aggregating, validations, transforming and loading.
- Solid understanding of Scala/Python programming and development.
- Diverse experience in working with variety of Database like Teradata, Oracle, Netezza.
- Experience in working with Hive query tuning to improve better throughput.
- Extensively used Scala, Spark - SQL & Python API’s for querying & transformation of data in Hive using Data frames
- Experienced with event-driven and scheduled AWS Lambda functions trigger various AWS resources.
- Involved in version control and source code management tools like SVN.
- Expert in Query Tuning and Performance optimization and Implementing Workload management.
- Good knowledge on various scripting languages like Linux/Unix shell scripting and Python.
- Excellent experience in different stages of the projects such as requirement analysis, systems analysis, Performance tuning, data loading, data extraction, and data transformation.
- Strong skills in coding and debugging Teradata utilities like Fast Load, Fast Export, Multi load and T-pump for Teradata processing huge volumes of data throughput.
- Proven expertise in query tuning and performance improvement of Data Warehousing applications.
TECHNICAL SKILLS:
I have extensive Domain knowledge in Hospitality, Retail and Telecom industries and proficient in:
Big Data technologies: Hadoop, HDFS, Hive, HBase, Hue, Zeppelin, Apache Spark, Docker, Yarn, Kafka, OOZIE, Avro, Ambari, JSON, Pig, NoSQL
Programming languages/Paradigms: Python, Scala, Java, C/C++, Shell Scripting, DevOps/CICD
AWS Cloud Services: S3, EBS, EC2, VPC, ELB, RDS, SQS, Lambda, Amazon EMR, Dynamo DB, Aurora DB
Databases: Teradata, Oracle, DB2, SQL Server
Version Control Tools: GitLab, GitHub, TFS, ClearCase
Operating Systems: Windows, AIX, Linux/Unix flavors
Agile Development practices:
ETL Tools: DataStage, Informatica
PROFESSIONAL EXPERIENCE:
Big Data Developer/Data Engineer
Confidential
Responsibilities:
- Consumed data fed through Kafka streams
- Developed Spark programs for processing and loading data to HDFS
- Configured Lambdas to be triggered by CloudWatch events
- Created SSM parameters to be used by lambda
- Created DDL/DML statements for Aurora DB
- Worked extensively on Hadoop Components such as HDFS, Name Node, Data Node, YARN, Spark programming.
- Analysed data which need to be loaded into hadoop and contacted with respective source teams to get the table information and connection details.
- Design, development and implementation of ETL pipelines using python API (pySpark) of on AWS. Writing reusable, testable, and efficient code.
- Used Sqoop to import data from different RDBMS systems like Oracle, DB2 and Netezza and loaded into HDFS.
- Used RDD's to perform transformation on datasets as well as to perform actions.
- Strong experience in writing applications using scala, python(pyspark) using different libraries like Pandas, NumPy.
- Created Hive tables and partitioned data for better performance. Implemented Hive UDF's and did performance tuning for better results.
- Developed workflow in Oozie to manage and schedule jobs on Hadoop cluster to trigger daily, weekly and monthly batch cycles.
- Strong knowledge on Data Warehouses, RDBMS and MPP database skills, including query Export the analysed data from hive tables to SQL databases using Sqoop for visualization and to generate reports for the BI team.
- Create partitioned tables in Hive and Manage and review Hadoop log files.
- Used Unix bash scripts to validate the files from Unix to HDFS file systems.
Skills: Python, Java, Hadoop, HDFS, GitLab, Kafka, Spark, YAML, Kubernetes/EKS, Aurora DB, Lambda
DW Consultant
Confidential, Orlando, FL
Responsibilities:
- Migrated Simba On-Prem data to Hadoop using Hadoop stack.
- Designed and implemented Hive tables for Simba Tactical tables structures over HDFS and HBase.
- Developed and tested Spark and Kafka jobs in Sandbox and Latest PDP environments.
- Used Zeppelin notebooks, Ambari, Yarn for execution and monitoring of Cloud jobs and data validation.
- Implemented QueryGrid views for accessing Cloud data via Teradata interfaces.
- Created and managed access to HDFS folders for project team in lower environment.
- Made changes related to migration of Merchandise Forecast Planning source feeds to the Cloud.
- Analyzed the new version changes and modified and tested the Semantic code.
- Wrote Python jobs for extracting source data with Data Mover component.
- Made necessary configuration and scheduling changes for implementing the new SIMBA in the cloud system.
- As project took the OnPrem route for final implementation, utilized my system and Python knowledge to modify the existing jobs to meet tight deadlines.
- Integrated OmniCart shopper data using SAS and Hadoop for CCP consumption
- $1M reduction in annual maintenance expense
- Significantly higher computing and storage enabling new capabilities and projects in FY17 and beyond
- Lightning fast (up-to 10x) performance for capabilities used by business partners (ex: faster reports, batch jobs)
- Significant efficiency & productivity gains by consolidating 2 data warehousing platforms (D3 & XI) into one (D3P)
- Volume and Revenue are available for realized room nights as well as rooms on the books for future arrival, unlike the volume for future arrivals only available before this project.
- Life cycle stages are identifiable for the room; walks, upgrades and upsells - new capability.
- Integrated Convention Space Sales data with Dreams groups data for more robust data around Groups, new capability.
- Improved Actual room count accuracy; accurate counts available
- Worked on application development and support for Dreams/Lilo, NGE, Accovia, DME and HRMS systems.
- Worked with Walt Disney Parks and Resorts (WDPR)’s Financial Systems and Lodging line of business teams to research reporting requirements and data queries.
- Developed ETL processes to enhance and support the D3 data warehouse.
- Supported interfaces from D3 to Dreams/Lilo, GoMaster, Guest Data Mart(GDM), NGE, Accovia, DME, and Revenue Management systems.
- Worked on performance tuning of load jobs and reporting queries.
- Performed build activities to match Lilo changes in D3.
- Wrote scripts for maintaining indexes and statistics on Teradata tables as part of Build activities.
- Designed, coded and implemented D3 ETL functionality to integrate new DME system data and DTR enhancements made to the Composite
ETL Architect
Confidential, Lakeland, FL
Responsibilities:
- Prepared source to target mapping documents as per business requirements and physical data model.
- Used DataStage stages such as Teradata API, Teradata Enterprise and Teradata MLoad used for loading transformed data into Teradata Database (EDW) based on the table type.
- Tuned Teradata queries on the target tables to enhance ETL and BI performance.
- Used Partition Primary Indexes, Secondary Indexes, Join Indexes etc. to enhance performance.
- Created Shell scripts to automate the data load processes to the target Data Warehouse.
Environment: IBM DataStage, Teradata, Teradata utilities, Oracle, Business Objects, DB2, SQL Assistant, Shell Scripting, Perl, AIX 5.2 and ERwin.
Sr. Software Engineer
Confidential, Richardson, TX
Responsibilities:
- Designed and implemented Bulk NE Addition feature to add multiple NEs of different NE Model types to CTM.
- Implemented CTM’s proprietary notification mechanism for events and alarms.
- Designed and implemented IPS (intelligent protection switching) module according to Resilient Packet Ring (RPR) specification.
Skills: C/C++, Java, XML, CORBA, VisiBroker, JacORB, JProbe, Oracle, Toad, OOAD, Clear Case, and Rational Rose
Software Engineer
Confidential, Clearwater FL
Responsibilities:
- Worked on an embedded platform developed for building test equipments for Telecom networks ranging from DS1, DS3, SDH, and SONET to ATM up to rates of 10G. CORBA Server provided the remote access capability to this platform.
Skills: C/C++, CORBA, VisiBroker, RogueWave, Java, OOAD, PVCS, Rational Rose, UML and Win NT-E.
