Big Data Engineer Resume
Atlanta, GA
SUMMARY
- 7 years of technical experience in Data analysis and Data Modeling business needs of clients, developing effective and efficient solutions, and ensuring client deliverables within committed timelines.
- Experience in Python, Scala, C, SQL, and Distributed Systems Architecture and Parallel Processing Frameworks, as well as a thorough understanding of Distributed Systems Architecture and Parallel Processing Frameworks.
- Constructing and manipulating large datasets of structured, semi - structured, and unstructured data and supporting systems application architecture using tools like SAS, SQL, Python, R, MiniTab, PowerBI, and more to extract multi-factor interactions and drive change.
- Using the S3 CLI tools, create scripts for creating new snapshots and deleting existing snapshots in S3.
- Expertise with Hive data warehouse architecture, including table creation, data distribution using Partitioning and Bucketing, and query development and tuning in Hive Query Language.
- Deep knowledge and strong deployment experience in Hadoop and Big Data ecosystems- HDFS, MapReduce, Spark, Pig, Sqoop, Hive, Oozie, Kafka, zookeeper, and HBase.
- Good understanding of client-server architecture for developing applications using TCP and UDP.
- Used Software methodologies like Agile, Scrum, TDD, and Waterfall.
- Experience in optimizing volumes, EC2 instances, created multiple VPC instances, created alarms, and notifications for EC2 instances using Cloud Watch.
- Good experience in developing desktop and web applications using Java, Spring, JDBC, Eclipse, React.
- Experienced in leading the enhancement, architecture, and ongoing evolution using a wide array of technologies (Spark, Python, GoLang, Apigee, Delta Lake, Databricks, Kafka, Data bucks, as well as more traditional technologies such as Mulesoft and SQL) across Amazon cloud environment.
- Knowledge of job workflow management and monitoring tools like Oozie, and zookeeper.
- Proficient in writing Bash, Pearl, Python scripts to automate and provide Control Flow.
- Knowledge of current trends in data technologies, data services, data virtualization and data integration.
- Experience in moving data into and out of the HDFS and Relational Database Systems (RDBMS) using Apache Sqoop.
- Created Logical and Physical Data Models by using Erwin based requirement analysis
- Expertise in AWS Resources like EC2, S3, EBS, VPC, ELB, AMI, SNS, RDS, IAM, Route 53, Autoscaling, Cloud Formation, Cloud Watch, Security Groups.
- Significant experience writing custom UDF in Hive and custom Input Formats in MapReduce.
- Used various Hadoop distributions (Cloudera, Hortonworks, Amazon EMR) to fully implement and leverage new Hadoop features.
- Created Hive tables, loaded data into it, and wrote Hive Ad-hoc queries that would run internally in MapReduce and Spark.
- Proficient in using Linux CLI.
- Strong experience building end-to-end data pipelines on the Hadoop platform.
- Experience working with NoSQL database technologies, including MongoDB, Cassandra, and HBase.
- Developed Spark Applications using Spark RDD, Spark-SQL, and Data frame APIs.
- Strong communication and interpersonal skills.
TECHNICAL SKILLS
PROGRAMMING: PYTHON |R |SAS PROGRAMMING |SQL |Scala| Shell Scripting
DATABASE DESIGN TOOLS: MS Visio| Fact and Dimensions tables| Normalization and De- normalization techniques| Kimball Inmon Methodologies
DATA MODELLING TOOLS: Erwin Data Modeler and Manager | ER Studio v17|physical and logical data modeling ETL/DATA WAREHOUSE TOOLS Informatica Power Center| Talend| Tableau| Pentaho| SSIS| DataStage QUERYING LANGUAGES SQL| NO SQL| PostgreSQL| MySQL| Microsoft SQL| Spark-SQL |Sqoop 1.4.4 DATABASES AWS RDS| Teradata| Hadoop FS| SQL Server| Oracle| Netezza| Microsoft SQL| DB2.
NOSQL DATABASES: MongoDB | Hadoop HBase | Apache Cassandra
CLOUD TECHNOLOGIES: AWS| Azure
HADOOP ECOSYSTEM: Hadoop| MapReduce| Yarn| HDFS| Kafka| Storm| Pig| Oozie| Zookeeper BIGDATA ECOSYSTEM Spark| Spark SQL| Spark Streaming| PySpark| Hive| Impala INTEGRATION TOOLS Git| Gerrit| Jenkins| ant| Maven
STREAMING: Flume 1.6| Spark Streaming| Streaming Analytics
METHODOLOGIES: Agile| Scrum| Waterfall UML
FAMILIAR: Microsoft Office |Github |Bitbucket|Slac
PROFESSIONAL EXPERIENCE
Confidential, Atlanta, GA
Big Data engineer
Responsibilities:
- Involved with impact assessment in terms of schedule changes, dependency impact, code changes for various change requests on the existing Data Warehouse applications that running in a production environment.
- Developed Data Mapping, Data Governance, Transformation, and Cleansing rules for the Master Data Management Architecture involving OLTP, ODS, and OLAP.
- Worked on Performance Tuning of the database which includes indexes, optimizing SQL Statements.
- Created tables, views, sequences, triggers, tablespaces, constraints, and generated DDL scripts for physical implementation.
- Worked on Optimization of the application and the designing of the database tables with the right partitioning keys using the DPF feature of hash partitioning and range partitioning.
- Used SQL for Querying the database in the UNIX environment.
- Experience in Project development and coordination with onshore-offshore ETL/BI developers & Business Analysts.
- Performed various data analyses at the source level and determined the key attributes for designing Fact and Dimension tables using star schema for an effective Data Warehouse and Data Mart.
- Used Model Mart of Erwin for effective model management of sharing, dividing, and reusing model information and design for productivity improvement.
- Worked at conceptual/logical/physical data model level using Erwin according to requirements.
- Worked closely hand in hand with the Business Analytics manager, who was also a part of the design/data modeling team
- Excellent Data Analytical / User interaction and Presentation Skills.
- Performeddataanalysisand dataprofiling usingcomplex SQL onvarioussourcessystems including Oracle, SQL server, and DB2
- Analyzed database performance with SQL Profiler and Optimized indexes to significantly improve performance.
- Created logical and physical data models using Erwin and reviewed these models with the business team and data architecture team.
- Performed data mining on data using very complex SQL queries and discovered patterns.
- Performed data management projects and fulfilling ad-hoc requests according to user specifications by utilizing data management software programs and tools like Perl, Toad, MS Access, Excel, and SQL.
- Prepared scripts for model and data migration from DB2 to the new Appliance environments.
- Ensured onsite to offshore transition, QA Processes, and closure of problems & issues.
- Experienced in creating UNIX scripts for file transfer and file manipulation.
Confidential, San Antonio, TX
Sr. Data engineer
Responsibilities:
- Extract Transform and Load data from Sources Systems to Azure Data Storage services using a combination of Azure Data Factory, T-SQL, Spark SQL, and U-SQL Azure Data Lake Analytics. Data Ingestion to one or more Azure Services - (Azure Data Lake, Azure Storage, Azure SQL, Azure DW) and processing the data in In Azure Databricks.
- Designed and implemented scalable infrastructure and platform for large amounts of data ingestion, aggregation, integration, and analytics in Hadoop, including Spark, Hive, Hbase.
- Building distributed data scalable using Hadoop.
- Creating MapReduce programs to enable data for transformation, extraction, and aggregation of multiple formats like Avro, Parquet, XML, JSON, CSV, and other compressed file formats.
- Developing an architecture to move the project from Abinitio to py spark and scala spark.
- Use Python, Scala programming on a daily basis to perform transformations for applying business logic.
- Setting up Hbase column-based storage repository for archiving data on daily basis.
- Created Pipelines in ADF using Linked Services/Datasets/Pipeline/ to Extract, Transform, and load data from different sources like Azure SQL, Blob storage, Azure SQL Data warehouse, write- back tool, and backward.
- Using Enterprise data lake to support various use cases including Analytics, Storing, and reporting of Voluminous, structured, and unstructured, rapidly changing data.
- Involved in designing, developing, testing, and documenting an application to combine personal loan, credit card, and mortgage from different countries and load data to Sybase database from hive database for Reporting insights.
- Implemented enterprise-grade platform (Mark logic) for ETL from mainframe to NoSQL (Cassandra).
- Writing Hive Queries in Spark-SQL for analysis and processing of the data.
- Using Sqoop to load data from HDFS, Hive, MySQL, and many other sources on daily basis.
- Creating job workflows using the Oozie schedule.
- Exported the analyzed data into relational databases using Sqoop for visualization and to generate reports for the BI team.
- Converting data load pipeline algorithms written in python and SQL to scala spark and pyspark.
Confidential, Providence, RI
Data Engineer
Responsibilities:
- Involved in building scalable distributed data lake system for Confidential real time and batch analytical needs.
- Involved in designing, reviewing, optimizing data transformation processes using Apache Storm.
- Experience in job management using Fair Scheduling and Developed job processing scripts using Control-M workflow.
- Used Spark API over Cloudera Hadoop YARN toperform analytics on data in Hive.
- Developed Scala scripts, UDFs using both Data frames/SQL and RDD/MapReduce in Spark 1.6 for Data Aggregation,queries and writingdata back into OLTP systemthrough Scoop.
- Experienced in Performing tuning of Spark Applications for setting right Batch Interval time, correct level of Parallelismand memorytuning.
- Loaded the data into Spark RDD and do in memory data computation to generate the output response.
- Optimizing ofexisting algorithms in Hadoopusing Spark Context, Spark-SQL, Data Framesand Pair RDD’s.
- Performed advanced procedures like text analytics and processing, using the in-memory computing capacities of Sparkusing Scala.
- Importeddata from Kafka Consumer into HBaseusing Sparkstreaming.
- Experienced in using Zookeeper and Oozie Operational Services for coordinating the cluster and scheduling workflows.
- Used Oozie workflow engine to manage interdependent Hadoop jobs and to automate several types of Hadoop jobs such asJava MapReduce, Hiveand Sqoop aswellassystemspecific jobs.
- Experienced in handling large datasets using partitions, Spark in Memory capabilities, Broadcasts in Spark, Effectiveefficient Joins, Transformation and otherduringingestionprocessitself.
- Workedon migratinglegacy Map Reduceprogramsinto Sparktransformationsusing Sparkand Scala.
- Worked on a POC to compare processing time for Impala with Apache Hive for batch applications to implement the former in project
- Workedextensively with Sqoop for importingmetadata from Oracle.
Confidential
PL/SQL Developer
Responsibilities:
- Using SQL Navigator, created advanced PL/SQL packages, procedures, triggers, functions, indexes, and collections to implement business logic. For the remote instance, generated server-side PL/SQL scripts for data manipulation and validation, as well as materialized views.
- Involved in determining the main systems' process flow, workflow, and data flow.
- Exception was used to handle errors. Handling a lot of stuff to make debugging and showing error warnings in the program easier.
- Built high-performance data integration solutions, including extraction, transformation, and load packages for data ware housing, utilizing SQL Server SSI Stool. Data was extracted from the XML file and entered the database.
- Testing all forms, PL/SQL code for logic correction.
- Data mapping from source to target database schemas, specification, and building data extract scripts/programming of data conversion in test and production settings were all part of the data analysis for data conversion.
- Extensive learning and development activities.
- All database objects, including tables, clusters, indexes, views, sequences, packages, and procedures, were administered.
- Involved in the full development cycle of Planning, Analysis, Design, Development, Testing, and Implementation.
- In Oracle, I designed and created all the tables and views for the system.
- Involved in building, debugging, and running forms.
- Involved in Data loading and Extracting functions using SQL*Loader.
- Developed PL/SQL triggers and master tables for automatic creation of primary keys.
- Designed and developed Oracle forms & reports generating up to 60 reports.
- Designing and developing forms validation procedures for query and update of data.
