Data Engineer Resume
Irving, TexaS
SUMMARY
- 8 years of IT experience in Analysis, Design, Development in Big Data technologies like Spark, Mapreduce, Hive, Yarn and HDFS including programming languages like Java, Scala and Python.
- Strong experience building data pipelines and performing large scale data transformations.
- In - Depth knowledge in working with Distributed Computing Systems and parallel processing techniques to efficiently deal with Big Data.
- Firm understanding of Hadoop architecture and various components including HDFS, Yarn, Mapreduce, Hive, Pig, HBase, Kafka, Oozie etc.,
- Strong experience building Spark applications using scala and python as programming language.
- Good experience troubleshooting and fine-tuning long running spark applications.
- Strong experience using Spark RDD Api, Spark Dataframe/Dataset Api,Spark-SQL and Spark ML frameworks for building end to end data pipelines.
- Good experience working with real time streaming pipelines using Kafka and Spark-Streaming.
- Strong experience working with Hive for performing various data analysis.
- Detailed exposure with various hive concepts like Partitioning, Bucketing, Join optimizations, Ser-De’s, built-in UDF’s and custom UDF’s.
- Good experience in automating end to end data pipelines using Oozie workflow orchestrator.
- Good experience working with Cloudera, Hortonworks and AWS big data services.
- Strong experience using and integrating various AWS cloud services like S3, EMR, Glue Metastore, Athena, Redshift into the data pipelines.
- Strong experience of leading multiple Azure Big Data and Data transformation implementations in Health domain.
- Worked on Docker based containers for using Airflow.
- Expertise in configuring and installation of PostgreSQL, Postgresplus advanced Server on OLTP to OLAP systems on from high end to low end environment.
- Experience in backup/restore of PostgreSQL databases. Strong experience in performance tuning & index maintenance.
- Detailed exposure on Azure tools such as Azure Data Lake, Azure Data Bricks,Azure Data Factory, HDInsight, Azure SQL Server, Azure DevOps.
- Knowledge in Setup and maintenance of Postgres master - slave clusters utilizing streaming replication.
- Experience in analyzing, designing, and developing ETL Strategies and processes, writing ETL specifications.
- Excellent understanding of NOSQL databases like HBASE, Cassandra, MongoDB.
- Proficient knowledge and hand on experience in writing shell scripts in Linux.
- Experienced in requirement analysis, application development, application migration and maintenance using Software Development Lifecycle (SDLC) and Python/Java technologies.
- Excellent technical and analytical skills with clear understanding of design goals and development for OLTP and dimensions modeling for OLAP.
- Adequate knowledge and working experience in Agile and Waterfall Methodologies.
- Defining user stories and driving the agile board in JIRA during project execution, participate in sprint demo and retrospective.
- Done POC on newly adopted technologies like Apache Airflow and Snowflake and Gitlab.
- Have good interpersonal, communication skills, strong problem-solving skills, explore/adopt to new technologies with ease and a good team member.
TECHNICAL SKILLS
Hadoop Ecosystem: HDFS, SQL, YARN, PIG Latin, MapReduce, Hive, Sqoop, Spark, Yarn, Strom, Zookeeper, Oozie, Kafka, Storm, Flume
Programming Languages: Python, PySpark, Java, JavaScript, Shell Scripting
Big Data Platforms: Hortonworks, Cloudera
AWS Platform: EC2, S3, EMR, Redshift, DynamoDB, Aurora, VPS, Glue, Kinesis, Boto3, Lambda
Operating Systems: Linux, Windows, UNIX
Databases: Netezza, MySQL, UDB, HBase, MongoDB, Cassandra, Snowflake
Development Methods: Agile/Scrum, Waterfall
IDE’s: PyCharm, IntelliJ, Ambari
Data Visualization: Tableau, BO Reports, Splunk
PROFESSIONAL EXPERIENCE
Confidential, Irving Texas
Data Engineer
Responsibilities:
- Strong experience of leading multiple Azure Big Data and Data transformation implementations in Health domain.
- Implemented large Lamda architectures using Azure Data platform capabilities like Azure Data Lake, Azure Data Factory, HDInsight, Azure SQL Server.
- Experience in developing Spark applications using Spark-SQL inDatabricksfor data extraction, transformation, and aggregation from multiple file formats for Analyzing& transforming the data to uncover insights into the customer usage patterns.
- Analyzing the Data from different sourcing using Big Data Solution Hadoop by implementing Azure Data Factory, Azure Data Lake, Azure Data Lake Analytics, HDInsights, Hive, Sqoop.
- Strong experience in Azure Cloud platforms like
- Designed end to end scalable architecture to solve business problems using various Azure Components like HDInsight, Data Factory, Data Lake, Azure Monitoring, Key Vault,Function app and Event Hubs.
- Created Pipelines inADFusingLinked Services/Datasets/Pipeline/ to Extract, Transform and load data from different sources likeAzure SQL, Blob storage, Azure SQL Data warehouse, write-back tool and backwards.
- Good experience in tracking and logging end to end software application build using Azure DevOps.
- UsedTerraformscript for deploying the application’s for higher environments.
- Involved in variousSDLCLife cycle phases like development, deployment, testing, documentation, implementation & maintenance of application software.
- Experience in transporting and processing real-time stream data usingKafka.
Environment: Azure, Data lake, data factory, Event hubs, Kafka, Function app, Key vault, Azure SQL, Azure Monitoring, Azure DevOps
Confidential, Atlanta GA
Data Engineer
Responsibilities:
- Responsible for the design, implementation and architecture of very large-scale data intelligence solutions around big data platforms.
- Analyzed large and critical datasets using HDFS, HBase, Hive, HQL, Pig, Sqoop and Zookeeper.
- Developed multiple POC’s using Spark, Scala and deployed on the Yarn Cluster, compared the performance of Spark, with Hive and SQL.
- Use Amazon Elastic Cloud Compute (EC2) infrastructure for computational tasks and Simple Storage Service (S3) as storage mechanism.
- Capable of using AWS utilities such as EMR, S3 and Cloud Watch to run and monitor Hadoop and Spark jobs on AWS.
- Worked on SQL queries in dimensional data warehouses and relational data warehouses. Performed Data Analysis and Data Profiling using Complex SQL queries on various systems.
- Troubleshoot and resolve data processing issues and proactively engaged in data modelling discussions.
- Worked on RDD Architecture and implementing spark operations on RDD and optimizing transformations and actions in Spark.
- Written programs in Spark using Python (PySpark) packages for performance tuning, optimization and data quality validations.
- Worked on Docker based containers for using Airflow.
- Worked on developing KafkaProducers and Kafka Consumers for streaming millions of events per second on streaming data.
- Implemented a distributing messaging queue to integrate with Cassandra using Apache Kafka.
- Hands on experience on fetching the live stream data from UDB into HBase table using PySpark streaming and Apache Kafka.
- Worked on Tableau to build customize interactive reports, worksheets, and dashboards.
Environment: HDFS, Python, SQL, Web Services, MapReduce, Spark, Kafka, Hive, Yarn, Pig, Flume, Zookeeper, Sqoop, UDB, Tableau, AWS, GitHub, Shell Scripting.
Confidential, California
Hadoop Engineer
Responsibilities:
- Designed and developed the applications on the data lake to transform the data according business users to perform analytics.
- Responsible to manage data coming from different sources and involved in HDFS maintenance and loading of structured and unstructured data.
- Worked on different files like csv, txt, fixed width to load data from various sources to raw tables.
- Conducted data model reviews with team members and captured technical metadata through modelling tools.
- Implemented ETLprocess wrote and optimized SQL queries to perform data extraction and merging from SQL server database.
- Experience in loading logs from multiple sources into HDFS using Flume.
- Worked with NoSQL databases like HBase in creating HBase tables to store large sets of semi-structured data coming from various data sources.
- Involved in designing and developing tables in HBase and storing aggregated data from Hive tables.
- Developed complex Mapreduce jobs for performing efficient data transformations.
- Data cleaning, pre-processing and modelling using JavaMapreduce.
- Strong Experience in writing SQL queries.
- Responsible for triggering the jobs using the Control-M.
Environment: Java, SQL, ETL, Hadoop, HDFS, HBase, MySQL, Netezza, Web Services, Shell Script, Control-M.
Confidential, Jacksonville FL
Lead ETL Developer
Responsibilities:
- Worked on Trade Transaction Management Projects Like Fixed Income Business Automation, Mortgage Based Securities and Consolidated Data Repository (CDR).
- Created various ETL Jobs in SSIS namely ICITrades, ICISecurityData, Bloomberg Trades, Global Transaction Trades, Factors, Financial Monitoring Units using Integration Catalogue 2014.
- Created various Real Time and Batch Jobs Scheduling in Autosys. Used Databases like Oracle, Teradata.
- Created various SSRS Reports Broker Dealer Liquidity Forecasting Reporting for FINRA4210. Legal/Regulatory, Unclassified Report Enhancements, Extended Settlements Reports, FINRA Master Summary reports, LOB Monitoring Reports.
- Created SSIS packages to extract, transform and load (ETL) data from different sources into destination target like data warehouse, Flat File, Excel, OLE DB using SQL Server integration Services.
- Involved in monitoring the workflows and in optimizing the load times.
- Handled Performance Tuning and Optimization on SSIS and MDX, with strong analytical and troubleshooting skills for quick issue resolution in large-scale production environments located globally.
- Involved inTabular DatawarehouseandDAXoperations for SSAS 2012 and 2008 OLAP databases.
Environment: SQL SERVER 2012, PERL, JSON, T-SQL, SSIS, SSRS and SSAS, MDX Queries.
Confidential, Jacksonville FL
ETL Developer
Responsibilities:
- Worked on various SSRS Reports Projects Green Sheet Capacity Planning Management Reports, Release Calendar Reports, Quality and Chargeback Reports, Dashboards on Capacity vs estimates using SQL server 2012 Reporting Services.
- Created various Chargeback Reports for Quality Center Team.
- Created various Data based subscriptions for various reports.
- Created various SSIS ETL packages for Retrieval and Updating of Global Directory, Loading QC / and Projects, Regions, Cost Center/GOC, Loading of Actuals and Estimates from External System i.e. Plainview and Loading various SSAS (Dimension and Facts) related Tables. Moving data from Production to Development, UAT and staging environments.
- The signing Party Requests application (MVC) provides the interface to create and maintain those requests for Travel, Headcount, and other requests each week and these requests and provide an automated workflow solution. These requests are reviewed by GCT chief information officer for exception approval. Tools used are ASP.NET, WCF web services, C#/LINQ, AJAX, jQuery, JSON for validation rules, Infragistics Web controls.
- Worked on some modules of TCAP windows application to maintain and monitor estimates for Resources. Some of them are Resources Time Off, Integration of Reports
- Created various Dashboards for the Quality Center.
- Imported data to SQL Server from Teradata and Oracle Databases.
- Created SSAS Cubes for the Estimation vs actuals and for Quality Center Management Statistics Analytical Dashboard Reports.
Environment: SQL SERVER 2012, jQuery, JSON, T-SQL, C#, SSIS, SSRS and SSAS, MDX Queries.