We provide IT Staff Augmentation Services!

Sr Data Engineer Resume

2.00/5 (Submit Your Rating)

Chicago, IL

OBJECTIVE

  • To work in a demanding workplace by showcasing my efficiency, displaying my intellect, and utilizing my software professional talents and adept IT professional with 7+ years of professional IT experience with Data Warehousing/Big Data which includes 4 years of experience in Big Data ecosystem related technologies like Hadoop, Map Reduce Pig, Hive and Spark, Data Visualization,Reporting and data quality solutions

SUMMARY

  • Knowledge in installing, configuring, and using Hadoop ecosystem components like Hadoop Map Reduce, HDFS, HBase, Oozie, Hive, Sqoop, Pig, spark, Kafka, storm, Zookeeper and Flume.
  • Experience in installation, configuration, supporting and managing - Cloudera’s Hadoop platform along with CDH3&4 clusters.
  • Experience on Migrating SQL database toAzure data Lake, Azure data lake Analytics,Azure SQL Database, Data BricksandAzure SQL Data warehouseand controlling and granting database accessandMigrating On premise databases toAzure Data Lake storeusing Azure Data factory.
  • Experience in implementing Azure data solutions, provisioning storage account, Azure Data Factory, SQL server, SQL Databases, SQL Data warehouse, Azure Data Bricks and Azure Cosmos DB
  • Have good experience designing cloud-based solutions in Azure by creating Azure SQL database, setting up Elastic pool jobs and design tabular models in Azure analysis services.
  • Experienced with Dimensional modelling, Data migration, Data cleansing, Data profiling, and ETL Processes features for data warehouses
  • Expertise in building CI/CD on AWS environment using AWS Code Commit, Code Build, Code Deploy and Code Pipeline and experience in using AWS CloudFormation, API Gateway, and AWS Lambda in automation and securing the infrastructure on AWS.
  • Excellent knowledge on Hadoop Architecture and ecosystems such as HDFS, Hive, Pig, Sqoop, Job Tracker, Task Tracker, Name Node, Data.
  • Expert in designing Parallel jobs using various stages like Join, Merge, Lookup, remove duplicates, Filter, Dataset, Lookup file set, Complex flat file, Modify, Aggregator, XML.
  • Good knowledge in Database Creation and maintenance of physical data models with Oracle, Teradata, Netezza, DB2, MongoDB, HBase and SQL Server databases.
  • Haveexperience ininstalling,configuringandadministratingHadoop cluster for major Hadoop distributions likeCDH4, and CDH5.
  • Involved with the Design and Development of ETL process related to benefits and offers data into the data warehouse from different sources.
  • Possess strong Documentation skill and knowledge sharing among Team, conducted data modeling sessions for different user groups, facilitated common data models between different applications, participated in requirement sessions to identify logical entities
  • Extensive experience in relational Data modeling, Dimensional data modeling, logical/Physical Design, ER Diagrams and OLTP and OLAP System Study and Analysis.
  • Extensively used ERWIN and PowerDesigner to design Logical and Physical Data Models, to forward and reverse engineering data models and publishing data model to acrobat PDF files.
  • Extensive knowledge and experience inproducing tables, reports, graphs,and listings using various procedures and handlinglarge databases to perform complexdata manipulations.
  • Excellent knowledge in preparing requiredproject documentationandtrackingand reportingregularly on the status of projects to all project stakeholders
  • Experience in UNIX shell scripting for processing large volumes of data from varied sources and loading into databases like Teradata
  • Proficient inData ModelingTechniques using Star Schema, Snowflake Schema, Fact and Dimension tables, RDBMS, Physical and Logical data modeling forData WarehouseandData Mart.
  • Experience with Data Analytics, Data Reporting, Ad-hoc Reporting, Graphs, Scales, PivotTables and OLAP reporting.
  • Highly skilled in using visualization tools like Tableau, ggplot2, dash,PowerBI, flask for creating dashboards.
  • Experience with Core Java, J2EE, IO Streams, Struts, ANT, Log4j, JUnit, JDBC, Servlets, JSP, Exception Handling, Multithreading, Java Beans, JNDI, XML/XSL, Socket programming, HTML, JavaScript, CSS, UML.
  • Experience onAngular JS, Node JS, Mongo DB,GitHub, Git,AmazonAWS,EC2, S3andcloud front.
  • Excellent Java development skills using J2jEE, spring, J2SE, Servlets, JUnit, MRUnit, JSP, JDBC.
  • Excellent Project implementation skills using Core Java, Java Beans, J2EE (JSP, Servlets), EJB, JMS, JNDI, JSF, Struts, Spring, Hibernate, JDBC, XML, Web Services and Design Patterns.
  • Experience in application development using Java, RDBMS, TALEND and Linux shell scripting and DB2.

TECHNICAL SKILLS

Big Data Ecosystem: Hadoop Map Reduce, Impala, HDFS, Hive, Pig, HBase, Flume, Storm, Sqoop, Oozie, Kafka, Spark and Zookeeper

Hadoop Distributions: Apache Hadoop 2.x/1.x, Cloudera CDP, Hortonworks HDP, Amazon EMR (EMR, EC2, EBS, RDS, S3, Athena, Glue, Elasticsearch, Lambda, DynamoDB, Redshift, ECS, Quicksight) Azure HDInsight (DataBricks, DataLake, Blob Storage, Data Factory ADF, SQL DB, SQL DWH, CosmosDB, Azure AD)

Programming Languages: Python, R, Scala, SAS, Java, SQL, HiveQL, PL/SQL, UNIX shell Scripting, Pig Latin

Machine learning: Linear Regression, Logistic Regression, Decision Tree, Random Forest, SVM, XGBoost, GBM, CatBoost, Naïve Bayes, PCA, LDA, K-Means, KNN, Neural Network

Deep Learning: Convolutional Neural Networks (CNN), Recurrent Neural Networks (RNN), LSTM, GRUs, GANs.

Databases: Snowflake, MySQL, Teradata, Oracle, MS SQL SERVER, PostgreSQL, DB2

NoSQL Databases: HBase, Cassandra, Mongo DB, DynamoDB and CosmosDB

Version Control: Git, SVN, Bitbucket

ETL/BI: Informatica, SSIS, SSRS, SSAS, Tableau, Power BI, QlikView, Arcadia, Erwin

Operating System: Mac OS, Windows 7/8/10, Unix, Linux, Ubuntu

Methodologies: RAD, JAD, UML, System Development Life Cycle (SDLC), Jira, Confluence, Agile, Waterfall Model

PROFESSIONAL EXPERIENCE

Confidential, Chicago, IL

Sr Data Engineer

Responsibilities:

  • Consult leadership/stakeholders to share design recommendations and thoughts to identify product and technical requirements, resolve technical problems and suggest Big Data based analytical solutions.
  • Designed and implemented a real-time data pipeline to process data by integrating ~150 billion raw records from multiple data sources using SQL, SnowSQL, Jenkins and stored processed data in Snowflake Cloud
  • Migrated an existing on-premises application to AWS. Used AWS services like EC2 and S3 for small data sets processing and storage, Experienced in Maintaining the Hadoop cluster on AWS EMR.
  • Architect & implement medium to large scale BI solutions on Azure using Azure Data Platform services (Azure Data Lake, Data Factory, Data Lake Analytics, Stream Analytics, Azure SQL DW, HDInsight/Databricks, NoSQL DB).
  • Designed and developed architecture for data services ecosystem spanning Relational, NoSQL, and Bigdata technologies. Extracted Mega Data from Amazon Redshift, AWS, and Elastic Search engine using SQL Queries to create reports.
  • Have good experience working with Azure BLOB andData Lakestorage and loading data intoAzure SQL Synapse analytics (DW).
  • Created Pipelines in ADF using Linked Services/Datasets/Pipeline/ to Extract, Transform and load data from different sources like Azure SQL, Blob storage, Azure SQL Data warehouse, write-back tool and backwards.
  • Developed JSON Scripts for deploying the Pipeline in Azure Data Factory (ADF) that process the data using the Sql Activity.
  • Developed Spark applications usingPysparkandSpark-SQLfor data extraction, transformation and aggregation from multiple file formats for analyzing & transforming the data to uncover insights into the customer usage patterns.
  • Experienced in performance tuning of Spark Applications for setting right batch interval time, correct level of parallelism and memory tuning.
  • Using Azure data lake, Azure Data factory and Azure data bricks to move and conform the data from on - premises to cloud to serve the analytical needs of the company.
  • Involved in creating UNIX shell scripts for database connectivity and executing queries in parallel job execution.
  • Extensively used Apache Sqoop for efficiently transferring bulk data between Apache Hadoop and relational databases (Oracle) for product level forecast. Extracted the data from Teradata into HDFS using Sqoop.
  • Controlling and granting database access and migrating on premise databases to Azure data lake store using Azure Data Factory.
  • Worked on Kafka REST API to collect and load the data on Hadoop file system and used Sqoop to load the data from relational databases.
  • Extract Real time feed usingKafkaandSpark Streamingand convert it to RDD and process data in the form of Data Frame and save the data as Parquet format in HDFS.
  • Analyzed existing systems and propose improvements in processes and systems for usage of modern scheduling tools like Airflow and migrating the legacy systems into an Enterprise data lake built on Azure Cloud.
  • Instantiated, created, and maintained CI/CD (continuous integration & deployment) pipelines and apply automation to environments and applications. Worked on various automation tools like GIT, Terraform, Ansible.
  • Developed JSON Scripts for deploying the Pipeline in Azure Data Factory (ADF) that process the data using the Sql Activity.
  • Created data pipeline package to move data from amazon S3 bucket to MYSQL database and executed MySQL stored procedure using events to load data into tables.
  • AWS CI/CD Data pipeline and AWS Data Lake using EC2, AWS Glue, AWS Lambda.

Environment: Databrick, Azure Synapse, Cosmos DB, ADF, SSRS, Power BI, Azure Data Lake, ARM, Azure HDInsight, Blob storage, Apache Spark,AWS, Azure ADF V2, ADLS, Spark SQL, Python/Scala, Ansible Scripts, Azure SQL DW(Synopsis), Azure SQL DB

Confidential, Miami, FL

Sr.Big Data Developer

Responsibilities:

  • As aBig DataDeveloper implemented solutions for ingesting data from various sources and processing the Data-at-Rest utilizing Big Data technologies such asHadoop, MapReduce Frameworks, MongoDB, Hive, Oozie, Flume, SqoopandTalendetc.
  • Processed data into HDFS by developing solutions, analyzed the data using MapReduce, Pig, Hive and produce summary results from Hadoop to downstream systems.
  • Developed data pipelines using Sqoop, Pig and Hive to ingest customer member data, clinical, biometrics, lab and claims data into HDFS to perform data analytics.
  • Managed and supported enterprise Data Warehouse operation, big data advanced predictive application development using Cloudera & Hortonworks HDP.
  • Implemented a CI/CD pipeline with Docker, Jenkins (TFS Plugin installed), Team Foundation Server (TFS), GitHub and Azure Container Service, whenever a new TFS/GitHub branch gets started, Jenkins, our Continuous Integration (CI) server, automatically attempts to build a new Docker container from it.
  • Involved in designing and deployment of Hadoop cluster and different Big Data analytical tools including Pig, Hive, HBase, Oozie, Zookeeper, SQOOP, Flume, Spark, Impala, and Cassandra with Horton work Distribution.
  • Designed, built, and maintained Big Data workflows to process millions of records using oozie workflows, Data Stage and maintained tables in HDFS using partitions in parquet encrypted format
  • Worked on developing ETL processes (Data Stage Open Studio) to load data from multiple data sources to HDFS using FLUME and SQOOP, and performed structural modifications using Map Reduce, HIVE.
  • Developing Spark scripts, UDFS using both Spark DSL and Spark SQL query for data aggregation, querying, and writing data back into RDBMS through Sqoop.
  • Experienced in using Zookeeper and OOZIE Operational Services for coordinating the cluster and scheduling workflows.
  • Implement solutions for ingesting data from various sources and processing the Datasets utilizing Big Data technologies such as Hadoop, Hive, Kafka, Map Reduce Frameworks and HBase.
  • Effective working relationships with client team to understand support requirements, and effectively manage client expectations. Ability to learn new technologies quickly.
  • DevelopedSpark codeusingScalaandSpark-SQL/Streamingfor faster testing and processing of data.
  • UsedSqoopto import the data fromRDBMStoHadoop Distributed File System(HDFS) and later analyzed the imported data usingHadoopComponents.
  • Created numerous pipelines in Azure using Azure Data Factory v2 to get the data from disparate source systems by using different Azure Activities like Move &Transform, Copy, filter, for each, Databricks etc.
  • Built pipelines to move hashed and un-hashed data fromAzure Blob to Data Lake.
  • Implemented theHive queriesfor aggregating the data and extracting useful information by sorting the data according to required attributes.
  • Developed workflow in Oozie to automate the tasks of loading the data into HDFS and pre-processing, analyzing, and training the classifier using MapReduce jobs, Pig jobs, and Hive jobs.
  • Developed data pipeline using Kafka, Sqoop, Hive and Java MapReduce to ingest customer behavioral data and financial histories into HDFS for analysis.
  • Experienced in handling large datasets using Partitions, Spark in Memory capabilities, Broadcasts in Spark, Effective & efficient Joins, Transformations and other during ingestion process itself.
  • Worked on partitioning HIVE tables and running the scripts in parallel to reduce run-time of the scripts.
  • Written multipleMapReduceprograms to power data for extraction, transformation and aggregation from multiple file formats includingXML,JSON,CSV& other compressed file formats.
  • Used Pig to do transformations, event joins, filter boot traffic and some pre - aggregations before storing the data onto HDFS.
  • Developed Python scripts for Spark to perform operations like Data inspection, Cleaning, Loading and Transformation for Large Datasets of Structured and semi-structured imported data.
  • Applied MapReduce framework jobs in java for data processing by installing and configuring Hadoop, HDFS.
  • Worked on analyzingHadoop clusterand different big data analytic tools includingPigHBasedatabase andSqoop

Environment: s:Hadoop, Spark, HDFS, Hive, Pig, HBase, Big Data, Apache Storm, Oozie, Sqoop, Kafka, Flume, Zookeeper, MapReduce, Cassandra, Scala, Linux, NoSQL, MySQL Workbench, Java,Eclipse, Oracle 10g, SQL

Confidential, Houston, TX

Data Engineer

Responsibilities:

  • Design and construct of AWS Data pipelines using various resources in AWS including AWS API Gateway to receives response from aws lambda and retrieve data from snowflake using lambda function and convert the response into Json format using Database as Snow Flake, DynamoDB, AWS Lambda function and AWS S3.
  • Developed and implemented data acquisition of Jobs using Scala that are implemented using Sqoop, Hive & Pig for optimization of MR Jobs to use HDFS efficiently by using various compression mechanisms with the help of Oozie workflow.
  • Designed and Developed Spark workflows using Scala for data pull from AWS S3 bucket and Snowflake applying transformations on it.
  • Migrated an existing on-premises application to AWS. Used AWS services like EC2 and S3 for small data sets processing and storage, Experienced in Maintaining the Hadoop cluster on AWS EMR.
  • Analyzed large and critical datasets using Cloudera, HDFS, MapReduce, Hive, Hive UDF, Pig, Sqoop and Spark.
  • Used Git version control to manage the source code and integrating Git with Jenkins to support build automation and integrated with Jira to monitor the commits.
  • Written Terraform scripts to automate AWS services which include ELB, CloudFront distribution, RDS, EC2, database security groups, Route 53, VPC, Subnets, Security Groups, and S3 Bucket and converted existing AWS infrastructure to AWS Lambda deployed via Terraform and AWS CloudFormation.
  • Worked on setting up Data Lake/Data catalog on AWS Glu
  • Setting up of data models and creating actual data lake on AWS Athena from S3 for visualization in aws quick sight.
  • Hands on experience in Hadoop, Hive, PIG, Sqoop, Kafka, AWS EMR, AWS S3, AWS Redshift, Oozie, Flume, HBase, Hue, HDP, IBM Mainframes, HP Nonstop and RedHat 5.6.
  • Worked on Snowflake Schemas and Data Warehousing andprocessedbatch and streaming data load pipeline usingSnow Pipeand Matillion from data lake Confidential AWS S3 bucket.
  • Responsible for the design, development, and administration of complex T-SQL queries (DDL / DML), Stored Procedures, Views& functions for transactional and analytical data structures
  • Identify and interpret trends and patterns in large and complex datasets. Analyze trends in key metrics
  • Involved in migrating tables from RDBMS into Hive tables using SQOOP and later generated data visualizations using Tableau.
  • Collaborated with Data engineers and operation team to implement ETL process, Snowflake models, wrote and optimized SQL queries to perform data extraction to fit the analytical requirements.
  • Involved in Writing Detailed Level Test Documentation for reports and Universe testing. Involved in developing detailed Test strategy, Test plan, Test cases and Test procedures using Quality Center for Functional and Regression Testing.
  • Hands on experience on Hadoop, Hive, PIG, Sqoop, Kafka, AWS EMR, AWS S3, AWS Redshift, Oozie, Flume, HBase, Hue, HDP, IBM Mainframes, HP Nonstop and RedHat 5.6.
  • Interfacing with business customers, gathering requirements and creating data sets/data to be used by business users for visualization.
  • Experience in migrating Enterprise Data (Trust data) and staging procedures from Microsoft SQL to AWS Redshift using AWS Glue, S3
  • Demonstrated expertise in data modeling, ETL development, and data warehousing per needs of project requirements.
  • Used AWS data pipeline for Data Extraction, Transformation and Loading from homogeneous or heterogeneous data sources and built various graphs for business decision-making using Python Matplot library
  • Deploy the code toEMRviaCI/CD using Jenkins.

Environment: AWS EMR, S3, RDS, Redshift, Lambda, Boto3, DynamoDB, Amazon SageMaker, Apache Spark, HBase, Apache Kafka, HIVE, SQOOP, Map Reduce, Snowflake, Apache Pig, Python, SSRS, Tableau

Confidential

Data Analyst

Responsibilities:

  • Created and analyzed business requirements to compose functional and implementable technical data solutions.
  • Analysis of functional and non-functional categorized data elements fordata profilingand mapping from source to target data environment. Developed working documents to support findings and assign specific tasks
  • Performeddata analysisanddata profilingusing complexSQLon various sources systems including Oracle andTeradata.
  • Responsible for analyzing various data sources such as flat files, ASCII Data, EBCDIC Data,Relational Data(Oracle, DB2 UDB, MS SQL Server) from various heterogeneous data sources.
  • Generated, wrote and run SQL script to implement the Don or update of indexes, creation of views and store procedures.
  • Created data dictionary, Data mapping for ETL and application support, DFD, ERD, mapping documents, metadata, DDL and DML as required.
  • Involved in information-gathering meetings and JAD sessions to gather business requirements, deliver business requirements document and preliminary logical data model.
  • Define the ETL mapping specification and Design the ETL process to source the data from sources and load it into DWH tables.
  • Developed Mappings using Source Qualifier, Expression, Filter, Look up, Update Strategy, Sorter, Joiner, Normalizer and Router transformations
  • Define the ETL mapping specification and Design the ETL process to source the data from sources and load it into DWH tables.
  • Performing data management projects and fulfilling ad-hoc requests according to user specifications by utilizing data management software programs and tools like Perl, Toad, MSAccess,Excel,and SQL
  • Identified and analyzed source data coming from Oracle, SQL server and flat files
  • Performed forward and reverse engineering, applying DDLs to database in restructuring the existing data Model using ERWIN
  • Designed ETL specification documents to load the data in target using various transformations according to the business requirements
  • Involved in writing, testing, and implementing triggers, stored procedures and functions at Database level using PL/SQL.
  • Extensively used ERWIN to design and restructure Logical and Physical Data Models.
  • Used Excel in setting up pivot tables to create various reports using a set of data from an SQL query.
  • TestedComplex ETL Mappingsand Sessions based on business user requirements and business rules to load data from source flat files and RDBMS tables to target tables.
  • Responsible for analyzing various data sources such as flat files, ASCII Data, EBCDIC Data,Relational Data(Oracle, DB2 UDB, MS SQL Server) from various heterogeneous data sources.
  • Extensively used Python's multiple data science packages like Pandas, NumPy, matplotlib, Seaborn, SciPy, Scikit-learn and NLTK.
  • Cleansing, mapping, and transforming data, create the job stream, add and delete the components to the job stream on data manager based on the requirement.
  • Developed Teradata SQL scripts using RANK functions to improve the query performance while pulling the data from large tables.
  • Designed and deployed reports with Drill Down, Drill Through and Drop-down menu option and Parameterized and Linked reports using Tableau.
  • Worked with data compliance teams, Data governance team to maintain data models, Metadata, Data Dictionaries; define source fields and its definitions.

Environment: Informatica Power Center v 8.6.1, Power Exchange, IBM Rational Data Architect, MS SQL Server, Teradata, PL/SQL, IBM Control Center, TOAD, Microsoft Project Plan, Repository Manager, Workflow Manager, ERWIN 3.0, Oracle 10g/9i, Teradata, TOAD, UNIX, and Shell scripting

Confidential

Java developer

Responsibilities:

  • Successfully completed the Architecture, Detailed Design & Development of modules Interacted with end users to gather, analyze, and implement the project.
  • Provide support in all phases of Software development life cycle (SDLC), quality management systems and project life cycle processes. Utilizing Database Such asMYSQL, Following HTTP and WSDL Standards to Design the REST/ SOAP Based Web API’S using XML, JSON, HTML, and DOM Technologies.
  • Provided end-to-end support for Enterprise Architectures, from requirements analysis and process modeling with IBM Rational Rose.
  • Developing and coding J2EE Components with JSP, JAVA Beans, and business objects with Hibernate.
  • Implemented the Email module, which included setting upJMS message queue, designing and developing email client which sent Java messages to the message queue, designing and developingmessage driven beansthat consumed the messages from the queue and sent emails using contents from the message.
  • Developed implemented a GWT based web Application, and maintained an asynchronous,AJAXbased rich client for improved customer experience usingXMLdata andXSLTtemplates
  • Involved in developing front end (UI) of the application using Angular 4, Bootstrap, JavaScript, jQuery, HTML5, CSS3.Active Coordination and Project documentation updates.
  • Designed and developed front-end Graphic User Interface with JSP, HTML,CSS3,JavaScript,jQuery and Flex-box.
  • Developed server-side utilities using J2EE technologies Servlets, JSP, JDBC using JDeveloper.
  • Hands on experience in conducting Joint Application Development (JAD) sessions with End-users, SMEs, Developers, QAs and other stakeholders’ jig for project meetings, walkthroughs, and customer interviews.
  • Used GIT to maintain the version of the files and took the responsibility to do the code merges and creating new branch when new feature implementation starts.
  • Implemented procedures, packages, triggers, and different Joins to retrieve the data base using PL/SQL, SQL scripts. Created DDL, DML scripts to create tables and privileges on respective tables in the database.
  • Implemented procedures, packages, triggers, and different Joins to retrieve the data base using PL/SQL, SQL scripts. Created DDL, DML scripts to create tables and privileges on respective tables in the database.
  • Developed business objects, request handlers and JSP’s for the boost mobile site using JAVA (Servlets, and XML)
  • Provided end-to-end support for Enterprise Architectures, from requirements analysis and process modeling with IBM Rational Rose.
  • Developed dispatcher Servlets class to handle all the requests matching the URL pattern
  • Developed business objects, request handlers and JSPs for the Project I site using JAVA (Servlets).

Environment: Java 6, Eclipse 3.6, JavaScript, jQuery tt1.10, Axis 1.4, JDeveloper 12, Oracle EBS OAF,ADF R12, Oracle DB 11g, TOAD, PL/SQL, JProfiler, Linux, WindowsXP

We'd love your feedback!