Sr. Data Engineer Resume
Hartford, CT
SUMMARY
- 7+ years of progressive experience in teh field of software programming, developing, Bigdata technologies and Hadoop Ecosystems which also includes Design, Integration, Maintenance, Implementation and Testing of various web applications and Interfaces.
- Good Experience wif Hive, Spark, HBase, Sqoop, Python for data extraction, processing, storage, and analysis.
- Expertise in designing data intensive applications using Hadoop Ecosystem, Big Data Analytical, Cloud Data engineering, Data Warehouse, Data Visualization, Reporting, and Data Quality solutions.
- Developed and maintained optimal data pipeline architecture in Snowflake Datawarehouse.
- Assembled large, complex data sets dat meet functional / non - functional business requirements.
- Strong experience in migrating other databases to Snowflake.
- Experience in building Snowpipe and data Ingestion.
- Experience in using Snowflake Clone and Time Travel.
- Experience in Coding for Stored Procedures/ Triggers/Task.
- Experience in AWS environment and AWS spark wif Strong experience in Cloud computing platforms such as AWS services.
- Experience on Spark Core, Spark SQL, Spark Streaming and creating teh Data Frames handle in Spark wif Scala.
- Expertise in Big Data processing using Hadoop, Hadoop Ecosystem (Map Reduce, Pig, Spark, Scala, Hive, Sqoop, Flume and HBase, Cassandra, Mongo DB, Kafka Framework) implementation, maintenance, ETL and Big Data analysis operations.
- Identified, designed, and implemented internal process improvements: automating manual processes, optimizing data delivery, re-designing infrastructure for greater scalability.
- To improve reporting performance by building an aggregate table schema dat reduces teh quantity of data needed, and thereby increases speed, of report queries.
- Worked on analytics tools dat utilize teh data pipeline to provide actionable insights into customer acquisition, operational efficiency and other key business performance metrics.
- Worked wif stakeholders including teh Executive, Product, Data and Design teams to assist wif data-related technical issues and support their data infrastructure needs.
- Strong knowledge in writing Hive UDF, Generic UDF's to in corporate complex business logic into Hive Queries.
- Experienced in optimizing Hive queries by tuning configuration parameters.
- Involved in designing teh data model in Hive for migrating teh ETL process into Hadoop and wrote Pig Scripts to load data into Hadoop environment.
- Experience in setting up and buildingAWSinfrastructure resources like VPC, EC2, S3, IAM, EBS, Lambda, Security Groups, IaaS, RDS, Dynamo DB, CloudFront, Elasticsearch, SNS and CloudFormation.
- Exposure to Migration from Data warehouses to Hadoop Eco System.
- Strong Knowledge and experience on architecture and components of Spark, and efficient in working wif Spark Core, SparkSQL.
- Extensive knowledge in programming wif Resilient Distributed Datasets (RDDs).
- Familiar wif importing and exporting data using Sqoop into HDFS and HIVE.
- Experience in cloud services specifically Microsoft Azure data pipeline services as-in Azure Databricks (PySpark), Azure Data factory, Azure blob storage.
- Experience in using tools like Zena, DE tool and Airflow to Schedule and Monitor workflows.
- Good knowledge and experince in storing and retrieving and maintaining teh Data warehouse on Snowflake.
- Strong knowledge on latest streaming tools like Kafka, Nifi and Spark Streaming.
- Expertise in implementing Spark modules and tuning its performance.
- Experience in handling different file formats like Text files, Sequence files, Avro data files using different SerDe's in Hive.
- Very good understanding of Partitions, bucketing concepts in Hive and designed both internal and external tables in Hive to optimize performance.
- Capable of processing large sets of structured, semi-structured and unstructured data and supporting systems application architecture.
- Having Good experience on Single node and Multi node Cluster Configurations.
TECHNICAL SKILLS
Big Data Technologies: HDFS, HBase, Hadoop, MapReduce, Azure data bricks, Hive, Pig, Sqoop, Kafka, Spark Core, Spark Streaming, Spark SQL, and Zookeeper.
Languages: Scala, Python, SQL, TSQL
Databases: MySQL, Oracle, DB2, HiveQL, SQL, Synapse
Operating Systems: Linux, UNIX, Windows
Workflow scheduler: Data Factory, Airflow, DE tool, Zena
Build Tools: Jenkins
Languages: C, SQL, Languages Shell, and Python scripting.
Databases: MySQL, Mongo DB, Cassandra, SQL Server.
PROFESSIONAL EXPERIENCE
Confidential | Hartford, CT
Sr. Data Engineer
Responsibilities:
- Worked closely wif teh business analysts to convert teh Business Requirements into Technical Requirements and preparing low and high-level documentation and worked closely wif architects.
- Developed Spark applications using Spark-SQL inDatabricksfor data extraction, transformation, and aggregation from multiple file formats for Analyzing & transforming teh data to uncover insights into teh customer usage patterns.
- Performing transformations using Hive, MapReduce, hands on experience in copying .log, snappy files into HDFS from Greenplum using Flume & Kafka, loaded data into HDFS and extracted teh data into HDFS from MYSQL using Sqoop.
- Imported required tables from RDBMS to HDFS using Sqoop and used Storm/ Spark streaming and Kafka to get real time streaming of data into HBase.
- Worked extensively wif importing metadata into Hive and migrated existing tables and applications to work on Hive and AWS cloud.
- Used AWS Redshift, S3, Spectrum and Athena services to query large amount data stored on S3 to create a Virtual Data Lake wifout having to go through ETL process.
- Design and Develop ETL Processes in AWS Glue to migrate Campaign data from external sources like S3, ORC/Parquet/Text Files into AWS Redshift.
- Developed views and templates wif Python and Django's view controller and templating language to create a user-friendly website interface.
- Experience in Writing Map Reduce jobs for text mining and worked wif predictive analysis team and Experience in working wif Hadoop components such as HBase, Spark, Yarn, Kafka, Zookeeper, PIG, HIVE, Sqoop, Oozie, Impala and Flume.
- Wrote HIVE UDF's as per requirements and to handle different schemas and xml data.
- Implemented ETL code to load data from multiple sources into HDFS using Pig Scripts.
- Developed data pipeline using Python, Hive to load data into data link. Perform data analysis data mapping for several data sources.
- Loaded data into S3 buckets using AWS Glue and PySpark. Involved in filtering data stored in S3 buckets using Elasticsearch and loaded data into Hive external tables.
- Used AWS EMR to transform and move large amounts of data into and out of other AWS data stores and databases, such as Amazon Simple Storage Service (Amazon S3) and Amazon DynamoDB.
- Designed new Member and Provider booking system which allows providers to book new slots, wif sending out teh member leg and provider Leg directly to TP through DataLink.
- Analyze various type of raw file like Json, Csv, Xml wif Python using Pandas, NumPy etc.
- DevelopedSparkapplications using Scalafor easy Hadoop transitions.And Hands on experienced in writingSparkjobs and Sparkstreaming API using Scalaand Python.
- Used SparkAPI over Cloudera Hadoop YARN to perform analytics on data in Hive developedSpark code andSpark-SQL/Streaming for faster testing and processing of data.
- Automated teh existing scripts for performance calculations using scheduling tools like Airflow.
- Created cloud-based software solutions written in Scala Spray IO, Akka, and Slick.
- Experience on fetching teh live stream data from DB2 to HBase table using Spark Streaming and Apache Kafka.
- Experience in job workflow scheduling and monitoring tools like Oozie and Zookeeper.
- Populated HDFS and Cassandra wif huge amounts of data using Apache Kafka.
Environment: Map Reduce, HDFS, Hive, Pig, HBase, Python, SQL, Sqoop, Flume, Oozie, Impala, Scala, Spark, Apache Kafka, Play, AWS, AKKA, Zookeeper, Linux Red Hat, HP-ALM, Eclipse, Cassandra, SSIS.
Confidential |Chicago, IL
Data Engineer
Responsibilities:
- Performing transformations using Hive, MapReduce, hands on experience in copying .log, snappy files into HDFS from Greenplum using Flume & Kafka, loaded data into HDFS and extracted teh data into HDFS from MYSQL using Sqoop and DataBricks.
- Imported required tables from RDBMS to HDFS using Sqoop and used Storm/ Spark streaming and Kafka to get real time streaming of data into HBase.
- Developing and Monitoring of applications in Hortonworks Hadoop Data Lake Environment.
- Develops Scripts using SQL Mappings and Joins in Hive using HIVE QL to perform testing.
- Configuring Spark Streaming in Python to receive real time data from teh Kafka and store it onto HDFS.
- Using PySpark prepared teh data as per teh business needs and published teh processed data to teh HDFS.
- Data Profiling, Data Analysis and Data Visualization for teh PHI and sensitive data, Analyze and Visualize teh type of teh data from teh sources and creating teh Mapping document
- Conducts data profiling and analysis to identify data anomalies and nuances wifin teh databases.
- Performs project specific, ad-hoc analysis and develop reproducible scripts where analytical approaches to meet business requirements.
- Performs complex HQL coding, which include data flows for evaluating business rules and creating necessary workflow to segregate data and load into final database objects for data visualization using analytical aspects of HIVE.
- Performs ETL and SQOOP on teh received data files to extract teh data and load teh data into proprietary databases, such as AAH, PSI, DB2, DataLake etc.
- Extend teh design to document low-level design specifications, including creation of data flow, workflow, data integration rules, data normalization, and data standardization methods.
- Communicating day to day progress of both Onsite and offshore teams to client manager and making sure work is tracked and completed as per project schedules.
- Takes an active part in Agile Development Process, bybeing a key player in participating in all Agile Ceremonies like Scrums, Sprint Planning, Backlog grooming, Sprint Demo and Retrospectives.
Environment: HDFS, Hive, HBase, Python, SQL, Sqoop, Flume, Oozie, Impala, Scala, Spark, Apache Kafka, Play, AKKA, Zookeeper, Linux Red Hat, HP-ALM.
Confidential |San Diego, CA.
Data Engineer
Responsibilities:
- Developed scripts for teh Data flow process from various sources to Target Database Using teh Bigdata Technologies to test teh process flow of data.
- Used Spark, HIVEQL, PIG extensively for retrieving, querying, storage, and data transformations.
- Developed Apache Kafka Streams using Console consumers and created Phoenix tables on streaming JSON data for User Validation.
- Created Spark streaming jobs to get teh Real time data and Batch processing data from source (DB2) to data lake.
- Worked on Parsing teh Different types of files like XML, Json, Fixed width to flattening them to Text format and loading in tables.
- Used Sqoop for loading teh Batch Data from source to Target.
- Implemented Kafka for consuming teh live streaming Data and Integrated Zena tool using Shell scripts to automate teh data process.
- Created Hive Tables and perform required queries on top of teh tables.
- Scheduling teh jobs based on Time and Event based triggers.
- Analyse Data Quality, Size, format, and frequency of teh data from sources.
- Design transformation rules based on tables involved and analyse teh join criteria to grab teh required fields.
- Design Mapping documents by analysing teh data to refer while developing scripts and unit testing as an entry criterion to teh QA.
- Documentation of test cases and workflow instructions
- Experience in using JIRA, ALM for tracking teh issues/defects, Loading teh testcases and test results.
- Knowledge transfer from teh client to teh offshore team to transfer teh necessary technical and business knowledge
- Communicate timelines, progress, delays of teh work assigned on daily basis from teh team to teh client and vice versa.
- Conduct review meetings wif teh offshore and support teams on daily assignments, management of Development and testing tasks.
Environment: Map Reduce, Hive, Pig, HBase, Python, SQL, Sqoop, Flume, Oozie, Impala, Scala, Spark, Zookeeper, Linux Red Hat, HP-ALM, Eclipse.
Confidential
Software Developer
Responsibilities:
- Involved in teh analysis, design, implementation, and testing of teh project.
- Exposed to various phases of Software Development Life Cycle using Agile - Scrum Software development methodology.
- Developed views and templates wif Python and Django's view controller and templating language to create a user-friendly website interface.
- Developed teh customer complaints application using Django Framework, which includes Python code.
- Strong understanding and practical experience in developing Spark applications wif Python.
- Developed Scala scripts, UDFs using both Data frames/SQL in Spark for Data Aggregations.
- Designed, develop, test, deploy and maintain teh website.
- Developed entire frontend and backend modules using Python on Django Web Framework.
- Developed Python scripts to update content in teh database and manipulate files.
- Rewrite existing Java application in Python module to deliver certain format of data
- Developed entire frontend and backend modules using Python on Django Web Framework.
- Generated property list for every application dynamically usingPython.
- Designed and developed teh UI of teh website using HTML, XHTML, AJAX, CSS and JavaScript.
- Wrote Python scripts to parse XML documents and load teh data in database.
- Generated property list for every application dynamically using Python.
- Handled all teh client-side validation using JavaScript.
- Performed testing using Django’s Test Module.
- Designed and developed data management system using MySQL.
- Creating unit test/regression test framework for working/new code.
- Responsible for search engine optimization to improve teh visibility of teh website.
- Responsible for debugging and troubleshooting teh web application.
Environment: Python, Django, Java, MySQL, Linux, HTML, XHTML, CSS, AJAX, JavaScript, Apache Web Server