We provide IT Staff Augmentation Services!

Senior Big Data Engineer Resume

3.00/5 (Submit Your Rating)

Woonsocket, RI

SUMMARY

  • Over 8+ years of IT experience in Analysis, design, development, implementation, maintenance, and support wif experience in developing strategic methods for deploying big data technologies to efficiently solve Big Data processing requirement.
  • Strong experience in Hadoop distributed file system (HDFS), Impala, Sqoop, Kafka, Hive, Spark, Hue, MapReduce framework, Oozie, Zookeeper and Pig.
  • Installing Packages and setting up an CDH cluster coordinating wif Zookeeper, Spark, Kafka, HDFS.
  • Experience in data analysis using Hive, Pig Latin, Hbase and custom Map Reduce programs in Java.
  • Experience in writing custom UDFs in JAVA and SCALA for HIVE and Pig to extend teh functionality.
  • Responsible for importing data to HDFS using Sqoop from different RDBMS servers and exporting data using Sqoop to teh RDBMS servers after aggregations for other ETL operations
  • Experience wif Cloudera and Horton works distributions.
  • Experience in importing and exporting data using Sqoop from HDFS to Relational Database Systems (RDBMS) and from RDBMS to HDFS.
  • Experience in migrating other databases to Snowflake.
  • Well versed wifBig data on AWS cloud services me.e. EC2, S3, Glue, Antana, DynamoDB and RedShift
  • Spark Streaming collects dis data from Kafka in near - real-time and performs necessary transformations and aggregation on teh fly to build teh common learner data model and persists teh data in Cassandra cluster.
  • Experience in Creating Power BI reports and upgraded power pivot reports to Power BI.
  • Define virtual warehouse sizing for Snowflake for different type of workloads.
  • Solid experience on Azure Data Lake Analytics, Azure Data Bricks, Azure Data Lake Storage, Azure Data Factory, Azure SQL databases and Azure SQL Data Warehouse for providing analytics and reports for improving marketing strategies.
  • Experienced in processing large datasets wif Spark using Python.
  • Performed Hadoop backup Strategy to take teh backup of hive, HDFS, HBase, Oozie etc.
  • Written Oozie workflow to invoke teh Jobs in predefined Interval.
  • Experienced wif Docker and Kuberneteson multiple cloud providers, from halping developers build and containerize their application (CI/CD) to deploying either on public or private cloud.
  • On Other Hand working on POC wif Kafka and NIFI to pull teh real-time events into Hadoop Box.
  • Exploring wif teh Spark for improving teh performance and optimization of teh existing algorithms in Hadoop using Spark Context, SPARK-SQL, DATA FRAME, PAIR RDD'S and YARN.
  • Experienced in managingHadoopCluster using Hortonworks Ambari.
  • Experience in working wif Flume to load teh log data from multiple sources directly into HDFS.
  • Read teh data from HBase to Spark toperform Join on different tables.

TECHNICAL SKILLS

Languages: Scala, Python, SQL, Java, PL/SQL, Linux shell scripts

Hadoop Ecosystem: HDFS, Spark/PySpark, YARN, Hive, Hbase, Impala, Zookeeper, Sqoop, Oozie, Drill, Flume, Spark/PySpark, Solr and Avro, AWS, Amazon EC2, S3, Azure Databricks

Enterprise Technologies: J2EE, Event Hub, Kinesis, JDBC, Kafka

Operating Systems: Windows, Linux, UNIX, Mac-OS.

IDEs: Eclipse, IntelliJ

Cloud Environment: AWS, Azure, Snowflake

Relational Databases: Oracle, SQL, DB2, MySQL, Teradata

NoSQL databases: Hbase, Mongo DB, Cassandra

Markup Languages: HTML, XHTML, XML, DHTML.

Build & Management Tools: ANT, MAVEN, SVN.

Query Languages: SQL, PL/SQL.

Methodologies: SDLC, OOAD, Agile.

Continuous Integration Tools: Jenkins, Docker, Kubernetes

PROFESSIONAL EXPERIENCE

Confidential, Woonsocket, RI

Senior Big Data Engineer

Responsibilities:

  • Involved in complete Big Data flow of teh application starting from data ingestion upstream to HDFS, processing teh data in HDFS and analyzing teh data and involved.
  • Designed and implemented by configuring Topics in new Kafka cluster in all environment
  • Demonstrated expert level technical capabilities in areas of Azure Batch and Interactive solutions, Azure Machine learning solutions and operationalizing end to end Azure Cloud Analytics solutions.
  • Designed end to end scalable architecture to solve business problems using various Azure Components like HDInsight, Data Factory, Data Lake, Storage and Machine Learning Studio.
  • Worked on migrating MapReduce programs into Spark transformations using Spark and Scala.
  • Designed changes to transform current Hadoop jobs to HBase.
  • Partnered wif ETL developers to ensure dat data is well cleaned and teh data warehouse is up-to-date for reporting purpose by Pig.
  • Developing Json Scripts for deploying teh Pipeline in Azure Data Factory (ADF) dat process teh data using teh Cosmos Activity.
  • Experience in developing customized UDF’s in Python to extend Hive and Pig Latin functionality.
  • Created Pipelines inADFusingLinked Services/Datasets/Pipeline/ to Extract, Transform and load data from different sources likeAzure SQL, Blob storage, Azure SQL Data warehouse, write-back tool and backwards.
  • Configured Flume to extract teh data from teh web server output files to load into HDFS.
  • Used Flume to collect, aggregate and store teh web log data from different sources like web servers, mobile and network devices and pushed into HDFS.
  • Develop solutions to leverage ETL tools and identify opportunities for process improvements using Informatica and Python
  • Used windows Azure SQL reporting services to create reports wif tables, charts and maps.
  • Wrote Flume configuration files for importing streaming log data into HBase wif Flume.
  • Imported several transactional logs from web servers wif Flume to ingest teh data into HDFS.
  • Using Flume and Spool directory for loading teh data from local system (LFS) to HDFS.
  • Involved in designing teh row key in HBase to store Text and JSON as key values in HBase table and designed row key in such a way to get/scan it in a sorted order.
  • Created Partitioned Hive tables and worked on them using HiveQL.
  • Developed Json Scripts for deploying teh Pipeline in Azure Data Factory (ADF) dat process teh data using teh Cosmos Activity.
  • Responsible for data services and data movement infrastructures
  • Experienced in ETL concepts, building ETL solutions and Data modeling
  • Worked on architecting teh ETL transformation layers and writing spark jobs to do teh processing.
  • Experience in developing MapReduce Programs using Apache Hadoop for analyzing teh big data as per teh requirement.
  • Implemented Kafka Security Features using SSL and wifout Kerberos. Further wif more grain-fines Security me set up Kerberos to has users and groups dis will enable more advanced security features
  • Measured Efficiency of Hadoop/Hive environment ensuring SLA is met
  • Analyzed teh system for new enhancements/functionalities and perform Impact analysis of teh application for implementing ETL changes
  • Used OOZIE Operational Services for batch processing and scheduling workflows dynamically
  • Worked wif SCRUM team in delivering agreed user stories on time for every Sprint.

Environment: Hadoop YARN, Spark, Spark Streaming, MapReduce, Spark SQL,Kafka, Scala, Azure, Python, Hive, Sqoop, Impala, Tableau, Talend, Oozie, Control-M, HBase, Java, Oracle 12c, Linux

Confidential, Dallas, TX

Big Data Engineer

Responsibilities:

  • Using Python in spark to extract teh data from Snowflake and upload it to Salesforce on Daily basis.
  • Use python to write a service which is event based using AWS Lambda to achieve real time data to One-Lake (A Data Lake solution in Cap-One Enterprise).
  • Analyzing SQL scripts and designed teh solution to implement using PySpark
  • Export tables from Teradata to HDFS using Sqoop and build tables in Hive.
  • Use SparkSQL to load JSON data and create Schema RDD and loaded it into Hive Tables and handled structured data using SparkSQL.
  • Developed Spark code using Scala and Spark-SQL/Streaming for faster processing of data.
  • Developing Spark programs wif Python, and applied principals of functional programming to process teh complex structured data sets.
  • Developed Automation Regressing Scripts for validation of ETL process between multiple databases like AWS Redshift, Oracle, MongoDB, T-SQL, and SQL Server usingPython.
  • Used Airflow for scheduling teh Hive, Spark and MapReduce jobs.
  • Developed Spark/Scala, Python for regular expression (regex) project in teh Hadoop/Hive environment wif Linux/Windows for big data resources.
  • Data sources are extracted, transformed and loaded to generate CSV data files wif Python programming and SQL queries.
  • Worked wif Hadoop infrastructure to storage data in HDFS storage and use Spark / HIVE SQL to migrate underlying SQL codebase in AWS.
  • Worked wif Hadoop ecosystem and Implemented Spark using Scala and utilized Data frames and Spark SQL API for faster processing of data.
  • Utilized Apache Spark wif Python to develop and execute Big Data Analytics and Machine learning applications, executed machine Learning use cases under Spark ML and Mllib.
  • Developed Spark Streaming job to consume teh data from teh Kafka topic of different source systems and push teh data into HDFS locations.
  • Converting Hive/SQL queries into Spark transformations using Spark RDDs and Pyspark
  • Filtering and cleaning data using Scala code and SQL Queries
  • Troubleshooting errors in Hbase Shell/API, Pig, Hive and MapReduce.
  • Implemented Installation and configuration of multi-node cluster on Cloud using Amazon Web Services (AWS) onEC2.
  • Used Talend for Big Data Integration using Spark and Hadoop.
  • Responsible for analyzing large data sets and derive customer usage patterns by developing new MapReduce programs using Java.
  • Designed Kafka producer client using Confluent Kafka and produced events into Kafka topic.
  • Developed reusable objects like PL/SQL program units and libraries, database procedures and functions, database triggers to be used by teh team and satisfying teh business rules.
  • Subscribing teh Kafka topic wif Kafka consumer client and process teh events in real time using spark.
  • Collected data using Spark Streaming from AWS S3 bucket in near-real-time and performs necessary Transformations and Aggregation on teh fly to build teh common learner data model and persists teh data in HDFS.
  • Generate metadata, create Talend etl jobs, mappings to load data warehouse, data lake.
  • Designed and Developed Real Time Stream Processing Application using Spark, Kafka, Scala and Hive to perform Streaming ETL and apply Machine Learning.
  • Utilized Spark, Scala, Hadoop, HBase, Cassandra, MongoDB, Kafka, Spark Streaming, a broad variety of machine learning methods including classifications, regressions, dimensionally reduction etc.
  • Experience wif Data Analytics, Data Reporting, Ad-hoc Reporting, Graphs, Scales, PivotTables and OLTP reporting.

Environment: Hadoop, Spark, Scala, Hbase, Hive, Python, PL/SQL AWS, EC2, S3, Lambda, Auto Scaling, Cloud Watch, Cloud Formation, IBM Info sphere, DataStage, MapReduce, Oracle12c, Flat files, TOAD, MS SQL Server database, XML files, Cassandra, MongoDB, Kafka, MS Access database, Autosys, UNIX, Erwin.

Confidential, Charlotte, NC

Data Engineer

Responsibilities:

  • Developed SSRS reports, SSIS packages to Extract, Transform and Load data from various source systems
  • Implementing and Managing ETL solutions and automating operational processes.
  • Was responsible for ETL and data validation using SQL Server Integration Services.
  • Measured Efficiency of Hadoop/Hive environment ensuring SLA is met
  • Optimized teh Tensor Flow Model for efficiency
  • Created ad hoc queries and reports to support business decisions SQL Server Reporting Services (SSRS).
  • Analyze teh existing application programs and tune SQL queries using execution plan, query analyzer, SQL Profiler and database engine tuning advisor to enhance performance.
  • Optimizing and tuning teh Redshift environment, enabling queries to perform up to 100x faster for Tableau and SAS Visual Analytics.
  • Migrated on premise database structure to Confidential Redshift data warehouse
  • Created Entity Relationship Diagrams (ERD), Functional diagrams, Data flow diagrams and enforced referential integrity constraints and created logical and physical models using Erwin.
  • Defined and deployed monitoring, metrics, and logging systems on AWS.
  • Compiled data from various sources to perform complex analysis for actionable results
  • Strong understanding of AWS components such as EC2 and S3
  • Managed security groups on AWS, focusing on high-availability, fault-tolerance, and auto scaling using Terraform templates. Along wif Continuous Integration and Continuous Deployment wif AWS Lambda and AWS code pipeline.
  • Defined facts, dimensions and designed teh data marts using teh Ralph Kimball's Dimensional Data Mart modeling methodology using Erwin.
  • Worked on Big data on AWS cloud services me.e. EC2, S3, EMR and DynamoDB

Environment: AWS,EC2, S3, SQL Server, Erwin, Oracle, Redshift, Informatica, RDS, NOSQL, Snow Flake Schema, MySQL, Dynamo DB, Docker, PostgreSQL, Tableau, Git Hub

Confidential

Data Engineer

Responsibilities:

  • Installed and configuredFlume,Hive,Pig,SqoopandOozieon teh Hadoop cluster.
  • Responsible for coding Java Batch, Restful Service,MapReduceprogram, Hive query's, testing, debugging, Peer code review, troubleshooting and maintain status report.
  • DevelopedPigLatin scripts to extract and filter relevant data from teh web server output files to load into HDFS.
  • UsedAWS S3to store large amount of data in identical/similar repository.
  • Developed job workflow inOozieto automate teh tasks of loading teh data intoHDFSand few otherHivejobs.
  • UsedHiveto analyze teh partitioned and bucketed data and compute various metrics for reporting.
  • Experience in writing customMapReduceprograms &UDF's in Java to extendHiveandPigcore functionality.
  • Analyzed teh data by performingHivequeries and runningPigscripts to study customer behavior.
  • OptimizedMapReduceJobs to useHDFSefficiently by using various compression mechanisms.
  • Enabled speedy reviews and first mover advantages by usingOozieto automate data loading into teh Hadoop Distributed File System andPIGto pre-process teh data.
  • DevelopedOozie workflowfor scheduling and orchestrating theETLprocess.
  • Experienced in managing and reviewing teh Hadoop log files usingShell scripts.
  • DevelopedFlumeAgents for loading and filtering teh streaming data intoHDFS.
  • Handling continuous streaming data comes from different sources usingFlumeand set destination asHDFS.
  • Involved in collecting, aggregating and moving data from servers toHDFSusingFlume.
  • Experience in creating variousOoziejobs to manage processing workflows.

Environment: Hadoop, HDFS, Map Reduce, Hive, Pig, AWS, Flume, Oozie, HBase, Sqoop, RDBMS/DB, Flat files, MySQL, Java.

Confidential

Data Analyst

Responsibilities:

  • Created SSIS packages to pull data from SQL Server and exported to Excel Spreadsheets and vice versa.
  • Created new procedures to handle complex logic for business and modified already existing stored procedures, functions, views and tables for new enhancements of teh project and to resolve teh existing defects.
  • Automated teh process of extracting teh various files like flat/excel files from various sources like FTP and SFTP (Secure FTP).
  • Deploying and scheduling reports using SSRS to generate daily, weekly, monthly and quarterly reports.
  • Loading data from various sources like OLEDB, flat files to SQL Server database Using SSIS Packages and created data mappings to load teh data from source to destination.
  • Extensive use of Expressions, Variables, Row Count in SSIS packages

Environment: MS SQL Server, SQL Server Business Intelligence Development Studio, SSIS, SSRS, Report Builder, Office, Excel, Flat Files, T-SQL.

We'd love your feedback!