We provide IT Staff Augmentation Services!

Big Data Engineer Resume

4.00/5 (Submit Your Rating)

Minnesota, MN

SUMMARY

  • Around 8 years of experience using Big Data Ecosystems & Java.
  • Extensive experience in Apache Spark wif Scala, Apache Solr, Python
  • Extensive experience in data ingestion technologies like Flume, Kafka and NiFi
  • Utilize Flume, Kafka and NiFi to gain real - time and near real-time streaming data in HDFS from different data sources.
  • Good Knowledge in usingNiFi to automate teh data movement between different Hadoop systems.
  • Developed ELT workflows using NiFi to load data into Hive and Tera data.
  • Solid Mathematics, Probability and Statistics foundation and broad practical statistical and data mining techniques cultivated through various industry work and academic programs.
  • Delivered models in high-stakes information retrieval and statistical analysis, e.g. Fraud detection.
  • Experience in machine learning including supervised or unsupervised learning techniques and algorithms (e.g. k-NN, SVM, RVM, Naïve Bayes, Decision trees, etc.)
  • Design and develop models for building and deploying scalable cloud based predictive and prescriptive intelligence solutions such as recommender systems.
  • Knowledge in Spark Core, Spark-SQL, Spark Streaming and machine learning using Scala and Python Programming languages.
  • Hands on experience on Java8,Scala and Play/Akka framework.
  • Distributed Application Development using Actor Models for extreme scalability using Akka.
  • Involved in teh Software Development Life Cycle (SDLC)phases which include Analysis, Design, Implementation, Testing and Maintenance.
  • Extensively worked onCI/CD pipelinefor code deployment by engaging different tools (Git,Jenkins) in teh process right from developer code check-in to Production deployment.
  • Integrated services likeGitHub, AWS Code Pipeline, and Jenkins to create a deployment pipeline.
  • Responsible for building application full automation pipeline for deployment into AWS using Jenkins, Arti factory, Puppet and Terra form
  • Developed Spark Code using SCALA and Spark-SQL/Streaming for faster testing and processing of data.
  • Used DATAFRAME API in Scala for converting teh distributed collection of data organized into named columns.
  • Strong technical, administration & mentoring noledge in Linux and Big Data/Hadoop technologies.
  • Involved in designing and architecting data warehouses and data lakes on regular (Oracle, SQL Server) high performance (Netezza and Teradata) and big data (Hadoop - MongoDB, Hive, Cassandra and HBase) databases.
  • Have sound noledge on In-Memory MEMSQL.
  • Experience wif developing and maintaining applications written forAmazon Simple Storage,AWSElastic Map Reduce, andAWSCloudFormation.
  • Worked on migrating teh on-premises applications toAWS
  • Hands on experience on major components in Hadoop Ecosystem like Hadoop Map Reduce, HDFS, HIVE, PIG, Pentaho, Hbase, Zookeeper, Sqoop, Oozie, Cassandra, Flume and Avro.
  • Experienced teh deployment of Hadoop Cluster using Puppet tool
  • Used Kafka Streams to Configure Spark Streaming to get information and tan store it in HDFS
  • Work experience wif cloud infrastructure like Amazon Web Services (AWS).
  • Experience in importing and exporting teh data using SQOOP from HDFS to Relational Database systems/mainframe and vice-versa
  • Experience in designing and developing applications in Spark using Scala to compare teh performance of Spark wif Hive and SQL/Oracle.
  • Expertise in working wif ETL Architects, Data Analysts and data modelers to translate business rules/requirements into conceptual, physical and logical dimensional models and worked wif complex normalized and denormalized data models.
  • Installing, configuring and managing of Hadoop Clusters and Data Science tools.
  • Managing teh Hadoop distribution wif Cloudera Manager, Cloudera Navigator, Hue.
  • Setting up teh High-Availability for Hadoop Clusters components and Edge nodes.
  • Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs, Python and Scala.
  • Used Spark SQL for Scala, Python interface that automatically converts RDD case classes to schema RDD.
  • Configured and deployed Azure Automation scripts for applications utilizing teh Azure stack that including compute,blobs, ADF, Azure Data Lake, Azure Data Factory, Azure SQL, Cloud services, ARM Templates and utilities focusing on Automation Involved in converting Hive/SQL queries into SPARK TRANSFORMATIONS using Spark RDDs, and SCALA.
  • Strong experience in writing applications using python using different libraries like Pandas, Scikit-learn, NumPy, SciPy, Matpotlib etc
  • Experience in developing Shell scripts and Python Scripts for system management.
  • Well versed in using Software development methodologies like Rapid Application Development (RAD), Agile Methodology and Scrum software development processes.
  • Involved in converting Cassandra/Hive/SQL queries into Spark transformations using RDDs, and Scala
  • Analyzed teh Cassandra/SQL scripts and designed teh solution to implement using Scala
  • Experience wif Object Oriented Analysis and Design (OOAD)methodologies.
  • Migrated Python Machine learning modules to scalable, high performance and fault-tolerant distributed systems like Apache Spark.
  • Delivery experience on major Hadoop ecosystem Components such as Pig, Hive, Spark Kafka, Elastic Search & HBase and monitoring wif Cloudera Manager.
  • Worked on Data Modelling using various ML (Machine Learning Algorithms) via R and Python.
  • UsedAWSservices like EC2 and S3 for small data sets processing and storage, Experienced in Maintaining theHadoopcluster onAWSEMR.
  • Hands on experience in developing ETL data pipelines using pyspark on AWS EMR.
  • Decent skill in data warehousing and AWS migration wif tools such AWS SCT, DMS, Data Pipeline.
  • Experience in installations of software, writing test cases, debugging, and testing of batch and online systems.
  • Supported data analysis projects using Elastic Map Reduce on teh Amazon Web Services (AWS) cloud. Exporting and importing data into S3.
  • Experience in working onAWS, Flume to load teh log data from multiple sources directly into HDFS & running Pig and Hive scripts.
  • Experience in Production, quality assurance (QA), SIT (System Integration testing) and user acceptance (UA) testing.
  • Hands on experience in Spark using Scala and python creating RDD's, applying operations -Transformation and Actions.
  • Expertise in J2EEtechnologies like JSP, Servlets, EJBs 2.0, JDBC, JNDI and AJAX.
  • Experience in Performance Tuning and Debugging of existing ETL processes.
  • Expertise in applying Java Messaging Service (JMS)for reliable information exchange across Java applications.
  • Proficient wif Core JAVA, AWT and also wif teh markup languages like HTML 5.0,XHTML, DHTML, CSS, XML 1.1, XSL, XSLT, XPath, XQuery, Angular.js, Node.js
  • Expert in understanding teh data and designing/Implementing teh enterprise platforms like Hadoop Data lake and Huge Data warehouses.
  • Experience wif various technology platforms, application architecture, design, and delivery including experience architecting large big data enterprise data lake projects.
  • Worked wif version control systems like Subversion, Perforce, and GIT for providing common platform for all teh developers.
  • Articulate in written and verbal communication along wif strong interpersonal, analytical, and organizational skills.
  • Hands on experience wif Microsoft Azure Cloud services, Storage Accounts and Virtual Networks.
  • Good at manage hosting plans for Azure Infrastructure, implementing and deploying workloads on Azure virtual machines (VMs).
  • Highly motivated team player wif teh ability to work independently and adapt quickly to new and emerging technologies.
  • Creatively communicate and present models to business customers and executives, utilizing a variety of formats and visualization methodologies.
  • Develop codes using Data bricks and integrate snowflake to load into Cloud Databases.

TECHNICAL SKILLS

Languages: C,C++,Java 6,Java 7, Java 8, Scala.

Big Data Skills: Map reduce, Hadoop, Spark, Kafka,Python,Snowflake, Databricks

Web Technologies: HTML5 JavaScript, Ajax, CSS, JQuery, XML,BootStrap.

Servers: WebSphere, Tomcat 6.x, MIIS (Microsoft Internet Information Server)

Case Tools and IDE: Eclipse, NetBeans, RAD, IntelliJ, Netezza, PyCharm.

Frameworks in Hadoop: Spark, Kafka, Storm

Databases: DB2, Oracle and MySQL Server,MongoDB,Cassandra

Version Tools: SVN, CVS, ClearCases

Web Services: SOAP, REST

PROFESSIONAL EXPERIENCE

Confidential, Minnesota, MN

Big Data Engineer

Responsibilities:

  • Program Map Reduce and Hadoop solutions in Python, and Scala.
  • We useNiFito extract and parse teh streaming data.
  • Design, Development and Enhancements of various types of reports.
  • Designed & developed scalable and reliable near real-time stream processing solutions using (NiFi & Kafka).
  • Re-designed and developed a critical ingestion pipeline to process over 100 GB of data.
  • Provided technical expertise and created software design proposals for upcoming components.
  • Responsible for maintaining quality reference data by performing operations such as cleaning, transformation and ensuring Integrity in a relational environment.
  • Initiated a lot of fine-tuning mechanisms to tune teh database as well as teh queries to complete a set of given jobs or tasks in optimal time.
  • We use AWS extensively, so experience wif EMR and other Amazon Web services (S3, EC2,IAM) will halp you hit teh ground running.
  • Design, develop, deploy, and manage a reliable and scalable data analysis pipeline, using technologies including AWS, open-source tooling, and custom-built frameworks.
  • Create business intelligence dashboards in QuickSight for reconciliation and verifying data.

Environment: Apache Spark, Spark SQL, Nifi, Scala,Python,Pyspark, AWS.

Confidential, Chicago, IL

Big Data Engineer

Responsibilities:

  • Generated detailed studies on potential third-party data handling solutions, verifying compliance wif internal needs and stakeholder requirements.
  • Designed compliance frameworks for multi-site data warehousing efforts to verify conformity wif state and federal data security guidelines.
  • Verified and validated between teh target data and source data during data migration using data reconciliation process
  • Configured panels in reconciliation and enabled one to model complex pipeline systems.
  • Implemented Spark using Scala and Spark SQL for faster testing and processing of data.
  • Performed large-scale data conversions, transferring EBCDIC data into standardized parquet /ascii formats.
  • Convert EBCDIC file to parquet/Ascii using built-in program and worked wif Cobrix.
  • Develop a Scala program for rewards conversion (Earn, Spend, Bonus) and Integrate spark wif MongoDB and Cassandra
  • Collaborated wif Devops team on ETL (Extract, Transform, Load) tasks, maintaining data integrity and verifying pipeline stability.
  • Develop teh code and coordinted wif teh operations team on deploying, monitoring, analyzing, and tuning data into MongoDB.
  • Authored specifications for data processing tools and technologies, defining SOP (Standard Operating Procedures) for conversion and loading teh data NPI/PCI.
  • Mapped data between source systems and warehouses.
  • Completed quality reviews for designs, codes, test plans and documentation methods.
  • Validated warehouse data structure and accuracy.
  • Collaborated wif multi-functional roles to communicate and align development efforts.
  • Resolved conversion problems, improved operations and provided exceptional client support.
  • Developed Spark/Scala, Python for regular expression (regex) project in teh Databricks for big data resources.
  • Expertized in implementing Spark usingScalaandSpark SQLfor faster testing and processing of data responsible to manage data from different sources.
  • Data sources are extracted, transformed and loaded to generate CSV data files wif Python programming and SQL queries.
  • Worked on data pre-processing and cleaning teh data to perform feature engineering and performed data imputation techniques for teh missing values in teh dataset using Python.
  • Develop Python codes on Databricks for automation and integrate wif snowflake.
  • Analysed teh sql scripts and designed it by using PySpark SQL for faster performance.
  • Develop python code on Databricks to run all notebooks in parallel to load into snowflake database.
  • Worked on POC's wif AWS Quick sight to create dashboard for data, loading through data bricks into snowflake databases.
  • Developed Data Lake using AWS and Hands-on experience in AWS Glue, EMR, Atana, Quick Sight and Lake Formation.
  • Worked on Data Ingestion activities to bring large volumes of data in to our Data Lake.
  • Work wif IT support to create ETL / ELT interfaces to teh data lake and create and visualize teh data and data products on teh data lake.
  • Involved in file movements between HDFS andAWSS3 and extensivelyworked wif S3 bucket inAWS.
  • Developed Python code to provide data analysis and generate complex data report.
  • Deployed teh Cassandra cluster in cloud (Amazon AWS) environment wif scalable nodes as per teh business requirement.
  • Implemented teh ETL design to dump teh Map-Reduce data cubes to Cassandra cluster.
  • Developed various shell scripts and python scripts to automate Spark jobs and hive scripts.
  • Optimizing of existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frames and Pair RDD's.
  • ImplementedSparkusing Scala and Spark SQL for faster testing and processing of data.
  • Implemented Spark using Scala and utilizingData framesand Spark SQL API for faster processing of data.
  • Data Management strategy defining direction of data organization; metadata management wifin Data Lakes.
  • Managed quality assurance program, including on-site evaluations, internal audits and customer surveys.
  • Involved in conversion from WHIRL ( Confidential ) systems to TSYS systems.
  • Utilized various data analysis and data visualization tools to accomplish data analysis, report design and report delivery.
  • Resolved conflicts and negotiated mutually beneficial agreements between parties.
  • Prepared functional and technical documentation data for warehouses.
  • Selected methods and criteria for warehouse data evaluation procedures.
  • Tested software applications and systems to identify enhancement opportunities.
  • Performed systems and data analysis using variety of computer languages and procedures.
  • Developed and modified programs to meet customer requirements.
  • Coordinated troubleshooting support for warehouse personnel.
  • Cooperated fully wif product owners and enterprise architects to understand requirements.

Environment: Apache Spark, Spark Streaming, Spark SQL, Scala, Pyspark, Python, MongoDB, AWS Tools: S3, EC2, Quick sight, Databricks, snowflake.

Confidential, Chicago, IL

Hadoop Developer

Responsibilities:

  • Migration of Oracle tables to teh HDFS using SQOOP.
  • Developed a process for teh Batch ingestion of CSV Files, Sqoop from different sources and also generating views on teh data source using Shell Scripting and Python.
  • Developed Managed beans and defined Navigation rules for teh application using JSF.
  • Designed and implemented Apache Spark job which takes teh Sequence File from th HDFS and migrate to teh Hbase.
  • Motivated and assisted team of six members in reaching individual and team goals for quality, productivity and revenue generation.
  • Data cleaning, pre-processing and modelling using Spark and Python.
  • Used WebSphere Application Server Developer Tools for Eclipse (WDT) to createJavabatch projects based on theJavaBatch 1.0 standard (JSR 352) and submit them to a Liberty profile server
  • Worked on ApacheNiFito ingest raw data to HDFS.Configured SSL Context Service and different key tab parameters in ApacheNiFi.
  • Involved and guide teh team for teh preparing teh technical specification
  • Involved in Development, Build and Deployment Application
  • Implemented micro services in order to separate teh tasks and not to have dependency on other Parallel ongoing tasks of same Application.
  • Extending HIVE and PIG core functionality by using custom User Defined Function’s (UDF), User Defined Table-Generating Functions (UDTF) and User Defined Aggregating Functions (UDAF) for Hive and Pig using python.
  • Identifying opportunities to improve infrastructure that effectively and efficiently utilizes teh Microsoft Azure Windows server 2008/2012/R2, Microsoft SQL Server, Microsoft Visual Studio, Windows Power Shell, Cloud infrastructure.
  • Deployed Azure IaaS virtual machines (VMs) and Cloud services (PaaS role instances) into secure VNets and subnets.
  • Developed Web service using Restful wif Jersey, and implemented JAX-RS and also provided security-using SSL.
  • Creation Of HBase Tables and implemented Salting on teh Hbase
  • Migration of tables from oracle to Hbase on teh Tenant basis
  • Experienced in performance tuning of Spark Applications for setting right Batch Interval time, correct level of Parallelism and memory tuning.
  • Design and develop ETL code using Informatica Mappings to load data from heterogeneous Source systems like flat files, XML’s, MS Access files, Oracle to target system Oracle under Stage, tan to data warehouse and tan to Data Mart tables for reporting.
  • Developed ETL wif SCD’s, caches, complex joins wif optimized SQL queries.
  • Written Programs in Spark using Scala and Python for Data quality check.
  • Developed a data pipeline using Kafka, Spark and Hive to ingest, transform and analyzing customer behavioral data.
  • Developed real time data processing applications by using Scala and Python and implemented Apache Spark Streaming from various streaming sources like Kafka.
  • Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs, Python and Scala.
  • Implemented ELK (Elastic Search, Log stash, Kibana) stack to collect and analyze teh logs produced by teh spark cluster.
  • Contribute to teh support forums (specific to Azure Networking, Azure Virtual Machines, Azure Active Directory, Azure Storage) for Microsoft Developers Network.
  • Performed advanced procedures like text analytics and processing, using teh in-memory computing capabilities of Spark
  • One time migration of 300 Billions of records using apache Spark Bulk Loading
  • Involved in Delta Migration using Sqoop Incremental Updates
  • Involved in typical steps in building data lake like setup storage, move data, cleanse, prep and catalog data, configure and enforce security and compliance policies, make data available for analytics.
  • Using shell scripts to perform ETL process to call sql or pmcmd commands, pre-post ETL process like file validation, zipping, massaging and archiving teh source and target files, and used UNIX scripting to manage teh file systems.
  • Maintained, structured, and surveyed documents wifin teh NoSQL MongoDB database; ensuring data integrity, correcting anomalies, and increasing teh overall maintainability of teh database.
  • Involved in architecturing teh Data Pipe-Line for teh Add/Update flow of real-time analysis flow for teh IMS Application
  • Achitectured teh whole design for teh data migration process and also for teh real time analysis.
  • Written Apache Spark Jobs using Scala API
  • Working wif Kafka to get near real-time data onto big data cluster and required data into Spark for analysis
  • Created kafka spark streaming data pipelines for consuming teh data from external source and performing teh transformations in scala and contributed towards developing a Data Pipeline to load data from different sources like Web, RDBMS, NoSQL to Apache Kafka or Spark cluster
  • Involved in administrating and configuring teh mapr distribution
  • Implemented Mapr Streams (KAFKA 0.9 API) to teh Spark Streaming using Java API.

Environment: Apache Spark, Spark Streaming, Spark SQL, Hadoop Security, Nifi, Mapr Streams, Mapr 5.1, Open-Shift, Scala, ETL Informatica, Java, Hbase, Eclipse, MVN, Sequence Files.

Confidential, Irving, TX

Hadoop Developer

Responsibilities:

  • Performed hands-on data manipulation, transformation, hypothesis testing and predictive modeling Developed robust set of codes that are tested, automated, structured and efficient
  • Defined service layer using EJB3.0 and also defined remote and local services.
  • Accessed remote and local EJB services from controller.
  • Developed application using JSP, Tag libraries, JSF and Struts (MVC) Framework.
  • Exposed web services to client developing WSDL also involved in developing web client for application interactions.
  • Evaluate, refine, and continuously improve teh efficiency and accuracy of existing Predictive Models using Netezza.
  • Involved in developing several Data Pipelines using Apache Kafka.
  • Decent skills in data warehousing andAWSmigration wif tools such asAWSSCT, DMS, Datapipeline
  • Developed Framework API for Tax calculations in Yoda using server-side components using J2EE and spring framework.
  • Designed, developed and implemented a messaging module usingJavaMessaging Service (JMS) point-to-point messaging and Message Driven Beans to listen to teh messages in teh queue for interactions wif client ordering data.Collaborated on insights wif other Data Scientists, Business Analysts, and Partners.
  • Imported data fromAWSS3 into Spark RDD, performed transformations and actions on RDDs.
  • Worked on importing metadata into Hive and migrating existing tables and applications to work on Hive andAWScloud.
  • Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs, Python and Scala.
  • Worked extensively wif importing metadata into Hive using Python and migrated existing tables and applications to work on AWS cloud (S3).
  • Evaluate, refine, and continuously improve teh efficiency and accuracy of existing Predictive Models.
  • UsedJavaMessaging Services (JMS) for reliable and asynchronous exchange of important information such as payment status report to MQ Server using MQ Series.
  • Worked on POC’s wif Apache Spark using Scala to implement spark in project.
  • Consumed teh data from Kafka using Apache spark.
  • Worked wif Apache Spark which provides fast and general engine for large data processing integrated wif functional programming language Scala.
  • Experienced in Apache Spark for implementing advanced procedures like text analytics and processing using teh in-memory computing capabilities written in Scala.
  • Responsible for Design and Development of ETL and Technical Design documents for Informatica mappings wif teh High level DFD’s, Process flow description, Source to Target row level transformation logic, performance tuning, and scheduling.
  • Importing and exporting teh data from HDFS to RDBMS using Sqoop and Kafka.
  • Populated HDFS and Cassandra wif huge amounts of data using Apache Kafka.
  • Developed Scala and SQL code to extract data from various databases
  • Champion new innovative ideas around teh Data Science and Advanced Analytics practices
  • Creatively communicated and presented models to business customers and executives, utilizing a variety of formats and visualization methodologies.
  • Participate in planning, implementation, and growth of our Amazon Web Services (AWS) foundational footprint.
  • Created stored procedures and packages in Oracle as a part of teh pre and Post ETL process.
  • Uploaded data to Hadoop, Hive and combined new tables wif existing databases.
  • Developed statistical models to forecast inventory and procurement cycles.8
  • Implemented teh data backup strategies for teh data in teh Cassandra cluster.
  • Generated teh data cubes using Hive, Pig, JAVA Map-Reducing on provisioning Hadoop cluster in AWS.
  • Imported teh data from relational databases into HDFS using SQOOP.
  • Tune and optimize ETL jobs and SQL Queries for performance and throughput
  • Utilized Python Panda Frame to provide data analysis.
  • Worked on Horton works 2.3 distribution
  • Utilized Python regular expressions operation (NLP) to analysis customer review.
  • Understanding of data storage and retrieval techniques, ETL, and databases, to include graph stores, relational databases, tuple stores, NOSQL, Hadoop, PIG, MySQL and Oracle databases

Environment: Apache Spark, Pyspark, Spark Streaming, Spark SQL, Scala, Apache Kafka, Apache Flume, Python Pandas, Cassandra, Hortonworks (HDP) 2.3, AWS, AKKA, Hive, PIG, ETL Informatica.

Confidential

Jr. Java/J2EE Developer

Responsibilities:

  • Developed User Interfaces module usingJSP,JavaScript, DHTML and form beansfor presentation layer.
  • Developed Servlets and Java Server Pages (JSP).
  • Developed PL/SQL queries, and wrote stored procedures andJDBC routines to generate reports based on client requirements.
  • Enhancement of teh System according to teh customer requirements.
  • Involved in teh customization of teh available functionalities of teh software for an NBFC (Non-BankingFinancialCompany).
  • Involved in putting proper review processes and documentation for functionality development.
  • Providing support and guidance for Production and Implementation Issues.
  • Used Java Script validation in JSP.
  • UsedHibernateframework to access teh data from back-end SQL Server database.
  • Used AJAX (Asynchronous JavaScript and XML) to implement user friendly andefficient client interface.
  • UsedMDBfor consuming messages from JMS queue/topic.
  • Designed and developed Web Application usingStrutsFramework.
  • ANT to compile and generate EAR, WAR, and JAR files.
  • Created tes1t case scenarios for Functional Testing and wrote Unit test cases wif JUnit.
  • Responsible for Integration, unit testing, system testing and stress testing for all teh phases of project.
Environment: Java, J2EE, JSP 1.2, Performance Tuning,Spring1.2,Hibernate 2.0, JSF1.2,EJB 1.2, IBM WebSphere6.0, Servlets, JDBC, XML, XSLT, DOM, CSS, HTML, DHTML, SQL, Java Script, Log4J, ANT1.6, WSAD 6.0, Oracle 9i, Windows 2000.

We'd love your feedback!