We provide IT Staff Augmentation Services!

Big Data Engineer Resume

2.00/5 (Submit Your Rating)

Minnesota, MN

SUMMARY

  • Around 8 years of experience using Big Data Ecosystems & Java.
  • Extensive experience in Apache Spark with Scala, Apache Solr, Python
  • Extensive experience in data ingestion technologies like Flume, Kafka and NiFi
  • Utilize Flume, Kafka and NiFi to gain real - time and near real-time streaming data in HDFS from different data sources.
  • Good Knowledge in usingNiFi to automate the data movement between different Hadoop systems.
  • Developed ELT workflows using NiFi to load data into Hive and Tera data.
  • Solid Mathematics, Probability and Statistics foundation and broad practical statistical and data mining techniques cultivated through various industry work and academic programs.
  • Delivered models in high-stakes information retrieval and statistical analysis, e.g. Fraud detection.
  • Experience in machine learning including supervised or unsupervised learning techniques and algorithms (e.g. k-NN, SVM, RVM, Naïve Bayes, Decision trees, etc.)
  • Design and develop models for building and deploying scalable cloud based predictive and prescriptive intelligence solutions such as recommender systems.
  • Knowledge in Spark Core, Spark-SQL, Spark Streaming and machine learning using Scala and Python Programming languages.
  • Hands on experience on Java8,Scala and Play/Akka framework.
  • Distributed Application Development using Actor Models for extreme scalability using Akka.
  • Involved in the Software Development Life Cycle (SDLC)phases which include Analysis, Design, Implementation, Testing and Maintenance.
  • Extensively worked onCI/CD pipelinefor code deployment by engaging different tools (Git,Jenkins) in the process right from developer code check-in to Production deployment.
  • Integrated services likeGitHub, AWS Code Pipeline, and Jenkins to create a deployment pipeline.
  • Responsible for building application full automation pipeline for deployment into AWS using Jenkins, Arti factory, Puppet and Terra form
  • Developed Spark Code using SCALA and Spark-SQL/Streaming for faster testing and processing of data.
  • Used DATAFRAME API in Scala for converting the distributed collection of data organized into named columns.
  • Strong technical, administration & mentoring knowledge in Linux and Big Data/Hadoop technologies.
  • Involved in designing and architecting data warehouses and data lakes on regular (Oracle, SQL Server) high performance (Netezza and Teradata) and big data (Hadoop - MongoDB, Hive, Cassandra and HBase) databases.
  • Have sound knowledge on In-Memory MEMSQL.
  • Experience with developing and maintaining applications written forAmazon Simple Storage,AWSElastic Map Reduce, andAWSCloudFormation.
  • Worked on migrating the on-premises applications toAWS
  • Hands on experience on major components in Hadoop Ecosystem like Hadoop Map Reduce, HDFS, HIVE, PIG, Pentaho, Hbase, Zookeeper, Sqoop, Oozie, Cassandra, Flume and Avro.
  • Experienced the deployment of Hadoop Cluster using Puppet tool
  • Used Kafka Streams to Configure Spark Streaming to get information and then store it in HDFS
  • Work experience with cloud infrastructure like Amazon Web Services (AWS).
  • Experience in importing and exporting the data using SQOOP from HDFS to Relational Database systems/mainframe and vice-versa
  • Experience in designing and developing applications in Spark using Scala to compare the performance of Spark with Hive and SQL/Oracle.
  • Expertise in working with ETL Architects, Data Analysts and data modelers to translate business rules/requirements into conceptual, physical and logical dimensional models and worked with complex normalized and denormalized data models.
  • Installing, configuring and managing of Hadoop Clusters and Data Science tools.
  • Managing the Hadoop distribution with Cloudera Manager, Cloudera Navigator, Hue.
  • Setting up the High-Availability for Hadoop Clusters components and Edge nodes.
  • Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs, Python and Scala.
  • Used Spark SQL for Scala, Python interface that automatically converts RDD case classes to schema RDD.
  • Configured and deployed Azure Automation scripts for applications utilizing the Azure stack that including compute,blobs, ADF, Azure Data Lake, Azure Data Factory, Azure SQL, Cloud services, ARM Templates and utilities focusing on Automation Involved in converting Hive/SQL queries into SPARK TRANSFORMATIONS using Spark RDDs, and SCALA.
  • Strong experience in writing applications using python using different libraries like Pandas, Scikit-learn, NumPy, SciPy, Matpotlib etc
  • Experience in developing Shell scripts and Python Scripts for system management.
  • Well versed in using Software development methodologies like Rapid Application Development (RAD), Agile Methodology and Scrum software development processes.
  • Involved in converting Cassandra/Hive/SQL queries into Spark transformations using RDDs, and Scala
  • Analyzed the Cassandra/SQL scripts and designed the solution to implement using Scala
  • Experience with Object Oriented Analysis and Design (OOAD)methodologies.
  • Migrated Python Machine learning modules to scalable, high performance and fault-tolerant distributed systems like Apache Spark.
  • Delivery experience on major Hadoop ecosystem Components such as Pig, Hive, Spark Kafka, Elastic Search & HBase and monitoring with Cloudera Manager.
  • Worked on Data Modelling using various ML (Machine Learning Algorithms) via R and Python.
  • UsedAWSservices like EC2 and S3 for small data sets processing and storage, Experienced in Maintaining theHadoopcluster onAWSEMR.
  • Hands on experience in developing ETL data pipelines using pyspark on AWS EMR.
  • Decent skill in data warehousing and AWS migration with tools such AWS SCT, DMS, Data Pipeline.
  • Experience in installations of software, writing test cases, debugging, and testing of batch and online systems.
  • Supported data analysis projects using Elastic Map Reduce on the Amazon Web Services (AWS) cloud. Exporting and importing data into S3.
  • Experience in working onAWS, Flume to load the log data from multiple sources directly into HDFS & running Pig and Hive scripts.
  • Experience in Production, quality assurance (QA), SIT (System Integration testing) and user acceptance (UA) testing.
  • Hands on experience in Spark using Scala and python creating RDD's, applying operations -Transformation and Actions.
  • Expertise in J2EEtechnologies like JSP, Servlets, EJBs 2.0, JDBC, JNDI and AJAX.
  • Experience in Performance Tuning and Debugging of existing ETL processes.
  • Expertise in applying Java Messaging Service (JMS)for reliable information exchange across Java applications.
  • Proficient with Core JAVA, AWT and also with the markup languages like HTML 5.0,XHTML, DHTML, CSS, XML 1.1, XSL, XSLT, XPath, XQuery, Angular.js, Node.js
  • Expert in understanding the data and designing/Implementing the enterprise platforms like Hadoop Data lake and Huge Data warehouses.
  • Experience with various technology platforms, application architecture, design, and delivery including experience architecting large big data enterprise data lake projects.
  • Worked with version control systems like Subversion, Perforce, and GIT for providing common platform for all the developers.
  • Articulate in written and verbal communication along with strong interpersonal, analytical, and organizational skills.
  • Hands on experience with Microsoft Azure Cloud services, Storage Accounts and Virtual Networks.
  • Good at manage hosting plans for Azure Infrastructure, implementing and deploying workloads on Azure virtual machines (VMs).
  • Highly motivated team player with the ability to work independently and adapt quickly to new and emerging technologies.
  • Creatively communicate and present models to business customers and executives, utilizing a variety of formats and visualization methodologies.
  • Develop codes using Data bricks and integrate snowflake to load into Cloud Databases.

TECHNICAL SKILLS

Languages: C,C++,Java 6,Java 7, Java 8, Scala.

Big Data Skills: Map reduce, Hadoop, Spark, Kafka,Python,Snowflake, Databricks

Web Technologies: HTML5 JavaScript, Ajax, CSS, JQuery, XML,BootStrap.

Servers: WebSphere, Tomcat 6.x, MIIS (Microsoft Internet Information Server)

Case Tools and IDE: Eclipse, NetBeans, RAD, IntelliJ, Netezza, PyCharm.

Frameworks in Hadoop: Spark, Kafka, Storm

Databases: DB2, Oracle and MySQL Server,MongoDB,Cassandra

Version Tools: SVN, CVS, ClearCases

Web Services: SOAP, REST

PROFESSIONAL EXPERIENCE

Confidential, Minnesota, MN

Big Data Engineer

Responsibilities:

  • Program Map Reduce and Hadoop solutions in Python, and Scala.
  • We useNiFito extract and parse the streaming data.
  • Design, Development and Enhancements of various types of reports.
  • Designed & developed scalable and reliable near real-time stream processing solutions using (NiFi & Kafka).
  • Re-designed and developed a critical ingestion pipeline to process over 100 GB of data.
  • Provided technical expertise and created software design proposals for upcoming components.
  • Responsible for maintaining quality reference data by performing operations such as cleaning, transformation and ensuring Integrity in a relational environment.
  • Initiated a lot of fine-tuning mechanisms to tune the database as well as the queries to complete a set of given jobs or tasks in optimal time.
  • We use AWS extensively, so experience with EMR and other Amazon Web services (S3, EC2,IAM) will help you hit the ground running.
  • Design, develop, deploy, and manage a reliable and scalable data analysis pipeline, using technologies including AWS, open-source tooling, and custom-built frameworks.
  • Create business intelligence dashboards in QuickSight for reconciliation and verifying data.

Environment: Apache Spark, Spark SQL, Nifi, Scala,Python,Pyspark, AWS.

Confidential, Chicago, IL

Big Data Engineer

Responsibilities:

  • Generated detailed studies on potential third-party data handling solutions, verifying compliance with internal needs and stakeholder requirements.
  • Designed compliance frameworks for multi-site data warehousing efforts to verify conformity with state and federal data security guidelines.
  • Verified and validated between the target data and source data during data migration using data reconciliation process
  • Configured panels in reconciliation and enabled one to model complex pipeline systems.
  • Implemented Spark using Scala and Spark SQL for faster testing and processing of data.
  • Performed large-scale data conversions, transferring EBCDIC data into standardized parquet /ascii formats.
  • Convert EBCDIC file to parquet/Ascii using built-in program and worked with Cobrix.
  • Develop a Scala program for rewards conversion (Earn, Spend, Bonus) and Integrate spark with MongoDB and Cassandra
  • Collaborated with Devops team on ETL (Extract, Transform, Load) tasks, maintaining data integrity and verifying pipeline stability.
  • Develop the code and coordinted with the operations team on deploying, monitoring, analyzing, and tuning data into MongoDB.
  • Authored specifications for data processing tools and technologies, defining SOP (Standard Operating Procedures) for conversion and loading the data NPI/PCI.
  • Mapped data between source systems and warehouses.
  • Completed quality reviews for designs, codes, test plans and documentation methods.
  • Validated warehouse data structure and accuracy.
  • Collaborated with multi-functional roles to communicate and align development efforts.
  • Resolved conversion problems, improved operations and provided exceptional client support.
  • Developed Spark/Scala, Python for regular expression (regex) project in the Databricks for big data resources.
  • Expertized in implementing Spark usingScalaandSpark SQLfor faster testing and processing of data responsible to manage data from different sources.
  • Data sources are extracted, transformed and loaded to generate CSV data files with Python programming and SQL queries.
  • Worked on data pre-processing and cleaning the data to perform feature engineering and performed data imputation techniques for the missing values in the dataset using Python.
  • Develop Python codes on Databricks for automation and integrate with snowflake.
  • Analysed the sql scripts and designed it by using PySpark SQL for faster performance.
  • Develop python code on Databricks to run all notebooks in parallel to load into snowflake database.
  • Worked on POC's with AWS Quick sight to create dashboard for data, loading through data bricks into snowflake databases.
  • Developed Data Lake using AWS and Hands-on experience in AWS Glue, EMR, Athena, Quick Sight and Lake Formation.
  • Worked on Data Ingestion activities to bring large volumes of data in to our Data Lake.
  • Work with IT support to create ETL / ELT interfaces to the data lake and create and visualize the data and data products on the data lake.
  • Involved in file movements between HDFS andAWSS3 and extensivelyworked with S3 bucket inAWS.
  • Developed Python code to provide data analysis and generate complex data report.
  • Deployed the Cassandra cluster in cloud (Amazon AWS) environment with scalable nodes as per the business requirement.
  • Implemented the ETL design to dump the Map-Reduce data cubes to Cassandra cluster.
  • Developed various shell scripts and python scripts to automate Spark jobs and hive scripts.
  • Optimizing of existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frames and Pair RDD's.
  • ImplementedSparkusing Scala and Spark SQL for faster testing and processing of data.
  • Implemented Spark using Scala and utilizingData framesand Spark SQL API for faster processing of data.
  • Data Management strategy defining direction of data organization; metadata management within Data Lakes.
  • Managed quality assurance program, including on-site evaluations, internal audits and customer surveys.
  • Involved in conversion from WHIRL ( Confidential ) systems to TSYS systems.
  • Utilized various data analysis and data visualization tools to accomplish data analysis, report design and report delivery.
  • Resolved conflicts and negotiated mutually beneficial agreements between parties.
  • Prepared functional and technical documentation data for warehouses.
  • Selected methods and criteria for warehouse data evaluation procedures.
  • Tested software applications and systems to identify enhancement opportunities.
  • Performed systems and data analysis using variety of computer languages and procedures.
  • Developed and modified programs to meet customer requirements.
  • Coordinated troubleshooting support for warehouse personnel.
  • Cooperated fully with product owners and enterprise architects to understand requirements.

Environment: Apache Spark, Spark Streaming, Spark SQL, Scala, Pyspark, Python, MongoDB, AWS Tools: S3, EC2, Quick sight, Databricks, snowflake.

Confidential, Chicago, IL

Hadoop Developer

Responsibilities:

  • Migration of Oracle tables to the HDFS using SQOOP.
  • Developed a process for the Batch ingestion of CSV Files, Sqoop from different sources and also generating views on the data source using Shell Scripting and Python.
  • Developed Managed beans and defined Navigation rules for the application using JSF.
  • Designed and implemented Apache Spark job which takes the Sequence File from th HDFS and migrate to the Hbase.
  • Motivated and assisted team of six members in reaching individual and team goals for quality, productivity and revenue generation.
  • Data cleaning, pre-processing and modelling using Spark and Python.
  • Used WebSphere Application Server Developer Tools for Eclipse (WDT) to createJavabatch projects based on theJavaBatch 1.0 standard (JSR 352) and submit them to a Liberty profile server
  • Worked on ApacheNiFito ingest raw data to HDFS.Configured SSL Context Service and different key tab parameters in ApacheNiFi.
  • Involved and guide the team for the preparing the technical specification
  • Involved in Development, Build and Deployment Application
  • Implemented micro services in order to separate the tasks and not to have dependency on other Parallel ongoing tasks of same Application.
  • Extending HIVE and PIG core functionality by using custom User Defined Function’s (UDF), User Defined Table-Generating Functions (UDTF) and User Defined Aggregating Functions (UDAF) for Hive and Pig using python.
  • Identifying opportunities to improve infrastructure that effectively and efficiently utilizes the Microsoft Azure Windows server 2008/2012/R2, Microsoft SQL Server, Microsoft Visual Studio, Windows Power Shell, Cloud infrastructure.
  • Deployed Azure IaaS virtual machines (VMs) and Cloud services (PaaS role instances) into secure VNets and subnets.
  • Developed Web service using Restful with Jersey, and implemented JAX-RS and also provided security-using SSL.
  • Creation Of HBase Tables and implemented Salting on the Hbase
  • Migration of tables from oracle to Hbase on the Tenant basis
  • Experienced in performance tuning of Spark Applications for setting right Batch Interval time, correct level of Parallelism and memory tuning.
  • Design and develop ETL code using Informatica Mappings to load data from heterogeneous Source systems like flat files, XML’s, MS Access files, Oracle to target system Oracle under Stage, then to data warehouse and then to Data Mart tables for reporting.
  • Developed ETL with SCD’s, caches, complex joins with optimized SQL queries.
  • Written Programs in Spark using Scala and Python for Data quality check.
  • Developed a data pipeline using Kafka, Spark and Hive to ingest, transform and analyzing customer behavioral data.
  • Developed real time data processing applications by using Scala and Python and implemented Apache Spark Streaming from various streaming sources like Kafka.
  • Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs, Python and Scala.
  • Implemented ELK (Elastic Search, Log stash, Kibana) stack to collect and analyze the logs produced by the spark cluster.
  • Contribute to the support forums (specific to Azure Networking, Azure Virtual Machines, Azure Active Directory, Azure Storage) for Microsoft Developers Network.
  • Performed advanced procedures like text analytics and processing, using the in-memory computing capabilities of Spark
  • One time migration of 300 Billions of records using apache Spark Bulk Loading
  • Involved in Delta Migration using Sqoop Incremental Updates
  • Involved in typical steps in building data lake like setup storage, move data, cleanse, prep and catalog data, configure and enforce security and compliance policies, make data available for analytics.
  • Using shell scripts to perform ETL process to call sql or pmcmd commands, pre-post ETL process like file validation, zipping, massaging and archiving the source and target files, and used UNIX scripting to manage the file systems.
  • Maintained, structured, and surveyed documents within the NoSQL MongoDB database; ensuring data integrity, correcting anomalies, and increasing the overall maintainability of the database.
  • Involved in architecturing the Data Pipe-Line for the Add/Update flow of real-time analysis flow for the IMS Application
  • Achitectured the whole design for the data migration process and also for the real time analysis.
  • Written Apache Spark Jobs using Scala API
  • Working with Kafka to get near real-time data onto big data cluster and required data into Spark for analysis
  • Created kafka spark streaming data pipelines for consuming the data from external source and performing the transformations in scala and contributed towards developing a Data Pipeline to load data from different sources like Web, RDBMS, NoSQL to Apache Kafka or Spark cluster
  • Involved in administrating and configuring the mapr distribution
  • Implemented Mapr Streams (KAFKA 0.9 API) to the Spark Streaming using Java API.

Environment: Apache Spark, Spark Streaming, Spark SQL, Hadoop Security, Nifi, Mapr Streams, Mapr 5.1, Open-Shift, Scala, ETL Informatica, Java, Hbase, Eclipse, MVN, Sequence Files.

Confidential, Irving, TX

Hadoop Developer

Responsibilities:

  • Performed hands-on data manipulation, transformation, hypothesis testing and predictive modeling Developed robust set of codes that are tested, automated, structured and efficient
  • Defined service layer using EJB3.0 and also defined remote and local services.
  • Accessed remote and local EJB services from controller.
  • Developed application using JSP, Tag libraries, JSF and Struts (MVC) Framework.
  • Exposed web services to client developing WSDL also involved in developing web client for application interactions.
  • Evaluate, refine, and continuously improve the efficiency and accuracy of existing Predictive Models using Netezza.
  • Involved in developing several Data Pipelines using Apache Kafka.
  • Decent skills in data warehousing andAWSmigration with tools such asAWSSCT, DMS, Datapipeline
  • Developed Framework API for Tax calculations in Yoda using server-side components using J2EE and spring framework.
  • Designed, developed and implemented a messaging module usingJavaMessaging Service (JMS) point-to-point messaging and Message Driven Beans to listen to the messages in the queue for interactions with client ordering data.Collaborated on insights with other Data Scientists, Business Analysts, and Partners.
  • Imported data fromAWSS3 into Spark RDD, performed transformations and actions on RDDs.
  • Worked on importing metadata into Hive and migrating existing tables and applications to work on Hive andAWScloud.
  • Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs, Python and Scala.
  • Worked extensively with importing metadata into Hive using Python and migrated existing tables and applications to work on AWS cloud (S3).
  • Evaluate, refine, and continuously improve the efficiency and accuracy of existing Predictive Models.
  • UsedJavaMessaging Services (JMS) for reliable and asynchronous exchange of important information such as payment status report to MQ Server using MQ Series.
  • Worked on POC’s with Apache Spark using Scala to implement spark in project.
  • Consumed the data from Kafka using Apache spark.
  • Worked with Apache Spark which provides fast and general engine for large data processing integrated with functional programming language Scala.
  • Experienced in Apache Spark for implementing advanced procedures like text analytics and processing using the in-memory computing capabilities written in Scala.
  • Responsible for Design and Development of ETL and Technical Design documents for Informatica mappings with the High level DFD’s, Process flow description, Source to Target row level transformation logic, performance tuning, and scheduling.
  • Importing and exporting the data from HDFS to RDBMS using Sqoop and Kafka.
  • Populated HDFS and Cassandra with huge amounts of data using Apache Kafka.
  • Developed Scala and SQL code to extract data from various databases
  • Champion new innovative ideas around the Data Science and Advanced Analytics practices
  • Creatively communicated and presented models to business customers and executives, utilizing a variety of formats and visualization methodologies.
  • Participate in planning, implementation, and growth of our Amazon Web Services (AWS) foundational footprint.
  • Created stored procedures and packages in Oracle as a part of the pre and Post ETL process.
  • Uploaded data to Hadoop, Hive and combined new tables with existing databases.
  • Developed statistical models to forecast inventory and procurement cycles.8
  • Implemented the data backup strategies for the data in the Cassandra cluster.
  • Generated the data cubes using Hive, Pig, JAVA Map-Reducing on provisioning Hadoop cluster in AWS.
  • Imported the data from relational databases into HDFS using SQOOP.
  • Tune and optimize ETL jobs and SQL Queries for performance and throughput
  • Utilized Python Panda Frame to provide data analysis.
  • Worked on Horton works 2.3 distribution
  • Utilized Python regular expressions operation (NLP) to analysis customer review.
  • Understanding of data storage and retrieval techniques, ETL, and databases, to include graph stores, relational databases, tuple stores, NOSQL, Hadoop, PIG, MySQL and Oracle databases

Environment: Apache Spark, Pyspark, Spark Streaming, Spark SQL, Scala, Apache Kafka, Apache Flume, Python Pandas, Cassandra, Hortonworks (HDP) 2.3, AWS, AKKA, Hive, PIG, ETL Informatica.

Confidential

Jr. Java/J2EE Developer

Responsibilities:

  • Developed User Interfaces module usingJSP,JavaScript, DHTML and form beansfor presentation layer.
  • Developed Servlets and Java Server Pages (JSP).
  • Developed PL/SQL queries, and wrote stored procedures andJDBC routines to generate reports based on client requirements.
  • Enhancement of the System according to the customer requirements.
  • Involved in the customization of the available functionalities of the software for an NBFC (Non-BankingFinancialCompany).
  • Involved in putting proper review processes and documentation for functionality development.
  • Providing support and guidance for Production and Implementation Issues.
  • Used Java Script validation in JSP.
  • UsedHibernateframework to access the data from back-end SQL Server database.
  • Used AJAX (Asynchronous JavaScript and XML) to implement user friendly andefficient client interface.
  • UsedMDBfor consuming messages from JMS queue/topic.
  • Designed and developed Web Application usingStrutsFramework.
  • ANT to compile and generate EAR, WAR, and JAR files.
  • Created tes1t case scenarios for Functional Testing and wrote Unit test cases with JUnit.
  • Responsible for Integration, unit testing, system testing and stress testing for all the phases of project.
Environment: Java, J2EE, JSP 1.2, Performance Tuning,Spring1.2,Hibernate 2.0, JSF1.2,EJB 1.2, IBM WebSphere6.0, Servlets, JDBC, XML, XSLT, DOM, CSS, HTML, DHTML, SQL, Java Script, Log4J, ANT1.6, WSAD 6.0, Oracle 9i, Windows 2000.

We'd love your feedback!