We provide IT Staff Augmentation Services!

Big Data Engineer Resume

2.00/5 (Submit Your Rating)

Mooresville, NC

SUMMARY

  • Cloudera certified Spark and Hadoop Developer wif 7+ years of experience on Hadoop, Spark.
  • Experience in machine learning using random forest, deep learning for understand the shopping trends.
  • Experience in installing, configuring, testing Hadoop ecosystem components.
  • Experience in migrating the data using Sqoop from HDFS to Relational Database System and vice - versa according to client's requirement.
  • Experience in managing different file formats and compressions.
  • Extensive noledge on Spark Core APIs, Data Frames, Spark-SQL.
  • Very good understanding of partitioning and bucketing in Hive.
  • Knowledge on ingesting real-time or near real time streaming data into HDFS using flume, Kafka, and spark streaming.
  • Experience on Migrating SQL database to Azure data Lake, Azure data lake Analytics, Azure SQL Database, Data Bricks and Azure SQL Data warehouse and controlling and granting database access and Migrating on premise databases to Azure Data lake store using Azure Data factory.
  • Knowledge on integration of flume wif Kafka, integration of flume wif spark streaming and Kafka wif spark streaming.
  • Experience in analyzing data using Python, R, SQL, Microsoft Excel, Hive, PySpark, Spark SQL for Data Mining, Data Cleansing, Data Munging.
  • Hands on experience in using various Hadoop distros (Cloudera (CDH 4/CDH 5), Hortonworks, Map-R, IBM Big Insights, Apache and EMR Hadoop distributions.
  • Hands on experience onAWS cloudservices (VPC, EC2, S3, IAM, RDS, Redshift, Data Pipeline, EMR, DynamoDB, Workspaces, Lambda, Kinesis, RDS, SNS, SQS)
  • Designed HIVE queries & Pig scripts to perform data analysis, data transfer and table design.
  • Expertise in Hive queries and have extensive noledge on joins.
  • Extensive noledge on Sqoop import and exports.
  • Create hive scripts to extract, transform, load (ETL) and store the data
  • Managed and reviewed Hadoop logfiles. Hands-on noledge in core Java concepts like Exceptions, Collections, Data-structures me/O. Multi-threading, Serialization, and deserialization of streaming applications.
  • Experience in NoSQL databases such as HBase and Cassandra.
  • Hands on expertise wif AWS Databases such as RDS(Aurora), Redshift, DynamoDB and Elastic Cache (Memcached & Redis).
  • Used Postman & SOAPUI for rest service testing.
  • Experience on Productionizing Apache Nifi. for dataflows wif significant processing requirements and controlling security of data flow.
  • Involved in various projects related to Data Modeling, System/Data Analysis, Design and Development for both OLTP and Data warehousing environments.
  • Practical understanding of the Data modeling (Dimensional & Relational) concepts like Star-Schema Modeling, Snowflake Schema Modeling, Fact and Dimension tables.
  • Experience in Software Design, Development and Implementation of Client/Server Web based Applications using JSTL, jQuery, JavaScript, Java Beans, JDBC, Struts, PL/SQL, SQL, HTML, CSS, XML, AJAX and had a bird's eye view on React Java Script Library.
  • Sound noledge of ORACLE 9i, Core Java, Jsp, Servlets and experience in SQL and PL/SQL concepts database stored procedures, functions and Triggers.
  • Experience in deploying NiFi Data flow in Production team and Integrating data from multiple sources like Cassandra, MongoDB.
  • Experience wif DevOps tools (GitHub, JIRA) and methodologies (Agile, Scrum)
  • Experience in mentoring my team members by training, guiding, and monitoring their tasks.

TECHNICAL SKILLS

Languages: Python, SQL, shell script, Java

Bigdata Technologies: HDFS, Hive, Pig, Spark, Kafka, Sqoop, Flume, KUDU, Oozie, Apache Airflow.

Java Technologies: Core Java, JSP, JDBC, Eclipse, Jboss

Cloud Technologies: Amazon AWS, AWS Deque, AWS EMR, AWS Red shift, AWS EC2, Azure Data Lake, Data factory, Azure Databricks, Azure SQL database, Azure SQL Datawarehouse.

NoSQL Databases: HBase, MongoDB3.2, & Cassandra

IDE’s & Tools: Eclipse, IntelliJ Idea, putty, Visual Studio, Jenkins

Databases: My SQL, Oracle, Teradata, DynamoDB

Operating Systems: Unix, windows, Linux

Business Intelligence Tools: Tableau

Modelling Language: UML

PROFESSIONAL EXPERIENCE

Confidential, Mooresville, NC

Big Data Engineer

Responsibilities:

  • Worked as a Big Data Engineer in the scorecard Team to store, process & manage the huge amount of data in day to day operations collected from various sources. Most of the work me was involved in building data pipelines for the dashboards.
  • Built a Data Quality framework and Data quality architecture design to run the Data quality rules on data by using the AWS Deque framework. developed spark and Scala code to build a Data quality framework.
  • Configured different environment in Jenkins to execute test cases against hard launch or soft launch.
  • Set-up databasesin GCP using RDS, storage using S3 bucketand configuring instance backups to S3 bucket.
  • Created data pipelines to pull the data from rest API’s into Hadoop and HIVE.
  • Worked on creating Hive tables and written Hive queries for data analysis to meet business requirements and experienced in Sqoop to import and export the data from Oracle & MySQL.
  • Developed python code to pull the data from the rest API’s. created Hive queries to transform data for further downstream processing. worked on the Kafka to get the data from the source to the Hadoop in batch and streaming processing.
  • Worked on the Sqoop to import the data from the DB2 into Hadoop.
  • Developed a spark and Scala code to download the files from AWS S3 bucket and load into the Hadoop.
  • Worked extensively on Oozie scheduler to automate the data pipeline process.
  • Performed data analysis on the data sources when there was change in the source for the data which metrics to be calculating.
  • Created views on top of metrics in Hive for the BI team to build the cubes.
  • Worked on Postman to test the sample data of the API’s.
  • Written python code to download the Splunk logs into CSV file.
  • Created data pipeline for the dash boards of Splunk logs.
  • Good noledge on Splunk monitoring tool.
  • Performed aggregations on the Splunk data according to the business requirement by using python.
  • Worked on the groovy script to deploy the changes into edge nodes.
  • Worked on Spark SQL UDF’s and Hive UDF’s.
  • Extensively used Jenkins to create the hive tables and shell commands in the QA and PROD environments.
  • Created new repository wif enterprise Jenkins and created CICD pipeline to deploy code through Jenkins.
  • Deployed build and child jobs in groovy to create the CICD for the feature branches in the repository through Jenkins.
  • Worked closely wif data analysts’ team and BI team.
  • Developed python code for different tasks, dependencies, SLA watcher and time sensor for each job for workflow management and automation using Apache Airflow tool. worked on Migrating SQL database to Azure data Lake, Azure data lake Analytics, Azure SQL Database, Data Bricks and Azure SQL Data warehouse.
  • Developed notebooks in SPARK/SCALA using Azure Data Bricks to transform the data into business requirements.
  • Worked on Azure Data Lake Gen 2(ADLS Gen2) to copy files from the AWS S3 Bucket by using Azure Data Factory (ADF) copy Activity.
  • Developed JSON Scripts for deploying the Pipeline in Azure Data Factory (ADF) that process the data using the SQL Activity.

Confidential, San Diego, CA.

Big Data Engineer

Responsibilities:

  • Worked as a Big Data Engineer in the Risk Management Team where the Bank wanted to store, process & manage the huge amount of data in day-to-day operations collected from various sources. The system majorly checks the credibility of the customer & looks for the credit risks.
  • Collaborated wif various team & management to understand the requirement & design the complete system
  • Implemented the complete big data pipeline wif real-time processing
  • Worked wif Cloudera 5.12.x and its different components.
  • DevelopedCrawlersjava ETL framework to extract data from Cerner client’s database and ingest into HDFS & HBase for Long Term Storage.
  • Used Spark 2.1.x API to stream data from various sources in real-time.
  • Takes care about performance and security across all the Restful API.
  • Prepare required Restful API guide for User Interface developer and HTML in front end and it uses Restful API web services.
  • Developed Spark code in Scala using Spark SQL, EMR, Google Cloud & Data Frames for aggregation.
  • Experience in monitoring and managing Cassandra cluster.
  • Importing and exporting data into HDFS from Oracle 10.2 database and vice versa using SQOOP.
  • Extracted files from NoSQL database, Cassandra through Sqoop and placed in HDFS for processing.
  • Responsible to manage data coming from different sources.
  • Worked wif Kafka for data collection.
  • Created schema in Hive wif performance optimization using bucketing & partitioning worked rigorously on hive optimization.
  • Worked wif Impala 2.8.x for executing ad-hoc queries. worked wif source control management such as GitHub to push and pull code from repositories.
  • Written Hive queries to transform data for further downstream processing
  • Built a data reconciliation tool in spark environment.
  • Built and deployed Java application into multiple Unix based environments and produced both unit and functional test results along wif release notes.

Confidential, San Diego, CA

Spark/Hadoop Developer

Responsibilities:

  • Primary aim of the project is to demonstrate how a consumption strategy can be developed on top of Hadoop. Key roles and responsibilities in the project are:
  • Involved in various projects related to Data Modeling, System/Data Analysis, Design and Development for both OLTP and Data warehousing environments.
  • Installation and setup of multi node Cloudera cluster on AWS cloud.
  • Created data pipelines for different events to load the data from DynamoDB to AWS S3 bucket and tan into HDFS location.
  • Used CloudWatch for monitoring AWS cloud resources and the applications that deployed on AWS by creating new alarm, enable notification service.
  • Used CLOUDWATCH to actively monitor stats from all services in AWS solutions.
  • Used AWS services like EC2 and S3 for small data sets.
  • Experience to manage IAM users by creating new users, giving them a limited access as per needs, assign roles and policies to specific user.
  • Installation and setup of At Scale on top of Hadoop cluster using Hive and Impala as the SQL engines.
  • AWS EMRDistribution for Hadoop.
  • Implemented custom UDFs for Confidential KUDU
  • Worked on AWS Data Pipeline to configure data loads from S3 to into Redshift.
  • Used JSON schema to define table and column mapping from S3 data to Redshift.
  • Development of cubes involving multiple facts and dimensions
  • Development of calculations, leveraging Query Datasets.
  • Developed Spark/Scala, Python for regular expression (regex) project in the Hadoop/Hive environment wif Linux/Windows for big data resources.
  • Infrastructure related activities like Managing at Scale database instance, backing up instance, upgrading instance, reviewing logs and troubleshooting Hadoop and At Scale environment.
  • Defining & managing Aggregates.
  • Used Model Mart of data modeling tool Erwin for TEMPeffective model management of sharing, dividing and reusing model information and design for productivity improvement
  • Built Kafka-Spark-Cassandra Scala simulator for Met stream, a big data consultancy; Kafka-Spark-Cassandra prototypes.
  • Supported Workflow Notifications to receive feedback about workflow status changes and get alerted on failures. Workflows send notifications to SNS topics, each stage its own, to which the Workflow Notifications service SQS queue is subscribed during the deploy process. Service interacts wif the queue and publishes human readable messages to the Slack, SignalFX
  • Ran many performance tests using the Cassandra -stress tool in order to measure and improve the read and write performance of the cluster.
  • Used data modelling tool Erwin for reverse engineering to connect to existing database and ODS to create graphical representation in the form of Entity Relationships and elicit more information
  • Developing fine grain access and data level security for region, branch and main branch hierarchy.
  • Conducted statistical analysis to validate data and interpretations using Python, as well as presented Research findings, status reports and assisted wif collecting user feedback to improve the processes and tools.
  • Develop access patters for data access from excel and tableau.

Environment: Cluster Size - 8Node Cluster wif At Scale on the Edge node on cloud, Cloudera CDH5.15, At Scale 7.2, Tableau2018.

Confidential, Dallas, TX

Hadoop Developer

Responsibilities:

  • Developed the framework for multiple data sources ingestion capabilities. Configured framework wif required metadata for data ingestion for all the data sources.
  • Installed and configured Hadoop Map Reduce, HDFS, developed multiple Map Reduce jobs in java for data cleaning and preprocessing.
  • Developed a mapping of Data ingestion type to the program mapping (Sqoop, Pig or Kafka)
  • Developed Ingestion capability using Sqoop, Kafka and pig. Leveraged spark for data processing and transformation
  • Developed the real time / near real time framework using Kafka and Flume capabilities
  • Developed framework to decide on data formats like Parquet, AVRO, ORC etc.
  • Developed Spark code using Python and Spark-SQL for faster processing and testing.
  • Developed Merge jobs in Python to extract and load data into MySQL database.
  • Worked on Python Open stack API's and used Python scripts to update content in the database and manipulate files.
  • Implement code in Python to retrieve and manipulate data. Created Data Quality Scripts using SQL and Hive to validate successful data load and quality of the data. Created various types of data visualizations using Python and Tableau.
  • Worked on Spark SQL for joining multiple hive tables and write them to a final hive table and stored them on S3.
  • Implemented Spark RDD transformations to Map business analysis and apply actions on top of transformations.
  • Performed querying of both managed and external tables created by Hive.
  • Flume was used to ingest streaming data into HDFS or Kafka topics, where it acted as a Kafka producer. Multiple Flume agents were also used to collect data from multiple sources into a Flume collector.
  • Continuous monitoring and managing the Hadoop cluster through Cloudera Manager.
  • Used Spark API over Cloudera Hadoop YARN to perform analytics on data in Hive & Google Cloud.
  • Responsible in creating Hive tables, loading wif data and writing Hive queries.
  • Setting up Cloudera Clusters, adding nodes /hosts. worked wif source control management such as GitHub to push and pull code from repositories.
  • Performance Tuning of Cloudera Clusters in terms of resources allocation across Yarn, HDFS and Impala.

Environment: Hadoop, Cloudera 5.15, Spark 2.1, HDFS, Python, MapReduce, YARN, Hive, Pig, Sqoop, Oozie, Java, Unix

Confidential

Java Developer

Responsibilities:

  • Designed the user interfaces using JSP.
  • Developed Custom tags, JSTL to support custom User Interfaces.
  • Developed the application using Struts (MVC) Framework.
  • Implemented Business processes such as user autantication, Account Transfer using Session EJBs.
  • Used Eclipse to write the code for JSP, Servlets, Struts and EJBs.
  • Deployed the applications on Web Logic Application Server.
  • Worked on Large-scale Hadoop YARN cluster for distributed data processing and analysis using Sqoop, Pig, Hive, Impala and NoSQL databases. Develop Hadoop data processes using Hive and/or Impala.
  • Worked on Postman to test the sample data of the API’s.
  • Used Java Messaging Services (JMS) and Backend messaging for reliable and asynchronous exchange of important information such as payment status report.
  • Developed the entire Application(s) through Eclipse.
  • Worked wif Web Logic Application Server to deploy the Application(s).
  • Developed the Ant scripts for preparing WAR files used to deploy J2EE components.
  • Used JDBC for database connectivity to Oracle.
  • Worked wif Oracle Database to create tables, procedures, functions and select statements.
  • Used JUnit Testing, debugging, and bug fixing.
  • Used Log4J to capture the log that includes runtime exceptions and developed WAR framework to alert the client and production support in case of application failures.
  • Performed Data driven testing using Selenium and TestNG functions which reads data from property and XML files. Involved in CICD process using GIT, Jenkins’s job creation, Maven build and publish.
  • Used Maven to build and run the Selenium automation framework. Involved in building and deploying scripts using Maven to generate WAR, EAR and JAR files.
  • Worked on importing data from HDFS to Oracle database and vice-versa using SQOOP to configure Hive Meta store wif MySQL, which stores the metadata for Hive tables.
  • Importing and exporting Data from MySQL/Oracle to HiveQL Using SQOOP.
  • Worked on deploying Hadoop cluster wif multiple nodes and different big data analytic tools including Pig, HBase database and Sqoop. Got good experience wif NoSQL database.

Environment: Eclipse, Web Sphere Application Server, JSP, Servlet, HTML, JUnit, Hibernate, Struts, XML, JAXP, CVS, JAX-RPC, AXIS, SOAP, TOAD, AJAX, Jenkins, Maven, Log4J, UNIX, Linux, NoSQL, MySQL Workbench, Java, Eclipse, J2EE, JSP, Struts, JNDI, Oracle 10g, HTML, XML, Web Logic 8.1, Ant, CVS, Log4J, JUnit, JMS, JDBC, JavaScript, Eclipse IDE, UNIX Shell Scripting, Rational Unified Process (RUP).

Confidential 

Java Developer

Responsibilities:

  • Responsible and active in the analysis, design, implementation, and deployment of full Software Development Lifecycle (SDLC) of the project.
  • Designed and developed user interface using JSP, HTML and JavaScript.
  • Defined the search criteria and pulled out the record of the customer from the database. Make the required changes and save the updated record back to the database.
  • Validated the fields of user registration screen and login screen by writing JavaScript and jQuery validations.
  • Used DAO and JDBC for database access.
  • Developed stored procedures and triggers using PL/SQL in order to calculate and update the tables to implement business logic.
  • Design and develop XML processing components for dynamic menus on the application.
  • Involved in postproduction support and maintenance of the application.
  • Involved in the analysis, design, implementation, and testing of the project modules.
  • Implemented the presentation layer wif HTML, XHTML and JavaScript.
  • Developed web components using JSP and JDBC.
  • Deploying the Application to the JBOSS Application Server.
  • Requirement gatherings from various stakeholders of the project.
  • Effort-estimation and estimating timelines for development tasks.
  • Used to J2EE and EJB to handle the business flow and Functionality.
  • Implemented database using SQL Server.
  • Designed tables and indexes.
  • Wrote complex SQL queries and stored procedures. worked wif source control management such as GitHub to push and pull code from repositories.
  • Involved in fixing bugs and unit testing wif test cases using JUnit.
  • Created user and technical documentation.

Environment: s: Java, Oracle, HTML, XML, SQL, J2EE, JUnit, JDBC, JSP, Tomcat, SQL Server, MongoDB, JavaScript, GitHub, SourceTree, NetBeans.

We'd love your feedback!