Bigdata Engineer/developer Resume
OH
PROFESSIONAL SUMMARY:
- 7+ years of extensive hands - on experience in IT industry and including an experience in development using Big Data/Hadoop ecosystem.
- Experience in installation, supporting and managing Hadoop Clusters using Apache Cloudera distributions, Hortonworks and Microsoft Azure Cloud.
- Good experience in processing Unstructured, Semi-structured and Structured data.
- Thorough understanding of the HDFS, Map Reduce framework, Spark framework and Map Reduce Jobs
- Experienced in building highly scalable Big-data solutions using Hadoop and multiple distributions i.e., Horton works and NoSQL platforms
- Hands-on experience on major components in Hadoop Ecosystem including Hive, HBase, Sqoop, Scala, Kafka.
- Ability to move the data in and out of Hadoop RDBMS, No-SQL from various systems using SQOOP and other traditional data movement technologies.
- Good understanding and experienced with HBase Schema design.
- Experience in production environment for Big Data solutions and monitoring the cluster and nightly batch job schedule.
- Worked with TDD and Git, GitHub for source code and Team city for Continuous Integration automating the pipeline and Spark Jobs for Maven Projects.
- Having knowledge on Low Level Design and Use Case Diagrams.
- Experience in Hadoop Distributions like Cloudera, Hortonworks, Azure and experience with Hadoop applications (monitoring, debugging, and performance tuning).
- Having hands on experience in Data Warehousing design and loading tables with large data and can develop the enterprise levels of data.
- Good understanding of Scrum methodologies, Agile methodologies, Test Driven Development and continuous integration
- Analyzed large amounts of data sets to determine optimal way to aggregate and report on it.
- Expertise in developing Spark code using Scala and Spark-SQL/Streaming for faster testing and processing of data and experience with the Spark improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, Pair RDD's, Spark, YARN.
- Experienced in importing and exporting data using Sqoop from HDFS to Relational Database and expertise in job workflow scheduling and monitoring tools like Oozie, TWS (Tivoli Workload Scheduler).
- Involve in the integration of the applications with different tools like Team city, GitLab, Bit bucket and JIRA for issues and stories tracking.
- Ability to work independently or in a group with minimal supervision to meet deadlines.
TECHNICAL SKILLS:
Programming Languages: Java, C, C++, SQL, PL/SQL, UML, Python, Scala
HDFS, MapReduce, Yarn, Hive, Spark, Spark: SQL, HBase, Sqoop, Zookeeper, Oozie and TWS
Tools: Maven, log4j, SVN, CVS, GIT, DB Visualizer, Team city, TWS (Tivoli Workload Scheduler), Putty, WinSCP, Hortonworks
IDE s: Eclipse, My Eclipse, IntelliJ
RDBMS: Oracle, MySQL, DB2
Methodologies: Agile Scrum, Waterfall model
Operating Systems: Windows, Linux/Unix.
NoSQL Databases: HBase
WORK EXPERIENCE:
Confidential, OH
Bigdata Engineer/Developer
Responsibilities:
- Developed Spark 2.1/2.4 Scala component to process the business logic and store the computation results of 10 TB data into HBase database to access the downstream web apps using Big SQL db2 database in Apache Hadoop environment by Hortonworks.
- Worked on submitting the SPARK jobs on the cluster with 250 cores size on single node as well as multiple nodes.
- Good understanding on the architectural designs and worked closely with business analyst to grab the business information in order to speed up the challenging processes.
- Created Pipelines in ADF using Linked Services/Datasets/Pipeline/ to Extract, Transform and load data from different sources like Azure SQL, Blob storage, Azure SQL Data warehouse and moving data from on premises to Azure cloud. Created activities for data ingestion using Azure data Factory.
- Used ADS Gen2 on top of Blob Storage for storing the data and explored to use Azure Synapse for querying the data for some use cases.
- Developed Json Scripts for deploying the Pipeline in Azure Data Factory (ADF) that processes the data using the Cosmos Activity.
- Used Spark UI to observe the running of a submitted Spark Job at the node level.
- Worked on Putty to run the Spark SQL commands and Spark Jobs and tune the code according to the performance of the job on the Cluster.
- Test the developed modules in the application using Junit Library and Junit testing Framework
- Analyze structured, unstructured data, and file system data and loading the data to HBase tables based on the project requirement using IBM Big SQL with Sqoop mechanism and processing the data using Spark SQL in-memory computation &processing results to Hive, HBase
- Handle importing other enterprise data from different data sources into HDFS using JDBC and Load Hadoop in Big SQL and perform transformations using Spark API to perform necessary transformations and actions on the fly for building the common learner data model which gets the data from upstream in near real time and persists into HBase.
- Working with different file structures with different Hive file formats like Text file, CSV file, Sequence file, Parquet to analyze the data to build data model and reading them from HDFS and processing through parquet files and loading into HBASE tables.
- Develop the Batch jobs using Scala programming Language to process the data from files and tables, transform the data with the business logic and deliver it to the user
- Worked in Continuous Deployment module, which is used to create new tables or to update the existing table structure if needed in different environments along with DDL (Data Definition Language) creation for the tables
- Loading data from Linux/Unix file system to HDFS and working with PUTTY for the better communication between Unix and Window system and for accessing the data files in the Hadoop environment;
- Involved in Spark tuning to improve the Jobs performance based on the Pepper Data monitoring tool metrics.
- Developed and Implemented Spark ETL custom component to extract the data from upstream systems and push the data to HDFS and finally store the data in HBase with wide row format
- Enhance the application with new features and make the performance improvement in all the modules of the application, exploring and applying spark techniques like partitioning the data with Keys and writing it to parquet files which improve performance improvement.
- Work with Continuous integration tools like maven, Team city, IntelliJ and scheduling the jobs with TWS ( Tivoli Workload Scheduler) tool,
- Creating and cloning the jobs and Job streams in TWS tool and promoting them to higher environments and monitoring the nightly batch schedule of Hadoop jobs in cluster along with production support.
- Coordinating with co-developers, agile development and project management team and external systems and responsible for demos, presentation of developed modules to the project management team.
Technical Environment: Scala, Spark framework, Linux, Jira, Bitbucket, IBM Big SQL, Hive, HBase, IntelliJ IDEA, Maven, Db2 Visualizer, ETL, TeamCity, WinSCP, PuTTY, IBM TWS (Tivoli Workload Scheduler), Windows, Azure Data Factory, Linux, Microsoft Azure, Azure Blob, Azure Data warehouse
Confidential, Parsippany NJ
Bigdata Engineer/Developer
Responsibilities:
- Configured Spark Streaming to receive real time data from the Apache Kafka and store the stream data to HDFS using Python.
- Doing various POC’s and propose pros and cons of various approaches.
- Used Kafka to determine the pet data metrics to produce centralized feed from different inputs and persisting the data using Kafka to the disk and processing as required.
- Processed data into HDFS by developing solutions and analyzed the data using Map Reduce and Hive to produce summary results from Hadoop to downstream systems.
- Developed Scala scripts, UDF’s using PySpark, Data frames/SQL and RDD’s in Spark 1.3 for Data Aggregation, queries and writing data back into OLTP system directly or through Sqoop.
- Worked on writing Spark scripts for improving the performance and optimization of the existing algorithms in Hadoop using Spark Context/Session, SparkSQL, Data Frame, Pair RDD's.
- Involved in loading and transforming large sets of Structured, Semi-Structured and Unstructured data and analyzed them by running Hive queries and created Managed tables and External tables in Hive and loaded data from HDFS.
- Scheduled several times based Oozie workflow by developing Python scripts.
- Designing ETL Data Pipeline flow to ingest the data from RDBMS source to Hadoop using shell script, Sqoop and MySQL.
- Used Oozie workflow engine to manage interdependent Hadoop jobs and to automate several types of Hadoop jobs such as map-reduce jobs, Hive and Sqoop.
- Involved in Spark and Spark Streaming creating RDD's, applying operations -Transformation and Actions and created partitioned tables and loaded data using both static partition and dynamic partition method in Hive.
- Launching EC2, creating security groups, auto-scaling, load balancers, Route 53, SES and SNS in the defined virtual private connection.
- Designed and provisioned the platform architecture to execute Hadoop and Machine Learning use cases under Cloud infrastructure, AWS, EMR, and S3.
- Using Kafka on publish-subscribe messaging as a distributed commit log, have experienced in its fast, scalable and durability.
- Completed data extraction, aggregation and analysis in HDFS by using PySpark and store the data needed to Hive and Test-Driven Development (TDD) process and extensive experience with Agile and SCRUM programming methodology.
- Research evaluate and utilize new technologies/tools/frameworks around Hadoop ecosystem and improved the Performance by tuning of HIVE.
Environment: HDFS, Map Reduce, Python, Hive, Sqoop, Oozie Scheduler, Pyspark, Shell Scripts, Oracle, HBase, Cloudera, Kafka, Spark, Scala and ETL.
Confidential, OH
Spark/Scala Developer
Responsibilities:
- Worked on loading disparate data sets coming from different sources (HADOOP) environment using Spark.
- Maintained the data in Data Lake (ETL) which is coming from Teradata Database.
- Responsible in creating Hive Tables to load the data which comes from MySQL by using Sqoop.
- Worked on performing Join operations in Spark using hive and writing HQL statements as per the user requirements.
- Exploring with the Spark for improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, Pair RDD's, Spark YARN.
- Used Data Frame API solutions to pre-process large sets of structured, with different file formats like Text file, CSV, Sequence files and Parquet files and converting the distributed collection of data organized into named columns.
- Worked extensively on Hive to create, alter and drop tables and involved in writing hive queries.
- Created Views from Hive Tables on top of data residing in Data Lake. Worked closely with scrum master and team to gather information and perform daily activities.
- Deployed a 32 node Cluster and then later upgraded it into a 60-node cluster on YARN to do the initial work of coding and testing the code.
- Implemented the Business Rules in Spark/ SCALA to get the business logic and developed code from scratch in Spark using SCALA according to the technical requirements.
- Used Spark UI to observe the running of a submitted Spark Job at the node level.
- Worked on Putty to run the Spark SQL commands and Spark Jobs and tune the code according to the performance of the job on the Cluster.
- Used Microsoft Visio to put the complex working structure in a diagrammatic representation.
- Experience using WINSCP to view the data storage structure in the server and to upload JARs which were used to do the Spark Submit.
Environment: Hadoop, HDFS, Hive, Spark 1.6, SQL, HBase, UNIX Shell Scripting, MapReduce, Putty, WinSCP, IntelliJ, Linux.
Confidential
Java Developer
Responsibilities:
- Worked on Agile Methodology (Scrum) to meet timelines with quality deliverables.
- Developed the web services and implemented them using Soap UI using Web Logic application server.
- Developed and consumed Restful Web Services and Used SOAP UI to test the web service request/response scenarios.
- Deployed Web Services JSP's, Servlets, and server-side components in Tomcat Application Server and Consumed SOAP Service, developed using JAXWS.
- Utilized Log4j for request / response / error logging.
- Integrated REST URL’s in the client side and developed REST JAX-RS and SOAP JAX-WS API’s according to the client requirement.
- Implemented DAO layer using JPA (Hibernate framework) to interact with IBM database.
- Developed applications using Spring Boot.
- Migrating the legacy monolithic spring application to Micro Services platform.
- Used Spring MVC framework with JDBC to develop the entire business logic of the system.
- Used Oracle MySQL Developer to implement the database and store all kinds of information such as text, taxId, memberId, groupId, SSN, etc.
- Incorporated JDBC API to create, retrieve and update data from the database.
- Extensively wrote PL/SQL queries, triggers and stored functions to manipulate data stored within the database.
- Worked with User Interface layer for the entire application using, HTML, JSP and JQuery, React JS and Angular JS.
- Used Spring MVC Architecture to split the UI logic into Views, Models and Collections.
- Designed and developed the application to be responsive using Bootstrap CSS framework. Worked on creating login authentication modules using JavaScript.
- Used Jira for task management and estimation of stories and maven building tool to build the application.
- Performed unit testing using Junit test framework.
Technical Environment: Java, J2EE, Spring, Spring MVC, Servlets, Java Script, Custom Tags, JDBC, XML, JAX-RS, JAX-WS, Oracle, Hibernate, Web Logic Application Server, IntelliJ IDE, Log4j, Soap UI, Apache HTTP Server, JQuery, React JS, Bootstrap 3, Windows.
Software Engineer
Confidential
Responsibilities:
- Analyzed requirements and created detailed Technical Design Document.
- Analyzed functional specification and reviewing changes.
- XML is used to create data transfer logic from other formats to XML file for billing module.
- Oracle database is used to design Database schema, create Database structure, Tables and Relationship diagrams.
- Web Sphere 4.0 is used as the application server.
- Developing JSP's for front end, developing Servlets and Session Beans in middle tier
- Wrote the test cases for Payment module.
- Designed and developed modules DCB and Data transmission.
- Migration of hardcoded Account numbers to Database
Technical Environment: Java, J2EE, JSP, Servlets, Java Script, Custom Tags, JDBC, XML, JAXB, Oracle, Sybase, Web Sphere 4.0 Application Server, Log4j, VSS, Windows NT