Software Engineer/big Data Resume
Durham, NC
PROFISSIONAL SUMMARY:
- 8 years of professional IT experience in analyzing requirements, designing, building, highly distributed mission critical products and applications.
- Highly dedicated and results oriented Hadoop Developer with 3+ years of strong end - to-end experience on Hadoop Development with varying level of expertise around different BIGDATA Environmentprojects.
- Expertise in core Hadoop and Hadoop technology stack which includes HDFS, MapReduce, Oozie, Hive, Sqoop, Pig, Flume, HBase, Spark, Kafka, and Zookeeper.
- Extensive knowledge on NoSQL databases like HBase, Cassandra and Mongo DB and its Integration with Hadoop cluster.
- Create 3 node NiFi cluster configuration for existing process and build the end to end dataflow.
- Worked on ELK stack (Elasticsearch Logstash Kibana) and created the 5-node cluster to store data.
- UsedCassandra CQL with Java APIs to retrieve data from Cassandra tables.
- Developed fan-out workflow using Flumeand Kafkafor ingesting data from various data sources like Webservers, RESTAPI by using network sources and ingested data into Hadoop with HDFS sink.
- Good Experience in importing and exporting data between HDFS and RDBMS using Sqoop.
- Expertise in writingHadoopJobs for analyzing data using Hive QL (Queries), Pig Latin (Data flow language), and custom MapReduce programs in Java.
- Experienced in working with structured data using Hive QL, join operations, Hive UDFs, partitions, bucketing and internal/external tables.
- Involvement in creating custom UDFs for Pig and Hive to consolidate strategies and usefulness of Python/Java into Pig Latin and HQL (HiveQL).
- Excellent working Knowledge in Spark Core, Spark SQL, Spark Streaming.
- Planned and created pipeline for constant information ingestion utilizing Kafka, Spark streaming and different NoSQL databases.
- Developed Scala scripts, UDF's using both Data frames/SQL and RDD/MapReduce in Spark for Data Aggregation, queries and writing data into HDFS through Sqoop.
- Good experience on Zookeeper for coordinating cluster resources and Oozie for scheduling.
- Experienced with full text search and implemented data querying with faceted reader search using Solr.
- Experienced in working with monitoring tools to check status of cluster using Cloudera manager.
- Good knowledge on Amazon AWS concepts like EMR and EC2 web services which provides fast andefficient processing of Big Data.
- Extensive experience with SQL, PL/SQL, ORACLE and DB2 database concepts.
- Implemented Data Quality in ETL Tool Talendand having good knowledge in Data Warehousing and ETL Tools like IBM DataStage, Informatica and Talend.
- Experienced in backend development using SQL, stored procedures on Oracle 10g and 11g.
- Good working knowledge on multithreading Core Java, J2EE, JDBC, jQuery, JavaScript, and Web Services (SOAP, REST).
- Good understanding in content publishing/deployment, end-to-end content lifecycle, delivery processes and web content management.
- Good Problem solving and analytical skills, admire to learn new technical skills.
- Experience in working with Onsite-Offshore model.
TECHNICAL SKILLS:
Hadoop Distribution: Horton works, Cloudera, Apache, EMR
Hadoop Data Services: Hive, Pig, Sqoop, Flume, Spark, Kafka, NiFi
Hadoop Operational Services: Zookeeper, Oozie
NO SQL Databases: HBase, Cassandra, MongoDB, Elasticsearch
Cloud Services: Amazon AWS
Languages: SQL, PL/SQL, Pig Latin, HiveQL, Unix Shell Scripting, Java, Python, Scala
ETL Tools: Informatica, IBM DataStage, Talend
Java & J2EE Technologies: Core Java, Servlets, Hibernate, Spring,JSP, JDBC
Application Servers: Web Logic, Web Sphere, Tomcat.
Databases: Oracle, MySQL, DB2, Teradata, MS SQL Server
Operating Systems: UNIX, Windows, iOS, LINUX
WORK EXPERIENCE:
Confidential, Durham, NC
Software Engineer/Big Data
Responsibilities:
- Analyzing the business requirements and transforming them to detail design specifications.
- Created a 5 node Elasticsearch cluster with Logstash and Kibana to store Q&A pairs.
- Used Logstash pipeline to transfer data from PostgreSQL to Elasticsearch.
- Performed data visualization and monitor the complete data using Kibana.
- Did a POC on NiFito design end to end dataflow through NiFi for existing process.
- Created NiFi Cluster manager configuration with 3 nodes using Zookeeper.
- Performed data flowFile operations in NiFi using different processors.
- Ingested data from various data sources into Hadoop HDFS/Hive tables and managed data pipelines in providing data to business/data scientists for performing the analytics.
- Performed analysis and tuning (Partitioning & Bucketing) on the Hive Queries to improve the time efficiency by 15% for Monthly processing
- Working on Proof of Concept analysis to understand the performance benefits by replacing the current Hive Meta-store architecture using Apache Kudu as the underlying data store.
- Analyzed the data processing from backed server to PostgreSQL database.
- Worked on RESTful web services to create APIs for the Chat Bot.
- Worked on designing and documenting the RESTful web services with the help of swagger.
- Involved in Big Data Nirvana Brain Storming for future projects.
Confidential, Sterling, VA
Spark/Hadoop Developer
Responsibilities:
- Extensively migrated existing architecture toSpark Streaming to process the live streaming data.
- Responsible forSparkCore configuration based on type of Input Source.
- Executed Spark code using Scala forSpark Streaming/SQL for faster processing of data.
- Performed SQL Joins among Hive tables to get input forSparkbatch process.
- Involved in converting Hive/SQL queries intoSparktransformations usingSparkRDDs.
- Developed Kafka producer and brokers for message handling.
- Involved in importing the real-time data to Hadoop using Kafka and implemented the Oozie job for daily imports.
- Used Amazon CLI for data transfers to and from Amazon S3 buckets.
- Executed Hadoop/Sparkjobs on AWS EMR using programs, data stored in S3 Buckets.
- Experience in pulling the data from Amazon S3 bucket to data lake and built Hive tables on top of it and created data frames inSpark to perform further analysis.
- ImplementedSparkRDD transformations, actions to implement business analysis.
- Used Amazon CloudWatch to monitor and track resources on AWS.
- Worked on Data serialization formats for converting complex objects into sequence bits by using Avro, Parquet, JSON, CSV formats.
- The RDDs and data frames undergo various transformations and actions and are stored in HDFS as parquet Files.
- DevelopedSparkscripts by using Scala shell commands as per the requirement.
- Involved in converting Cassandra/Hive/SQL queries intoSparktransformations usingSparkRDD's in Scala and Python.
- UsingSpark-Streaming APIs to perform transformations and actions on the fly for building the common learner data model which gets the data from Kafka in near real time and persists into Cassandra.
- Implemented CRUD operations using CQL on top of Cassandra file system.
- Worked on analyzing and examining customer behavioral data using Cassandra.
- Set upSolrClouds for distributing indexing and search.
- Analyzes large amount of data sets to determine optimal way to aggregate and report on it.
- Involved in daily SCRUM meetings to discuss the development/progress and was active in making scrum meetings more productive.
Environment: Cassandra, Kafka, Spark, Pig, Hive, Oozie, AWS, Solr, SQL, Scala, Python, Java, FileZilla, putty, IntelliJ, github.
Confidential, Nashville, TN
Hadoop Developer
Responsibilities:
- Development of Apache Flume client to send data as events to flume sever and stored in file and HDFS.
- Developed Flume configuration to extract log data from different resources and transfer data with different file formats (JSON, XML, Parquet) to Hive tables using different SerDe's.
- Developed PIG Latin scripts to extract the data from the web server output files to load into HDFS.
- Involved in developing Hive DDLs to create, alter and drop Hive tables.
- Created concurrent access for Hive tables with shared and exclusive locking that can be enabled in Hive with the help of Zookeeper implementation in the cluster.
- Did various performance optimizations like using distributed cache for small datasets, partition and bucketing in Hive, doing map side joins.
- Developed the Sqoop scripts to make the interaction between HDFS and RDBMS (Oracle, MySQL).
- Plan, deploy, monitor, and maintain Amazon AWS cloud infrastructure consisting of multiple EC2 nodes and VMwareas required in the environment.
- Storing and loading the data from HDFS to Amazon S3 and backing up the Namespace data into NFS.
- Implemented Name Node backup using NFS. This was done for High availability.
- Used PIG to perform data validation on the data ingested using Sqoop andFlume and the cleansed data set is pushed into MongoDB.
- Designed and implemented the MongoDB schema.
- Wrote services to store and retrieve user data from the MongoDB for the application on devices.
- Used Mongoose API to access the MongoDB from NodeJS.
- Created and Implemented Business validation and coverage Price Gap Rules in Talend on Hive, using Talend Tool.
- Involved in development of Talend components to validate the data quality across different data sources.
- Automated and Scheduling the Rules on Weekly, Monthly Basis in TAC (Talend Administration Centre).
- Wrote shell scripts for rolling day-to-day processes and it is automated.
- Written the shell scripts to monitor the data ofHadoopdaemon services and respond accordingly to any warning or failure conditions.
Environment: Apache Flume, Hive, Pig, HDFS, Zookeeper, Sqoop, RDBMS, AWS, MongoDB, Talend, Shell Scripts, Eclipse, WinSCP.
Confidential, Schaumburg, IL
Big Data Engineer
Responsibilities:
- Developed MapReduce/ EMR jobs to analyze the data and provide heuristics and reports. The heuristics were used for improving campaign targeting and efficiency.
- Responsible for building scalable distributed data solutions usingHadoop.
- Developed Simple to complex MapReduce Jobs that are implemented using Hive and Pig.
- Analyzed the data by performing Hive queries (HiveQL) and running Pig scripts (Pig Latin) to study customer behavior and used UDF's to implement business logic inHadoop.
- Implemented business logic by writing UDFs in Java and used various UDFs from other sources.
- Managing and ReviewingHadoopLog Files, deploy and MaintainingHadoopCluster.
- Created HBase tables to load large sets of structured, semi-structured and unstructured data coming from UNIX, NoSQL.
- Supporting HBase Architecture Design with theHadoopArchitect group to build up a Database Design in HDFS.
- Experience in creating tables, dropping, and altered at run time without blocking updates and queries using HBase.
- Wrote Flume configuration files for importing streaming log data into HBase with Flume.
- Utilized ApacheHadoopenvironment by ClouderaDistribution.
- Exported data from HDFS environment into RDBMS using Sqoop for report generation and visualization purpose.
- Using Oozie workflows and enabled email alerts on any failure cases.
- Managed and reviewedHadooplog files to identify issues when job fails.
Environment: MapReduce, Hive, Pig, Sqoop, Oozie, HBase, Flume, Cloudera, HDFS, RDBMS, java, eclipse, VMware, mainframes, WinSCP, putty.
Confidential, Santa Clara, CA
Hadoop Developer
Responsibilities:
- Involved in the complete Software Development Life Cycle (SDLC) to develop the application.
- Worked on analyzing Hadoop cluster and different big data analytic tools including Pig, Sqoop, Zookeeper.
- Involved in loading data from Linux file system to HDFS.
- Exported the analyzed data to the relational database using Sqoop for visualization and to generate reports for the BI team.
- Importing and exporting data into HDFS and Hive using Sqoop from SQL server and vice versa.
- Developed multiple MapReduce jobs in Java for data cleaning.
- Created PigLatin scripts to sort, group, join and filter the enterprise wise data.
- Involved in creatingHive tables, loading with data, and writing hive queries that will run internally in MapReduce.
- Supported MapReduce programs those are running on the cluster.
- Analyzed large data sets by running Hivequeries and Pig scripts.
- Implemented Frameworks using Java and Python to automate the ingestion flow.
- Worked on tuning the performance on Pig queries.
- Mentored analyst and test team for writing Hive Queries.
- Used Oozie workflow engine to run multiple MapReduce jobs.
- Worked on Zookeeper for coordinating between different master node and data nodes.
Environment: HDFS, MapReduce, Hive, Pig, Sqoop, Linux, Java, python, Oozie, Zookeeper, SQL Server, Linux.
Confidential
Software Developer
Responsibilities:
- Involved in End to End Design and Development of UI Layer, Service Layer, and Persistence Layer.
- Implemented Spring MVC for designing and implementing the UI Layer for the application.
- Implemented UI screens using JSF for defining and executing UI flow in the application for the Order Guide module.
- Have used AJAX to retrieve data from server synchronously in the background without interfering with the display and existing page in an interactive way.
- Have Used DWR (Direct Web Remoting) generated script to make AJAX calls to JAVA.
- Involved in writing JavaScript for dynamic manipulation of the elements on the screen and to validate the input.
- Have used pair programming model for the development.
- Involved in writing Spring Validator Classes for validating the input data.
- Have set up Acegi security for the application using the spring framework.
- Have used JAXB to marshal and unmarshal java objects to Communicate with the backend mainframe system.
- Involved in writing complex PL/SQL and SQL blocks for the application
- Worked on persistence layer using O/R Mapping Tool Hibernate with Oracle 10g Database.
- Provided expertise for performance optimizations on the end-to-end solution, implemented performance enhancements to database interaction objects.
- Used Log4j package for the debugging, info, and error tracings.
Environment: MVC Architecture, Spring-IOC, Hibernate-JPA, JAVA Web Services-REST, Spring Batch, XML, Java Script, Oracle, Database (10g) and JSF UI, JUnits.
Confidential
Java Developer
Responsibilities:
- Used Servlets, Struts, JSP andJavaBeans for developing the Performance module using Legacy Code.
- Involved in EMI (Repayments), Cheque Bounce, Deposits module in all activities.
- Involved in coding for JSP pages, Form Beans, and Action Classes in Struts.
- Created Custom Tag Libraries to support the Struts framework.
- Involved in Writing Validations.
- Involved in Database Connectivity through JDBC.
- Involved in Writing DAO's.
- Developed JUnit Test cases and performed application testing for QC team.
- Used JavaScript for client-side validations.
- Participated in weekly project meetings, updates, and Provided Estimates for the assigned task.
Environment: JAVA, J2EE, STRUTS, JSP, JDBC, ANT, XML, IBM Web Sphere, WSAD, JUNIT, DB2, Rational Rose, CVS, SOAP, and RUP.