Software Developer /big Data Resume
Malvern, PA
SUMMARY
- Over 8+ years of experience in software and development, 3+ years of experience in Big Data/Hadoop which includes analysis, design and development. Passionate towards working inHadoopand Big data Technologies, Big data Processing, J2EE application development, Analytics and Visualization.
- In depth knowledge of Object Oriented programming methodologies (OOPS) and object oriented features like Inheritance, Polymorphism, Exception handling and Templates and development experience with Java technologies, Unix Scripting.
- Around 4 years of working experience in setting, configuring and monitoring ofHadoopcluster of Cloudera, Hortonworks distribution, Spark.
- Good experience in system monitoring, development and support related activities for Hadoop and Hadoop Admin, Java/J2EE Technologies and Spark Streaming, Spark SQL, AWS RDS.
- Experience in working on ApacheHadoopecosystem components like Map Reduce, HDFS, Hive, Pig, HBase, Flume, Sqoop, Oozie.
- Good understanding onHadoopMR1 and MR2 (YARN) Architecture.
- Experience in Hadoopdistributions like Apache, Cloudera (CDH), Hortonworks.
- Extraordinary understanding of Hadoop cluster building and hands on involvement withHadoopsegments such as Job Tracker, Task Tracker, Name Node, Data Node and HDFS Framework.
- Extensive experience in using Flume to transfer log data files to Hadoop Distributed File System.
- Good Knowledge in Apache Spark data processing to handle data from RDBMS and streaming sources with Spark streaming.
- Worked with Kafka streaming to fetch the data from real time or near real time.
- Good Knowledge in Spark SQL queries to load tables into HDFS to run select queries on top.
- Experience in developing HiveQL scripts for Data Analysis and ETL purposes and also extended the default functionality by writing User Defined Functions (UDF's) for data specific processing.
- Good experience working with differentHadoopfile formats like SequenceFile, JSON, ORC, AVRO and Parquet.
- Experience with migrating data to and from RDBMS and unstructured sources into HDFS using Sqoop.
- Tested various flume agents, data ingestion into HDFS, retrieving and validating snappy files.
- Experience in NoSQL Column - Oriented Databases like Cassandra, HBase, MongoDB and its Integration withHadoop cluster.
- Very good understanding Cassandra cluster mechanism that includes replication strategies, snitch, gossip, consistent hashing and consistency levels.
- Experience in job workflow scheduling and monitoring tools like Oozie .
- Experience with various scripting languages like Linux/Unix shell scripts.
- Good Knowledge in Importing refined data from HDFS into Tableau for data visualization and report.
- Have good skills in writing SPARK Jobs in Scala for processing large sets of structured, semi-structured and store them in HDFS.
- Having a good knowledge of Spark SQL and Spark using (Python)Pyspark.
- Strong work ethic with desire to succeed and make significant contributions to the organization.
- Strong problem solving skills, good communication, interpersonal skills and a good team player.
- Have the motivation to take independent responsibility as well as ability to contribute and be a productive team member.
- Working Knowledge in Teradata.
- Good Knowledge with Autosys.
TECHNICAL SKILLS
Hadoop/Big Data Technologies: HDFS, MapReduce, Hive, Pig, Sqoop, YARN, Flume, HBase, Impala, Cassandra, Oozie, Zookeeper, Spark, Pyspark, Scala,Git.
Programming Languages: Java JDK1.4/1.5/1.6 (JDK 7/JDK 8), C/C++, R, HTML, SQL, PL/SQL, Python.
Operating Systems: UNIX, Windows, LINUX.
Application Servers: IBM Web sphere, Tomcat, Web Logic, Web, Sphere.
Web technologies: XHTML, JavaScript, AngularJS, AJAX, HTML, XML, CSS, DOM, JQuery, AWS.
Databases: Oracle 8i/9i/10g, Microsoft SQL Server, Teradata, MySQL 4.x/5.x, MongoDB (NoSql), Dynamo DB
Tools: INFORMATICA7/8.x, SQLDeveloper, Talend. Power BI.
PROFESSIONAL EXPERIENCE
Software Developer /Big Data
Confidential, Malvern, PA
Responsibilities:
- Collaborated on insights with other Data Scientists, Business Analysts.
- Evaluate, refine and continuously improve the efficiency for existing predictive models .
- Developed Python and SQL code to extract data from Enterprise DB2 database.
- Implemented data mapping for converting Data warehouse to On-prem Hadoop .
- Created Continuous Integration Build and Deploy plans between Bit Bucket and Bamboo.
- Worked in AWS data migration frame work from Enterprise DB2 to Amazon Cloud .
- Worked in creating repositories in Bit bucket and used Spring Tool Suit to develop the code and utilized GIT in STS to perform Push, Pull and commit .
- Created Stack environment to spin the EMR cluster and created S3 buckets to store the Application .
- Worked on sqooping the Enterprise tables to S3 .
- Developed Hive scripts to analysis the data processing.
- Developed python pyspark scripts in STS for calculating data items and final score.
- Executed Shell scripting for spark jobs to trigger the workflow.
- Used Spark-SQL to Load JSON data and create Schema RDD and loaded it into Hive Tables and handled Structured data using SparkSQL
- Developing and maintaining Workflow Scheduling Jobs in Oozie for importing data from RDBMS to Hive.
- Load the data into Spark RDD and do in memory data Computation to generate the Output response.
- Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs, Scala and Python.
- Analyzed the SQL scripts and designed the solution to implement using PySpark.
Environment: Apache Hadoop, DB2, Java, HDFS, Spring Tool Suit, Hive, Oozie, AWS, EMR, Spark, Sqoop, Cloudera Distribution, Python, SQL, Control-M.
Hadoop Developer
Confidential, Brentwood, TN
Responsibilities:
- Hands on experience in loading source data like Web Logs using Kafka pipelining to HDFS.
- Experienced on loading and transforming of large sets of data from Cassandra source through Kafka and placed in HDFS for further processing.
- Created Hive Tables, loaded transactional data from RDBMS using Kafka .
- Used SparkSQL for Scala & Python interface that automatically converts RDD case classes to schema RDD.
- Extensive Working knowledge of partitioned table, UDFs, performance tuning, compression-related properties, thrift server in Hive.
- Involved in writing optimized Pig Script along with involved in developing and testing Pig Latin Scripts.
- Working knowledge in writing Pig’s Load and Store functions.
- Developed Spark Jobs on log data to transform into structured way.
- Developed optimal strategies for distributing the web log data over the cluster, importing and exporting the stored web log data into HDFS and Hive using Kafka connectors.
- Collected and aggregated large amounts of web log data from different sources such as webservers, mobile and network devices using Apache Kafka and stored the data into HDFS for analysis.
- Monitored workload, job performance and capacity planning using Cloudera Distribution.
- Developed PIG scripts for the analysis of semi structured data.
- Developed and involved in the industry specific UDF (user defined functions).
- Ability to design application taking advantage of Disaster recovery.
- Used Kafka to collect, aggregate, and store the web log data from different sources like web servers, mobile and network devices and pushed to HDFS.
- Implemented Cassandra connector for Spark 1.6.1 .
- Queried and analyzed data from Cassandra for quick searching, sorting and grouping through CQL.
- Familiar with AWS, S3, EMR, Ec2 worked by taking some Big Data.
- Analyzed the web log data using the HiveQL to extract number of unique visitors per day, page views, visit duration, most purchased product on website.
- Integrated Oozie with the rest of the Hadoop stack supporting several types of Hadoop jobs out of the box as well as system specific jobs .
Environment: Apache Hadoop, Teradata, MapReduce, Java, HDFS, CentOS 6.4, Spark, Scala, Hive, Pig, Oozie, Flume, AWS, EMR, Java (jdk 1.7), Eclipse, Sqoop, Cloudera Distribution, Python, Hbase, SQL.
Big Data developer
Confidential, Grand Rapids, MI
Responsibilities:
- Associated in complete Big Data flow of the application starting from data ingestion from upstream to HDFS, processing and analyzing the data in HDFS.
- Implementing ETL processes usingFlumeto load data from different sources database to HDFS.
- Worked on a live160 nodes Hadoop clusterrunningCDH5.
- Involved in migrating the map reduce jobs into Spark Jobs and Used Spark SQL and Data frames API to load structured and semi structured data into Spark Clusters.
- Worked with highly semi structured data of 1 PB - 2 PB .
- Developed Spark API to import data into HDFS from Teradata and created Hive tables.
- Developed Flume jobs to import data in Avro file format from RDBMS database and created hive tables on top of it.
- Involved in running all the hive scripts through hive, Hive on Spark and some through Spark SQL.
- Loading data into parquet files by applying transformation using Hive.
- Implemented various MapReduce Jobs in custom environments and updating them to Hbase tables by generating hive queries and work with orc.
- Prepared record based storage layer using HBase that enable fast, random read and write data.
- Developed Hadoop, Map Reduce, HDFS, and multiple map reduce jobs in PIG and Hive for data cleaning and pre-processing.
- Loaded the data into Spark RDD and done in memory data Computation to generate the Output response..
- Performed advanced procedures like text analytics and processing, using the in-memory computing capabilities of Spark using Scala.
- Developed Apache Sparkjobs using Scala in test environment for faster data processing and usedSparkSQL for querying.
- Implemented Cassandra connection with the Resilient Distributed Datasets .
- Migrated HiveQL queries on structured into SparkQL to improve performance.
- Analyzed data usingHadoopcomponents Hive and Pig and created tables in hive for the end users.
- Developed Flume ETL job for handling data from HTTP Source and Sink as HDFS.
- Involved in writing Hive queries and pig scripts for data analysis to meet the business requirements.
- Written Oozie flows and shell scripts to automate the flow.
- Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting.
- Experienced in managing and reviewing the Hadoop log files.
- Used Pig as ETL tool to do Transformations, even joins and some pre-aggregations before storing the data onto HDFS.
- Hands on experience in development of Linux shell scripting.
- Integrated Hive and Tableau Desktop reports and published to Tableau Server.Responsible to manage data coming from different sources.
- Experienced in transferring HiveQL to Impala for processing to minimize query response time.
- Utilized Apache Hadoop environment by Cloudera.
Environment: Hadoop, MapReduce, HDFS, Hive, Hbase Pig, Java, MySQL, Cloudera Manager, Teradata, Flume, Oozie, Pig, Python, Spark, Scala, Pyspark, Tableau, SQL.
Hadoop developer
Confidential, Albany, NY
Responsibilities:
- Responsible for building scalable distributed data solutions usingHadoop.
- Experience in installation, configuration, supporting and monitoringHadoopclusters using Apache, Cloudera distributions.
- Helped the team to increase cluster size from 55 nodes to 145+ nodes.
- Installed and configured Hadoop MapReduce, HDFS, developed multiple MapReduce jobs .
- Handled importing of data from various data sources, performed transformations using Hive, MapReduce, loaded data into HDFS and Extracted the data from MySQL into HDFS using Sqoop
- Experience in Data Warehousing and ETL processes and Strong database, SQL, ETL and data analysis skills..
- Developed data pipeline using Flume, Sqoop, Pig and Java map reduce to ingest claim data and financial histories into HDFS for analysis.
- Worked on importing data from HDFS to MYSQL database and vice-versa using SQOOP.
- Implemented Map Reduce jobs in HIVE by querying the available data.
- Configured Hive metastore with MySQL, which stores the metadata for Hive tables.
- Written Hive and Pig scripts as per requirements.
- Responsible for developing multiple Kafka Producers and Consumers as per the software requirement specifications..
- Created, altered and deleted topics (Kafka Queues) when required with varying
- Performance tuning using Partitioning, bucketing of IMPALA tables.
- Experience in NoSql database such as Hbase, MongoDB.
- Involved in cluster maintenance and monitoring.
- Good knowledge is agile methodologies.
- Design and develop JAVA API (Commerce API) which provides functionality to connect to the Cassandra through Java services.
- Connected HDFS, Hive and Cassandra using StreamSets.
- Load and transform large sets of structured, semi structured data.
- Involved in loading data from UNIX file system to HDFS.
- Worked on Python for data cleansing which differ from classic relational databases.
- Involved in knowledge transition activities to the team members.
- Successful in creating and implementing complex code changes. Successful in creating and implementing complex code changes.
Environment: Hadoop 1.0.1, Java, HDFS, Python, MapReduce, Pig, Hive, Impala, Sqoop, Kafka, MongoDb, HBase, Java, SQL scripting, Linux shell scripting, Eclipse and Cloudera.
Java/Hadoop Developer
Confidential
Responsibilities:
- Developed MapReduce jobs using Java API.
- Wrote MapReduce jobs using Pig Latin.
- Developed workflow using Oozie for running MapReduce jobs and Hive Queries.
- Worked on Cluster coordination services through Zookeeper.
- Worked on loading log data directly into HDFS using Flume.
- Involved in loading data from LINUX file system to HDFS.
- Responsible for managing data from multiple sources BIDW & Analytics knowledge.
- Experienced in running Hadoop streaming jobs to process terabytes of xml format data.
- Responsible to manage data coming from different sources.
- Assisted in exporting analyzed data to relational databases using Sqoop.
- Experienced in importing and exporting data into HDFS and assisted in exporting analyzed data to RDBMS using SQOOP.
- Implemented JMS for asynchronous auditing purposes.
- Created and maintained Technical documentation for launching Cloudera Hadoop Clusters and for executing Hive queries and Pig Scripts
- Experience in defining, designing and developing Java applications, specially using Hadoop Map/Reduce by leveraging frameworks such as Cascading and Hive.
- Experience in Document designs and procedures for building and managing Hadoop clusters.
- Strong Experience in troubleshooting the operating system, maintaining the cluster issues and also java related bugs.
- Experience in Automate deployment, management and self-serve troubleshooting applications.
- Define and evolve existing architecture to scale with growth data volume, users and usage.
- Design and develop JAVA API (Commerce API) which provides functionality to connect to the Cassandra through Java services.
- Installed and configured Hive and also written Hive UDFs.
- Experience in managing the CVS and migrating into Subversion.
Environment: Hadoop, HDFS, Hive, Flume, Sqoop, HBase, PIG, Eclipse, MySQL and Ubuntu, Java (JDK 1.6).
Java Developer
Confidential
Responsibilities:
- Involved in design, development and analysis documents in sharing with Clients.
- Developed web pages using Struts framework, JSP, XML, JavaScript, Hibernate, Springs, Html/ DHTML and CSS, configure struts application, use tag library.
- Developed Application using Spring and Hibernate, Spring batch, Web Services like Soap and restful Web services.
- Used Spring Framework at Business Tier and also Spring's Bean Factory for initializing services.
- Used AJAX, JavaScript to create interactive user interface.
- Implemented client side validations using JavaScript & server side validations.
- Developed Single Page application using angular JS & backbone JS.
- Implemented Hibernate to persist the data into Database and wrote HQL based queries to implement CRUD operations on the data.
- Developed an API to write XML documents from a database. Utilized XML and XSL Transformation for dynamic web-content and database connectivity.
- Database modeling, administration and development using SQL and PL/SQL in Oracle 11g.
- Coded different deployment descriptors using XML. Generated Jar files are deployed on Apache Tomcat Server.
- Involved in the development of presentation layer and GUI framework in JSP. Client-Side validations were done using JavaScript.
- Involved in configuring and deploying the application using WebSphere.
- Involved in code reviews and mentored the team in resolving issues.
- Undertook the Integration and testing of the various parts of the application.
- Developed automated Build files using ANT.
- Used Subversion for version control and log4j for logging errors.
- Code Walkthrough, Test cases and Test Plans
Environment: HTML5, JSP, Servlets, JDBC, JavaScript, Json, jQuery, Spring, SQL, Oracle 11g, Tomcat, Eclipse IDE, XML, XSL, ANT, Tomcat 5.
Associate Java Developer
Confidential
Responsibilities:
- Analysis, design and development of application based on J2EE and design patterns
- Involved in all phases of SDLC (Software Development Life Cycle)
- Developed user interface using JSP, HTML, CSS and JavaScript
- Involved in developing functional model, object model and dynamic model using UML
- Development of the Java classes to be used in JSP and Servlets
- Implemented asynchronous functionalities like e-mail notification using JMS.
- Developed and performed unit testing using JUnit framework in a Test-Driven environment (TDD).
- Implemented Multithreading to achieve consistent concurrency in the application
- Used the Struts framework for managing the navigation and page flow
- Created SQL queries and used PL/SQL stored procedures
- Used JDBC for database transactions
- Developed stored procedures in Oracle
- Involved in developing the helper classes for better data exchange between the MVC layers
- Used Test Driven Development approach and wrote many unit and integration test cases
- Used Eclipse as IDE tool to develop the application and JIRA for bug and issue tracking
- Worked on running integrated testing using JUNIT and XML for building the data structures required for the Web Service
- Used ANT tool for building and packaging the application
- Code repository management using SVN
Environment: Java/J2EE, Struts, Servlets, HTML, CSS, JSP, XML, JavaScript, Water fall, Eclipse IDE, Oracle, SQL, JDBC, JBOSS, JUNIT, ANT, JUNIT, Eclipse ANT, SVN, Apache Tomcat Server.