- 8+ years of IT experience in complete life cycle of software development using Object Oriented analysis and design using Big data Technologies / Hadoop ecosystem, SQL, Java, J2EE technologies.
- Around 5years of experience working on Big Data and Data Science building Advanced Customer Insight and Product Analytic Platforms using Big Data and Open Source Technologies.
- Wide experience on Data Mining, Real time Analytics, Business Intelligence, Machine Learning and Web Development.
- Leveraged strong Skills in developing applications involving Big Data technologies like Hadoop, Spark, ElasticSearch, MapReduce, Yarn, Flume, Hive, Pig, Kafka, Storm, Sqoop, HBase, Hortonworks, Cloudera, Mahout, Avro and Scala.
- Skilled programming in Map - Reduce framework and Hadoop ecosystems.
- Very good experience in designing and implementing MapReduce jobs to support distributed data processing and process large data sets utilizing the Hadoop cluster.
- Experience in implementing Inverted Indexing algorithm using MapReduce.
- Extensive experience in creating Hive tables, loading them with data and writing hive queries which will run internally in MapReduce way.
- Hands on experience in migrating complex MapReduce programs into Apache Spark RDD transformations.
- Experience in setting up standards and processes for Hadoop based application design and implementation.
- Good Exposure on Apache Hadoop MapReduce programming, PIG Scripting and HDFS.
- Worked on developing ETL processes to load data from multiple data sources to HDFS using FLUME and SQOOP, perform structural modifications using Map-Reduce, HIVE and analyze data using visualization/reporting tools.
- Experience in writing Pig UDF’s (Eval, Filter, Load and Store) and macros.
- Experience in developing customized UDF’s in java to extend Hive and Pig Latin functionality.
- Exposure on usage of Apache Kafka develop data pipeline of logs as a stream of messages using producers and consumers.
- Experience in integrating Apache Kafka with Apache Storm and created Storm data pipelines for real time processing.
- Developed and maintained operational best practices for smooth operation of Cassandra/Hadoop clusters.
- Very good understanding on NOSQL databases like MongoDB, Cassandra and HBase.
- Extracted files from MongoDB through Sqoop and placed in HDFS and processed.
- Experience in coordinating Cluster services through ZooKeeper.
- Hands on experience in setting up Apache Hadoop, MapR and Hortonworks Clusters.
- Good knowledge on Apache Hadoop Cluster planning which includes choosing the Hardware and operating systems to host an Apache Hadoop cluster.
- Experience in Hadoop Distributions like Cloudera, HortonWorks, BigInsights, MapR Windows Azure, and Impala.
- Experience using integrated development environment like Eclipse, Net beans, JDeveloper, MyEclipse.
- Excellent understanding of relational databases as pertains to application development using several RDBMS including in IBM DB2, Oracle 10g, MS SQL Server 2005/2008, and MySQL and strong database skills including SQL, Stored Procedure and PL/SQL.
- Working knowledge on J2EE development with Spring, Struts, Hibernate Frameworks in various projects and expertise in Web Services (JAXB, SOAP, WSDL, Restful) development
- Experience in writing tests using Spec2, Scala Test, Selenium, TestNg and Junit.
- Ability to work on diverse Application Servers like JBOSS, APACHE TOMCAT, WEBSPHERE.
- Worked on different OS like UNIX/Linux, Windows XP, and Windows
- A passion to learn new things (new Languages or new Implementations) have made me up to date with the latest trends and industry standard.
- Proficient in adapting to the new Work Environment and Technologies.
- Quick learner and self-motivated team player with excellent interpersonal skills.
- Well focused and can meet the expected deadlines on target.
- Good understanding of Scrum methodologies, Test Driven Development and continuous integration.
Hadoop/Big Data: HDFS, MapReduce, HBase, Pig, Hive, Sqoop, Flume, MongoDB, Avro, Hadoop Streaming, Cassandra, Oozie, Zookeeper, Spark, Strom, Kafka
Java & J2EE Technologies: Core Java, Servlets, JSP, JDBC, JNDI, Java Beans
IDE s: Eclipse, Net beans, WSAD, Oracle SQL Developer
Languages: C,C++, Java, Python, Linux shell scripts, SQL
Databases: Cassandra, MongoDB, HBase, Teradata, Oracle, MySQL, DB2
Web Servers: JBoss, Web Logic, Web Sphere, Apache Tomcat
Confidential, Minneapolis, MN
- Developed MapReduce programs to parse the raw data, and create intermediate data which would be further used to be loaded into Hive portioned data.
- Involved in data ingestion into HDFS using Sqoop for full load and Flume for incremental load on variety of sources like web server, RDBMS and Data API’s.
- Performed multiple MapReduce jobs in PIG and Hive for data cleaning and pre-processin.
- Developed Map Reduce programs to join data from different data sources using optimized joins by implementing bucketed joins or map joins depending on the requirement.
- Implementation of the Business logic layer for MongoDB Services.
- Executed Hive queries on tables and stored in Hive to perform data analysis to meet the business requirements.
- Worked on Creating Kafka topics, partitions, writing custom partitioner classes.
- Worked on Big Data Integration and Analytics based on Hadoop, Spark and Kafka.
- Worked with Kafka for the proof of concept for carrying out log processing on a distributed system.
- Integrated Apache Storm with Kafka to perform web analytics. Uploaded click stream data from Kafka to HDFS, Hbase and Hive by integrating with Storm.
- Performed real-time analysis of the incoming data using Kafka consumer API , Kafka topics, Spark Streaming utilizing Scala .
- Used the Datastax Opscenter for maintenance operations and Keyspace and table management.
- Implemented advanced procedures like text analytics and processing using the in-memory computing capabilities like Spark
- Developed Spark code using Python and Spark-SQL/Streaming for faster processing of data.
- Real time streaming the data using Spark with Kafka.
- Built real time pipeline for streaming data using Kafka and SparkStreaming.
- Developed Kafka producer and consumers, Cassandra clients and Spark along with components on HDFS, Hive .
- Processing large data sets in parallel across the Hadoop cluster for pre-processing.
- Developed the code for Importing and exporting data into HDFS using Sqoop .
- Imported data from structured data source into HDFS using Sqoop incremental imports.
- Implemented Kafka Custom partitioners to send data to different categorized topics.
- Implemented Storm topology with Streaming group to perform real time analytical operations.
- Experience in implementing Kafka Spouts for streaming data and different bolts to consume data.
- Created Hive tables, partitioners and implemented incremental imports to perform ad-hoc queries on structured data.
- Written Shell scripts that run multiple Hive jobs which helps to automate different hive tables incrementally which are used to generate different reports using Tableau for the Business use.
Environment: s: Hadoop, Hive, Flume, Linux, Shell Scripting, Java, Eclipse, MongoDB, Kafka, Spark, Zookeeper, Sqoop, Ambari.
Confidential, Austin, TX
- Developed Map Reduce jobs in Java for data cleansing, preprocessing and implemented complex data analytical algorithms.
- Created Hive Generic UDF's to process business logic with Hive QL.
- Involved in optimizing Hive queries, improve performance by configuring Hive Query parameters.
- Used Cassandra Query Language (CQL) to perform analytics on time series data.
- Worked on HBase Shell, CQL, HBase API and Cassandra Hector API as part of the above proof of concept.
- Moving data from HDFS to Cassandra using Map Reduce and Bulk Output Format class.
- Responsible for running Hadoop streaming Jobs to process terabytes of XML Data.
- Development of Oozie workflow for orchestrating and scheduling the ETL process.
- Involved in implementation of Avro,ORC, and Parquet data formats for Apache Hive computations to handle the custom business requirements.
- Used Flume to collect, aggregate, and store the web log data from different sources like web servers, mobile and network devices and pushed to HDFS
- Write Unix shell scripts in combination with the Talend data maps to process the source files and load into staging database
- Worked in retrieving transaction data from RDBMS to HDFS, get total transacted amount per user using MapReduce and save output in Hive table.
- Experience in implementing Kafka Consumers and Producers by extending Kafka high-level API in java and ingesting data to HDFS or Hbase depending on the context.
- Worked on creating the workflow to run multiple Hive and Pig jobs, which run independently with time and data availability.
- Developed SQL scripts using Spark for handling different data sets and verifying the performance over Map Reduce Jobs.
- Involved in converting Map Reduce programs into Spark transformations using Spark RDD's and Python.
- Worked on Amazon AWS concepts like EMR and EC2 web services for fast and efficient processing of Big Data.
- Responsible for maintaining and expanding AWS (Cloud Services) infrastructure using AWS (SNS, SQS)
- Developed Spark scripts by using Python Shell commands as per the requirement.
- Experience implementing machine learning techniques in Spark by using Spark Mlib.
- Involved in moving data from Hive tables into Cassandra for real time analytics on hive tables.
- Involved in using Hadoop bench marks in Monitoring, Testing Hadoop cluster.
- Involved in implementing test cases, testing map reduce programs using MRUnit and other mocking frame works.
- Involved in cluster maintenance which includes adding, removing cluster nodes, cluster monitoring and troubleshooting, reviewing and managing data backups and Hadoop log files.
- Involved in implementing Maven build scripts, to work on maven projects and integrated with Jenkins.
Environment: s: Hadoop, AWS, Map Reduce, Hive, Spark, Avro, Kafka, Storm, Linux, Sqoop, Shell Scripting, Oozie, Cassandra, Git, XML, Scala, Java, Maven, Eclipse, Oracle.
Confidential, Miami, FL
- Involved in full life-cycle of the project from Design, Analysis, logical and physical architecture Modeling, Development, Implementation, Testing.
- Responsible to managing data coming from different sources and involved in HDFS maintenance and loading of structured and unstructured data.
- Developed MapReduce programs to parse the raw data and store the refined data in tables.
- Designed and Modified Database tables and used HBASE Queries to insert and fetch data from tables.
- Involved in moving all log files generated from various sources to HDFS for further processing through Flume.
- Developed algorithms for identifying Influencers with in specified social network channels.
- Involved in loading and transforming large sets of Structured, Semi structured and Unstructured data from relational databases into HDFS using Sqoop imports.
- Analyzing data with Hive, Pig and Hadoop Streaming.
- Responsible for analyzing and cleansing raw data by performing Hive queries and running Pig scripts on data.
- Developed Pig Latin scripts to extract the data from the web server output files to load into HDFS.
- Created Hive tables, loaded data and wrote Hive queries that run within the map.
- Used OOZIE Operational Services for batch processing and scheduling workflows dynamically.
- Populated HDFS and Cassandra with huge amounts of data using Apache Kafka.
- Experienced in working with Apache Storm.
- Hands on experience in application development using Java, RDBMS, and Linux shell scripting.
- Performed Data mining investigations to find new insights related to customers.
- Involved in forecast based on the present results and insights derived from Data analysis.
- Developed sentiment analysis system per particular domain using machine learning concepts by using supervised learning methodology.
- Configured Hadoop Environment with Kerberos authentication, Name nodes, and Data nodes.
- Designed Sources to Targets mappings from SQLServer, Excel/Flat files to Oracle using Informatica Power Center.
- Created Data Marts and loaded the data using Informatica Tool.
- Developed and generated insights based on brand conversations, which in turn helpful for effectively driving brand awareness, Engagement and traffic to Social media pages.
- Involved in identification of topics and trends and building context around that brand.
- Developed different formulas for calculating engagement on social media posts.
- Involved in the Identifying, Analyzing defects, questionable function error and inconsistencies in output.
Environment: Java, NLP, HBase, Machine Learning, Hadoop, HDFS, Map Reduce, Hortonworks, Hive, Apache Storm, Sqoop, Flume, Oozie, Apache Kafka, Zookeeper, MySQL, and eclipse
Confidential, Houston, TX
- Developed high-level design documents, Use case documents, detailed design documents and Unit Test Plan documents and created Use Cases, Class Diagrams and Sequence Diagrams using UML.
- Extensive involvement in database design, development, coding of stored Procedures, DDL&DML statements, functions and triggers.
- Utilized Hibernate for Object/Relational Mapping purposes for transparent persistence onto the SQL server.
- Developed portlet kind of user experience using Ajax, jQuery.
- Used spring IOC for creating the beans to be injected at the run time.
- Modified the existing JSP pages using JSTL.
- Integrated Spring Dependency Injection among different layers of an application with Spring and O/R mapping tool of Hibernate for rapid development and ease of maintenance.
- Designed and developed the web-tier using HTML5, CSS3, JSP, Servlets, Struts and Tiles framework.
- Developed the UI panels using JSF, XHTML, CSS and JQuery.
- Extensively involved in writing Object relational mapping code using hibernate, developed Hibernate mapping files for configuring Hibernate POJOs for relational mapping.
- Developed the RESTful web services using Spring IOC to provide user a way to run the job and generate daily status report.
- Developed and exposed the SOAP web services by using JAX-WS, WSDL, AXIS, JAXP and JAXB
- Involved in developing business components using EJB Session Beans and persistence using EJB Entity beans.
- Implemented the Connectivity to the Database Server Using JDBC.
- Consumed Web Services using Apache CXF framework for getting remote information.
- Used the Eclipse as IDE, configured and deployed the application into WebLogic application server.
- Used Maven build scripts to automate the build and deployment process.
- Used JMS in the project for sending and receiving the messages on the queue.
- Developed UI using HTML, CSS, Java Script and AJAX.
- Used Oracle IDE to create web services for EI application using top down approach.
- Worked on creating basic framework for spring and web services enabled environment for EI applications as web service provider.
- Created SOAP Handler to enable authentication and audit logging during Web Service calls.
- Created Service Layer API's and Domain objects using Struts.
- Designed, developed and configured the applications using Struts Framework.
- Created Spring DAO classes to call the database through spring -JPA ORM framework.
- Wrote PL/SQL queries and created stored procedures and invoke stored procedures using spring JDBC.
- Used Exception handling and Multi-threading for the optimum performance of the application.
- Used the Core Java concepts to implement the Business Logic.
- Created High level Design Document for Web Services and EI common framework and participated in review discussion meeting with client.
- Deployed and configured the data source for database in WebLogic application server and utilized log4j for tracking errors and debugging, maintain the source code using Subversion.
- Used Clear Case tool for build management and ANT for Application configuration and Integration.
- Created, executed, and documented, the tests necessary to ensure that an application and/or environment meet performance requirements (Technical, Functional and User Interface)
Environment: Windows, Linux, Rational Clear Case, Java, JAX-WS, SOAP, WSDL, JSP, Java Script, Ajax, Oracle IDE, log4j, ANT, struts, JPA, XML, HTML5, CSS3, Oracle WebLogic.
Software Developer - Intern
- Worked as a Development Team Member.
- Coordinated with Business Analysts to gather the requirement and prepare data flow diagrams and technical documents.
- Identified Use Cases and generated Class, Sequence and State diagrams using UML.
- Used JMS for the asynchronous exchange of critical business data and events among J2EE components and legacy system.
- Involved in Designing, coding and maintaining of Entity Beans and Session Beans using EJB 2.1 Specification.
- Involved in the development of Web Interface using MVC Struts Framework.
- User Interface was developed using JSP and tags, CSS, HTML and Java Script.
- Database connection was made using properties files.
- Used Session Filter for implementing timeout for ideal users.
- Used stored Procedure to interact with database.
- Development of Persistence was done using DAO and Hibernate Framework.
Environment: J2EE, Struts1.0, Java Script, Swing, CSS, HTML, XML, XSLT, DTD, JUnit, EJB 2.1, Oracle, Tomcat, Eclipse, Web logic 7.0/8.1.