- Big Data developer with over 8+ years of professional IT experience, which includes 4years’ experience in the field of Big Data.
- Extensive experience in working with various distributions of Hadoop like enterprise versions of Cloudera, Hortonworks and good knowledge on MAPR distribution and Amazon’s EMR.
- In depth experience in using various Hadoop Ecosystem tools like HDFS, MapReduce, Yarn, Pig, Hive, Sqoop, Spark, Storm, Kafka, Oozie, Elastic search, HBase, and Zookeeper.
- Extensive knowledge of Hadoop architecture and its components.
- Good knowledge in installing, configuring, monitoring and troubleshooting Hadoop cluster and its eco - system components.
- Exposure to Data Lake Implementation using Apache Spark.
- Developed Data pipe lines and applied business logics using Spark.
- Well-versed in spark components like Spark SQL, MLib, Spark streaming and GraphX.
- Extensively worked on Spark streaming and Apache Kafka to fetch live stream data.
- Used Scala and Python to convert Hive/SQL queries into RDD transformations in Apache Spark.
- Experience in integrating Hive queries into Spark environment using Spark SQL.
- Expertise in performing real time analytics on big data using HBase and Cassandra .
- Handled importing data from RDBMS into HDFS using Sqoop and vice-versa.
- Extensive experience in importing and exporting streaming data into HDFS using stream processing platforms like Flume and Kafka.
- Experience in developing data pipeline using Pig, Sqoop, and Flume to extract the data from weblogs and store in HDFS.
- Created User Defined Functions (UDFs), User Defined Aggregated Functions (UDAFs) in PIG and Hive.
- Hands-on experience in tools like Oozie and Airflowto orchestrate jobs.
- Proficient in NoSQL databases including HBase, Cassandra, MongoDB and its integration with Hadoop cluster.
- Expertise in Cluster management and configuring Cassandra Database.
- Great familiarity with creating Hive tables, Hive joins & HQL for querying the databases eventually leading to complex Hive UDFs.
- Accomplished developing Pig Latin Scripts and using Hive Query Language for data analytics.
- Worked on different compression codecs (ZIO, SNAPPY, GZIP) and file formats (ORC, AVRO, TEXTFILE, PARQUET)
- Experience in practical implementation of cloud-specific AWS technologies including IAM, Amazon Cloud Services like Elastic Compute Cloud (EC2), ElastiCache, Simple Storage Services (S3), Cloud Formation, Virtual Private Cloud (VPC), Route 53, Lambda, EBS.
- Built AWS secured solutions by creating VPC with public and private subnets.
- Worked on data warehousing and ETL tools like Informatica, Talend, and Pentaho.
- Expertise working in JAVA J2EE, JDBC, ODBC, JSP, Java Eclipse, Java Beans, EJB, Servlets.
- Developed web page interfaces using JSP, Java Swings, and HTML scripting languages.
- Developed web page interfaces using JSP, Java Swings, and HTML scripting languages.
- Experience working with Spring and Hibernate frameworks for JAVA.
- Worked on various programming languages using IDEs like Eclipse, NetBeans, andIntellij.
- Excelled in using version control tools like PVCS, SVN, VSS and GIT.
- Development experience in DBMS like Oracle, MS SQL Server, Teradata, and MYSQL .
- Developed stored procedures and queries using PL/SQL.
- Experience with best practices of Web services development and Integration (both REST and SOAP ).
- Experienced in using build tools like Ant, Gradle, SBT, Maven to build and deploy applications into the server.
- Knowledge in Unified Modeling Language (UML) and expertise in Object Oriented Analysis and Design (OOAD) and knowledge
- Experience in complete Software Development Life Cycle (SDLC) in both Waterfall and Agile methodologies
- Knowledge in Creating dashboards and data visualizations using Tableau to provide business insights
- Excellent communication skills, interpersonal skills, problem-solving skills and very good team player along with a can do attitude and ability to effectively communicate with all levels of the organization such as technical, management and customers.
Languages/Tools: Java, C, C++, C#, Scala, VB, XML, HTML/XHTML, HDML, DHTML.
Big Data: HDFS, MapReduce, HIVE, PIG, HBase, SQOOP, Oozie, Zookeeper, Spark, Mahout, Kafka, Storm, Cassandra, Solr, Impala, Greenplum, MongoDB
J2EE Standards: JDBC, JNDI, JMS, Java Mail & XML Deployment Descriptors.
Web/Distributed Technologies: J2EE,Servlets 2.1/2.2, JSP 2.0, Struts 1.1, Hibernate 3.0, JSF, JSTL1.1,EJB 1.1/2.0, RMI,JNI, XML,JAXP,XSL,XSLT, UML, MVC,STRUTS,Spring 2.0, Corba, Java Threads.
Operating System: Windows 95/98/NT/2000/XP, MS-DOS, UNIX, multiple flavors of Linux.
Databases / NO SQL: Oracle 10g, MS SQL Server 2000, DB2, MS Access & MySQL. Teradata, Cassandra, Greenplum and MongoDB
Browser Languages: HTML, XHTML, CSS, XML, XSL, XSD, XSLT.
Browser Scripting: Java script, HTML DOM, DHTML, AJAX.
App/Web Servers: IBM Websphere 5.1.2/5.0/4.0/3.5, BEA Web logic 5.1/7.0, Jdeveloper, Apache Tomcat, JBoss.
GUI Environment: Swing, AWT, Applets.
Messaging & Web Services Technology: SOAP, WSDL,UDDI, XML, SOA, JAX-RPC, IBM WebSphere MQ v5.3, JMS.
Networking Protocols: HTTP, HTTPS, FTP, UDP, TCP/IP, SNMP, SMTP, POP3.
Testing &Case Tools: Junit, Log4j, Rational Clear case, CVS, ANT, Maven, JBuilder.
Version Control Systems: Git, SVN, CVS
Confidential - Ohio
Sr. Big Data/Spark Developer
- Involved in analyzing business requirements and prepared detailed specifications that follow project guidelines required for project development.
- Used Sqoop to import data from Relational Databases like MySQL, Oracle.
- Involved in importing structured and unstructured data into HDFS.
- Responsible for fetching real time data using Kafka and processing using Spark and Scala.
- Worked on Kafka toimport real time weblogs and ingested the data to Spark Streaming.
- Developed business logic using Kafka Direct Stream in Spark Streaming andimplemented business transformations.
- Worked on Building and implementing real-time streaming ETL pipeline using Kafka Streams API.
- Worked on Hive to implement Web Interfacing and stored the data in Hive tables.
- Migrated Map Reduce programs into Spark transformations using Spark and Scala.
- Experienced with Spark Context, Spark-SQL, Spark YARN.
- Implemented Spark Scripts using Scala, Spark SQL to access hive tables into spark for faster processing of data.
- Loaded the data into Spark RDD and do in memory data Computation to generate the Output response.
- Implemented data quality checks using Spark Streaming and arranged passable and bad flags on the data.
- Implemented Hive Partitioning and Bucketing on the collected data in HDFS.
- Involved in Data Querying and Summarization using Hive and Pig and created UDF’s, UDAF’s and UDTF’s.
- Implemented Sqoop jobs for large data exchanges between RDBMS andHive clusters.
- Extensively used Zookeeper as a backup server and job scheduled for Spark Jobs.
- Developed traits and case classes etc in Scala.
- Developed Spark scripts using Scala shell commands as per the business requirement.
- Worked on Cloudera distribution and deployed on AWS EC2 Instances.
- Experienced in loading the real-time data to NoSQL database like Cassandra.
- Well versed in using Data Manipulations, Compactions, in Cassandra.
- Experience in retrieving the data present in Cassandra cluster by running queries in CQL (Cassandra Query Language).
- Worked on connecting Cassandra database to the Amazon EMR File System for storing the database in S3.
- Implemented usage of Amazon EMR for processing Big Data across a Hadoop Cluster of virtual servers on Amazon Elastic Compute Cloud (EC2) and Amazon Simple Storage Service (S3).
- Deployed the project on Amazon EMR with S3 connectivity for setting a backup storage.
- Well versed in using of Elastic Load Balancer for Autoscalingin EC2 servers.
- Configured work flows that involves Hadoop actions using Oozie.
- Used Python for pattern matching in build logs to format warnings and errors.
- Coordinated with SCRUM team in delivering agreed user stories on time for every sprint.
Environment: Hadoop YARN, Spark SQL, Spark-Streaming, AWS S3, AWS EMR, Spark-SQL, GraphX, Scala, Python, Kafka, Hive, Pig, Sqoop, Cassandra, Cloudera, Oracle 10g, Linux.
Confidential - Coraopolis, PA
Hadoop/Big Data Analyst
- Developed MapReduce programs to parse and filter the raw data store the refined data in partitioned tables in the Greenplum.
- Created Hive queries that helped market analysts spot emerging trends by comparing fresh data with Greenplum reference tables and historical metrics.
- Responsible for creating Hive tables, loading the structured data resulted from MapReduce jobs into the tables and writing hive queries to further analyze the logs to identify issues and behavioral patterns.
- Involved in running MapReduce jobs for processing millions of records.
- Built reusable Hive UDF libraries for business requirements which enabled users to use these UDF's in Hive Querying.
- Developed a data pipeline using Kafka and Storm to store data into HDFS.
- Experienced in migrating Hive QL into Impala to minimize query response time.
- Responsible for Data Modeling in Cassandra as per our requirement.
- Shared responsibility for administration of Hadoop, Hive and Pig.
- Managing and scheduling Jobs on a Hadoop cluster using Oozie and cron jobs.
- Installed and configured Hadoop MapReduce, HDFS, Developed multiple MapReduce jobs in java for data cleaning and preprocessing.
- Used Elastic Search & MongoDB for storing and querying the offers and non-offers data.
- Created UDFs to calculate the pending payment for the given Residential or Small Business customer, and used in Pig and Hive Scripts .
- Deployed and built the application using Maven .
- Maintain Hadoop, Hadoop ecosystems, third party software, and database(s) with updates/upgrades, performance tuning and monitoring using Ambari
- Obtained good experience with NOSQL database Cassandra.
- Used Cassandra CQL with Java API's to retrieve data from Cassandra tables.
- Experience in managing and reviewing Hadoop log files.
- Experienced in moving data from Hive tables into Cassandra for real time analytics on Hive tables.
- Used Python scripting for large scale text processing utilities
- Handled importing of data from various data sources, performed transformations using Hive. (External tables, partitioning).
- Involved in NoSQL (DataStax Cassandra) database design, integration and implementation.
- Implemented CRUD operations involving lists, sets and maps in DataStax Cassandra.
- Responsible for data modeling in MongoDB in order to load data which is coming as structured as well as unstructured data.
- Unstructured files like XML's, JSON files are processed using custom built Java API and pushed into mongodb.
- Participated in development/implementation of Cloudera Hadoop environment.
- Created tables, inserted data and executed various Cassandra Query Language (CQL 3) commands on tables from java code and using cqlsh command line client .
- Wrote test cases in MRunitfor unit testing of Mapreduce Programs.
- Created Business Logic using Servlets, Session beans and deployed them on Web logic server.
- Developed the XML Schema and Web services for the data maintenance and structures
- Provided Technical support for production environments resolving the issues, analyzing the defects, providing and implementing the solution defects.
- Built and deployed Java applications into multiple Unix based environments and produced both unit and functional test results along with release notes.
Confidential - Peoria, IL
Hadoop/Big Data Analyst
- Worked on analyzing Hadoop cluster using different big data analytic tools including Pig, Hive, and MapReduce.
- Worked on debugging, performance tuning of Hive & Pig Jobs.
- Created Hbase tables to store various data formats of PII data coming from different portfolios.
- Implemented test scripts to support test driven development and continuous integration.
- Worked on tuning the performance Pig queries.
- Involved in loading data from LINUX file system to HDFS .
- Importing and exporting data into HDFS and Hive using Sqoop .
- Experience working on processing unstructured data using Pig and Hive.
- Experienced in running Hadoop streaming jobs to process terabytes of xml format data.
- Gained experience in managing and reviewing Hadoop log files.
- Involved in scheduling Oozie workflow engine to run multiple Hive and pig jobs.
- Developed Pig Latin scripts to extract data from the web server output files to load into HDFS .
- Extensively used Pig for data cleansing.
- Created and maintained Technical documentation for launching HADOOP Clusters and for executing Hive queries and Pig Scripts.
- Implemented SQL, PL/SQL Stored Procedures.
- Actively involved in code review and bug fixing for improving the performance.
Environment : Hadoop, HDFS, Pig, Hive, MapReduce, Sqoop, LINUX, Cloudera, Big Data, Java APIs, Java collection, SQL, AJAX.
Confidential - SanFrancisco, CA
- Involved in design and development phases of Software Development Life Cycle (SDLC).
- Involved in designing UML Use case diagrams, Class diagrams, and Sequence diagrams using Rational Rose.
- Followed agile methodology and SCRUM meetings to track, optimize and tailored features to customer needs.
- Developed user interface using JSP, JSP Tag libraries, and Java Script to simplify the complexities of the application.
- Implemented Model View Controller (MVC) architecture using Jakarta Struts 1.3 frameworks at presentation tier.
- Developed a Dojo based front end including forms and controls and programmed event handling.
- Implemented SOA architecture with web services using JAX-RS (REST) and JAX-WS (SOAP)
- Developed various Enterprise Java Bean components to fulfill the business functionality.
- Created Action Classes which route submittals to appropriate EJB components and render retrieved information.
- Validated all forms using Struts validation framework and implemented Tiles framework in the presentation layer.
- Used Core java and object oriented concepts.
- Extensively used Hibernate 3.0 in data access layer to access and update information in the database.
- Used Spring 2.0 Framework for Dependency injection and integrated it with the Struts Framework and Hibernate.
- Used JDBC to connect to backend databases, Oracle and SQL Server 2005.
- Proficient in writing SQL queries, stored procedures for multiple databases, Oracle 10g and SQL Server 2005.
- Wrote Stored Procedures using PL/SQL. Performed query optimization to achieve faster indexing and making the system more scalable. .
- Used Java Messaging Services (JMS) for reliable and asynchronous exchange of important information such as payment status report.
- Used Web Services - WSDL and REST for getting credit card information from third party and used SAX and DOM XML parsers for data retrieval.
- Implemented SOA architecture with web services using Web Services like JAX-WS.
- Extensively used IBM RAD 7.0 for writing code.
- Implemented Persistence layer using Hibernate to interact with Oracle 10g and SQL Server 2005 databases.
- Used ANT scripts to build the application and deployed on Web Sphere Application Server.
- Extensively involved in the design and development of JSP screens to suit specific modules.
- Converted the application’s console printing of process information to proper logging technology using log4j.
- Developed the business components (in core Java) used in the JSP screens.
- Involved in the implementation of logical and physical database design by creating suitable tables,views and triggers.
- Developed related procedures and functions used by JDBC calls in the above components.
- Extensively involved in performance tuning of Oracle queries.
- Created components to extract application messages stored in xml files.
- Executed UNIX shell scripts for command line administrative access to oracle database and for scheduling backup jobs.
- Created war files and deployed in web server.
- Performed source and version control using VSS.
- Involved in maintenance support.
Junior JAVA Developer
- Involved in the analysis, design, implementation, and testing of the project.
- Developed web components using JSP, Servlets and JDBC.
- Designed tables and indexes.
- Extensively worked on JUnit for testing the application code of server-client data transferring.
- Developed and enhanced products in design and in alignment with business objectives.
- Used SVN as a repository for managing/deploying application code.
- Involved in the system integration and user acceptance tests successfully.
- Developed front end using JSTL, JSP, HTML, and Java Script .
- Wrote complex SQL queries and stored procedures.
- Involved in fixing bugs and unit testing with test cases using JUnit.
- Actively involved in the system testing.
- Involved in implementing service layer using Spring IOC module.
- Prepared the Installation, Customer guide and Configuration document which were delivered to the customer along with the product.