- Big Data developer with over 8+ years of professional IT experience, which includes 4years’ experience in the field of Big Data.
- Extensive experience in working with various distributions of Hadoop like enterprise versions of Cloudera, Hortonworks and good knowledge on MAPR distribution and Amazon’s EMR.
- In depth experience in using various Hadoop Ecosystem tools like HDFS, MapReduce, Yarn, Pig, Hive, Sqoop, Spark, Storm, Kafka, Oozie, Elastic search, HBase, and Zookeeper.
- Extensive knowledge of Hadoop architecture and its components.
- Good knowledge in installing, configuring, monitoring and troubleshooting Hadoop cluster and its eco - system components.
- Exposure to Data Lake Implementation using Apache Spark.
- Developed Data pipe lines and applied business logics using Spark.
- Well-versed in spark components like Spark SQL, MLib, Spark streaming and GraphX.
- Extensively worked on Spark streaming and Apache Kafka to fetch live stream data.
- Used Scala and Python to convert Hive/SQL queries into RDD transformations in Apache Spark.
- Experience in integrating Hive queries into Spark environment using Spark SQL.
- Expertise in performing real time analytics on big data using HBase and Cassandra.
- Handled importing data from RDBMS into HDFS using Sqoop and vice-versa.
- Extensive experience in importing and exporting streaming data into HDFS using stream processing platforms like Flume and Kafka.
- Experience in developing data pipeline using Pig, Sqoop, and Flume to extract the data from weblogs and store in HDFS.
- Created User Defined Functions (UDFs), User Defined Aggregated Functions (UDAFs) in PIG and Hive.
- Hands-on experience in tools like Oozie and Airflowto orchestrate jobs.
- Proficient in NoSQL databases including HBase, Cassandra, MongoDB and its integration with Hadoop cluster.
- Expertise in Cluster management and configuring Cassandra Database.
- Great familiarity with creating Hive tables, Hive joins & HQL for querying the databases eventually leading to complex Hive UDFs.
- Accomplished developing Pig Latin Scripts and using Hive Query Language for data analytics.
- Worked on different compression codecs (ZIO, SNAPPY, GZIP) and file formats (ORC, AVRO, TEXTFILE, PARQUET)
- Experience in practical implementation of cloud-specific AWS technologies including IAM, Amazon Cloud Services like Elastic Compute Cloud (EC2), ElastiCache, Simple Storage Services (S3), Cloud Formation, Virtual Private Cloud (VPC), Route 53, Lambda, EBS.
- Built AWS secured solutions by creating VPC with public and private subnets.
- Worked on data warehousing and ETL tools like Informatica, Talend, and Pentaho.
- Expertise working in JAVA J2EE, JDBC, ODBC, JSP, Java Eclipse, Java Beans, EJB, Servlets.
- Developed web page interfaces using JSP, Java Swings, and HTML scripting languages.
- Developed web page interfaces using JSP, Java Swings, and HTML scripting languages.
- Experience working with Spring and Hibernate frameworks for JAVA.
- Worked on various programming languages using IDEs like Eclipse, NetBeans, and Intellij.
- Excelled in using version control tools like PVCS, SVN, VSS and GIT.
- Development experience in DBMS like Oracle, MS SQL Server, Teradata, and MYSQL.
- Developed stored procedures and queries using PL/SQL.
- Experience with best practices of Web services development and Integration (both REST andSOAP).
- Experienced in using build tools like Ant, Gradle, SBT, Maven to build and deploy applications into the server.
- Knowledge in Unified Modeling Language (UML) and expertise in Object Oriented Analysis and Design (OOAD) and knowledge
- Experience in complete Software Development Life Cycle (SDLC) in both Waterfall and Agile methodologies
- Knowledge in Creating dashboards and data visualizations using Tableau to provide business insights
- Excellent communication skills, interpersonal skills, problem-solving skills and very good team player along with a can do attitude and ability to effectively communicate with all levels of the organization such as technical, management and customers.
Languages/Tools: Java, C, C++, C#, Scala, VB, XML, HTML/XHTML, HDML, DHTML.
Big Data: HDFS, MapReduce, HIVE, PIG, HBase, SQOOP, Oozie, Zookeeper, Spark, Mahout, Kafka, Storm, Cassandra, Solr, Impala, Greenplum, MongoDB
J2EE Standards: JDBC, JNDI, JMS, Java Mail & XML Deployment Descriptors.
Web/Distributed Technologies: J2EE,Servlets 2.1/2.2, JSP 2.0, Struts 1.1, Hibernate 3.0, JSF, JSTL1.1,EJB 1.1/2.0, RMI,JNI, XML,JAXP,XSL,XSLT, UML, MVC,STRUTS,Spring 2.0, Corba, Java Threads.
Operating System: Windows 95/98/NT/2000/XP, MS-DOS, UNIX, multiple flavors of Linux.
Databases / NO SQL: Oracle 10g, MS SQL Server 2000, DB2, MS Access & MySQL. Teradata, Cassandra, Greenplum and MongoDB
Browser Languages: HTML, XHTML, CSS, XML, XSL, XSD, XSLT.
Browser Scripting: Java script, HTML DOM, DHTML, AJAX.
App/Web Servers: IBM Websphere 5.1.2/5.0/4.0/3.5 , BEA Web logic 5.1/7.0, Jdeveloper, Apache Tomcat, JBoss.
GUI Environment: Swing, AWT, Applets.
Messaging & Web Services Technology: SOAP, WSDL,UDDI, XML, SOA, JAX-RPC, IBM WebSphere MQ v5.3, JMS.
Networking Protocols: HTTP, HTTPS, FTP, UDP, TCP/IP, SNMP, SMTP, POP3.
Testing &Case Tools: Junit, Log4j, Rational Clear case, CVS, ANT, Maven, JBuilder.
Version Control Systems: Git, SVN, CVS
Confidential - Ohio
Sr. Big Data/Scala Developer
- Involved in analyzing business requirements and prepared detailed specifications that follow project guidelines required for project development.
- Used Sqoop to import data from Relational Databases like MySQL, Oracle.
- Involved in importing structured and unstructured data into HDFS.
- Responsible for fetching real time data using Kafka and processing using Spark and Scala.
- Worked on Kafka to import real time weblogs and ingested the data to Spark Streaming.
- Developed business logic using Kafka Direct Stream in Spark Streaming and implemented business transformations.
- Worked on Building and implementing real-time streaming ETL pipeline using Kafka Streams API.
- Worked on Hive to implement Web Interfacing and stored the data in Hive tables.
- Migrated Map Reduce programs into Spark transformations using Spark and Scala.
- Experienced with Spark Context, Spark-SQL, Spark YARN.
- By using Vertica Columnar relational database management system used for data warehousing and Big data analytics.
- Strengthened T-SQL coding skills.
- Created schema, table, and T-SQL scripts to archive Vertica resources usage data for trend analysis.
- Implemented Spark Scripts using Scala, Spark SQL to accesshivetables into spark for faster processing of data.
- Loaded the data into Spark RDD and do in memory data Computation to generate the Output response.
- Implemented data quality checks using Spark Streaming and arranged passable and bad flags on the data.
- Implemented Hive Partitioning and Bucketing on the collected data in HDFS.
- Involved in Data Querying and Summarization using Hive and Pig and created UDF’s, UDAF’s and UDTF’s.
- Implemented Sqoop jobs for large data exchanges between RDBMS and Hive clusters.
- Extensively used Zookeeper as a backup server and job scheduled for Spark Jobs.
- Developed traits and case classes etc in Scala.
- Developed Spark scripts using Scala shell commands as per the business requirement.
- Worked on Cloudera distribution and deployed on AWS EC2 Instances.
- Experienced in loading the real-time data to NoSQL database like Cassandra.
- Well versed in using Data Manipulations, Compactions, in Cassandra.
- Experience in retrieving the data present in Cassandra cluster by running queries in CQL (Cassandra Query Language).
- Worked on connecting Cassandra database to the Amazon EMR File System for storing the database in S3.
- Implemented usage of Amazon EMR for processing Big Data across aHadoop Clusterof virtual servers on Amazon Elastic Compute Cloud (EC2) and Amazon Simple Storage Service (S3).
- Deployed the project on Amazon EMR with S3 connectivity for setting a backup storage.
- Well versed in using of Elastic Load Balancer for Auto scaling in EC2 servers.
- Configured work flows that involves Hadoop actions using Oozie.
- Used Python for pattern matching in build logs to format warnings and errors.
- Coordinated with SCRUM team in delivering agreed user stories on time for every sprint.
Environment: Hadoop YARN, Spark SQL, Spark-Streaming, AWS S3, AWS EMR, Spark-SQL, GraphX, Scala, Python, Kafka, Hive, Pig, Sqoop, Cassandra, Cloudera, Oracle 10g, Linux.
Confidential - Coraopolis, PA
Hadoop/Big Data Analyst
- Developed MapReduce programs to parse and filter the raw data store the refined data in partitioned tables in the Hbase.
- Created Hive queries that helped market analysts spot emerging trends by comparing fresh data with Hbase reference tables and historical metrics.
- Responsible for creatingHivetables, loading the structured data resulted from MapReduce jobs into the tables and writinghivequeries to further analyze the logs to identify issues and behavioral patterns.
- Involved in running MapReduce jobs for processing millions of records.
- Built reusable Hive UDF libraries for business requirements which enabled users to use these UDF's in Hive Querying.
- Developed a data pipeline using Scala to store data into HDFS.
- Experienced in migratingHiveQL into Impala to minimize query response time.
- Responsible for Data Modeling in Hbase as per our requirement.
- Shared responsibility for administration of Hadoop, Hive and Pig.
- Managing and scheduling Jobs on a Hadoop cluster using Nifi jobs.
- Installed and configured Hadoop MapReduce, HDFS, Developed multiple MapReduce jobs in java for data cleaning and preprocessing.
- Created UDFs to calculate the pending payment for the given Residential or Small Business customer, and used in Pig and Hive Scripts.
- Deployed and built the application usingMaven.
- Maintain Hadoop, Hadoop ecosystems, third party software, and database(s) with updates/upgrades, performance tuning and monitoring using Ambari
- Obtained good experience with NOSQL database Hbase.
- UsedCassandraCQL with Java API's to retrieve data fromCassandratables.
- Experience in managing and reviewing Hadoop log files.
- Experienced in moving data from Hive tables intoHbasefor real time analytics on Hive tables.
- Handled importing of data from various data sources, performed transformations usingHive. (External tables, partitioning).
- Involved in NoSQL (DataStaxCassandra) database design, integration and implementation.
- Implemented CRUD operations involving lists, sets and maps in DataStaxCassandra.
- Responsible for data modeling inHbasein order to load data which is coming as structured as well as unstructured data.
- Unstructured files like XML's, JSON files are processed using custom built Java API and pushed intomongodb.
- Participated in development/implementation ofClouderaHadoopenvironment.
- Created tables, inserted data and executed variousCassandraQuery Language (CQL 3) commands on tables from java code and using cqlsh command line client .
- Wrote test cases in MRunit for unit testing of Mapreduce Programs.
- Created Business Logic using Servlets, Session beans and deployed them on Web logic server.
- Developed the XML Schema and Web services for the data maintenance and structures
- Provided Technical support for production environments resolving the issues, analyzing the defects, providing and implementing the solution defects.
- Built and deployed Java applications into multiple Unix based environments and produced both unit and functional test results along with release notes.
Confidential - Peoria, IL
Hadoop/Big Data Analyst
- Exported data to a MySQL from HDFS using Sqoop and NFS mount approach.
- Moved data from HDFS to Cassandra using Map Reduce and BulkOutputFormat class.
- Developed Map Reduce programs for applying business rules on the data.
- Developed and executed hive queries for denormalizing the data.
- Works with ETL workflow, analysis of big data and loaded them into Hadoop cluster.
- Installed and configured Hadoop Cluster for development and testing environment.
- Implemented Fair scheduler on the Job tracker to share the resources of the cluster for the map reduces jobs given by the users.
- Involved in creating Shell scripts to simplify the execution of all other scripts (Pig, Hive, Sqoop, Impala and MapReduce) and move the data inside and outside of HDFS
- Automated the workflow using shell scripts.
- Performance tuning of the Hive queries, written by other developer.
- Mastered major Hadoop distros HDP/CDH and numerous Open Source projects
- Prototype various applications that utilize modern Big Data tools.
Environment: Linux, Java, Map Reduce, HDFS, DB2, Cassandra, Hive, Pig, Sqoop, FTP.
Confidential - SanFrancisco, CA
- Worked on analyzing Hadoop cluster using different big data analytic tools including Pig, Hive, and MapReduce.
- Worked on debugging, performance tuning of Hive&Pig Jobs.
- Created Hbase tables to store various data formats of PII data coming from different portfolios.
- Implemented test scripts to support test driven development and continuous integration.
- Worked on tuning the performance Pig queries.
- Involved in loading data from LINUX file system to HDFS.
- Importing and exporting data into HDFS and Hive using Sqoop.
- Experience working on processing unstructured data using Pig and Hive.
- Experienced in running Hadoop streaming jobs to process terabytes of xml format data.
- Gained experience in managing and reviewing Hadoop log files.
- Involved in scheduling Oozie workflow engine to run multiple Hive and pig jobs.
- Developed Pig Latin scripts to extract data from the web server output files to load into HDFS.
- Extensively used Pig for data cleansing.
- Created and maintained Technical documentation for launching HADOOP Clusters and for executing Hive queries and Pig Scripts.
- Implemented SQL, PL/SQL Stored Procedures.
- Actively involved in code review and bug fixing for improving the performance.
Environment: Hadoop, HDFS, Pig, Hive, MapReduce, Sqoop, LINUX, Cloudera, Big Data, Java APIs, Java collection, SQL, AJAX.
- Extensively involved in the design and development of JSP screens to suit specific modules.
- Converted the application’s console printing of process information to proper logging technology using log4j.
- Developed the business components (in core Java) used in the JSP screens.
- Involved in the implementation of logical and physical database design by creating suitable tables,views and triggers.
- Developed related procedures and functions used by JDBC calls in the above components.
- Extensively involved in performance tuning of Oracle queries.
- Created components to extract application messages stored in xml files.
- Executed UNIX shell scripts for command line administrative access to oracle database and for scheduling backup jobs.
- Created war files and deployed in web server.
- Performed source and version control using VSS.
- Involved in maintenance support.
Junior JAVA Developer
- Involved in the analysis, design, implementation, and testing of the project.
- Developed web components using JSP, Servlets and JDBC.
- Designed tables and indexes.
- Extensively worked on JUnit for testing the application code of server-client data transferring.
- Developed and enhanced products in design and in alignment with business objectives.
- Used SVN as a repository for managing/deploying application code.
- Involved in the system integration and user acceptance tests successfully.
- Developed front end using JSTL, JSP, HTML, and Java Script.
- Wrote complex SQL queries and stored procedures.
- Involved in fixing bugs and unit testing with test cases using JUnit.
- Actively involved in the system testing.
- Involved in implementing service layer using Spring IOC module.
- Prepared the Installation, Customer guide and Configuration document which were delivered to the customer along with the product.