- 6+ years of overall experience in Information Technology (IT) industry including 2+ years of hands on experience in Bigdata ecosystem technologies such as Apache Hadoop, MapReduce, Spark, Hive, Pig, HBase, Oozie, Sqoop, Avro, Cassandra, Flume, Kafka, and Zookeeper
- Technical skilled at developing new applications on Hadoop according to business needs and converting existing applications to Hadoop environment
- Analysis, design, development, and production support using data warehouse, Extract, Transform, and Load (ETL), core Java and mainframe applications
- Responsible for analysing big data and provide technical expertise and recommendations to improve current existing systems
- Hands on experience in capacity planning, monitoring and performance tuning of Hadoop clusters
- Good knowledge in distributed programming through spark, specifically Scala, Python
- Proficient knowledge on Apache Spark and programming Scala to analyse large datasets using Spark Streaming and Kafka to process real time data
- Worked on writing custom UDF in Python, Java for Hive
- Extensive experience in importing and exporting data using Sqoop from HDFS/Hive/HBase to Relational Database Systems (RDBMS) and vice versa.
- Experience in collecting, aggregating and moving large amounts of streaming data using
- Flume, Kafka, Spark Streaming.
- Hands on experience in writing MapReduce programs in Java and Python
- Hands - on experience on full life cycle implementation using MapReduce, CDH (Cloudera) and HDP (Hortonworks Data Platform)
- Involved in design and architecting of Big Data solutions using Hadoop ecosystem
- Tune up Yarn Configurations for Yarn-Spark Optimization
- Experience in database design, entity relationships, database analysis, programming SQL, stored procedure’s PL/ SQL, packages and triggers in Oracle and MongoDB on Unix /Linux.
- Strong in core java, data structure, algorithms design, Object-Oriented Design(OOD) and
- Java components like Collections Framework, Exception handling, I/O system
- Strong knowledge of Object Oriented Programming (OOP) concepts including the use of Polymorphism, Abstraction, Inheritance, and Encapsulation.
- Demonstrated ability to communicate and gather requirements, partner with enterprise architects, business users, analysts and development teams to deliver rapid iteration of complex solutions
- Excellent global exposure to various work cultures and client interaction with diverse teams
- Experience in Tableau reporting.
- Experience in Agile, Waterfall, and Scrum Development environments by using Git and JIRA
Operating Systems: Linux, MacOS, Raspbian, Windows, Robot \ Python 2.7/3.6, Scala, Java 8, R, MATLAB
Tools: Tableau, Plotly, Microsoft Office (Word, \ Apache Hadoop 2.5, Spark 1.6/2.3, MapReduce, Excel with macros, PowerPoint), Putty, and \ Hive 1.2.1, HDFS, Kafka 0.10.0.1, Pig Fritzing\ Flume 1.5.2, Oozie
Confidential, Woodcliff Lake, NJ
- Built data pipeline using Kafka, Sqoop, Hive and Spark to ingest data into HDFS for analysis
- Design schemas and create HIVE tables for importing data from multiple data sources using Sqoop
- Handled importing of data from various data sources, performed transformations using Spark and loaded data into Hive Database and HDFS
- Enhanced and optimized product Spark code to aggregate, group and run data mining tasks using Spark framework
- Experienced with Spark Context, Spark -SQL, Data Frame, Pair RDD’s, Spark YARN
- Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs.
- Created and populated bucketed tables in Hive to allow for faster mapside joins and for more efficient jobs and more efficient sampling. Also performed partitioning of data to optimize Hive queries
- Worked with NoSQL databases like HBase in creating HBase tables to load large sets of semi structured data coming from Kafka using Kafka connect API
- Worked in transforming JSON data in streams for data pipelines.
- Applied other HDFS formats and structure (Avro, Parquet) to support fast retrieval of data, user analytics and analysis.
- Experienced with Amazon EMR and Ec2
- Worked on building custom ETL workflows using Spark/Hive to perform data cleaning and mapping.
- Created Oozie coordinated workflow to execute Sqoop incremental job daily
- Worked on automating Data pipelines for production level data.
Environment: Scala 2.11.8, Apache Hadoop 2.5, Hive, HDFS, YARN, Apache Spark 1.6.0, Spark SQL, Sqoop, Kafka, HBase, Elasticsearch, JIRA.
Confidential, New York City, NY
Big Data Engineer
- Worked on analysing Hadoop cluster using different big data analytic tools including Flume, Hive, Sqoop, Spark, SparkSQL.
- Worked with technology and business groups for Hadoop migration strategy.
- Worked with SQOOP import and export functionalities to handle large data set transfer between DB2 database and HDFS
- Worked with Flume to import the log data from the Syslog’s into the Hadoop Cluster
- Load data into Cassandra table from HDFS using Flume
- Implemented Partitioning, Bucketing in Hive for better organization of the data.
- Created hive queries for extracting data and sending them to clients.
- Created Scala programs to develop the reports for Business users.
- Created hive UDFs for formatting data in Python
- Distributed programming through Spark, specifically Scala
- Transformation and Analysis in Hive/Pig, Parsing the raw data using Map reduce and Spark
- Collaborating with business users/product owners/developers to contribute to the analysis of functional requirements.
Environment: Hadoop 2.5, Hive, Pig, SPARK 1.6, SCALA, MapReduce, HBASE, Sqoop, Kafka, MongoDB
- Built business process models using MS Visio 2013 to create case diagrams and flow diagrams to show flow of steps that are required.
- Designed, implemented, automated modelling and analysis procedures on existing and experimentally created data
- Developed Ad-hoc analyses to aid team in understanding Customer behaviour, developed POCs, and drove Decision-Making
- Collaborated with Data Architects on changes associated with Data Systems and System Interfaces
- Debugged many PL/SQL packages procedures, function, cursors and types for applications
- Created dynamic linear models to perform trend analysis using Python
- Used MS Excel, MS Access and SQL to write and run various queries.
- Used traceability matrix to trace the requirements of the organization.
- Analysed the data and created dashboards using Tableau
- Managed the Metadata associated with the ETL processes used to populate the Data Warehouse
- Partnered with other Analysts to Develop Data Infrastructure (data pipelines, reports etc.) and other tools to make Analytics easier and more Effective
Environment: Python 2.7.0, SQL, Oracle 11g, MS Office, MS Visio, Tableau
- Deployed the production site using Apache and Docker
- Used Web Services and API’s in python to create interactive application
- Proficient in SQL databases MySQL, PostgreSQL, Oracle and MongoDB
- Involved in Unit testing and Integration testing of the code using PyTest
- Experienced in working with various Python Integrated Development Environments like IDLE, PyCharm, Atom, Eclipse, PyDev and Sublime Text
- Experience in creating initial website prototype from Django skeleton and building out Views, Templates using CSS for whole site following Django MVC architecture.
Environment: Python2.7, PyTest, IDLE, PyCharm, Atom, Eclipse, PyDev, Sublime Text, MySQL, PostgreSQL, Oracle, MongoDB, Jupyter Notebook, Anaconda, GitHub, Jira
- Interacted with clients to gather specification of the product
- Analysed the specification proposed and designed the use cases using Microsoft Visio
- Used the Model View Controller framework to build the web portal application
- Wrote SQL scripts to create and maintain the database, roles, users, tables, views, procedures and triggers in Oracle
- Interaction of data between different layers was achieved through XML
- Participated in designing the user interface for the application using HTML and Java Server Pages (JSP).
- Prepared solution design document for website design and development
- Developed and executed test cases suite for unit and integration testing using Junit test cases
- Conducted IQA, EQA, code walk-through and reviews with the team
- Researched and implemented bug fixes and documented them
- Provided post-implementation, application maintenance and enhancement support to the client about the product / software application
- Deployed the application using Oracle Weblogic Server by creating WAR files of the complete application