- Strong IT Professional with 6 Years of Big Data, Big Data analytics, Hadoop Admin.
- Experienced in installing, configuring Hadoop cluster of major Hadoop distributions.
- Hands on experience in installing, configuring and using ecosystem components like Hadoop Map Reduce, HDFS, HBase, Hive, Sqoop, Pig.
- Experience in importing and exporting the data from RDBMS databases MySQL, Oracle and DB2 into Hadoop Data Lake using SQOOP jobs.
- Experience in developing Sqoop jobs in incremental mode both in append and last updated mode.
- Experience in handling different file formats like Parquet, Apache Avro, Sequence file, JSON, Text files, XML and Flat file format.
- Experience in developing Spark Applications using Spark RDD, Spark - SQL, and Data frames and Dataset APIs.
- Experience in Big Data platforms like Horton works, Cloudera.
- Expertise in optimizing traffic across network using joining multiple tables using joins and organizing data using Partitions and Buckets.
- Strong Knowledge of Hadoop and Hive and Hive's analytical functions.
- Efficient in building Hive, pig and map Reduce scripts.
- Successfully loaded files to Hive and HDFS from MYSQL.
- Loaded the dataset into Hive for ETL Operation.
- Good understanding and knowledge of designing and querying NOSQL databases like HBase for searching, grouping and sorting.
- Good knowledge on Hadoop Cluster architecture and monitoring the cluster.
- Good understanding of cloud configuration in Amazon Web Services (AWS).
- Strong Communication skills of written, oral, interpersonal and presentation
- Ability to perform at a high level, meet deadlines, adaptable to ever changing priorities.
- Extensively used Dynamic SQL commands and SQL stored procedures, Functions and joins to interact with the Database.
- Strong interpersonal and communication skills with the ability to understand both business and technical needs from clients and customers.
- Experienced in performing analytics on structured and unstructured data using Hive queries.
Distributed Computing: Hadoop, Hive, MapReduce, Spark, Kafka, HBase, Impala, Oozie
Amazon Web Services: EC2, EMR, Mobile hub, S3, DynamoDB
Databases: MS SQL Server 2005/2008, Oracle, MySQL, MongoDB, MS Access, and MySQL
Tools: Git, Jenkins, XCode, Jupyter Notebook, Selenium
Software Methodologies: Agile, Scrum, Waterfall
Confidential, Pennington, NJ
- Working, building and processing Data Pipelines on Big Data platform using Hadoop and its components Hive, Spark, Map-Reduce, Sqoop, HBase, Oozie and Impala technologies.
- Extensive knowledge in all stages of the Software Development Life Cycle (SDLC) beginning from initiation, definition to implementation and support.
- Working on trading application - managing the process among various trading flows, routing to execution engines.
- Experienced in FIX messaging and Routing Engine logics.
- Developed various ETL Data Ingestion workflows or pipelines using Apache Oozie.
- Developed the ETL data-streaming pipeline for near real time application generated data, which is stored in Mongodb and using Spark streaming, moved to HBase.
- Importing data using ODBC and JDBC connectors with Sqoop from Nodes to HDFS.
- Created and maintained Hive, HBase, MongoDB and Apache Phoenix databases and tables for audit related validations such as CAT/ARC/UNITY reporting.
- End to End validations on Client facing applications such as Merrill Edge applications.
- Developed Spark scripts by using Scala shell commands as per the requirement.
- Used Kafka to transfer data from different data systems to HDFS.
- Created Spark jobs to see trends in orders placed by users.
- Responsible for generating actionable insights from complex data to drive real business results for various application teams.
- Ingested data from RDBMS and performed data transformations, and then export the transformed data to HBase as per the business requirement.
- Converted unstructured data to structured data using PySpark.
- Developed Spark code to using Scala and Spark-SQL for faster processing and testing.
- Used Spark API over Hadoop YARN as execution engine for data analytics using Hive.
- Exported the analyzed data to the relational databases using Sqoop to further visualize and generate reports for the BI team.
- Proof of Concept and Production development on Hbase/Hive for a loading a large set of Data into HBase.
- Created mapping documents for data flow and ETL process to support the project once it is deployed in production.
Environment: Hive, Sqoop, Spark Core, Spark SQL, Python, PySpark, Kafka, HBase, PySpark, Oracle, MongoDB, Scala
Confidential, Hartford, CT
Big Data Developer
- Building Data Structures & pipelines to organize and manage large scale Consumer Health Data.
- Analyzing structured and unstructured data with Hadoop, Python, Hive, Unix and Spark.
- Implementing Data quality improvement based on project needs.
- Evaluating analytical results & supporting business decision to align with project goals.
- Worked on stories related to Ingestion, Transformation, and Publication of data on time.
- Used Spark for real-time data ingestion from web servers (unstructured and structured).
- Implementing data import and export jobs into HDFS and Hive using Sqoop.
- Converting unstructured data into a structured format using Pig.
- Using Hive as a data warehouse in Hadoop, HQL on the data (structured data).
- Worked with Big Data Hadoop Application on cloud through Amazon Web Services (AWS) EC2 and S3.
- Involving Spark to improvise the performance and optimization of the existing algorithms in Hadoop using Spark context, spark-SQL, Data Frame.
- Imported the data from different sources like cloud services, Local file system into SparkRDD and worked on cloud Amazon Web Services (EMR, S3, EC2, and Lambda).
- Used Spark-Streaming APIs to perform necessary transformations and actions on the fly for building the common learner data model which gets the data from Kafka in near real time and Persists into Cassandra.
- Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BA team.
- Actively involved in all phases of SDLC process/methodology and ensured project delivery
Environment: Hive, Sqoop, Spark Core, Spark SQL, Python, PySpark, Kafka, Pig, AWS, HDFS, Lambda.
Confidential, Boston, MA
- Designed architecture and developed intelligent system POC for Knowledge Management.
- Wrote Hierarchical clustering algorithm from scratch. Implemented supervised machine learning algorithms.
- Technologies worked with MEAN (MongoDB, Express, Angular, Nodejs), Python (NumPy, SciPy, Scikit-Learn, Tensor Flow), Elastic search, Git.
- Deployed and managed the system on AWS instance.
- Imported data using Sqoop to load data from MySQL to HDFS on regular basis.
- Optimized MapReduce Jobs to use HDFS efficiently by using various compression mechanisms.
- Worked on NoSQL including MongoDB, Cassandra and HBase.
- Continuously monitored and managed the Hadoop Cluster using Cloudera Manager.
- Gained experience in managing and reviewing Hadoop log files.
- Involved in scheduling Oozie workflow engine to run multiple Hive jobs.
Environment: Hive, Sqoop, Spark Core, Spark SQL, MongoDB, Python, ExpressJS, NodeJS, Angular, AWS.
- Project coordination with other Development teams, System managers and web master and developed good working environment
- Generated Business Logic using Servlets, Session beans and deployed them on Web logic server
- Created complex SQL queries and stored procedures for trading application of client- Confidential .
- Performed code enhancements for existing application; technology used Java, Git
- Performed Unit testing using Selenium.
- Validated various business flows related to trading platforms.
- Analyzed the banking and existing system requirements and validated them to trading flows.
- Designed the process flow between front-end and server-side components
- Proficient in doing Object Oriented Design using UML-Rational Rose
- Created Technical Design Documentation (TDD) based on the Business Specifications
- Worked with QA team for testing and resolving defects.