- Have 3+ years of experience in Analysis, Design, Development, Testing, Implementation, Maintenance and Enhancements using Big Data in implementing end - to-end Hadoop solutions.
- Experience in designing Use Cases, Class diagrams, Sequence and Collaboration diagrams for multi - tiered object-oriented system architectures utilizing Unified Modeling Tools (UML) such as Rational Rose, Rational Unified Process (RUP) Working knowledge of Agile Development and Test-Driven Development (TDD) Business Driven Development (BDD) methodologies.
- Solid Knowledge in Amazon web services (AWS) EMR, EC2 which provides fast and efficient processing.
- Hands on experience in Hadoop ecosystem including Spark, Kafka, HBase, MapReduce, HDFS, Hive, Pig, Impala, Sqoop, Oozie, Flume, Mahout, Zookeeper, Storm, Tableau, and Talend big data technologies.
- Extensive knowledge of Client - Server technology, web-based n-tier architecture, Database Design and development of applications using J2EE Design Patterns like Singleton, Session Facade, Factory Pattern and Business Delegate.
- Experience in different Hadoop distributions like Cloudera (CDH3 & CDH4) and Horton Works Distributions (HDP).
- Involved in converting Hive/SQL queries into Spark transformations using Spark RDD and Pyspark concepts.
- Experience working with SQL, PL/SQL and NoSQL databases like Microsoft SQL Server, Oracle, HBase and Cassandra.
- Experience in Importing and exporting data from different databases like MySQL, Oracle, Netezza, Teradata, DB2 into HDFS using Sqoop, Talend.
- Worked on data warehouse product Amazon Redshift.
- Good experience in design the jobs and transformations and load the data sequentially & parallel for initial and incremental loads in Talend.
- Experience in developing and scheduling ETL workflows in Hadoop using Oozie.
- Experience in developing applications using Map Reduce, Pig and Hive.
- Loading log data directly into HDFS using Flume.
- Have implementation of Kerberos authentication for client/server applications by using secret-key cryptography.
- Experience with Tableau that is used as a reporting tool.
- Good knowledge in developing Restful web services.
- Working knowledge of database such as Oracle, Microsoft SQL Server, DB2
- Strong experience in database design, writing complex SQL Queries and Stored Procedures
- Have extensive experience in building and deploying applications on Web/Application Servers like WebLogic, WebSphere, J Boss and Tomcat
- Experience in Building, Deploying and Integrating with Ant, Maven and using Junit Testing Framework for debugging.
- Excellent communication skills and strong architecture skills
- Ability to learn and adapt quickly to the emerging new technologies.
Programming languages: Python, R, Scala, SAS, SAS Enterprise Guide, SQL, C, C++, Shell Scripting
Database: SQL, MySQL, TSQL, MS Access, Oracle, Hive, MongoDB, Cassandra, PostgreSQL, Informatica, Openshift.
Statistical Software: SPSS, R, SAS
BI Tools: Tableau, Crystal Reports, Amazon Redshift, Azure Data Warehouse, Splunk, Power BI
Data Science/Data Analysis Tools & Techniques: Generalized Linear Models, Logistic Regressions, Boxplots, K-Means, Clustering, Neural networks, AI, Teradata, Tableau, KNIME, Azure ML, Alteryx, SVN, PuTTY, WinSCP, Redmine (Bug Tracking, Documentation, Scrum)
Development Tool: R Studio, Eclipse, IntelliJ, Jupyter notebook, GitLab
Bigdata Framework: Hadoop, HDFS, MapReduce, Pig, Zookeeper, Hive, Sqoop, Flume, Kafka, Spark, Impala
Methodologies: Agile, Scrum, Kanban
Machine Learning: Na ve Bayes, Decision trees, Ensemble Learning, Regression models, Random Forests, Time-series, K-means, K-NN, ARIMA, ETS, Text mining, NLP, KNN, Recommendation System, APRIORI
Deep Learning: ANN, CNN, LSTM, Reinforcement Learning
Cloud Technologies: AWS (EC2, S3, RDS, EBS, VPC, IAM, Security Groups, EMR), Microsoft Azure, Openstack, GCP
Operating Systems: Windows, Linux, Unix, Macintosh, Red Hat
Confidential, Westborough, MA
- Involved in Data Profiling and merge data from multiple data sources.
- Involved in Big data requirement analysis, develop and design solutions for ETL and Business Intelligence platforms.
- Installed Kafka on Hadoop cluster and configured producer and consumer to establish connection from source to HDFS.
- Load real time data from various data sources into HDFS using Kafka.
- Worked on reading multiple data formats on HDFS using python.
- Implemented Spark using Python (pySpark) and SparkSQL for faster testing and processing of data.
- Load the data into Spark RDD and do in memory data Computation.
- Involved in converting Hive/SQL queries into Spark transformations using API’s like Spark SQL, Data Frames and python.
- Analyzed the SQL scripts and designed the solution to implement using python.
- Exploring the Spark by improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, SparkSQL, Data Frame, Pair RDD & Spark YARN.
- Performed transformations, cleaning and filtering on imported data using Spark Data Frame API, Hive, MapReduce, and loaded final data into Hive.
- Involved in converting Map Reduce programs into Spark transformations using Spark RDD on python.
- Developed Spark scripts by using python Shell commands as per the requirement.
- Worked with NoSQL databases like HBase in creating tables to load large sets of semi structured data coming from source systems.
- Design and develop the HBase target schema.
- Used deep learning frameworks like Tensorflow, and Keras to help clients build Deep learning models
- Worked on visualizing the reports using Tableau and SAS Visual Analytics.
Environment: AWS, Spark, Python, PY-Spark, Linux, Hadoop, Docker, Shell Scripting, PL/SQL, Agile methodologies.
Confidential, Brockton, MA
- Developed a bigdata web application using Agile methodology in Scala which has the capability of combining functional and object-oriented programming.
- Work with different data sources like HDFS, Hive and Teradata for Spark to process the data.
- Use Spark to process the data before ingesting the data into the HBase. Both Batch and real-time spark jobs were created using Scala.
- Use HBase as the database to store application data, as HBase offers features like high scalability, distributed NoSQL, column oriented and real-time data querying.
- Use Kafka a publish-subscribe messaging system by creating topics using consumers and producers to ingest data into the application for Spark to process the data and create Kafka topics for application and system logs.
- Utilize Play framework to build web applications that combines easily with Akka.
- Configure Zookeeper to coordinate and support the distributed applications as it offers high throughput and availability with low latency.
- Create and update the Terraform scripts to create the infrastructure and Consul scripts to enable Service Discovery for the application’s systems.
- Configure Nginx to serve the static content of the web pages reducing the load on the web server for the static content.
- Write SQL queries to perform CRUD operations on the PostgreSQL to save, store, update and delete rows in tables using Play Slick.
- Perform database migrations as and when needed.
- Use SBT to build the Scala project.
- Presenting the application once every month to customers explaining new features of the application and answer any questions that might arise from the discussions and take suggestions to improve the application for better user experience.
- Create and update Jenkins jobs to develop pipelines to deploy the application in different environments like develop, QA and Production.
- Use Git commands extensively to store the code.
Environment: SPARK, Scala, Python, Intellij IDE, KAFKA, Play Framework, Slick, PostgreSQL, AWS CLI, Terraform, Consul, SBT, HBase, Akka.
- Teamed up with data engineers to implement ETL process, performed and optimized SQL queries to perform data extraction and merging from SQL server database.
- Worked on projects with big data, machine learning, data visualization using environments like R studio and jupyter notebook.
- Used Python and R packages to perform Exploratory Data Analysis and connected to Tableau Desktop to visualize the same.
- Used statistical techniques for hypothesis testing to validate data and interpretations and presented findings and data to team to improve strategies and operations.
- Worked with business stakeholders, application developers, and production teams and across functional units to identify business needs and discuss solution options.
- Analyzed business requirements, system requirements, data mapping requirement specifications, and responsible for documenting functional requirements and supplementary requirements.
- Worked with consumers and different teams to gain insights about the data concepts behind their business.
- Determined the missing data, outlier and invalid data and applied appropriate data management techniques.
- Ensured best practices are applied and integrity of data is maintained through security, documentation, and change management
- Organized the data to required type and format for further manipulation
- Used advanced Microsoft Excel to create pivot tables using MS Excel functions using packages Data Analysis
- Worked with SQL, SQL PLUS, Oracle PL/SQL stored procedures, triggers, SQL queries and loading data into data warehouse/data marts.
Environment: Python, R, Machine learning, Oracle, Tableau, MongoDB, SQL server, GIT, Data Collection, Statistical Analysis, Excel, MySQL