- Over 5 years of diverse experience in Information Technology which includes Big Data technologies such as Hadoop and Apache Spark.
- Expertise in latest Big Data technology Apache Hadoop, Apache Spark, HDFS, MapReduce, Hive, Spark SQL, Yarn, Mesos.
- Hands on experience in AWS cloud services S3, EMR, Athena, EC2, Glue, Cloud formation, Cloud watch, Lambda.
- Hands on experience in Python and Java, Data Mining and Big Data Analytics.
- Hands on experience in Pyspark Shell script, performance tuning Hive queries.
- Programming Experience in different programming languages Java, C, Python, COBOL and databases like Oracle, MySQL, SQL Server.
- Highly - motivated, productive and customer-focused team player with strong communication, interpersonal, organizational, time management, analytical and problem solving skills.
- Reliable and dedicated with the ability to grasp and apply new procedures quickly; organize and prioritize tasks to meet deadlines.
Big Data Technology: Apache Spark, Hadoop, Hive, Oozie, Yarn, Map Reduce, HBase, HDFS, Sqoop, Flume, AWS-S3, Machine Learning, Data Mining
Languages: Python, Java, Java Script, SQL, PL/SQL, WSDL, XML,C
CI/CD: Jenkins, Docker, Maven, Github,ArtiFactory
Framework: Map Reduce, MVC, Hibernate
Database: Oracle, SQL Server, MySQL
NoSQL Database: Apache Cassandra, Apache Hbase, MongoDB
Reporting Tools: Qlik Sense, Tableau, Splunk
Operating System: Windows, UNIX, Linux
ConfidentialLead Offshore team
- To get the optimized solution implementing different AWS services. The data is being moved from on-premises to S3 buckets.
- Building data pipelines on the AWS and load data from on-premises to S3 buckets.
- Transform and cleanse the data using AWS EMR, Spark and Hive with Python.
- Perform SQL queries on AWS with Athena and RedShift.
- AWS Glue is used to explore the schema of the data as it arrives.
Lead Offshore team
- Built data pipelines to load and transform large sets of structured, semi structured data.
- Involved in writing Spark jobs as per business logic.
- Improved performance and optimization of the existing algorithms by exploring different components like Spark Context, Spark-SQL, Data Frame, Pair RDD's, accumulators.
- Experienced in defining job flows.
- Responsible to manage data coming from different sources.
- Supported Map Reduce Programs those were running on the cluster.