- 8+ years of overall IT experience in Application Development in SQL DB and Big Data Hadoop, spark.
- Expertise in concepts of end - to-end project planning and implementation from scope management in various environments viz. release based maintenance, custom application development, enterprise wide application deployment, testing support and quality management in adherence to international guidelines and norms
- Cloudera certified Hadoop Developer with hands on experience on major components in Hadoop Ecosystem like Hadoop Map Reduce, HDFS, HIVE, PIG, Hbase, Zookeeper,Oozie and Flume.
- Experience in using Kettleby Pentaho.
- Expertise in setting up processes for Hadoop based application design and implementation.
- Experience in importing and exporting data using Sqoop from HDFS to Relational Database Systems and vice-versa.
- Experience in managing and reviewing Hadoop log files.
- Experienced in processing Big data on the Apache Hadoop framework using MapReduce programs.
- Excellent understanding and knowledge of NOSQL databases like HBase and Mongo DB.
- Experience in working with Windows, UNIX/LINUX platform with different technologies such as Big Data, SQL, XML, HTML, Core Java, Shell Scripting etc.
- Experience in giving training and guiding new team members in the Project.
- Experience in detailed system design using use case analysis, functional analysis, modelling program with class sequence, activity and state diagrams using UML and rational rose.
- Proficient in Retail, Telecom and Banking Domains.
- Exclusive experience in Hadoop and its components like HDFS, Map Reduce, Pig, Hive, Sqoop, HBase and Oozie, SPARK & SCALA.
- Around 6 years of professional experience involving project development, implementation, deployment and maintenance using BigData technologies in designing and implementing complete end-to-end Hadoop based data analytical solutions using HDFS, MapReduce, Spark, Scala, Yarn, Kafka, PIG, HIVE, Sqoop, Flume, Oozie, Impala, HBase.
- Developed the Unix shell/Python scripts for creating the reports from Hive data.
- Involved in loading data fromUNIX file system to HDFS.
- Experience in working with different Hadoop distributions like CDH and Hortonworks.
- Experienced with the Spark improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, Pair RDD's, Spark YARN.
- Experience developing Pig Latin and HiveQL scripts for Data Analysis and ETL purposes and extended the default functionality by writing User Defined Functions (UDFs), User Defined Aggregate Function (UDAFs) for custom data specific processing.
- Strong Knowledge on Architecture of Distributed systems and Parallel processing, In-depth understanding of MapReduce programing paradigm and Spark execution framework.
- Good experience in creating data ingestion pipelines, data transformations, data management, data governance and real time streaming at an enterprise level.
- Expert in working with Hive data warehouse tool-creating tables, data distribution by implementing partitioning and bucketing, writing and optimizing the HiveQL queries.
- In depth understanding of Hadoop Architecture and its various components such as Resource Manager, Application Master, Name Node, Data Node, HBase design principles etc.
- Experience developing iterative algorithms using Spark Streaming in Scala and Python to builds near real-time dashboards.
- Experience with migrating data to and from RDBMS and unstructured sources into HDFS using Sqoop.
- Experience in job workflow scheduling and monitoring tools like Oozie and good knowledge on Zookeeper.
- Profound understanding of Partitions and Bucketing concepts in Hive and designed both Managed and External tables in Hive to optimize performance.
- Worked on NoSQL databases including HBase and Mongo DB.
- Experienced with performing CRUD operations using HBase Java Client API and Solr API
- Good experience in working with cloud environment like Amazon Web Services (AWS) EC2 and S3.
- Experience in Implementing Continuous Delivery pipeline with Maven, Ant, Jenkins and AWS.
- Experience writing Shell scripts in Linux OS and integrating them with other solutions.
- Strong Experience in working with Databases like Oracle 10g, DB2, SQL Server 2008 and MySQL and proficiency in writing complex SQL queries.
- Experience in using PL/SQL to write Stored Procedures, Functions and Triggers.
- Excellent communication, interpersonal and analytical skills and a highly motivated team player with the ability to work independently.
- Experience using Jira for ticketing issues and Jenkins for continuous integration.
- Knowledge on Cassandra, AWS
- Very good experience in customer specification study, requirements gathering, system architectural design and turning the requirements into final product.
- Experience in interacting with customers and working at client locations for real time field testing of products and services.
- Working experience with Jupiter notebook for live coding and computational, output and documentation
- Ability to work effectively with associates at all levels within the organization.
- Experience in dealing with Apache Hadoop components likeMapReduce, HDFS, Hive, Pig, Sqoop, Big Data, Pivotal HD, HAWQ, Alteryx
- Strong background in mathematics and have very good analytical and problem solving skills.
Hadoop/Bigdata: HDFS, MapReduce, Sqoop,Hive,PIG,HBASE,Zookeeper,Clusterconfiguration,FLUME, AWS
Databases: SQL, NOSQL HBase, MYSQL,Oracle,PL/SQL.
ProgrammingLanguages: C, C, C, Java, SQL, Shell, Python
IDE's Utilities: Eclipse
WebTechnologies: J2EE, JMS, Web Service
Operating System: Windows, Mac, Linux and Unix
IDE: Eclipse, Microsoft Visual Studio 2008, 2012, Flex Builder
Version control: Git, SVN, CVS
Tools: FileZilla, Putty, PL/SQL Developer, JUnit
Confidential, Franklin Lakes, NJ
Sr. Big Data Engineer
- Web service and distributed systems development experience usingScala
- Expertise in usingAlteryxData Blending tool
- Experience in Alteryx for quick implementation and end to end analytic method
- Create workflows to fetch data from different sources to HDFS using Alteryx and Schedule jobs
- Strong Scala coding experience + web services development skills
- Implemented Angular Router to enable navigation from one view to the next as customer performs application tasks
- Developed Angular-JS components such as controllers, Services, filters, models.
- Experience Oracle PL/SQL, able to create all functions through Pl/SQL
- Supported the design, development, and implementation of Big Data solutions for the Informatics Team.
- Built and automated data engineering ETL pipeline over Snowflake DB using Apache spark and integrated data from disparate sources with Python APIs, consolidated them in a data mart (star schema)
- Planned, co-ordinated analysis, design and extraction of encounter data from multiple source systems into the data warehouse relational database while ensuring data integrity.
- Experienced with Docker and Kuberneteson multiple cloud providers, from helping developers build and containerize their application (CI/CD) to deploying either on public or private cloud
- Experience in Scikit learn package in python, R, and Matlab to apply Machine language algorithms
- Experience in the Kubernetes and there frame work used for distributes system
- Kubernetes experience with server discovery and load balancing
- Experience microservices architecture using Spring Boot using 12-factor app methodology. Deployed, Scaled, Configured, wrote manifest file for various Microservices in PCF.
- Implemented REST Microservices using spring boot. Generated Metrics with method level granularity and Persistence using Spring AOP and Spring Actuator
- Enhanced and expanded the encounter data warehouse model through sound detailed analysis of business requirements.
- Orchestrated entire pipeline using apache airflow delivering daily/weekly metric email reports from the power BI server to facilitate on the go decision making for business users.
- Designed multiple Deployment strategies, Continuous Integration and Continuous Development Pipelines using jenkins to ensure zero downtime and shortened deployment cycles
- Design develop and maintain applications using Scala with excellent coding skills of Scala and JAVA
- In charge of any architecture activities pertaining of applications
- Used Scala sbt to develop Scala coded spark projects and executed using spark-submit
- Involved in loading data fromUNIX file system to HDFS
- Involved in developing alinear regressionmodel to predict a continuous measurement for improving the observation on wind turbine data developed using spark with Scala API
- Determined the Assess requirements and best technologies and design patterns to utilize for data-centric solutions.
- Experience in wealth and asset management able to do analysis using Hive, oracle and ETL processes and related tools
- Design, build and launch extremely efficient & reliable data pipelines for real time streaming, search and Indexing solutions.
- Worked closely with Cloudera Administrators to optimize the usage of our clusters and plan for future expansion and usage.
- Developed Oozie workflows for daily incremental loads, which gets data from Teradata and then imported into Hive Tables
- Created sentry policy files to provide access to the required databases and tables to view from impala to the business users in the dev, test and prod environment
- Developed the Unix shell/Python scripts for creating the reports from Hive data.
- Developed AWS MapReduce (EMR)/Spark Python modules for machine learning & predictive analytics in Hadoop.
- Generated the data cubes using hive, Pig, JAVA Map-Reducing on provisioning Hadoop cluster in AWS.
- Created S3 buckets(configure, policies, permissions) and used AWS S3 for storage and backup of Data in to AWS and AWS Glacier to store archive data
- Experienced in Spark performance tuning for setting right batch interval time, correct level of parallelism, Data structured tuning and memory tuning
- Used Apache parquet to store large scale data to increases the Spark SQL performance
- Utilized Spark, Scala, Hadoop, HBase, Kafka, Spark Streaming, MLLib, Python, a broad variety of machine learning methods including classifications, regressions, dimensionally reduction etc.
- Developed a PySpark code for saving data into AVRO and Parquet format and building Hive tables on top of them
- Used Python and Spark to implement different machine learning algorithms including Generalized Linear Model.
- Implemented HBase/SOLR for real time search/indexing at production scale.
- Designed ControlIM jobs by domain area, which resulting into longer wait times for other jobs that are not dependent.
- Created event-based job execution across various application projects.
- Evaluated the top 10 long running jobs for optimization and performance efficiencies.
- Allocated maximum server resources to the processes enabling parallel runs for all jobs with maximum threads per job.
- Worked with Sqoop in Importing and exporting data from different databases like DB2, AQDB into HDFS and Hive.
- Create queues and allocated the clusters resources to provide the priority for jobs.
- Captured the data logs from web server into HDFS using Flume for analysis.
- Worked in an agile environment, collaborate with Solution Architects, Scrum Masters, Developers and Testers.
Environment: Hadoop, HDFS, Hive, Impala, Machine learning, AWS, IBM Q-rep, Sqoop, HiveQL, Apache Airflow, Oozie, Spark, AVRO, Parquet, Ab Initio, MQ, Flume, Kafka, ControlIM, DB2, AQDB, jenkins, Cloudera, Python, PySpark, Scala.
Confidential, Northbrook, IL
Big Data/Sr Hadoop Developer
- Installed and configured Hadoop MapReduce, HDFS and developed multiple MapReduce jobs in Java for data cleansing and preprocessing.
- Developed java Map Reduce programs using core concepts like OOPS, Multithreading, Collections and IO.
- Gathered requirements, developed and deployed a bigdata solution using Big Insights (IBM Hadoop Distribution)
- Worked with application teams to install operating system, Hadoop updates, patches, version upgrades as required
- Developed the Unix shell/Python scripts for creating the reports from Hive data.
- Helped the team to increase cluster size from 35 nodes to 118 nodes. The configuration for additional data nodes was managed using Puppet.
- Responsible to manage data coming from different sources and involved in HDFS maintenance and loading of structured and unstructured data.
- Used Sqoop to import the data on to Cassandra tables from databases and also importing data from various sources to the Cassandra cluster using Java API's.
- Supported HBase Architecture Design with the Hadoop Architect team to develop a Database Design in HDFS.
- Created Cassandra tables to load large sets of structured, semi-structured and unstructured data coming from Linux, NoSQL and a variety of portfolios.
- Involved in creating data-models for customer data using Cassandra Query Language.
- Developed multiple Map Reduce jobs in Java for data cleaning and preprocessing.
- Developed the Pig UDF'S to pre-process the data for analysis.
- Hands on writing Map Reduce code to make semi structured data as structured data and for inserting data into HBase from HDFS.
- Implemented a script to transmit information from Webservers to Hadoop using Flume.
- Used Zookeeper to manage coordination among the clusters.
- Used Apache Kafka and Apache Storm to gather log data and fed into HDFS.
- Created workflow in Oozie for Automating tasks of loading data into Amazon S3 and to preprocess using Pig, utilized Oozie for data scrubbing and processing
- Developed scripts and deployed them to pre-process the data before moving to HDFS.
- Performed extensive analysis on data with Hive and Pig.
- Upgraded the Hadoop Cluster from CDH3 to CDH4 and setup High Availability Cluster Integrate the HIVE with existing applications
- Integrating Bigdata technologies and analysis tools into the overall architecture
- Handled importing of data from various data sources, performed transformations using Hive, MapReduce,
- Analyzed data by performing Hive queries and running Pig scripts to know user behavior.
Environment: Apache Hadoop, HIVE, PIG, HDFS, Zookeeper, Kafka, Java, UNIX, MYSQL, Eclipse, Oozie, Sqoop, Storm
Confidential, Houston, Texas
- Designed and developed data loading strategies, transformation for business to analyze the datasets.
- Processed flat files in various file formats and stored them as in various partition models in HDFS.
- Responsible for Building, develop, testing shared components that will be used across modules.
- Responsible in performing sort, join, aggregations, filter, and other transformations on the datasets using Spark.
- Involved in developing a linear regression model for predicting continuous measurement.
- Responsible in Implementing advanced procedures like text analytics and processing using the in-memory computing capabilities like Spark.
- Experience in extracting appropriate features from data sets in order to handle bad, null, partial records using Spark SQL.
- Collected data using Spark Streaming from AWS S3 bucket in near-real-time and performs necessary Transformations and Aggregations to buil d the data model and persists the data in HDFS
- Involved in converting Hive/SQL queries into Spark transformations using Spark RDD's.
- Experienced in working with spark eco system using Spark SQL and Scala queries on different formats like Text file, CSV file.
- Expertized in implementing Spark using Scala and Spark SQL for faster testing and processing of data responsible to manage data from different sources.
- Managing and scheduling Jobs on a Hadoop cluster using Oozie.
- Using spark - Cassandra connector to load data to and from Cassandra.
- Experience in building Real-time Data Pipelines with Kafka Connect and Spark Streaming.
- Responsible in development of Spark Cassandra connector to load data from flat file to Cassandra for analysis.
- Imported the data from different sources like AWS S3, LFS into Spark RDD.
- Responsible in creating consumer API’s using Kafka.
- Responsible in creating Hive tables, loading with data and writing Hive queries.
- Developed end to end data processing pipelines that begin with receiving data using distributed messaging systems Kafka for persisting data into Cassandra
- Worked on a POC to perform sentiment analysis of twitter data using Open NLP API.
- Responsible in creating mappings and workflows to extract and load data from relational databases, flat file sources and legacy systems using Talend.
- Developed and designed ETL Jobs using Talend Integration Suite in Talend 5.2.2
- Experienced in managing and reviewing log files using Web UI and Cloudera Manager.
- Involved in creating External Hive tables and involved in data loading and writing Hive UDFs.
- Experience in using various compression techniques like Snappy Codec, Lzo, and Gzip to save data and optimize data transfer over network using Avro, Parquet.
- Involved in unit testing and user documentation and used Log4j for creating the logs.
Environment: Apache Spark, Hadoop, HDFS, Hive, Kafka, Sqoop, Scala, Talend, Cassandra, Oozie, Cloudera, Impala, Linux, Oozie
SQL Database Administrator/ SQL DBA
- Configured, Installed, and Maintained SQL Server 2000 & 2005 in development, test, and production environment
- Installed and Configured Operating Systems, Windows Server (2000 & 2003)
- Installed and Configured SQL Server Reporting Services (SSRS)
- Configured and Maintained Replications, Log Shipping
- Upgraded databases from SQL Server 7.0/2000 to 2005 in testing and production environment
- Upgraded MS Access 2000/97 databases into MS SQL Server 2000, and 2005
- Applied SP (Service Pack) and patches on SQL Server 2000, 2005, Windows Server 2003 Enterprise Edition
- Expertise in database Performance tuning
- Writing T-SQL and Stored-Procedures
- Strong working experience in creating, modifying tables, Index (Cluster/Non-Cluster), Constraints (Unique/Check), Views, Stored Procedures, Triggers
- Data Modeling, developing E-R Diagram using ERWin and SQL Server Data Diagram
- Backward & Forward Engineering with data modeling tools mentioned above
- Export & Import data from Flat file, CSV file, Oracle, and Access to/from SQL Server Database using DTS, SSIS, and BCP
- Generated Report using SSRS and in Excel, html, text from Database as well
- Working experience in Database Backup, Restore and Point-in-Time Recovery
- Scheduled jobs to automate different database related activities including backup, monitoring database health, disk space, backup verification
- Developed Different Maintenance Plans for database monitoring
- Monitor Server Activity, Error Log, space usage, and solving problem as needed