Bigdata Developer/administrator Resume
NJ
SUMMARY
- An overall experience of around 7 years in professional IT Services including Big Data and Cloud based applications spanning across technologies and business domains
- Proficiency wif Certification on Big Data/ NoSQL technologies as Developer/Administrator
- Hands on experience in Hadoop ecosystem wif components - Hadoop Map Reduce, HDFS, Oozie, HiveQL, Sqoop, HBase, MongoDB, Zookeeper, Pig and Flume wif M5, CDH3&4 clusters and EMR cloud computing wif Amazon Web Services (AWS)
- Excellent understanding / noledge of Hadoop (1 & 2) architecture Hands-on experience wif components in Hadoop Ecosystem and noledge of Mapper/Reducer/HDFS Frame work for scalability, distributed computing like CDN and high performance computing
- Great understanding of Storm architecture and working structure for real-time processing
- Experience in analyzing data using PIG Latin, HIVEQL and Custom MapReduce programing in JAVA, PYTHON(Streaming) using Development tools like Eclipse and Visual Studio
- Worked on NoSQL databases including Cassandra, MongoDB, MarkLogic, and HBase. Managing and reviewing Hadoop log files, worked wif HCatalog to open up access to Hive’s Metastore
- Experience in importing / exporting teh data using Sqoop from HDFS to Relational Database Systems/Mainframe and vice-versa. Used Hadoop Streaming utility to run Map/Reduce jobs
- Good working noledge on Hadoop Administration activities like Installing cluster, commissioning & decommissioning of Datanode, namenode recovery, capacity planning and slots configuration, Installing and configuring HBase, HDFS, PIG, HIVE, and Hadoop MapReduce
- Expertise in Tableau Server Configuration and Dashboard building
- Experience in data cleansing and visualization using Paxata, Tableau
- Hands on experience configuring fully automated Microsoft cloud solutions like Microsoft System Center
- Utilizing Virtualization (Hyper-V) and also VM Ware Virtualization Solutions
- Performed Database administration activities like backup, recovery, integrity check and index reorganization. Involved in working wif disaster recovery solutions such as replication and log shipping
- Hands on experience in application development using Java, Python, C, COBOL, JCL, RDBMS, DB2 and Linux shell scripting
- Expertise in DB2 UDB, Oracle, SQL Server 2008/2005/2000 , PL/SQL and My SQL wif oracle workforce certifications
- Hands on experience in implementing Lambda Architecture
- Experience in working wif Flume, Kafka, Storm to handle large volumes of streaming data
- Hands on experience in developing Sqoop jobs to import teh data from RDBMS sources like MySQL, Oracle, PostgreSQL into HDFS as well as exporting vice versa
- Experience in writing workflows using Apache Oozie wif job controllers like Hive and Sqoop
- Expertise in handling multiple relational databases like SQL Server, PostgreSQL, MySQL and Oracle
- Expertise in NoSQL databases like MarkLogic, Cassandra, MongoDB, HBase
- Performed Database administration activities like backup, recovery, integrity check and index reorganization. Involved in working wif disaster recovery solutions for data as well as log information
- Highly motivated wif strong Analytical Skills, Excellent communication and Interpersonal skills
TECHNICAL SKILLS
PROGRAMING LANGUAGES: JAVA/J2EE | JAVA SCRIPT | PYTHON | C | PL/SQL
NOSQL DATASTORE: MARKLOGIC | CASSANDRA | MONGODB | HBASE
BIGDATA TECHNOLOGIES: HADOOP ECOSYSTEM | SOLR
HADOOP DISTRIBUTIONS: CLOUDERA | HORTONWORKS | MAPR
HADOOP ECOSYSTEM: HDFS | MAPREDUCE | PIG | HIVE | OOZIE | SQOOP | ZOOKEEPER | MAHOUT | FLUME | SPARK | KAFKA
CLOUD PLATFORMS: AWS | MICROSOFT SYSTEM CENTER | AZURE
DATA PREPARATION/BI TOOLS: PAXATA | TABLEAU
IDES, FRAMEWORKS, TOOLS: JMS | MAVEN | ECLIPSE IDE | MS OFFICE
AMAZON WEB SERVICES: EC2 | S3 | RDS | EMR | VPC | GLACIER | CLOUD WATCH | CLOUDTRAIL | IDENTITY AND ACCESS MANAGEMENT |EMR | DATA PIPELINE | CLOUD |FORMATION | SES | SNS | REDSHIFT
NETWORKING: VIRTUALIZATION | RIP V2 | VLANS | NETWORK ADMINISTRATION-UNIX |WINDOWS
DATABASE: SQL SERVER | ORACLE 9i | 10G
CONFIGURATION MANAGEMENT: CHEF | PUPPET
WEB TECHNOLOGY: RESTFUL SERVICES | HTML | XTML | CSS | JAVASCRIPT
METHODOLOGY: AGILE SCRUM
IDES, FRAMEWORKS, TOOLS: JMS | MAVEN | ECLIPSE IDE | MS OFFICE | ADOBE APPS
SERVERS: APACHE TOMCAT
MAINFRAME APPLICATION: OS 360 | Z-OS | SQL | ORACLE 9I | 10G | COBOL | JCL | IMS DB | DB2 | TSO/ISPF | SDF | VSAM | FILE-MANAGER
SOFTWARE DEVELOPMENT: SOFTWARE PROJECT MANAGEMENT | SOFTWARE QUALITY ASSURANCE
OPERATING SYSTEMS: CENTOS | FEDORA | UBUNTU| Z-OS| OS 360 | Z-OS | WINDOWS 2008/2012
PROFESSIONAL EXPERIENCE
Confidential, NJ
BigData Developer/Administrator
Responsibilities:
- Involved in Analysis, Design and Development for technical specifications incorporating various AWS cloud technologies, making use of parallel transformation frameworks such as MapReduce (EMR)
- Written JSON templates to launch configuration in AWS Cloud Formation
- Moved centralized in house departments store data to S3 for transformation and analytics
- Used Amazon Elastic MapReduce to transform teh data stored in S3 and store teh cleansed teh data back to S3
- Written custom MapReduce jobs along wif Hive and Pig Scripts to do teh same in EMR
- Later as part of teh process, loaded transformed data to RedShift from S3 on scheduled basis for warehousing
- Used AWS DataPipeline for scheduling as well as pipelining teh complete ETL workflow
- Also installed and configured Tableau Server in AWS EC2 instance connecting to Redshift for BI analytics by different departments
- Active Directory using AWS Directory Service was used for autanticating Tableau server
Environment: Amazon EMR, S3, EMR, DataPipeline, RedShift, Cloud Formation, Hive, Pig, Tableau Server
Confidential, IL
BigData Engineer
Responsibilities:
- Designed, developed, and deployed lambda architecture
- Designed and implemented persistence layer for sensor data collected on appserver through MQTT and HTTP protocol
- IOT devices sends teh real time events through their inbuilt protocols either MQTT or HTTP
- All teh events are captured by respective brokers which is running behind a load balancer.
- Implemented messaging system using distributed Kafka cluster. All teh brokers (or Appserver for HTTP events) push teh incomings events through Kafka
- Designed and implemented teh processing layer(Speed Layer) of teh events using Apache Storm by consuming them from Kafka Cluster
- Implemented complex business rules in Storm and stored teh processed data in Cassandra as well as Solr.
- Storm also handles teh ingestion of raw data from Kafka to HDFS.
- HDFS serves teh same for batch analytics.
- Designed Retrieval Layer for teh enterprise app by obtaining teh data from Cassandra as well as Solr.
- Designed data model in Cassandra for faster query retrieval as well as searching mechanism in Apache Solr.
- Build a scheduled reporting on raw Datastore for BI analytics
Environment: Hadoop, Hive, Zookeeper, Storm, Cassandra, Java, Spring MVC, Apache Tomcat, Apache SOLR, Kafka, HortonWorks
Confidential
Responsibilities:
- Coded Java Maven module to perform various Amazon S3 Operations
- Written custom MapReduce program to cleanse and enrich data in EMR
- Load transformed data back to S3
- Load data to Marklogic server in AWS
- Coded Java script for search operations to build search app wif MarkLogic
- Querying for keyword search wif Facets number of years, URI’s against skillset
- Coded Python module for URI based analysis
- Build a Python Dictionary for predefines keyword from URI crawling data
- Ingested teh above data to MarkLogic
- Build a Faceted Search on top of data stored in MarkLogic
Environment: Amazon EMR, S3, Java, Java Script, Python, MarkLogic-NoSQL
Confidential, MN
Big Data Developer/Analyst
Responsibilities:
- Involved in Analysis, Design and Development for technical specifications incorporating various BigData technologies
- Involved in teh design and development phases of Software Development Life Cycle (SDLC) using Scrum methodology
- Management of teh existing cluster, worked on commissioning & decommissioning of Datanode, Namenode recovery, capacity planning, and slots configuration
- Developed teh data pipeline using Flume, Sqoop to ingest customer behavioral data and purchase histories into HDFS for analysis
- Used Pig to perform data validation on teh data ingested using scoop and flume and teh cleansed data set is pushed into HBase
- Used Cassandra integrations and computed various metrics for reporting on teh tableau dashboard
- Extensively used Cloudera CDH distribution of Hadoop
- Developed job flows in Oozie to automate teh workflow for pig and hive jobs.
- Loaded teh aggregated data onto DB2 from hadoop environment using Sqoop for reporting on teh dashboard.
- Used Impala to connect wif Tableau for Reporting
- Participate in requirement gathering and analysis phase of teh project in documenting teh business requirements by conducting workshops/meetings wif various business users
Environment: Red Hat Linux, HDFS, Cloudera CDH, Map-Reduce, Hive, Java JDK1.6, Pig, Sqoop, Flume, Zookeeper, Oozie, DB2, Impala, Tableau
Confidential, NY
No-SQL/ DB2 BI Developer
Responsibilities:
- Loaded teh structured data from different applications which is mainly stored in DB2 into MongoDB using JDBC-ODBC connectivity
- Data extracted from different RDBMS is converted to JSON object and pushed teh same to MongoDB.
- Unstructured files like XML’s, JSON files are processed using custom built Java API and pushed into mongodb.
- Responsible for data modeling in MongoDB in order to load data which is coming as structured as well as unstructured data
- Developed a Unified Turnover system which can take both structured and unstructured data and insert accordingly into MongoDB based on teh rules defined in teh Turnover system.
- Performed data validation in Turnover system while ingesting data into MongoDB
- Developed data export options in Turnover System from Data Warehouse in which we can load data when required into any RDBMS or can even exported as flat flies
Environment: Linux, Java JDK1.6, MongoDB, Java J2EE, various RDBMS
Confidential
Mainframe/ DB2 Developer
Responsibilities:
- Coding JCl wif change in storage disk
- Coded New Cobol Modules
- Analyzing structural Schema for improvements
- Integration new DB2 queries for module improvements
- Tool based analytics wif SPUFI, SDF etc.
Environment: Z/OS, JCL, COBOL, DB2, CICS, SPUFI, QMF, SDF, CICS
Confidential
Responsibilities:
- Tracking modules for various DML operations on DB2
- Altered DB2 modules to improve efficiency
- Tool based analytics wif SPUFI, SDF etc.
- CICS operation on various modules to perform addition, deletion, updation etc.
- Tuning operations on recovery modules
Environment: Z/OS, JCL, COBOL, DB2, CICS, SPUFI, QMF, SDF
Confidential
Responsibilities:
- Performed day to day maintenance on data retrieval
- Tracking modules for various Hierarchical relationship in IMS DB implemented modules
- Analyzing structural Schema for improvements
- Coding modules for fetch operation on wif IMSDB.
- Tuning operations on recovery modules
Environment: Z/OS, JCL, COBOL, VSAM, IMS DB