Big Data Engineer Resume
Virginia, BeacH
SUMMARY
- 3 years of experience in software development industry, mainly focused on Big Data technologies and Distributed computing.
- Experience in working with MapReduce programs using Apache Hadoop for working with Big Data .
- Hands on creating databases, schemas and tables in PostgreSQL .
- Experience with SQL, PL/SQL and database concepts.
- Responsible for account growth of new and existing accounts utilizing knowledge of PaaS, IaaS, and SaaS.
- Experience on Creating SQL objects like Tables, Stored Procedures, Views, Indexes, Triggers, Functions, User Defined Data - Types, Rules, and Defaults.
- Hands on Hadoop Ecosystem, HDFS for the data storage & MapReduce for the cluster management.
- Good knowledge in Apache Spark 2.0 using Scala programming.
- Knowledge on processing with real-time data using Apache Spark.
- Hands on converting Hive/SQL queries in to Spark transformations using spark RDDs, spark SQL.
- Hands on coding Apache Spark using Python for the faster data processing.
- Worked on various individual projects relevant to data mining.
- Good understanding of concepts like Enterprise Data Warehousing ETL, Data Modelling, Data Mapping.
- Experience in designing and maintaining high performing ELT/ETL Processes
- Experience in working with Hive tool creating tables, data distribution by implementing partitioning and bucketing, writing and optimizing the Hive SQL queries as per the client requirements.
- Hands on creating External and Internal tables in Hive.
- Developed the UNIX shell scripts for creating the reports from Hive data.
- Work experience with cloud infrastructure like Google cloud and Microsoft Azure .
- Experience in NOSQL databases such as HBase and MongoDB .
- Ability to meet deadlines and handle multiple tasks, flexible in work schedules and processes excellent communication skills.
- Articulate in written and verbal communication along with strong interpersonal, strong analytical and organizational skills.
TECHNICAL SKILLS
Programming: Python, R, Scala, Bash scripting & Linux Kernel Development.
Hadoop: Spark, Map-Reduce, Pig, Hive, Sqoop and Hbase.
Methodologies: Agile Methodologies and Waterfall Methodology PostgreSQL
SQL: My SQL server, NOSQL
Data warehousing Data Mining: Cassandra & Mango DB. Teradata and Hive.
Cloud services: Rapid Miner and R-programming tool Google cloud services & Microsoft Azure.
Apache: Core, Data frames, Spark SQL, Spark Streaming Eclipse, IntelliJ, Eclipse Scala IDE and PyCharm (Python).
Spark: Spark
PROFESSIONAL EXPERIENCE
Confidential, Virginia Beach
Big data Engineer
Environment: Hadoop HDFS, MapReduce, Apache Spark v 2.1.1, Apache Kafka v 0.10.2, Apache Spark streaming, Apache Cassandra v 3.10, Apache Zoo Keeper (v 3.4.10), Microsoft Azure and Scala programming v 2.11 .
Responsibilities:
- Task is on Data pipeline (Data processing job) which uses the latest technology such as spark streaming and Kafka
- Reading input data, transform, and write out the resulting output.
- Creating a program with multiple pipelines.
- Pipeline Transforms, Composite Transforms and Root Transforms.
- Processed streaming data as well as batch using Apache Spark, Spark Streaming.
- Streaming data in the Kafka cluster and also collected data is fed to the Apache core via Spark streaming API used.
- Data was divide into small chunks and processed.
- Used Spark for cleaning, processing, extracting relevant fields and performing aggregations.
- Tasks on performing data transformations, Spark RDD and Data frames.
- Kafka was used as messaging system to gather all the data from different sources and processing and pushing them to the Kafka.
- Developed Kafka consumer program in Scala
- Used Cassandra as the NoSQL Database and acquired very good working experience with NoSQL databases.
- Hadoop HDFS was used to archiving the incoming data and also performing the ETL.
- Used Apache Zookeeper for maintaining configuration information, naming, providing distributed synchronization and group services.
Confidential
Hadoop Developer
Environment: Hadoop, HDFS, Pig, Sqoop, HBase, Shell Scripting, Apache Crunch, Ubuntu, Linux Red Hat.
Responsibilities:
- Worked on analyzing Hadoop cluster and different big data analytic tools including Pig, Hbase database and Sqoop.
- Responsible for building scalable distributed data solutions using Hadoop.
- Implemented nine nodes CDH3 Hadoop cluster on Red hat LINUX.
- Implemented Apache Crunch library on top of map reduce and spark for data aggregation .
- Involved in loading data from LINUX file system to HDFS.
- Worked on installing cluster, commissioning & decommissioning of data node, name node recovery, capacity planning, and slots configuration.
- Created HBase tables to store variable data formats of PII ( Personally identifiable information ) data coming from different portfolios.
- Implemented a script to transmit sysprin information from Oracle to base using Sqoop .
- Implemented best income logic using Pig scripts and UDFs.
- Implemented test scripts to support test driven development and continuous integration.
- Worked on tuning the performance Pig queries.
- Worked with application teams to install operating system, Hadoop updates, patches, version upgrades as required.
- Responsible to manage data coming from different sources.
- Involved in loading data from file system to HDFS.
- Load and transform large sets of structured, semi structured and unstructured data
- Cluster coordination services through Zookeeper.
- Experience in managing and reviewing Hadoop log files.
- Job management using Fair scheduler.
- Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.
- Responsible for cluster maintenance, adding and removing cluster nodes, cluster monitoring and troubleshooting, manage and review data backups, manage and review Hadoop log files.
- Installed Oozie workflow engine to run multiple Hive and pig jobs.
- Analyzed large amounts of data sets to determine optimal way to aggregate and report on it.
- Supported in setting up QA environment and updating configurations for implementing scripts with Pig and Sqoop.
Confidential
Java Developer
Environment: JAVA, Eclipse IDE, HTML, PL/SQL.
Responsibilities:
- Worked on designing and developing the Web Application User Interface and implemented its related functionality in JAVA/J2EE for the product.
- Designed and developed applications using JSP, Servlets and HTML.
- Used Hibernate ORM module as an Object Relational mapping tool for back end operations.
- Provided Hibernate configuration file and mapping files and also involved in the integration of Struts with Hibernate libraries.
- Extensively used Java Multi-Threading concept for downloading files from a URL.
- Extensively used Eclipse IDE for developing, debugging, integrating and deploying the application.
- Developed Web Service client interface for invoking the methods using SOAP.
- Created navigation component that reads the next page details from an XML config file .
- Developed applications with HTML, JSP and Tag libraries.
- Developed required stored procedures and database functions using PL/SQL.
- Developed, Tested and debugged various components in WebLogic Application Server .
Software Assurance and Risk Mitigation
Confidential, Montgomery
Responsibilities:
- IPAD technology in hospitals, the usage of IPAD’s by the doctors to enter the all the required information of the patients.
- Security challenges: CIA (confidentiality Integrity Availability)
- Experience and a deeper understanding of software-induced security risks and how to manage them.
- FIPS-200, NIST SP800-30 rev1, NIST800-53 rev4.
Confidential, Montgomery, Alabama
Risk Quantitative Risk Assessment and Management
Responsibilities:
- Conducted a research survey as part of our Cyber systems and Information Security degree program requirements.
- Survey is comprised of questions related to risk areas associated with your particular field of work.
- Purpose of the survey is to utilize experienced company personnel to help identify the vulnerabilities associated within the area of Banking.
- The survey will assist in extracting the specific threats that comprise each vulnerability and any countermeasures employed to minimize the Risk / impact of the vulnerability exploitation.
- Once a sufficient number of sample surveys are collected from each company, the results will be input into a security risk assessment program developed at the university.
- The program will analyze the vulnerability, threat, and countermeasure survey results, calculate a cost to mitigate the identified vulnerabilities, and produce an overall residual risk percentage.
- The program will then optimize the results down to a desired risk percentage.
- The optimized results include a cost to optimize and recommendations on areas to focus on to achieve the best return on investment.
- The results of the optimization will be made available to the company along with a detailed explanation of the findings.
- The results of the survey/optimization process may be used in published articles from the university, but the actual company name will never be identified.